1223 Commits

Author SHA1 Message Date
goldsteinn
a56ba1fab0
[ValueTracking] Handle recursive select/PHI in ComputeKnownBits (#114689)
Finish porting #114008 to `KnownBits` (Follow up to #113707).
2025-01-22 11:51:18 -06:00
Teresa Johnson
3a423a10ff
[MemProf][PGO] Prevent dropping of profile metadata during optimization (#121359)
This patch fixes a couple of places where memprof-related metadata
(!memprof and !callsite) were being dropped, and one place where PGO
metadata (!prof) was being dropped.

All were due to instances of combineMetadata() being invoked. That
function drops all metadata not in the list provided by the client, and
also drops any not in its switch statement.

Memprof metadata needed a case in the combineMetadata switch statement.
For now we simply keep the metadata of the instruction being kept, which
doesn't retain all the profile information when two calls with
memprof metadata are being combined, but at least retains some.

For the memprof metadata being dropped during call CSE, add memprof and
callsite metadata to the list of known ids in combineMetadataForCSE.

Neither memprof nor regular prof metadata were in the list of known ids
for the callsite in MemCpyOptimizer, which was added to combine AA
metadata after optimization of byval arguments fed by memcpy
instructions, and similar types of optimizations of memcpy uses.

There is one other callsite of combineMetadata, but it is only invoked
on load instructions, which do not carry these types of metadata.
2025-01-02 12:11:59 -08:00
DaPorkchop_
cea738bc9a
[SimplifyCFG] Replace unreachable switch lookup table holes with poison (#94990)
As discussed in #94468, this causes switch lookup table entries which
are unreachable to be poison instead of filling them with a value from
one of the reachable cases.

---------

Co-authored-by: DianQK <dianqk@dianqk.net>
2024-12-26 07:47:26 +08:00
Dominik Steenken
fa9cef50b1
Only guard loop metadata that has non-debug info in it (#118825)
This PR is motivated by a mismatch we discovered between compilation
results with vs. without `-g3`. We noticed this when compiling SPEC2017
testcases. The specific instance we saw is fixed in this PR by modifying
a guard (see below), but it is likely similar instances exist elsewhere
in the codebase.

The specific case fixed in this PR manifests itself in the `SimplifyCFG`
pass doing different things depending on whether DebugInfo is generated
or not. At the end of this comment, there is reduced example code that
shows the behavior in question.

The differing behavior has two root causes:
1. Commit https://github.com/llvm/llvm-project/commit/c07e19b adds loop
metadata including debug locations to loops that otherwise would not
have loop metadata
2. Commit https://github.com/llvm/llvm-project/commit/ac28efa6c100 adds
a guard to a simplification action in `SImplifyCFG` that prevents it
from simplifying away loop metadata

So, the change in 2. does not consider that when compiling with debug
symbols, loops that otherwise would not have metadata that needs
preserving, now have debug locations in their loop metadata. Thus, with
`-g3`, `SimplifyCFG` behaves differently than without it.

The larger issue is that while debug info is not supposed to influence
the final compilation result, commits like 1. blur the line between what
is and is not debug info, and not all optimization passes account for
this.

This PR does not address that and rather just modifies this particular
guard in order to restore equivalent behavior between debug and
non-debug builds in this one instance.

---

Here is a reduced version of a file from `f526.blender_r` that showcases
the behavior in question:
```C
struct LinkNode;
typedef struct LinkNode {
 struct LinkNode *next;
 void *link;
} LinkNode;

void do_projectpaint_thread_ph_v_state() {
  int *ps = do_projectpaint_thread_ph_v_state;
  LinkNode *node;
  while (do_projectpaint_thread_ph_v_state)
    for (node = ps; node; node = node->next)
      ;
}
```
Compiling this with and without DebugInfo, and then disassembling the
results, leads to different outcomes (tested on SystemZ and X86). The
reason for this is that the `SimplifyCFG` pass does different things in
either case.
2024-12-20 15:15:51 +01:00
Florian Hahn
c4a78b6fe3
[SimplifyCFG] Always allow hoisting if all instructions match. (#97158)
Generalize hoistCommonCodeFromSuccessors's `EqTermsOnly` to
`AllInstsEqOnly` and always allow hoisting if all instructions match.

In that case, all instructions can be hoisted and the
original branch will be replaced and selects for PHIs are added. This
allows preserving metadata in more cases, using the existing hoisting
logic, whereas previously FoldTwoEntryPHINode would drop the metadata.


https://llvm-compile-time-tracker.com/compare.php?from=716360367fbdabac2c374c19b8746f4de49a5599&to=986b2c47df516b31d998c055400e4f62aa76edc6&stat=instructions:u

PR: https://github.com/llvm/llvm-project/pull/97158
2024-12-13 21:26:27 +00:00
Antonio Frighetto
d26df32255 [SimplifyCFG] Consider preds to switch in simplifyDuplicateSwitchArms
Allow a duplicate basic block with multiple predecessors to the
jump table to be simplified, by considering that the same basic
block may appear in more switch cases.
2024-12-13 09:07:24 +01:00
Antonio Frighetto
e32c428bec [SimplifyCFG] Precommit tests for PR118955 (NFC) 2024-12-13 09:07:24 +01:00
Nikita Popov
462cb3cd6c
[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144)
If the gep is nusw (usually via inbounds) and the offset is
non-negative, we can infer nuw.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-05 14:36:40 +01:00
Lee Wei
9bf6365237
[llvm] Remove br i1 undef from some regression tests [NFC] (#118419)
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/ObjCARC, Reassociate, SCCP, SLPVectorizer...`.
After this PR, I'll continue to fix tests under `llvm/tests/CodeGen`,
which has more UB tests than `llvm/tests/Transforms`.
2024-12-03 20:54:36 +00:00
AdityaK
39601a6e54
Bail out jump threading on indirect branches only (#117778)
Remove check for PHI in pred as pointed out in #103688 
Reduced the testcase to remove redundant phi in pred

Fixes: #102351
2024-11-26 14:57:28 -08:00
Matt Arsenault
4028bb10c3
Local: Handle noalias_addrspace in combineMetadata (#103938)
This should act like range.

Previously ConstantRangeList assumed a 64-bit range. Now query from the
actual entries. This also means that the empty range has no bitwidth, so
move asserts to avoid checking the bitwidth of empty ranges.
2024-11-26 09:13:34 -05:00
Phoebe Wang
2568e52a73
[X86,SimplifyCFG] Support hoisting load/store with conditional faulting (Part II) (#108812)
This is a follow up of #96878 to support hoisting load/store from BBs
have the same predecessor, if load/store are the only instructions and
the branch is unpredictable, e.g.:

```
void test (int a, int *c, int *d) {
  if (a)
   *c = a;
  else
   *d = a;
}
```
2024-11-25 15:19:28 +08:00
Stephen Tozer
2188a56a75
[DebugInfo][SimplifyCFG] Fully propagate merged invoke DILocations (#114235)
Currently when we merge invokes as part of SimplifyCFG we apply a merge
of the invoke DILocations to the merged invoke. We also insert an
unconditional branch to the merged invoke at the positions previously
occupied by the original invokes; as this branch is part of the
substitution for the invoke it has replaced, we should propagate the
original invoke DebugLoc to it.
2024-11-15 17:20:55 +00:00
Michael Maitland
6b9952759f
[SimplifyCFG] Simplify switch instruction that has duplicate arms (#114262)
I noticed that the two C functions emitted different IR:

```
int switch_duplicate_arms(int switch_val, int v, int w) {
  switch (switch_val) {
  default:
    break;
  case 0:
    w = v;
    break;
  case 1:
    w = v;
    break;
  }
  return w;
}

int if_duplicate_arms(int switch_val, int v, int w) {
  if (switch_val == 0)
    w = v;
  else if (switch_val == 1)
    w = v;
  return v0;
}
```

We generate IR that looks like this:

```
define i32 @switch_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
  switch i32 %1, label %7 [
    i32 0, label %5
    i32 1, label %6
  ]

5:
  br label %7

6:
  br label %7

7:
  %8 = phi i32 [ %3, %4 ], [ %2, %6 ], [ %2, %5 ]
  ret i32 %8
}

define i32 @if_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
  %5 = icmp ult i32 %1, 2
  %6 = select i1 %5, i32 %2, i32 %3
  ret i32 %6
}
```

For `switch_duplicate_arms`, taking case 0 and 1 are the same since %5
and %6
branch to the same location and the incoming values for %8 are the same
from
those blocks. We could remove one on the duplicate switch targets and
update
the switch with the single target.

On RISC-V, prior to this patch, we generate the following code:
```
switch_duplicate_arms:
        li      a4, 1
        beq     a1, a4, .LBB0_2
        mv      a0, a3
        bnez    a1, .LBB0_3
.LBB0_2:
        mv      a0, a2
.LBB0_3:
        ret

if_duplicate_arms:
        li      a4, 2
        mv      a0, a2
        bltu    a1, a4, .LBB1_2
        mv      a0, a3
.LBB1_2:
        ret
```

After this patch, the O3 code is optimized to the icmp + select pair,
which
gives us the same code gen as `if_duplicate_arms`, as desired. This
results
is one less branch instruction in the final assembly.

This may help with both code size and further switch simplification. I
found
that this patch causes no significant impact to spec2006/int/ref and
spec2017/intrate/ref.

---------

Co-authored-by: Min Hsu <min@myhsu.dev>
2024-11-15 15:38:34 +01:00
Florian Hahn
40c75426a9
[SimplifyCFG] Add test for updating llvm.access.group when hoisting.
Add extra test coverage for preserving llvm.access.group metadata when
hoisting.
2024-11-12 13:14:30 +00:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
elhewaty
9efb07f261
[IR] Add samesign flag to icmp instruction (#111419)
Inspired by
https://discourse.llvm.org/t/rfc-signedness-independent-icmps/81423
2024-10-15 17:11:25 +08:00
Noah Goldstein
82ac399733 [SimplifyCFG] Allow merging invoke's with different attrs
Same logic as other callsites, if the attributes are intersectable, we
merge.

Closes #111713
2024-10-10 01:07:59 -05:00
Noah Goldstein
cd04a9d401 [SimplifyCFG] Add/update tests for merging invokes with different attrs; NFC 2024-10-10 01:07:59 -05:00
Matt Arsenault
a8e1311a1c
[RFC] IR: Define noalias.addrspace metadata (#102461)
This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.

In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For
AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We
therefore
need to codegen this by inserting a runtime address space check,
performing
the private case without atomics, and fallback to issuing the real
atomic
otherwise. This metadata allows us to avoid this extra check and branch.

Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.
2024-10-07 23:21:42 +04:00
Noah Goldstein
e343af777e [SimplifyCFG][Attributes] Enabling sinking calls with differing number of attrsets
Prior impl would fail if the number of attribute sets on the two calls
wasn't the same which is unnecessary as long as we aren't throwing
away and must-preserve attrs.

Closes #110896
2024-10-02 15:15:07 -05:00
Noah Goldstein
baf008ac29 [SimplifyCFG] Add tests for sinking calls with differing number of attrs; NFC 2024-10-02 15:15:07 -05:00
Noah Goldstein
4d4beeb43c [SimplifyCFG] Supporting hoisting/sinking callbases with differing attrs
Some (many) attributes can safely be dropped to enable sinking. For
example removing `nonnull` on a return/param can't affect correctness.

Closes #109472
2024-10-01 18:27:08 -05:00
Noah Goldstein
c42659417f [SimplifyCFG] Add tests for hoisting/sinking callbases with differing attrs; NFC 2024-10-01 18:27:08 -05:00
Nikita Popov
f445e39ab2
[SimplifyCFG] Use isWritableObject() API (#110127)
SimplifyCFG store speculation currently has some homegrown code to check
for a writable object, handling the alloca special case only.

Switch it to use the generic isWritableObject() API, which means that we
also support byval arguments, allocator return values, and writable
arguments.

I've adjusted isWritableObject() to also check for the noalias attribute
when handling writable. Otherwise, I don't think that we can generalize
from at-entry writability. This was not relevant for previous uses of
the function, because they'd already require noalias for other reasons
anyway.
2024-09-30 10:03:46 +02:00
Nikita Popov
8f21459777 [SimplifyCFG] Add additional store speculation tests (NFC) 2024-09-26 16:20:06 +02:00
Chengjun
e4688b98cd
[SimplifyCFG] Avoid increasing too many phi entries when removing empty blocks (#104887)
Now in the simplifycfg and jumpthreading passes, we will remove the
empty blocks (blocks only have phis and an unconditional branch).
However, in some cases, this will increase size of the IR and slow down
the compile of other passes dramatically. For example, we have the
following CFG:

1. BB1 has 100 predecessors, and unconditionally branches to BB2 (does
not have any other instructions).
2. BB2 has 100 phis.

Then in this case, if we remove BB1, for every phi in BB2, we need to
increase 99 entries (replace the incoming edge from BB1 with 100 edges
from its predecessors). Then in total, we will increase 9900 phi
entries, which can slow down the compile time for many other passes.

Therefore, in this change, we add a check to see whether removing the
empty blocks will increase lots of phi entries. Now, the threshold is
1000 (can be controlled by the command line option
`max-phi-entries-increase-after-removing-empty-block`), which means that
we will not remove an empty block if it will increase the total number
of phi entries by 1000. This threshold is conservative and for most of
the cases, we will not have such a large phi. So, this will only be
triggered in some unusual IRs.
2024-09-25 12:41:13 +02:00
Nikita Popov
6f194a6dea [SimplifyCFG] Avoid truncation in linear map overflow check
This is supposed to test multiplication of the linear multiplifier
with the largest value it can be multiplied with. However, if
we truncate TableSize-1 here, it might not actually be the largest
value. I think in practice this still works out, because in cases
where we'd truncate the value here we'd also fail the NonMonotonic
check. But to match the intent of the code, we should treat the
truncating case as overflowing.
2024-09-23 15:13:32 +02:00
Phoebe Wang
7773dcd163
[X86][NFC] Change test name and add a new test (#109638)
Address post commit comments in #108754.
2024-09-23 20:21:11 +08:00
Nikita Popov
8a6248b739
[SimplifyCFG] Don't separate a load/store from its gep during sinking (#102318)
If we can sink the a load/store, but not the gep producing its pointer
operand, don't sink the load/store either. This may prevent the gep from
being folded into an addressing mode, and may also negatively affect
further analysis.

Fixes https://github.com/llvm/llvm-project/issues/96838.
2024-09-23 09:32:24 +02:00
Nikita Popov
5a4c6f9799
[Loads] Check context instruction for context-sensitive derefability (#109277)
If a dereferenceability fact is provided through `!dereferenceable` (or
similar), it may only hold on the given control flow path. When we use
`isSafeToSpeculativelyExecute()` to check multiple instructions, we
might make use of `!dereferenceable` information that does not hold at
the speculation target. This doesn't happen when speculating
instructions one by one, because `!dereferenceable` will be dropped
while speculating.

Fix this by checking whether the instruction with `!dereferenceable`
dominates the context instruction. If this is not the case, it means we
are speculating, and cannot guarantee that it holds at the speculation
target.

Fixes https://github.com/llvm/llvm-project/issues/108854.
2024-09-23 09:13:09 +02:00
Nikita Popov
30cdf1e959
[SimplifyCFG] Pass context instruction to isSafeToSpeculativelyExecute() (#109132)
Pass speculation target and assumption cache to
isSafeToSpeculativelyExecute() calls.

This allows speculating based on dereferenceable/align assumptions, but
the primary motivation here is to avoid regressions from planned changes
to fix https://github.com/llvm/llvm-project/issues/108854.
2024-09-19 10:19:15 +02:00
Noah Goldstein
37932643ab [SimplifyCFG] Deduce paths unreachable if they cause div/rem UB
Same we way mark a path unreachable if it may cause a nullptr
dereference, div/rem by zero or signed div/rem of INT_MIN by -1 cause
immediate UB.

Closes #109008
2024-09-18 12:59:52 -05:00
Noah Goldstein
f5d62d7647 [SimplifyCFG] Add tests for deducing paths unreachable if they cause div/rem UB; NFC 2024-09-18 12:59:52 -05:00
Nikita Popov
13b4d1bfea [SimplifyCFG][LICM] Add additional speculation tests
These are related to https://github.com/llvm/llvm-project/issues/108854.
2024-09-18 14:48:58 +02:00
Noah Goldstein
419c53477e [SimplifyCFG] Mark div/rem as not-cheap to sink if we are replacing const denominator
Close #109007
2024-09-17 12:04:34 -05:00
Noah Goldstein
ae8d0200b0 [SimplifyCFG] Add test for sinking div/rem with const remainder; NFC 2024-09-17 12:04:34 -05:00
Andreas Jonson
a0d00c94c2
[SimplifyCFG] Swap range metadata to attribute for calls. (#108984)
Among the last usages of range metadata for call before being able to
deprecate and only have the range attribute for calls.
2024-09-17 18:25:53 +02:00
Csanád Hajdú
bc8a5d104c
[Patchpoint] Add immarg attributes to patchpoint arguments (#97276) 2024-09-17 14:00:24 +04:00
Phoebe Wang
af5a45b34b
[X86,SimplifyCFG] Use passthru to reduce select (#108754) 2024-09-16 20:20:36 +08:00
AdityaK
3c9022c965
Bail out jump threading on indirect branches (#103688)
The bug was introduced by
https://github.com/llvm/llvm-project/pull/68473

Fixes: #102351
2024-09-10 22:39:02 -07:00
Shengchen Kan
87c86aa6b9
[X86,SimplifyCFG] Support hoisting load/store with conditional faulting (Part I) (#96878)
This is simplifycfg part of
https://github.com/llvm/llvm-project/pull/95515

In this PR, we support hoisting load/store with conditional faulting in
`SimplifyCFGOpt::speculativelyExecuteBB` to eliminate conditional
branches.
This is for cases like
```
void test (int a, int *b) {
  if (a)
   *b = a;
}
``` 

In the following patches, we will support the hoist in
`SimplifyCFGOpt::hoistCommonCodeFromSuccessors`.
That is for cases like
```
void test (int a, int *c, int *d) {
  if (a)
   *c = a;
  else 
   *d = a;
}
```
2024-08-29 10:42:44 +08:00
Nikita Popov
84497c6f4f
[SimplifyCFG] Remove limitation on sinking of load/store of alloca (#104788)
This is a followup to https://github.com/llvm/llvm-project/pull/104579
to remove the limitation on sinking loads/stores of allocas entirely,
even if this would introduce a phi node.

Nowadays, SROA supports speculating load/store over select/phi.
Additionally, SimplifyCFG with sinking only runs at the end of the
function simplification pipeline, after SROA. I checked that the two
tests modified here still successfully SROA after the SimplifyCFG
transform.

We should, however, keep the limitation on lifetime intrinsics. SROA
does not have speculation support for these, and I've also found that
the way these are handled in the backend is very problematic
(https://github.com/llvm/llvm-project/issues/104776), so I think we
should leave them alone.
2024-08-26 10:14:43 +02:00
Nikita Popov
4d85285ff6
[SimplifyCFG] Fold switch over ucmp/scmp to icmp and br (#105636)
If we switch over ucmp/scmp and have two switch cases going to the same
destination, we can convert into icmp+br.

Fixes https://github.com/llvm/llvm-project/issues/105632.
2024-08-22 16:57:09 +02:00
Nikita Popov
716f7e2d18 [SimplifyCFG] Add tests for switch over cmp intrinsic (NFC) 2024-08-22 11:52:02 +02:00
Nikita Popov
b3fa45b642
[SimplifyCFG] Add support for hoisting commutative instructions (#104805)
This extends SimplifyCFG hoisting to also hoist instructions with
commuted operands, for example a+b on one side and b+a on the other
side.

This should address the issue mentioned in:
https://github.com/llvm/llvm-project/pull/91185#issuecomment-2097447927
2024-08-20 12:48:06 +02:00
Nikita Popov
b64e7e07e5 [SimplifyCFG] Add tests for hoisting of commutative instructions (NFC) 2024-08-19 17:13:21 +02:00
Nikita Popov
83879f4f53
[SimplifyCFG] Don't block sinking for allocas if no phi created (#104579)
SimplifyCFG sinking currently does not sink loads/stores of allocas,
because historically SROA was unable to handle the resulting IR. Since
then, SROA both learned to speculate loads/stores over selects and phis,
*and* SimplifyCFG sinking has been deferred to the end of the function
simplification pipeline, which means that SROA happens before it.

As such, I believe that this workaround should no longer be necessary.
Given how sensitive SimplifyCFG sinking seems to be, this patch takes a
very conservative step towards removing this, by allowing sinking if we
don't actually need to form a phi over the pointer argument.

This fixes https://github.com/llvm/llvm-project/issues/104567, where
sinking a store to an escaped alloca allows converting a switch into
arithmetic.
2024-08-19 09:55:30 +02:00
Nikita Popov
65390f9d6f [SimplifyCFG] Add test for #104567 (NFC) 2024-08-16 12:37:18 +02:00
Nikita Popov
1139dee910 [SimplifyCFG] Add more sinking tests (NFC) 2024-08-08 15:13:59 +02:00