Updates SimplifyCFG to avoid jump threading through loop headers if
-keep-loops is requested. Canonical loop form requires a loop header
that dominates all blocks in the loop. If we thread through a header, we
risk breaking its domination of the loop. This change avoids this issue
by conservatively avoiding threading through headers entirely.
Fixes: https://github.com/llvm/llvm-project/issues/151144
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Extend jump-threading to allow local defs that are live outside of the
threaded block. Allow threading to destinations where the local defs are
not live.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
Currently we always produce a cost of 1 for all CostKinds that are not
RecipThroughput, which can underestimate the cost if the type has a
higher legalization cost (like larger vectors). This relaxes it to cover
all cost kinds.
Fix#141753 .
This patch introduces a new check, that tries to decide if the
conjunction of all the values uniquely identify the accepted values by
the switch.
We should be able to allow `simplifySwitchOfPowersOfTwo` transform
to take place, as, on recent X86 targets, the weighted latency-size
appears to be 2. This favours computing trailing zeroes and indexing
into a smaller value table, over generating a jump table with an
indirect branch, which overall should be more efficient.
Just for consistency, to avoid confusing conditions.
`reverse` helps to avoid tests updates as nothing is
changing for for successors count <=2.
For #142461
An existing transformation replaces invoke instructions with a call to
the invoked function and a branch to the destination; when this happens,
we propagate the invoke's source location to the call but not to the
branch. This patch updates this behaviour to propagate to the branch as
well.
Found using https://github.com/llvm/llvm-project/pull/107279.
Preserve branch weight metadata when merging instructions if one of the
instructions is missing metadata. This is similar in behaviour to what
we do today for other types of metadata such as mmra, memprof and
callsite metadata.
Also add a legality check when merging prof metadata based on
instruction type. Without this check GVN PRE optimizations result in
prof metadata on phi nodes which break the module verifier.
Build failure caught by
https://lab.llvm.org/buildbot/#/builders/113/builds/6621
```
!9185 = !{!"branch_weights", i32 3912, i32 802}
Wrong number of operands
!9185 = !{!"branch_weights", i32 3912, i32 802}
fatal error: error in backend: Broken module found, compilation aborted!
```
Reverts #134200 with additional changes.
Preserve branch weight metadata when merging instructions if one of the
instructions is missing metadata. This is similar in behaviour to what
we do today for other types of metadata such as mmra, memprof and
callsite metadata.
This is to fix a bug when a target only support conditional faulting
load, see test case hoist_store_without_cstore.
Split `-simplifycfg-hoist-loads-stores-with-cond-faulting` into
`-simplifycfg-hoist-loads-with-cond-faulting` and
`-simplifycfg-hoist-stores-with-cond-faulting` to control conditional
faulting load and store respectively.
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.
Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.
(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
Closes#115683 .
Overflow arithmetic instruction plus extract value are usually generated
when a division is being replaced, but the zero check may still be
there. In that case hoist these two instructions out of this basic
block, and let later optimizations take care of the unnecessary zero
checks.
The llvm.fake.use intrinsic is used to prevent certain values from being
optimized out for the benefit of debug info; it is not, however, a debug
or pseudo instruction itself and necessarily must not be treated as one,
since its purpose is to act like a normal instruction. In the original
commit that added them, the IR intrinsic however was treated as one in
`getPrevNonDebugInstruction` (but _not_ in `getNextNonDebugInstruction`,
or in the MIR equivalents). This patch correctly treats it as a
non-debug instruction.
Update LangRef and code using `Dereferenceable` in assume bundles to
only use the information if it is safe at the point of use.
`Dereferenceable` in an assume bundle is only guaranteed at the point of
the assumption, but may not be guaranteed at later points, because the
pointer may have been freed.
Update code using `Dereferenceable` to only use it if the pointer cannot
be freed. This can further be refined to check if the pointer could be
freed between assume and use.
This follows up on https://github.com/llvm/llvm-project/pull/123196.
With that change, it should be safe to expose dereferenceable
assumptions more widely as in
https://github.com/llvm/llvm-project/pull/121789
PR: https://github.com/llvm/llvm-project/pull/126117
This patch fixes a couple of places where memprof-related metadata
(!memprof and !callsite) were being dropped, and one place where PGO
metadata (!prof) was being dropped.
All were due to instances of combineMetadata() being invoked. That
function drops all metadata not in the list provided by the client, and
also drops any not in its switch statement.
Memprof metadata needed a case in the combineMetadata switch statement.
For now we simply keep the metadata of the instruction being kept, which
doesn't retain all the profile information when two calls with
memprof metadata are being combined, but at least retains some.
For the memprof metadata being dropped during call CSE, add memprof and
callsite metadata to the list of known ids in combineMetadataForCSE.
Neither memprof nor regular prof metadata were in the list of known ids
for the callsite in MemCpyOptimizer, which was added to combine AA
metadata after optimization of byval arguments fed by memcpy
instructions, and similar types of optimizations of memcpy uses.
There is one other callsite of combineMetadata, but it is only invoked
on load instructions, which do not carry these types of metadata.
As discussed in #94468, this causes switch lookup table entries which
are unreachable to be poison instead of filling them with a value from
one of the reachable cases.
---------
Co-authored-by: DianQK <dianqk@dianqk.net>
This PR is motivated by a mismatch we discovered between compilation
results with vs. without `-g3`. We noticed this when compiling SPEC2017
testcases. The specific instance we saw is fixed in this PR by modifying
a guard (see below), but it is likely similar instances exist elsewhere
in the codebase.
The specific case fixed in this PR manifests itself in the `SimplifyCFG`
pass doing different things depending on whether DebugInfo is generated
or not. At the end of this comment, there is reduced example code that
shows the behavior in question.
The differing behavior has two root causes:
1. Commit https://github.com/llvm/llvm-project/commit/c07e19b adds loop
metadata including debug locations to loops that otherwise would not
have loop metadata
2. Commit https://github.com/llvm/llvm-project/commit/ac28efa6c100 adds
a guard to a simplification action in `SImplifyCFG` that prevents it
from simplifying away loop metadata
So, the change in 2. does not consider that when compiling with debug
symbols, loops that otherwise would not have metadata that needs
preserving, now have debug locations in their loop metadata. Thus, with
`-g3`, `SimplifyCFG` behaves differently than without it.
The larger issue is that while debug info is not supposed to influence
the final compilation result, commits like 1. blur the line between what
is and is not debug info, and not all optimization passes account for
this.
This PR does not address that and rather just modifies this particular
guard in order to restore equivalent behavior between debug and
non-debug builds in this one instance.
---
Here is a reduced version of a file from `f526.blender_r` that showcases
the behavior in question:
```C
struct LinkNode;
typedef struct LinkNode {
struct LinkNode *next;
void *link;
} LinkNode;
void do_projectpaint_thread_ph_v_state() {
int *ps = do_projectpaint_thread_ph_v_state;
LinkNode *node;
while (do_projectpaint_thread_ph_v_state)
for (node = ps; node; node = node->next)
;
}
```
Compiling this with and without DebugInfo, and then disassembling the
results, leads to different outcomes (tested on SystemZ and X86). The
reason for this is that the `SimplifyCFG` pass does different things in
either case.
Allow a duplicate basic block with multiple predecessors to the
jump table to be simplified, by considering that the same basic
block may appear in more switch cases.
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/ObjCARC, Reassociate, SCCP, SLPVectorizer...`.
After this PR, I'll continue to fix tests under `llvm/tests/CodeGen`,
which has more UB tests than `llvm/tests/Transforms`.
This should act like range.
Previously ConstantRangeList assumed a 64-bit range. Now query from the
actual entries. This also means that the empty range has no bitwidth, so
move asserts to avoid checking the bitwidth of empty ranges.
This is a follow up of #96878 to support hoisting load/store from BBs
have the same predecessor, if load/store are the only instructions and
the branch is unpredictable, e.g.:
```
void test (int a, int *c, int *d) {
if (a)
*c = a;
else
*d = a;
}
```
Currently when we merge invokes as part of SimplifyCFG we apply a merge
of the invoke DILocations to the merged invoke. We also insert an
unconditional branch to the merged invoke at the positions previously
occupied by the original invokes; as this branch is part of the
substitution for the invoke it has replaced, we should propagate the
original invoke DebugLoc to it.
I noticed that the two C functions emitted different IR:
```
int switch_duplicate_arms(int switch_val, int v, int w) {
switch (switch_val) {
default:
break;
case 0:
w = v;
break;
case 1:
w = v;
break;
}
return w;
}
int if_duplicate_arms(int switch_val, int v, int w) {
if (switch_val == 0)
w = v;
else if (switch_val == 1)
w = v;
return v0;
}
```
We generate IR that looks like this:
```
define i32 @switch_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
switch i32 %1, label %7 [
i32 0, label %5
i32 1, label %6
]
5:
br label %7
6:
br label %7
7:
%8 = phi i32 [ %3, %4 ], [ %2, %6 ], [ %2, %5 ]
ret i32 %8
}
define i32 @if_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
%5 = icmp ult i32 %1, 2
%6 = select i1 %5, i32 %2, i32 %3
ret i32 %6
}
```
For `switch_duplicate_arms`, taking case 0 and 1 are the same since %5
and %6
branch to the same location and the incoming values for %8 are the same
from
those blocks. We could remove one on the duplicate switch targets and
update
the switch with the single target.
On RISC-V, prior to this patch, we generate the following code:
```
switch_duplicate_arms:
li a4, 1
beq a1, a4, .LBB0_2
mv a0, a3
bnez a1, .LBB0_3
.LBB0_2:
mv a0, a2
.LBB0_3:
ret
if_duplicate_arms:
li a4, 2
mv a0, a2
bltu a1, a4, .LBB1_2
mv a0, a3
.LBB1_2:
ret
```
After this patch, the O3 code is optimized to the icmp + select pair,
which
gives us the same code gen as `if_duplicate_arms`, as desired. This
results
is one less branch instruction in the final assembly.
This may help with both code size and further switch simplification. I
found
that this patch causes no significant impact to spec2006/int/ref and
spec2017/intrate/ref.
---------
Co-authored-by: Min Hsu <min@myhsu.dev>