This can come from loops like
```
for (int i = 0; i < X; ++i) {
if (i < N)
__builtin_trap();
if (i < M)
__builtin_trap();
x[i] = y[i];
}
```
Reviewers: nikic
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/181264
If the predicate has samesign set, we could either perform the checks
with the unsigned predicate and return and unsigned invariant predicate,
or we could perform them with the signed predicate and return a signed
invariant predicate. The current implementation can end up mixing both,
using a signed predicate for one check and an unsigned one for the
other.
Avoid this by dropping the samesign flag.
Fixes https://github.com/llvm/llvm-project/issues/180870.
getTruncateExpr may not always return a SCEVAddRecExpr when truncating
loop bounds. Add a check to verify the result type before casting, and
bail out of the transformation if the cast would be invalid.
This prevents potential crashes from invalid casts when dealing with
complex loop bounds.
Co-authored by Michael Rowan
Resolves
[https://github.com/llvm/llvm-project/issues/153090](https://github.com/llvm/llvm-project/issues/153090)
getTruncateExpr may not always return a SCEVAddRecExpr when truncating
loop bounds. Add a check to verify the result type before casting, and
bail out of the transformation if the cast would be invalid.
This prevents potential crashes from invalid casts when dealing with
complex loop bounds.
Co-authored by Michael Rowan
Resolves [#153090](https://github.com/llvm/llvm-project/issues/153090)
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.
The syntax in the common cases looks like this:
`denormal_fpenv(preservesign,preservesign)`
`denormal_fpenv(float: preservesign,preservesign)`
`denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`
I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).
This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.
This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.
I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
Reland #171963, #172639 and #173444, they are reverted in
86b9f90b9574b3a7d15d28a91f6316459dcfa046 because of introducing
non-determinism in compiles.
The non-determinism has been fixed in
9b8addffa70cee5b2acc5454712d9cf78ce45710.
This causes non-determinism in compiles.
From nikic: "FYI the non-determinism is also visible on
llvm-opt-benchmark. Maybe repeatedly running test cases from
299446d99f
could reproduce the issue..."
Also revert dependent 796fafeff92fe5d2d20594859e92607116e30a16 and
e135447bda617125688b71d33480d131d1076a72.
Add a test case for commit f54c6b4306a3f92c08aeb8a9fa222b88985cb9ef, which was previously failing after
refactor in b27af83120b32a4b8312ddf1e6317271122769e4.
Skip constant folding the loop predicates if the loop contains control
convergence tokens referenced outside the loop.
Fixes https://github.com/llvm/llvm-project/issues/164496.
Verified
[loop_peeling.test](https://github.com/llvm/offload-test-suite/pull/473)
passes with the fix.
Similar control convergence issues are found on other passes.
https://github.com/llvm/llvm-project/issues/165642
HLSL used for tests:
```hlsl
RWStructuredBuffer<uint> Out : register(u0);
[numthreads(8,1,1)]
void main(uint3 TID : SV_GroupThreadID) {
for (uint i = 0; i < 8; i++) {
if (i == TID.x) {
Out[TID.x] = WaveActiveMax(TID.x);
break;
}
}
}
```
With nested loop:
```hlsl
RWStructuredBuffer<uint> Out : register(u0);
[numthreads(8,8,1)]
void main(uint3 TID : SV_GroupThreadID) {
for (uint i = 0; i < 8; i++) {
for (uint j = 0; j < 8; j++) {
if (i == TID.x && j == TID.y) {
uint index = TID.x * 8 + TID.y;
Out[index] = WaveActiveMax(index);
break;
}
}
}
}
```
When transforming floating-point induction variables into integer ones,
make sure we stay within the bounds of fp values that can be represented
as integers without gaps, i.e., 2^24 and 2^53 for IEEE-754 single and
double precision respectively (both on negative and positive side).
Fixes: https://github.com/llvm/llvm-project/issues/166496.
At the moment, the effectivness of guards that contain divisibility
information (A % B == 0 ) depends on the order of the conditions.
This patch makes using divisibility information independent of the
order, by collecting and applying the divisibility information
separately.
We first collect all conditions in a vector, then collect the
divisibility information from all guards.
When processing other guards, we apply divisibility info collected
earlier.
After all guards have been processed, we add the divisibility info,
rewriting the existing rewrite. This ensures we apply the divisibility
info to the largest rewrite expression.
This helps to improve results in a few cases, one in
https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2921 and another one
in a different large C/C++ based IR corpus.
PR: https://github.com/llvm/llvm-project/pull/163021
Unused loop invariant loads were not sunk from the preheader to the exit
block, increasing live range.
This commit moves the sinkUnusedInvariant logic from indvarsimplify to
LICM also adds functionality to sink unused load that's not
clobbered by the loop body.
This is important to optimize patterns that frequently appear with
bounds checks:
```
for (int i = 0; i < N; ++i) {
bar[i] = foo[i] + 123;
}
```
which gets roughly turned into
```
for (int i = 0; i < N; ++i) {
if (i >= size of foo)
ubsan.trap();
if (i >= size of bar)
ubsan.trap();
bar[i] = foo[i] + 123;
}
```
Motivating example:
https://github.com/google/boringssl/blob/main/crypto/fipsmodule/hmac/hmac.cc.inc#L138
I hand-verified the assembly and confirmed that this optimization
removes the check in the loop.
This also allowed the loop to be vectorized.
Alive2: https://alive2.llvm.org/ce/z/3qMdLF
I did a `stage2-check-all` for both normal and
`-DBOOTSTRAP_CMAKE_C[XX]_FLAGS="-fsanitize=array-bounds
-fsanitize-trap=all"`.
I also ran some Google-internal tests with `fsanitize=array-bounds`.
Everything passes.
We would reenter the loop with %i.04 being 0, so the usub would
overflow to -1. This was the only test in this file that had
an overflow in the loop, the other ones did not.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Try to push the constant operand into a ZExt:
A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not
unsigned-wrap.
The actual code supports ZExts with arbitrary number of arguments, hence
the getAddExpr in the return.
This helps SCEV reasoning in some cases, commonly when adding an offset
to a zero-extended SCEV that subtracts the same offset.
Note that this is restricted to cases where we can fold away an operand
of the inner Add. This is needed to avoid bad interactions with patterns
when forming ZExts, which try to push to ZExt to add operands.
https://alive2.llvm.org/ce/z/q7d303
PR: https://github.com/llvm/llvm-project/pull/151227
Update SimplifyICmpOperands to only try subtracting 1 from RHS first, if
RHS is an op we can fold the subtract directly into. Otherwise try
adding to LHS first, as we can preserve NUW flags.
This improves results in a few cases, including the modified test case
from berkeley-abc and new code to be added in
https://github.com/llvm/llvm-project/pull/128061.
Note that there are more cases where the results can be improved by
better ordering here which I'll try to investigate as follow-up.
PR: https://github.com/llvm/llvm-project/pull/144404
Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the
correct datalayout when given a triple. Avoid explicitly specifying it
in tests that depend on the AMDGPU target being present to avoid the
string becoming out of sync with the TargetInfo value.
Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg
were updated to ensure that tests for non-target-specific passes that
happen to use the AMDGPU layout still pass when building with a limited
set of targets.
Reviewed By: shiltian, arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/137921
While sinking instructions (that are loop invariant) from preheader to
the exit block, we are skipping instructions due to decrementing
instruction iterator twice.
When IndVarSimplify simplifies a rem of the induction variable to a cmp
and select, only the select currently receives the rem's source
location; this patch propagates it to the cmp as well.
Found using https://github.com/llvm/llvm-project/pull/107279.
`WidenIV::widenWithVariantUse` assumes that exactly one of the binop
operands is the IV to be widened. This miscompilation happens when it
tries to sign-extend the "NonIV" operand while the IV is zero-extended.
Closes https://github.com/llvm/llvm-project/issues/135182.
The code already guards against values coming from a previous iteration
using properlyDominates(). However, addrecs are considered to properly
dominate the loop they are defined in.
Handle this special case separately, by checking for expressions that
have computable loop evolution (this should cover cases like a zext of
an addrec as well).
I considered changing the definition of properlyDominates() instead, but
decided against it. The current definition is useful in other context,
e.g. when deciding whether an expression is safe to expand in a given
block.
Fixes https://github.com/llvm/llvm-project/issues/126012.