InstCombine already has some rules for `icmp ptrtoint, ptrtoint` to drop
the casts and compare the source values. This change adds the same for
the reverse case with `inttoptr`.
This extends the optimisation implemented in #107769 by relaxing the
condtions to make it happen. Now, the value produced by `ucmp`/`scmp`
doesn't need to be one-use, but only one-user, meaning it can be present
in a single phi node more than once.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
There is already a comment on the member and documentation in the
InstCombine contributor guide, but also rename it to make add
an additional speed bump.
This replaces some uses of isSafeToSpeculativelyExecute() with
isSafeToSpeculativelyExecuteWithVariableReplaced(), in cases where we
are guarding against operand changes rather plain speculation.
I believe that this is NFC with the current implementation of the
function (as it only does something different from loads), but this
makes us more defensive against future generalizations.
The motivation of this patch is to fold more generalized patterns like
`icmp ult (shl nuw 16, X), 64 -> icmp ult X, 2`.
Alive2: https://alive2.llvm.org/ce/z/gyqjQH
In current visitPHINode function during InstCombine, it can remove dead
phi cycles (all phis have one use, which is another phi). However, it
cannot deal with the case when the phis form a web (all phis have one or
more uses, and all the uses are phi). This change extends the algorithm
so that it can also deal with the dead phi web.
In some cases, if an undef value is the product of another instcombine
simplification, a bitcast of undef is simplified to a zeroinitializer
vector instead of undef.
alive2: ~~https://alive2.llvm.org/ce/z/Ag3Ki7~~https://alive2.llvm.org/ce/z/ywP5t2
related: #76438
This patch adds the following foldings: `floor(x) <= x --> true` and `x
<= ceil(x) --> true`. We leverage the properties of these math functions
and ensure there is no floating point input of `nan`.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
When we have a `phi` instruction with more than one of its incoming
values being a call to `ucmp` or `scmp`, which is then compared with an
integer constant, we can move the comparison through the `phi` into the
incoming basic blocks because we know that a comparison of `ucmp`/`scmp`
with a constant will be simplified by the next iteration of InstCombine.
There's a high chance that other similar patterns can be identified, in
which case they can be easily handled by the same code by moving the
check for "simplifiable" instructions into a lambda.
These transforms all perform a variant of (gep (gep p, x), y)
to (gep p, (x + y)). We can preserve both inbounds and nuw
during such transforms (https://alive2.llvm.org/ce/z/Stu4cN), but
not nusw, which would require proving that the new add is nsw.
For the constant offset case, I've conservatively retained the
logic that checks for negative intermediate offsets, though I'm
not sure it's still reachable nowadays.
This was modifying the GEP in place, with code to adjust the
inbounds flag. This was correct at the time, but now fails to
account for other GEP flags like nuw, leading to miscompilations.
Remove the special case, and always create a new GEP instruction.
Logic for preserving nuw in the cases where it is valid will be
added in a followup patch.
The foldOpIntoPhi() transforms requires all operands to be
phi-translatable. This can be the case either because they are phi nodes
in the same block, or because the operand dominates the block.
Currently, most callers of foldOpIntoPhi() satisfy this pre-condition by
requiring a constant operand, which trivially dominates everything. Only
selects had handling for variable operands.
Move this logic into foldOpIntoPhi(), so things are handled correctly if
other callers are generalized. Also make the implementation a bit more
general by querying the dominator tree.
The op of phi transform wants to prevent moving an operation across a
backedge, as this may lead to an infinite combine loop.
Currently, this is done using isPotentiallyReachable(). The problem with
that is that all blocks inside a loop are reachable from each other.
This means that the op of phi transform is effectively completely
disabled for code inside loops, even when it's not actually operating on
a loop phi (just a phi that happens to be in a loop).
Fix this by explicitly computing the backedges inside the function
instead. Do this via RPOT, which is a bit more efficient than using
FindFunctionBackedges() (which does it without any pre-computed
analyses).
For irreducible cycles, there may be multiple possible choices of
backedge, and this just picks one of them. This is still sufficient to
prevent combine loops.
This also removes the last use of LoopInfo in InstCombine -- I'll drop
the analysis in a followup.
This patch replaces all dominated uses of condition with true/false to
improve context-sensitive optimizations. It eliminates a bunch of
branches in llvm-opt-benchmark.
As a side effect, it may introduce new phi nodes in some corner cases.
See the following case:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br i1 %cmp, label %if.then, label %if.else
if.then:
br %bb2
if.else:
br %bb2
bb2:
%res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else]
ret i1 %res
}
```
It will be simplified into:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br i1 %cmp, label %if.then, label %if.else
if.then:
br %bb2
if.else:
br %bb2
bb2:
%res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else]
ret i1 %res
}
```
I am planning to fix this in late pipeline/CGP since this problem exists
before the patch.
This patch is moving out stepvector intrinsic from the experimental
namespace.
This intrinsic exists in LLVM for several years now, and is widely used.
Added folds:
- `(add (sub X, Y), (sub Z, X))` -> `(sub Z, Y)`
- `(sub (add X, Y), (add X, Z))` -> `(sub Y, Z)`
The fold typically is handled in the `Reassosiate` pass, but it fails
if the inner `sub`/`add` are multi-use. Less importantly, Reassosiate
doesn't propagate flags correctly.
This patch adds the fold explicitly the InstCombine
Proofs: https://alive2.llvm.org/ce/z/p6JyRPCloses#105866
getMaskedTypeForICmpPair() tries to model non-and operands as x & -1.
However, this can end up confusing the matching logic, by picking the -1
operand as the "common" operand, resulting in a successful, but useless,
match. This is what causes commutation failures for some of the
optimizations driven by this function.
Fix this by treating a match against -1 as a non-match.
This patch updates the select operand when the cond has the nuw or nsw
property. Considering the semantics of the nuw and nsw flag, if there is
no poison value in this expression, this code assumes that X can only be
0, 1 or -1.
close: #96765
alive2: https://alive2.llvm.org/ce/z/3n3n2Q
This patch expands already existing funcionality to include these two
additional folds, which are nearly identical to the ones already
implemented.
Proofs: https://alive2.llvm.org/ce/z/Xy7s4j
When folding an icmp into a select, treat an icmp of a constant with a
one-use ucmp/scmp intrinsic as a simplification. These comparisons will
reduce down to an icmp.
This addresses a regression seen in Rust and also in llvm-opt-benchmark.
This change also covers the fold of `zext(A > B) - zext(A < B)` since it
is already being canonicalized into the aforementioned pattern.
Proof: https://alive2.llvm.org/ce/z/AgnfMn
Previously, (zext (icmp ne (and X, (1 << ShAmt)), 0)) has only been
folded if the bit width of X and the result were equal. Use a trunc or
zext instruction to also support other bit widths.
This is a follow-up to commit 533190acdb9d2ed774f96a998b5c03be3df4f857,
which introduced a regression: (zext (icmp ne (and (lshr X ShAmt) 1) 0))
is not folded any longer to (zext/trunc (and (lshr X ShAmt) 1)) since
the commit introduced the fold of (icmp ne (and (lshr X ShAmt) 1) 0) to
(icmp ne (and X (1 << ShAmt)) 0). The change introduced by this commit
restores this fold.
Alive proof: https://alive2.llvm.org/ce/z/MFkNXs
Relates to issue #86813 and pull request #101838.
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
This patch adds the aforementioned fold to InstCombine. This pattern is
produced after naive implementations of 3-way comparison in high-level
languages are transformed into LLVM IR and then optimized.
Proofs: https://alive2.llvm.org/ce/z/w4QLq_
Given an unsigned integer comparison of `add nsw X, C1` with some
constant `C2` we can fold it into a signed comparison of `X` and `C2 -
C1` under the following conditions:
* There's a `nsw` flag on the addition
* `C2` is non-negative
* `X + C1` is non-negative
* `C2 - C1` is non-negative
When looking at PHI operand for combining, only look at instructions and
arguments. The loop later iteraters over Arg's users, which is not
useful if Arg is a constant -- it's users are not meaningful and might
be in different functions, which causes problems for the dominates()
query.
Pull Request: https://github.com/llvm/llvm-project/pull/103302
transformConstExprCastCall() implements a number of highly dubious
transforms attempting to make a call function type line up with the
function type of the called function. Historically, the main value this
had was to avoid function type mismatches due to pointer type
differences, which is no longer relevant with opaque pointers.
This patch is a step towards reducing the scope of the transform, by
applying it only to definitions, not declarations. For declarations, the
declared signature might not match the actual function signature, e.g.
`void @fn()` is sometimes used as a placeholder for functions with
unknown signature. The implementation already bailed out in some cases
for declarations, but I think it would be safer to disable the transform
entirely.
For the test cases, I've updated some of them to use definitions
instead, so that the test coverage is preserved.