This PR fixes https://github.com/llvm/llvm-project/issues/55013 : the
max intrinsics is not generated for this simple loop case :
https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being
folded into a select, thus not generating the max intrinsics.
For the story :
Since LLVM 14, SCCP pass got smarter by folding sext into zext for
positive ranges : https://reviews.llvm.org/D81756. After this change,
InstCombine was sometimes unable to fold ICMP correctly as both of the
arguments pointed to mismatched zext/sext. To fix this, @rotateright
implemented this fix : https://reviews.llvm.org/D124419 that tries to
resolve the mismatch by knowing if the argument of a zext is positive
(in which case, it is like a sext) by using ValueTracking, however
ValueTracking is not smart enough to infer that the value is positive in
some cases. Recently, @nikic implemented #67982 which keeps the
information that a zext is non-negative. This PR simply uses this
information to do the folding accordingly.
TLDR : This PR uses the recent nneg tag on zext to fold the icmp
accordingly in instcombine.
This PR also contains test cases for sext/zext folding with InstCombine
as well as a x86 regression tests for the max/min case.
Looking through inttoptr / ptrtoint intermixed with GEPs is very
questionable from a provenance perspective. We also don't seem to
have any test coverage that shows this is useful (apart from one
test I added to guard against a crash).
This fold requires a fold against a constant, which will always be
on the RHS. If the swapped fold actually did trigger, it would
result in a miscompile, because it did not work with the swapped
predicate when swapping operands.
Use the constant folding API instead. In the second case using
IR builder should also work, but the way the instructions are
created an inserted there is very unusual, so I've left it alone.
This patch canonicalizes `icmp eq/ne (A ^ Cst), B` to `icmp eq/ne (A ^ B), Cst` since the latter form exposes more optimizations.
Proof: https://alive2.llvm.org/ce/z/9DbhGcFixes#65968.
This patch further improves the simplification of pattern `icmp eq/ne
min|max(X, Y), Z` as discussed in
[D156238](https://reviews.llvm.org/D156238).
When `X < Z`:
`min(X, Y) == Z -> false`
`min(X, Y) != Z -> true`
`max(X, Y) == Z -> Y == Z`
`max(Y, Z) != Z -> Y != Z`
When `X > Z`:
`max(X, Y) == Z -> false`
`max(X, Y) != Z -> true`
`min(X, Y) == Z -> Y == Z`
`min(Y, Z) != Z -> Y != Z`
Alive2: https://alive2.llvm.org/ce/z/evkmaq
This patch further improves the simplification of pattern `icmp eq/ne
min|max(X, Y), Z` as discussed in
[D156238](https://reviews.llvm.org/D156238).
When `X < Z`:
`min(X, Y) == Z -> false`
`min(X, Y) != Z -> true`
`max(X, Y) == Z -> Y == Z`
`max(Y, Z) != Z -> Y != Z`
When `X > Z`:
`max(X, Y) == Z -> false`
`max(X, Y) != Z -> true`
`min(X, Y) == Z -> Y == Z`
`min(Y, Z) != Z -> Y != Z`
Alive2: https://alive2.llvm.org/ce/z/evkmaq
This patch generalizes the fold of `icmp pred min/max(X, Y), Z` to address the issue https://github.com/llvm/llvm-project/issues/62898.
For example, we can fold `smin(X, Y) < Z` into `X < Z` when `Y > Z` is implied by constant folds/invariants/dom conditions.
Alive2 (with `--disable-undef-input` due to the limitation of --smt-to=10000): https://alive2.llvm.org/ce/z/rB7qLc
You can run the standalone translation validation tool `alive-tv` locally to verify these transformations.
```
alive-tv transforms.ll --smt-to=600000 --exit-on-error
```
Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D156238
The icmp is being folded in phi only if they belong in the same BB.
This patch extends the same beyond the BB.
Have seen scenarios where this seems to be beneficial.
Differential Revision: https://reviews.llvm.org/D157740
Since we no longer support typed LLVM IR pointer types, the code can
be simplified into for example using PointerType::get directly instead
of using Type::getInt8PtrTy and Type::getInt32PtrTy etc.
Differential Revision: https://reviews.llvm.org/D156733
Replace these with IRBuilder uses, as we don't (from a type
perspective) care about Constant results.
Switch the predicate to m_ImmConstant() instead of isa<Constant>
to guarantee that these do get folded away and our assumptions
about simplifications hold true.
D152673 Incorrectly didn't account for operand position in the `icmp`,
i.e it treated `icmp uge x, y` the same as `icmp uge y, x` which is
incorrect:
https://reviews.llvm.org/rG142f7448e770f25b774b058a7eab1f107c4daad9
The fix takes operand position into account. The new tests
exhaustively cover all operand positions for `ule`, `uge`, `ult`,
`ugt` (the set of predicates) and all transform verify with the new
commit.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D156058
in `ctpop(X) eq/ne 1` or `ctpop(X) ugt/ule 1`, if there is any
known-bit in `X`, instead of going through `ctpop`, we can just test
if there are any other known bits in `X`. If there are, `X` is not a
power of 2. If there aren't, `X` is a power of 2.
https://alive2.llvm.org/ce/z/eLMJgU
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D152677
This comes up when adding two `bool` types in C/C++
```
bool foo(bool a, bool b) {
return a + b;
}
...
->
define i1 @foo(i1 %a, i1 %b) {
%conv = zext i1 %a to i32
%conv3.neg = sext i1 %b to i32
%tobool4 = icmp ne i32 %conv, %conv3.neg
ret i1 %tobool4
}
```
Proof: https://alive2.llvm.org/ce/z/HffWAN
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D154574
This patch is a continuation of D154206. It introduces a fold for the
operation "uadd_sat(X, C) pred C2" where "C" and "C2" are constants. The
fold is:
uadd_sat(X, C) pred C2
=> (X >= ~C) || ((X + C) pred C2) -> when (UINT_MAX pred C2) is true
=> (X < ~C) && ((X + C) pred C2) -> when (UINT_MAX pred C2) is false
This patch also generalizes the fold to work with any saturating
intrinsic as long as the saturating value is known.
Proofs: https://alive2.llvm.org/ce/z/wWeirP
Differential Revision: https://reviews.llvm.org/D154565
This patch introduces a fold for the operation "usub_sat(X, C) pred C2"
where "C" and "C2" are constants. The fold is:
usub_sat(X, C) pred C2
=> (X < C) || ((X - C) pred C2) -> when (0 pred C2) is true
=> (X >= C) && ((X - C) pred C2) -> when (0 pred C2) is false
These expressions can generally be folded into a simpler expression. As
they can sometimes emit more than one instruction, they are limited to
cases where the "usub_sat" has only one user.
Fixes#58342.
Proofs: https://alive2.llvm.org/ce/z/ws_N2J
Differential Revision: https://reviews.llvm.org/D154206
When D152728 hoisted the code to a helper function, it moved the call
to the helper outside of `foldICmpEquality`, so an equality check is
needed in the helper.
Reviewed By: nikic, fhahn
Differential Revision: https://reviews.llvm.org/D153041
InstCombine tries to swap compare operands to match sub instructions
in order to expose "CSE opportunities". However, it doesn't really
make sense to perform this transform in the middle-end, as we cannot
actually CSE the instructions there.
The backend already performs this fold in
18f5446a45/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (L4236)
on the SDAG level, however this only works within a single basic block.
To handle cross-BB cases, we do need to handle this in the IR layer.
This patch moves the fold from InstCombine to CGP in the backend,
while keeping the same (somewhat dubious) heuristic.
Differential Revision: https://reviews.llvm.org/D152541