This patch generalizes the fold of `icmp pred min/max(X, Y), Z` to address the issue https://github.com/llvm/llvm-project/issues/62898.
For example, we can fold `smin(X, Y) < Z` into `X < Z` when `Y > Z` is implied by constant folds/invariants/dom conditions.
Alive2 (with `--disable-undef-input` due to the limitation of --smt-to=10000): https://alive2.llvm.org/ce/z/rB7qLc
You can run the standalone translation validation tool `alive-tv` locally to verify these transformations.
```
alive-tv transforms.ll --smt-to=600000 --exit-on-error
```
Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D156238
The icmp is being folded in phi only if they belong in the same BB.
This patch extends the same beyond the BB.
Have seen scenarios where this seems to be beneficial.
Differential Revision: https://reviews.llvm.org/D157740
Since we no longer support typed LLVM IR pointer types, the code can
be simplified into for example using PointerType::get directly instead
of using Type::getInt8PtrTy and Type::getInt32PtrTy etc.
Differential Revision: https://reviews.llvm.org/D156733
Replace these with IRBuilder uses, as we don't (from a type
perspective) care about Constant results.
Switch the predicate to m_ImmConstant() instead of isa<Constant>
to guarantee that these do get folded away and our assumptions
about simplifications hold true.
D152673 Incorrectly didn't account for operand position in the `icmp`,
i.e it treated `icmp uge x, y` the same as `icmp uge y, x` which is
incorrect:
https://reviews.llvm.org/rG142f7448e770f25b774b058a7eab1f107c4daad9
The fix takes operand position into account. The new tests
exhaustively cover all operand positions for `ule`, `uge`, `ult`,
`ugt` (the set of predicates) and all transform verify with the new
commit.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D156058
in `ctpop(X) eq/ne 1` or `ctpop(X) ugt/ule 1`, if there is any
known-bit in `X`, instead of going through `ctpop`, we can just test
if there are any other known bits in `X`. If there are, `X` is not a
power of 2. If there aren't, `X` is a power of 2.
https://alive2.llvm.org/ce/z/eLMJgU
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D152677
This comes up when adding two `bool` types in C/C++
```
bool foo(bool a, bool b) {
return a + b;
}
...
->
define i1 @foo(i1 %a, i1 %b) {
%conv = zext i1 %a to i32
%conv3.neg = sext i1 %b to i32
%tobool4 = icmp ne i32 %conv, %conv3.neg
ret i1 %tobool4
}
```
Proof: https://alive2.llvm.org/ce/z/HffWAN
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D154574
This patch is a continuation of D154206. It introduces a fold for the
operation "uadd_sat(X, C) pred C2" where "C" and "C2" are constants. The
fold is:
uadd_sat(X, C) pred C2
=> (X >= ~C) || ((X + C) pred C2) -> when (UINT_MAX pred C2) is true
=> (X < ~C) && ((X + C) pred C2) -> when (UINT_MAX pred C2) is false
This patch also generalizes the fold to work with any saturating
intrinsic as long as the saturating value is known.
Proofs: https://alive2.llvm.org/ce/z/wWeirP
Differential Revision: https://reviews.llvm.org/D154565
This patch introduces a fold for the operation "usub_sat(X, C) pred C2"
where "C" and "C2" are constants. The fold is:
usub_sat(X, C) pred C2
=> (X < C) || ((X - C) pred C2) -> when (0 pred C2) is true
=> (X >= C) && ((X - C) pred C2) -> when (0 pred C2) is false
These expressions can generally be folded into a simpler expression. As
they can sometimes emit more than one instruction, they are limited to
cases where the "usub_sat" has only one user.
Fixes#58342.
Proofs: https://alive2.llvm.org/ce/z/ws_N2J
Differential Revision: https://reviews.llvm.org/D154206
When D152728 hoisted the code to a helper function, it moved the call
to the helper outside of `foldICmpEquality`, so an equality check is
needed in the helper.
Reviewed By: nikic, fhahn
Differential Revision: https://reviews.llvm.org/D153041
InstCombine tries to swap compare operands to match sub instructions
in order to expose "CSE opportunities". However, it doesn't really
make sense to perform this transform in the middle-end, as we cannot
actually CSE the instructions there.
The backend already performs this fold in
18f5446a45/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (L4236)
on the SDAG level, however this only works within a single basic block.
To handle cross-BB cases, we do need to handle this in the IR layer.
This patch moves the fold from InstCombine to CGP in the backend,
while keeping the same (somewhat dubious) heuristic.
Differential Revision: https://reviews.llvm.org/D152541
The ORE argument threaded through ValueTracking is used only in a
single, untested place. It is also essentially never passed: The
only places that do so have been added very recently as part of the
KnownFPClass migration, which is vanishingly unlikely to hit this
code path. Remove this effectively dead argument.
Differential Revision: https://reviews.llvm.org/D151562
We need to add the replaced instruction itself to the worklist as
well. We want to remove the old instructions, but can't easily do
so directly, as the icmp is also one of the users and we need to
retain it until the fold has finished.
In case of a comparison with two select instructions having the same
condition, check whether one of the resulting branches can be simplified.
If so, just compare the other branch and select the appropriate result.
For example:
%tmp1 = select i1 %cmp, i32 %y, i32 %x
%tmp2 = select i1 %cmp, i32 %z, i32 %x
%cmp2 = icmp slt i32 %tmp2, %tmp1
The icmp will result false for the false value of selects and the result
will depend upon the comparison of true values of selects if %cmp is
true. Thus, transform this into:
%cmp = icmp slt i32 %y, %z
%sel = select i1 %cond, i1 %cmp, i1 false
Differential Revision: https://reviews.llvm.org/D150360
foldAllocaCmp() needs to fold all comparisons of an alloca at the
same time, to ensure that there is a consistent view of the alloca
address. Currently, it folds "all" comparisons by limiting to the
case where there is only one. This patch switches the algorithm to
instead actually collect and fold all comparisons.
Something we need to be careful about here is that there may be
comparisons where both sides of the icmp are based on the alloca.
Such comparisons are comparing offsets of the alloca, and as such
can be ignored here, but shouldn't be folded to false.
Differential Revision: https://reviews.llvm.org/D144492
There is no getNullValue in ConstantFP. Due to inheritance, we're calling
Constant::getNullValue which handles any type including FP.
Since we already know we want an FP constant we can use ConstantFP::getZero
which might be faster and is a more readable name for an FP zero.
Many uses of getIntPtrType() were using that type to calculate the
neened type for GEP offset arguments. However, some time ago,
DataLayout was extended to support pointers where the size of the
pointer is not equal to the size of the values used to index it.
Much code was already migrated to, for example, use getIndexSizeInBits
instead of getPtrSizeInBits, but some rewrites still used
getIntPtrType() to get the type for GEP offsets.
This commit changes uses of getIntPtrType() to getIndexType() where
they are involved in a GEP-related calculation.
In at least one case (bounds check insertion) this resolves a compiler
crash that the new test added here would previously trigger.
This commit does not impact
- C library-related rewriting (memcpy()), which are operating under
the assumption that intptr_t == size_t. While all the mechanisms for
breaking this assumption now exist, doing so is outside the scope of
this commit.
- Code generation and below. Note that the use of getIntPtrType() in
CodeGenPrepare will be changed in a future commit.
- Usage of getIntPtrType() in any backend
Depends on D143435
Reviewed By: arichardson
Differential Revision: https://reviews.llvm.org/D143437
We can fold equality comparisons of non-inbounds geps to offset
comparison (https://alive2.llvm.org/ce/z/x2Zp8b). The inbounds
requirement is only necessary for relational comparisons.
We currently already canonicalize icmp eq (%x & Pow2), Pow2 to
icmp ne (%x & Pow2), 0. This patch generalizes the fold based on
known bits.
In particular, this allows us to handle comparisons against
!range !{i64 0, i64 2} loads, which addresses an optimization
regression in Rust caused by 8df376db7282b955e7990cb8887ee9dcd3565040.
Differential Revision: https://reviews.llvm.org/D146149