We can turn:
```
%add = add i8 %arg, C1
%and = and i8 %add, C2
%cmp = icmp eq i1 %and, C3
```
into:
```
%and = and i8 %arg, C2
%cmp = icmp eq i1 %and, (C3 - C1) & C2
```
This is only worth doing if the sequence is the sole user of the addition
operation.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Extend decomposeBitTestICmp() to handle cases where the resulting
comparison is of the form `icmp (X & Mask) pred C` with non-zero
`C`. Add a flag to allow code to opt-in to this behavior and use it in
the "log op of icmp" fold infrastructure.
This addresses regressions from #97289.
Proofs: https://alive2.llvm.org/ce/z/hUhdbU
decomposeBitTestICmp() currently returns the result via two out
parameters plus an in-place modification of Pred. This changes it to
return an optional struct instead.
The motivation here is twofold. First, I'd like to extend this code to
handle cases where the comparison is against a value other than zero,
which would mean yet another out parameter. Second, while doing that I
was badly bitten by the in-place modification, so I'd like to get rid of
it.
InstCombine already has some rules for `icmp ptrtoint, ptrtoint` to drop
the casts and compare the source values. This change adds the same for
the reverse case with `inttoptr`.
The motivation of this patch is to fold more generalized patterns like
`icmp ult (shl nuw 16, X), 64 -> icmp ult X, 2`.
Alive2: https://alive2.llvm.org/ce/z/gyqjQH
alive2: ~~https://alive2.llvm.org/ce/z/Ag3Ki7~~https://alive2.llvm.org/ce/z/ywP5t2
related: #76438
This patch adds the following foldings: `floor(x) <= x --> true` and `x
<= ceil(x) --> true`. We leverage the properties of these math functions
and ensure there is no floating point input of `nan`.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
When folding an icmp into a select, treat an icmp of a constant with a
one-use ucmp/scmp intrinsic as a simplification. These comparisons will
reduce down to an icmp.
This addresses a regression seen in Rust and also in llvm-opt-benchmark.
Given an unsigned integer comparison of `add nsw X, C1` with some
constant `C2` we can fold it into a signed comparison of `X` and `C2 -
C1` under the following conditions:
* There's a `nsw` flag on the addition
* `C2` is non-negative
* `X + C1` is non-negative
* `C2 - C1` is non-negative
Implement a new transformation that fold the bit-testing expression
(icmp ne (and (lshr V B) 1) 0) to (icmp ne (and V (shl 1 B)) 0) for
constant V. This rule already existed for non-constant V and constants
other than 1; this restriction to non-constant V has been added in
commit c3b2111d975a39d19f0c5d635e2961a4449c5a71 to fix an infinite loop.
Avoid the infinite loop by allowing constant V only if the shift
instruction is an lshr and the constant is 1. Also fold the negated
variant of the LHS.
This transformation necessitates an adaption of existing tests in
`icmp-and-shift.ll` and `load-cmp.ll`. One test in `icmp-and-shift.ll`,
which previously was a negative test, now gets folded. Rename it to
indicate that it is a positive test.
Alive proof: https://alive2.llvm.org/ce/z/vcJJTx
Relates to issue #86813.
Proof (Please run alive-tv with larger smt-to):
https://alive2.llvm.org/ce/z/-aqixk
FMF propagation: https://alive2.llvm.org/ce/z/zyKK_p
```
sqrt(X) < 0.0 --> false
sqrt(X) u>= 0.0 --> true
sqrt(X) u< 0.0 --> X u< 0.0
sqrt(X) u<= 0.0 --> X u<= 0.0
sqrt(X) > 0.0 --> X > 0.0
sqrt(X) >= 0.0 --> X >= 0.0
sqrt(X) == 0.0 --> X == 0.0
sqrt(X) u!= 0.0 --> X u!= 0.0
sqrt(X) <= 0.0 --> X == 0.0
sqrt(X) u> 0.0 --> X u!= 0.0
sqrt(X) u== 0.0 --> X u<= 0.0
sqrt(X) != 0.0 --> X > 0.0
!isnan(sqrt(X)) --> X >= 0.0
isnan(sqrt(X)) --> X u< 0.0
```
In most cases, `sqrt` cannot be eliminated since it has multiple uses.
But this patch will break data dependencies and allow optimizer to sink
expensive `sqrt` calls into successor blocks.
This will enable calling SimplifyDemandedBits() with a SimplifyQuery
that has CondContext set in the future.
Additionally this also marginally strengthens the analysis by
retaining the original context instruction for one-use chains.
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
This patch handles various cases where an operation of the kind `icmp
(ucmp/scmp x, y), constant` folds to `icmp x, y`. Another patch with
cases where this operation folds to a constant (i.e. dumb cases like
`icmp eq (cmp x, y), 4` should be published in a couple of days.
I wasn't sure what negative tests should be added here, if any are
necessary at all. I'd love to hear your suggestions.
Proofs (ucmp): https://alive2.llvm.org/ce/z/qQ7ihz
Proofs (scmp): https://alive2.llvm.org/ce/z/cipKEn
---------
Co-authored-by: Nikita Popov <github@npopov.com>
Two ways to relaxed:
1) Only require one of the `and` ops to be single-use if both `X`
and `Y` are constant.
2) Special case, if `X ^ Y` is a negative power of 2, then
`Z ^ NegP2 ==/!= 0` will fold to `Z u</u>= -NegP2` which
creates no additional instructions so fold irrelivant of
use counts.
Closes#94867
The fold will replace 2 uses of `Y` we should also do fold if `Y` has
2 uses (not only oneuse).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D159062
We currently do:
`(icmp eq/ne (and X, Y), Y)` -> `(icmp eq/ne (and ~X, Y), 0)`
if `X` is constant. We can make this more general and do it if `X` is
freely invertable (i.e say `X = ~Z`).
As well, we can also do:
`(icmp eq/ne (and X, Y), Y)` -> `(icmp eq/ne (or X, ~Y), -1)`
If `Y` is freely invertible.
Proofs: https://alive2.llvm.org/ce/z/yeWH3E
Differential Revision: https://reviews.llvm.org/D159059Closes#84688
This patch extends the transform `(icmp pred iM (shl iM %v, N), C) ->
(icmp pred i(M-N) (trunc %v iM to i(M-N)), (trunc (C>>N))` to handle
icmps with the flipped strictness of predicate.
See the following case:
```
icmp ult i64 (shl X, 32), 8589934593 ->
icmp ule i64 (shl X, 32), 8589934592 ->
icmp ule i32 (trunc X, i32), 2 ->
icmp ult i32 (trunc X, i32), 3
```
Fixes the regression introduced by
https://github.com/llvm/llvm-project/pull/86111#issuecomment-2098203152.
Alive2 proofs: https://alive2.llvm.org/ce/z/-sp5n3
`nuw` cannot be propagated as we always use `ashr` here. I don't see the
value of fixing this (see the test `test_icmp_shl_nuw`).
This is valid as long as the sign of the wrap flag doesn't differ from
the sign of the `pred`.
Proofs: https://alive2.llvm.org/ce/z/35NsrR
NB: The online Alive2 hasn't been updated with `trunc nuw/nsw`
support, so the proofs must be reproduced locally.
Closes#87935
We have flexibility in what constant to use when combining an `ashr
exact` with a slt or ult of a constant, and it's not possible to revisit
this decision later in the compilation pipeline after the `ashr exact`
is removed. Keeping a constant close to power-of-2 (pow2val + 1) should
be no worse than neutral, and in some cases may allow better codegen
later on for targets that can more cheaply generate power of 2 (which
may be selectable if converting back to setle/setge) or near power of 2
constants.
Alive2 proofs:
<https://alive2.llvm.org/ce/z/2BmPnq> and
<https://alive2.llvm.org/ce/z/DtuhnR>
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
guaranteed to succeed, or the code doesn't use the resulting compare if
we fail to fold.
There is no real motivation for this change other than to highlight a
case where the new `Checked` matcher API can handle non-splat-vecs
without increasing code complexity.
Closes#85676
Convert the existing foldICmpTruncWithTruncOrExt() fold to work with
trunc nowrap flags instead of computeKnownBits(). This also allows us to
generalize the fold to work with signed comparisons.
Interestingly, apart from the obvious combinations like signed
predicates with trunc nsw, some non-obvious ones are also legal. For
example for unsigned predicates we can do the transform for two trunc
nsw as well (rather than only trunc nuw).
Proofs: https://alive2.llvm.org/ce/z/ndewwK
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice
from the experimental namespace.
All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.