With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
We had a separate fold that handled just the trivial case where we're
replacing exactly the argument of the select. Handle this in select
value equivalence by relaxing the infinite loop protection to allow a
replacement of a non-constant with a constant.
This also fixes https://github.com/llvm/llvm-project/issues/113301, as
the separate fold did not handle undef values correctly.
See the following case:
```
define i1 @test_logical_and_icmp_samesign(i8 %x) {
%cmp1 = icmp ne i8 %x, 9
%cmp2 = icmp samesign ult i8 %x, 11
%and = select i1 %cmp1, i1 %cmp2, i1 false
ret i1 %and
}
```
Currently we cannot convert this logical and into a bitwise and due to
the `samesign` flag. But if `%cmp2` evaluates to `poison`, we can infer
that `%cmp1` is either `poison` or `true` (`samesign` violation
indicates that X is negative). Therefore, `%and` still evaluates to
`poison`.
This patch converts a logical and into a bitwise and iff TV is poison
implies that Cond is either poison or true. Likewise, we convert a
logical or into a bitwise or iff FV is poison implies that Cond is
either poison or false.
Note:
1. This logic is implemented in InstCombine. Not sure whether it is
profitable to move it into ValueTracking and call `impliesPoison(TV/FV,
Sel)` instead.
2. We only handle the case that `ValAssumedPoison` is `icmp samesign
pred X, C1` and `V` is `icmp pred X, C2`. There are no suitable variants
for `isImpliedCondition` to pass the fact that X is [non-]negative.
Alive2: https://alive2.llvm.org/ce/z/eorFfa
Motivation: fix [a major
regression](https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1724#discussion_r1849663863)
to unblock https://github.com/llvm/llvm-project/pull/112742.
Since cd16b07 (IR: introduce CmpInst::isEquivalence), there is now an
isEquivalence routine in CmpInst that we can use to determine
equivalence in foldSelectValueEquivalence. Implement this, extending it
to include floating-point equivalences as well.
Relands #114356. Compared to the last version, this patch only merges
poison-generating/nsz flags from the select to fix LV regression in
`llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.
Add a helper for shared folds between logical and bitwise and/or
and move the and/or of icmp and fcmp folds in there. This makes
it easier to extend to more folds.
A possible extension would be to base the current and/or of icmp
reassociation logic on this helper, so that it for example also
applies to fcmp.
InstCombinerImpl::foldSelectInstWithICmp has some inlined code for
select-icmp-xor simplification, but this simplification is already done
by other code, via another path:
(X & Y) == 0 ? X : X ^ Y ->
((X & Y) == 0 ? 0 : Y) ^ X ->
(X & Y) ^ X ->
X & ~Y
Cover the cases that it claims to simplify, and demonstrate that
stripping it doesn't cause test changes.
Today, InstCombine can fold fcmp+select patterns to minnum/maxnum
intrinsics when the nnan and nsz flags are set. The ordering of the
operands in both the fcmp and select instructions is important for the
folding to occur.
maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ogt, oge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ule, ult}
The second pattern is supposed to make the order of the operands in the
select instruction irrelevant. However, the pattern matching code uses
the CmpInst::getInversePredicate method to invert the comparison
predicate. This method doesn't take into account the fast-math flags,
which can lead missing the folding opportunity.
The patch extends the pattern matching code to handle unordered fcmp
instructions. This allows the folding to occur even when the select
instruction has the operands in the inverse order.
New maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ugt, uge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ole, olt}
The same changes are applied to the minnum intrinsic.
foldSelectEquivalence currently doesn't support GVN-like replacements on
vector types. Put in the checks for potentially lane-crossing
operations, and lift the limitation.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This corresponds to the canonicalized form of some logic that was
seen in Swift-generated code for comparing optional pointers:
`(X==Z || Y==Z) ? (X==Z && Y==Z) : X==Y --> X==Y`
where `Z` was the constant `0`.
https://alive2.llvm.org/ce/z/J_3aa9
decomposeBitTestICmp() currently returns the result via two out
parameters plus an in-place modification of Pred. This changes it to
return an optional struct instead.
The motivation here is twofold. First, I'd like to extend this code to
handle cases where the comparison is against a value other than zero,
which would mean yet another out parameter. Second, while doing that I
was badly bitten by the in-place modification, so I'd like to get rid of
it.
The foldOpIntoPhi() transforms requires all operands to be
phi-translatable. This can be the case either because they are phi nodes
in the same block, or because the operand dominates the block.
Currently, most callers of foldOpIntoPhi() satisfy this pre-condition by
requiring a constant operand, which trivially dominates everything. Only
selects had handling for variable operands.
Move this logic into foldOpIntoPhi(), so things are handled correctly if
other callers are generalized. Also make the implementation a bit more
general by querying the dominator tree.
This patch updates the select operand when the cond has the nuw or nsw
property. Considering the semantics of the nuw and nsw flag, if there is
no poison value in this expression, this code assumes that X can only be
0, 1 or -1.
close: #96765
alive2: https://alive2.llvm.org/ce/z/3n3n2Q
This patch expands already existing funcionality to include these two
additional folds, which are nearly identical to the ones already
implemented.
Proofs: https://alive2.llvm.org/ce/z/Xy7s4j
This patch adds the aforementioned fold to InstCombine. This pattern is
produced after naive implementations of 3-way comparison in high-level
languages are transformed into LLVM IR and then optimized.
Proofs: https://alive2.llvm.org/ce/z/w4QLq_
LLVM is not evaluating X u > C, a, b the same way it evaluates X <= C, b, a.
To fix this, let's move the folds to after the canonicalization of -1 to TrueVal.
Let's allow splat vectors with poison elements to be recognized too!
Finally, for completion, handle the one case that isn't caught by the above checks because it is canonicalized to eq:
X == -1 ? -1 : X + 1 -> uadd.sat(X, 1)
Alive2 Proof:
https://alive2.llvm.org/ce/z/WEcgYH
It is already handled in a different method, especially as a - UMIN(a,
b) cannot be handled by a select statement, unless it means something
like: "(c < b) ? b - ((b > c) ? c : b) : 0;" but LLVM handles that case
as well.
Consider the following case:
```
%cmp = icmp eq ptr %p, null
%load = load i32, ptr %p, align 4
%sel = select i1 %cmp, i32 %load, i32 0
```
`foldSelectValueEquivalence` converts `load i32, ptr %p, align 4` into
`load i32, ptr null, align 4`, which causes immediate UB. `%load` is
speculatable, but it doesn't hold after operand substitution.
This patch introduces a new helper
`isSafeToSpeculativelyExecuteWithVariableReplaced`. It ignores operand
info in these instructions since their operands will be replaced later.
Fixes#99436.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
This is not safe if the simplification ends up looking through
lane-crossing operations. For now, we don't have a good way to
limit this in computeKnownBits(), so just disable vector handling
entirely.
Fixes https://github.com/llvm/llvm-project/issues/97475.
Disable conversion of a `(select (icmp))` when it would result in more
instructions `(xor (lshr (and)))`. This transformation produces more
instructions and can interfere with other more profitable folds for
`select`. For example before this change the following folding would
occur:
```llvm
%1 = icmp slt i32 %X, 0
%2 = select i1 %1, i64 0, i64 8
```
to
```llvm
%1 = lshr i32 %X, 28
%2 = and i32 %1, 8
%3 = xor i32 %2, 8
%4 = zext nneg i32 %3 to i64
```
Simplify the arms of a select based on the KnownBits implied by its condition.
For now this only handles the case where the select arm folds to a constant,
but this can be generalized to handle other patterns by using
SimplifyDemandedBits instead (in that case we would also have to limit to
non-undef conditions).
This is implemented by adding a new member to SimplifyQuery that can be used
to inject an additional condition. The affected values are pre-computed and
we don't call computeKnownBits() if the select arms don't contain affected
values. This reduces the cost in some pathological cases.
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.