`InstCombine` uses `Worklist` to manage change history. `setOperand`,
which was previously used to change the `Select` Instruction, does not,
so it is `run` twice, which causes an `LLVM ERROR`.
This problem is resolved by changing `setOperand` to `replaceOperand` as
the change history will be registered in the Worklist.
Fixes#77553.
This patch does the following folds:
```
(select A && B, T, F) -> (select A, (select B, T, F), F)
(select A || B, T, F) -> (select A, T, (select B, T, F))
```
if `(select B, T, F)` can be folded into a value or a canonicalized SPF.
Alive2: https://alive2.llvm.org/ce/z/4Bdrbu
The original motivation of this patch is to simplify the following
pattern:
```
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%cmp9.i = icmp ugt i64 %add.i, 1152921504606846975
%or.cond.i = or i1 %cmp7.i, %cmp9.i
%cond.i = select i1 %or.cond.i, i64 1152921504606846975, i64 %add.i
->
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%max = call i64 @llvm.umax.i64(i64 %add.i, 1152921504606846975)
%cond.i = select i1 %cmp7.i, i64 1152921504606846975, i64 %max
```
The later form has a better codegen for some backends. It is also more
analysis-friendly than the original one.
Godbolt: https://godbolt.org/z/eK6eb5jf1
Alive2: https://alive2.llvm.org/ce/z/VHlxL2
Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=7c71d3996a72b9b024622f23bf556539b961c88c&to=638ce8666fadaca1ab2639a3c2bc52a4a8508f40&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.02%|-0.00%|+0.02%|-0.03%|-0.00%|-0.05%|-0.00%|
It is an alternative to #76203 and #76363 because we can simplify
`select (icmp eq/ne a, b), a, b` into `b` or `a`.
Fixes#75784.
Fixes#76043.
Thank @XChy for providing additional tests.
Co-authored-by: XChy <xxs_chy@outlook.com>
In line with updated shufflevector semantics, this represents the
poison elements rather than undef elements now. This commit is a
pure rename, without any logic changes.
This is nearly an NFC, the only change is potentially to order that
values are created/names.
Otherwise it is a slight speed boost/simplification to avoid having to
go through the `getFreelyInverted` recursive logic twice to simplify
the extra `not` op.
The bug is that `IsAndVariant` is used to assume which arm in the
select the output `SelInner` should be placed but match the inner
select condition with `m_c_LogicalOp`. With fully simplified ops, this
works fine, but its possible if the select condition is not
simplified, for it match both `LogicalAnd` and `LogicalOr` i.e `select
true, true, false`.
In PR71330 for example, the issue occurs in the following IR:
```
define i32 @bad() {
%..i.i = select i1 false, i32 0, i32 3
%brmerge = select i1 true, i1 true, i1 false
%not.cmp.i.i.not = xor i1 true, true
%.mux = zext i1 %not.cmp.i.i.not to i32
%retval.0.i.i = select i1 %brmerge, i32 %.mux, i32 %..i.i
ret i32 %retval.0.i.i
}
```
When simplifying:
```
%retval.0.i.i = select i1 %brmerge, i32 %.mux, i32 %..i.i
```
We end up matching `%brmerge` as `LogicalAnd` for `IsAndVariant`, but
the inner select (`%..i.i`) condition which is `false` with
`LogicalOr`.
Closes#71489
This reverts commit b5743d4798b250506965e07ebab806a3c2d767cc.
This causes some minor compile-time impact. Revert for now, better
to do the change more gradually.
Instead of unsetting flags on the instruction, attempting the
fold, and the resetting the flags if it failed, add support to
simplifyWithOpReplaced() to ignore poison-generating flags/metadata
and collect all instructions where they may need to be dropped.
This allows us to perform the fold a) with poison-generating
metadata, which was previously not handled and b) poison-generating
flags/metadata that are not on the root instruction.
Proof for the ctpop case: https://alive2.llvm.org/ce/z/3H3HFs
Fixes https://github.com/llvm/llvm-project/issues/62450.
Extend folding for `2^n` euclidean division remainder operations
on signed integers by handling the specific instance in which one
`select` arm has already been replaced by 1.
Reported-By: HypheX
Fixes: https://github.com/llvm/llvm-project/issues/66417.
As per my proposal for how to eliminate debug intrinsics [0], for various
places in InstCombine prefer to insert using an instruction iterator rather
than an instruction pointer. This is so that we can eventually pass more
information in the iterator class. These call-sites where I've changed the
spelling are those that necessary to build a stage2clang to produce an
identical binary in the coming no-debug-intrinsics mode.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152543
Fold select (fcmp oeq x, 0), (fmul x, y), x => x
This cleans up a pattern left behind by denormal range checks under
denormals are zero.
The pattern starts out as something like:
x = x < smallest_normal ? x * K : x;
The comparison folds to an == 0 when the denormal mode treats input
denormals as zero. This makes library denormal checks free after
linked into DAZ enabled code.
alive2 is mostly happy with this, but there are some issues. First,
there are many reported failures in some of the negative tests that
happen to trigger some preexisting canonicalize introducing
combine. Second, alive2 is incorrectly asserting that denormals must
be flushed with the DAZ modes. It's allowed to drop a canonicalize.
https://reviews.llvm.org/D157030
Was missing a nullptr check before derefencing. Fixed + test case
included in the patch.
Re-Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148414
This just expands on the existing logic that worked for `Or` and
applies it to any binop where `0` is the identity value on the RHS
i.e: `add`, `or`, `xor`, `shl`, etc...
Proofs For Some: https://alive2.llvm.org/ce/z/XZo6JD
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148414
This is essentially NFC as the cases `decomposeBitTestICmp` covers
that weren't already covered explicitly, will be canonicalized into
the cases explicitly covered. As well the unsigned cases don't apply
as the `Mask` is not a power of 2.
That being said, using a well established helper is less bug prone and
if some canonicalization changes, will prevent regressions here.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148744
AFAICT, the trunc is not needed for correctness/performance and just
blocks what should be handlable cases.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148413
There was just alot of boolean logic to propegate conditions that seem
clearer in conditions.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148412
Simplify a pattern that may show up when computing
the remainder of euclidean division. Particularly,
when the divisor is a power of two and never negative,
the signed remainder can be folded with a bitwise and.
Fixes 64305.
Proofs: https://alive2.llvm.org/ce/z/9_KG6c
Differential Revision: https://reviews.llvm.org/D156811
The select arm that takes the ctlz result can also instead be a
constant with the bit width (as this is what the ctlz evaluates to
for a==0).
This avoids a regression when strengthening the
simplifyWithOpReplaced() fold.
Proof: https://alive2.llvm.org/ce/z/DMRL5A
The select base, (gep base, offset) to gep base, select (0, offset)
fold used to drop inbounds, because the gep base, 0 this introduces
might not be inbounds. After the semantics change in D154051, such
a GEP is always considered inbounds, in which allows us to preserve
the flag here.
As the PhaseOrdering test demonstrates, this can result in major
optimization improvements in some cases.
Differential Revision: https://reviews.llvm.org/D154055
This fold is buggy if the constant adjustment overflows.
Additionally, since we now canonicalize to min/max intrinsics,
the constants picked here don't actually matter, as long as SPF
still recognizes the pattern.
Fixes https://github.com/llvm/llvm-project/issues/62088.
This reverts commit 754f3ae65518331b7175d7a9b4a124523ebe6eac.
Unfortunately the change can cause regressions due to dropping flags
from instructions (like nuw,nsw,inbounds), prevent further optimizations
depending on those flags.
A simple example is the IR below, where `inbounds` is dropped with the
patch and the phase-ordering test added in 7c91d82ab912fae8b.
define i1 @test(ptr %base, i64 noundef %len, ptr %p2) {
bb:
%gep = getelementptr inbounds i32, ptr %base, i64 %len
%c.1 = icmp uge ptr %p2, %base
%c.2 = icmp ult ptr %p2, %gep
%select = select i1 %c.1, i1 %c.2, i1 false
ret i1 %select
}
For more discussion, see D149404.
This patch add a new API `impliesPoisonIgnoreFlagsOrMetadatas` which is
the same as `impliesPoison` but ignoring poison generating flags or
metadatas in the process of implying poison and recording these ignored
instructions.
In InstCombineSelect, replacing `impliesPoison` with
`impliesPoisonIgnoreFlagsOrMetadatas` to allow more patterns like
`select i1 %a, i1 %b, i1 false` to be optimized to and/or instructions
by droping the poison generating flags or metadatas.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149404
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.
Differential Revision: https://reviews.llvm.org/D149256
This reverts commit 27c4e233104ba765cd986b3f8b0dcd3a6c3a9f89.
I think I made a mistake with the use in RemoveConditionFromAssume(),
because the instruction being changed is not the current one, but
the next assume. Revert the change for now.
Same as with other replacement methods, it's generally necessary
to report a change on the instruction itself, e.g. by returning
it from the visit method (or possibly explicitly adding it to the
worklist).
Return Instruction * from replaceUse() to encourage the usual
"return replaceXYZ" pattern.
In the degenerate case where the select is fed by an unsimplified
icmp with two constant operands, don't try to replace one constant
with another. Wait for the icmp to be simplified first instead.
Fixes https://github.com/llvm/llvm-project/issues/61361.
This overlaps partially with the codegen patch D144789. This needs no-wrap
for correctness, and I'm not sure if there's an unsigned equivalent:
https://alive2.llvm.org/ce/z/ErmQ-9https://alive2.llvm.org/ce/z/mr-c_A
This is obviously an improvement in IR, and it looks like a codegen win
for all targets and data types that I sampled.
The 'nabs' case is left as a potential follow-up (and seems less likely
to occur in real code).
Differential Revision: https://reviews.llvm.org/D145073