We previously did this iff the inner `(shl/lshr -1, x)` was
one-use. No instructions are added even if the inner `(shl/lshr -1,
x)` is multi-use and this canonicalization both makes the resulting
instruction easier to analyze and shrinks its dependency chain.
Closes#81576
If we weren't shifting out any non-zero bits or changing the sign before the transform, we
shouldn't be after.
Alive2: https://alive2.llvm.org/ce/z/mB-rWz
This patch passes `SimplifyQuery` to `computeKnownBits` directly in
`InstSimplify` and `InstCombine`.
As the `DomConditionCache` in #73662 is only used in `InstCombine`, it
is inconvenient to introduce a new argument `DC` to `computeKnownBits`.
This patch adds exact flags for sext/zext idiom `shr (shl X, Y), Y`.
Alive2: https://alive2.llvm.org/ce/z/xYFpfB
We can generalize it to handle pattern `shr (shl X, Y), Z` with `Y u>=
Z` (e.g., non-splat vectors). But I don't think it's worth the effort.
This missed optimization is discovered with the help of
https://github.com/AliveToolkit/alive2/pull/962.
Use the constant folding API instead. In the second case using
IR builder should also work, but the way the instructions are
created an inserted there is very unusual, so I've left it alone.
As per my proposal for how to eliminate debug intrinsics [0], for various
places in InstCombine prefer to insert using an instruction iterator rather
than an instruction pointer. This is so that we can eventually pass more
information in the iterator class. These call-sites where I've changed the
spelling are those that necessary to build a stage2clang to produce an
identical binary in the coming no-debug-intrinsics mode.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152543
1 << (cttz X) --> -X & X
https://alive2.llvm.org/ce/z/qv3E9e
This creates an extra use of the input value, so that's generally
not preferred, but there are advantages to this direction:
1. 'negate' and 'and' allow for better analysis than 'cttz'.
2. This is more likely to induce follow-on transforms (in the
example from issue #60801, we'll get the decrement pattern).
3. The more basic ALU ops are more likely to result in better
codegen across a variety of targets.
This won't solve the motivating bugs (see issue #60799) because
we do not recognize the redundant icmp+sel, and the x86 backend
may not have the pattern-matching to produce the optimal BMI
instructions.
Differential Revision: https://reviews.llvm.org/D144329
We can match and capture in one statement. Also, make the
code more closely resemble the description comment by using
the constant name of an operand value.
Tries to perform
(lshr (add (zext X), (zext Y)), K)
-> (icmp ult (add X, Y), X)
where
- The add's operands are zexts from a K-bits integer to a bigger type.
- The add is only used by the shr, or by iK (or narrower) truncates.
- The lshr type has more than 2 bits (other types are boolean math).
- K > 1
This seems to be a pattern that just comes from OpenCL front-ends, so adding DAG/GISel combines doesn't seem to be worth the complexity.
Original patch D107552 by @abinavpp - adapted to use (a + b < a) instead of uaddo following discussion on the review.
See this issue https://github.com/RadeonOpenCompute/ROCm/issues/488
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D138814
Follow-up to:
6c39a3aae1dc
That converted a pattern with ashr directly to icmp+zext, and
this updates the pattern that we used to convert to.
This canonicalizes to icmp for better analysis in the minimum case
and shortens patterns where the source type is not the same as dest type:
https://alive2.llvm.org/ce/z/tpXJ64https://alive2.llvm.org/ce/z/dQ405O
This requires an adjustment to an icmp transform to avoid infinite looping.
This is a disguised sign-bit test with offset:
(X / +DivC) >> (Width - 1) --> ext (X <= -DivC)
(X / -DivC) >> (Width - 1) --> ext (X >= +DivC)
https://alive2.llvm.org/ce/z/cO8JO4
We don't match/test poison in the sdiv constant because
that would be immediate undefined behavior.
If we are right shifting a multiply by a negated power of 2 where
the power of 2 is the same as the shift amount, we can replace with
a negate followed by an And.
New tests have not been committed yet but the patch shows the diffs.
Let me know if you want any changes or additional tests.
Differential Revision: https://reviews.llvm.org/D130103
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
(C2 >> X) >> C1 --> (C2 >> C1) >> X
The shift-left form of this transform has existed since:
16f18ed7b555bce5163
...but it applies to matching shift right opcodes too:
https://alive2.llvm.org/ce/z/c5eQms
The restriction goes back to:
16f18ed7b555bce51
...but the fold only replaces a shift with a shift, so that's not necessary.
Generalizing to other opcodes is planned as a follow-up.
This reverts commit 3794cc0e996481e10307b67c8436aa44e0d65d22.
This change is suspected of causing bots to hang at stage 2
compiles, so reverting to confirm and investigate.
The existing transform was wrong in 3 ways:
1. It created an extra instruction when the source and dest types don't match.
2. It did not account for an extra use of the icmp, so could create 2 extra insts.
3. It favored bit hacks over icmp (icmp generally has better analysis).
This fixes#54692 (modeled by the PhaseOrdering tests).
This is a minimal step to fix the bug, but we should likely invert
the sibling transform for the "is negative" pattern too.
The backend should be able to invert this back to a shift if that
leads to better codegen.
The first attempt at this missed a check to make sure the offset
constant was in range and caused many bot failures.
That was missed in the Alive2 proof because on overshift creates
poison rather than the assert from APInt. Here's an alternate
attempt at a proof using count-trailing-zeros:
https://alive2.llvm.org/ce/z/pnXQYR
Original commit message:
This is similar to an existing pre-shift-of-constant fold:
8a9c70fc01e6
...but in this case, we need no-wrap on the shl and a negative
offset:
https://alive2.llvm.org/ce/z/_RVz99
This reverts commit 5819f4a422865fc9a8ea4dc772769e14010ff6a7.
This caused bots to fail with a crash/assert during the fold,
so some constraint was missed.
This is similar to an existing pre-shift-of-constant fold:
8a9c70fc01e6
...but in this case, we need no-wrap on the shl and a negative
offset:
https://alive2.llvm.org/ce/z/_RVz99Fixes#54890
This is not expected to have a functional difference as discussed in the
post-commit comments for 8a9c70fc01e6. All of the motivating tests for
the older fold still optimize as expected because other code can infer
the 'nuw'.
With 'nuw' we can convert the increment of the shift amount
into a pre-shift (constant fold) of the shifted constant:
https://alive2.llvm.org/ce/z/FkTyR2
Fixes issue #41976
The test is already simplified, and I'm not sure how
to write a test to exercise the new clause. But it
protects the 2-bit pattern from miscompiling as noted
in D123453.
https://alive2.llvm.org/ce/z/QPyVfv
(If we managed to fall into the mul transform, it
would wrongly create a zero on this pattern.)