Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
Infer the 'exact' flag on an 'lshr' or 'ashr' instruction when the shift
amount is computed via a 'cttz' intrinsic on the same operand.
Proof: https://alive2.llvm.org/ce/z/CQR2PGFixes#131444.
#121386 Introduced cttz intrinsics which caused a regression where
vscale/vscale divisions could no longer be constant folded.
This fold was suggested as a fix in #126411.
https://alive2.llvm.org/ce/z/gWbtPw
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
The following pattern: `(C2 << X) << C1` will usually be transformed
into `(C2 << C1) << X`, essentially swapping `X` and `C1`.
However, this should only be done when `C1` is an immediate constant,
otherwise thiscan lead to both constants being swapped forever.
This fixes#118798.
close: #94737
alive2: https://alive2.llvm.org/ce/z/WF_7mX
In this patch, we combine `(X + Y) / 2` into `(X & Y)` only when both X
and Y are less than or equal to 1.
This is the same fold as ((X << nuw Z) sub nuw Y) >>u exact Z --> X sub
nuw (Y >>u exact Z), but with the sub operands swapped.
Alive2 Proof:
https://alive2.llvm.org/ce/z/pT-RxG
This is the same fold as ((X << nuw Z) sub nuw Y) >>u exact Z --> X sub
nuw (Y >>u exact Z), which we already have and approved in the codebase, but with the sub operands
swapped.
Proof it works in reverse, so we could have a wider generalization of this pattern: https://alive2.llvm.org/ce/z/2cRcdx
These are already handled by canEvaluateShifted/getShiftedValue (one-use
only), and also in reassociateShiftAmtsOfTwoSameDirectionShifts (also
multi-use), so let's at least get rid of the *third* implementation...
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.
This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.
As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.
There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
We previously did this iff the inner `(shl/lshr -1, x)` was
one-use. No instructions are added even if the inner `(shl/lshr -1,
x)` is multi-use and this canonicalization both makes the resulting
instruction easier to analyze and shrinks its dependency chain.
Closes#81576
If we weren't shifting out any non-zero bits or changing the sign before the transform, we
shouldn't be after.
Alive2: https://alive2.llvm.org/ce/z/mB-rWz
This patch passes `SimplifyQuery` to `computeKnownBits` directly in
`InstSimplify` and `InstCombine`.
As the `DomConditionCache` in #73662 is only used in `InstCombine`, it
is inconvenient to introduce a new argument `DC` to `computeKnownBits`.
This patch adds exact flags for sext/zext idiom `shr (shl X, Y), Y`.
Alive2: https://alive2.llvm.org/ce/z/xYFpfB
We can generalize it to handle pattern `shr (shl X, Y), Z` with `Y u>=
Z` (e.g., non-splat vectors). But I don't think it's worth the effort.
This missed optimization is discovered with the help of
https://github.com/AliveToolkit/alive2/pull/962.
Use the constant folding API instead. In the second case using
IR builder should also work, but the way the instructions are
created an inserted there is very unusual, so I've left it alone.
As per my proposal for how to eliminate debug intrinsics [0], for various
places in InstCombine prefer to insert using an instruction iterator rather
than an instruction pointer. This is so that we can eventually pass more
information in the iterator class. These call-sites where I've changed the
spelling are those that necessary to build a stage2clang to produce an
identical binary in the coming no-debug-intrinsics mode.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152543
1 << (cttz X) --> -X & X
https://alive2.llvm.org/ce/z/qv3E9e
This creates an extra use of the input value, so that's generally
not preferred, but there are advantages to this direction:
1. 'negate' and 'and' allow for better analysis than 'cttz'.
2. This is more likely to induce follow-on transforms (in the
example from issue #60801, we'll get the decrement pattern).
3. The more basic ALU ops are more likely to result in better
codegen across a variety of targets.
This won't solve the motivating bugs (see issue #60799) because
we do not recognize the redundant icmp+sel, and the x86 backend
may not have the pattern-matching to produce the optimal BMI
instructions.
Differential Revision: https://reviews.llvm.org/D144329
We can match and capture in one statement. Also, make the
code more closely resemble the description comment by using
the constant name of an operand value.
Tries to perform
(lshr (add (zext X), (zext Y)), K)
-> (icmp ult (add X, Y), X)
where
- The add's operands are zexts from a K-bits integer to a bigger type.
- The add is only used by the shr, or by iK (or narrower) truncates.
- The lshr type has more than 2 bits (other types are boolean math).
- K > 1
This seems to be a pattern that just comes from OpenCL front-ends, so adding DAG/GISel combines doesn't seem to be worth the complexity.
Original patch D107552 by @abinavpp - adapted to use (a + b < a) instead of uaddo following discussion on the review.
See this issue https://github.com/RadeonOpenCompute/ROCm/issues/488
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D138814
Follow-up to:
6c39a3aae1dc
That converted a pattern with ashr directly to icmp+zext, and
this updates the pattern that we used to convert to.
This canonicalizes to icmp for better analysis in the minimum case
and shortens patterns where the source type is not the same as dest type:
https://alive2.llvm.org/ce/z/tpXJ64https://alive2.llvm.org/ce/z/dQ405O
This requires an adjustment to an icmp transform to avoid infinite looping.