When InstTy is a type like IntrinsicInst which can have a variable
number of arguments, we can encounter a case where Operation will have
fewer than two arguments and error at the getOperand() calls.
Fixes: https://github.com/llvm/llvm-project/issues/152725.
This put the onus on the caller to ensure the result type is big enough.
In the unlikely event a cropped result is required then explicitly
truncate a safe value.
Previously only fixed vector splats were handled. This adds supports for
scalable vectors too by allowing ConstantExpr splats.
We need to add the extra V->getType()->isVectorTy() check because a
ConstantExpr might be a scalar to vector bitcast.
By allowing ConstantExprs this also allow fixed vector ConstantExprs to
be folded, which causes the diffs in
llvm/test/Analysis/ValueTracking/known-bits-from-operator-constexpr.ll
and llvm/test/Transforms/InstSimplify/ConstProp/cast-vector.ll. I can
remove them from this PR if reviewers would prefer.
Fixes#132922
Fixes (keep it open) #130110.
If the incoming value is PHI itself, we can skip this. If we can
guarantee that the other incoming values are neither undef nor poison,
then we can also guarantee that the value isn't either. If we cannot
guarantee that, it makes no sense in calculating it.
Update LangRef and code using `Dereferenceable` in assume bundles to
only use the information if it is safe at the point of use.
`Dereferenceable` in an assume bundle is only guaranteed at the point of
the assumption, but may not be guaranteed at later points, because the
pointer may have been freed.
Update code using `Dereferenceable` to only use it if the pointer cannot
be freed. This can further be refined to check if the pointer could be
freed between assume and use.
This follows up on https://github.com/llvm/llvm-project/pull/123196.
With that change, it should be safe to expose dereferenceable
assumptions more widely as in
https://github.com/llvm/llvm-project/pull/121789
PR: https://github.com/llvm/llvm-project/pull/126117
- **[ValueTracking] Add test for issue 124275**
- **[ValueTracking] Fix bug of using wrong condition for deducing
KnownBits**
Fixes https://github.com/llvm/llvm-project/issues/124275
Bug was introduced by https://github.com/llvm/llvm-project/pull/114689
Now that computeKnownBits supports breaking out of recursive Phi
nodes, `IncValue` can be an operand of a different Phi than `P`. This
breaks the previous assumptions we had when using the possibly
condition at `CxtI` to constrain `IncValue`.
Create an abstraction over isImplied{True,False}ByMatchingCmp to
faithfully communicate the result of both functions, cleaning up code in
callsites. While at it, fix a bug in the implied-false version of the
function, which was inadvertedenly dropping samesign information.
There is a narrow special-case in isImpliedCondICmps that can benefit
from being taught about samesign. Since it costs us nothing to implement
it, teach it about samesign, for completeness. This patch marks the
completion of the effort to teach ValueTracking about samesign.
Move isImplied{True,False}ByMatchingCmp from CmpInst to ICmpInst, so
that it can operate on CmpPredicate instead of CmpInst::Predicate, and
teach it about samesign. There are two callers of this function, and we
choose to migrate the one in ValueTracking, namely
isImpliedCondMatchingOperands to CmpPredicate, hence teaching it about
samesign, with visible test impact.
isImpliedCondICmps() and its callers in ValueTracking can greatly
benefit from being taught about samesign. As a first step, teach one
caller, namely isImpliedCondOperands(). Very minimal changes are
required for this, as CmpPredicate::getMatching() does most of the work.
A signed min-max clamp is the sequence of smin and smax intrinsics,
which constrain a signed value into the range: smin <= value <= smax.
The patch improves the calculation of KnownBits for a value subjected to
the signed clamping.
A urem recurrence has the property that the result can never exceed the
start value. A udiv recurrence has the property that the result can
never exceed either the start value or the numerator, whichever is
greater. Implement a simplification based on these properties.
As defined in LangRef, branching on `undef` is undefined behavior.
This PR aims to remove undefined behavior from tests. As UB tests break
Alive2 and may be the root cause of breaking future optimizations.
Here's an Alive2 proof for one of the examples:
https://alive2.llvm.org/ce/z/TncxhP
Changes are:
1) Make signed-overflow detection optimal
2) For signed-overflow, try to rule out direction even if we can't
totally rule out overflow.
3) Intersect add/sub assuming no overflow with possible overflow
clamping values as opposed to add/sub without the assumption.
KnownBits::srem does not correctly set the leader zero-bits, omitting
the fact that LHS may be known-negative or known-non-negative. Fix this.
Alive2 proof: https://alive2.llvm.org/ce/z/Ugh-Dq
There is an underlying bug in KnownBits, and we should theoretically be
able to determine the high-bits of an srem as shown in the test, just
like urem. In preparation to fix this bug, add pre-commit tests testing
high-bits of srem and urem.
The shift-recurrence-knownbits.ll test file only covers shift
instructions while testing recurrence patterns with knownbits. Add tests
for add, sub, mul, and, and or as well, and rename the file
recurrence-knownbits.ll.
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
Add simple support for looking through a zext when doing
ComputeKnownSignBits for shl. This is valid for the case when
all extended bits are shifted out, because then the number of sign
bits can be found by analysing the zext operand.
The solution here is simple as it only handle a single zext (not
passing remaining left shift amount during recursion). It could be
possible to generalize this in the future by for example passing an
'OffsetFromMSB' parameter to ComputeNumSignBitsImpl, telling it to
calculate number of sign bits starting at some offset from the most
significant bit.
`llvm.vector.reverse` preserves each of the elements and thus elements
common to them.
Alive2 doesn't support the intrin yet, but the logic seems pretty
self-evident.
Closes#99013
Add KnownBits computations to ValueTracking and X86 DAG lowering.
These instructions add/subtract adjacent vector elements in their operands. Example: phadd [X1, X2] [Y1, Y2] = [X1 + X2, Y1 + Y2]. This means that, in this example, we can compute the KnownBits of the operation by computing the KnownBits of [X1, X2] + [X1, X2] and [Y1, Y2] + [Y1, Y2] and intersecting the results. This approach also generalizes to all x86 vector types.
There are also the operations phadd.sw and phsub.sw, which perform saturating addition/subtraction. Use sadd_sat and ssub_sat to compute the KnownBits of these operations.
Also adjust the existing test case pr53247.ll because it can be transformed to a constant using the new KnownBits computation.
Fixes#82516.
When calling SimplifyDemandedBits (as opposed to
SimplifyDemandedInstructionBits), and there are multiple uses,
always use SimplifyMultipleUseDemandedBits and drop the special
case for root values.
This fixes the ephemeral value detection, as seen by the restored
assumes in tests. It may result in more or less simplification,
depending on whether we get more out of having demanded bits or
the ability to perform non-multi-use transforms. The change in
the phi-known-bits.ll test is because the icmp operand now gets
simplified based on demanded bits, which then prevents a different
known bits simplification later.
This also makes the code safe against future changes like
https://github.com/llvm/llvm-project/pull/97289, which add more
context that would have to be discarded for the multi-use case.
We currently do:
`(icmp eq/ne (and X, Y), Y)` -> `(icmp eq/ne (and ~X, Y), 0)`
if `X` is constant. We can make this more general and do it if `X` is
freely invertable (i.e say `X = ~Z`).
As well, we can also do:
`(icmp eq/ne (and X, Y), Y)` -> `(icmp eq/ne (or X, ~Y), -1)`
If `Y` is freely invertible.
Proofs: https://alive2.llvm.org/ce/z/yeWH3E
Differential Revision: https://reviews.llvm.org/D159059Closes#84688
There is a missed optimization in
``` llvm
define i8 @known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
%2 = add i8 %x, -1
%3 = tail call i8 @llvm.ctlz.i8(i8 %2, i1 true)
%4 = lshr i8 -1, %3
%5 = add i8 %4, 1
%6 = icmp ugt i8 %x, 1
%p = select i1 %6, i8 %5, i8 1
%r = urem i8 %y, %p
ret i8 %r
}
```
which is extracted from the Rust code
``` rust
fn func(x: usize, y: usize) -> usize {
let z = x.next_power_of_two();
y % z
}
```
Here `%p` (a.k.a `z`) is semantically a power-of-two, so `y urem p` can
be optimized to `y & (p - 1)`. (Alive2 proof:
https://alive2.llvm.org/ce/z/H3zooY)
---
It could be generalized to recognizing `LShr(UINT_MAX, Y) + 1` as a
power-of-two, which is what this PR does.
Alive2 proof: https://alive2.llvm.org/ce/z/zUPTbc
Canonicalize getelementptr instructions for scalable vector types into
ptradd representation with an explicit llvm.vscale call. This
representation has better support in BasicAA, which can reason about
llvm.vscale, but not plain scalable GEPs.