This patch relands https://github.com/llvm/llvm-project/pull/86409.
I mistakenly thought that `Known.makeNegative()` clears the sign bit of
`Known.Zero`. This patch fixes the assertion failure by explicitly
clearing the sign bit.
There is a missed optimization in
``` llvm
define i8 @known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
%2 = add i8 %x, -1
%3 = tail call i8 @llvm.ctlz.i8(i8 %2, i1 true)
%4 = lshr i8 -1, %3
%5 = add i8 %4, 1
%6 = icmp ugt i8 %x, 1
%p = select i1 %6, i8 %5, i8 1
%r = urem i8 %y, %p
ret i8 %r
}
```
which is extracted from the Rust code
``` rust
fn func(x: usize, y: usize) -> usize {
let z = x.next_power_of_two();
y % z
}
```
Here `%p` (a.k.a `z`) is semantically a power-of-two, so `y urem p` can
be optimized to `y & (p - 1)`. (Alive2 proof:
https://alive2.llvm.org/ce/z/H3zooY)
---
It could be generalized to recognizing `LShr(UINT_MAX, Y) + 1` as a
power-of-two, which is what this PR does.
Alive2 proof: https://alive2.llvm.org/ce/z/zUPTbc
Inline calls to strcmp(s1, s2) and strncmp(s1, s2, N), where N and
exactly one of s1 and s2 are constant.
For example:
```c
int res = strcmp(s, "ab");
```
is converted to
```c
int res = (int)s[0] - (int)'a';
if (res != 0)
goto END;
res = (int)s[1] - (int)'b';
if (res != 0)
goto END;
res = (int)s[2] - (int)'\0';
END:
```
Ported from a similar gcc feature [Inline strcmp with small constant
strings](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809).
If the value is not boolean and we are checking for `Undef` or
`UndefOrPoison`, we can avoid the potentially expensive IDom walk.
This should improve compile time for isGuaranteedNotToBeUndefOrPoison
and isGuaranteedNotToBeUndef.
Swapping the operands of a select is not valid if one hand is more
poisonous that the other, because the negation zero contains poison
elements.
Fix this by adding an extra parameter to isKnownNegation() to forbid
poison elements.
I've implemented this using manual checks to avoid needing four variants
for the NeedsNSW/AllowPoison combinations. Maybe there is a better way
to do this...
Fixes https://github.com/llvm/llvm-project/issues/89669.
This improves handling of `threadlocal.address` intrinsic in analyses:
The thread-id cannot change within a function with the exception of
suspend points of pre-split coroutines. This changes
`llvm::getUnderlyingObject` to look through `threadlocal.address` in
these cases.
`GlobalsAAResult::AnalyzeUsesOfPointer` checks whether an address can be
traced to simple loads/stores or escapes to other places. Starting the
analysis from a thread-local `GlobalValue` the `threadlocal.address`
intrinsic is safe to skip here.
This improves issue #87437
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.
This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.
As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.
There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
Rename has/dropPoisonGeneratingFlagsOrMetadata to
has/dropPoisonGeneratingAnnotations and make it also handle
nonnull, align and range return attributes on calls, similar
to the existing handling for !nonnull, !align and !range metadata.
Prior to #85863, the required parameters of llvm::isKnownNonZero were
Value and DataLayout. After, they are Value, Depth, and SimplifyQuery,
where SimplifyQuery is implicitly constructible from DataLayout. The
change to move Depth before SimplifyQuery needed callers to be updated
unnecessarily, and as commented in #85863, we actually want Depth to be
after SimplifyQuery anyway so that it can be defaulted and the caller
does not need to specify it.
As the undef can be replaced with a zero value, this is not legal
in the general case. We can only allow poison values. This matches
what the other ValueTracking helpers like computeKnownBits() do.
`(icmp uge/ugt (and X, Y), C)` implies both `(icmp uge/ugt X, C)` and
`(icmp uge/ugt Y, C)`. We can use this to deduce leading ones in `X`.
`(icmp ule/ult (or X, Y), C)` implies both `(icmp ule/ult X, C)` and
`(icmp ule/ult Y, C)`. We can use this to deduce leading zeros in `X`.
Closes#86059
Handles cases like `X ^ Y == X` / `X disjoint| Y == X`.
Both of these cases have identical logic to the existing `add` case,
so just converting the `add` code to a more general helper.
Proofs: https://alive2.llvm.org/ce/z/Htm7peCloses#87706
Instead of relying on known-bits for strictly positive, use the
`isKnownPositive` API. This will use `isKnownNonZero` which is more
accurate.
Closes#88170
Adds support for: `{s,u}{add,sub,mul}.with.overflow`
The logic is identical to the the non-overflow binops, we where just
missing the cases.
Closes#87701
There is one notable "regression". This patch replaces the bespoke `or
disjoint` logic we a direct match. This means we fail some
simplification during `instsimplify`.
All the cases we fail in `instsimplify` we do handle in `instcombine`
as we add `disjoint` flags.
Other than that, just some basic cases.
See proofs: https://alive2.llvm.org/ce/z/_-g7C8Closes#86083
In `(icmp eq (and x,y), C)` all 1s in `C` must also be set in both
`x`/`y`.
In `(icmp eq (or x,y), C)` all 0s in `C` must also be set in both
`x`/`y`.
Closes#87143
All the isKnownNonZero() handling for instructions should be inside
this function. This makes the structure more similar to
computeKnownBitsFromOperator() as well.
This may not be entirely NFC due to different depth handling.
If we had a comparison to a literal nan with a false predicate,
we were incorrectly treating it as an unordered compare. This was
correct for fcmp true, but not fcmp false. I noticed this in the
review for e44d3b3e503fa12fdaead2936b28844aa36237c1 but misdiagnosed
the reason. Also change the test for the fcmp true case to be more
useful, but it wasn't wrong previously.
In 2fe81edef6f
[NFC][RemoveDIs] Insert instruction using iterators in Transforms/
we changed
if (*req_idx != *i)
return FindInsertedValue(I->getAggregateOperand(), idx_range,
- InsertBefore);
+ *InsertBefore);
}
but there is no guarantee that is InsertBefore is non-empty at that
point,
which we e.g can see in the added testcase.
Instead just pass on the optional InsertBefore in the recursive call to
FindInsertedValue, as we do at several other places already.