For value-accumulating recurrences of kind:
```
%umax.acc = phi i8 [ %umax, %backedge ], [ %a, %entry ]
%umax = call i8 @llvm.umax.i8(i8 %umax.acc, i8 %b)
```
The binary intrinsic may be simplified into an intrinsic with init
value and the other operand, if the latter is loop-invariant:
```
%umax = call i8 @llvm.umax.i8(i8 %a, i8 %b)
```
Proofs: https://alive2.llvm.org/ce/z/ea2cVC.
Fixes: https://github.com/llvm/llvm-project/issues/145875.
With the advent of intrinsic-less debug-info, we no longer need to
scatter calls to getPrevNonDebugInstruction around the codebase. Remove
most of them -- there are one or two that have the "SkipPseudoOp" flag
turned on, however they don't seem to be in positions where skipping
anything would be reasonable.
Try to optimize a call to the result of a ptrauth intrinsic, potentially
into the ptrauth call bundle:
call(ptrauth.resign(p)), ["ptrauth"()] -> call p, ["ptrauth"()]
call(ptrauth.sign(p)), ["ptrauth"()] -> call p
as long as the key/discriminator are the same in sign and auth-bundle,
and we don't change the key in the bundle (to a potentially-invalid
key.)
Generating a plain call to a raw unauthenticated pointer is generally
undesirable, but if we ended up seeing a naked ptrauth.sign in the first
place, we already have suspicious code. Unauthenticated calls are also
easier to spot than naked signs, so let the indirect call shine.
Note that there is an arguably unsafe extension to this, where we don't
bother checking that the key in bundle and intrinsic are the same (and
also allow folding away an auth into a bundle.)
This can end up generating calls with a bundle that has an invalid key
(which an informed frontend wouldn't have otherwise done), which can be
problematic. The C that generates that is straightforward but arguably
unreasonable. That wouldn't be an issue if we were to bite the bullet
and make these fully AArch64-specific, allowing key knowledge to be
embedded here.
Try to optimize a call to a ptrauth constant, into its ptrauth bundle:
call(ptrauth(f)), ["ptrauth"()] -> call f
as long as the key/discriminator are the same in constant and bundle.
This is the intrinsic version of #146349, and handles fabs as well as
other intrinsics.
It's largely a copy of InstCombinerImpl::foldShuffledIntrinsicOperands
but a bit simpler since we don't need to find a common mask.
Creating a separate function seems to be cleaner than trying to shoehorn
it into the existing one.
This follows on from
https://github.com/llvm/llvm-project/pull/144933#issuecomment-2992372627,
and allows us to remove the reverse (fneg (reverse x)) combine.
A separate patch will handle the case for fabs. I haven't checked if we
perform this canonicalization for either unops or binops for vp.reverse
This canonicalizes fneg/fabs (shuffle X, poison, mask) -> shuffle
(fneg/fabs X), posion, mask
This undoes part of b331a7ebc1e02f9939d1a4a1509e7eb6cdda3d38 and
a8f13dbdeb31be37ee15b5febb7cc2137bbece67, but keeps the binary shuffle
case i.e. shuffle fneg, fneg, mask.
By pulling out the shuffle we bring it inline with the same
canonicalisation we perform on binary ops and intrinsics, which the
original commit acknowledges it goes in the opposite direction.
However nowadays VectorCombine is more powerful and can do more
optimisations when the shuffle is pulled out, so I think we should
revisit this. In particular we get more shuffles folded and can perform
scalarization.
This simply copies the structure of the vector.reverse patterns from
just above, and reimplements them for the vp.reverse intrinsics when the
mask is all ones and the EVLs exactly match.
Its unfortunate that we have three different ways to represent a reverse
(shuffle, vector.reverse, and vp.reverse) but I don't see an obvious way
to remove any them because the semantics are slightly different.
This significantly improves vectorization in TSVC_2's s112 and s1112
loops when using EVL tail folding.
We canonicalize reverse to after a binop in foldVectorBinop, and
simplify reverse pairs in InstSimplify, so these elimination transforms
are redundant.
This addresses a TODO in foldShuffledIntrinsicOperands to use
isTriviallyVectorizable instead of a hardcoded list of intrinsics, which
in turn allows more intriniscs to be scalarized by VectorCombine.
From what I can tell every intrinsic here should be speculatable so an
assertion was added.
Because this enables intrinsics like abs which have a scalar operand, we
need to also check isVectorIntrinsicWithScalarOpAtArg.
We currently combine (AES (EOR (A, B)), 0) into (AES A, B) for Neon
intrinsics when the zero operand appears in the RHS of the AES
instruction.
This patch extends the combine to support AES SVE intrinsics and
the case where the zero operand appears in the LHS of the AES
instructions.
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
We currently pull shuffles through binops and intrinsics, which is an
important canonical form for VectorCombine to be able to scalarize
vector sequences. But while binops can be folded with a constant
operand, intrinsics currently require all operands to be shufflevectors.
This extends intrinsic folding to be in line with regular binops by
reusing the constant "unshuffling" logic.
As far as I can tell the list of currently folded intrinsics don't
require any special UB handling.
This change in combination with #138095 and #137823 fixes the following
C:
```c
void max(int *x, int *y, int n) {
for (int i = 0; i < n; i++)
x[i] += *y > 42 ? *y : 42;
}
```
Not using the splatted vector form on RISC-V with `-O3 -march=rva23u64`:
```asm
vmv.s.x v8, a4
li a4, 42
vmax.vx v10, v8, a4
vrgather.vi v8, v10, 0
.LBB0_9: # %vector.body
# =>This Inner Loop Header: Depth=1
vl2re32.v v10, (a5)
vadd.vv v10, v10, v8
vs2r.v v10, (a5)
```
i.e., it now generates
```asm
li a6, 42
max a6, a4, a6
.LBB0_9: # %vector.body
# =>This Inner Loop Header: Depth=1
vl2re32.v v8, (a5)
vadd.vx v8, v8, a6
vs2r.v v8, (a5)
```
Migrate their usage to the `AnyMem*Inst` family, and add a isAtomic()
query on the base class for that hierarchy. This matches the idioms we
use for e.g. isAtomic on load, store, etc.. instructions, the existing
isVolatile idioms on mem* routines, and allows us to more easily share
code between atomic and non-atomic variants.
As with #138568, the goal here is to simplify the class hierarchy and
make it easier to reason about. I'm moving from easiest to hardest, and
will stop at some point when I hit "good enough". Longer term, I'd sorta
like to merge or reverse the naming on the plain Mem*Inst and the
AnyMem*Inst, but that's a much larger and more risky change. Not sure
I'm going to actually do that.
Use an existing helper function. Remove the use of a local Changed
variable which doesn't seem to interact with surrounding transforms in
any meaningful way. (Both memcpy and memmove are MemTransfer
instructions, so switching from one to the other doesn't change
results.)
Posted for review mostly for a sanity check that I'm not missing
something with the logic around the Change flag.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
This PR introduces the following transformations:
- If C0 is not 0:
umax(nuw_shl(x, C0), x + 1) -> x == 0 ? 1 : nuw_shl(x, C0)
- If C0 is not 0 or 1:
umax(nuw_mul(x, C0), x + 1) -> x == 0 ? 1 : nuw_mul(x, C0)
Fixes#122388.
Alive2 proof: https://alive2.llvm.org/ce/z/rkp_8U
- **[InstCombine] Add tests for folding `(ct{t,l}z Pow2)`; NFC**
- **[InstCombine] Fold `(ct{t,l}z Pow2)` -> `Log2(Pow2)`**
Do so we can find `Log2(Pow2)` for "free" with `takeLog2`
https://alive2.llvm.org/ce/z/CL77fo
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.
The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.
Fixes#112633
Factor out and unify common code from InstSimplify and InstCombine that
partially guard against cross-lane vector operations into
llvm::isNotCrossLaneOperation in ValueTracking.
Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/68H4ka
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.