Goal is simply to reduce direct usage of getLength and setLength so that
if we end up moving memset.pattern (whose length is in elements) there
are fewer places to audit.
When matching integers, `m_ConstantInt` is a convenient alternative to
`m_APInt` for matching unsigned 64-bit integers, allowing one to
simplify
```cpp
const APInt *IntC;
if (match(V, m_APInt(IntC))) {
if (IntC->ule(UINT64_MAX)) {
uint64_t Int = IntC->getZExtValue();
// ...
}
}
```
to
```cpp
uint64_t Int;
if (match(V, m_ConstantInt(Int))) {
// ...
}
```
However, this simplification is only true if `V` is a scalar type.
Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt`
does not.
This patch ensures that the matching behaviour of `m_ConstantInt`
parallels that of `m_APInt`, and also incorporates it in some obvious
places.
For loads that operate on aggregate type, instcombine unpacks the loads.
It does not preserve the invariant.load metadata. This patch fixes that,
it looks for the metadata in the parent load and attaches the metadata
to the unpacked loads.
```
%struct.double2 = type { double, double }
%struct.double1 = type { double }
define %struct.double2 @func1(ptr %a) {
%1 = load %struct.double2, ptr %a, align 16, !invariant.load !1
ret %struct.double2 %1
}
!1 = !{}
```
Reproducer: https://godbolt.org/z/hcY8MMvYh
In visitShuffleVectorInst there's an if block that's meant to turn
shufflevector followed by bitcast into extractelement where possible.
It assumes that there will never be bitcasts performed on vectors of ptr
as such operations are almost always illegal, and ptrtoint instructions
should be used instead.
There is however an edge case where a bitcast instruction can be
performed on a vector of type `<1 x ptr>` to turn it into type `ptr`
In this edge case, the code initializes the variable `VecBitWidth` to 0.
Then, when iterating over users that are bitcasts, an attempt is made to
create a vector of size 0, which triggers and assert.
This commit changes initialization of `VecBitWidth` to use datalayout to
find the the size of the vector instead of getPrimitiveSizeInBits method
which results in 0 for ptr and vectors of ptr.
For value-accumulating recurrences of kind:
```
%umax.acc = phi i8 [ %umax, %backedge ], [ %a, %entry ]
%umax = call i8 @llvm.umax.i8(i8 %umax.acc, i8 %b)
```
The binary intrinsic may be simplified into an intrinsic with init
value and the other operand, if the latter is loop-invariant:
```
%umax = call i8 @llvm.umax.i8(i8 %a, i8 %b)
```
Proofs: https://alive2.llvm.org/ce/z/ea2cVC.
Fixes: https://github.com/llvm/llvm-project/issues/145875.
My understanding is that gep [n x i8] and gep i8 can be treated
equivalently - the array type conveys no extra information and could be
removed. This goes through foldCmpLoadFromIndexedGlobal and tries to
make it work for non-array gep types, so long as the index type still
matches the array being loaded.
Extracting any element from a subvector starting at index 0 is
equivalent to extracting from the original vector, i.e.
extract_elt(vector_extract(x, 0), y) -> extract_elt(x, y)
Split GEPs that have more than one variable index into two. This is in
preparation for the ptradd migration, which will not support multi-index
GEPs.
This also enables the split off part to be CSEd and LICMed.
shrinkSplatShuffle in InstCombine would only move truncs up through
shuffles if those shuffles inputs had the exact same type as their
output, this PR weakens this constraint to only requiring that the
scalar type of the input and output match.
There is no need to first remove the instructions before and then
the ones after in two different worklist iterations. We don't need
to worry about change reporting here, as the functions do that
themselves.
This avoids the issue in #150338, but not really in a principled
way. It's possible that we will have to allow poison arguments
to lifetime.start/lifetime.end again if this turns out to be a
recurring problem.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
When using PatternMatch, there is a common problem where we want to both
match something against a pattern, but also capture the
value/instruction for various reasons (e.g. to access flags).
Currently the two ways to do that is to either capture using
m_Value/m_Instruction and do a separate match on the result, or to use
the somewhat awkward `m_CombineAnd(m_XYZ, m_Value(V))` pattern.
This PR introduces to add a variant of `m_Value`/`m_Instruction` which
does both a capture and a match. `m_Value(V, m_XYZ)` is basically
equivalent to `m_CombineAnd(m_XYZ, m_Value(V))`.
I've ported two InstCombine files to this pattern as a sample.
When expanding a GEP chain, if there is a chain of one-use GEPs followed
by a multi-use GEP, rewrite the multi-use GEP to include the one-use
GEPs offsets.
This means the offsets from the one-use GEPs can be reused by the offset
expansion without additional cost (from computing them again with a
different reassociation).
This is another prune of dead code -- we never generate debug intrinsics
nowadays, therefore there's no need for these codepaths to run.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
At this stage I'm just opportunistically deleting any code using
debug-intrinsic types, largely adjacent to calls to findDbgUsers. I'll
get to deleting that in probably one or more two commits.
SROA and a few other facilities use generic-lambdas and some overloaded
functions to deal with both intrinsics and debug-records at the same time.
As part of stripping out intrinsic support, delete a swathe of this code
from things in the Utils directory.
This is a large diff, but is mostly about removing functions that were
duplicated during the migration to debug records. I've taken a few
opportunities to replace comments about "intrinsics" with "records",
and replace generic lambdas with plain lambdas (I believe this makes
it more readable).
All of this is chipping away at intrinsic-specific code until we get to
removing parts of findDbgUsers, which is the final boss -- we can't
remove that until almost everything else is gone.
To push a freeze through an instruction, only one operand may produce
poison. However, this currently fails for identical operands which are
treated as separate. This patch fixes this by treating them as a single
operand.
With the advent of intrinsic-less debug-info, we no longer need to
scatter calls to getPrevNonDebugInstruction around the codebase. Remove
most of them -- there are one or two that have the "SkipPseudoOp" flag
turned on, however they don't seem to be in positions where skipping
anything would be reasonable.
The testcase I added previously failed because a SelectInst with invalid
operands was created (one side `addrspace(4)`, the other
`addrspace(5)`).
PointerReplacer needs to dig deeper if the true and/or false
instructions of the select are not available.
Fixes SWDEV-542957
Try to optimize a call to the result of a ptrauth intrinsic, potentially
into the ptrauth call bundle:
call(ptrauth.resign(p)), ["ptrauth"()] -> call p, ["ptrauth"()]
call(ptrauth.sign(p)), ["ptrauth"()] -> call p
as long as the key/discriminator are the same in sign and auth-bundle,
and we don't change the key in the bundle (to a potentially-invalid
key.)
Generating a plain call to a raw unauthenticated pointer is generally
undesirable, but if we ended up seeing a naked ptrauth.sign in the first
place, we already have suspicious code. Unauthenticated calls are also
easier to spot than naked signs, so let the indirect call shine.
Note that there is an arguably unsafe extension to this, where we don't
bother checking that the key in bundle and intrinsic are the same (and
also allow folding away an auth into a bundle.)
This can end up generating calls with a bundle that has an invalid key
(which an informed frontend wouldn't have otherwise done), which can be
problematic. The C that generates that is straightforward but arguably
unreasonable. That wouldn't be an issue if we were to bite the bullet
and make these fully AArch64-specific, allowing key knowledge to be
embedded here.
Try to optimize a call to a ptrauth constant, into its ptrauth bundle:
call(ptrauth(f)), ["ptrauth"()] -> call f
as long as the key/discriminator are the same in constant and bundle.
Add a new fold to instcombine to move SExt/ZExt across identity
shuffles, applying the cast after the shuffle. This sinks extends and
can enable more general additional folding of both shuffles (and
related instructions) and extends. If backends prefer splitting up doing
casts first, the extends can be hoisted again in VectorCombine for
example.
A larger example is included in the load_i32_zext_to_v4i32. The wider
extend is easier to compute an accurate cost for and targets (like
AArch64) can lower a single wider extend more efficiently than multiple
separate extends.
This is a generalization of a VectorCombine version
(https://github.com/llvm/llvm-project/pull/141109) as suggested by
@preames.
PR: https://github.com/llvm/llvm-project/pull/146901
The motivation of this pattern is to check whether the product of a
variable and a constant would be mathematically (i.e., as integer
numbers instead of bit vectors) greater than a given constant bound. The
pattern appears to occur when compiling several Rust projects (it seems
to originate from the `smallvec` crate but I have not checked this
further).
Unless `c1` is `0`, we can transform this pattern into `x > c2/c1` with
all operations working on unsigned integers. Due to undefined behavior
when an element of a non-splat vector is `0`, the transform is only
implemented for scalars and splat vectors.
Alive proof: https://alive2.llvm.org/ce/z/LawTkmCloses#142674
If C1 is 1 and we're working with a power of two divisor, this will end
up replacing the `and` for the remainder with a multiply and a longer
dependency chain.
Fixes https://github.com/llvm/llvm-project/issues/147176.
The change adds a new instcombine pattern, and associated test, for
patterns like this:
```
%3 = shufflevector <2 x float> %1, <2 x float> poison, <4 x i32> zeroinitializer
%4 = extractelement <4 x float> %3, i64 %idx
```
The shufflevector has a splat, or broadcast, mask, so the extractelement
simply must be the first element of %1, so we transform this to
```
%2 = extractelement <2 x float> %1, i64 0
```