This PR adds a small, targeted InstCombine fold for the pattern:
```
%idx = srem i64 %x, 2^k
%p = getelementptr inbounds nuw i8, ptr %base, i64 %idx
```
When the GEP is inbounds + nuw, and the divisor is a non-zero
power-of-two constant, the signed remainder cannot produce a negative
offset without violating the inbounds/nuw constraints. In that case we
can canonicalize the index to a non-negative form and expose the common
power-of-two rewrite:
- Rewrite the GEP index from `srem %x, 2^k` to `urem %x, 2^k`
- Create a new GEP with the new index and replace the original GEP
- the `urem %x, 2^k` will further folds to `and %x (2^k-1)`
resulting the following pattern
```
%idx = and i64 %x, (2^k-1)
%p = getelementptr inbounds nuw i8, ptr %base, i64 %idx
```
Fixes#180097.
generalized alive2 proof: https://alive2.llvm.org/ce/z/8EBxug
In #172961 we are trying to remove llvm.experimental.vp.reverse now that
llvm.vector.splice.right supports variable offsets.
A VP reverse reverses the first EVL elements of the vector, e.g.
01234567 -> 210xxxxx when EVL=3, where x=poison.
This can now be represented by splice.right(reverse(V), poison, EVL):
01234567
-> 76543210 (reverse)
-> 210xxxxx (splice.right)
This PR implements the vp.reverse combines that pull through binops, but
generalized to vector.splice. Specifically, this implements the
following combines:
Op(splice(V1, poison, offset), splice(V2, poison, offset)) -> splice(Op(V1, V2), poison, offset)
Op(splice(V1, poison, offset), RHSSplat) -> splice(Op(V1, RHSSplat), poison, offset)
Op(LHSSplat, splice(V2, poison, offset)) -> splice(Op(LHSSplat, V2), poison, offset)
We can then remove the vp.reverse intrinsic and its related combines
soon after, once we migrate the loop vectorizer over.
The new select `InstCombinerImpl::foldBinOpSelectBinOp` reuses the same
condition in the same BB as the original so the profile info can be
trivially copied over.
Closes#172176.
Previously, `FoldOpIntoSelect` wouldn't fold multi-use selects if
`MultiUse` wasn't explicitly true. This prevents useful folding when the
select is used multiple times in the same intrinsic call. Similar to
what is done in `foldOpIntoPhi`, we'll now check that all of the uses
come from a single user, rather than checking that there is only one
use.
Follow the structure of SimplifyDemandedBits. Doesn't handle anything
in the multiple use case for now, and continues just calling
computeKnownFPClass.
This changes LangRef to specify that pointer icmp only compares the
address bits of the pointers. That is, `icmp pred %a, %b` is equivalent
to `icmp pred ptrtoaddr(%a), ptrtoaddr(%b)`.
Similarly, it specifies that the `nonnull` attribute requires that the
address bits are non-zero.
There are a couple of motivations for this:
* For inequality comparisons, this is really the only sensible
semantics. Relational comparison of address and metadata bits as a
single integer is generally meaningless (unless the metadata bits are
equal).
* This matches (as far as I understand) the behavior of existing CHERI
implementations.
* LLVM can only reason about the address bits. These semantics allow
pointers with non-address bits to receive essentially the same
comparison optimization support as ordinary pointers.
In terms of implementation, this PR adjusts:
* The AMDGPULowerBufferFatPointers pass.
* An InstCombine fold that may replace pointers with different
non-address bits.
* The fold that replaces pointers based on dominating pointer equality.
It does not adjust:
* ISel, because we don't have in-tree targets where we can show a
difference.
* Various icmp+ptrtoint transforms, because we'll have to change this
code for ptrtoaddr optimization support anyway, and these changes are
tightly related.
Related discussion starting from:
https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/60?u=nikic
Currently sinking assumes in instcombine drops assumes if they would
prevent sinking. Removing dereferenceable assumptions earlier on can
inhibit vectorization of early-exit loops in practice.
Special-case deferenceable assumptions so that they block sinking. This
can be combined with a separate change to drop dereferencebale
assumptions after vectorization: https://clang.godbolt.org/z/jGqcx3sbs
PR: https://github.com/llvm/llvm-project/pull/166945
This patch enables `FoldOpIntoSelect` and `foldOpIntoPhi` for the cases
when Op's second parameter is a non-constant.
It doesn't seem to bring significant improvements, but the compile
time impact is neglegable.
This extends the `ptradd x, ptrtoint(y) - ptrtoint(x)` to `y`
InstCombine fold to support ptrtoaddr. In the case where x and y have
the same underlying object, this is handled by InstSimplify already. If
the underlying object may differ, the replacement can only be performed
if provenance does not matter.
For pointers with non-address bits we need to be careful here, because
the pattern will return a pointer with the non-address bits of x and the
address bits of y. As such, uses in ptrtoaddr are safe to replace, but
uses in ptrtoint are not. Whether uses in icmp are safe to replace
depends on the outcome of the pending discussion on icmp semantics (I'll
adjust this in https://github.com/llvm/llvm-project/pull/163936 if/when
that lands).
This patch improves constant folding through `llvm.vector.insert`. It
does not change anything for fixed-length vectors (which can already be
folded to ConstantVectors for these cases), but folds scalable vectors
that otherwise would not be folded.
These folds preserve the destination vector (which could be undef or
poison), giving targets more freedom in lowering the operations.
Previously, cross-lane operations were disallowed here, but they are
only problematic if the `select` condition is a vector, as the input of
the operation is not simply one of the arms of the phi/select.
Converting a vector float op into a vector int op may be non-profitable,
especially for targets where the float op for a given type is legal, but
the integer op is not.
We could of course also try to address this via a reverse transform in
the backend, but I don't think it's worth the bother, given that vectors
were never the intended use case for this transform in the first place.
Fixes https://github.com/llvm/llvm-project/issues/162749.
Making the choice more clear from the API name, otherwise it'd be very easy for one to just "not bother" with the `MDFrom`, especially since it is optional and follows the optional `Name` - but this time we'd have a harder time detecting it's effectivelly dropped metadata.
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.
While doing this, I noticed some other variables in the global namespace
and moved them as well.
Split GEPs that have more than one non-zero offset into two GEPs. This
is in preparation for the ptradd migration, which can only represent
such GEPs.
This also enables CSE and LICM of the common base.
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
Fold:
%gep1 = ptradd %p, C1
%gep2 = ptradd %gep1, %x
%res = ptradd %gep2, C2
To:
%gep = ptradd %gep, %x
%res = ptradd %gep, C1+C2
An alternative to this would be to generally canonicalize constant
offset GEPs to the right. I found the results of doing that somewhat
mixed, so I'm going for this more obviously beneficial change for now.
Proof for flag preservation on reassociation:
https://alive2.llvm.org/ce/z/gmpAMg
GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`.
Canonicalize these to `gep %T, ptr %p, i64 %idx`.
This enables transforms that only support one GEP index to work and
improves CSE.
Various transforms were recently hardened to make sure they still work
without the leading index.
InstCombine tries to convert `freeze(inst(op))` to `inst(freeze(op))`.
Currently, this is limited to the case where a single operand needs to
be frozen, and all other operands are guaranteed non-poison.
This patch allows the transform even if multiple operands need to be
frozen. The existing limitation makes sure that we do not increase the
total number of freezes, but it also means that that we may fail to
eliminate freezes (via poison flag dropping) and may prevent
optimizations (as analysis generally can't look past freeze). Overall, I
believe that aggressively pushing freezes upwards is more beneficial
than harmful.
This is the middle-end version of #145939 in DAGCombine (which is
currently reverted for SDAG-specific reasons).
Split GEPs that have more than one variable index into two. This is in
preparation for the ptradd migration, which will not support multi-index
GEPs.
This also enables the split off part to be CSEd and LICMed.
When expanding a GEP chain, if there is a chain of one-use GEPs followed
by a multi-use GEP, rewrite the multi-use GEP to include the one-use
GEPs offsets.
This means the offsets from the one-use GEPs can be reused by the offset
expansion without additional cost (from computing them again with a
different reassociation).
This is another prune of dead code -- we never generate debug intrinsics
nowadays, therefore there's no need for these codepaths to run.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
At this stage I'm just opportunistically deleting any code using
debug-intrinsic types, largely adjacent to calls to findDbgUsers. I'll
get to deleting that in probably one or more two commits.
SROA and a few other facilities use generic-lambdas and some overloaded
functions to deal with both intrinsics and debug-records at the same time.
As part of stripping out intrinsic support, delete a swathe of this code
from things in the Utils directory.
This is a large diff, but is mostly about removing functions that were
duplicated during the migration to debug records. I've taken a few
opportunities to replace comments about "intrinsics" with "records",
and replace generic lambdas with plain lambdas (I believe this makes
it more readable).
All of this is chipping away at intrinsic-specific code until we get to
removing parts of findDbgUsers, which is the final boss -- we can't
remove that until almost everything else is gone.
To push a freeze through an instruction, only one operand may produce
poison. However, this currently fails for identical operands which are
treated as separate. This patch fixes this by treating them as a single
operand.
With the advent of intrinsic-less debug-info, we no longer need to
scatter calls to getPrevNonDebugInstruction around the codebase. Remove
most of them -- there are one or two that have the "SkipPseudoOp" flag
turned on, however they don't seem to be in positions where skipping
anything would be reasonable.
If we're expanding offsets for a chain of GEPs in RewriteGEPs mode, we
should also rewrite GEPs that have one-use themselves, but are kept
alive by a multi-use GEP later in the chain.
For the sake of simplicity, I've changed this to just skip the one-use
condition entirely (which will perform an unnecessary rewrite of a no
longer used GEP, but shouldn't otherwise matter).