The previous position of llvm.protected.field.ptr lowering for loads
and stores was problematic as it not only inhibited optimizations such
as DSE (as stores to a llvm.protected.field.ptr were not considered to
must-alias stores to the non-protected.field pointer) but also required
changes to other optimization passes to avoid transformations that would
reduce PFP coverage.
Address this by moving the load/store part of the lowering to
InstCombine, where it will run earlier than the PFP-breaking and
AA-relying transformations. The deactivation symbol, null comparison
and EmuPAC parts of the lowering remain in PreISelLowering.
Now that the transformation inhibitions are no longer needed, remove them
(i.e. partially revert #151649, and revert #182976).
This change resulted in a 2.4% reduction in Fleetbench .text size and
the following improvements to PFP performance overhead for BM_PROTO_Arena
on various microarchitectures:
before after
Apple M2 Ultra 3.5% 3.3%
Google Axion C4A 3.3% 2.9%
Google Axion N4A 2.7% 2.2%
Reviewers: fmayer, nikic, vitalybuka
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/186548
Fixes#186334
Similar to #176556 , add the missing result type check in
`foldReversedIntrinsicOperands()`. This prevents `CreateVectorReverse()`
from being applied to struct-returning intrinsics.
The optimization introduced in #183329 incorrectly assumed that any
extraction from a scalable active lane mask used a scalable index. When
the result of a `llvm.vector.extract` is a fixed-width vector, the index
should not be multiplied by vscale.
This PR adds a check to ensure the index is only scaled by VScaleMin
when the return type of the extraction is a scalable vector, not
fixed-width.
Fixes#185271
The current LangRef wording says that the sign of a zero argument or
result is "insignificant", which is not really clear on what this means.
Alive2 models this as non-deterministically flipping the zero sign bits
for both inputs and outputs.
This PR proposes to specify this flag as non-deterministically flipping
inputs only. A consequence of this is that fabs is guaranteed to have an
unset sign bit even if the input is zero (that is,
https://alive2.llvm.org/ce/z/irCftQ is no longer a valid transform), and
that the copysign result sign only depends on the second operand (that
is, https://alive2.llvm.org/ce/z/VnHdfh is no longer a valid transform).
These rules are still more liberal than we'd really like them to be, but
at least avoid some of the issues with nsz.
This is based on the discussion in:
https://discourse.llvm.org/t/rfc-clarify-the-behavior-of-fp-operations-on-bit-strings-with-nsz-flag/85981
When extracting a subvector from the result of a get_active_lane_mask, return
a constant zero vector if it can be proven that all lanes will be inactive.
For example, the result of the extract below will be a subvector where
every lane is inactive if X & Y are const, and `Y * VScale >= X`:
vector.extract(get.active.lane.mask(Start, X), Y)
Fold: minmax(sub X, Z , sub Y, Z) -> sub minmax(X, Y), Z
When both sub instructions have no-wrap flags and share the same RHS
operand, we can fold:
smin (sub nsw X, Z), (sub nsw Y, Z) -> sub nsw (smin X, Y), Z
smax (sub nsw X, Z), (sub nsw Y, Z) -> sub nsw (smax X, Y), Z
umin (sub nuw X, Z), (sub nuw Y, Z) -> sub nuw (umin X, Y), Z
umax (sub nuw X, Z), (sub nuw Y, Z) -> sub nuw (umax X, Y), Z
This is valid because subtraction by a common value preserves relative
ordering when no signed/unsigned overflow occurs.
Proof: https://alive2.llvm.org/ce/z/n9gwj2
Closes https://github.com/llvm/llvm-project/issues/167059
```llvm
define i1 @src(i1 %0) {
%2 = insertelement <8 x i1> poison, i1 %0, i32 0
%3 = shufflevector <8 x i1> %2, <8 x i1> poison, <8 x i32> zeroinitializer
%4 = tail call i1 @llvm.vector.reduce.add.v8i1(<8 x i1> %3)
ret i1 %4
}
define i1 @tgt(i1 %0) {
ret i1 0
}
```
alive2: https://alive2.llvm.org/ce/z/vejxot
`vector_reduce_add(<n x i1>)` to `Trunc(ctpop(bitcast <n x i1> to in))`
interferes with the `vector_reduce_add(<splat>)` to `mul`, so I
exchanged their order.
Relevant PR: #161020
If only of the operands is one-use, the total number of fpexts stays the
same, but the min/max is performed on a narrowed type. Additionally, the
fpext may fold with a following fptrunc.
Select instructions created from the expansion of an umax intrinsic do
not have profile data even though the function may have profile data.
This is because PGO instrumentation does not support intrinsics.
Assisted-by: gemini
Truncating at 32 bits is now avoided by removing a cast to `unsigned`.
This would also break at 64 bits (with the pointer size > 64 bit), but I
don't think LLVM supports such a
thing.
This reverts commit bc7315749d6d16d0f162f816b3ec0ef7169615f2.
Currently we do not emit lifetimes by default when compiling with
memtag-stack - which means we don't catch use-after-scope (when
compiling without optimization).
This patch fixes that by mirroring ASan, HWASan and MSan, and always
emitting lifetime markers. The patch is based on the changes made in
aeca569.
rdar://163713381
Resolves#169701.
This PR extends the existing InstCombine operation which folds `tbl1`
intrinsics to `shufflevector` if the mask operand is constant. Before
this change, it only handled 64-bit `tbl1` intrinsics with no
out-of-bounds indices. I've extended it to support both 64-bit and
128-bit vectors, and it now handles the full range of `tbl1`-`tbl4` and
`tbx1`-`tbx4`, as long as at most two of the input operands are actually
indexed into.
For the purposes of `tbl`, we need a dummy vector of zeroes if there are
any out-of-bounds indices, and for the purposes of `tbx`, we use the
"fallback" operand. Both of those take up an operand for the purposes of
`shufflevector`.
This works a lot like https://github.com/llvm/llvm-project/pull/169110,
with some added complexity because we need to handle multiple operands.
I raised a couple questions in that PR that still need to be answered:
- Is it correct to check `IsA<UndefValue>` for each mask index, and set
the output mask index to -1 if so? This is later folded to a poison
value, and I'm not sure about the subtle differences between poison and
undef and when you can substitute one for the other. As I mentioned in
#169110, the existing x86 pass (`simplifyX86vpermilvar`) already behaves
this way when it comes to undef.
- How can I write an Alive2 proof for this? It's very hard to find good
documentation or tutorials about Alive2.
As with #169110, most of the regression test cases were generated using
Claude. Everything else was written by me.
Add an ImplicitTrunc argument to ConstantInt::get(), which allows
controlling whether implicit truncation of the value is permitted.
This argument currently defaults to true, but will be switched to false
in the future to guard against signed/unsigned confusion, similar to
what has already happened for APInt.
The argument gives an opt-out for cases where the truncation is
intended. The patch contains one illustrative example where this
happens.
Back when `TargetTransformInfo::instCombineIntrinsic` was added in
https://reviews.llvm.org/D81728, several transforms common to both ARM
and AArch64 were kept in the non-target-specific `InstCombineCalls.cpp`
so they could be shared between the two targets.
I want to extend the transform of the `tbl` intrinsics into static
`shufflevector`s in a similar manner to
https://github.com/llvm/llvm-project/pull/169110 (right now it only
works with a 64-bit `tbl1`, but `shufflevector` should allow it to work
with up to 2 operands, and it can definitely work with 128-bit vectors).
I think separating out the transform into a TTI hook is a prerequisite.
~~I'm not happy about creating an entirely new module for this and
having to wire it up through CMake and everything, but I'm not sure
about the alternatives. If any maintainers can think of a cleaner way of
doing this, I'm very open to it.~~
I've moved the transforms into
`Transforms/Utils/ARMCommonInstCombineIntrinsic.cpp`, which is a lot
simpler.
On RISC-V, some loops that the loop vectorizer vectorizes pre-LTO may
turn out to have the exact trip count exposed after LTO, see #164762.
If the trip count is small enough we can fold away the
@llvm.experimental.get.vector.length intrinsic based on this corollary
from the LangRef:
> If %cnt is less than or equal to %max_lanes, the return value is equal
to %cnt.
This on its own doesn't remove the @llvm.experimental.get.vector.length
in #164762 since we also need to teach computeKnownBits about
@llvm.experimental.get.vector.length and the sub recurrence, but this PR
is a starting point.
I've added this in InstCombine rather than InstSimplify since we may
need to insert a truncation (@llvm.experimental.get.vector.length can
take an i64 %cnt argument, the result is always i32).
Note that there was something similar done in VPlan in #167647 for when
the loop vectorizer knows the trip count.
Deactivation symbol operands are supported in the code generator by
building on the previously added support for IRELATIVE relocations.
Reviewers: ojhunt, fmayer, ahmedbougacha, nikic, efriedma-quic
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/133537
A new InstCombine transform uses this attribute to rewrite calls to a
modular version of the implementation along with llvm.reloc.none
relocations against aspects of the implementation needed by the call.
This change only adds support for the 'float' aspect, but it also builds
the structure needed for others.
See issue #146159
Previously, cross-lane operations were disallowed here, but they are
only problematic if the `select` condition is a vector, as the input of
the operation is not simply one of the arms of the phi/select.
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.
This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
Fixes#160066
Whenever we have a vector with all the same elemnts, created with
`insertelement` and `shufflevector` and we sum the vector, we have a
multiplication.
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.
Co-authored-by: Luke Lau <luke@igalia.com>
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.