75 Commits

Author SHA1 Message Date
Min-Yih Hsu
0e9b6d6c8a
[IA][RISCV] Detecting gap mask from a mask assembled by interleaveN intrinsics (#153510)
If the mask of a (fixed-vector) deinterleaved load is assembled by
`vector.interleaveN` intrinsic, any intrinsic arguments that are
all-zeros are regarded as gaps.
2025-08-15 09:22:47 -07:00
Min-Yih Hsu
c202d2f515
[IA][RISCV] Recognizing gap masks assembled from bitwise AND (#153324)
For a deinterleaved masked.load / vp.load, if it's mask, `%c`, is
synthesized by the following snippet:
```
%m = shufflevector %s, poison, <0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3>
%g = <1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0>
%c = and %m, %g
```
Then we can know that `%g` is the gap mask and `%s` is the mask for each
field / component. This patch teaches InterleaveAccess pass to recognize
such patterns
2025-08-14 11:17:50 -07:00
Min-Yih Hsu
ca05058b49
[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612)
Turn the following deinterleaved load patterns
```
%l = masked.load(%ptr, /*mask=*/110110110110, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
%f2 = shufflevector %l, [2, 5, 8, 11]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1111)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
%f2 = poison
```
The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component.

Similarly,  turning the following snippet
```
%l = masked.load(%ptr, /*mask=*/110000110000, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1010)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
```

Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.
2025-08-12 14:08:18 -07:00
Philip Reames
f65b329d70 [IA] Fix a bug introduced by a recent refactoring
I had dropped the check for which intrinsics were supported.  This is
a quick fix to get tree back into an unbroken state, a cleaner change
may follow.
2025-07-26 11:46:15 -07:00
Philip Reames
7c37722f19
[IA] Recognize repeated masks which come from shuffle vectors (#150285)
This extends the fixed vector lowering to support the case where the
mask is formed via shufflevector idiom.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2025-07-24 21:11:38 -07:00
Philip Reames
992118cb4d
[IA] Add masked.load/store support for shuffle (de)interleave load/store (#150241)
This completes the basic support for masked.laod and masked.store in
InterleaveAccess. The backend already added via the intrinsic lowering
path and the common code structure (in RISCV at least).

Note that this isn't enough to enable in LV yet. We still need support
for recognizing an interleaved mask via a shufflevector in getMask.
2025-07-23 20:26:36 -07:00
Philip Reames
dbd9eae95a
[IA] Support vp.store in lowerinterleavedStore (#149605)
Follow up to 28417e64, and the whole line of work started with 4b81dc7.

This change merges the handling for VPStore - currently in
lowerInterleavedVPStore - into the existing dedicated routine used in
the shuffle lowering path. This removes the last use of the dedicated
lowerInterleavedVPStore and thus we can remove it.

This contains two changes which are functional.

First, like in 28417e64, merging support for vp.store exposes the
strided store optimization for code using vp.store.

Second, it seems the strided store case had a significant missed
optimization. We were performing the strided store at the full unit
strided store type width (i.e. LMUL) rather than reducing it to match
the input width. This became obvious when I tried to use the mask
created by the helper routine as it caused a type incompatibility.

Normally, I'd try not to include an optimization in an API rework, but
structuring the code to both be correct for vp.store and not optimize
the existing case turned out be more involved than seemed worthwhile. I
could pull this part out as a pre-change, but its a bit awkward on it's
own as it turns out to be somewhat of a half step on the possible
optimization; the full optimization is complex with the old code
structure.

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-07-22 15:50:17 -07:00
Philip Reames
2e2a8992f9
[IA] Remove resriction on constant masks for shuffle lowering (#150098)
The point of this change is simply to show that the constant check was
not required for correctness. The mixed intrinsic and shuffle tests are
added purely to exercise the code. An upcoming change will add support
for shuffle matching in getMask to support non-constant fixed vector
cases.
2025-07-22 15:21:26 -07:00
Philip Reames
bf86abee3e
[RISCV][IA] Support masked.store of deinterleaveN intrinsic (#149893)
This is the masked.store side to the masked.load support added in
881b3fd.

With this change, we support masked.load and masked.store via the
intrinsic lowering path used primarily with scalable vectors. An
upcoming change will extend the fixed vector (i.a. shuffle vector) paths
in the same manner.
2025-07-21 14:04:03 -07:00
Philip Reames
033df384cd [IA] Naming and style cleanup [nfc]
1) Rename argument II to something slightly more descriptive since we have
more than one IntrinsicInst flowing through.
2) Perform a checked dyn_cast early to eliminate two casts later in each
routine.
2025-07-21 12:46:41 -07:00
Philip Reames
881b3fdfad
[RISCV][IA] Support masked.load for deinterleaveN matching (#149556)
This builds on the whole series of recent API reworks to implement
support for deinterleaveN of masked.load. The goal is to be able to
enable masked interleave groups in the vectorizer once all the codegen
and costing pieces are in place.

I considered including the shuffle path support in this review as well
(since the RISCV target specific stuff should be common), but decided to
separate it into it's own review just to focus attention on one thing at
a time.
2025-07-21 11:07:41 -07:00
Philip Reames
28417e6459
[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174)
This continues in the direction started by commit 4b81dc7. We
essentially merges the handling for VPLoad - currently in
lowerInterleavedVPLoad - into the existing dedicated routine. This
removes the last use of the dedicate lowerInterleavedVPLoad and thus we
can remove it.

This isn't quite NFC as the main callback has support for the strided
load optimization whereas the VPLoad specific version didn't. So this
adds the ability to form a strided load for a vp.load deinterleave with
one shuffle used.
2025-07-17 17:29:28 -07:00
Philip Reames
b9adc4a59c
[IA] Use a single callback for lowerInterleaveIntrinsic [nfc] (#148978) (#149168)
This continues in the direction started by commit 4b81dc7. We
essentially merges the handling for VPStore - currently in
lowerInterleavedVPStore which is shared between shuffle and intrinsic
based interleaves - into the existing dedicated routine.
2025-07-16 18:09:27 -07:00
Min-Yih Hsu
6824bcfdb4
[IA] Relax the requirement of having ExtractValue users on deinterleave intrinsic (#148716)
There are cases where InstCombine / InstSimplify might sink extractvalue
instructions that use a deinterleave intrinsic into successor blocks,
which prevents InterleavedAccess from kicking in because the current
pattern requires deinterleave intrinsic to be used by extractvalue.
However, this requirement is bit too strict while we could have just
replaced the users of deinterleave intrinsic with whatever generated by
the target TLI hooks.
2025-07-16 13:46:02 -07:00
Philip Reames
4b81dc75f4
[IA] Use a single callback for lowerDeinterleaveIntrinsic [nfc] (#148978)
This essentially merges the handling for VPLoad - currently in
lowerInterleavedVPLoad which is shared between shuffle and intrinsic
based interleaves - into the existing dedicated routine.

My plan is that if we like this factoring is that I'll do the same for
the intrinsic store paths, and then remove the excess generality from
the shuffle paths since we don't need to support both modes in the
shared VPLoad/Store callbacks. We can probably even fold the VP versions
into the non-VP shuffle variants in the analogous way.
2025-07-15 18:08:57 -07:00
Min-Yih Hsu
ae810dde5d
[IA][NFC] Factoring out helper functions that extract (de)interleaving factors (#148689)
Factoring out and combining `isInterleaveIntrinsic`,
`isDeinterleaveIntrinsic`, and `getIntrinsicFactor` into
`getInterleaveIntrinsicFactor` and `getDeinterleaveIntrinsicFactor`
inside VectorUtils.

NFC.
2025-07-14 12:36:42 -07:00
Philip Reames
7bf439d260 [IA] Partially revert interface change from 4a66ba
As noted in post commit review, the API change here was not required.
I'd apparently confused myself when teasing apart patches from my
development branch.
2025-07-09 12:02:52 -07:00
Philip Reames
4a66ba2a4d
[IA] Support deinterleave intrinsics w/ fewer than N extracts (#147572)
For the fixed vector cases, we already support this, but the
deinterleave intrinsic cases (primary used by scalable vectors) didn't.
Supporting it requires plumbing through the Factor separately from the
extracts, as there can now be fewer extracts than the Factor. Note that
the fixed vector path handles this slightly differently - it uses the
shuffle and indices scheme to achieve the same thing.
2025-07-09 09:41:07 -07:00
Craig Topper
5776915d0f
[InterleavedAccessPass] Add skipFunction check for opt-bisect-limit (#147629) 2025-07-08 21:46:34 -07:00
Luke Lau
8e4fb4bead
[IA] Remove recursive [de]interleaving support (#143875)
Now that the loop vectorizer emits just a single
llvm.vector.[de]interleaveN intrinsic after #141865, we can remove the
need to recognise recursively [de]interleaved intrinsics.

No in-tree target currently has instructions to emit an interleaved
access with a factor > 8, and I'm not aware of any other passes that
will emit recursive interleave patterns, so this code is effectively
dead.

Some tests have been converted from the recursive form to a single
intrinsic, and some others were deleted that are no longer needed, e.g.
to do with the recursive tree.

This closes off the work started in #139893.
2025-06-25 12:29:45 +01:00
Luke Lau
6d88343662
[IA] Add support for [de]interleave{4,6,8} (#141512)
This teaches the interleaved access pass to the lower the intrinsics for
factors 4,6 and 8 added in #139893 to target intrinsics.

Because factors 4 and 8 could either have been recursively
[de]interleaved or have just been a single intrinsic, we need to check
that it's the former it before reshuffling around the values via
interleaveLeafValues.

After this patch, we can teach the loop vectorizer to emit a single
interleave intrinsic for factors 2 through to 8, and then we can remove
the recursive interleaving matching in interleaved access pass.
2025-05-28 11:44:41 +01:00
Luke Lau
09c3d1432b
[IA] Add support for [de]interleave{3,5,7} (#139373)
This adds support for lowering deinterleave and interleave intrinsics
for factors 3 5 and 7 into target specific memory intrinsics.

Notably this doesn't add support for handling higher factors constructed
from interleaving interleave intrinsics, e.g. factor 6 from interleave3
+ interleave2.

I initially tried this but it became very complex very quickly. For
example, because there's now multiple factors involved
interleaveLeafValues is no longer symmetric between interleaving and
deinterleaving. There's then also two ways of representing a factor 6
deinterleave: It can both be done as either 1 deinterleave3 and 3
deinterleave2s OR 1 deinterleave2 and 3 deinterleave3s.

I'm not sure the complexity of supporting arbitrary factors is warranted
given how we only need to support a small number of factors currently:
SVE only needs factors 2,3,4 whilst RVV only needs 2,3,4,5,6,7,8.

My preference would be to just add a interleave6 and deinterleave6
intrinsic to avoid all this ambiguity, but I'll defer this discussion to
a later patch.
2025-05-22 04:32:47 +01:00
Min-Yih Hsu
63fcce6611
[IA][RISCV] Add support for vp.load/vp.store with shufflevector (#135445)
Teach InterleavedAccessPass to recognize vp.load + shufflevector and
shufflevector + vp.store. Though this patch only adds RISC-V support to
actually lower this pattern. The vp.load/vp.store in this pattern
require constant mask.
2025-05-07 15:51:19 -07:00
Luke Lau
f218cd28d4 [IA] Remove unused argument. NFC 2025-04-24 19:08:07 +08:00
Kazu Hirata
599005686a
[llvm] Use *Set::insert_range (NFC) (#132325)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-20 22:24:06 -07:00
Min-Yih Hsu
005b23bb3b
[IA][RISCV] Support VP loads/stores in InterleavedAccessPass (#120490)
Teach InterleavedAccessPass to recognize the following patterns:
  - vp.store an interleaved scalable vector
  - Deinterleaving a scalable vector loaded from vp.load

Upon recognizing these patterns, IA will collect the interleaved /
deinterleaved operands and delegate them over to their respective
newly-added TLI hooks.

For RISC-V, these patterns are lowered into segmented loads/stores

Right now we only recognized power-of-two (de)interleave cases, in which
(de)interleave4/8 are synthesized from a tree of (de)interleave2.

---------

Co-authored-by: Nikolay Panchenko <nicholas.panchenko@gmail.com>
2025-02-04 11:07:34 -08:00
Min-Yih Hsu
bc74a1edbe
[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863)
Previously, AArch64 used pattern matching to support
llvm.vector.(de)interleave of 2 and 4; RISC-V only supported
(de)interleave of 2.

This patch consolidates the logics in these two targets by factoring out
the common factor calculations into the InterleaveAccess Pass.
2025-01-23 15:27:51 -08:00
Hassnaa Hamdi
0209739597
[InterleavedAccessPass]: Ensure that dead nodes get erased only once (#122643)
Use SmallSetVector instead of SmallVector to avoid duplication,
so that dead nodes get erased/deleted only once.
2025-01-14 09:34:27 +00:00
Kazu Hirata
735ab61ac8
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
2024-11-12 23:15:06 -08:00
Craig Topper
829c47f4e0 [InterleavedAccess] Use SmallVectorImpl references. NFC
Instead of repeating SmallVector size in multiple places.
2024-08-28 09:37:59 -07:00
Hassnaa Hamdi
3176f255c9
[IA][AArch64]: Construct (de)interleave4 out of (de)interleave2 (#89276)
- [AArch64]: TargetLowering is updated to spot load/store (de)interleave4 like sequences using PatternMatch,
   and emit equivalent sve.ld4 and sve.st4 intrinsics.
2024-08-12 17:23:00 +01:00
Maciej Gabka
bfc0317153
Move several vector intrinsics out of experimental namespace (#88748)
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice

from the experimental namespace.

All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
2024-04-29 10:16:45 +01:00
David Green
18bb175428 [AArch64] Add costs for LD3/LD4 shuffles.
Similar to #87934, this adds costs to the shuffles in a canonical LD3/LD4
pattern, which are represented in LLVM as deinterleaving-shuffle(load). This
likely has less effect at the moment than the ST3/ST4 costs as instcombine will
perform certain transforms without considering the cost.
2024-04-21 13:53:22 +01:00
Jeremy Morse
b9d83eff25
[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736)
These are the last remaining "trivial" changes to passes that use
Instruction pointers for insertion. All of this should be NFC, it's just
changing the spelling of how we identify a position.

In one or two locations, I'm also switching uses of getNextNode etc to
using std::next with iterators. This too should be NFC.

---------

Merged by: Stephen Tozer <stephen.tozer@sony.com>
2024-03-19 16:36:29 +00:00
paperchalice
cd6e462d01
[CodeGen] Port InterleavedAccess to new pass manager (#74904) 2023-12-10 19:15:51 +08:00
Skwoogey
a700a520f8
[InterleavedAccessPass] Avoid optimizing load instructions if it has dead binop users (#71339)
If a load instruction qualifies to be optimized by InterleavedAccess
Pass, but also has a dead binop instruction, this will lead to a crash.

Binop instruction will not be deleted, because normally it would be
deleted through its' users, but it has none. Later on deleting a load
instruction will fail because it still has uses.
2023-11-07 08:08:49 +00:00
Graham Hunter
e49d04e760 [AArch64][CodeGen] Lower (de)interleave2 intrinsics to ld2/st2
The InterleavedAccess pass currently matches (de)interleaving
shufflevector instructions with loads or stores, and calls into
target lowering to generate ldN or stN instructions.

Since we can't use shufflevector for scalable vectors (besides a
splat with zeroinitializer), we have interleave2 and deinterleave2
intrinsics. This patch extends InterleavedAccess to recognize those
intrinsics and if possible replace them with ld2/st2.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D146218
2023-06-26 14:39:04 +01:00
Akshay Khadse
aab0ca3e79 Fix uninitialized scalar members in CodeGen
This change fixes some static code analysis warnings.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D148811
2023-04-21 12:22:34 +08:00
David Green
7b6fae42f7 [InterleaveAccess] Check that binop shuffles have an undef second operand
It is expected that shuffles that we hoist through binops only have a single
vector operand, the other being undef/poison. The checks for
isDeInterleaveMaskOfFactor check that all the elements come from inside the
first vector, but with non-canonical shuffles the second operand could still
have a value. Add a quick check to make sure it is UndefValue as expected, to
make sure we don't run into problems with BinOpShuffles not using BinOps.

Fixes #61749

Differential Revision: https://reviews.llvm.org/D147306
2023-03-31 15:38:27 +01:00
Luke Lau
a9d9616c0d [RISCV][NFC] Share interleave mask checking logic
This adds two new methods to ShuffleVectorInst, isInterleave and
isInterleaveMask, so that the logic to check if a shuffle mask is an
interleave can be shared across the TTI, codegen and the interleaved
access pass.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D145971
2023-03-14 11:02:52 +00:00
Kazu Hirata
9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
David Green
28b41237e6 [InterleaveAccessPass] Handle multi-use binop shuffles
D89489 added some logic to the interleaved access pass to attempt to
undo the folding of shuffles into binops, that instcombine performs. If
early-cse is run too, the binops may be commoned into a single operation
with multiple shuffle uses. It is still profitable reverse the transform
though, so long as all the uses are shuffles.

Differential Revision: https://reviews.llvm.org/D129419
2022-07-10 17:24:37 +01:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
serge-sans-paille
ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
David Green
c49611f909 Mark CFG as preserved in TypePromotion and InterleaveAccess passes
Neither of these passes modify the CFG, allowing us to preserve DomTree
and LoopInfo across them by using setPreservesCFG.

Differential Revision: https://reviews.llvm.org/D110161
2021-09-22 18:58:00 +01:00
David Green
fda8b4714e [InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass
The Interleave Access pass will convert shuffle(binop(load, load)) to
binop(shuffle(load), shuffle(load)), in order to create more
interleaving load patterns (VLD2/3/4) that might have been messed up by
instcombine. As shown in D104247 we were missing copying IR flags to the
new instruction though, which should just be kept the same as the
original instruction.

Differential Revision: https://reviews.llvm.org/D104255
2021-06-17 09:53:33 +01:00
Kazu Hirata
0da15ea581 [llvm] Use append_range (NFC) 2021-01-27 23:25:41 -08:00
Juneyoung Lee
9f2d9364b0 [CodeGen] Update transformations to use poison for shufflevector/insertelem's initial vector elem
This patch is a part of D93817 and makes transformations in CodeGen use poison for shufflevector/insertelem's initial vector element.

The change in CodeGenPrepare.cpp is fine because the mask of shufflevector should be always zero.
It doesn't touch the second element (which is poison).

The change in InterleavedAccessPass.cpp is also fine becauses the mask is of the form <a, a+m, a+2m, .., a+km> where a+km is smaller than
the size of the first vector operand.
This is guaranteed by the caller of replaceBinOpShuffles, which is lowerInterleavedLoad.
It calls isDeInterleaveMask and isDeInterleaveMaskOfFactor to check the mask is the desirable form.
isDeInterleaveMask has the check that a+km is smaller than the vector size.
To check my understanding, I added an assertion & added a test to show that this optimization doesn't fire in such case.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D94056
2021-01-10 18:03:51 +09:00