llvm-project

Author	SHA1	Message	Date
Min-Yih Hsu	0e9b6d6c8a	[IA][RISCV] Detecting gap mask from a mask assembled by interleaveN intrinsics (#153510 ) If the mask of a (fixed-vector) deinterleaved load is assembled by `vector.interleaveN` intrinsic, any intrinsic arguments that are all-zeros are regarded as gaps.	2025-08-15 09:22:47 -07:00
Min-Yih Hsu	c202d2f515	[IA][RISCV] Recognizing gap masks assembled from bitwise AND (#153324 ) For a deinterleaved masked.load / vp.load, if it's mask, `%c`, is synthesized by the following snippet: ``` %m = shufflevector %s, poison, <0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3> %g = <1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0> %c = and %m, %g ``` Then we can know that `%g` is the gap mask and `%s` is the mask for each field / component. This patch teaches InterleaveAccess pass to recognize such patterns	2025-08-14 11:17:50 -07:00
Min-Yih Hsu	ca05058b49	[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612 ) Turn the following deinterleaved load patterns ``` %l = masked.load(%ptr, /mask=/110110110110, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] %f2 = shufflevector %l, [2, 5, 8, 11] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1111) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 %f2 = poison ``` The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component. Similarly, turning the following snippet ``` %l = masked.load(%ptr, /mask=/110000110000, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1010) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 ``` Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.	2025-08-12 14:08:18 -07:00
Philip Reames	f65b329d70	[IA] Fix a bug introduced by a recent refactoring I had dropped the check for which intrinsics were supported. This is a quick fix to get tree back into an unbroken state, a cleaner change may follow.	2025-07-26 11:46:15 -07:00
Philip Reames	7c37722f19	[IA] Recognize repeated masks which come from shuffle vectors (#150285 ) This extends the fixed vector lowering to support the case where the mask is formed via shufflevector idiom. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2025-07-24 21:11:38 -07:00
Philip Reames	992118cb4d	[IA] Add masked.load/store support for shuffle (de)interleave load/store (#150241 ) This completes the basic support for masked.laod and masked.store in InterleaveAccess. The backend already added via the intrinsic lowering path and the common code structure (in RISCV at least). Note that this isn't enough to enable in LV yet. We still need support for recognizing an interleaved mask via a shufflevector in getMask.	2025-07-23 20:26:36 -07:00
Philip Reames	dbd9eae95a	[IA] Support vp.store in lowerinterleavedStore (#149605 ) Follow up to 28417e64, and the whole line of work started with 4b81dc7. This change merges the handling for VPStore - currently in lowerInterleavedVPStore - into the existing dedicated routine used in the shuffle lowering path. This removes the last use of the dedicated lowerInterleavedVPStore and thus we can remove it. This contains two changes which are functional. First, like in 28417e64, merging support for vp.store exposes the strided store optimization for code using vp.store. Second, it seems the strided store case had a significant missed optimization. We were performing the strided store at the full unit strided store type width (i.e. LMUL) rather than reducing it to match the input width. This became obvious when I tried to use the mask created by the helper routine as it caused a type incompatibility. Normally, I'd try not to include an optimization in an API rework, but structuring the code to both be correct for vp.store and not optimize the existing case turned out be more involved than seemed worthwhile. I could pull this part out as a pre-change, but its a bit awkward on it's own as it turns out to be somewhat of a half step on the possible optimization; the full optimization is complex with the old code structure. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-07-22 15:50:17 -07:00
Philip Reames	2e2a8992f9	[IA] Remove resriction on constant masks for shuffle lowering (#150098 ) The point of this change is simply to show that the constant check was not required for correctness. The mixed intrinsic and shuffle tests are added purely to exercise the code. An upcoming change will add support for shuffle matching in getMask to support non-constant fixed vector cases.	2025-07-22 15:21:26 -07:00
Philip Reames	bf86abee3e	[RISCV][IA] Support masked.store of deinterleaveN intrinsic (#149893 ) This is the masked.store side to the masked.load support added in 881b3fd. With this change, we support masked.load and masked.store via the intrinsic lowering path used primarily with scalable vectors. An upcoming change will extend the fixed vector (i.a. shuffle vector) paths in the same manner.	2025-07-21 14:04:03 -07:00
Philip Reames	033df384cd	[IA] Naming and style cleanup [nfc] 1) Rename argument II to something slightly more descriptive since we have more than one IntrinsicInst flowing through. 2) Perform a checked dyn_cast early to eliminate two casts later in each routine.	2025-07-21 12:46:41 -07:00
Philip Reames	881b3fdfad	[RISCV][IA] Support masked.load for deinterleaveN matching (#149556 ) This builds on the whole series of recent API reworks to implement support for deinterleaveN of masked.load. The goal is to be able to enable masked interleave groups in the vectorizer once all the codegen and costing pieces are in place. I considered including the shuffle path support in this review as well (since the RISCV target specific stuff should be common), but decided to separate it into it's own review just to focus attention on one thing at a time.	2025-07-21 11:07:41 -07:00
Philip Reames	28417e6459	[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174 ) This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad - into the existing dedicated routine. This removes the last use of the dedicate lowerInterleavedVPLoad and thus we can remove it. This isn't quite NFC as the main callback has support for the strided load optimization whereas the VPLoad specific version didn't. So this adds the ability to form a strided load for a vp.load deinterleave with one shuffle used.	2025-07-17 17:29:28 -07:00
Philip Reames	b9adc4a59c	[IA] Use a single callback for lowerInterleaveIntrinsic [nfc] (#148978 ) (#149168 ) This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPStore - currently in lowerInterleavedVPStore which is shared between shuffle and intrinsic based interleaves - into the existing dedicated routine.	2025-07-16 18:09:27 -07:00
Min-Yih Hsu	6824bcfdb4	[IA] Relax the requirement of having ExtractValue users on deinterleave intrinsic (#148716 ) There are cases where InstCombine / InstSimplify might sink extractvalue instructions that use a deinterleave intrinsic into successor blocks, which prevents InterleavedAccess from kicking in because the current pattern requires deinterleave intrinsic to be used by extractvalue. However, this requirement is bit too strict while we could have just replaced the users of deinterleave intrinsic with whatever generated by the target TLI hooks.	2025-07-16 13:46:02 -07:00
Philip Reames	4b81dc75f4	[IA] Use a single callback for lowerDeinterleaveIntrinsic [nfc] (#148978 ) This essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad which is shared between shuffle and intrinsic based interleaves - into the existing dedicated routine. My plan is that if we like this factoring is that I'll do the same for the intrinsic store paths, and then remove the excess generality from the shuffle paths since we don't need to support both modes in the shared VPLoad/Store callbacks. We can probably even fold the VP versions into the non-VP shuffle variants in the analogous way.	2025-07-15 18:08:57 -07:00
Min-Yih Hsu	ae810dde5d	[IA][NFC] Factoring out helper functions that extract (de)interleaving factors (#148689 ) Factoring out and combining `isInterleaveIntrinsic`, `isDeinterleaveIntrinsic`, and `getIntrinsicFactor` into `getInterleaveIntrinsicFactor` and `getDeinterleaveIntrinsicFactor` inside VectorUtils. NFC.	2025-07-14 12:36:42 -07:00
Philip Reames	7bf439d260	[IA] Partially revert interface change from 4a66ba As noted in post commit review, the API change here was not required. I'd apparently confused myself when teasing apart patches from my development branch.	2025-07-09 12:02:52 -07:00
Philip Reames	4a66ba2a4d	[IA] Support deinterleave intrinsics w/ fewer than N extracts (#147572 ) For the fixed vector cases, we already support this, but the deinterleave intrinsic cases (primary used by scalable vectors) didn't. Supporting it requires plumbing through the Factor separately from the extracts, as there can now be fewer extracts than the Factor. Note that the fixed vector path handles this slightly differently - it uses the shuffle and indices scheme to achieve the same thing.	2025-07-09 09:41:07 -07:00
Craig Topper	5776915d0f	[InterleavedAccessPass] Add skipFunction check for opt-bisect-limit (#147629 )	2025-07-08 21:46:34 -07:00
Luke Lau	8e4fb4bead	[IA] Remove recursive [de]interleaving support (#143875 ) Now that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic after #141865, we can remove the need to recognise recursively [de]interleaved intrinsics. No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead. Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree. This closes off the work started in #139893.	2025-06-25 12:29:45 +01:00
Luke Lau	6d88343662	[IA] Add support for [de]interleave{4,6,8} (#141512 ) This teaches the interleaved access pass to the lower the intrinsics for factors 4,6 and 8 added in #139893 to target intrinsics. Because factors 4 and 8 could either have been recursively [de]interleaved or have just been a single intrinsic, we need to check that it's the former it before reshuffling around the values via interleaveLeafValues. After this patch, we can teach the loop vectorizer to emit a single interleave intrinsic for factors 2 through to 8, and then we can remove the recursive interleaving matching in interleaved access pass.	2025-05-28 11:44:41 +01:00
Luke Lau	09c3d1432b	[IA] Add support for [de]interleave{3,5,7} (#139373 ) This adds support for lowering deinterleave and interleave intrinsics for factors 3 5 and 7 into target specific memory intrinsics. Notably this doesn't add support for handling higher factors constructed from interleaving interleave intrinsics, e.g. factor 6 from interleave3 + interleave2. I initially tried this but it became very complex very quickly. For example, because there's now multiple factors involved interleaveLeafValues is no longer symmetric between interleaving and deinterleaving. There's then also two ways of representing a factor 6 deinterleave: It can both be done as either 1 deinterleave3 and 3 deinterleave2s OR 1 deinterleave2 and 3 deinterleave3s. I'm not sure the complexity of supporting arbitrary factors is warranted given how we only need to support a small number of factors currently: SVE only needs factors 2,3,4 whilst RVV only needs 2,3,4,5,6,7,8. My preference would be to just add a interleave6 and deinterleave6 intrinsic to avoid all this ambiguity, but I'll defer this discussion to a later patch.	2025-05-22 04:32:47 +01:00
Min-Yih Hsu	63fcce6611	[IA][RISCV] Add support for vp.load/vp.store with shufflevector (#135445 ) Teach InterleavedAccessPass to recognize vp.load + shufflevector and shufflevector + vp.store. Though this patch only adds RISC-V support to actually lower this pattern. The vp.load/vp.store in this pattern require constant mask.	2025-05-07 15:51:19 -07:00
Luke Lau	f218cd28d4	[IA] Remove unused argument. NFC	2025-04-24 19:08:07 +08:00
Kazu Hirata	599005686a	[llvm] Use *Set::insert_range (NFC) (#132325 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-20 22:24:06 -07:00
Min-Yih Hsu	005b23bb3b	[IA][RISCV] Support VP loads/stores in InterleavedAccessPass (#120490 ) Teach InterleavedAccessPass to recognize the following patterns: - vp.store an interleaved scalable vector - Deinterleaving a scalable vector loaded from vp.load Upon recognizing these patterns, IA will collect the interleaved / deinterleaved operands and delegate them over to their respective newly-added TLI hooks. For RISC-V, these patterns are lowered into segmented loads/stores Right now we only recognized power-of-two (de)interleave cases, in which (de)interleave4/8 are synthesized from a tree of (de)interleave2. --------- Co-authored-by: Nikolay Panchenko <nicholas.panchenko@gmail.com>	2025-02-04 11:07:34 -08:00
Min-Yih Hsu	bc74a1edbe	[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863 ) Previously, AArch64 used pattern matching to support llvm.vector.(de)interleave of 2 and 4; RISC-V only supported (de)interleave of 2. This patch consolidates the logics in these two targets by factoring out the common factor calculations into the InterleaveAccess Pass.	2025-01-23 15:27:51 -08:00
Hassnaa Hamdi	0209739597	[InterleavedAccessPass]: Ensure that dead nodes get erased only once (#122643 ) Use SmallSetVector instead of SmallVector to avoid duplication, so that dead nodes get erased/deleted only once.	2025-01-14 09:34:27 +00:00
Kazu Hirata	735ab61ac8	[CodeGen] Remove unused includes (NFC) (#115996 ) Identified with misc-include-cleaner.	2024-11-12 23:15:06 -08:00
Craig Topper	829c47f4e0	[InterleavedAccess] Use SmallVectorImpl references. NFC Instead of repeating SmallVector size in multiple places.	2024-08-28 09:37:59 -07:00
Hassnaa Hamdi	3176f255c9	[IA][AArch64]: Construct (de)interleave4 out of (de)interleave2 (#89276 ) - [AArch64]: TargetLowering is updated to spot load/store (de)interleave4 like sequences using PatternMatch, and emit equivalent sve.ld4 and sve.st4 intrinsics.	2024-08-12 17:23:00 +01:00
Maciej Gabka	bfc0317153	Move several vector intrinsics out of experimental namespace (#88748 ) This patch is moving out following intrinsics: * vector.interleave2/deinterleave2 * vector.reverse * vector.splice from the experimental namespace. All these intrinsics exist in LLVM for more than a year now, and are widely used, so should not be considered as experimental.	2024-04-29 10:16:45 +01:00
David Green	18bb175428	[AArch64] Add costs for LD3/LD4 shuffles. Similar to #87934, this adds costs to the shuffles in a canonical LD3/LD4 pattern, which are represented in LLVM as deinterleaving-shuffle(load). This likely has less effect at the moment than the ST3/ST4 costs as instcombine will perform certain transforms without considering the cost.	2024-04-21 13:53:22 +01:00
Jeremy Morse	b9d83eff25	[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736 ) These are the last remaining "trivial" changes to passes that use Instruction pointers for insertion. All of this should be NFC, it's just changing the spelling of how we identify a position. In one or two locations, I'm also switching uses of getNextNode etc to using std::next with iterators. This too should be NFC. --------- Merged by: Stephen Tozer <stephen.tozer@sony.com>	2024-03-19 16:36:29 +00:00
paperchalice	cd6e462d01	[CodeGen] Port `InterleavedAccess` to new pass manager (#74904 )	2023-12-10 19:15:51 +08:00
Skwoogey	a700a520f8	[InterleavedAccessPass] Avoid optimizing load instructions if it has dead binop users (#71339 ) If a load instruction qualifies to be optimized by InterleavedAccess Pass, but also has a dead binop instruction, this will lead to a crash. Binop instruction will not be deleted, because normally it would be deleted through its' users, but it has none. Later on deleting a load instruction will fail because it still has uses.	2023-11-07 08:08:49 +00:00
Graham Hunter	e49d04e760	[AArch64][CodeGen] Lower (de)interleave2 intrinsics to ld2/st2 The InterleavedAccess pass currently matches (de)interleaving shufflevector instructions with loads or stores, and calls into target lowering to generate ldN or stN instructions. Since we can't use shufflevector for scalable vectors (besides a splat with zeroinitializer), we have interleave2 and deinterleave2 intrinsics. This patch extends InterleavedAccess to recognize those intrinsics and if possible replace them with ld2/st2. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D146218	2023-06-26 14:39:04 +01:00
Akshay Khadse	aab0ca3e79	Fix uninitialized scalar members in CodeGen This change fixes some static code analysis warnings. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148811	2023-04-21 12:22:34 +08:00
David Green	7b6fae42f7	[InterleaveAccess] Check that binop shuffles have an undef second operand It is expected that shuffles that we hoist through binops only have a single vector operand, the other being undef/poison. The checks for isDeInterleaveMaskOfFactor check that all the elements come from inside the first vector, but with non-canonical shuffles the second operand could still have a value. Add a quick check to make sure it is UndefValue as expected, to make sure we don't run into problems with BinOpShuffles not using BinOps. Fixes #61749 Differential Revision: https://reviews.llvm.org/D147306	2023-03-31 15:38:27 +01:00
Luke Lau	a9d9616c0d	[RISCV][NFC] Share interleave mask checking logic This adds two new methods to ShuffleVectorInst, isInterleave and isInterleaveMask, so that the logic to check if a shuffle mask is an interleave can be shared across the TTI, codegen and the interleaved access pass. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D145971	2023-03-14 11:02:52 +00:00
Kazu Hirata	9e6d1f4b5d	[CodeGen] Qualify auto variables in for loops (NFC)	2022-07-17 01:33:28 -07:00
David Green	28b41237e6	[InterleaveAccessPass] Handle multi-use binop shuffles D89489 added some logic to the interleaved access pass to attempt to undo the folding of shuffles into binops, that instcombine performs. If early-cse is run too, the binops may be commoned into a single operation with multiple shuffle uses. It is still profitable reverse the transform though, so long as all the uses are shuffles. Differential Revision: https://reviews.llvm.org/D129419	2022-07-10 17:24:37 +01:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
serge-sans-paille	ed98c1b376	Cleanup includes: DebugInfo & CodeGen Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121332	2022-03-12 17:26:40 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
David Green	c49611f909	Mark CFG as preserved in TypePromotion and InterleaveAccess passes Neither of these passes modify the CFG, allowing us to preserve DomTree and LoopInfo across them by using setPreservesCFG. Differential Revision: https://reviews.llvm.org/D110161	2021-09-22 18:58:00 +01:00
David Green	fda8b4714e	[InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass The Interleave Access pass will convert shuffle(binop(load, load)) to binop(shuffle(load), shuffle(load)), in order to create more interleaving load patterns (VLD2/3/4) that might have been messed up by instcombine. As shown in D104247 we were missing copying IR flags to the new instruction though, which should just be kept the same as the original instruction. Differential Revision: https://reviews.llvm.org/D104255	2021-06-17 09:53:33 +01:00
Kazu Hirata	0da15ea581	[llvm] Use append_range (NFC)	2021-01-27 23:25:41 -08:00
Juneyoung Lee	9f2d9364b0	[CodeGen] Update transformations to use poison for shufflevector/insertelem's initial vector elem This patch is a part of D93817 and makes transformations in CodeGen use poison for shufflevector/insertelem's initial vector element. The change in CodeGenPrepare.cpp is fine because the mask of shufflevector should be always zero. It doesn't touch the second element (which is poison). The change in InterleavedAccessPass.cpp is also fine becauses the mask is of the form <a, a+m, a+2m, .., a+km> where a+km is smaller than the size of the first vector operand. This is guaranteed by the caller of replaceBinOpShuffles, which is lowerInterleavedLoad. It calls isDeInterleaveMask and isDeInterleaveMaskOfFactor to check the mask is the desirable form. isDeInterleaveMask has the check that a+km is smaller than the vector size. To check my understanding, I added an assertion & added a test to show that this optimization doesn't fire in such case. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D94056	2021-01-10 18:03:51 +09:00

1 2

75 Commits