llvm-project

Author	SHA1	Message	Date
Luke Lau	fe4f6c1a58	[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 ) When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal. This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't. But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account. For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet. We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.	2025-07-28 22:59:49 +08:00
Min-Yih Hsu	6824bcfdb4	[IA] Relax the requirement of having ExtractValue users on deinterleave intrinsic (#148716 ) There are cases where InstCombine / InstSimplify might sink extractvalue instructions that use a deinterleave intrinsic into successor blocks, which prevents InterleavedAccess from kicking in because the current pattern requires deinterleave intrinsic to be used by extractvalue. However, this requirement is bit too strict while we could have just replaced the users of deinterleave intrinsic with whatever generated by the target TLI hooks.	2025-07-16 13:46:02 -07:00
Philip Reames	e5bc7e7df3	[RISCV][IA] Always generate masked versions of segment LD/ST [nfc-ish] (#148905 ) Goal is to be able to eventually merge some of these code path. Having the mask operand should get dropped cleanly via pattern match.	2025-07-15 13:02:24 -07:00
Luke Lau	8e4fb4bead	[IA] Remove recursive [de]interleaving support (#143875 ) Now that the loop vectorizer emits just a single llvm.vector.[de]interleaveN intrinsic after #141865, we can remove the need to recognise recursively [de]interleaved intrinsics. No in-tree target currently has instructions to emit an interleaved access with a factor > 8, and I'm not aware of any other passes that will emit recursive interleave patterns, so this code is effectively dead. Some tests have been converted from the recursive form to a single intrinsic, and some others were deleted that are no longer needed, e.g. to do with the recursive tree. This closes off the work started in #139893.	2025-06-25 12:29:45 +01:00
Luke Lau	6d88343662	[IA] Add support for [de]interleave{4,6,8} (#141512 ) This teaches the interleaved access pass to the lower the intrinsics for factors 4,6 and 8 added in #139893 to target intrinsics. Because factors 4 and 8 could either have been recursively [de]interleaved or have just been a single intrinsic, we need to check that it's the former it before reshuffling around the values via interleaveLeafValues. After this patch, we can teach the loop vectorizer to emit a single interleave intrinsic for factors 2 through to 8, and then we can remove the recursive interleaving matching in interleaved access pass.	2025-05-28 11:44:41 +01:00
Harald van Dijk	86d1d4eacb	[RISC-V] Allow intrinsics to be used with any pointer type. (#139634 ) RISC-V does not use address spaces and leaves them available for user code to make use of. Intrinsics, however, required pointer types to use the default address space, complicating handling during lowering to handle non-default address spaces. When the intrinsics are overloaded, this is handled without extra effort. This commit does not yet update Clang builtin functions to also permit pointers to non-default address spaces.	2025-05-23 09:40:27 +01:00
Luke Lau	09c3d1432b	[IA] Add support for [de]interleave{3,5,7} (#139373 ) This adds support for lowering deinterleave and interleave intrinsics for factors 3 5 and 7 into target specific memory intrinsics. Notably this doesn't add support for handling higher factors constructed from interleaving interleave intrinsics, e.g. factor 6 from interleave3 + interleave2. I initially tried this but it became very complex very quickly. For example, because there's now multiple factors involved interleaveLeafValues is no longer symmetric between interleaving and deinterleaving. There's then also two ways of representing a factor 6 deinterleave: It can both be done as either 1 deinterleave3 and 3 deinterleave2s OR 1 deinterleave2 and 3 deinterleave3s. I'm not sure the complexity of supporting arbitrary factors is warranted given how we only need to support a small number of factors currently: SVE only needs factors 2,3,4 whilst RVV only needs 2,3,4,5,6,7,8. My preference would be to just add a interleave6 and deinterleave6 intrinsic to avoid all this ambiguity, but I'll defer this discussion to a later patch.	2025-05-22 04:32:47 +01:00
Luke Lau	55c48ee6f1	[RISCV] Ignore interleaved accesses with non-default address spaces (#139698 ) This fixes a crash introduced in https://github.com/llvm/llvm-project/pull/137045#issuecomment-2872208568 where we don't have overloaded pointer types for segmented load/store intrinsics. This should be temporary until #139634 lands and overloads the pointer type for these	2025-05-13 14:40:16 +01:00
Min-Yih Hsu	808a5f15d7	[RISCV] Remove`riscv.segN.load/store` in favor of their mask variants (#137045 ) RISCVVectorPeepholePass would replace instructions with all-ones mask with their unmask variant, so there isn't really a point to keep separate versions of intrinsics. Note that `riscv.segN.load/store.mask` does not take pointer type (i.e. address space) as part of its overloading type signature, because RISC-V doesn't really use address spaces other than the default one.	2025-05-08 09:27:26 -07:00
Pedro Lobo	34a3c2302b	[AArch64][SVE] Change placeholder from `undef` to `poison` (#130519 ) Default to a `poison` vector when calling `@llvm.vector.insert`.	2025-03-11 21:51:28 +00:00
Min-Yih Hsu	bc74a1edbe	[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863 ) Previously, AArch64 used pattern matching to support llvm.vector.(de)interleave of 2 and 4; RISC-V only supported (de)interleave of 2. This patch consolidates the logics in these two targets by factoring out the common factor calculations into the InterleaveAccess Pass.	2025-01-23 15:27:51 -08:00
Hassnaa Hamdi	0209739597	[InterleavedAccessPass]: Ensure that dead nodes get erased only once (#122643 ) Use SmallSetVector instead of SmallVector to avoid duplication, so that dead nodes get erased/deleted only once.	2025-01-14 09:34:27 +00:00
Paul Walker	56c091ea71	[LLVM][IR] Use splat syntax when printing ConstantExpr based splats. (#116856 ) This brings the printing of scalable vector constant splats inline with their fixed length counterparts.	2024-11-21 11:21:12 +00:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Brandon Wu	22f98740b6	[llvm][RISCV] Support RISCV vector tuple CodeGen and Calling Convention (#97995 ) This patch handles target lowering and calling convention. For target lowering, the vector tuple type represented as multiple scalable vectors is now changed to a single `MVT`, each `MVT` has a corresponding register class. The load/store of vector tuples are handled as the same way but need another vector insert/extract instructions to get sub-register group. Inline assembly constraint for vector tuple type can directly be modeled as "vr" which is identical to normal vector registers. For calling convention, it no longer needs an alternative algorithm to handle register allocation, this makes the code easier to maintain and read. Stacked on https://github.com/llvm/llvm-project/pull/97994	2024-08-31 19:28:36 +08:00
Hassnaa Hamdi	3176f255c9	[IA][AArch64]: Construct (de)interleave4 out of (de)interleave2 (#89276 ) - [AArch64]: TargetLowering is updated to spot load/store (de)interleave4 like sequences using PatternMatch, and emit equivalent sve.ld4 and sve.st4 intrinsics.	2024-08-12 17:23:00 +01:00
Sander de Smalen	1015f51dd9	[AArch64] NFC: Rename -force-streaming-compatible-sve to -force-streaming-compatible (#92774 ) The behaviour of the flag should be equivalent to __arm_streaming_compatible. At the moment, the name suggests that '-force-streaming-compatible-sve' on its own (i.e. without specifying `+sve`) enables the compiler to use the streaming-compatible subset of SVE instructions, but the semantics merely are that the function can be called with either PSTATE.SM=0 or PSTATE.SM=1.	2024-05-22 07:58:54 +01:00
Harald van Dijk	8fd838a8c4	[RISC-V] Limit vscale interleaving to addrspace 0. (#91573 ) The vlseg and vsseg intrinsic functions are not overloaded on pointer type, so cannot handle non-default address spaces. This fixes an error we see after #90583.	2024-05-09 19:15:42 +01:00
Maciej Gabka	bfc0317153	Move several vector intrinsics out of experimental namespace (#88748 ) This patch is moving out following intrinsics: * vector.interleave2/deinterleave2 * vector.reverse * vector.splice from the experimental namespace. All these intrinsics exist in LLVM for more than a year now, and are widely used, so should not be considered as experimental.	2024-04-29 10:16:45 +01:00
Allen	8aa7e378de	[InterleavedAccessPass] Get round the unsupported large scalarize vectors (#88643 ) When build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512; While the MinElts is still 4 for <vscale x 4 x double> in getNumInterleavedAccesses, so it creates invalid llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited. Unlikely, the related custom spilting is not supported now. Fix https://github.com/llvm/llvm-project/issues/88247	2024-04-16 09:00:03 +08:00
Jim Lin	9e1ad3cff6	[RISCV] Remove blank lines at the end of testcases. NFC.	2024-01-02 13:13:04 +08:00
paperchalice	cd6e462d01	[CodeGen] Port `InterleavedAccess` to new pass manager (#74904 )	2023-12-10 19:15:51 +08:00
Skwoogey	a700a520f8	[InterleavedAccessPass] Avoid optimizing load instructions if it has dead binop users (#71339 ) If a load instruction qualifies to be optimized by InterleavedAccess Pass, but also has a dead binop instruction, this will lead to a crash. Binop instruction will not be deleted, because normally it would be deleted through its' users, but it has none. Later on deleting a load instruction will fail because it still has uses.	2023-11-07 08:08:49 +00:00
Visoiu Mistrih Francis	cc9ba5600e	[test] -march -> -mtriple (#67741 ) Similar to 806761a	2023-09-29 10:43:23 -07:00
Graham Hunter	e49d04e760	[AArch64][CodeGen] Lower (de)interleave2 intrinsics to ld2/st2 The InterleavedAccess pass currently matches (de)interleaving shufflevector instructions with loads or stores, and calls into target lowering to generate ldN or stN instructions. Since we can't use shufflevector for scalable vectors (besides a splat with zeroinitializer), we have interleave2 and deinterleave2 intrinsics. This patch extends InterleavedAccess to recognize those intrinsics and if possible replace them with ld2/st2. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D146218	2023-06-26 14:39:04 +01:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Luke Lau	ec26c9cdc0	[RISCV] Lower fixed length interleaved accesses via vssegN/vlsegN This enables the interleaved access pass on O1 and above, and causes interleaving/deinterleaving shuffles of fixed length vectors with stores/loads to be lowered into vssegN/vlsegN. We need to be careful and make sure that we only lower vsseg/vlseg whenever we know the fixed vector type will fit within the minimum vlen, and that the interleaving factor is supported for the given LMUL. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D145085	2023-04-02 16:47:44 +01:00
Luke Lau	80f3be9603	Revert "[RISCV] Lower fixed length interleaved accesses via vssegN/vlsegN" This reverts commit b95913e8c3a3521b85d689a358e620d89a4e83de.	2023-04-02 15:56:24 +01:00
Luke Lau	b95913e8c3	[RISCV] Lower fixed length interleaved accesses via vssegN/vlsegN This enables the interleaved access pass on O1 and above, and causes interleaving/deinterleaving shuffles of fixed length vectors with stores/loads to be lowered into vssegN/vlsegN. We need to be careful and make sure that we only lower vsseg/vlseg whenever we know the fixed vector type will fit within the minimum vlen, and that the interleaving factor is supported for the given LMUL. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D145085	2023-04-02 15:20:21 +01:00
David Green	7b6fae42f7	[InterleaveAccess] Check that binop shuffles have an undef second operand It is expected that shuffles that we hoist through binops only have a single vector operand, the other being undef/poison. The checks for isDeInterleaveMaskOfFactor check that all the elements come from inside the first vector, but with non-canonical shuffles the second operand could still have a value. Add a quick check to make sure it is UndefValue as expected, to make sure we don't run into problems with BinOpShuffles not using BinOps. Fixes #61749 Differential Revision: https://reviews.llvm.org/D147306	2023-03-31 15:38:27 +01:00
Nikita Popov	37188448cf	[InterleavedAccess] Convert tests to opaque pointers (NFC)	2022-12-27 11:00:50 +01:00
Bradley Smith	a83aa33d1b	[IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace. Differential Revision: https://reviews.llvm.org/D127976	2022-06-27 10:48:45 +00:00
Mindong Chen	495e258fd7	[AArch64][SVE] Add FP types to the supported SVE structure load/stores vector type list This adds FP type support to the SVE Container type list as a supplement to D112303. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D113333	2021-11-08 22:29:08 +08:00
Bradley Smith	13faa5f440	[AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types This adds support for SVE structured loads/stores to the relevant target hooks, such that we can support these instructions in the InterleavedAccess pass. Depends on D112078 Differential Revision: https://reviews.llvm.org/D112303	2021-10-29 09:35:57 +00:00
David Green	fda8b4714e	[InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass The Interleave Access pass will convert shuffle(binop(load, load)) to binop(shuffle(load), shuffle(load)), in order to create more interleaving load patterns (VLD2/3/4) that might have been messed up by instcombine. As shown in D104247 we were missing copying IR flags to the new instruction though, which should just be kept the same as the original instruction. Differential Revision: https://reviews.llvm.org/D104255	2021-06-17 09:53:33 +01:00
Juneyoung Lee	9f2d9364b0	[CodeGen] Update transformations to use poison for shufflevector/insertelem's initial vector elem This patch is a part of D93817 and makes transformations in CodeGen use poison for shufflevector/insertelem's initial vector element. The change in CodeGenPrepare.cpp is fine because the mask of shufflevector should be always zero. It doesn't touch the second element (which is poison). The change in InterleavedAccessPass.cpp is also fine becauses the mask is of the form <a, a+m, a+2m, .., a+km> where a+km is smaller than the size of the first vector operand. This is guaranteed by the caller of replaceBinOpShuffles, which is lowerInterleavedLoad. It calls isDeInterleaveMask and isDeInterleaveMaskOfFactor to check the mask is the desirable form. isDeInterleaveMask has the check that a+km is smaller than the vector size. To check my understanding, I added an assertion & added a test to show that this optimization doesn't fire in such case. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D94056	2021-01-10 18:03:51 +09:00
Florian Hahn	ed936aad78	[InterleavedAccess] Return correct 'modified' status. Both tryReplaceExtracts and replaceBinOpShuffles may modify the IR, even if no interleaved loads are generated, but currently the pass pretends no changes were made. This patch updates the pass to return true if either of the functions made any changes. In case of tryReplaceExtracts, changes are made if there are any Extracts and true is returned. `replaceBinOpShuffles` always makes changes if BinOpShuffles is not empty. It also always returned true, so I went ahead and change it to just `replaceBinOpShuffles`. Fixes PR48208. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D93997	2021-01-04 15:49:47 +00:00
Juneyoung Lee	49c2d703d3	[X86] Make deinterleave8bitStride3 use unary CreateShuffleVector This patch makes X86InterleavedAccessGroup::deinterleave8bitStride3 use the unary CreateShuffleVector. This is a continuation of D93923. There were a few missing replacements. IIUC, this patch does not cause change in the generated programs' semantics because the function inserts shufflevectors that only choose elements from the first vector. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93993	2021-01-04 02:10:51 +09:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Juneyoung Lee	ae6e89327b	Precommit tests that have poison as shufflevector's placeholder This commit copies existing tests at llvm/Transforms containing 'shufflevector X, undef' and replaces them with 'shufflevector X, poison'. The new copied tests have -inseltpoison.ll suffix at its file name (as db7a2f347f132b3920415013d62d1adfb18d8d58 did) See https://reviews.llvm.org/D93793 Test files listed using grep -R -E "^[^;]shufflevector <.> ., <.> undef" \| cut -d":" -f1 \| uniq Test files copied & updated using file_org=llvm/test/Transforms/$1 if [[ "$file_org" = -inseltpoison.ll ]]; then file=$file_org else file=${file_org%.ll}-inseltpoison.ll if [ ! -f $file ]; then cp $file_org $file fi fi sed -i -E 's/^([^;])shufflevector <(.)> (.), <(.)> undef/\1shufflevector <\2> \3, <\4> poison/g' $file head -1 $file \| grep "Assertions have been autogenerated by utils/update_test_checks.py" -q if [ "$?" == 1 ]; then echo "$file : should be manually updated" # The test is manually updated exit 1 fi python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file	2020-12-29 17:09:31 +09:00
Simon Pilgrim	4f175dce80	[InterleavedAccess] Remove unused check-prefixes Just use default CHECK	2020-11-09 13:12:40 +00:00
David Green	a4b6b1e1c8	[InterleaveAccess] Recognise Interleave loads through binary operations Instcombine will currently sink identical shuffles though vector binary operations. This is probably generally useful, but can break up the code pattern we use to represent an interleaving load group. This patch reverses that in the InterleaveAccessPass to re-recognise the pattern of shuffles sunk past binary operations and folds them back if an interleave group can be created. Differential Revision: https://reviews.llvm.org/D89489	2020-10-29 09:13:23 +00:00
David Green	ecd4f3fccb	[AArch64] Additional Interleaving Access test. NFC	2020-10-28 08:00:05 +00:00
Guillaume Chatelet	5b84ee4f61	[Alignment] Fix misaligned interleaved loads Summary: Tentatively fixing https://bugs.llvm.org/show_bug.cgi?id=45957 Reviewers: craig.topper, nlopes Subscribers: hiraditya, llvm-commits, RKSimon, jdoerfert, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D80276	2020-05-27 12:12:22 +00:00
Guillaume Chatelet	6e1eff7858	[NFC] Updating tests Summary: Updating IR now that alignment is explicitly set. This is a prerequisite to D80276. Reviewers: efriedma Subscribers: llvm-commits, craig.topper Tags: #llvm Differential Revision: https://reviews.llvm.org/D80549	2020-05-27 12:02:46 +00:00
Eli Friedman	11aa3707e3	StoreInst should store Align, not MaybeAlign This is D77454, except for stores. All the infrastructure work was done for loads, so the remaining changes necessary are relatively small. Differential Revision: https://reviews.llvm.org/D79968	2020-05-15 12:26:58 -07:00
Eli Friedman	4532a50899	Infer alignment of unmarked loads in IR/bitcode parsing. For IR generated by a compiler, this is really simple: you just take the datalayout from the beginning of the file, and apply it to all the IR later in the file. For optimization testcases that don't care about the datalayout, this is also really simple: we just use the default datalayout. The complexity here comes from the fact that some LLVM tools allow overriding the datalayout: some tools have an explicit flag for this, some tools will infer a datalayout based on the code generation target. Supporting this properly required plumbing through a bunch of new machinery: we want to allow overriding the datalayout after the datalayout is parsed from the file, but before we use any information from it. Therefore, IR/bitcode parsing now has a callback to allow tools to compute the datalayout at the appropriate time. Not sure if I covered all the LLVM tools that want to use the callback. (clang? lli? Misc IR manipulation tools like llvm-link?). But this is at least enough for all the LLVM regression tests, and IR without a datalayout is not something frontends should generate. This change had some sort of weird effects for certain CodeGen regression tests: if the datalayout is overridden with a datalayout with a different program or stack address space, we now parse IR based on the overridden datalayout, instead of the one written in the file (or the default one, if none is specified). This broke a few AVR tests, and one AMDGPU test. Outside the CodeGen tests I mentioned, the test changes are all just fixing CHECK lines and moving around datalayout lines in weird places. Differential Revision: https://reviews.llvm.org/D78403	2020-05-14 13:03:50 -07:00
David Green	3a6eb5f160	[ARM] Disable VLD4 under MVE Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109	2019-12-08 10:37:29 +00:00
David Green	882f23caea	[ARM] MVE interleaving load and stores. Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392	2019-11-19 18:37:30 +00:00

1 2

81 Commits