llvm-project

Author	SHA1	Message	Date
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Florian Hahn	5ea6a3fc6d	[VPlan] Compute scalable VF in preheader for induction increment. (#74762 ) UF * VF is loop invariant and can be computed directly in the preheader. This prepares the code for #74761 and reduces the test changes.	2023-12-08 12:18:31 +00:00
Luke Lau	8d16c6809a	[RISCV] Increase default vectorizer LMUL to 2 After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot. Whilst we could be clever here and make the vectorizer smarter about dynamically selecting an LMUL that a) Doesn't affect register pressure b) Suitable for the microarchitecture we would need to teach its heuristics about RISC-V register grouping specifics. Instead this just does the easy, pragmatic thing by changing the default to a safe value that doesn't affect register pressure signifcantly[1], but should increase throughput and unlock more interleaving. [1] Register spilling when compiling sqlite at various levels of `-riscv-v-register-bit-width-lmul`: LMUL=1 2573 spills LMUL=2 2583 spills LMUL=4 2819 spills LMUL=8 3256 spills Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D143723	2023-03-23 10:33:50 +00:00
Luke Lau	15f9cf164c	[LV][RISCV] Don't interleave scalable vector loops It's less clear with scalable vectors than fixed length vectors that interleaving exposes more ILP, as scalable vectors can be thought of a sort of hardware form of interleaving, especially with larger LMULs. This also addresses the unexpected additional unrolling that occurs when using larger LMULs in the loop vectorizer. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144485	2023-02-22 10:15:11 +00:00
Sander de Smalen	5a115452c4	Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part, e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292', which was obviously wrong. Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this. This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.	2023-02-09 09:42:29 +00:00
Sander de Smalen	1f01cdda68	Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices." This patch causes a regression, so reverting it while I investigate the issue. This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.	2023-02-08 15:46:52 +00:00
Sander de Smalen	e6eb84a191	[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. This is specifically relevant for loops that vectorize using a scalable VF, where the code results in: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %gep = getelementptr ..., i32 %vf.part1 Which InstCombine then changes into: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %vf.part1.zext = sext i32 %vf.part1 to i64 %gep = getelementptr ..., i32 %vf.part1.zext D143016 tried to remove these extends, but that only works when the call to llvm.vscale.i32() has a single use. After doing any kind of CSE on these calls the combine no longer kicks in. It seems more sensible to ask DataLayout what type to use, rather than relying on InstCombine to insert the extend and hoping it can fold it away. I've only changed this for indices that are not constant, because I vaguely remember there was a reason for sticking with i32. It would also mean patching up loads more tests. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D143267	2023-02-07 11:47:51 +00:00
Nikita Popov	5b40015063	[LoopVectorize] Convert some tests to opaque pointers (NFC) For these tests update_test_checks.py had to be rerun.	2022-12-14 15:27:31 +01:00
Roman Lebedev	be51fa4580	[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax	2022-12-05 22:17:30 +03:00
jacquesguan	45bae1be90	[RISCV][test] Add inloop reduction vectorize test. NFC	2022-08-04 15:06:44 +08:00

10 Commits