llvm-project

Author	SHA1	Message	Date
Florian Hahn	50b9ca4dda	[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510 ) After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510	2025-09-18 19:25:05 +01:00
Sander de Smalen	17e008db17	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637 ) The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.	2025-09-17 11:44:47 +01:00
Ramkumar Ramachandra	46fcece2a8	[VPlan] Extend CSE to eliminate GEPs (#156699 ) The motivation for this patch is to close the gap between the VPlan-based CSE and the legacy CSE, to make it easier to remove the legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22 test failures, and after this patch, there are only 12 failures, and all of them seem to have a single root cause: VPlanTransforms::createInterleaveGroups() and VPInterleaveGroup::execute(). The improvements from this patch are of course welcome. While developing the patch, a miscompile was found when GEP source-element-types differ, and this has been fixed. Co-authored-by: Florian Hahn <flo@fhahn.com> Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-16 10:14:32 +00:00
Luke Lau	4bb250d6a3	[VPlan] Always consider register pressure on RISC-V (#156951 ) Stacked on #156923 In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because whilst the largest element type is i8, we generate a bunch of pointer vectors for gathers and scatters. This means the VF chosen is quite high e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x i64> m8 registers for the pointers. This was briefly fixed by #132190 where we computed register pressure in VPlan and used it to prune VFs that were likely to spill. The legacy cost model wasn't able to do this pruning because it didn't have visibility into the pointer vectors that were needed for the gathers/scatters. However VF pruning was restricted again to just the case when max bandwidth was enabled in #141736 to avoid an AArch64 regression, and restricted again in #149056 to only prune VFs that had max bandwidth enabled. On RISC-V we take advantage of register grouping for performance and choose a default of LMUL 2, which means there are 16 registers to work with – half the number as SVE, so we encounter higher register pressure more frequently. As such, we likely want to always consider pruning VFs with high register pressure and not just the VFs from max bandwidth. This adds a TTI hook to opt into this behaviour for RISC-V which fixes the motivating godbolt example above. When last checked this significantly reduces the number of spills on SPEC CPU 2017, up to 80% on 538.imagick_r.	2025-09-12 06:21:54 +00:00
Elvis Wang	3e898bc40f	[LV] Fix cost misaligned when gather/scatter w/ addr is uniform. (#157387 ) This patch fix the assertion when the `isUniform` (from legacy model) and `isSingleScalar`(from Vplan-based model) mismatch. The simplify test that cause assertion ``` loop: loadA = load %a => %a is loop invariant. loadB = load %LoadA ... ``` In the legacy cost model, it cannot analysis that addr of `%loadB` is uniform but in the Vplan-based cost model both addr in `%loadA` and `loadB` is single scalar. Full test caused crash: https://llvm.godbolt.org/z/zEG8YKjqh. --------- Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-11 07:49:54 +08:00
Florian Hahn	9b1b93766d	Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9. Recommit with with a fix for MSan failure ( https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a set to track deleted values. Using the InsertedInstructions set is not sufficient, as it use asserting value handles as keys, which may dereference the value at construction. Original message: Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-09 09:47:41 +01:00
Florian Hahn	eeb43806eb	Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f. Triggers MSan errors in some configurations, e.g. https://lab.llvm.org/buildbot/#/builders/169/builds/14799	2025-09-08 14:52:28 +01:00
Florian Hahn	528b13df57	[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 ) Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-08 10:53:20 +01:00
Florian Hahn	b50ad945dd	[InstSimplify] Simplify extractvalue (umul_with_overflow(x, 1)). (#157307 ) Look through extractvalue to simplify umul_with_overflow where one of the operands is 1. This removes some redundant instructions when expanding SCEVs, which in turn makes the runtime check cost estimate more accurate, reducing the minimum iterations for which vectorization is profitable. PR: https://github.com/llvm/llvm-project/pull/157307	2025-09-07 18:32:40 +01:00
Luke Lau	3f9e0736ac	[VPlan] Move findCommonEdgeMask optimization to simplifyBlends (#156304 ) Following up from #150368, this moves folding common edge masks into simplifyBlends. One test in uniform-blend.ll ended up regressing but after looking at it closely, it came from a weird (x && !x) edge mask. So I've just included a simplifcation in this PR to fold that to false.	2025-09-05 01:29:22 +00:00
Ramkumar Ramachandra	e4c0b3e111	[VPlan] Simplify x && false -> false, x \| 0 -> x (#156345 ) The OR x, 0 -> x simplification has been introduced to avoid regressions.	2025-09-04 10:29:59 +01:00
Mel Chen	2f5500e4cf	[LV] Improve the test coverage for strided access. nfc (#155981 ) Add tests for strided access with UF > 1, and introduce a new test case @constant_stride_reinterpret.	2025-09-03 10:19:36 +00:00
Luke Lau	c33ccfa52b	[VPlan] Reassociate (x & y) & z -> x & (y & z) (#155383 ) This PR reassociates logical ands in order to enable more simplifications. The driving motivation for this is that with tail folding all blocks inside the loop body will end up using the header mask. However this can end up nestled deep within a chain of logical ands from other edges. Typically the header mask will be a leaf nested in the LHS, e.g. (headermask & y) & z. So pulling it out allows it to be simplified further, e.g. allows it to be optimised away to VP intrinsics with EVL tail folding.	2025-09-03 01:09:19 +00:00
Ramkumar Ramachandra	d8fd511480	[VPlan] Introduce CSE pass (#151872 ) Introduce a simple common-subexpression-elimination pass at the VPlan-level, running late during the execution of the VPlan. The long-term vision is to get rid of the legacy non-VPlan-based cse routine in LV, but this patch doesn't yet fully subsume it.	2025-09-02 12:23:29 +01:00
Elvis Wang	8fdae0c7da	[Reland] "[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. #149955 " (#156386 ) This patch implements the `getAddressComputationCost()` in RISCV TTI which make the gather/scatter with address calculation more expansive that stride cost. Note that the only user of `getAddressComputationCost()` with vector type is in `VPWidenMemoryRecipe::computeCost()`. So this patch make some LV tests changes. I've checked the tests changes in LV and seems those changes can be divided into two groups. * gather/scatter with uniform vector ptr, seems can be optimized to masked.load. * can optimize to stride load/store. ---- After #155739 landed, the assertion (cost mis-aligned) is fixed. I've tested llvm-test-suite w/ rva23u64 and rva23u64_zvl1024b locally and no assertion occurred.	2025-09-02 11:43:27 +08:00
Elvis Wang	7997a79be6	[LV] Align legacy cost model to vplan-based model for gather/scatter w/ uniform addr. (#155739 ) This patch check if the addr is uniform in legacy cost model to align vplan-based cost model after #150371. This patch fixes llvm-test-suite assertion (https://lab.llvm.org/buildbot/#/builders/210/builds/1935) due to cost model misaligned after #149955 under RISCV. I've tested this patch (on top of #149955) on the llvm-test-suite locally with crashed options `rva23u64`, `rva23u64_zvl1024b` and build successfully. Since this fix will change LV, I think would be better to create a PR to fix this.	2025-09-02 09:11:45 +08:00
Mel Chen	13357e8a12	[LV][EVL] Support interleaved access with tail folding by EVL (#152070 ) The InterleavedAccess pass already supports transforming vector-predicated (vp) load/store intrinsics. With this patch, we start enabling interleaved access under tail folding by EVL. This patch introduces a new base class, VPInterleaveBase, and a concrete class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and the new VPInterleaveEVLRecipe inherit from and implement VPInterleaveBase. Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL operand to emit vp.load/vp.store intrinsics. Currently, tail folding by EVL is only supported for scalable vectorization. Therefore, VPInterleaveEVLRecipe will only emit interleave/deinterleave intrinsics. Reverse accesses are not yet implemented, as masked reverse interleaved access under tail folding is not yet supported. Fixed #123201	2025-09-01 21:20:06 +08:00
Luke Lau	eb7f6a5f8a	[VPlan] Simplify (x && y) \|\| (x && z) -> x && (y \|\| z) (#156308 ) Split off from #155383, since it turns out this has a diff on its own.	2025-09-01 21:12:23 +08:00
Luke Lau	c9faedd760	[VPlan] Fold common edges away in convertPhisToBlends (#150368 ) If a phi is widened with tail folding, all of its predecessors will have a mask of the form %x = logical-and %active-lane-mask, %foo %y = logical-and %active-lane-mask, %bar %z = logical-and %active-lane-mask, %baz ... We can remove the common %active-lane-mask from all of these edge masks, which in turn allows us to simplify a lot of VPBlendRecipes. In particular, it allows the header mask to be removed in selects with EVL tail folding, improving RISC-V codegen on SPEC CPU 2017 for 525.x264_r, and supersedes #147243. This also allows us to remove VPBlendRecipe and directly emit VPInstruction::Select in another patch.	2025-09-01 07:03:33 +00:00
Elvis Wang	69db050839	Revert "[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI." (#155535 ) Reverts llvm/llvm-project#149955	2025-08-27 01:54:19 +00:00
Elvis Wang	dfd3833674	[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. (#149955 ) This patch implements the `getAddressComputationCost()` in RISCV TTI which make the gather/scatter with address calculation more expansive that stride cost. Note that the only user of `getAddressComputationCost()` with vector type is in `VPWidenMemoryRecipe::computeCost()`. So this patch make some LV tests changes. I've checked the tests changes in LV and seems those changes can be divided into two groups. * gather/scatter with uniform vector ptr, seems can be optimized to masked.load. * can optimize to stride load/store.	2025-08-27 08:40:40 +08:00
Florian Hahn	5faed1ad84	[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643 ) This patch adds a new VPlan-based addMinimumIterationCheck, which replaced the ILV version for the non-epilogue case. The VPlan-based version constructs a SCEV expression to compute the minimum iterations, use that to check if the check is known true or false. Otherwise it creates a VPExpandSCEV recipe and emits a compare-and-branch. When using epilogue vectorization, we still need to create the minimum trip-count-check during the legacy skeleton creation. The patch moves the definitions out of ILV. PR: https://github.com/llvm/llvm-project/pull/153643	2025-08-26 15:52:31 +01:00
Luke Lau	f3520c538d	[VPlan] Replace EVL branch condition with (branch-on-count AVLNext, 0) (#152167 ) This changes the branch condition to use the AVL's backedge value instead of the EVL-based IV. This allows us to emit bnez on RISC-V and removes a use of the trip count, which should reduce register pressure. To match phis with VPlanPatternMatch I've had to relax the assert that the number of operands must exactly match the pattern for the Phi opcode, and I've copied over m_ZExtOrSelf from the LLVM IR PatternMatch.h. Fixes #151459	2025-08-26 11:19:19 +00:00
Luke Lau	a12d012c87	[VPlan][RISC-V] Add test case for #154103 This has now been fixed by #152707	2025-08-26 17:31:29 +08:00
Luke Lau	42cf9c60d7	[RISCV] Mark Sub/AddChainWithSubs as legal reduction types (#154753 ) We used to vectorize these scalably but after #147026 they were split out from RecurKind::Add into their own RecurKinds, and we didn't mark them as supported in isLegalToVectorizeReduction. This caused the loop vectorizer to drop the scalable VPlan because it thinks the reductions will be scalarized. This fixes it by just marking them as supported. Fixes #154554	2025-08-21 21:43:48 +08:00
Elvis Wang	d611a9ca15	[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. (#154482 ) `VPEVLBasedIVPHIRecipe` will lower to VPInstruction scalar phi and generate scalar phi. This recipe will only occupy a scalar register just like other phi recipes. This patch fix the register usage for `VPEVLBasedIVPHIRecipe` from vector to scalar which is close to generated vector IR. https://godbolt.org/z/6Mzd6W6ha shows that no register spills when choosing `<vscale x 16>`. Note that this test is basically copied from AArch64.	2025-08-21 07:39:01 +08:00
Florian Hahn	351d398a37	[VPlan] Run final VPlan simplifications before codegen. Dissolving the hierarchical VPlan CFG and converting abstract to concrete recipes can expose additional simplification opportunities. Do a final run of simplifyRecipes before executing the VPlan.	2025-08-16 18:54:27 +01:00
Florian Hahn	db98ac43ec	[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495 ) Directly emit shl instead of a multiply if VF * Step is a power-of-2. The main motivation here is to prepare the code and test for directly generating and expanding a SCEV expression of the minimum iteration count. SCEVExpander will directly emit shl for multiplies with powers-of-2. InstCombine will also performs this combine, so end-to-end this should effectively by NFC. PR: https://github.com/llvm/llvm-project/pull/153495	2025-08-14 19:27:51 +01:00
Mel Chen	b9138bde35	[LV][EVL] More lit tests for interleaved access. nfc (#152959 ) Add test cases for reverse interleaved access and interleaved access with gap.	2025-08-13 15:43:39 +08:00
Florian Hahn	8cdab07aaa	Reapply "[VPlan] Remove trivial dead VPPhi cycles." This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9. Recommit with a fix for the verifier error caused for EVL recipes. Extra test coverage added in 6f939da60e.	2025-08-12 22:09:30 +01:00
Florian Hahn	6f939da60e	[LV] Add additional test for backedge elimination with EVL.	2025-08-12 21:58:19 +01:00
Florian Hahn	86813aa786	[VPlan] Add dedicated user for resume phi with epilogue vectorization. Epilogue vectorization currently relies on the resume phi for the canonical induction being always available, which is why VPPhi are considered to have side-effects, to prevent their removal. This patch adds a new ResumeForEpilogue opcode to mark the resume phi as used for epilogue vectorization. This allows treating VPPhis in general as not having side-effects, enabling removal of unused VPPhis.	2025-08-10 21:21:16 +01:00
Luke Lau	723de7f231	[LV][RISCV] Try fixing Windows buildbot failure in force-vect-msg.ll. NFC The clang-x64-windows-msvc buildbot is failing after 707447159341f7b5678dee4f47731af50524b9ae due to this test failing: https://lab.llvm.org/buildbot/#/builders/63/builds/8528 This is a stab in the dark, but my first thought is that it may be due to the handling of floats with MSVC or something. So this removes the floating point part of the check. I don't have access to a Windows machine handy to debug this just yet, so pushing this to see if it can quickly return the buildbot to green.	2025-08-09 21:09:36 +08:00
Florian Hahn	82d633e9ff	[VPlan] Materialize vector trip count using VPInstructions. (#151925 ) Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. PR: https://github.com/llvm/llvm-project/pull/151925	2025-08-08 11:44:32 +01:00
Luke Lau	7074471593	[RISCV] Enable tail folding by default (#151681 ) We have been tracking the performance of EVL tail folding in the loop vectorizer on RISC-V for a while now, and after much hard work from various contributors we think it should be generally profitable to enable by default now. With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU 2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30% geomean codesize reduction on SPEC and TSVC, with no significant regressions detected. Now that we are early into the LLVM 22.x development cycle it seems like a good time to enable it to catch any issues. There are still more EVL related items of work being tracked in #123069, which should continue to improve performance.	2025-08-08 14:26:23 +08:00
Luke Lau	0720af8c24	[LV][RISCV] Precommit RUN line changes from #151681 . NFC In preparation for enabling EVL tail folding by default.	2025-08-08 12:40:27 +08:00
Luke Lau	a04142f11f	[LV][RISCV] Add check lines for scalable interleave costs. NFC Previously we could only scalably vectorize interleave groups with factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now support all factors (available on RISC-V). So this adds the remaining check lines for the scalable VFs.	2025-08-07 12:28:12 +08:00
Luke Lau	44af26ea2e	[LV] Fix EVL test after merge. NFC Test was modified in both 25d1285eecbab731eaf418c8aab44e4eb5f9e538 and df8da2ff8370fda479b5c118704af4f50e0d3536	2025-08-07 11:12:43 +08:00
Luke Lau	df8da2ff83	[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110 ) Now that VPWidenPointerInductionRecipes are modelled in VPlan in #148274, we can support them in EVL tail folding. We need to replace their VFxUF operand with EVL as the increment is not guaranteed to always be VF on the penultimate iteration, and UF is always 1 with EVL tail folding. We also need to move the creation of the backedge value to the latch so that EVL dominates it. With this we will no longer fail to convert a VPlan to EVL tail folding, so adjust tryAddExplicitVectorLength to account for this. This brings us to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail folding vs no tail folding. The test in only-compute-cost-for-vplan-vfs.ll previously relied on widened pointer inductions with EVL tail folding to end up in a scenario with no vector VPlans, so this also replaces it with an unvectorizable fixed-order recurrence test from first-order-recurrence-multiply-recurrences.ll that also gets discarded.	2025-08-07 10:54:24 +08:00
Florian Hahn	25d1285eec	[VPlan] Replace single-entry VPPhis with their incoming values. Replace trivial, single-entry VPPhis with their incoming values,	2025-08-06 20:03:31 +01:00
Luke Lau	94a6cd464e	[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274 ) This is the VPWidenPointerInductionRecipe equivalent of #118638, with the motivation of allowing us to use the EVL as the induction step. There is a new VPInstruction added, WidePtrAdd to allow adding the step vector to the induction phi, since VPInstruction::PtrAdd only handles scalars or multiple scalar lanes. Originally this transformation was copied from the original recipe's execute code, but it's since been simplifed by teaching `unrollWidenInductionByUF` to unroll the recipe, which brings it inline with VPWidenIntOrFpInductionRecipe.	2025-08-05 16:54:02 +08:00
Mel Chen	76d98cfcc4	[RISCV][TTI] Enable masked interleave access (#151665 ) Now that support for masked loads/stores of interleave groups has landed, we can enable the loop vectorizer to generate masked interleave access where applicable. This improves vectorization in several ways: * Internal predication support: This enables interleave group vectorization for loops with internal control flow predication, provided all members of the group share the same predicate. Gaps in interleave groups are still not efficiently handled by masking, so masking for gaps remains disabled for now. * Tail folding: This allows tail folding of loops with interleave groups by using masking. Without this, vectorized loops with interleaves would fall back to using separate gather/scatter accesses, which can be significantly less efficient. "[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)" was reverted by 5294793bdcf6ca142f7a0df897638bd4e85ed1a7 due to triggering an assertion. The issue has been addressed in the patch "[LV] Fix gap mask requirement for interleaved access (#151105)". On the other hand, this patch also enable fixed-length masked interleave access (#150624) since support for fixed-length has also been landed 992118cb4deab139ae384bb85f03225a9a21b008. --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2025-08-05 16:08:13 +08:00
Mel Chen	86916ff0f0	[LV] Fix gap mask requirement for interleaved access (#151105 ) When interleaved stores contain gaps, a mask is required to skip the gaps, regardless of whether scalar epilogues are allowed. This patch corrects the condition under which a gap mask is needed, ensuring consistency between the legacy and VPlan-based cost models and avoiding assertion failures. Related #149981	2025-08-01 14:24:30 +08:00
Luke Lau	9c90f848fd	[LV][RISCV] Regenerate select-cmp-reduction.ll with UTC. NFC Simplify the opt invocation, and remove -force-vector-width and the fixed-length RUN line so we're testing what the loop vectorizer emits in the default configuration.	2025-08-01 12:38:18 +08:00
Luke Lau	b926f5d923	[LV][RISCV] Regenerate reduction tests with UTC. NFC Previously these tests used optimization remarks to check the chosen VF, but I don't think thats needed if we can see from the types used in the vectorized body. We can just use UTC to generate the body instead. Whilst we're here, also simplify the opt invocation to remove unneeded options and rename from scalable-reductions.ll -> reductions.ll, seeing as it's not specifically testing for scalable VFs anymore.	2025-08-01 12:32:14 +08:00
Luke Lau	8dfc07cad2	[RISCV][LV] Merge check lines in bf16/f16 tests. NFC Now that we fall back to scalar epilogues in #150908 the non-zvfhmin/zvfbfmin lines should be the same.	2025-08-01 11:26:16 +08:00
Luke Lau	7250b66240	[VPlan] Create AVL as a phi from TC -> 0 with EVL tail folding (#151481 ) This implements the first half of #151459, by changing the AVL so it's no longer computed as `trip-count - EVL-based IV`, but instead a separate scalar phi that is decremented by EVL each iteration. This shortens the dependency chain for computing the AVL and should eventually allow us to convert the branch condition to `branch-count avl-next, 0`. `simplifyBranchConditionForVFAndUF` had to be updated to prevent a regression because this introduces a VPPhi in the header block.	2025-08-01 11:00:05 +08:00
Samuel Tebbs	fcc419b05f	[LV][NFCI] Swap reduction recipe operand order https://github.com/llvm/llvm-project/pull/147026 will enable sub reductions, which require that the phi value is the first operand since they aren't commutative. This re-orders the operands when executing reductions, which actually matches other existing code in VPReductionRecipe::execute.	2025-07-31 14:35:10 +01:00
Shih-Po Hung	cc8c941e17	[VPlan] Convert EVL loops to variable-length stepping after dissolution (#147222 ) Loop regions require fixed-length steps and rounded-up trip counts, but after dissolution creates explicit control flow, EVL loops can leverage variable-length stepping with original trip counts. This patch adds a post-dissolution transform pass to convert EVL loops from fixed-length to variable-length stepping .	2025-07-30 16:50:57 +08:00
Luke Lau	b663e563cc	[VPlan] Fix header masks in EVL tail folding (#150202 ) With EVL tail folding, the EVL may not always be VF on the second-to-last iteration. Recipes that have been converted to VP intrinsics via optimizeMaskToEVL account for this, but recipes that are left behind will still use the old header mask which may end up having a different vector length. This is effectively the same as #95368, and fixes this by converting header masks from icmp ule wide-canonical-iv, backedge-trip-count -> icmp ult step-vector, evl. Without it, recipes that fall through optimizeMaskToEVL may use the wrong vector length, e.g. in #150074 and #149981. We really need to split off optimizeMaskToEVL into VPlanTransforms::optimize and move transformRecipestoEVLRecipes into tryToBuildVPlanWithVPRecipes, so we don't mix up what is needed for correctness and what is needed to optimize away the mask computations. We should be able to still generate a correct albeit suboptimal VPlan without running optimizeMaskToEVL. I've added a TODO for this, which I think we can do after #148274 Fixes #150197	2025-07-30 11:31:04 +08:00

1 2 3 4 5 ...

464 Commits