llvm-project

Author	SHA1	Message	Date
Florian Hahn	86813aa786	[VPlan] Add dedicated user for resume phi with epilogue vectorization. Epilogue vectorization currently relies on the resume phi for the canonical induction being always available, which is why VPPhi are considered to have side-effects, to prevent their removal. This patch adds a new ResumeForEpilogue opcode to mark the resume phi as used for epilogue vectorization. This allows treating VPPhis in general as not having side-effects, enabling removal of unused VPPhis.	2025-08-10 21:21:16 +01:00
Luke Lau	723de7f231	[LV][RISCV] Try fixing Windows buildbot failure in force-vect-msg.ll. NFC The clang-x64-windows-msvc buildbot is failing after 707447159341f7b5678dee4f47731af50524b9ae due to this test failing: https://lab.llvm.org/buildbot/#/builders/63/builds/8528 This is a stab in the dark, but my first thought is that it may be due to the handling of floats with MSVC or something. So this removes the floating point part of the check. I don't have access to a Windows machine handy to debug this just yet, so pushing this to see if it can quickly return the buildbot to green.	2025-08-09 21:09:36 +08:00
Florian Hahn	82d633e9ff	[VPlan] Materialize vector trip count using VPInstructions. (#151925 ) Materialize the vector trip count computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. It also simplifies vector-trip count computations for scalable vectors, as we can re-use the UF x VF computation. PR: https://github.com/llvm/llvm-project/pull/151925	2025-08-08 11:44:32 +01:00
Luke Lau	7074471593	[RISCV] Enable tail folding by default (#151681 ) We have been tracking the performance of EVL tail folding in the loop vectorizer on RISC-V for a while now, and after much hard work from various contributors we think it should be generally profitable to enable by default now. With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU 2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30% geomean codesize reduction on SPEC and TSVC, with no significant regressions detected. Now that we are early into the LLVM 22.x development cycle it seems like a good time to enable it to catch any issues. There are still more EVL related items of work being tracked in #123069, which should continue to improve performance.	2025-08-08 14:26:23 +08:00
Luke Lau	0720af8c24	[LV][RISCV] Precommit RUN line changes from #151681 . NFC In preparation for enabling EVL tail folding by default.	2025-08-08 12:40:27 +08:00
Luke Lau	a04142f11f	[LV][RISCV] Add check lines for scalable interleave costs. NFC Previously we could only scalably vectorize interleave groups with factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now support all factors (available on RISC-V). So this adds the remaining check lines for the scalable VFs.	2025-08-07 12:28:12 +08:00
Luke Lau	44af26ea2e	[LV] Fix EVL test after merge. NFC Test was modified in both 25d1285eecbab731eaf418c8aab44e4eb5f9e538 and df8da2ff8370fda479b5c118704af4f50e0d3536	2025-08-07 11:12:43 +08:00
Luke Lau	df8da2ff83	[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110 ) Now that VPWidenPointerInductionRecipes are modelled in VPlan in #148274, we can support them in EVL tail folding. We need to replace their VFxUF operand with EVL as the increment is not guaranteed to always be VF on the penultimate iteration, and UF is always 1 with EVL tail folding. We also need to move the creation of the backedge value to the latch so that EVL dominates it. With this we will no longer fail to convert a VPlan to EVL tail folding, so adjust tryAddExplicitVectorLength to account for this. This brings us to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail folding vs no tail folding. The test in only-compute-cost-for-vplan-vfs.ll previously relied on widened pointer inductions with EVL tail folding to end up in a scenario with no vector VPlans, so this also replaces it with an unvectorizable fixed-order recurrence test from first-order-recurrence-multiply-recurrences.ll that also gets discarded.	2025-08-07 10:54:24 +08:00
Florian Hahn	25d1285eec	[VPlan] Replace single-entry VPPhis with their incoming values. Replace trivial, single-entry VPPhis with their incoming values,	2025-08-06 20:03:31 +01:00
Luke Lau	94a6cd464e	[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274 ) This is the VPWidenPointerInductionRecipe equivalent of #118638, with the motivation of allowing us to use the EVL as the induction step. There is a new VPInstruction added, WidePtrAdd to allow adding the step vector to the induction phi, since VPInstruction::PtrAdd only handles scalars or multiple scalar lanes. Originally this transformation was copied from the original recipe's execute code, but it's since been simplifed by teaching `unrollWidenInductionByUF` to unroll the recipe, which brings it inline with VPWidenIntOrFpInductionRecipe.	2025-08-05 16:54:02 +08:00
Mel Chen	76d98cfcc4	[RISCV][TTI] Enable masked interleave access (#151665 ) Now that support for masked loads/stores of interleave groups has landed, we can enable the loop vectorizer to generate masked interleave access where applicable. This improves vectorization in several ways: * Internal predication support: This enables interleave group vectorization for loops with internal control flow predication, provided all members of the group share the same predicate. Gaps in interleave groups are still not efficiently handled by masking, so masking for gaps remains disabled for now. * Tail folding: This allows tail folding of loops with interleave groups by using masking. Without this, vectorized loops with interleaves would fall back to using separate gather/scatter accesses, which can be significantly less efficient. "[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)" was reverted by 5294793bdcf6ca142f7a0df897638bd4e85ed1a7 due to triggering an assertion. The issue has been addressed in the patch "[LV] Fix gap mask requirement for interleaved access (#151105)". On the other hand, this patch also enable fixed-length masked interleave access (#150624) since support for fixed-length has also been landed 992118cb4deab139ae384bb85f03225a9a21b008. --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2025-08-05 16:08:13 +08:00
Mel Chen	86916ff0f0	[LV] Fix gap mask requirement for interleaved access (#151105 ) When interleaved stores contain gaps, a mask is required to skip the gaps, regardless of whether scalar epilogues are allowed. This patch corrects the condition under which a gap mask is needed, ensuring consistency between the legacy and VPlan-based cost models and avoiding assertion failures. Related #149981	2025-08-01 14:24:30 +08:00
Luke Lau	9c90f848fd	[LV][RISCV] Regenerate select-cmp-reduction.ll with UTC. NFC Simplify the opt invocation, and remove -force-vector-width and the fixed-length RUN line so we're testing what the loop vectorizer emits in the default configuration.	2025-08-01 12:38:18 +08:00
Luke Lau	b926f5d923	[LV][RISCV] Regenerate reduction tests with UTC. NFC Previously these tests used optimization remarks to check the chosen VF, but I don't think thats needed if we can see from the types used in the vectorized body. We can just use UTC to generate the body instead. Whilst we're here, also simplify the opt invocation to remove unneeded options and rename from scalable-reductions.ll -> reductions.ll, seeing as it's not specifically testing for scalable VFs anymore.	2025-08-01 12:32:14 +08:00
Luke Lau	8dfc07cad2	[RISCV][LV] Merge check lines in bf16/f16 tests. NFC Now that we fall back to scalar epilogues in #150908 the non-zvfhmin/zvfbfmin lines should be the same.	2025-08-01 11:26:16 +08:00
Luke Lau	7250b66240	[VPlan] Create AVL as a phi from TC -> 0 with EVL tail folding (#151481 ) This implements the first half of #151459, by changing the AVL so it's no longer computed as `trip-count - EVL-based IV`, but instead a separate scalar phi that is decremented by EVL each iteration. This shortens the dependency chain for computing the AVL and should eventually allow us to convert the branch condition to `branch-count avl-next, 0`. `simplifyBranchConditionForVFAndUF` had to be updated to prevent a regression because this introduces a VPPhi in the header block.	2025-08-01 11:00:05 +08:00
Samuel Tebbs	fcc419b05f	[LV][NFCI] Swap reduction recipe operand order https://github.com/llvm/llvm-project/pull/147026 will enable sub reductions, which require that the phi value is the first operand since they aren't commutative. This re-orders the operands when executing reductions, which actually matches other existing code in VPReductionRecipe::execute.	2025-07-31 14:35:10 +01:00
Shih-Po Hung	cc8c941e17	[VPlan] Convert EVL loops to variable-length stepping after dissolution (#147222 ) Loop regions require fixed-length steps and rounded-up trip counts, but after dissolution creates explicit control flow, EVL loops can leverage variable-length stepping with original trip counts. This patch adds a post-dissolution transform pass to convert EVL loops from fixed-length to variable-length stepping .	2025-07-30 16:50:57 +08:00
Luke Lau	b663e563cc	[VPlan] Fix header masks in EVL tail folding (#150202 ) With EVL tail folding, the EVL may not always be VF on the second-to-last iteration. Recipes that have been converted to VP intrinsics via optimizeMaskToEVL account for this, but recipes that are left behind will still use the old header mask which may end up having a different vector length. This is effectively the same as #95368, and fixes this by converting header masks from icmp ule wide-canonical-iv, backedge-trip-count -> icmp ult step-vector, evl. Without it, recipes that fall through optimizeMaskToEVL may use the wrong vector length, e.g. in #150074 and #149981. We really need to split off optimizeMaskToEVL into VPlanTransforms::optimize and move transformRecipestoEVLRecipes into tryToBuildVPlanWithVPRecipes, so we don't mix up what is needed for correctness and what is needed to optimize away the mask computations. We should be able to still generate a correct albeit suboptimal VPlan without running optimizeMaskToEVL. I've added a TODO for this, which I think we can do after #148274 Fixes #150197	2025-07-30 11:31:04 +08:00
Florian Hahn	c93d166c58	[VPlan] Simplify (MUL %x, 0) -> 0. Simplify trivial multiplies. https://alive2.llvm.org/ce/z/DabRkA	2025-07-28 21:50:57 +01:00
Luke Lau	5f2092dae3	[RISCV][LV] Update f16/bf16 loop vectorizer tests. NFC This fixes a failing test after the changes in #150908 affected the result in #150882.	2025-07-28 23:19:03 +08:00
Luke Lau	fe4f6c1a58	[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 ) When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal. This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't. But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account. For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet. We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.	2025-07-28 22:59:49 +08:00
Luke Lau	92d09245d6	[VPlan] Fall back to scalar epilogue if possible when EVL isn't legal (#150908 ) When enabling predicated vectorization by default on RISC-V, there's a bunch of performance regressions on llvm-test-suite's LoopInterleaving microbenchmarks: https://lnt.lukelau.me/db_default/v4/nts/788?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=791&baseline=730&submit=Update Most of these regressions stem from the interleave_count pragma, which causes EVL tail folding interleaving to be unsupported (since we don't support unrolling with EVL) Currently if DataWithEVL isn't legal we fall back to DataWithoutLaneMask as the tail folding style, but this is very slow on RISC-V. The order of performance roughly is something like: DataWithEVL > None (scalar-epilogue) > Data[WithoutLaneMask] So this patch tries to prevent the regressions by falling back to a scalar epilogue where possible, i.e. the existing vectorization we have today. Not we may still need to fall back to DataWithoutLaneMask, e.g. if the trip count is low etc or it's forced by -prefer-predicate-over-epilogue=predicate-dont-vectorize.	2025-07-28 20:10:36 +08:00
Luke Lau	d4f9c11e06	[RISCV][LV] Use predicate-else-scalar-epilogue flag in tail folding tests. NFC Align the tests closer with what we eventually intend to enable by default on RISC-V by using -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue, instead of dropping vectorization entirely with predicate-dont-vectorize. Also adjust the non-EVL run lines so that they use -prefer-predicate-over-epilogue=scalar-epilogue instead of -force-tail-folding-style=none, so we're only using testing one type of flag instead of a combination of two.	2025-07-28 17:04:00 +08:00
Luke Lau	ddb12c10a9	[RISCV][LV] Remove redundant -force-tail-folding-style from tests. NFC This isn't needed after we set the tail folding style to data-with-evl via TTI in #148686. Also rename the tests to reflect the fact they're no longer forcing the tail folding style.	2025-07-28 16:11:01 +08:00
Florian Hahn	89ae085859	[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735 ) VPVectorPointer for part 0 is just the pointer operand. Simplify it after unrolling. This removes a large number of redundant GEPs with index 0. PR: https://github.com/llvm/llvm-project/pull/149735	2025-07-27 13:53:26 +01:00
Florian Hahn	fa3ec0c17c	[VPlan] Materialize constant vector trip counts before final opts. (#142309 ) Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together. PR: https://github.com/llvm/llvm-project/pull/142309	2025-07-26 17:16:36 +01:00
Alex Bradbury	5294793bdc	Revert "[RISCV][TTI] Enable masked interleave access for scalable vector (#149981 )" This reverts commit ee3a7714b7a69ac9aae4b79f4c67adc38bc6876b. Causes an assertion for the zvl1024b RISC-V build configuration. See comment with reproducer at <https://github.com/llvm/llvm-project/pull/149981#issuecomment-3118482801>	2025-07-25 16:14:10 +01:00
Mel Chen	ee3a7714b7	[RISCV][TTI] Enable masked interleave access for scalable vector (#149981 ) Now that support for masked loads/stores of interleave groups has landed, we can enable the loop vectorizer to generate masked interleave access where applicable. This improves vectorization in several ways: * Internal predication support: This enables interleave group vectorization for loops with internal control flow predication, provided all members of the group share the same predicate. Gaps in interleave groups are still not efficiently handled by masking, so masking for gaps remains disabled for now. * Tail folding: This allows tail folding of loops with interleave groups by using masking. Without this, vectorized loops with interleaves would fall back to using separate gather/scatter accesses, which can be significantly less efficient. * Scalable vector support: Currently, only scalable vector types are supported for masked interleave lowering. Fixed-length vector support will be enabled in the future. As interleave access is not yet supported with tail folding by EVL, that functionality is temporarily disabled. We are going to create another patch to support it. Co-authored-by: Philip Reames <preames@rivosinc.com> --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2025-07-25 17:53:08 +08:00
Luke Lau	9563e7a940	[VPlan] Mark VPInstruction::ExplicitVectorLength as single scalar. NFC (#150221 ) This allows it to be broadcasted without an explicit VPInstruction::Broadcast in #150202	2025-07-23 22:38:21 +08:00
Luke Lau	20c52e4231	Reapply "[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686 )" This reverts commit 25e97fc420f8ecc43fbabadfe9767b4163e6ee36. The original commit was reverted due to a crash in llvm-test-suite. The crash stemmed from a multiply reduction, which isn't supported for scalable VFs on RISC-V. But for EVL tail folding we only support scalable VFs, so when -force-tail-folding-style=data-with-evl is specified we check to see if there's a scalable VF, and fall back to data-without-lane-mask if there isn't. This is done in setTailFoldingStyles, but previously we were only checking if the forced tail folding style was legal, not the style returned by TTI. This version fixes this by checking the actual computed tail folding style and not just the forced one, and adds a test for the crash in llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll	2025-07-22 23:52:02 +08:00
Luke Lau	25e97fc420	Revert "[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686 )" This reverts commit 38318dd05615a2f38abdeeae99e7423165308902. The clang-riscv-gauntlet buildbot is breaking with this commit: https://lab.llvm.org/buildbot/#/builders/210/builds/371	2025-07-22 22:54:26 +08:00
Luke Lau	6e723d2de8	[VPlan] Remove loop region in simplifyBranchConditionForVFAndUF with EVL PHI (#150016 ) Previously we fell back to just simplifying the branch cond to true since one of the phis was a VPEVLBasedIVPHIRecipe. However this should be fine to replace with its start value.	2025-07-22 22:30:34 +08:00
Luke Lau	38318dd056	[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686 ) In preparation to eventually make EVL tail folding the default, this patch sets DataWithEVL as the preferred tail folding style for RISC-V, but doesn't enable tail folding by default. And although tail folding isn't enabled by default, the loop vectorizer will actually tail fold loops with a small trip count, so this will cause some EVL vectorized loops to be generated in the default configuration. The EVL tail folding work is still not complete, e.g. we still need to handle interleave groups etc., see #123069, but a lot of these missing features also apply to the data (masked) tail folding strategy, which is the default anyway. The actual overall performance picture is much better, on TSVC EVL tail folding is faster than data on every benchmark on the spacemit-x60[^1]: https://lnt.lukelau.me/db_default/v4/nts/755?compare_to=756 And on SPEC CPU 2017 we see a geomean improvement[^2]: https://lnt.lukelau.me/db_default/v4/nts/751?compare_to=753 This is likely due to masked instructions generally being less performant on the spacemit-x60, up to twice as slow: https://camel-cdr.github.io/rvv-bench-results/bpi_f3/index.html [^1]: These benchmarks don't exactly give the same performance numbers as this patch, but it's a good indicator that EVL tail folding is generally faster than masked tail folding. [^2]: The large code size increase in 505.mcf_r is due to a function being inlined now	2025-07-22 21:02:59 +08:00
Luke Lau	cb8b0cd2cf	[LV] Precommit test changes for #148686 . NFC Namely explicitly adding -force-tail-folding-style=data to existing RUN lines so that we don't lose them when we switch to data-with-evl by default.	2025-07-22 16:16:43 +08:00
Mel Chen	d2a7f4e528	[NFC][LV] Refine the lit test case riscv-vector-reverse.ll (#149020 ) This patch includes the following changes: 1. Merge riscv-vector-reverse-output.ll into riscv-vector-reverse.ll, and only check the generated LLVM IR. 2. Add vplan-riscv-vector-reverse.ll to preserve the original debug output checks from riscv-vector-reverse.ll.	2025-07-22 14:56:14 +08:00
Mel Chen	6f240d5a7d	[LV][EVL] Remove interleave count from the test case for EVL tail-folding. nfc (#149834 ) Remove the interleave count since we have not supported it when EVL tail-folding.	2025-07-22 08:59:53 +08:00
Florian Hahn	afe8150780	[VPlan] Simplify exituser handling by generating all extracts first(NFCI) Simplify the handling of exit users by generating all extracts first (safe option), and have FOR handling optimize the extracts, similar to already done for reductions and inductions. NFC modulo first-order recurrence extract order in middle block.	2025-07-16 08:14:12 +01:00
Luke Lau	df387661c4	[RISCV] Remove -riscv-v-vector-bits-min from LoopVectorize tests. NFC (#148565 ) If I understand correctly there was a point where we used to need this before it was implied by Zvl*b. Now that it is though and we use -mattr=+v in pretty much every test we can remove it. In unroll-in-loop-vectorizer.ll we can force a VF of 1 instead by using -force-vector-width=1, and in scalable-basics.ll the two RUN lines were the same so I merged them.	2025-07-14 21:59:35 +08:00
Florian Hahn	64686c59c3	[VPlan] Connect (MemRuntime\|SCEV)Check blocks as VPlan transform (NFC). (#143879 ) Connect SCEV and memory runtime check block directly in VPlan as VPIRBasicBlocks, removing ILV::emitSCEVChecks and ILV::emitMemRuntimeChecks. The new logic is currently split across LoopVectorizationPlanner::addRuntimeChecks which collects a list of {Condition, CheckBlock} pairs and performs some checks and emits remarks if needed. The list of checks is then added to VPlan in VPlanTransforms::connectCheckBlocks. PR: https://github.com/llvm/llvm-project/pull/143879	2025-07-09 14:03:25 +02:00
Luke Lau	61a0653cc6	[VPlan] Fix first-order splices without header mask not using EVL (#146672 ) This fixes a buildbot failure with EVL tail folding after #144666: https://lab.llvm.org/buildbot/#/builders/132/builds/1653 For a first-order recurrence to be correct with EVL tail folding we need to convert splices to vp splices with the EVL operand. Originally we did this by looking for users of the header mask and its users, and converting it in createEVLRecipe. However after #144666 a FOR splice might not actually use the header mask if it's based off e.g. an induction variable, and so we wouldn't pick it up in createEVLRecipe. This fixes this by converting FOR splices separately in a loop over all recipes in the plan, regardless of whether or not it uses the header mask. I think there was some conflation in createEVLRecipe between what was an optimisation and what was needed for correctness. Most of the transforms in it just exist to optimize the mask away and we should still emit correct code without them. So I've renamed it to make the separation clearer.	2025-07-03 16:55:00 +01:00
Luke Lau	ec25a0568c	[VPlan] Don't convert VPWidenSelectRecipes to vp.select in EVL transform (#146695 ) createEVLRecipe tries to optimise recipes that use the header mask by replacing them with their VP equivalents and setting the EVL, allowing the mask to be removed. However we currently also convert widened selects to vp.select even though they don't necessarily use the header mask. Unlike vp.merge a vp.select only makes the "unused" lanes past EVL poison, so it's not needed for correctness. In the same vein as #127180, this patch removes the transform for VPWidenSelectRecipes and keeps them as plain select instructions to allow for more optimisations. RISCVVLOptimizer will still be able to optimise away any VL toggles and we end up with better code generation across llvm-test-suite and SPEC CPU 2017.	2025-07-03 11:50:25 +01:00
Mel Chen	bc8dad1c7e	[VPlan] Emit VPVectorEndPointerRecipe for reverse interleave pointer adjustment (#144864 ) A reverse interleave access is essentially composed of multiple load/store operations with same negative stride, and their addresses are based on the last lane address of member 0 in the interleaved group. Currently, we already have VPVectorEndPointerRecipe for computing the last lane address of consecutive reverse memory accesses. This patch extends VPVectorEndPointerRecipe to support constant stride and extracts the reverse interleave group address adjustment from VPInterleaveRecipe::execute, replacing it with a VPVectorEndPointerRecipe. The final goal is to support interleaved accesses with EVL tail folding. Given that VPInterleaveRecipe is large and tightly coupled — combining both load and store, and embedding operations like reverse pointer adjustion (GEP), widen load/store, deinterleave/interleave, and reversal — breaking it down into smaller, dedicated recipes may allow VPlanTransforms::tryAddExplicitVectorLength to lower them into EVL-aware form more effectively. One foreseeable challenge is that VPlanTransforms::convertToConcreteRecipes currently runs after tryAddExplicitVectorLength, so decomposing VPInterleaveRecipe will likely need to happen earlier in the pipeline to be effective.	2025-07-02 18:16:02 +08:00
Luke Lau	4a2fa0847f	[VPlan] Support VPWidenIntOrFpInductionRecipes with EVL tail folding (#144666 ) Following on from #118638, this handles widened induction variables with EVL tail folding by setting the VF operand to be EVL, calculated in the vector body. We need to do this for correctness since with EVL tail folding the number of elements processed in the penultimate iteration may not be VF, but the runtime EVL, and we need take this into account when updating the backedge value. - Because the VF may now not be a live-in we need to move the insertion point to just after the VFs definition - We also need to avoid truncating it when it's the same size as the step type, previously this wasn't a problem for live-ins. - Also because the VF may be smaller than the IV type, since the EVL is always i32, we may need to zext it. On -march=rva23u64 -O3 we get 87.1% more loops vectorized on TSVC, and 42.8% more loops vectorized on SPEC CPU 2017	2025-07-01 12:29:24 +01:00
Luke Lau	f01a7936be	[VPlan] Replace all uses of VF when EVL tail folding. NFCI (#146339 ) With EVL tail folding, any use of the VF live in should be replaced by the EVL. Otherwise, it should likely be directly emitted as a constant via VPTransformState::VF. This strengthens the EVL transformation by replacing all uses of VF with EVL and asserting that the only users are VPVectorEndPointerRecipe and VPScalarIVStepsRecipe, the latter of which is new. This should be NFC because even though we didn't previously replace the EVL of VPScalarIVStepsRecipe, it's only used when unrolling which we don't allow with EVL tail folding yet.	2025-06-30 13:47:38 +01:00
Florian Hahn	ecff028a96	[LV] Update test after 4ac4726d00. Update missed test checks after #144644.	2025-06-25 09:54:54 +01:00
Florian Hahn	830b2c842e	[LV] Replace redundant ExtractLastElement of reduction result (NFC). Replace redundant ExtractLastElement VPInstructions early. This is NFC, as the VPInstruction computing the final result is vector-to-scalar, producing a single scalar already. This enables follow-up changes to model more aspects of reductions directly in VPlan.	2025-06-24 21:48:58 +01:00
Florian Hahn	2f5d965bb5	[VPlan] Use EMIT-SCALAR when printing casts. Split off EMIT-SCALAR printing changes from already approved https://github.com/llvm/llvm-project/pull/140623. Currently all casts are single scalars, this brings printing in line with printing for other VPInstructions.	2025-06-21 10:23:53 +01:00
Philip Reames	c103bbc836	[LV] Consider whether vscale is a known power of two for iteration check (#144963 ) Going mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that.	2025-06-20 11:37:27 -07:00
Ramkumar Ramachandra	c8c4bd1ebc	[LV] Stengthen loop-invariance checks in isPredicatedInst (#140744 ) Check loop-invariance against SCEV as well.	2025-06-20 14:01:48 +01:00

1 2 3 4 5 ...

433 Commits