llvm-project

Author	SHA1	Message	Date
Florian Hahn	c93d166c58	[VPlan] Simplify (MUL %x, 0) -> 0. Simplify trivial multiplies. https://alive2.llvm.org/ce/z/DabRkA	2025-07-28 21:50:57 +01:00
Florian Hahn	ccc96e6484	[LV] Add tests where vector trip count is known equal to VFxUF. Add additional tests to cover the case where the trip count isn't equal to VFxUF, but the vector trip count is.	2025-07-28 21:11:51 +01:00
Luke Lau	5f2092dae3	[RISCV][LV] Update f16/bf16 loop vectorizer tests. NFC This fixes a failing test after the changes in #150908 affected the result in #150882.	2025-07-28 23:19:03 +08:00
Luke Lau	fe4f6c1a58	[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 ) When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal. This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't. But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account. For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet. We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.	2025-07-28 22:59:49 +08:00
Luke Lau	92d09245d6	[VPlan] Fall back to scalar epilogue if possible when EVL isn't legal (#150908 ) When enabling predicated vectorization by default on RISC-V, there's a bunch of performance regressions on llvm-test-suite's LoopInterleaving microbenchmarks: https://lnt.lukelau.me/db_default/v4/nts/788?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=791&baseline=730&submit=Update Most of these regressions stem from the interleave_count pragma, which causes EVL tail folding interleaving to be unsupported (since we don't support unrolling with EVL) Currently if DataWithEVL isn't legal we fall back to DataWithoutLaneMask as the tail folding style, but this is very slow on RISC-V. The order of performance roughly is something like: DataWithEVL > None (scalar-epilogue) > Data[WithoutLaneMask] So this patch tries to prevent the regressions by falling back to a scalar epilogue where possible, i.e. the existing vectorization we have today. Not we may still need to fall back to DataWithoutLaneMask, e.g. if the trip count is low etc or it's forced by -prefer-predicate-over-epilogue=predicate-dont-vectorize.	2025-07-28 20:10:36 +08:00
Florian Hahn	2f2df751d4	[LV] Use SCEV::getElementCount in selectEpilogueVectorizationFactor. (#150018 ) Follow-up to https://github.com/llvm/llvm-project/pull/149789 to use getElementCount to compute the remaining iterations in selectEpilogueVectrizationFactor. PR: https://github.com/llvm/llvm-project/pull/150018	2025-07-28 12:12:27 +01:00
Luke Lau	d4f9c11e06	[RISCV][LV] Use predicate-else-scalar-epilogue flag in tail folding tests. NFC Align the tests closer with what we eventually intend to enable by default on RISC-V by using -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue, instead of dropping vectorization entirely with predicate-dont-vectorize. Also adjust the non-EVL run lines so that they use -prefer-predicate-over-epilogue=scalar-epilogue instead of -force-tail-folding-style=none, so we're only using testing one type of flag instead of a combination of two.	2025-07-28 17:04:00 +08:00
Luke Lau	ddb12c10a9	[RISCV][LV] Remove redundant -force-tail-folding-style from tests. NFC This isn't needed after we set the tail folding style to data-with-evl via TTI in #148686. Also rename the tests to reflect the fact they're no longer forcing the tail folding style.	2025-07-28 16:11:01 +08:00
Florian Hahn	89ae085859	[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735 ) VPVectorPointer for part 0 is just the pointer operand. Simplify it after unrolling. This removes a large number of redundant GEPs with index 0. PR: https://github.com/llvm/llvm-project/pull/149735	2025-07-27 13:53:26 +01:00
Florian Hahn	39b825e669	[LV] Add test for miscompile with conditional store. Add test case for https://github.com/llvm/llvm-project/issues/149347.	2025-07-27 13:43:43 +01:00
Florian Hahn	80c43b6c07	[VPlan] Add ExtractLane VPInst to extract across multiple parts. (#148817 ) This patch adds a new ExtractLane VPInstruction which extracts across multiple parts using a wide index, to be used in combination with FirstActiveLane. The patch updates early-exit codegen to use it instead ExtractElement, which is only per-part. With this change, interleaving should work correctly with early-exit loops. The patch removes the restrictions added in 6f43754e9 (#145877), but does not yet automatically select interleave counts > 1 for early-exit loops. I'll share a patch as follow-up. The cost of extracting a lane adds non-trivial overhead in the exit block, so that should be considered when picking the interleave count. PR: https://github.com/llvm/llvm-project/pull/148817	2025-07-27 08:08:25 +01:00
Florian Hahn	82e4b83328	[VPlan] Use terminator debug loc for latch BranchOnCond. Update VPlan to consistently use the latch branch debug location for the latch branch in the vector loop, if there is one.	2025-07-26 21:45:25 +01:00
Florian Hahn	fa3ec0c17c	[VPlan] Materialize constant vector trip counts before final opts. (#142309 ) Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together. PR: https://github.com/llvm/llvm-project/pull/142309	2025-07-26 17:16:36 +01:00
Florian Hahn	662bede01e	[LV] Handle known-false mem runtime checks in GeneratedRTChecks. Handle mem checks known to be false in getMemRuntimeChecks the same way as SCEV checks known to be false in getSCEVChecks. This ensures such redundant check blocks are not added in the first place.	2025-07-26 15:39:21 +01:00
Florian Hahn	9e7782db73	[LV,LAA] Add tests where RT checks are known false after expansion.	2025-07-26 14:17:35 +01:00
Florian Hahn	e5f5813042	[LV] Update some tests to have variable trip counts. (NFC) Update tests for which checking both the scalar resume and exit values is interesting, because they have first-order recurrences to have variable trip-counts, to avoid the branch in the middle.block being folded away by https://github.com/llvm/llvm-project/pull/142309. For similar reasons, also update check-prof-info.ll	2025-07-26 09:59:06 +01:00
Florian Hahn	9a201531ed	[LV] Bail out early if runtime checks are known to fail. There are a number of cases for which SCEV may not be able to prove a predicate will always be true/false, which may be simplified to a constant during expansion (see discussion in https://github.com/llvm/llvm-project/pull/131538). Bail out early if runtime checks are known to always fail, as the vector loop generated later will never execute.	2025-07-26 09:26:15 +01:00
Florian Hahn	445006d3a9	[LV] Add test for re-using existing phi for SCEV Add. Add another test case for https://github.com/llvm/llvm-project/pull/147824, where the difference between an existing phi and the target SCEV is an add of a constant.	2025-07-25 21:08:39 +01:00
Alex Bradbury	5294793bdc	Revert "[RISCV][TTI] Enable masked interleave access for scalable vector (#149981 )" This reverts commit ee3a7714b7a69ac9aae4b79f4c67adc38bc6876b. Causes an assertion for the zvl1024b RISC-V build configuration. See comment with reproducer at <https://github.com/llvm/llvm-project/pull/149981#issuecomment-3118482801>	2025-07-25 16:14:10 +01:00
Florian Hahn	e21ee41be4	[SCEV] Try to re-use pointer LCSSA phis when expanding SCEVs. (#147824 ) Generalize the code added in https://github.com/llvm/llvm-project/pull/147214 to also support re-using pointer LCSSA phis when expanding SCEVs with AddRecs. A common source of integer AddRecs with pointer bases are runtime checks emitted by LV based on the distance between 2 pointer AddRecs. This improves codegen in some cases when vectorizing and prevents regressions with https://github.com/llvm/llvm-project/pull/142309, which turns some phis into single-entry ones, which SCEV will look through now (and expand the whole AddRec), whereas before it would have to treat the LCSSA phi as SCEVUnknown. Compile-time impact neutral: https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u PR: https://github.com/llvm/llvm-project/pull/147824	2025-07-25 15:29:40 +01:00
Mel Chen	ee3a7714b7	[RISCV][TTI] Enable masked interleave access for scalable vector (#149981 ) Now that support for masked loads/stores of interleave groups has landed, we can enable the loop vectorizer to generate masked interleave access where applicable. This improves vectorization in several ways: * Internal predication support: This enables interleave group vectorization for loops with internal control flow predication, provided all members of the group share the same predicate. Gaps in interleave groups are still not efficiently handled by masking, so masking for gaps remains disabled for now. * Tail folding: This allows tail folding of loops with interleave groups by using masking. Without this, vectorized loops with interleaves would fall back to using separate gather/scatter accesses, which can be significantly less efficient. * Scalable vector support: Currently, only scalable vector types are supported for masked interleave lowering. Fixed-length vector support will be enabled in the future. As interleave access is not yet supported with tail folding by EVL, that functionality is temporarily disabled. We are going to create another patch to support it. Co-authored-by: Philip Reames <preames@rivosinc.com> --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2025-07-25 17:53:08 +08:00
Florian Hahn	6d004d2e5b	[LV] Add additional SCEV expansion tests for #147824 . Add additional test coverage for https://github.com/llvm/llvm-project/pull/147824.	2025-07-25 10:23:56 +01:00
Luke Lau	feb77c0fea	[VPlan] Handle VPWidenSelectRecipe in tryToFoldLiveIns (#150357 ) This helps simplify VPBlendRecipes that are expanded to selects in another patch.	2025-07-25 09:46:19 +08:00
Luke Lau	9563e7a940	[VPlan] Mark VPInstruction::ExplicitVectorLength as single scalar. NFC (#150221 ) This allows it to be broadcasted without an explicit VPInstruction::Broadcast in #150202	2025-07-23 22:38:21 +08:00
Florian Hahn	77b1b956da	[LV] Also clamp MaxVF by trip count when maximizing vector bandwidth. (#149794 ) Also clamp the max VF when maximizing vector bandwidth by the maximum trip count. Otherwise we may end up choosing a VF for which the vector loop never executes. PR: https://github.com/llvm/llvm-project/pull/149794	2025-07-23 10:19:56 +01:00
Luke Lau	20c52e4231	Reapply "[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686 )" This reverts commit 25e97fc420f8ecc43fbabadfe9767b4163e6ee36. The original commit was reverted due to a crash in llvm-test-suite. The crash stemmed from a multiply reduction, which isn't supported for scalable VFs on RISC-V. But for EVL tail folding we only support scalable VFs, so when -force-tail-folding-style=data-with-evl is specified we check to see if there's a scalable VF, and fall back to data-without-lane-mask if there isn't. This is done in setTailFoldingStyles, but previously we were only checking if the forced tail folding style was legal, not the style returned by TTI. This version fixes this by checking the actual computed tail folding style and not just the forced one, and adds a test for the crash in llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll	2025-07-22 23:52:02 +08:00
Luke Lau	25e97fc420	Revert "[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686 )" This reverts commit 38318dd05615a2f38abdeeae99e7423165308902. The clang-riscv-gauntlet buildbot is breaking with this commit: https://lab.llvm.org/buildbot/#/builders/210/builds/371	2025-07-22 22:54:26 +08:00
Luke Lau	6e723d2de8	[VPlan] Remove loop region in simplifyBranchConditionForVFAndUF with EVL PHI (#150016 ) Previously we fell back to just simplifying the branch cond to true since one of the phis was a VPEVLBasedIVPHIRecipe. However this should be fine to replace with its start value.	2025-07-22 22:30:34 +08:00
Luke Lau	38318dd056	[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686 ) In preparation to eventually make EVL tail folding the default, this patch sets DataWithEVL as the preferred tail folding style for RISC-V, but doesn't enable tail folding by default. And although tail folding isn't enabled by default, the loop vectorizer will actually tail fold loops with a small trip count, so this will cause some EVL vectorized loops to be generated in the default configuration. The EVL tail folding work is still not complete, e.g. we still need to handle interleave groups etc., see #123069, but a lot of these missing features also apply to the data (masked) tail folding strategy, which is the default anyway. The actual overall performance picture is much better, on TSVC EVL tail folding is faster than data on every benchmark on the spacemit-x60[^1]: https://lnt.lukelau.me/db_default/v4/nts/755?compare_to=756 And on SPEC CPU 2017 we see a geomean improvement[^2]: https://lnt.lukelau.me/db_default/v4/nts/751?compare_to=753 This is likely due to masked instructions generally being less performant on the spacemit-x60, up to twice as slow: https://camel-cdr.github.io/rvv-bench-results/bpi_f3/index.html [^1]: These benchmarks don't exactly give the same performance numbers as this patch, but it's a good indicator that EVL tail folding is generally faster than masked tail folding. [^2]: The large code size increase in 505.mcf_r is due to a function being inlined now	2025-07-22 21:02:59 +08:00
Florian Hahn	37f0f10a85	[LV] Don't vectorize epilogue with scalable VF if no iterations remain. (#149789 ) Currently we may try to vectorize the epilogue with a scalable VF, even if there are no remaining iterations after the main vector loop with a fixed VF. Update selectEpilogueVectorizationFactor to always compute the number of remaining iterations and exit early if no epilogue iterations remain. Fixes https://github.com/llvm/llvm-project/issues/149726 PR: https://github.com/llvm/llvm-project/pull/149789	2025-07-22 13:13:31 +01:00
Luke Lau	cb8b0cd2cf	[LV] Precommit test changes for #148686 . NFC Namely explicitly adding -force-tail-folding-style=data to existing RUN lines so that we don't lose them when we switch to data-with-evl by default.	2025-07-22 16:16:43 +08:00
Mel Chen	d2a7f4e528	[NFC][LV] Refine the lit test case riscv-vector-reverse.ll (#149020 ) This patch includes the following changes: 1. Merge riscv-vector-reverse-output.ll into riscv-vector-reverse.ll, and only check the generated LLVM IR. 2. Add vplan-riscv-vector-reverse.ll to preserve the original debug output checks from riscv-vector-reverse.ll.	2025-07-22 14:56:14 +08:00
Mel Chen	6f240d5a7d	[LV][EVL] Remove interleave count from the test case for EVL tail-folding. nfc (#149834 ) Remove the interleave count since we have not supported it when EVL tail-folding.	2025-07-22 08:59:53 +08:00
Florian Hahn	3fd53db858	[VPlan] Remove unneeded VPVectorPointer after narrowing to replicate. The replicate recipes created when narrowing interleave groups don't need a VPVectorPointer, they can simply use the existing pointer.	2025-07-19 20:18:04 +01:00
Florian Hahn	004c67ea25	[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239 ) Update LV to vectorize maxnum/minnum reductions without fast-math flags, by adding an extra check in the loop if any inputs to maxnum/minnum are NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros are already handled consistently by maxnum/minnum. If any input is NaN, exit the vector loop, compute the reduction result up to the vector iteration that contained NaN inputs and * resume in the scalar loop New recurrence kinds are added for reductions using maxnum/minnum without fast-math flags. PR: https://github.com/llvm/llvm-project/pull/148239	2025-07-18 21:58:19 +01:00
Nicholas Guy	b5e3fffd20	[LoopVectorizer][NFC] Require asserts on maxbandwidth-regpressure.ll (#149484 ) Fix for buildbot failure: https://lab.llvm.org/buildbot/#/builders/11/builds/19837	2025-07-18 10:21:21 +01:00
Nicholas Guy	20fc297ce3	[LoopVectorizer] Only check register pressure for VFs that have been enabled via maxBandwidth (#149056 ) Currently if MaxBandwidth is enabled, the register pressure is checked for each VF. This changes that to only perform said check if the VF would not have otherwise been considered by the LoopVectorizer if maxBandwidth was not enabled. Theoretically this allows for higher VFs to be considered than would otherwise be deemed "safe" (from a regpressure perspective), but more concretely this reduces the amount of work done at compile-time when maxBandwidth is enabled.	2025-07-18 09:21:20 +01:00
Florian Hahn	46357438ba	[SCEV] Try to re-use existing LCSSA phis when expanding SCEVAddRecExpr. (#147214 ) If an AddRec is expanded outside a loop with a single exit block, check if any of the (lcssa) phi nodes in the exit block match the AddRec. If that's the case, simply use the existing lcssa phi. This can reduce the number of instruction created for SCEV expansions, mainly for runtime checks generated by the loop vectorizer. Compile-time impact should be mostly neutral https://llvm-compile-time-tracker.com/compare.php?from=48c7a3187f9831304a38df9bdb3b4d5bf6b6b1a2&to=cf9d039a7b0db5d0d912e0e2c01b19c2a653273a&stat=instructions:u PR: https://github.com/llvm/llvm-project/pull/147214	2025-07-17 15:47:54 +01:00
Florian Hahn	afe8150780	[VPlan] Simplify exituser handling by generating all extracts first(NFCI) Simplify the handling of exit users by generating all extracts first (safe option), and have FOR handling optimize the extracts, similar to already done for reductions and inductions. NFC modulo first-order recurrence extract order in middle block.	2025-07-16 08:14:12 +01:00
Florian Hahn	cfdd5ca2ed	[LV] Add tests for fmin reductions without fast-math flags. Some of those reductions can be vectorized with extra checks. Extra tests for https://github.com/llvm/llvm-project/pull/148239 and follow-ups.	2025-07-15 13:34:12 +01:00
David Sherwood	c363a3f9c8	[LV] Ensure getScaledReductions only matches extends inside the loop (#148264 ) In getScaledReductions for the case where we try to match a partial reduction of the form: %phi = phi i32 ... ... %add = add i32 %phi, %zext where %zext = i8 %some_val to i32 we should ensure that %zext is actually inside the loop. Fixes https://github.com/llvm/llvm-project/issues/148260	2025-07-15 09:54:58 +01:00
Luke Lau	c8d0e24745	[VPlan] Preserve trunc nuw/nsw in VPRecipeWithIRFlags (#144700 ) This preserves the nuw/nsw flags on widened truncs by checking for TruncInst in the VPIRFlags constructor The motivation for this is to be able to fold away some redundant truncs feeding into uitofps (or potentially narrow the inductions feeding them)	2025-07-15 15:34:14 +08:00
Florian Hahn	5a4586f468	Reapply "[LAA] Remove loop-invariant check added in 234cc40adc61." This reverts commit d43a80936d437d217d5a6dbbaa5fb131c27e7085. With the correctness issue blocking the recommit finally fixed (5d01697ec6cb), again unconditionally check if accesses are completely before or after each other.	2025-07-14 21:21:22 +01:00
Luke Lau	df387661c4	[RISCV] Remove -riscv-v-vector-bits-min from LoopVectorize tests. NFC (#148565 ) If I understand correctly there was a point where we used to need this before it was implied by Zvl*b. Now that it is though and we use -mattr=+v in pretty much every test we can remove it. In unroll-in-loop-vectorizer.ll we can force a VF of 1 instead by using -force-vector-width=1, and in scalable-basics.ll the two RUN lines were the same so I merged them.	2025-07-14 21:59:35 +08:00
Florian Hahn	cad62df49a	[Loads] Support dereferenceable assumption with variable size. (#128436 ) Update isDereferenceableAndAlignedPointer to make use of dereferenceable assumptions with variable sizes via SCEV. To do so, factor out the logic to check via an assumption to a helper, and use SE to check if the access size is less than the dereferenceable size. PR: https://github.com/llvm/llvm-project/pull/128436	2025-07-14 08:17:33 +01:00
Florian Hahn	f4c7cc26b6	[LV] Use more precise isPredicatedInst in legacy CCH (NFC). Legal::isMaskRequired may be overly conservative and also return true when no mask is actually required. Use isPredicatedInst from the cost model instead, which fixes a cost-model divergence between legacy and VPlan cost model where the legacy cost model incorrectly assumed some loads were predicated. Fixes https://github.com/llvm/llvm-project/issues/148431.	2025-07-13 19:55:34 +01:00
Florian Hahn	cc65da0fb1	[LV] Update fmax tests to include ogt/olt/ole/ugt predicates. Adjust and update tests as per feedback in https://github.com/llvm/llvm-project/pull/146711.	2025-07-13 12:16:54 +01:00
Anna Thomas	fe403584c4	[LV] Add a statistic for early exit vectorization Add statistic LoopsEarlyExitVectorized PR: https://github.com/llvm/llvm-project/pull/145730	2025-07-11 09:10:26 -04:00
David Sherwood	74e3dfe389	[LV] Disable forcing interleaving for uncountable early exit loops (#147993 ) Interleaving does not currently work properly when vectorising loops with uncountable early exits. Interleaving is already disabled for normal vectorisation and for the pragma/hint - this patch also disables it when using -force-vector-interleave.	2025-07-11 09:46:21 +01:00
Florian Hahn	c452de1715	Reapply "[VPlan] Allow derived IVs and scalar-steps in narrowing interleave." This reverts commit f5ed863176dd286462cd5558723dfe445967fedf. Recommit patch now that the crash exposed by the change has been fixed.	2025-07-10 20:48:19 +01:00

1 2 3 4 5 ...

3239 Commits