llvm-project

Author	SHA1	Message	Date
Florian Hahn	62c50fd795	[VPlan] Retrieve canonical IV directly in preparePlanForEpilogue (NFCI). Move code handling canonical IV out of the loop, simplifying the loop body. Preparation for follow-up changes	2025-10-01 19:32:54 +01:00
Florian Hahn	f61be43525	Revert "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053 )" This reverts commit b4be7ecaf06bfcb4aa8d47c4fda1eed9bbe4ae77. See https://github.com/llvm/llvm-project/issues/161404 for a crash exposed by the change. Revert while I investigate.	2025-09-30 22:13:06 +01:00
Ramkumar Ramachandra	280abaf9da	[VPlan] Handle scalar-VF in transforms (NFC) (#161365 )	2025-09-30 19:35:12 +01:00
Ramkumar Ramachandra	2f7252a841	[LV] Preserve GEP nusw when widening memory (#160885 )	2025-09-30 10:42:45 +00:00
Florian Hahn	45ce88758d	[LV] Don't preserve LCSSA in SCEVExpander for runtime checks. (#159556 ) LV does not preserve LCSSA, it constructs it just before processing a loop to vectorize. Runtime check expressions are invariant to that loop, so expanding them should not break LCSSA form for the loop we are about to vectorize. This fixes a crash when discarding instructions generated when expanding runtime checks, if the expansion introduces LCSSA phis for values from other loops which are not in LCSSA form: we would introduce new LCSSA phis and update all outside users, some of which are not created by the expander and cannot be cleaned up. Fixes https://github.com/llvm/llvm-project/issues/158259. PR: https://github.com/llvm/llvm-project/pull/159556	2025-09-30 10:03:55 +00:00
Florian Hahn	b4be7ecaf0	[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053 ) Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: https://github.com/llvm/llvm-project/pull/160053	2025-09-29 08:08:09 +00:00
Ramkumar Ramachandra	0fc6213aee	[LV] Clarify nature of legacy CSE (NFC) (#160855 ) In order to avoid conflating the legacy CSE with the VPlan-based one, rename the legacy CSE and insert a FIXME to clarify the nature of the legacy CSE.	2025-09-28 10:02:22 +01:00
Florian Hahn	78af056137	[VPlan] Run CSE closer to VPlan::execute. (#160572 ) Additional CSE opportunities are exposed after converting to concrete recipes/dissolving regions and materializing various expressions. Run CSE later, to capitalize on some of the late opportunities. PR: https://github.com/llvm/llvm-project/pull/160572	2025-09-26 09:38:58 +00:00
Florian Hahn	2016af5652	[VPlan] Create epilogue minimum iteration check in VPlan. (#157545 ) Move creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan. It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis. Test changes are improvements due to folding of live-ins. PR: https://github.com/llvm/llvm-project/pull/157545	2025-09-25 07:13:38 +00:00
Florian Hahn	ac3f148f60	[LV] Set extend kinds together with ExtOpTypes (NFC). Set extend kinds together with ExtOpTypes. This will make it easier to adjust the extend kind handling.	2025-09-24 21:36:11 +01:00
Florian Hahn	a7b4dd42bd	[LV] Don't create partial reductions if factor doesn't match accumulator (#158603 ) Check if the scale-factor of the accumulator is the same as the request ScaleFactor in tryToCreatePartialReductions. This prevents creating partial reductions if not all instructions in the reduction chain form partial reductions. e.g. because we do not form a partial reduction for the loop exit instruction. Currently code-gen works fine, because the scale factor of VPPartialReduction is not used during ::execute, but it means we compute incorrect cost/register pressure, because the partial reduction won't reduce to the specified scaling factor. PR: https://github.com/llvm/llvm-project/pull/158603	2025-09-24 12:21:03 +01:00
Ramkumar Ramachandra	66fd42008a	[LV] Don't ignore invariant stores when costing (#158682 ) Invariant stores of reductions are removed early in the VPlan construction, and there is no reason to ignore them while costing.	2025-09-24 11:33:24 +01:00
Florian Hahn	88aab08ae5	[LV] Check for hoisted safe-div selects in planContainsAdditionalSimp. In some cases, safe-divisor selects can be hoisted out of the vector loop. Catching all cases in the legacy cost model isn't possible, in particular checking if all conditions guarding a division are loop invariant. Instead, check in planContainsAdditionalSimplifications if there are any hoisted safe-divisor selects. If so, don't compare to the more inaccurate legacy cost model. Fixes https://github.com/llvm/llvm-project/issues/160354. Fixes https://github.com/llvm/llvm-project/issues/160356.	2025-09-23 21:54:02 +01:00
Florian Hahn	49605a4727	[LV] Set correct costs for interleave group members. This ensures each scalarized member has an accurate cost, matching the cost it would have if it would not have been considered for an interleave group.	2025-09-21 18:07:22 +01:00
Florian Hahn	addfdb5273	[LV] Skip select cost for invariant divisors in legacy cost model. For UDiv/SDiv with invariant divisors, the created selects will be hoisted out. Don't compute their cost for each iteration, to match the more accurate VPlan-based cost modeling. Fixes https://github.com/llvm/llvm-project/issues/159402.	2025-09-21 15:08:50 +01:00
Florian Hahn	7dd9b3d814	[LV] Also handle non-uniform scalarized loads when processing AddrDefs. Loads of addresses are scalarized and have their costs computed w/o scalarization overhead. Consistently apply this logic also to non-uniform loads that are already scalarized, to ensure their costs are consistent with other scalarized lodas that are used as addresses.	2025-09-21 09:36:58 +01:00
Florian Hahn	81c0c7337d	[LV] Pass operand info to getMemoryOpCost in getMemInstScalarizationCost. Pass operand info to getMemoryOpCost in getMemInstScalarizationCost. This matches the behavior in VPReplicateRecipe::computeCost.	2025-09-19 21:03:38 +01:00
Florian Hahn	0c028bbf33	[LV] Always add uniform pointers to uniforms list. Always add pointers proved to be uniform via legal/SCEV to worklist. This extends the existing logic to handle a few more pointers known to be uniform.	2025-09-18 22:56:19 +01:00
Florian Hahn	50b9ca4dda	[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510 ) After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510	2025-09-18 19:25:05 +01:00
Ramkumar Ramachandra	f68f3b9a7e	[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550 )	2025-09-18 12:40:31 +01:00
Ramkumar Ramachandra	0384f6c9db	[VPlanPatternMatch] Introduce match functor (NFC) (#159521 ) Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to introduce the VPlanPatternMatch version of the match functor to shorten some idioms. Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-18 10:36:12 +01:00
Hassnaa Hamdi	e8aa0b688a	[LV]: Ensure fairness when selecting epilogue VF. (#155547 ) Consider IC when deciding if epilogue profitable for scalable vectors, same as fixed-width vectors.	2025-09-17 14:48:10 +01:00
Ramkumar Ramachandra	148a83543b	[LV] Introduce m_One and improve (0\|1)-match (NFC) (#157419 )	2025-09-15 10:34:06 +00:00
Luke Lau	4bb250d6a3	[VPlan] Always consider register pressure on RISC-V (#156951 ) Stacked on #156923 In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because whilst the largest element type is i8, we generate a bunch of pointer vectors for gathers and scatters. This means the VF chosen is quite high e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x i64> m8 registers for the pointers. This was briefly fixed by #132190 where we computed register pressure in VPlan and used it to prune VFs that were likely to spill. The legacy cost model wasn't able to do this pruning because it didn't have visibility into the pointer vectors that were needed for the gathers/scatters. However VF pruning was restricted again to just the case when max bandwidth was enabled in #141736 to avoid an AArch64 regression, and restricted again in #149056 to only prune VFs that had max bandwidth enabled. On RISC-V we take advantage of register grouping for performance and choose a default of LMUL 2, which means there are 16 registers to work with – half the number as SVE, so we encounter higher register pressure more frequently. As such, we likely want to always consider pruning VFs with high register pressure and not just the VFs from max bandwidth. This adds a TTI hook to opt into this behaviour for RISC-V which fixes the motivating godbolt example above. When last checked this significantly reduces the number of spills on SPEC CPU 2017, up to 80% on 538.imagick_r.	2025-09-12 06:21:54 +00:00
Elvis Wang	3e898bc40f	[LV] Fix cost misaligned when gather/scatter w/ addr is uniform. (#157387 ) This patch fix the assertion when the `isUniform` (from legacy model) and `isSingleScalar`(from Vplan-based model) mismatch. The simplify test that cause assertion ``` loop: loadA = load %a => %a is loop invariant. loadB = load %LoadA ... ``` In the legacy cost model, it cannot analysis that addr of `%loadB` is uniform but in the Vplan-based cost model both addr in `%loadA` and `loadB` is single scalar. Full test caused crash: https://llvm.godbolt.org/z/zEG8YKjqh. --------- Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-11 07:49:54 +08:00
Graham Hunter	3c810b76b9	[LV] Add initial legality checks for early exit loops with side effects (#145663 ) This adds initial support to LoopVectorizationLegality to analyze loops with side effects (particularly stores to memory) and an uncountable exit. This patch alone doesn't enable any new transformations, but does give clearer reasons for rejecting vectorization for such a loop. The intent is for a loop like the following to pass the specific checks, and only be rejected at the end until the transformation code is committed: ``` // Assume a is marked restrict // Assume b is known to be large enough to access up to b[N-1] for (int i = 0; i < N; ++) { a[i]++; if (b[i] > threshold) break; } ```	2025-09-10 13:54:52 +01:00
Ramkumar Ramachandra	5544afd253	[LoopUtils] Simplify expanded RT-checks (#157518 ) Follow up on 528b13d ([SCEVExp] Add helper to clean up dead instructions after expansion.) to hoist the SCEVExapnder::eraseDeadInstructions call from LoopVectorize into the LoopUtils APIs add[Diff]RuntimeChecks, so that other callers (LoopDistribute and LoopVersioning) can benefit from the patch.	2025-09-09 11:38:54 +00:00
Florian Hahn	9b1b93766d	Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9. Recommit with with a fix for MSan failure ( https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a set to track deleted values. Using the InsertedInstructions set is not sufficient, as it use asserting value handles as keys, which may dereference the value at construction. Original message: Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-09 09:47:41 +01:00
Florian Hahn	eeb43806eb	Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f. Triggers MSan errors in some configurations, e.g. https://lab.llvm.org/buildbot/#/builders/169/builds/14799	2025-09-08 14:52:28 +01:00
Florian Hahn	528b13df57	[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 ) Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-08 10:53:20 +01:00
Florian Hahn	ee29611427	[LV] Remove ILV::LoopVectorPreHeader (NFC). Remove LoopVectorPreheader member variable from ILV as it is only used by epilogue skeleton creation.	2025-09-07 13:48:00 +01:00
Florian Hahn	724a63ba8b	[LV] Use more accurate getSCEV/MemChecks in GeneratedRTCheck::hasChecks. Update hasChecks to use getSCEV/MemRuntimeChecks(), which automatically handles checking for known-false checks. This improves a few cases where we previously did not add metadata to disable runtime unrolling, due to runtime checks, even though no runtime checks are needed.	2025-09-06 19:21:11 +01:00
Florian Hahn	e0f00bd645	[LV] Don't consider second op as invariant in getDivRemSpeculationCost. The second operand when using a safe divisor will always be a select in the loop, so won't be invariant; don't treat it as such. This fixes a divergence with legacy and VPlan based cost model. Fixes https://github.com/llvm/llvm-project/issues/156066.	2025-09-06 14:06:04 +01:00
Luke Lau	4e5e65e55d	[VPlan] Only compute reg pressure if considered. NFCI (#156923 ) In #149056 VF pruning was changed so that it only pruned VFs that stemmed from MaxBandwidth being enabled. However we always compute register pressure regardless of whether or not max bandwidth is permitted for any VFs (via `MaxPermissibleVFWithoutMaxBW`). This skips the computation if not needed and renames the method for clarity. The diff in reg-usage.ll is due to the scalable VPlan not actually having any maxbandwidth VFs, so I've changed it to check the fixed-length VF instead, which is affected by maxbandwidth.	2025-09-05 00:23:47 +00:00
Shih-Po Hung	9876b06bc7	[LV] Add initial legality checks for loops with unbound loads. (#152422 ) This patch splits out the legality checks from PR #151300, following the landing of PR #128593. It is a step toward supporting vectorization of early-exit loops that contain potentially faulting loads. In this commit, an early-exit loop is considered legal for vectorization if it satisfies the following criteria: 1. it is a read-only loop. 2. all potentially faulting loads are unit-stride, which is the only type currently supported by vp.load.ff.	2025-09-05 08:20:16 +08:00
Florian Hahn	8796dfdcba	[VPlan] Consolidate logic to update loop metadata and profile info. This patch consolidates updating loop metadata and profile info for both the remainder and vector loops in a single place. This is NFC, modulo consistently applying vectorization specific metadata also in the experimental VPlan-native path. Split off from https://github.com/llvm/llvm-project/pull/154510.	2025-09-04 21:50:40 +01:00
Hassnaa Hamdi	35b22764e2	[LV][AArch64] Prefer epilogue with fixed-width over scalable VF. (#155546 ) In case of equal costs Prefer epilogue with fixed-width over scalable VF. That is helpful in cases like post-LTO vectorization where epilogue with fixed-width VF can be removed when we eventually know that the trip count is less than the epilogue iterations.	2025-09-04 19:31:30 +01:00
Ramkumar Ramachandra	d8fd511480	[VPlan] Introduce CSE pass (#151872 ) Introduce a simple common-subexpression-elimination pass at the VPlan-level, running late during the execution of the VPlan. The long-term vision is to get rid of the legacy non-VPlan-based cse routine in LV, but this patch doesn't yet fully subsume it.	2025-09-02 12:23:29 +01:00
David Sherwood	e867b85118	[LV] Always emit branch weights for vector epilogue (#155437 ) We currently only emit the branch weights for the epilogue iteration count check if there was already branch weight data for the scalar loop. However, the code makes no use of the existing branch weight when estimating the likelihood of taking a particular branch and so we can just always add the branch weights regardless. These hints should hopefully improve code generation.	2025-09-02 10:15:21 +01:00
Elvis Wang	7997a79be6	[LV] Align legacy cost model to vplan-based model for gather/scatter w/ uniform addr. (#155739 ) This patch check if the addr is uniform in legacy cost model to align vplan-based cost model after #150371. This patch fixes llvm-test-suite assertion (https://lab.llvm.org/buildbot/#/builders/210/builds/1935) due to cost model misaligned after #149955 under RISCV. I've tested this patch (on top of #149955) on the llvm-test-suite locally with crashed options `rva23u64`, `rva23u64_zvl1024b` and build successfully. Since this fix will change LV, I think would be better to create a PR to fix this.	2025-09-02 09:11:45 +08:00
Florian Hahn	bf4486eb29	[LV] Move fixing reduction resumes for epilogue out of executePlan.(NFC) Move fixing up reduction resume values out of the general ::executePlan and perform it together with updating induction resume values. This also allows moving additional bypass block handling to EpilogueVectorizerEpilogueLoop.	2025-09-01 19:39:00 +01:00
Sam Tebbs	37127f74f4	[LV] Bundle sub reductions into VPExpressionRecipe (#147255 ) This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. -> https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-09-01 17:25:01 +01:00
Florian Hahn	507ff082c2	[VPlan] Move runtime check blocks to correct position during exec (NFC). Move adjusting the position of completely disconnected IR blocks to VPIRBasicBlock::execute.	2025-09-01 16:15:02 +01:00
Mel Chen	13357e8a12	[LV][EVL] Support interleaved access with tail folding by EVL (#152070 ) The InterleavedAccess pass already supports transforming vector-predicated (vp) load/store intrinsics. With this patch, we start enabling interleaved access under tail folding by EVL. This patch introduces a new base class, VPInterleaveBase, and a concrete class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and the new VPInterleaveEVLRecipe inherit from and implement VPInterleaveBase. Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL operand to emit vp.load/vp.store intrinsics. Currently, tail folding by EVL is only supported for scalable vectorization. Therefore, VPInterleaveEVLRecipe will only emit interleave/deinterleave intrinsics. Reverse accesses are not yet implemented, as masked reverse interleaved access under tail folding is not yet supported. Fixed #123201	2025-09-01 21:20:06 +08:00
Kerry McLaughlin	f0e9bba024	[LoopVectorize] Generate wide active lane masks (#147535 ) This patch adds a new flag (-enable-wide-lane-mask) which allows LoopVectorize to generate wider-than-VF active lane masks when it is safe to do so (i.e. the mask is used for data and control flow). The transform in extractFromWideActiveLaneMask creates vector extracts from the first active lane mask in the header & loop body, modifying the active lane mask phi operands to use the extracts. An additional operand is passed to the ActiveLaneMask instruction, the value of which is used as a multiplier of VF when generating the mask. By default this is 1, and is updated to UF by extractFromWideActiveLaneMask. The motivation for this change is to improve interleaved loops when SVE2.1 is available, where we can make use of the whilelo instruction which returns a predicate pair. This is based on a PR that was created by @momchil-velikov (#81140) and contains tests which were added there.	2025-09-01 13:53:30 +01:00
Ramkumar Ramachandra	4cf770275f	[VPlan] Introduce replaceSymbolicStrides (NFC) (#155842 ) Introduce VPlanTransforms::replaceSymbolicStrides factoring some code from LoopVectorize.	2025-09-01 09:03:46 +00:00
Florian Hahn	a53a5ed65d	[VPlan] Add VPBlockBase::hasPredecessors (NFC). Split off from https://github.com/llvm/llvm-project/pull/154510/, add helper to check if a block has any predecessors.	2025-09-01 09:44:49 +01:00
Ramkumar Ramachandra	c8d7a73cf1	[LV] Improve code around operands-iterator (NFC) (#156016 )	2025-09-01 09:17:55 +01:00
Florian Hahn	b1d2c627b1	[VPlan] Unconditionally run attachRuntimeChecks (NFCI). Instead of conditionally executing attachRuntimeChecks, directly check if the blocks to be added are still disconnected.	2025-08-31 17:28:27 +01:00
Florian Hahn	0aac22758a	[LV] Correctly cost chains of replicating calls in legacy CM. Check for scalarized calls in needsExtract to fix a divergence between legacy and VPlan-based cost model. The legacy cost model was missing a check for scalarized calls in needsExtract, which meant if incorrectly assumed the result of a scalarized call needs extracting. Exposed by https://github.com/llvm/llvm-project/pull/154617. Fixes https://github.com/llvm/llvm-project/issues/156091.	2025-08-31 15:13:47 +01:00

1 2 3 4 5 ...

2731 Commits