llvm-project

Author	SHA1	Message	Date
Elvis Wang	3e898bc40f	[LV] Fix cost misaligned when gather/scatter w/ addr is uniform. (#157387 ) This patch fix the assertion when the `isUniform` (from legacy model) and `isSingleScalar`(from Vplan-based model) mismatch. The simplify test that cause assertion ``` loop: loadA = load %a => %a is loop invariant. loadB = load %LoadA ... ``` In the legacy cost model, it cannot analysis that addr of `%loadB` is uniform but in the Vplan-based cost model both addr in `%loadA` and `loadB` is single scalar. Full test caused crash: https://llvm.godbolt.org/z/zEG8YKjqh. --------- Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-11 07:49:54 +08:00
Graham Hunter	3c810b76b9	[LV] Add initial legality checks for early exit loops with side effects (#145663 ) This adds initial support to LoopVectorizationLegality to analyze loops with side effects (particularly stores to memory) and an uncountable exit. This patch alone doesn't enable any new transformations, but does give clearer reasons for rejecting vectorization for such a loop. The intent is for a loop like the following to pass the specific checks, and only be rejected at the end until the transformation code is committed: ``` // Assume a is marked restrict // Assume b is known to be large enough to access up to b[N-1] for (int i = 0; i < N; ++) { a[i]++; if (b[i] > threshold) break; } ```	2025-09-10 13:54:52 +01:00
Ramkumar Ramachandra	5544afd253	[LoopUtils] Simplify expanded RT-checks (#157518 ) Follow up on 528b13d ([SCEVExp] Add helper to clean up dead instructions after expansion.) to hoist the SCEVExapnder::eraseDeadInstructions call from LoopVectorize into the LoopUtils APIs add[Diff]RuntimeChecks, so that other callers (LoopDistribute and LoopVersioning) can benefit from the patch.	2025-09-09 11:38:54 +00:00
Florian Hahn	9b1b93766d	Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9. Recommit with with a fix for MSan failure ( https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a set to track deleted values. Using the InsertedInstructions set is not sufficient, as it use asserting value handles as keys, which may dereference the value at construction. Original message: Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-09 09:47:41 +01:00
Florian Hahn	eeb43806eb	Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 )" This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f. Triggers MSan errors in some configurations, e.g. https://lab.llvm.org/buildbot/#/builders/169/builds/14799	2025-09-08 14:52:28 +01:00
Florian Hahn	528b13df57	[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308 ) Add new helper to erase dead instructions inserted during SCEV expansion but not being used due to InstSimplifyFolder simplifications. Together with https://github.com/llvm/llvm-project/pull/157307 this also allows removing some specialized folds, e.g. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205 PR: https://github.com/llvm/llvm-project/pull/157308	2025-09-08 10:53:20 +01:00
Florian Hahn	ee29611427	[LV] Remove ILV::LoopVectorPreHeader (NFC). Remove LoopVectorPreheader member variable from ILV as it is only used by epilogue skeleton creation.	2025-09-07 13:48:00 +01:00
Florian Hahn	724a63ba8b	[LV] Use more accurate getSCEV/MemChecks in GeneratedRTCheck::hasChecks. Update hasChecks to use getSCEV/MemRuntimeChecks(), which automatically handles checking for known-false checks. This improves a few cases where we previously did not add metadata to disable runtime unrolling, due to runtime checks, even though no runtime checks are needed.	2025-09-06 19:21:11 +01:00
Florian Hahn	e0f00bd645	[LV] Don't consider second op as invariant in getDivRemSpeculationCost. The second operand when using a safe divisor will always be a select in the loop, so won't be invariant; don't treat it as such. This fixes a divergence with legacy and VPlan based cost model. Fixes https://github.com/llvm/llvm-project/issues/156066.	2025-09-06 14:06:04 +01:00
Luke Lau	4e5e65e55d	[VPlan] Only compute reg pressure if considered. NFCI (#156923 ) In #149056 VF pruning was changed so that it only pruned VFs that stemmed from MaxBandwidth being enabled. However we always compute register pressure regardless of whether or not max bandwidth is permitted for any VFs (via `MaxPermissibleVFWithoutMaxBW`). This skips the computation if not needed and renames the method for clarity. The diff in reg-usage.ll is due to the scalable VPlan not actually having any maxbandwidth VFs, so I've changed it to check the fixed-length VF instead, which is affected by maxbandwidth.	2025-09-05 00:23:47 +00:00
Shih-Po Hung	9876b06bc7	[LV] Add initial legality checks for loops with unbound loads. (#152422 ) This patch splits out the legality checks from PR #151300, following the landing of PR #128593. It is a step toward supporting vectorization of early-exit loops that contain potentially faulting loads. In this commit, an early-exit loop is considered legal for vectorization if it satisfies the following criteria: 1. it is a read-only loop. 2. all potentially faulting loads are unit-stride, which is the only type currently supported by vp.load.ff.	2025-09-05 08:20:16 +08:00
Florian Hahn	8796dfdcba	[VPlan] Consolidate logic to update loop metadata and profile info. This patch consolidates updating loop metadata and profile info for both the remainder and vector loops in a single place. This is NFC, modulo consistently applying vectorization specific metadata also in the experimental VPlan-native path. Split off from https://github.com/llvm/llvm-project/pull/154510.	2025-09-04 21:50:40 +01:00
Hassnaa Hamdi	35b22764e2	[LV][AArch64] Prefer epilogue with fixed-width over scalable VF. (#155546 ) In case of equal costs Prefer epilogue with fixed-width over scalable VF. That is helpful in cases like post-LTO vectorization where epilogue with fixed-width VF can be removed when we eventually know that the trip count is less than the epilogue iterations.	2025-09-04 19:31:30 +01:00
Ramkumar Ramachandra	d8fd511480	[VPlan] Introduce CSE pass (#151872 ) Introduce a simple common-subexpression-elimination pass at the VPlan-level, running late during the execution of the VPlan. The long-term vision is to get rid of the legacy non-VPlan-based cse routine in LV, but this patch doesn't yet fully subsume it.	2025-09-02 12:23:29 +01:00
David Sherwood	e867b85118	[LV] Always emit branch weights for vector epilogue (#155437 ) We currently only emit the branch weights for the epilogue iteration count check if there was already branch weight data for the scalar loop. However, the code makes no use of the existing branch weight when estimating the likelihood of taking a particular branch and so we can just always add the branch weights regardless. These hints should hopefully improve code generation.	2025-09-02 10:15:21 +01:00
Elvis Wang	7997a79be6	[LV] Align legacy cost model to vplan-based model for gather/scatter w/ uniform addr. (#155739 ) This patch check if the addr is uniform in legacy cost model to align vplan-based cost model after #150371. This patch fixes llvm-test-suite assertion (https://lab.llvm.org/buildbot/#/builders/210/builds/1935) due to cost model misaligned after #149955 under RISCV. I've tested this patch (on top of #149955) on the llvm-test-suite locally with crashed options `rva23u64`, `rva23u64_zvl1024b` and build successfully. Since this fix will change LV, I think would be better to create a PR to fix this.	2025-09-02 09:11:45 +08:00
Florian Hahn	bf4486eb29	[LV] Move fixing reduction resumes for epilogue out of executePlan.(NFC) Move fixing up reduction resume values out of the general ::executePlan and perform it together with updating induction resume values. This also allows moving additional bypass block handling to EpilogueVectorizerEpilogueLoop.	2025-09-01 19:39:00 +01:00
Sam Tebbs	37127f74f4	[LV] Bundle sub reductions into VPExpressionRecipe (#147255 ) This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. -> https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-09-01 17:25:01 +01:00
Florian Hahn	507ff082c2	[VPlan] Move runtime check blocks to correct position during exec (NFC). Move adjusting the position of completely disconnected IR blocks to VPIRBasicBlock::execute.	2025-09-01 16:15:02 +01:00
Mel Chen	13357e8a12	[LV][EVL] Support interleaved access with tail folding by EVL (#152070 ) The InterleavedAccess pass already supports transforming vector-predicated (vp) load/store intrinsics. With this patch, we start enabling interleaved access under tail folding by EVL. This patch introduces a new base class, VPInterleaveBase, and a concrete class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and the new VPInterleaveEVLRecipe inherit from and implement VPInterleaveBase. Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL operand to emit vp.load/vp.store intrinsics. Currently, tail folding by EVL is only supported for scalable vectorization. Therefore, VPInterleaveEVLRecipe will only emit interleave/deinterleave intrinsics. Reverse accesses are not yet implemented, as masked reverse interleaved access under tail folding is not yet supported. Fixed #123201	2025-09-01 21:20:06 +08:00
Kerry McLaughlin	f0e9bba024	[LoopVectorize] Generate wide active lane masks (#147535 ) This patch adds a new flag (-enable-wide-lane-mask) which allows LoopVectorize to generate wider-than-VF active lane masks when it is safe to do so (i.e. the mask is used for data and control flow). The transform in extractFromWideActiveLaneMask creates vector extracts from the first active lane mask in the header & loop body, modifying the active lane mask phi operands to use the extracts. An additional operand is passed to the ActiveLaneMask instruction, the value of which is used as a multiplier of VF when generating the mask. By default this is 1, and is updated to UF by extractFromWideActiveLaneMask. The motivation for this change is to improve interleaved loops when SVE2.1 is available, where we can make use of the whilelo instruction which returns a predicate pair. This is based on a PR that was created by @momchil-velikov (#81140) and contains tests which were added there.	2025-09-01 13:53:30 +01:00
Ramkumar Ramachandra	4cf770275f	[VPlan] Introduce replaceSymbolicStrides (NFC) (#155842 ) Introduce VPlanTransforms::replaceSymbolicStrides factoring some code from LoopVectorize.	2025-09-01 09:03:46 +00:00
Florian Hahn	a53a5ed65d	[VPlan] Add VPBlockBase::hasPredecessors (NFC). Split off from https://github.com/llvm/llvm-project/pull/154510/, add helper to check if a block has any predecessors.	2025-09-01 09:44:49 +01:00
Ramkumar Ramachandra	c8d7a73cf1	[LV] Improve code around operands-iterator (NFC) (#156016 )	2025-09-01 09:17:55 +01:00
Florian Hahn	b1d2c627b1	[VPlan] Unconditionally run attachRuntimeChecks (NFCI). Instead of conditionally executing attachRuntimeChecks, directly check if the blocks to be added are still disconnected.	2025-08-31 17:28:27 +01:00
Florian Hahn	0aac22758a	[LV] Correctly cost chains of replicating calls in legacy CM. Check for scalarized calls in needsExtract to fix a divergence between legacy and VPlan-based cost model. The legacy cost model was missing a check for scalarized calls in needsExtract, which meant if incorrectly assumed the result of a scalarized call needs extracting. Exposed by https://github.com/llvm/llvm-project/pull/154617. Fixes https://github.com/llvm/llvm-project/issues/156091.	2025-08-31 15:13:47 +01:00
Florian Hahn	c6b340e560	[LV] Emit remarks for vectorization decision before execute (NFCI). Move the emission of remarks for the vectorization decision before executing the plan, in preparation for https://github.com/llvm/llvm-project/pull/154510.	2025-08-31 14:41:40 +01:00
Florian Hahn	f07dc6f119	[LV] Remove special handling for interlaving only. (NFC) Remove the special code for handling interleaving only, as it will be handled naturally by the generic code handling arbitrary IC & VF.	2025-08-29 19:40:43 +01:00
Ramkumar Ramachandra	cdbef270e6	[LV] Use DenseMap::keys to improve code (NFC) (#156014 )	2025-08-29 19:35:20 +01:00
Florian Hahn	47737cdeda	[LV] Move introduceCheckBlockInVPlan to EpilogueVectorizerMainLoop (NFC) Move it to the sub-class that is actually using it.	2025-08-28 19:18:02 +01:00
Florian Hahn	ffd0f5fd21	[LV] Remove unneeded ILV::LoopScalarPreHeader (NFC). Follow-up suggested in https://github.com/llvm/llvm-project/pull/153643. Remove some more global state by directly returning the scalar preheader from createScalarPreheader.	2025-08-26 21:44:48 +01:00
Florian Hahn	5088795994	[LV] Remove unused ILV::VectorTripCount (NFC). The field is no longer used, remove it.	2025-08-26 19:01:51 +01:00
Florian Hahn	5faed1ad84	[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643 ) This patch adds a new VPlan-based addMinimumIterationCheck, which replaced the ILV version for the non-epilogue case. The VPlan-based version constructs a SCEV expression to compute the minimum iterations, use that to check if the check is known true or false. Otherwise it creates a VPExpandSCEV recipe and emits a compare-and-branch. When using epilogue vectorization, we still need to create the minimum trip-count-check during the legacy skeleton creation. The patch moves the definitions out of ILV. PR: https://github.com/llvm/llvm-project/pull/153643	2025-08-26 15:52:31 +01:00
Kerry McLaughlin	884c03e71b	[LV] Return Invalid from getLegacyCost when instruction cost forced. (#154543 ) LoopVectorizationCostModel::expectedCost will only override the cost returned by getInstructionCost when valid. This patch ensures we do the same in VPCostContext::getLegacyCost, avoiding the "VPlan cost model and legacy cost model disagreed" assert in the included test.	2025-08-26 10:26:57 +01:00
David Sherwood	d606eae2ce	[LV] Stop using the legacy cost model for udiv + friends (#152707 ) In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either: 1. Scalarised the operation, in which case we wouldn't be using a VPWidenRecipe, or 2. We've inserted a select for the second operand to ensure we don't fault through divide-by-zero. For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which aren't accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions.	2025-08-26 10:17:23 +01:00
Florian Hahn	c950a72974	[VPlan] Support scalar VF for ExtractLane and FirstActiveLane. Extend ExtractLane and FirstActiveLane to support scalable VFs. This allows correct handling when interleaving with VF = 1. Alive2 proofs: - Fixed codegen with this patch: https://alive2.llvm.org/ce/z/8Y5_Vc (verifies as correct) - Original codegen: https://alive2.llvm.org/ce/z/twdg3X (doesn't verify) Fixes https://github.com/llvm/llvm-project/issues/154967.	2025-08-25 21:45:21 +01:00
Ramkumar Ramachandra	66be00d635	[VPlan] Introduce m_Cmp; match more compares (#154771 ) Extend [Specific]Cmp_match to handle floating-point compares, and introduce m_Cmp that matches both integer and floating-point compares. Use it in simplifyRecipe to match and simplify the general case of compares. The change has necessitated a bugfix in VPReplicateRecipe::execute.	2025-08-24 13:27:06 +01:00
Florian Hahn	954097dd61	[VPlan] Use SCEV to check subtract in getOptimizableIVOf. Simplify checks for IV subtractions in getOptimizableIVOf by using SCEV. This slightly generalizes the patterns we can handle.	2025-08-23 22:00:01 +01:00
Florian Hahn	9f87cd68a4	[VPlan] Add m_ExtractLastElement matcher. (NFC)	2025-08-23 21:21:03 +01:00
Florian Hahn	30c26dcc47	[VPlan] Create extracts for live-outs early (NFC). Create extracts for live-outs during skeleton construction.	2025-08-23 13:28:15 +01:00
Florian Hahn	300d2c6d20	[VPlan] Move SCEV expansion to VPlan transform. (NFCI). Move the logic to expand SCEVs directly to a late VPlan transform that expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an abstract recipe without execute, which clarifies how the recipe is handled, i.e. it is not executed like regular recipes. It also helps to simplify construction, as now scalar evolution isn't required to be passed to the recipe.	2025-08-21 22:03:26 +01:00
Ramkumar Ramachandra	a96b78cf41	[SCEVPatternMatch] Add signed cst match; use in LV (NFC) (#154568 ) Add a m_scev_SpecificSInt for matching a sign-extended value, and use it to improve some code in LoopVectorize.	2025-08-21 15:46:53 +00:00
Florian Hahn	4e6c88be7c	[TTI] Remove Args argument from getOperandsScalarizationOverhead (NFC). (#154126 ) Remove the ArrayRef<const Value> Args operand from getOperandsScalarizationOverhead and require that the callers de-duplicate arguments and filter constant operands. Removing the Value based Args argument enables callers where no Value * operands are available to use the function in a follow-up: computing the scalarization cost directly for a VPlan recipe. It also allows more accurate cost-estimates in the future: for example, when vectorizing a loop, we could also skip operands that are live-ins, as those also do not require scalarization. PR: https://github.com/llvm/llvm-project/pull/154126	2025-08-20 21:09:08 +01:00
David Sherwood	e172110d12	[LV] Don't calculate scalar costs for scalable VFs in setVectorizedCallDecision (#152713 ) In setVectorizedCallDecision we attempt to calculate the scalar costs for vectorisation calls, even for scalable VFs where we already know the answer is Invalid. We can avoid doing unnecessary work by skipping this completely for scalable vectors.	2025-08-20 15:00:31 +01:00
Florian Hahn	dc23869f98	[LV] Handle vector trip count being zero in preparePlanForEpiVectorLoop. After a485e0e, we may not set the vector trip count in preparePlanForEpilogueVectorLoop if it is zero. We should not choose a VF * UF that makes the main vector loop dead (i.e. vector trip count is zero), but there are some cases where this can happen currently. In those cases, set EPI.VectorTripCount to zero.	2025-08-20 11:54:22 +01:00
David Sherwood	13d8ba7dea	[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086 ) There are a couple of places in the loop vectoriser where we want to calculate the cost of extracting the last lane in a vector. However, we wrongly assume that asking for the cost of extracting lane (VF.getKnownMinValue() - 1) is an accurate representation of the cost of extracting the last lane. For SVE at least, this is non-trivial as it requires the use of whilelo and lastb instructions. To solve this problem I have added a new getReverseVectorInstrCost interface where the index is used in reverse from the end of the vector. Suppose a vector has a given ElementCount EC, the extracted/inserted lane would be EC - 1 - Index. For scalable vectors this index is unknown at compile time. I've added a AArch64 hook that better represents the cost, and also a RISCV hook that maintains compatibility with the behaviour prior to this PR. I've also taken the liberty of adding support in vplan for calculating the cost of VPInstruction::ExtractLastElement.	2025-08-19 09:31:37 +01:00
Mel Chen	1dac302ce7	[LV] Explicitly disallow interleaved access requiring gap mask for scalable VFs. nfc (#154122 ) Currently, VPInterleaveRecipe::execute does not support generating LLVM IR for interleaved accesses that require a gap mask for scalable VFs. It would be better to detect and prevent such groups from being vectorized as interleaved accesses in LoopVectorizationCostModel::interleavedAccessCanBeWidened, rather than relying on the TTI function getInterleavedMemoryOpCost to return an invalid cost.	2025-08-19 08:42:39 +08:00
Florian Hahn	7e9989390d	[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487 ) Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to serve their users requiring a vector, instead of doing so when unrolling by VF. Now we only need to implicitly build vectors in VPTransformState::get for VPInstructions. Once they are also unrolled by VF we can remove the code-path alltogether. PR: https://github.com/llvm/llvm-project/pull/151487	2025-08-18 20:49:42 +01:00
Kazu Hirata	07eb7b7692	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.	2025-08-18 07:01:29 -07:00
Florian Hahn	351d398a37	[VPlan] Run final VPlan simplifications before codegen. Dissolving the hierarchical VPlan CFG and converting abstract to concrete recipes can expose additional simplification opportunities. Do a final run of simplifyRecipes before executing the VPlan.	2025-08-16 18:54:27 +01:00

1 2 3 4 5 ...

2707 Commits