llvm-project

Author	SHA1	Message	Date
Florian Hahn	7509cad693	[VPlan] Support masked VPInsts, use for predication (NFC) (#142285 ) Add support for mask operands to most VPInstructions, using getNumOperandsForOpcode. This allows VPlan predication to predicate VPInstructions directly. The mask will then be dropped or handled when creating wide recipes. Depends on https://github.com/llvm/llvm-project/pull/142284. Depends on https://github.com/llvm/llvm-project/pull/168784. PR: https://github.com/llvm/llvm-project/pull/142285	2026-02-08 18:23:36 +00:00
Florian Hahn	b0d95f0c7b	[VPlan] Handle Mul/UDiv in getSCEVExprForVPValue (NFCI). Support Mul/UDiv and AND-variant (https://alive2.llvm.org/ce/z/rBJVdg) in getSCEVExprForVPValue. This is used in code paths when computing SCEV expressions in the VPlan-based cost model, which should produce costs matching the legacy cost model.	2026-02-01 21:41:30 +00:00
Florian Hahn	90b3712d8a	Reapply "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851 )" This reverts commit d1e477b00b49c63ff4dd513eeb14a5b18bc055d7. Recommit with a extra checks making sure extends are VPWidenCastRecipes, rejecting VPReplicateRecipes. Original message: As a first step, move the existing partial reduction detection logic to VPlan, trying to preserve the existing code structure & behavior as closely as possible. With this, partial reductions are detected and created together in a single step. This allows forming partial reductions and bundling them up if profitable together in a follow-up. PR: https://github.com/llvm/llvm-project/pull/167851	2026-02-01 16:27:27 +00:00
Martin Storsjö	d1e477b00b	Revert "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851 )" This reverts commit f4e8cc1a2229dca76d21c8d37439c4c194b06b86. This change wasn't NFC; it causes failed asserts when building ffmpeg for i686 windows, see https://github.com/llvm/llvm-project/pull/167851 for details.	2026-02-01 14:35:02 +02:00
Florian Hahn	f4e8cc1a22	[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851 ) As a first step, move the existing partial reduction detection logic to VPlan, trying to preserve the existing code structure & behavior as closely as possible. With this, partial reductions are detected and created together in a single step. This allows forming partial reductions and bundling them up if profitable together in a follow-up. PR: https://github.com/llvm/llvm-project/pull/167851	2026-01-31 19:44:46 +00:00
Jakub Kuderski	55fbb71db1	[llvm] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178502 ) Pre-commiting this before landing the new check in https://github.com/llvm/llvm-project/pull/177892	2026-01-28 15:44:04 -05:00
Florian Hahn	14a209f852	[VPlan] Replace ComputeFindIVRes with ComputeRdxRes + cmp + sel (NFC) (#176672 ) Replace ComputeFindIVResult with ComputeReductionResult + explicit compare + select, to more explicitly and simpler model computing finding the first/last induction, which boils down to a min/max reduction + compare and select of the sentinel value. PR: https://github.com/llvm/llvm-project/pull/176672	2026-01-22 19:28:47 +00:00
Florian Hahn	3beb520ce1	[VPlan] Support VPWidenPointerInduction in getSCEVExprForVPValue (NFCI) Support VPWidenPointerInductionRecipe in getSCEVExprForVPValue. This is used in code paths when computing SCEV expressions in the VPlan-based cost model, which should produce costs matching the legacy cost model.	2026-01-21 22:40:04 +00:00
Florian Hahn	83b13e6de9	[VPLan] Update formatting in getSCEVExprForVPValue (NFC). Reformat TypeSwitch in getSCEVExprForVPValue, to reduce diff in follow-up changes.	2026-01-21 22:24:06 +00:00
Florian Hahn	6cc18a8e43	[VPlan] Support more GEP-like recipes in getSCEVExprForVPValue (NFCI) Support VPWidenGEPRecipe, VPInstructions and VPRelpicateRecipe with GEP-like opcodes in getSCEVExprForVPValue via a new matcher binding source element type and operands. This is used in code paths when computing SCEV expressions in the VPlan-based cost model, which should produce costs matching the legacy cost model.	2026-01-18 22:20:25 +00:00
Florian Hahn	d0c87356d1	[VPlan] Handle constant step for VPScalarIVSteps in getSCEVExpr (NFC). Update getSCEVExprForVPValue to handle VPScalarIVSteps with any constant step. getSCEVExprForVPValue computes the SCEV for lane 0, so we can simply return the IV operand, truncated/extended as needed. This should be NFC and is tested via the VPlan-based cost-model, which should compute costs matching the legacy cost model.	2026-01-14 22:29:50 +00:00
Florian Hahn	2f7e218017	[VPlan] Add missing sext(sub) SCEV fold to getSCEVExprForVPValue. SCEV has a manual fold when doing SCEV construction from IR, that is not integrated in the regular SCEV construction functions. Mirror the behavior in getSCEVExprForVPValue, to match results when constructing SCEVs from IR. Fixes https://github.com/llvm/llvm-project/issues/174622.	2026-01-11 20:51:13 +00:00
Florian Hahn	31b93d6e38	[VPlan] Add specialized VPValue subclasses for different types (NFC) (#172758 ) This patch adds VPValue sub-classes for the different cases we currently have: * VPIRValue: A live-in VPValue that wraps an underlying IR value * VPSymbolicValue: A symbolic VPValue not tied to an underlying value, e.g. the vector trip count or VF VPValues * VPRecipeValue: A VPValue defined by a VPDef/VPRecipeBase. This has multiple benefits: * clearer constructors for each kind of VPValue * limited scope: for example allows moving VPDef member to VPRecipeValue, reducing size of other VPValues. * stricter type checking for member variables (e.g. using VPLiveIn in the Value -> live-in map in VPlan, or using VPSymbolicValue for symbolic member VPValues) There probably are additional opportunities for cleanups as follow-ups. PR: https://github.com/llvm/llvm-project/pull/172758	2026-01-07 20:29:05 +00:00
Florian Hahn	16830b2164	[VPlan] Remove VPWidenSelectRecipe, use VPWidenRecipe instead (NFCI). (#174234 ) All extra state has been removed from VPWidenSelectRecipe at this point. There's no benefit of having a separate recipe and Select can easily be handled by the existing VPWidenRecipe. PR: https://github.com/llvm/llvm-project/pull/174234	2026-01-05 22:33:37 +00:00
Florian Hahn	3f5ee8aa76	[VPlan] Handle VPInstruction::Not in getSCEVExprForVPValue (NFC). https://alive2.llvm.org/ce/z/jpLaJX	2026-01-03 22:32:52 +00:00
Florian Hahn	524b1788c4	[VPlan] Add BranchOnTwoConds, use for early exit plans. (#172750 ) This PR introduces a new BranchOnTwoConds VPInstruction, that takes 2 boolean operands and must be placed in a block with 3 successors. If condition I is true, branches to successor I, otherwise falls through to check the next condition. If both conditions are false, branch to the third successor. This new branch recipe is used for early-exit loops, to simplify the representation in VPlan initially, by avoid the need for splitting the middle block early on, in a way that preserves the single-exit block property of regions. All exits still go through the latch block, but they can go to more than 2 successors. This idea was part of one of the original proposals for how to model early exits in VPlan, but at that point in time, there was no good way to handle this during code-gen, and we went with the early split-middle block approach initially. Now that we dissolve regions before ::execute, the new recipe can be lowered nicely after regions have been removed, to a set of VPBBs and BranchOnCond recipes. The initial lowering preserves the original structure with the split middle blocks. Follow-ups will improve the lowering to avoid this splitting, providing performance gains. PR: https://github.com/llvm/llvm-project/pull/172750	2025-12-29 19:39:38 +00:00
Florian Hahn	7de080482c	[VPlan] Handle min/max intrinsics in getSCEVExprForVPValue (NFCI) Use m_Intrinsic to handle min/max intrinsics in getSCEVExprForVPValue. This also extends Argument_match and IntrinsicID_match to VPInstruction for completeness, and unifies the handling to avoid looking up functions from the underlying IR instruction. Tested via the VPlan-based cost-model, but same costs should be computed. As part of the extension, fix a bug in Argument_match that had an incorrect offset for the operands of VPReplicateRecipe; the function is the last argument.	2025-12-28 22:28:16 +00:00
Florian Hahn	60e5b86052	[VPlan] Support extends and truncs in getSCEVExprForVPValue. (NFCI) Handle extends and truncates in getSCEVExprForVPValue. This enables computing SCEVs in more cases in the VPlan-based cost-model, but should compute the matching costs in all cases.	2025-12-26 21:38:14 +00:00
Florian Hahn	15bf7079b0	[VPlan] Support truncated IVs in getSCEVExprForVPValue. (NFCI) Handle truncated inductions in getSCEVExprForVPValue. This means we are able to compute SCEV expressions for more inductions used in the VPlan-based cost model, which should produce costs matching the legacy cost model.	2025-12-25 22:03:29 +00:00
Florian Hahn	ee1bac863a	[VPlan] Support binary add/sub in getSCEVExprForVPValue. (NFCI) Handle binary add/sub in getSCEVExprForVPValue. This means we are able to compute more replicate recipe costs in the VPlan cost model. It should produce the same costs.	2025-12-24 23:00:16 +00:00
Florian Hahn	c43ccefc9f	[VPlan] Use PSE to construct SCEVs in getSCEVExprForVPValue (NFCI). getSCEVExprForVPValue is used to create SCEVs for expressions from the original loop, which may be predicated. Use PSE to construct predicated SCEVs if possible. This matches the legacy LV code behavior. Currently should be NFC, but will enable migrating more SCEV/cost-based computations to VPlan. The patch requires exposing a new getPredicatedSCEV helper to PredicatedScalarEvolution which just takes a SCEV, to avoid needing to go through IR values, which isn't an option for getSCEVExprForVPValue.	2025-12-21 22:39:49 +00:00
Florian Hahn	1f78f6a2d6	[LV] Check Addr in getAddressAccessSCEV in terms of SCEV expressions. (#171204 ) getAddressAccessSCEV previously had some restrictive checks that limited pointer SCEV expressions passed to TTI to GEPs with operands that must either be invariant or marked as inductions. As a consequence, the check rejected things like `GEP %base, (%iv + 1)`, while the SCEV for the GEP should be as easily analyzeable as for `GEP %base, %v`, with the only difference being the of the AddRec start adjusted by 1. This patch changes the code to use a SCEV-based check, limiting the address SCEV to be loop invariant, an affine AddRec (i.e. induction ), or an add expression of such operands or a sign-extended AddRec. This catches all existing cases getAddressAccessSCEV caught, plus additional ones like the cases mentioned above. This means we pass address SCEVs in more cases, giving the backends a better change to make informed decisions. It also unifies the decision when to use an address SCEV between the legacy and VPlan-based cost model. An illustrative example of showing the impact are the gather-cost.ll tests. Previously they were considered not profitable to vectorize because we failed to determine that %gep.src_data = getelementptr inbounds [1536 x float], ptr @src_data, i64 0, i64 %mul has a relatively small constant stride. There may be some rough edges in the cost models, where not passing pointer SCEVs hid some incorrect modeling, but those issues should be fixed in the target cost models if they surface. PR: https://github.com/llvm/llvm-project/pull/171204	2025-12-19 22:05:27 +00:00
Florian Hahn	53cf22f3a1	[VPlan] Simplify live-ins early using SCEV. (#155304 ) Use SCEV to simplify all live-ins during VPlan0 construction. This enables us to remove special SCEV queries when constructing VPWidenRecipes and improves results in some cases. This leads to simplifications in a number of cases in real-world applications (~250 files changed across LLVM, SPEC, ffmpeg) PR: https://github.com/llvm/llvm-project/pull/155304	2025-12-14 20:15:05 +00:00
Florian Hahn	c465a56e9d	[VPlan] Handle canonical IVs in ::isSingleScalar. (NFCI) The canonical IV is always a single scalar. They are already treated as uniform-across-UF-and-VF. This should currently be NFC.	2025-11-30 21:51:03 +00:00
Sam Tebbs	071d1fb8be	[LV] Use VPReductionRecipe for partial reductions (#147513 ) Partial reductions can easily be represented by the VPReductionRecipe class by setting their scale factor to something greater than 1. This PR merges the two together and gives VPReductionRecipe a VFScaleFactor so that it can choose to generate the partial reduction intrinsic at execute time. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. -> https://github.com/llvm/llvm-project/pull/147513 Replaces https://github.com/llvm/llvm-project/pull/146073 .	2025-11-26 16:18:22 +00:00
Florian Hahn	a51e2ef0fe	[VPlan] Treat VPVector(End)PointerRecipe as single-scalar, if ops are. (#169249 ) VPVector(End)PointerRecipes are single-scalar if all their operands are. This should be effectively NFC currently, but it should re-enable cost checking for some more VPWidenMemoryRecipe after https://github.com/llvm/llvm-project/pull/157387 as discovered by John Brawn.	2025-11-25 14:46:30 +00:00
Florian Hahn	a2231af5dd	[VPlan] Share PreservesUniformity logic between isSingleScalar and isUniformAcrossVFsAndUFs Extract the PreservesUniformity logic from isSingleScalar into a shared static helper function. Update isUniformAcrossVFsAndUFs to use this logic for VPWidenRecipe and VPInstruction, so that any opcode that preserves uniformity is considered uniform-across-vf-and-uf if its operands are. This unifies the uniformity checking logic and makes it easier to extend in the future. This should effectively by NFC currently.	2025-11-22 22:11:01 +00:00
Ramkumar Ramachandra	b98f6a54f6	[VPlan] Cast to VPIRMetadata in getMemoryLocation (NFC) (#169028 ) This allows us to strip an unnecessary TypeSwitch.	2025-11-21 14:23:17 +00:00
Florian Hahn	7c34848ae1	[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247 ) This patch implements a transform to hoists single-scalar replicated loads with invariant addresses out of the vector loop to the preheader when scoped noalias metadata proves they cannot alias with any stores in the loop. This enables hosting of loads we can prove do not alias any stores in the loop due to memory runtime checks added during vectorization. PR: https://github.com/llvm/llvm-project/pull/166247	2025-11-18 09:35:48 +00:00
Luke Lau	4d4a60cde0	[VPlan] Fix LastActiveLane assertion on scalar VF (#167897 ) For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask: <x1> vector loop: { vector.body: EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next> vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0> EMIT vp<%6> = icmp ule vp<%5>, vp<%3> EMIT vp<%index.next> = add nuw vp<%4>, vp<%1> EMIT branch-on-count vp<%index.next>, vp<%2> No successors } Successor(s): middle.block middle.block: EMIT vp<%8> = last-active-lane vp<%6> EMIT vp<%9> = extract-lane vp<%8>, vp<%5> Successor(s): ir-bb<exit> The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe. Fixes #167813	2025-11-17 11:03:38 +00:00
Florian Hahn	820daa5c1e	[VPlan] Support VPWidenIntOrFpInduction in getSCEVExprForVPValue. (NFCI) Construct SCEVs for VPWidenIntOrFpInductionRecipe analogous to VPCanonicalInductionPHIRecipe: create an AddRec with start + step from the recipe. Currently the only impact should be computing more costs of replicating stores directly in VPlan.	2025-11-15 13:35:11 +00:00
Ramkumar Ramachandra	eab44600fb	[VPlan] Rename onlyFirst(Lane\|Part)Used (NFC) (#166562 ) Rename onlyFirst(Lane\|Part)Used to usesFirst(Lane\|Part)Only, in line with usesScalars, for clarity.	2025-11-06 10:07:58 +00:00
Ramkumar Ramachandra	912cc5f098	[VPlan] Improve getOrCreateVPValueForSCEVExpr (NFC) (#165699 ) Use early exit in getOrCreateVPValueForSCEVExpr.	2025-11-03 06:44:30 +00:00
Florian Hahn	683b00bb50	[VPlan] Limit VPScalarIVSteps to step == 1 in getSCEVExprForVPValue. For now, just support VPScalarIVSteps with step == 1 in getSCEVExprForVPValue. This fixes a crash when the step would be != 1.	2025-10-31 02:22:56 +00:00
Florian Hahn	b2d12d6f2b	[VPlan] Extend getSCEVForVPV, use to compute VPReplicateRecipe cost. (#161276 ) Update getSCEVExprForVPValue to handle more complex expressions, to use it in VPReplicateRecipe::comptueCost. In particular, it supports construction SCEV expressions for GetElementPtr VPReplicateRecipes, with operands that are VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we hit a sub-expression we don't support yet, we return SCEVCouldNotCompute. Note that the SCEV expression is valid VF = 1: we only support construction AddRecs for VPCanonicalIVRecipe, which is an AddRec starting at 0 and stepping by 1. The returned SCEV expressions could be converted to a VF specific one, by rewriting the AddRecs to ones with the appropriate step. Note that the logic for constructing SCEVs for GetElementPtr was directly ported from ScalarEvolution.cpp. Another thing to note is that we construct SCEV expression purely by looking at the operation of the recipe and its translated operands, w/o accessing the underlying IR (the exception being getting the source element type for GEPs). PR: https://github.com/llvm/llvm-project/pull/161276	2025-10-30 15:46:19 -07:00
Florian Hahn	d020b2da54	[VPlan] Move isSingleScalar implementation to VPlanUtils.cpp (NFC) Move the implementation of vputils::isSingleScalar to VPlanUtils.cpp to enable code sharing.	2025-10-25 21:56:03 +01:00
Florian Hahn	8c29bce1e9	[VPlan] Remove SCEVToExpansion mapping (NFC). (#164490 ) VPlan::SCEVToExpansion isn't needed any longer, as SCEV expansion de-duplication is handled locally in expandSCEVs. PR: https://github.com/llvm/llvm-project/pull/164490	2025-10-24 21:38:23 +00:00
Sam Tebbs	6b19a546aa	[LV] Bundle partial reductions inside VPExpressionRecipe (#147302 ) This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. -> https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. https://github.com/llvm/llvm-project/pull/147513	2025-10-23 11:18:55 +00:00
Ramkumar Ramachandra	2ec01e430a	[VPlan] Move two VPBlockUtils members (NFC) (#162507 )	2025-10-21 16:40:13 +01:00
Ramkumar Ramachandra	9bfaf12c07	[VPlan] Handle more replicates in isUniformAcrossVFsAndUFs (#162342 ) A single-scalar replicate without side-effects, and with uniform operands, is uniform. Special-case assumes and stores.	2025-10-20 10:26:23 +00:00
Florian Hahn	86b89a6dcc	[VPlan] Mark VPlan argument in isHeaderMask as const (NFC). isHeaderMask should not modify the VPlan; mark as const to allow easy re-use in the VPlanVerifier.	2025-10-15 19:46:28 +01:00
Florian Hahn	861519327a	[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020 ) The canonical IV is tied to region blocks; move getCanonicalIV there and update all users. PR: https://github.com/llvm/llvm-project/pull/163020	2025-10-15 12:48:35 +01:00
Ramkumar Ramachandra	107940f3be	[VPlan] Improve binary matchers in two places (NFC) (#162268 )	2025-10-07 14:56:43 +01:00
Florian Hahn	2284ce0596	[VPlan] Move using VPlanPatternMatch to top in VPlanUtils.cpp (NFC). Only VPlan pattern matching is used in the file, move the using statement to the top level.	2025-09-28 10:29:44 +01:00
Florian Hahn	a7b4dd42bd	[LV] Don't create partial reductions if factor doesn't match accumulator (#158603 ) Check if the scale-factor of the accumulator is the same as the request ScaleFactor in tryToCreatePartialReductions. This prevents creating partial reductions if not all instructions in the reduction chain form partial reductions. e.g. because we do not form a partial reduction for the loop exit instruction. Currently code-gen works fine, because the scale factor of VPPartialReduction is not used during ::execute, but it means we compute incorrect cost/register pressure, because the partial reduction won't reduce to the specified scaling factor. PR: https://github.com/llvm/llvm-project/pull/158603	2025-09-24 12:21:03 +01:00
Graham Hunter	6b99a7bbed	[LV] Provide utility routine to find uncounted exit recipes (#152530 ) Splitting out just the recipe finding code from #148626 into a utility function (along with the extra pattern matchers). Hopefully this makes reviewing a bit easier. Added a gtest, since this isn't actually used anywhere yet.	2025-09-18 15:45:23 +00:00
Ramkumar Ramachandra	f68f3b9a7e	[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550 )	2025-09-18 12:40:31 +01:00
Ramkumar Ramachandra	148a83543b	[LV] Introduce m_One and improve (0\|1)-match (NFC) (#157419 )	2025-09-15 10:34:06 +00:00
Kerry McLaughlin	f0e9bba024	[LoopVectorize] Generate wide active lane masks (#147535 ) This patch adds a new flag (-enable-wide-lane-mask) which allows LoopVectorize to generate wider-than-VF active lane masks when it is safe to do so (i.e. the mask is used for data and control flow). The transform in extractFromWideActiveLaneMask creates vector extracts from the first active lane mask in the header & loop body, modifying the active lane mask phi operands to use the extracts. An additional operand is passed to the ActiveLaneMask instruction, the value of which is used as a multiplier of VF when generating the mask. By default this is 1, and is updated to UF by extractFromWideActiveLaneMask. The motivation for this change is to improve interleaved loops when SVE2.1 is available, where we can make use of the whilelo instruction which returns a predicate pair. This is based on a PR that was created by @momchil-velikov (#81140) and contains tests which were added there.	2025-09-01 13:53:30 +01:00
Florian Hahn	300d2c6d20	[VPlan] Move SCEV expansion to VPlan transform. (NFCI). Move the logic to expand SCEVs directly to a late VPlan transform that expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an abstract recipe without execute, which clarifies how the recipe is handled, i.e. it is not executed like regular recipes. It also helps to simplify construction, as now scalar evolution isn't required to be passed to the recipe.	2025-08-21 22:03:26 +01:00

1 2

62 Commits