llvm-project

Author	SHA1	Message	Date
Florian Hahn	7c34848ae1	[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247 ) This patch implements a transform to hoists single-scalar replicated loads with invariant addresses out of the vector loop to the preheader when scoped noalias metadata proves they cannot alias with any stores in the loop. This enables hosting of loads we can prove do not alias any stores in the loop due to memory runtime checks added during vectorization. PR: https://github.com/llvm/llvm-project/pull/166247	2025-11-18 09:35:48 +00:00
Luke Lau	4d4a60cde0	[VPlan] Fix LastActiveLane assertion on scalar VF (#167897 ) For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask: <x1> vector loop: { vector.body: EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next> vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0> EMIT vp<%6> = icmp ule vp<%5>, vp<%3> EMIT vp<%index.next> = add nuw vp<%4>, vp<%1> EMIT branch-on-count vp<%index.next>, vp<%2> No successors } Successor(s): middle.block middle.block: EMIT vp<%8> = last-active-lane vp<%6> EMIT vp<%9> = extract-lane vp<%8>, vp<%5> Successor(s): ir-bb<exit> The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe. Fixes #167813	2025-11-17 11:03:38 +00:00
Florian Hahn	820daa5c1e	[VPlan] Support VPWidenIntOrFpInduction in getSCEVExprForVPValue. (NFCI) Construct SCEVs for VPWidenIntOrFpInductionRecipe analogous to VPCanonicalInductionPHIRecipe: create an AddRec with start + step from the recipe. Currently the only impact should be computing more costs of replicating stores directly in VPlan.	2025-11-15 13:35:11 +00:00
Ramkumar Ramachandra	eab44600fb	[VPlan] Rename onlyFirst(Lane\|Part)Used (NFC) (#166562 ) Rename onlyFirst(Lane\|Part)Used to usesFirst(Lane\|Part)Only, in line with usesScalars, for clarity.	2025-11-06 10:07:58 +00:00
Ramkumar Ramachandra	912cc5f098	[VPlan] Improve getOrCreateVPValueForSCEVExpr (NFC) (#165699 ) Use early exit in getOrCreateVPValueForSCEVExpr.	2025-11-03 06:44:30 +00:00
Florian Hahn	683b00bb50	[VPlan] Limit VPScalarIVSteps to step == 1 in getSCEVExprForVPValue. For now, just support VPScalarIVSteps with step == 1 in getSCEVExprForVPValue. This fixes a crash when the step would be != 1.	2025-10-31 02:22:56 +00:00
Florian Hahn	b2d12d6f2b	[VPlan] Extend getSCEVForVPV, use to compute VPReplicateRecipe cost. (#161276 ) Update getSCEVExprForVPValue to handle more complex expressions, to use it in VPReplicateRecipe::comptueCost. In particular, it supports construction SCEV expressions for GetElementPtr VPReplicateRecipes, with operands that are VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we hit a sub-expression we don't support yet, we return SCEVCouldNotCompute. Note that the SCEV expression is valid VF = 1: we only support construction AddRecs for VPCanonicalIVRecipe, which is an AddRec starting at 0 and stepping by 1. The returned SCEV expressions could be converted to a VF specific one, by rewriting the AddRecs to ones with the appropriate step. Note that the logic for constructing SCEVs for GetElementPtr was directly ported from ScalarEvolution.cpp. Another thing to note is that we construct SCEV expression purely by looking at the operation of the recipe and its translated operands, w/o accessing the underlying IR (the exception being getting the source element type for GEPs). PR: https://github.com/llvm/llvm-project/pull/161276	2025-10-30 15:46:19 -07:00
Florian Hahn	d020b2da54	[VPlan] Move isSingleScalar implementation to VPlanUtils.cpp (NFC) Move the implementation of vputils::isSingleScalar to VPlanUtils.cpp to enable code sharing.	2025-10-25 21:56:03 +01:00
Florian Hahn	8c29bce1e9	[VPlan] Remove SCEVToExpansion mapping (NFC). (#164490 ) VPlan::SCEVToExpansion isn't needed any longer, as SCEV expansion de-duplication is handled locally in expandSCEVs. PR: https://github.com/llvm/llvm-project/pull/164490	2025-10-24 21:38:23 +00:00
Sam Tebbs	6b19a546aa	[LV] Bundle partial reductions inside VPExpressionRecipe (#147302 ) This PR bundles partial reductions inside the VPExpressionRecipe class. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. -> https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. https://github.com/llvm/llvm-project/pull/147513	2025-10-23 11:18:55 +00:00
Ramkumar Ramachandra	2ec01e430a	[VPlan] Move two VPBlockUtils members (NFC) (#162507 )	2025-10-21 16:40:13 +01:00
Ramkumar Ramachandra	9bfaf12c07	[VPlan] Handle more replicates in isUniformAcrossVFsAndUFs (#162342 ) A single-scalar replicate without side-effects, and with uniform operands, is uniform. Special-case assumes and stores.	2025-10-20 10:26:23 +00:00
Florian Hahn	86b89a6dcc	[VPlan] Mark VPlan argument in isHeaderMask as const (NFC). isHeaderMask should not modify the VPlan; mark as const to allow easy re-use in the VPlanVerifier.	2025-10-15 19:46:28 +01:00
Florian Hahn	861519327a	[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020 ) The canonical IV is tied to region blocks; move getCanonicalIV there and update all users. PR: https://github.com/llvm/llvm-project/pull/163020	2025-10-15 12:48:35 +01:00
Ramkumar Ramachandra	107940f3be	[VPlan] Improve binary matchers in two places (NFC) (#162268 )	2025-10-07 14:56:43 +01:00
Florian Hahn	2284ce0596	[VPlan] Move using VPlanPatternMatch to top in VPlanUtils.cpp (NFC). Only VPlan pattern matching is used in the file, move the using statement to the top level.	2025-09-28 10:29:44 +01:00
Florian Hahn	a7b4dd42bd	[LV] Don't create partial reductions if factor doesn't match accumulator (#158603 ) Check if the scale-factor of the accumulator is the same as the request ScaleFactor in tryToCreatePartialReductions. This prevents creating partial reductions if not all instructions in the reduction chain form partial reductions. e.g. because we do not form a partial reduction for the loop exit instruction. Currently code-gen works fine, because the scale factor of VPPartialReduction is not used during ::execute, but it means we compute incorrect cost/register pressure, because the partial reduction won't reduce to the specified scaling factor. PR: https://github.com/llvm/llvm-project/pull/158603	2025-09-24 12:21:03 +01:00
Graham Hunter	6b99a7bbed	[LV] Provide utility routine to find uncounted exit recipes (#152530 ) Splitting out just the recipe finding code from #148626 into a utility function (along with the extra pattern matchers). Hopefully this makes reviewing a bit easier. Added a gtest, since this isn't actually used anywhere yet.	2025-09-18 15:45:23 +00:00
Ramkumar Ramachandra	f68f3b9a7e	[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550 )	2025-09-18 12:40:31 +01:00
Ramkumar Ramachandra	148a83543b	[LV] Introduce m_One and improve (0\|1)-match (NFC) (#157419 )	2025-09-15 10:34:06 +00:00
Kerry McLaughlin	f0e9bba024	[LoopVectorize] Generate wide active lane masks (#147535 ) This patch adds a new flag (-enable-wide-lane-mask) which allows LoopVectorize to generate wider-than-VF active lane masks when it is safe to do so (i.e. the mask is used for data and control flow). The transform in extractFromWideActiveLaneMask creates vector extracts from the first active lane mask in the header & loop body, modifying the active lane mask phi operands to use the extracts. An additional operand is passed to the ActiveLaneMask instruction, the value of which is used as a multiplier of VF when generating the mask. By default this is 1, and is updated to UF by extractFromWideActiveLaneMask. The motivation for this change is to improve interleaved loops when SVE2.1 is available, where we can make use of the whilelo instruction which returns a predicate pair. This is based on a PR that was created by @momchil-velikov (#81140) and contains tests which were added there.	2025-09-01 13:53:30 +01:00
Florian Hahn	300d2c6d20	[VPlan] Move SCEV expansion to VPlan transform. (NFCI). Move the logic to expand SCEVs directly to a late VPlan transform that expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an abstract recipe without execute, which clarifies how the recipe is handled, i.e. it is not executed like regular recipes. It also helps to simplify construction, as now scalar evolution isn't required to be passed to the recipe.	2025-08-21 22:03:26 +01:00
Ramkumar Ramachandra	f34326dac8	[VPlan] Introduce vputils::onlyScalarValuesUsed (NFC) (#153577 )	2025-08-15 15:55:59 +00:00
Florian Hahn	08f50e9665	[VPlan] Use vector tripcount if computable when simplifying conds. (#151034 ) Update isConditionTrueViaVFAndUF to use the vector trip count if computable. This is the case when it has been materialized to a constant. Otherwise fall back to the trip count. PR: https://github.com/llvm/llvm-project/pull/151034	2025-08-02 16:31:31 +01:00
Florian Hahn	dcef154b5c	[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506 ) Building on top of https://github.com/llvm/llvm-project/pull/114305, replace VPRegionBlocks with explicit CFG before executing. This brings the final VPlan closer to the IR that is generated and helps to simplify codegen. It will also enable further simplifications of phi handling during execution and transformations that do not have to preserve the canonical IV required by loop regions. This for example could include replacing the canonical IV with an EVL based phi while completely removing the original canonical IV. PR: https://github.com/llvm/llvm-project/pull/117506	2025-05-24 19:17:16 +01:00
Florian Hahn	04fde85057	[VPlan] Rename isUniform(AfterVectorization) to isSingleScalar (NFC). (#140134 ) Update the naming in VPReplicateRecipe and vputils to the more accurate isSingleScalar, as the functions check for cases where only a single scalar is needed, either because it produces the same value for all lanes or has only their first lane used. Discussed in https://github.com/llvm/llvm-project/pull/139150. PR: https://github.com/llvm/llvm-project/pull/140134	2025-05-16 16:38:39 +01:00
Graham Hunter	5b9246517f	[LV] Fix ScalarIVSteps vplan pattern matcher, remove m_CanonicalIV() (#138298 ) 783a846 changed VPScalarIVStepsRecipe to take 3 arguments (adding VF explicitly) instead of 2, but didn't change the corresponding pattern matcher. This matcher was only used in vputils::isHeaderMask, and no test ever reached that function with a ScalarIVSteps recipe for the value being matched -- it was always a WideCanonicalIV. So the matcher bailed out immediately before checking arguments and asserting that the number of arguments in the recipe was the same provided by the matcher. Since the constructors for ScalarIVSteps take 3 values, we should be safe to update the matcher and guard it with a dedicated gtest. m_CanonicalIV() on the other hand is removed; as a phi recipe it may not have a consistent number of arguments to match, only requiring one (the start value) when being constructed with the assumption that a second incoming value is added for the backedge later. In order to keep the matcher we would need to add multiple matchers with different numbers of arguments for it depending on what phase of vplan construction we were in, and ensure that we never reorder matcher usage vs. vplan transformation. Since the main IR PatternMatch.h doesn't contain any matchers for PHI nodes, I think we can just remove it and match via m_Specific() using the VPValue we get from Plan.getCanonicalIV().	2025-05-14 15:01:03 +01:00
Florian Hahn	6a9e8fc50c	[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) (#129706 ) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * https://github.com/llvm/llvm-project/pull/129508 * https://github.com/llvm/llvm-project/pull/119284 PR: https://github.com/llvm/llvm-project/pull/129706	2025-04-10 22:30:40 +01:00
Florian Hahn	f13d58303f	[VPlan] Pass some functions directly to all_of (NFC). Remove some unneeded lambdas.	2025-03-13 18:50:11 +00:00
Florian Hahn	e258bca950	[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235 ) Update getOrCreateVPValueForSCEVExpr to only skip expansion of SCEVUnknown if the underlying value isn't an instruction. Instructions may be defined in a loop and using them without expansion may break LCSSA form. SCEVExpander will take care of preserving LCSSA if needed. We could also try to pass LoopInfo, but there are some users of the function where it won't be available and main benefit from skipping expansion is slightly more concise VPlans. Note that SCEVExpander is now used to expand SCEVUnknown with floats. Adjust the check in expandCodeFor to only check the types and casts if the type of the value is different to the requested type. Otherwise we crash when trying to expand a float and requesting a float type. Fixes https://github.com/llvm/llvm-project/issues/121518. PR: https://github.com/llvm/llvm-project/pull/125235	2025-02-11 13:03:12 +01:00
Florian Hahn	6c8f41d336	[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292 ) As a first step to move towards modeling the full skeleton in VPlan, start by wrapping IR blocks created during legacy skeleton creation in VPIRBasicBlocks and hook them into the VPlan. This means the skeleton CFG is represented in VPlan, just before execute. This allows moving parts of skeleton creation into recipes in the VPBBs gradually. Note that this allows retiring some manual DT updates, as this will be handled automatically during VPlan execution. PR: https://github.com/llvm/llvm-project/pull/114292	2024-12-12 15:58:16 +00:00
Florian Hahn	8ec406757c	[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842 ) This patch implements explicit unrolling by UF as VPlan transform. In follow up patches this will allow simplifying VPTransform state (no need to store unrolled parts) as well as recipe execution (no need to generate code for multiple parts in an each recipe). It also allows for more general optimziations (e.g. avoid generating code for recipes that are uniform-across parts). It also unifies the logic dealing with unrolled parts in a single place, rather than spreading it out across multiple places (e.g. VPlan post processing for header-phi recipes previously.) In the initial implementation, a number of recipes still take the unrolled part as additional, optional argument, if their execution depends on the unrolled part. The computation for start/step values for scalable inductions changed slightly. Previously the step would be computed as scalar and then splatted, now vscale gets splatted and multiplied by the step in a vector mul. This has been split off https://github.com/llvm/llvm-project/pull/94339 which also includes changes to simplify VPTransfomState and recipes' ::execute. The current version mostly leaves existing ::execute untouched and instead sets VPTransfomState::UF to 1. A follow-up patch will clean up all references to VPTransformState::UF. Another follow-up patch will simplify VPTransformState to only store a single vector value per VPValue. PR: https://github.com/llvm/llvm-project/pull/95842	2024-09-21 19:47:37 +01:00
Florian Hahn	0d736e296c	[VPlan] Add getSCEVExprForVPValue util, use to get trip count SCEV (NFC) (#94464 ) Add a new getSCEVExprForVPValue utility which can be used to get a SCEV expression for a VPValue. The initial implementation only returns SCEVs for live-in IR values (by constructing a SCEV based on the live-in IR value) and VPExpandSCEVRecipe. This is enough to serve its first use, getting a SCEV for a VPlan's trip count, but will be extended in the future. It also removes createTripCountSCEV, as the new helper can be used to retrieve the SCEV from the VPlan. PR: https://github.com/llvm/llvm-project/pull/94464	2024-09-18 14:41:56 +01:00
Ramkumar Ramachandra	71ede8d831	VPlan: factor out VPlanUtils into its own file (NFC) (#105857 )	2024-08-28 13:54:41 +01:00

34 Commits