34 Commits

Author SHA1 Message Date
Florian Hahn
7c34848ae1
[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247)
This patch implements a transform to hoists single-scalar replicated
loads with invariant addresses out of the vector loop to the preheader
when scoped noalias metadata proves they cannot alias with any stores in
the loop.

This enables hosting of loads we can prove do not alias any stores in
the loop due to memory runtime checks added during vectorization.

PR: https://github.com/llvm/llvm-project/pull/166247
2025-11-18 09:35:48 +00:00
Luke Lau
4d4a60cde0
[VPlan] Fix LastActiveLane assertion on scalar VF (#167897)
For a scalar only VPlan with tail folding, if it has a phi live out then
legalizeAndOptimizeInductions will scalarize the widened canonical IV
feeding into the header mask:

    <x1> vector loop: {
      vector.body:
        EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
        vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
        EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
        EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
        EMIT branch-on-count vp<%index.next>, vp<%2>
      No successors
    }
    Successor(s): middle.block

    middle.block:
      EMIT vp<%8> = last-active-lane vp<%6>
      EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
    Successor(s): ir-bb<exit>

The verifier complains about this but this should still generate the
correct last active lane, so this fixes the assert by handling this case
in isHeaderMask. There is a similar pattern already there for
ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes #167813
2025-11-17 11:03:38 +00:00
Florian Hahn
820daa5c1e [VPlan] Support VPWidenIntOrFpInduction in getSCEVExprForVPValue. (NFCI)
Construct SCEVs for VPWidenIntOrFpInductionRecipe analogous to
VPCanonicalInductionPHIRecipe: create an AddRec with start + step from
the recipe.

Currently the only impact should be computing more costs of replicating
stores directly in VPlan.
2025-11-15 13:35:11 +00:00
Ramkumar Ramachandra
eab44600fb
[VPlan] Rename onlyFirst(Lane|Part)Used (NFC) (#166562)
Rename onlyFirst(Lane|Part)Used to usesFirst(Lane|Part)Only, in line
with usesScalars, for clarity.
2025-11-06 10:07:58 +00:00
Ramkumar Ramachandra
912cc5f098
[VPlan] Improve getOrCreateVPValueForSCEVExpr (NFC) (#165699)
Use early exit in getOrCreateVPValueForSCEVExpr.
2025-11-03 06:44:30 +00:00
Florian Hahn
683b00bb50
[VPlan] Limit VPScalarIVSteps to step == 1 in getSCEVExprForVPValue.
For now, just support VPScalarIVSteps with step == 1 in
getSCEVExprForVPValue. This fixes a crash when the step would be != 1.
2025-10-31 02:22:56 +00:00
Florian Hahn
b2d12d6f2b
[VPlan] Extend getSCEVForVPV, use to compute VPReplicateRecipe cost. (#161276)
Update getSCEVExprForVPValue to handle more complex expressions, to use
it in VPReplicateRecipe::comptueCost.

In particular, it supports construction SCEV expressions for
GetElementPtr VPReplicateRecipes, with operands that are
VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we
hit a sub-expression we don't support yet, we return
SCEVCouldNotCompute.

Note that the SCEV expression is valid VF = 1: we only support
construction AddRecs for VPCanonicalIVRecipe, which is an AddRec
starting at 0 and stepping by 1. The returned SCEV expressions could be
converted to a VF specific one, by rewriting the AddRecs to ones with
the appropriate step.

Note that the logic for constructing SCEVs for GetElementPtr was
directly ported from ScalarEvolution.cpp.

Another thing to note is that we construct SCEV expression purely by
looking at the operation of the recipe and its translated operands, w/o
accessing the underlying IR (the exception being getting the source
element type for GEPs).

PR: https://github.com/llvm/llvm-project/pull/161276
2025-10-30 15:46:19 -07:00
Florian Hahn
d020b2da54
[VPlan] Move isSingleScalar implementation to VPlanUtils.cpp (NFC)
Move the implementation of vputils::isSingleScalar to VPlanUtils.cpp to
enable code sharing.
2025-10-25 21:56:03 +01:00
Florian Hahn
8c29bce1e9
[VPlan] Remove SCEVToExpansion mapping (NFC). (#164490)
VPlan::SCEVToExpansion isn't needed any longer, as SCEV expansion
de-duplication is handled locally in expandSCEVs.

PR: https://github.com/llvm/llvm-project/pull/164490
2025-10-24 21:38:23 +00:00
Sam Tebbs
6b19a546aa
[LV] Bundle partial reductions inside VPExpressionRecipe (#147302)
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. -> https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. https://github.com/llvm/llvm-project/pull/147513
2025-10-23 11:18:55 +00:00
Ramkumar Ramachandra
2ec01e430a
[VPlan] Move two VPBlockUtils members (NFC) (#162507) 2025-10-21 16:40:13 +01:00
Ramkumar Ramachandra
9bfaf12c07
[VPlan] Handle more replicates in isUniformAcrossVFsAndUFs (#162342)
A single-scalar replicate without side-effects, and with uniform
operands, is uniform. Special-case assumes and stores.
2025-10-20 10:26:23 +00:00
Florian Hahn
86b89a6dcc
[VPlan] Mark VPlan argument in isHeaderMask as const (NFC).
isHeaderMask should not modify the VPlan; mark as const to allow easy
re-use in the VPlanVerifier.
2025-10-15 19:46:28 +01:00
Florian Hahn
861519327a
[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)
The canonical IV is tied to region blocks; move getCanonicalIV there and
update all users.

PR: https://github.com/llvm/llvm-project/pull/163020
2025-10-15 12:48:35 +01:00
Ramkumar Ramachandra
107940f3be
[VPlan] Improve binary matchers in two places (NFC) (#162268) 2025-10-07 14:56:43 +01:00
Florian Hahn
2284ce0596
[VPlan] Move using VPlanPatternMatch to top in VPlanUtils.cpp (NFC).
Only VPlan pattern matching is used in the file, move the using
statement to the top level.
2025-09-28 10:29:44 +01:00
Florian Hahn
a7b4dd42bd
[LV] Don't create partial reductions if factor doesn't match accumulator (#158603)
Check if the scale-factor of the accumulator is the same as the request
ScaleFactor in tryToCreatePartialReductions.

This prevents creating partial reductions if not all instructions in the
reduction chain form partial reductions. e.g. because we do not form a
partial reduction for the loop exit instruction.

Currently code-gen works fine, because the scale factor of
VPPartialReduction is not used during ::execute, but it means we compute
incorrect cost/register pressure, because the partial reduction won't
reduce to the specified scaling factor.

PR: https://github.com/llvm/llvm-project/pull/158603
2025-09-24 12:21:03 +01:00
Graham Hunter
6b99a7bbed
[LV] Provide utility routine to find uncounted exit recipes (#152530)
Splitting out just the recipe finding code from #148626 into a utility
function (along with the extra pattern matchers). Hopefully this makes
reviewing a bit easier.

Added a gtest, since this isn't actually used anywhere yet.
2025-09-18 15:45:23 +00:00
Ramkumar Ramachandra
f68f3b9a7e
[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550) 2025-09-18 12:40:31 +01:00
Ramkumar Ramachandra
148a83543b
[LV] Introduce m_One and improve (0|1)-match (NFC) (#157419) 2025-09-15 10:34:06 +00:00
Kerry McLaughlin
f0e9bba024
[LoopVectorize] Generate wide active lane masks (#147535)
This patch adds a new flag (-enable-wide-lane-mask) which allows
LoopVectorize to generate wider-than-VF active lane masks when it
is safe to do so (i.e. the mask is used for data and control flow).

The transform in extractFromWideActiveLaneMask creates vector
extracts from the first active lane mask in the header & loop body,
modifying the active lane mask phi operands to use the extracts.

An additional operand is passed to the ActiveLaneMask instruction,
the value of which is used as a multiplier of VF when generating the
mask.
By default this is 1, and is updated to UF by
extractFromWideActiveLaneMask.

The motivation for this change is to improve interleaved loops when
SVE2.1 is available, where we can make use of the whilelo instruction
which returns a predicate pair.

This is based on a PR that was created by @momchil-velikov (#81140)
and contains tests which were added there.
2025-09-01 13:53:30 +01:00
Florian Hahn
300d2c6d20
[VPlan] Move SCEV expansion to VPlan transform. (NFCI).
Move the logic to expand SCEVs directly to a late VPlan transform that
expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an
abstract recipe without execute, which clarifies how the recipe is
handled, i.e. it is not executed like regular recipes.

It also helps to simplify construction, as now scalar evolution isn't
required to be passed to the recipe.
2025-08-21 22:03:26 +01:00
Ramkumar Ramachandra
f34326dac8
[VPlan] Introduce vputils::onlyScalarValuesUsed (NFC) (#153577) 2025-08-15 15:55:59 +00:00
Florian Hahn
08f50e9665
[VPlan] Use vector tripcount if computable when simplifying conds. (#151034)
Update isConditionTrueViaVFAndUF to use the vector trip count if
computable. This is the case when it has been materialized to a
constant. Otherwise fall back to the trip count.

PR: https://github.com/llvm/llvm-project/pull/151034
2025-08-02 16:31:31 +01:00
Florian Hahn
dcef154b5c
[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506)
Building on top of https://github.com/llvm/llvm-project/pull/114305,
replace VPRegionBlocks with explicit CFG before executing.

This brings the final VPlan closer to the IR that is generated and
helps to simplify codegen.

It will also enable further simplifications of phi handling during
execution and transformations that do not have to preserve the 
canonical IV required by loop regions. This for example could include
replacing the canonical IV with an EVL based phi while completely
removing the original canonical IV.

PR: https://github.com/llvm/llvm-project/pull/117506
2025-05-24 19:17:16 +01:00
Florian Hahn
04fde85057
[VPlan] Rename isUniform(AfterVectorization) to isSingleScalar (NFC). (#140134)
Update the naming in VPReplicateRecipe and vputils to the more accurate
isSingleScalar, as the functions check for cases where only a single
scalar is needed, either because it produces the same value for all
lanes or has only their first lane used.

Discussed in https://github.com/llvm/llvm-project/pull/139150.

PR: https://github.com/llvm/llvm-project/pull/140134
2025-05-16 16:38:39 +01:00
Graham Hunter
5b9246517f
[LV] Fix ScalarIVSteps vplan pattern matcher, remove m_CanonicalIV() (#138298)
783a846 changed VPScalarIVStepsRecipe to take 3 arguments (adding
VF explicitly) instead of 2, but didn't change the corresponding
pattern matcher.

This matcher was only used in vputils::isHeaderMask, and no test
ever reached that function with a ScalarIVSteps recipe for the
value being matched -- it was always a WideCanonicalIV. So the
matcher bailed out immediately before checking arguments and
asserting that the number of arguments in the recipe was the
same provided by the matcher.

Since the constructors for ScalarIVSteps take 3 values, we should
be safe to update the matcher and guard it with a dedicated gtest.

m_CanonicalIV() on the other hand is removed; as a phi recipe it
may not have a consistent number of arguments to match, only
requiring one (the start value) when being constructed with the
assumption that a second incoming value is added for the backedge
later. In order to keep the matcher we would need to add multiple
matchers with different numbers of arguments for it depending on
what phase of vplan construction we were in, and ensure that we
never reorder matcher usage vs. vplan transformation. Since the
main IR PatternMatch.h doesn't contain any matchers for PHI nodes,
I think we can just remove it and match via m_Specific() using the
VPValue we get from Plan.getCanonicalIV().
2025-05-14 15:01:03 +01:00
Florian Hahn
6a9e8fc50c
[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) (#129706)
There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.

This leads to duplication from defining multiple full recipes.

This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.

There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* https://github.com/llvm/llvm-project/pull/129508
* https://github.com/llvm/llvm-project/pull/119284

PR: https://github.com/llvm/llvm-project/pull/129706
2025-04-10 22:30:40 +01:00
Florian Hahn
f13d58303f
[VPlan] Pass some functions directly to all_of (NFC).
Remove some unneeded lambdas.
2025-03-13 18:50:11 +00:00
Florian Hahn
e258bca950
[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235)
Update getOrCreateVPValueForSCEVExpr to only skip expansion of
SCEVUnknown if the underlying value isn't an instruction. Instructions
may be defined in a loop and using them without expansion may break
LCSSA form. SCEVExpander will take care of preserving LCSSA if needed.

We could also try to pass LoopInfo, but there are some users of the
function where it won't be available and main benefit from skipping
expansion is slightly more concise VPlans.

Note that SCEVExpander is now used to expand SCEVUnknown with floats.
Adjust the check in expandCodeFor to only check the types and casts if
the type of the value is different to the requested type. Otherwise we
crash when trying to expand a float and requesting a float type.

Fixes https://github.com/llvm/llvm-project/issues/121518.

PR: https://github.com/llvm/llvm-project/pull/125235
2025-02-11 13:03:12 +01:00
Florian Hahn
6c8f41d336
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292)
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.

Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/114292
2024-12-12 15:58:16 +00:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
Florian Hahn
0d736e296c
[VPlan] Add getSCEVExprForVPValue util, use to get trip count SCEV (NFC) (#94464)
Add a new getSCEVExprForVPValue utility which can be used to get a SCEV
expression for a VPValue. The initial implementation only returns SCEVs
for live-in IR values (by constructing a SCEV based on the live-in IR
value) and VPExpandSCEVRecipe. This is enough to serve its first use,
getting a SCEV for a VPlan's trip count, but will be extended in the
future.

It also removes createTripCountSCEV, as the new helper can be used to
retrieve the SCEV from the VPlan.

PR: https://github.com/llvm/llvm-project/pull/94464
2024-09-18 14:41:56 +01:00
Ramkumar Ramachandra
71ede8d831
VPlan: factor out VPlanUtils into its own file (NFC) (#105857) 2024-08-28 13:54:41 +01:00