62 Commits

Author SHA1 Message Date
Florian Hahn
7509cad693
[VPlan] Support masked VPInsts, use for predication (NFC) (#142285)
Add support for mask operands to most VPInstructions, using
getNumOperandsForOpcode.

This allows VPlan predication to predicate VPInstructions directly. The
mask will then be dropped or handled when creating wide recipes.

Depends on https://github.com/llvm/llvm-project/pull/142284.
Depends on https://github.com/llvm/llvm-project/pull/168784.

PR: https://github.com/llvm/llvm-project/pull/142285
2026-02-08 18:23:36 +00:00
Florian Hahn
b0d95f0c7b
[VPlan] Handle Mul/UDiv in getSCEVExprForVPValue (NFCI).
Support Mul/UDiv and AND-variant (https://alive2.llvm.org/ce/z/rBJVdg)
in getSCEVExprForVPValue.

This is used in code paths when computing SCEV expressions in the
VPlan-based cost model, which should produce costs matching the legacy
cost model.
2026-02-01 21:41:30 +00:00
Florian Hahn
90b3712d8a
Reapply "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851)"
This reverts commit d1e477b00b49c63ff4dd513eeb14a5b18bc055d7.

Recommit with a extra checks making sure extends are VPWidenCastRecipes,
rejecting VPReplicateRecipes.

Original message:
As a first step, move the existing partial reduction detection logic to
VPlan, trying to preserve the existing code structure & behavior as
closely as possible.

With this, partial reductions are detected and created together in a
single step.

This allows forming partial reductions and bundling them up if
profitable together in a follow-up.

PR: https://github.com/llvm/llvm-project/pull/167851
2026-02-01 16:27:27 +00:00
Martin Storsjö
d1e477b00b Revert "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851)"
This reverts commit f4e8cc1a2229dca76d21c8d37439c4c194b06b86.

This change wasn't NFC; it causes failed asserts when building
ffmpeg for i686 windows, see
https://github.com/llvm/llvm-project/pull/167851 for details.
2026-02-01 14:35:02 +02:00
Florian Hahn
f4e8cc1a22
[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851)
As a first step, move the existing partial reduction detection logic to
VPlan, trying to preserve the existing code structure & behavior as
closely as possible.

With this, partial reductions are detected and created together in a
single step.

This allows forming partial reductions and bundling them up if
profitable together in a follow-up.

PR: https://github.com/llvm/llvm-project/pull/167851
2026-01-31 19:44:46 +00:00
Jakub Kuderski
55fbb71db1
[llvm] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178502)
Pre-commiting this before landing the new check in
https://github.com/llvm/llvm-project/pull/177892
2026-01-28 15:44:04 -05:00
Florian Hahn
14a209f852
[VPlan] Replace ComputeFindIVRes with ComputeRdxRes + cmp + sel (NFC) (#176672)
Replace ComputeFindIVResult with ComputeReductionResult + explicit
compare + select, to more explicitly and simpler model computing finding
the first/last induction, which boils down to a min/max reduction +
compare and select of the sentinel value.

PR: https://github.com/llvm/llvm-project/pull/176672
2026-01-22 19:28:47 +00:00
Florian Hahn
3beb520ce1
[VPlan] Support VPWidenPointerInduction in getSCEVExprForVPValue (NFCI)
Support VPWidenPointerInductionRecipe in getSCEVExprForVPValue.

This is used in code paths when computing SCEV expressions in the
VPlan-based cost model, which should produce costs matching the legacy
cost model.
2026-01-21 22:40:04 +00:00
Florian Hahn
83b13e6de9
[VPLan] Update formatting in getSCEVExprForVPValue (NFC).
Reformat TypeSwitch in getSCEVExprForVPValue, to reduce diff in
follow-up changes.
2026-01-21 22:24:06 +00:00
Florian Hahn
6cc18a8e43
[VPlan] Support more GEP-like recipes in getSCEVExprForVPValue (NFCI)
Support VPWidenGEPRecipe, VPInstructions and VPRelpicateRecipe with
GEP-like opcodes in getSCEVExprForVPValue via a new matcher binding
source element type and operands.

This is used in code paths when computing SCEV expressions in the
VPlan-based cost model, which should produce costs matching the legacy
cost model.
2026-01-18 22:20:25 +00:00
Florian Hahn
d0c87356d1
[VPlan] Handle constant step for VPScalarIVSteps in getSCEVExpr (NFC).
Update getSCEVExprForVPValue to handle VPScalarIVSteps with any constant
step. getSCEVExprForVPValue computes the SCEV for lane 0, so we can
simply return the IV operand, truncated/extended as needed.

This should be NFC and is tested via the VPlan-based cost-model, which
should compute costs matching the legacy cost model.
2026-01-14 22:29:50 +00:00
Florian Hahn
2f7e218017
[VPlan] Add missing sext(sub) SCEV fold to getSCEVExprForVPValue.
SCEV has a manual fold when doing SCEV construction from IR, that is not
integrated in the regular SCEV construction functions. Mirror the
behavior in getSCEVExprForVPValue, to match results when constructing
SCEVs from IR.

Fixes https://github.com/llvm/llvm-project/issues/174622.
2026-01-11 20:51:13 +00:00
Florian Hahn
31b93d6e38
[VPlan] Add specialized VPValue subclasses for different types (NFC) (#172758)
This patch adds VPValue sub-classes for the different cases we currently
have:
 * VPIRValue: A live-in VPValue that wraps an underlying IR value
* VPSymbolicValue: A symbolic VPValue not tied to an underlying value,
e.g. the vector trip count or VF VPValues
 * VPRecipeValue: A VPValue defined by a VPDef/VPRecipeBase.

This has multiple benefits:
 * clearer constructors for each kind of VPValue
* limited scope: for example allows moving VPDef member to VPRecipeValue,
reducing size of other VPValues.
* stricter type checking for member variables (e.g. using VPLiveIn in
the Value -> live-in map in VPlan, or using VPSymbolicValue for symbolic
member VPValues)

There probably are additional opportunities for cleanups as follow-ups.

PR: https://github.com/llvm/llvm-project/pull/172758
2026-01-07 20:29:05 +00:00
Florian Hahn
16830b2164
[VPlan] Remove VPWidenSelectRecipe, use VPWidenRecipe instead (NFCI). (#174234)
All extra state has been removed from VPWidenSelectRecipe at this point.
There's no benefit of having a separate recipe and Select can easily be
handled by the existing VPWidenRecipe.

PR: https://github.com/llvm/llvm-project/pull/174234
2026-01-05 22:33:37 +00:00
Florian Hahn
3f5ee8aa76
[VPlan] Handle VPInstruction::Not in getSCEVExprForVPValue (NFC).
https://alive2.llvm.org/ce/z/jpLaJX
2026-01-03 22:32:52 +00:00
Florian Hahn
524b1788c4
[VPlan] Add BranchOnTwoConds, use for early exit plans. (#172750)
This PR introduces a new BranchOnTwoConds VPInstruction, that takes 2
boolean operands and must be placed in a block with 3 successors.

If condition I is true, branches to successor I, otherwise falls through
to check the next condition. If both conditions are false, branch to the
third successor.

This new branch recipe is used for early-exit loops, to simplify the
representation in VPlan initially, by avoid the need for splitting the
middle block early on, in a way that preserves the single-exit block
property of regions. All exits still go through the latch block, but
they can go to more than 2 successors.

This idea was part of one of the original proposals for how to model
early exits in VPlan, but at that point in time, there was no good way
to handle this during code-gen, and we went with the early split-middle
block approach initially.

Now that we dissolve regions before ::execute, the new recipe can be
lowered nicely after regions have been removed, to a set of VPBBs and
BranchOnCond recipes. The initial lowering preserves the original
structure with the split middle blocks. Follow-ups will improve the
lowering to avoid this splitting, providing performance gains.

PR: https://github.com/llvm/llvm-project/pull/172750
2025-12-29 19:39:38 +00:00
Florian Hahn
7de080482c
[VPlan] Handle min/max intrinsics in getSCEVExprForVPValue (NFCI)
Use m_Intrinsic to handle min/max intrinsics in getSCEVExprForVPValue.
This also extends Argument_match and IntrinsicID_match to VPInstruction
for completeness, and unifies the handling to avoid looking up functions
from the underlying IR instruction.

Tested via the VPlan-based cost-model, but same costs should be
computed.

As part of the extension, fix a bug in Argument_match that had an
incorrect offset for the operands of VPReplicateRecipe; the function is
the last argument.
2025-12-28 22:28:16 +00:00
Florian Hahn
60e5b86052
[VPlan] Support extends and truncs in getSCEVExprForVPValue. (NFCI)
Handle extends and truncates in getSCEVExprForVPValue. This enables
computing SCEVs in more cases in the VPlan-based cost-model, but should
compute the matching costs in all cases.
2025-12-26 21:38:14 +00:00
Florian Hahn
15bf7079b0
[VPlan] Support truncated IVs in getSCEVExprForVPValue. (NFCI)
Handle truncated inductions in getSCEVExprForVPValue. This means we are
able to compute SCEV expressions for more inductions used in the
VPlan-based cost model, which should produce costs matching the legacy
cost model.
2025-12-25 22:03:29 +00:00
Florian Hahn
ee1bac863a
[VPlan] Support binary add/sub in getSCEVExprForVPValue. (NFCI)
Handle binary add/sub in getSCEVExprForVPValue. This means we are able
to compute more replicate recipe costs in the VPlan cost model. It
should produce the same costs.
2025-12-24 23:00:16 +00:00
Florian Hahn
c43ccefc9f
[VPlan] Use PSE to construct SCEVs in getSCEVExprForVPValue (NFCI).
getSCEVExprForVPValue is used to create SCEVs for expressions from the
original loop, which may be predicated. Use PSE to construct predicated
SCEVs if possible. This matches the legacy LV code behavior.

Currently should be NFC, but will enable migrating more SCEV/cost-based
computations to VPlan.

The patch requires exposing a new getPredicatedSCEV helper to
PredicatedScalarEvolution which just takes a SCEV, to avoid needing to
go through IR values, which isn't an option for getSCEVExprForVPValue.
2025-12-21 22:39:49 +00:00
Florian Hahn
1f78f6a2d6
[LV] Check Addr in getAddressAccessSCEV in terms of SCEV expressions. (#171204)
getAddressAccessSCEV previously had some restrictive checks that limited
pointer SCEV expressions passed to TTI to GEPs with operands that must
either be invariant or marked as inductions.

As a consequence, the check rejected things like `GEP %base, (%iv + 1)`,
while the SCEV for the GEP should be as easily analyzeable as for `GEP
%base, %v`, with the only difference being the of the AddRec start
adjusted by 1.

This patch changes the code to use a SCEV-based check, limiting the
address SCEV to be loop invariant, an affine AddRec (i.e. induction ),
or an add expression of such operands or a sign-extended AddRec.

This catches all existing cases getAddressAccessSCEV caught, plus
additional ones like the cases mentioned above.

This means we pass address SCEVs in more cases, giving the backends a
better change to make informed decisions. It also unifies the decision
when to use an address SCEV between the legacy and VPlan-based cost
model.

An illustrative example of showing the impact are the gather-cost.ll
tests. Previously they were considered not profitable to vectorize
because we failed to determine that
 %gep.src_data = getelementptr inbounds [1536 x float], ptr @src_data,
                                                        i64 0, i64 %mul
has a relatively small constant stride.

There may be some rough edges in the cost models, where not passing
pointer SCEVs hid some incorrect modeling, but those issues should be
fixed in the target cost models if they surface.


PR: https://github.com/llvm/llvm-project/pull/171204
2025-12-19 22:05:27 +00:00
Florian Hahn
53cf22f3a1
[VPlan] Simplify live-ins early using SCEV. (#155304)
Use SCEV to simplify all live-ins during VPlan0 construction. This
enables us to remove special SCEV queries when constructing
VPWidenRecipes and improves results in some cases.

This leads to simplifications in a number of cases in real-world
applications (~250 files changed across LLVM, SPEC, ffmpeg)

PR: https://github.com/llvm/llvm-project/pull/155304
2025-12-14 20:15:05 +00:00
Florian Hahn
c465a56e9d
[VPlan] Handle canonical IVs in ::isSingleScalar. (NFCI)
The canonical IV is always a single scalar. They are already treated as
uniform-across-UF-and-VF.

This should currently be NFC.
2025-11-30 21:51:03 +00:00
Sam Tebbs
071d1fb8be
[LV] Use VPReductionRecipe for partial reductions (#147513)
Partial reductions can easily be represented by the VPReductionRecipe
class by setting their scale factor to something greater than 1. This PR
merges the two together and gives VPReductionRecipe a VFScaleFactor so
that it can choose to generate the partial reduction intrinsic at
execute time.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. -> https://github.com/llvm/llvm-project/pull/147513

Replaces https://github.com/llvm/llvm-project/pull/146073 .
2025-11-26 16:18:22 +00:00
Florian Hahn
a51e2ef0fe
[VPlan] Treat VPVector(End)PointerRecipe as single-scalar, if ops are. (#169249)
VPVector(End)PointerRecipes are single-scalar if all their operands are.
This should be effectively NFC currently, but it should re-enable cost
checking for some more VPWidenMemoryRecipe after
https://github.com/llvm/llvm-project/pull/157387 as discovered by
John Brawn.
2025-11-25 14:46:30 +00:00
Florian Hahn
a2231af5dd
[VPlan] Share PreservesUniformity logic between isSingleScalar and isUniformAcrossVFsAndUFs
Extract the PreservesUniformity logic from isSingleScalar into a shared
static helper function. Update isUniformAcrossVFsAndUFs to use this
logic for VPWidenRecipe and VPInstruction, so that any opcode that
preserves uniformity is considered uniform-across-vf-and-uf if its
operands are.

This unifies the uniformity checking logic and makes it easier to extend
in the future.

This should effectively by NFC currently.
2025-11-22 22:11:01 +00:00
Ramkumar Ramachandra
b98f6a54f6
[VPlan] Cast to VPIRMetadata in getMemoryLocation (NFC) (#169028)
This allows us to strip an unnecessary TypeSwitch.
2025-11-21 14:23:17 +00:00
Florian Hahn
7c34848ae1
[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247)
This patch implements a transform to hoists single-scalar replicated
loads with invariant addresses out of the vector loop to the preheader
when scoped noalias metadata proves they cannot alias with any stores in
the loop.

This enables hosting of loads we can prove do not alias any stores in
the loop due to memory runtime checks added during vectorization.

PR: https://github.com/llvm/llvm-project/pull/166247
2025-11-18 09:35:48 +00:00
Luke Lau
4d4a60cde0
[VPlan] Fix LastActiveLane assertion on scalar VF (#167897)
For a scalar only VPlan with tail folding, if it has a phi live out then
legalizeAndOptimizeInductions will scalarize the widened canonical IV
feeding into the header mask:

    <x1> vector loop: {
      vector.body:
        EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
        vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
        EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
        EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
        EMIT branch-on-count vp<%index.next>, vp<%2>
      No successors
    }
    Successor(s): middle.block

    middle.block:
      EMIT vp<%8> = last-active-lane vp<%6>
      EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
    Successor(s): ir-bb<exit>

The verifier complains about this but this should still generate the
correct last active lane, so this fixes the assert by handling this case
in isHeaderMask. There is a similar pattern already there for
ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes #167813
2025-11-17 11:03:38 +00:00
Florian Hahn
820daa5c1e [VPlan] Support VPWidenIntOrFpInduction in getSCEVExprForVPValue. (NFCI)
Construct SCEVs for VPWidenIntOrFpInductionRecipe analogous to
VPCanonicalInductionPHIRecipe: create an AddRec with start + step from
the recipe.

Currently the only impact should be computing more costs of replicating
stores directly in VPlan.
2025-11-15 13:35:11 +00:00
Ramkumar Ramachandra
eab44600fb
[VPlan] Rename onlyFirst(Lane|Part)Used (NFC) (#166562)
Rename onlyFirst(Lane|Part)Used to usesFirst(Lane|Part)Only, in line
with usesScalars, for clarity.
2025-11-06 10:07:58 +00:00
Ramkumar Ramachandra
912cc5f098
[VPlan] Improve getOrCreateVPValueForSCEVExpr (NFC) (#165699)
Use early exit in getOrCreateVPValueForSCEVExpr.
2025-11-03 06:44:30 +00:00
Florian Hahn
683b00bb50
[VPlan] Limit VPScalarIVSteps to step == 1 in getSCEVExprForVPValue.
For now, just support VPScalarIVSteps with step == 1 in
getSCEVExprForVPValue. This fixes a crash when the step would be != 1.
2025-10-31 02:22:56 +00:00
Florian Hahn
b2d12d6f2b
[VPlan] Extend getSCEVForVPV, use to compute VPReplicateRecipe cost. (#161276)
Update getSCEVExprForVPValue to handle more complex expressions, to use
it in VPReplicateRecipe::comptueCost.

In particular, it supports construction SCEV expressions for
GetElementPtr VPReplicateRecipes, with operands that are
VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we
hit a sub-expression we don't support yet, we return
SCEVCouldNotCompute.

Note that the SCEV expression is valid VF = 1: we only support
construction AddRecs for VPCanonicalIVRecipe, which is an AddRec
starting at 0 and stepping by 1. The returned SCEV expressions could be
converted to a VF specific one, by rewriting the AddRecs to ones with
the appropriate step.

Note that the logic for constructing SCEVs for GetElementPtr was
directly ported from ScalarEvolution.cpp.

Another thing to note is that we construct SCEV expression purely by
looking at the operation of the recipe and its translated operands, w/o
accessing the underlying IR (the exception being getting the source
element type for GEPs).

PR: https://github.com/llvm/llvm-project/pull/161276
2025-10-30 15:46:19 -07:00
Florian Hahn
d020b2da54
[VPlan] Move isSingleScalar implementation to VPlanUtils.cpp (NFC)
Move the implementation of vputils::isSingleScalar to VPlanUtils.cpp to
enable code sharing.
2025-10-25 21:56:03 +01:00
Florian Hahn
8c29bce1e9
[VPlan] Remove SCEVToExpansion mapping (NFC). (#164490)
VPlan::SCEVToExpansion isn't needed any longer, as SCEV expansion
de-duplication is handled locally in expandSCEVs.

PR: https://github.com/llvm/llvm-project/pull/164490
2025-10-24 21:38:23 +00:00
Sam Tebbs
6b19a546aa
[LV] Bundle partial reductions inside VPExpressionRecipe (#147302)
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. -> https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. https://github.com/llvm/llvm-project/pull/147513
2025-10-23 11:18:55 +00:00
Ramkumar Ramachandra
2ec01e430a
[VPlan] Move two VPBlockUtils members (NFC) (#162507) 2025-10-21 16:40:13 +01:00
Ramkumar Ramachandra
9bfaf12c07
[VPlan] Handle more replicates in isUniformAcrossVFsAndUFs (#162342)
A single-scalar replicate without side-effects, and with uniform
operands, is uniform. Special-case assumes and stores.
2025-10-20 10:26:23 +00:00
Florian Hahn
86b89a6dcc
[VPlan] Mark VPlan argument in isHeaderMask as const (NFC).
isHeaderMask should not modify the VPlan; mark as const to allow easy
re-use in the VPlanVerifier.
2025-10-15 19:46:28 +01:00
Florian Hahn
861519327a
[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)
The canonical IV is tied to region blocks; move getCanonicalIV there and
update all users.

PR: https://github.com/llvm/llvm-project/pull/163020
2025-10-15 12:48:35 +01:00
Ramkumar Ramachandra
107940f3be
[VPlan] Improve binary matchers in two places (NFC) (#162268) 2025-10-07 14:56:43 +01:00
Florian Hahn
2284ce0596
[VPlan] Move using VPlanPatternMatch to top in VPlanUtils.cpp (NFC).
Only VPlan pattern matching is used in the file, move the using
statement to the top level.
2025-09-28 10:29:44 +01:00
Florian Hahn
a7b4dd42bd
[LV] Don't create partial reductions if factor doesn't match accumulator (#158603)
Check if the scale-factor of the accumulator is the same as the request
ScaleFactor in tryToCreatePartialReductions.

This prevents creating partial reductions if not all instructions in the
reduction chain form partial reductions. e.g. because we do not form a
partial reduction for the loop exit instruction.

Currently code-gen works fine, because the scale factor of
VPPartialReduction is not used during ::execute, but it means we compute
incorrect cost/register pressure, because the partial reduction won't
reduce to the specified scaling factor.

PR: https://github.com/llvm/llvm-project/pull/158603
2025-09-24 12:21:03 +01:00
Graham Hunter
6b99a7bbed
[LV] Provide utility routine to find uncounted exit recipes (#152530)
Splitting out just the recipe finding code from #148626 into a utility
function (along with the extra pattern matchers). Hopefully this makes
reviewing a bit easier.

Added a gtest, since this isn't actually used anywhere yet.
2025-09-18 15:45:23 +00:00
Ramkumar Ramachandra
f68f3b9a7e
[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550) 2025-09-18 12:40:31 +01:00
Ramkumar Ramachandra
148a83543b
[LV] Introduce m_One and improve (0|1)-match (NFC) (#157419) 2025-09-15 10:34:06 +00:00
Kerry McLaughlin
f0e9bba024
[LoopVectorize] Generate wide active lane masks (#147535)
This patch adds a new flag (-enable-wide-lane-mask) which allows
LoopVectorize to generate wider-than-VF active lane masks when it
is safe to do so (i.e. the mask is used for data and control flow).

The transform in extractFromWideActiveLaneMask creates vector
extracts from the first active lane mask in the header & loop body,
modifying the active lane mask phi operands to use the extracts.

An additional operand is passed to the ActiveLaneMask instruction,
the value of which is used as a multiplier of VF when generating the
mask.
By default this is 1, and is updated to UF by
extractFromWideActiveLaneMask.

The motivation for this change is to improve interleaved loops when
SVE2.1 is available, where we can make use of the whilelo instruction
which returns a predicate pair.

This is based on a PR that was created by @momchil-velikov (#81140)
and contains tests which were added there.
2025-09-01 13:53:30 +01:00
Florian Hahn
300d2c6d20
[VPlan] Move SCEV expansion to VPlan transform. (NFCI).
Move the logic to expand SCEVs directly to a late VPlan transform that
expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an
abstract recipe without execute, which clarifies how the recipe is
handled, i.e. it is not executed like regular recipes.

It also helps to simplify construction, as now scalar evolution isn't
required to be passed to the recipe.
2025-08-21 22:03:26 +01:00