426 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
d107c29fef
[VPlan] Strip unused CanonicalIVTy arg (NFC) (#153418) 2025-08-13 15:53:56 +01:00
Luke Lau
9217b6ab2e
[VPlan] Enforce that there is only ever one header mask. NFC (#152489)
We almost only ever have one header mask, except with the data tail
folding style, i.e. with VPInstruction::ActiveLaneMask.

All we need to do is to make sure to erase the old header icmp based
header mask when replacing it.
2025-08-13 02:39:04 +00:00
Florian Hahn
8cdab07aaa
Reapply "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9.

Recommit with a fix for the verifier error caused for EVL recipes.

Extra test coverage added in 6f939da60e.
2025-08-12 22:09:30 +01:00
Florian Hahn
424258947e
[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879)
Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.

This is one of the last few steps needed to model the full vector
skeleton in VPlan.

This is mostly NFC, although in some cases we remove some unused
computations.

PR: https://github.com/llvm/llvm-project/pull/152879
2025-08-12 14:13:13 +01:00
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Florian Hahn
1c7c8e3ad3
Revert "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2.

This seems to be breaking some RISCV bots, reverting for now
https://lab.llvm.org/buildbot/#/builders/210/builds/1266
2025-08-11 22:05:30 +01:00
Florian Hahn
1f17bb133f
[VPlan] Remove trivial dead VPPhi cycles.
Update removeDeadRecipes to remove trivial dead VPPhi cycles.

Should effectively be NFC end-to-end.
2025-08-11 21:29:49 +01:00
Luke Lau
aea82a780a
[VPlan] Remove some getCanonicalIV() uses. NFC (#152969)
A lot of time getCanonicalIV() is used to get the canonical IV type,
e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext.

However VPTypeAnalysis has a constructor that takes the VPlan directly
and there's a method on VPlan to get the LLVMContext directly, so use
those instead where possible.

This lets us remove a constructor on VPTypeAnalysis.

Also remove an unused LLVMContext argument in UnrollState whilst we're
here.
2025-08-11 18:12:05 +08:00
Ramkumar Ramachandra
95c525b1db
[VPlan] Preserve nusw on VectorEndPointer (#151558)
In createInterleaveGroups, get the nusw in addition to inbounds from the
existing GEP, and set them on the VPVectorEndPointerRecipe.
2025-08-11 10:38:25 +01:00
Mel Chen
6db3776f9b
[LV][EVL] Simplify EVL recipe transformation by using a single EVL mask. nfc (#152479)
The EVL mask is always defined as `icmp ult (step-vector, EVL)`, so we
only need to generate it once per plan in the header. Then, we replace
all uses of the header mask with the EVL mask, and recursively optimize
the users of EVL mask into EVL recipes. This way, the transformation to
EVL recipes can be done with just a single loop.
2025-08-11 11:09:01 +08:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Luke Lau
df8da2ff83
[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110)
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.

We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.

We also need to move the creation of the backedge value to the latch so
that EVL dominates it.

With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.

The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
2025-08-07 10:54:24 +08:00
Ramkumar Ramachandra
092388171f
[VPlan] Introduce m_[Specific]ICmp matcher (#151540) 2025-08-06 20:35:35 +01:00
Florian Hahn
25d1285eec
[VPlan] Replace single-entry VPPhis with their incoming values.
Replace trivial, single-entry VPPhis with their incoming values,
2025-08-06 20:03:31 +01:00
Florian Hahn
e80e7e717e
[VPlan] Use scalar VPPhi instead of VPWidenPHIRecipe in createPlainCFG. (#150847)
The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.

PR: https://github.com/llvm/llvm-project/pull/150847
2025-08-06 14:43:03 +01:00
Florian Hahn
777c320e6c
[VPlan] Address comments missed in #142309.
Address additional comments from
https://github.com/llvm/llvm-project/pull/142309.
2025-08-06 11:52:08 +01:00
Luke Lau
94a6cd464e
[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274)
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.

There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.

Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
2025-08-05 16:54:02 +08:00
Mel Chen
8761b6cf8f
[VPlan] Use VPTypeAnalysis to get the step type of widen pointer induction (#147925)
This patch uses VPTypeAnalysis to determine its type since the induction
step is not always a live-in value in the VPlan and may be defined by a
recipe.
2025-08-05 09:13:44 +08:00
Florian Hahn
0433e1e15f
[VPlan] Add VPlan::getTrue/getFalse convenience helpers (NFC).
Makes it slightly more convenient to create true/false constants.
2025-08-04 21:04:55 +01:00
Florian Hahn
559d1dff89
[VPlan] Materialize BackedgeTakenCount using VPInstructions.
Explicitly compute the backedge-taken count using VPInstruction. This is
needed to model the full skeleton in VPlan.

NFC modulo some instruction re-ordering.
2025-08-03 12:21:28 +01:00
Florian Hahn
08f50e9665
[VPlan] Use vector tripcount if computable when simplifying conds. (#151034)
Update isConditionTrueViaVFAndUF to use the vector trip count if
computable. This is the case when it has been materialized to a
constant. Otherwise fall back to the trip count.

PR: https://github.com/llvm/llvm-project/pull/151034
2025-08-02 16:31:31 +01:00
Ramkumar Ramachandra
af0be76a35
[VPlan] Replace reverse RPOT with PO traversal (NFC) (#151757) 2025-08-02 08:46:27 +01:00
Mel Chen
86916ff0f0
[LV] Fix gap mask requirement for interleaved access (#151105)
When interleaved stores contain gaps, a mask is required to skip the
gaps, regardless of whether scalar epilogues are allowed.
This patch corrects the condition under which a gap mask is needed,
ensuring consistency between the legacy and VPlan-based cost models and
avoiding assertion failures.

Related #149981
2025-08-01 14:24:30 +08:00
Ramkumar Ramachandra
d07f48e4da
[VPlan] Use m_BinaryOr matcher for clarity (NFC) (#151541) 2025-08-01 06:56:27 +01:00
Luke Lau
7250b66240
[VPlan] Create AVL as a phi from TC -> 0 with EVL tail folding (#151481)
This implements the first half of #151459, by changing the AVL so it's
no longer computed as `trip-count - EVL-based IV`, but instead a
separate scalar phi that is decremented by EVL each iteration.

This shortens the dependency chain for computing the AVL and should
eventually allow us to convert the branch condition to `branch-count
avl-next, 0`.

`simplifyBranchConditionForVFAndUF` had to be updated to prevent a
regression because this introduces a VPPhi in the header block.
2025-08-01 11:00:05 +08:00
Ramkumar Ramachandra
b7d00b827e
[VPlan] Uniformly use VPlanPatternMatch in transforms (NFC) (#151488) 2025-07-31 12:01:40 +01:00
Mel Chen
6752415ce8
[VectorUtils] Simplify the code by new function InterleaveGroup::isFull. nfc (#151112) 2025-07-31 16:02:53 +08:00
Shih-Po Hung
cc8c941e17
[VPlan] Convert EVL loops to variable-length stepping after dissolution (#147222)
Loop regions require fixed-length steps and rounded-up trip counts, but
after dissolution creates explicit control flow, EVL loops can leverage
variable-length stepping with original trip counts.

This patch adds a post-dissolution transform pass to convert EVL loops
from fixed-length to variable-length stepping .
2025-07-30 16:50:57 +08:00
Luke Lau
b663e563cc
[VPlan] Fix header masks in EVL tail folding (#150202)
With EVL tail folding, the EVL may not always be VF on the
second-to-last iteration.

Recipes that have been converted to VP intrinsics via optimizeMaskToEVL
account for this, but recipes that are left behind will still use the
old header mask which may end up having a different vector length.

This is effectively the same as #95368, and fixes this by converting
header masks from icmp ule wide-canonical-iv, backedge-trip-count ->
icmp ult step-vector, evl. Without it, recipes that fall through
optimizeMaskToEVL may use the wrong vector length, e.g. in #150074 and
#149981.

We really need to split off optimizeMaskToEVL into
VPlanTransforms::optimize and move transformRecipestoEVLRecipes into
tryToBuildVPlanWithVPRecipes, so we don't mix up what is needed for
correctness and what is needed to optimize away the mask computations.
We should be able to still generate a correct albeit suboptimal VPlan
without running optimizeMaskToEVL. I've added a TODO for this, which I
think we can do after #148274

Fixes #150197
2025-07-30 11:31:04 +08:00
Florian Hahn
c93d166c58
[VPlan] Simplify (MUL %x, 0) -> 0.
Simplify trivial multiplies.
https://alive2.llvm.org/ce/z/DabRkA
2025-07-28 21:50:57 +01:00
Florian Hahn
f8b1c7333f
[VPlan] Add getContext helper to VPlan (NFC). 2025-07-27 18:53:53 +01:00
Florian Hahn
89ae085859
[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735)
VPVectorPointer for part 0 is just the pointer operand. Simplify it
after unrolling. This removes a large number of redundant GEPs with
index 0.

PR: https://github.com/llvm/llvm-project/pull/149735
2025-07-27 13:53:26 +01:00
Florian Hahn
d1f2a661f4
[VPlan] Pass debug location explicitly to VPBlendRecipe (NFC).
This enables creating VPBlendRecipes without underlying PHINode.
2025-07-27 09:12:26 +01:00
Florian Hahn
80c43b6c07
[VPlan] Add ExtractLane VPInst to extract across multiple parts. (#148817)
This patch adds a new ExtractLane VPInstruction which extracts across
multiple parts using a wide index, to be used in combination with
FirstActiveLane.

The patch updates early-exit codegen to use it instead ExtractElement,
which is only per-part. With this change, interleaving should work
correctly with early-exit loops.

The patch removes the restrictions added in 6f43754e9 (#145877), but
does not yet automatically select interleave counts > 1 for early-exit
loops.

I'll share a patch as follow-up. The cost of extracting a lane adds
non-trivial overhead in the exit block, so that should be considered
when picking the interleave count.

PR: https://github.com/llvm/llvm-project/pull/148817
2025-07-27 08:08:25 +01:00
Florian Hahn
fa3ec0c17c
[VPlan] Materialize constant vector trip counts before final opts. (#142309)
Materialize constant vector trip counts before ::execute, if the trip
count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now
this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle
block.

For now this is also only done when not vectorizing the epilogue, as the
simplification complicates stitching the 2 plans together.

PR: https://github.com/llvm/llvm-project/pull/142309
2025-07-26 17:16:36 +01:00
Luke Lau
feb77c0fea
[VPlan] Handle VPWidenSelectRecipe in tryToFoldLiveIns (#150357)
This helps simplify VPBlendRecipes that are expanded to selects in
another patch.
2025-07-25 09:46:19 +08:00
Luke Lau
114d74e391
[VPlan] Expand VPBlendRecipes to select instructions. NFC (#133993)
When looking at some EVL tail folded code in SPEC CPU 2017 I noticed we
sometimes have both VPBlendRecipes and select VPInstructions in the same
plan:

    EMIT vp<%active.lane.mask> = active lane mask vp<%5>, vp<%3>
    EMIT vp<%7> = icmp ...
    EMIT vp<%8> = logical-and vp<%active.lane.mask>, vp<%7>
    BLEND ir<%8> = ir<%n.015> ir<%foo>/vp<%8>
    EMIT vp<%9> = select vp<%active.lane.mask>, ir<%8>, ir<%n.015>

Since a blend will ultimately generate a chain of selects, we could fold
the blend into the select:

    EMIT vp<%active.lane.mask> = active lane mask vp<%5>, vp<%3>
    EMIT vp<%7> = icmp ...
    EMIT vp<%8> = logical-and vp<%active.lane.mask>, vp<%7>
    EMIT ir<%8> = select vp<%8>, ir<%foo>, ir<%n.015>

So as a first step, this patch expands blends to a series of select
instructions, which may allow them to be simplified further with other
select instructions.
2025-07-23 20:09:33 +08:00
Luke Lau
6e723d2de8
[VPlan] Remove loop region in simplifyBranchConditionForVFAndUF with EVL PHI (#150016)
Previously we fell back to just simplifying the branch cond to true
since one of the phis was a VPEVLBasedIVPHIRecipe. However this should
be fine to replace with its start value.
2025-07-22 22:30:34 +08:00
Florian Hahn
3fd53db858
[VPlan] Remove unneeded VPVectorPointer after narrowing to replicate.
The replicate recipes created when narrowing interleave groups don't
need a VPVectorPointer, they can simply use the existing pointer.
2025-07-19 20:18:04 +01:00
Florian Hahn
89193640f4
[VPlan] Remove unused argument from canNarrowLoad (NFC).
The WideMember argument is unused, remove it.
2025-07-11 21:10:58 +01:00
Florian Hahn
c452de1715
Reapply "[VPlan] Allow derived IVs and scalar-steps in narrowing interleave."
This reverts commit f5ed863176dd286462cd5558723dfe445967fedf.

Recommit patch now that the crash exposed by the change has been fixed.
2025-07-10 20:48:19 +01:00
Florian Hahn
253f8b6873
[VPlan] Support single-scalar VPReplicateRecipes when narrowing IGs.
When narrowing interleave groups, we can treat single scalar
VPReplicateRecipes as already narrowed.
2025-07-09 21:30:44 +01:00
Luke Lau
2e5776130b
[VPlan] Simplify select !c, x, y -> select c, y, x (#147268)
This is split off from #133993

On its own this simplification isn't that useful, but it allows us to
make the equivalent VPBlendRecipe optimisation more generic by operating
on VPInstructions.

In order to actually test this without #133993, I've had to also extend
the m_Not pattern matcher to also catch VPWidenRecipes, since I couldn't
really think of a straightforward way to create a VPInstruction::Select
with a negated condition.
2025-07-08 15:56:04 +08:00
Florian Hahn
6a9a16da7a
[VPlan] Replace RdxDesc with RecurKind in VPReductionPHIRecipe (NFC). (#142322)
Replace RdxDesc with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
https://github.com/llvm/llvm-project/pull/141860
https://github.com/llvm/llvm-project/pull/141932
https://github.com/llvm/llvm-project/pull/142290
https://github.com/llvm/llvm-project/pull/142291

PR: https://github.com/llvm/llvm-project/pull/142322
2025-07-06 21:40:42 +01:00
Florian Hahn
1f3f9874b0
[VPlan] Fix crash when narrowing interleave-groups with reuse.
If a wide load is used multiple times in an expression, it will be
narrowed the first time. Re-use the already narrowed op in that case to
fix crash.
2025-07-04 21:32:24 +01:00
Florian Hahn
6efa3dfb7b
[VPlan] Handle interleave groups with trivially narrow operands.
If all operands to an interleave group are already trivially narrow,
narrow the interleave group itself as well.
2025-07-03 21:02:10 +01:00
Luke Lau
61a0653cc6
[VPlan] Fix first-order splices without header mask not using EVL (#146672)
This fixes a buildbot failure with EVL tail folding after #144666:
https://lab.llvm.org/buildbot/#/builders/132/builds/1653

For a first-order recurrence to be correct with EVL tail folding we need
to convert splices to vp splices with the EVL operand.
Originally we did this by looking for users of the header mask and its
users, and converting it in createEVLRecipe.

However after #144666 a FOR splice might not actually use the header
mask if it's based off e.g. an induction variable, and so we wouldn't
pick it up in createEVLRecipe.

This fixes this by converting FOR splices separately in a loop over all
recipes in the plan, regardless of whether or not it uses the header
mask.

I think there was some conflation in createEVLRecipe between what was an
optimisation and what was needed for correctness. Most of the transforms
in it just exist to optimize the mask away and we should still emit
correct code without them. So I've renamed it to make the separation
clearer.
2025-07-03 16:55:00 +01:00
Luke Lau
ec25a0568c
[VPlan] Don't convert VPWidenSelectRecipes to vp.select in EVL transform (#146695)
createEVLRecipe tries to optimise recipes that use the header mask by
replacing them with their VP equivalents and setting the EVL, allowing
the mask to be removed.

However we currently also convert widened selects to vp.select even
though they don't necessarily use the header mask.

Unlike vp.merge a vp.select only makes the "unused" lanes past EVL
poison, so it's not needed for correctness.

In the same vein as #127180, this patch removes the transform for
VPWidenSelectRecipes and keeps them as plain select instructions to
allow for more optimisations.

RISCVVLOptimizer will still be able to optimise away any VL toggles and
we end up with better code generation across llvm-test-suite and SPEC
CPU 2017.
2025-07-03 11:50:25 +01:00
Mel Chen
bc8dad1c7e
[VPlan] Emit VPVectorEndPointerRecipe for reverse interleave pointer adjustment (#144864)
A reverse interleave access is essentially composed of multiple
load/store operations with same negative stride, and their addresses are
based on the last lane address of member 0 in the interleaved group.

Currently, we already have VPVectorEndPointerRecipe for computing the
last lane address of consecutive reverse memory accesses. This patch
extends VPVectorEndPointerRecipe to support constant stride and extracts
the reverse interleave group address adjustment from
VPInterleaveRecipe::execute, replacing it with a
VPVectorEndPointerRecipe.

The final goal is to support interleaved accesses with EVL tail folding.
Given that VPInterleaveRecipe is large and tightly coupled — combining
both load and store, and embedding operations like reverse pointer
adjustion (GEP), widen load/store, deinterleave/interleave, and reversal
— breaking it down into smaller, dedicated recipes may allow
VPlanTransforms::tryAddExplicitVectorLength to lower them into EVL-aware
form more effectively.

One foreseeable challenge is that
VPlanTransforms::convertToConcreteRecipes currently runs after
tryAddExplicitVectorLength, so decomposing VPInterleaveRecipe will
likely need to happen earlier in the pipeline to be effective.
2025-07-02 18:16:02 +08:00
Florian Hahn
6b3d2b629c
[VPlan] Add VPExpressionRecipe, replacing extended reduction recipes. (#144281)
This patch adds a new recipe to combine multiple recipes into an
'expression' recipe, which should be considered as single entity for
cost-modeling and transforms. The recipe needs to be 'decomposed', i.e.
replaced by its individual recipes before execute.

This subsumes VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe and should make it easier to extend to
include more types of bundled patterns, like e.g. extends folded into
loads or various arithmetic instructions, if supported by the target.

It allows avoiding re-creating the original recipes when converting to
concrete recipes, together with removing the need to record various
information. The current version of the patch still retains the original
printing matching VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe, but this specialized print could be
replaced with printing the bundled recipes directly.

PR: https://github.com/llvm/llvm-project/pull/144281
2025-07-01 20:44:50 +01:00