680 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
b4743b2641
[VPlan] Introduce VPlan::get(Zero|AllOnes) (NFC) (#184085) 2026-03-03 09:47:05 +00:00
Luke Lau
bcc272b322
[LV] Remove DataAndControlFlowWithoutRuntimeCheck. NFC (#183762)
After #144963 and #183292 we never emit the runtime check, so
DataAndControlFlowWithoutRuntimeCheck is equivalent to
DataAndControlFlow.

With that we only need to store one tail folding style instead of two,
because we don't need to distinguish whether or not the IV update
overflows (to a non-zero value)
2026-03-02 21:14:04 +08:00
Jan Patrick Lehr
60fec80bdc
Revert "[VPlan] Remove unused VPExpandSCEVRecipe before expansion" (#184108)
Reverts llvm/llvm-project#181329

Breaks: https://lab.llvm.org/buildbot/#/builders/123/builds/36163
Local revert fixes the issue seen in the buildbot.
2026-03-02 12:45:48 +00:00
Mel Chen
c62c00c524
[VPlan] Remove unused VPExpandSCEVRecipe before expansion (#181329)
VPExpandSCEVRecipe may become unused after VPlan optimizations. This
patch removes VPExpandSCEVRecipes with no users before expansion in
expandSCEVs, avoiding generating dead code during VPlan execution.
2026-03-02 09:04:59 +00:00
Florian Hahn
320220e48b
[VPlan] Support arbitrary predicated early exits. (#182396)
This removes the restriction requiring a single predicated early exit.
Using MaskedCond, we only combine early-exit conditions with block
masks from non-exiting control flow.

This means we have to ensure that we check the early exit conditions in
program order, to make sure we take the first exit in program order that
exits at the first lane for the combined exit condition.

To do so, sort the exits by their reverse post-order numbers.

Depends on https://github.com/llvm/llvm-project/pull/182395

PR: https://github.com/llvm/llvm-project/pull/182396
2026-03-01 16:07:05 +00:00
Florian Hahn
72525fb4ee
[VPlan] Materialize UF after unrolling (NFCI).
Move materialization of the symbolic UF directly to unrollByUF. At this
point, unrolling materializes the decision and it is natural to also
materialize the symbolic UF here.
2026-02-28 12:44:15 +00:00
Luke Lau
6f9c68d320
[VPlan] Don't adjust trip count for DataAndControlFlowWithoutRuntimeCheck (#183729)
Previously, the canonical IV increment may have overflowed to a non-zero
value due to vscale being a non power-of-two. So we used to emit a
runtime check for this.

If you didn't want the runtime check,
DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the
trip count so it wouldn't overflow.

However #144963 stopped the check from ever being emitted because vscale
is always a power-of-two on AArch64 and RISC-V, so it never overflowed
to a non-zero value. And in #183292 the code to emit the check was
removed. But we never restored the trip count back to normal when the
target's vscale was a power-of-two.

Now that vscale is always a power-of-two, this PR avoids adjusting it. A
follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.
2026-02-28 04:01:58 +00:00
Florian Hahn
d7e037c838
Revert "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)"
This reverts commit 9c53215d213189d1f62e8f6ee7ba73a089ac2269.

Appears to cause crashes with ordered reductions, revert while I
investigate
2026-02-27 21:29:41 +00:00
Florian Hahn
9c53215d21
[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)
Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-02-27 16:49:54 +00:00
Luke Lau
c5c0fe663c
[VPlan] Remove non-power-of-2 scalable VF comment. NFC (#183719)
No longer holds after #183080
2026-02-27 10:45:17 +00:00
Sander de Smalen
a1f83ba1b6
[LV] NFCI: Move extend optimization to transformToPartialReduction. (#182860)
The reason for doing this in `transformToPartialReduction` is so that we
can create the VPExpressions directly when transforming reductions into
partial reductions (to be done in a follow-up PR).

I also intent to see if we can merge the in-loop reductions with partial
reductions, so that there will be no need for the separate
`convertToAbstractRecipes` VPlan Transform pass.
2026-02-27 08:38:13 +00:00
Ramkumar Ramachandra
bd5f9384d8
[VPlan] Extend interleave-group-narrowing to WidenCast (#183204)
WidenCast is very similar to Widen recipes.

Fixes #128062.
2026-02-26 14:50:25 +00:00
Florian Hahn
c5d6feb315
[VPlan] Limit interleave group narrowing to consecutive wide loads.
Tighten check in canNarrowLoad to require consecutive wide loads; we
cannot properly narrow gathers at the moment.

Fixe https://github.com/llvm/llvm-project/issues/183345.
2026-02-26 12:52:31 +00:00
Florian Hahn
32b8b9ba1e
[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507)
Now that we have ExitingIVValue, we can also use it for tail-folded
loops; the only difference is that we have to compute the end value with
the original trip count instead the vector trip count.

This allows removing the induction increment operand only used when
tail-folding.

PR: https://github.com/llvm/llvm-project/pull/182507
2026-02-26 11:48:04 +00:00
Benjamin Maxwell
3c566a698a
[LV] Fix miscompile with conditional scalar assignment + tail folding (#182492)
Previously, we could miscompile when vectorizing conditional scalar
assignments with forced tail folding, as the backedge select could be
based on the header mask, not the assignment conditional.

This resulted in a number of failures in the LLVM test suite when
building with `-O3 -march=armv8-a+sve -mllvm
-prefer-predicate-over-epilogue=predicate-dont-vectorize`.

The patch reworks `handleFindLastReductions()` to correctly handle tail
folding.
2026-02-26 09:00:16 +00:00
Florian Hahn
bf4705c05b
[VPlan] Supported conditionally executed single early exits. (#182395)
Add support for a single early exit that is executed conditionally. To
make sure the mask from any non-exiting control flow is combined with
the early exit condition.

To do so, introduce a MaskedCond VPInstruction, which is inserted as
user of the early-exit condition, at the point of the early-exit branch.
The VPInstruction will get masked automatically if needed by the
predicator, ensuring that we properly account for it when checking
whether the early exit has been taken.

Note that this does not allow for instructions that require predication
after the early exit. This requires additional work in progress:
https://github.com/llvm/llvm-project/pull/172454

As an alternative to MaskedCond, we could also predicate before handling
early exiting blocks: https://github.com/llvm/llvm-project/pull/181830

PR: https://github.com/llvm/llvm-project/pull/182395
2026-02-25 14:28:04 +00:00
Florian Hahn
804572136e
[VPlan] Allow recursive narrowing in interleave group narrowing. (#167310)
This allows canNarrowOps to recursively check if operands can be
narrowed, enabling narrowing of longer chains of operations that
feed interleave groups.

Depends on https://github.com/llvm/llvm-project/pull/167309.

PR: https://github.com/llvm/llvm-project/pull/167310
2026-02-24 21:30:00 +00:00
Benjamin Maxwell
3b5a05d0b2
Revert "[VPlan] Strengthen materializeFactors with assert (NFC) (#181665)" (#183014)
This PR did not solve the TODO as intended. Reverting so the TODO is not
lost.

This reverts commit aab9412a69a07787e9ec98b25709d709b7b537a6.
2026-02-24 18:03:01 +00:00
Ramkumar Ramachandra
e147b3a05e
[VPlan] Fix alias logic in canHoistOrSinkWithNoAliasCheck (#179504)
The correct way to check if two memory locations may alias is outlined
in ScopedNoAliasAAResult::alias: extract this into a helper, to fix the
current logic.
2026-02-24 16:14:44 +00:00
Florian Hahn
72c0a074db
[VPlan] Move out canNarrowOps (NFC). (#167309)
Move definition of canNarrowOps out to static function, to make it
easier to extend + generalize

PR: https://github.com/llvm/llvm-project/pull/167309
2026-02-24 14:20:47 +00:00
Florian Hahn
6b352aa8ea
Revert "[VPlan] Add simple driver option to run some individual transforms. (#178522)"
This reverts commit 3df1c6f88bfbbd76d9256c55358bb75e02e33779.

Causes build-failures without assertions
https://lab.llvm.org/buildbot/#/builders/159/builds/41683
2026-02-23 22:55:42 +00:00
Florian Hahn
3df1c6f88b
[VPlan] Add simple driver option to run some individual transforms. (#178522)
Add an alternative to test VPlan in more isolation via a new
`vplan-test-transform` option, which builds VPlan0 for each loop in the
input IR and then can invoke a set of transforms on it.

In order to allow different recipe types to be created, a new
widen-from-metadata transform is added, which transforms VPInstructions
to different recipes, based on custom !vplan.widen metadata. Currently
this supports creating widen & replicate recipes, but can easily be
extended in the future.

Currently the handling is intentionally bare-bones, to be extended
gradually as needed.

PR: https://github.com/llvm/llvm-project/pull/178522
2026-02-23 22:49:00 +00:00
Luke Lau
ff88b83fed
[VPlan] Handle extracts for middle blocks also used by early exiting blocks. NFC (#181789)
Currently createExtractsForLiveOuts only handles creating extracts when
the middle block has one predecessor, but if an early exit exits to the
same block as the latch then it might have multiple predecessors.

This handles the latter case to avoid the need to handle it in
VPlanTransforms::handleUncountableEarlyExits. Addresses the comment in
https://github.com/llvm/llvm-project/pull/174864#discussion_r2794153217
2026-02-23 04:03:49 +00:00
Florian Hahn
4ce4987381
[VPlan] Optimize FindLast of FindIV w/o sentinel. (#172569)
For FindLast reduction selecting an IV, we can avoid the horizontal
AnyOf in the vector loop, by introducing an independent  boolean
reduction to track if the condition was ever true in the loop. If it was
never true in the loop, we select the start value, otherwise the select
the min/max of the FindIV reduction, as required by the predicate.

The main advantage of this approach is that we have 2 independent
reductions, that do not require a horizontal AnyOf reduction in the
loop.

Currently this requires a non-wrapping IV, but this can be relaxed in
the future by selecting a canonical IV, which is then transformed to the
specific derived IV for the reduction after the loop.

Depends on https://github.com/llvm/llvm-project/pull/177870.

PR: https://github.com/llvm/llvm-project/pull/172569
2026-02-20 21:48:35 +00:00
Sander de Smalen
46bfd69343
[LV] NFCI: Add RecurKind to VPPartialReductionChain (#181705)
This avoids having to pass around the RecurKind or re-figure it out from
the VPReductionPHI node.

This is useful in a follow-up PR, where we need to distinguish between a
`Sub` and `AddWithSub` recurrence, which can't be deduced from the
`ReductionBinOp` field.
2026-02-19 13:35:10 +00:00
Luke Lau
6a5375fbce
[VPlan] Plumb recurrence FMFs through VPReductionPHIRecipe via VPIRFlags. NFC (#181694)
In order to be able to create selects for reduction phis through tail
folding in foldTailByMasking (#176143), make VPReductionPHIRecipe an
instance of VPIRFlags and plumb the FMFs from the original RdxDesc.

This allows us to remove more uses of the RecurrenceDescriptor in
addReductionResultComputation, which should help untie it from
LoopVectorizationLegality.
2026-02-19 11:23:47 +00:00
Benjamin Maxwell
867272d52a
[LV] Pass symbolic VF to CalculateTripCountMinusVF and CanonicalIVIncrementForPart (NFC) (#180542)
This makes it easier to update the runtime VF per VPlan.
2026-02-18 08:58:47 +00:00
Shih-Po Hung
97fa3e5936
[NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe (#177114)
This is groundwork for #151300, which aims to support first-faulting
loads in non-tail-folded early-exit loops.
Per #175900, we need a variable-length stepping transform that can
shared between EVL and non-EVL loops.
The idea is to have an EVL-independent counter and transform for
tracking the cumulative number of processed elements.

This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and
transform (canonicalizeEVLLoops) to be EVL-independent:
- Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to
  reflect its general purpose of tracking processed element count.
- Rename canonicalizeEVLLoops to convertToVariableLengthStep.

This is NFC.
2026-02-18 07:04:58 +00:00
Ramkumar Ramachandra
aab9412a69
[VPlan] Strengthen materializeFactors with assert (NFC) (#181665)
This fixes a TODO.
2026-02-17 16:18:22 +00:00
Luke Lau
09a0615686
[VPlan] Simplify worklist in reassociateHeaderMask. NFC (#181595)
Addresses review comments from
https://github.com/llvm/llvm-project/pull/180898#pullrequestreview-3791945590.
We don't need to recursively collect direct users of the header mask, we
can do that as a separate step so that the main worklist loop only
handles potentially reassociatable candidates.

Also add back mention of tail folding to comment and a TODO.
2026-02-17 14:36:25 +00:00
Florian Hahn
cdaeecabf7
[VPlan] Only remove backedge if IV is still incremented by VFxUF.
After 6f253e87dd, VFxUF may have been replaced by UF, in which case the
simplification is no longer correct. Tighten check to make sure the
increment is still what we expect.

Fixes a miscompile in the added test case.
2026-02-17 11:40:32 +00:00
Ramkumar Ramachandra
2b7c1f9d82
[VPlan] Directly unroll VectorEndPointerRecipe (#172372)
Directly unroll VectorEndPointerRecipe following 0636225b ([VPlan]
Directly unroll VectorPointerRecipe, #168886). It allows us to leverage
existing VPlan simplifications to optimize.

Co-authored-by: Luke Lau <luke@igalia.com>
Co-authored-by: Florian Hahn <flo@fhahn.com>
2026-02-16 09:59:55 +00:00
Ramkumar Ramachandra
edfe43cc9e
[VPlan] Factor common VPDT-sort in sink-replicate (NFC) (#179214) 2026-02-16 07:24:55 +00:00
Florian Hahn
6f253e87dd
Reapply "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

The underlying issue causing the revert has been fixed independently
as 301fa24671256734df6b7ee65f23ad885400108e.

Original message:
Move narrowInterleaveGroups to to general VPlan optimization stage.

To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.

If such a VF is found, the original VPlan is split into 2:
 a) a new clone which contains all VFs of Plan, except VFToOptimize, and
 b) the original Plan with VFToOptimize as single VF.

The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.

Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.

One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz

PR: https://github.com/llvm/llvm-project/pull/149706
2026-02-15 20:10:10 +00:00
Florian Hahn
f3a816598d
[VPlan] Add VPSymbolicValue for UF. (NFC)
Add a symbolic unroll factor (UF) to VPlan similar to VF & VFxUF that
gets replaced with the concrete UF during plan execution, similar to how VF
is used for the vectorization factor. This is a preparatory change that
allows transforms to use the symbolic UF before the concrete UF is
determined.

Note that the old getUF that returns the concrete UF after unrolling has
been renamed to getConcreteUF.

Split off from the re-commit of 8d29d093096
(https://github.com/llvm/llvm-project/pull/149706) as suggested.
2026-02-15 15:24:35 +00:00
Florian Hahn
f26e8595c3
[VPlan] Use VPlan::getConstantInt in a few more cases (NFC).
VPlan::getConstantInt() allows for slightly more compact creation of
VPIRValues wrapping ConstantInts.
2026-02-15 14:45:33 +00:00
Brian Cain
02429c4633
[LV] Fix strict weak ordering violation in handleUncountableEarlyExits sort (#181462)
The sort comparator used VPDT.dominates() which returns true for
dominates(A, A), violating the irreflexivity requirement of strict weak
ordering. With _GLIBCXX_DEBUG enabled (LLVM_ENABLE_EXPENSIVE_CHECKS=ON),
std::sort validates this property and aborts:

Error: comparison doesn't meet irreflexive requirements, assert(!(a <
a)).

Use properlyDominates() instead, which correctly returns false for equal
inputs while preserving the intended dominance-based ordering.

This fixes a crash introduced by ede1a9626b89 ("[LV] Vectorize early
exit loops with multiple exits.").
2026-02-14 23:05:39 -06:00
Florian Hahn
ede1a9626b
[LV] Vectorize early exit loops with multiple exits. (#174864)
Building on top of the recent changes to introduce BranchOnTwoConds,
this patch adds support for vectorizing loops with multiple early exits,
all dominating a countable latch. The early exits must form a
dominance chain, so we can simply check which early exit has been taken
in dominance order.

Currently LoopVectorizationLegality ensures that all exits other than
the latch must be uncountable. handleUncountableEarlyExits now collects
those uncountable exits and processes each exit.

In the vector region, we compute if any exit has been taken, by taking
the OR of all early exit conditions (EarlyExitConds) and checking if
there's
any active lane.

If the early exit is taken, we exit the loop and compute which early
exit
has been taken. The first taken early exit is the one where its exit
condition is true in the first active lane of EarlyExitConds.

We create a chain of dispatch blocks outside the loop to check this for
the early exit blocks ordered by dominance.

Depends on https://github.com/llvm/llvm-project/pull/174016.

PR: https://github.com/llvm/llvm-project/pull/174864
2026-02-13 16:44:23 +00:00
Ramkumar Ramachandra
ec0b22ff47
[VPlan] Reuse introduces-broadcast logic in narrowToSingleScalars (#174444)
narrowToSingleScalarRecipes' operands check is a bit too restrictive by
permitting a single user. Factor out and reuse the existing
introduces-broadcast logic to improve results.
2026-02-13 15:56:57 +00:00
Florian Hahn
a55fbab0cf
[VPlan] Run initial recipe simplification on VPlan0. (#176828)
In some cases, LV gets simplifyable IR as input. Directly apply
simplifications on the initial VPlan0 to avoid vectorization in cases
where the loop body can be folded away.

Using the end-to-end pipeline, this is relatively rare, but when
reducing test cases, the reduction often ends up with cases with trivial
folds. Rejecting those will result in more robust & realistic test
cases.

As follow-up, I also plan to add initial dead recipe removal.

Depends on https://github.com/llvm/llvm-project/pull/176795.

PR: https://github.com/llvm/llvm-project/pull/176828
2026-02-13 12:01:22 +00:00
Florian Hahn
ef85b0c454
[VPlan] Check scalar VF in removeRedundantCanonicalIVs.
When the plan has only a scalar VF, we never generate vectors for IVs,
so we can always perform the replacement.
2026-02-12 23:02:51 +00:00
Luke Lau
3482a9c6cb
[VPlan] Explicitly reassociate header mask in logical and (#180898)
We reassociate ((x && y) && z) -> (x && (y && z)) if x has more than
use, in order to allow simplifying the header mask further. However this
is somewhat unreliable as there are times when it doesn't have more than
one use, e.g. see the case we run into in
https://github.com/llvm/llvm-project/pull/173265/changes#r2769759907.

This moves it into a separate transformation that always reassociates
the header mask regardless of the number of uses, which prevents some
fragile test changes in #173265.

We need to run it before both calls to simplifyRecipes in optimize. I
considered putting it in simplifyRecipes itself but simplifyRecipes is
also called after unrolling and when the loop region is dissolved which
causes vputils::findHeaderMask to assert.

There isn't really any benefit to reassociating masks that aren't the
header mask so the existing simplification was removed.
2026-02-12 14:56:15 +00:00
Ramkumar Ramachandra
2223b931c5
[VPlan] Introduce m_c_Logical(And|Or) (#180048) 2026-02-12 13:14:08 +00:00
Florian Hahn
61521a94af
[VPlan] Ensure countable region in narrowInterleaveGroups.
This tightens the legality checks. Currently should not have any impact,
but is needed to avoid mis-compiles in follow-up changes.
2026-02-10 21:24:54 +00:00
Florian Hahn
a1fc5b4a48
[VPlan] Reject partial reductions with invalid costs in getScaledReds. (#180438)
Check if costs for partial reductions are valid up-front in
getScaledReductions instead when transforming each link in the chain in
transformToPartialReduction. This ensures that we either transform all
entries in the chain together, or none via the existing invalidation
logic.

This fixes a crash when a link in the chain would have invalid cost, as
in the added test cases.

Fixes https://github.com/llvm/llvm-project/issues/180340.

PR: https://github.com/llvm/llvm-project/pull/180438
2026-02-10 21:16:21 +00:00
Sander de Smalen
3157758190
[LV] Handle partial sub-reductions with sub in middle block. (#178919)
Sub-reductions can be implemented in two ways:
(1) negate the operand in the vector loop (the default way).
(2) subtract the reduced value from the init value in the middle block.

Note that both ways keep the reduction itself as an 'add' reduction,
which is necessary because only llvm.vector.partial.reduce.add exists.

The ISD nodes for partial reductions don't support folding the
sub/negation into its operands because the following is not a valid
transformation:
```
     sub(0, mul(ext(a), ext(b)))
  -> mul(ext(a), ext(sub(0, b)))
```
It can therefore be better to choose option (2) such that the partial
reduction is always positive (starting at '0') and to do a final
subtract in the middle block.

For AArch64 there are no dot-product instructions that can
do a `partial.reduce.sub(acc, mul(ext(a), ext(b)))` operation.
I'm not sure if such instructions exist for other targets.
(If so then we may want to make this decision a target option)

This PR also increases the AArch64 cost of a partial sub-reduction
when this exists in an 'add-sub' reduction chain.

Fixes https://github.com/llvm/llvm-project/issues/178703
2026-02-10 11:00:32 +00:00
Mel Chen
7e5d9189d2
[VPlan] Simplify true && x -> x (#179426) 2026-02-10 08:49:03 +00:00
Florian Hahn
d1ec04dfd4
[VPlan] Simplify single-entry VPWidenPHIRecipe.
Include VPWidenPHIRecipe in phi simplification if there's a single
incoming value.
2026-02-09 22:10:13 +00:00
Luke Lau
8cd86ff284
[VPlan] Propagate FastMathFlags from phis to blends (#180226)
If a phi has fast math flags, we can propagate it to the widened select.
To do this, this patch makes VPPhi and VPBlendRecipe subclasses of
VPRecipeWithIRFlags, and propagates it through PlainCFGBuilder and
VPPredicator.

Alive2 proofs for some of the FMFs (it looks like it can't reason about
the full "fast" set yet)
nnan: https://alive2.llvm.org/ce/z/f0bRd4
nsz: https://alive2.llvm.org/ce/z/u9P96T

The actual motivation for this to eventually be able to move the special
casing for tail folding in
LoopVectorizationPlanner::addReductionResultComputation into the CFG in
#176143, which requires passing through FMFs.
2026-02-09 19:38:58 +08:00
Florian Hahn
7509cad693
[VPlan] Support masked VPInsts, use for predication (NFC) (#142285)
Add support for mask operands to most VPInstructions, using
getNumOperandsForOpcode.

This allows VPlan predication to predicate VPInstructions directly. The
mask will then be dropped or handled when creating wide recipes.

Depends on https://github.com/llvm/llvm-project/pull/142284.
Depends on https://github.com/llvm/llvm-project/pull/168784.

PR: https://github.com/llvm/llvm-project/pull/142285
2026-02-08 18:23:36 +00:00