701 Commits

Author SHA1 Message Date
Luke Lau
bf46a95f2c
[VPlan] Use target's index type for {First,Last}ActiveLane instead of i64 (#186361)
Fixes #186005

On RV32 with zve32x, i.e. no legal 64 bit types either scalar or vector,
@llvm.cttz.elts.i64 cannot be lowered and so returns an illegal cost for
scalable VFs. However VPInstruction::FirstActiveLane and
VPInstruction::LastActiveLane always use a hardcoded i64 type.

This causes a legacy/VPlan cost model mismatch in the live-out.ll test,
and in early-exit-live-out.ll prevents the scalable VF from being
chosen.

This PR teaches the two VPInstructions to use the target's index type,
i.e. the width of a pointer in the default address space, so it will
generate a 32 bit cttz.elts on RV32. This should be large enough to hold
the maximum number of elements in a vector, as if the vector was any
bigger it would imply it isn't accessible by memory.

I considered using the canonical IV type but I don't think that will
work since the canonical IV can be i64 on RV32, and it causes
regressions due to extra zexting on 64-bit targets with a 32-bit IV.
2026-03-18 15:01:21 +00:00
Elvis Wang
3eb8b788b7
Revert "[LV] Replace remaining LogicalAnd to vp.merge in EVL optimization." (#187170)
Reverts llvm/llvm-project#184068

This hit the cost model assertion in rva23 stage2 build bot.
https://lab.llvm.org/buildbot/#/builders/213/builds/2497
2026-03-18 09:21:40 +08:00
Elvis Wang
52089f895e
[LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068)
This patch replace the remaining LogicalAnd to vp.merge in the second
pass to not break the `m_RemoveMask` pattern in the optimizeMaskToEVL.

This can help to remove header mask for FindLast reduction (CSA) loops.

PR: https://github.com/llvm/llvm-project/pull/184068
2026-03-18 08:39:27 +08:00
Ramkumar Ramachandra
56d7920c09
[VPlan] Factor collectGroupedReplicateMemOps (NFC) (#186820)
Factor out a collectGroupedReplicateMemOps from
collectComplementaryPredicatedMemOps, so it can be re-used in other
places.
2026-03-17 09:15:46 +00:00
Elvis Wang
51b3b9b039
[LV] Optimize x && (x && y) -> x && y (#185806)
This patch removes the extra logical-and in `x && (x && y)` and `x && (y && x)` to `x && y`.
This helps to simplify mask calculation in the FindLast reduction and
exposes more opportunities to replace to EVL.

PR link: https://github.com/llvm/llvm-project/pull/185806
2026-03-17 13:03:04 +08:00
Ramkumar Ramachandra
92e44b247f
Reland [VPlan] Extend interleave-group-narrowing to WidenCast (#186454)
The patch was intially landed as bd5f9384, but then reverted due to an
underlying issue in narrowInterleaveGroups, described in #185860. The
issue has since been fixed. The reland is simply a conflict-resolved
version of the original patch, which includes an additonal test update.

WidenCast is very similar to Widen recipes.

Fixes #128062.
2026-03-16 12:21:48 +00:00
Ramkumar Ramachandra
616bf5abd1
[VPlan] Introduce VPlan::getDataLayout (NFC) (#186418) 2026-03-13 16:17:04 +00:00
Florian Hahn
cbb8e08192
[VPlan] Don't narrow wide loads for scalable VFs when narrowing IGs. (#186181)
For scalable VFs, the narrowed plan processes vscale iterations at once,
so a shared wide load cannot be narrowed to a uniform scalar; bail out,
as there currently is not way to create a narrowed load that loads
vscale elements.

Fixes https://github.com/llvm/llvm-project/issues/185860.

PR: https://github.com/llvm/llvm-project/pull/186181
2026-03-13 16:04:42 +00:00
Florian Hahn
579aca8755
[VPlan] Prevent uses of materialized VPSymbolicValues. (NFC) (#182318)
After VPSymbolicValues (like VF and VFxUF) are materialized via
replaceAllUsesWith, they should not be accessed again. This patch:

1. Tracks materialization state in VPSymbolicValue.

2. Asserts if the materialized VPValue is used again. Currently it
   adds asserts to various member functions, preventing calling them
   on materialized symbolic values.

Note that this still allows some uses (e.g. comparing VPSymbolicValue
references or pointers), but this should be relatively harmless given
that it is impossible to (re-)add any users. If we want to further
tighten the checks, we could add asserts to the accessors or override
operator&, but that will require more changes and not add much extra
guards I think.

Depends on https://github.com/llvm/llvm-project/pull/182146 to fix a
current access violation.

PR: https://github.com/llvm/llvm-project/pull/182318
2026-03-13 14:39:46 +00:00
Ramkumar Ramachandra
540ea54ad7
Revert "[VPlan] Extend interleave-group-narrowing to WidenCast" (#186072)
This reverts commit bd5f9384 (#183204) to buy us time to investigate a
AArch64 SVE-fixed-length buildbot miscompile.

Ref: https://lab.llvm.org/buildbot/#/builders/143/builds/14601
2026-03-12 11:37:09 +00:00
Benjamin Maxwell
430e2b7b79
[LV] Simplify the chain traversal in getScaledReductions() (NFCI) (#184830)
I found the logic of this function quite hard to reason about. This
patch attempts to rectify this by splitting out matching an extended
reduction operand and traversing reduction chain.

- `matchExtendedReductionOperand()` contains all the logic to match an
  extended operand.
- `getScaledReductions()` validates each operation in the chain,
  starting backwards from the exit value, walking up through the operand
  that is not extended.
2026-03-11 06:39:20 +00:00
Florian Hahn
c79a058a6a
[VPlan] Materialize VectorTripCount in narrowInterleaveGroups. (#182146)
When narrowInterleaveGroups transforms a plan, VF and VFxUF are
materialized (replaced with concrete values). This patch also
materializes the VectorTripCount in the same transform.

This ensures that VectorTripCount is properly computed when the narrow
interleave transform is applied, instead of using the original VF
+ UF to compute the vector trip count. The previous behavior generated
correct code, but executed fewer iterations in the vector loop.

The change also enables stricter verification prevent accesses of UF,
VF, VFxUF etc after materialization as follow-up.

Note that in some cases we no miss branch folding, but that should be
addressed separately, https://github.com/llvm/llvm-project/pull/181252

Fixes one of the violations accessing a VectorTripCount after UF and VF
being materialized

PR: https://github.com/llvm/llvm-project/pull/182146
2026-03-10 12:33:30 +00:00
Sander de Smalen
0da00c325b
[LV] Support float and pointer FindLast reductions (#184101)
This duplicates #182313 with some very small modifications on top, as
@dheaton-arm is unable
to finish the PR and I'm unable to push to his branch.

Expands support for the `FindLast` Reccurence Kind to floating-point and
pointer types, thereby
enabling conditional scalar assignment (CSA) for these types.

Originally authored by @dheaton-arm

---------

Co-authored-by: Damian Heaton <Damian.Heaton@arm.com>
2026-03-09 10:27:06 +00:00
Aiden Grossman
e30f9c1946 Revert "Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)""
This reverts commit 6aa115bba55054b0dc81ebfc049e8c7a29e614b2.

This is causing crashes. See #185345 for details.
2026-03-09 04:24:01 +00:00
Florian Hahn
2207296d3f
[VPlan] Fold constant trunc after EVL simplification.
This fixes a crash for the new test after
6aa115bba55054b0dc81ebfc049e8c7a29e614b2.
2026-03-08 19:31:20 +00:00
Florian Hahn
6aa115bba5
Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)"
This reverts commit d7e037c8383e66e5c07897f144f6d8ef47258682.

Recommit with a small fix to properly handle ordered reductions when
connecting the epilogue.

Original message:

Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-03-08 11:13:40 +00:00
Florian Hahn
2ce5f91425
[VPlan] Optimize resume values of IVs together with other exit values. (#174239)
Remove updateScalarResumePhis and create extracts for live-outs early in
addInitialSkeleton. Instead of extracting the from the header phi
recipes for the resume values (which is incorrect), extract the last
lane of the backedege value.

Then update optimizeInductionExitUsers to optimize both the scalar
resume values for IVs and IV exit values together.

This removes the need to pass state between transforms and addresses a
TODO.

PR: https://github.com/llvm/llvm-project/pull/174239
2026-03-06 17:05:53 +00:00
Florian Hahn
d316fb0797
[VPlan] Replicate VPScalarIVStepsRecipe by VF outside replicate regions. (#170053)
Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the
patch adds a new lane operand to VPScalarIVStepsRecipe, which is only
added when replicating. This enables removing a number of lane 0
computations. The lane operand will also be used to explicitly replicate
replicate regions in a follow-up.

Depends on https://github.com/llvm/llvm-project/pull/169796
Depends on https://github.com/llvm/llvm-project/pull/170906

PR: https://github.com/llvm/llvm-project/pull/170053
2026-03-05 12:42:20 +00:00
Ramkumar Ramachandra
ca0d100e79
[VPlan] Use VPlan::getZero to improve code (NFC) (#184591) 2026-03-04 21:21:35 +00:00
Florian Hahn
c370f5af6c
[VPlan] Preserve IsSingleScalar for hoisted predicated load. (#184453)
The predicated loads may be single scalar (e.g. for VF = 1). We should
preserve IsSingleScalar when hoisting them. As all loops access the same
address, IsSingleScalar must match across all loads in the group.

This fixes an assertion when interleaving-only with hoisted loads.

Fixes https://github.com/llvm/llvm-project/issues/184372

PR: https://github.com/llvm/llvm-project/pull/184453
2026-03-04 14:32:00 +00:00
Florian Hahn
bbde3e3b59
[VPlan] Preserve IsSingleScalar for sunken predicated stores. (#184329)
The predicated stores may be single scalar (e.g. for VF = 1). We should
preserve IsSingleScalar. As all stores access the same address,
IsSingleScalar must match across all stores in the group.

This fixes an assertion when interleaving-only with sunken stores.

Fixes https://github.com/llvm/llvm-project/issues/184317

PR: https://github.com/llvm/llvm-project/pull/184329
2026-03-03 14:08:00 +00:00
Ramkumar Ramachandra
b4743b2641
[VPlan] Introduce VPlan::get(Zero|AllOnes) (NFC) (#184085) 2026-03-03 09:47:05 +00:00
Luke Lau
bcc272b322
[LV] Remove DataAndControlFlowWithoutRuntimeCheck. NFC (#183762)
After #144963 and #183292 we never emit the runtime check, so
DataAndControlFlowWithoutRuntimeCheck is equivalent to
DataAndControlFlow.

With that we only need to store one tail folding style instead of two,
because we don't need to distinguish whether or not the IV update
overflows (to a non-zero value)
2026-03-02 21:14:04 +08:00
Jan Patrick Lehr
60fec80bdc
Revert "[VPlan] Remove unused VPExpandSCEVRecipe before expansion" (#184108)
Reverts llvm/llvm-project#181329

Breaks: https://lab.llvm.org/buildbot/#/builders/123/builds/36163
Local revert fixes the issue seen in the buildbot.
2026-03-02 12:45:48 +00:00
Mel Chen
c62c00c524
[VPlan] Remove unused VPExpandSCEVRecipe before expansion (#181329)
VPExpandSCEVRecipe may become unused after VPlan optimizations. This
patch removes VPExpandSCEVRecipes with no users before expansion in
expandSCEVs, avoiding generating dead code during VPlan execution.
2026-03-02 09:04:59 +00:00
Florian Hahn
320220e48b
[VPlan] Support arbitrary predicated early exits. (#182396)
This removes the restriction requiring a single predicated early exit.
Using MaskedCond, we only combine early-exit conditions with block
masks from non-exiting control flow.

This means we have to ensure that we check the early exit conditions in
program order, to make sure we take the first exit in program order that
exits at the first lane for the combined exit condition.

To do so, sort the exits by their reverse post-order numbers.

Depends on https://github.com/llvm/llvm-project/pull/182395

PR: https://github.com/llvm/llvm-project/pull/182396
2026-03-01 16:07:05 +00:00
Florian Hahn
72525fb4ee
[VPlan] Materialize UF after unrolling (NFCI).
Move materialization of the symbolic UF directly to unrollByUF. At this
point, unrolling materializes the decision and it is natural to also
materialize the symbolic UF here.
2026-02-28 12:44:15 +00:00
Luke Lau
6f9c68d320
[VPlan] Don't adjust trip count for DataAndControlFlowWithoutRuntimeCheck (#183729)
Previously, the canonical IV increment may have overflowed to a non-zero
value due to vscale being a non power-of-two. So we used to emit a
runtime check for this.

If you didn't want the runtime check,
DataAndControlFlowWithoutRuntimeCheck skipped it and instead tweaked the
trip count so it wouldn't overflow.

However #144963 stopped the check from ever being emitted because vscale
is always a power-of-two on AArch64 and RISC-V, so it never overflowed
to a non-zero value. And in #183292 the code to emit the check was
removed. But we never restored the trip count back to normal when the
target's vscale was a power-of-two.

Now that vscale is always a power-of-two, this PR avoids adjusting it. A
follow up NFC can then remove DataAndControlFlowWithoutRuntimeCheck.
2026-02-28 04:01:58 +00:00
Florian Hahn
d7e037c838
Revert "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)"
This reverts commit 9c53215d213189d1f62e8f6ee7ba73a089ac2269.

Appears to cause crashes with ordered reductions, revert while I
investigate
2026-02-27 21:29:41 +00:00
Florian Hahn
9c53215d21
[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)
Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-02-27 16:49:54 +00:00
Luke Lau
c5c0fe663c
[VPlan] Remove non-power-of-2 scalable VF comment. NFC (#183719)
No longer holds after #183080
2026-02-27 10:45:17 +00:00
Sander de Smalen
a1f83ba1b6
[LV] NFCI: Move extend optimization to transformToPartialReduction. (#182860)
The reason for doing this in `transformToPartialReduction` is so that we
can create the VPExpressions directly when transforming reductions into
partial reductions (to be done in a follow-up PR).

I also intent to see if we can merge the in-loop reductions with partial
reductions, so that there will be no need for the separate
`convertToAbstractRecipes` VPlan Transform pass.
2026-02-27 08:38:13 +00:00
Ramkumar Ramachandra
bd5f9384d8
[VPlan] Extend interleave-group-narrowing to WidenCast (#183204)
WidenCast is very similar to Widen recipes.

Fixes #128062.
2026-02-26 14:50:25 +00:00
Florian Hahn
c5d6feb315
[VPlan] Limit interleave group narrowing to consecutive wide loads.
Tighten check in canNarrowLoad to require consecutive wide loads; we
cannot properly narrow gathers at the moment.

Fixe https://github.com/llvm/llvm-project/issues/183345.
2026-02-26 12:52:31 +00:00
Florian Hahn
32b8b9ba1e
[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507)
Now that we have ExitingIVValue, we can also use it for tail-folded
loops; the only difference is that we have to compute the end value with
the original trip count instead the vector trip count.

This allows removing the induction increment operand only used when
tail-folding.

PR: https://github.com/llvm/llvm-project/pull/182507
2026-02-26 11:48:04 +00:00
Benjamin Maxwell
3c566a698a
[LV] Fix miscompile with conditional scalar assignment + tail folding (#182492)
Previously, we could miscompile when vectorizing conditional scalar
assignments with forced tail folding, as the backedge select could be
based on the header mask, not the assignment conditional.

This resulted in a number of failures in the LLVM test suite when
building with `-O3 -march=armv8-a+sve -mllvm
-prefer-predicate-over-epilogue=predicate-dont-vectorize`.

The patch reworks `handleFindLastReductions()` to correctly handle tail
folding.
2026-02-26 09:00:16 +00:00
Florian Hahn
bf4705c05b
[VPlan] Supported conditionally executed single early exits. (#182395)
Add support for a single early exit that is executed conditionally. To
make sure the mask from any non-exiting control flow is combined with
the early exit condition.

To do so, introduce a MaskedCond VPInstruction, which is inserted as
user of the early-exit condition, at the point of the early-exit branch.
The VPInstruction will get masked automatically if needed by the
predicator, ensuring that we properly account for it when checking
whether the early exit has been taken.

Note that this does not allow for instructions that require predication
after the early exit. This requires additional work in progress:
https://github.com/llvm/llvm-project/pull/172454

As an alternative to MaskedCond, we could also predicate before handling
early exiting blocks: https://github.com/llvm/llvm-project/pull/181830

PR: https://github.com/llvm/llvm-project/pull/182395
2026-02-25 14:28:04 +00:00
Florian Hahn
804572136e
[VPlan] Allow recursive narrowing in interleave group narrowing. (#167310)
This allows canNarrowOps to recursively check if operands can be
narrowed, enabling narrowing of longer chains of operations that
feed interleave groups.

Depends on https://github.com/llvm/llvm-project/pull/167309.

PR: https://github.com/llvm/llvm-project/pull/167310
2026-02-24 21:30:00 +00:00
Benjamin Maxwell
3b5a05d0b2
Revert "[VPlan] Strengthen materializeFactors with assert (NFC) (#181665)" (#183014)
This PR did not solve the TODO as intended. Reverting so the TODO is not
lost.

This reverts commit aab9412a69a07787e9ec98b25709d709b7b537a6.
2026-02-24 18:03:01 +00:00
Ramkumar Ramachandra
e147b3a05e
[VPlan] Fix alias logic in canHoistOrSinkWithNoAliasCheck (#179504)
The correct way to check if two memory locations may alias is outlined
in ScopedNoAliasAAResult::alias: extract this into a helper, to fix the
current logic.
2026-02-24 16:14:44 +00:00
Florian Hahn
72c0a074db
[VPlan] Move out canNarrowOps (NFC). (#167309)
Move definition of canNarrowOps out to static function, to make it
easier to extend + generalize

PR: https://github.com/llvm/llvm-project/pull/167309
2026-02-24 14:20:47 +00:00
Florian Hahn
6b352aa8ea
Revert "[VPlan] Add simple driver option to run some individual transforms. (#178522)"
This reverts commit 3df1c6f88bfbbd76d9256c55358bb75e02e33779.

Causes build-failures without assertions
https://lab.llvm.org/buildbot/#/builders/159/builds/41683
2026-02-23 22:55:42 +00:00
Florian Hahn
3df1c6f88b
[VPlan] Add simple driver option to run some individual transforms. (#178522)
Add an alternative to test VPlan in more isolation via a new
`vplan-test-transform` option, which builds VPlan0 for each loop in the
input IR and then can invoke a set of transforms on it.

In order to allow different recipe types to be created, a new
widen-from-metadata transform is added, which transforms VPInstructions
to different recipes, based on custom !vplan.widen metadata. Currently
this supports creating widen & replicate recipes, but can easily be
extended in the future.

Currently the handling is intentionally bare-bones, to be extended
gradually as needed.

PR: https://github.com/llvm/llvm-project/pull/178522
2026-02-23 22:49:00 +00:00
Luke Lau
ff88b83fed
[VPlan] Handle extracts for middle blocks also used by early exiting blocks. NFC (#181789)
Currently createExtractsForLiveOuts only handles creating extracts when
the middle block has one predecessor, but if an early exit exits to the
same block as the latch then it might have multiple predecessors.

This handles the latter case to avoid the need to handle it in
VPlanTransforms::handleUncountableEarlyExits. Addresses the comment in
https://github.com/llvm/llvm-project/pull/174864#discussion_r2794153217
2026-02-23 04:03:49 +00:00
Florian Hahn
4ce4987381
[VPlan] Optimize FindLast of FindIV w/o sentinel. (#172569)
For FindLast reduction selecting an IV, we can avoid the horizontal
AnyOf in the vector loop, by introducing an independent  boolean
reduction to track if the condition was ever true in the loop. If it was
never true in the loop, we select the start value, otherwise the select
the min/max of the FindIV reduction, as required by the predicate.

The main advantage of this approach is that we have 2 independent
reductions, that do not require a horizontal AnyOf reduction in the
loop.

Currently this requires a non-wrapping IV, but this can be relaxed in
the future by selecting a canonical IV, which is then transformed to the
specific derived IV for the reduction after the loop.

Depends on https://github.com/llvm/llvm-project/pull/177870.

PR: https://github.com/llvm/llvm-project/pull/172569
2026-02-20 21:48:35 +00:00
Sander de Smalen
46bfd69343
[LV] NFCI: Add RecurKind to VPPartialReductionChain (#181705)
This avoids having to pass around the RecurKind or re-figure it out from
the VPReductionPHI node.

This is useful in a follow-up PR, where we need to distinguish between a
`Sub` and `AddWithSub` recurrence, which can't be deduced from the
`ReductionBinOp` field.
2026-02-19 13:35:10 +00:00
Luke Lau
6a5375fbce
[VPlan] Plumb recurrence FMFs through VPReductionPHIRecipe via VPIRFlags. NFC (#181694)
In order to be able to create selects for reduction phis through tail
folding in foldTailByMasking (#176143), make VPReductionPHIRecipe an
instance of VPIRFlags and plumb the FMFs from the original RdxDesc.

This allows us to remove more uses of the RecurrenceDescriptor in
addReductionResultComputation, which should help untie it from
LoopVectorizationLegality.
2026-02-19 11:23:47 +00:00
Benjamin Maxwell
867272d52a
[LV] Pass symbolic VF to CalculateTripCountMinusVF and CanonicalIVIncrementForPart (NFC) (#180542)
This makes it easier to update the runtime VF per VPlan.
2026-02-18 08:58:47 +00:00
Shih-Po Hung
97fa3e5936
[NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe (#177114)
This is groundwork for #151300, which aims to support first-faulting
loads in non-tail-folded early-exit loops.
Per #175900, we need a variable-length stepping transform that can
shared between EVL and non-EVL loops.
The idea is to have an EVL-independent counter and transform for
tracking the cumulative number of processed elements.

This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and
transform (canonicalizeEVLLoops) to be EVL-independent:
- Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to
  reflect its general purpose of tracking processed element count.
- Rename canonicalizeEVLLoops to convertToVariableLengthStep.

This is NFC.
2026-02-18 07:04:58 +00:00
Ramkumar Ramachandra
aab9412a69
[VPlan] Strengthen materializeFactors with assert (NFC) (#181665)
This fixes a TODO.
2026-02-17 16:18:22 +00:00