728 Commits

Author SHA1 Message Date
Florian Hahn
c109dd1e9a
[VPlan] Refactor FindLastSelect matching to use m_Specific(PhiR) (NFC). (#190547)
Match the select operands directly against PhiR using m_Specific,
binding only the non-phi IV expression. This replaces the generic
TrueVal/FalseVal matching followed by an assert and conditional
extraction.

Split off from approved
https://github.com/llvm/llvm-project/pull/183911/ as suggested.
2026-04-05 20:07:34 +00:00
Florian Hahn
36e495dd90
[VPlan] Use APSInt in CheckSentinel directly (NFC). (#190534)
Simplify the sentinel checking logic by using APSInt and checking for
both a signed and unsigned sentinel in a single call.

Removes the IsSigned argument

Split off from approved
https://github.com/llvm/llvm-project/pull/183911/ as suggested.
2026-04-05 16:43:59 +00:00
Florian Hahn
a2c16bb59f
[VPlan] Rename CondSelect to FindLastSelect (NFC). (#190536)
…ns (NFC).

Use the more descriptive name FindLastSelect for the conditional select
that picks between the reduction phi and the IV value.

Split off from approved
https://github.com/llvm/llvm-project/pull/183911/ as suggested.
2026-04-05 16:39:34 +00:00
Sander de Smalen
730a07f225
[LV] Only create partial reductions when profitable. (#181706)
We want the LV cost-model to make the best possible decision of VF and
whether or not to use partial reductions. At the moment, when the LV can
use partial reductions for a given VF range, it assumes those are always
preferred. After transforming the plan to use partial reductions, it
then chooses the most profitable VF. It is possible for a different VF
to have been more profitable, if it wouldn't have chosen to use partial
reductions.

This PR changes that, to first decide whether partial reductions are
more profitable for a given chain. If not, then it won't do the
transform.

This causes some regressions for AArch64 which are addressed in a
follow-up PR to keep this one simple.
2026-04-03 17:42:51 +01:00
Ramkumar Ramachandra
e09d1e3ff1
[VPlan] Use not_equal_to to improve code (NFC) (#190262) 2026-04-03 07:32:34 +01:00
Ramkumar Ramachandra
bb2a63a673
[VPlan] Use m_Isa to improve code (NFC) (#190149) 2026-04-02 15:53:05 +01:00
Ramkumar Ramachandra
82e8494070
[VPlan] Avoid unnecessary BTC SymbolicValue creation (NFC) (#189929)
Don't unnecessarily create a backedge-taken-count SymbolicValue. This
allows us to simplify some code.
2026-04-01 16:25:48 +00:00
Henry Jiang
5d624b5b93
[VPlan] Stop outerloop vectorization from vectorizing nonvector intrinsics (#185347)
In outer-loop VPlan, avoid emitting vector intrinsic calls for intrinsics
without a vector form. In VPRecipeBuilder, detect missing vector intrinsic
mapping and emit scalar handling instead of a vector call.

Also fix assertion when `llvm.pseudoprobe` in VPlan's native path is being
treated as a `WIDEN-INTRINSIC`.

Reproducer: https://godbolt.org/z/GsPYobvYs
2026-03-31 16:01:39 -07:00
Florian Hahn
ff4e229f8c
Revert "[VPlan] Extract reverse mask from reverse accesses" (#189637)
Reverts llvm/llvm-project#155579

Assertion added triggers on some buildbots
clang:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840:
virtual InstructionCost
llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &)
const: Assertion `!IsReverse() && "Inconsecutive memory access should
not have reverse order"' failed.
PLEASE submit a bug report to
https://github.com/llvm/llvm-project/issues/ and include the crash
backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang
-DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3
-std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o
-MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o
CMakeFiles/timberwolfmc.dir/finalpin.c.o -c
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c
2026-03-31 15:53:01 +01:00
Ramkumar Ramachandra
c592eba498
[VPlan] Use RPOT in CSE, fixing potential crash (#187548)
A CSE crash is observed arising from outdated hash values unless we
forbid replacements in successor phis in blocks that are not dominated
by the def: the crash is observed when there is a block with CSE'able
phis with CSE'able incoming values, with incoming values coming from a
non-dominating block, under the condition that the block with the phis
is visited before the non-dominating block. It is unfortunately
impossible to write a test case showing a crash at present, but crashes
do occur when attempting to CSE DerivedIV recipes. The root cause of the
crash is visiting a non-dominated use before a def, and hence would be
fixed by a reverse post-order traversal.

Fixes #187499.

Co-authored-by: Luke Lau <luke@igalia.com>
2026-03-31 10:40:03 +01:00
Mel Chen
f76f41f702
[VPlan] Extract reverse mask from reverse accesses (#155579)
Following #146525, separate the reverse mask from reverse access
recipes.
At the same time, remove the unused member variable `Reverse` from
`VPWidenMemoryRecipe`.
This will help to reduce redundant reverse mask computations by
VPlan-based common subexpression elimination.
2026-03-31 08:51:15 +00:00
Ramkumar Ramachandra
8a4f21048f
[VPlan] Generalize noalias-licm-check to replicate regions (NFC) (#187017)
In order to use the cannotHoistOrSinkWithNoAlias check in use-sites
after replicate regions are created, generalize it to work with
replicate regions.
2026-03-30 09:17:39 +01:00
Florian Hahn
b5d43f7794
[VPlan] Use transferSuccessors in mergeBlocksIntoPredecessors (NFC). (#189275)
transferSuccessors is more compact and is guaranteed to preserve the
predecessor/successor order properly in all cases. This is not an issue
today, but will when used in more places, including #186252.

Split off from approved
https://github.com/llvm/llvm-project/pull/186252.

PR: https://github.com/llvm/llvm-project/pull/189275
2026-03-29 20:20:23 +01:00
Florian Hahn
c467d38090
[LV] Fix offset handling for epilogue resume values. (NFCI) (#189259)
Instead of replacing all uses of the canonical IV with an add of the
resume value and then relying on the fold to simplify, directly create
offset versions of both the canonical IV and its increment.

The original offset computation were incorrect, but not resulted in
mis-compiles due to the corresponding fold.

Split off from approved
https://github.com/llvm/llvm-project/pull/156262.
2026-03-29 17:04:50 +00:00
Ramkumar Ramachandra
840e9a4ddd
[VPlan] Fix wrap-flags on WidenInduction unroll (#187710)
Due to a somewhat recent change, IntOrFpInduction recipes have
associated VPIRFlags. The VPlanUnroll logic for WidenInduction recipes
predates this change, and computes incomplete wrap-flags: update it to
simply use the flags on IntOrFpInduction recipes; PointerInduction
recipes have no associated flags, and indeed, no flags should be used.
2026-03-27 13:26:04 +00:00
Florian Hahn
90c1c588f8
[VPlan] Don't set WrapFlags for truncated IVs. (#188966)
The wrap flags from the IV bin-op are not guaranteed to apply to
truncated inductions, which are evaluated in narrower types.

Instead of dropping them late (in expandVPWidenIntOrFpInduction), do not
add them at the outset, the prevent invalid transforms based on
incorrect flags in the future.

PR: https://github.com/llvm/llvm-project/pull/188966
2026-03-27 12:39:03 +00:00
Florian Hahn
f8fe67c998
[VPlan] Expose cloneFrom and mergeBlocksIntoPredecessors. (NFC) (#188818)
Move cloneFrom from a file-static function in VPlan.cpp to a public
static method VPBlockUtils::cloneFrom, and move
mergeBlocksIntoPredecessors from a file-static function in
VPlanTransforms.cpp to a public static method
VPlanTransforms::mergeBlocksIntoPredecessors.

This is in preparation for dissolving replicate regions which needs both
utilities.

Split off from approved
https://github.com/llvm/llvm-project/pull/170212.

PR: https://github.com/llvm/llvm-project/pull/188818
2026-03-26 19:07:33 +00:00
Ramkumar Ramachandra
76a9692254
[VPlan] Sink single-scalar replicates in licm (#187047)
Refine the replicate bail-out in licm to permit single-scalar
replicates.
2026-03-26 14:42:57 +00:00
Florian Hahn
40304d8fef
Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)" (#188589)
This reverts commit e30f9c19464bcf1bf1e9f69b63884fb78ad2d05d.

Re-land, now that the reported crash causing the revert has been fixed
as part of 77fb84889 (#187504).

Original message:

Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-03-26 10:14:10 +00:00
Luke Lau
065a39b9f7
[VPlan] Tighten SafeAVL matching in convertEVLExitCond. NFC (#179164)
Follow-up from
https://github.com/llvm/llvm-project/pull/178181#discussion_r2743630145
2026-03-25 18:11:51 +08:00
Benjamin Maxwell
249b086545
[LV] Fix crash when extends are not widened in partial reduction matching (#187782)
Fixes https://github.com/llvm/llvm-project/pull/185821#issuecomment-4098933551
2026-03-23 10:30:19 +00:00
Florian Hahn
c079372099 [VPlan] Add m_VPPhi pattern matcher and use in removeDeadRecipes (NFC).
Add m_VPPhi to match VPPhi instructions with exactly 2 operands.

Split off from https://github.com/llvm/llvm-project/pull/156262.
2026-03-22 19:49:47 +00:00
Ramkumar Ramachandra
1dfd268f10
[VPlan] Simplify mul x, -1 -> sub 0, x (#187551)
Simplify exactly as InstCombine does. A follow-up would include
simplifying add x, (sub 0, y) -> sub x, y.

Alive2 proof: https://alive2.llvm.org/ce/z/Af7QiD
2026-03-20 12:07:51 +00:00
Benjamin Maxwell
4b17135d14
[LV] Simplify matchExtendedReductionOperand() (NFCI) (#185821)
This updates `matchExtendedReductionOperand` so the simple case of
`UpdateR(PrevValue, ext(...))` is matched first as an early exit. The
binop matching is then flattened to remove the extra layer of the
`MatchExtends` lambda.
2026-03-20 09:29:28 +00:00
Graham Hunter
b227fab5a6
[NFC][LV] Introduce enums for uncountable exit detail and style (#184808)
Recursively splitting out some work from #183318; this covers
the enums for early exit loop type (none, readonly, readwrite)
and the style used (just readonly and
masked-handle-ee-in-scalar-tail for now) and refactoring for
basic use of those enums.
2026-03-19 14:17:25 +00:00
Elvis Wang
53f8f3b017
Reland [LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068) (#187199)
This patch replace the remaining LogicalAnd to vp.merge in the second
pass to not break the `m_RemoveMask` pattern in the optimizeMaskToEVL.

Also skip cost model comparison when the plan contains `vp_merge` which
won't be calculated by the legacy model.

This can help to remove header mask for FindLast reduction (CSA) loops.

Original PR: https://github.com/llvm/llvm-project/pull/184068
Original built-bot failure:
https://lab.llvm.org/buildbot/#/builders/213/builds/2497
2026-03-19 07:56:42 +08:00
Florian Hahn
fce100e26e
[VPlan] Fix masked_cond expansion.
masked_cond is used to combine early-exit conditions with masks from
predicate. The early-exit condition should only be evaluated if the mask
is true. Emit the mask first, to avoid incorrect poison propagation.

Fixes https://github.com/llvm/llvm-project/issues/187061.
2026-03-18 20:26:04 +00:00
Luke Lau
bf46a95f2c
[VPlan] Use target's index type for {First,Last}ActiveLane instead of i64 (#186361)
Fixes #186005

On RV32 with zve32x, i.e. no legal 64 bit types either scalar or vector,
@llvm.cttz.elts.i64 cannot be lowered and so returns an illegal cost for
scalable VFs. However VPInstruction::FirstActiveLane and
VPInstruction::LastActiveLane always use a hardcoded i64 type.

This causes a legacy/VPlan cost model mismatch in the live-out.ll test,
and in early-exit-live-out.ll prevents the scalable VF from being
chosen.

This PR teaches the two VPInstructions to use the target's index type,
i.e. the width of a pointer in the default address space, so it will
generate a 32 bit cttz.elts on RV32. This should be large enough to hold
the maximum number of elements in a vector, as if the vector was any
bigger it would imply it isn't accessible by memory.

I considered using the canonical IV type but I don't think that will
work since the canonical IV can be i64 on RV32, and it causes
regressions due to extra zexting on 64-bit targets with a 32-bit IV.
2026-03-18 15:01:21 +00:00
Elvis Wang
3eb8b788b7
Revert "[LV] Replace remaining LogicalAnd to vp.merge in EVL optimization." (#187170)
Reverts llvm/llvm-project#184068

This hit the cost model assertion in rva23 stage2 build bot.
https://lab.llvm.org/buildbot/#/builders/213/builds/2497
2026-03-18 09:21:40 +08:00
Elvis Wang
52089f895e
[LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068)
This patch replace the remaining LogicalAnd to vp.merge in the second
pass to not break the `m_RemoveMask` pattern in the optimizeMaskToEVL.

This can help to remove header mask for FindLast reduction (CSA) loops.

PR: https://github.com/llvm/llvm-project/pull/184068
2026-03-18 08:39:27 +08:00
Ramkumar Ramachandra
56d7920c09
[VPlan] Factor collectGroupedReplicateMemOps (NFC) (#186820)
Factor out a collectGroupedReplicateMemOps from
collectComplementaryPredicatedMemOps, so it can be re-used in other
places.
2026-03-17 09:15:46 +00:00
Elvis Wang
51b3b9b039
[LV] Optimize x && (x && y) -> x && y (#185806)
This patch removes the extra logical-and in `x && (x && y)` and `x && (y && x)` to `x && y`.
This helps to simplify mask calculation in the FindLast reduction and
exposes more opportunities to replace to EVL.

PR link: https://github.com/llvm/llvm-project/pull/185806
2026-03-17 13:03:04 +08:00
Ramkumar Ramachandra
92e44b247f
Reland [VPlan] Extend interleave-group-narrowing to WidenCast (#186454)
The patch was intially landed as bd5f9384, but then reverted due to an
underlying issue in narrowInterleaveGroups, described in #185860. The
issue has since been fixed. The reland is simply a conflict-resolved
version of the original patch, which includes an additonal test update.

WidenCast is very similar to Widen recipes.

Fixes #128062.
2026-03-16 12:21:48 +00:00
Ramkumar Ramachandra
616bf5abd1
[VPlan] Introduce VPlan::getDataLayout (NFC) (#186418) 2026-03-13 16:17:04 +00:00
Florian Hahn
cbb8e08192
[VPlan] Don't narrow wide loads for scalable VFs when narrowing IGs. (#186181)
For scalable VFs, the narrowed plan processes vscale iterations at once,
so a shared wide load cannot be narrowed to a uniform scalar; bail out,
as there currently is not way to create a narrowed load that loads
vscale elements.

Fixes https://github.com/llvm/llvm-project/issues/185860.

PR: https://github.com/llvm/llvm-project/pull/186181
2026-03-13 16:04:42 +00:00
Florian Hahn
579aca8755
[VPlan] Prevent uses of materialized VPSymbolicValues. (NFC) (#182318)
After VPSymbolicValues (like VF and VFxUF) are materialized via
replaceAllUsesWith, they should not be accessed again. This patch:

1. Tracks materialization state in VPSymbolicValue.

2. Asserts if the materialized VPValue is used again. Currently it
   adds asserts to various member functions, preventing calling them
   on materialized symbolic values.

Note that this still allows some uses (e.g. comparing VPSymbolicValue
references or pointers), but this should be relatively harmless given
that it is impossible to (re-)add any users. If we want to further
tighten the checks, we could add asserts to the accessors or override
operator&, but that will require more changes and not add much extra
guards I think.

Depends on https://github.com/llvm/llvm-project/pull/182146 to fix a
current access violation.

PR: https://github.com/llvm/llvm-project/pull/182318
2026-03-13 14:39:46 +00:00
Ramkumar Ramachandra
540ea54ad7
Revert "[VPlan] Extend interleave-group-narrowing to WidenCast" (#186072)
This reverts commit bd5f9384 (#183204) to buy us time to investigate a
AArch64 SVE-fixed-length buildbot miscompile.

Ref: https://lab.llvm.org/buildbot/#/builders/143/builds/14601
2026-03-12 11:37:09 +00:00
Benjamin Maxwell
430e2b7b79
[LV] Simplify the chain traversal in getScaledReductions() (NFCI) (#184830)
I found the logic of this function quite hard to reason about. This
patch attempts to rectify this by splitting out matching an extended
reduction operand and traversing reduction chain.

- `matchExtendedReductionOperand()` contains all the logic to match an
  extended operand.
- `getScaledReductions()` validates each operation in the chain,
  starting backwards from the exit value, walking up through the operand
  that is not extended.
2026-03-11 06:39:20 +00:00
Florian Hahn
c79a058a6a
[VPlan] Materialize VectorTripCount in narrowInterleaveGroups. (#182146)
When narrowInterleaveGroups transforms a plan, VF and VFxUF are
materialized (replaced with concrete values). This patch also
materializes the VectorTripCount in the same transform.

This ensures that VectorTripCount is properly computed when the narrow
interleave transform is applied, instead of using the original VF
+ UF to compute the vector trip count. The previous behavior generated
correct code, but executed fewer iterations in the vector loop.

The change also enables stricter verification prevent accesses of UF,
VF, VFxUF etc after materialization as follow-up.

Note that in some cases we no miss branch folding, but that should be
addressed separately, https://github.com/llvm/llvm-project/pull/181252

Fixes one of the violations accessing a VectorTripCount after UF and VF
being materialized

PR: https://github.com/llvm/llvm-project/pull/182146
2026-03-10 12:33:30 +00:00
Sander de Smalen
0da00c325b
[LV] Support float and pointer FindLast reductions (#184101)
This duplicates #182313 with some very small modifications on top, as
@dheaton-arm is unable
to finish the PR and I'm unable to push to his branch.

Expands support for the `FindLast` Reccurence Kind to floating-point and
pointer types, thereby
enabling conditional scalar assignment (CSA) for these types.

Originally authored by @dheaton-arm

---------

Co-authored-by: Damian Heaton <Damian.Heaton@arm.com>
2026-03-09 10:27:06 +00:00
Aiden Grossman
e30f9c1946 Revert "Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)""
This reverts commit 6aa115bba55054b0dc81ebfc049e8c7a29e614b2.

This is causing crashes. See #185345 for details.
2026-03-09 04:24:01 +00:00
Florian Hahn
2207296d3f
[VPlan] Fold constant trunc after EVL simplification.
This fixes a crash for the new test after
6aa115bba55054b0dc81ebfc049e8c7a29e614b2.
2026-03-08 19:31:20 +00:00
Florian Hahn
6aa115bba5
Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)"
This reverts commit d7e037c8383e66e5c07897f144f6d8ef47258682.

Recommit with a small fix to properly handle ordered reductions when
connecting the epilogue.

Original message:

Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-03-08 11:13:40 +00:00
Florian Hahn
2ce5f91425
[VPlan] Optimize resume values of IVs together with other exit values. (#174239)
Remove updateScalarResumePhis and create extracts for live-outs early in
addInitialSkeleton. Instead of extracting the from the header phi
recipes for the resume values (which is incorrect), extract the last
lane of the backedege value.

Then update optimizeInductionExitUsers to optimize both the scalar
resume values for IVs and IV exit values together.

This removes the need to pass state between transforms and addresses a
TODO.

PR: https://github.com/llvm/llvm-project/pull/174239
2026-03-06 17:05:53 +00:00
Florian Hahn
d316fb0797
[VPlan] Replicate VPScalarIVStepsRecipe by VF outside replicate regions. (#170053)
Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the
patch adds a new lane operand to VPScalarIVStepsRecipe, which is only
added when replicating. This enables removing a number of lane 0
computations. The lane operand will also be used to explicitly replicate
replicate regions in a follow-up.

Depends on https://github.com/llvm/llvm-project/pull/169796
Depends on https://github.com/llvm/llvm-project/pull/170906

PR: https://github.com/llvm/llvm-project/pull/170053
2026-03-05 12:42:20 +00:00
Ramkumar Ramachandra
ca0d100e79
[VPlan] Use VPlan::getZero to improve code (NFC) (#184591) 2026-03-04 21:21:35 +00:00
Florian Hahn
c370f5af6c
[VPlan] Preserve IsSingleScalar for hoisted predicated load. (#184453)
The predicated loads may be single scalar (e.g. for VF = 1). We should
preserve IsSingleScalar when hoisting them. As all loops access the same
address, IsSingleScalar must match across all loads in the group.

This fixes an assertion when interleaving-only with hoisted loads.

Fixes https://github.com/llvm/llvm-project/issues/184372

PR: https://github.com/llvm/llvm-project/pull/184453
2026-03-04 14:32:00 +00:00
Florian Hahn
bbde3e3b59
[VPlan] Preserve IsSingleScalar for sunken predicated stores. (#184329)
The predicated stores may be single scalar (e.g. for VF = 1). We should
preserve IsSingleScalar. As all stores access the same address,
IsSingleScalar must match across all stores in the group.

This fixes an assertion when interleaving-only with sunken stores.

Fixes https://github.com/llvm/llvm-project/issues/184317

PR: https://github.com/llvm/llvm-project/pull/184329
2026-03-03 14:08:00 +00:00
Ramkumar Ramachandra
b4743b2641
[VPlan] Introduce VPlan::get(Zero|AllOnes) (NFC) (#184085) 2026-03-03 09:47:05 +00:00
Luke Lau
bcc272b322
[LV] Remove DataAndControlFlowWithoutRuntimeCheck. NFC (#183762)
After #144963 and #183292 we never emit the runtime check, so
DataAndControlFlowWithoutRuntimeCheck is equivalent to
DataAndControlFlow.

With that we only need to store one tail folding style instead of two,
because we don't need to distinguish whether or not the IV update
overflows (to a non-zero value)
2026-03-02 21:14:04 +08:00