2942 Commits

Author SHA1 Message Date
Hassnaa Hamdi
6bf8279dc2
[LV][NFC] correct comment for isScalarEpilogueAllowed() (#190254)
The comment had the opposite meaning of what the function actually does.
2026-04-05 12:55:36 +01:00
Ramkumar Ramachandra
d835dd2b43
[LV] Strip createStepForVF (NFC) (#185668)
The mul -> shl simplification is already done in VPlan.
2026-04-02 10:04:37 +01:00
Florian Hahn
0b61cd39e4
[LV] Add epilogue minimum iteration check in VPlan as well. (#189372)
Update LV to also use the VPlan-based addMinimumIterationCheck for the
iteration count check for the epilogue.

As the VPlan-based addMinimumIterationCheck uses VPExpandSCEV, those
need to be placed in the entry block for now, moving vscale * VF * IC to
the entry for scalable vectors.

The new logic also fails to simplify some checks involving PtrToInt,
because they were only simplified when going through generated IR, then
folding some PtrToInt in IR, then constructing SCEVs again. But those
should be cleaned up by later combines, and there is not really much we
can do other than trying to go through IR.

PR: https://github.com/llvm/llvm-project/pull/189372
2026-04-01 15:47:41 +01:00
Ramkumar Ramachandra
3068132e32
[LV] Use bind_front in tryToOptimizeInductionTruncate (NFC) (#189763) 2026-04-01 08:19:49 +01:00
Florian Hahn
ff4e229f8c
Revert "[VPlan] Extract reverse mask from reverse accesses" (#189637)
Reverts llvm/llvm-project#155579

Assertion added triggers on some buildbots
clang:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840:
virtual InstructionCost
llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &)
const: Assertion `!IsReverse() && "Inconsecutive memory access should
not have reverse order"' failed.
PLEASE submit a bug report to
https://github.com/llvm/llvm-project/issues/ and include the crash
backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang
-DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3
-std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o
-MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o
CMakeFiles/timberwolfmc.dir/finalpin.c.o -c
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c
2026-03-31 15:53:01 +01:00
Mel Chen
f76f41f702
[VPlan] Extract reverse mask from reverse accesses (#155579)
Following #146525, separate the reverse mask from reverse access
recipes.
At the same time, remove the unused member variable `Reverse` from
`VPWidenMemoryRecipe`.
This will help to reduce redundant reverse mask computations by
VPlan-based common subexpression elimination.
2026-03-31 08:51:15 +00:00
Hassnaa Hamdi
1d48c3bc01
[NFC][LV] Separate control-flow masking from tail-folding masking. (#169509)
- Differentiate between operations that need masking because they are in
a conditionally-executed block, and operations that need masking because
the loop is tail-folded (predicated).

- This is needed for future work when we need to support a predicated
vector epilogue in combination with an unpredicated vector body.
- This is first patch in a series.
- See #181401 for the follow-on work.
2026-03-30 17:09:34 +01:00
Florian Hahn
c467d38090
[LV] Fix offset handling for epilogue resume values. (NFCI) (#189259)
Instead of replacing all uses of the canonical IV with an add of the
resume value and then relying on the fold to simplify, directly create
offset versions of both the canonical IV and its increment.

The original offset computation were incorrect, but not resulted in
mis-compiles due to the corresponding fold.

Split off from approved
https://github.com/llvm/llvm-project/pull/156262.
2026-03-29 17:04:50 +00:00
Florian Hahn
3b58a740d3
[VPlan] Move flags to separate variable (NFC).
Addresses post-commit feedback missed in
https://github.com/llvm/llvm-project/pull/188966.
2026-03-27 12:44:11 +00:00
Florian Hahn
90c1c588f8
[VPlan] Don't set WrapFlags for truncated IVs. (#188966)
The wrap flags from the IV bin-op are not guaranteed to apply to
truncated inductions, which are evaluated in narrower types.

Instead of dropping them late (in expandVPWidenIntOrFpInduction), do not
add them at the outset, the prevent invalid transforms based on
incorrect flags in the future.

PR: https://github.com/llvm/llvm-project/pull/188966
2026-03-27 12:39:03 +00:00
Florian Hahn
5aae014ed5
[LV] Refine tripcount estimate using minimum iteration count rt check. (#188135)
When not folding the tail the minimum iteration count check ensures that
the vector loop is not executed if computing the trip count wraps around
to zero, as the trip count must be at least VF when vectorizing without
tail-folding.

Add and use a new tryToRefineConstantMaxTripCount helper. This ensures
we do not create dead main loops when vectorizing the epilogue, as we
choose smaller main VFs.

PR: https://github.com/llvm/llvm-project/pull/188135
2026-03-26 20:48:53 +00:00
Florian Hahn
40304d8fef
Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)" (#188589)
This reverts commit e30f9c19464bcf1bf1e9f69b63884fb78ad2d05d.

Re-land, now that the reported crash causing the revert has been fixed
as part of 77fb84889 (#187504).

Original message:

Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-03-26 10:14:10 +00:00
Florian Hahn
77fb848894
Reapply "[LV] Simplify and unify resume value handling for epilogue vec." (#187504)
This reverts commit cdaf29f84dd0abbd1f961982799059c92d76625b.

This version skips removeBranchOnConst when vectorizing the epilogue, as
it may trigger folds that remove the resume phi used as resume value
from the epilogue.

This fixes https://github.com/llvm/llvm-project/issues/187323.

Original message:
This patch tries to drastically simplify resume value handling for the
scalar loop when vectorizing the epilogue.

It uses a simpler, uniform approach for updating all resume values in
the scalar loop:

1. Create ResumeForEpilogue recipes for all scalar resume phis in the
main loop (the epilogue plan will have exactly the same scalar resume
phis, in exactly the same order)
2. Update ::execute for ResumeForEpilogue to set the underlying value
when executing. This is not super clean, but allows easy lookup of the
generated IR value when we update the resume phis in the epilogue. Once
we connect the 2 plans together explicitly, this can be removed.
3. Use the list of ResumeForEpilogue VPInstructions from the main loop
to update the resume/bypass values from the epilogue.

This simplifies the code quite a bit, makes it more robust (should fix
https://github.com/llvm/llvm-project/issues/179407) and also fixes a
mis-compile in the existing tests (see change in

llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll,
where previously we would incorrectly resume using the start value when
the epilogue iteration check failed)

In some cases, we get simpler code, due to additional CSE, in some cases
the induction end value computations get moved from the epilogue
iteration check to the vector preheader. We could try to sink the
instructions as cleanup, but it is probably not worth the trouble.

Fixes https://github.com/llvm/llvm-project/issues/179407.

PR for recommit https://github.com/llvm/llvm-project/pull/188134
2026-03-23 22:09:40 +00:00
Alan Zhao
c624851037
[LoopVectorize] Fix an integer narrowing conversion in getPredBlockCostDivisor(...) (#187605)
`LoopVectorizationCostModel::getPredBlockCostDivisor(...)` may return
large `uint64_t` values that get coerced to an `unsigned` by
`VPCostContext::getPredBlockCostDivisor(...)`, which can cause division
by zero.

Fixes #187584
2026-03-23 17:22:05 +00:00
Florian Hahn
19b0c68ee0
[VPlan] Skip epilogue vectorization if dead after narrowing IGs. (#187016)
When narrowing interleave groups, the main vector loop processes IC
iterations instead of VF * IC. Update selectEpilogueVectorizationFactor
to use the effective VF, checking if the canonical IV controlling the
loop now steps by UF instead of VFxUF.

This avoids epilogue vectorization with dead epilogue vector loops and
also prevents crashes in cases where we can prove both the epilogue and
scalar loop are dead.

Fixes https://github.com/llvm/llvm-project/issues/186846

PR: https://github.com/llvm/llvm-project/pull/187016
2026-03-20 12:33:16 +00:00
Sander de Smalen
a971089cb8
[LV] Explain why a less profitable VF was chosen (NFCI) (#187469)
I was very puzzled the other day when it showed that VF 8 had a cost of
X and VF 16 had a cost of X/2, yet it still choose VF 8. This PR adds
some extra debug output to explain why this happens.
2026-03-20 07:21:17 +00:00
Florian Hahn
fd3cf1c160
[LV] Move dereferenceability check from Legal to VPlan (NFC) (#185323)
Instead of checking dereferenceability early during
LoopVectorizationLegality, defer the check to VPlan construction via
areAllLoadsDereferenceable.

This in preparation for supporting early exit vectorization of
non-dereferencable loads, e.g. via speculative loads
(https://discourse.llvm.org/t/rfc-provide-intrinsics-for-speculative-loads/89692)
or first-faulting loads. Detection in VPlan allows easily replacing
potentially non-deref loads with other loads as needed.

PR: https://github.com/llvm/llvm-project/pull/185323
2026-03-19 19:21:45 +00:00
Florian Hahn
cdaf29f84d
Revert "[LV] Simplify and unify resume value handling for epilogue vec." (#187504)
Reverts llvm/llvm-project#185969

This is suspected to cause a miscompile in 549.fotonik3d_r from SPEC 2017 FP
2026-03-19 14:38:37 +00:00
Graham Hunter
b227fab5a6
[NFC][LV] Introduce enums for uncountable exit detail and style (#184808)
Recursively splitting out some work from #183318; this covers
the enums for early exit loop type (none, readonly, readwrite)
and the style used (just readonly and
masked-handle-ee-in-scalar-tail for now) and refactoring for
basic use of those enums.
2026-03-19 14:17:25 +00:00
Florian Hahn
78a8f00977
Revert "[VPlan] Create header phis once regions have been created (NFC)."
This reverts commit 91b928f919364b29e241821fc639b9ef56dab1a5.

This complicates some analysis that need the happen on the scalar VPlan,
before regions have been created, e.g.
https://github.com/llvm/llvm-project/pull/185323/.
2026-03-19 12:53:12 +00:00
Elvis Wang
53f8f3b017
Reland [LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068) (#187199)
This patch replace the remaining LogicalAnd to vp.merge in the second
pass to not break the `m_RemoveMask` pattern in the optimizeMaskToEVL.

Also skip cost model comparison when the plan contains `vp_merge` which
won't be calculated by the legacy model.

This can help to remove header mask for FindLast reduction (CSA) loops.

Original PR: https://github.com/llvm/llvm-project/pull/184068
Original built-bot failure:
https://lab.llvm.org/buildbot/#/builders/213/builds/2497
2026-03-19 07:56:42 +08:00
John Brawn
a083e19efe
[VPlan] Add the cost of spills when considering register pressure (#179646)
Currently when considering register pressure is enabled, we reject any
VF that has higher pressure than the number of registers. However this
can result in failing to vectorize in cases where it's beneficial, as
the cost of the extra spills is less than the benefit we get from
vectorizing.

Deal with this by instead calculating the cost of spills and adding that
to the rest of the cost, so we can detect this kind of situation and
still vectorize while avoiding vectorizing in cases where the extra cost
makes it not with it.
2026-03-18 15:30:39 +00:00
Alexis Engelke
080bc25728
[IR][NFCI] Remove *WithoutDebug (#187240)
The function instructionsWithoutDebug serves two uses: skipping debug
intrinsics and skipping pseudo instructions. Nonetheless, these
functions are expensive due to out-of-line filtering using
std::function. Ideally, the filter should be inlined, but that would
require including IntrinsicInst.h in BasicBlock.h.

We no longer use debug intrinsics, so the first use (parameter false) is
no longer needed. The second use is sometimes needed, but the
distinction between PseudoProbe instructions can be made at the call
sites more easily in many cases.

Therefore, remove instructionsWithoutDebug/sizeWithoutDebug.

c-t-t stage2-O3 -0.21%.
2026-03-18 15:08:41 +00:00
Florian Hahn
91b928f919
[VPlan] Create header phis once regions have been created (NFC).
Since 1b29ac1d1857ea42273fc7862ea019a74a55195d, regions are constructed
as part of the scalar transforms; moving header phi creation after
region creation slightly simplifies the code.
2026-03-17 08:02:56 +00:00
Florian Hahn
013f2542a2
[LV] Simplify and unify resume value handling for epilogue vec. (#185969)
This patch tries to drastically simplify resume value handling for the
scalar loop when vectorizing the epilogue.

It uses a simpler, uniform approach for updating all resume values in
the scalar loop:

1. Create ResumeForEpilogue recipes for all scalar resume phis in the
main loop (the epilogue plan will have exactly the same scalar resume
phis, in exactly the same order)
2. Update ::execute for ResumeForEpilogue to set the underlying value
when executing. This is not super clean, but allows easy lookup of the
generated IR value when we update the resume phis in the epilogue. Once
we connect the 2 plans together explicitly, this can be removed.
3. Use the list of ResumeForEpilogue VPInstructions from the main loop
to update the resume/bypass values from the epilogue.

This simplifies the code quite a bit, makes it more robust (should fix
https://github.com/llvm/llvm-project/issues/179407) and also fixes a
mis-compile in the existing tests (see change in
llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll,
where previously we would incorrectly resume using the start value when
the epilogue iteration check failed)

In some cases, we get simpler code, due to additional CSE, in some cases
the induction end value computations get moved from the epilogue
iteration check to the vector preheader. We could try to sink the
instructions as cleanup, but it is probably not worth the trouble.

Fixes https://github.com/llvm/llvm-project/issues/179407.
2026-03-16 21:21:59 +00:00
Florian Hahn
9de31c4a3e
[VPlan] Create zero resume value for CanIV directly (NFC).
The start value of the canonical IV is always 0. Assert and generate
zero VPValue manually in preparation for
https://github.com/llvm/llvm-project/pull/156262. Split off as
suggested.
2026-03-14 21:13:20 +00:00
Alexis Engelke
efcd3b6108
[IPO][InstCombine][Vectorize][NFCI] Drop uses of BranchInst (#186596)
Refactor remaining parts of Transforms apart from Scalar and Utils.
2026-03-14 17:49:00 +00:00
Florian Hahn
1b29ac1d18
[LV] Move predication, early exit & region handling to VPlan0 (NFCI) (#185305)
Move handleEarlyExits, predication and region creation to operate
directly on VPlan0. This means they only have to run once, reducing
compile time a bit; the relative order remains unchanged.

Introducing the regions at this point in particular unlocks performing
more transforms once, on the initial VPlan, instead of running them for
each VF.

Whether a scalar epilogue is required is still determined by legacy cost
model, so we need to still account for that in the VF specific VPlan
logic.

PR: https://github.com/llvm/llvm-project/pull/185305
2026-03-14 17:14:08 +00:00
Florian Hahn
475cc4fe0b
[VPlan] Account for any-of costs in legacy cost model
Some VPlan transforms, like vectorizing fmin without fast-math,
introduce AnyOfs, which have costs assigned in the VPlan-based cost
model, but not the legacy cost model. Account for their cost like done
for other similar VPInstrctions, like EVL.

Fixes https://github.com/llvm/llvm-project/issues/185867.
2026-03-12 21:51:23 +00:00
Alexis Engelke
4fd826d1f9
[IR] Split Br into UncondBr and CondBr (#184027)
BranchInst currently represents both unconditional and conditional
branches. However, these are quite different operations that are often
handled separately. Therefore, split them into separate opcodes and
classes to allow distinguishing these operations in the type system.
Additionally, this also slightly improves compile-time performance.
2026-03-11 12:31:10 +00:00
Florian Hahn
c79a058a6a
[VPlan] Materialize VectorTripCount in narrowInterleaveGroups. (#182146)
When narrowInterleaveGroups transforms a plan, VF and VFxUF are
materialized (replaced with concrete values). This patch also
materializes the VectorTripCount in the same transform.

This ensures that VectorTripCount is properly computed when the narrow
interleave transform is applied, instead of using the original VF
+ UF to compute the vector trip count. The previous behavior generated
correct code, but executed fewer iterations in the vector loop.

The change also enables stricter verification prevent accesses of UF,
VF, VFxUF etc after materialization as follow-up.

Note that in some cases we no miss branch folding, but that should be
addressed separately, https://github.com/llvm/llvm-project/pull/181252

Fixes one of the violations accessing a VectorTripCount after UF and VF
being materialized

PR: https://github.com/llvm/llvm-project/pull/182146
2026-03-10 12:33:30 +00:00
Aiden Grossman
e30f9c1946 Revert "Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)""
This reverts commit 6aa115bba55054b0dc81ebfc049e8c7a29e614b2.

This is causing crashes. See #185345 for details.
2026-03-09 04:24:01 +00:00
Florian Hahn
6aa115bba5
Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)"
This reverts commit d7e037c8383e66e5c07897f144f6d8ef47258682.

Recommit with a small fix to properly handle ordered reductions when
connecting the epilogue.

Original message:

Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-03-08 11:13:40 +00:00
Florian Hahn
2ce5f91425
[VPlan] Optimize resume values of IVs together with other exit values. (#174239)
Remove updateScalarResumePhis and create extracts for live-outs early in
addInitialSkeleton. Instead of extracting the from the header phi
recipes for the resume values (which is incorrect), extract the last
lane of the backedege value.

Then update optimizeInductionExitUsers to optimize both the scalar
resume values for IVs and IV exit values together.

This removes the need to pass state between transforms and addresses a
TODO.

PR: https://github.com/llvm/llvm-project/pull/174239
2026-03-06 17:05:53 +00:00
Benjamin Maxwell
03c34bb59e
[LV] Support interleaving with FindLast reductions (#184099)
This extends the existing support to work with arbitrary interleave
factors. The main change here is reworking the ExtractLastActive
VPInstruction to take a variable amount of arguments and handling it in
unrollRecipeByUF and VPInstruction::generate.

The select condition for all mask/data values in a find-last recurrence
is the true if the mask for any part is true. Because of this the masks
for inactive parts will be updated to all-false when the parts with
active lanes are updated. This ensures the mask/data for last active
element always corresponds to the greatest part with an active lane.

This means finding the last element in the middle block simply requires
chaining the `extract.last.active` to forward the result from the last
active part through any inactive parts ahead of it.
2026-03-06 15:30:58 +00:00
Luke Lau
825129378e
[VPlan] Move tail folding out of VPlanPredicator. NFC (#176143)
Currently the logic for introducing a header mask and predicating the
vector loop region is done inside introduceMasksAndLinearize.

This splits the tail folding part out into an individual VPlan transform
so that VPlanPredicator.cpp doesn't need to worry about tail folding,
which seemed to be a temporary measure according to a comment in
VPlanTransforms.h.

To perform tail folding independently, this splits the "body" of the
vector loop region between the phis in the header and the branch + iv
increment in the latch:

Before:

```
+-------------------------------------------+
|%iv = ...                                  |
|...                                        |
|%iv.next = add %iv, vfxuf                  |
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+
```

After:
```
+-------------------------------------------+
|%iv = ...                                  |
|%wide.iv = widen-canonical-iv ...          |
|%header-mask = icmp ule %wide.iv, BTC      |---+
|branch-on-cond %header-mask                |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|...                                        |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|%iv.next = add %iv, vfxuf                  |<--+
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+
```

Phis are then inserted in the latch for any value in the loop body that
have outside uses, with poison as their incoming value from the header
edge.

The motivation for this is to allow us to share the same "predicate all
successor blocks" type of predication we do for tail folding, but for
early-exit loops in #172454. This may also allow us to directly emit an
EVL based header mask, instead of having to match + transform the
existing header mask in addExplicitVectorLength.

This also allows us to eventually handle recurrences in the same
transform, avoiding the need to special case tail folding in
addReductionResultComputation.
2026-03-05 08:17:37 +00:00
Benjamin Maxwell
c6bb6a7e42
[LV] Add -force-target-supports-masked-memory-ops option (#184325)
This can be used to make target agnostic tail-folding tests much less
verbose, as masked loads/stores can be used rather than scalar
predication.
2026-03-04 13:36:29 +00:00
Luke Lau
bcc272b322
[LV] Remove DataAndControlFlowWithoutRuntimeCheck. NFC (#183762)
After #144963 and #183292 we never emit the runtime check, so
DataAndControlFlowWithoutRuntimeCheck is equivalent to
DataAndControlFlow.

With that we only need to store one tail folding style instead of two,
because we don't need to distinguish whether or not the IV update
overflows (to a non-zero value)
2026-03-02 21:14:04 +08:00
Tomer Shafir
265c1f4833
[LV] Add debug print for TTI.MaxInterleaveFactor (NFC) (#183309)
As its not currently visible in the debug output.

---------

Co-authored-by: Sander de Smalen <sander.desmalen@arm.com>
2026-03-02 10:21:58 +02:00
Florian Hahn
3cf53f684d
[LV] Handle sunk reverse VPInstruction in planContainsAdditionalSimps.
Licm can now sink reverse VPInstructions outside the loop region; they
won't be considered when computing costs. Account for that in
planContainsAdditionalSimplifications.

Fixes https://github.com/llvm/llvm-project/issues/183592.
2026-03-01 18:44:46 +00:00
Benjamin Maxwell
74c0ee7e72
[TTI] Remove TargetLibraryInfo from IntrinsicCostAttributes (NFC) (#183764)
This is a remnant from when `sincos` costs used the vector mappings from
`TargetLibraryInfo::getVectorMappingInfo`.
2026-03-01 10:16:16 +00:00
Florian Hahn
d7e037c838
Revert "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)"
This reverts commit 9c53215d213189d1f62e8f6ee7ba73a089ac2269.

Appears to cause crashes with ordered reductions, revert while I
investigate
2026-02-27 21:29:41 +00:00
Florian Hahn
9c53215d21
[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)
Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-02-27 16:49:54 +00:00
Luke Lau
d43213fe80
Revert "[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301)" (#183698)
This reverts commit b0b3e3e1c7f6387eabc2ef9ff1fea311e63a4299.

After thinking about this for a bit, I don't think this is correct.
vscale being a power-of-2 only guarantees the canonical IV increment
overflows to zero, but not overflows in general.
2026-02-27 09:13:33 +00:00
Luke Lau
b0b3e3e1c7
[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301)
After #183080 vscale can no longer be a non-power of 2, which means the
canonical IV can't overflow with tail folding w/ scalable vectors
anymore. Therefore we don't need to drop the NUW flag.

IVUpdateMayOverflow is left to be removed in a separate PR since it
removes further runtime checks.
2026-02-27 07:19:49 +00:00
Florian Hahn
32b8b9ba1e
[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507)
Now that we have ExitingIVValue, we can also use it for tail-folded
loops; the only difference is that we have to compute the end value with
the original trip count instead the vector trip count.

This allows removing the induction increment operand only used when
tail-folding.

PR: https://github.com/llvm/llvm-project/pull/182507
2026-02-26 11:48:04 +00:00
Benjamin Maxwell
3c566a698a
[LV] Fix miscompile with conditional scalar assignment + tail folding (#182492)
Previously, we could miscompile when vectorizing conditional scalar
assignments with forced tail folding, as the backedge select could be
based on the header mask, not the assignment conditional.

This resulted in a number of failures in the LLVM test suite when
building with `-O3 -march=armv8-a+sve -mllvm
-prefer-predicate-over-epilogue=predicate-dont-vectorize`.

The patch reworks `handleFindLastReductions()` to correctly handle tail
folding.
2026-02-26 09:00:16 +00:00
Luke Lau
1e560c181a
[LV] Remove CheckNeededWithTailFolding from addMinimumIterationCheck. NFC (#183066)
The IV can no longer overflow with tail folding after #183080.
2026-02-25 18:08:16 +00:00
Luke Lau
a8f5f4a9fc
[VPlan] Assert vplan-verify-each result and fix LastActiveLane verification (#182254)
Currently if -vplan-verify-each is enabled and a pass fails the
verifier, it will output the failure to stderr but will still finish
with a zero exit code.

This adds an assert that the verification fails so that e.g. lit will
pick up verifier failures in the in-tree tests with an EXPENSIVE_CHECKS
build.

Currently the LastActiveLane verification fails in several tests, so
this also includes a fix to handle more prefix masks. All of the prefix
masks that the verifier encounters are of the form `icmp ult/ule
monotonically-increasing-sequence, uniform`, which always generate a
prefix mask.

Tested that llvm-test-suite + SPEC CPU 2017 now pass with
-vplan-verify-each enabled for RISC-V.
2026-02-26 01:49:39 +08:00
Paul Walker
ab360b1e7e
[LLVM][TTI] Remove the isVScaleKnownToBeAPowerOfTwo hook. (#183292)
After https://github.com/llvm/llvm-project/pull/183080 this is no longer
a configurable property.

NOTE: No test changes expected beyond
llvm/test/Transforms/LoopVectorize/scalable-predication.ll which has
been removed because it only existed to verfiy the now unsupported
functionality.
2026-02-25 14:09:52 +00:00