2925 Commits

Author SHA1 Message Date
Florian Hahn
cdaf29f84d
Revert "[LV] Simplify and unify resume value handling for epilogue vec." (#187504)
Reverts llvm/llvm-project#185969

This is suspected to cause a miscompile in 549.fotonik3d_r from SPEC 2017 FP
2026-03-19 14:38:37 +00:00
Graham Hunter
b227fab5a6
[NFC][LV] Introduce enums for uncountable exit detail and style (#184808)
Recursively splitting out some work from #183318; this covers
the enums for early exit loop type (none, readonly, readwrite)
and the style used (just readonly and
masked-handle-ee-in-scalar-tail for now) and refactoring for
basic use of those enums.
2026-03-19 14:17:25 +00:00
Florian Hahn
78a8f00977
Revert "[VPlan] Create header phis once regions have been created (NFC)."
This reverts commit 91b928f919364b29e241821fc639b9ef56dab1a5.

This complicates some analysis that need the happen on the scalar VPlan,
before regions have been created, e.g.
https://github.com/llvm/llvm-project/pull/185323/.
2026-03-19 12:53:12 +00:00
Elvis Wang
53f8f3b017
Reland [LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068) (#187199)
This patch replace the remaining LogicalAnd to vp.merge in the second
pass to not break the `m_RemoveMask` pattern in the optimizeMaskToEVL.

Also skip cost model comparison when the plan contains `vp_merge` which
won't be calculated by the legacy model.

This can help to remove header mask for FindLast reduction (CSA) loops.

Original PR: https://github.com/llvm/llvm-project/pull/184068
Original built-bot failure:
https://lab.llvm.org/buildbot/#/builders/213/builds/2497
2026-03-19 07:56:42 +08:00
John Brawn
a083e19efe
[VPlan] Add the cost of spills when considering register pressure (#179646)
Currently when considering register pressure is enabled, we reject any
VF that has higher pressure than the number of registers. However this
can result in failing to vectorize in cases where it's beneficial, as
the cost of the extra spills is less than the benefit we get from
vectorizing.

Deal with this by instead calculating the cost of spills and adding that
to the rest of the cost, so we can detect this kind of situation and
still vectorize while avoiding vectorizing in cases where the extra cost
makes it not with it.
2026-03-18 15:30:39 +00:00
Alexis Engelke
080bc25728
[IR][NFCI] Remove *WithoutDebug (#187240)
The function instructionsWithoutDebug serves two uses: skipping debug
intrinsics and skipping pseudo instructions. Nonetheless, these
functions are expensive due to out-of-line filtering using
std::function. Ideally, the filter should be inlined, but that would
require including IntrinsicInst.h in BasicBlock.h.

We no longer use debug intrinsics, so the first use (parameter false) is
no longer needed. The second use is sometimes needed, but the
distinction between PseudoProbe instructions can be made at the call
sites more easily in many cases.

Therefore, remove instructionsWithoutDebug/sizeWithoutDebug.

c-t-t stage2-O3 -0.21%.
2026-03-18 15:08:41 +00:00
Florian Hahn
91b928f919
[VPlan] Create header phis once regions have been created (NFC).
Since 1b29ac1d1857ea42273fc7862ea019a74a55195d, regions are constructed
as part of the scalar transforms; moving header phi creation after
region creation slightly simplifies the code.
2026-03-17 08:02:56 +00:00
Florian Hahn
013f2542a2
[LV] Simplify and unify resume value handling for epilogue vec. (#185969)
This patch tries to drastically simplify resume value handling for the
scalar loop when vectorizing the epilogue.

It uses a simpler, uniform approach for updating all resume values in
the scalar loop:

1. Create ResumeForEpilogue recipes for all scalar resume phis in the
main loop (the epilogue plan will have exactly the same scalar resume
phis, in exactly the same order)
2. Update ::execute for ResumeForEpilogue to set the underlying value
when executing. This is not super clean, but allows easy lookup of the
generated IR value when we update the resume phis in the epilogue. Once
we connect the 2 plans together explicitly, this can be removed.
3. Use the list of ResumeForEpilogue VPInstructions from the main loop
to update the resume/bypass values from the epilogue.

This simplifies the code quite a bit, makes it more robust (should fix
https://github.com/llvm/llvm-project/issues/179407) and also fixes a
mis-compile in the existing tests (see change in
llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll,
where previously we would incorrectly resume using the start value when
the epilogue iteration check failed)

In some cases, we get simpler code, due to additional CSE, in some cases
the induction end value computations get moved from the epilogue
iteration check to the vector preheader. We could try to sink the
instructions as cleanup, but it is probably not worth the trouble.

Fixes https://github.com/llvm/llvm-project/issues/179407.
2026-03-16 21:21:59 +00:00
Florian Hahn
9de31c4a3e
[VPlan] Create zero resume value for CanIV directly (NFC).
The start value of the canonical IV is always 0. Assert and generate
zero VPValue manually in preparation for
https://github.com/llvm/llvm-project/pull/156262. Split off as
suggested.
2026-03-14 21:13:20 +00:00
Alexis Engelke
efcd3b6108
[IPO][InstCombine][Vectorize][NFCI] Drop uses of BranchInst (#186596)
Refactor remaining parts of Transforms apart from Scalar and Utils.
2026-03-14 17:49:00 +00:00
Florian Hahn
1b29ac1d18
[LV] Move predication, early exit & region handling to VPlan0 (NFCI) (#185305)
Move handleEarlyExits, predication and region creation to operate
directly on VPlan0. This means they only have to run once, reducing
compile time a bit; the relative order remains unchanged.

Introducing the regions at this point in particular unlocks performing
more transforms once, on the initial VPlan, instead of running them for
each VF.

Whether a scalar epilogue is required is still determined by legacy cost
model, so we need to still account for that in the VF specific VPlan
logic.

PR: https://github.com/llvm/llvm-project/pull/185305
2026-03-14 17:14:08 +00:00
Florian Hahn
475cc4fe0b
[VPlan] Account for any-of costs in legacy cost model
Some VPlan transforms, like vectorizing fmin without fast-math,
introduce AnyOfs, which have costs assigned in the VPlan-based cost
model, but not the legacy cost model. Account for their cost like done
for other similar VPInstrctions, like EVL.

Fixes https://github.com/llvm/llvm-project/issues/185867.
2026-03-12 21:51:23 +00:00
Alexis Engelke
4fd826d1f9
[IR] Split Br into UncondBr and CondBr (#184027)
BranchInst currently represents both unconditional and conditional
branches. However, these are quite different operations that are often
handled separately. Therefore, split them into separate opcodes and
classes to allow distinguishing these operations in the type system.
Additionally, this also slightly improves compile-time performance.
2026-03-11 12:31:10 +00:00
Florian Hahn
c79a058a6a
[VPlan] Materialize VectorTripCount in narrowInterleaveGroups. (#182146)
When narrowInterleaveGroups transforms a plan, VF and VFxUF are
materialized (replaced with concrete values). This patch also
materializes the VectorTripCount in the same transform.

This ensures that VectorTripCount is properly computed when the narrow
interleave transform is applied, instead of using the original VF
+ UF to compute the vector trip count. The previous behavior generated
correct code, but executed fewer iterations in the vector loop.

The change also enables stricter verification prevent accesses of UF,
VF, VFxUF etc after materialization as follow-up.

Note that in some cases we no miss branch folding, but that should be
addressed separately, https://github.com/llvm/llvm-project/pull/181252

Fixes one of the violations accessing a VectorTripCount after UF and VF
being materialized

PR: https://github.com/llvm/llvm-project/pull/182146
2026-03-10 12:33:30 +00:00
Aiden Grossman
e30f9c1946 Revert "Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)""
This reverts commit 6aa115bba55054b0dc81ebfc049e8c7a29e614b2.

This is causing crashes. See #185345 for details.
2026-03-09 04:24:01 +00:00
Florian Hahn
6aa115bba5
Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)"
This reverts commit d7e037c8383e66e5c07897f144f6d8ef47258682.

Recommit with a small fix to properly handle ordered reductions when
connecting the epilogue.

Original message:

Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-03-08 11:13:40 +00:00
Florian Hahn
2ce5f91425
[VPlan] Optimize resume values of IVs together with other exit values. (#174239)
Remove updateScalarResumePhis and create extracts for live-outs early in
addInitialSkeleton. Instead of extracting the from the header phi
recipes for the resume values (which is incorrect), extract the last
lane of the backedege value.

Then update optimizeInductionExitUsers to optimize both the scalar
resume values for IVs and IV exit values together.

This removes the need to pass state between transforms and addresses a
TODO.

PR: https://github.com/llvm/llvm-project/pull/174239
2026-03-06 17:05:53 +00:00
Benjamin Maxwell
03c34bb59e
[LV] Support interleaving with FindLast reductions (#184099)
This extends the existing support to work with arbitrary interleave
factors. The main change here is reworking the ExtractLastActive
VPInstruction to take a variable amount of arguments and handling it in
unrollRecipeByUF and VPInstruction::generate.

The select condition for all mask/data values in a find-last recurrence
is the true if the mask for any part is true. Because of this the masks
for inactive parts will be updated to all-false when the parts with
active lanes are updated. This ensures the mask/data for last active
element always corresponds to the greatest part with an active lane.

This means finding the last element in the middle block simply requires
chaining the `extract.last.active` to forward the result from the last
active part through any inactive parts ahead of it.
2026-03-06 15:30:58 +00:00
Luke Lau
825129378e
[VPlan] Move tail folding out of VPlanPredicator. NFC (#176143)
Currently the logic for introducing a header mask and predicating the
vector loop region is done inside introduceMasksAndLinearize.

This splits the tail folding part out into an individual VPlan transform
so that VPlanPredicator.cpp doesn't need to worry about tail folding,
which seemed to be a temporary measure according to a comment in
VPlanTransforms.h.

To perform tail folding independently, this splits the "body" of the
vector loop region between the phis in the header and the branch + iv
increment in the latch:

Before:

```
+-------------------------------------------+
|%iv = ...                                  |
|...                                        |
|%iv.next = add %iv, vfxuf                  |
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+
```

After:
```
+-------------------------------------------+
|%iv = ...                                  |
|%wide.iv = widen-canonical-iv ...          |
|%header-mask = icmp ule %wide.iv, BTC      |---+
|branch-on-cond %header-mask                |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|...                                        |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|%iv.next = add %iv, vfxuf                  |<--+
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+
```

Phis are then inserted in the latch for any value in the loop body that
have outside uses, with poison as their incoming value from the header
edge.

The motivation for this is to allow us to share the same "predicate all
successor blocks" type of predication we do for tail folding, but for
early-exit loops in #172454. This may also allow us to directly emit an
EVL based header mask, instead of having to match + transform the
existing header mask in addExplicitVectorLength.

This also allows us to eventually handle recurrences in the same
transform, avoiding the need to special case tail folding in
addReductionResultComputation.
2026-03-05 08:17:37 +00:00
Benjamin Maxwell
c6bb6a7e42
[LV] Add -force-target-supports-masked-memory-ops option (#184325)
This can be used to make target agnostic tail-folding tests much less
verbose, as masked loads/stores can be used rather than scalar
predication.
2026-03-04 13:36:29 +00:00
Luke Lau
bcc272b322
[LV] Remove DataAndControlFlowWithoutRuntimeCheck. NFC (#183762)
After #144963 and #183292 we never emit the runtime check, so
DataAndControlFlowWithoutRuntimeCheck is equivalent to
DataAndControlFlow.

With that we only need to store one tail folding style instead of two,
because we don't need to distinguish whether or not the IV update
overflows (to a non-zero value)
2026-03-02 21:14:04 +08:00
Tomer Shafir
265c1f4833
[LV] Add debug print for TTI.MaxInterleaveFactor (NFC) (#183309)
As its not currently visible in the debug output.

---------

Co-authored-by: Sander de Smalen <sander.desmalen@arm.com>
2026-03-02 10:21:58 +02:00
Florian Hahn
3cf53f684d
[LV] Handle sunk reverse VPInstruction in planContainsAdditionalSimps.
Licm can now sink reverse VPInstructions outside the loop region; they
won't be considered when computing costs. Account for that in
planContainsAdditionalSimplifications.

Fixes https://github.com/llvm/llvm-project/issues/183592.
2026-03-01 18:44:46 +00:00
Benjamin Maxwell
74c0ee7e72
[TTI] Remove TargetLibraryInfo from IntrinsicCostAttributes (NFC) (#183764)
This is a remnant from when `sincos` costs used the vector mappings from
`TargetLibraryInfo::getVectorMappingInfo`.
2026-03-01 10:16:16 +00:00
Florian Hahn
d7e037c838
Revert "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)"
This reverts commit 9c53215d213189d1f62e8f6ee7ba73a089ac2269.

Appears to cause crashes with ordered reductions, revert while I
investigate
2026-02-27 21:29:41 +00:00
Florian Hahn
9c53215d21
[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)
Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-02-27 16:49:54 +00:00
Luke Lau
d43213fe80
Revert "[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301)" (#183698)
This reverts commit b0b3e3e1c7f6387eabc2ef9ff1fea311e63a4299.

After thinking about this for a bit, I don't think this is correct.
vscale being a power-of-2 only guarantees the canonical IV increment
overflows to zero, but not overflows in general.
2026-02-27 09:13:33 +00:00
Luke Lau
b0b3e3e1c7
[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301)
After #183080 vscale can no longer be a non-power of 2, which means the
canonical IV can't overflow with tail folding w/ scalable vectors
anymore. Therefore we don't need to drop the NUW flag.

IVUpdateMayOverflow is left to be removed in a separate PR since it
removes further runtime checks.
2026-02-27 07:19:49 +00:00
Florian Hahn
32b8b9ba1e
[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507)
Now that we have ExitingIVValue, we can also use it for tail-folded
loops; the only difference is that we have to compute the end value with
the original trip count instead the vector trip count.

This allows removing the induction increment operand only used when
tail-folding.

PR: https://github.com/llvm/llvm-project/pull/182507
2026-02-26 11:48:04 +00:00
Benjamin Maxwell
3c566a698a
[LV] Fix miscompile with conditional scalar assignment + tail folding (#182492)
Previously, we could miscompile when vectorizing conditional scalar
assignments with forced tail folding, as the backedge select could be
based on the header mask, not the assignment conditional.

This resulted in a number of failures in the LLVM test suite when
building with `-O3 -march=armv8-a+sve -mllvm
-prefer-predicate-over-epilogue=predicate-dont-vectorize`.

The patch reworks `handleFindLastReductions()` to correctly handle tail
folding.
2026-02-26 09:00:16 +00:00
Luke Lau
1e560c181a
[LV] Remove CheckNeededWithTailFolding from addMinimumIterationCheck. NFC (#183066)
The IV can no longer overflow with tail folding after #183080.
2026-02-25 18:08:16 +00:00
Luke Lau
a8f5f4a9fc
[VPlan] Assert vplan-verify-each result and fix LastActiveLane verification (#182254)
Currently if -vplan-verify-each is enabled and a pass fails the
verifier, it will output the failure to stderr but will still finish
with a zero exit code.

This adds an assert that the verification fails so that e.g. lit will
pick up verifier failures in the in-tree tests with an EXPENSIVE_CHECKS
build.

Currently the LastActiveLane verification fails in several tests, so
this also includes a fix to handle more prefix masks. All of the prefix
masks that the verifier encounters are of the form `icmp ult/ule
monotonically-increasing-sequence, uniform`, which always generate a
prefix mask.

Tested that llvm-test-suite + SPEC CPU 2017 now pass with
-vplan-verify-each enabled for RISC-V.
2026-02-26 01:49:39 +08:00
Paul Walker
ab360b1e7e
[LLVM][TTI] Remove the isVScaleKnownToBeAPowerOfTwo hook. (#183292)
After https://github.com/llvm/llvm-project/pull/183080 this is no longer
a configurable property.

NOTE: No test changes expected beyond
llvm/test/Transforms/LoopVectorize/scalable-predication.ll which has
been removed because it only existed to verfiy the now unsupported
functionality.
2026-02-25 14:09:52 +00:00
Luke Lau
0e9880cc04
[VPlan] Remove verifyLate from VPlanVerifier. NFC (#182799)
We can instead just check if the VPlan has been unrolled.
2026-02-23 23:06:37 +08:00
Jonas Paulsson
d3081aafc4
[SystemZ, LoopVectorizer] Enable vectorization of epilogue loops. (#172925)
This enables vectorization of epilogue loops produced by LoopVectorizer on
SystemZ.

LoopVectorizationCostModel::isEpilogueVectorizationProfitable() and
TTI.preferEpilogueVectorization() have been refactored slightly so that
targets can override preferEpilogueVectorization(ElementCount Iters) and
directly control this, whereas before this depended on
TTI.getMaxInterleaveFactor() as well.

The Iters passed to preferEpilogueVectorization() reflects the total number
of scalar iterations performed in the vectorized loop (including interleaving).

The default implementation of preferEpilogueVectorization() now subsumes
the old check against getMaxInterleaveFactor(). This patch should be NFC for
other targets.
2026-02-22 10:59:09 -06:00
Florian Hahn
4ce4987381
[VPlan] Optimize FindLast of FindIV w/o sentinel. (#172569)
For FindLast reduction selecting an IV, we can avoid the horizontal
AnyOf in the vector loop, by introducing an independent  boolean
reduction to track if the condition was ever true in the loop. If it was
never true in the loop, we select the start value, otherwise the select
the min/max of the FindIV reduction, as required by the predicate.

The main advantage of this approach is that we have 2 independent
reductions, that do not require a horizontal AnyOf reduction in the
loop.

Currently this requires a non-wrapping IV, but this can be relaxed in
the future by selecting a canonical IV, which is then transformed to the
specific derived IV for the reduction after the loop.

Depends on https://github.com/llvm/llvm-project/pull/177870.

PR: https://github.com/llvm/llvm-project/pull/172569
2026-02-20 21:48:35 +00:00
Luke Lau
6a5375fbce
[VPlan] Plumb recurrence FMFs through VPReductionPHIRecipe via VPIRFlags. NFC (#181694)
In order to be able to create selects for reduction phis through tail
folding in foldTailByMasking (#176143), make VPReductionPHIRecipe an
instance of VPIRFlags and plumb the FMFs from the original RdxDesc.

This allows us to remove more uses of the RecurrenceDescriptor in
addReductionResultComputation, which should help untie it from
LoopVectorizationLegality.
2026-02-19 11:23:47 +00:00
Florian Hahn
4042975b63
[LV] Support argmin/argmax with strict predicates. (#170223)
Extend handleMultiUseReductions to support strict predicates (>, <),
matching the first index instead of the last for non-strict predicates.

Builds on top of https://github.com/llvm/llvm-project/pull/141431.

FindLast reductions with strict predicates are adjusted to compute the
correct result as follows:

1. Find the first canonical indices corresponding to partial min/max
   values, using loop reductions.
2. Find which of the partial min/max values are equal to the overall
    min/max value.
3. Select among the canonical indices those corresponding to the overall
    min/max value.
4. Find the first canonical index of overall min/max and scale it back to
    the original IV using VPDerivedIVRecipe.
5. If the overall min/max equals the starting min/max, the condition in
    the loop was always false, due to being strict; return the original start 
    value in that case.
2026-02-19 10:52:27 +00:00
Sander de Smalen
114e20805f
[LV] Fix sub-reduction PHI in vectorized epilogue (#182072)
When the vectorized epilogue loop uses partial reductions, the PHI node
in the loop must start at 0 (because for partial sub-reductions the
sub is done in the middle block) and the compute-reduction-result must
subtract from the partial result (as calculated in the middle block of
the main vector loop), instead of subtracting from the original init
value.

This fixes the issue as reported on #178919 by @aeubanks.
2026-02-19 10:15:32 +00:00
Florian Hahn
cb87d346b4
[VPlan] Retrieve vector trip count from BranchOnCount (NFC).
In preparePlanForMainVectorLoop, the vector trip count may already be
materialized, e.g. when narrowing interleave groups. VPSymbolicValues
should not be accessed after materializing. Instead retrieve the trip
count directly from the branch-on-count. This is NFC at the moment, but
is needed to tighten VPSymbolicValue access verification.
2026-02-18 21:08:33 +00:00
Shih-Po Hung
97fa3e5936
[NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe (#177114)
This is groundwork for #151300, which aims to support first-faulting
loads in non-tail-folded early-exit loops.
Per #175900, we need a variable-length stepping transform that can
shared between EVL and non-EVL loops.
The idea is to have an EVL-independent counter and transform for
tracking the cumulative number of processed elements.

This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and
transform (canonicalizeEVLLoops) to be EVL-independent:
- Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to
  reflect its general purpose of tracking processed element count.
- Rename canonicalizeEVLLoops to convertToVariableLengthStep.

This is NFC.
2026-02-18 07:04:58 +00:00
Andrei Elovikov
15057eb8ce
[VPlan] Add VPlan-dump-based test for predication (#180794) 2026-02-17 19:14:29 +08:00
Florian Hahn
6f253e87dd
Reapply "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

The underlying issue causing the revert has been fixed independently
as 301fa24671256734df6b7ee65f23ad885400108e.

Original message:
Move narrowInterleaveGroups to to general VPlan optimization stage.

To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.

If such a VF is found, the original VPlan is split into 2:
 a) a new clone which contains all VFs of Plan, except VFToOptimize, and
 b) the original Plan with VFToOptimize as single VF.

The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.

Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.

One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz

PR: https://github.com/llvm/llvm-project/pull/149706
2026-02-15 20:10:10 +00:00
Florian Hahn
f3a816598d
[VPlan] Add VPSymbolicValue for UF. (NFC)
Add a symbolic unroll factor (UF) to VPlan similar to VF & VFxUF that
gets replaced with the concrete UF during plan execution, similar to how VF
is used for the vectorization factor. This is a preparatory change that
allows transforms to use the symbolic UF before the concrete UF is
determined.

Note that the old getUF that returns the concrete UF after unrolling has
been renamed to getConcreteUF.

Split off from the re-commit of 8d29d093096
(https://github.com/llvm/llvm-project/pull/149706) as suggested.
2026-02-15 15:24:35 +00:00
Florian Hahn
b3dcf485d2
[VPlan] Compute NumPredStores for VPReplicateRecipe costs in VPlan.
Compute the number of predicated stores directly in VPlan instead of
using CM.useEmulatedMaskMemRefHack(), which will only account for the
number of predicated stores for the last VF the legacy cost model
considered.

Fixes https://github.com/llvm/llvm-project/issues/181183
2026-02-13 21:16:53 +00:00
Florian Hahn
ede1a9626b
[LV] Vectorize early exit loops with multiple exits. (#174864)
Building on top of the recent changes to introduce BranchOnTwoConds,
this patch adds support for vectorizing loops with multiple early exits,
all dominating a countable latch. The early exits must form a
dominance chain, so we can simply check which early exit has been taken
in dominance order.

Currently LoopVectorizationLegality ensures that all exits other than
the latch must be uncountable. handleUncountableEarlyExits now collects
those uncountable exits and processes each exit.

In the vector region, we compute if any exit has been taken, by taking
the OR of all early exit conditions (EarlyExitConds) and checking if
there's
any active lane.

If the early exit is taken, we exit the loop and compute which early
exit
has been taken. The first taken early exit is the one where its exit
condition is true in the first active lane of EarlyExitConds.

We create a chain of dispatch blocks outside the loop to check this for
the early exit blocks ordered by dominance.

Depends on https://github.com/llvm/llvm-project/pull/174016.

PR: https://github.com/llvm/llvm-project/pull/174864
2026-02-13 16:44:23 +00:00
Florian Hahn
a55fbab0cf
[VPlan] Run initial recipe simplification on VPlan0. (#176828)
In some cases, LV gets simplifyable IR as input. Directly apply
simplifications on the initial VPlan0 to avoid vectorization in cases
where the loop body can be folded away.

Using the end-to-end pipeline, this is relatively rare, but when
reducing test cases, the reduction often ends up with cases with trivial
folds. Rejecting those will result in more robust & realistic test
cases.

As follow-up, I also plan to add initial dead recipe removal.

Depends on https://github.com/llvm/llvm-project/pull/176795.

PR: https://github.com/llvm/llvm-project/pull/176828
2026-02-13 12:01:22 +00:00
Florian Hahn
d3afa171ee
[LV] Don't scalarize loads that need predication in legacy CM.
The legacy cost model tries to scalarize loads that are used as
pointers. Skip if the load would need predicating when scalarized,
because that would incur very high costs, see useEmulatedMaskMemRefHack.

Fixes https://github.com/llvm/llvm-project/issues/180780.
2026-02-11 20:52:08 +00:00
Jonas Paulsson
d80a729572
[LoopVectorizer] Rename variable (NFC). (#180585)
Since TargetTransformInfo::enableAggressiveInterleaving(bool
HasReductions) takes the HasReductions argument, the LoopVectorizer
should save its returned value in a variable called AggressivelyInterleave
instead of AggressivelyInterleaveReductions.
2026-02-10 10:11:43 -06:00
Andrei Elovikov
f96c1ccc1e
[VPlan] Add -vplan-print-after= option (#178700)
UpdateTestChecks support is updated in subsequent
https://github.com/llvm/llvm-project/pull/178736.
2026-02-10 16:07:25 +00:00