4008 Commits

Author SHA1 Message Date
Florian Hahn
0403639667
[VPlan] Skip successors outside any loop when updating LoopInfo. (#190553)
Successors outside of any loop do not contribute to the innermost loop,
skip them to avoid incorrect results due to
getSmallestCommonLoop(nullptr, X) returning nullptr.
2026-04-06 12:58:41 +01:00
Florian Hahn
58208a0cc1
[LV] Additional epilogue tests for find-iv and with uses of IV.(NFC) (#190548)
Additional test coverage for loops not yet supported, with sinkable
find-iv expressions (github.com/llvm/llvm-project/pull/183911) and uses
of the IV.

PR: https://github.com/llvm/llvm-project/pull/190548
2026-04-05 20:42:11 +00:00
Elvis Wang
47cd798670
Revert "[LV] Enable scalable FindLast on RISCV." (#190463)
Reverts llvm/llvm-project#184931 since it crash llvm-test-suite.
https://lab.llvm.org/buildbot/#/builders/210/builds/9807
2026-04-04 23:03:11 +08:00
Elvis Wang
a955b3caba
[LV] Enable scalable FindLast on RISCV. (#184931)
This patch enables FindLast reduction vectorization with scalable vectors
on RISCV.
2026-04-04 18:58:58 +08:00
Florian Hahn
093c6391b2
[LV] Add additional tests with IV live-outs. (NFC) (#190395)
Add additional tests with IV live-out users, for which epilogue
vectorization is not enabled yet.

Also modernize check lines.
2026-04-04 10:20:04 +01:00
Sander de Smalen
730a07f225
[LV] Only create partial reductions when profitable. (#181706)
We want the LV cost-model to make the best possible decision of VF and
whether or not to use partial reductions. At the moment, when the LV can
use partial reductions for a given VF range, it assumes those are always
preferred. After transforming the plan to use partial reductions, it
then chooses the most profitable VF. It is possible for a different VF
to have been more profitable, if it wouldn't have chosen to use partial
reductions.

This PR changes that, to first decide whether partial reductions are
more profitable for a given chain. If not, then it won't do the
transform.

This causes some regressions for AArch64 which are addressed in a
follow-up PR to keep this one simple.
2026-04-03 17:42:51 +01:00
Ramkumar Ramachandra
82e8494070
[VPlan] Avoid unnecessary BTC SymbolicValue creation (NFC) (#189929)
Don't unnecessarily create a backedge-taken-count SymbolicValue. This
allows us to simplify some code.
2026-04-01 16:25:48 +00:00
Florian Hahn
0b61cd39e4
[LV] Add epilogue minimum iteration check in VPlan as well. (#189372)
Update LV to also use the VPlan-based addMinimumIterationCheck for the
iteration count check for the epilogue.

As the VPlan-based addMinimumIterationCheck uses VPExpandSCEV, those
need to be placed in the entry block for now, moving vscale * VF * IC to
the entry for scalable vectors.

The new logic also fails to simplify some checks involving PtrToInt,
because they were only simplified when going through generated IR, then
folding some PtrToInt in IR, then constructing SCEVs again. But those
should be cleaned up by later combines, and there is not really much we
can do other than trying to go through IR.

PR: https://github.com/llvm/llvm-project/pull/189372
2026-04-01 15:47:41 +01:00
Henry Jiang
5d624b5b93
[VPlan] Stop outerloop vectorization from vectorizing nonvector intrinsics (#185347)
In outer-loop VPlan, avoid emitting vector intrinsic calls for intrinsics
without a vector form. In VPRecipeBuilder, detect missing vector intrinsic
mapping and emit scalar handling instead of a vector call.

Also fix assertion when `llvm.pseudoprobe` in VPlan's native path is being
treated as a `WIDEN-INTRINSIC`.

Reproducer: https://godbolt.org/z/GsPYobvYs
2026-03-31 16:01:39 -07:00
Florian Hahn
ff4e229f8c
Revert "[VPlan] Extract reverse mask from reverse accesses" (#189637)
Reverts llvm/llvm-project#155579

Assertion added triggers on some buildbots
clang:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840:
virtual InstructionCost
llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &)
const: Assertion `!IsReverse() && "Inconsecutive memory access should
not have reverse order"' failed.
PLEASE submit a bug report to
https://github.com/llvm/llvm-project/issues/ and include the crash
backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang
-DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3
-std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o
-MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o
CMakeFiles/timberwolfmc.dir/finalpin.c.o -c
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c
2026-03-31 15:53:01 +01:00
Ramkumar Ramachandra
d43d9022c1
[LV] Regen vplan-predicate-switch test with UTC (NFC) (#189581) 2026-03-31 10:48:01 +00:00
David Sherwood
f59d6f734e
[LV][NFC] Remove more function attributes from tests (#188185)
Following on from PR #188091 I've also removed the following function
attributes from tests:

 nounwind uwtable ssp

as they didn't make any difference to the tests.
2026-03-31 09:56:17 +01:00
Mel Chen
f76f41f702
[VPlan] Extract reverse mask from reverse accesses (#155579)
Following #146525, separate the reverse mask from reverse access
recipes.
At the same time, remove the unused member variable `Reverse` from
`VPWidenMemoryRecipe`.
This will help to reduce redundant reverse mask computations by
VPlan-based common subexpression elimination.
2026-03-31 08:51:15 +00:00
Florian Hahn
713c70d7ef
[VPlan] Handle regions with live-outs and scalar VF when replicating. (#186252)
Extend intial unrolling of replicate regions
(https://github.com/llvm/llvm-project/pull/170212) to support live-outs,
if the VF is scalar.

This allows adding the logic needed to explicitly unroll, and replacing
VPPredPhiInsts with regular scalar VPPhi, without yet having to worry
about packing values into vector phis. This will be done in a follow-up
change, which means all replicate regions will be fully dissolved.

PR: https://github.com/llvm/llvm-project/pull/186252
2026-03-30 13:23:23 +01:00
Sander de Smalen
a4e6f495c3
[AArch64] More accurately model cost of partial reductions (#181707)
With #181706 using the cost-model to decide whether using partial
reductions is profitable, we need to more accurately represent the cost
of certain partial reduction operations:

* Reflect the fact that *MLALB/T instructions can be used for 16-bit ->
32-bit partial reductions (or *MLAL/MLAL2 for NEON).
* Calculate the cost of expanding the partial reduction in ISel for
reductions that don't have an explicit instruction, rather than
returning a random number. For sub-reductions we scale the cost to make
them slightly cheaper, so that they're still candidates for forming cdot
operations.
2026-03-30 08:49:41 +01:00
Florian Hahn
617ec39fd0
[VPlan] Add printing test for dissolving replicate regions. (#189192)
Add VPlan printing test for
 https://github.com/llvm/llvm-project/pull/186252
 https://github.com/llvm/llvm-project/pull/189022
2026-03-28 21:01:42 +00:00
Ramkumar Ramachandra
840e9a4ddd
[VPlan] Fix wrap-flags on WidenInduction unroll (#187710)
Due to a somewhat recent change, IntOrFpInduction recipes have
associated VPIRFlags. The VPlanUnroll logic for WidenInduction recipes
predates this change, and computes incomplete wrap-flags: update it to
simply use the flags on IntOrFpInduction recipes; PointerInduction
recipes have no associated flags, and indeed, no flags should be used.
2026-03-27 13:26:04 +00:00
Florian Hahn
90c1c588f8
[VPlan] Don't set WrapFlags for truncated IVs. (#188966)
The wrap flags from the IV bin-op are not guaranteed to apply to
truncated inductions, which are evaluated in narrower types.

Instead of dropping them late (in expandVPWidenIntOrFpInduction), do not
add them at the outset, the prevent invalid transforms based on
incorrect flags in the future.

PR: https://github.com/llvm/llvm-project/pull/188966
2026-03-27 12:39:03 +00:00
Florian Hahn
99aa33d5b3
Reapply "[VPlan] Explicitly unroll replicate-regions without live-outs by VF." (#188947)
This reverts commit 4562a953db9d9813a873b78144cee1df39c7e0c0.

The recommit adjusts processLaneForReplicateRegion to first remap all
operands, then update the new operands. This fixes a VPlan verification
failure when running LV tests with expensive checks.

Original message:

This patch adds a new replicateReplicateRegionsByVF transform to unroll
replicate=regions by VF, dissolving them. The transform creates VF
copies of the replicate-region's content, connects them and converts
recipes to single-scalar variants for the corresponding lanes.

The initial version skips regions with live-outs (VPPredInstPHIRecipe),
which will be added in follow-up patches.

Depends on https://github.com/llvm/llvm-project/pull/170053

PR: https://github.com/llvm/llvm-project/pull/170212
2026-03-27 12:19:58 +00:00
Ramkumar Ramachandra
9d5684bb00
[LV] Regen iv_outside_user test with UTC (NFC) (#188934)
To merge different CHECK prefixes to a common one.
2026-03-27 12:07:49 +00:00
Florian Hahn
849ba979bd
[VPlan] Add test showing incorrect flags on truncated inductions (NFC). 2026-03-27 11:00:26 +00:00
Florian Hahn
4562a953db
Revert "[VPlan] Explicitly unroll replicate-regions without live-outs by VF." (#188868)
Reverts llvm/llvm-project#170212

appears to cause a failure with expensive checks:
https://lab.llvm.org/buildbot/#/builders/187/builds/18306
2026-03-26 23:20:49 +00:00
Florian Hahn
cb1661b046
[VPlan] Explicitly unroll replicate-regions without live-outs by VF. (#170212)
This patch adds a new replicateReplicateRegionsByVF transform to
unroll replicate=regions by VF, dissolving them. The transform creates
VF copies of the replicate-region's content, connects them and converts
recipes to single-scalar variants for the corresponding lanes.

The initial version skips regions with live-outs (VPPredInstPHIRecipe),
which will be added  in follow-up patches.

Depends on https://github.com/llvm/llvm-project/pull/170053

PR: https://github.com/llvm/llvm-project/pull/170212
2026-03-26 21:35:29 +00:00
Florian Hahn
5aae014ed5
[LV] Refine tripcount estimate using minimum iteration count rt check. (#188135)
When not folding the tail the minimum iteration count check ensures that
the vector loop is not executed if computing the trip count wraps around
to zero, as the trip count must be at least VF when vectorizing without
tail-folding.

Add and use a new tryToRefineConstantMaxTripCount helper. This ensures
we do not create dead main loops when vectorizing the epilogue, as we
choose smaller main VFs.

PR: https://github.com/llvm/llvm-project/pull/188135
2026-03-26 20:48:53 +00:00
Ramkumar Ramachandra
76a9692254
[VPlan] Sink single-scalar replicates in licm (#187047)
Refine the replicate bail-out in licm to permit single-scalar
replicates.
2026-03-26 14:42:57 +00:00
Florian Hahn
40304d8fef
Reapply "[VPlan] Remove manual region removal when simplifying for VF and UF. (#181252)" (#188589)
This reverts commit e30f9c19464bcf1bf1e9f69b63884fb78ad2d05d.

Re-land, now that the reported crash causing the revert has been fixed
as part of 77fb84889 (#187504).

Original message:

Replace manual region dissolution code in
simplifyBranchConditionForVFAndUF with using general
removeBranchOnConst. simplifyBranchConditionForVFAndUF now just creates
a (BranchOnCond true) or updates BranchOnTwoConds.

The loop then gets automatically removed by running removeBranchOnConst.

This removes a bunch of special logic to handle header phi replacements
and CFG updates. With the new code, there's no restriction on what kind
of header phi recipes the loop contains.

Note that VPEVLBasedIVRecipe needs to be marked as readnone. This is
technically unrelated, but I could not find an independent test that
would be impacted.

The code to deal with epilogue resume values now needs updating, because
we may simplify a reduction directly to the start value.

PR: https://github.com/llvm/llvm-project/pull/181252
2026-03-26 10:14:10 +00:00
Florian Hahn
6420dd833e
[LV] Add missing REQUIRES: asserts to new test from #188126.
Test checks debug output, and requires asserts.
2026-03-25 15:45:22 +00:00
John Brawn
5f49ce5eaf
[ARM] Consider register pressure when vectorizing with MVE (#188053)
MVE only has 8 vector registers, so it's not too hard for the vectorizer
to end up using more than that resulting in enough spilling that it's
worse than not vectorizing. Enable
shouldConsiderVectorizationRegPressure for targets with MVE so the
vectorizer doesn't vectorize in those cases.
2026-03-25 10:46:49 +00:00
Rohit Aggarwal
d21e1a3798
[LIBM][AMDLIBM] - New vector calls for cdfnorm and round scalar calls (#187232)
In amdlibm, new vector calls

cdfnorm
amd_vrd2_cdfnorm
amd_vrd4_cdfnorm
amd_vrd8_cdfnorm

round
amd_vrs16_roundf
amd_vrs8_roundf
amd_vrs4_roundf
amd_vrd8_round 
amd_vrd4_round 
amd_vrd2_round

Link to aocl repo -
[aocl-libm-ose](https://github.com/amd/aocl-libm-ose)
2026-03-25 10:03:00 +00:00
Florian Hahn
86c1510418
[VPlan] Remove isVector guard in getCostForRecipeWithOpcode. (#188126)
The legacy cost model computes and passes RHSInfo both when widening and
replicating. Match behavior in VPlan-based cost model.

The added test shows that we now compute the same cost as the legacy
cost model.

Without this change, the test added in
llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would
crash with https://github.com/llvm/llvm-project/pull/187056.

PR: https://github.com/llvm/llvm-project/pull/188126
2026-03-25 09:59:13 +00:00
David Sherwood
85e1c641eb
[LV][NFC] Remove some unused attributes from tests (#188091)
The local_unnamed_addr and dso_local attributes add no value to any of
the tests and simply increase file size, so I've removed all instances.
2026-03-24 06:52:31 +00:00
Florian Hahn
77fb848894
Reapply "[LV] Simplify and unify resume value handling for epilogue vec." (#187504)
This reverts commit cdaf29f84dd0abbd1f961982799059c92d76625b.

This version skips removeBranchOnConst when vectorizing the epilogue, as
it may trigger folds that remove the resume phi used as resume value
from the epilogue.

This fixes https://github.com/llvm/llvm-project/issues/187323.

Original message:
This patch tries to drastically simplify resume value handling for the
scalar loop when vectorizing the epilogue.

It uses a simpler, uniform approach for updating all resume values in
the scalar loop:

1. Create ResumeForEpilogue recipes for all scalar resume phis in the
main loop (the epilogue plan will have exactly the same scalar resume
phis, in exactly the same order)
2. Update ::execute for ResumeForEpilogue to set the underlying value
when executing. This is not super clean, but allows easy lookup of the
generated IR value when we update the resume phis in the epilogue. Once
we connect the 2 plans together explicitly, this can be removed.
3. Use the list of ResumeForEpilogue VPInstructions from the main loop
to update the resume/bypass values from the epilogue.

This simplifies the code quite a bit, makes it more robust (should fix
https://github.com/llvm/llvm-project/issues/179407) and also fixes a
mis-compile in the existing tests (see change in

llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-epilogue-vec.ll,
where previously we would incorrectly resume using the start value when
the epilogue iteration check failed)

In some cases, we get simpler code, due to additional CSE, in some cases
the induction end value computations get moved from the epilogue
iteration check to the vector preheader. We could try to sink the
instructions as cleanup, but it is probably not worth the trouble.

Fixes https://github.com/llvm/llvm-project/issues/179407.

PR for recommit https://github.com/llvm/llvm-project/pull/188134
2026-03-23 22:09:40 +00:00
Florian Hahn
2f1e0d14f4
[LV] Add additional epilogue vector tests.
Add additional epilogue vectorization tests for
 * https://github.com/llvm/llvm-project/issues/187323
 * https://github.com/llvm/llvm-project/issues/185345
2026-03-23 20:44:00 +00:00
Andrei Elovikov
a9ae2fd79e
[NFC][LV] Fix what seems to be a typo in the test (#187769)
The test was added in
4e9894498e.

Alternative fixes would be:
* Remove unused GEP, although not clear why we'd want to overwrite
stored `i64` with `ptr` store.
* Keep this patch, but perform both GEPs with `i64` element type to
reduce the diff. It's not clear if the scalarization caused by that type
mismatch is intentional/relevant for the original change.
2026-03-23 17:28:32 +00:00
Alan Zhao
c624851037
[LoopVectorize] Fix an integer narrowing conversion in getPredBlockCostDivisor(...) (#187605)
`LoopVectorizationCostModel::getPredBlockCostDivisor(...)` may return
large `uint64_t` values that get coerced to an `unsigned` by
`VPCostContext::getPredBlockCostDivisor(...)`, which can cause division
by zero.

Fixes #187584
2026-03-23 17:22:05 +00:00
Benjamin Maxwell
249b086545
[LV] Fix crash when extends are not widened in partial reduction matching (#187782)
Fixes https://github.com/llvm/llvm-project/pull/185821#issuecomment-4098933551
2026-03-23 10:30:19 +00:00
Sander de Smalen
6feced2a7c Fix select-best-vf-tripcount.ll buildbot failure
This test failed on the llvm-clang-win-x-aarch64 buildbot.

It seems the rounding is different, leading to a different output.
Instead of:
  Cost for VF 4: 9 (Estimated cost per lane: 2.2)

The windows buildbot it fails because the test output is:
  Cost for VF 4: 9 (Estimated cost per lane: 2.3)
2026-03-20 14:16:58 +00:00
Florian Hahn
19b0c68ee0
[VPlan] Skip epilogue vectorization if dead after narrowing IGs. (#187016)
When narrowing interleave groups, the main vector loop processes IC
iterations instead of VF * IC. Update selectEpilogueVectorizationFactor
to use the effective VF, checking if the canonical IV controlling the
loop now steps by UF instead of VFxUF.

This avoids epilogue vectorization with dead epilogue vector loops and
also prevents crashes in cases where we can prove both the epilogue and
scalar loop are dead.

Fixes https://github.com/llvm/llvm-project/issues/186846

PR: https://github.com/llvm/llvm-project/pull/187016
2026-03-20 12:33:16 +00:00
Ramkumar Ramachandra
1dfd268f10
[VPlan] Simplify mul x, -1 -> sub 0, x (#187551)
Simplify exactly as InstCombine does. A follow-up would include
simplifying add x, (sub 0, y) -> sub x, y.

Alive2 proof: https://alive2.llvm.org/ce/z/Af7QiD
2026-03-20 12:07:51 +00:00
Ramkumar Ramachandra
b6accfa0b4
[LV] Regen induction-ptrcasts test with UTC (NFC) (#187678) 2026-03-20 11:58:19 +00:00
Benjamin Maxwell
4b17135d14
[LV] Simplify matchExtendedReductionOperand() (NFCI) (#185821)
This updates `matchExtendedReductionOperand` so the simple case of
`UpdateR(PrevValue, ext(...))` is matched first as an early exit. The
binop matching is then flattened to remove the extra layer of the
`MatchExtends` lambda.
2026-03-20 09:29:28 +00:00
Sander de Smalen
a971089cb8
[LV] Explain why a less profitable VF was chosen (NFCI) (#187469)
I was very puzzled the other day when it showed that VF 8 had a cost of
X and VF 16 had a cost of X/2, yet it still choose VF 8. This PR adds
some extra debug output to explain why this happens.
2026-03-20 07:21:17 +00:00
Florian Hahn
fd3cf1c160
[LV] Move dereferenceability check from Legal to VPlan (NFC) (#185323)
Instead of checking dereferenceability early during
LoopVectorizationLegality, defer the check to VPlan construction via
areAllLoadsDereferenceable.

This in preparation for supporting early exit vectorization of
non-dereferencable loads, e.g. via speculative loads
(https://discourse.llvm.org/t/rfc-provide-intrinsics-for-speculative-loads/89692)
or first-faulting loads. Detection in VPlan allows easily replacing
potentially non-deref loads with other loads as needed.

PR: https://github.com/llvm/llvm-project/pull/185323
2026-03-19 19:21:45 +00:00
Florian Hahn
cdaf29f84d
Revert "[LV] Simplify and unify resume value handling for epilogue vec." (#187504)
Reverts llvm/llvm-project#185969

This is suspected to cause a miscompile in 549.fotonik3d_r from SPEC 2017 FP
2026-03-19 14:38:37 +00:00
John Brawn
e8556ff6b6
[NFC] Remove fractional part of costs in maxbandwidth-regpressure.ll (#187498)
This test is failing on the llvm-clang-x-aarch64 buildbot due to what
looks like a difference in rounding behaviour when printing estimated
cost per lane. Solve this by removing the fractional part, which is what
we've done in the past when this has happened (e.g. commit aeb88f677).
2026-03-19 13:50:56 +00:00
John Brawn
191c84b822
[VPlan] Permit derived IV in isHeaderMask (#187360)
When matching scalar steps of the canonical IV, also match a derived IV
of the canonical IV if the derivation is essentially a no-op. Fixes a
failure in the mve-reg-pressure-spills.ll test when expensive checks are
enabled.
2026-03-19 12:05:07 +00:00
Koakuma
6aeeae676a
[SPARC][Tests] Add lit.local.cfg to SPARC LoopVectorize tests (#187489) 2026-03-19 18:59:15 +07:00
Koakuma
23af867e6d
[SPARC] Add TTI implementation for getting register numbers and widths (#180660)
Correctly inform transform passes about our registers; this prevents the
issue with the `find-last` test where the loop vectorizer pass
mistakenly thinks that the backend has vector capabilities and generates
vector types, which causes the backend to crash.

See also: https://github.com/sparclinux/issues/issues/69
2026-03-19 18:37:46 +07:00
Elvis Wang
53f8f3b017
Reland [LV] Replace remaining LogicalAnd to vp.merge in EVL optimization. (#184068) (#187199)
This patch replace the remaining LogicalAnd to vp.merge in the second
pass to not break the `m_RemoveMask` pattern in the optimizeMaskToEVL.

Also skip cost model comparison when the plan contains `vp_merge` which
won't be calculated by the legacy model.

This can help to remove header mask for FindLast reduction (CSA) loops.

Original PR: https://github.com/llvm/llvm-project/pull/184068
Original built-bot failure:
https://lab.llvm.org/buildbot/#/builders/213/builds/2497
2026-03-19 07:56:42 +08:00
Florian Hahn
fce100e26e
[VPlan] Fix masked_cond expansion.
masked_cond is used to combine early-exit conditions with masks from
predicate. The early-exit condition should only be evaluated if the mask
is true. Emit the mask first, to avoid incorrect poison propagation.

Fixes https://github.com/llvm/llvm-project/issues/187061.
2026-03-18 20:26:04 +00:00