7433 Commits

Author SHA1 Message Date
vporpo
8d442bc5b5
[SandboxVec][LoadStoreVec] Add support for constants (#189769)
Up until now the pass would only vectorize load-store pairs. This patch
implements vectorization of constant-store pairs.
2026-04-06 11:25:20 -07:00
Florian Hahn
0403639667
[VPlan] Skip successors outside any loop when updating LoopInfo. (#190553)
Successors outside of any loop do not contribute to the innermost loop,
skip them to avoid incorrect results due to
getSmallestCommonLoop(nullptr, X) returning nullptr.
2026-04-06 12:58:41 +01:00
Florian Hahn
64a0bd1227
[LV] Return best VPlan together with VF from computeBestVF (NFC). (#190385)
computeBestVF iterates over all VPlans and picks the VF of the most
profitable VPlan. This VPlan is later needed for execution and
additional checks. Instead of retrieving it multiple times later, just
directly return it from computeBestVF.

This removes some redundant lookups.

PR: https://github.com/llvm/llvm-project/pull/190385
2026-04-06 11:01:18 +01:00
Florian Hahn
f7cdebb478
[VPlan] Mark unary ops as not having side-effects (NFC). (#190554)
Mark unary ops (only FNeg current) to neither read nor write memory,
similar to binary and cast ops.

Should currently be NFC end-to-end.
2026-04-06 09:05:38 +01:00
Florian Hahn
c109dd1e9a
[VPlan] Refactor FindLastSelect matching to use m_Specific(PhiR) (NFC). (#190547)
Match the select operands directly against PhiR using m_Specific,
binding only the non-phi IV expression. This replaces the generic
TrueVal/FalseVal matching followed by an assert and conditional
extraction.

Split off from approved
https://github.com/llvm/llvm-project/pull/183911/ as suggested.
2026-04-05 20:07:34 +00:00
Florian Hahn
36e495dd90
[VPlan] Use APSInt in CheckSentinel directly (NFC). (#190534)
Simplify the sentinel checking logic by using APSInt and checking for
both a signed and unsigned sentinel in a single call.

Removes the IsSigned argument

Split off from approved
https://github.com/llvm/llvm-project/pull/183911/ as suggested.
2026-04-05 16:43:59 +00:00
Florian Hahn
a2c16bb59f
[VPlan] Rename CondSelect to FindLastSelect (NFC). (#190536)
…ns (NFC).

Use the more descriptive name FindLastSelect for the conditional select
that picks between the reduction phi and the IV value.

Split off from approved
https://github.com/llvm/llvm-project/pull/183911/ as suggested.
2026-04-05 16:39:34 +00:00
Hassnaa Hamdi
c5a904946a
[LV][NFC] remove dead code in canFoldTailByMasking() (#190263)
Remove unused ReductionLiveOuts variable in `canFoldTailByMasking()`.
The set was being populated with reduction loop exit instructions but
was never actually used anywhere in the function.
2026-04-05 12:59:32 +01:00
Hassnaa Hamdi
6bf8279dc2
[LV][NFC] correct comment for isScalarEpilogueAllowed() (#190254)
The comment had the opposite meaning of what the function actually does.
2026-04-05 12:55:36 +01:00
vporpo
94545a7c63
[SandboxVec][Legality][NFC] Outline differentBlock() and areUnique() (#190024)
And reuse them in LoadStoreVec.
2026-04-03 12:14:55 -07:00
Sander de Smalen
730a07f225
[LV] Only create partial reductions when profitable. (#181706)
We want the LV cost-model to make the best possible decision of VF and
whether or not to use partial reductions. At the moment, when the LV can
use partial reductions for a given VF range, it assumes those are always
preferred. After transforming the plan to use partial reductions, it
then chooses the most profitable VF. It is possible for a different VF
to have been more profitable, if it wouldn't have chosen to use partial
reductions.

This PR changes that, to first decide whether partial reductions are
more profitable for a given chain. If not, then it won't do the
transform.

This causes some regressions for AArch64 which are addressed in a
follow-up PR to keep this one simple.
2026-04-03 17:42:51 +01:00
Florian Hahn
c963092b0c
[VPlan] Mark VPCanonicalIVPHI as not reading memory (NFCI). (#190338)
The canonical IV does not access any memory. Mark accordingly. This
should be NFC end-to-end.

PR: https://github.com/llvm/llvm-project/pull/190338
2026-04-03 13:12:20 +00:00
Ramkumar Ramachandra
e09d1e3ff1
[VPlan] Use not_equal_to to improve code (NFC) (#190262) 2026-04-03 07:32:34 +01:00
Ramkumar Ramachandra
d0e265f20d
[VPlan] Cleanup and generalize VPIRMetadata CastInfo (NFC) (#190162)
Similar to b0230f59 ([VPlan] Cleanup and generalize VPPhiAccessors
CastInfo, #190027).
2026-04-02 19:00:23 +01:00
Graham Hunter
deaef1c1b7
[LV] Adjust exit recipe detection to run on early vplan (#183318)
Splitting out some work from #178454; this covers the enums for
early exit loop type (none, readonly, readwrite) and the style
used (readonly with multiple exit blocks, or masking with the
last iteration done in scalar code), along with changing the early
exit recipe detection to suit moving the transform for handling
early exit readwrite loops earlier in the vplan pipeline.
2026-04-02 17:25:35 +01:00
Ramkumar Ramachandra
bb2a63a673
[VPlan] Use m_Isa to improve code (NFC) (#190149) 2026-04-02 15:53:05 +01:00
Alexey Bataev
c2f97c5917
[SLP] Do not skip tiny trees with gathered loads to vectorize
The isTreeTinyAndNotFullyVectorizable check for 2-node trees
(insertelement root + gather child) was too aggressive: it rejected
trees even when LoadEntriesToVectorize was non-empty, preventing
gathered loads from being vectorized into masked loads/strided loads, etc.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/190181
2026-04-02 09:47:01 -04:00
Alexey Bataev
dc2d25f80b
Revert "[SLP] Do not skip tiny trees with gathered loads to vectorize"
This reverts commit 94ec7ffa46d351b86fbbe3a445ceef37f331c4a2 to fix
reported issue https://github.com/llvm/llvm-project/pull/190040#issuecomment-4177827078

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/190176
2026-04-02 09:26:31 -04:00
Ramkumar Ramachandra
b0230f5996
[VPlan] Cleanup and generalize VPPhiAccessors CastInfo (NFC) (#190027) 2026-04-02 13:47:44 +01:00
Alexey Bataev
94ec7ffa46
[SLP] Do not skip tiny trees with gathered loads to vectorize
The isTreeTinyAndNotFullyVectorizable check for 2-node trees
(insertelement root + gather child) was too aggressive: it rejected
trees even when LoadEntriesToVectorize was non-empty, preventing
gathered loads from being vectorized into masked loads/strided loads, etc.

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/190040
2026-04-02 06:47:53 -04:00
Ramkumar Ramachandra
d835dd2b43
[LV] Strip createStepForVF (NFC) (#185668)
The mul -> shl simplification is already done in VPlan.
2026-04-02 10:04:37 +01:00
Alexey Bataev
c6669c4993
[SLP] Guard FMulAdd conversion to require single-use/non-reordered FMul operands
The FMulAdd (CombinedVectorize) transformation in transformNodes() marks
an FMul child entry with zero cost, assuming it is fully absorbed into
the fmuladd intrinsic. However, when any FMul scalar has multiple uses
(e.g., also stored separately), the FMul must survive as a separate
node.

Reviewers: hiraditya, RKSimon, bababuck

Pull Request: https://github.com/llvm/llvm-project/pull/189692
2026-04-01 17:14:52 -04:00
Ramkumar Ramachandra
934438ad86
[VPlanPatternMatch] Unify and clarify m_Isa (NFC) (#189941) 2026-04-01 20:48:46 +01:00
Ramkumar Ramachandra
44979bedf0
[VPlan] Strip dead code in isUniformAcrossVFsAndUFs (NFC) (#189687)
Checking a VPInstruction for scalar-cast is equivalent to checking
opcode against Instruction::isCast via preservesUniformity.
2026-04-01 17:38:41 +01:00
Ramkumar Ramachandra
82e8494070
[VPlan] Avoid unnecessary BTC SymbolicValue creation (NFC) (#189929)
Don't unnecessarily create a backedge-taken-count SymbolicValue. This
allows us to simplify some code.
2026-04-01 16:25:48 +00:00
Alexey Bataev
1e06cd634e
[SLP][NFC] Fix uninitialized ReductionRoot in getTreeCost
ReductionRoot was initialized to nullptr instead of the RdxRoot
parameter. This caused two ScaleCost calls (for MinBWs cast cost and
ReductionBitWidth resize cost) to pass nullptr as the user instruction,
and suppressed the "Reduction Cost" line in debug output. In practice
the scale factor is the same because the tree root's main op and the
reduction root share the same basic block, so this is NFC.

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/189994
2026-04-01 12:22:02 -04:00
Florian Hahn
0b61cd39e4
[LV] Add epilogue minimum iteration check in VPlan as well. (#189372)
Update LV to also use the VPlan-based addMinimumIterationCheck for the
iteration count check for the epilogue.

As the VPlan-based addMinimumIterationCheck uses VPExpandSCEV, those
need to be placed in the entry block for now, moving vscale * VF * IC to
the entry for scalable vectors.

The new logic also fails to simplify some checks involving PtrToInt,
because they were only simplified when going through generated IR, then
folding some PtrToInt in IR, then constructing SCEVs again. But those
should be cleaned up by later combines, and there is not really much we
can do other than trying to go through IR.

PR: https://github.com/llvm/llvm-project/pull/189372
2026-04-01 15:47:41 +01:00
David Green
fd40c60665
[VectorCombine] Fix transitive Uses in foldShuffleToIdentity (#188989)
The Uses in foldShuffleToIdentity is intended to detect where an operand
is used to distinguish between splats, identities and concats of the
same value. When looking through multiple unsimplified shuffles the same
Use could be both a splat and a identity though. This patch changes the
Use to a Value and an original Use, so that even if we are looking
through multiple vectors we recognise the splat vs identity vs concat of
each use correctly.

Fixes #180338
2026-04-01 14:53:04 +01:00
Ramkumar Ramachandra
3068132e32
[LV] Use bind_front in tryToOptimizeInductionTruncate (NFC) (#189763) 2026-04-01 08:19:49 +01:00
Alexey Bataev
c20e233020 [SLP] Replace TrackedToOrig DenseMap with parallel SmallVector in reduction
Replace the DenseMap<Value*, Value*> TrackedToOrig with a SmallVector<Value*>
indexed in parallel with Candidates. This avoids hash-table overhead for the
tracked-value-to-original-value mapping in horizontal reduction processing.

Fixes #189686
2026-03-31 16:22:57 -07:00
Henry Jiang
5d624b5b93
[VPlan] Stop outerloop vectorization from vectorizing nonvector intrinsics (#185347)
In outer-loop VPlan, avoid emitting vector intrinsic calls for intrinsics
without a vector form. In VPRecipeBuilder, detect missing vector intrinsic
mapping and emit scalar handling instead of a vector call.

Also fix assertion when `llvm.pseudoprobe` in VPlan's native path is being
treated as a `WIDEN-INTRINSIC`.

Reproducer: https://godbolt.org/z/GsPYobvYs
2026-03-31 16:01:39 -07:00
vporpo
d8e9e0af1c
[SandboxVec][LoadStoreVec] Initial pass implementation (#188308)
This patch implements a new simple region pass that can vectorize
store-load chains.
2026-03-31 15:15:43 -07:00
Florian Hahn
ff4e229f8c
Revert "[VPlan] Extract reverse mask from reverse accesses" (#189637)
Reverts llvm/llvm-project#155579

Assertion added triggers on some buildbots
clang:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840:
virtual InstructionCost
llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &)
const: Assertion `!IsReverse() && "Inconsecutive memory access should
not have reverse order"' failed.
PLEASE submit a bug report to
https://github.com/llvm/llvm-project/issues/ and include the crash
backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang
-DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3
-std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/CMakeFiles/timberwolfmc.dir/finalpin.c.o
-MF CMakeFiles/timberwolfmc.dir/finalpin.c.o.d -o
CMakeFiles/timberwolfmc.dir/finalpin.c.o -c
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/test-suite/MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/finalpin.c
2026-03-31 15:53:01 +01:00
Ramkumar Ramachandra
c592eba498
[VPlan] Use RPOT in CSE, fixing potential crash (#187548)
A CSE crash is observed arising from outdated hash values unless we
forbid replacements in successor phis in blocks that are not dominated
by the def: the crash is observed when there is a block with CSE'able
phis with CSE'able incoming values, with incoming values coming from a
non-dominating block, under the condition that the block with the phis
is visited before the non-dominating block. It is unfortunately
impossible to write a test case showing a crash at present, but crashes
do occur when attempting to CSE DerivedIV recipes. The root cause of the
crash is visiting a non-dominated use before a def, and hence would be
fixed by a reverse post-order traversal.

Fixes #187499.

Co-authored-by: Luke Lau <luke@igalia.com>
2026-03-31 10:40:03 +01:00
Mel Chen
f76f41f702
[VPlan] Extract reverse mask from reverse accesses (#155579)
Following #146525, separate the reverse mask from reverse access
recipes.
At the same time, remove the unused member variable `Reverse` from
`VPWidenMemoryRecipe`.
This will help to reduce redundant reverse mask computations by
VPlan-based common subexpression elimination.
2026-03-31 08:51:15 +00:00
Vasileios Porpodas
47e3f42bc7 Reapply "[SandboxVec][VecUtils] Lane Enumerator (#188355)"
This reverts commit c93049ef504f942af0f884ce8a5efc21df21d131.
2026-03-31 00:19:25 +00:00
Demetrius Kanios
96bd7b6e15
[CodeGen] Add additional params to TargetLoweringBase::getTruncStoreAction (#187422)
The truncating store analogue of #181104.

Adds `Alignment` and `AddrSpace` parameters to
`TargetLoweringBase::getTruncStoreAction` and dependents, and introduces
a `getCustomTruncStoreAction` hook for targets to customize legalization
behavior using this new information.

This change is fully backwards compatible from the target's point of
view, with `setTruncStoreAction` having identical functionality. The
change is purely additive.
2026-03-30 16:52:45 -07:00
Vasileios Porpodas
c93049ef50 Revert "[SandboxVec][VecUtils] Lane Enumerator (#188355)"
This reverts commit 02402beefec61c5947c9d3bec60626a4afd860a8.
2026-03-30 22:12:02 +00:00
vporpo
02402beefe
[SandboxVec][VecUtils] Lane Enumerator (#188355)
This patch introduces an iterator that helps us iterate over lane-value
pairs in a range. For example, given a container `(i32 %v0, <2 x i32>
%v1, i32 %v2)` we get:
```
Lane Value
  0   %v0
  1   %v1
  3   %v2
```

We use this iterator to replace the lane counting logic in
BottomUpVec.cpp.
2026-03-30 15:08:16 -07:00
Alexey Bataev
26e0d15eaa
[SLP] Prefer to trim equal-cost alternate-shuffle subtrees
If the trimming candidate subtree is rooted at an alternate-shuffle node
with binary ops, and this subtree has the same cost as the buildvector
node cost, better to stick with the buildvector node to avoid runtime
perf regressions from shuffle/extra operations  overhead that the cost model may
underestimate. Skip trimming if the subtree contains ExtractElement
nodes, since those operate on already-materialized vectors, which may
reduced vector-to-scalar code movement and have better perf.

Reviewers: hiraditya, bababuck, fhahn, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/188272
2026-03-30 16:03:18 -04:00
Hassnaa Hamdi
1d48c3bc01
[NFC][LV] Separate control-flow masking from tail-folding masking. (#169509)
- Differentiate between operations that need masking because they are in
a conditionally-executed block, and operations that need masking because
the loop is tail-folded (predicated).

- This is needed for future work when we need to support a predicated
vector epilogue in combination with an unpredicated vector body.
- This is first patch in a series.
- See #181401 for the follow-on work.
2026-03-30 17:09:34 +01:00
Florian Hahn
713c70d7ef
[VPlan] Handle regions with live-outs and scalar VF when replicating. (#186252)
Extend intial unrolling of replicate regions
(https://github.com/llvm/llvm-project/pull/170212) to support live-outs,
if the VF is scalar.

This allows adding the logic needed to explicitly unroll, and replacing
VPPredPhiInsts with regular scalar VPPhi, without yet having to worry
about packing values into vector phis. This will be done in a follow-up
change, which means all replicate regions will be fully dissolved.

PR: https://github.com/llvm/llvm-project/pull/186252
2026-03-30 13:23:23 +01:00
Florian Hahn
53e7f9ada4
[VPlan] Generalize header-phi detection in VPPhi::execute. (NFC) (#189352)
Generalize the header-phi detection in VPPhi::execute to use VPDT.

This is currently NFC, but is needed to use VPPhi also for dissolving
replicate regions (https://github.com/llvm/llvm-project/pull/186252).

Split off from approved https://github.com/llvm/llvm-project/pull/186252
as suggested.

PR: https://github.com/llvm/llvm-project/pull/189352
2026-03-30 12:21:22 +01:00
Alexey Bataev
c7908d3320 [SLP][NFC]Use passing-by-ref in the range based loop to prevent warnings/errors 2026-03-30 03:47:00 -07:00
Ramkumar Ramachandra
8a4f21048f
[VPlan] Generalize noalias-licm-check to replicate regions (NFC) (#187017)
In order to use the cannotHoistOrSinkWithNoAlias check in use-sites
after replicate regions are created, generalize it to work with
replicate regions.
2026-03-30 09:17:39 +01:00
Florian Hahn
b5d43f7794
[VPlan] Use transferSuccessors in mergeBlocksIntoPredecessors (NFC). (#189275)
transferSuccessors is more compact and is guaranteed to preserve the
predecessor/successor order properly in all cases. This is not an issue
today, but will when used in more places, including #186252.

Split off from approved
https://github.com/llvm/llvm-project/pull/186252.

PR: https://github.com/llvm/llvm-project/pull/189275
2026-03-29 20:20:23 +01:00
Florian Hahn
c467d38090
[LV] Fix offset handling for epilogue resume values. (NFCI) (#189259)
Instead of replacing all uses of the canonical IV with an add of the
resume value and then relying on the fold to simplify, directly create
offset versions of both the canonical IV and its increment.

The original offset computation were incorrect, but not resulted in
mis-compiles due to the corresponding fold.

Split off from approved
https://github.com/llvm/llvm-project/pull/156262.
2026-03-29 17:04:50 +00:00
Alexey Bataev
4450891580 [SLP] Check if potential bitcast/bswap candidate is a root of reduction
Need to check if the potential bitcast/bswap-like construct is a root of
the reduction, otherwise it cannot represent a bitcast/bswap construct.

Fixes #189184
2026-03-28 13:58:22 -07:00
Ryan Buchner
a125d9b5ef
[SLP][NFC] Reapply "Refactor to prepare for constant stride stores" (#188689)
Refactor to proceed #185964.

Much of this is a refactor to address this issues. Instead of iterating over one chain at a time, attempting all VFs for that given change, we now iterate over VFs, trying each chain for the current VF.

Includes fix for use after free bug.
2026-03-27 10:11:49 -07:00
Ramkumar Ramachandra
840e9a4ddd
[VPlan] Fix wrap-flags on WidenInduction unroll (#187710)
Due to a somewhat recent change, IntOrFpInduction recipes have
associated VPIRFlags. The VPlanUnroll logic for WidenInduction recipes
predates this change, and computes incomplete wrap-flags: update it to
simply use the flags on IntOrFpInduction recipes; PointerInduction
recipes have no associated flags, and indeed, no flags should be used.
2026-03-27 13:26:04 +00:00