582 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
85fafd5db0
[SCEVExp] Get DL from SE, strip constructor arg (NFC) (#171823) 2025-12-11 14:26:47 +00:00
Florian Hahn
c61a481a23
[VPlan] Use SCEV to prove non-aliasing for stores at different offsets. (#170347)
Extend the logic add in https://github.com/llvm/llvm-project/pull/168771
to also allow sinking stores past stores in the same noalias set by
checking if we can prove no-alias via the distance between accesses,
checked via SCEV.

PR: https://github.com/llvm/llvm-project/pull/170347
2025-12-09 16:19:13 +00:00
Florian Hahn
0768068ff0
[VPlan] Remove ExtractLastLane for plans with scalar VFs. (#171145)
ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to
remove them. This also requires adjusting the code in VPlanUnroll.cpp to
split off handling of ExtractLastLane/ExtractPenultimateElement for
scalar VFs, which now needs to match ExtractLastPart.

PR: https://github.com/llvm/llvm-project/pull/171145
2025-12-09 11:59:40 +00:00
Ramkumar Ramachandra
c5b90103da
[VPlan] Use nuw when computing {VF,VScale}xUF (#170710)
These quantities should never unsigned-wrap. This matches the behavior
if only VFxUF is used (and not VF): when computing both VF and VFxUF,
nuw should hold for each step separately.
2025-12-08 15:46:02 +00:00
Florian Hahn
3fc7419236
[VPlan] Replace ExtractLast(Elem|LanePerPart) with ExtractLast(Lane/Part) (#164124)
Replace ExtractLastElement and ExtractLastLanePerPart with more generic
and specific ExtractLastLane and ExtractLastPart, which model distinct
parts of extracting across parts and lanes. ExtractLastElement ==
ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart ==
ExtractLastLane, the latter clarifying the name of the opcode. A new
m_ExtractLastElement matcher is provided for convenience.

The patch should be NFC modulo printing changes.

PR: https://github.com/llvm/llvm-project/pull/164124
2025-12-07 15:15:43 +00:00
Florian Hahn
f02dc4d198
[VPlan] Don't try to hoist multi-defs for first-order recurrences.
Currently the hoisting implementation expects single-defs. Bail out on
multi-defs (VPInterleaveRecipe), to fix an assertion.

Fixes https://github.com/llvm/llvm-project/issues/170666
2025-12-04 21:09:16 +00:00
Ramkumar Ramachandra
ef58670f03
Revert [VPlan] Consolidate logic for narrowToSingleScalars (#170720)
This reverts commit 7b3ec51, as a crash was reported:
https://llvm.godbolt.org/z/dK6ff5zvr -- this will give us time to
investigate a re-land.
2025-12-04 19:14:51 +00:00
Stephen Tozer
fda85a1423
[DebugInfo][LoopVectorizer][NFC] Use unknown annotations for more instructions (#170522)
Some recent patches have added more non-annotated empty locations to the
loop vectorizer, resulting in errors reported on the DebugLoc coverage
tracking buildbot:
https://lab.llvm.org/staging/#/builders/222/builds/1938

This patch adds "unknown" annotations in place of the empty locations,
allowing the buildbot to ignore them for now.
2025-12-04 09:32:01 +00:00
Ramkumar Ramachandra
7b3ec5191a
[VPlan] Consolidate logic for narrowToSingleScalars (NFCI) (#167360)
The logic for narrowing to single scalar recipes is in two different
places: narrowToSingleScalarRecipes and legalizeAndOptimizeInductions.
Consolidate them.
2025-12-03 20:25:52 +00:00
Florian Hahn
4b6ad11876
[VPlan] Sink predicated stores with complementary masks. (#168771)
Extend the logic to hoist predicated loads
(https://github.com/llvm/llvm-project/pull/168373) to sink predicated
stores with complementary masks in a similar fashion.

The patch refactors some of the existing logic for legality checks to be
shared between hosting and sinking, and adds a new sinking transform on
top.

With respect to the legality checks, for sinking stores the code also
checks if there are any aliasing stores that may alias, not only loads.

PR: https://github.com/llvm/llvm-project/pull/168771
2025-12-02 11:43:37 +00:00
Florian Hahn
25ab47bd40
[VPlan] Use wide IV if scalar lanes > 0 are used with scalable vectors. (#169796)
For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar
step values. At the moment, it creates a vector, in addition to to the
first lane. The only supported case for this is when only the last lane
is used. A recipe should not set both scalar and vector values.

Instead, we can simply use a vector induction. It would also be possible
to preserve the current vector code-gen, by creating VPInstructions
based on the first lane of VPScalarIVStepsRecipe, but using a vector
induction seems simpler.

PR: https://github.com/llvm/llvm-project/pull/169796
2025-12-01 17:33:36 +00:00
David Sherwood
17677ad7eb
[LV] Don't create WidePtrAdd recipes for scalar VFs (#169344)
While attempting to remove the use of undef from more loop vectoriser
tests I discovered a bug where this assert was firing:

```
llvm::Constant* llvm::Constant::getSplatValue(bool) const: Assertion `this->getType()->isVectorTy() && "Only valid for vectors!"' failed.
...
 #8 0x0000aaaab9e2fba4 llvm::Constant::getSplatValue
 #9 0x0000aaaab9dfb844 llvm::ConstantFoldBinaryInstruction
```

This seems to be happening because we are incorrectly generating
WidePtrAdd recipes for scalar VFs. The PR fixes this by checking whether
a plan has a scalar VF only in legalizeAndOptimizeInductions.

This PR also removes the use of undef from the test `both` in
Transforms/LoopVectorize/iv_outside_user.ll, which is what started
triggering the assert.

Fixes #169334
2025-12-01 08:12:41 +00:00
Florian Hahn
b76089c7f3
[VPlan] Skip uses-scalars restriction if one of ops needs broadcast. (#168246)
Update the logic in narrowToSingleScalar to allow narrowing even if not
all users use scalars, if at least one of the operands already needs
broadcasting.

In that case, there won't be any additional broadcasts introduced. This
should allow removing the special handling for stores, which can
introduce additional broadcasts currently.

Fixes https://github.com/llvm/llvm-project/issues/169668.

PR: https://github.com/llvm/llvm-project/pull/168246
2025-11-28 10:26:27 +00:00
Florian Hahn
8459508227
[VPlan] Handle scalar VPWidenPointerInd in convertToConcreteRecipes. (#169338)
In some case, VPWidenPointerInductions become only used by scalars after
legalizeAndOptimizationInducftions was already run, for example due to
some VPlan optimizations.

Move the code to scalarize VPWidenPointerInductions to a helper and use
it if needed.

This fixes a crash after #148274 in the added test case.

Fixes https://github.com/llvm/llvm-project/issues/169780
2025-11-27 21:52:15 +00:00
Luke Lau
1c7ec06b16
[VPlan] Optimize LastActiveLane to EVL - 1 (#169766)
With EVL tail folding, the LastActiveLane can be computed with EVL - 1.
This removes the need for a header mask and vfirst.m for loops with live
outs on RISC-V:

     # %bb.5:                                # %for.cond.cleanup7
    -       vsetvli zero, zero, e32, m2, ta, ma
    -       vmv.v.x v8, s1
    -       vmsleu.vv       v10, v8, v22
    -       vfirst.m        a0, v10
    -       srli    a1, a0, 63
    -       czero.nez       a0, a0, a1
    -       czero.eqz       a1, s8, a1
    -       or      a0, a0, a1
    -       addi    a0, a0, -1
    -       vsetvli zero, zero, e64, m4, ta, ma
    -       vslidedown.vx   v8, v12, a0
    +       addi    s1, s1, -1
    +       vslidedown.vx   v8, v12, s1
2025-11-27 17:03:08 +08:00
Florian Hahn
f8eca64a28
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 20:03:55 +00:00
Florian Hahn
d58ebe339c
Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)""
This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43.

Missed some test updates.
2025-11-26 19:41:39 +00:00
Florian Hahn
72e51d389f
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 19:31:25 +00:00
Sam Tebbs
071d1fb8be
[LV] Use VPReductionRecipe for partial reductions (#147513)
Partial reductions can easily be represented by the VPReductionRecipe
class by setting their scale factor to something greater than 1. This PR
merges the two together and gives VPReductionRecipe a VFScaleFactor so
that it can choose to generate the partial reduction intrinsic at
execute time.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. -> https://github.com/llvm/llvm-project/pull/147513

Replaces https://github.com/llvm/llvm-project/pull/146073 .
2025-11-26 16:18:22 +00:00
Florian Hahn
4cc8cc81e3
[VPlan] Hoist predicated loads with complementary masks. (#168373)
This patch adds a new VPlan transformation to hoist predicated loads, if
we can prove they execute unconditionally, i.e. there are 2 predicated
loads to the same address with complementary masks. Then we are
guaranteed to execute one of them on each iteration, allowing us to
remove the mask.

The transform groups masked replicating loads by their address SCEV,
then checks if there are 2 loads with complementary mask. If that is the
case, we check if there are any writes that may alias the load address
in the blocks between the first and last load with the same address.
The transforms operates after linearizing the CFG, but before
introducing replicate regions, which means this is just checking a chain
of consecutive blocks.

Currently this only uses noalias metadata to check for no-alias (using
the helpers added in https://github.com/llvm/llvm-project/pull/166247).

Then we create an unpredicated VPReplicateRecipe at the position of the
first load, then replace all users of the grouped loads with it.

Small Alive2 proof for hoisting with complementary masks:
https://alive2.llvm.org/ce/z/kUx742

PR: https://github.com/llvm/llvm-project/pull/168373
2025-11-26 13:55:14 +00:00
Ramkumar Ramachandra
c25e0d3e29
[VPlan] Simplify x + 0 -> x (#169394) 2025-11-25 05:58:41 +00:00
Ramkumar Ramachandra
37f7b3128d
Reland [VPlan] Handle WidenGEP in narrowToSingleScalars (#167880)
Changes: Fix a missed update to WidenGEP::usesFirstLaneOnly, and include
reduced-case test that was previously hitting the new assert: the
underlying reason was that VPWidenGEP::usesScalars was too weak, and the
single-scalar WidenGEP was not narrowed by narrowToSingleScalarRecipes.

This allows us to strip a special case in VPWidenGEP::execute.
2025-11-24 18:11:58 +00:00
Ramkumar Ramachandra
1abb055c57
[IVDesc] Make getCastInsts return an ArrayRef (NFC) (#169021)
To make it clear that the return value is immutable.
2025-11-24 08:57:55 +00:00
hstk30-hw
13a39eaa0b
[Sema] Fix Wunused-but-set-variable warning(NFC) (#169220)
Fix warning: 
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1455:23: warning:
variable 'Store' set but not used [-Wunused-but-set-variable]
2025-11-24 12:49:07 +08:00
Florian Hahn
21378fb75a
[VPlan] Merge fcmp uno feeding AnyOf. (#166823)
Fold
  any-of (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
  any-of (fcmp uno %A, %B), ...

This pattern is generated to check if any vector lane is NaN, and
combining multiple compares is beneficial on architectures that have
dedicated instructions.

Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM

Combine suggested as part of
https://github.com/llvm/llvm-project/pull/161735

PR: https://github.com/llvm/llvm-project/pull/166823
2025-11-23 15:52:19 +00:00
Florian Hahn
080ca902c6
[VPlan] Create resume phis in scalar preheader early. (NFC) (#166099)
Create phi recipes for scalar resume value up front in addInitialSkeleton during initial construction. This will allow moving the remaining code dealing with resume values to VPlan transforms/construction.

PR: https://github.com/llvm/llvm-project/pull/166099
2025-11-22 20:45:41 +00:00
Ramkumar Ramachandra
299ea95747
[VPlan] Drop poison-generating flags on induction trunc (#168922)
After truncating an integer-induction, neither nuw nor nsw hold.

Fixes #168902.

Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-11-21 08:14:46 +00:00
Florian Hahn
2befda2225
[VPlan] Populate and use VPIRFlags from initial VPInstruction. (#168450)
Update VPlan to populate VPIRFlags during VPInstruction construction and
use it when creating widened recipes, instead of constructing VPIRFlags
from the underlying IR instruction each time. The VPRecipeWithIRFlags
constructor taking an underlying instruction and setting the flags based
on it has been removed.

This centralizes initial VPIRFlags creation and ensures flags are
consistently available throughout VPlan transformations and makes sure
we don't accidentally re-add flags from the underlying instruction that
already got dropped during transformations.

Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did
the same for VPIRMetadata.

Should be NFC w.r.t. to the generated IR.

PR: https://github.com/llvm/llvm-project/pull/168450
2025-11-18 15:15:14 +00:00
Florian Hahn
7c34848ae1
[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247)
This patch implements a transform to hoists single-scalar replicated
loads with invariant addresses out of the vector loop to the preheader
when scoped noalias metadata proves they cannot alias with any stores in
the loop.

This enables hosting of loads we can prove do not alias any stores in
the loop due to memory runtime checks added during vectorization.

PR: https://github.com/llvm/llvm-project/pull/166247
2025-11-18 09:35:48 +00:00
Florian Hahn
3cba379e3d
[VPlan] Populate and use VPIRMetadata from VPInstructions (NFC) (#167253)
Update VPlan to populate VPIRMetadata during VPInstruction construction
and use it when creating widened recipes, instead of constructing
VPIRMetadata from the underlying IR instruction each time.

This centralizes VPIRMetadata in VPInstructions and ensures metadata is
consistently available throughout VPlan transformations.

PR: https://github.com/llvm/llvm-project/pull/167253
2025-11-17 21:28:49 +00:00
Florian Hahn
321b9d190b
[VPlan] Replace VPIRMetadata::addMetadata with setMetadata. (NFC)
Replace addMetadata with setMetadata, which sets metadata, updating
existing entries or adding a new entry otherwise.

This isn't strictly needed at the moment, but will be needed for
follow-up patches.
2025-11-17 20:55:18 +00:00
Ramkumar Ramachandra
ef023cae38
Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)
Changes: The previous patch had to be reverted to a mismatching-OpType
assert in cse. The reduced-test has now been added corresponding to a
RVV pointer-induction, and the pointer-induction case has been updated
to use createOverflowingBinaryOp.

While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-17 13:44:25 +00:00
Ramkumar Ramachandra
54fdf67bdb
[VPlan] Mark getPredicatedMask static (NFC) (#168067) 2025-11-17 08:22:00 +00:00
Ramkumar Ramachandra
daa30ae263
[VPlan] Improve code in RemoveMask_match (NFC) (#168065) 2025-11-17 08:21:34 +00:00
Ramkumar Ramachandra
85db92884c
[VPlan] Strip outdated comment in optimizeForVFAndUF (NFC) (#168068) 2025-11-15 10:09:49 +00:00
Alex Bradbury
f2336d4c7e
Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080)
Reverts llvm/llvm-project#163538

This is causing build failures on the two-stage RVV buildbots. e.g.
https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a
reproducer and more information at
https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822

This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.
2025-11-14 16:11:48 +00:00
Ramkumar Ramachandra
355e0f94af
[VPlan] Expand WidenInt inductions with nuw/nsw (#163538)
While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-14 12:10:55 +00:00
Florian Hahn
a6edeedbfa Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b.

This appears to be causing some runtime failures on RISCV
https://lab.llvm.org/buildbot/#/builders/210/builds/5221
2025-11-13 22:34:55 +00:00
Luke Lau
c0f7d51e8a
[VPlan] Simplify ExplicitVectorLength(%AVL) -> %AVL when AVL <= VF (#167647)
[`llvm.experimental.get.vector.length`](https://llvm.org/docs/LangRef.html#id2399)
has the property that if the AVL (%cnt) is less than or equal to VF
(%max_lanes) then the return value is just AVL.

This patch uses SCEV to simplify this in optimizeForVFAndUF, and adds
`ExplicitVectorLength` to
`VPInstruction::opcodeMayReadOrWriteFromMemory` so it gets removed once
dead.
2025-11-13 13:17:01 +00:00
Ramkumar Ramachandra
9ba738af2c
[VPlan] Fix assert in store-user in narrowToSingleScalars (#167686)
Follow up on c2d4c7c18b96 ([VPlan] Permit more users in
narrowToSingleScalars) to fix an assert related to WidenStore users of
the recipe being narrowed in narrowToSingleScalars.
2025-11-12 17:53:24 +00:00
Florian Hahn
62d1a080e6
[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-12 15:11:00 +00:00
Luke Lau
02c68b3ef7
[VPlan] Plumb scalable register size through narrowInterleaveGroups (#167505)
On RISC-V narrowInterleaveGroups doesn't kick in because the wrong
VectorRegWidth is passed to isConsecutiveInterleaveGroup.

narrowInterleaveGroups is always passed the RGK_FixedWidthVector
register size, but on RISC-V the RGK_ScalableVector size is twice as
large because we want to use LMUL 2. This causes the `GroupSize ==
VectorRegWidth` check to fail.

This fixes it by using the scalable register size whenever the VF is
scalable and plumbing it through as a potentially scalable TypeSize.

Note that this only makes a difference when tail folding is disabled, as
narrowInterleaveGroups can't handle EVL based IVs yet.
2025-11-12 11:14:53 +00:00
Florian Hahn
b9f0dadc10
[VPlan] Merge fcmp uno feeding Or. (#167251)
Fold
 or (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
 or (fcmp uno %A, %B), ...

This pattern is generated to check if any vector lane is NaN, and
combining multiple compares is beneficial on architectures that have
dedicated instructions.

Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM

Combine suggested as part of #161735

PR: https://github.com/llvm/llvm-project/pull/167251
2025-11-12 10:15:59 +00:00
Mel Chen
68a4af6acc
[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem (#154072)
Since div/rem operations don’t support a mask operand, the lanes of the
divisor that are masked out are currently replaced with 1 using
VPInstruction::Select before the predicated div/rem operation.
This patch replaces
```
  VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS)
```
with
```
  vp.merge(conditional_mask, LHS, RHS, EVL)
```
so that the header mask can be replaced by EVL in this usage scenario
when tail folding with EVL.
2025-11-12 08:03:57 +00:00
Florian Hahn
519cf3c2b8
[VPlan] Remove unneeded getDefiningRecipe with isa/cast/dyn_cast. (NFC)
Classof for most recipes directly supports VPValue, so there is no need
to call getDefiningRecipe when using isa/cast/dyn_cast.
2025-11-11 22:07:48 +00:00
Florian Hahn
c41ef17653
[VPlan] Add getSingleUser helper (NFC).
Add helper to make it easier to retrieve the single user of a VPUser.
2025-11-11 21:05:44 +00:00
Ramkumar Ramachandra
c8c328406c
Revert "[VPlan] Handle WidenGEP in narrowToSingleScalars" (#167509)
This reverts commit fdd52f5fe130fb8b98f4aed3d15aa0789cce6b40, as it
causes buildbot failures. This will give us time to investigate the
failure.

https://lab.llvm.org/buildbot/#/builders/210/builds/5160
2025-11-11 14:29:28 +00:00
Kerry McLaughlin
de3de3f143
[LV] Consider interleaving when -enable-wide-lane-mask=true (#163387)
Currently the only way to enable the use of wide active lane masks is to pass
-enable-wide-lane-mask and force both interleaving & tail-folding with additional
flags. This patch changes selectInterleaveCount to consider interleaving if wide
lane masks were requested, although the feature remains off by default.
2025-11-11 11:46:59 +00:00
Ramkumar Ramachandra
fdd52f5fe1
[VPlan] Handle WidenGEP in narrowToSingleScalars (#166740)
This allows us to strip a special case in VPWidenGEP::execute.
2025-11-11 10:33:55 +00:00
Florian Hahn
8b1cc2d5f5 [VPlan] Update canNarrowLoad to check WidenMember0's op first (NFCI).
This hardens the code to check based on WideMember0's operands. This
ensures each call will go through the same check. Should be NFC
currently but needed when generalizing in follow-up patches.
2025-11-10 22:18:34 +00:00