2808 Commits

Author SHA1 Message Date
Aiden Grossman
f29d06029f Revert "[LV] Mark checks as never succeeding for high cost cutoff."
This reverts commit 8a115b6934a90441d77ea54af73e7aaaa1394b38.

This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326

/home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'
2025-12-09 21:32:09 +00:00
Florian Hahn
8a115b6934
[LV] Mark checks as never succeeding for high cost cutoff.
When GeneratedRTChecks::create bails out due to exceeding the cost
threshold, no runtime checks are generated and we must not proceed
assuming checks have been generated.

Mark the checks as never succeeding, to make sure we don't try to
vectorize assuming the runtime checks hold. This fixes a case where we
previously incorrectly vectorized assuming runtime checks had been
generated when forcing vectorization via metadate.

Fixes the mis-compile mentioned in
https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588
2025-12-09 20:37:21 +00:00
Pengcheng Wang
1ef0a56b55
[LV][NFC] Use foldTailWithEVL() (#171282) 2025-12-09 16:58:53 +08:00
Luke Lau
0fbb45e7d6 [LV] Return getPredBlockCostDivisor in uint64_t
When the probability of a block is extremely low, HeaderFreq / BBFreq
may be larger than 32 bits. Previously this got truncated to uint32_t
which could cause division by zero exceptions on x86. Widen the return
type to uint64_t which should fit the entire range of BlockFrequency
values.

It's also worth noting that a frequency can never be zero according to
BlockFrequency.h, so we shouldn't need to worry about divide by zero in
getPredBlockCostDivisor itself.
2025-12-09 15:43:13 +08:00
Florian Hahn
65dd29b335
[LV] Compare induction start values via SCEV in assertion (NFCI).
Instead of comparing plain VPValue in the assertion checking the start
values, directly compare the SCEV's. This future-proofs the code in
preparation of performing more simplifications/canonicalizations for
live-ins.
2025-12-08 21:31:53 +00:00
Luke Lau
e8219e5ce8
[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690)
In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we
unprofitably loop vectorize on RISC-V.

The loop looks something like:

```c
  for (int i = 0; i < n; i++) {
    if (x0[i] == a)
      if (x1[i] == b)
        if (x2[i] == c)
          // do stuff...
  }
```

Because it's so deeply nested the actual inner level of the loop rarely
gets executed. However we still deem it profitable to vectorize, which
due to the if-conversion means we now always execute the body.

This stems from the fact that `getPredBlockCostDivisor` currently
assumes that blocks have 50% chance of being executed as a heuristic.

We can fix this by using BlockFrequencyInfo, which gives a more accurate
estimate of the innermost block being executed 12.5% of the time. We can
then calculate the probability as `HeaderFrequency / BlockFrequency`.

Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V.

Whilst there's a lot of changes in the in-tree tests, this doesn't
affect llvm-test-suite or SPEC CPU 2017 that much:

- On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized
on llvm-test-suite/SPEC CPU 2017.
- On x86-64 -flto -O3 **with PGO** there's 0.9%/0% less geomean loops
vectorized on llvm-test-suite/SPEC CPU 2017.

Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO:
https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au
2025-12-08 14:28:26 +00:00
Florian Hahn
3fc7419236
[VPlan] Replace ExtractLast(Elem|LanePerPart) with ExtractLast(Lane/Part) (#164124)
Replace ExtractLastElement and ExtractLastLanePerPart with more generic
and specific ExtractLastLane and ExtractLastPart, which model distinct
parts of extracting across parts and lanes. ExtractLastElement ==
ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart ==
ExtractLastLane, the latter clarifying the name of the opcode. A new
m_ExtractLastElement matcher is provided for convenience.

The patch should be NFC modulo printing changes.

PR: https://github.com/llvm/llvm-project/pull/164124
2025-12-07 15:15:43 +00:00
Luke Lau
37ea097943
[VPlan] Remove VPWidenRecipe constructor with no underlying instruction. NFCI (#166521)
My understanding is that a VPWidenRecipe should be used for recipes with
an exact underlying scalar instruction, and VPInstruction should be used
elsewhere e.g. for instructions generated as a part of the vectorization
process.

The only user of the VPWidenRecipe constructor that doesn't take an
underlying instruction is in adjustRecipesForReductions, but we can just
use VPInstruction there.
2025-12-04 19:19:23 +08:00
Florian Hahn
f0e1254bce
[LV] Use forced cost once for whole interleave group in legacy costmodel (#168270)
The VPlan-based cost model assigns the forced cost once for a whole
VPInterleaveRecipe. Update the legacy cost model to match this behavior.
This fixes a cost-model divergence, and assigns the cost in a way that
matches the generated code more accurately.

PR: https://github.com/llvm/llvm-project/pull/168270
2025-12-02 21:39:54 +00:00
Florian Hahn
4b6ad11876
[VPlan] Sink predicated stores with complementary masks. (#168771)
Extend the logic to hoist predicated loads
(https://github.com/llvm/llvm-project/pull/168373) to sink predicated
stores with complementary masks in a similar fashion.

The patch refactors some of the existing logic for legality checks to be
shared between hosting and sinking, and adds a new sinking transform on
top.

With respect to the legality checks, for sinking stores the code also
checks if there are any aliasing stores that may alias, not only loads.

PR: https://github.com/llvm/llvm-project/pull/168771
2025-12-02 11:43:37 +00:00
Florian Hahn
24b87b8d48
[VPlan] Skip cost verification for loops with EVL gather/scatter.
The VPlan-based cost model use vp_gather/vp_scatter for gather/scatter
costs, which is different to the legacy cost model and cannot be matched
there. Don't verify the costs match for plans containing gather/scatters
with EVL.

Fixes https://github.com/llvm/llvm-project/issues/169948.
2025-11-29 22:00:30 +00:00
Florian Hahn
99addbf73d
[LV] Vectorize selecting last IV of min/max element. (#141431)
Add support for vectorizing loops that select the index of the minimum
or maximum element. The patch implements vectorizing those patterns by
combining Min/Max and FindFirstIV reductions.

It extends matching Min/Max reductions to allow in-loop users that are
FindLastIV reductions. It records a flag indicating that the Min/Max
reduction is used by another reduction. The extra user is then check as
part of the new `handleMultiUseReductions` VPlan transformation.

It processes any reduction that has other reduction users. The reduction
using the min/max reduction currently must be a FindLastIV reduction,
which needs adjusting to compute the correct result:
 1. We need to find the last IV for which the condition based on the
     min/max reduction is true,
 2. Compare the partial min/max reduction result to its final value and,
 3. Select the lanes of the partial FindLastIV reductions which
     correspond to the lanes matching the min/max reduction result.

Depends on https://github.com/llvm/llvm-project/pull/140451

PR: https://github.com/llvm/llvm-project/pull/141431
2025-11-28 22:26:19 +00:00
Shih-Po Hung
b9bdec3021
[TTI][Vectorize] Migrate masked/gather-scatter/strided/expand-compress costing (NFCI) (#165532)
In #160470, there is a discussion about the possibility to explored a
general approach for handling memory intrinsics.

API changes:
- Remove getMaskedMemoryOpCost, getGatherScatterOpCost,
getExpandCompressMemoryOpCost, getStridedMemoryOpCost from
Analysis/TargetTransformInfo.
- Add getMemIntrinsicInstrCost.

In BasicTTIImpl, map intrinsic IDs to existing target implementation
until the legacy TTI hooks are retired.
- masked_load/store → getMaskedMemoryOpCost
- masked_/vp_gather/scatter → getGatherScatterOpCost
- masked_expandload/compressstore → getExpandCompressMemoryOpCost
- experimental_vp_strided_{load,store} → getStridedMemoryOpCost
TODO: add support for vp_load_ff.

No functional change intended; costs continue to route to the same
target-specific hooks.
2025-11-28 05:14:37 +00:00
Florian Hahn
f8eca64a28
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 20:03:55 +00:00
Florian Hahn
d58ebe339c
Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)""
This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43.

Missed some test updates.
2025-11-26 19:41:39 +00:00
Florian Hahn
72e51d389f
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 19:31:25 +00:00
Sam Tebbs
071d1fb8be
[LV] Use VPReductionRecipe for partial reductions (#147513)
Partial reductions can easily be represented by the VPReductionRecipe
class by setting their scale factor to something greater than 1. This PR
merges the two together and gives VPReductionRecipe a VFScaleFactor so
that it can choose to generate the partial reduction intrinsic at
execute time.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. -> https://github.com/llvm/llvm-project/pull/147513

Replaces https://github.com/llvm/llvm-project/pull/146073 .
2025-11-26 16:18:22 +00:00
Florian Hahn
4cc8cc81e3
[VPlan] Hoist predicated loads with complementary masks. (#168373)
This patch adds a new VPlan transformation to hoist predicated loads, if
we can prove they execute unconditionally, i.e. there are 2 predicated
loads to the same address with complementary masks. Then we are
guaranteed to execute one of them on each iteration, allowing us to
remove the mask.

The transform groups masked replicating loads by their address SCEV,
then checks if there are 2 loads with complementary mask. If that is the
case, we check if there are any writes that may alias the load address
in the blocks between the first and last load with the same address.
The transforms operates after linearizing the CFG, but before
introducing replicate regions, which means this is just checking a chain
of consecutive blocks.

Currently this only uses noalias metadata to check for no-alias (using
the helpers added in https://github.com/llvm/llvm-project/pull/166247).

Then we create an unpredicated VPReplicateRecipe at the position of the
first load, then replace all users of the grouped loads with it.

Small Alive2 proof for hoisting with complementary masks:
https://alive2.llvm.org/ce/z/kUx742

PR: https://github.com/llvm/llvm-project/pull/168373
2025-11-26 13:55:14 +00:00
Florian Hahn
48eb697441
[LV] Count cost of middle block if TC <= VF. (#168949)
If the expected trip count is less than the VF, the vector loop will
only execute a single iteration. When that's the case, the cost of the
middle block has the same impact as the cost of the vector loop. Include
it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra
work in the middle block makes it unprofitable.

Note that isOutsideLoopWorkProfitable already scales the cost of blocks
outside the vector region, but the patch restricts accounting for the
middle block to cases where VF <= ExpectedTC, to initially catch some
worst cases and avoid regressions.

This initial version should specifically avoid unprofitable tail-folding
for loops with low trip counts after re-applying
https://github.com/llvm/llvm-project/pull/149042.

PR: https://github.com/llvm/llvm-project/pull/168949
2025-11-24 19:23:04 +00:00
Ramkumar Ramachandra
1abb055c57
[IVDesc] Make getCastInsts return an ArrayRef (NFC) (#169021)
To make it clear that the return value is immutable.
2025-11-24 08:57:55 +00:00
Florian Hahn
080ca902c6
[VPlan] Create resume phis in scalar preheader early. (NFC) (#166099)
Create phi recipes for scalar resume value up front in addInitialSkeleton during initial construction. This will allow moving the remaining code dealing with resume values to VPlan transforms/construction.

PR: https://github.com/llvm/llvm-project/pull/166099
2025-11-22 20:45:41 +00:00
Florian Hahn
7acfbc23a7
[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289)
Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and
replace it with `onlyScalarValuesUsed`. This removes the need to carry
state from the legacy cost model through VPlan, and the VPlan-based
analysis gives more accurate results, avoiding a number of extracts.

PR: https://github.com/llvm/llvm-project/pull/168289
2025-11-20 18:58:25 +00:00
Florian Hahn
67e35bbebb
[LV] Check full partial reduction chains in order. (#168036)
https://github.com/llvm/llvm-project/pull/162822 added another
validation step to check if entries in a partial reduction chain have
the same scale factor. But the validation was still dependent on the
order of entries in PartialReductionChains, and would fail to reject
some cases (e.g. if the first first link matched the scale of the second
link, but the second link is invalidated later).

To fix that, group chains by their starting phi nodes, then perform the
validation for each chain, and if it fails, invalidate the whole chain
for the phi.

Fixes https://github.com/llvm/llvm-project/issues/167243.
Fixes https://github.com/llvm/llvm-project/issues/167867.

PR: https://github.com/llvm/llvm-project/pull/168036
2025-11-20 15:54:57 +00:00
Sam Tebbs
3396b4654b
[LV] Allow partial reductions with an extended bin op (#165536)
A pattern of the form reduce.add(ext(mul)) is valid for a partial
reduction as long as the mul and its operands fulfill the requirements
of a normal partial reduction. The mul's extend operands will be
optimised to the wider extend, and we already have oneUse checks in
place to make sure the mul and operands can be modified safely.

1. -> https://github.com/llvm/llvm-project/pull/165536
2. https://github.com/llvm/llvm-project/pull/165543
2025-11-20 10:22:11 +00:00
Florian Hahn
040d9c94be
[VPlan] Collect FMFs for in-loop reduction chain in VPlan. (NFC)
Replace retrieving FMFs for in-loop reduction via underlying instruction
+ legal by collecting the flags during reduction chain traversal in
VPlan.
2025-11-19 22:11:21 +00:00
Luke Lau
5da0445420
[LV] Consolidate shouldOptimizeForSize and remove unused BFI/PSI. NFC (#168697)
#158690 plans on passing BFI as a lazy lambda to avoid computing
BlockFrequencyInfo when not needed.

In preparation for that, this PR removes BFI and PSI from some
constructors that aren't used. It also consolidates the two calls to
llvm::shouldOptimizeForSize so that the result is computed once and
passed where needed.

This also renames OptForSize in LoopVectorizationLegality to clarify
that it's to prevent runtime SCEV checks, see
https://reviews.llvm.org/D68082
2025-11-19 21:29:26 +08:00
Hassnaa Hamdi
f7f41350b4
[LV]: Skip Epilogue scalable VF greater than RemainingIterations. (#156724)
Consider skipping epilogue scalable VF when they are greater than
RemainingIterations same as fixed VF.
And skip scalable RemainingIterations from that comparison because
SCEV ATM can't evaluate non-canonical vscale-based expressions.
2025-11-19 05:11:17 +00:00
Shih-Po Hung
961940e1a7
[TTI] Use MemIntrinsicCostAttributes for getMaskedMemoryOpCost (#168029)
- Split from #165532. This is a step toward a unified interface for
masked/gather-scatter/strided/expand-compress cost modeling.
- Replace the ad-hoc parameter list with a single attributes object.

API change:
```
- InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment,
-                                       AddressSpace, CostKind);

+ InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes,
+                                       CostKind);
```
Notes:
- NFCI intended: callers populate MemIntrinsicCostAttributes with the
same information as before.
- Follow-up: migrate gather/scatter, strided, and expand/compress cost
queries to the same attributes-based entry point.
2025-11-19 09:51:12 +08:00
Florian Hahn
2befda2225
[VPlan] Populate and use VPIRFlags from initial VPInstruction. (#168450)
Update VPlan to populate VPIRFlags during VPInstruction construction and
use it when creating widened recipes, instead of constructing VPIRFlags
from the underlying IR instruction each time. The VPRecipeWithIRFlags
constructor taking an underlying instruction and setting the flags based
on it has been removed.

This centralizes initial VPIRFlags creation and ensures flags are
consistently available throughout VPlan transformations and makes sure
we don't accidentally re-add flags from the underlying instruction that
already got dropped during transformations.

Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did
the same for VPIRMetadata.

Should be NFC w.r.t. to the generated IR.

PR: https://github.com/llvm/llvm-project/pull/168450
2025-11-18 15:15:14 +00:00
Florian Hahn
3cba379e3d
[VPlan] Populate and use VPIRMetadata from VPInstructions (NFC) (#167253)
Update VPlan to populate VPIRMetadata during VPInstruction construction
and use it when creating widened recipes, instead of constructing
VPIRMetadata from the underlying IR instruction each time.

This centralizes VPIRMetadata in VPInstructions and ensures metadata is
consistently available throughout VPlan transformations.

PR: https://github.com/llvm/llvm-project/pull/167253
2025-11-17 21:28:49 +00:00
Ramkumar Ramachandra
ef023cae38
Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)
Changes: The previous patch had to be reverted to a mismatching-OpType
assert in cse. The reduced-test has now been added corresponding to a
RVV pointer-induction, and the pointer-induction case has been updated
to use createOverflowingBinaryOp.

While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-17 13:44:25 +00:00
Florian Hahn
e009de26b6 [LV] Use VPlan pattern matching in adjustRecipesForReductions (NFC)
Replace the assert checking if CurrentLinkI is a CmpInst with a pattern
matching check in the if condition. This uses VPlan-level pattern matching
instead of inspecting the underlying instruction type.
2025-11-15 21:45:40 +00:00
Alex Bradbury
f2336d4c7e
Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080)
Reverts llvm/llvm-project#163538

This is causing build failures on the two-stage RVV buildbots. e.g.
https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a
reproducer and more information at
https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822

This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.
2025-11-14 16:11:48 +00:00
Ramkumar Ramachandra
355e0f94af
[VPlan] Expand WidenInt inductions with nuw/nsw (#163538)
While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-14 12:10:55 +00:00
Mel Chen
3277f6caef
[LV] Explicitly disable in-loop reductions for AnyOf and FindIV. nfc (#163541)
Currently, in-loop reductions for AnyOf and FindIV are not supported.
They were implicitly blocked. This happened because
RecurrenceDescriptor::getReductionOpChain could not detect their
recurrence chain. The reason is that RecurrenceDescriptor::getOpcode was
set to Instruction::Or, but the recurrence chains of AnyOf and FindIV do
not actually contain an Instruction::Or.

This patch explicitly disables in-loop reductions for AnyOf and FindIV
instead of relying on getReductionOpChain to implicitly prevent them.
2025-11-14 09:14:07 +00:00
Luke Lau
851f8f7984
[VPlan] Disable partial reductions again with EVL tail folding (#167863)
VPPartialReductionRecipe doesn't yet support an EVL variant, and we
guard against this by not calling convertToAbstractRecipes when we're
tail folding with EVL.

However recently some things got shuffled around which means we may
detect some scaled reductions in collectScaledReductions and store them
in ScaledReductionMap, where outside of convertToAbstractRecipes we may
look them up and start e.g. adding a scale factor to an otherwise
regular VPReductionPHI.

This fixes it by skipping collectScaledReductions, and fixes #167861
2025-11-14 06:30:12 +00:00
Florian Hahn
a6edeedbfa Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b.

This appears to be causing some runtime failures on RISCV
https://lab.llvm.org/buildbot/#/builders/210/builds/5221
2025-11-13 22:34:55 +00:00
Ryan Buchner
a04c6b5512
[LV] Update LoopVectorizationPlanner::emitInvalidCostRemarks to handle reduction plans (#165913)
The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case.

Fixes #165359.
2025-11-13 06:12:40 -10:00
Florian Hahn
b6bcfdea40 [VPlan] Get opcode & type from recipe in adjustRecipesForReduction (NFC)
Replace direct access to underlying IR instructions with VPlan-level
equivalents, i.e. VPTypeAnalysis and pattern matching on the recipe.

Removes a few uses of accessing underlying IR.
2025-11-12 22:37:15 +00:00
Florian Hahn
53a65ba6b9 [VPlan] Don't look up recipe for IV step via RecipeBuilder. (NFC)
Directly update induction increments with step value created for wide
inductions in createWidenInductionRecipes, which does not require
looking up via RecipeBuilder.
2025-11-12 22:08:56 +00:00
Florian Hahn
62d1a080e6
[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-12 15:11:00 +00:00
Luke Lau
02c68b3ef7
[VPlan] Plumb scalable register size through narrowInterleaveGroups (#167505)
On RISC-V narrowInterleaveGroups doesn't kick in because the wrong
VectorRegWidth is passed to isConsecutiveInterleaveGroup.

narrowInterleaveGroups is always passed the RGK_FixedWidthVector
register size, but on RISC-V the RGK_ScalableVector size is twice as
large because we want to use LMUL 2. This causes the `GroupSize ==
VectorRegWidth` check to fail.

This fixes it by using the scalable register size whenever the VF is
scalable and plumbing it through as a potentially scalable TypeSize.

Note that this only makes a difference when tail folding is disabled, as
narrowInterleaveGroups can't handle EVL based IVs yet.
2025-11-12 11:14:53 +00:00
Florian Hahn
519cf3c2b8
[VPlan] Remove unneeded getDefiningRecipe with isa/cast/dyn_cast. (NFC)
Classof for most recipes directly supports VPValue, so there is no need
to call getDefiningRecipe when using isa/cast/dyn_cast.
2025-11-11 22:07:48 +00:00
Florian Hahn
c41ef17653
[VPlan] Add getSingleUser helper (NFC).
Add helper to make it easier to retrieve the single user of a VPUser.
2025-11-11 21:05:44 +00:00
Kerry McLaughlin
de3de3f143
[LV] Consider interleaving when -enable-wide-lane-mask=true (#163387)
Currently the only way to enable the use of wide active lane masks is to pass
-enable-wide-lane-mask and force both interleaving & tail-folding with additional
flags. This patch changes selectInterleaveCount to consider interleaving if wide
lane masks were requested, although the feature remains off by default.
2025-11-11 11:46:59 +00:00
Sander de Smalen
517d725463
[LV] Move condition to VPPartialReductionRecipe::execute (#166136)
This means that VPExpressions will now be constructed for
VPPartialReductionRecipe's when the loop has tail-folding predication.

Note that control-flow (if/else) predication is not yet handled
for partial reductions, because of the way partial reductions
are recognised and built up.
2025-11-11 09:42:54 +00:00
Luke Lau
bfd4155f23
[VPlan] Don't apply predication discount to non-originally-predicated blocks (#160449)
Split off from #158690. Currently if an instruction needs predicated due
to tail folding, it will also have a predicated discount applied to it
in multiple places.
This is likely inaccurate because we can expect a tail folded
instruction to be executed on every iteration bar the last.

This fixes it by checking if the instruction/block was originally
predicated, and in doing so prevents vectorization with tail folding
where we would have had to scalarize the memory op anyway.

On llvm-test-suite this causes 4 loops in total to no longer be
vectorized with -O3 on arm64-apple-darwin, and there's no observable
performance impact.
2025-11-10 12:10:40 +00:00
Florian Hahn
d406c15fc8 [VPlan] Use VPInstructionWithType for casts in VPlan0. (NFC)
Use VPInstructionWithType for casts in VPlan0, to enable additional
analysis/transforms on VPlan0, and more accurate modeling in VPlan0.
2025-11-09 21:35:50 +00:00
Florian Hahn
8a6d5f68e4
[VPlan] Update more VPRecipeBuilder members to take VPInst directly (NFC)
Update VPRecipeBuilder methos to take VPInstruction* directly instead of
ArrayRef<> for operands and Instruction * separately.

This allows avoid accessing the underlying instruction in some cases, by
using information directly from VPInstruction, like getOpcode(),
getDebugLoc(), and getOperand().

It also allows directly transferring other information directly from
VPInstruction in the future.
2025-11-07 21:06:38 +00:00
Florian Hahn
3ad5765e23
[LV] Check all users of partial reductions in chain have same scale. (#162822)
Check that all partial reductions in a chain are only used by other
partial reductions with the same scale factor. Otherwise we end up
creating users of scaled reductions where the types of the other
operands don't match.

A similar issue was addressed in
https://github.com/llvm/llvm-project/pull/158603, but misses the chained
cases.

Fixes https://github.com/llvm/llvm-project/issues/162530.

PR: https://github.com/llvm/llvm-project/pull/162822
2025-11-06 21:45:57 +00:00