2678 Commits

Author SHA1 Message Date
Florian Hahn
47737cdeda
[LV] Move introduceCheckBlockInVPlan to EpilogueVectorizerMainLoop (NFC)
Move it to the sub-class that is actually using it.
2025-08-28 19:18:02 +01:00
Florian Hahn
ffd0f5fd21
[LV] Remove unneeded ILV::LoopScalarPreHeader (NFC).
Follow-up suggested in https://github.com/llvm/llvm-project/pull/153643.
Remove some more global state by directly returning the scalar
preheader from createScalarPreheader.
2025-08-26 21:44:48 +01:00
Florian Hahn
5088795994
[LV] Remove unused ILV::VectorTripCount (NFC).
The field is no longer used, remove it.
2025-08-26 19:01:51 +01:00
Florian Hahn
5faed1ad84
[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643)
This patch adds a new VPlan-based addMinimumIterationCheck, which
replaced the ILV version for the non-epilogue case.

The VPlan-based version constructs a SCEV expression to compute the
minimum iterations, use that to check if the check is known true or
false. Otherwise it creates a VPExpandSCEV recipe and emits a
compare-and-branch.

When using epilogue vectorization, we still need to create the minimum
trip-count-check during the legacy skeleton creation. The patch moves
the definitions out of ILV.

PR: https://github.com/llvm/llvm-project/pull/153643
2025-08-26 15:52:31 +01:00
Kerry McLaughlin
884c03e71b
[LV] Return Invalid from getLegacyCost when instruction cost forced. (#154543)
LoopVectorizationCostModel::expectedCost will only override the cost
returned by getInstructionCost when valid. This patch ensures we do
the same in VPCostContext::getLegacyCost, avoiding the "VPlan cost
model and legacy cost model disagreed" assert in the included test.
2025-08-26 10:26:57 +01:00
David Sherwood
d606eae2ce
[LV] Stop using the legacy cost model for udiv + friends (#152707)
In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and
srem we fall back on the legacy cost unnecessarily. At this point we
know that the vplan must be functionally correct, i.e. if the
divide/remainder is not safe to speculatively execute then we must have
either:

1. Scalarised the operation, in which case we wouldn't be using a
VPWidenRecipe, or
2. We've inserted a select for the second operand to ensure we don't
fault through divide-by-zero.

For 2) it's necessary to add the select operation to
VPInstruction::computeCost so that we mirror the cost of the legacy cost
model. The only problem with this is that we also generate selects in
vplan for predicated loops with reductions, which *aren't* accounted for
in the legacy cost model. In order to prevent asserts firing I've also
added the selects to precomputeCosts to ensure the legacy costs match
the vplan costs for reductions.
2025-08-26 10:17:23 +01:00
Florian Hahn
c950a72974
[VPlan] Support scalar VF for ExtractLane and FirstActiveLane.
Extend ExtractLane and FirstActiveLane to support scalable VFs. This
allows correct handling when interleaving with VF = 1.

Alive2 proofs:
 - Fixed codegen with this patch: https://alive2.llvm.org/ce/z/8Y5_Vc
   (verifies as correct)
 - Original codegen: https://alive2.llvm.org/ce/z/twdg3X (doesn't
   verify)

Fixes https://github.com/llvm/llvm-project/issues/154967.
2025-08-25 21:45:21 +01:00
Ramkumar Ramachandra
66be00d635
[VPlan] Introduce m_Cmp; match more compares (#154771)
Extend [Specific]Cmp_match to handle floating-point compares, and
introduce m_Cmp that matches both integer and floating-point compares.
Use it in simplifyRecipe to match and simplify the general case of
compares. The change has necessitated a bugfix in
VPReplicateRecipe::execute.
2025-08-24 13:27:06 +01:00
Florian Hahn
954097dd61
[VPlan] Use SCEV to check subtract in getOptimizableIVOf.
Simplify checks for IV subtractions in getOptimizableIVOf by using SCEV.
This slightly generalizes the patterns we can handle.
2025-08-23 22:00:01 +01:00
Florian Hahn
9f87cd68a4
[VPlan] Add m_ExtractLastElement matcher. (NFC) 2025-08-23 21:21:03 +01:00
Florian Hahn
30c26dcc47
[VPlan] Create extracts for live-outs early (NFC).
Create extracts for live-outs during skeleton construction.
2025-08-23 13:28:15 +01:00
Florian Hahn
300d2c6d20
[VPlan] Move SCEV expansion to VPlan transform. (NFCI).
Move the logic to expand SCEVs directly to a late VPlan transform that
expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an
abstract recipe without execute, which clarifies how the recipe is
handled, i.e. it is not executed like regular recipes.

It also helps to simplify construction, as now scalar evolution isn't
required to be passed to the recipe.
2025-08-21 22:03:26 +01:00
Ramkumar Ramachandra
a96b78cf41
[SCEVPatternMatch] Add signed cst match; use in LV (NFC) (#154568)
Add a m_scev_SpecificSInt for matching a sign-extended value, and use it
to improve some code in LoopVectorize.
2025-08-21 15:46:53 +00:00
Florian Hahn
4e6c88be7c
[TTI] Remove Args argument from getOperandsScalarizationOverhead (NFC). (#154126)
Remove the ArrayRef<const Value*> Args operand from
getOperandsScalarizationOverhead and require that the callers
de-duplicate arguments and filter constant operands.

Removing the Value * based Args argument enables callers where no Value
* operands are available to use the function in a follow-up: computing
the scalarization cost directly for a VPlan recipe.

It also allows more accurate cost-estimates in the future: for example,
when vectorizing a loop, we could also skip operands that are live-ins,
as those also do not require scalarization.

PR: https://github.com/llvm/llvm-project/pull/154126
2025-08-20 21:09:08 +01:00
David Sherwood
e172110d12
[LV] Don't calculate scalar costs for scalable VFs in setVectorizedCallDecision (#152713)
In setVectorizedCallDecision we attempt to calculate the scalar costs
for vectorisation calls, even for scalable VFs where we already know the
answer is Invalid. We can avoid doing unnecessary work by skipping this
completely for scalable vectors.
2025-08-20 15:00:31 +01:00
Florian Hahn
dc23869f98
[LV] Handle vector trip count being zero in preparePlanForEpiVectorLoop.
After a485e0e, we may not set the vector trip count in
preparePlanForEpilogueVectorLoop if it is zero. We should not choose a
VF * UF that makes the main vector loop dead (i.e. vector trip count is
zero), but there are some cases where this can happen currently.

In those cases, set EPI.VectorTripCount to zero.
2025-08-20 11:54:22 +01:00
David Sherwood
13d8ba7dea
[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086)
There are a couple of places in the loop vectoriser where we
want to calculate the cost of extracting the last lane in a
vector. However, we wrongly assume that asking for the cost
of extracting lane (VF.getKnownMinValue() - 1) is an accurate
representation of the cost of extracting the last lane. For
SVE at least, this is non-trivial as it requires the use of
whilelo and lastb instructions.

To solve this problem I have added a new
getReverseVectorInstrCost interface where the index is used
in reverse from the end of the vector. Suppose a vector has
a given ElementCount EC, the extracted/inserted lane would be
EC - 1 - Index. For scalable vectors this index is unknown at
compile time. I've added a AArch64 hook that better represents
the cost, and also a RISCV hook that maintains compatibility
with the behaviour prior to this PR.

I've also taken the liberty of adding support in vplan for
calculating the cost of VPInstruction::ExtractLastElement.
2025-08-19 09:31:37 +01:00
Mel Chen
1dac302ce7
[LV] Explicitly disallow interleaved access requiring gap mask for scalable VFs. nfc (#154122)
Currently, VPInterleaveRecipe::execute does not support generating LLVM
IR for interleaved accesses that require a gap mask for scalable VFs.
It would be better to detect and prevent such groups from being
vectorized as interleaved accesses in
LoopVectorizationCostModel::interleavedAccessCanBeWidened, rather than
relying on the TTI function getInterleavedMemoryOpCost to return an
invalid cost.
2025-08-19 08:42:39 +08:00
Florian Hahn
7e9989390d
[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487)
Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to
serve their users requiring a vector, instead of doing so when unrolling
by VF.

Now we only need to implicitly build vectors in VPTransformState::get
for VPInstructions. Once they are also unrolled by VF we can remove the
code-path alltogether.

PR: https://github.com/llvm/llvm-project/pull/151487
2025-08-18 20:49:42 +01:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Florian Hahn
351d398a37
[VPlan] Run final VPlan simplifications before codegen.
Dissolving the hierarchical VPlan CFG and converting abstract to
concrete recipes can expose additional simplification opportunities.

Do a final run of simplifyRecipes before executing the VPlan.
2025-08-16 18:54:27 +01:00
Florian Hahn
2ed727f3f6
[VPlan] Move SCEV invalidation to ::executePlan. (NFCI)
Move SCEV invalidation from legacy ILV code-path directly to ::executePlan.
2025-08-15 20:32:41 +01:00
Florian Hahn
db98ac43ec
[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495)
Directly emit shl instead of a multiply if VF * Step is a power-of-2. The
main motivation here is to prepare the code and test for directly
generating and expanding a SCEV expression of the minimum iteration
count. SCEVExpander will directly emit shl for multiplies with
powers-of-2.

InstCombine will also performs this combine, so end-to-end this should
effectively by NFC.

PR: https://github.com/llvm/llvm-project/pull/153495
2025-08-14 19:27:51 +01:00
Florian Hahn
ff0ce74be8
[VPlan] Replace scalar preheader with VPIRBB at single place (NFC).
Replace the scalar preheader VPBB with an VPIRBB wrapping the IR basic
block created by createVectorizedLoopSkeleton.
2025-08-14 19:11:34 +01:00
Florian Hahn
177f27d220
[VPlan] Add incoming_[blocks,values] iterators to VPPhiAccessors (NFC) (#138472)
Add 3 new iterator ranges to VPPhiAccessors

* incoming_values(): returns a range over the incoming
  values of a phi 
* incoming_blocks(): returns a range over the incoming 
  blocks of a phi
* incoming_values_and_blocks: returns a range over pairs of
   incoming values and blocks.

Depends on https://github.com/llvm/llvm-project/pull/124838.

PR: https://github.com/llvm/llvm-project/pull/138472
2025-08-14 16:47:04 +01:00
Elvis Wang
01fac67e2a
[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342)
This patch add cost kind to `getAddressComputationCost()` for #149955.

Note that this patch also remove all the default value in `getAddressComputationCost()`.
2025-08-14 16:01:44 +08:00
Florian Hahn
9400490a3c
[LV] Remove unused ILV state (NFC).
Remove unused member variables from InnerLoopVectorizer.
2025-08-13 21:28:50 +01:00
Ramkumar Ramachandra
d107c29fef
[VPlan] Strip unused CanonicalIVTy arg (NFC) (#153418) 2025-08-13 15:53:56 +01:00
Florian Hahn
48bfaa4c06
[VPlan] Replace VPBB for vector.ph during skeleton creation (NFC)
Shift replacement of regular VPBB for vector.ph with the VPIRBB wrapping
the created IR block directly to skeleton creation, to be consistent
with how the scalar preheader is handled.
2025-08-13 08:30:18 +01:00
Florian Hahn
8cdab07aaa
Reapply "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9.

Recommit with a fix for the verifier error caused for EVL recipes.

Extra test coverage added in 6f939da60e.
2025-08-12 22:09:30 +01:00
Florian Hahn
424258947e
[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879)
Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.

This is one of the last few steps needed to model the full vector
skeleton in VPlan.

This is mostly NFC, although in some cases we remove some unused
computations.

PR: https://github.com/llvm/llvm-project/pull/152879
2025-08-12 14:13:13 +01:00
David Sherwood
8140779a9a
[LV] Improve accuracy of branch weights in epilogue iteration check block (#152980)
When one of the vector loops (main or epilogue) is scalable and the
other isn't, we can use the estimated value of vscale to improve the
accuracy.
2025-08-12 10:37:47 +01:00
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Florian Hahn
1c7c8e3ad3
Revert "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2.

This seems to be breaking some RISCV bots, reverting for now
https://lab.llvm.org/buildbot/#/builders/210/builds/1266
2025-08-11 22:05:30 +01:00
Florian Hahn
1f17bb133f
[VPlan] Remove trivial dead VPPhi cycles.
Update removeDeadRecipes to remove trivial dead VPPhi cycles.

Should effectively be NFC end-to-end.
2025-08-11 21:29:49 +01:00
Luke Lau
aea82a780a
[VPlan] Remove some getCanonicalIV() uses. NFC (#152969)
A lot of time getCanonicalIV() is used to get the canonical IV type,
e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext.

However VPTypeAnalysis has a constructor that takes the VPlan directly
and there's a method on VPlan to get the LLVMContext directly, so use
those instead where possible.

This lets us remove a constructor on VPTypeAnalysis.

Also remove an unused LLVMContext argument in UnrollState whilst we're
here.
2025-08-11 18:12:05 +08:00
Luke Lau
acb86fb9e0
[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657)
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.

The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.

This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.

No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
2025-08-11 18:00:12 +08:00
David Sherwood
aba0ce10c7
[LV] Add new line to interleaving disabled message (#152722) 2025-08-11 09:53:20 +01:00
David Sherwood
9181a7e294
[LV] Fix branch weights in epilogue min iteration check block (#152534)
I've changed how we construct the EpilogueVectorizerEpilogueLoop and
EpilogueVectorizerMainLoop classes so that we construct the parent class
with an additional boolean parameter indicating whether we're
vectorising the main or epilogue loop. The
InnerLoopAndEpilogueVectorizer class uses this new argument in
combination with the EpilogueLoopVectorizationInfo struct to set the
right UF and VF values. This then allows EpilogueVectorizerEpilogueLoop
to access the correct values of VF and UF for the main loop, which are
required when setting branch weights in the minimum iteration check
block.
2025-08-11 09:52:54 +01:00
Florian Hahn
86813aa786
[VPlan] Add dedicated user for resume phi with epilogue vectorization.
Epilogue vectorization currently relies on the resume phi for the
canonical induction being always available, which is why VPPhi are
considered to have side-effects, to prevent their removal.

This patch adds a new ResumeForEpilogue opcode to mark the resume phi as
used for epilogue vectorization. This allows treating VPPhis in general
as not having side-effects, enabling removal of unused VPPhis.
2025-08-10 21:21:16 +01:00
Florian Hahn
06fd0f9d65
[VPlan] Move initial skeleton construction earlier (NFC). (#150848)
Split up the not clearly named prepareForVectorization transform into
buildVPlan0, which adds the vector preheader, middle and scalar
preheader blocks, as well as the canonical induction recipes and sets
the trip count. The new transform is run directly after building the
plain CFG VPlan initially.

The remaining code handling early exits and adding the branch in the
middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still
runs at the original position.

With the code movement, we only have to add the skeleton once to the
initial VPlan, and cloning will take care of the rest. It will also
enable moving other construction steps to work directly on VPlan0, like
adding resume phis.

PR: https://github.com/llvm/llvm-project/pull/150848
2025-08-09 20:54:42 +01:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Florian Hahn
a485e0eae0
[VPlan] Retrieve vector TC for epilogue from resume phi (NFC).
Instead of relying on getOrCreateVectorTripCount to initialize
EPI.VectorTripCount, delay initialization after we retrieved the resume
phi and get the trip count from there. This makes the code independent
of legacy vector trip count creation.
2025-08-07 07:52:35 +01:00
Luke Lau
df8da2ff83
[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110)
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.

We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.

We also need to move the creation of the backedge value to the latch so
that EVL dominates it.

With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.

The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
2025-08-07 10:54:24 +08:00
Florian Hahn
e80e7e717e
[VPlan] Use scalar VPPhi instead of VPWidenPHIRecipe in createPlainCFG. (#150847)
The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.

PR: https://github.com/llvm/llvm-project/pull/150847
2025-08-06 14:43:03 +01:00
Florian Hahn
777c320e6c
[VPlan] Address comments missed in #142309.
Address additional comments from
https://github.com/llvm/llvm-project/pull/142309.
2025-08-06 11:52:08 +01:00
Florian Hahn
d478502a42
[VPlan] Ensure that IV resume phi for epilogue is always first. (NFCI)
Update handling of canonical IV resume phi for the epilogue loop to make
sure the resume phi for the canonical IV is always the first phi in the
scalar preheader.

This makes it easier to retrieve it in preparePlanForEpilogueVectorLoop.

For now, we keep an assert to make sure we use the same resume phi as
before. This will be removed in the future.
2025-08-05 21:06:41 +01:00
Florian Hahn
c9dd14d1d4
[VPlan] Compute interleave count for VPlan. (#149702)
Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some
information directly from VPlan. Register pressure was already computed
for a VPlan, and with this patch we now also check for reductions
directly on VPlan, as well as checking how many load and store
operations remain in the loop.

This should be mostly NFC, but we may compute slightly different
interleave counts, except for some edge cases, e.g. where dead loads
have been removed. This shouldn't happen in practice, and the patch
doesn't cause changes across a large test corpus on AArch64.

Computing the interleave count based on VPlan allows for making better
decisions in presence of VPlan optimizations, for example when
operations on interleave groups are narrowed.

Note that there are a few test changes for tests that were still
checking the legacy cost-model output when it was computed in
selectInterleaveCount.

PR: https://github.com/llvm/llvm-project/pull/149702
2025-08-05 09:42:55 +01:00
Florian Hahn
215e6beae0
[LV] Use MapVector for ScalarCostsTy for deterministic iter order (NFC)
We iterate over the scalar costs of instruction when printing costs, and
currently the iteration order is not deterministic. Currently no tests
check the output with multiple instructions in the map, but those will
come soon.
2025-08-04 19:31:07 +01:00
Florian Hahn
559d1dff89
[VPlan] Materialize BackedgeTakenCount using VPInstructions.
Explicitly compute the backedge-taken count using VPInstruction. This is
needed to model the full skeleton in VPlan.

NFC modulo some instruction re-ordering.
2025-08-03 12:21:28 +01:00