460 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
e4c0b3e111
[VPlan] Simplify x && false -> false, x | 0 -> x (#156345)
The OR x, 0 -> x simplification has been introduced to avoid
regressions.
2025-09-04 10:29:59 +01:00
Luke Lau
c33ccfa52b
[VPlan] Reassociate (x & y) & z -> x & (y & z) (#155383)
This PR reassociates logical ands in order to enable more
simplifications.

The driving motivation for this is that with tail folding all blocks
inside the loop body will end up using the header mask. However this can
end up nestled deep within a chain of logical ands from other edges.

Typically the header mask will be a leaf nested in the LHS, e.g.
(headermask & y) & z. So pulling it out allows it to be simplified
further, e.g. allows it to be optimised away to VP intrinsics with EVL
tail folding.
2025-09-03 01:09:19 +00:00
Ramkumar Ramachandra
d8fd511480
[VPlan] Introduce CSE pass (#151872)
Introduce a simple common-subexpression-elimination pass at the
VPlan-level, running late during the execution of the VPlan. The
long-term vision is to get rid of the legacy non-VPlan-based cse routine
in LV, but this patch doesn't yet fully subsume it.
2025-09-02 12:23:29 +01:00
Sam Tebbs
37127f74f4
[LV] Bundle sub reductions into VPExpressionRecipe (#147255)
This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. -> https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-09-01 17:25:01 +01:00
Mel Chen
13357e8a12
[LV][EVL] Support interleaved access with tail folding by EVL (#152070)
The InterleavedAccess pass already supports transforming
vector-predicated (vp) load/store intrinsics. With this patch, we start
enabling interleaved access under tail folding by EVL.

This patch introduces a new base class, VPInterleaveBase, and a concrete
class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and
the new VPInterleaveEVLRecipe inherit from and implement
VPInterleaveBase.

Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL
operand to emit vp.load/vp.store intrinsics.

Currently, tail folding by EVL is only supported for scalable
vectorization. Therefore, VPInterleaveEVLRecipe will only emit
interleave/deinterleave intrinsics. Reverse accesses are not yet
implemented, as masked reverse interleaved access under tail folding is
not yet supported.

Fixed #123201
2025-09-01 21:20:06 +08:00
Luke Lau
eb7f6a5f8a
[VPlan] Simplify (x && y) || (x && z) -> x && (y || z) (#156308)
Split off from #155383, since it turns out this has a diff on its own.
2025-09-01 21:12:23 +08:00
Kerry McLaughlin
f0e9bba024
[LoopVectorize] Generate wide active lane masks (#147535)
This patch adds a new flag (-enable-wide-lane-mask) which allows
LoopVectorize to generate wider-than-VF active lane masks when it
is safe to do so (i.e. the mask is used for data and control flow).

The transform in extractFromWideActiveLaneMask creates vector
extracts from the first active lane mask in the header & loop body,
modifying the active lane mask phi operands to use the extracts.

An additional operand is passed to the ActiveLaneMask instruction,
the value of which is used as a multiplier of VF when generating the
mask.
By default this is 1, and is updated to UF by
extractFromWideActiveLaneMask.

The motivation for this change is to improve interleaved loops when
SVE2.1 is available, where we can make use of the whilelo instruction
which returns a predicate pair.

This is based on a PR that was created by @momchil-velikov (#81140)
and contains tests which were added there.
2025-09-01 13:53:30 +01:00
Ramkumar Ramachandra
4cf770275f
[VPlan] Introduce replaceSymbolicStrides (NFC) (#155842)
Introduce VPlanTransforms::replaceSymbolicStrides factoring some code
from LoopVectorize.
2025-09-01 09:03:46 +00:00
Ramkumar Ramachandra
0a193cb687
[VPlan] Use IsaPred to improve code (NFC) (#156037) 2025-09-01 09:16:35 +01:00
Florian Hahn
465b17c450
[VPlan] Support scalable VFs in narrowInterleaveGroups. (#154842)
Update narrowInterleaveGroups to support scalable VFs. After the
transform, the vector loop will process a single iteration of the
original vector loop for fixed-width vectors and vscale iterations for
scalable vectors.
2025-08-31 20:45:07 +01:00
Florian Hahn
13aff91e7c
Revert "[VPlan] Support plans with vector pointers in narrowInterleaveGroups."
This reverts commit 806a797c52d8018639f5cdcce5eb375b17c87f5e as it
introduced a miscompile.
2025-08-31 19:37:24 +01:00
Florian Hahn
806a797c52
[VPlan] Support plans with vector pointers in narrowInterleaveGroups.
After narrowing interleave groups and related memory operations, all
vector pointers should be removed. Remove the check.

In preparation for https://github.com/llvm/llvm-project/pull/149706.
2025-08-29 20:55:40 +01:00
Florian Hahn
5ebd59806b
[VPlan] Fold BinaryAnd x, 0 -> 0 in simplifyRecipe.
This also fixes a cost-model divergence in the newly added
tests in constant-fold.ll
2025-08-27 22:35:08 +01:00
Florian Hahn
5faed1ad84
[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643)
This patch adds a new VPlan-based addMinimumIterationCheck, which
replaced the ILV version for the non-epilogue case.

The VPlan-based version constructs a SCEV expression to compute the
minimum iterations, use that to check if the check is known true or
false. Otherwise it creates a VPExpandSCEV recipe and emits a
compare-and-branch.

When using epilogue vectorization, we still need to create the minimum
trip-count-check during the legacy skeleton creation. The patch moves
the definitions out of ILV.

PR: https://github.com/llvm/llvm-project/pull/153643
2025-08-26 15:52:31 +01:00
Ramkumar Ramachandra
1e0e0e0a56
[VPlan] Improve style around container-inserts (NFC) (#155174) 2025-08-26 14:12:59 +01:00
Luke Lau
f3520c538d
[VPlan] Replace EVL branch condition with (branch-on-count AVLNext, 0) (#152167)
This changes the branch condition to use the AVL's backedge value
instead of the EVL-based IV.

This allows us to emit bnez on RISC-V and removes a use of the trip
count, which should reduce register pressure.

To match phis with VPlanPatternMatch I've had to relax the assert that
the number of operands must exactly match the pattern for the Phi
opcode, and I've copied over m_ZExtOrSelf from the LLVM IR
PatternMatch.h.

Fixes #151459
2025-08-26 11:19:19 +00:00
Florian Hahn
c950a72974
[VPlan] Support scalar VF for ExtractLane and FirstActiveLane.
Extend ExtractLane and FirstActiveLane to support scalable VFs. This
allows correct handling when interleaving with VF = 1.

Alive2 proofs:
 - Fixed codegen with this patch: https://alive2.llvm.org/ce/z/8Y5_Vc
   (verifies as correct)
 - Original codegen: https://alive2.llvm.org/ce/z/twdg3X (doesn't
   verify)

Fixes https://github.com/llvm/llvm-project/issues/154967.
2025-08-25 21:45:21 +01:00
Ramkumar Ramachandra
66be00d635
[VPlan] Introduce m_Cmp; match more compares (#154771)
Extend [Specific]Cmp_match to handle floating-point compares, and
introduce m_Cmp that matches both integer and floating-point compares.
Use it in simplifyRecipe to match and simplify the general case of
compares. The change has necessitated a bugfix in
VPReplicateRecipe::execute.
2025-08-24 13:27:06 +01:00
Florian Hahn
954097dd61
[VPlan] Use SCEV to check subtract in getOptimizableIVOf.
Simplify checks for IV subtractions in getOptimizableIVOf by using SCEV.
This slightly generalizes the patterns we can handle.
2025-08-23 22:00:01 +01:00
Florian Hahn
9f87cd68a4
[VPlan] Add m_ExtractLastElement matcher. (NFC) 2025-08-23 21:21:03 +01:00
Luke Lau
c97c6869b6
[VPlan] Allow folding not (cmp eq) -> icmp ne with other select users (#154497)
Currently we only allow folding not (cmp eq) -> icmp ne if the not is
the only user of the compare.
However a common scenario is that some select might also use the
compare. We can still fold the not if we also swizzle the arms of the
selects.

This helps avoid regressions in #150368
2025-08-22 07:59:14 +08:00
Florian Hahn
300d2c6d20
[VPlan] Move SCEV expansion to VPlan transform. (NFCI).
Move the logic to expand SCEVs directly to a late VPlan transform that
expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an
abstract recipe without execute, which clarifies how the recipe is
handled, i.e. it is not executed like regular recipes.

It also helps to simplify construction, as now scalar evolution isn't
required to be passed to the recipe.
2025-08-21 22:03:26 +01:00
Florian Hahn
e41aaf5a64
[VPlan] Use VPIRMetadata for VPInterleaveRecipe. (#153084)
Use VPIRMetadata for VPInterleaveRecipe to preserve noalias metadata
added by versioning.

This still uses InterleaveGroup's logic to preserve existing metadata
from IR. This can be migrated separately.

Fixes https://github.com/llvm/llvm-project/issues/153006.

PR: https://github.com/llvm/llvm-project/pull/153084
2025-08-21 18:58:10 +01:00
Florian Hahn
21cca5ea9d
[VPlan] Rely on VPlan opts to simplify multiply by 1 (NFCI). 2025-08-21 18:43:47 +01:00
Luke Lau
5ef28e0a88
[VPlan] Add m_c_Add to VPlanPatternMatch. NFC (#154730)
Same thing as #154705, and useful for simplifying the matching in
#152167
2025-08-21 11:26:08 +00:00
Luke Lau
955c475ae6
[VPlan] Add m_Sub to VPlanPatternMatch. NFC (#154705)
To mirror PatternMatch.h, and we'll also be able to use it in #152167
2025-08-21 09:33:46 +00:00
Shih-Po Hung
cf0e86118d
[VPlan] Handle canonical VPWidenIntOrFpInduction in branch-condition simplification (#153539)
SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a
few PHI
recipes in the loop header. With more IV-step optimizations,
the canonical widen-canonical-iv can be replaced by a canonical
VPWidenIntOrFpInduction,
which the pass did not handle, causing regressions (missed
simplifications).

This patch replaces canonical VPWidenIntOrFpInduction with a StepVector
in the vector preheader
since the vector loop region only executes once.
2025-08-21 07:34:54 +08:00
Luke Lau
cabf6433c6
[VPlan] EVL transform VPVectorEndPointerRecipe alongisde load/store recipes. NFC (#152542)
This is the first step in untangling the variable step transform and
header mask optimizations as described in #152541.

Currently we replace all VF users globally in the plan, including
VPVectorEndPointerRecipe. However this leaves reversed loads and stores
in an incorrect state until they are adjusted in optimizeMaskToEVL.

This moves the VPVectorEndPointerRecipe transform so that it is updated
in lockstep with the actual load/store recipe.

One thought that crossed my mind was that VPInterleaveRecipe could also
use VPVectorEndPointerRecipe, in which case we would have also been
computing the wrong address because we don't transform it to an EVL
recipe which accounts for the reversed address.
2025-08-19 08:16:48 +00:00
Luke Lau
144736b07e
[VPlan] Don't fold live ins with both scalar and vector operands (#154067)
If we end up with a extract_element VPInstruction where both operands
are live-ins, we will try to fold the live-ins even though the first
operand is a vector whilst the live-in is scalar.

This fixes it by just returning the vector live-in instead of calling
the folder, and removes the handling for insertelement where we aren't
able to do the fold. From some quick testing we previously never hit
this fold anyway, and were probably just missing test coverage.

Fixes #154045
2025-08-19 04:10:53 +00:00
Florian Hahn
7e9989390d
[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487)
Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to
serve their users requiring a vector, instead of doing so when unrolling
by VF.

Now we only need to implicitly build vectors in VPTransformState::get
for VPInstructions. Once they are also unrolled by VF we can remove the
code-path alltogether.

PR: https://github.com/llvm/llvm-project/pull/151487
2025-08-18 20:49:42 +01:00
Ramkumar Ramachandra
97f554249c
[VPlan] Preserve nusw in createInBoundsPtrAdd (#151549)
Rename createInBoundsPtrAdd to createNoWrapPtrAdd, and preserve nusw as
well as inbounds at the callsite.
2025-08-18 17:48:42 +01:00
Mel Chen
145e8aadca
[LV][EVL] Add dead EVL mask into ToErase for consistency. nfc (#153761) 2025-08-18 14:11:50 +08:00
Ramkumar Ramachandra
f34326dac8
[VPlan] Introduce vputils::onlyScalarValuesUsed (NFC) (#153577) 2025-08-15 15:55:59 +00:00
Ramkumar Ramachandra
86482dffba
[VPlan] Use m_Broadcast to improve a match (NFC) (#153607) 2025-08-14 18:10:58 +01:00
Ramkumar Ramachandra
d107c29fef
[VPlan] Strip unused CanonicalIVTy arg (NFC) (#153418) 2025-08-13 15:53:56 +01:00
Luke Lau
9217b6ab2e
[VPlan] Enforce that there is only ever one header mask. NFC (#152489)
We almost only ever have one header mask, except with the data tail
folding style, i.e. with VPInstruction::ActiveLaneMask.

All we need to do is to make sure to erase the old header icmp based
header mask when replacing it.
2025-08-13 02:39:04 +00:00
Florian Hahn
8cdab07aaa
Reapply "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9.

Recommit with a fix for the verifier error caused for EVL recipes.

Extra test coverage added in 6f939da60e.
2025-08-12 22:09:30 +01:00
Florian Hahn
424258947e
[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879)
Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.

This is one of the last few steps needed to model the full vector
skeleton in VPlan.

This is mostly NFC, although in some cases we remove some unused
computations.

PR: https://github.com/llvm/llvm-project/pull/152879
2025-08-12 14:13:13 +01:00
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Florian Hahn
1c7c8e3ad3
Revert "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2.

This seems to be breaking some RISCV bots, reverting for now
https://lab.llvm.org/buildbot/#/builders/210/builds/1266
2025-08-11 22:05:30 +01:00
Florian Hahn
1f17bb133f
[VPlan] Remove trivial dead VPPhi cycles.
Update removeDeadRecipes to remove trivial dead VPPhi cycles.

Should effectively be NFC end-to-end.
2025-08-11 21:29:49 +01:00
Luke Lau
aea82a780a
[VPlan] Remove some getCanonicalIV() uses. NFC (#152969)
A lot of time getCanonicalIV() is used to get the canonical IV type,
e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext.

However VPTypeAnalysis has a constructor that takes the VPlan directly
and there's a method on VPlan to get the LLVMContext directly, so use
those instead where possible.

This lets us remove a constructor on VPTypeAnalysis.

Also remove an unused LLVMContext argument in UnrollState whilst we're
here.
2025-08-11 18:12:05 +08:00
Ramkumar Ramachandra
95c525b1db
[VPlan] Preserve nusw on VectorEndPointer (#151558)
In createInterleaveGroups, get the nusw in addition to inbounds from the
existing GEP, and set them on the VPVectorEndPointerRecipe.
2025-08-11 10:38:25 +01:00
Mel Chen
6db3776f9b
[LV][EVL] Simplify EVL recipe transformation by using a single EVL mask. nfc (#152479)
The EVL mask is always defined as `icmp ult (step-vector, EVL)`, so we
only need to generate it once per plan in the header. Then, we replace
all uses of the header mask with the EVL mask, and recursively optimize
the users of EVL mask into EVL recipes. This way, the transformation to
EVL recipes can be done with just a single loop.
2025-08-11 11:09:01 +08:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Luke Lau
df8da2ff83
[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110)
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.

We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.

We also need to move the creation of the backedge value to the latch so
that EVL dominates it.

With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.

The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
2025-08-07 10:54:24 +08:00
Ramkumar Ramachandra
092388171f
[VPlan] Introduce m_[Specific]ICmp matcher (#151540) 2025-08-06 20:35:35 +01:00
Florian Hahn
25d1285eec
[VPlan] Replace single-entry VPPhis with their incoming values.
Replace trivial, single-entry VPPhis with their incoming values,
2025-08-06 20:03:31 +01:00
Florian Hahn
e80e7e717e
[VPlan] Use scalar VPPhi instead of VPWidenPHIRecipe in createPlainCFG. (#150847)
The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.

PR: https://github.com/llvm/llvm-project/pull/150847
2025-08-06 14:43:03 +01:00
Florian Hahn
777c320e6c
[VPlan] Address comments missed in #142309.
Address additional comments from
https://github.com/llvm/llvm-project/pull/142309.
2025-08-06 11:52:08 +01:00