481 Commits

Author SHA1 Message Date
Nicolai Hähnle
11a4b2d950
Cleanup the LLVM exported symbols namespace (#161240)
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.
2025-10-01 15:32:07 -07:00
Ramkumar Ramachandra
280abaf9da
[VPlan] Handle scalar-VF in transforms (NFC) (#161365) 2025-09-30 19:35:12 +01:00
Sam Tebbs
88658dbbc5
[LV] Add ExtNegatedMulAccReduction expression type (#160154)
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

1. https://github.com/llvm/llvm-project/pull/156976
2. -> https://github.com/llvm/llvm-project/pull/160154
3. https://github.com/llvm/llvm-project/pull/147302
2025-09-30 10:10:37 +01:00
Florian Hahn
71be13a6f0
[VPlan] Rewrite VPExpandSCEVExprs in replaceSymbolicStrides.
Extend replaceSymbolicStrides to also replace SCEVUnknowns in
VPExpandSCEVExprs using the information from StridesMaps.

This results in simpler SCEV expansions in some cases.
2025-09-28 21:55:31 +01:00
Florian Hahn
70a26da639
[VPlan] Set correct flags when creating and cloning VPWidenCastRecipe.
Make sure that we set the correct wrap flags when creating new
VPWidenCastRecipes for truncs and preserve the flags from the recipe
directly when cloning, to make sure they are not dropped.

Fixes https://github.com/llvm/llvm-project/issues/160396
2025-09-25 09:00:47 +01:00
Ramkumar Ramachandra
019913e4fa
[VPlan] Add WidenGEP::getSourceElementType (NFC) (#159029) 2025-09-22 10:02:08 +01:00
Ramkumar Ramachandra
b716d35388
[VPlanPatternMatch] Introduce m_ConstantInt (#159558) 2025-09-21 13:27:46 +01:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Ramkumar Ramachandra
0384f6c9db
[VPlanPatternMatch] Introduce match functor (NFC) (#159521)
Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to
introduce the VPlanPatternMatch version of the match functor to shorten
some idioms.

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-18 10:36:12 +01:00
Ramkumar Ramachandra
46fcece2a8
[VPlan] Extend CSE to eliminate GEPs (#156699)
The motivation for this patch is to close the gap between the
VPlan-based CSE and the legacy CSE, to make it easier to remove the
legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22
test failures, and after this patch, there are only 12 failures, and all
of them seem to have a single root cause:
VPlanTransforms::createInterleaveGroups() and
VPInterleaveGroup::execute(). The improvements from this patch are of
course welcome.

While developing the patch, a miscompile was found when GEP
source-element-types differ, and this has been fixed.

Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-16 10:14:32 +00:00
Ramkumar Ramachandra
148a83543b
[LV] Introduce m_One and improve (0|1)-match (NFC) (#157419) 2025-09-15 10:34:06 +00:00
Florian Hahn
ef7e03a2d1
[VPlan] Limit ExtractLastElem fold to recipes guaranteed single-scalar.
vputils::isSingleScalar(A) may return true to recipes that produce only
a single scalar value, but they could still end up as vector
instruction, because the recipe could not be converted to a
single-scalar VPInstruction/VPReplicateRecipe.

For now, only apply the fold for recipes guaranteed to produce a single
value, i.e. single-scalar VPInstructions and VPReplicateRecipes.

Fixes https://github.com/llvm/llvm-project/issues/158319.
2025-09-13 18:15:38 +01:00
Florian Hahn
b8eaceb39b
[VPlan] Explicitly replicate VPInstructions by VF. (#155102)
Extend replicateByVF added in #142433 (aa240293190) to also explicitly
unroll replicating VPInstructions.

Now the only remaining case where we replicate for all lanes is
VPReplicateRecipes in replicate regions.

PR: https://github.com/llvm/llvm-project/pull/155102
2025-09-12 17:06:26 +01:00
Florian Hahn
1efa997317
[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars.
Move handling of stores to single-scalar/uniform address from
replicateByVF to narrowToSingleScalar.
2025-09-10 21:58:29 +01:00
Florian Hahn
055e4ff35a
[VPlan] Don't narrow op multiple times in narrowInterleaveGroups.
Track which ops already have been narrowed, to avoid narrowing the same
operation multiple times. Repeated narrowing will lead to incorrect
results, because we could first narrow from an interleave group -> wide
load, and then narrow the wide load > single-scalar load.

Fixes thttps://github.com/llvm/llvm-project/issues/156190.
2025-09-10 19:22:42 +01:00
Florian Hahn
c3e76b2770
[VPlan] Keep common flags during CSE. (#157664)
During CSE, we don't have to drop all poison-generating flags on
mis-match, we can keep the ones common on both recipes.

PR: https://github.com/llvm/llvm-project/pull/157664
2025-09-10 10:20:48 +00:00
Stephen Tozer
d4f7995488
[VPlan] Use Unknown instead of empty location in VPlanTransforms (#157702)
The default values for DebugLocs in LoopVectorizer/VPlan were recently
updated from empty DebugLocs to DebugLoc::getUnknown, as part of the
DebugLoc Coverage Tracking work. However, there are some cases where we
also pass an explicit empty DebugLoc, in many cases as a filler
argument. This patch updates all of these to `getUnknown` for now, until
either valid locations or a suitable categorization can be assigned to
each instead.

This change is NFC outside of DebugLoc coverage tracking builds.
2025-09-10 10:33:58 +01:00
Mel Chen
4d9a7fa9ba
[VPlan] Remove dead recipes before simplifying blends (#157622)
In simplifyBlends, when normalizing a blend recipe, the first mask that
is used only by the blend and is not all-false is chosen, and its
corresponding incoming value becomes the initial value, with the others
blended into it. At the same time, the mask that is chosen can be
eliminated. However, a multi-user mask might be used by a dead recipe,
which prevents this optimization. This patch moves removeDeadRecipes
before simplifyBlends to eliminate dead recipes, allowing simplifyBlends
to remove more dead masks.
2025-09-10 08:03:18 +00:00
Florian Hahn
c4b17bf9ed
[VPlan] Slightly extend ExtractLastElement fold to single-scalars.
Update ExtractLastElement fold to support single scalar recipes, if all
their users only use scalars.
2025-09-09 22:08:08 +01:00
Florian Hahn
132bacde22
[VPlan] Also allow extracts as users when converting to single scalars.
Extracts technically do not use scalars, but vectors, but if the operand
is a single scalar we do not need a vector and they should not block
forming single scalars.
2025-09-08 22:11:39 +01:00
Luke Lau
3f9e0736ac
[VPlan] Move findCommonEdgeMask optimization to simplifyBlends (#156304)
Following up from #150368, this moves folding common edge masks into
simplifyBlends.

One test in uniform-blend.ll ended up regressing but after looking at it
closely, it came from a weird (x && !x) edge mask. So I've just included
a simplifcation in this PR to fold that to false.
2025-09-05 01:29:22 +00:00
Ramkumar Ramachandra
e4c0b3e111
[VPlan] Simplify x && false -> false, x | 0 -> x (#156345)
The OR x, 0 -> x simplification has been introduced to avoid
regressions.
2025-09-04 10:29:59 +01:00
Luke Lau
c33ccfa52b
[VPlan] Reassociate (x & y) & z -> x & (y & z) (#155383)
This PR reassociates logical ands in order to enable more
simplifications.

The driving motivation for this is that with tail folding all blocks
inside the loop body will end up using the header mask. However this can
end up nestled deep within a chain of logical ands from other edges.

Typically the header mask will be a leaf nested in the LHS, e.g.
(headermask & y) & z. So pulling it out allows it to be simplified
further, e.g. allows it to be optimised away to VP intrinsics with EVL
tail folding.
2025-09-03 01:09:19 +00:00
Ramkumar Ramachandra
d8fd511480
[VPlan] Introduce CSE pass (#151872)
Introduce a simple common-subexpression-elimination pass at the
VPlan-level, running late during the execution of the VPlan. The
long-term vision is to get rid of the legacy non-VPlan-based cse routine
in LV, but this patch doesn't yet fully subsume it.
2025-09-02 12:23:29 +01:00
Sam Tebbs
37127f74f4
[LV] Bundle sub reductions into VPExpressionRecipe (#147255)
This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. -> https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-09-01 17:25:01 +01:00
Mel Chen
13357e8a12
[LV][EVL] Support interleaved access with tail folding by EVL (#152070)
The InterleavedAccess pass already supports transforming
vector-predicated (vp) load/store intrinsics. With this patch, we start
enabling interleaved access under tail folding by EVL.

This patch introduces a new base class, VPInterleaveBase, and a concrete
class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and
the new VPInterleaveEVLRecipe inherit from and implement
VPInterleaveBase.

Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL
operand to emit vp.load/vp.store intrinsics.

Currently, tail folding by EVL is only supported for scalable
vectorization. Therefore, VPInterleaveEVLRecipe will only emit
interleave/deinterleave intrinsics. Reverse accesses are not yet
implemented, as masked reverse interleaved access under tail folding is
not yet supported.

Fixed #123201
2025-09-01 21:20:06 +08:00
Luke Lau
eb7f6a5f8a
[VPlan] Simplify (x && y) || (x && z) -> x && (y || z) (#156308)
Split off from #155383, since it turns out this has a diff on its own.
2025-09-01 21:12:23 +08:00
Kerry McLaughlin
f0e9bba024
[LoopVectorize] Generate wide active lane masks (#147535)
This patch adds a new flag (-enable-wide-lane-mask) which allows
LoopVectorize to generate wider-than-VF active lane masks when it
is safe to do so (i.e. the mask is used for data and control flow).

The transform in extractFromWideActiveLaneMask creates vector
extracts from the first active lane mask in the header & loop body,
modifying the active lane mask phi operands to use the extracts.

An additional operand is passed to the ActiveLaneMask instruction,
the value of which is used as a multiplier of VF when generating the
mask.
By default this is 1, and is updated to UF by
extractFromWideActiveLaneMask.

The motivation for this change is to improve interleaved loops when
SVE2.1 is available, where we can make use of the whilelo instruction
which returns a predicate pair.

This is based on a PR that was created by @momchil-velikov (#81140)
and contains tests which were added there.
2025-09-01 13:53:30 +01:00
Ramkumar Ramachandra
4cf770275f
[VPlan] Introduce replaceSymbolicStrides (NFC) (#155842)
Introduce VPlanTransforms::replaceSymbolicStrides factoring some code
from LoopVectorize.
2025-09-01 09:03:46 +00:00
Ramkumar Ramachandra
0a193cb687
[VPlan] Use IsaPred to improve code (NFC) (#156037) 2025-09-01 09:16:35 +01:00
Florian Hahn
465b17c450
[VPlan] Support scalable VFs in narrowInterleaveGroups. (#154842)
Update narrowInterleaveGroups to support scalable VFs. After the
transform, the vector loop will process a single iteration of the
original vector loop for fixed-width vectors and vscale iterations for
scalable vectors.
2025-08-31 20:45:07 +01:00
Florian Hahn
13aff91e7c
Revert "[VPlan] Support plans with vector pointers in narrowInterleaveGroups."
This reverts commit 806a797c52d8018639f5cdcce5eb375b17c87f5e as it
introduced a miscompile.
2025-08-31 19:37:24 +01:00
Florian Hahn
806a797c52
[VPlan] Support plans with vector pointers in narrowInterleaveGroups.
After narrowing interleave groups and related memory operations, all
vector pointers should be removed. Remove the check.

In preparation for https://github.com/llvm/llvm-project/pull/149706.
2025-08-29 20:55:40 +01:00
Florian Hahn
5ebd59806b
[VPlan] Fold BinaryAnd x, 0 -> 0 in simplifyRecipe.
This also fixes a cost-model divergence in the newly added
tests in constant-fold.ll
2025-08-27 22:35:08 +01:00
Florian Hahn
5faed1ad84
[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643)
This patch adds a new VPlan-based addMinimumIterationCheck, which
replaced the ILV version for the non-epilogue case.

The VPlan-based version constructs a SCEV expression to compute the
minimum iterations, use that to check if the check is known true or
false. Otherwise it creates a VPExpandSCEV recipe and emits a
compare-and-branch.

When using epilogue vectorization, we still need to create the minimum
trip-count-check during the legacy skeleton creation. The patch moves
the definitions out of ILV.

PR: https://github.com/llvm/llvm-project/pull/153643
2025-08-26 15:52:31 +01:00
Ramkumar Ramachandra
1e0e0e0a56
[VPlan] Improve style around container-inserts (NFC) (#155174) 2025-08-26 14:12:59 +01:00
Luke Lau
f3520c538d
[VPlan] Replace EVL branch condition with (branch-on-count AVLNext, 0) (#152167)
This changes the branch condition to use the AVL's backedge value
instead of the EVL-based IV.

This allows us to emit bnez on RISC-V and removes a use of the trip
count, which should reduce register pressure.

To match phis with VPlanPatternMatch I've had to relax the assert that
the number of operands must exactly match the pattern for the Phi
opcode, and I've copied over m_ZExtOrSelf from the LLVM IR
PatternMatch.h.

Fixes #151459
2025-08-26 11:19:19 +00:00
Florian Hahn
c950a72974
[VPlan] Support scalar VF for ExtractLane and FirstActiveLane.
Extend ExtractLane and FirstActiveLane to support scalable VFs. This
allows correct handling when interleaving with VF = 1.

Alive2 proofs:
 - Fixed codegen with this patch: https://alive2.llvm.org/ce/z/8Y5_Vc
   (verifies as correct)
 - Original codegen: https://alive2.llvm.org/ce/z/twdg3X (doesn't
   verify)

Fixes https://github.com/llvm/llvm-project/issues/154967.
2025-08-25 21:45:21 +01:00
Ramkumar Ramachandra
66be00d635
[VPlan] Introduce m_Cmp; match more compares (#154771)
Extend [Specific]Cmp_match to handle floating-point compares, and
introduce m_Cmp that matches both integer and floating-point compares.
Use it in simplifyRecipe to match and simplify the general case of
compares. The change has necessitated a bugfix in
VPReplicateRecipe::execute.
2025-08-24 13:27:06 +01:00
Florian Hahn
954097dd61
[VPlan] Use SCEV to check subtract in getOptimizableIVOf.
Simplify checks for IV subtractions in getOptimizableIVOf by using SCEV.
This slightly generalizes the patterns we can handle.
2025-08-23 22:00:01 +01:00
Florian Hahn
9f87cd68a4
[VPlan] Add m_ExtractLastElement matcher. (NFC) 2025-08-23 21:21:03 +01:00
Luke Lau
c97c6869b6
[VPlan] Allow folding not (cmp eq) -> icmp ne with other select users (#154497)
Currently we only allow folding not (cmp eq) -> icmp ne if the not is
the only user of the compare.
However a common scenario is that some select might also use the
compare. We can still fold the not if we also swizzle the arms of the
selects.

This helps avoid regressions in #150368
2025-08-22 07:59:14 +08:00
Florian Hahn
300d2c6d20
[VPlan] Move SCEV expansion to VPlan transform. (NFCI).
Move the logic to expand SCEVs directly to a late VPlan transform that
expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an
abstract recipe without execute, which clarifies how the recipe is
handled, i.e. it is not executed like regular recipes.

It also helps to simplify construction, as now scalar evolution isn't
required to be passed to the recipe.
2025-08-21 22:03:26 +01:00
Florian Hahn
e41aaf5a64
[VPlan] Use VPIRMetadata for VPInterleaveRecipe. (#153084)
Use VPIRMetadata for VPInterleaveRecipe to preserve noalias metadata
added by versioning.

This still uses InterleaveGroup's logic to preserve existing metadata
from IR. This can be migrated separately.

Fixes https://github.com/llvm/llvm-project/issues/153006.

PR: https://github.com/llvm/llvm-project/pull/153084
2025-08-21 18:58:10 +01:00
Florian Hahn
21cca5ea9d
[VPlan] Rely on VPlan opts to simplify multiply by 1 (NFCI). 2025-08-21 18:43:47 +01:00
Luke Lau
5ef28e0a88
[VPlan] Add m_c_Add to VPlanPatternMatch. NFC (#154730)
Same thing as #154705, and useful for simplifying the matching in
#152167
2025-08-21 11:26:08 +00:00
Luke Lau
955c475ae6
[VPlan] Add m_Sub to VPlanPatternMatch. NFC (#154705)
To mirror PatternMatch.h, and we'll also be able to use it in #152167
2025-08-21 09:33:46 +00:00
Shih-Po Hung
cf0e86118d
[VPlan] Handle canonical VPWidenIntOrFpInduction in branch-condition simplification (#153539)
SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a
few PHI
recipes in the loop header. With more IV-step optimizations,
the canonical widen-canonical-iv can be replaced by a canonical
VPWidenIntOrFpInduction,
which the pass did not handle, causing regressions (missed
simplifications).

This patch replaces canonical VPWidenIntOrFpInduction with a StepVector
in the vector preheader
since the vector loop region only executes once.
2025-08-21 07:34:54 +08:00
Luke Lau
cabf6433c6
[VPlan] EVL transform VPVectorEndPointerRecipe alongisde load/store recipes. NFC (#152542)
This is the first step in untangling the variable step transform and
header mask optimizations as described in #152541.

Currently we replace all VF users globally in the plan, including
VPVectorEndPointerRecipe. However this leaves reversed loads and stores
in an incorrect state until they are adjusted in optimizeMaskToEVL.

This moves the VPVectorEndPointerRecipe transform so that it is updated
in lockstep with the actual load/store recipe.

One thought that crossed my mind was that VPInterleaveRecipe could also
use VPVectorEndPointerRecipe, in which case we would have also been
computing the wrong address because we don't transform it to an EVL
recipe which accounts for the reversed address.
2025-08-19 08:16:48 +00:00
Luke Lau
144736b07e
[VPlan] Don't fold live ins with both scalar and vector operands (#154067)
If we end up with a extract_element VPInstruction where both operands
are live-ins, we will try to fold the live-ins even though the first
operand is a vector whilst the live-in is scalar.

This fixes it by just returning the vector live-in instead of calling
the folder, and removes the handling for insertelement where we aren't
able to do the fold. From some quick testing we previously never hit
this fold anyway, and were probably just missing test coverage.

Fixes #154045
2025-08-19 04:10:53 +00:00