370 Commits

Author SHA1 Message Date
Paul Walker
6955a7d134 [NFC][LLVM][Instrumentation][LoopVectorize] Regenerate test checks. 2025-06-05 11:38:30 +00:00
Luke Lau
5458ea5122 [LV] Regenerate UTC variable names in RISCV/interleaved-accesses.ll. NFC 2025-06-04 01:07:32 +01:00
Florian Hahn
11713e86b0
[LV] Move VPlan-based calculateRegisterUsage to VPlanAnalysis (NFC). (#135673)
Move VPlan-based calculateRegisterUsage from LoopVectorize
to VPlanAnalysis.cpp. It is a VPlan-based analysis and this helps
to reduce the size of LoopVectorize.

PR: https://github.com/llvm/llvm-project/pull/135673
2025-06-02 17:40:50 +01:00
Florian Hahn
9ea4924720
[VPlan] Use EMIT-SCALAR for single-scalar VPPhis (NFC).
Follow-up to https://github.com/llvm/llvm-project/pull/141428, to also
use EMIT-SCALAR for VPPhis that are single scalars.
2025-05-29 11:20:07 +01:00
Florian Hahn
5b85e4b08d
[VPlan] Use EMIT-SCALAR when printing single-scalar VPInstructions. (#141428)
By using SINGLE-SCALAR when printing, it is clear in the debug output
that those VPInstructions only produce a single scalar.

Split off in preparation for
https://github.com/llvm/llvm-project/pull/140623.

PR: https://github.com/llvm/llvm-project/pull/141428
2025-05-29 09:29:06 +01:00
Ramkumar Ramachandra
5f39be5917
[VPlan] Use InstSimplifyFolder instead of TargetFolder (#141222)
For more powerful folding with operands that are not necessarily
all-constant, use InstSimplifyFolder instead of TargetFolder in
tryToConstantFold, and rename the function tryToFoldLiveIns.
2025-05-28 11:00:14 +02:00
Florian Hahn
d56deea1e4
[VPlan] Connect Entry to scalar preheader during initial construction. (#140132)
Update initial construction to connect the Plan's entry to the scalar
preheader during initial construction. This moves a small part of the
 skeleton creation out of ILV and will also enable replacing
 VPInstruction::ResumePhi with regular VPPhi recipes.

Resume phis need 2 incoming values to start with, the second being the
bypass value from the scalar ph (and used to replicate the incoming
value for other bypass blocks). Adding the extra edge ensures we
incoming values for resume phis match the incoming blocks.

PR: https://github.com/llvm/llvm-project/pull/140132
2025-05-27 16:07:56 +01:00
Luke Lau
841c8d48a6 [LV] Add tests for more interleave group factors on AArch64 and RISC-V. NFC
The plan is to eventually add support for scalably vectorizing these for
non-power-of-2 factors, see https://github.com/llvm/llvm-project/pull/139893

Simultaneously, we need to add a test to make sure we don't generate
@llvm.vector.[de]interleave3 for AArch64 if we can't lower it (yet)
2025-05-26 18:21:27 +01:00
Philip Reames
041d189f01
[RISCV][TTI] Adjust costing in getPartialReductionCost for zvqdotq (#141430)
Two changes:

1) Handle fixed vector cases now that 77a3f8 has landed. 
2) Fix a mistake in the original costing - the VF passed in is the
   input VF, not the output VF.  Given that we should be costing the
   accumulator type with VF/4.

Note that (2) does not cause any visible test differences as the
vectorizer (outside of maximize-bandwidth mode) does not consider wide
enough VF for the costing difference to matter.
2025-05-26 08:23:56 -07:00
Florian Hahn
dcef154b5c
[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506)
Building on top of https://github.com/llvm/llvm-project/pull/114305,
replace VPRegionBlocks with explicit CFG before executing.

This brings the final VPlan closer to the IR that is generated and
helps to simplify codegen.

It will also enable further simplifications of phi handling during
execution and transformations that do not have to preserve the 
canonical IV required by loop regions. This for example could include
replacing the canonical IV with an EVL based phi while completely
removing the original canonical IV.

PR: https://github.com/llvm/llvm-project/pull/117506
2025-05-24 19:17:16 +01:00
Philip Reames
a21fb74c0c
[RISCV][TTI] Implement getPartialReductionCost for the vqdotq cases (#140974)
Doing so tells the loop vectorizer that the partial.reduce intrinsic is
profitable to use over the plain extend/multiply/reduce.add sequence.
2025-05-23 07:15:06 -07:00
Philip Reames
c21416d1f9 [RISCV][TTI] Add test coverage for getPartialReductionCost [nfc]
Adding testing in advance of a change to cost the zvqdotq instructions
such that we emit them from LV.
2025-05-21 15:12:23 -07:00
Ramkumar Ramachandra
cf1f116f78
[VPlan] Introduce constant folder in simplifyRecipe (#125365)
Introduce a VPlan-level constant folder in simplifyRecipe that tries to
fold a recipe to a constant using TargetFolder.
2025-05-20 14:16:01 +01:00
Sam Tebbs
70501ed2f0
[LoopVectorizer] Prune VFs based on plan register pressure (#132190)
This PR moves the register usage checking to after the plans are
created, so that any recipes that optimise register usage (such as
partial reductions) can be properly costed and not have their VF pruned
unnecessarily.

Depends on https://github.com/llvm/llvm-project/pull/137746
2025-05-19 13:27:17 +01:00
Min-Yih Hsu
0ab67ec191
[LV][EVL] Introduce the EVLIndVarSimplify Pass for EVL-vectorized loops (#131005)
When we enable EVL-based loop vectorization w/ predicated tail-folding,
each vectorized loop has effectively two induction variables: one
calculates the step using (VF x vscale) and the other one increases the
IV by values returned from experiment.get.vector.length. The former,
also known as canonical IV, is more favorable for analyses as it's
"countable" in the sense of SCEV; the latter (EVL-based IV), however, is
more favorable to codegen, at least for those that support scalable
vectors like AArch64 SVE and RISC-V.

The idea is that we use canonical IV all the way until the end of all
vectorizers, where we replace it with EVL-based IV using EVLIVSimplify
introduced here. Such that we can have the best from both worlds.

This Pass is enabled by default in RISC-V. However, since we haven't
really vectorize loops with predicate tail-folding by default, this Pass
is no-op at this moment.
2025-05-14 13:49:50 -07:00
Florian Hahn
7a9fd62278
[VPlan] Use VPlan operand order for VPBlendRecipes. (#139475)
Don't use the order of incoming values of IR phis when creating 
VPBlendRecipes. Instead, simply use the incoming operands and
blocks from the VPWidenPHIRecipe.

Note that this changes the order of the incoming operands/masks for some
blends.

PR: https://github.com/llvm/llvm-project/pull/139475
2025-05-14 14:56:35 +01:00
Florian Hahn
5fa64d65e9
[VPlan] Use printPhiOperands for VPPhi.
Split off from  https://github.com/llvm/llvm-project/pull/139151 to land
printing improvements separately.

Updates printing of VPPhi operands to be consistent with
VPWidenPHIRecipe.
2025-05-10 12:49:29 +01:00
Luke Lau
1484f82cbc
[VPlan] Add VPInstruction::StepVector and use it in VPWidenIntOrFpInductionRecipe (#129508)
Split off from #118638, this adds VPInstruction::StepVector, which
generates integer step vectors (0,1,2,...,VF). This is a step towards
eventually modelling all the separate parts of
VPWidenIntOrFpInductionRecipe in VPlan.

This is then used by VPWidenIntOrFpInductionRecipe, where we materialize
it just before unrolling so the operands stay in a fixed position.

The need for a separate operand in VPWidenIntOrFpInductionRecipe, as
well as the need to update it in
optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638
when everything is expanded in convertToConcreteRecipes.
2025-05-08 18:47:44 +08:00
Min-Yih Hsu
e0537c0768
[LV][EVL] Attach a new metadata on EVL vectorized loops (#131000)
This patch attaches a new metadata, `llvm.loop.isvectorized.withevl`, on
loops vectorized with explicit vector length. This will help other
optimizations down in the pipeline that focus on EVL-vectorized loop

This approach is much safer than, said IR pattern matching to figure out
if a loop is EVL-vectorized or not.
2025-05-06 10:06:37 -07:00
Florian Hahn
043b04acff
Reapply "[VPlan] Fold NOT into predicate of wide compares." (#130347)
This reverts commit 8dd160f4767f971572eac065c8650d9202ff5bf9.

The recommit contains an adjustment to planContainsAdditionalSimplifications,
which considers changes to the original predicate for compares.

Original commit message:

Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.

Alive2 Proofs for FPCMP test changes:  https://alive2.llvm.org/ce/z/WGDz9U

PR: https://github.com/llvm/llvm-project/pull/129430
2025-04-28 20:01:37 +01:00
YunQiang Su
e9a34e4236
[RISCV] Support vectorizing FMINIMUMNUM and FMAXIMUMNUM (#135727)
RISC-V V extension support vfmax and vfmin, which follow IEEE754-2019.
We can use them directly.
2025-04-27 19:10:02 +08:00
Florian Hahn
df21288247
[VPlan] Replace ExtractFromEnd with Extract(Last|Penultimate)Element (NFC). (#137030)
ExtractFromEnd only has 2 uses, extracting the last and penultimate
elements. Replace it with 2 separate opcodes, removing the need to
materialize and handle a constant argument.

PR: https://github.com/llvm/llvm-project/pull/137030
2025-04-25 16:27:29 +01:00
Florian Hahn
5739a22fbb
[VPlan] Also duplicated scalar-steps when it enables sinking scalars. (#136021)
Extend sinking logic to duplicate scalar steps recipe if it enables
sinking, that is if all users in a destination block require all lanes.

This should be the last step before removing legacy sinkScalarOperands.

PR: https://github.com/llvm/llvm-project/pull/136021
2025-04-21 18:36:43 +01:00
Luke Lau
41675fa5b8
[VPlan] Simplify vp.merge true, (or x, y), x -> vp.merge y, true, x (#135017)
With EVL tail folding an AnyOf reduction will emit an i1 vp.merge like

vp.merge true, (or phi, cond), phi, evl

We can remove the or and optimise this to

vp.merge cond, true, phi, evl

Which makes it slightly easier to pattern match in #134898.

This also adds a pattern matcher for calls to help match this.

Blended AnyOf reductions will use an and instead of an or, which we may
also be able to simplify in a later patch.
2025-04-17 16:31:14 +02:00
Mel Chen
ffd5b14894
[LV] Add test cases for reverse accesses involving irregular types. nfc (#135139)
Add a test with irregular type to ensure the vector load/store
instructions are not generated.
2025-04-14 14:17:39 +08:00
Florian Hahn
995fd47944
[LAA] Make sure MaxVF for Store-Load forward safe dep distances is pow2.
MaxVF computed in couldPreventStoreLoadFowrard may not be a power of 2,
as CommonStride may not be a power-of-2.

This can cause crashes after 78777a20. Use bit_floor to make sure it is
a suitable power-of-2.

Fixes https://github.com/llvm/llvm-project/issues/134696.
2025-04-12 20:05:37 +01:00
Florian Hahn
6a9e8fc50c
[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) (#129706)
There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.

This leads to duplication from defining multiple full recipes.

This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.

There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* https://github.com/llvm/llvm-project/pull/129508
* https://github.com/llvm/llvm-project/pull/119284

PR: https://github.com/llvm/llvm-project/pull/129706
2025-04-10 22:30:40 +01:00
Florian Hahn
6f92339d9e
[LV] Compute register usage for interleaving on VPlan. (#126437)
Add a version of calculateRegisterUsage that works estimates register
usage for a VPlan. This mostly just ports the existing code, with some
updates to figure out what recipes will generate vectors vs scalars.

There are number of changes in the computed register usages, but they
should be more accurate w.r.t. to the generated vector code.

There are the following changes:

 * Scalar usage increases in most cases by 1, as we always create a
   scalar canonical IV, which is alive across the loop and is not
   considered by the legacy implementation

 * Output is ordered by insertion, now scalar registers are added first
   due the canonical IV phi.

 * Using the VPlan, we now also more precisely know if an induction will
   be vectorized or scalarized.

Depends on https://github.com/llvm/llvm-project/pull/126415

PR: https://github.com/llvm/llvm-project/pull/126437
2025-04-08 20:52:50 +01:00
Florian Hahn
5fbd0658a0
[VPlan] Add initial CFG simplification, removing BranchOnCond true. (#106748)
Add an initial CFG simplification transform, which removes the dead
edges for blocks terminated with BranchOnCond true.

At the moment, this removes the edge between middle block and scalar
preheader when folding the tail.

PR: https://github.com/llvm/llvm-project/pull/106748
2025-04-04 15:44:26 +01:00
Luke Lau
79435de8a5
[ConstantFold] Support scalable constant splats in ConstantFoldCastInstruction (#133207)
Previously only fixed vector splats were handled. This adds supports for
scalable vectors too by allowing ConstantExpr splats.

We need to add the extra V->getType()->isVectorTy() check because a
ConstantExpr might be a scalar to vector bitcast.

By allowing ConstantExprs this also allow fixed vector ConstantExprs to
be folded, which causes the diffs in
llvm/test/Analysis/ValueTracking/known-bits-from-operator-constexpr.ll
and llvm/test/Transforms/InstSimplify/ConstProp/cast-vector.ll. I can
remove them from this PR if reviewers would prefer.

Fixes #132922
2025-04-03 16:24:56 +01:00
YunQiang Su
e25187bc3e
LLVM/Test: Add vectorizing testcases for fminimumnum and fminimumnum (#133843)
Vectorizing of fminimumnum and fminimumnum have not support yet. Let's
add the testcase for it now, and we will update the testcase when we
support it.
2025-04-02 08:46:02 +08:00
Luke Lau
6afe5e5d1a
[LV][EVL] Peek through combination tail-folded + predicated masks (#133430)
If a recipe was predicated and tail folded at the same time, it will
have a mask like

    EMIT vp<%header-mask> = icmp ule canonical-iv, backedge-tc
    EMIT vp<%mask> = logical-and vp<%header-mask>, vp<%pred-mask>

When converting to an EVL recipe, if the mask isn't exactly just the
header-mask we copy the whole logical-and.
We can remove this redundant logical-and (because it's now covered by
EVL) and just use vp<%pred-mask> instead.

This lets us remove the widened canonical IV in more places.
2025-03-31 21:28:39 +01:00
Florian Hahn
783a846507
[VPlan] Add VF as operand to VPScalarIVStepsRecipe.
Similarly to other recipes, update VPScalarIVStepsRecipe to also take
the runtime VF as argument. This removes some unnecessary runtime VF
computations for scalable vectors. It will also allow dropping the
UF == 1 restriction for narrowing interleave groups required in
577631f0a528.
2025-03-28 21:48:59 +00:00
Pengcheng Wang
f5f4da6db6
[RISCV] Don't vectorize for loops with small trip count (#132176)
Inspired by https://reviews.llvm.org/D130755.

I don't know the logic behind the value 5, it is copied from AArch64.

For some tests, I have to change the trip count so that we don't
break what they are testing.
2025-03-28 15:51:29 +08:00
Florian Hahn
2c7d40b2f0
[VPlan] Generalize SCALAR-STEPS removal to any unroll factor.
Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled
VPlan, which means there's a single UF, but it will be > 1 if unrolling
took place.
2025-03-26 21:03:50 +00:00
Florian Hahn
dfca6c0d3b
[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655)
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part where
only the first lane is demanded.

This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.

In preparation for removing redundant WideIV steps added in
https://github.com/llvm/llvm-project/pull/119284.

PR: https://github.com/llvm/llvm-project/pull/123655
2025-03-25 12:57:24 +00:00
Florian Hahn
c482b8faea
[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC).
Instead of executing the whole entry VPIRBB twice, first only execute
the VPExpandSCEVRecipes and replace their uses with the expanded
VPValue, which will be a live-in. This allows removing special logic in
VPExpandSCEVRecipe to support executing twice and allows moving the
ExpandedSCEVs map out of VPTransformState.

It will also allow adding other recipes to the entry VPBB in the future.
2025-03-23 09:06:01 +00:00
David Sherwood
4e69258bf3
[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565)
At the moment if we decide to enable tail-folding we do not include
the cost of generating the mask per VF. This can mean we make some
poor choices of VF, which is definitely true for SVE-enabled AArch64
targets where mask generation for fixed-width vectors is more
expensive than for scalable vectors.

I've added a VPInstruction::computeCost function to return the costs
of the ActiveLaneMask and ExplicitVectorLength operations.
Unfortunately, in order to prevent asserts firing I've also had to
duplicate the same code in the legacy cost model to make sure the
chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for
now. The alternative would be to disable the assert completely when
tail-folding, which I imagine is just as bad.

New tests added:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
  Transforms/LoopVectorize/RISCV/tail-folding-cost.ll
2025-03-21 09:24:56 +00:00
Florian Hahn
11b8699572
[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415)
calculateRegisterUsage adds end points for each user of an instruction
to Ends and ignores instructions not added to it, i.e. instructions with
no users.

This means things like stores aren't included, which in turn means
values that are only used in stores are also not included for
consideration. This means we underestimate the register usage in cases
where the only users are things like stores.

Update the code to don't skip instructions without users (i.e. not in
Ends) if they have side-effects.

PR: https://github.com/llvm/llvm-project/pull/126415
2025-03-19 15:13:43 +00:00
Florian Hahn
870f753f1f
[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC).
Also include VPlan's BTC in the set of VPValues to materialize
broadcasts for, if it is used.
2025-03-18 22:35:18 +00:00
Luke Lau
a4dc02c0e7
[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086)
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.

Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.

This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
2025-03-19 00:09:15 +08:00
David Sherwood
f6b1b91a3d
[LV][NFC] Regenerate CHECK lines in some tests (#131799)
Regenerates CHECK lines in tests that are affected by
PR #130565 to aid reviews.
2025-03-18 14:38:01 +00:00
Mel Chen
489d1e764e
[LV][NFC] Pre-commit test for supporting strided accesses. (#130563)
Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to
verify all generated IR, not just debug output.
Pre-commit for #128718.
2025-03-18 16:08:42 +08:00
Florian Hahn
6a8d5f22ff
[VPlan] Don't access canonical IV in VPWidenPointerInduction::execute.
This updates VPWidenPointerInductionRecipe::execute to not use the
canonical IV to determine the insert point. Instead, it relies on the
current recipe position. In cases where this is not sufficient, set the
insert point to the first non-phi instruction, to ensure phis are
created together.
2025-03-15 21:32:48 +00:00
Florian Hahn
56b05a0d6b
[VPlan] Use VFxUF in VPWidenPointerInductionRecipe.
Use VFxUF VPValue instead of computing VF * UF explicitly.
2025-03-15 18:18:53 +00:00
Luke Lau
26324bc1bf
[VPlan] Move FOR splice cost into VPInstruction::FirstOrderRecurrenceSplice (#129645)
After #124093 we now support fixed-order recurrences with EVL tail
folding by replacing VPInstruction::FirstOrderRecurrenceSplice with a VP
splice intrinsic.

However the costing for the splice is currently done in
VPFirstOrderRecurrencePHIRecipe, so when we add the VP splice intrinsic
we end up costing it twice.

This fixes it by splitting out the cost for the splice into
FirstOrderRecurrenceSplice so that it's not duplicated when we replace
it.

We still have to keep the VF=1 checks in VPFirstOrderRecurrencePHIRecipe
since the splice might end up dead and discarded, e.g. in the test
@pr97452_scalable_vf1_for.
2025-03-14 15:33:32 +08:00
Florian Hahn
02575f887b
[VPlan] Use VPInstruction for VPScalarPHIRecipe. (NFCI) (#129767)
Now that all phi nodes manage their incoming blocks through the
VPlan-predecessors, there should be no need for having a dedicate
recipe, it should be sufficient to allow PHI opcodes in VPInstruction.

Follow-ups will also migrate VPWidenPHIRecipe and possibly others,
building on top of https://github.com/llvm/llvm-project/pull/129388.

PR: https://github.com/llvm/llvm-project/pull/129767
2025-03-13 18:35:07 +00:00
Mel Chen
5d5e706691 [VPlan] Restrict hoisting of broadcast operations using VPDominatorTree (#117138)
This patch restricts broadcast operations from being hoisted to the vector
preheader unless the basic block that defines the broadcasted value properly
dominates the vector preheader.

This prevents potential use-before-definition issues when the broadcasted
value is defined within the plan. VPDominatorTree is used to confirm this
restriction while still allowing safe hoisting for broadcasted values defined
outside the plan.

Issue https://github.com/llvm/llvm-project/issues/117139
2025-03-13 07:16:04 -07:00
Mel Chen
ffe202ca00 Revert "[LV] Limits the splat operations be hoisted must not be defined by a recipe. (#117138)"
This reverts commit 1ff10fa82fff83bb2f0a5c1ffde6203b52bc9619.
2025-03-13 07:16:04 -07:00
Florian Hahn
62994c3291
[VPlan] Also introduce explicit broadcasts for values from entry VPBB.
Update and generalize materializeBroadcasts to also introduce explicit
broadcasts for VPValues defined in the Plans Entry block.

This fixes a crash when trying to insert the broadcasts generated by
VPTransformState::get after the generating instruction, which isn't
possible after invoke instructions.

Fixes https://github.com/llvm/llvm-project/issues/128838.
2025-03-12 22:03:19 +00:00