433 Commits

Author SHA1 Message Date
Florian Hahn
86813aa786
[VPlan] Add dedicated user for resume phi with epilogue vectorization.
Epilogue vectorization currently relies on the resume phi for the
canonical induction being always available, which is why VPPhi are
considered to have side-effects, to prevent their removal.

This patch adds a new ResumeForEpilogue opcode to mark the resume phi as
used for epilogue vectorization. This allows treating VPPhis in general
as not having side-effects, enabling removal of unused VPPhis.
2025-08-10 21:21:16 +01:00
Luke Lau
723de7f231 [LV][RISCV] Try fixing Windows buildbot failure in force-vect-msg.ll. NFC
The clang-x64-windows-msvc buildbot is failing after
707447159341f7b5678dee4f47731af50524b9ae due to this test failing:
https://lab.llvm.org/buildbot/#/builders/63/builds/8528

This is a stab in the dark, but my first thought is that it may be due
to the handling of floats with MSVC or something. So this removes the
floating point part of the check. I don't have access to a Windows
machine handy to debug this just yet, so pushing this to see if it can
quickly return the buildbot to green.
2025-08-09 21:09:36 +08:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Luke Lau
7074471593
[RISCV] Enable tail folding by default (#151681)
We have been tracking the performance of EVL tail folding in the loop
vectorizer on RISC-V for a while now, and after much hard work from
various contributors we think it should be generally profitable to
enable by default now.

With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU
2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30%
geomean codesize reduction on SPEC and TSVC, with no significant
regressions detected.

Now that we are early into the LLVM 22.x development cycle it seems like
a good time to enable it to catch any issues. There are still more EVL
related items of work being tracked in #123069, which should continue to
improve performance.
2025-08-08 14:26:23 +08:00
Luke Lau
0720af8c24 [LV][RISCV] Precommit RUN line changes from #151681. NFC
In preparation for enabling EVL tail folding by default.
2025-08-08 12:40:27 +08:00
Luke Lau
a04142f11f [LV][RISCV] Add check lines for scalable interleave costs. NFC
Previously we could only scalably vectorize interleave groups with
factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now
support all factors (available on RISC-V). So this adds the remaining
check lines for the scalable VFs.
2025-08-07 12:28:12 +08:00
Luke Lau
44af26ea2e [LV] Fix EVL test after merge. NFC
Test was modified in both 25d1285eecbab731eaf418c8aab44e4eb5f9e538 and
df8da2ff8370fda479b5c118704af4f50e0d3536
2025-08-07 11:12:43 +08:00
Luke Lau
df8da2ff83
[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110)
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.

We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.

We also need to move the creation of the backedge value to the latch so
that EVL dominates it.

With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.

The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
2025-08-07 10:54:24 +08:00
Florian Hahn
25d1285eec
[VPlan] Replace single-entry VPPhis with their incoming values.
Replace trivial, single-entry VPPhis with their incoming values,
2025-08-06 20:03:31 +01:00
Luke Lau
94a6cd464e
[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274)
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.

There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.

Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
2025-08-05 16:54:02 +08:00
Mel Chen
76d98cfcc4
[RISCV][TTI] Enable masked interleave access (#151665)
Now that support for masked loads/stores of interleave groups has
landed, we can enable the loop vectorizer to generate masked interleave
access where applicable.

This improves vectorization in several ways:
* Internal predication support: This enables interleave group
vectorization for loops with internal control flow predication, provided
all members of the group share the same predicate. Gaps in interleave
groups are still not efficiently handled by masking, so masking for gaps
remains disabled for now.
* Tail folding: This allows tail folding of loops with interleave groups
by using masking. Without this, vectorized loops with interleaves would
fall back to using separate gather/scatter accesses, which can be
significantly less efficient.

"[RISCV][TTI] Enable masked interleave access for scalable vector
(#149981)" was reverted by 5294793bdcf6ca142f7a0df897638bd4e85ed1a7 due
to triggering an assertion. The issue has been addressed in the patch
"[LV] Fix gap mask requirement for interleaved access (#151105)". On the
other hand, this patch also enable fixed-length masked interleave access
(#150624) since support for fixed-length has also been landed
992118cb4deab139ae384bb85f03225a9a21b008.

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2025-08-05 16:08:13 +08:00
Mel Chen
86916ff0f0
[LV] Fix gap mask requirement for interleaved access (#151105)
When interleaved stores contain gaps, a mask is required to skip the
gaps, regardless of whether scalar epilogues are allowed.
This patch corrects the condition under which a gap mask is needed,
ensuring consistency between the legacy and VPlan-based cost models and
avoiding assertion failures.

Related #149981
2025-08-01 14:24:30 +08:00
Luke Lau
9c90f848fd [LV][RISCV] Regenerate select-cmp-reduction.ll with UTC. NFC
Simplify the opt invocation, and remove -force-vector-width and the
fixed-length RUN line so we're testing what the loop vectorizer emits in
the default configuration.
2025-08-01 12:38:18 +08:00
Luke Lau
b926f5d923 [LV][RISCV] Regenerate reduction tests with UTC. NFC
Previously these tests used optimization remarks to check the chosen VF,
but I don't think thats needed if we can see from the types used in the
vectorized body. We can just use UTC to generate the body instead.

Whilst we're here, also simplify the opt invocation to remove unneeded
options and rename from scalable-reductions.ll -> reductions.ll, seeing
as it's not specifically testing for scalable VFs anymore.
2025-08-01 12:32:14 +08:00
Luke Lau
8dfc07cad2 [RISCV][LV] Merge check lines in bf16/f16 tests. NFC
Now that we fall back to scalar epilogues in #150908 the
non-zvfhmin/zvfbfmin lines should be the same.
2025-08-01 11:26:16 +08:00
Luke Lau
7250b66240
[VPlan] Create AVL as a phi from TC -> 0 with EVL tail folding (#151481)
This implements the first half of #151459, by changing the AVL so it's
no longer computed as `trip-count - EVL-based IV`, but instead a
separate scalar phi that is decremented by EVL each iteration.

This shortens the dependency chain for computing the AVL and should
eventually allow us to convert the branch condition to `branch-count
avl-next, 0`.

`simplifyBranchConditionForVFAndUF` had to be updated to prevent a
regression because this introduces a VPPhi in the header block.
2025-08-01 11:00:05 +08:00
Samuel Tebbs
fcc419b05f [LV][NFCI] Swap reduction recipe operand order
https://github.com/llvm/llvm-project/pull/147026 will enable sub
reductions, which require that the phi value is the first operand since
they aren't commutative. This re-orders the operands when executing
reductions, which actually matches other existing code in
VPReductionRecipe::execute.
2025-07-31 14:35:10 +01:00
Shih-Po Hung
cc8c941e17
[VPlan] Convert EVL loops to variable-length stepping after dissolution (#147222)
Loop regions require fixed-length steps and rounded-up trip counts, but
after dissolution creates explicit control flow, EVL loops can leverage
variable-length stepping with original trip counts.

This patch adds a post-dissolution transform pass to convert EVL loops
from fixed-length to variable-length stepping .
2025-07-30 16:50:57 +08:00
Luke Lau
b663e563cc
[VPlan] Fix header masks in EVL tail folding (#150202)
With EVL tail folding, the EVL may not always be VF on the
second-to-last iteration.

Recipes that have been converted to VP intrinsics via optimizeMaskToEVL
account for this, but recipes that are left behind will still use the
old header mask which may end up having a different vector length.

This is effectively the same as #95368, and fixes this by converting
header masks from icmp ule wide-canonical-iv, backedge-trip-count ->
icmp ult step-vector, evl. Without it, recipes that fall through
optimizeMaskToEVL may use the wrong vector length, e.g. in #150074 and
#149981.

We really need to split off optimizeMaskToEVL into
VPlanTransforms::optimize and move transformRecipestoEVLRecipes into
tryToBuildVPlanWithVPRecipes, so we don't mix up what is needed for
correctness and what is needed to optimize away the mask computations.
We should be able to still generate a correct albeit suboptimal VPlan
without running optimizeMaskToEVL. I've added a TODO for this, which I
think we can do after #148274

Fixes #150197
2025-07-30 11:31:04 +08:00
Florian Hahn
c93d166c58
[VPlan] Simplify (MUL %x, 0) -> 0.
Simplify trivial multiplies.
https://alive2.llvm.org/ce/z/DabRkA
2025-07-28 21:50:57 +01:00
Luke Lau
5f2092dae3 [RISCV][LV] Update f16/bf16 loop vectorizer tests. NFC
This fixes a failing test after the changes in #150908 affected the
result in #150882.
2025-07-28 23:19:03 +08:00
Luke Lau
fe4f6c1a58
[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882)
When vectorizing with predication some loops that were previously
vectorized without zvfhmin/zvfbfmin will no longer be vectorized because
the masked load/store or gather/scatter cost returns illegal.

This is due to a discrepancy where for these costs we check
isLegalElementTypeForRVV but for regular memory accesses we don't.

But for bf16 and f16 vectors we don't actually need the extension
support for loads and stores, so this adds a new function which takes
this into account.

For regular memory accesses we should probably also e.g. return an
invalid cost for i64 elements on zve32x, but it doesn't look like we
have tests for this yet.

We also should probably not be vectorizing these bf16/f16 loops to begin
with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this
is due to the scalar costs being too cheap. I've added tests for this in
a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.
2025-07-28 22:59:49 +08:00
Luke Lau
92d09245d6
[VPlan] Fall back to scalar epilogue if possible when EVL isn't legal (#150908)
When enabling predicated vectorization by default on RISC-V, there's a
bunch of performance regressions on llvm-test-suite's LoopInterleaving
microbenchmarks:
https://lnt.lukelau.me/db_default/v4/nts/788?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=791&baseline=730&submit=Update

Most of these regressions stem from the interleave_count pragma, which
causes EVL tail folding interleaving to be unsupported (since we don't
support unrolling with EVL)

Currently if DataWithEVL isn't legal we fall back to DataWithoutLaneMask
as the tail folding style, but this is very slow on RISC-V.

The order of performance roughly is something like:

DataWithEVL > None (scalar-epilogue) > Data[WithoutLaneMask]

So this patch tries to prevent the regressions by falling back to a
scalar epilogue where possible, i.e. the existing vectorization we have
today. Not we may still need to fall back to DataWithoutLaneMask, e.g.
if the trip count is low etc or it's forced by
-prefer-predicate-over-epilogue=predicate-dont-vectorize.
2025-07-28 20:10:36 +08:00
Luke Lau
d4f9c11e06 [RISCV][LV] Use predicate-else-scalar-epilogue flag in tail folding tests. NFC
Align the tests closer with what we eventually intend to enable by
default on RISC-V by using
-prefer-predicate-over-epilogue=predicate-else-scalar-epilogue, instead
of dropping vectorization entirely with predicate-dont-vectorize.

Also adjust the non-EVL run lines so that they use
-prefer-predicate-over-epilogue=scalar-epilogue instead of
-force-tail-folding-style=none, so we're only using testing one type of
flag instead of a combination of two.
2025-07-28 17:04:00 +08:00
Luke Lau
ddb12c10a9 [RISCV][LV] Remove redundant -force-tail-folding-style from tests. NFC
This isn't needed after we set the tail folding style to data-with-evl
via TTI in #148686.  Also rename the tests to reflect the fact they're
no longer forcing the tail folding style.
2025-07-28 16:11:01 +08:00
Florian Hahn
89ae085859
[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735)
VPVectorPointer for part 0 is just the pointer operand. Simplify it
after unrolling. This removes a large number of redundant GEPs with
index 0.

PR: https://github.com/llvm/llvm-project/pull/149735
2025-07-27 13:53:26 +01:00
Florian Hahn
fa3ec0c17c
[VPlan] Materialize constant vector trip counts before final opts. (#142309)
Materialize constant vector trip counts before ::execute, if the trip
count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now
this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle
block.

For now this is also only done when not vectorizing the epilogue, as the
simplification complicates stitching the 2 plans together.

PR: https://github.com/llvm/llvm-project/pull/142309
2025-07-26 17:16:36 +01:00
Alex Bradbury
5294793bdc Revert "[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)"
This reverts commit ee3a7714b7a69ac9aae4b79f4c67adc38bc6876b.

Causes an assertion for the zvl1024b RISC-V build configuration. See
comment with reproducer at
<https://github.com/llvm/llvm-project/pull/149981#issuecomment-3118482801>
2025-07-25 16:14:10 +01:00
Mel Chen
ee3a7714b7
[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)
Now that support for masked loads/stores of interleave groups has
landed, we can enable the loop vectorizer to generate masked interleave
access where applicable.

This improves vectorization in several ways:
* Internal predication support: This enables interleave group
vectorization for loops with internal control flow predication, provided
all members of the group share the same predicate. Gaps in interleave
groups are still not efficiently handled by masking, so masking for gaps
remains disabled for now.
* Tail folding: This allows tail folding of loops with interleave groups
by using masking. Without this, vectorized loops with interleaves would
fall back to using separate gather/scatter accesses, which can be
significantly less efficient.
* Scalable vector support: Currently, only scalable vector types are
supported for masked interleave lowering. Fixed-length vector support
will be enabled in the future.

As interleave access is not yet supported with tail folding by EVL, that
functionality is temporarily disabled. We are going to create another
patch to support it.

Co-authored-by: Philip Reames <preames@rivosinc.com>

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2025-07-25 17:53:08 +08:00
Luke Lau
9563e7a940
[VPlan] Mark VPInstruction::ExplicitVectorLength as single scalar. NFC (#150221)
This allows it to be broadcasted without an explicit
VPInstruction::Broadcast in #150202
2025-07-23 22:38:21 +08:00
Luke Lau
20c52e4231 Reapply "[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686)"
This reverts commit 25e97fc420f8ecc43fbabadfe9767b4163e6ee36.

The original commit was reverted due to a crash in llvm-test-suite. The
crash stemmed from a multiply reduction, which isn't supported for
scalable VFs on RISC-V. But for EVL tail folding we only support
scalable VFs, so when -force-tail-folding-style=data-with-evl is
specified we check to see if there's a scalable VF, and fall back to
data-without-lane-mask if there isn't.

This is done in setTailFoldingStyles, but previously we were only
checking if the forced tail folding style was legal, not the style
returned by TTI.

This version fixes this by checking the actual computed tail folding
style and not just the forced one, and adds a test for the crash in
llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll
2025-07-22 23:52:02 +08:00
Luke Lau
25e97fc420 Revert "[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686)"
This reverts commit 38318dd05615a2f38abdeeae99e7423165308902.

The clang-riscv-gauntlet buildbot is breaking with this commit:
https://lab.llvm.org/buildbot/#/builders/210/builds/371
2025-07-22 22:54:26 +08:00
Luke Lau
6e723d2de8
[VPlan] Remove loop region in simplifyBranchConditionForVFAndUF with EVL PHI (#150016)
Previously we fell back to just simplifying the branch cond to true
since one of the phis was a VPEVLBasedIVPHIRecipe. However this should
be fine to replace with its start value.
2025-07-22 22:30:34 +08:00
Luke Lau
38318dd056
[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686)
In preparation to eventually make EVL tail folding the default, this
patch sets DataWithEVL as the preferred tail folding style for RISC-V,
but doesn't enable tail folding by default.

And although tail folding isn't enabled by default, the loop vectorizer
will actually tail fold loops with a small trip count, so this will
cause some EVL vectorized loops to be generated in the default
configuration.

The EVL tail folding work is still not complete, e.g. we still need to
handle interleave groups etc., see #123069, but a lot of these missing
features also apply to the data (masked) tail folding strategy, which is
the default anyway.

The actual overall performance picture is much better, on TSVC EVL tail
folding is faster than data on every benchmark on the spacemit-x60[^1]:
https://lnt.lukelau.me/db_default/v4/nts/755?compare_to=756
And on SPEC CPU 2017 we see a geomean improvement[^2]:
https://lnt.lukelau.me/db_default/v4/nts/751?compare_to=753

This is likely due to masked instructions generally being less
performant on the spacemit-x60, up to twice as slow:
https://camel-cdr.github.io/rvv-bench-results/bpi_f3/index.html

[^1]: These benchmarks don't exactly give the same performance numbers
as this patch, but it's a good indicator that EVL tail folding is
generally faster than masked tail folding.
[^2]: The large code size increase in 505.mcf_r is due to a function
being inlined now
2025-07-22 21:02:59 +08:00
Luke Lau
cb8b0cd2cf [LV] Precommit test changes for #148686. NFC
Namely explicitly adding -force-tail-folding-style=data to existing RUN
lines so that we don't lose them when we switch to data-with-evl by
default.
2025-07-22 16:16:43 +08:00
Mel Chen
d2a7f4e528
[NFC][LV] Refine the lit test case riscv-vector-reverse.ll (#149020)
This patch includes the following changes:
1. Merge riscv-vector-reverse-output.ll into riscv-vector-reverse.ll,
and only check the generated LLVM IR.
2. Add vplan-riscv-vector-reverse.ll to preserve the original debug
output checks from riscv-vector-reverse.ll.
2025-07-22 14:56:14 +08:00
Mel Chen
6f240d5a7d
[LV][EVL] Remove interleave count from the test case for EVL tail-folding. nfc (#149834)
Remove the interleave count since we have not supported it when EVL
tail-folding.
2025-07-22 08:59:53 +08:00
Florian Hahn
afe8150780
[VPlan] Simplify exituser handling by generating all extracts first(NFCI)
Simplify the handling of exit users by generating all extracts first
(safe option), and have FOR handling optimize the extracts, similar to
already done for reductions and inductions.

NFC modulo first-order recurrence extract order in middle block.
2025-07-16 08:14:12 +01:00
Luke Lau
df387661c4
[RISCV] Remove -riscv-v-vector-bits-min from LoopVectorize tests. NFC (#148565)
If I understand correctly there was a point where we used to need this
before it was implied by Zvl*b.

Now that it is though and we use -mattr=+v in pretty much every test we
can remove it.

In unroll-in-loop-vectorizer.ll we can force a VF of 1 instead by using
-force-vector-width=1, and in scalable-basics.ll the two RUN lines were
the same so I merged them.
2025-07-14 21:59:35 +08:00
Florian Hahn
64686c59c3
[VPlan] Connect (MemRuntime|SCEV)Check blocks as VPlan transform (NFC). (#143879)
Connect SCEV and memory runtime check block directly in VPlan as
VPIRBasicBlocks, removing ILV::emitSCEVChecks and
ILV::emitMemRuntimeChecks.

The new logic is currently split across
LoopVectorizationPlanner::addRuntimeChecks which collects a list of
{Condition, CheckBlock} pairs and performs some checks and emits remarks
if needed. The list of checks is then added to VPlan in
VPlanTransforms::connectCheckBlocks.

PR: https://github.com/llvm/llvm-project/pull/143879
2025-07-09 14:03:25 +02:00
Luke Lau
61a0653cc6
[VPlan] Fix first-order splices without header mask not using EVL (#146672)
This fixes a buildbot failure with EVL tail folding after #144666:
https://lab.llvm.org/buildbot/#/builders/132/builds/1653

For a first-order recurrence to be correct with EVL tail folding we need
to convert splices to vp splices with the EVL operand.
Originally we did this by looking for users of the header mask and its
users, and converting it in createEVLRecipe.

However after #144666 a FOR splice might not actually use the header
mask if it's based off e.g. an induction variable, and so we wouldn't
pick it up in createEVLRecipe.

This fixes this by converting FOR splices separately in a loop over all
recipes in the plan, regardless of whether or not it uses the header
mask.

I think there was some conflation in createEVLRecipe between what was an
optimisation and what was needed for correctness. Most of the transforms
in it just exist to optimize the mask away and we should still emit
correct code without them. So I've renamed it to make the separation
clearer.
2025-07-03 16:55:00 +01:00
Luke Lau
ec25a0568c
[VPlan] Don't convert VPWidenSelectRecipes to vp.select in EVL transform (#146695)
createEVLRecipe tries to optimise recipes that use the header mask by
replacing them with their VP equivalents and setting the EVL, allowing
the mask to be removed.

However we currently also convert widened selects to vp.select even
though they don't necessarily use the header mask.

Unlike vp.merge a vp.select only makes the "unused" lanes past EVL
poison, so it's not needed for correctness.

In the same vein as #127180, this patch removes the transform for
VPWidenSelectRecipes and keeps them as plain select instructions to
allow for more optimisations.

RISCVVLOptimizer will still be able to optimise away any VL toggles and
we end up with better code generation across llvm-test-suite and SPEC
CPU 2017.
2025-07-03 11:50:25 +01:00
Mel Chen
bc8dad1c7e
[VPlan] Emit VPVectorEndPointerRecipe for reverse interleave pointer adjustment (#144864)
A reverse interleave access is essentially composed of multiple
load/store operations with same negative stride, and their addresses are
based on the last lane address of member 0 in the interleaved group.

Currently, we already have VPVectorEndPointerRecipe for computing the
last lane address of consecutive reverse memory accesses. This patch
extends VPVectorEndPointerRecipe to support constant stride and extracts
the reverse interleave group address adjustment from
VPInterleaveRecipe::execute, replacing it with a
VPVectorEndPointerRecipe.

The final goal is to support interleaved accesses with EVL tail folding.
Given that VPInterleaveRecipe is large and tightly coupled — combining
both load and store, and embedding operations like reverse pointer
adjustion (GEP), widen load/store, deinterleave/interleave, and reversal
— breaking it down into smaller, dedicated recipes may allow
VPlanTransforms::tryAddExplicitVectorLength to lower them into EVL-aware
form more effectively.

One foreseeable challenge is that
VPlanTransforms::convertToConcreteRecipes currently runs after
tryAddExplicitVectorLength, so decomposing VPInterleaveRecipe will
likely need to happen earlier in the pipeline to be effective.
2025-07-02 18:16:02 +08:00
Luke Lau
4a2fa0847f
[VPlan] Support VPWidenIntOrFpInductionRecipes with EVL tail folding (#144666)
Following on from #118638, this handles widened induction variables with
EVL tail folding by setting the VF operand to be EVL, calculated in the
vector body.

We need to do this for correctness since with EVL tail folding the
number of elements processed in the penultimate iteration may not be VF,
but the runtime EVL, and we need take this into account when updating
the backedge value.

- Because the VF may now not be a live-in we need to move the insertion
point to just after the VFs definition
- We also need to avoid truncating it when it's the same size as the
step type, previously this wasn't a problem for live-ins.
- Also because the VF may be smaller than the IV type, since the EVL is
always i32, we may need to zext it.

On -march=rva23u64 -O3 we get 87.1% more loops vectorized on TSVC, and
42.8% more loops vectorized on SPEC CPU 2017
2025-07-01 12:29:24 +01:00
Luke Lau
f01a7936be
[VPlan] Replace all uses of VF when EVL tail folding. NFCI (#146339)
With EVL tail folding, any use of the VF live in should be replaced by
the EVL. Otherwise, it should likely be directly emitted as a constant
via VPTransformState::VF.

This strengthens the EVL transformation by replacing all uses of VF with
EVL and asserting that the only users are VPVectorEndPointerRecipe and
VPScalarIVStepsRecipe, the latter of which is new.

This should be NFC because even though we didn't previously replace the
EVL of VPScalarIVStepsRecipe, it's only used when unrolling which we
don't allow with EVL tail folding yet.
2025-06-30 13:47:38 +01:00
Florian Hahn
ecff028a96
[LV] Update test after 4ac4726d00.
Update missed test checks after #144644.
2025-06-25 09:54:54 +01:00
Florian Hahn
830b2c842e
[LV] Replace redundant ExtractLastElement of reduction result (NFC).
Replace redundant ExtractLastElement VPInstructions early. This is NFC,
as the VPInstruction computing the final result is vector-to-scalar,
producing a single scalar already. This enables follow-up changes to
model more aspects of reductions directly in VPlan.
2025-06-24 21:48:58 +01:00
Florian Hahn
2f5d965bb5
[VPlan] Use EMIT-SCALAR when printing casts.
Split off EMIT-SCALAR printing changes from already approved
https://github.com/llvm/llvm-project/pull/140623.

Currently all casts are single scalars, this brings printing in line
with printing for other VPInstructions.
2025-06-21 10:23:53 +01:00
Philip Reames
c103bbc836
[LV] Consider whether vscale is a known power of two for iteration check (#144963)
Going mostly by the comment here - but it says "vscale is not
necessarily a power-of-2". Both in tree targets have vscale as a power
of two, and we have an existing TTI hook for that.
2025-06-20 11:37:27 -07:00
Ramkumar Ramachandra
c8c4bd1ebc
[LV] Stengthen loop-invariance checks in isPredicatedInst (#140744)
Check loop-invariance against SCEV as well.
2025-06-20 14:01:48 +01:00