464 Commits

Author SHA1 Message Date
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
Ramkumar Ramachandra
46fcece2a8
[VPlan] Extend CSE to eliminate GEPs (#156699)
The motivation for this patch is to close the gap between the
VPlan-based CSE and the legacy CSE, to make it easier to remove the
legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22
test failures, and after this patch, there are only 12 failures, and all
of them seem to have a single root cause:
VPlanTransforms::createInterleaveGroups() and
VPInterleaveGroup::execute(). The improvements from this patch are of
course welcome.

While developing the patch, a miscompile was found when GEP
source-element-types differ, and this has been fixed.

Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-16 10:14:32 +00:00
Luke Lau
4bb250d6a3
[VPlan] Always consider register pressure on RISC-V (#156951)
Stacked on #156923 

In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because
whilst the largest element type is i8, we generate a bunch of pointer
vectors for gathers and scatters. This means the VF chosen is quite high
e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x
i64> m8 registers for the pointers.

This was briefly fixed by #132190 where we computed register pressure in
VPlan and used it to prune VFs that were likely to spill. The legacy
cost model wasn't able to do this pruning because it didn't have
visibility into the pointer vectors that were needed for the
gathers/scatters.

However VF pruning was restricted again to just the case when max
bandwidth was enabled in #141736 to avoid an AArch64 regression, and
restricted again in #149056 to only prune VFs that had max bandwidth
enabled.

On RISC-V we take advantage of register grouping for performance and
choose a default of LMUL 2, which means there are 16 registers to work
with – half the number as SVE, so we encounter higher register pressure
more frequently.

As such, we likely want to always consider pruning VFs with high
register pressure and not just the VFs from max bandwidth.

This adds a TTI hook to opt into this behaviour for RISC-V which fixes
the motivating godbolt example above. When last checked this
significantly reduces the number of spills on SPEC CPU 2017, up to
80% on 538.imagick_r.
2025-09-12 06:21:54 +00:00
Elvis Wang
3e898bc40f
[LV] Fix cost misaligned when gather/scatter w/ addr is uniform. (#157387)
This patch fix the assertion when the `isUniform` (from legacy model)
and `isSingleScalar`(from Vplan-based model) mismatch.

The simplify test that cause assertion
```
loop:
  loadA = load %a  => %a is loop invariant.
  loadB = load %LoadA
  ...
```
In the legacy cost model, it cannot analysis that addr of `%loadB` is
uniform but in the Vplan-based cost model both addr in `%loadA` and
`loadB` is single scalar.

Full test caused crash: https://llvm.godbolt.org/z/zEG8YKjqh.

---------

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-11 07:49:54 +08:00
Florian Hahn
9b1b93766d
Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9.

Recommit with with a fix for MSan failure (
https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a
set to track deleted values. Using the InsertedInstructions set is not
sufficient, as it use asserting value handles as keys, which may
dereference the value at construction.

Original message:

Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-09 09:47:41 +01:00
Florian Hahn
eeb43806eb
Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f.

Triggers MSan errors in some configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/169/builds/14799
2025-09-08 14:52:28 +01:00
Florian Hahn
528b13df57
[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)
Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-08 10:53:20 +01:00
Florian Hahn
b50ad945dd
[InstSimplify] Simplify extractvalue (umul_with_overflow(x, 1)). (#157307)
Look through extractvalue to simplify umul_with_overflow where one of
the operands is 1.

This removes some redundant instructions when expanding SCEVs, which in
turn makes the runtime check cost estimate more accurate, reducing the
minimum iterations for which vectorization is profitable.

PR: https://github.com/llvm/llvm-project/pull/157307
2025-09-07 18:32:40 +01:00
Luke Lau
3f9e0736ac
[VPlan] Move findCommonEdgeMask optimization to simplifyBlends (#156304)
Following up from #150368, this moves folding common edge masks into
simplifyBlends.

One test in uniform-blend.ll ended up regressing but after looking at it
closely, it came from a weird (x && !x) edge mask. So I've just included
a simplifcation in this PR to fold that to false.
2025-09-05 01:29:22 +00:00
Ramkumar Ramachandra
e4c0b3e111
[VPlan] Simplify x && false -> false, x | 0 -> x (#156345)
The OR x, 0 -> x simplification has been introduced to avoid
regressions.
2025-09-04 10:29:59 +01:00
Mel Chen
2f5500e4cf
[LV] Improve the test coverage for strided access. nfc (#155981)
Add tests for strided access with UF > 1, and introduce a new test case
@constant_stride_reinterpret.
2025-09-03 10:19:36 +00:00
Luke Lau
c33ccfa52b
[VPlan] Reassociate (x & y) & z -> x & (y & z) (#155383)
This PR reassociates logical ands in order to enable more
simplifications.

The driving motivation for this is that with tail folding all blocks
inside the loop body will end up using the header mask. However this can
end up nestled deep within a chain of logical ands from other edges.

Typically the header mask will be a leaf nested in the LHS, e.g.
(headermask & y) & z. So pulling it out allows it to be simplified
further, e.g. allows it to be optimised away to VP intrinsics with EVL
tail folding.
2025-09-03 01:09:19 +00:00
Ramkumar Ramachandra
d8fd511480
[VPlan] Introduce CSE pass (#151872)
Introduce a simple common-subexpression-elimination pass at the
VPlan-level, running late during the execution of the VPlan. The
long-term vision is to get rid of the legacy non-VPlan-based cse routine
in LV, but this patch doesn't yet fully subsume it.
2025-09-02 12:23:29 +01:00
Elvis Wang
8fdae0c7da
[Reland] "[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. #149955" (#156386)
This patch implements the `getAddressComputationCost()` in RISCV TTI
which
make the gather/scatter with address calculation more expansive that
stride cost.

Note that the only user of `getAddressComputationCost()` with vector
type is in `VPWidenMemoryRecipe::computeCost()`. So this patch make some
LV tests changes.

I've checked the tests changes in LV and seems those changes can be
divided into two groups.
 * gather/scatter with uniform vector ptr, seems can be optimized to
 masked.load.
 * can optimize to stride load/store.

----
After #155739 landed, the assertion (cost mis-aligned) is fixed.
I've tested llvm-test-suite w/ rva23u64 and rva23u64_zvl1024b locally
and no assertion occurred.
2025-09-02 11:43:27 +08:00
Elvis Wang
7997a79be6
[LV] Align legacy cost model to vplan-based model for gather/scatter w/ uniform addr. (#155739)
This patch check if the addr is uniform in legacy cost model to align
vplan-based cost model after #150371.

This patch fixes llvm-test-suite assertion
(https://lab.llvm.org/buildbot/#/builders/210/builds/1935) due to cost
model misaligned after #149955 under RISCV.

I've tested this patch (on top of #149955) on the llvm-test-suite
locally with crashed options `rva23u64`, `rva23u64_zvl1024b` and build
successfully.

Since this fix will change LV, I think would be better to create a PR to
fix this.
2025-09-02 09:11:45 +08:00
Mel Chen
13357e8a12
[LV][EVL] Support interleaved access with tail folding by EVL (#152070)
The InterleavedAccess pass already supports transforming
vector-predicated (vp) load/store intrinsics. With this patch, we start
enabling interleaved access under tail folding by EVL.

This patch introduces a new base class, VPInterleaveBase, and a concrete
class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and
the new VPInterleaveEVLRecipe inherit from and implement
VPInterleaveBase.

Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL
operand to emit vp.load/vp.store intrinsics.

Currently, tail folding by EVL is only supported for scalable
vectorization. Therefore, VPInterleaveEVLRecipe will only emit
interleave/deinterleave intrinsics. Reverse accesses are not yet
implemented, as masked reverse interleaved access under tail folding is
not yet supported.

Fixed #123201
2025-09-01 21:20:06 +08:00
Luke Lau
eb7f6a5f8a
[VPlan] Simplify (x && y) || (x && z) -> x && (y || z) (#156308)
Split off from #155383, since it turns out this has a diff on its own.
2025-09-01 21:12:23 +08:00
Luke Lau
c9faedd760
[VPlan] Fold common edges away in convertPhisToBlends (#150368)
If a phi is widened with tail folding, all of its predecessors will have
a mask of the form

    %x = logical-and %active-lane-mask, %foo
    %y = logical-and %active-lane-mask, %bar
    %z = logical-and %active-lane-mask, %baz
    ...

We can remove the common %active-lane-mask from all of these edge masks,
which in turn allows us to simplify a lot of VPBlendRecipes.

In particular, it allows the header mask to be removed in selects with
EVL tail folding, improving RISC-V codegen on SPEC CPU 2017 for
525.x264_r, and supersedes #147243.

This also allows us to remove VPBlendRecipe and directly emit
VPInstruction::Select in another patch.
2025-09-01 07:03:33 +00:00
Elvis Wang
69db050839
Revert "[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI." (#155535)
Reverts llvm/llvm-project#149955
2025-08-27 01:54:19 +00:00
Elvis Wang
dfd3833674
[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. (#149955)
This patch implements the `getAddressComputationCost()` in RISCV TTI
which
make the gather/scatter with address calculation more expansive that
stride cost.

Note that the only user of `getAddressComputationCost()` with vector
type is in `VPWidenMemoryRecipe::computeCost()`. So this patch make some
LV tests changes.

I've checked the tests changes in LV and seems those changes can be
divided into two groups.
 * gather/scatter with uniform vector ptr, seems can be optimized to
 masked.load.
 * can optimize to stride load/store.
2025-08-27 08:40:40 +08:00
Florian Hahn
5faed1ad84
[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643)
This patch adds a new VPlan-based addMinimumIterationCheck, which
replaced the ILV version for the non-epilogue case.

The VPlan-based version constructs a SCEV expression to compute the
minimum iterations, use that to check if the check is known true or
false. Otherwise it creates a VPExpandSCEV recipe and emits a
compare-and-branch.

When using epilogue vectorization, we still need to create the minimum
trip-count-check during the legacy skeleton creation. The patch moves
the definitions out of ILV.

PR: https://github.com/llvm/llvm-project/pull/153643
2025-08-26 15:52:31 +01:00
Luke Lau
f3520c538d
[VPlan] Replace EVL branch condition with (branch-on-count AVLNext, 0) (#152167)
This changes the branch condition to use the AVL's backedge value
instead of the EVL-based IV.

This allows us to emit bnez on RISC-V and removes a use of the trip
count, which should reduce register pressure.

To match phis with VPlanPatternMatch I've had to relax the assert that
the number of operands must exactly match the pattern for the Phi
opcode, and I've copied over m_ZExtOrSelf from the LLVM IR
PatternMatch.h.

Fixes #151459
2025-08-26 11:19:19 +00:00
Luke Lau
a12d012c87 [VPlan][RISC-V] Add test case for #154103
This has now been fixed by #152707
2025-08-26 17:31:29 +08:00
Luke Lau
42cf9c60d7
[RISCV] Mark Sub/AddChainWithSubs as legal reduction types (#154753)
We used to vectorize these scalably but after #147026 they were split
out from RecurKind::Add into their own RecurKinds, and we didn't mark
them as supported in isLegalToVectorizeReduction.

This caused the loop vectorizer to drop the scalable VPlan because it
thinks the reductions will be scalarized.

This fixes it by just marking them as supported.

Fixes #154554
2025-08-21 21:43:48 +08:00
Elvis Wang
d611a9ca15
[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. (#154482)
`VPEVLBasedIVPHIRecipe` will lower to VPInstruction scalar phi and
generate scalar phi. This recipe will only occupy a scalar register just
like other phi recipes.

This patch fix the register usage for `VPEVLBasedIVPHIRecipe` from
vector
to scalar which is close to generated vector IR.

https://godbolt.org/z/6Mzd6W6ha shows that no register spills when
choosing `<vscale x 16>`.

Note that this test is basically copied from AArch64.
2025-08-21 07:39:01 +08:00
Florian Hahn
351d398a37
[VPlan] Run final VPlan simplifications before codegen.
Dissolving the hierarchical VPlan CFG and converting abstract to
concrete recipes can expose additional simplification opportunities.

Do a final run of simplifyRecipes before executing the VPlan.
2025-08-16 18:54:27 +01:00
Florian Hahn
db98ac43ec
[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495)
Directly emit shl instead of a multiply if VF * Step is a power-of-2. The
main motivation here is to prepare the code and test for directly
generating and expanding a SCEV expression of the minimum iteration
count. SCEVExpander will directly emit shl for multiplies with
powers-of-2.

InstCombine will also performs this combine, so end-to-end this should
effectively by NFC.

PR: https://github.com/llvm/llvm-project/pull/153495
2025-08-14 19:27:51 +01:00
Mel Chen
b9138bde35
[LV][EVL] More lit tests for interleaved access. nfc (#152959)
Add test cases for reverse interleaved access and interleaved access
with gap.
2025-08-13 15:43:39 +08:00
Florian Hahn
8cdab07aaa
Reapply "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9.

Recommit with a fix for the verifier error caused for EVL recipes.

Extra test coverage added in 6f939da60e.
2025-08-12 22:09:30 +01:00
Florian Hahn
6f939da60e
[LV] Add additional test for backedge elimination with EVL. 2025-08-12 21:58:19 +01:00
Florian Hahn
86813aa786
[VPlan] Add dedicated user for resume phi with epilogue vectorization.
Epilogue vectorization currently relies on the resume phi for the
canonical induction being always available, which is why VPPhi are
considered to have side-effects, to prevent their removal.

This patch adds a new ResumeForEpilogue opcode to mark the resume phi as
used for epilogue vectorization. This allows treating VPPhis in general
as not having side-effects, enabling removal of unused VPPhis.
2025-08-10 21:21:16 +01:00
Luke Lau
723de7f231 [LV][RISCV] Try fixing Windows buildbot failure in force-vect-msg.ll. NFC
The clang-x64-windows-msvc buildbot is failing after
707447159341f7b5678dee4f47731af50524b9ae due to this test failing:
https://lab.llvm.org/buildbot/#/builders/63/builds/8528

This is a stab in the dark, but my first thought is that it may be due
to the handling of floats with MSVC or something. So this removes the
floating point part of the check. I don't have access to a Windows
machine handy to debug this just yet, so pushing this to see if it can
quickly return the buildbot to green.
2025-08-09 21:09:36 +08:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Luke Lau
7074471593
[RISCV] Enable tail folding by default (#151681)
We have been tracking the performance of EVL tail folding in the loop
vectorizer on RISC-V for a while now, and after much hard work from
various contributors we think it should be generally profitable to
enable by default now.

With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU
2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30%
geomean codesize reduction on SPEC and TSVC, with no significant
regressions detected.

Now that we are early into the LLVM 22.x development cycle it seems like
a good time to enable it to catch any issues. There are still more EVL
related items of work being tracked in #123069, which should continue to
improve performance.
2025-08-08 14:26:23 +08:00
Luke Lau
0720af8c24 [LV][RISCV] Precommit RUN line changes from #151681. NFC
In preparation for enabling EVL tail folding by default.
2025-08-08 12:40:27 +08:00
Luke Lau
a04142f11f [LV][RISCV] Add check lines for scalable interleave costs. NFC
Previously we could only scalably vectorize interleave groups with
factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now
support all factors (available on RISC-V). So this adds the remaining
check lines for the scalable VFs.
2025-08-07 12:28:12 +08:00
Luke Lau
44af26ea2e [LV] Fix EVL test after merge. NFC
Test was modified in both 25d1285eecbab731eaf418c8aab44e4eb5f9e538 and
df8da2ff8370fda479b5c118704af4f50e0d3536
2025-08-07 11:12:43 +08:00
Luke Lau
df8da2ff83
[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110)
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.

We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.

We also need to move the creation of the backedge value to the latch so
that EVL dominates it.

With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.

The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
2025-08-07 10:54:24 +08:00
Florian Hahn
25d1285eec
[VPlan] Replace single-entry VPPhis with their incoming values.
Replace trivial, single-entry VPPhis with their incoming values,
2025-08-06 20:03:31 +01:00
Luke Lau
94a6cd464e
[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274)
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.

There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.

Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
2025-08-05 16:54:02 +08:00
Mel Chen
76d98cfcc4
[RISCV][TTI] Enable masked interleave access (#151665)
Now that support for masked loads/stores of interleave groups has
landed, we can enable the loop vectorizer to generate masked interleave
access where applicable.

This improves vectorization in several ways:
* Internal predication support: This enables interleave group
vectorization for loops with internal control flow predication, provided
all members of the group share the same predicate. Gaps in interleave
groups are still not efficiently handled by masking, so masking for gaps
remains disabled for now.
* Tail folding: This allows tail folding of loops with interleave groups
by using masking. Without this, vectorized loops with interleaves would
fall back to using separate gather/scatter accesses, which can be
significantly less efficient.

"[RISCV][TTI] Enable masked interleave access for scalable vector
(#149981)" was reverted by 5294793bdcf6ca142f7a0df897638bd4e85ed1a7 due
to triggering an assertion. The issue has been addressed in the patch
"[LV] Fix gap mask requirement for interleaved access (#151105)". On the
other hand, this patch also enable fixed-length masked interleave access
(#150624) since support for fixed-length has also been landed
992118cb4deab139ae384bb85f03225a9a21b008.

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2025-08-05 16:08:13 +08:00
Mel Chen
86916ff0f0
[LV] Fix gap mask requirement for interleaved access (#151105)
When interleaved stores contain gaps, a mask is required to skip the
gaps, regardless of whether scalar epilogues are allowed.
This patch corrects the condition under which a gap mask is needed,
ensuring consistency between the legacy and VPlan-based cost models and
avoiding assertion failures.

Related #149981
2025-08-01 14:24:30 +08:00
Luke Lau
9c90f848fd [LV][RISCV] Regenerate select-cmp-reduction.ll with UTC. NFC
Simplify the opt invocation, and remove -force-vector-width and the
fixed-length RUN line so we're testing what the loop vectorizer emits in
the default configuration.
2025-08-01 12:38:18 +08:00
Luke Lau
b926f5d923 [LV][RISCV] Regenerate reduction tests with UTC. NFC
Previously these tests used optimization remarks to check the chosen VF,
but I don't think thats needed if we can see from the types used in the
vectorized body. We can just use UTC to generate the body instead.

Whilst we're here, also simplify the opt invocation to remove unneeded
options and rename from scalable-reductions.ll -> reductions.ll, seeing
as it's not specifically testing for scalable VFs anymore.
2025-08-01 12:32:14 +08:00
Luke Lau
8dfc07cad2 [RISCV][LV] Merge check lines in bf16/f16 tests. NFC
Now that we fall back to scalar epilogues in #150908 the
non-zvfhmin/zvfbfmin lines should be the same.
2025-08-01 11:26:16 +08:00
Luke Lau
7250b66240
[VPlan] Create AVL as a phi from TC -> 0 with EVL tail folding (#151481)
This implements the first half of #151459, by changing the AVL so it's
no longer computed as `trip-count - EVL-based IV`, but instead a
separate scalar phi that is decremented by EVL each iteration.

This shortens the dependency chain for computing the AVL and should
eventually allow us to convert the branch condition to `branch-count
avl-next, 0`.

`simplifyBranchConditionForVFAndUF` had to be updated to prevent a
regression because this introduces a VPPhi in the header block.
2025-08-01 11:00:05 +08:00
Samuel Tebbs
fcc419b05f [LV][NFCI] Swap reduction recipe operand order
https://github.com/llvm/llvm-project/pull/147026 will enable sub
reductions, which require that the phi value is the first operand since
they aren't commutative. This re-orders the operands when executing
reductions, which actually matches other existing code in
VPReductionRecipe::execute.
2025-07-31 14:35:10 +01:00
Shih-Po Hung
cc8c941e17
[VPlan] Convert EVL loops to variable-length stepping after dissolution (#147222)
Loop regions require fixed-length steps and rounded-up trip counts, but
after dissolution creates explicit control flow, EVL loops can leverage
variable-length stepping with original trip counts.

This patch adds a post-dissolution transform pass to convert EVL loops
from fixed-length to variable-length stepping .
2025-07-30 16:50:57 +08:00
Luke Lau
b663e563cc
[VPlan] Fix header masks in EVL tail folding (#150202)
With EVL tail folding, the EVL may not always be VF on the
second-to-last iteration.

Recipes that have been converted to VP intrinsics via optimizeMaskToEVL
account for this, but recipes that are left behind will still use the
old header mask which may end up having a different vector length.

This is effectively the same as #95368, and fixes this by converting
header masks from icmp ule wide-canonical-iv, backedge-trip-count ->
icmp ult step-vector, evl. Without it, recipes that fall through
optimizeMaskToEVL may use the wrong vector length, e.g. in #150074 and
#149981.

We really need to split off optimizeMaskToEVL into
VPlanTransforms::optimize and move transformRecipestoEVLRecipes into
tryToBuildVPlanWithVPRecipes, so we don't mix up what is needed for
correctness and what is needed to optimize away the mask computations.
We should be able to still generate a correct albeit suboptimal VPlan
without running optimizeMaskToEVL. I've added a TODO for this, which I
think we can do after #148274

Fixes #150197
2025-07-30 11:31:04 +08:00