3307 Commits

Author SHA1 Message Date
Florian Hahn
10a6fd70d6
[LV] Regenerate checks for test (NFC).
Auto-generate check lines for scalable-loop-unpredicated-body-scalar-tail.ll,
while also updating the input to be more compact and avoid unnecessary
checks to keep auto-generated checks compact without loss of
generality.
2025-08-13 12:20:50 +01:00
Mel Chen
b9138bde35
[LV][EVL] More lit tests for interleaved access. nfc (#152959)
Add test cases for reverse interleaved access and interleaved access
with gap.
2025-08-13 15:43:39 +08:00
Florian Hahn
8cdab07aaa
Reapply "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9.

Recommit with a fix for the verifier error caused for EVL recipes.

Extra test coverage added in 6f939da60e.
2025-08-12 22:09:30 +01:00
Florian Hahn
6f939da60e
[LV] Add additional test for backedge elimination with EVL. 2025-08-12 21:58:19 +01:00
Florian Hahn
424258947e
[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879)
Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.

This is one of the last few steps needed to model the full vector
skeleton in VPlan.

This is mostly NFC, although in some cases we remove some unused
computations.

PR: https://github.com/llvm/llvm-project/pull/152879
2025-08-12 14:13:13 +01:00
David Sherwood
8140779a9a
[LV] Improve accuracy of branch weights in epilogue iteration check block (#152980)
When one of the vector loops (main or epilogue) is scalable and the
other isn't, we can use the estimated value of vscale to improve the
accuracy.
2025-08-12 10:37:47 +01:00
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Florian Hahn
3cad3de6ea
[LV] Add more tests for handling IR metadata for interleave groups.
Includes a test case for
https://github.com/llvm/llvm-project/issues/153006
2025-08-11 22:09:07 +01:00
Florian Hahn
1c7c8e3ad3
Revert "[VPlan] Remove trivial dead VPPhi cycles."
This reverts commit 1f17bb133f4f49942a1e0245291811ca3c99a7d2.

This seems to be breaking some RISCV bots, reverting for now
https://lab.llvm.org/buildbot/#/builders/210/builds/1266
2025-08-11 22:05:30 +01:00
Florian Hahn
1f17bb133f
[VPlan] Remove trivial dead VPPhi cycles.
Update removeDeadRecipes to remove trivial dead VPPhi cycles.

Should effectively be NFC end-to-end.
2025-08-11 21:29:49 +01:00
Ramkumar Ramachandra
95c525b1db
[VPlan] Preserve nusw on VectorEndPointer (#151558)
In createInterleaveGroups, get the nusw in addition to inbounds from the
existing GEP, and set them on the VPVectorEndPointerRecipe.
2025-08-11 10:38:25 +01:00
David Sherwood
9181a7e294
[LV] Fix branch weights in epilogue min iteration check block (#152534)
I've changed how we construct the EpilogueVectorizerEpilogueLoop and
EpilogueVectorizerMainLoop classes so that we construct the parent class
with an additional boolean parameter indicating whether we're
vectorising the main or epilogue loop. The
InnerLoopAndEpilogueVectorizer class uses this new argument in
combination with the EpilogueLoopVectorizationInfo struct to set the
right UF and VF values. This then allows EpilogueVectorizerEpilogueLoop
to access the correct values of VF and UF for the main loop, which are
required when setting branch weights in the minimum iteration check
block.
2025-08-11 09:52:54 +01:00
Elvis Wang
37fe7a9933
[LV] Generate scalar xor for VPInstruction::Not if possible. (#152628)
`VPInstruction::Not` which will generate xor instruction is widely used
for the exit condition. This patch make `VPInstruction::Not` generate
scalar `xor` if possible.

This can help reducing the (splat true) in the `xor` and make `xor` be
scalar.
2025-08-11 16:35:21 +08:00
Florian Hahn
86813aa786
[VPlan] Add dedicated user for resume phi with epilogue vectorization.
Epilogue vectorization currently relies on the resume phi for the
canonical induction being always available, which is why VPPhi are
considered to have side-effects, to prevent their removal.

This patch adds a new ResumeForEpilogue opcode to mark the resume phi as
used for epilogue vectorization. This allows treating VPPhis in general
as not having side-effects, enabling removal of unused VPPhis.
2025-08-10 21:21:16 +01:00
Florian Hahn
d9199a85e1
[LV] Add missing check lines for tests.
Add stray missing check lines for 2 tests.
2025-08-09 21:33:36 +01:00
Luke Lau
723de7f231 [LV][RISCV] Try fixing Windows buildbot failure in force-vect-msg.ll. NFC
The clang-x64-windows-msvc buildbot is failing after
707447159341f7b5678dee4f47731af50524b9ae due to this test failing:
https://lab.llvm.org/buildbot/#/builders/63/builds/8528

This is a stab in the dark, but my first thought is that it may be due
to the handling of floats with MSVC or something. So this removes the
floating point part of the check. I don't have access to a Windows
machine handy to debug this just yet, so pushing this to see if it can
quickly return the buildbot to green.
2025-08-09 21:09:36 +08:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Graham Hunter
de72cca671
[CostModel] Provide a default model for histogram intrinsics (#149348)
Since we scalarize these intrinsics when the target does not support
them, we should model that for costing purposes.
2025-08-08 11:00:00 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Luke Lau
7074471593
[RISCV] Enable tail folding by default (#151681)
We have been tracking the performance of EVL tail folding in the loop
vectorizer on RISC-V for a while now, and after much hard work from
various contributors we think it should be generally profitable to
enable by default now.

With tail folding there is a 21% improvement on 525.x264_r on SPEC CPU
2017 on the BPI-F3 (-march=rva22u64_v -O3 -flto), as well as a 30%
geomean codesize reduction on SPEC and TSVC, with no significant
regressions detected.

Now that we are early into the LLVM 22.x development cycle it seems like
a good time to enable it to catch any issues. There are still more EVL
related items of work being tracked in #123069, which should continue to
improve performance.
2025-08-08 14:26:23 +08:00
Luke Lau
0720af8c24 [LV][RISCV] Precommit RUN line changes from #151681. NFC
In preparation for enabling EVL tail folding by default.
2025-08-08 12:40:27 +08:00
Ramkumar Ramachandra
edeee824f0
Reland [VectorUtils] Trivially vectorize ldexp, [l]lround (#152476)
Changes: The original patch, landed as 1336675, was reverted due to a
bug in LoopVectorize resulting in a crash. The bug has now been fixed by
95c32bf ([VPlan] Return invalid cost if any skeleton block has invalid
costs), and this reland is identical to the original patch.
2025-08-07 12:07:29 +01:00
Florian Hahn
47944d071f
[LV] Auto-generate checks for sve-low-trip-count.ll.
Auto-generate checks for
https://github.com/llvm/llvm-project/pull/151925. Also update some
naming to make more consistent with other tests.
2025-08-07 10:50:20 +01:00
Florian Hahn
95c32bf2d4
[VPlan] Return invalid cost if any skeleton block has invalid costs. (#151940)
We need to reject plans that contain recipes with invalid costs. LICM
can move recipes with invalid costs out of the loop region, which then
get missed by the main cost computation.

Extend the logic to check recipes for invalid cost currently only
covering the middle block to include all skeleton blocks.

Fixes https://github.com/llvm/llvm-project/issues/144358 
Fixes https://github.com/llvm/llvm-project/issues/151664

PR: https://github.com/llvm/llvm-project/pull/151940
2025-08-07 10:45:27 +01:00
Ties Stuij
b9e133d5b6
[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A320 (#152156)
With this new A320 in-order core, we follow adding the
FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520
(#132246), which reaps the same code generation benefits of preferring
fixed over scalable when the cost is equal.

So when we have:
```
void foo(float* a, float* b, float* dst, unsigned n) {
    for (unsigned i = 0; i < n; ++i)
        dst[i] = a[i] + b[i];
}
```

When compiling without the feature enabled, we get:
```
...
    ld1b    { z0.b }, p0/z, [x0, x10]
    ld1b    { z2.b }, p0/z, [x1, x10]
    add     x12, x0, x10
    ldr     z1, [x12, #1, mul vl]
    add     x12, x1, x10
    ldr     z3, [x12, #1, mul vl]
    fadd    z0.s, z2.s, z0.s
    add     x12, x2, x10
    fadd    z1.s, z3.s, z1.s
    dech    x11
    st1b    { z0.b }, p0, [x2, x10]
    incb    x10, all, mul #2
    str     z1, [x12, #1, mul vl]
...
```

When compiling with, we get:
```
...
  	ldp	    q0, q1, [x12, #-16]
	ldp	    q2, q3, [x11, #-16]
	subs	x13, x13, #8
	fadd	v0.4s, v2.4s, v0.4s
	fadd	v1.4s, v3.4s, v1.4s
	add	    x11, x11, #32
	add	    x12, x12, #32
	stp	    q0, q1, [x10, #-16]
	add	    x10, x10, #32

...
```
2025-08-07 09:48:09 +01:00
Luke Lau
a04142f11f [LV][RISCV] Add check lines for scalable interleave costs. NFC
Previously we could only scalably vectorize interleave groups with
factor 2, but after 7ef77eb9984d1fb537a409cf4be89560fbb681fe we now
support all factors (available on RISC-V). So this adds the remaining
check lines for the scalable VFs.
2025-08-07 12:28:12 +08:00
Luke Lau
44af26ea2e [LV] Fix EVL test after merge. NFC
Test was modified in both 25d1285eecbab731eaf418c8aab44e4eb5f9e538 and
df8da2ff8370fda479b5c118704af4f50e0d3536
2025-08-07 11:12:43 +08:00
Luke Lau
df8da2ff83
[VPlan] Support VPWidenPointerInductionRecipes with EVL tail folding (#152110)
Now that VPWidenPointerInductionRecipes are modelled in VPlan in
#148274, we can support them in EVL tail folding.

We need to replace their VFxUF operand with EVL as the increment is not
guaranteed to always be VF on the penultimate iteration, and UF is
always 1 with EVL tail folding.

We also need to move the creation of the backedge value to the latch so
that EVL dominates it.

With this we will no longer fail to convert a VPlan to EVL tail folding,
so adjust tryAddExplicitVectorLength to account for this. This brings us
to 99.4% of all vector loops vectorized on SPEC CPU 2017 with tail
folding vs no tail folding.

The test in only-compute-cost-for-vplan-vfs.ll previously relied on
widened pointer inductions with EVL tail folding to end up in a scenario
with no vector VPlans, so this also replaces it with an unvectorizable
fixed-order recurrence test from
first-order-recurrence-multiply-recurrences.ll that also gets discarded.
2025-08-07 10:54:24 +08:00
Anna Thomas
59231115b0 [Loads] Precommit tests for #149551. NFC
Add these tests that currently require predicated loads due to variable
start SCEV.
2025-08-06 15:43:51 -04:00
Florian Hahn
25d1285eec
[VPlan] Replace single-entry VPPhis with their incoming values.
Replace trivial, single-entry VPPhis with their incoming values,
2025-08-06 20:03:31 +01:00
Florian Hahn
e80e7e717e
[VPlan] Use scalar VPPhi instead of VPWidenPHIRecipe in createPlainCFG. (#150847)
The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.

PR: https://github.com/llvm/llvm-project/pull/150847
2025-08-06 14:43:03 +01:00
Florian Hahn
d478502a42
[VPlan] Ensure that IV resume phi for epilogue is always first. (NFCI)
Update handling of canonical IV resume phi for the epilogue loop to make
sure the resume phi for the canonical IV is always the first phi in the
scalar preheader.

This makes it easier to retrieve it in preparePlanForEpilogueVectorLoop.

For now, we keep an assert to make sure we use the same resume phi as
before. This will be removed in the future.
2025-08-05 21:06:41 +01:00
Florian Hahn
e3ededa0f1
[LV] Add tests with canonical widen IV, reductions in different order.
Add missing test coverage for re-using the resume value from the main
vector loop for the canonical IV in the epilogue.
2025-08-05 19:19:13 +01:00
Ramkumar Ramachandra
f03345a07a
[LV] Improve a test; get rid of runtime checks (#152182) 2025-08-05 18:48:10 +01:00
Ramkumar Ramachandra
5dfc2d4535
[LV] Regen some tests with UTC (#152128) 2025-08-05 18:01:02 +01:00
Luke Lau
94a6cd464e
[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274)
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.

There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.

Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
2025-08-05 16:54:02 +08:00
Florian Hahn
c9dd14d1d4
[VPlan] Compute interleave count for VPlan. (#149702)
Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some
information directly from VPlan. Register pressure was already computed
for a VPlan, and with this patch we now also check for reductions
directly on VPlan, as well as checking how many load and store
operations remain in the loop.

This should be mostly NFC, but we may compute slightly different
interleave counts, except for some edge cases, e.g. where dead loads
have been removed. This shouldn't happen in practice, and the patch
doesn't cause changes across a large test corpus on AArch64.

Computing the interleave count based on VPlan allows for making better
decisions in presence of VPlan optimizations, for example when
operations on interleave groups are narrowed.

Note that there are a few test changes for tests that were still
checking the legacy cost-model output when it was computed in
selectInterleaveCount.

PR: https://github.com/llvm/llvm-project/pull/149702
2025-08-05 09:42:55 +01:00
Mel Chen
76d98cfcc4
[RISCV][TTI] Enable masked interleave access (#151665)
Now that support for masked loads/stores of interleave groups has
landed, we can enable the loop vectorizer to generate masked interleave
access where applicable.

This improves vectorization in several ways:
* Internal predication support: This enables interleave group
vectorization for loops with internal control flow predication, provided
all members of the group share the same predicate. Gaps in interleave
groups are still not efficiently handled by masking, so masking for gaps
remains disabled for now.
* Tail folding: This allows tail folding of loops with interleave groups
by using masking. Without this, vectorized loops with interleaves would
fall back to using separate gather/scatter accesses, which can be
significantly less efficient.

"[RISCV][TTI] Enable masked interleave access for scalable vector
(#149981)" was reverted by 5294793bdcf6ca142f7a0df897638bd4e85ed1a7 due
to triggering an assertion. The issue has been addressed in the patch
"[LV] Fix gap mask requirement for interleaved access (#151105)". On the
other hand, this patch also enable fixed-length masked interleave access
(#150624) since support for fixed-length has also been landed
992118cb4deab139ae384bb85f03225a9a21b008.

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2025-08-05 16:08:13 +08:00
Mel Chen
8761b6cf8f
[VPlan] Use VPTypeAnalysis to get the step type of widen pointer induction (#147925)
This patch uses VPTypeAnalysis to determine its type since the induction
step is not always a live-in value in the VPlan and may be defined by a
recipe.
2025-08-05 09:13:44 +08:00
Ramkumar Ramachandra
9d151897ba
[LV] Improve a test, regen with UTC (#151947) 2025-08-04 18:41:47 +01:00
Florian Hahn
66a8341f6d
[VPlan] Skip disconnected exit blocks in hasEarlyExit. (#151718)
Currently hasEarlyExit returns true, if there are multiple exit blocks.
ExitBlocks contains the wrapped original IR exit blocks. Without
checking the predecessors we incorrectly return true for loops with
multiple countable exits, that have been vectorized by requiring a
scalar epilogue. In that case, the exit blocks will get disconnected.

Fix this by filtering out disconnected exit blocks.

Currently this should only impact the 'early-exit vectorized' statistic.

PR: https://github.com/llvm/llvm-project/pull/151718
2025-08-04 11:31:00 +01:00
Ramkumar Ramachandra
8a55d46ebc
[LV] Add missing CHECK lines for a test from UTC (#151871) 2025-08-04 09:42:59 +01:00
Florian Hahn
559d1dff89
[VPlan] Materialize BackedgeTakenCount using VPInstructions.
Explicitly compute the backedge-taken count using VPInstruction. This is
needed to model the full skeleton in VPlan.

NFC modulo some instruction re-ordering.
2025-08-03 12:21:28 +01:00
Florian Hahn
39c30665e9
[VPlan] Update type of cloned instruction in scalarizeInstruction.
The operands of the replicate recipe may have been narrowed, resulting
in a narrower result type. Update the type of the cloned instruction to
the correct type.

Fixes https://github.com/llvm/llvm-project/issues/151392.
2025-08-02 19:49:59 +01:00
Florian Hahn
08f50e9665
[VPlan] Use vector tripcount if computable when simplifying conds. (#151034)
Update isConditionTrueViaVFAndUF to use the vector trip count if
computable. This is the case when it has been materialized to a
constant. Otherwise fall back to the trip count.

PR: https://github.com/llvm/llvm-project/pull/151034
2025-08-02 16:31:31 +01:00
Florian Hahn
eee9755881
[LV] Refine check to find epilogue IV resume value.
Make sure to check that the vector trip count is containedin the list of
incoming values to serve as tie-breaker with phis with all-zero incoming
values.

Fixes https://github.com/llvm/llvm-project/issues/151686.
2025-08-01 20:54:39 +01:00
Ramkumar Ramachandra
78c57c9b02
[LV] Fix missing REQUIRES: asserts in test (#151737)
e7200c7 ([LV] Pre-commit test for #151664) forgot to require asserts in
the test, and stripped a CHECK line in error. Fix this.
2025-08-01 18:59:20 +01:00
Ramkumar Ramachandra
e7200c734d
[LV] Pre-commit test for #151664 (#151671)
Hoisted vector instructions are costed incorrectly.
2025-08-01 17:09:11 +01:00
Florian Hahn
d204fdcc23
[LV] Add countable multi-exit test to vec.stats.ll
Currently the multi-exit loops with all countable exits are considered
vectorized with early exit, while that terminology is used for loops we
performed early-exit vectorization, instead of requring a scalar
epilogue.
2025-08-01 16:39:28 +01:00
Florian Hahn
2ae996cbbe
[LAA] Support assumptions in evaluatePtrAddRecAtMaxBTCWillNotWrap (#147047)
This patch extends the logic added in
https://github.com/llvm/llvm-project/pull/128061 to support
dereferenceability information from assumptions as well.

Unfortunately both assumption cache and the dominator tree need to be
threaded through multiple layers to make them available where needed.

PR: https://github.com/llvm/llvm-project/pull/147047
2025-08-01 14:18:07 +01:00