3239 Commits

Author SHA1 Message Date
Florian Hahn
c93d166c58
[VPlan] Simplify (MUL %x, 0) -> 0.
Simplify trivial multiplies.
https://alive2.llvm.org/ce/z/DabRkA
2025-07-28 21:50:57 +01:00
Florian Hahn
ccc96e6484
[LV] Add tests where vector trip count is known equal to VFxUF.
Add additional tests to cover the case where the trip count isn't equal
to VFxUF, but the vector trip count is.
2025-07-28 21:11:51 +01:00
Luke Lau
5f2092dae3 [RISCV][LV] Update f16/bf16 loop vectorizer tests. NFC
This fixes a failing test after the changes in #150908 affected the
result in #150882.
2025-07-28 23:19:03 +08:00
Luke Lau
fe4f6c1a58
[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882)
When vectorizing with predication some loops that were previously
vectorized without zvfhmin/zvfbfmin will no longer be vectorized because
the masked load/store or gather/scatter cost returns illegal.

This is due to a discrepancy where for these costs we check
isLegalElementTypeForRVV but for regular memory accesses we don't.

But for bf16 and f16 vectors we don't actually need the extension
support for loads and stores, so this adds a new function which takes
this into account.

For regular memory accesses we should probably also e.g. return an
invalid cost for i64 elements on zve32x, but it doesn't look like we
have tests for this yet.

We also should probably not be vectorizing these bf16/f16 loops to begin
with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this
is due to the scalar costs being too cheap. I've added tests for this in
a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.
2025-07-28 22:59:49 +08:00
Luke Lau
92d09245d6
[VPlan] Fall back to scalar epilogue if possible when EVL isn't legal (#150908)
When enabling predicated vectorization by default on RISC-V, there's a
bunch of performance regressions on llvm-test-suite's LoopInterleaving
microbenchmarks:
https://lnt.lukelau.me/db_default/v4/nts/788?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=791&baseline=730&submit=Update

Most of these regressions stem from the interleave_count pragma, which
causes EVL tail folding interleaving to be unsupported (since we don't
support unrolling with EVL)

Currently if DataWithEVL isn't legal we fall back to DataWithoutLaneMask
as the tail folding style, but this is very slow on RISC-V.

The order of performance roughly is something like:

DataWithEVL > None (scalar-epilogue) > Data[WithoutLaneMask]

So this patch tries to prevent the regressions by falling back to a
scalar epilogue where possible, i.e. the existing vectorization we have
today. Not we may still need to fall back to DataWithoutLaneMask, e.g.
if the trip count is low etc or it's forced by
-prefer-predicate-over-epilogue=predicate-dont-vectorize.
2025-07-28 20:10:36 +08:00
Florian Hahn
2f2df751d4
[LV] Use SCEV::getElementCount in selectEpilogueVectorizationFactor. (#150018)
Follow-up to https://github.com/llvm/llvm-project/pull/149789 to use
getElementCount to compute the remaining iterations in
selectEpilogueVectrizationFactor.

PR: https://github.com/llvm/llvm-project/pull/150018
2025-07-28 12:12:27 +01:00
Luke Lau
d4f9c11e06 [RISCV][LV] Use predicate-else-scalar-epilogue flag in tail folding tests. NFC
Align the tests closer with what we eventually intend to enable by
default on RISC-V by using
-prefer-predicate-over-epilogue=predicate-else-scalar-epilogue, instead
of dropping vectorization entirely with predicate-dont-vectorize.

Also adjust the non-EVL run lines so that they use
-prefer-predicate-over-epilogue=scalar-epilogue instead of
-force-tail-folding-style=none, so we're only using testing one type of
flag instead of a combination of two.
2025-07-28 17:04:00 +08:00
Luke Lau
ddb12c10a9 [RISCV][LV] Remove redundant -force-tail-folding-style from tests. NFC
This isn't needed after we set the tail folding style to data-with-evl
via TTI in #148686.  Also rename the tests to reflect the fact they're
no longer forcing the tail folding style.
2025-07-28 16:11:01 +08:00
Florian Hahn
89ae085859
[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735)
VPVectorPointer for part 0 is just the pointer operand. Simplify it
after unrolling. This removes a large number of redundant GEPs with
index 0.

PR: https://github.com/llvm/llvm-project/pull/149735
2025-07-27 13:53:26 +01:00
Florian Hahn
39b825e669
[LV] Add test for miscompile with conditional store.
Add test case for https://github.com/llvm/llvm-project/issues/149347.
2025-07-27 13:43:43 +01:00
Florian Hahn
80c43b6c07
[VPlan] Add ExtractLane VPInst to extract across multiple parts. (#148817)
This patch adds a new ExtractLane VPInstruction which extracts across
multiple parts using a wide index, to be used in combination with
FirstActiveLane.

The patch updates early-exit codegen to use it instead ExtractElement,
which is only per-part. With this change, interleaving should work
correctly with early-exit loops.

The patch removes the restrictions added in 6f43754e9 (#145877), but
does not yet automatically select interleave counts > 1 for early-exit
loops.

I'll share a patch as follow-up. The cost of extracting a lane adds
non-trivial overhead in the exit block, so that should be considered
when picking the interleave count.

PR: https://github.com/llvm/llvm-project/pull/148817
2025-07-27 08:08:25 +01:00
Florian Hahn
82e4b83328
[VPlan] Use terminator debug loc for latch BranchOnCond.
Update VPlan to consistently use the latch branch debug location for the
latch branch in the vector loop, if there is one.
2025-07-26 21:45:25 +01:00
Florian Hahn
fa3ec0c17c
[VPlan] Materialize constant vector trip counts before final opts. (#142309)
Materialize constant vector trip counts before ::execute, if the trip
count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now
this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle
block.

For now this is also only done when not vectorizing the epilogue, as the
simplification complicates stitching the 2 plans together.

PR: https://github.com/llvm/llvm-project/pull/142309
2025-07-26 17:16:36 +01:00
Florian Hahn
662bede01e
[LV] Handle known-false mem runtime checks in GeneratedRTChecks.
Handle mem checks known to be false in getMemRuntimeChecks the same way
as SCEV checks known to be false in getSCEVChecks. This ensures such
redundant check blocks are not added in the first place.
2025-07-26 15:39:21 +01:00
Florian Hahn
9e7782db73
[LV,LAA] Add tests where RT checks are known false after expansion. 2025-07-26 14:17:35 +01:00
Florian Hahn
e5f5813042
[LV] Update some tests to have variable trip counts. (NFC)
Update tests for which checking both the scalar resume and exit values
is interesting, because they have first-order recurrences to have
variable trip-counts, to avoid the branch in the middle.block being
folded away by https://github.com/llvm/llvm-project/pull/142309.

For similar reasons, also update check-prof-info.ll
2025-07-26 09:59:06 +01:00
Florian Hahn
9a201531ed
[LV] Bail out early if runtime checks are known to fail.
There are a number of cases for which SCEV may not be able to prove a
predicate will always be true/false, which may be simplified to a
constant during expansion (see discussion in
https://github.com/llvm/llvm-project/pull/131538).

Bail out early if runtime checks are known to always fail, as the
vector loop generated later will never execute.
2025-07-26 09:26:15 +01:00
Florian Hahn
445006d3a9
[LV] Add test for re-using existing phi for SCEV Add.
Add another test case for
https://github.com/llvm/llvm-project/pull/147824, where the difference
between an existing phi and the target SCEV is an add of a constant.
2025-07-25 21:08:39 +01:00
Alex Bradbury
5294793bdc Revert "[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)"
This reverts commit ee3a7714b7a69ac9aae4b79f4c67adc38bc6876b.

Causes an assertion for the zvl1024b RISC-V build configuration. See
comment with reproducer at
<https://github.com/llvm/llvm-project/pull/149981#issuecomment-3118482801>
2025-07-25 16:14:10 +01:00
Florian Hahn
e21ee41be4
[SCEV] Try to re-use pointer LCSSA phis when expanding SCEVs. (#147824)
Generalize the code added in
https://github.com/llvm/llvm-project/pull/147214 to also support
re-using pointer LCSSA phis when expanding SCEVs with AddRecs.

A common source of integer AddRecs with pointer bases are runtime checks
emitted by LV based on the distance between 2 pointer AddRecs.

This improves codegen in some cases when vectorizing and prevents
regressions with https://github.com/llvm/llvm-project/pull/142309, which
turns some phis into single-entry ones, which SCEV will look through
now (and expand the whole AddRec), whereas before it would have to treat
the LCSSA phi as SCEVUnknown.

Compile-time impact neutral:
https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u

PR: https://github.com/llvm/llvm-project/pull/147824
2025-07-25 15:29:40 +01:00
Mel Chen
ee3a7714b7
[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)
Now that support for masked loads/stores of interleave groups has
landed, we can enable the loop vectorizer to generate masked interleave
access where applicable.

This improves vectorization in several ways:
* Internal predication support: This enables interleave group
vectorization for loops with internal control flow predication, provided
all members of the group share the same predicate. Gaps in interleave
groups are still not efficiently handled by masking, so masking for gaps
remains disabled for now.
* Tail folding: This allows tail folding of loops with interleave groups
by using masking. Without this, vectorized loops with interleaves would
fall back to using separate gather/scatter accesses, which can be
significantly less efficient.
* Scalable vector support: Currently, only scalable vector types are
supported for masked interleave lowering. Fixed-length vector support
will be enabled in the future.

As interleave access is not yet supported with tail folding by EVL, that
functionality is temporarily disabled. We are going to create another
patch to support it.

Co-authored-by: Philip Reames <preames@rivosinc.com>

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2025-07-25 17:53:08 +08:00
Florian Hahn
6d004d2e5b
[LV] Add additional SCEV expansion tests for #147824.
Add additional test coverage for
https://github.com/llvm/llvm-project/pull/147824.
2025-07-25 10:23:56 +01:00
Luke Lau
feb77c0fea
[VPlan] Handle VPWidenSelectRecipe in tryToFoldLiveIns (#150357)
This helps simplify VPBlendRecipes that are expanded to selects in
another patch.
2025-07-25 09:46:19 +08:00
Luke Lau
9563e7a940
[VPlan] Mark VPInstruction::ExplicitVectorLength as single scalar. NFC (#150221)
This allows it to be broadcasted without an explicit
VPInstruction::Broadcast in #150202
2025-07-23 22:38:21 +08:00
Florian Hahn
77b1b956da
[LV] Also clamp MaxVF by trip count when maximizing vector bandwidth. (#149794)
Also clamp the max VF when maximizing vector bandwidth by the maximum
trip count. Otherwise we may end up choosing a VF for which the vector
loop never executes.

PR: https://github.com/llvm/llvm-project/pull/149794
2025-07-23 10:19:56 +01:00
Luke Lau
20c52e4231 Reapply "[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686)"
This reverts commit 25e97fc420f8ecc43fbabadfe9767b4163e6ee36.

The original commit was reverted due to a crash in llvm-test-suite. The
crash stemmed from a multiply reduction, which isn't supported for
scalable VFs on RISC-V. But for EVL tail folding we only support
scalable VFs, so when -force-tail-folding-style=data-with-evl is
specified we check to see if there's a scalable VF, and fall back to
data-without-lane-mask if there isn't.

This is done in setTailFoldingStyles, but previously we were only
checking if the forced tail folding style was legal, not the style
returned by TTI.

This version fixes this by checking the actual computed tail folding
style and not just the forced one, and adds a test for the crash in
llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll
2025-07-22 23:52:02 +08:00
Luke Lau
25e97fc420 Revert "[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686)"
This reverts commit 38318dd05615a2f38abdeeae99e7423165308902.

The clang-riscv-gauntlet buildbot is breaking with this commit:
https://lab.llvm.org/buildbot/#/builders/210/builds/371
2025-07-22 22:54:26 +08:00
Luke Lau
6e723d2de8
[VPlan] Remove loop region in simplifyBranchConditionForVFAndUF with EVL PHI (#150016)
Previously we fell back to just simplifying the branch cond to true
since one of the phis was a VPEVLBasedIVPHIRecipe. However this should
be fine to replace with its start value.
2025-07-22 22:30:34 +08:00
Luke Lau
38318dd056
[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686)
In preparation to eventually make EVL tail folding the default, this
patch sets DataWithEVL as the preferred tail folding style for RISC-V,
but doesn't enable tail folding by default.

And although tail folding isn't enabled by default, the loop vectorizer
will actually tail fold loops with a small trip count, so this will
cause some EVL vectorized loops to be generated in the default
configuration.

The EVL tail folding work is still not complete, e.g. we still need to
handle interleave groups etc., see #123069, but a lot of these missing
features also apply to the data (masked) tail folding strategy, which is
the default anyway.

The actual overall performance picture is much better, on TSVC EVL tail
folding is faster than data on every benchmark on the spacemit-x60[^1]:
https://lnt.lukelau.me/db_default/v4/nts/755?compare_to=756
And on SPEC CPU 2017 we see a geomean improvement[^2]:
https://lnt.lukelau.me/db_default/v4/nts/751?compare_to=753

This is likely due to masked instructions generally being less
performant on the spacemit-x60, up to twice as slow:
https://camel-cdr.github.io/rvv-bench-results/bpi_f3/index.html

[^1]: These benchmarks don't exactly give the same performance numbers
as this patch, but it's a good indicator that EVL tail folding is
generally faster than masked tail folding.
[^2]: The large code size increase in 505.mcf_r is due to a function
being inlined now
2025-07-22 21:02:59 +08:00
Florian Hahn
37f0f10a85
[LV] Don't vectorize epilogue with scalable VF if no iterations remain. (#149789)
Currently we may try to vectorize the epilogue with a scalable VF, even
if there are no remaining iterations after the main vector loop with a
fixed VF.

Update selectEpilogueVectorizationFactor to always compute the number of
remaining iterations and exit early if no epilogue iterations remain.

Fixes https://github.com/llvm/llvm-project/issues/149726

PR: https://github.com/llvm/llvm-project/pull/149789
2025-07-22 13:13:31 +01:00
Luke Lau
cb8b0cd2cf [LV] Precommit test changes for #148686. NFC
Namely explicitly adding -force-tail-folding-style=data to existing RUN
lines so that we don't lose them when we switch to data-with-evl by
default.
2025-07-22 16:16:43 +08:00
Mel Chen
d2a7f4e528
[NFC][LV] Refine the lit test case riscv-vector-reverse.ll (#149020)
This patch includes the following changes:
1. Merge riscv-vector-reverse-output.ll into riscv-vector-reverse.ll,
and only check the generated LLVM IR.
2. Add vplan-riscv-vector-reverse.ll to preserve the original debug
output checks from riscv-vector-reverse.ll.
2025-07-22 14:56:14 +08:00
Mel Chen
6f240d5a7d
[LV][EVL] Remove interleave count from the test case for EVL tail-folding. nfc (#149834)
Remove the interleave count since we have not supported it when EVL
tail-folding.
2025-07-22 08:59:53 +08:00
Florian Hahn
3fd53db858
[VPlan] Remove unneeded VPVectorPointer after narrowing to replicate.
The replicate recipes created when narrowing interleave groups don't
need a VPVectorPointer, they can simply use the existing pointer.
2025-07-19 20:18:04 +01:00
Florian Hahn
004c67ea25
[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239)
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros 
are already handled consistently by maxnum/minnum.

If any input is NaN,
 *exit the vector loop,
 *compute the reduction result up to the vector iteration that contained
   NaN inputs and
 * resume in the scalar loop


New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.

PR: https://github.com/llvm/llvm-project/pull/148239
2025-07-18 21:58:19 +01:00
Nicholas Guy
b5e3fffd20
[LoopVectorizer][NFC] Require asserts on maxbandwidth-regpressure.ll (#149484)
Fix for buildbot failure:
https://lab.llvm.org/buildbot/#/builders/11/builds/19837
2025-07-18 10:21:21 +01:00
Nicholas Guy
20fc297ce3
[LoopVectorizer] Only check register pressure for VFs that have been enabled via maxBandwidth (#149056)
Currently if MaxBandwidth is enabled, the register pressure is checked
for each VF. This changes that to only perform said check if the VF
would not have otherwise been considered by the LoopVectorizer if
maxBandwidth was not enabled.

Theoretically this allows for higher VFs to be considered than would
otherwise be deemed "safe" (from a regpressure perspective), but more
concretely this reduces the amount of work done at compile-time when
maxBandwidth is enabled.
2025-07-18 09:21:20 +01:00
Florian Hahn
46357438ba
[SCEV] Try to re-use existing LCSSA phis when expanding SCEVAddRecExpr. (#147214)
If an AddRec is expanded outside a loop with a single exit block, check
if any of the (lcssa) phi nodes in the exit block match the AddRec. If
that's the case, simply use the existing lcssa phi.

This can reduce the number of instruction created for SCEV expansions,
mainly for runtime checks generated by the loop vectorizer.

Compile-time impact should be mostly neutral

https://llvm-compile-time-tracker.com/compare.php?from=48c7a3187f9831304a38df9bdb3b4d5bf6b6b1a2&to=cf9d039a7b0db5d0d912e0e2c01b19c2a653273a&stat=instructions:u

PR: https://github.com/llvm/llvm-project/pull/147214
2025-07-17 15:47:54 +01:00
Florian Hahn
afe8150780
[VPlan] Simplify exituser handling by generating all extracts first(NFCI)
Simplify the handling of exit users by generating all extracts first
(safe option), and have FOR handling optimize the extracts, similar to
already done for reductions and inductions.

NFC modulo first-order recurrence extract order in middle block.
2025-07-16 08:14:12 +01:00
Florian Hahn
cfdd5ca2ed
[LV] Add tests for fmin reductions without fast-math flags.
Some of those reductions can be vectorized with extra checks.

Extra tests for https://github.com/llvm/llvm-project/pull/148239 and
follow-ups.
2025-07-15 13:34:12 +01:00
David Sherwood
c363a3f9c8
[LV] Ensure getScaledReductions only matches extends inside the loop (#148264)
In getScaledReductions for the case where we try to match a partial
reduction of the form:

%phi = phi i32 ...
...
%add = add i32 %phi, %zext

where

%zext = i8 %some_val to i32

we should ensure that %zext is actually inside the loop.

Fixes https://github.com/llvm/llvm-project/issues/148260
2025-07-15 09:54:58 +01:00
Luke Lau
c8d0e24745
[VPlan] Preserve trunc nuw/nsw in VPRecipeWithIRFlags (#144700)
This preserves the nuw/nsw flags on widened truncs by checking for
TruncInst in the VPIRFlags constructor

The motivation for this is to be able to fold away some redundant truncs
feeding into uitofps (or potentially narrow the inductions feeding them)
2025-07-15 15:34:14 +08:00
Florian Hahn
5a4586f468
Reapply "[LAA] Remove loop-invariant check added in 234cc40adc61."
This reverts commit d43a80936d437d217d5a6dbbaa5fb131c27e7085.

With the correctness issue blocking the recommit finally fixed
(5d01697ec6cb), again unconditionally check if accesses are completely
before or after each other.
2025-07-14 21:21:22 +01:00
Luke Lau
df387661c4
[RISCV] Remove -riscv-v-vector-bits-min from LoopVectorize tests. NFC (#148565)
If I understand correctly there was a point where we used to need this
before it was implied by Zvl*b.

Now that it is though and we use -mattr=+v in pretty much every test we
can remove it.

In unroll-in-loop-vectorizer.ll we can force a VF of 1 instead by using
-force-vector-width=1, and in scalable-basics.ll the two RUN lines were
the same so I merged them.
2025-07-14 21:59:35 +08:00
Florian Hahn
cad62df49a
[Loads] Support dereferenceable assumption with variable size. (#128436)
Update isDereferenceableAndAlignedPointer to make use of dereferenceable
assumptions with variable sizes via SCEV.

To do so, factor out the logic to check via an assumption to a helper,
and use SE to check if the access size is less than the dereferenceable
size.

PR: https://github.com/llvm/llvm-project/pull/128436
2025-07-14 08:17:33 +01:00
Florian Hahn
f4c7cc26b6
[LV] Use more precise isPredicatedInst in legacy CCH (NFC).
Legal::isMaskRequired may be overly conservative and also return true
when no mask is actually required.

Use isPredicatedInst from the cost model instead, which fixes a
cost-model divergence between legacy and VPlan cost model where the
legacy cost model incorrectly assumed some loads were predicated.

Fixes https://github.com/llvm/llvm-project/issues/148431.
2025-07-13 19:55:34 +01:00
Florian Hahn
cc65da0fb1
[LV] Update fmax tests to include ogt/olt/ole/ugt predicates.
Adjust and update tests as per feedback in
https://github.com/llvm/llvm-project/pull/146711.
2025-07-13 12:16:54 +01:00
Anna Thomas
fe403584c4 [LV] Add a statistic for early exit vectorization
Add statistic LoopsEarlyExitVectorized

PR: https://github.com/llvm/llvm-project/pull/145730
2025-07-11 09:10:26 -04:00
David Sherwood
74e3dfe389
[LV] Disable forcing interleaving for uncountable early exit loops (#147993)
Interleaving does not currently work properly when vectorising loops
with uncountable early exits. Interleaving is already disabled for
normal vectorisation and for the pragma/hint - this patch also disables
it when using -force-vector-interleave.
2025-07-11 09:46:21 +01:00
Florian Hahn
c452de1715
Reapply "[VPlan] Allow derived IVs and scalar-steps in narrowing interleave."
This reverts commit f5ed863176dd286462cd5558723dfe445967fedf.

Recommit patch now that the crash exposed by the change has been fixed.
2025-07-10 20:48:19 +01:00