1044 Commits

Author SHA1 Message Date
Hassnaa Hamdi
f7f41350b4
[LV]: Skip Epilogue scalable VF greater than RemainingIterations. (#156724)
Consider skipping epilogue scalable VF when they are greater than
RemainingIterations same as fixed VF.
And skip scalable RemainingIterations from that comparison because
SCEV ATM can't evaluate non-canonical vscale-based expressions.
2025-11-19 05:11:17 +00:00
Florian Hahn
7c34848ae1
[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247)
This patch implements a transform to hoists single-scalar replicated
loads with invariant addresses out of the vector loop to the preheader
when scoped noalias metadata proves they cannot alias with any stores in
the loop.

This enables hosting of loads we can prove do not alias any stores in
the loop due to memory runtime checks added during vectorization.

PR: https://github.com/llvm/llvm-project/pull/166247
2025-11-18 09:35:48 +00:00
Ramkumar Ramachandra
ef023cae38
Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)
Changes: The previous patch had to be reverted to a mismatching-OpType
assert in cse. The reduced-test has now been added corresponding to a
RVV pointer-induction, and the pointer-induction case has been updated
to use createOverflowingBinaryOp.

While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-17 13:44:25 +00:00
Florian Hahn
ca26cf8611 [LV] Use variables in CHECK lines for unnamed VPValues in test.
Update test to capture unnamed VPValues in variables, making it easier
to update with future VPlan changes.
2025-11-15 12:10:03 +00:00
Florian Hahn
77fd6bef38 [LV] Also cover -force-target-instruction-cost=1 in tests.
Extend test to cover different -force-target-instruction-cost settings.
2025-11-14 21:15:14 +00:00
Alex Bradbury
f2336d4c7e
Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080)
Reverts llvm/llvm-project#163538

This is causing build failures on the two-stage RVV buildbots. e.g.
https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a
reproducer and more information at
https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822

This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.
2025-11-14 16:11:48 +00:00
Ramkumar Ramachandra
355e0f94af
[VPlan] Expand WidenInt inductions with nuw/nsw (#163538)
While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-14 12:10:55 +00:00
Florian Hahn
79cd1b7a25 [LV] Drop verbose check-prefix from partial-reduce-incomplete-chains.ll.
There's only a single RUN line in the test, use the more compact default CHECK.
2025-11-13 22:18:01 +00:00
Matt Arsenault
d4c8cfeac0
AArch64: Regenerate baseline checks in loop vectorize test (#167926) 2025-11-13 19:11:32 +00:00
Ryan Buchner
a04c6b5512
[LV] Update LoopVectorizationPlanner::emitInvalidCostRemarks to handle reduction plans (#165913)
The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case.

Fixes #165359.
2025-11-13 06:12:40 -10:00
Florian Hahn
b9f0dadc10
[VPlan] Merge fcmp uno feeding Or. (#167251)
Fold
 or (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
 or (fcmp uno %A, %B), ...

This pattern is generated to check if any vector lane is NaN, and
combining multiple compares is beneficial on architectures that have
dedicated instructions.

Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM

Combine suggested as part of #161735

PR: https://github.com/llvm/llvm-project/pull/167251
2025-11-12 10:15:59 +00:00
Kerry McLaughlin
de3de3f143
[LV] Consider interleaving when -enable-wide-lane-mask=true (#163387)
Currently the only way to enable the use of wide active lane masks is to pass
-enable-wide-lane-mask and force both interleaving & tail-folding with additional
flags. This patch changes selectInterleaveCount to consider interleaving if wide
lane masks were requested, although the feature remains off by default.
2025-11-11 11:46:59 +00:00
Sander de Smalen
517d725463
[LV] Move condition to VPPartialReductionRecipe::execute (#166136)
This means that VPExpressions will now be constructed for
VPPartialReductionRecipe's when the loop has tail-folding predication.

Note that control-flow (if/else) predication is not yet handled
for partial reductions, because of the way partial reductions
are recognised and built up.
2025-11-11 09:42:54 +00:00
Ramkumar Ramachandra
c2d4c7c18b
[VPlan] Permit more users in narrowToSingleScalars (#166559)
narrowToSingleScalarRecipes can permit users that are WidenStore, or a
VPInstruction that has a suitable opcode. This is a generalization and
extension of the existing code.
2025-11-10 17:03:14 +00:00
Luke Lau
bfd4155f23
[VPlan] Don't apply predication discount to non-originally-predicated blocks (#160449)
Split off from #158690. Currently if an instruction needs predicated due
to tail folding, it will also have a predicated discount applied to it
in multiple places.
This is likely inaccurate because we can expect a tail folded
instruction to be executed on every iteration bar the last.

This fixes it by checking if the instruction/block was originally
predicated, and in doing so prevents vectorization with tail folding
where we would have had to scalarize the memory op anyway.

On llvm-test-suite this causes 4 loops in total to no longer be
vectorized with -O3 on arm64-apple-darwin, and there's no observable
performance impact.
2025-11-10 12:10:40 +00:00
Ramkumar Ramachandra
2d1d5fe78e
[VPlan] Simplify branch-cond with getVectorTripCount (#155604)
Call getVectorTripCount first, and call getTripCount failing that, in
simplifyBranchConditionForVFAndUF, to simplify missed cases. While at
it, strip the dead check for a zero TC.
2025-11-10 10:43:37 +00:00
Florian Hahn
3b219cf42a [LV] Add register pressure test for #164124.
Add extra test for https://github.com/llvm/llvm-project/pull/164124
2025-11-08 21:59:38 +00:00
Florian Hahn
3ee2f07e17
[VPlan] Support multiple F(Max|Min)Num reductions. (#161735)
Generalize handleMaxMinNumReductions to handle any number of
F(Max|Min)Num reductions by collecting a vector of reductions to
convert.

We then add NaN checks for all of them, followed by adjusting the branch
controlling the vector loop region, and updating the resume phis.

Addresses a TODO from https://github.com/llvm/llvm-project/pull/148239

PR: https://github.com/llvm/llvm-project/pull/161735
2025-11-07 13:59:06 +00:00
Florian Hahn
3ad5765e23
[LV] Check all users of partial reductions in chain have same scale. (#162822)
Check that all partial reductions in a chain are only used by other
partial reductions with the same scale factor. Otherwise we end up
creating users of scaled reductions where the types of the other
operands don't match.

A similar issue was addressed in
https://github.com/llvm/llvm-project/pull/158603, but misses the chained
cases.

Fixes https://github.com/llvm/llvm-project/issues/162530.

PR: https://github.com/llvm/llvm-project/pull/162822
2025-11-06 21:45:57 +00:00
Martin Storsjö
8b3a124ad8 Revert "[InterleavedAccess] Construct interleaved access store with shuffles"
This reverts commit 78d649199b47370b72848c1ca8d9bd3323b050ac.

That commit caused failed asserts, see
https://github.com/llvm/llvm-project/pull/164000 for details.
2025-11-06 11:09:26 +02:00
Shikhar Jain
9100c9212d
[AArch64] Enable masked load/store for Streaming-SVE with -march=armv8-a+sme (#163133)
For subtarget aarch64, isLegalMaskedLoadStore() should not return false
for Streaming-SVE. Thus now on usage of -march=armv8-a+sme & for
workloads that contains loops with control flow where predication is
data dependent on any array/vectors, masked load/stores along with
necessary scalable vectorization constructs would be emitted.

Fixes: #162797
2025-11-06 07:15:27 +00:00
Florian Hahn
efe8573127
[LV] Add extra tests for narrowing interleave groups with op chains.
Add additional tests to cover chains of ops feeding interleave groups,
some of which could be narrowed.
2025-11-05 23:11:41 +00:00
Florian Hahn
b0b4616790
[VPlan] Handle single-scalar conds in VPWidenSelectRecipe. (#165506)
Generalize VPWidenSelectRecipe codegen to consider single-scalar
conditions instead of just loop-invariant ones.

If the condition is a single-scalar, we can simply use a scalar
condition.

PR: https://github.com/llvm/llvm-project/pull/165506
2025-11-05 22:11:29 +00:00
Florian Hahn
54190970cf
[LV] Add tests for narrowing interleave groups with casts.
Add additional tests for narrowing interleave groups with casts.
2025-11-05 20:57:52 +00:00
Ramkrishnan
78d649199b
[InterleavedAccess] Construct interleaved access store with shuffles
Cost of interleaved store of 8 factor and 16 factor are cheaper in AArch64 with additional interleave instructions.
2025-11-05 15:06:30 -05:00
David Green
6ad25c5912
[AArch64] Improve the cost model for extending mull (#125651)
We already have cost model code for detecting extending mull multiplies
for the form `mul(ext, ext)`. Since it was added the codegen for mull
has been improved, this attempts to catch the cost model up.

The main idea is to incorporate extends of larger sizes. A vector `v8i32
mul(zext(v8i8), zext(v8i8))` will be code-generated as `zext (v8i16
mul(zext(v8i8), zext(v8i8))`, or umull+ushll+ushll2.

So the total cost should be 3ish if each instruction costs 1. Where
exactly we attribute the costs is dependable, this patch opts to sets
the cost of the extend to 0 (or the cost of the extend not included in
the mull) and the mul gets the cost of the mull+extra extends.

isWideningInstruction is split into two functions for the two types of
operands it supports. isSingleExtWideningInstruction now handles addw
instructions that extend the second operand, isBinExtWideningInstruction
is for instructions like addl that extend both operands.
2025-11-04 07:50:51 +00:00
Sander de Smalen
f17c95ba54 [LV] Simplify vplan-printing.ll test (NFC)
This simplifies the test by moving some of the complicated options
to loop attributes, so that it's easier to extend the test file
with new cases.

The options `-enable-epilogue-vectorization` and
`-epilogue-vectorization-force-VF=2` were not strictly necessary
for the test.
2025-11-03 08:34:23 +00:00
Florian Hahn
b7e922a3da
[VPlan] Convert BuildVector with all-equal values to Broadcast. (#165826)
Fold BuildVector where all operands are equal to Broadcast of the first
operand. This will subsequently make it easier to remove additional
buildvectors/broadcasts, e.g. via
https://github.com/llvm/llvm-project/pull/165506.

PR: https://github.com/llvm/llvm-project/pull/165826
2025-11-01 17:28:42 -07:00
Florian Hahn
683b00bb50
[VPlan] Limit VPScalarIVSteps to step == 1 in getSCEVExprForVPValue.
For now, just support VPScalarIVSteps with step == 1 in
getSCEVExprForVPValue. This fixes a crash when the step would be != 1.
2025-10-31 02:22:56 +00:00
Hassnaa Hamdi
be29f0dd86
[LV]: Improve accuracy of calculating remaining iterations of MainLoopVF (#156723)
Transform TC and VF to same numerical space when they are different.
2025-10-26 14:45:44 +00:00
Florian Hahn
301fa24671 [VPlan] Limit narrowInterleaveGroups to single block regions for now.
Currently only regions with a single block are supported by the legality
checks.
2025-10-23 23:55:59 +01:00
Florian Hahn
4ec5852c1d [LV] Add tests for narrowing interleave groups with multiple blocks.
Add additional test coverage for narrowInterleaveGroups with loops with
multiple blocks.
2025-10-23 22:54:03 +01:00
paperchalice
249883d0c5
[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786)
Post cleanup for #164534.
2025-10-23 20:31:31 +08:00
Sam Tebbs
6b19a546aa
[LV] Bundle partial reductions inside VPExpressionRecipe (#147302)
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. -> https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. https://github.com/llvm/llvm-project/pull/147513
2025-10-23 11:18:55 +00:00
Florian Hahn
bfc322dd72
Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

There have been reports of mis-compiles
in https://github.com/llvm/llvm-project/pull/149706.

Revert while I investigate.
2025-10-22 21:27:11 +01:00
Kerry McLaughlin
45c0b29171
[LV] Ignore user-specified interleave count when unsafe. (#153009)
When an VF is specified via a loop hint, it will be clamped to a safe
VF or ignored if it is found to be unsafe. This is not the case for
user-specified interleave counts, which can lead to loops such as
the following with a memory dependence being vectorised with
interleaving:

```
#pragma clang loop interleave_count(4)
for (int i = 4; i < LEN; i++)
    b[i] = b[i - 4] + a[i];
```

According to [1], loop hints are ignored if they are not safe to apply.

This patch adds a check to prevent vectorisation with interleaving if
isSafeForAnyVectorWidth() returns false. This is already checked in
selectInterleaveCount().

[1]
https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
2025-10-22 15:21:27 +01:00
Florian Hahn
aca53f4375
[VPlan] Skip masked interleave groups in narrowInterleaveGroups.
8d29d09309 exposed a crash due to incorrectly trying to handle masked
interleave recipes. For now, the current code does not support masked
interleave recipes. Bail out for them.
2025-10-22 14:10:01 +01:00
Florian Hahn
8d29d09309
[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)
Move narrowInterleaveGroups to to general VPlan optimization stage.

To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.

If such a VF is found, the original VPlan is split into 2:
 a) a new clone which contains all VFs of Plan, except VFToOptimize, and
 b) the original Plan with VFToOptimize as single VF.

The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.

Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.

One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz

PR: https://github.com/llvm/llvm-project/pull/149706
2025-10-21 11:37:42 +01:00
David Sherwood
822c291aac
[LV][NFC] Remove undef from phi incoming values (#163762)
Split off from PR #163525, this standalone patch replaces
 use of undef as incoming PHI values with zero, in order
 to reduce the likelihood of contributors hitting the
 `undef deprecator` warning in github.
2025-10-21 10:49:27 +01:00
Sushant Gokhale
005ec78b71
[AArch64][CostModel] Add constraints on which partial reductions are (#163728)
natively supported on Neon and SVE

PR #158641 refined and refactored the cost model for partial reductions.
While doing so, it missed out on certain constraints. Specifically,
cases like i32 -> i64 partial reduce are not natively supported. This
patch adds back the condition/constraint that was present before PR
#158641
2025-10-20 17:36:44 -07:00
Ramkumar Ramachandra
9bfaf12c07
[VPlan] Handle more replicates in isUniformAcrossVFsAndUFs (#162342)
A single-scalar replicate without side-effects, and with uniform
operands, is uniform. Special-case assumes and stores.
2025-10-20 10:26:23 +00:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Florian Hahn
445415709e
[LV] Move test for incomplete partial reduction chains to separate file.
Move test to new file, to prepare for adding similar tests in
https://github.com/llvm/llvm-project/pull/162822.
2025-10-19 22:23:53 +01:00
Florian Hahn
b9ce7656e9
[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-19 18:49:05 +00:00
Nikita Popov
8fa4a1029c [LoopVectorize] Regenerate test checks (NFC) 2025-10-16 18:21:42 +02:00
Florian Hahn
7f54fccc0e
[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056)
When narrowing stores of a single-scalar, we currently use
ExtractLastElement, which extracts the last element across all parts.
This is not correct if the store's address is not uniform across all
parts. If it is only uniform-per-part, the last lane per part must be
extracted. Add a new ExtractLastLanePerPart opcode to handle this
correctly. Most transforms apply to both ExtractLastElement and
ExtractLastLanePerPart, with the only difference being their treatment
during unrolling.

Fixes https://github.com/llvm/llvm-project/issues/162498.

PR: https://github.com/llvm/llvm-project/pull/163056
2025-10-15 13:46:09 +01:00
David Sherwood
4f2c867756
[LV][NFC] Fix "cpu" attribute in some partial-reduce*.ll tests (#163518) 2025-10-15 09:26:04 +01:00
Sushant Gokhale
778d3c8ccc
[NFC] Partial reduce test to demonstrate regression post commit #cc9c64d (#162681)
We have seen performance regression for several instances of the Numba
benchmark, with some ranging around 70%, on Neoverse-v2 post #158641.
The mentioned case is short reproducer of the same. See
https://godbolt.org/z/j9Mj5WM7c for the IR differences.. A future patch
will address this.
2025-10-14 23:51:36 -07:00
Florian Hahn
ae7b15f2e2
[VPlan] Return invalid for scalable VF in VPReplicateRecipe::computeCost
Replication is currently not supported for scalable VFs. Make sure
VPReplicateRecipe::computeCost returns an invalid cost early, for
scalable VFs if the recipe is not a single-scalar.

Note that this moves the existing invalid-costs.ll out of the AArch64
subdirectory, as it does not use a target triple.

Fixes https://github.com/llvm/llvm-project/issues/160792.
2025-10-11 19:28:02 +01:00
Ramkumar Ramachandra
7296734394
[VPlan] Mark ActiveLaneMask as not having mem effects (#162330)
VPInstruction::ActiveLaneMask does not read or write memory. This allows
us to clean up some dead recipes.
2025-10-08 09:19:24 +01:00