3558 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
1de55c9693
[VPlan] Avoid sinking allocas in sinkScalarOperands (#166135)
Use cannotHoistOrSinkRecipe to forbid sinking allocas.
2025-11-05 13:06:24 +00:00
David Sherwood
7b3fe5fd42
[LV][NFC] Remove undef values in some test cases (#164401)
Split off from PR #163525, this standalone patch replaces simple cases
where undef is used as a value for arithmetic or getelementptr
instructions. This will reduce the likelihood of contributors hitting
the `undef deprecator` warning in github.
2025-11-05 09:18:02 +00:00
Florian Hahn
af9a4263a1
[LAA] Only use inbounds/nusw in isNoWrap if the GEP is dereferenced. (#161445)
Update isNoWrap to only use the inbounds/nusw flags from GEPs that are
guaranteed to be dereferenced on every iteration. This fixes a case
where we incorrectly determine no dependence.

I think the issue is isolated to code that evaluates the resulting
AddRec at BTC, just using it to compute the distance between accesses
should still be fine; if the access does not execute in a given
iteration, there's no dependence in that iteration. But isolating the
code is not straight-forward, so be conservative for now. The practical
impact should be very minor (only one loop changed across a corpus with
27k modules from large C/C++ workloads.

Fixes https://github.com/llvm/llvm-project/issues/160912.

PR: https://github.com/llvm/llvm-project/pull/161445
2025-11-04 17:08:12 +00:00
David Green
6ad25c5912
[AArch64] Improve the cost model for extending mull (#125651)
We already have cost model code for detecting extending mull multiplies
for the form `mul(ext, ext)`. Since it was added the codegen for mull
has been improved, this attempts to catch the cost model up.

The main idea is to incorporate extends of larger sizes. A vector `v8i32
mul(zext(v8i8), zext(v8i8))` will be code-generated as `zext (v8i16
mul(zext(v8i8), zext(v8i8))`, or umull+ushll+ushll2.

So the total cost should be 3ish if each instruction costs 1. Where
exactly we attribute the costs is dependable, this patch opts to sets
the cost of the extend to 0 (or the cost of the extend not included in
the mull) and the mul gets the cost of the mull+extra extends.

isWideningInstruction is split into two functions for the two types of
operands it supports. isSingleExtWideningInstruction now handles addw
instructions that extend the second operand, isBinExtWideningInstruction
is for instructions like addl that extend both operands.
2025-11-04 07:50:51 +00:00
Luke Lau
e17bc1ec51
[VPlan] Explicitly predicate some replicate region sinking tests. NFC (#164934)
To remove some test diffs in #160449
2025-11-03 12:19:50 +00:00
Sander de Smalen
f17c95ba54 [LV] Simplify vplan-printing.ll test (NFC)
This simplifies the test by moving some of the complicated options
to loop attributes, so that it's easier to extend the test file
with new cases.

The options `-enable-epilogue-vectorization` and
`-epilogue-vectorization-force-VF=2` were not strictly necessary
for the test.
2025-11-03 08:34:23 +00:00
Florian Hahn
b0fc9650f7 [LV] Add tests with hoist-able invariant loads. 2025-11-02 16:10:45 +00:00
Florian Hahn
b7e922a3da
[VPlan] Convert BuildVector with all-equal values to Broadcast. (#165826)
Fold BuildVector where all operands are equal to Broadcast of the first
operand. This will subsequently make it easier to remove additional
buildvectors/broadcasts, e.g. via
https://github.com/llvm/llvm-project/pull/165506.

PR: https://github.com/llvm/llvm-project/pull/165826
2025-11-01 17:28:42 -07:00
Florian Hahn
90bbffec02
[VPlan] Add VPlan printing tests for recipes with metadata. 2025-11-01 22:05:51 +00:00
Sam Tebbs
31a0ebb840
[NFCI] Address post-merge review of #162503 (#165582) 2025-10-31 10:23:03 +00:00
Florian Hahn
317b42ef5c
[VPlan] Remove original recipe after narrowing to single-scalar.
Directly remove RepOrWidenR after replacing all uses. Removing the dead
user early unlocks additional opportunities for further narrowing.
2025-10-31 04:38:16 +00:00
Florian Hahn
683b00bb50
[VPlan] Limit VPScalarIVSteps to step == 1 in getSCEVExprForVPValue.
For now, just support VPScalarIVSteps with step == 1 in
getSCEVExprForVPValue. This fixes a crash when the step would be != 1.
2025-10-31 02:22:56 +00:00
Vigneshwar Jayakumar
469702c5d5
[LICM] Sink unused l-invariant loads in preheader. (#157559)
Unused loop invariant loads were not sunk from the preheader to the exit
block, increasing live range.

This commit moves the sinkUnusedInvariant logic from indvarsimplify to
LICM also adds functionality to sink unused load that's not
clobbered by the loop body.
2025-10-30 09:23:04 -05:00
Florian Hahn
98d3a25f74
[VPlan] Don't preserve LCSSA in expandSCEVs. (#165505)
This follows similar reasoning as 45ce88758d24
(https://github.com/llvm/llvm-project/pull/159556):

LV does not preserve LCSSA, it constructs it just before processing a
loop to vectorize. Runtime check expressions are invariant to that loop,
so expanding them should not break LCSSA form for the loop we are about
to vectorize.

LV creates SCEV and memory runtime checks early on and then disconnects
the blocks temporarily. The patch fixes a mis-compile, where previously
LCSSA construction during SCEV expand may replace uses in currently
unreachable SCEV/memory check blocks.

Fixes https://github.com/llvm/llvm-project/issues/162512

PR: https://github.com/llvm/llvm-project/pull/165505
2025-10-29 18:25:46 +00:00
Sam Tebbs
22f860a55d
[LV] Bundle (partial) reductions with a mul of a constant (#162503)
A reduction (including partial reductions) with a multiply of a constant
value can be bundled by first converting it from `reduce.add(mul(ext,
const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to
extend the constant.

This PR adds such bundling by first truncating the constant to the
source type of the other extend, then extending it to the destination
type of the extend. The first truncate is necessary so that the types of
each extend's operand are then the same, and the call to
canConstantBeExtended proves that the extend following a truncate is
safe to do. The truncate is removed by optimisations.

This is a stacked PR, 1a and 1b can be merged in any order:
1a. https://github.com/llvm/llvm-project/pull/147302
1b. https://github.com/llvm/llvm-project/pull/163175
2. -> https://github.com/llvm/llvm-project/pull/162503
2025-10-28 16:59:53 +00:00
Ramkumar Ramachandra
a2d873fb87
[VPlan] Introduce cannotHoistOrSinkRecipe, fix miscompile (#162674)
Factor out common code to determine legality of hoisting and sinking.
The patch has the side-effect of fixing an underlying bug, where a
load/store pair is reordered.
2025-10-28 09:36:17 +00:00
Florian Hahn
0e28c9bc9d
[LAA] Skip undef/poison strides in collectStridedAccess.
The map returned by collectStridedAccess is used to replace strides with
their versioned values. This does not work for Undef/Poison, which don't
have use-lists. Don't try to version them, as versioning won't be useful in
practice.

Fixes https://github.com/llvm/llvm-project/issues/162922.
2025-10-27 05:01:17 +00:00
Florian Hahn
57ba58d558
[LV] Modernize version-mem-access.ll tests.
Auto-generate CHECK lines and simplify tests a bit.
2025-10-27 03:37:59 +00:00
Hassnaa Hamdi
be29f0dd86
[LV]: Improve accuracy of calculating remaining iterations of MainLoopVF (#156723)
Transform TC and VF to same numerical space when they are different.
2025-10-26 14:45:44 +00:00
Ramkumar Ramachandra
2c6c2689c5
[VPlan] Extend tryToFoldLiveIns to fold binary intrinsics (#161703)
InstSimplifyFolder can fold binary intrinsics, so take the opportunity
to unify code with getOpcodeOrIntrinsicID, and handle the case. The
additional handling of WidenGEP is non-functional, as the GEP is
simplified before it is widened, as the included test shows.
2025-10-24 10:21:39 +00:00
Florian Hahn
301fa24671 [VPlan] Limit narrowInterleaveGroups to single block regions for now.
Currently only regions with a single block are supported by the legality
checks.
2025-10-23 23:55:59 +01:00
Florian Hahn
4ec5852c1d [LV] Add tests for narrowing interleave groups with multiple blocks.
Add additional test coverage for narrowInterleaveGroups with loops with
multiple blocks.
2025-10-23 22:54:03 +01:00
paperchalice
249883d0c5
[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786)
Post cleanup for #164534.
2025-10-23 20:31:31 +08:00
Sam Tebbs
6b19a546aa
[LV] Bundle partial reductions inside VPExpressionRecipe (#147302)
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. -> https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. https://github.com/llvm/llvm-project/pull/147513
2025-10-23 11:18:55 +00:00
Florian Hahn
bfc322dd72
Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

There have been reports of mis-compiles
in https://github.com/llvm/llvm-project/pull/149706.

Revert while I investigate.
2025-10-22 21:27:11 +01:00
Kerry McLaughlin
45c0b29171
[LV] Ignore user-specified interleave count when unsafe. (#153009)
When an VF is specified via a loop hint, it will be clamped to a safe
VF or ignored if it is found to be unsafe. This is not the case for
user-specified interleave counts, which can lead to loops such as
the following with a memory dependence being vectorised with
interleaving:

```
#pragma clang loop interleave_count(4)
for (int i = 4; i < LEN; i++)
    b[i] = b[i - 4] + a[i];
```

According to [1], loop hints are ignored if they are not safe to apply.

This patch adds a check to prevent vectorisation with interleaving if
isSafeForAnyVectorWidth() returns false. This is already checked in
selectInterleaveCount().

[1]
https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
2025-10-22 15:21:27 +01:00
Florian Hahn
aca53f4375
[VPlan] Skip masked interleave groups in narrowInterleaveGroups.
8d29d09309 exposed a crash due to incorrectly trying to handle masked
interleave recipes. For now, the current code does not support masked
interleave recipes. Bail out for them.
2025-10-22 14:10:01 +01:00
Sam Parker
20340accf2
[NFC][WebAssembly] FP conversion interleave tests (#164576) 2025-10-22 11:43:44 +01:00
Florian Hahn
8d29d09309
[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)
Move narrowInterleaveGroups to to general VPlan optimization stage.

To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.

If such a VF is found, the original VPlan is split into 2:
 a) a new clone which contains all VFs of Plan, except VFToOptimize, and
 b) the original Plan with VFToOptimize as single VF.

The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.

Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.

One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz

PR: https://github.com/llvm/llvm-project/pull/149706
2025-10-21 11:37:42 +01:00
David Sherwood
822c291aac
[LV][NFC] Remove undef from phi incoming values (#163762)
Split off from PR #163525, this standalone patch replaces
 use of undef as incoming PHI values with zero, in order
 to reduce the likelihood of contributors hitting the
 `undef deprecator` warning in github.
2025-10-21 10:49:27 +01:00
Sushant Gokhale
005ec78b71
[AArch64][CostModel] Add constraints on which partial reductions are (#163728)
natively supported on Neon and SVE

PR #158641 refined and refactored the cost model for partial reductions.
While doing so, it missed out on certain constraints. Specifically,
cases like i32 -> i64 partial reduce are not natively supported. This
patch adds back the condition/constraint that was present before PR
#158641
2025-10-20 17:36:44 -07:00
Florian Hahn
35b9f20449
[LV] Check for TruncInsts in canTruncateToMinimalBitwidth.
TruncInst must truncate at most to their destination. Return false if
MinBWs contains a destination size > the trunc result type size.

Fixes https://github.com/llvm/llvm-project/issues/162688.
2025-10-20 22:31:16 +01:00
Florian Hahn
b4dbb1cdc4
[VPlan] Be more careful with CSE in replicate regions. (#162110)
Recipes in replicate regions implicitly depend on the region's
predicate. Limit CSE to recipes in the same block, when either recipe is
in a replicate region.

This allows handling VPPredInstPHIRecipe during CSE. If we perform CSE
on recipes inside a replicate region, we may end up with 2
VPPredInstPHIRecipes sharing the same operand. This is incompatible with
current VPPredInstPHIRecipe codegen, which re-sets the current value of
its operand in VPTransformState. This can cause crashes in the added
test cases.

Note that this patch only modifies ::isEqual to check for replicating
regions and not getHash, as CSE across replicating regions should be
uncommon.

Fixes https://github.com/llvm/llvm-project/issues/157314. 
Fixes https://github.com/llvm/llvm-project/issues/161974.

PR: https://github.com/llvm/llvm-project/pull/162110
2025-10-20 10:53:47 +00:00
Luke Lau
9fe1f29541
[VPlan] Set flags when constructing zexts using VPWidenCastRecipe (#164198)
createWidenCast doesn't set the flag type, so when we simplify trunc
(zext nneg x) -> zext x we would hit an assertion in CSE that the flag
types don't match with other VPWidenCastRecipes that weren't simplified.

This fixes it the same way trunc flags are handled too.

As an aside I think it should be correct to preserve the nneg flag in
this case since the input operand is still non-negative after the
transform. But that's left to another PR.

Fixes https://github.com/llvm/llvm-project/issues/164171
2025-10-20 10:39:16 +00:00
Ramkumar Ramachandra
9bfaf12c07
[VPlan] Handle more replicates in isUniformAcrossVFsAndUFs (#162342)
A single-scalar replicate without side-effects, and with uniform
operands, is uniform. Special-case assumes and stores.
2025-10-20 10:26:23 +00:00
Florian Hahn
9317975a7a
[VPlan] Match legacy behavior w.r.t. using pointer phis as scalar addrs.
When the legacy cost model scalarizes loads that are used as addresses
for other loads and stores, it looks to phi nodes, if they are direct
address operands of loads/stores. Match this behavior in
isUsedByLoadStoreAddress, to fix a divergence between legacy and
VPlan-based cost model.
2025-10-20 11:09:25 +01:00
Florian Hahn
eb17a8d599
[SCEV] Preserve divisor info when adding guard info for ICMP_NE via Sub. (#163250)
Follow-up to https://github.com/llvm/llvm-project/pull/160500 to
preserve divisibiltiy info when creating the UMax.

PR: https://github.com/llvm/llvm-project/pull/163250
2025-10-20 10:20:41 +01:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Florian Hahn
445415709e
[LV] Move test for incomplete partial reduction chains to separate file.
Move test to new file, to prepare for adding similar tests in
https://github.com/llvm/llvm-project/pull/162822.
2025-10-19 22:23:53 +01:00
Florian Hahn
b9ce7656e9
[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-19 18:49:05 +00:00
Florian Hahn
12ec050b9b
[LV] Remove some unnecessary uses of poison from tests. 2025-10-17 21:20:44 +01:00
Nikita Popov
8fa4a1029c [LoopVectorize] Regenerate test checks (NFC) 2025-10-16 18:21:42 +02:00
Ramkumar Ramachandra
b71515cc76
[VPlan] Extend licm to hoist assumes (#162636)
Assumes are safe to hoist if they're guaranteed to execute, since they
don't alias, and don't throw. This mirrors what the IR-LICM does.
2025-10-16 13:59:32 +00:00
Ramkumar Ramachandra
34fdd7472b
[LV] Add coverage for operand-bundles (#163417) 2025-10-16 12:22:03 +00:00
David Sherwood
c48aa54656
[LV][NFC] Remove undef from function return values (#163578)
Split off from PR #163525, this standalone patch replaces `ret * undef`
returns with `ret void` in order to reduce the likelihood of
contributors hitting the `undef deprecator` warning in github.
2025-10-16 09:49:38 +01:00
Florian Hahn
7f54fccc0e
[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056)
When narrowing stores of a single-scalar, we currently use
ExtractLastElement, which extracts the last element across all parts.
This is not correct if the store's address is not uniform across all
parts. If it is only uniform-per-part, the last lane per part must be
extracted. Add a new ExtractLastLanePerPart opcode to handle this
correctly. Most transforms apply to both ExtractLastElement and
ExtractLastLanePerPart, with the only difference being their treatment
during unrolling.

Fixes https://github.com/llvm/llvm-project/issues/162498.

PR: https://github.com/llvm/llvm-project/pull/163056
2025-10-15 13:46:09 +01:00
David Sherwood
4f2c867756
[LV][NFC] Fix "cpu" attribute in some partial-reduce*.ll tests (#163518) 2025-10-15 09:26:04 +01:00
Sushant Gokhale
778d3c8ccc
[NFC] Partial reduce test to demonstrate regression post commit #cc9c64d (#162681)
We have seen performance regression for several instances of the Numba
benchmark, with some ranging around 70%, on Neoverse-v2 post #158641.
The mentioned case is short reproducer of the same. See
https://godbolt.org/z/j9Mj5WM7c for the IR differences.. A future patch
will address this.
2025-10-14 23:51:36 -07:00
Florian Hahn
0fefa56b03
[LV] Add additional min/max reduction tests.
Add test coverage for min/max reductions with various combinations of
users (in and outside loops, used by stores) and predicated variants.

This adds missing test coverage for min/max reductions.
2025-10-14 22:26:59 +01:00
Ramkumar Ramachandra
4ec78f56c2
[LV] Increase coverage of uniformity-rewriter (#161219)
Add a test with a non-uniform load of an argument (SCEVUnknown), showing
that SCEVUnknown cannot always be considered uniform.
2025-10-13 10:15:34 +00:00