3431 Commits

Author SHA1 Message Date
Florian Hahn
b8eaceb39b
[VPlan] Explicitly replicate VPInstructions by VF. (#155102)
Extend replicateByVF added in #142433 (aa240293190) to also explicitly
unroll replicating VPInstructions.

Now the only remaining case where we replicate for all lanes is
VPReplicateRecipes in replicate regions.

PR: https://github.com/llvm/llvm-project/pull/155102
2025-09-12 17:06:26 +01:00
Graham Hunter
54fc5367f6 [LV] Fix crash in uncountable exit with side effects checking
Fixes an ICE reported on PR #145663, as an assert was found to be
reachable with a specific combination of unreachable blocks.
2025-09-12 10:41:05 +00:00
Luke Lau
4bb250d6a3
[VPlan] Always consider register pressure on RISC-V (#156951)
Stacked on #156923 

In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because
whilst the largest element type is i8, we generate a bunch of pointer
vectors for gathers and scatters. This means the VF chosen is quite high
e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x
i64> m8 registers for the pointers.

This was briefly fixed by #132190 where we computed register pressure in
VPlan and used it to prune VFs that were likely to spill. The legacy
cost model wasn't able to do this pruning because it didn't have
visibility into the pointer vectors that were needed for the
gathers/scatters.

However VF pruning was restricted again to just the case when max
bandwidth was enabled in #141736 to avoid an AArch64 regression, and
restricted again in #149056 to only prune VFs that had max bandwidth
enabled.

On RISC-V we take advantage of register grouping for performance and
choose a default of LMUL 2, which means there are 16 registers to work
with – half the number as SVE, so we encounter higher register pressure
more frequently.

As such, we likely want to always consider pruning VFs with high
register pressure and not just the VFs from max bandwidth.

This adds a TTI hook to opt into this behaviour for RISC-V which fixes
the motivating godbolt example above. When last checked this
significantly reduces the number of spills on SPEC CPU 2017, up to
80% on 538.imagick_r.
2025-09-12 06:21:54 +00:00
Joel E. Denny
0e3c5566c0
[PGO] Add llvm.loop.estimated_trip_count metadata (#152775)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As the RFC explains, that metadata enables future patches, such as PR
#128785, to fix block frequency issues without losing estimated trip
counts.
2025-09-11 15:55:18 -04:00
Graham Hunter
e285602fda [LV] Enforce addrec in current loop for uncountable exit load address check
Addresses post-commit review raised for #145663
2025-09-11 11:18:22 +00:00
Elvis Wang
3e898bc40f
[LV] Fix cost misaligned when gather/scatter w/ addr is uniform. (#157387)
This patch fix the assertion when the `isUniform` (from legacy model)
and `isSingleScalar`(from Vplan-based model) mismatch.

The simplify test that cause assertion
```
loop:
  loadA = load %a  => %a is loop invariant.
  loadB = load %LoadA
  ...
```
In the legacy cost model, it cannot analysis that addr of `%loadB` is
uniform but in the Vplan-based cost model both addr in `%loadA` and
`loadB` is single scalar.

Full test caused crash: https://llvm.godbolt.org/z/zEG8YKjqh.

---------

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-11 07:49:54 +08:00
Florian Hahn
1efa997317
[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars.
Move handling of stores to single-scalar/uniform address from
replicateByVF to narrowToSingleScalar.
2025-09-10 21:58:29 +01:00
Florian Hahn
055e4ff35a
[VPlan] Don't narrow op multiple times in narrowInterleaveGroups.
Track which ops already have been narrowed, to avoid narrowing the same
operation multiple times. Repeated narrowing will lead to incorrect
results, because we could first narrow from an interleave group -> wide
load, and then narrow the wide load > single-scalar load.

Fixes thttps://github.com/llvm/llvm-project/issues/156190.
2025-09-10 19:22:42 +01:00
Florian Hahn
7b828738c6
[LV] Add tests with multiple store groups re-using widened ops.
Test coverage for https://github.com/llvm/llvm-project/issues/156190.
2025-09-10 17:10:46 +01:00
Nikita Popov
a301e1a895
[InstCombine] Split GEPs with multiple non-zero offsets (#151333)
Split GEPs that have more than one non-zero offset into two GEPs. This
is in preparation for the ptradd migration, which can only represent
such GEPs.

This also enables CSE and LICM of the common base.
2025-09-10 16:51:58 +02:00
Graham Hunter
3c810b76b9
[LV] Add initial legality checks for early exit loops with side effects (#145663)
This adds initial support to LoopVectorizationLegality to analyze loops
with side effects (particularly stores to memory) and an uncountable
exit. This patch alone doesn't enable any new transformations, but
does give clearer reasons for rejecting vectorization for such a loop.

The intent is for a loop like the following to pass the specific checks,
and only be rejected at the end until the transformation code is
committed:

```
// Assume a is marked restrict
// Assume b is known to be large enough to access up to b[N-1]
for (int i = 0; i < N; ++) {
  a[i]++;
  if (b[i] > threshold)
    break;
}
```
2025-09-10 13:54:52 +01:00
Hassnaa Hamdi
5739142345
[LV][AArch64][NFC]: Change TC in a test case. (#157512)
- In sve-epilog-vscale-fixed.ll file, it tests the preference of
fixed-width epilogue VF vs scalable when costs are equal. This NFC patch
is changing the TC in the test case to be unknown to avoid folding the
epilogue in future LV changes.
2025-09-10 12:41:49 +01:00
Florian Hahn
c3e76b2770
[VPlan] Keep common flags during CSE. (#157664)
During CSE, we don't have to drop all poison-generating flags on
mis-match, we can keep the ones common on both recipes.

PR: https://github.com/llvm/llvm-project/pull/157664
2025-09-10 10:20:48 +00:00
Mel Chen
4d9a7fa9ba
[VPlan] Remove dead recipes before simplifying blends (#157622)
In simplifyBlends, when normalizing a blend recipe, the first mask that
is used only by the blend and is not all-false is chosen, and its
corresponding incoming value becomes the initial value, with the others
blended into it. At the same time, the mask that is chosen can be
eliminated. However, a multi-user mask might be used by a dead recipe,
which prevents this optimization. This patch moves removeDeadRecipes
before simplifyBlends to eliminate dead recipes, allowing simplifyBlends
to remove more dead masks.
2025-09-10 08:03:18 +00:00
Florian Hahn
c4b17bf9ed
[VPlan] Slightly extend ExtractLastElement fold to single-scalars.
Update ExtractLastElement fold to support single scalar recipes, if all
their users only use scalars.
2025-09-09 22:08:08 +01:00
Nikita Popov
dbdac9f3ab [LoopVectorize] Generate test checks (NFC) 2025-09-09 17:43:17 +02:00
Florian Hahn
6bcb172bd6
[LV] Add test for preserving common GEP flags.
Add additional test coverage for preserving poison generating flags.
Modernize the existing flags tests with auto-generated check lines.
2025-09-09 13:54:53 +01:00
David Green
204917ea97
[LoopVectorizer][AArch64] Add a -sve-vscale-for-tuning override option. (#156916)
It can be useful for debugging and tuning to be able to alter the
VScaleForTuning. This adds a quick option to the aarch64 subtarget for
altering it.
2025-09-09 10:46:12 +01:00
Florian Hahn
9b1b93766d
Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9.

Recommit with with a fix for MSan failure (
https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a
set to track deleted values. Using the InsertedInstructions set is not
sufficient, as it use asserting value handles as keys, which may
dereference the value at construction.

Original message:

Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-09 09:47:41 +01:00
Florian Hahn
132bacde22
[VPlan] Also allow extracts as users when converting to single scalars.
Extracts technically do not use scalars, but vectors, but if the operand
is a single scalar we do not need a vector and they should not block
forming single scalars.
2025-09-08 22:11:39 +01:00
Florian Hahn
eeb43806eb
Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f.

Triggers MSan errors in some configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/169/builds/14799
2025-09-08 14:52:28 +01:00
Florian Hahn
408a2e7cee
[LV] Remove instcombine,simplifycfg and dce from some tests.
Remove instcombine, simplifycfg and dce from some tests, as they make it
a bit more difficult to see the codegen coming out of LV and most
simplifications are already done on the VPlan-level.

Also modernizes some check lines.
2025-09-08 12:01:37 +01:00
Florian Hahn
528b13df57
[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)
Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-08 10:53:20 +01:00
Luke Lau
fe6e178401
[VPlan] Don't build recipes for unconditional switches (#157323)
In #157322 we crash because we try to infer a type for a VPReplicate
switch recipe.

My understanding was that these switches should be removed by
VPlanPredicator, but this switch survived through it because it was
unconditional, i.e. had no cases other than the default case.

This fixes #157322 by not emitting any recipes for unconditional
switches to begin with, similar to how we treat unconditional branches.
2025-09-08 09:01:43 +00:00
Florian Hahn
b50ad945dd
[InstSimplify] Simplify extractvalue (umul_with_overflow(x, 1)). (#157307)
Look through extractvalue to simplify umul_with_overflow where one of
the operands is 1.

This removes some redundant instructions when expanding SCEVs, which in
turn makes the runtime check cost estimate more accurate, reducing the
minimum iterations for which vectorization is profitable.

PR: https://github.com/llvm/llvm-project/pull/157307
2025-09-07 18:32:40 +01:00
Florian Hahn
2654690006
[LV] Add additional test with SCEV predicate.
The SCEV predicate in the existing tests for optimizing for size is
known to be false. Add additional test with a predicate that cannot be
proven true/false.

Also generate checks with latest version of script.
2025-09-07 16:14:52 +01:00
Florian Hahn
afa0e70cc6
[LV] Remove instcombine,simplifycfg and dce from some tests.
Remove instcombine, simplifycfg and dce from some tests, as they make it
a bit more difficult to see the codegen coming out of LV and most
simplifications are already done on the VPlan-level.

Also modernizes some check lines.
2025-09-07 10:28:25 +01:00
Florian Hahn
59d72b57b0
[LV] Modernize and regenerate checks for some tests. 2025-09-06 20:52:29 +01:00
Florian Hahn
724a63ba8b
[LV] Use more accurate getSCEV/MemChecks in GeneratedRTCheck::hasChecks.
Update hasChecks to use getSCEV/MemRuntimeChecks(), which automatically
handles checking for known-false checks.

This improves a few cases where we previously did not add metadata to
disable runtime unrolling, due to runtime checks, even though no runtime
checks are needed.
2025-09-06 19:21:11 +01:00
Florian Hahn
cd8c3e5053
[LV] Add test showing missing metadata to disable runtime unrolling. 2025-09-06 16:42:02 +01:00
Florian Hahn
e0f00bd645
[LV] Don't consider second op as invariant in getDivRemSpeculationCost.
The second operand when using a safe divisor will always be a select in
the loop, so won't be invariant; don't treat it as such.

This fixes a divergence with legacy and VPlan based cost model.

Fixes https://github.com/llvm/llvm-project/issues/156066.
2025-09-06 14:06:04 +01:00
Florian Hahn
b9b0ea5f62
[LV] Pass DT to isGuaranteedNotToBePoison in canVectorizeWithIfCvt.
Pass DT to slightly improve analysis results. Note that the context
instruction is already passed.
2025-09-05 20:42:56 +01:00
Florian Hahn
f8972c8280
[SCEVExp] Fix early exit in ComputeEndCheck. (#156910)
ComputeEndCheck incorrectly returned false for unsigned predicates
starting at zero and a positive step.

The AddRec could still wrap if Step * trunc ExitCount wraps or trunc
ExitCount strips leading 1s.

Fixes https://github.com/llvm/llvm-project/issues/156849.

PR: https://github.com/llvm/llvm-project/pull/156910
2025-09-05 15:13:11 +00:00
Phoebe Wang
94b164c218
[X86][AVX10] Remove EVEX512 and AVX10-256 implementations (#157034)
The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343

We have warned these options in LLVM21 through #132542. This patch
removes underlying implementations in LLVM22.
2025-09-05 14:08:59 +00:00
Florian Hahn
74ec38fad0
[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. (#156730)
Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9

PR: https://github.com/llvm/llvm-project/pull/156730
2025-09-05 08:45:13 +00:00
Luke Lau
3f9e0736ac
[VPlan] Move findCommonEdgeMask optimization to simplifyBlends (#156304)
Following up from #150368, this moves folding common edge masks into
simplifyBlends.

One test in uniform-blend.ll ended up regressing but after looking at it
closely, it came from a weird (x && !x) edge mask. So I've just included
a simplifcation in this PR to fold that to false.
2025-09-05 01:29:22 +00:00
Luke Lau
4e5e65e55d
[VPlan] Only compute reg pressure if considered. NFCI (#156923)
In #149056 VF pruning was changed so that it only pruned VFs that
stemmed from MaxBandwidth being enabled.

However we always compute register pressure regardless of whether or not
max bandwidth is permitted for any VFs (via
`MaxPermissibleVFWithoutMaxBW`).

This skips the computation if not needed and renames the method for
clarity.

The diff in reg-usage.ll is due to the scalable VPlan not actually
having any maxbandwidth VFs, so I've changed it to check the
fixed-length VF instead, which is affected by maxbandwidth.
2025-09-05 00:23:47 +00:00
Shih-Po Hung
9876b06bc7
[LV] Add initial legality checks for loops with unbound loads. (#152422)
This patch splits out the legality checks from PR #151300, following the
landing of PR #128593.

It is a step toward supporting vectorization of early-exit loops that
contain potentially faulting loads.
In this commit, an early-exit loop is considered legal for vectorization
if it satisfies the following criteria:

1. it is a read-only loop.
2. all potentially faulting loads are unit-stride, which is the only
type currently supported by vp.load.ff.
2025-09-05 08:20:16 +08:00
Florian Hahn
8796dfdcba
[VPlan] Consolidate logic to update loop metadata and profile info.
This patch consolidates updating loop metadata and profile info for both
the remainder and vector loops in a single place. This is NFC, modulo
consistently applying vectorization specific metadata also in the
experimental VPlan-native path.

Split off from https://github.com/llvm/llvm-project/pull/154510.
2025-09-04 21:50:40 +01:00
Hassnaa Hamdi
35b22764e2
[LV][AArch64] Prefer epilogue with fixed-width over scalable VF. (#155546)
In case of equal costs Prefer epilogue with fixed-width over scalable VF.
That is helpful in cases like post-LTO vectorization where epilogue with
fixed-width VF can be removed when we eventually know that the trip count
is less than the epilogue iterations.
2025-09-04 19:31:30 +01:00
Florian Hahn
ec581e460a
[LV] Don't run instcombine for interleaved-accesses test.
Drop instcombine from the run-line to make test independent and make it
easier to follow the generated code for SCEV predicate checks.
2025-09-04 16:08:52 +01:00
Florian Hahn
a614807130
[LV] Add more tests for interleave groups requiring predicates.
Adds tests for https://github.com/llvm/llvm-project/issues/156849.

Also tidies up the existing related test a bit.
2025-09-04 15:45:15 +01:00
Florian Hahn
b400fd1151
[LAA] Support assumptions with non-constant deref sizes. (#156758)
Update evaluatePtrAddrecAtMaxBTCWillNotWrap to support non-constant
sizes in dereferenceable assumptions.

Apply loop-guards in a few places needed to reason about expressions
involving trip counts of the from (BTC - 1).

PR: https://github.com/llvm/llvm-project/pull/156758
2025-09-04 11:32:33 +01:00
Ramkumar Ramachandra
c14052e20b
[VPlan] Let Not preserve uniformity in isSingleScalar (#156676)
LogicalAnd and WidePtrAdd should also preserve uniformity, but we don't
have test coverage to enable adding them.
2025-09-04 11:27:14 +01:00
Ramkumar Ramachandra
e4c0b3e111
[VPlan] Simplify x && false -> false, x | 0 -> x (#156345)
The OR x, 0 -> x simplification has been introduced to avoid
regressions.
2025-09-04 10:29:59 +01:00
Florian Hahn
f1e91bff42
[LV] Regenerate more checks for missing branch weights. 2025-09-03 22:18:04 +01:00
Florian Hahn
ce5a1158b8
[LV] Regenerate checks for missing branch weights. 2025-09-03 21:37:52 +01:00
Florian Hahn
2729284db1
[LV] Add early-exit tests with deref assumptions and scaled sizes.
Add tests where the size of dereferenceable assumption is multiplied by
a constant.
2025-09-03 20:30:46 +01:00
Florian Hahn
a434a7a4f1
Reapply "[LAA,Loads] Use loop guards and max BTC if needed when checking deref. (#155672)"
This reverts commit f0df1e3dd4ec064821f673ced7d83e5a2cf6afa1.

Recommit with extra check for SCEVCouldNotCompute. Test has been added in
b16930204b.

Original message:
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.

The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.

Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 914374624f
(https://github.com/llvm/llvm-project/pull/155300).

PR: https://github.com/llvm/llvm-project/pull/155672
2025-09-03 12:45:28 +01:00
Mel Chen
2f5500e4cf
[LV] Improve the test coverage for strided access. nfc (#155981)
Add tests for strided access with UF > 1, and introduce a new test case
@constant_stride_reinterpret.
2025-09-03 10:19:36 +00:00