3811 Commits

Author SHA1 Message Date
Florian Hahn
3192fe2c7b
[VPlan] Fall back to legacy cost model if PtrSCEV is nullptr.
There are some cases when PtrSCEV can be nullptr. Fall back to legacy
cost model, to not call isLoopInvariant with nullptr.

Fixes a crash after 0c4f8094939d2.
2026-02-08 11:55:12 +00:00
Kewen Meng
703c2762d3
Revert "[LV] Support conditional scalar assignments of masked operations" (#180275)
Reverts llvm/llvm-project#178862 

revert to unblock bot:
https://lab.llvm.org/buildbot/#/builders/206/builds/13225
2026-02-06 13:24:40 -08:00
Florian Hahn
bd40d1de9c
Reapply "[SCEVExp] Use SCEVPtrToAddr in tryToReuseLCSSAPhi if possible. (#180257)"
This reverts commit cb905605b2e95f88296afe136b21a7d2476cb058.

Recommit the patch with a small change to check the destination
type matches the address type, to avoid a crash on mismatch.

Original message:

This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using
SCEVPtrToInt allows re-use, because the IR already contains a re-usable
phi using PtrToInt.

This is a first step towards migrating to SCEVPtrToAddr and avoids
regressions in follow-up changes.

PR: https://github.com/llvm/llvm-project/pull/178727
2026-02-06 21:14:41 +00:00
Florian Hahn
cb905605b2
Revert "[SCEVExp] Use SCEVPtrToAddr in tryToReuseLCSSAPhi if possible." (#180257)
Reverts llvm/llvm-project#178727

triggers asserts in on some build bots
2026-02-06 18:26:37 +00:00
Florian Hahn
c32cde4182
[SCEVExp] Use SCEVPtrToAddr in tryToReuseLCSSAPhi if possible. (#178727)
This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using
SCEVPtrToInt allows re-use, because the IR already contains a re-usable
phi using PtrToInt.

This is a first step towards migrating to SCEVPtrToAddr and avoids
regressions in follow-up changes.

PR: https://github.com/llvm/llvm-project/pull/178727
2026-02-06 17:38:24 +00:00
Ramkumar Ramachandra
901d175d18
[VPlan] Simplify x & AllOnes -> x (#180049) 2026-02-06 13:42:58 +00:00
Florian Hahn
fdce0ea708
[VPlan] Add ExitingIVValue VPInstruction. (#175651)
Add a new VPInstruction opcode to compute the exiting value of an
induction variable after vectorization. This replaces the pattern of
extracting the last lane from the last part of the induction backedge
value when applicable.

This allows us to always use the pre-computed IV end value. It will also
allow unifying end value creation for both induction resume and exit
values.

PR: https://github.com/llvm/llvm-project/pull/175651
2026-02-06 12:27:31 +00:00
Benjamin Maxwell
4f90eb6427
[LV] Support conditional scalar assignments of masked operations (#178862)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).

For example, the following loop can now be vectorized:

```
int simple_csa_int_load(
  int* a, int* b, int default_val, int N, int threshold)
{
  int result = default_val;
  for (int i = 0; i < N; ++i)
    if (a[i] > threshold)
      result = b[i];
  return result;
}
```

It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.
2026-02-06 11:43:06 +00:00
Luke Lau
33a2c3ee9c
[VPlan] Ignore poison incoming values when creating blend (#180005)
We have an optimization in VPPredicator when creating blends where if
all the incoming values are the same, we just return that value.

This extends it to handle cases like "phi [%x, %x, poison, %x]" by
ignoring poison values.

This is split off from #176143 to prevent regressions when maintaining
SSA by adding PHIs with a poison incoming value.
2026-02-06 19:09:43 +08:00
Ramkumar Ramachandra
30986dc3ff
[LV] Regen a VPlan-printing test with UTC (#179948)
Post 49288b65 ([UTC] Add initial VPlan support, #178534), we can
generate VPlan-printing tests with UTC. Do it for one test, with the
caveat that two Final VPlan prints are no longer checked.
2026-02-05 22:52:42 +00:00
Florian Hahn
e524ee5a7d
[VPlan] Auto-generate some VPlan check lines.
Use new UTC support to auto-generate some check lines to make them
easier to update in the future.
2026-02-05 21:52:37 +00:00
David Green
8f484ff2a0
[AArch64] Add FeatureUseFixedOverScalableIfEqualCost to Neoverse-V3 and Neoverse-V3ae (#179903)
This was missing from neoverse-v3 and neoverse-v3ae, but should be
present like neoverse-v2.
2026-02-05 17:55:49 +00:00
Alexander Kornienko
7165353506
Revert "[LoopVectorize] Support vectorization of overflow intrinsics" (#179819)
Reverts llvm/llvm-project#174835, which causes clang crashes.

See
https://github.com/llvm/llvm-project/pull/174835#issuecomment-3844233831
and https://github.com/llvm/llvm-project/issues/179671 for details.
2026-02-05 15:41:49 +01:00
Florian Hahn
05a2b146fb
[LV] Optimize FindLast recurrences to FindIV (NFCI). (#177870)
This patch restructures Find(First|Last)IV handling. Instead of
differentiating between FindLast, FindFirstIV and FindLastIV up front,
this patch simplifies the logic in IVDescriptor to just identify the
FindLast pattern up-front.

It then adds a new VPlan transformation to optimize FindLast reductions
to FindIV reductions if there is a suitable sentinel value.
Find(Last|First)IV recurrence kinds to a single FindIV kind.

This is simpler and more accurate, given selecting the first/last
induction of the final IV reduction is directly controlled by the
corresponding recurrence kind of the ComputeReductionResult.

The new structure also allows further optimizations, like vectorizing
FindLastIV with another boolean reduction that tracks if the condition
in the loop was ever true, if there is no suitable sentinel value.

PR: https://github.com/llvm/llvm-project/pull/177870
2026-02-05 13:57:20 +00:00
nora
3bbf748a63
[VPlan] Create edge mask for single-destination switch (#179107)
When converting phis to blends, the `VPPredicator` expects to have edge
masks to the phi node if the phi node has different incoming blocks.
This was not the case if the predecessor of the phi was a switch where a
conditional destination was the same as the default destination.

This was because when creating edge masks in `createSwitchEdgeMasks`,
edge masks are set in a loop through the *non-default* destinations. But
when there are no non-default destinations (but at least one condition,
otherwise an earlier condition would trigger and just forward the source
mask), this loop is never executed, so the masks are never set.

To resolve this, we explicitly forward the source mask for these cases
as well, which is correct because it is an unconditional branch, just a
very convoluted one.

fixes #179074
2026-02-05 15:50:57 +08:00
Florian Hahn
d97ce9bc04 [LV] Use DomTree DFS numbers to sort early exit blocks.
properlyDominates does not provide a strict weak ordering. Use DFS in
numbers instead, to avoid ordering violations.
2026-02-04 21:52:27 +00:00
Florian Hahn
49288b6523
[UTC] Add initial VPlan support. (#178534)
Add support for extracting a VPlan from LV debug output and generalizing
matching for unnamed VPValues.

Once we have support for -vplan-print-after=xxxx we can strip the logic
to extract a VPlan manually. We cannot use regex, as we need to match
from start opening bracket to the correct closing bracket.

PR: PR: https://github.com/llvm/llvm-project/pull/178534
2026-02-04 20:29:22 +00:00
Florian Hahn
792f7b089a
[VPlan] Refine exit select check in transformtoPartialReduction.
Make sure we find the actual select for the exit users and only use it
for the final link in the chain. This fixes a miscompile after
90b3712d8a20efa2cbaadc177da576e485dce038.
2026-02-03 21:07:02 +00:00
Mel Chen
8c6658aca6
[VPlan] Sink recipes from the vector loop region in licm. (#168031)
When a recipe can be safely sunk and all of its users are outside the
vector loop region in the same dedicated exit block, the recipe does not
need to be executed on every iteration.
This patch extends the VPlan-based LICM (Loop Invariant Code Motion) to
also sink such recipes from the vector loop region into the exit block.
This reduces redundant computation and improves cost model accuracy.

TODO: Support nested loop sinking
TODO: Support sinking `VPReplicateRecipe` (requires `replicateByVF`
fixes)
TODO: Support recipes with multiple defined values (e.g., interleaved
loads)
TODO: Clone recipes without users to all exit blocks
TODO: Support PHI node users by checking incoming value blocks
TODO: Support sinking when users are in multiple blocks
TODO: Clone recipes when users are on multiple exit paths

Co-authored-by: Luke Lau <luke@igalia.com>

---------

Co-authored-by: Luke Lau <luke@igalia.com>
Co-authored-by: Luke Lau <luke_lau@icloud.com>
2026-02-03 07:57:15 +00:00
Florian Hahn
beb0e7e150
[VPlan] Fold (x | !x) -> true. (#177887)
PR: https://github.com/llvm/llvm-project/pull/177887
2026-02-01 20:12:21 +00:00
Florian Hahn
90b3712d8a
Reapply "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851)"
This reverts commit d1e477b00b49c63ff4dd513eeb14a5b18bc055d7.

Recommit with a extra checks making sure extends are VPWidenCastRecipes,
rejecting VPReplicateRecipes.

Original message:
As a first step, move the existing partial reduction detection logic to
VPlan, trying to preserve the existing code structure & behavior as
closely as possible.

With this, partial reductions are detected and created together in a
single step.

This allows forming partial reductions and bundling them up if
profitable together in a follow-up.

PR: https://github.com/llvm/llvm-project/pull/167851
2026-02-01 16:27:27 +00:00
Martin Storsjö
d1e477b00b Revert "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851)"
This reverts commit f4e8cc1a2229dca76d21c8d37439c4c194b06b86.

This change wasn't NFC; it causes failed asserts when building
ffmpeg for i686 windows, see
https://github.com/llvm/llvm-project/pull/167851 for details.
2026-02-01 14:35:02 +02:00
Florian Hahn
f4e8cc1a22
[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851)
As a first step, move the existing partial reduction detection logic to
VPlan, trying to preserve the existing code structure & behavior as
closely as possible.

With this, partial reductions are detected and created together in a
single step.

This allows forming partial reductions and bundling them up if
profitable together in a follow-up.

PR: https://github.com/llvm/llvm-project/pull/167851
2026-01-31 19:44:46 +00:00
Florian Hahn
a0b99e32d3
[LV] Add additional partial reduction test coverage for #167851.
Add test cases for which earlier versions of
https://github.com/llvm/llvm-project/pull/167851 was not NFC.

Test chained_sext_adds is moved to a new file.
2026-01-30 20:31:32 +00:00
Andrei Elovikov
d8621d665d
Reapply "[VPlan] Add hidden -vplan-print-after-all option" (#178547)
Re-commit of https://github.com/llvm/llvm-project/pull/175839 after
fixing build without `LLVM_ENABLE_DUMP`.

This consists of the following changes:

* Merge several overloads of `VPlanTransforms::runPass` into a single
function to avoid code duplication.

* Add helper macro `RUN_VPLAN_PASS` to capture the transformation name
  and pass it to the helper above for printing.

* Add new `-vplan-print-after-all` option (somewhat similar to existing
  `-vplan-verify-each`).

* Add two empty passes `printAfterInitialConstruction`/`printFinalVPlan`
so that initial/final VPlans would be supported in `-vplan-print-after-all`

This follows the original future plans in
https://github.com/llvm/llvm-project/pull/123640.
2026-01-30 19:55:09 +00:00
Florian Hahn
abfd56293c
[VPlan] Mark VPActiveLaneMaskPHIRecipe as readnone. (#177886)
VPWidenActiveLaneMaskPHIRecipe does not have side-effects and also does
not access memory. Mark accordingly. This allows hoisting of some
invariant loads out of loops and also removing unused phi recipes in the
future.

In
llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll,
the hoisting makes vectorization profitable.

PR: https://github.com/llvm/llvm-project/pull/177886
2026-01-30 16:12:30 +00:00
Sander de Smalen
a726b1907a NFC: Cleanup AArch64/partial-reduce-chained.ll
This had some loop attributes that were unused.
Also cleaned up the flags a little bit.
2026-01-30 14:59:38 +00:00
Sander de Smalen
b4c7518a0f
[LV] Add support for extended fadd reductions (#178447)
This makes use of the llvm.vector.partial.reduce.fadd intrinsics added
in #163975 to handle the following with FDOT:
```
float32_t fdot(float16_t *src, int N) {
  float32_t sum = 0.0f;
  for (int i=0; i<N; ++i)
    sum += src[i];
  return sum;
}
```
2026-01-30 08:27:57 +00:00
Florian Hahn
eabcdb572b
Revert "[VPlan] Add hidden -vplan-print-after-all option (#175839)" (#178544)
This reverts commit 97e1df149de213b760aae4060ee9e25dc9908125.

It looks like the commit caused some build bot failures. Revert back to green
so the failures can be investigated.

https://lab.llvm.org/buildbot/#/builders/159/builds/39803
https://lab.llvm.org/buildbot/#/builders/2/builds/43204
2026-01-28 23:49:24 +00:00
Andrei Elovikov
97e1df149d
[VPlan] Add hidden -vplan-print-after-all option (#175839)
This consists of the following changes: 
        
* Merge several overloads of `VPlanTransforms::runPass` into a single
function
  to avoid code duplication.
  
* Add helper macro `RUN_VPLAN_PASS` to capture the transformation name
  and pass it to the helper above for printing.

* Add new `-vplan-print-after-all` option (somewhat similar to existing
  `-vplan-verify-each`).
  
* Add two empty passes `printAfterInitialConstruction`/`printFinalVPlan`
so that initial/final
   VPlans would be supported in `-vplan-print-after-all`

This follows the original future plans in
https://github.com/llvm/llvm-project/pull/123640.
2026-01-28 22:25:54 +00:00
Jay Foad
c75d371f57
[LLVM] Fix typo "LABLE" in test checks (#178451) 2026-01-28 17:31:05 +00:00
Florian Hahn
e36cd26618
[VPlan] Remove non-reductions after simplifications. (#176795)
In some cases, we identify patterns as reductions, even though they can
be simplified to a non-reduction.

Mark VPReductionPHIRecipe as not reading from memory & not having
side-effects, to clean them up.

We also need to remove ComputeReductionResult VPInstructions with
live-in arguments. This means there is actually no reduction, and we
need to fold it to the live in. Otherwise we would incorrectly reduce
the live-in.

PR: https://github.com/llvm/llvm-project/pull/176795
2026-01-28 15:51:08 +00:00
Damian Heaton
762ba885f9
[LV] Add support for llvm.vector.partial.reduce.fadd (#163975)
Allows the Loop Vectorizer to generate `llvm.vector.partial.reduce.fadd`
intrinsics when sequences which match its requirements are found.
2026-01-28 15:05:34 +00:00
Mel Chen
2f92d44043
[LV] Pre-commit test for sinking the recipe into vector early exit block. nfc (#177954)
Pre-commit for #168031
2026-01-28 04:17:12 +00:00
Jim Lin
0ed8e7230f
[VPlan] Create SCEV before any VPIRInstructions to check for overflow (#177911)
This PR tried to fix the assertion fail at VPlanTransforms.cpp:4862
since SCEV was created after VPIRInstructions.

The tripcount in scalable-predication.ll was changed from constant value
256 to non-constant value %n to avoid VPIRInstructions optimized out,
which cannot trigger the assertion fail.

The orders in ir-bb<entry> from:

ir-bb<entry>:
  EMIT vp<%2> = EXPAND SCEV (1 umax %n)
  EMIT vp<%3> = sub ir<-1>, vp<%2>
  EMIT vp<%4> = EXPAND SCEV (4 * vscale)<nuw>
  EMIT vp<%5> = icmp ult vp<%3>, vp<%4>
  EMIT branch-on-cond vp<%5>
Successor(s): scalar.ph, vector.ph

to:

ir-bb<entry>:
  EMIT vp<%2> = EXPAND SCEV (1 umax %n)
  EMIT vp<%3> = EXPAND SCEV (4 * vscale)<nuw>
  EMIT vp<%4> = sub ir<-1>, vp<%2>
  EMIT vp<%5> = icmp ult vp<%4>, vp<%3>
  EMIT branch-on-cond vp<%5>
Successor(s): scalar.ph, vector.ph
2026-01-28 03:16:50 +00:00
Tomer Shafir
4239e858fe
[AArch64] Align nontemporal store/load little-endian checks (#177468)
This patch aims to align all nontemporal store/load handling to
systematically enforce a little-endian target. This has been the
effective support LLVM had for NT store/load lowering (there has been no
effective support for big-endian, even with the inconsistencies).

The change in `llvm/lib/Target/AArch64/AArch64InstrInfo.td` is
effectively a NFC, because the only lowering of LDNP, in
`llvm/lib/Target/AArch64/AArch64ISelLowering.cpp`, have already checked
for `isLittleEndian`. The change in
`llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h` affects its
single caller
`llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp`. The
previous logic has been wrong, enabling vectorization of effectively
illegal nontemporal store/load instructions on big-endian.
2026-01-27 21:37:21 +02:00
Ryan Buchner
2753e1dedf
[RISCV] Set the reciprocal throughtput cost for division to TTI::TCC_Expensive (#177516)
Fixes #176208. Scaled back version of #176515 that only affects the RISCV backend.

Only modifies the cost for cases when DIV is a legal operation.

Updates the cost for both Scalar and Vector types.

Used `TTI::TCC_Expensive` as suggested by
https://github.com/llvm/llvm-project/issues/176208#issuecomment-3760902537.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2026-01-27 11:01:19 -08:00
Florian Hahn
537c648fc0
[LV] Precommit extra argmin/argmax tests for #170223.
Precommit extra tests for
https://github.com/llvm/llvm-project/pull/170223
2026-01-26 21:32:50 +00:00
Florian Hahn
7b445ddcd2
[LV] Add additional tests for early-exit loops loads not known deref.
Add additional test coverage for loops with loads that are not known to
be dereferenceable.
2026-01-25 23:09:58 +00:00
Mircea Trofin
b39568d782
[LV] capture branch weights for constant trip counts (#175096)
When a vectorized loop has constant trip, it's important to update the
profile information accordingly. Hotness analysis will only look at
profile info.

For example, in the `tripcount.ll` test, without producing the profile
info, in the `const_trip_over_profile` function, the BFI of the
`vector.body` would be 32 (this is the expected value when synthetic
branch weights are used, in loops). The real value is 250. The
`for.body`value was _very_ incorrect before, too (and detrimentally so,
as it would have appeared as "very hot" when it wasn't):

The table below was obtained by printing BFI in the RUN: command, i.e.
`build/bin/opt < llvm/test/Transforms/LoopVectorize/tripcount.ll
-passes="loop-vectorize,print<block-freq>"
-loop-vectorize-with-block-frequency -S -o /dev/null`. Showing only the
`float` value, i.e. the BFI relative to the function entry BB.

```

Printing analysis results of BFI for function 'const_trip_over_profile':  
block-frequency-info: const_trip_over_profile

```

| Block | Before | After |
| ----- | ------ | ----- |
| `entry` | float = 1.0 | float = 1.0 |
| `vector.ph` | float = 1.0 | float = 1.0 |
| `vector.body` | float = **32.0** | float = **250.0** |
| `middle.block` | float = 1.0 | float = 1.0 |
| `scalar.ph` | float = 1.0 | float = 1.0 |
| `for.body` | float = **2147483647.8** | float = **1.0** |
| `for.end` | float = 1.0 | float = 1.0 |
2026-01-23 14:31:05 -08:00
Mel Chen
149c76538e
[LV] Separate runtime check cost from total overhead in profitability check (#176754)
In isOutsideLoopWorkProfitable function, there are two places where only
the runtime check cost (RtC) should be used, but incorrectly included
the costs of middle blocks and early-exit blocks.
1. VectorizeMemoryCheckThreshold comparison for interleaving-only
2. Minimum trip count that bounds runtime check overhead, i.e. MinTC2
calculation
This results in an overly conservative minimum profitable trip count.

This patch separates the runtime check cost from the total overhead
cost, and uses only RtC for VectorizeMemoryCheckThreshold comparison and
the MinTC2 calculation.
2026-01-23 07:29:56 +00:00
Florian Hahn
7ea1fa591a
[LV] Skip FindLast reductions in collectInLoopReductions.
FindLast in-loop reductions are not supported, similarly to FindLastIV
reductions. Skip them in collectInLoopReductions, to avoid a crash for
loops with FindLast reductions and in-loop reductions preferred.
2026-01-22 21:49:52 +00:00
Florian Hahn
14a209f852
[VPlan] Replace ComputeFindIVRes with ComputeRdxRes + cmp + sel (NFC) (#176672)
Replace ComputeFindIVResult with ComputeReductionResult + explicit
compare + select, to more explicitly and simpler model computing finding
the first/last induction, which boils down to a min/max reduction +
compare and select of the sentinel value.

PR: https://github.com/llvm/llvm-project/pull/176672
2026-01-22 19:28:47 +00:00
Florian Hahn
f41767db9d
[LV] Add replicating load/store cost tests for Apple CPUs.
Add dedicated tests to check replicating load/store costs on Apple CPUs.
2026-01-21 16:37:12 +00:00
Florian Hahn
d3f2f1366d
[LV] Consider UserIC when limiting VF. (#174573)
If a UserIC is provided, the vector loop will process VF * UserIC. Pass
it through UserIC to computeFeasibleMaxVF and use it to limit the max VF
to factors where VF * UserIC <= MaxTripCount. This avoids creating dead
vector loops with user provided interleave counts.

PR: https://github.com/llvm/llvm-project/pull/174573
2026-01-20 14:19:11 +00:00
David Sherwood
abc924356e
[LV][NFC] Update low trip count tail-folding tests (#176898)
Whilst reviewing PR #176754 I realised there seemed to be some odd cost
model issues for the tests in file

LoopVectorize/AArch64/fold-tail-low-trip-count.ll

where we seemed to be vectorising loops that aren't worth it. It turns
out the tests were not targeting AArch64 despite being in the AArch64
directory. I fixed the RUN line for the file and also added a new file
for RISCV so we get more test coverage.
2026-01-20 12:15:50 +00:00
Florian Hahn
69fbab2b72
[VPlan] Fall back to legacy cost if operands may be force-scalarized.
If any of the operands of a VPReplicateRecipe have been
force-scalarized, then the legacy cost model skips the scalarization
overhead, but we cannot match this in the VPlan cost model.

Bail out for now in those very rare cases.

Fixes https://github.com/llvm/llvm-project/issues/176720.
2026-01-19 21:25:54 +00:00
Florian Hahn
e34fefdb35
[LV] Add extra tests with sink-able recipes.
Add extra test coverage for
https://github.com/llvm/llvm-project/pull/168031.
2026-01-19 18:33:55 +00:00
Florian Hahn
c6f3bc888b
[LV] Add single-iteration epilogue test with de-generate reduction.
While at it, also modernize the check lines to reduce diff of future
changes.
2026-01-19 17:43:40 +00:00
Florian Hahn
5e5d6389f6
[LV] Allow loops with multiple early exits in legality checks. (#176403)
This patch removes the single uncountable exit constraint, allowing
loops with multiple early exits, if the exits form a dominance chain and
all other constraints hold for all uncountable early exits.

While legality now accepts such loops, vectorization is not yet
supported. VPlan support will be added in a follow up:
https://github.com/llvm/llvm-project/pull/174864

PR: https://github.com/llvm/llvm-project/pull/176403
2026-01-19 12:32:04 +00:00