7134 Commits

Author SHA1 Message Date
Alexey Bataev
78490acb32 [SLP]Support for zext i1 %x modeling as select %x, 1, 0
Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.

Fixes #178403

Recommit after revert in 993e1f66afcfe9da03bd813e669eada341b11d2f

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/180635
2026-02-10 12:54:12 -08:00
Luke Lau
f81889da29
[VPlan] Fix convertToPhisToBlends folding non poison blend to poison (#180686)
This fixes a miscompile in #180005 where we didn't check that the first
incoming value isn't poison.

We should use the first non-poison incoming value if it exists, or just
poison if all the incoming values are poison.
2026-02-10 16:15:57 +00:00
Jonas Paulsson
d80a729572
[LoopVectorizer] Rename variable (NFC). (#180585)
Since TargetTransformInfo::enableAggressiveInterleaving(bool
HasReductions) takes the HasReductions argument, the LoopVectorizer
should save its returned value in a variable called AggressivelyInterleave
instead of AggressivelyInterleaveReductions.
2026-02-10 10:11:43 -06:00
Andrei Elovikov
f96c1ccc1e
[VPlan] Add -vplan-print-after= option (#178700)
UpdateTestChecks support is updated in subsequent
https://github.com/llvm/llvm-project/pull/178736.
2026-02-10 16:07:25 +00:00
Alexey Bataev
993e1f66af Revert "[SLP]Support for zext i1 %x modeling as select %x, 1, 0"
This reverts commit 70aebae2a13114f4e3d5e2460c052d8f3de295be to fix
buildbots https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flab.llvm.org%2Fbuildbot%2F%23%2Fbuilders%2F85%2Fbuilds%2F18614&data=05%7C02%7C%7Ce5641da3fe984280a6e908de68b3658c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639063316889757116%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=65hUwLDdZkXq3zUEt3cVuqJNwXN7Alw4JKDggDbjeVk%3D&reserved=0
2026-02-10 06:49:53 -08:00
Alexey Bataev
70aebae2a1
[SLP]Support for zext i1 %x modeling as select %x, 1, 0
Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.

Fixes #178403

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/180635
2026-02-10 08:59:44 -05:00
Sander de Smalen
3157758190
[LV] Handle partial sub-reductions with sub in middle block. (#178919)
Sub-reductions can be implemented in two ways:
(1) negate the operand in the vector loop (the default way).
(2) subtract the reduced value from the init value in the middle block.

Note that both ways keep the reduction itself as an 'add' reduction,
which is necessary because only llvm.vector.partial.reduce.add exists.

The ISD nodes for partial reductions don't support folding the
sub/negation into its operands because the following is not a valid
transformation:
```
     sub(0, mul(ext(a), ext(b)))
  -> mul(ext(a), ext(sub(0, b)))
```
It can therefore be better to choose option (2) such that the partial
reduction is always positive (starting at '0') and to do a final
subtract in the middle block.

For AArch64 there are no dot-product instructions that can
do a `partial.reduce.sub(acc, mul(ext(a), ext(b)))` operation.
I'm not sure if such instructions exist for other targets.
(If so then we may want to make this decision a target option)

This PR also increases the AArch64 cost of a partial sub-reduction
when this exists in an 'add-sub' reduction chain.

Fixes https://github.com/llvm/llvm-project/issues/178703
2026-02-10 11:00:32 +00:00
Benjamin Maxwell
f22a178b13
Reland "[LV] Support conditional scalar assignments of masked operations" (#180708)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).

For example, the following loop can now be vectorized:

```
int simple_csa_int_load(
  int* a, int* b, int default_val, int N, int threshold)
{
  int result = default_val;
  for (int i = 0; i < N; ++i)
    if (a[i] > threshold)
      result = b[i];
  return result;
}
```

It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.

---

Reverts llvm/llvm-project#180275 (original PR: #178862)

Additional type legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` was
added in #180290, which should resolve the backend crashes on x86.
2026-02-10 09:57:48 +00:00
Mel Chen
7e5d9189d2
[VPlan] Simplify true && x -> x (#179426) 2026-02-10 08:49:03 +00:00
Florian Hahn
d1ec04dfd4
[VPlan] Simplify single-entry VPWidenPHIRecipe.
Include VPWidenPHIRecipe in phi simplification if there's a single
incoming value.
2026-02-09 22:10:13 +00:00
Vishruth Thimmaiah
84f4b1e52d Reland "[LoopVectorize] Support vectorization of overflow intrinsics" (#180526)
Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`,
`ssub`, `umul` and `smul` as trivially vectorizable.

Fixes #174617

---

This patch is a reland of #174835.

Reverts #179819
2026-02-09 15:32:04 +00:00
David Sherwood
44031ae79f
[LV] Fix issue in VPFirstOrderRecurrencePHIRecipe::usesFirstLaneOnly (#179977)
In some cases we decide to vectorise loops with first-order recurrences
using VF=1, IC>1. We then attempt to unroll a vplan in replicateByVF,
however when trying to erase the list of values from the parent we
trigger the following assert:

```
virtual llvm::VPRecipeValue::~VPRecipeValue(): Assertion `Users.empty()
  && "trying to delete a VPRecipeValue with remaining users"' failed.
```

The problem seems to stem from this code:

```
  DefR->replaceUsesWithIf(LaneDefs[0], [DefR](VPUser &U, unsigned) {
    return U.usesFirstLaneOnly(DefR);
  });
```

since usesFirstLaneOnly returns false and we fail to replace uses of
DefR with LaneDefs[0]. Upon inspection the only VPUser objects that
return false are VPInstruction::FirstOrderRecurrenceSplice and
VPFirstOrderRecurrencePHIRecipe. Since the values are all scalar it's
simply not possible for us to be using anything other than the first
lane. I've fixed this by bailing out of replicateByVF early for plans with
only a scalar VF.

Fixes https://github.com/llvm/llvm-project/issues/179671
2026-02-09 13:42:26 +00:00
Florian Hahn
7defb0a4a3
[VPlan] Skip applying InstsToScalarize with forced instr costs. (#168269)
ForceTargetInstructionCost in the legacy cost model overrides any costs
from InstsToScalarize. Match the behavior in the VPlan-based cost model.
This fixes a crash with -force-target-instr-cost for the added test
case.

PR: https://github.com/llvm/llvm-project/pull/168269
2026-02-09 13:20:44 +00:00
Luke Lau
8cd86ff284
[VPlan] Propagate FastMathFlags from phis to blends (#180226)
If a phi has fast math flags, we can propagate it to the widened select.
To do this, this patch makes VPPhi and VPBlendRecipe subclasses of
VPRecipeWithIRFlags, and propagates it through PlainCFGBuilder and
VPPredicator.

Alive2 proofs for some of the FMFs (it looks like it can't reason about
the full "fast" set yet)
nnan: https://alive2.llvm.org/ce/z/f0bRd4
nsz: https://alive2.llvm.org/ce/z/u9P96T

The actual motivation for this to eventually be able to move the special
casing for tail folding in
LoopVectorizationPlanner::addReductionResultComputation into the CFG in
#176143, which requires passing through FMFs.
2026-02-09 19:38:58 +08:00
Florian Hahn
6324ee32c1
[VPlan] Use PredBB's terminator as insert point for VPIRPhi extracts.
Use PredBB's terminator as insert point in VPIRPhi::execute to make sure
the extracts are placed after any possibly sunk instructions.

Fixes https://github.com/llvm/llvm-project/issues/180363.
2026-02-08 20:36:36 +00:00
Florian Hahn
7509cad693
[VPlan] Support masked VPInsts, use for predication (NFC) (#142285)
Add support for mask operands to most VPInstructions, using
getNumOperandsForOpcode.

This allows VPlan predication to predicate VPInstructions directly. The
mask will then be dropped or handled when creating wide recipes.

Depends on https://github.com/llvm/llvm-project/pull/142284.
Depends on https://github.com/llvm/llvm-project/pull/168784.

PR: https://github.com/llvm/llvm-project/pull/142285
2026-02-08 18:23:36 +00:00
Florian Hahn
3c5b05427d
[VPlan] Pass underlying instr to getMemoryOpCost in ::computeCost.
Pass underlying instruction to getMemoryOpCost in
VPReplicateRecipe::computeCost if UsedByLoadStoreAddress is true.
Some targets use the underlying instruction to improve costs,
and this is needed to match the legacy cost model.

Fixes https://github.com/llvm/llvm-project/issues/177780.
Fixes https://github.com/llvm/llvm-project/issues/177772.
2026-02-08 16:15:39 +00:00
Florian Hahn
3192fe2c7b
[VPlan] Fall back to legacy cost model if PtrSCEV is nullptr.
There are some cases when PtrSCEV can be nullptr. Fall back to legacy
cost model, to not call isLoopInvariant with nullptr.

Fixes a crash after 0c4f8094939d2.
2026-02-08 11:55:12 +00:00
Florian Hahn
0c4f809493
[VPlan] Compute predicated load/store costs in VPlan. (NFC) (#179129)
Update VPReplicateReicpe::computeCost to compute predicated load/store
costs directly, unless the pointer is uniform. In that case, the legacy
cost model uses a different logic, which will be migrated separately.

PR: https://github.com/llvm/llvm-project/pull/179129
2026-02-07 20:02:54 +00:00
Kewen Meng
703c2762d3
Revert "[LV] Support conditional scalar assignments of masked operations" (#180275)
Reverts llvm/llvm-project#178862 

revert to unblock bot:
https://lab.llvm.org/buildbot/#/builders/206/builds/13225
2026-02-06 13:24:40 -08:00
Luke Lau
d1d9413e7b [VPlan] Don't use std::not_fn
It looks like some Apple based toolchains can't compile this:
https://github.com/llvm/llvm-project/pull/180005#issuecomment-3861614477

There aren't any other users of std::not_fn within LLVM so just use a
lambda for now.
2026-02-07 01:30:55 +08:00
Rahul Joshi
017ca73917
[NFC][LLVM] Remove pass initialization from pass constructors (#180158) 2026-02-06 09:06:32 -08:00
Ramkumar Ramachandra
22a16623e1
[VPlan] Fix comments in simplifyRecipe around BinaryOr (NFC) (#180050) 2026-02-06 13:43:30 +00:00
Ramkumar Ramachandra
901d175d18
[VPlan] Simplify x & AllOnes -> x (#180049) 2026-02-06 13:42:58 +00:00
Florian Hahn
fdce0ea708
[VPlan] Add ExitingIVValue VPInstruction. (#175651)
Add a new VPInstruction opcode to compute the exiting value of an
induction variable after vectorization. This replaces the pattern of
extracting the last lane from the last part of the induction backedge
value when applicable.

This allows us to always use the pre-computed IV end value. It will also
allow unifying end value creation for both induction resume and exit
values.

PR: https://github.com/llvm/llvm-project/pull/175651
2026-02-06 12:27:31 +00:00
Florian Hahn
689c99557f
[VectorCombine] Skip dead shufflevector in GetIndexRangeInShuffles to fix crash. (#179217)
Update GetIndexRangeInShuffles to skip unused shuffles. This matches the
behavior in the loop below and without it, we end up with an index
mis-match, causing a crash for the added test case.

PR: https://github.com/llvm/llvm-project/pull/179217
2026-02-06 12:05:47 +00:00
Benjamin Maxwell
4f90eb6427
[LV] Support conditional scalar assignments of masked operations (#178862)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).

For example, the following loop can now be vectorized:

```
int simple_csa_int_load(
  int* a, int* b, int default_val, int N, int threshold)
{
  int result = default_val;
  for (int i = 0; i < N; ++i)
    if (a[i] > threshold)
      result = b[i];
  return result;
}
```

It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.
2026-02-06 11:43:06 +00:00
Julian Pokrovsky
3f4d94fd4c
[VectorCombine] foldShuffleOfBinops - support multiple uses of shuffled binops (#179429)
Resolves #173035
2026-02-06 11:10:20 +00:00
Luke Lau
33a2c3ee9c
[VPlan] Ignore poison incoming values when creating blend (#180005)
We have an optimization in VPPredicator when creating blends where if
all the incoming values are the same, we just return that value.

This extends it to handle cases like "phi [%x, %x, poison, %x]" by
ignoring poison values.

This is split off from #176143 to prevent regressions when maintaining
SSA by adding PHIs with a poison incoming value.
2026-02-06 19:09:43 +08:00
Alexey Bataev
fe754dff6d [SLP]Remove LoadCombine workaround after handling of the copyables
LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.

Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)

Recommit after revert in 6377c86d718232fe60c548dfd7ab439f7ff84df7

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/174205
2026-02-05 11:16:08 -08:00
Alexey Bataev
6377c86d71 Revert "[SLP]Remove LoadCombine workaround after handling of the copyables"
This reverts commit 8dbb9f66e8b14a8a06f1873a2c1b7dce366ed2d6 to fix
buildbot issues https://lab.llvm.org/buildbot/#/builders/224/builds/2795
2026-02-05 09:57:00 -08:00
Alexey Bataev
8dbb9f66e8
[SLP]Remove LoadCombine workaround after handling of the copyables
LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.

Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/174205
2026-02-05 10:42:08 -05:00
Alexander Kornienko
7165353506
Revert "[LoopVectorize] Support vectorization of overflow intrinsics" (#179819)
Reverts llvm/llvm-project#174835, which causes clang crashes.

See
https://github.com/llvm/llvm-project/pull/174835#issuecomment-3844233831
and https://github.com/llvm/llvm-project/issues/179671 for details.
2026-02-05 15:41:49 +01:00
Florian Hahn
05a2b146fb
[LV] Optimize FindLast recurrences to FindIV (NFCI). (#177870)
This patch restructures Find(First|Last)IV handling. Instead of
differentiating between FindLast, FindFirstIV and FindLastIV up front,
this patch simplifies the logic in IVDescriptor to just identify the
FindLast pattern up-front.

It then adds a new VPlan transformation to optimize FindLast reductions
to FindIV reductions if there is a suitable sentinel value.
Find(Last|First)IV recurrence kinds to a single FindIV kind.

This is simpler and more accurate, given selecting the first/last
induction of the final IV reduction is directly controlled by the
corresponding recurrence kind of the ComputeReductionResult.

The new structure also allows further optimizations, like vectorizing
FindLastIV with another boolean reduction that tracks if the condition
in the loop was ever true, if there is no suitable sentinel value.

PR: https://github.com/llvm/llvm-project/pull/177870
2026-02-05 13:57:20 +00:00
Valeriy Savchenko
92c26bb1a5
[VectorCombine] Fix crash in foldEquivalentReductionCmp on i1 vector (#179917) 2026-02-05 12:43:48 +00:00
nora
3bbf748a63
[VPlan] Create edge mask for single-destination switch (#179107)
When converting phis to blends, the `VPPredicator` expects to have edge
masks to the phi node if the phi node has different incoming blocks.
This was not the case if the predecessor of the phi was a switch where a
conditional destination was the same as the default destination.

This was because when creating edge masks in `createSwitchEdgeMasks`,
edge masks are set in a loop through the *non-default* destinations. But
when there are no non-default destinations (but at least one condition,
otherwise an earlier condition would trigger and just forward the source
mask), this loop is never executed, so the masks are never set.

To resolve this, we explicitly forward the source mask for these cases
as well, which is correct because it is an unconditional branch, just a
very convoluted one.

fixes #179074
2026-02-05 15:50:57 +08:00
Florian Hahn
6e98a93357
[LV] Make sure DFS numbers are valid before use.
This fixes random crashes with expensive checks after
d97ce9bc04394de6464b0278fe2d250bcc8f3a1b, as accessing DFS numbers is
only safe after calling updateDFSNumbers.
2026-02-04 22:41:36 +00:00
Florian Hahn
d97ce9bc04 [LV] Use DomTree DFS numbers to sort early exit blocks.
properlyDominates does not provide a strict weak ordering. Use DFS in
numbers instead, to avoid ordering violations.
2026-02-04 21:52:27 +00:00
Florian Hahn
792f7b089a
[VPlan] Refine exit select check in transformtoPartialReduction.
Make sure we find the actual select for the exit users and only use it
for the final link in the chain. This fixes a miscompile after
90b3712d8a20efa2cbaadc177da576e485dce038.
2026-02-03 21:07:02 +00:00
Andrei Elovikov
d510c4c3f3
[VPlan] Generalize VPAllSuccessorsIterator to support predecessors (#178724)
To be used in Mel's https://github.com/llvm/llvm-project/pull/173265.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke_lau@icloud.com>
2026-02-03 21:03:09 +00:00
Alexey Bataev
46a38488a4 [SLP]Disable modeling disjoint reduction or as bitcast for big endian
Big endian targets cannot be modeled as bitcast, need to support it as
a reversion/bswap instead, just disabling it for now.
2026-02-03 06:16:25 -08:00
Florian Hahn
8240cf337a
[VPlan] Always set flags for overflowing ops etc via VPIRFlags. (#179138)
Enforce that all VPInstructions set the correct OpType of the VPIRFlags.
Flag mis-matches (e.g. VPInstruction Add without `OverflowingBinOp`
being set) can cause crashes (e.g. in CSE) or potentially mis-compiles.

Add a few helpers in VPBuilder to create common instructions with
correct flags.

PR: https://github.com/llvm/llvm-project/pull/179138
2026-02-03 12:33:23 +00:00
Mel Chen
8c6658aca6
[VPlan] Sink recipes from the vector loop region in licm. (#168031)
When a recipe can be safely sunk and all of its users are outside the
vector loop region in the same dedicated exit block, the recipe does not
need to be executed on every iteration.
This patch extends the VPlan-based LICM (Loop Invariant Code Motion) to
also sink such recipes from the vector loop region into the exit block.
This reduces redundant computation and improves cost model accuracy.

TODO: Support nested loop sinking
TODO: Support sinking `VPReplicateRecipe` (requires `replicateByVF`
fixes)
TODO: Support recipes with multiple defined values (e.g., interleaved
loads)
TODO: Clone recipes without users to all exit blocks
TODO: Support PHI node users by checking incoming value blocks
TODO: Support sinking when users are in multiple blocks
TODO: Clone recipes when users are on multiple exit paths

Co-authored-by: Luke Lau <luke@igalia.com>

---------

Co-authored-by: Luke Lau <luke@igalia.com>
Co-authored-by: Luke Lau <luke_lau@icloud.com>
2026-02-03 07:57:15 +00:00
Ramkumar Ramachandra
350b138313
[VPlan] Improve code around ArrayRef construction (NFC) (#179191) 2026-02-02 23:01:12 +00:00
Ryan Buchner
e5b99502d7
[SLP] Avoid adding duplicate VFs into vectorizeStores()::CandidateVFs (#179296)
Small compile time improvement:
```
stage1-O3: (-0.01%)
stage1-ReleaseThinLTO (-0.00%)
stage1-ReleaseLTO-g (-0.01%)
stage1-O0-g (-0.00%)
stage1-aarch64-O3 (+0.01%)
stage1-aarch64-O0-g (-0.02%)
stage2-O3 (-0.00%)
stage2-O0-g (-0.03%)
stage2-clang (+0.00%)
```

Also changes/removes a few comments for clarity.
2026-02-02 11:03:27 -08:00
Hans Wennborg
2ee37cc4cf Revert "[VectorCombine] Trim low end of loads used in shufflevector rebroadcasts. (#149093)"
It appears to create loads from underaligned addresses. See comment on
the PR.

> Following on from #128938, trim the low end of loads where only some of
> the incoming lanes are used for rebroadcasts in shufflevector
> instructions.

This reverts commit 6c8d9d0c4da51c7f9e7671902be3ad9b65d56c84 and the
follow-up commits 07a6a23f6c5387fc1e7df174b5921f6004db64e0 and
313a2008538abb61ab13f8cc9f9a712f7faff3a5.
2026-02-02 18:50:02 +01:00
Luke Lau
bb14eabaca
[VPlan] Split out EVL exit cond transform from canonicalizeEVLLoops. NFC (#178181)
This is split out from #177114.

In order to make canonicalizeEVLLoops a generic "convert to variable
stepping" transform, move the code that changes the exit condition to a
separate transform since not all variable stepping loops will want to
transform the exit condition. Run it before canonicalizeEVLLoops before
VPEVLBasedIVPHIRecipe is expanded.

Also relax the assertion for VPInstruction::ExplicitVectorLength to just
bail instead, since eventually VPEVLBasedIVPHIRecipe will be used by
other loops that aren't EVL tail folded.
2026-02-02 04:45:43 +00:00
Florian Hahn
b0d95f0c7b
[VPlan] Handle Mul/UDiv in getSCEVExprForVPValue (NFCI).
Support Mul/UDiv and AND-variant (https://alive2.llvm.org/ce/z/rBJVdg)
in getSCEVExprForVPValue.

This is used in code paths when computing SCEV expressions in the
VPlan-based cost model, which should produce costs matching the legacy
cost model.
2026-02-01 21:41:30 +00:00
Florian Hahn
beb0e7e150
[VPlan] Fold (x | !x) -> true. (#177887)
PR: https://github.com/llvm/llvm-project/pull/177887
2026-02-01 20:12:21 +00:00
Florian Hahn
90b3712d8a
Reapply "[VPlan] Detect and create partial reductions in VPlan. (NFCI) (#167851)"
This reverts commit d1e477b00b49c63ff4dd513eeb14a5b18bc055d7.

Recommit with a extra checks making sure extends are VPWidenCastRecipes,
rejecting VPReplicateRecipes.

Original message:
As a first step, move the existing partial reduction detection logic to
VPlan, trying to preserve the existing code structure & behavior as
closely as possible.

With this, partial reductions are detected and created together in a
single step.

This allows forming partial reductions and bundling them up if
profitable together in a follow-up.

PR: https://github.com/llvm/llvm-project/pull/167851
2026-02-01 16:27:27 +00:00