602 Commits

Author SHA1 Message Date
Benjamin Maxwell
f22a178b13
Reland "[LV] Support conditional scalar assignments of masked operations" (#180708)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).

For example, the following loop can now be vectorized:

```
int simple_csa_int_load(
  int* a, int* b, int default_val, int N, int threshold)
{
  int result = default_val;
  for (int i = 0; i < N; ++i)
    if (a[i] > threshold)
      result = b[i];
  return result;
}
```

It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.

---

Reverts llvm/llvm-project#180275 (original PR: #178862)

Additional type legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` was
added in #180290, which should resolve the backend crashes on x86.
2026-02-10 09:57:48 +00:00
Vishruth Thimmaiah
84f4b1e52d Reland "[LoopVectorize] Support vectorization of overflow intrinsics" (#180526)
Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`,
`ssub`, `umul` and `smul` as trivially vectorizable.

Fixes #174617

---

This patch is a reland of #174835.

Reverts #179819
2026-02-09 15:32:04 +00:00
David Sherwood
44031ae79f
[LV] Fix issue in VPFirstOrderRecurrencePHIRecipe::usesFirstLaneOnly (#179977)
In some cases we decide to vectorise loops with first-order recurrences
using VF=1, IC>1. We then attempt to unroll a vplan in replicateByVF,
however when trying to erase the list of values from the parent we
trigger the following assert:

```
virtual llvm::VPRecipeValue::~VPRecipeValue(): Assertion `Users.empty()
  && "trying to delete a VPRecipeValue with remaining users"' failed.
```

The problem seems to stem from this code:

```
  DefR->replaceUsesWithIf(LaneDefs[0], [DefR](VPUser &U, unsigned) {
    return U.usesFirstLaneOnly(DefR);
  });
```

since usesFirstLaneOnly returns false and we fail to replace uses of
DefR with LaneDefs[0]. Upon inspection the only VPUser objects that
return false are VPInstruction::FirstOrderRecurrenceSplice and
VPFirstOrderRecurrencePHIRecipe. Since the values are all scalar it's
simply not possible for us to be using anything other than the first
lane. I've fixed this by bailing out of replicateByVF early for plans with
only a scalar VF.

Fixes https://github.com/llvm/llvm-project/issues/179671
2026-02-09 13:42:26 +00:00
Luke Lau
8cd86ff284
[VPlan] Propagate FastMathFlags from phis to blends (#180226)
If a phi has fast math flags, we can propagate it to the widened select.
To do this, this patch makes VPPhi and VPBlendRecipe subclasses of
VPRecipeWithIRFlags, and propagates it through PlainCFGBuilder and
VPPredicator.

Alive2 proofs for some of the FMFs (it looks like it can't reason about
the full "fast" set yet)
nnan: https://alive2.llvm.org/ce/z/f0bRd4
nsz: https://alive2.llvm.org/ce/z/u9P96T

The actual motivation for this to eventually be able to move the special
casing for tail folding in
LoopVectorizationPlanner::addReductionResultComputation into the CFG in
#176143, which requires passing through FMFs.
2026-02-09 19:38:58 +08:00
Florian Hahn
6324ee32c1
[VPlan] Use PredBB's terminator as insert point for VPIRPhi extracts.
Use PredBB's terminator as insert point in VPIRPhi::execute to make sure
the extracts are placed after any possibly sunk instructions.

Fixes https://github.com/llvm/llvm-project/issues/180363.
2026-02-08 20:36:36 +00:00
Florian Hahn
7509cad693
[VPlan] Support masked VPInsts, use for predication (NFC) (#142285)
Add support for mask operands to most VPInstructions, using
getNumOperandsForOpcode.

This allows VPlan predication to predicate VPInstructions directly. The
mask will then be dropped or handled when creating wide recipes.

Depends on https://github.com/llvm/llvm-project/pull/142284.
Depends on https://github.com/llvm/llvm-project/pull/168784.

PR: https://github.com/llvm/llvm-project/pull/142285
2026-02-08 18:23:36 +00:00
Florian Hahn
3c5b05427d
[VPlan] Pass underlying instr to getMemoryOpCost in ::computeCost.
Pass underlying instruction to getMemoryOpCost in
VPReplicateRecipe::computeCost if UsedByLoadStoreAddress is true.
Some targets use the underlying instruction to improve costs,
and this is needed to match the legacy cost model.

Fixes https://github.com/llvm/llvm-project/issues/177780.
Fixes https://github.com/llvm/llvm-project/issues/177772.
2026-02-08 16:15:39 +00:00
Florian Hahn
3192fe2c7b
[VPlan] Fall back to legacy cost model if PtrSCEV is nullptr.
There are some cases when PtrSCEV can be nullptr. Fall back to legacy
cost model, to not call isLoopInvariant with nullptr.

Fixes a crash after 0c4f8094939d2.
2026-02-08 11:55:12 +00:00
Florian Hahn
0c4f809493
[VPlan] Compute predicated load/store costs in VPlan. (NFC) (#179129)
Update VPReplicateReicpe::computeCost to compute predicated load/store
costs directly, unless the pointer is uniform. In that case, the legacy
cost model uses a different logic, which will be migrated separately.

PR: https://github.com/llvm/llvm-project/pull/179129
2026-02-07 20:02:54 +00:00
Kewen Meng
703c2762d3
Revert "[LV] Support conditional scalar assignments of masked operations" (#180275)
Reverts llvm/llvm-project#178862 

revert to unblock bot:
https://lab.llvm.org/buildbot/#/builders/206/builds/13225
2026-02-06 13:24:40 -08:00
Florian Hahn
fdce0ea708
[VPlan] Add ExitingIVValue VPInstruction. (#175651)
Add a new VPInstruction opcode to compute the exiting value of an
induction variable after vectorization. This replaces the pattern of
extracting the last lane from the last part of the induction backedge
value when applicable.

This allows us to always use the pre-computed IV end value. It will also
allow unifying end value creation for both induction resume and exit
values.

PR: https://github.com/llvm/llvm-project/pull/175651
2026-02-06 12:27:31 +00:00
Benjamin Maxwell
4f90eb6427
[LV] Support conditional scalar assignments of masked operations (#178862)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).

For example, the following loop can now be vectorized:

```
int simple_csa_int_load(
  int* a, int* b, int default_val, int N, int threshold)
{
  int result = default_val;
  for (int i = 0; i < N; ++i)
    if (a[i] > threshold)
      result = b[i];
  return result;
}
```

It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.
2026-02-06 11:43:06 +00:00
Alexander Kornienko
7165353506
Revert "[LoopVectorize] Support vectorization of overflow intrinsics" (#179819)
Reverts llvm/llvm-project#174835, which causes clang crashes.

See
https://github.com/llvm/llvm-project/pull/174835#issuecomment-3844233831
and https://github.com/llvm/llvm-project/issues/179671 for details.
2026-02-05 15:41:49 +01:00
Florian Hahn
8240cf337a
[VPlan] Always set flags for overflowing ops etc via VPIRFlags. (#179138)
Enforce that all VPInstructions set the correct OpType of the VPIRFlags.
Flag mis-matches (e.g. VPInstruction Add without `OverflowingBinOp`
being set) can cause crashes (e.g. in CSE) or potentially mis-compiles.

Add a few helpers in VPBuilder to create common instructions with
correct flags.

PR: https://github.com/llvm/llvm-project/pull/179138
2026-02-03 12:33:23 +00:00
Ramkumar Ramachandra
a19cbc4b77
[VPlan] Rename VectorEndPointer's IndexedTy to SourceElementTy (NFC) (#178856)
For consistency with IR terminology.
2026-02-01 11:30:26 +00:00
Florian Hahn
abfd56293c
[VPlan] Mark VPActiveLaneMaskPHIRecipe as readnone. (#177886)
VPWidenActiveLaneMaskPHIRecipe does not have side-effects and also does
not access memory. Mark accordingly. This allows hoisting of some
invariant loads out of loops and also removing unused phi recipes in the
future.

In
llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll,
the hoisting makes vectorization profitable.

PR: https://github.com/llvm/llvm-project/pull/177886
2026-01-30 16:12:30 +00:00
Sander de Smalen
b4c7518a0f
[LV] Add support for extended fadd reductions (#178447)
This makes use of the llvm.vector.partial.reduce.fadd intrinsics added
in #163975 to handle the following with FDOT:
```
float32_t fdot(float16_t *src, int N) {
  float32_t sum = 0.0f;
  for (int i=0; i<N; ++i)
    sum += src[i];
  return sum;
}
```
2026-01-30 08:27:57 +00:00
Florian Hahn
e36cd26618
[VPlan] Remove non-reductions after simplifications. (#176795)
In some cases, we identify patterns as reductions, even though they can
be simplified to a non-reduction.

Mark VPReductionPHIRecipe as not reading from memory & not having
side-effects, to clean them up.

We also need to remove ComputeReductionResult VPInstructions with
live-in arguments. This means there is actually no reduction, and we
need to fold it to the live in. Otherwise we would incorrectly reduce
the live-in.

PR: https://github.com/llvm/llvm-project/pull/176795
2026-01-28 15:51:08 +00:00
Damian Heaton
762ba885f9
[LV] Add support for llvm.vector.partial.reduce.fadd (#163975)
Allows the Loop Vectorizer to generate `llvm.vector.partial.reduce.fadd`
intrinsics when sequences which match its requirements are found.
2026-01-28 15:05:34 +00:00
Florian Hahn
b794baf8e7
[TTI] Add VectorInstrContext for context-aware insert/extract costs. (#175982)
This commit introduces the VectorInstrContext (VIC) infrastructure to
improve cost estimates for insert/extracts based on the context
instruction in which the insert/extract is used.

This is similar to CastContextHint, and allows providing context on how
the insert/extract is going to be used before creating IR. This is
useful in the LoopVectorizer, where costs need to estimated before
creating IR.

The new hint currently only replaces an existing check in AArch64,
but new uses will be introduced in follow-ups, including
https://github.com/llvm/llvm-project/pull/177201.

PR: https://github.com/llvm/llvm-project/pull/175982
2026-01-27 16:30:29 +00:00
Florian Hahn
a871b707b7
Reapply "[VPlan] Move VDef subclass ID to VPRecipeBase (NFC). (#174282)"
Move SubclassID to VPRecipeBase, and store VPRecipeBase directly in
VPRecipeValue, instead of VPDef. This allows for some additional
simplifications and VPDef now just holds various helpers to deal with
removing and adding VPValues.

This reverts commit 16395da0ff577750571b99fe28281ce6fb6a3ae8.

PR: https://github.com/llvm/llvm-project/pull/174282
2026-01-24 13:22:48 +00:00
Florian Hahn
16395da0ff
Revert "[VPlan] Fold VPDef into VPRecipeBase (NFC). (#174282)"
This reverts commit f3ae334f4b7a8cf4fe0eb6ee7b2f2ef0879f522d.

Committed with out-of-date message, revert to reland with updated
message.
2026-01-24 13:16:45 +00:00
Florian Hahn
f3ae334f4b
[VPlan] Fold VPDef into VPRecipeBase (NFC). (#174282)
A separate VDef is not needed any longer, fold i into VPRecipeBase to
simplify code and class hierarchy.

Depends on https://github.com/llvm/llvm-project/pull/172758.

PR: https://github.com/llvm/llvm-project/pull/174282
2026-01-24 13:16:12 +00:00
Florian Hahn
14a209f852
[VPlan] Replace ComputeFindIVRes with ComputeRdxRes + cmp + sel (NFC) (#176672)
Replace ComputeFindIVResult with ComputeReductionResult + explicit
compare + select, to more explicitly and simpler model computing finding
the first/last induction, which boils down to a min/max reduction +
compare and select of the sentinel value.

PR: https://github.com/llvm/llvm-project/pull/176672
2026-01-22 19:28:47 +00:00
Luke Lau
2036bc524b
[IR] Update IRBuilder::createVectorSplice to allow variable offsets (#177178)
Following on from #174693, this updates IRBuilder to allow variable
offsets, and splits the createVectorSplice function into two functions
for left and right splices.

We could preserve the existing createVectorSplice API but given there's
only one LLVM-internal user of it in the loop vectorizer, and the notion
of a negative offset doesn't exist in the intrinsics anymore, I've
removed it. Happy to add it back if reviewers prefer though.

I've also added unit tests since createVectorSpliceLeft has no coverage
otherwise.
2026-01-22 04:21:57 +00:00
Florian Hahn
dd363d0629
[VPlan] Replace UnrollPart for VPScalarIVSteps with start index op (NFC) (#170906)
Replace the unroll part operand for VPScalarIVStepsRecipe with the start
index. This simplifies https://github.com/llvm/llvm-project/pull/170053
and is also a first step to break down the recipe into its components.

PR: https://github.com/llvm/llvm-project/pull/170906
2026-01-21 22:13:13 +00:00
Florian Hahn
69fbab2b72
[VPlan] Fall back to legacy cost if operands may be force-scalarized.
If any of the operands of a VPReplicateRecipe have been
force-scalarized, then the legacy cost model skips the scalarization
overhead, but we cannot match this in the VPlan cost model.

Bail out for now in those very rare cases.

Fixes https://github.com/llvm/llvm-project/issues/176720.
2026-01-19 21:25:54 +00:00
Ramkumar Ramachandra
302565b39e
[VPlan] Move VPDerivedIVRecipe::execute to VPlanRecipes (NFC) (#176577) 2026-01-19 13:06:37 +00:00
Florian Hahn
ae1bd068db
[VPlan] Replace PhiR operand of ComputeAnyOfResult with VPIRFlags. (#175657)
Replace the Phi recipe operand of ComputeAnyOfVResult with VPIRFlags,
building on top of https://github.com/llvm/llvm-project/pull/174026.

PR: https://github.com/llvm/llvm-project/pull/175657
2026-01-18 20:29:38 +00:00
Florian Hahn
3457e7efc3
[VPlan] Match inverted logical AND/OR for select costs.
VPlan transforms may invert logical AND/OR selects, which can impact
costs on targets the select is not cheap but the boolean AND/OR is.
Also match the inverted logical AND/OR to improve accuracy of the
cost estimation and fixes the underlying issue for the cost
divergence between legacy and VPlan-based cost model that caused
the revert of 01d34eb38fa058 in ed004cf42bf57c.
2026-01-18 16:15:42 +00:00
Florian Hahn
459990dcf7
[VPlan] Replace PhiR operand of ComputeFindIVResult with VPIRFlags. #174026 (#175461)
Replace the Phi recipe operand of ComputeFindIVResult with VPIRFlags,
building on top of https://github.com/llvm/llvm-project/pull/174026.

PR: https://github.com/llvm/llvm-project/pull/175461
2026-01-17 16:23:33 +00:00
Florian Hahn
d528686f43
[VPlan] Add VPConstantInt for VPIRValues wrapping ConstantInts (NFC) (#175458)
Follow-up to https://github.com/llvm/llvm-project/pull/174282: Introduce
a new VPConstantInt overlay for VPIRValue, to make it easier to check
and access constant int IR values.

PR: https://github.com/llvm/llvm-project/pull/175458
2026-01-16 11:27:07 +00:00
Vishruth Thimmaiah
04baf1105f
[LoopVectorize] Support vectorization of overflow intrinsics (#174835)
Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`,
`ssub`, `umul` and `smul` as trivially vectorizable.

Fixes #174617

---------

Signed-off-by: vishruth-thimmaiah <vishruththimmaiah@gmail.com>
2026-01-16 10:09:42 +00:00
Elvis Wang
aa11629192
[LV] Prevent extract-lane generate unused IRs with single vector operand. (#172798)
When `extract-lane` only contains single vector operand. We can simplify
it to `extractelement`.

This patch makes `extract-lane` generate simple `extractelement` when it
only contains single vector operand to prevent unused IR generated.

This patch is mostly NFC, the unused IR should be removed in following
IR passes.
2026-01-16 13:59:51 +08:00
Florian Hahn
f61aab79ce
[VPlan] Handle min/max recur kinds in ::printFlags.
Following up to d5c11b9a24c84f, also handle min/max recurrence kinds in
::printFlags, so the proper kind is imprinted instead of icmp.

NFC modulo debug printing changes
2026-01-14 18:16:11 +00:00
Graham Hunter
2abd6d6d7a
[LV] Vectorize conditional scalar assignments (#158088)
Based on Michael Maitland's previous work:
https://github.com/llvm/llvm-project/pull/121222

This PR uses the existing recurrences code instead of introducing a
new pass just for CSA autovec. I've also made recipes that are more
generic.
2026-01-14 14:59:18 +00:00
Florian Hahn
d5c11b9a24
[VPlan] Replace PhiR operand of ComputeRdxResult with VPIRFlags. (#174026)
Remove the artificial PhiR operand of ComputeReductionResult, which was
only used to look up recurrence kind, in-loop and ordered properties.

Instead, encode them as VPIRFlags as suggested by @ayalz in
https://github.com/llvm/llvm-project/pull/170223.

This addresses a TODO to make codegen for ComputeReductionResult
independent of looking up information from other recipes.

This is NFC w.r.t. codegen, the printing has been improved to include
the reduction type, and whether it is in-loop/ordered.

PR: https://github.com/llvm/llvm-project/pull/174026
2026-01-14 07:45:44 +00:00
Florian Hahn
f9a8096067
[VPlan] Merge Select with previous cases in ::computeCost (NFC).
Merge cases calling the same helper, as suggested post-commit in
https://github.com/llvm/llvm-project/pull/174234
2026-01-12 22:19:28 +00:00
Elvis Wang
cd2caf6580
[LV] Simplify extract-lane with scalar operand to the scalar value itself. (#174534)
This patch simplifies extract-lane(%lane_num, %X) to %X when %X is a
scalar value. Extracting from a scalar is redundant since there is only
one value to extract.
2026-01-12 10:03:44 +08:00
Aiden Grossman
7450a75b93
[VPlan] Allow truncation for lanes in VPScalarIVStepsRecipe (#175268)
VPScalarIVStepsRecipe relies on APInt truncation in order to vectorize
blocks with a width greater than the maximum value the types of some of
their (changing) operands are able to hold (e.g., an i1 input with a
vector width of 4). Simply reenable implicit truncation in
ConstantInt::get() to cover this case.

Remove the helper function given it is only called in one place to
prevent accidentally using it elsewhere where we probably do not want
implicit truncation turned on.

This fixes another case that we saw after
acb78bde6fb613a9af2a604bc69fa744a8cee850 did not fix that issue, which
had the same stack trace. We still want to keep lane constants as
unsigned.

Somewhat similar to 6d1e7d4982fabc9e245897056a5425496df6a7a3.

This test case comes from a tensorflow/XLA compilation from a test case
in https://github.com/google-research/spherical-cnn.
2026-01-10 10:34:46 -05:00
Aiden Grossman
acb78bde6f
[VPlan] Use unsigned integers for lane start indices (#175231)
a83c89495ba6fe0134dcaa02372c320cc7ff0dbf caused assertion failures here
as if we have a single bit induction variable and two lanes (0 and 1),
then the second lane index (1) will be out of bounds of what a signed
1-bit integer can hold. Lane indices are always >0 according to
VPlanHelpers.h:125, and the lane representation in this code is also
unsigned.

The test case come from tensorflow/XLA.
2026-01-09 14:28:28 -08:00
Florian Hahn
31b93d6e38
[VPlan] Add specialized VPValue subclasses for different types (NFC) (#172758)
This patch adds VPValue sub-classes for the different cases we currently
have:
 * VPIRValue: A live-in VPValue that wraps an underlying IR value
* VPSymbolicValue: A symbolic VPValue not tied to an underlying value,
e.g. the vector trip count or VF VPValues
 * VPRecipeValue: A VPValue defined by a VPDef/VPRecipeBase.

This has multiple benefits:
 * clearer constructors for each kind of VPValue
* limited scope: for example allows moving VPDef member to VPRecipeValue,
reducing size of other VPValues.
* stricter type checking for member variables (e.g. using VPLiveIn in
the Value -> live-in map in VPlan, or using VPSymbolicValue for symbolic
member VPValues)

There probably are additional opportunities for cleanups as follow-ups.

PR: https://github.com/llvm/llvm-project/pull/172758
2026-01-07 20:29:05 +00:00
Ramkumar Ramachandra
bdc7681d63
[VPlan] Restore all-operands-inv WidenGEP logic (#174416)
Restore the all-operands-invariant handling in WidenGEP::execute prior
to 37f7b31 (Reland [VPlan] Handle WidenGEP in narrowToSingleScalars), as
crashes have been reported.

Fixes #173761.
2026-01-06 13:05:37 +00:00
Nikita Popov
707d18c8e7
[VPlan] Use getSigned() for index in VectorEndPointer recipe (#174426)
The stride can be negative here, so we should use getSigned().

This avoids an assertion failure with
https://github.com/llvm/llvm-project/pull/171456. It also avoids a
miscompile if the index is >64-bit, but I don't think that can happen in
practice.
2026-01-06 09:10:59 +01:00
Florian Hahn
16830b2164
[VPlan] Remove VPWidenSelectRecipe, use VPWidenRecipe instead (NFCI). (#174234)
All extra state has been removed from VPWidenSelectRecipe at this point.
There's no benefit of having a separate recipe and Select can easily be
handled by the existing VPWidenRecipe.

PR: https://github.com/llvm/llvm-project/pull/174234
2026-01-05 22:33:37 +00:00
Florian Hahn
188507e542
[VPlan] Inline createFindLastIVReduction into its only caller. (NFC)
createFindLastIVReduction is only used for generating code for
ComputeFindIVResult. Inline the code there, in preparation for
https://github.com/llvm/llvm-project/pull/172569.
2026-01-04 13:31:47 +00:00
Florian Hahn
990883a690
[VPlan] Handle Alloca in VPReplicateRecipe::computeCost. (NFCI)
Handle Alloca in the VPlan-based cost mode. This also updates the cost
in the legacy cost model to clarify that we always compute the scalar
cost.
2026-01-03 17:40:51 +00:00
Florian Hahn
b4d833135a
[VPlan] Handle non-free bitcasts in getCostForRecipeWithOpcode.
Update bitcast cost handling to match the legacy cost model.
2026-01-02 18:04:13 +00:00
Florian Hahn
5ee82dffc6
[VPlan] Handle addrspacecast/ptrtoaddr in VPlan-based cost model.
Also handle missing PtrToAddrs and AddrSpaceCast in
getCostForRecipeWithOpcode.

This makes sure all cast opcodes are handled, fixing a crash on loops
replicating addrspacecast and ptrtoaddrs.
2026-01-01 10:35:35 +00:00
Florian Hahn
6f6fca136c
[VPlan] Re-use common cast cost logic for VPReplicateRecipe (NFCI).
Move the logic to compute cast costs to getCostForRecipeWithOpcode and
use for VPReplicateRecipe.

This should match the costs computed by the legacy cost model for scalar
casts.
2025-12-30 22:34:53 +00:00