3852 Commits

Author SHA1 Message Date
Florian Hahn
c4e8adf7bb
[VPlan] Skip gather/scatters in useEmulatedMaskMemRefHack.
The legacy cost model skips gather/scatters when determining the
predicated stores in useEmulatedMaskMemRefHack. Match the behavior in
the VPlan-based implementation. This fixes a cost divergence in the
attached test.
2026-02-16 21:01:16 +00:00
Ramkumar Ramachandra
2b7c1f9d82
[VPlan] Directly unroll VectorEndPointerRecipe (#172372)
Directly unroll VectorEndPointerRecipe following 0636225b ([VPlan]
Directly unroll VectorPointerRecipe, #168886). It allows us to leverage
existing VPlan simplifications to optimize.

Co-authored-by: Luke Lau <luke@igalia.com>
Co-authored-by: Florian Hahn <flo@fhahn.com>
2026-02-16 09:59:55 +00:00
Florian Hahn
6f253e87dd
Reapply "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

The underlying issue causing the revert has been fixed independently
as 301fa24671256734df6b7ee65f23ad885400108e.

Original message:
Move narrowInterleaveGroups to to general VPlan optimization stage.

To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.

If such a VF is found, the original VPlan is split into 2:
 a) a new clone which contains all VFs of Plan, except VFToOptimize, and
 b) the original Plan with VFToOptimize as single VF.

The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.

Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.

One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz

PR: https://github.com/llvm/llvm-project/pull/149706
2026-02-15 20:10:10 +00:00
Florian Hahn
caf2a4846b
[VPlan] Use -vplan-print-after in more VPlan tests
use vplan-print-after instead of -debug/-debug-only in more tests,
auto-generate some checks.
2026-02-14 20:38:48 +00:00
Florian Hahn
cd38f8486b
[LV] Add argmin test for epilogue vectorization w/o wide canonical IV.
Add additional epilogue vectorization test coverage for
https://github.com/llvm/llvm-project/pull/170223.

Also regenerate check lines for related tests.
2026-02-14 19:08:10 +00:00
Florian Hahn
b3dcf485d2
[VPlan] Compute NumPredStores for VPReplicateRecipe costs in VPlan.
Compute the number of predicated stores directly in VPlan instead of
using CM.useEmulatedMaskMemRefHack(), which will only account for the
number of predicated stores for the last VF the legacy cost model
considered.

Fixes https://github.com/llvm/llvm-project/issues/181183
2026-02-13 21:16:53 +00:00
Florian Hahn
ede1a9626b
[LV] Vectorize early exit loops with multiple exits. (#174864)
Building on top of the recent changes to introduce BranchOnTwoConds,
this patch adds support for vectorizing loops with multiple early exits,
all dominating a countable latch. The early exits must form a
dominance chain, so we can simply check which early exit has been taken
in dominance order.

Currently LoopVectorizationLegality ensures that all exits other than
the latch must be uncountable. handleUncountableEarlyExits now collects
those uncountable exits and processes each exit.

In the vector region, we compute if any exit has been taken, by taking
the OR of all early exit conditions (EarlyExitConds) and checking if
there's
any active lane.

If the early exit is taken, we exit the loop and compute which early
exit
has been taken. The first taken early exit is the one where its exit
condition is true in the first active lane of EarlyExitConds.

We create a chain of dispatch blocks outside the loop to check this for
the early exit blocks ordered by dominance.

Depends on https://github.com/llvm/llvm-project/pull/174016.

PR: https://github.com/llvm/llvm-project/pull/174864
2026-02-13 16:44:23 +00:00
Ramkumar Ramachandra
ec0b22ff47
[VPlan] Reuse introduces-broadcast logic in narrowToSingleScalars (#174444)
narrowToSingleScalarRecipes' operands check is a bit too restrictive by
permitting a single user. Factor out and reuse the existing
introduces-broadcast logic to improve results.
2026-02-13 15:56:57 +00:00
Florian Hahn
a55fbab0cf
[VPlan] Run initial recipe simplification on VPlan0. (#176828)
In some cases, LV gets simplifyable IR as input. Directly apply
simplifications on the initial VPlan0 to avoid vectorization in cases
where the loop body can be folded away.

Using the end-to-end pipeline, this is relatively rare, but when
reducing test cases, the reduction often ends up with cases with trivial
folds. Rejecting those will result in more robust & realistic test
cases.

As follow-up, I also plan to add initial dead recipe removal.

Depends on https://github.com/llvm/llvm-project/pull/176795.

PR: https://github.com/llvm/llvm-project/pull/176828
2026-02-13 12:01:22 +00:00
Florian Hahn
ef85b0c454
[VPlan] Check scalar VF in removeRedundantCanonicalIVs.
When the plan has only a scalar VF, we never generate vectors for IVs,
so we can always perform the replacement.
2026-02-12 23:02:51 +00:00
Florian Hahn
2b2582cd3b
[VPlan] Update isUniformAcrossVFsAndUFs to account for sinking.
Recipes can be sunk now. In those cases, the sunk recipes are outside
the loop region, but may not be uniform across VF and UF.

Update the code to only exit early if the recipe is defined before the
region. Without DT available, the easiest way to check is just if it is
in the entry/preheader block.

Fixes https://github.com/llvm/llvm-project/issues/181002.
Fixes https://github.com/llvm/llvm-project/issues/180781.
2026-02-12 22:17:50 +00:00
Andrei Elovikov
8e335d5336
[UTC][VPlan] Use -vplan-print-after for VPlan-dump-based tests (#178736)
Switch tests from using `-debug[-only=LoopVectorize]` to
`-vplan-print-after` as that provides better control at what step in the
pipeline we want to check the VPlan (I'm using `optimize$` for now to
preserve previous state).

Then, update `-vplan-print-after*` to print what function the loop
belongs to. That enables us to simplify VPlan UTC support as the output
of the updated tests contains the VPlan dump only - no special
filtering/extraction is necessary anymore.
2026-02-12 20:14:07 +00:00
Kunqiu Chen
85e07bad93
[InstructionSimplify] Extend simplifyICmpWithZero to handle equivalent zero RHS (#179055)
Add a new helper function `matchEquivZeroRHS()` that recognizes
comparisons with constants that are equivalent to comparisons with zero,
and transforms the predicate accordingly.

This handles the following transformations:
- icmp sgt X, -1 --> icmp sge X, 0
- icmp sle X, -1 --> icmp slt X, 0
- icmp [us]ge X, 1 --> icmp [us]gt X, 0
- icmp [us]lt X, 1 --> icmp [us]le X, 0

This enables more optimization opportunities in `simplifyICmpWithZero`,
such as folding icmp sgt X, -1 when X is known to be non-negative.

---

- IR Impact: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3414
2026-02-13 00:06:32 +08:00
Luke Lau
3482a9c6cb
[VPlan] Explicitly reassociate header mask in logical and (#180898)
We reassociate ((x && y) && z) -> (x && (y && z)) if x has more than
use, in order to allow simplifying the header mask further. However this
is somewhat unreliable as there are times when it doesn't have more than
one use, e.g. see the case we run into in
https://github.com/llvm/llvm-project/pull/173265/changes#r2769759907.

This moves it into a separate transformation that always reassociates
the header mask regardless of the number of uses, which prevents some
fragile test changes in #173265.

We need to run it before both calls to simplifyRecipes in optimize. I
considered putting it in simplifyRecipes itself but simplifyRecipes is
also called after unrolling and when the loop region is dissolved which
causes vputils::findHeaderMask to assert.

There isn't really any benefit to reassociating masks that aren't the
header mask so the existing simplification was removed.
2026-02-12 14:56:15 +00:00
Florian Hahn
8e1d5ec534
[LV] Add LoopVectorize/VPlan subdirectory for VPlan printing tests. (#180611)
Add a new VPlan subdirectory as common place for tests checking VPlan
printing. It contains a lit.local.cfg that only runs the tests when
assertions are enabled.

This removes the need to add explicit REQUIRES: asserts to VPlan tests.

PR: https://github.com/llvm/llvm-project/pull/180611
2026-02-12 14:06:24 +00:00
Ramkumar Ramachandra
2223b931c5
[VPlan] Introduce m_c_Logical(And|Or) (#180048) 2026-02-12 13:14:08 +00:00
Florian Hahn
d3afa171ee
[LV] Don't scalarize loads that need predication in legacy CM.
The legacy cost model tries to scalarize loads that are used as
pointers. Skip if the load would need predicating when scalarized,
because that would incur very high costs, see useEmulatedMaskMemRefHack.

Fixes https://github.com/llvm/llvm-project/issues/180780.
2026-02-11 20:52:08 +00:00
Florian Hahn
2dcf858ba0
[LAA] Use SCEVPtrToAddr in tryToCreateDiffChecks. (#178861)
The checks created by LAA only compute a pointer difference and do not
need to capture provenance. Use SCEVPtrToAddr instead of SCEVPtrToInt
for computations.

To avoid regressions while parts of SCEV are migrated to use PtrToAddr
this adds logic to rewrite all PtrToInt to PtrToAddr if possible in the
created expressions. This is needed to avoid regressions.

Similarly, if in the original IR we have a PtrToInt, SCEVExpander tries
to re-use it if possible when expanding PtrToAddr.

Depends on https://github.com/llvm/llvm-project/pull/178727.

Fixes https://github.com/llvm/llvm-project/issues/156978.

PR: https://github.com/llvm/llvm-project/pull/178861
2026-02-11 11:51:51 +00:00
Florian Hahn
a1fc5b4a48
[VPlan] Reject partial reductions with invalid costs in getScaledReds. (#180438)
Check if costs for partial reductions are valid up-front in
getScaledReductions instead when transforming each link in the chain in
transformToPartialReduction. This ensures that we either transform all
entries in the chain together, or none via the existing invalidation
logic.

This fixes a crash when a link in the chain would have invalid cost, as
in the added test cases.

Fixes https://github.com/llvm/llvm-project/issues/180340.

PR: https://github.com/llvm/llvm-project/pull/180438
2026-02-10 21:16:21 +00:00
Florian Hahn
5131186506
[VPlan] Use UTC to auto-generate more VPlan checks.
Update more VPlan tests to use auto-generated check lines via new UTC
support.
2026-02-10 20:56:05 +00:00
Luke Lau
f81889da29
[VPlan] Fix convertToPhisToBlends folding non poison blend to poison (#180686)
This fixes a miscompile in #180005 where we didn't check that the first
incoming value isn't poison.

We should use the first non-poison incoming value if it exists, or just
poison if all the incoming values are poison.
2026-02-10 16:15:57 +00:00
Andrei Elovikov
f96c1ccc1e
[VPlan] Add -vplan-print-after= option (#178700)
UpdateTestChecks support is updated in subsequent
https://github.com/llvm/llvm-project/pull/178736.
2026-02-10 16:07:25 +00:00
Nikita Popov
067f1c95a4 [LoopVectorizer] Generate test checks (NFC) 2026-02-10 17:01:49 +01:00
Sander de Smalen
3157758190
[LV] Handle partial sub-reductions with sub in middle block. (#178919)
Sub-reductions can be implemented in two ways:
(1) negate the operand in the vector loop (the default way).
(2) subtract the reduced value from the init value in the middle block.

Note that both ways keep the reduction itself as an 'add' reduction,
which is necessary because only llvm.vector.partial.reduce.add exists.

The ISD nodes for partial reductions don't support folding the
sub/negation into its operands because the following is not a valid
transformation:
```
     sub(0, mul(ext(a), ext(b)))
  -> mul(ext(a), ext(sub(0, b)))
```
It can therefore be better to choose option (2) such that the partial
reduction is always positive (starting at '0') and to do a final
subtract in the middle block.

For AArch64 there are no dot-product instructions that can
do a `partial.reduce.sub(acc, mul(ext(a), ext(b)))` operation.
I'm not sure if such instructions exist for other targets.
(If so then we may want to make this decision a target option)

This PR also increases the AArch64 cost of a partial sub-reduction
when this exists in an 'add-sub' reduction chain.

Fixes https://github.com/llvm/llvm-project/issues/178703
2026-02-10 11:00:32 +00:00
Benjamin Maxwell
f22a178b13
Reland "[LV] Support conditional scalar assignments of masked operations" (#180708)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).

For example, the following loop can now be vectorized:

```
int simple_csa_int_load(
  int* a, int* b, int default_val, int N, int threshold)
{
  int result = default_val;
  for (int i = 0; i < N; ++i)
    if (a[i] > threshold)
      result = b[i];
  return result;
}
```

It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.

---

Reverts llvm/llvm-project#180275 (original PR: #178862)

Additional type legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` was
added in #180290, which should resolve the backend crashes on x86.
2026-02-10 09:57:48 +00:00
Matt Arsenault
302ff8fd00
InstCombine: Use SimplifyDemandedFPClass on fmul (#177490)
Start trying to use SimplifyDemandedFPClass on instructions, starting
with fmul. This subsumes the old transform on multiply of 0. The
main change is the introduction of nnan/ninf. I do not think anywhere
was systematically trying to introduce fast math flags before, though
a few odd transforms would set them.

Previously we only called SimplifyDemandedFPClass on function returns
with nofpclass annotations. Start following the pattern of
SimplifyDemandedBits, where this will be called from relevant root
instructions.

I was wondering if this should go into InstCombineAggressive, but that
apparently does not make use of InstCombineInternal's worklist.
2026-02-10 09:49:31 +00:00
Mel Chen
7e5d9189d2
[VPlan] Simplify true && x -> x (#179426) 2026-02-10 08:49:03 +00:00
Florian Hahn
68d4175cc1
[LV] Add FindLast tests where IV-based expression could be sunk. (NFC)
Add set of FindLast tests where the selected expression is based on an
IV and could be sunk.
2026-02-09 23:31:39 +00:00
Florian Hahn
06cffa5ee3
[VPlan] Auto-generate CHECKs in some VPlan printing tests.
Use new UTC support to re-generate check lines.
2026-02-09 23:22:18 +00:00
Florian Hahn
2b9a1aee5a
[LV] Add additional tests for reductions with intermediate stores. (NFC)
Adds missing test coverage for reductions with intermediate stores,
including partial reductions with intermediate stores, as well as
chained min/max reductions with intermediate stores.
2026-02-09 23:14:26 +00:00
Florian Hahn
d1ec04dfd4
[VPlan] Simplify single-entry VPWidenPHIRecipe.
Include VPWidenPHIRecipe in phi simplification if there's a single
incoming value.
2026-02-09 22:10:13 +00:00
Vishruth Thimmaiah
84f4b1e52d Reland "[LoopVectorize] Support vectorization of overflow intrinsics" (#180526)
Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`,
`ssub`, `umul` and `smul` as trivially vectorizable.

Fixes #174617

---

This patch is a reland of #174835.

Reverts #179819
2026-02-09 15:32:04 +00:00
hanbeom
77ccd853d0
[IVDesc] Check loop-preheader for loop-legality when pass-remarks enabled (#166310)
When `-pass-remarks=loop-vectorize` is specified, the subsequent logic
is executed to display detailed debug messages even if no PreHeader
exists in the loop.

Therefore, an assert occurs when the `getLoopPreHeader()` function is
called. This commit resolves that issue.

Fixed: #165377
2026-02-10 00:02:13 +09:00
Luke Lau
ed19bbfe01 Revert "[VPlan] Add missing REQUIRES: asserts to VPlan output test"
This reverts commit 2805c8aaa61a94ef22ac76c8dac56f7dfe970651.

This added the REQUIRES line to the wrong test, 041ce9f added it to the
correct one.
2026-02-09 22:00:42 +08:00
David Sherwood
041ce9fe0f
[LV][NFC] Add "REQUIRES: assert" to new test file (#180522)
Fixes a minor test regression introduced by
https://github.com/llvm/llvm-project/pull/180226 in file
llvm/test/Transforms/LoopVectorize/phi-with-fastflags-vplan.ll
2026-02-09 13:55:35 +00:00
David Sherwood
44031ae79f
[LV] Fix issue in VPFirstOrderRecurrencePHIRecipe::usesFirstLaneOnly (#179977)
In some cases we decide to vectorise loops with first-order recurrences
using VF=1, IC>1. We then attempt to unroll a vplan in replicateByVF,
however when trying to erase the list of values from the parent we
trigger the following assert:

```
virtual llvm::VPRecipeValue::~VPRecipeValue(): Assertion `Users.empty()
  && "trying to delete a VPRecipeValue with remaining users"' failed.
```

The problem seems to stem from this code:

```
  DefR->replaceUsesWithIf(LaneDefs[0], [DefR](VPUser &U, unsigned) {
    return U.usesFirstLaneOnly(DefR);
  });
```

since usesFirstLaneOnly returns false and we fail to replace uses of
DefR with LaneDefs[0]. Upon inspection the only VPUser objects that
return false are VPInstruction::FirstOrderRecurrenceSplice and
VPFirstOrderRecurrencePHIRecipe. Since the values are all scalar it's
simply not possible for us to be using anything other than the first
lane. I've fixed this by bailing out of replicateByVF early for plans with
only a scalar VF.

Fixes https://github.com/llvm/llvm-project/issues/179671
2026-02-09 13:42:26 +00:00
Florian Hahn
7defb0a4a3
[VPlan] Skip applying InstsToScalarize with forced instr costs. (#168269)
ForceTargetInstructionCost in the legacy cost model overrides any costs
from InstsToScalarize. Match the behavior in the VPlan-based cost model.
This fixes a crash with -force-target-instr-cost for the added test
case.

PR: https://github.com/llvm/llvm-project/pull/168269
2026-02-09 13:20:44 +00:00
Luke Lau
2805c8aaa6 [VPlan] Add missing REQUIRES: asserts to VPlan output test
Should fix https://lab.llvm.org/buildbot/#/builders/11/builds/33293
2026-02-09 20:40:10 +08:00
Luke Lau
8cd86ff284
[VPlan] Propagate FastMathFlags from phis to blends (#180226)
If a phi has fast math flags, we can propagate it to the widened select.
To do this, this patch makes VPPhi and VPBlendRecipe subclasses of
VPRecipeWithIRFlags, and propagates it through PlainCFGBuilder and
VPPredicator.

Alive2 proofs for some of the FMFs (it looks like it can't reason about
the full "fast" set yet)
nnan: https://alive2.llvm.org/ce/z/f0bRd4
nsz: https://alive2.llvm.org/ce/z/u9P96T

The actual motivation for this to eventually be able to move the special
casing for tail folding in
LoopVectorizationPlanner::addReductionResultComputation into the CFG in
#176143, which requires passing through FMFs.
2026-02-09 19:38:58 +08:00
Florian Hahn
6324ee32c1
[VPlan] Use PredBB's terminator as insert point for VPIRPhi extracts.
Use PredBB's terminator as insert point in VPIRPhi::execute to make sure
the extracts are placed after any possibly sunk instructions.

Fixes https://github.com/llvm/llvm-project/issues/180363.
2026-02-08 20:36:36 +00:00
Florian Hahn
3c5b05427d
[VPlan] Pass underlying instr to getMemoryOpCost in ::computeCost.
Pass underlying instruction to getMemoryOpCost in
VPReplicateRecipe::computeCost if UsedByLoadStoreAddress is true.
Some targets use the underlying instruction to improve costs,
and this is needed to match the legacy cost model.

Fixes https://github.com/llvm/llvm-project/issues/177780.
Fixes https://github.com/llvm/llvm-project/issues/177772.
2026-02-08 16:15:39 +00:00
Florian Hahn
3192fe2c7b
[VPlan] Fall back to legacy cost model if PtrSCEV is nullptr.
There are some cases when PtrSCEV can be nullptr. Fall back to legacy
cost model, to not call isLoopInvariant with nullptr.

Fixes a crash after 0c4f8094939d2.
2026-02-08 11:55:12 +00:00
Kewen Meng
703c2762d3
Revert "[LV] Support conditional scalar assignments of masked operations" (#180275)
Reverts llvm/llvm-project#178862 

revert to unblock bot:
https://lab.llvm.org/buildbot/#/builders/206/builds/13225
2026-02-06 13:24:40 -08:00
Florian Hahn
bd40d1de9c
Reapply "[SCEVExp] Use SCEVPtrToAddr in tryToReuseLCSSAPhi if possible. (#180257)"
This reverts commit cb905605b2e95f88296afe136b21a7d2476cb058.

Recommit the patch with a small change to check the destination
type matches the address type, to avoid a crash on mismatch.

Original message:

This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using
SCEVPtrToInt allows re-use, because the IR already contains a re-usable
phi using PtrToInt.

This is a first step towards migrating to SCEVPtrToAddr and avoids
regressions in follow-up changes.

PR: https://github.com/llvm/llvm-project/pull/178727
2026-02-06 21:14:41 +00:00
Florian Hahn
cb905605b2
Revert "[SCEVExp] Use SCEVPtrToAddr in tryToReuseLCSSAPhi if possible." (#180257)
Reverts llvm/llvm-project#178727

triggers asserts in on some build bots
2026-02-06 18:26:37 +00:00
Florian Hahn
c32cde4182
[SCEVExp] Use SCEVPtrToAddr in tryToReuseLCSSAPhi if possible. (#178727)
This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using
SCEVPtrToInt allows re-use, because the IR already contains a re-usable
phi using PtrToInt.

This is a first step towards migrating to SCEVPtrToAddr and avoids
regressions in follow-up changes.

PR: https://github.com/llvm/llvm-project/pull/178727
2026-02-06 17:38:24 +00:00
Ramkumar Ramachandra
901d175d18
[VPlan] Simplify x & AllOnes -> x (#180049) 2026-02-06 13:42:58 +00:00
Florian Hahn
fdce0ea708
[VPlan] Add ExitingIVValue VPInstruction. (#175651)
Add a new VPInstruction opcode to compute the exiting value of an
induction variable after vectorization. This replaces the pattern of
extracting the last lane from the last part of the induction backedge
value when applicable.

This allows us to always use the pre-computed IV end value. It will also
allow unifying end value creation for both induction resume and exit
values.

PR: https://github.com/llvm/llvm-project/pull/175651
2026-02-06 12:27:31 +00:00
Benjamin Maxwell
4f90eb6427
[LV] Support conditional scalar assignments of masked operations (#178862)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).

For example, the following loop can now be vectorized:

```
int simple_csa_int_load(
  int* a, int* b, int default_val, int N, int threshold)
{
  int result = default_val;
  for (int i = 0; i < N; ++i)
    if (a[i] > threshold)
      result = b[i];
  return result;
}
```

It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.
2026-02-06 11:43:06 +00:00
Luke Lau
33a2c3ee9c
[VPlan] Ignore poison incoming values when creating blend (#180005)
We have an optimization in VPPredicator when creating blends where if
all the incoming values are the same, we just return that value.

This extends it to handle cases like "phi [%x, %x, poison, %x]" by
ignoring poison values.

This is split off from #176143 to prevent regressions when maintaining
SSA by adding PHIs with a poison incoming value.
2026-02-06 19:09:43 +08:00