1041 Commits

Author SHA1 Message Date
Mel Chen
f196b1d66f
[VPlan] Extract reverse operation for reverse accesses (#146525)
This patch introduces VPInstruction::Reverse and extracts the reverse
operations of loaded/stored values from reverse memory accesses. This
extraction facilitates future support for permutation elimination within
VPlan.
2025-12-18 14:57:48 +00:00
Florian Hahn
bab0dc4d48
Reapply "[LV] Mark checks as never succeeding for high cost cutoff."
Reapply 8a115b6934a90441 with an update to tests handling remarks.

The patch now directly emits a clear remark when we bail out
due to the memory check threshold.

Original message:
When GeneratedRTChecks::create bails out due to exceeding the cost
threshold, no runtime checks are generated and we must not proceed
assuming checks have been generated.

Mark the checks as never succeeding, to make sure we don't try to
vectorize assuming the runtime checks hold. This fixes a case where we
previously incorrectly vectorized assuming runtime checks had been
generated when forcing vectorization via metadate.

Fixes the mis-compile mentioned in
https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588
2025-12-17 20:21:49 +00:00
Luke Lau
67d0e21a62
Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846)" (#172261)
This reapplies #171846 with a test case and fix for a legacy cost-model
mismatch assertion.

In the previous version of the patch, we only considered the plan to
contain simplifications when it had a VPBlendRecipe and VF.isScalar()
was true.

However for some VPlans we may have a blend with only the first lane
used:

    BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c>
    CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi>
    vp<%5> = vector-pointer ir<%gep>

And in the legacy cost model we cost a blend as a phi if it's uniform:

// If we know that this instruction will remain uniform, check the cost
of
    // the scalar version.
    if (isUniformAfterVectorization(I, VF))
      VF = ElementCount::getFixed(1);

So this replaces the VF.isScalar() check with
vputils::onlyFirstLaneUsed, which matches how the VPlan cost model
mirrored the legacy model beforehand.

A VPInstruction::Select will also emit a scalar select for a vector VF
if only the first lane is used, so this also updates
VPBlendRecipe::computeCost to reflect that too.
2025-12-16 06:30:54 +00:00
Ramkumar Ramachandra
0636225b93
[VPlan] Directly unroll VectorPointerRecipe (#168886)
In an effort to get rid of VPUnrollPartAccessor and directly unroll
recipes, start by directly unrolling VectorPointerRecipe, allowing for
VPlan-based simplifications and simplification of the corresponding
execute.
2025-12-15 10:54:06 +00:00
Florian Hahn
a99a982440
[LV] Add test coverage for remark for unprofitable RT checks.
Add test coverage for remark when runtime checks are not profitable with
threshold provided.

Also make sure that X86 remark tests actually passes an X86 triple,
which is needed for the threshold remark.

Also clean up the tests a bit.
2025-12-13 22:44:09 +00:00
Luke Lau
4ea8157773 Revert "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846)"
This reverts commit fd5f53aa9b21060063484fc6c346316a34a6464c.

It's triggering legacy cost model assertions reported in
https://github.com/llvm/llvm-project/pull/171846#issuecomment-3647640019
2025-12-13 20:05:34 +08:00
Luke Lau
fd5f53aa9b
[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846)
A VPBlendRecipe always emits selects, even when the VF is scalar.

However the legacy cost model always costs all scalar non-header phis as
a phi, and the VPlan cost model has to account for this.

This can cause the cost to be a little off, for example not including
the cost of the select in @smax_call_uniform leading to unprofitable
vectorization.

This removes this from the VPlan cost model and handles checks for the
case in planContainsAdditionalSimplifications instead.

I considered trying to make the legacy cost model more accurate but I'm
not sure if it's possible. We need information as to whether or not the
scalar VF we are costing is the original loop in which case it's
actually a phi, or if it's a VPBlendRecipe that emits a select,
potentially from a VF=1, UF>=1 VPlan.
2025-12-12 00:25:58 +08:00
Florian Hahn
de53b1a4ef
[LV] Simplify IR for gather-cost.ll, auto-generate checks. (NFC)
Simplify tests and auto-generate check in preparation for further
updates.
2025-12-08 19:19:51 +00:00
Florian Hahn
1054a6e9de
[SCEV] Handle non-constant start values in AddRec UDiv canonicalization. (#170474)
Follow-up to https://github.com/llvm/llvm-project/pull/169576 to enable
UDiv canonicalization if the start of the AddRec is not constant.

The fold is not restricted to constant start values, as long as we are
able to compute a constant remainder. The fold is only applied if the
subtraction of the remainder can be folded into to start expression, but
that is just to avoid creating more complex AddRecs.

For reference, the proof from #169576 is
https://alive2.llvm.org/ce/z/iu2tav

PR: https://github.com/llvm/llvm-project/pull/170474
2025-12-03 21:13:11 +00:00
Florian Hahn
5d876093b7
[SCEV] Allow udiv canonicalization of potentially-wrapping AddRecs (#169576)
Extend the {X,+,N}/C => {(X - X%N),+,N}/C canonicalization to handle
AddRecs that may wrap, when X < N <= C and both N,C are powers of 2. The
alignment and power-of-2 properties ensure division results remain
equivalent for all offsets [(X - X%N), X).

Alive2 Proof: https://alive2.llvm.org/ce/z/iu2tav

Fixes https://github.com/llvm/llvm-project/issues/168709

PR: https://github.com/llvm/llvm-project/pull/169576
2025-12-02 14:09:53 +00:00
Florian Hahn
b76089c7f3
[VPlan] Skip uses-scalars restriction if one of ops needs broadcast. (#168246)
Update the logic in narrowToSingleScalar to allow narrowing even if not
all users use scalars, if at least one of the operands already needs
broadcasting.

In that case, there won't be any additional broadcasts introduced. This
should allow removing the special handling for stores, which can
introduce additional broadcasts currently.

Fixes https://github.com/llvm/llvm-project/issues/169668.

PR: https://github.com/llvm/llvm-project/pull/168246
2025-11-28 10:26:27 +00:00
Florian Hahn
f8eca64a28
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 20:03:55 +00:00
Florian Hahn
d58ebe339c
Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)""
This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43.

Missed some test updates.
2025-11-26 19:41:39 +00:00
Florian Hahn
72e51d389f
Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6.

The following fixes have landed, addressing issues causing the original
revert:
* https://github.com/llvm/llvm-project/pull/169298
* https://github.com/llvm/llvm-project/pull/167897
* https://github.com/llvm/llvm-project/pull/168949

Original message:
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26 19:31:25 +00:00
Ramkumar Ramachandra
2d4a8dadba
[VPlan] Use DL index type consistently for GEPs (#169396)
In preparation to strip VPUnrollPartAccessor and unroll recipes
directly, strip unnecessary complication in getGEPIndexTy, as the unroll
part will no longer be available in follow-ups (see #168886 for
instance). The patch also helps by doing a mass test update up-front.
Narrowing the GEP index type conditionally does not yield any benefit,
and the change is non-functional in terms of emitted assembly. While at
it, avoid hard-coding address-space 0, and use the pointer operand's
address space to get the GEP index type.
2025-11-26 12:25:55 +00:00
David Sherwood
c0a7b15d01
[LV][NFC] Remove remaining uses of undef in tests (#169357)
Split off from PR #163525, this standalone patch replaces almost all the
remaining cases where undef is used as value in loop vectoriser tests.
This will reduce the likelihood of contributors hitting the `undef
deprecator` warning in github.
NOTE: The remaining use of undef in iv_outside_user.ll will be fixed in
a separate PR.

I've removed the test stride_undef from version-mem-access.ll, since
there is already a stride_poison test.
2025-11-26 11:24:10 +00:00
Ramkumar Ramachandra
cb63e99e58
[VPlan] Include flags in VectorPointerRecipe::printRecipe (#169466)
The change is non-functional with respect to emitted IR.
2025-11-25 10:26:51 +00:00
Ramkumar Ramachandra
c25e0d3e29
[VPlan] Simplify x + 0 -> x (#169394) 2025-11-25 05:58:41 +00:00
Florian Hahn
48eb697441
[LV] Count cost of middle block if TC <= VF. (#168949)
If the expected trip count is less than the VF, the vector loop will
only execute a single iteration. When that's the case, the cost of the
middle block has the same impact as the cost of the vector loop. Include
it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra
work in the middle block makes it unprofitable.

Note that isOutsideLoopWorkProfitable already scales the cost of blocks
outside the vector region, but the patch restricts accounting for the
middle block to cases where VF <= ExpectedTC, to initially catch some
worst cases and avoid regressions.

This initial version should specifically avoid unprofitable tail-folding
for loops with low trip counts after re-applying
https://github.com/llvm/llvm-project/pull/149042.

PR: https://github.com/llvm/llvm-project/pull/168949
2025-11-24 19:23:04 +00:00
Ramkumar Ramachandra
299ea95747
[VPlan] Drop poison-generating flags on induction trunc (#168922)
After truncating an integer-induction, neither nuw nor nsw hold.

Fixes #168902.

Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-11-21 08:14:46 +00:00
Florian Hahn
a3f6c4308a [LV] Add test a low-trip count test without folding the tail.
Add a low trip count test that is currently vectorized but unprofitable,
for https://github.com/llvm/llvm-project/issues/167858.
2025-11-20 21:24:33 +00:00
Florian Hahn
7acfbc23a7
[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289)
Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and
replace it with `onlyScalarValuesUsed`. This removes the need to carry
state from the legacy cost model through VPlan, and the VPlan-based
analysis gives more accurate results, avoiding a number of extracts.

PR: https://github.com/llvm/llvm-project/pull/168289
2025-11-20 18:58:25 +00:00
Florian Hahn
827ff2c1ce
[LV] Add tests for loops with low trip counts requiring tail-folding.
Add extra tests for over-eager tail-folding for tiny trip-count loops.

Reduced from https://github.com/llvm/llvm-project/issues/167858.
2025-11-20 18:42:12 +00:00
Florian Hahn
7c34848ae1
[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247)
This patch implements a transform to hoists single-scalar replicated
loads with invariant addresses out of the vector loop to the preheader
when scoped noalias metadata proves they cannot alias with any stores in
the loop.

This enables hosting of loads we can prove do not alias any stores in
the loop due to memory runtime checks added during vectorization.

PR: https://github.com/llvm/llvm-project/pull/166247
2025-11-18 09:35:48 +00:00
Ramkumar Ramachandra
ef023cae38
Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)
Changes: The previous patch had to be reverted to a mismatching-OpType
assert in cse. The reduced-test has now been added corresponding to a
RVV pointer-induction, and the pointer-induction case has been updated
to use createOverflowingBinaryOp.

While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-17 13:44:25 +00:00
Alex Bradbury
f2336d4c7e
Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080)
Reverts llvm/llvm-project#163538

This is causing build failures on the two-stage RVV buildbots. e.g.
https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a
reproducer and more information at
https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822

This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.
2025-11-14 16:11:48 +00:00
Ramkumar Ramachandra
355e0f94af
[VPlan] Expand WidenInt inductions with nuw/nsw (#163538)
While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-14 12:10:55 +00:00
Florian Hahn
a6edeedbfa Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b.

This appears to be causing some runtime failures on RISCV
https://lab.llvm.org/buildbot/#/builders/210/builds/5221
2025-11-13 22:34:55 +00:00
Ramkumar Ramachandra
9ba738af2c
[VPlan] Fix assert in store-user in narrowToSingleScalars (#167686)
Follow up on c2d4c7c18b96 ([VPlan] Permit more users in
narrowToSingleScalars) to fix an assert related to WidenStore users of
the recipe being narrowed in narrowToSingleScalars.
2025-11-12 17:53:24 +00:00
Florian Hahn
62d1a080e6
[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-12 15:11:00 +00:00
Luke Lau
bfd4155f23
[VPlan] Don't apply predication discount to non-originally-predicated blocks (#160449)
Split off from #158690. Currently if an instruction needs predicated due
to tail folding, it will also have a predicated discount applied to it
in multiple places.
This is likely inaccurate because we can expect a tail folded
instruction to be executed on every iteration bar the last.

This fixes it by checking if the instruction/block was originally
predicated, and in doing so prevents vectorization with tail folding
where we would have had to scalarize the memory op anyway.

On llvm-test-suite this causes 4 loops in total to no longer be
vectorized with -O3 on arm64-apple-darwin, and there's no observable
performance impact.
2025-11-10 12:10:40 +00:00
Ramkumar Ramachandra
2d1d5fe78e
[VPlan] Simplify branch-cond with getVectorTripCount (#155604)
Call getVectorTripCount first, and call getTripCount failing that, in
simplifyBranchConditionForVFAndUF, to simplify missed cases. While at
it, strip the dead check for a zero TC.
2025-11-10 10:43:37 +00:00
Florian Hahn
b0b4616790
[VPlan] Handle single-scalar conds in VPWidenSelectRecipe. (#165506)
Generalize VPWidenSelectRecipe codegen to consider single-scalar
conditions instead of just loop-invariant ones.

If the condition is a single-scalar, we can simply use a scalar
condition.

PR: https://github.com/llvm/llvm-project/pull/165506
2025-11-05 22:11:29 +00:00
David Sherwood
7b3fe5fd42
[LV][NFC] Remove undef values in some test cases (#164401)
Split off from PR #163525, this standalone patch replaces simple cases
where undef is used as a value for arithmetic or getelementptr
instructions. This will reduce the likelihood of contributors hitting
the `undef deprecator` warning in github.
2025-11-05 09:18:02 +00:00
Florian Hahn
af9a4263a1
[LAA] Only use inbounds/nusw in isNoWrap if the GEP is dereferenced. (#161445)
Update isNoWrap to only use the inbounds/nusw flags from GEPs that are
guaranteed to be dereferenced on every iteration. This fixes a case
where we incorrectly determine no dependence.

I think the issue is isolated to code that evaluates the resulting
AddRec at BTC, just using it to compute the distance between accesses
should still be fine; if the access does not execute in a given
iteration, there's no dependence in that iteration. But isolating the
code is not straight-forward, so be conservative for now. The practical
impact should be very minor (only one loop changed across a corpus with
27k modules from large C/C++ workloads.

Fixes https://github.com/llvm/llvm-project/issues/160912.

PR: https://github.com/llvm/llvm-project/pull/161445
2025-11-04 17:08:12 +00:00
Florian Hahn
b7e922a3da
[VPlan] Convert BuildVector with all-equal values to Broadcast. (#165826)
Fold BuildVector where all operands are equal to Broadcast of the first
operand. This will subsequently make it easier to remove additional
buildvectors/broadcasts, e.g. via
https://github.com/llvm/llvm-project/pull/165506.

PR: https://github.com/llvm/llvm-project/pull/165826
2025-11-01 17:28:42 -07:00
Florian Hahn
317b42ef5c
[VPlan] Remove original recipe after narrowing to single-scalar.
Directly remove RepOrWidenR after replacing all uses. Removing the dead
user early unlocks additional opportunities for further narrowing.
2025-10-31 04:38:16 +00:00
Florian Hahn
98d3a25f74
[VPlan] Don't preserve LCSSA in expandSCEVs. (#165505)
This follows similar reasoning as 45ce88758d24
(https://github.com/llvm/llvm-project/pull/159556):

LV does not preserve LCSSA, it constructs it just before processing a
loop to vectorize. Runtime check expressions are invariant to that loop,
so expanding them should not break LCSSA form for the loop we are about
to vectorize.

LV creates SCEV and memory runtime checks early on and then disconnects
the blocks temporarily. The patch fixes a mis-compile, where previously
LCSSA construction during SCEV expand may replace uses in currently
unreachable SCEV/memory check blocks.

Fixes https://github.com/llvm/llvm-project/issues/162512

PR: https://github.com/llvm/llvm-project/pull/165505
2025-10-29 18:25:46 +00:00
paperchalice
249883d0c5
[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786)
Post cleanup for #164534.
2025-10-23 20:31:31 +08:00
Florian Hahn
bfc322dd72
Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

There have been reports of mis-compiles
in https://github.com/llvm/llvm-project/pull/149706.

Revert while I investigate.
2025-10-22 21:27:11 +01:00
Florian Hahn
8d29d09309
[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)
Move narrowInterleaveGroups to to general VPlan optimization stage.

To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.

If such a VF is found, the original VPlan is split into 2:
 a) a new clone which contains all VFs of Plan, except VFToOptimize, and
 b) the original Plan with VFToOptimize as single VF.

The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.

Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.

One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz

PR: https://github.com/llvm/llvm-project/pull/149706
2025-10-21 11:37:42 +01:00
David Sherwood
822c291aac
[LV][NFC] Remove undef from phi incoming values (#163762)
Split off from PR #163525, this standalone patch replaces
 use of undef as incoming PHI values with zero, in order
 to reduce the likelihood of contributors hitting the
 `undef deprecator` warning in github.
2025-10-21 10:49:27 +01:00
Florian Hahn
9317975a7a
[VPlan] Match legacy behavior w.r.t. using pointer phis as scalar addrs.
When the legacy cost model scalarizes loads that are used as addresses
for other loads and stores, it looks to phi nodes, if they are direct
address operands of loads/stores. Match this behavior in
isUsedByLoadStoreAddress, to fix a divergence between legacy and
VPlan-based cost model.
2025-10-20 11:09:25 +01:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Florian Hahn
b9ce7656e9
[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-19 18:49:05 +00:00
Nikita Popov
8fa4a1029c [LoopVectorize] Regenerate test checks (NFC) 2025-10-16 18:21:42 +02:00
Ramkumar Ramachandra
b71515cc76
[VPlan] Extend licm to hoist assumes (#162636)
Assumes are safe to hoist if they're guaranteed to execute, since they
don't alias, and don't throw. This mirrors what the IR-LICM does.
2025-10-16 13:59:32 +00:00
David Sherwood
c48aa54656
[LV][NFC] Remove undef from function return values (#163578)
Split off from PR #163525, this standalone patch replaces `ret * undef`
returns with `ret void` in order to reduce the likelihood of
contributors hitting the `undef deprecator` warning in github.
2025-10-16 09:49:38 +01:00
Florian Hahn
ba69e33e13
[LV] Consistently apply address def scalarization across loop.
Consistently scalarize loads used as part of address computations across
all uses in the loop. This aligns the VPlan and legacy cost model and
fixes a divergence crash. It doesn't matter if the load and address
users are in different blocks, as long as they are in the same loop, the
scalar value can be used. This removes a number of insert/extracts.
2025-10-09 22:04:15 +01:00
Florian Hahn
98ce434870
[VPlan] Skip VPBlendRecipe in isUsedByLoadStoreAddress.
VPBlendRecipes are introduced as part of if-conversion, potentially adding
a def-use chain from a load used in a compare to another load/store. In
the scalar IR, there is no connection via def-use chains, so the legacy
cost model won't consider the load used by memory operation.

Skipping blends brings the VPlan-based cost-computation in line with the
legacy cost model after https://github.com/llvm/llvm-project/pull/162157.
2025-10-08 18:43:23 +01:00