3609 Commits

Author SHA1 Message Date
Florian Hahn
e148d2d422
[LV] Simplify existing load/store sink/hoisting tests, extend coverage.
Clean up some of the existing predicated load/store sink/hosting tests
and add additional test coverage for more complex cases.
2025-11-19 20:10:14 +00:00
Florian Hahn
0730913529
[VPlan] Print debug info for all recipes. (#168454)
Use the recently refactored VPRecipeBase::print to print debug location
for all recipes.

PR: https://github.com/llvm/llvm-project/pull/168454
2025-11-19 10:10:08 +00:00
Hassnaa Hamdi
f7f41350b4
[LV]: Skip Epilogue scalable VF greater than RemainingIterations. (#156724)
Consider skipping epilogue scalable VF when they are greater than
RemainingIterations same as fixed VF.
And skip scalable RemainingIterations from that comparison because
SCEV ATM can't evaluate non-canonical vscale-based expressions.
2025-11-19 05:11:17 +00:00
Florian Hahn
1e3ea03293
[VPlan] VPIRFlags kind for FCmp with predicate + fast-math flags (NFCI).
FCmp instructions have both a predicate and fast-math flags. Introduce a
new FCmp kind, that combines both to model this correctly in the current
system.

This should be NFC modulo VPlan printing which now includes the correct
fast-math flags.
2025-11-18 22:09:53 +00:00
Ramkumar Ramachandra
507f236f5e
[VPlan] Fix OpType-mismatch in getFlagsFromIndDesc (#168560)
Follow up on a cse OpType-mismatch crash reported due to ef023cae388d
(Reland [VPlan] Expand WidenInt inductions with nuw/nsw), setting the
OpType correctly when returning from getFlagsFromIndDesc.
2025-11-18 20:41:57 +00:00
Florian Hahn
2befda2225
[VPlan] Populate and use VPIRFlags from initial VPInstruction. (#168450)
Update VPlan to populate VPIRFlags during VPInstruction construction and
use it when creating widened recipes, instead of constructing VPIRFlags
from the underlying IR instruction each time. The VPRecipeWithIRFlags
constructor taking an underlying instruction and setting the flags based
on it has been removed.

This centralizes initial VPIRFlags creation and ensures flags are
consistently available throughout VPlan transformations and makes sure
we don't accidentally re-add flags from the underlying instruction that
already got dropped during transformations.

Follow-up to https://github.com/llvm/llvm-project/pull/167253, which did
the same for VPIRMetadata.

Should be NFC w.r.t. to the generated IR.

PR: https://github.com/llvm/llvm-project/pull/168450
2025-11-18 15:15:14 +00:00
Florian Hahn
7c34848ae1
[VPlan] Hoist loads with invariant addresses using noalias metadata. (#166247)
This patch implements a transform to hoists single-scalar replicated
loads with invariant addresses out of the vector loop to the preheader
when scoped noalias metadata proves they cannot alias with any stores in
the loop.

This enables hosting of loads we can prove do not alias any stores in
the loop due to memory runtime checks added during vectorization.

PR: https://github.com/llvm/llvm-project/pull/166247
2025-11-18 09:35:48 +00:00
Peter Collingbourne
b3c54914ef
InstCombine: Stop transforming EQ/NE of SHR to 0 to ULT/UGT if >1 use
This is a small code size optimization that lets us avoid both shifting
and comparing to a constant if we need the shifted value anyway. On most
architectures the zero comparison is cheaper than a constant comparison
(or free if the shift sets flags).

Although this change appears to remove the optimization entirely, we
continue to do this transform if there is one use because of the code
below the removed code that transforms the shift into an and, followed
by the PR10267 case in InstCombinerImpl::foldICmpAndConstConst that
transforms the and into a ult/ugt. Added a test case to verify this
explicitly.

Per [1] reduces clang .text size by 0.09% and dynamic instruction count
by 0.01%.

[1] https://llvm-compile-time-tracker.com/compare.php?from=1f38d49ebe96417e368a567efa4d650b8a9ac30f&to=0873787a12b8f2eab019d8211ace4bccc1807343&stat=size-text

Reviewers: nikic, dtcxzyw

Reviewed By: dtcxzyw

Pull Request: https://github.com/llvm/llvm-project/pull/168007
2025-11-17 19:39:20 -08:00
Florian Hahn
bafb3f6788
[LV] Add test with existing noalias metadata and runtime checks.
Add test where we have loads with existing noalias metadata and noalias
metadata gets added by loop versioning.
2025-11-17 19:01:26 +00:00
Ramkumar Ramachandra
ef023cae38
Reland [VPlan] Expand WidenInt inductions with nuw/nsw (#168354)
Changes: The previous patch had to be reverted to a mismatching-OpType
assert in cse. The reduced-test has now been added corresponding to a
RVV pointer-induction, and the pointer-induction case has been updated
to use createOverflowingBinaryOp.

While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-17 13:44:25 +00:00
Luke Lau
4d4a60cde0
[VPlan] Fix LastActiveLane assertion on scalar VF (#167897)
For a scalar only VPlan with tail folding, if it has a phi live out then
legalizeAndOptimizeInductions will scalarize the widened canonical IV
feeding into the header mask:

    <x1> vector loop: {
      vector.body:
        EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
        vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
        EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
        EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
        EMIT branch-on-count vp<%index.next>, vp<%2>
      No successors
    }
    Successor(s): middle.block

    middle.block:
      EMIT vp<%8> = last-active-lane vp<%6>
      EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
    Successor(s): ir-bb<exit>

The verifier complains about this but this should still generate the
correct last active lane, so this fixes the assert by handling this case
in isHeaderMask. There is a similar pattern already there for
ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes #167813
2025-11-17 11:03:38 +00:00
Florian Hahn
a464e3856e [LV] Check debug location for more recipes in vplan-printing.ll.
Extend test to check printing of debug locations to cover a range of
wide and replicating recipes. Currently those do not print the debug
metadata.
2025-11-16 12:15:06 +00:00
Florian Hahn
59d2e93590 [LV] Add test with to check different interleave counts for fmaxnum.
This adds additional test coverage for folding FCMP uno
(https://github.com/llvm/llvm-project/pull/166823)
2025-11-15 15:51:28 +00:00
Florian Hahn
eb98b65e82
[ValueTracking] Check across single predecessors in willNotFreeBetween. (#167965)
Extend willNotFreeBetween to perform simple checking across blocks to
support the case where CtxI is in a successor of the block that contains
the assume, but the assume's parent is the single predecessor of CtxI's
block.

This enables using _builtin_assume_dereferenceable to vectorize
std::find_if and co in practice.

End-to-end reproducer: https://godbolt.org/z/6jbsd4EjT

PR: https://github.com/llvm/llvm-project/pull/167965
2025-11-15 12:11:52 +00:00
Florian Hahn
ca26cf8611 [LV] Use variables in CHECK lines for unnamed VPValues in test.
Update test to capture unnamed VPValues in variables, making it easier
to update with future VPlan changes.
2025-11-15 12:10:03 +00:00
Florian Hahn
77fd6bef38 [LV] Also cover -force-target-instruction-cost=1 in tests.
Extend test to cover different -force-target-instruction-cost settings.
2025-11-14 21:15:14 +00:00
Alex Bradbury
f2336d4c7e
Revert "[VPlan] Expand WidenInt inductions with nuw/nsw" (#168080)
Reverts llvm/llvm-project#163538

This is causing build failures on the two-stage RVV buildbots. e.g.
https://lab.llvm.org/buildbot/#/builders/214/builds/1363. I've shared a
reproducer and more information at
https://github.com/llvm/llvm-project/pull/163538#issuecomment-3533482822

This reverts commit 355e0f94af5adabe90ac57110ce1b47596afd4cd.
2025-11-14 16:11:48 +00:00
Alexander Belyaev
7ee0e0f956 Revert "[LICM] Sink unused l-invariant loads in preheader. #157559"
This reverts commit 469702c5d5cc4fa18c3a962afb971950a084f373.

https://github.com/llvm/llvm-project/issues/168048
2025-11-14 14:51:33 +01:00
Ramkumar Ramachandra
355e0f94af
[VPlan] Expand WidenInt inductions with nuw/nsw (#163538)
While at it, record VPIRFlags in VPWidenInductionRecipe.
2025-11-14 12:10:55 +00:00
Luke Lau
851f8f7984
[VPlan] Disable partial reductions again with EVL tail folding (#167863)
VPPartialReductionRecipe doesn't yet support an EVL variant, and we
guard against this by not calling convertToAbstractRecipes when we're
tail folding with EVL.

However recently some things got shuffled around which means we may
detect some scaled reductions in collectScaledReductions and store them
in ScaledReductionMap, where outside of convertToAbstractRecipes we may
look them up and start e.g. adding a scale factor to an otherwise
regular VPReductionPHI.

This fixes it by skipping collectScaledReductions, and fixes #167861
2025-11-14 06:30:12 +00:00
Florian Hahn
a6edeedbfa Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"
This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b.

This appears to be causing some runtime failures on RISCV
https://lab.llvm.org/buildbot/#/builders/210/builds/5221
2025-11-13 22:34:55 +00:00
Florian Hahn
79cd1b7a25 [LV] Drop verbose check-prefix from partial-reduce-incomplete-chains.ll.
There's only a single RUN line in the test, use the more compact default CHECK.
2025-11-13 22:18:01 +00:00
Florian Hahn
6429549a72 [LV] Add early-exit tests, where deref assumes are not in preheader.
Test case for vectorizing std::find_if with
builtin_assume_dereferenceable. Currently not vectorized.

https://godbolt.org/z/6jbsd4EjT
2025-11-13 20:54:05 +00:00
Matt Arsenault
d4c8cfeac0
AArch64: Regenerate baseline checks in loop vectorize test (#167926) 2025-11-13 19:11:32 +00:00
Ryan Buchner
a04c6b5512
[LV] Update LoopVectorizationPlanner::emitInvalidCostRemarks to handle reduction plans (#165913)
The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case.

Fixes #165359.
2025-11-13 06:12:40 -10:00
Luke Lau
c0f7d51e8a
[VPlan] Simplify ExplicitVectorLength(%AVL) -> %AVL when AVL <= VF (#167647)
[`llvm.experimental.get.vector.length`](https://llvm.org/docs/LangRef.html#id2399)
has the property that if the AVL (%cnt) is less than or equal to VF
(%max_lanes) then the return value is just AVL.

This patch uses SCEV to simplify this in optimizeForVFAndUF, and adds
`ExplicitVectorLength` to
`VPInstruction::opcodeMayReadOrWriteFromMemory` so it gets removed once
dead.
2025-11-13 13:17:01 +00:00
Ramkumar Ramachandra
9ba738af2c
[VPlan] Fix assert in store-user in narrowToSingleScalars (#167686)
Follow up on c2d4c7c18b96 ([VPlan] Permit more users in
narrowToSingleScalars) to fix an assert related to WidenStore users of
the recipe being narrowed in narrowToSingleScalars.
2025-11-12 17:53:24 +00:00
Florian Hahn
62d1a080e6
[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)
Building on top of https://github.com/llvm/llvm-project/pull/148817,
introduce a new abstract LastActiveLane opcode that gets lowered to
Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1).

When folding the tail, update all extracts for uses outside the loop the
extract the value of the last actice lane.

See also https://github.com/llvm/llvm-project/issues/148603

PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-12 15:11:00 +00:00
Luke Lau
02c68b3ef7
[VPlan] Plumb scalable register size through narrowInterleaveGroups (#167505)
On RISC-V narrowInterleaveGroups doesn't kick in because the wrong
VectorRegWidth is passed to isConsecutiveInterleaveGroup.

narrowInterleaveGroups is always passed the RGK_FixedWidthVector
register size, but on RISC-V the RGK_ScalableVector size is twice as
large because we want to use LMUL 2. This causes the `GroupSize ==
VectorRegWidth` check to fail.

This fixes it by using the scalable register size whenever the VF is
scalable and plumbing it through as a potentially scalable TypeSize.

Note that this only makes a difference when tail folding is disabled, as
narrowInterleaveGroups can't handle EVL based IVs yet.
2025-11-12 11:14:53 +00:00
Florian Hahn
b9f0dadc10
[VPlan] Merge fcmp uno feeding Or. (#167251)
Fold
 or (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
 or (fcmp uno %A, %B), ...

This pattern is generated to check if any vector lane is NaN, and
combining multiple compares is beneficial on architectures that have
dedicated instructions.

Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM

Combine suggested as part of #161735

PR: https://github.com/llvm/llvm-project/pull/167251
2025-11-12 10:15:59 +00:00
Mel Chen
68a4af6acc
[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem (#154072)
Since div/rem operations don’t support a mask operand, the lanes of the
divisor that are masked out are currently replaced with 1 using
VPInstruction::Select before the predicated div/rem operation.
This patch replaces
```
  VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS)
```
with
```
  vp.merge(conditional_mask, LHS, RHS, EVL)
```
so that the header mask can be replaced by EVL in this usage scenario
when tail folding with EVL.
2025-11-12 08:03:57 +00:00
Florian Hahn
b612b10c9c
[VPlan] Add tests for hoisting predicated loads.
Adds test coverage with loops where the same loads get executed under
complementary predicates and can be hoisted, together with a set of
negative test cases.
2025-11-11 22:57:24 +00:00
Ramkumar Ramachandra
c8c328406c
Revert "[VPlan] Handle WidenGEP in narrowToSingleScalars" (#167509)
This reverts commit fdd52f5fe130fb8b98f4aed3d15aa0789cce6b40, as it
causes buildbot failures. This will give us time to investigate the
failure.

https://lab.llvm.org/buildbot/#/builders/210/builds/5160
2025-11-11 14:29:28 +00:00
Kerry McLaughlin
de3de3f143
[LV] Consider interleaving when -enable-wide-lane-mask=true (#163387)
Currently the only way to enable the use of wide active lane masks is to pass
-enable-wide-lane-mask and force both interleaving & tail-folding with additional
flags. This patch changes selectInterleaveCount to consider interleaving if wide
lane masks were requested, although the feature remains off by default.
2025-11-11 11:46:59 +00:00
Ramkumar Ramachandra
fdd52f5fe1
[VPlan] Handle WidenGEP in narrowToSingleScalars (#166740)
This allows us to strip a special case in VPWidenGEP::execute.
2025-11-11 10:33:55 +00:00
Sander de Smalen
517d725463
[LV] Move condition to VPPartialReductionRecipe::execute (#166136)
This means that VPExpressions will now be constructed for
VPPartialReductionRecipe's when the loop has tail-folding predication.

Note that control-flow (if/else) predication is not yet handled
for partial reductions, because of the way partial reductions
are recognised and built up.
2025-11-11 09:42:54 +00:00
Ramkumar Ramachandra
c2d4c7c18b
[VPlan] Permit more users in narrowToSingleScalars (#166559)
narrowToSingleScalarRecipes can permit users that are WidenStore, or a
VPInstruction that has a suitable opcode. This is a generalization and
extension of the existing code.
2025-11-10 17:03:14 +00:00
Vladislav Dzhidzhoev
e2a2c03eef
[DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (#166855)
These checks ensure that retained nodes of a DISubprogram belong to the
subprogram.

Tests with incorrect IR are fixed. We should not have variables of one subprogram present in retained nodes of other subprograms.

Also, interface for accessing DISubprogram's retained nodes is slightly
refactored. `DISubprogram::visitRetainedNodes` and
`DISubprogram::forEachRetainedNode` are added to avoid repeating checks
like
```
if (const auto *LV = dyn_cast<DILocalVariable>(N))
  ...
else if (const auto *L = dyn_cast<DILabel>(N))
  ...
else if (const auto *IE = dyn_cast<DIImportedEntity>(N))
  ...
```
2025-11-10 13:13:49 +01:00
Luke Lau
bfd4155f23
[VPlan] Don't apply predication discount to non-originally-predicated blocks (#160449)
Split off from #158690. Currently if an instruction needs predicated due
to tail folding, it will also have a predicated discount applied to it
in multiple places.
This is likely inaccurate because we can expect a tail folded
instruction to be executed on every iteration bar the last.

This fixes it by checking if the instruction/block was originally
predicated, and in doing so prevents vectorization with tail folding
where we would have had to scalarize the memory op anyway.

On llvm-test-suite this causes 4 loops in total to no longer be
vectorized with -O3 on arm64-apple-darwin, and there's no observable
performance impact.
2025-11-10 12:10:40 +00:00
Ramkumar Ramachandra
2d1d5fe78e
[VPlan] Simplify branch-cond with getVectorTripCount (#155604)
Call getVectorTripCount first, and call getTripCount failing that, in
simplifyBranchConditionForVFAndUF, to simplify missed cases. While at
it, strip the dead check for a zero TC.
2025-11-10 10:43:37 +00:00
Sam Parker
d10a85167a
[WebAssembly] Implement more of getCastInstrCost (#164612)
Fill out more information for sign and zero extend and add some truncate
information; however, the primary change is to int/fp conversions. In
particular, fp to (narrow) int appears to be relatively expensive.
2025-11-10 08:07:16 +00:00
Florian Hahn
3b219cf42a [LV] Add register pressure test for #164124.
Add extra test for https://github.com/llvm/llvm-project/pull/164124
2025-11-08 21:59:38 +00:00
Florian Hahn
3ee2f07e17
[VPlan] Support multiple F(Max|Min)Num reductions. (#161735)
Generalize handleMaxMinNumReductions to handle any number of
F(Max|Min)Num reductions by collecting a vector of reductions to
convert.

We then add NaN checks for all of them, followed by adjusting the branch
controlling the vector loop region, and updating the resume phis.

Addresses a TODO from https://github.com/llvm/llvm-project/pull/148239

PR: https://github.com/llvm/llvm-project/pull/161735
2025-11-07 13:59:06 +00:00
Luke Lau
bac427a0f6
[VPlan] Remove no-longer-needed EVL VPlan debug output tests. NFC (#166158)
These VPlan debug output tests were added in
https://github.com/llvm/llvm-project/pull/108351 and
https://github.com/llvm/llvm-project/pull/110412, whenever we used to
convert regular widening recipes to VP intrinsics during EVL tail
folding.

Nowadays we don't convert these recipes so there's nothing really to be
gained from testing them. This removes the VPlan tests since an upcoming
patch slightly perturbs these VPlans and removing them seems easier than
manually going through and updating them all.

I've kept behind the LLVM IR/UTC counterparts in
`tail-folding-{cast,call}-intrinsics.ll`, since even though they also
aren't really testing anything useful at least they're easy to update.
2025-11-07 16:34:10 +08:00
Florian Hahn
3ad5765e23
[LV] Check all users of partial reductions in chain have same scale. (#162822)
Check that all partial reductions in a chain are only used by other
partial reductions with the same scale factor. Otherwise we end up
creating users of scaled reductions where the types of the other
operands don't match.

A similar issue was addressed in
https://github.com/llvm/llvm-project/pull/158603, but misses the chained
cases.

Fixes https://github.com/llvm/llvm-project/issues/162530.

PR: https://github.com/llvm/llvm-project/pull/162822
2025-11-06 21:45:57 +00:00
Martin Storsjö
8b3a124ad8 Revert "[InterleavedAccess] Construct interleaved access store with shuffles"
This reverts commit 78d649199b47370b72848c1ca8d9bd3323b050ac.

That commit caused failed asserts, see
https://github.com/llvm/llvm-project/pull/164000 for details.
2025-11-06 11:09:26 +02:00
Shikhar Jain
9100c9212d
[AArch64] Enable masked load/store for Streaming-SVE with -march=armv8-a+sme (#163133)
For subtarget aarch64, isLegalMaskedLoadStore() should not return false
for Streaming-SVE. Thus now on usage of -march=armv8-a+sme & for
workloads that contains loops with control flow where predication is
data dependent on any array/vectors, masked load/stores along with
necessary scalable vectorization constructs would be emitted.

Fixes: #162797
2025-11-06 07:15:27 +00:00
Florian Hahn
efe8573127
[LV] Add extra tests for narrowing interleave groups with op chains.
Add additional tests to cover chains of ops feeding interleave groups,
some of which could be narrowed.
2025-11-05 23:11:41 +00:00
Florian Hahn
b0b4616790
[VPlan] Handle single-scalar conds in VPWidenSelectRecipe. (#165506)
Generalize VPWidenSelectRecipe codegen to consider single-scalar
conditions instead of just loop-invariant ones.

If the condition is a single-scalar, we can simply use a scalar
condition.

PR: https://github.com/llvm/llvm-project/pull/165506
2025-11-05 22:11:29 +00:00
Florian Hahn
54190970cf
[LV] Add tests for narrowing interleave groups with casts.
Add additional tests for narrowing interleave groups with casts.
2025-11-05 20:57:52 +00:00