429 Commits

Author SHA1 Message Date
Luke Lau
9dd1c66e8f
[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes (#118638)
The motivation of this PR is to make #115274 easier to implement, and
should allow us to add EVL support by just passing EVL to the VF
operand.

The current difficulty with widening IVs with EVL is that
VPWidenIntOrFpInductionRecipe generates its own backedge value. Since
it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which
means we can't use the EVL since it's defined in the loop body.

The gist in this PR is to take the approach in #114305 and expand
VPWidenIntOrFpInductionRecipe into several recipes for the initial
value, phi and backedge value just before execution. I.e. this example:

```
  vector.ph:
  Successor(s): vector loop

  <x1> vector loop: {
    vector.body:
      WIDEN-INDUCTION %i = phi %start, %step, %vf
      ...
      EMIT branch-on-count ...
    No successors
  }
```

gets expanded to:

``` 
vector.ph:
  ...
  vp<%induction.start> = ...
  vp<%induction.increment> = ...

Successor(s): vector loop

<x1> vector loop: {
  vector.body:
    ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next>
    ...
    vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment>
    EMIT branch-on-count ...
  No successors
}
```

This allows us to a value defined in the loop in the backedge value, and
also means we can just reuse the existing backedge fixups in
VPlan::execute without having to specially handle it ourselves.

After this #115274 should just become a matter of setting the VF operand
to EVL (and building the increment step in the loop body, not the
preheader).
2025-06-17 18:24:07 +01:00
Florian Hahn
790df93298
[VPlan] Mark VPFirstOrderRecurrencePHI as not reading/writing memory.
First-order recurrence phis don't have side-effects and don't read or
write memory. Mark them as such.
2025-06-15 22:00:47 +01:00
Florian Hahn
577199f922
Reapply "[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035)"
This reverts commit 0604dc199c019b23746f4a54885ba0c75569cdae.

The recommitted version addresses post-commit comments and adjusts the
place the branch weights are added. It now runs before VPlans are optimized
for VF and UF, which may remove the vector loop region, causing a crash
trying to get the middle block after that. Test case added in
72f99b75afc12bb.

Original message:
Manage branch weights for the BranchOnCond in the middle block in VPlan.
This requires updating VPInstruction to inherit from VPIRMetadata, which
in general makes sense as there are a number of opcodes that could take
metadata.

There are other branches (part of the skeleton) that also need branch
weights adding.

PR: https://github.com/llvm/llvm-project/pull/143035
2025-06-14 17:20:46 +01:00
Florian Hahn
732ebf803b
[VPlan] Address post-commit comments for f68848015f62.
Assign sentinel value to named variable to clarify naming and update
comments.

Addresses post-commit comments from
https://github.com/llvm/llvm-project/pull/142291.
2025-06-14 10:44:20 +01:00
Florian Hahn
f68848015f
[VPlan] Manage Sentinel value for FindLastIV in VPlan. (#142291)
Similar to modeling the start value as operand, also model the sentinel
value as operand explicitly. This makes all require information for
code-gen available directly in VPlan.

PR: https://github.com/llvm/llvm-project/pull/142291
2025-06-13 19:17:01 +01:00
David Sherwood
541e5118ce
[LV] Use getFixedValue instead of getKnownMinValue when appropriate (#143526)
There are many places in VPlan and LoopVectorize where we use
getKnownMinValue to discover the number of elements in a vector. Where
we expect the vector to have a fixed length, I have used the stronger
getFixedValue call. I believe this is clearer and adds extra protection
in the form of an assert in getFixedValue that the vector is not
scalable.

While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also
took the liberty of simplifying the code.

In theory I believe this patch should be NFC, but I'm reluctant to add
that to the title in case we're just missing tests for some of the VPlan
changes. I built and ran the LLVM test suite when targeting neoverse-v1
and it seemed ok.
2025-06-13 11:43:50 +01:00
Philip Reames
8ee9646b06
[LV] Simplify creation of vp.load/vp.store/vp.reduce intrinsics (#143804)
The use of VectorBuilder here was simply obscuring what was actually
going on. For vp.load and vp.store, the resulting code is significantly
more idiomatic. For the vp.reduce cases, we remove several layers of
indirection, including passing parameters via implicit state on the
builder. In both cases, the code is significantly easier to follow.
2025-06-12 13:46:06 -07:00
Hans Wennborg
0604dc199c Revert "[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035)"
This caused assertion failures:

  llvm/lib/Transforms/Vectorize/VPlan.h:4021:
  llvm::VPBasicBlock* llvm::VPlan::getMiddleBlock():
  Assertion `LoopRegion && "cannot call the function after vector loop region has been removed"' failed.

See comment on the PR.

> Manage branch weights for the BranchOnCond in the middle block in VPlan.
> This requires updating VPInstruction to inherit from VPIRMetadata, which
> in general makes sense as there are a number of opcodes that could take
> metadata.
>
> There are other branches (part of the skeleton) that also need branch
> weights adding.
>
> PR: https://github.com/llvm/llvm-project/pull/143035

This reverts commit db8d34db26e9ea92c08d6e813eca9cce40c48478.
2025-06-12 13:52:05 +02:00
Luke Lau
7ef77eb998
[LV] Support scalable interleave groups for factors 3,5,6 and 7 (#141865)
Currently the loop vectorizer can only vectorize interleave groups for
power-of-2 factors at scalable VFs by recursively interleaving
[de]interleave2 intrinsics.

However after https://github.com/llvm/llvm-project/pull/124825 and
#139893, we now have [de]interleave intrinsics for all factors up to 8,
which is enough to support all types of segmented loads and stores on
RISC-V.

Now that the interleaved access pass has been taught to lower these in
#139373 and #141512, this patch teaches the loop vectorizer to emit
these intrinsics for factors up to 8, which enables scalable
vectorization for non-power-of-2 factors.

As far as I'm aware, no in-tree target will vectorize a scalable
interelave group above factor 8 because the maximum interleave factor is
capped at 4 on AArch64 and 8 on RISC-V, and the
`-max-interleave-group-factor` CLI option defaults to 8, so the
recursive [de]interleaving code has been removed for now.

Factors of 3 with scalable VFs are also turned off in AArch64 since
there's no lowering for [de]interleave3 just yet either.
2025-06-12 11:09:09 +01:00
Florian Hahn
db8d34db26
[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035)
Manage branch weights for the BranchOnCond in the middle block in VPlan.
This requires updating VPInstruction to inherit from VPIRMetadata, which
in general makes sense as there are a number of opcodes that could take
metadata.

There are other branches (part of the skeleton) that also need branch
weights adding.

PR: https://github.com/llvm/llvm-project/pull/143035
2025-06-12 10:04:08 +01:00
Florian Hahn
6108d50aed
[VPlan] Add ReductionStartVector VPInstruction. (#142290)
Add a new VPInstruction::ReductionStartVector opcode to create the start
values for wide reductions. This more accurately models the start value
creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the
line it also allows removing VPReductionPHIRecipe::RdxDesc.

PR: https://github.com/llvm/llvm-project/pull/142290
2025-06-09 20:59:12 +01:00
Florian Hahn
414590bae8
[VPlan] Infer result type for ComptueReductionResult in ::execute (NFC).
Remove explicit use of underlying instruction to get type.
2025-06-08 21:44:28 +01:00
Florian Hahn
24bd4e59b9
[VPlan] Use regular phi printing for resume phis.
As discussed in https://github.com/llvm/llvm-project/pull/140405, remove
custom printing for resume-phis and update tests.
2025-06-07 21:49:54 +01:00
Florian Hahn
5520ab3d50
[VPlan] Add ComputeAnyOfResult VPInstruction (NFC) (#141932)
Add a dedicated opcode for any-of reduction, similar to
https://github.com/llvm/llvm-project/pull/132689 and
https://github.com/llvm/llvm-project/pull/132690.

The patch also explictly adds the start value to not require
RecurrenceDescriptor during execute. It also allows freezing the start
value to make it poison-safe.

PR: https://github.com/llvm/llvm-project/pull/141932
2025-06-03 14:33:53 +01:00
Florian Hahn
0f00a96fed
[VPlan] Simplify branch on False in VPlan transform (NFC). (#140409)
Simplify branch on false, starting with the branch from the middle block
to the scalar preheader. Initially this helps simplifying the initial
VPlan construction.

Depends on https://github.com/llvm/llvm-project/pull/140405.

PR: https://github.com/llvm/llvm-project/pull/140409
2025-05-31 20:32:45 +01:00
Ramkumar Ramachandra
f057a593a7
[VPlan] Improve code in VPWidenCallRecipe (NFC) (#141926)
Use operands() instead of {op_begin(), op_end()}. Also rename
arg_operands to args to match CallBase.
2025-05-31 15:41:20 +02:00
Florian Hahn
e4ef651695
[VPlan] Simplify VPReductionPHIRecipe::execute (NFC).
Simplify VPReductionPHIRecipe::execute by handling the simple cases
first, by directly using State.get() to the appropriate start value.
2025-05-30 15:56:44 +01:00
Florian Hahn
10bd4cd9cd
[VPlan] Remove ResumePhi opcode, use regular PHI instead (NFC). (#140405)
Use regular VPPhi instead of a separate opcode for resume phis. This
removes an unneeded specialized opcode and unifies the code
(verification, printing, updating when CFG is changed).

Depends on https://github.com/llvm/llvm-project/pull/140132.

PR: https://github.com/llvm/llvm-project/pull/140405
2025-05-30 12:50:08 +01:00
Florian Hahn
9ea4924720
[VPlan] Use EMIT-SCALAR for single-scalar VPPhis (NFC).
Follow-up to https://github.com/llvm/llvm-project/pull/141428, to also
use EMIT-SCALAR for VPPhis that are single scalars.
2025-05-29 11:20:07 +01:00
Florian Hahn
5b85e4b08d
[VPlan] Use EMIT-SCALAR when printing single-scalar VPInstructions. (#141428)
By using SINGLE-SCALAR when printing, it is clear in the debug output
that those VPInstructions only produce a single scalar.

Split off in preparation for
https://github.com/llvm/llvm-project/pull/140623.

PR: https://github.com/llvm/llvm-project/pull/141428
2025-05-29 09:29:06 +01:00
Elvis Wang
332fe08f1d
[VPlan] Implement VPlan-based cost model for VPReduction, VPExtendedReduction and VPMulAccumulateReduction. (#113903)
This patch implement the VPlan-based cost model for VPReduction,
VPExtendedReduction and VPMulAccumulateReduction.

With this patch, we can calculate the reduction cost by the VPlan-based
cost model so remove the reduction costs in `precomputeCost()`.

Ref: Original instruction based implementation:
https://reviews.llvm.org/D93476
2025-05-29 11:15:16 +08:00
Florian Hahn
249301c779
[LoopUtils] Pass sentinel value directly to createFindLastIVRed (NFC).
Now that there is only a single FindLastIV recurrence kind, simply pass
the sentinel value instead of the full recurrence descriptor to tighten
the interface.
2025-05-28 22:00:11 +01:00
Florian Hahn
0d7b34bfc1
[LoopUtils] Pass start value directly to createAnyOfReduction (NFC).
Now that there is only a single AnyOf recurrence kind, simply pass the
start value instead of the full recurrence descriptor, to tighten the
interface.
2025-05-28 21:28:02 +01:00
Florian Hahn
440a8adb86
[VPlan] Use VPIRFlags to manage FMFs for ComputeReductionResult (NFC).
Manage fast-math flags using VPIRFlags from VPInstruciton, in inline
with other VPInstructions. With this change, we now print the correctly
flags for ComputeReductionResult, other than that NFC.
2025-05-28 20:54:58 +01:00
Ramkumar Ramachandra
a8edb6a548
[VPlan] Improve cast code in VPlanRecipes (NFC) (#141240) 2025-05-27 22:31:46 +02:00
Simon Pilgrim
63eb00483f VPlanRecipes.cpp - fix "not all control paths return a value" MSVC warning 2025-05-25 15:16:01 +01:00
Florian Hahn
c0506a11f4
[VPlan] Separate out logic to manage IR flags to VPIRFlags (NFC). (#140621)
This patch moves the logic to manage IR flags to a separate VPIRFlags
class. For now, VPRecipeWithIRFlags is the only class that inherits
VPIRFlags. The new class allows for simpler passing of flags when
constructing recipes, simplifying the constructors for various recipes
(VPInstruction in particular, which now just has 2 constructors, one
taking an extra VPIRFlags argument.

This mirrors the approach taken for VPIRMetadata and makes it easier to
extend in the future. The patch also adds a unified flagsValidForOpcode
to check if the flags in a VPIRFlags match the provided opcode.

PR: https://github.com/llvm/llvm-project/pull/140621
2025-05-25 11:13:11 +01:00
Florian Hahn
dcef154b5c
[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506)
Building on top of https://github.com/llvm/llvm-project/pull/114305,
replace VPRegionBlocks with explicit CFG before executing.

This brings the final VPlan closer to the IR that is generated and
helps to simplify codegen.

It will also enable further simplifications of phi handling during
execution and transformations that do not have to preserve the 
canonical IV required by loop regions. This for example could include
replacing the canonical IV with an EVL based phi while completely
removing the original canonical IV.

PR: https://github.com/llvm/llvm-project/pull/117506
2025-05-24 19:17:16 +01:00
Mel Chen
1b711b27d2
[VPlan] Clean up the function VPInstruction::generate for ComputeReductionResult, nfc (#140245)
When reducing unrolled parts, explicitly check for min/max reductions
using the function RecurrenceDescriptor::isMinMaxRecurrenceKind. Only if
the reduction is not min/max reduction, call
RecurrenceDescriptor::getOpcode() to handle other cases via CreateBinOp.

Based on https://github.com/llvm/llvm-project/pull/140242
Related to https://github.com/llvm/llvm-project/pull/118393
2025-05-19 17:31:23 +08:00
Mel Chen
f594cd0936
[IVDescriptor][LV] Return Instruction::Or for IAnyOf/FAnyOf in getOpcode(), nfc (#140242) 2025-05-19 16:17:04 +08:00
Florian Hahn
04fde85057
[VPlan] Rename isUniform(AfterVectorization) to isSingleScalar (NFC). (#140134)
Update the naming in VPReplicateRecipe and vputils to the more accurate
isSingleScalar, as the functions check for cases where only a single
scalar is needed, either because it produces the same value for all
lanes or has only their first lane used.

Discussed in https://github.com/llvm/llvm-project/pull/139150.

PR: https://github.com/llvm/llvm-project/pull/140134
2025-05-16 16:38:39 +01:00
Elvis Wang
664c937b43
[VPlan] Implement VPExtendedReduction, VPMulAccumulateReductionRecipe and corresponding vplan transformations. (#137746)
This patch introduce two new recipes.

* VPExtendedReductionRecipe
  - cast + reduction.

* VPMulAccumulateReductionRecipe
  - (cast) + mul + reduction.

This patch also implements the transformation that match following
patterns via vplan and converts to abstract recipes for better cost
estimation.

* VPExtendedReduction
  - reduce(cast(...))

* VPMulAccumulateReductionRecipe
  - reduce.add(mul(...))
  - reduce.add(mul(ext(...), ext(...))
  - reduce.add(ext(mul(ext(...), ext(...))))

The converted abstract recipes will be lower to the concrete recipes
(widen-cast + widen-mul + reduction) just before recipe execution.

Note that this patch still relies on legacy cost model the calculate the
cost for these patters.
Will enable vplan-based cost decision in #113903.

Split from #113903.
2025-05-16 10:25:38 +08:00
Florian Hahn
045fdda39d
[VPlan] Replace TTI::getOperandInfo with Ctx.getOperandInfo (NFC).
Update to use VPlan-based implementation of getOperandInfo, removing
uses of underlying IR references.
2025-05-12 22:44:54 +01:00
Florian Hahn
fb017a52e7
[VPlan] Use load/store opcode for VPWiden(Load|Store)EVLRecipe (NFC).
Removes unnecessary uses of Ingredient.
2025-05-12 21:30:18 +01:00
Florian Hahn
9a9a78eacb
[VPlan] Handle most bin-ops in VPReplicateRecipe::computeCost. (NFC)
Directly compute costs for binary ops and GEPs in
VPReplicateRecipe::computeCost. This simply ports the legacy cost
computation for uniform/replicating binary ops to the VPlan cost model.
2025-05-11 13:51:14 +01:00
Florian Hahn
5fa64d65e9
[VPlan] Use printPhiOperands for VPPhi.
Split off from  https://github.com/llvm/llvm-project/pull/139151 to land
printing improvements separately.

Updates printing of VPPhi operands to be consistent with
VPWidenPHIRecipe.
2025-05-10 12:49:29 +01:00
Florian Hahn
f2e62cfca5
[VPlan] Add VPPhi subclass for VPInstruction with PHI opcodes.(NFC) (#139151)
Similarly to VPInstructionWithType and VPIRPhi, add VPPhi as a subclass
for VPInstruction. This allows implementing the VPPhiAccessors trait,
making available helpers for generic printing of incoming values /
blocks and accessors for incoming blocks and values.

It will also allow properly verifying def-uses for values used by
VPInstructions with PHI opcodes via
https://github.com/llvm/llvm-project/pull/124838.

PR: https://github.com/llvm/llvm-project/pull/139151
2025-05-10 11:08:00 +01:00
Florian Hahn
e854c381c6
[VPlan] Manage noalias/alias_scope metadata in VPlan. (#136450)
Use VPIRMetadata added in
https://github.com/llvm/llvm-project/pull/135272
to also manage no-alias metadata added by versioning.

Note that this means we have to build the no-alias metadata up-front
once. If it is not used, it will be discarded automatically.

This also fixes a case where incorrect metadata was added to wide
loads/stores that got converted from an interleave group.

Compile-time impact is neutral:

https://llvm-compile-time-tracker.com/compare.php?from=38bf1af41c5425a552a53feb13c71d82873f1c18&to=2fd7844cfdf5ec0f1c2ce0b9b3ae0763245b6922&stat=instructions:u
2025-05-09 11:19:12 +01:00
Florian Hahn
d06d43a9e8
[VPlan] Add printPhiOperands to VPPhiAccessors, use for wide phis.
(NFC modulo debug output changes)

Add generic helper to print phi operands (incoming values) together with
their incoming blocks.

As more and more transforms are added, keeping the incoming blocks of
phis becomes more important. Print incoming blocks via VPPhiAcessors, to
make debugging easier.
2025-05-08 20:56:48 +01:00
Luke Lau
1484f82cbc
[VPlan] Add VPInstruction::StepVector and use it in VPWidenIntOrFpInductionRecipe (#129508)
Split off from #118638, this adds VPInstruction::StepVector, which
generates integer step vectors (0,1,2,...,VF). This is a step towards
eventually modelling all the separate parts of
VPWidenIntOrFpInductionRecipe in VPlan.

This is then used by VPWidenIntOrFpInductionRecipe, where we materialize
it just before unrolling so the operands stay in a fixed position.

The need for a separate operand in VPWidenIntOrFpInductionRecipe, as
well as the need to update it in
optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638
when everything is expanded in convertToConcreteRecipes.
2025-05-08 18:47:44 +08:00
Maryam Moghadas
a750893fea
[VPlan][LV] Fix invalid truncation in VPScalarIVStepsRecipe (#137832)
Replace CreateTrunc with CreateSExtOrTrunc in VPScalarIVStepsRecipe to
safely handle type conversion. This prevents assertion failures from
invalid truncation when StartIdx0 has a smaller integer type than
IntStepTy. The assertion was introduced by commit 783a846.
Fixes https://github.com/llvm/llvm-project/issues/137185
2025-05-06 12:48:21 -04:00
Florian Hahn
75532b21b1
[VPlan] Replace getPreheaderBBFor with getCFGPredecessor. (NFC)
Replace existing uses of getPreheaderBBFor with the newly added more
general getCFGPredecessor.
2025-05-05 21:47:19 +01:00
Florian Hahn
b807a2bc13
[VPlan] Move VPReplicateRecipe::execute to VPlanRecipes.cpp (NFC).
Consolidate ::execute implementation in VPlanRecipes.cpp, in line with
other ::execute implementations.
2025-05-04 22:02:01 +01:00
Florian Hahn
6e20519717
[VPlan] Add VPPhiAccessors to provide interface for phi recipes (NFC) (#129388)
Add a VPPhiAccessors class to provide interfaces to access incoming
values and blocks.

The first user is VPWidenPhiRecipe, with the other phi-like recipes
following soon.

This will also be used to verify def-use chains where users are phi-like
recipes, simplifying https://github.com/llvm/llvm-project/pull/124838.

PR: https://github.com/llvm/llvm-project/pull/129388
2025-05-04 13:47:42 +01:00
Kazu Hirata
6ab7cb7899
[Transforms] Remove unused local variables (NFC) (#138442) 2025-05-04 00:35:22 -07:00
Samuel Tebbs
fa769655e7 [LV] NFC: Make VPPartialReductionRecipe a VPReductionRecipe 2025-04-30 19:44:40 +01:00
Luke Lau
2cd829fc2c
[VectorUtils][VPlan] Consolidate VPWidenIntrinsicRecipe::onlyFirstLaneUsed and isVectorIntrinsicWithScalarOpAtArg (#137497)
We can reuse isVectorIntrinsicWithScalarOpAtArg in VectorUtils to
determine if only the first lane will be used for a
VPWidenIntrinsicRecipe, provided that we also move the VP EVL operand
check into it.

This was needed by a local patch I was working on that created a
VPWidenIntrinsicRecipe with a VP intrinsic, and prevents the need to
update the scalar arguments in two places.
2025-05-01 01:25:41 +08:00
Luke Lau
c5d780bb72
[VPlan] Remove no longer needed VP intrinsic handling in VPWidenIntrinsicRecipe::computeCost. NFCI (#137573)
Whenever calls were transformed to VP intrinsics with EVL tail folding
in #110412, this workaround was added in computeCost to avoid an
assertion when checking ICA.getArgs().

However it turned out that the actual arguments were never used and this
assertion was removed in #115983 afterwards, so it's now fine to leave
the arguments empty and use the type based cost instead.

The type based cost and value based cost are the same for these VP
intrinsics.

This was tested by adding back in the transformation code in #110412 and
checking that no assertions were still hit.
2025-04-29 21:55:20 +08:00
Florian Hahn
df21288247
[VPlan] Replace ExtractFromEnd with Extract(Last|Penultimate)Element (NFC). (#137030)
ExtractFromEnd only has 2 uses, extracting the last and penultimate
elements. Replace it with 2 separate opcodes, removing the need to
materialize and handle a constant argument.

PR: https://github.com/llvm/llvm-project/pull/137030
2025-04-25 16:27:29 +01:00
Florian Hahn
5d136f90a9
[VPlan] Manage instruction metadata in VPlan. (#135272)
Add a new helper to manage IR metadata that can be progated to generated
instructions for recipes.

This helps to remove a number of remaining uses of getUnderlyingInstr
during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/135272
2025-04-24 11:57:19 +01:00