436 Commits

Author SHA1 Message Date
Florian Hahn
94b2617590
[VPlan] Remove VPIRPhis in exit blocks when deleting scalar loop BBs.
DeleteDeadBlocks will remove single-entry phis. Remove them from the exit
VPIRBBs in VPlan as well, otherwise we would retain references to deleted
IR instructions.

Fixes MSan failures after 8907b6d39
https://lab.llvm.org/buildbot/#/builders/164/builds/14013
2025-10-01 22:01:23 +01:00
Florian Hahn
8907b6d393
[VPlan] Remove original loop blocks if dead. (#155497)
Build on top of https://github.com/llvm/llvm-project/pull/154510 to
completely remove the blocks of dead scalar loops.

Depends on https://github.com/llvm/llvm-project/pull/154510. 

PR: https://github.com/llvm/llvm-project/pull/155497
2025-10-01 16:53:59 +00:00
Florian Hahn
f61be43525
Revert "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)"
This reverts commit b4be7ecaf06bfcb4aa8d47c4fda1eed9bbe4ae77.

See https://github.com/llvm/llvm-project/issues/161404 for a crash
exposed by the change. Revert while I investigate.
2025-09-30 22:13:06 +01:00
Florian Hahn
b4be7ecaf0
[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: https://github.com/llvm/llvm-project/pull/160053
2025-09-29 08:08:09 +00:00
Florian Hahn
41f3438362
[VPlan] Remove dead code for scalar VFs in VPRegionBlock::cost (NFC).
The VPlan cost model is not used to compute costs of scalar VFs
currently, as conversion to replicate regions makes accurately computing
the original scalar cost difficult.

Remove left over, dead code.
2025-09-28 17:30:57 +01:00
Shih-Po Hung
0d22f8344a
[LV][EVL] Remove metadata on EVL vectorized loops (#155760)
This patch  removes the metadata emission for EVL‑vectorized loops,
since there is no current in-tree consumer: 
   1) after VPlan performs canonical IV replacement #147222 and 
2) RISCV dropped EVLIndVarSimplifyPass #151483, which was the only user
of this metadata.
2025-09-23 07:39:33 +08:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Florian Hahn
30e9cbacab
[VPlan] Move logic to compute scalarization overhead to cost helper(NFC)
Extract the logic to compute the scalarization overhead to a helper for
easy re-use in the future.
2025-09-13 20:41:44 +01:00
Florian Hahn
b8eaceb39b
[VPlan] Explicitly replicate VPInstructions by VF. (#155102)
Extend replicateByVF added in #142433 (aa240293190) to also explicitly
unroll replicating VPInstructions.

Now the only remaining case where we replicate for all lanes is
VPReplicateRecipes in replicate regions.

PR: https://github.com/llvm/llvm-project/pull/155102
2025-09-12 17:06:26 +01:00
Florian Hahn
8796dfdcba
[VPlan] Consolidate logic to update loop metadata and profile info.
This patch consolidates updating loop metadata and profile info for both
the remainder and vector loops in a single place. This is NFC, modulo
consistently applying vectorization specific metadata also in the
experimental VPlan-native path.

Split off from https://github.com/llvm/llvm-project/pull/154510.
2025-09-04 21:50:40 +01:00
Florian Hahn
507ff082c2
[VPlan] Move runtime check blocks to correct position during exec (NFC).
Move adjusting the position of completely disconnected IR blocks to
VPIRBasicBlock::execute.
2025-09-01 16:15:02 +01:00
Florian Hahn
a53a5ed65d
[VPlan] Add VPBlockBase::hasPredecessors (NFC).
Split off from https://github.com/llvm/llvm-project/pull/154510/, add
helper to check if a block has any predecessors.
2025-09-01 09:44:49 +01:00
Ramkumar Ramachandra
1e0e0e0a56
[VPlan] Improve style around container-inserts (NFC) (#155174) 2025-08-26 14:12:59 +01:00
Florian Hahn
7e9989390d
[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487)
Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to
serve their users requiring a vector, instead of doing so when unrolling
by VF.

Now we only need to implicitly build vectors in VPTransformState::get
for VPInstructions. Once they are also unrolled by VF we can remove the
code-path alltogether.

PR: https://github.com/llvm/llvm-project/pull/151487
2025-08-18 20:49:42 +01:00
Florian Hahn
5892a2beec
[VPlan] Remove dead code from GetBroadCastInstr (NFCI).
All relevant places should already explicitly materialize broadcasts.
Remove dead code from VPTransformState::get
2025-08-17 21:51:14 +01:00
Florian Hahn
424258947e
[VPlan] Materialize VF and VFxUF using VPInstructions. (#152879)
Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.

This is one of the last few steps needed to model the full vector
skeleton in VPlan.

This is mostly NFC, although in some cases we remove some unused
computations.

PR: https://github.com/llvm/llvm-project/pull/152879
2025-08-12 14:13:13 +01:00
Luke Lau
aea82a780a
[VPlan] Remove some getCanonicalIV() uses. NFC (#152969)
A lot of time getCanonicalIV() is used to get the canonical IV type,
e.g. to instantiate a VPTypeAnalysis or to get the LLVMContext.

However VPTypeAnalysis has a constructor that takes the VPlan directly
and there's a method on VPlan to get the LLVMContext directly, so use
those instead where possible.

This lets us remove a constructor on VPTypeAnalysis.

Also remove an unused LLVMContext argument in UnrollState whilst we're
here.
2025-08-11 18:12:05 +08:00
Florian Hahn
82d633e9ff
[VPlan] Materialize vector trip count using VPInstructions. (#151925)
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.

PR: https://github.com/llvm/llvm-project/pull/151925
2025-08-08 11:44:32 +01:00
Florian Hahn
95c32bf2d4
[VPlan] Return invalid cost if any skeleton block has invalid costs. (#151940)
We need to reject plans that contain recipes with invalid costs. LICM
can move recipes with invalid costs out of the loop region, which then
get missed by the main cost computation.

Extend the logic to check recipes for invalid cost currently only
covering the middle block to include all skeleton blocks.

Fixes https://github.com/llvm/llvm-project/issues/144358 
Fixes https://github.com/llvm/llvm-project/issues/151664

PR: https://github.com/llvm/llvm-project/pull/151940
2025-08-07 10:45:27 +01:00
Luke Lau
94a6cd464e
[VPlan] Expand VPWidenPointerInductionRecipe into separate recipes (#148274)
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.

There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.

Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
2025-08-05 16:54:02 +08:00
Florian Hahn
559d1dff89
[VPlan] Materialize BackedgeTakenCount using VPInstructions.
Explicitly compute the backedge-taken count using VPInstruction. This is
needed to model the full skeleton in VPlan.

NFC modulo some instruction re-ordering.
2025-08-03 12:21:28 +01:00
Florian Hahn
fa3ec0c17c
[VPlan] Materialize constant vector trip counts before final opts. (#142309)
Materialize constant vector trip counts before ::execute, if the trip
count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now
this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle
block.

For now this is also only done when not vectorizing the epilogue, as the
simplification complicates stitching the 2 plans together.

PR: https://github.com/llvm/llvm-project/pull/142309
2025-07-26 17:16:36 +01:00
Florian Hahn
64686c59c3
[VPlan] Connect (MemRuntime|SCEV)Check blocks as VPlan transform (NFC). (#143879)
Connect SCEV and memory runtime check block directly in VPlan as
VPIRBasicBlocks, removing ILV::emitSCEVChecks and
ILV::emitMemRuntimeChecks.

The new logic is currently split across
LoopVectorizationPlanner::addRuntimeChecks which collects a list of
{Condition, CheckBlock} pairs and performs some checks and emits remarks
if needed. The list of checks is then added to VPlan in
VPlanTransforms::connectCheckBlocks.

PR: https://github.com/llvm/llvm-project/pull/143879
2025-07-09 14:03:25 +02:00
Igor Kirillov
aeec2c6e48
[VPlan] Speed up VPSlotTracker by using ModuleSlotTracker (#139881)
Currently, when VPSlotTracker is initialized with a VPlan, its
assignName method calls printAsOperand on each underlying instruction.
Each such call recomputes slot numbers for the entire function, leading
to O(N × M) complexity, where M is the number of instructions in the
loop and N is the number of instructions in the function.

This results in slow debug output for large loops. For example, printing
costs of all instructions becomes O(M² × N), which is especially painful
when enabling verbose dumps.

This patch improves debugging performance by caching slot numbers using
ModuleSlotTracker. It avoids redundant recomputation and makes debug
output significantly faster.
2025-06-26 22:40:48 +01:00
Florian Hahn
aa24029319
[VPlan] Unroll VPReplicateRecipe by VF. (#142433)
Explicitly unroll VPReplicateRecipes outside replicate regions by VF,
replacing them by VF single-scalar recipes. Extracts for operands are
added as needed and the scalar results are combined to a vector using a
new BuildVector VPInstruction.

It also adds a few folds to simplify unnecessary extracts/BuildVectors.

It also adds a BuildStructVector opcode for handling of calls that have
struct return types.

VPReplicateRecipe in replicate regions can will be unrolled as follow
up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e.
not executable.

PR: https://github.com/llvm/llvm-project/pull/142433
2025-06-26 11:19:09 +01:00
LiqinWeng
4ac4726d00
[VPlan] Format some print forms.NFC (#144644) 2025-06-25 16:14:50 +08:00
Florian Hahn
9f7a155394
[VPlan] Update packScalarIntoVector to take and return wide value (NFC)
Make the function more flexible in preparation for new users.
2025-06-21 18:03:14 +01:00
Arthur Eubanks
dfe4d44d8d
Revert "[VPlan] Remove unnecessary DomTreeUpdater flush (NFC)." (#144758)
This reverts commit 2e337349f436d75af112c081df5ec683871cbcc8.

Causes breakages internally, will post reproducer later.
2025-06-18 11:00:13 -07:00
Luke Lau
9dd1c66e8f
[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes (#118638)
The motivation of this PR is to make #115274 easier to implement, and
should allow us to add EVL support by just passing EVL to the VF
operand.

The current difficulty with widening IVs with EVL is that
VPWidenIntOrFpInductionRecipe generates its own backedge value. Since
it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which
means we can't use the EVL since it's defined in the loop body.

The gist in this PR is to take the approach in #114305 and expand
VPWidenIntOrFpInductionRecipe into several recipes for the initial
value, phi and backedge value just before execution. I.e. this example:

```
  vector.ph:
  Successor(s): vector loop

  <x1> vector loop: {
    vector.body:
      WIDEN-INDUCTION %i = phi %start, %step, %vf
      ...
      EMIT branch-on-count ...
    No successors
  }
```

gets expanded to:

``` 
vector.ph:
  ...
  vp<%induction.start> = ...
  vp<%induction.increment> = ...

Successor(s): vector loop

<x1> vector loop: {
  vector.body:
    ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next>
    ...
    vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment>
    EMIT branch-on-count ...
  No successors
}
```

This allows us to a value defined in the loop in the backedge value, and
also means we can just reuse the existing backedge fixups in
VPlan::execute without having to specially handle it ourselves.

After this #115274 should just become a matter of setting the VF operand
to EVL (and building the increment step in the loop body, not the
preheader).
2025-06-17 18:24:07 +01:00
Florian Hahn
790df93298
[VPlan] Mark VPFirstOrderRecurrencePHI as not reading/writing memory.
First-order recurrence phis don't have side-effects and don't read or
write memory. Mark them as such.
2025-06-15 22:00:47 +01:00
David Sherwood
541e5118ce
[LV] Use getFixedValue instead of getKnownMinValue when appropriate (#143526)
There are many places in VPlan and LoopVectorize where we use
getKnownMinValue to discover the number of elements in a vector. Where
we expect the vector to have a fixed length, I have used the stronger
getFixedValue call. I believe this is clearer and adds extra protection
in the form of an assert in getFixedValue that the vector is not
scalable.

While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also
took the liberty of simplifying the code.

In theory I believe this patch should be NFC, but I'm reluctant to add
that to the title in case we're just missing tests for some of the VPlan
changes. I built and ran the LLVM test suite when targeting neoverse-v1
and it seemed ok.
2025-06-13 11:43:50 +01:00
Stephen Tozer
a08a831515
[DLCov][NFC] Propagate annotated DebugLocs through transformations (#138047)
Part of the coverage-tracking feature, following #107279.

In order for DebugLoc coverage testing to work, we firstly have to set
annotations for intentionally-empty DebugLocs, and secondly we have to
ensure that we do not drop these annotations as we propagate DebugLocs
throughout compilation. As the annotations exist as part of the DebugLoc
class, and not the underlying DILocation, they will not survive a
DebugLoc->DILocation->DebugLoc roundtrip. Therefore this patch modifies
a number of places in the compiler to propagate DebugLocs directly
rather than via the underlying DILocation. This has no effect on the
output of normal builds; it only ensures that during coverage builds, we
do not drop incorrectly annotations and therefore create false
positives.

The bulk of these changes are in replacing
DILocation::getMergedLocation(s) with a DebugLoc equivalent, and in
changing the IRBuilder to store a DebugLoc directly rather than storing
DILocations in its general Metadata array. We also use a new function,
`DebugLoc::orElse`, which selects the "best" DebugLoc out of a pair
(valid location > annotated > empty), preferring the current DebugLoc on
a tie - this encapsulates the existing behaviour at a few sites where we
_may_ assign a DebugLoc to an existing instruction, while extending the
logic to handle annotation DebugLocs at the same time.
2025-06-12 14:06:27 +01:00
Florian Hahn
2e337349f4
[VPlan] Remove unnecessary DomTreeUpdater flush (NFC).
The current version does not need the explicit flush at this point.
2025-06-05 08:17:42 +01:00
Florian Hahn
2eab83f618
[VPlan] Remove CanonicalIV when dissolving loop regions (NFC). (#142372)
Directly replace the canonical IV when we dissolve the containing
region. That ensures that it won't get removed before the region gets
removed, which would result in an invalid region.

This removes the current ordering constraint between
convertToConcreteRecipes and dissolving regions.

PR: https://github.com/llvm/llvm-project/pull/142372
2025-06-03 10:05:28 +01:00
Florian Hahn
dcef154b5c
[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506)
Building on top of https://github.com/llvm/llvm-project/pull/114305,
replace VPRegionBlocks with explicit CFG before executing.

This brings the final VPlan closer to the IR that is generated and
helps to simplify codegen.

It will also enable further simplifications of phi handling during
execution and transformations that do not have to preserve the 
canonical IV required by loop regions. This for example could include
replacing the canonical IV with an EVL based phi while completely
removing the original canonical IV.

PR: https://github.com/llvm/llvm-project/pull/117506
2025-05-24 19:17:16 +01:00
Florian Hahn
672e9263cb
Reapply "[VPlan] Support cloning initial VPlan (NFC)."
This reverts commit 204252e2df80876702616518a5154dccacf3ebac.

Recommit with a fix for the leak in a unit test.
2025-05-23 21:22:31 +01:00
Florian Hahn
204252e2df
Revert "[VPlan] Support cloning initial VPlan (NFC)."
This reverts commit 5fa985e751c8f890fff31e190473aeeb6f7a9fc5.

Revert as this seems to introduce a call to a pure virtual function on a
few configs, e.g.
    https://lab.llvm.org/buildbot/#/builders/169/builds/11535
2025-05-18 22:03:00 +01:00
Florian Hahn
5fa985e751
[VPlan] Support cloning initial VPlan (NFC).
Support cloning VPlans as they are created by the initial buildVPlan,
i.e. scalar header not yet connected and no trip-count set. This is not
used yet but will in follow-up changes/

Also add a unit test for cloning & printing.
2025-05-18 19:37:17 +01:00
Florian Hahn
ba93685ea2
[VPlan] Also use original parent loop for exit VPBBs.
When vectorizing loops with early exits that is nested within another
one, one of the loop exits may be outside both loops, so setting adding
it to the parent loop is incorrect. Also use the original parent loop
for exit blocks.
2025-05-16 21:12:39 +01:00
Florian Hahn
04fde85057
[VPlan] Rename isUniform(AfterVectorization) to isSingleScalar (NFC). (#140134)
Update the naming in VPReplicateRecipe and vputils to the more accurate
isSingleScalar, as the functions check for cases where only a single
scalar is needed, either because it produces the same value for all
lanes or has only their first lane used.

Discussed in https://github.com/llvm/llvm-project/pull/139150.

PR: https://github.com/llvm/llvm-project/pull/140134
2025-05-16 16:38:39 +01:00
Florian Hahn
e854c381c6
[VPlan] Manage noalias/alias_scope metadata in VPlan. (#136450)
Use VPIRMetadata added in
https://github.com/llvm/llvm-project/pull/135272
to also manage no-alias metadata added by versioning.

Note that this means we have to build the no-alias metadata up-front
once. If it is not used, it will be discarded automatically.

This also fixes a case where incorrect metadata was added to wide
loads/stores that got converted from an interleave group.

Compile-time impact is neutral:

https://llvm-compile-time-tracker.com/compare.php?from=38bf1af41c5425a552a53feb13c71d82873f1c18&to=2fd7844cfdf5ec0f1c2ce0b9b3ae0763245b6922&stat=instructions:u
2025-05-09 11:19:12 +01:00
Florian Hahn
75532b21b1
[VPlan] Replace getPreheaderBBFor with getCFGPredecessor. (NFC)
Replace existing uses of getPreheaderBBFor with the newly added more
general getCFGPredecessor.
2025-05-05 21:47:19 +01:00
Florian Hahn
6e20519717
[VPlan] Add VPPhiAccessors to provide interface for phi recipes (NFC) (#129388)
Add a VPPhiAccessors class to provide interfaces to access incoming
values and blocks.

The first user is VPWidenPhiRecipe, with the other phi-like recipes
following soon.

This will also be used to verify def-use chains where users are phi-like
recipes, simplifying https://github.com/llvm/llvm-project/pull/124838.

PR: https://github.com/llvm/llvm-project/pull/129388
2025-05-04 13:47:42 +01:00
Kazu Hirata
6ab7cb7899
[Transforms] Remove unused local variables (NFC) (#138442) 2025-05-04 00:35:22 -07:00
Florian Hahn
daf32369dd
[VPlan] Move scalarizeInstruction out of ILV (NFC).
15bb1db4a9830 removed the last dependency on ILV, move the code out of
ILV in preparation of consolidating in VPlanRecipes.cpp.
2025-05-03 20:52:03 +01:00
Florian Hahn
5d136f90a9
[VPlan] Manage instruction metadata in VPlan. (#135272)
Add a new helper to manage IR metadata that can be progated to generated
instructions for recipes.

This helps to remove a number of remaining uses of getUnderlyingInstr
during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/135272
2025-04-24 11:57:19 +01:00
Florian Hahn
cab75384af
[VPlan] Only generate exit blocks for unique exit blocks.
Make sure we don't generate unnecessary blocks.
2025-04-18 19:04:22 +01:00
Sam Tebbs
b658a2e74a
[LV] Reduce register usage for scaled reductions (#133090)
This PR accounts for scaled reductions in `calculateRegisterUsage` to
reflect the fact that the number of lanes in their output is smaller
than the VF.

Depends on https://github.com/llvm/llvm-project/pull/126437
2025-04-11 14:31:08 +01:00
Florian Hahn
ad9f15ab53
[VPlan] Introduce and use VPValue::replaceUsesOfWith (NFC).
Adds an API matching LLVM's IR Value, which simplifies some code a
bit.
2025-04-07 22:07:52 +01:00
Florian Hahn
7aedebac8c
[VPlan] Populate ExitBlocks when cloning VPlan (NFC).
Update VPlan::duplicate to add cloned exit blocks to ExitBlocks.

Currently there are no uses of the exit blocks after cloning so this is
NFC at the moment.
2025-04-07 21:17:42 +01:00