2618 Commits

Author SHA1 Message Date
Florian Hahn
662bede01e
[LV] Handle known-false mem runtime checks in GeneratedRTChecks.
Handle mem checks known to be false in getMemRuntimeChecks the same way
as SCEV checks known to be false in getSCEVChecks. This ensures such
redundant check blocks are not added in the first place.
2025-07-26 15:39:21 +01:00
Florian Hahn
9a201531ed
[LV] Bail out early if runtime checks are known to fail.
There are a number of cases for which SCEV may not be able to prove a
predicate will always be true/false, which may be simplified to a
constant during expansion (see discussion in
https://github.com/llvm/llvm-project/pull/131538).

Bail out early if runtime checks are known to always fail, as the
vector loop generated later will never execute.
2025-07-26 09:26:15 +01:00
Alex Bradbury
5294793bdc Revert "[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)"
This reverts commit ee3a7714b7a69ac9aae4b79f4c67adc38bc6876b.

Causes an assertion for the zvl1024b RISC-V build configuration. See
comment with reproducer at
<https://github.com/llvm/llvm-project/pull/149981#issuecomment-3118482801>
2025-07-25 16:14:10 +01:00
Mel Chen
ee3a7714b7
[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)
Now that support for masked loads/stores of interleave groups has
landed, we can enable the loop vectorizer to generate masked interleave
access where applicable.

This improves vectorization in several ways:
* Internal predication support: This enables interleave group
vectorization for loops with internal control flow predication, provided
all members of the group share the same predicate. Gaps in interleave
groups are still not efficiently handled by masking, so masking for gaps
remains disabled for now.
* Tail folding: This allows tail folding of loops with interleave groups
by using masking. Without this, vectorized loops with interleaves would
fall back to using separate gather/scatter accesses, which can be
significantly less efficient.
* Scalable vector support: Currently, only scalable vector types are
supported for masked interleave lowering. Fixed-length vector support
will be enabled in the future.

As interleave access is not yet supported with tail folding by EVL, that
functionality is temporarily disabled. We are going to create another
patch to support it.

Co-authored-by: Philip Reames <preames@rivosinc.com>

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2025-07-25 17:53:08 +08:00
Florian Hahn
77b1b956da
[LV] Also clamp MaxVF by trip count when maximizing vector bandwidth. (#149794)
Also clamp the max VF when maximizing vector bandwidth by the maximum
trip count. Otherwise we may end up choosing a VF for which the vector
loop never executes.

PR: https://github.com/llvm/llvm-project/pull/149794
2025-07-23 10:19:56 +01:00
Luke Lau
20c52e4231 Reapply "[RISCV][LoopVectorize] Use DataWithEVL as the preferred tail folding style (#148686)"
This reverts commit 25e97fc420f8ecc43fbabadfe9767b4163e6ee36.

The original commit was reverted due to a crash in llvm-test-suite. The
crash stemmed from a multiply reduction, which isn't supported for
scalable VFs on RISC-V. But for EVL tail folding we only support
scalable VFs, so when -force-tail-folding-style=data-with-evl is
specified we check to see if there's a scalable VF, and fall back to
data-without-lane-mask if there isn't.

This is done in setTailFoldingStyles, but previously we were only
checking if the forced tail folding style was legal, not the style
returned by TTI.

This version fixes this by checking the actual computed tail folding
style and not just the forced one, and adds a test for the crash in
llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll
2025-07-22 23:52:02 +08:00
Florian Hahn
37f0f10a85
[LV] Don't vectorize epilogue with scalable VF if no iterations remain. (#149789)
Currently we may try to vectorize the epilogue with a scalable VF, even
if there are no remaining iterations after the main vector loop with a
fixed VF.

Update selectEpilogueVectorizationFactor to always compute the number of
remaining iterations and exit early if no epilogue iterations remain.

Fixes https://github.com/llvm/llvm-project/issues/149726

PR: https://github.com/llvm/llvm-project/pull/149789
2025-07-22 13:13:31 +01:00
Florian Hahn
3813567e08
[VPlan] Clarify transform name to handlMaxNumNumReductions. (NFC)
Clarify name as suggested in https://github.com/llvm/llvm-project/pull/149736,
as only FMaxNum and FMinNum are handled.
2025-07-21 07:14:46 +01:00
Florian Hahn
004c67ea25
[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239)
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros 
are already handled consistently by maxnum/minnum.

If any input is NaN,
 *exit the vector loop,
 *compute the reduction result up to the vector iteration that contained
   NaN inputs and
 * resume in the scalar loop


New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.

PR: https://github.com/llvm/llvm-project/pull/148239
2025-07-18 21:58:19 +01:00
Nicholas Guy
534b9cdddd
[LoopVectorizer][NFC] Update comment regarding VF register pressure. (#149478) 2025-07-18 09:55:00 +01:00
Nicholas Guy
20fc297ce3
[LoopVectorizer] Only check register pressure for VFs that have been enabled via maxBandwidth (#149056)
Currently if MaxBandwidth is enabled, the register pressure is checked
for each VF. This changes that to only perform said check if the VF
would not have otherwise been considered by the LoopVectorizer if
maxBandwidth was not enabled.

Theoretically this allows for higher VFs to be considered than would
otherwise be deemed "safe" (from a regpressure perspective), but more
concretely this reduces the amount of work done at compile-time when
maxBandwidth is enabled.
2025-07-18 09:21:20 +01:00
Florian Hahn
afe8150780
[VPlan] Simplify exituser handling by generating all extracts first(NFCI)
Simplify the handling of exit users by generating all extracts first
(safe option), and have FOR handling optimize the extracts, similar to
already done for reductions and inductions.

NFC modulo first-order recurrence extract order in middle block.
2025-07-16 08:14:12 +01:00
David Sherwood
c363a3f9c8
[LV] Ensure getScaledReductions only matches extends inside the loop (#148264)
In getScaledReductions for the case where we try to match a partial
reduction of the form:

%phi = phi i32 ...
...
%add = add i32 %phi, %zext

where

%zext = i8 %some_val to i32

we should ensure that %zext is actually inside the loop.

Fixes https://github.com/llvm/llvm-project/issues/148260
2025-07-15 09:54:58 +01:00
Florian Hahn
f4c7cc26b6
[LV] Use more precise isPredicatedInst in legacy CCH (NFC).
Legal::isMaskRequired may be overly conservative and also return true
when no mask is actually required.

Use isPredicatedInst from the cost model instead, which fixes a
cost-model divergence between legacy and VPlan cost model where the
legacy cost model incorrectly assumed some loads were predicated.

Fixes https://github.com/llvm/llvm-project/issues/148431.
2025-07-13 19:55:34 +01:00
Anna Thomas
fe403584c4 [LV] Add a statistic for early exit vectorization
Add statistic LoopsEarlyExitVectorized

PR: https://github.com/llvm/llvm-project/pull/145730
2025-07-11 09:10:26 -04:00
David Sherwood
74e3dfe389
[LV] Disable forcing interleaving for uncountable early exit loops (#147993)
Interleaving does not currently work properly when vectorising loops
with uncountable early exits. Interleaving is already disabled for
normal vectorisation and for the pragma/hint - this patch also disables
it when using -force-vector-interleave.
2025-07-11 09:46:21 +01:00
Florian Hahn
64686c59c3
[VPlan] Connect (MemRuntime|SCEV)Check blocks as VPlan transform (NFC). (#143879)
Connect SCEV and memory runtime check block directly in VPlan as
VPIRBasicBlocks, removing ILV::emitSCEVChecks and
ILV::emitMemRuntimeChecks.

The new logic is currently split across
LoopVectorizationPlanner::addRuntimeChecks which collects a list of
{Condition, CheckBlock} pairs and performs some checks and emits remarks
if needed. The list of checks is then added to VPlan in
VPlanTransforms::connectCheckBlocks.

PR: https://github.com/llvm/llvm-project/pull/143879
2025-07-09 14:03:25 +02:00
Ramkumar Ramachandra
f1451e9f07
[LV] Improve code using drop_{begin,end} (NFC) (#147504) 2025-07-08 13:42:10 +01:00
Florian Hahn
6a9a16da7a
[VPlan] Replace RdxDesc with RecurKind in VPReductionPHIRecipe (NFC). (#142322)
Replace RdxDesc with RecurKind in VPReductionPHIRecipe, as
all VPlan analyses and codegen only require the recurrence kind. This
enables creating new VPReductionPHIRecipe directly in LV, without
needing to construction a whole RecurrenceDescriptor object.

Depends on
https://github.com/llvm/llvm-project/pull/141860
https://github.com/llvm/llvm-project/pull/141932
https://github.com/llvm/llvm-project/pull/142290
https://github.com/llvm/llvm-project/pull/142291

PR: https://github.com/llvm/llvm-project/pull/142322
2025-07-06 21:40:42 +01:00
Florian Hahn
c35fbb5460
[VPlan] Use VPReductionPHIRecipe::isOrdered instead of CM (NFC).
Directly retrieve isOrdered from the reduction phi recipe instead of
going through the legacy cost model.

Split off as suggested in
https://github.com/llvm/llvm-project/pull/142322.
2025-07-06 20:49:34 +01:00
Florian Hahn
c5fff132d0
[LV] Add LVL::getRecurrenceDescriptor (NFC).
Split off adding helper to retrieve RecurrenceDescriptor as suggested
from https://github.com/llvm/llvm-project/pull/142322.
2025-07-06 20:31:38 +01:00
Florian Hahn
cf06047231
[LV] Remove AddedAnyChecks, check directly instead (NFC).
As suggested in https://github.com/llvm/llvm-project/pull/143879, remove
AddedAnyChecks member and directly check if there are any relevant
runtime check blocks.
2025-07-06 20:19:46 +01:00
David Sherwood
f575b18fdc
[LV] Add support for partial reductions without a binary op (#133922)
Consider IR such as this:

for.body:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
  %accum = phi i32 [ 0, %entry ], [ %add, %for.body ]
  %gep.a = getelementptr i8, ptr %a, i64 %iv
  %load.a = load i8, ptr %gep.a, align 1
  %ext.a = zext i8 %load.a to i32
  %add = add i32 %ext.a, %accum
  %iv.next = add i64 %iv, 1
  %exitcond.not = icmp eq i64 %iv.next, 1025
  br i1 %exitcond.not, label %for.exit, label %for.body

Conceptually we can vectorise this using partial reductions too,
although the current loop vectoriser implementation requires the
accumulation of a multiply. For AArch64 this is easily done with
a udot or sdot with an identity operand, i.e. a vector of (i16 1).

In order to do this I had to teach getScaledReductions that the
accumulated value may come from a unary op, hence there is only
one extension to consider. Similarly, I updated the vplan and
AArch64 TTI cost model to understand the possible unary op.

---------

Co-authored-by: Matt Devereau <matthew.devereau@arm.com>
2025-07-02 13:05:51 +01:00
Mel Chen
bc8dad1c7e
[VPlan] Emit VPVectorEndPointerRecipe for reverse interleave pointer adjustment (#144864)
A reverse interleave access is essentially composed of multiple
load/store operations with same negative stride, and their addresses are
based on the last lane address of member 0 in the interleaved group.

Currently, we already have VPVectorEndPointerRecipe for computing the
last lane address of consecutive reverse memory accesses. This patch
extends VPVectorEndPointerRecipe to support constant stride and extracts
the reverse interleave group address adjustment from
VPInterleaveRecipe::execute, replacing it with a
VPVectorEndPointerRecipe.

The final goal is to support interleaved accesses with EVL tail folding.
Given that VPInterleaveRecipe is large and tightly coupled — combining
both load and store, and embedding operations like reverse pointer
adjustion (GEP), widen load/store, deinterleave/interleave, and reversal
— breaking it down into smaller, dedicated recipes may allow
VPlanTransforms::tryAddExplicitVectorLength to lower them into EVL-aware
form more effectively.

One foreseeable challenge is that
VPlanTransforms::convertToConcreteRecipes currently runs after
tryAddExplicitVectorLength, so decomposing VPInterleaveRecipe will
likely need to happen earlier in the pipeline to be effective.
2025-07-02 18:16:02 +08:00
David Sherwood
9b13dfdfbc
[LV] Use vscale for tuning to improve branch weight estimates (#144733)
In addBranchWeightToMiddleTerminator we attempt to add branch weights to
the middle block terminator. We pessimistically assume vscale=1, whereas
we can improve the estimate by using the value of vscale used for
tuning.
2025-07-01 13:23:38 +01:00
Florian Hahn
59a7185dd9
[VPlan] Truncate/Extend ComputeReductionResult at construction (NFC). (#141860)
Instead of looking up the narrower reduction type via getRecurrenceType
we can generate the needed extend directly at constructiond re-use the
truncated value from the loop.

PR: https://github.com/llvm/llvm-project/pull/141860
2025-06-30 22:39:17 +01:00
Florian Hahn
20fbbd7675
[LV] Add support for cmp reductions with decreasing IVs. (#140451)
Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y)
reductions where one of x or y is a decreasing induction, producing a SMin
 reduction. It uses signed max as sentinel value.

PR: https://github.com/llvm/llvm-project/pull/140451
2025-06-29 11:17:03 +01:00
Ramkumar Ramachandra
613804cca9
[LV] Improve code using [[maybe_unused]] (NFC) (#137138) 2025-06-27 10:58:17 +01:00
David Sherwood
bf2b14acf3
[LV] Enable auto-vectorisation of loops with uncountable exits (#133099)
Until now the feature to enable vectorisation of some early exit
loops with uncountable exits was controlled under a flag, off by
default. Now that we have efficient code generation for
vectorising such loops (see PR #130766) and we still have some
time from the next LLVM release it seems like a good time point
to enable the feature by default. If any issues arise post-commit
it can be easily reverted.

Using this patch I built and ran the LLVM test suite successfully,
which on neoverse-v1 led to the vectorisation of 114 additional
early exit loops. I also built and ran SPEC2017 successfully for
both neoverse-v1 and neoverse-v2.
2025-06-27 10:39:33 +01:00
David Sherwood
6f43754e9c
[LV] Disable interleaving via hints for uncountable early exit loops (#145877)
Currently if the user enables interleaving during vectorisation of
uncountable early exit loops via the interleave_count pragma and the
enable-early-exit-vectorization option, it will miscompile. There is
ongoing work to fix this, but for now it seems safer to ignore the hint
until it is supported.

---------

Co-authored-by: Paul Walker <paul.walker@arm.com>
2025-06-27 09:09:55 +01:00
Florian Hahn
786ccb2c0e
[LV] Directly check if memory or SCEV check blocks are used (NFCI).
Slightly simplify the logic to retrieve check blocks in
GeneratedRTChecks, to prepare for additional refactoring.
2025-06-27 07:24:32 +01:00
Florian Hahn
aa24029319
[VPlan] Unroll VPReplicateRecipe by VF. (#142433)
Explicitly unroll VPReplicateRecipes outside replicate regions by VF,
replacing them by VF single-scalar recipes. Extracts for operands are
added as needed and the scalar results are combined to a vector using a
new BuildVector VPInstruction.

It also adds a few folds to simplify unnecessary extracts/BuildVectors.

It also adds a BuildStructVector opcode for handling of calls that have
struct return types.

VPReplicateRecipe in replicate regions can will be unrolled as follow
up, turing non-single-scalar VPReplicateRecipes into 'abstract', i.e.
not executable.

PR: https://github.com/llvm/llvm-project/pull/142433
2025-06-26 11:19:09 +01:00
Florian Hahn
830b2c842e
[LV] Replace redundant ExtractLastElement of reduction result (NFC).
Replace redundant ExtractLastElement VPInstructions early. This is NFC,
as the VPInstruction computing the final result is vector-to-scalar,
producing a single scalar already. This enables follow-up changes to
model more aspects of reductions directly in VPlan.
2025-06-24 21:48:58 +01:00
David Green
77941eba7f
[CostModel] Add a DstTy to getShuffleCost (#141634)
A shuffle will take two input vectors and a mask, to produce a new
vector of size <MaskElts x SrcEltTy>. Historically it has been assumed
that the SrcTy and the DstTy are the same for getShuffleCost, with that
being relaxed in recent years. If the Tp passed to getShuffleCost is the
SrcTy, then the DstTy can be calculated from the Mask elts and the src
elt size, but the Mask is not always provided and the Tp is not reliably
always the SrcTy. This has led to situations notably in the SLP
vectorizer but also in the generic cost routines where assumption about
how vectors will be legalized are built into the generic cost routines -
for example whether they will widen or promote, with the cost modelling
assuming they will widen but the default lowering to promote for integer
vectors.

This patch attempts to start improving that - it originally tried to
alter more of the cost model but that too quickly became too many
changes at once, so this patch just plumbs in a DstTy to getShuffleCost
so that DstTy and SrcTy can be reliably distinguished. The callers of
getShuffleCost have been updated to try and include a DstTy that is more
accurate. Otherwise it tries to be fairly non-functional, keeping the
SrcTy used as the primary type used in shuffle cost routines, only using
DstTy where it was in the past (for InsertSubVector for example).

Some asserts have been added that help to check for consistent values
when a Mask and a DstTy are provided to getShuffleCost. Some of them
took a while to get right, and some non-mask calls might still be
incorrect. Hopefully this will provide a useful base to build more
shuffles that alter size.
2025-06-21 12:29:29 +01:00
Philip Reames
c103bbc836
[LV] Consider whether vscale is a known power of two for iteration check (#144963)
Going mostly by the comment here - but it says "vscale is not
necessarily a power-of-2". Both in tree targets have vscale as a power
of two, and we have an existing TTI hook for that.
2025-06-20 11:37:27 -07:00
Luke Lau
a2b8a93ff9
[VPlan] Pass NumUnrolledElems as operand to VPWidenPointerInductionRecipe. NFC (#119859)
Similarly to VPWidenIntOrFpInductionRecipe, if we want to support it in
EVL tail folding we need to increment the induction by EVL steps instead
of VF*UF steps, but currently this is hard-wired in
VPWidenPointerInductionRecipe.

This adds an operand for the number of elements unrolled and plumbs it
through, so that we can swap it out in
VPlanTransforms::tryAddExplicitVectorLength further down the line.
2025-06-20 15:46:52 +01:00
Ramkumar Ramachandra
c8c4bd1ebc
[LV] Stengthen loop-invariance checks in isPredicatedInst (#140744)
Check loop-invariance against SCEV as well.
2025-06-20 14:01:48 +01:00
Philip Reames
d8e6d74c69
[LV] Consider EVL legality for TTI tail folding preference (#144790) 2025-06-19 16:15:23 -07:00
Philip Reames
b96370131d
[TTI] Plumb CostKind through getPartialReductionCost (#144953)
Purely for the sake of being idiomatic with other TTI costing routines,
no direct motivation beyond that.
2025-06-19 15:29:56 -07:00
David Sherwood
af51c9d9df
[LV][NFC] Add branch weight test showing incorrect behaviour (#144682)
This patch adds a test that shows incorrect branch weights being set in
function

EpilogueVectorizerEpilogueLoop::emitMinimumVectorEpilogueIterCountCheck
2025-06-19 10:49:40 +01:00
Florian Hahn
071a6feabd
[TTI] Remove PPC hasActiveVectorLength impl, simplify interface (NFC). (#142310)
PPCTTIImpl defines hasActiveVectorLength and also getVPMemoryOpCost, but
they appear unused (i.e. no changes to tests).

Remove them, as they complicate the interface for hasActiveVectorLength.
This simplifies the only use in LV as now no placeholder values need to
be passed.

PR: https://github.com/llvm/llvm-project/pull/142310
2025-06-18 19:02:17 +01:00
Paul Walker
d3441f7348
[LV] Change getSmallBestKnownTC to return an ElementCount (NFC) (#141793)
This is prep work for enabling better UF calculations when using vscale
based VFs to vectorise loops with vscale based tripcounts.

NOTE: NFC because All uses remain fixed-length until a following PR
changes LoopVectorize's version of getSmallConstantTripCount().
2025-06-18 11:45:20 +01:00
Mel Chen
ba40a7bc2e
[LoopVectorize] Vectorize fixed-order recurrence with vscale x 1. (#142772)
When the fixed-order recurrence phi is live-out from the loop, the
vectorizer uses VPInstruction::ExtractPenultimateElement to extract the
penultimate element from the recurrence vector. However, this is not
feasible when the VF is vscale x 1, since vscale could be 1, making the
vector contain only one element.

This patch changes the behavior for vscale x 1 by extracting the last
element from the vector produced by splicing the recurrence phi and the
previous value. This ensures we can still determine the correct live-out
value of the recurrence phi.
2025-06-18 16:03:20 +08:00
Luke Lau
9dd1c66e8f
[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes (#118638)
The motivation of this PR is to make #115274 easier to implement, and
should allow us to add EVL support by just passing EVL to the VF
operand.

The current difficulty with widening IVs with EVL is that
VPWidenIntOrFpInductionRecipe generates its own backedge value. Since
it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which
means we can't use the EVL since it's defined in the loop body.

The gist in this PR is to take the approach in #114305 and expand
VPWidenIntOrFpInductionRecipe into several recipes for the initial
value, phi and backedge value just before execution. I.e. this example:

```
  vector.ph:
  Successor(s): vector loop

  <x1> vector loop: {
    vector.body:
      WIDEN-INDUCTION %i = phi %start, %step, %vf
      ...
      EMIT branch-on-count ...
    No successors
  }
```

gets expanded to:

``` 
vector.ph:
  ...
  vp<%induction.start> = ...
  vp<%induction.increment> = ...

Successor(s): vector loop

<x1> vector loop: {
  vector.body:
    ir<%i> = WIDEN-PHI vp<%induction.start>, vp<%vec.ind.next>
    ...
    vp<%vec.ind.next> = add ir<%i>, vp<%induction.increment>
    EMIT branch-on-count ...
  No successors
}
```

This allows us to a value defined in the loop in the backedge value, and
also means we can just reuse the existing backedge fixups in
VPlan::execute without having to specially handle it ourselves.

After this #115274 should just become a matter of setting the VF operand
to EVL (and building the increment step in the loop body, not the
preheader).
2025-06-17 18:24:07 +01:00
Florian Hahn
d3bc834ece
[LV] Update check to find epilogue resume value to check all incoming.
This fixes a crash where all incoming values for the epilogue resume
value are zero, because there are no remaining iterations to execute for
the epilogue loop.
2025-06-16 21:10:12 +01:00
David Sherwood
a75e0627f9
[LV] Use vscale for tuning when updating profile information (#143690)
In fixVectorizedLoop we call setProfileInfoAfterUnrolling to update the
profile information after vectorising, however for scalable VFs we
pessimistically assume vscale=1. We can improve upon this by using the
value of vscale used for tuning, i.e. when targeting neoverse-v1 the
expected value is 2.
2025-06-16 10:02:38 +01:00
Sam Tebbs
3dd61c1876
[LV] Fix MVE regression from #132190 (#141736)
Register pressure was only considered if the vector bandwidth was being
maximised (chosen either by the target or user options), but #132190
inadvertently caused high pressure VFs to be pruned even when max
bandwidth wasn't enabled. This PR returns to the previous behaviour.
2025-06-16 09:58:03 +01:00
Ramkumar Ramachandra
fbade95ebf
[LV] Strip unnecessary make_{pair,optional} (NFC) (#141924) 2025-06-16 08:55:46 +01:00
Florian Hahn
df54a2d935
[VPlan] Only skip induction phis in planContainsAdditionalSimps (NFC).
Skip induction phis when checking for simplifications, as they may not
be lowered directly be lowered to a corresponding PHI recipe. Reductions
and first-order recurrences will get lowered to phi recipes, unless they
are removed. Considering them for simplifications allows removing them
if there are no remaining users.

NFC as currently reduction and recurrence phis are not
simplified/removed if dead.
2025-06-15 19:31:30 +01:00
Florian Hahn
577199f922
Reapply "[VPlan] Set branch weight metadata on middle term in VPlan (NFC) (#143035)"
This reverts commit 0604dc199c019b23746f4a54885ba0c75569cdae.

The recommitted version addresses post-commit comments and adjusts the
place the branch weights are added. It now runs before VPlans are optimized
for VF and UF, which may remove the vector loop region, causing a crash
trying to get the middle block after that. Test case added in
72f99b75afc12bb.

Original message:
Manage branch weights for the BranchOnCond in the middle block in VPlan.
This requires updating VPInstruction to inherit from VPIRMetadata, which
in general makes sense as there are a number of opcodes that could take
metadata.

There are other branches (part of the skeleton) that also need branch
weights adding.

PR: https://github.com/llvm/llvm-project/pull/143035
2025-06-14 17:20:46 +01:00