492 Commits

Author SHA1 Message Date
Florian Hahn
1c727baf69
[VPlan] Mark BranchOnCount and BranchOnCond as having side effects (NFC)
BranchOnCount and BranchOnCond do not read memory, but cannot be moved.

Mark them as having side-effects, but not reading/writing memory, which
more accurately models that above. This allows removing some special
checking for branches both in the current code and future patches.
2025-11-02 21:14:37 +00:00
Florian Hahn
f773efcffb
[VPlan] Add VPIRMetadata parameter to VPInstruction constructor. (NFC)
Update VPInstruction constructor to accept VPIRMetadata between the
Flags and DebugLoc parameters. This allows metadata to be passed during
construction rather than assigned afterward.
2025-11-01 21:57:52 +00:00
Florian Hahn
a943132761
[VPlan] Add VPRegionBlock::getCanonicalIVType (NFC). (#164127)
Split off from https://github.com/llvm/llvm-project/pull/156262.

Similar to VPRegionBlock::getCanonicalIV, add helper to get the type of
the canonical IV, in preparation for removing VPCanonicalIVPHIRecipe.

PR: https://github.com/llvm/llvm-project/pull/164127
2025-10-31 20:05:02 -07:00
Florian Hahn
b2d12d6f2b
[VPlan] Extend getSCEVForVPV, use to compute VPReplicateRecipe cost. (#161276)
Update getSCEVExprForVPValue to handle more complex expressions, to use
it in VPReplicateRecipe::comptueCost.

In particular, it supports construction SCEV expressions for
GetElementPtr VPReplicateRecipes, with operands that are
VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we
hit a sub-expression we don't support yet, we return
SCEVCouldNotCompute.

Note that the SCEV expression is valid VF = 1: we only support
construction AddRecs for VPCanonicalIVRecipe, which is an AddRec
starting at 0 and stepping by 1. The returned SCEV expressions could be
converted to a VF specific one, by rewriting the AddRecs to ones with
the appropriate step.

Note that the logic for constructing SCEVs for GetElementPtr was
directly ported from ScalarEvolution.cpp.

Another thing to note is that we construct SCEV expression purely by
looking at the operation of the recipe and its translated operands, w/o
accessing the underlying IR (the exception being getting the source
element type for GEPs).

PR: https://github.com/llvm/llvm-project/pull/161276
2025-10-30 15:46:19 -07:00
Ramkumar Ramachandra
a2d873fb87
[VPlan] Introduce cannotHoistOrSinkRecipe, fix miscompile (#162674)
Factor out common code to determine legality of hoisting and sinking.
The patch has the side-effect of fixing an underlying bug, where a
load/store pair is reordered.
2025-10-28 09:36:17 +00:00
Mel Chen
6bf948999f
[VPlan] Store memory alignment in VPWidenMemoryRecipe. nfc (#165255)
Add an member Alignment to VPWidenMemoryRecipe to store memory alignment
directly in the recipe. Update constructors, clone(), and relevant
methods to use this stored alignment instead of querying the IR
instruction. This allows VPWidenLoadRecipe/VPWidenStoreRecipe to be
constructed without relying on the original IR instruction in the
future.
2025-10-28 15:29:35 +08:00
Florian Hahn
523c796df7
[VPlan] Use VPlan type inference to get address space for recipes. (NFC)
Instead of accessing the address space from the IR reference, retrieve
it via type inference.
2025-10-28 04:51:24 +00:00
Sam Tebbs
6b19a546aa
[LV] Bundle partial reductions inside VPExpressionRecipe (#147302)
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. -> https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. https://github.com/llvm/llvm-project/pull/147513
2025-10-23 11:18:55 +00:00
Florian Hahn
9317975a7a
[VPlan] Match legacy behavior w.r.t. using pointer phis as scalar addrs.
When the legacy cost model scalarizes loads that are used as addresses
for other loads and stores, it looks to phi nodes, if they are direct
address operands of loads/stores. Match this behavior in
isUsedByLoadStoreAddress, to fix a divergence between legacy and
VPlan-based cost model.
2025-10-20 11:09:25 +01:00
Florian Hahn
b9ce7656e9
[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-19 18:49:05 +00:00
Florian Hahn
8769119027
[VPlan] Add VPRecipeBase::getRegion helper (NFC).
Multiple places retrieve the region for a recipe. Add a helper to make
the code more compact and clearer.
2025-10-18 21:25:34 +01:00
Ramkumar Ramachandra
0a4702407b
[VPlan] Improve code around canConstantBeExtended (NFC) (#161652)
Follow up on 7c4f188 ([LV] Support multiplies by constants when forming
scaled reductions), introducing m_APInt, and improving code around
canConstantBeExtended: we change canConstantBeExtended to take an APInt.
2025-10-16 13:03:13 +01:00
Florian Hahn
7f54fccc0e
[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056)
When narrowing stores of a single-scalar, we currently use
ExtractLastElement, which extracts the last element across all parts.
This is not correct if the store's address is not uniform across all
parts. If it is only uniform-per-part, the last lane per part must be
extracted. Add a new ExtractLastLanePerPart opcode to handle this
correctly. Most transforms apply to both ExtractLastElement and
ExtractLastLanePerPart, with the only difference being their treatment
during unrolling.

Fixes https://github.com/llvm/llvm-project/issues/162498.

PR: https://github.com/llvm/llvm-project/pull/163056
2025-10-15 13:46:09 +01:00
Florian Hahn
861519327a
[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)
The canonical IV is tied to region blocks; move getCanonicalIV there and
update all users.

PR: https://github.com/llvm/llvm-project/pull/163020
2025-10-15 12:48:35 +01:00
Kazu Hirata
6f13b94e61
[llvm] Use [[fallthrough]] instead of LLVM_FALLTHROUGH (NFC) (#163086)
[[fallthrough]] is now part of C++17, so we don't need to use
LLVM_FALLTHROUGH.
2025-10-12 20:49:19 -07:00
Florian Hahn
ae7b15f2e2
[VPlan] Return invalid for scalable VF in VPReplicateRecipe::computeCost
Replication is currently not supported for scalable VFs. Make sure
VPReplicateRecipe::computeCost returns an invalid cost early, for
scalable VFs if the recipe is not a single-scalar.

Note that this moves the existing invalid-costs.ll out of the AArch64
subdirectory, as it does not use a target triple.

Fixes https://github.com/llvm/llvm-project/issues/160792.
2025-10-11 19:28:02 +01:00
Florian Hahn
98ce434870
[VPlan] Skip VPBlendRecipe in isUsedByLoadStoreAddress.
VPBlendRecipes are introduced as part of if-conversion, potentially adding
a def-use chain from a load used in a compare to another load/store. In
the scalar IR, there is no connection via def-use chains, so the legacy
cost model won't consider the load used by memory operation.

Skipping blends brings the VPlan-based cost-computation in line with the
legacy cost model after https://github.com/llvm/llvm-project/pull/162157.
2025-10-08 18:43:23 +01:00
Ramkumar Ramachandra
7296734394
[VPlan] Mark ActiveLaneMask as not having mem effects (#162330)
VPInstruction::ActiveLaneMask does not read or write memory. This allows
us to clean up some dead recipes.
2025-10-08 09:19:24 +01:00
Florian Hahn
9c0e09e0c1
[VPlan] Process ExpressionRecipes in reverse order in constructor.
Currently there's a crash when trying to construct VPExpressionRecipes
for a mul (ext, ext), if the multiply has outside users; the mul will be
cloned to serve its external users, but the extends won't get cloned and
will stay connected to users outside the loop (the cloned multiply).

To fix this, process recipes in reverse order. This ensures that we
visit bundled users before their operands, properly ensuring that the
extends for the external user are cloned as well.
2025-10-06 22:24:02 +01:00
Florian Hahn
74af5784a5
Reapply "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)" (#162157)
This reverts commit f80c0baf058dbdc5 and 94eade61a02ae5.

Recommit a small fix for targets using prefersVectorizedAddressing.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: https://github.com/llvm/llvm-project/pull/160053
2025-10-06 22:16:08 +01:00
Alexey Bataev
f80c0baf05 Revert "Reapply "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)" (#161724)"
This reverts commit 8f2466bc72a5ab163621cb1bf4bf53a27f1cefe7 to fix
crashes reported in commits
2025-10-05 08:38:17 -07:00
Alexey Bataev
94eade61a0 Revert "[VPlan] Match legacy CM in ::computeCost if load is used by load/store."
This reverts commit 1d65d9ce06fef890389e61990d9c748162334e55 to fix
crashes, reported in the commits
2025-10-05 08:37:52 -07:00
Florian Hahn
1d65d9ce06
[VPlan] Match legacy CM in ::computeCost if load is used by load/store.
If a load is scalarized because it is used by a load/store address, the
legacy cost model does not pass ScalarEvolution to getAddressComputationCost.

Match the behavior in VPReplicateRecipe::computeCost.
2025-10-03 22:01:46 +01:00
Florian Hahn
8f2466bc72
Reapply "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)" (#161724)
This reverts commit f61be4352592639a0903e67a9b5d3ec664ad4d23.

Recommit a small fix handling scalarization overhead consistently with
legacy cost model if a load is used directly as operand of another
memory operation, which fixes
https://github.com/llvm/llvm-project/issues/161404.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: https://github.com/llvm/llvm-project/pull/160053
2025-10-02 22:00:22 +01:00
Florian Hahn
7c4f188f27
[LV] Support multiplies by constants when forming scaled reductions. (#161092)
We can create partial reductions for multiplies with constants, if the
constant is small enough to be extended from source to destination type
w/o changing the value.

This only handles constant on the right side of a multiply, relying on
other passes to canonicalize the input.

Alive2 Proofs: https://alive2.llvm.org/ce/z/iWRMr6

PR: https://github.com/llvm/llvm-project/pull/161092
2025-10-02 10:53:17 +00:00
Sam Tebbs
664b227089
[LV] Keep duplicate recipes in VPExpressionRecipe (#156976)
The VPExpressionRecipe class uses a set to store its bundled recipes. If
repeated recipes are bundled then the duplicates will be lost, causing
the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, for example, if the
extends are the same then the 3rd element of the set will be the
reduction, rather than the expected mul, causing a cast error. With this
change, the recipes are at the expected index in the set.

Fixes #156464
2025-10-01 16:01:54 +01:00
Florian Hahn
f61be43525
Revert "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)"
This reverts commit b4be7ecaf06bfcb4aa8d47c4fda1eed9bbe4ae77.

See https://github.com/llvm/llvm-project/issues/161404 for a crash
exposed by the change. Revert while I investigate.
2025-09-30 22:13:06 +01:00
Sam Tebbs
88658dbbc5
[LV] Add ExtNegatedMulAccReduction expression type (#160154)
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

1. https://github.com/llvm/llvm-project/pull/156976
2. -> https://github.com/llvm/llvm-project/pull/160154
3. https://github.com/llvm/llvm-project/pull/147302
2025-09-30 10:10:37 +01:00
Florian Hahn
b4be7ecaf0
[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: https://github.com/llvm/llvm-project/pull/160053
2025-09-29 08:08:09 +00:00
Florian Hahn
8460dbb450
[VPlan] Mark VPInstruction::Broadcast as not reading/writing memory.
This enables additional DCE/CSE opportunities and ensures that we don't
end up with multiple redundant users of a VPInstruction using EVL. It
fixes a verifier error in the added test_3_inductions test.
2025-09-27 20:48:42 +01:00
Luke Lau
7275c178bd
[VPlan] Fix packed replication of struct types (#160274)
I ran into this crash when #158690 caused a loop with a struct call to
be vectorized.

If we have a replicate recipe in a branch-on-mask predicated region
that's used by a widened recipe in another block then it will be packed
together with the other lanes via a VPPredInstPHIRecipe.

If we're replicating a call with a struct return type then we currently
crash. The code that handles structs in packScalarIntoVectorizedValue
seemed to be untested at least on test/Transforms/LoopVectorize.

There's two places that need to be fixed. The poison value that the
scalar is packed into needs to use toVectorizedTy to correctly handle
structs (not to be confused with toVectorTy!)

The other is that VPPredInstPHIRecipe expects its operand to be an
InsertElementInstr when stringing together the different lanes. For
structs this will be an InsertVlaueInstr, and the value for the previous
lane will be at the back of a chain of InsertValueInstrs.
2025-09-26 02:22:15 +00:00
Florian Hahn
70a26da639
[VPlan] Set correct flags when creating and cloning VPWidenCastRecipe.
Make sure that we set the correct wrap flags when creating new
VPWidenCastRecipes for truncs and preserve the flags from the recipe
directly when cloning, to make sure they are not dropped.

Fixes https://github.com/llvm/llvm-project/issues/160396
2025-09-25 09:00:47 +01:00
Ramkumar Ramachandra
66c35ebf3c
[VPlan] Avoid branching around State.get (NFC) (#159042) 2025-09-22 10:31:16 +01:00
Ramkumar Ramachandra
019913e4fa
[VPlan] Add WidenGEP::getSourceElementType (NFC) (#159029) 2025-09-22 10:02:08 +01:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
Florian Hahn
1858532c48
[VPlan] Handle predicated UDiv in VPReplicateRecipe::computeCost.
Account for predicated UDiv,SDiv,URem,SRem in
VPReplicateRecipe::computeCost: compute costs of extra phis and apply
getPredBlockCostDivisor.

Fixes https://github.com/llvm/llvm-project/issues/158660
2025-09-15 21:46:50 +01:00
Ramkumar Ramachandra
148a83543b
[LV] Introduce m_One and improve (0|1)-match (NFC) (#157419) 2025-09-15 10:34:06 +00:00
Florian Hahn
fb60d0337c
[VPlan] Return non-option cost from getCostForRecipeWithOpcode (NFC).
getCostForRecipeWithOpcode must only be called with supported opcodes.
Directly return the cost, and add llvm_unreachable to catch unhandled
cases.
2025-09-14 22:24:57 +01:00
Florian Hahn
91d4c0dfdf
Reapply "[VPlan] Compute cost of scalar (U|S)Div, (U|S)Rem in computeCost (NFCI)."
This reverts commit 9490d58fa92bb338db96af331194c9ba26eb0201.

Recommits de7e3a58952 with a fix for an unhandled case, causing crashes
in some configs.
2025-09-14 13:15:07 +01:00
Aiden Grossman
9490d58fa9 Revert "[VPlan] Compute cost of scalar (U|S)Div, (U|S)Rem in computeCost (NFCI)."
This reverts commit de7e3a589525179f3b02b84b194aac6cf581425c.

This broke quite a few upstream buildbots and premerge. Reverting for now to
get things back to green.

https://lab.llvm.org/buildbot/#/builders/137/builds/25467
2025-09-13 22:32:48 +00:00
Florian Hahn
de7e3a5895
[VPlan] Compute cost of scalar (U|S)Div, (U|S)Rem in computeCost (NFCI).
Directly compute the cost of UDiv, SDiv, URem, SRem in VPlan.
2025-09-13 22:09:06 +01:00
Florian Hahn
30e9cbacab
[VPlan] Move logic to compute scalarization overhead to cost helper(NFC)
Extract the logic to compute the scalarization overhead to a helper for
easy re-use in the future.
2025-09-13 20:41:44 +01:00
Florian Hahn
b8eaceb39b
[VPlan] Explicitly replicate VPInstructions by VF. (#155102)
Extend replicateByVF added in #142433 (aa240293190) to also explicitly
unroll replicating VPInstructions.

Now the only remaining case where we replicate for all lanes is
VPReplicateRecipes in replicate regions.

PR: https://github.com/llvm/llvm-project/pull/155102
2025-09-12 17:06:26 +01:00
Florian Hahn
c3e76b2770
[VPlan] Keep common flags during CSE. (#157664)
During CSE, we don't have to drop all poison-generating flags on
mis-match, we can keep the ones common on both recipes.

PR: https://github.com/llvm/llvm-project/pull/157664
2025-09-10 10:20:48 +00:00
David Sherwood
ba4ce60f1a
[LV] Add scalar load/stores to VPReplicateRecipe::computeCost (#153218)
Avoid calling getLegacyCost for single scalar loads and stores where the
cost is trivial to calculate.
2025-09-05 11:52:07 +01:00
Sam Tebbs
37127f74f4
[LV] Bundle sub reductions into VPExpressionRecipe (#147255)
This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. -> https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-09-01 17:25:01 +01:00
Mel Chen
13357e8a12
[LV][EVL] Support interleaved access with tail folding by EVL (#152070)
The InterleavedAccess pass already supports transforming
vector-predicated (vp) load/store intrinsics. With this patch, we start
enabling interleaved access under tail folding by EVL.

This patch introduces a new base class, VPInterleaveBase, and a concrete
class, VPInterleaveEVLRecipe. Both the existing VPInterleaveRecipe and
the new VPInterleaveEVLRecipe inherit from and implement
VPInterleaveBase.

Compared to VPInterleaveRecipe, VPInterleaveEVLRecipe adds an EVL
operand to emit vp.load/vp.store intrinsics.

Currently, tail folding by EVL is only supported for scalable
vectorization. Therefore, VPInterleaveEVLRecipe will only emit
interleave/deinterleave intrinsics. Reverse accesses are not yet
implemented, as masked reverse interleaved access under tail folding is
not yet supported.

Fixed #123201
2025-09-01 21:20:06 +08:00
Kerry McLaughlin
f0e9bba024
[LoopVectorize] Generate wide active lane masks (#147535)
This patch adds a new flag (-enable-wide-lane-mask) which allows
LoopVectorize to generate wider-than-VF active lane masks when it
is safe to do so (i.e. the mask is used for data and control flow).

The transform in extractFromWideActiveLaneMask creates vector
extracts from the first active lane mask in the header & loop body,
modifying the active lane mask phi operands to use the extracts.

An additional operand is passed to the ActiveLaneMask instruction,
the value of which is used as a multiplier of VF when generating the
mask.
By default this is 1, and is updated to UF by
extractFromWideActiveLaneMask.

The motivation for this change is to improve interleaved loops when
SVE2.1 is available, where we can make use of the whilelo instruction
which returns a predicate pair.

This is based on a PR that was created by @momchil-velikov (#81140)
and contains tests which were added there.
2025-09-01 13:53:30 +01:00
Florian Hahn
df098796ec
[VPlan] Compute cost of intrinsics directly for VPReplicateRecipe (NFCI). (#154617)
Handle intrinsic calls in VPReplicateRecipe::computeCost. There are some
intrinsics pseudo intrinsics for which the computed cost is known zero,
so we handle those up front.

Depends on https://github.com/llvm/llvm-project/pull/154291.

PR: https://github.com/llvm/llvm-project/pull/154617
2025-08-27 21:40:47 +01:00
Florian Hahn
5e32f728ec
[VPlan] Move logic to compute cost for intrinsic to helper (NFC).
Refactor to prepare for https://github.com/llvm/llvm-project/pull/154617.
2025-08-27 19:26:34 +01:00