2758 Commits

Author SHA1 Message Date
Mel Chen
d1874047f5
[VPlan] Retrieve alignment from Load/StoreInst in constructors. nfc (#165722)
This patch removes the explicit Alignment parameter from
VPWidenLoadRecipe and VPWidenStoreRecipe constructors. Instead, these
recipes now directly retrieve the alignment from their
LoadInst/StoreInst.
2025-11-06 09:02:04 +00:00
Florian Hahn
6e83937f39
[VPlan] Add getConstantInt helpers for constant int creation (NFC).
Add getConstantInt helper methods to VPlan to simplify the common
pattern of creating constant integer live-ins.

Suggested as follow-up in
https://github.com/llvm/llvm-project/pull/164127.
2025-11-01 04:13:01 +00:00
Florian Hahn
b2d12d6f2b
[VPlan] Extend getSCEVForVPV, use to compute VPReplicateRecipe cost. (#161276)
Update getSCEVExprForVPValue to handle more complex expressions, to use
it in VPReplicateRecipe::comptueCost.

In particular, it supports construction SCEV expressions for
GetElementPtr VPReplicateRecipes, with operands that are
VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we
hit a sub-expression we don't support yet, we return
SCEVCouldNotCompute.

Note that the SCEV expression is valid VF = 1: we only support
construction AddRecs for VPCanonicalIVRecipe, which is an AddRec
starting at 0 and stepping by 1. The returned SCEV expressions could be
converted to a VF specific one, by rewriting the AddRecs to ones with
the appropriate step.

Note that the logic for constructing SCEVs for GetElementPtr was
directly ported from ScalarEvolution.cpp.

Another thing to note is that we construct SCEV expression purely by
looking at the operation of the recipe and its translated operands, w/o
accessing the underlying IR (the exception being getting the source
element type for GEPs).

PR: https://github.com/llvm/llvm-project/pull/161276
2025-10-30 15:46:19 -07:00
Ramkumar Ramachandra
01fbbda62c
[LV] Strengthen assert: VPlan0 doesn't have WidenPHIs (NFC) (#165715)
VPWidenCanonicalIV and VPBlend recipes are created by VPPredicator, and
VPCanonicalIVPHI and VPInstruction recipes are created by
VPlanConstruction. WidenPHIs are never created.
2025-10-30 18:32:33 +00:00
Florian Hahn
4c46ae3948
[LV] Only skip scalarization overhead for members used as address.
Refine logic to scalarize interleave group member: only skip
scalarization overhead for member being used as addresses. For others,
use the regular scalar memory op cost.

This currently doesn't trigger in practice as far as I could find, but
fixes a potential divergence between VPlan- and legacy cost models.

It fixes a concrete divergence with a follow-up patch,
https://github.com/llvm/llvm-project/pull/161276.
2025-10-30 05:04:34 +00:00
Mel Chen
6bf948999f
[VPlan] Store memory alignment in VPWidenMemoryRecipe. nfc (#165255)
Add an member Alignment to VPWidenMemoryRecipe to store memory alignment
directly in the recipe. Update constructors, clone(), and relevant
methods to use this stored alignment instead of querying the IR
instruction. This allows VPWidenLoadRecipe/VPWidenStoreRecipe to be
constructed without relying on the original IR instruction in the
future.
2025-10-28 15:29:35 +08:00
Hassnaa Hamdi
be29f0dd86
[LV]: Improve accuracy of calculating remaining iterations of MainLoopVF (#156723)
Transform TC and VF to same numerical space when they are different.
2025-10-26 14:45:44 +00:00
Florian Hahn
bfc322dd72
Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

There have been reports of mis-compiles
in https://github.com/llvm/llvm-project/pull/149706.

Revert while I investigate.
2025-10-22 21:27:11 +01:00
Kerry McLaughlin
45c0b29171
[LV] Ignore user-specified interleave count when unsafe. (#153009)
When an VF is specified via a loop hint, it will be clamped to a safe
VF or ignored if it is found to be unsafe. This is not the case for
user-specified interleave counts, which can lead to loops such as
the following with a memory dependence being vectorised with
interleaving:

```
#pragma clang loop interleave_count(4)
for (int i = 4; i < LEN; i++)
    b[i] = b[i - 4] + a[i];
```

According to [1], loop hints are ignored if they are not safe to apply.

This patch adds a check to prevent vectorisation with interleaving if
isSafeForAnyVectorWidth() returns false. This is already checked in
selectInterleaveCount().

[1]
https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
2025-10-22 15:21:27 +01:00
Florian Hahn
8d29d09309
[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)
Move narrowInterleaveGroups to to general VPlan optimization stage.

To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.

If such a VF is found, the original VPlan is split into 2:
 a) a new clone which contains all VFs of Plan, except VFToOptimize, and
 b) the original Plan with VFToOptimize as single VF.

The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.

Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.

One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz

PR: https://github.com/llvm/llvm-project/pull/149706
2025-10-21 11:37:42 +01:00
Florian Hahn
35b9f20449
[LV] Check for TruncInsts in canTruncateToMinimalBitwidth.
TruncInst must truncate at most to their destination. Return false if
MinBWs contains a destination size > the trunc result type size.

Fixes https://github.com/llvm/llvm-project/issues/162688.
2025-10-20 22:31:16 +01:00
Florian Hahn
b9ce7656e9
[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-19 18:49:05 +00:00
Ramkumar Ramachandra
0a4702407b
[VPlan] Improve code around canConstantBeExtended (NFC) (#161652)
Follow up on 7c4f188 ([LV] Support multiplies by constants when forming
scaled reductions), introducing m_APInt, and improving code around
canConstantBeExtended: we change canConstantBeExtended to take an APInt.
2025-10-16 13:03:13 +01:00
Florian Hahn
861519327a
[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)
The canonical IV is tied to region blocks; move getCanonicalIV there and
update all users.

PR: https://github.com/llvm/llvm-project/pull/163020
2025-10-15 12:48:35 +01:00
Florian Hahn
4bf5ab4f9d
[VPlan] Set flags when constructing truncs using VPWidenCastRecipe.
VPWidenCastRecipes with Trunc opcodes where missing the correct OpType
for IR flags. Update createWidenCast to set the correct flags for
truncs, and use it consistenly.

Fixes https://github.com/llvm/llvm-project/issues/162374.
2025-10-12 14:01:12 +01:00
Florian Hahn
4b8cac2bcc
[VPlan] Don't reset canonical IV start value. (#161589)
Instead of re-setting the start value of the canonical IV when
vectorizing the epilogue we can emit an Add VPInstruction to provide
canonical IV value, adjusted by the resume value from the main loop.

This is in preparation to make the canonical IV a VPValue defined by
loop regions. It ensures that the canonical IV always starts at 0.

PR: https://github.com/llvm/llvm-project/pull/161589
2025-10-11 22:19:05 +01:00
Florian Hahn
ba69e33e13
[LV] Consistently apply address def scalarization across loop.
Consistently scalarize loads used as part of address computations across
all uses in the loop. This aligns the VPlan and legacy cost model and
fixes a divergence crash. It doesn't matter if the load and address
users are in different blocks, as long as they are in the same loop, the
scalar value can be used. This removes a number of insert/extracts.
2025-10-09 22:04:15 +01:00
Florian Hahn
4d45718b47
[IVDescriptors] Add isFPMinMaxNumRecurrenceKind helper (NFC).
Add helper to check for FMinNum and FMaxNum recurrence kinds, as
suggested in https://github.com/llvm/llvm-project/pull/161735.
2025-10-08 11:40:46 +01:00
Ramkumar Ramachandra
772071bb75
[LV] Improve code using VPIRPhi::getIRPhi (NFC) (#162270) 2025-10-08 09:19:48 +01:00
Florian Hahn
74af5784a5
Reapply "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)" (#162157)
This reverts commit f80c0baf058dbdc5 and 94eade61a02ae5.

Recommit a small fix for targets using prefersVectorizedAddressing.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: https://github.com/llvm/llvm-project/pull/160053
2025-10-06 22:16:08 +01:00
Ramkumar Ramachandra
a663119455
[LV] Fix verifier failures due to 93073af (#162097)
Follow up on 93073af ([LV] Move 3 functions into VPlanTransforms (NFC))
to not call runPass on the moved functions, as that results in verifier
failures.

Ref: https://lab.llvm.org/buildbot/#/builders/187/builds/12178
2025-10-06 16:36:50 +01:00
Ramkumar Ramachandra
93073af121
[LV] Move 3 functions into VPlanTransforms (NFC) (#158644)
Two of them are actually transforms, and the third is a dependent
static.
2025-10-06 11:13:01 +01:00
Alexey Bataev
f80c0baf05 Revert "Reapply "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)" (#161724)"
This reverts commit 8f2466bc72a5ab163621cb1bf4bf53a27f1cefe7 to fix
crashes reported in commits
2025-10-05 08:38:17 -07:00
Ramkumar Ramachandra
ff4aec5d3c
[VPlan] Deref VPlanPtr when passing to transform (NFC) (#161369)
For uniformity with other transforms.
2025-10-03 09:47:35 +01:00
Florian Hahn
8f2466bc72
Reapply "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)" (#161724)
This reverts commit f61be4352592639a0903e67a9b5d3ec664ad4d23.

Recommit a small fix handling scalarization overhead consistently with
legacy cost model if a load is used directly as operand of another
memory operation, which fixes
https://github.com/llvm/llvm-project/issues/161404.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: https://github.com/llvm/llvm-project/pull/160053
2025-10-02 22:00:22 +01:00
Florian Hahn
7c4f188f27
[LV] Support multiplies by constants when forming scaled reductions. (#161092)
We can create partial reductions for multiplies with constants, if the
constant is small enough to be extended from source to destination type
w/o changing the value.

This only handles constant on the right side of a multiply, relying on
other passes to canonicalize the input.

Alive2 Proofs: https://alive2.llvm.org/ce/z/iWRMr6

PR: https://github.com/llvm/llvm-project/pull/161092
2025-10-02 10:53:17 +00:00
Florian Hahn
1a850279c5
[LV] Re-compute cost of scalarized load users.
If there are direct memory op users of the newly scalarized load,
their cost may have changed because there's no scalarization
overhead for the operand. Update it.

This ensures assigning consistent costs to scalarized memory
instructions that themselves have scalarized memory instructions as
operands.
2025-10-01 22:30:18 +01:00
Florian Hahn
62c50fd795
[VPlan] Retrieve canonical IV directly in preparePlanForEpilogue (NFCI).
Move code handling canonical IV out of the loop, simplifying the loop
body. Preparation for follow-up changes
2025-10-01 19:32:54 +01:00
Florian Hahn
f61be43525
Revert "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)"
This reverts commit b4be7ecaf06bfcb4aa8d47c4fda1eed9bbe4ae77.

See https://github.com/llvm/llvm-project/issues/161404 for a crash
exposed by the change. Revert while I investigate.
2025-09-30 22:13:06 +01:00
Ramkumar Ramachandra
280abaf9da
[VPlan] Handle scalar-VF in transforms (NFC) (#161365) 2025-09-30 19:35:12 +01:00
Ramkumar Ramachandra
2f7252a841
[LV] Preserve GEP nusw when widening memory (#160885) 2025-09-30 10:42:45 +00:00
Florian Hahn
45ce88758d
[LV] Don't preserve LCSSA in SCEVExpander for runtime checks. (#159556)
LV does not preserve LCSSA, it constructs it just before processing a
loop to vectorize. Runtime check expressions are invariant to that loop,
so expanding them should not break LCSSA form for the loop we are about
to vectorize.

This fixes a crash when discarding instructions generated when expanding
runtime checks, if the expansion introduces LCSSA phis for values from
other loops which are not in LCSSA form: we would introduce new LCSSA
phis and update all outside users, some of which are not created by the
expander and cannot be cleaned up.

Fixes https://github.com/llvm/llvm-project/issues/158259.

PR: https://github.com/llvm/llvm-project/pull/159556
2025-09-30 10:03:55 +00:00
Florian Hahn
b4be7ecaf0
[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: https://github.com/llvm/llvm-project/pull/160053
2025-09-29 08:08:09 +00:00
Ramkumar Ramachandra
0fc6213aee
[LV] Clarify nature of legacy CSE (NFC) (#160855)
In order to avoid conflating the legacy CSE with the VPlan-based one,
rename the legacy CSE and insert a FIXME to clarify the nature of the
legacy CSE.
2025-09-28 10:02:22 +01:00
Florian Hahn
78af056137
[VPlan] Run CSE closer to VPlan::execute. (#160572)
Additional CSE opportunities are exposed after converting to concrete
recipes/dissolving regions and materializing various expressions. Run
CSE later, to capitalize on some of the late opportunities.

PR: https://github.com/llvm/llvm-project/pull/160572
2025-09-26 09:38:58 +00:00
Florian Hahn
2016af5652
[VPlan] Create epilogue minimum iteration check in VPlan. (#157545)
Move creation of the minimum iteration check for the epilogue vector
loop to VPlan. This is a first step towards breaking up and moving
skeleton creation for epilogue vectorization to VPlan.

It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum
iteration check is created directly in VPlan, connecting the check
blocks from the main vector loop is done as post-processing. Next steps
are to move connecting and updating the branches from the check blocks
to VPlan, as well as updating the incoming values for phis.

Test changes are improvements due to folding of live-ins.

PR: https://github.com/llvm/llvm-project/pull/157545
2025-09-25 07:13:38 +00:00
Florian Hahn
ac3f148f60
[LV] Set extend kinds together with ExtOpTypes (NFC).
Set extend kinds together with ExtOpTypes. This will make it easier to
adjust the extend kind handling.
2025-09-24 21:36:11 +01:00
Florian Hahn
a7b4dd42bd
[LV] Don't create partial reductions if factor doesn't match accumulator (#158603)
Check if the scale-factor of the accumulator is the same as the request
ScaleFactor in tryToCreatePartialReductions.

This prevents creating partial reductions if not all instructions in the
reduction chain form partial reductions. e.g. because we do not form a
partial reduction for the loop exit instruction.

Currently code-gen works fine, because the scale factor of
VPPartialReduction is not used during ::execute, but it means we compute
incorrect cost/register pressure, because the partial reduction won't
reduce to the specified scaling factor.

PR: https://github.com/llvm/llvm-project/pull/158603
2025-09-24 12:21:03 +01:00
Ramkumar Ramachandra
66fd42008a
[LV] Don't ignore invariant stores when costing (#158682)
Invariant stores of reductions are removed early in the VPlan
construction, and there is no reason to ignore them while costing.
2025-09-24 11:33:24 +01:00
Florian Hahn
88aab08ae5
[LV] Check for hoisted safe-div selects in planContainsAdditionalSimp.
In some cases, safe-divisor selects can be hoisted out of the vector
loop. Catching all cases in the legacy cost model isn't possible, in
particular checking if all conditions guarding a division are loop
invariant.

Instead, check in planContainsAdditionalSimplifications if there are any
hoisted safe-divisor selects. If so, don't compare to the more
inaccurate legacy cost model.

Fixes https://github.com/llvm/llvm-project/issues/160354.
Fixes https://github.com/llvm/llvm-project/issues/160356.
2025-09-23 21:54:02 +01:00
Florian Hahn
49605a4727
[LV] Set correct costs for interleave group members.
This ensures each scalarized member has an accurate cost, matching the
cost it would have if it would not have been considered for an
interleave group.
2025-09-21 18:07:22 +01:00
Florian Hahn
addfdb5273
[LV] Skip select cost for invariant divisors in legacy cost model.
For UDiv/SDiv with invariant divisors, the created selects will be
hoisted out. Don't compute their cost for each iteration, to match the
more accurate VPlan-based cost modeling.

Fixes https://github.com/llvm/llvm-project/issues/159402.
2025-09-21 15:08:50 +01:00
Florian Hahn
7dd9b3d814
[LV] Also handle non-uniform scalarized loads when processing AddrDefs.
Loads of addresses are scalarized and have their costs computed w/o
scalarization overhead. Consistently apply this logic also to
non-uniform loads that are already scalarized, to ensure their costs are
consistent with other scalarized lodas that are used as addresses.
2025-09-21 09:36:58 +01:00
Florian Hahn
81c0c7337d
[LV] Pass operand info to getMemoryOpCost in getMemInstScalarizationCost.
Pass operand info to getMemoryOpCost in getMemInstScalarizationCost.
This matches the behavior in VPReplicateRecipe::computeCost.
2025-09-19 21:03:38 +01:00
Florian Hahn
0c028bbf33
[LV] Always add uniform pointers to uniforms list.
Always add pointers proved to be uniform via legal/SCEV to worklist.
This extends the existing logic to handle a few more pointers known to
be uniform.
2025-09-18 22:56:19 +01:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Ramkumar Ramachandra
f68f3b9a7e
[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550) 2025-09-18 12:40:31 +01:00
Ramkumar Ramachandra
0384f6c9db
[VPlanPatternMatch] Introduce match functor (NFC) (#159521)
Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to
introduce the VPlanPatternMatch version of the match functor to shorten
some idioms.

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-18 10:36:12 +01:00
Hassnaa Hamdi
e8aa0b688a
[LV]: Ensure fairness when selecting epilogue VF. (#155547)
Consider IC when deciding if epilogue profitable for scalable vectors,
same as fixed-width vectors.
2025-09-17 14:48:10 +01:00
Ramkumar Ramachandra
148a83543b
[LV] Introduce m_One and improve (0|1)-match (NFC) (#157419) 2025-09-15 10:34:06 +00:00