535 Commits

Author SHA1 Message Date
Kerry McLaughlin
de3de3f143
[LV] Consider interleaving when -enable-wide-lane-mask=true (#163387)
Currently the only way to enable the use of wide active lane masks is to pass
-enable-wide-lane-mask and force both interleaving & tail-folding with additional
flags. This patch changes selectInterleaveCount to consider interleaving if wide
lane masks were requested, although the feature remains off by default.
2025-11-11 11:46:59 +00:00
Ramkumar Ramachandra
fdd52f5fe1
[VPlan] Handle WidenGEP in narrowToSingleScalars (#166740)
This allows us to strip a special case in VPWidenGEP::execute.
2025-11-11 10:33:55 +00:00
Florian Hahn
8b1cc2d5f5 [VPlan] Update canNarrowLoad to check WidenMember0's op first (NFCI).
This hardens the code to check based on WideMember0's operands. This
ensures each call will go through the same check. Should be NFC
currently but needed when generalizing in follow-up patches.
2025-11-10 22:18:34 +00:00
Ramkumar Ramachandra
c2d4c7c18b
[VPlan] Permit more users in narrowToSingleScalars (#166559)
narrowToSingleScalarRecipes can permit users that are WidenStore, or a
VPInstruction that has a suitable opcode. This is a generalization and
extension of the existing code.
2025-11-10 17:03:14 +00:00
Ramkumar Ramachandra
2d1d5fe78e
[VPlan] Simplify branch-cond with getVectorTripCount (#155604)
Call getVectorTripCount first, and call getTripCount failing that, in
simplifyBranchConditionForVFAndUF, to simplify missed cases. While at
it, strip the dead check for a zero TC.
2025-11-10 10:43:37 +00:00
Florian Hahn
17ad8480f8 [VPlan] Convert redundant isSingleScalar check into assert (NFC).
Follow-up to post-commit suggestion in
https://github.com/llvm/llvm-project/pull/165506.

C must be a single-scalar, turn check into assert.
2025-11-07 20:04:25 +00:00
Ramkumar Ramachandra
eab44600fb
[VPlan] Rename onlyFirst(Lane|Part)Used (NFC) (#166562)
Rename onlyFirst(Lane|Part)Used to usesFirst(Lane|Part)Only, in line
with usesScalars, for clarity.
2025-11-06 10:07:58 +00:00
Mel Chen
d1874047f5
[VPlan] Retrieve alignment from Load/StoreInst in constructors. nfc (#165722)
This patch removes the explicit Alignment parameter from
VPWidenLoadRecipe and VPWidenStoreRecipe constructors. Instead, these
recipes now directly retrieve the alignment from their
LoadInst/StoreInst.
2025-11-06 09:02:04 +00:00
Florian Hahn
9fc8ddd2c8
[VPlan] Move code narrowing ops feeding an interleave group to helper (NFCI)
Move and combine the code to narrow ops feeding interleave groups to a
single unified static helper. NFC, as legalization logic has not
changed.
2025-11-05 22:54:52 +00:00
Florian Hahn
b0b4616790
[VPlan] Handle single-scalar conds in VPWidenSelectRecipe. (#165506)
Generalize VPWidenSelectRecipe codegen to consider single-scalar
conditions instead of just loop-invariant ones.

If the condition is a single-scalar, we can simply use a scalar
condition.

PR: https://github.com/llvm/llvm-project/pull/165506
2025-11-05 22:11:29 +00:00
Ramkumar Ramachandra
1de55c9693
[VPlan] Avoid sinking allocas in sinkScalarOperands (#166135)
Use cannotHoistOrSinkRecipe to forbid sinking allocas.
2025-11-05 13:06:24 +00:00
Ramkumar Ramachandra
0a95a86634
[VPlan] Fix first-lane comment in sinkScalarOperands (NFC) (#166347)
To follow-up on a post-commit review.
2025-11-04 12:02:58 +00:00
Ramkumar Ramachandra
0cae0af520
[VPlan] Shorten insert-idiom in sinkScalarOperands (NFC) (#166343)
To follow-up on a post-commit review.
2025-11-04 10:04:57 +00:00
Mel Chen
40a042e49c
[VPlanTransform] Specialize simplifyRecipe for VPSingleDefRecipe pointer. nfc (#165568)
The function simplifyRecipe now takes a VPSingleDefRecipe pointer since
it only simplifies single-def recipes for now.
2025-11-03 09:00:54 +00:00
Luke Lau
97d4e96cc5
[VPlan] Perform optimizeMaskToEVL in terms of pattern matching (#155394)
Currently in optimizeMaskToEVL we convert every widened load, store or
reduction to a VP predicated recipe with EVL, regardless of whether or
not it uses the header mask.

So currently we have to be careful when working on other parts VPlan to
make sure that the EVL transform doesn't break or transform something
incorrectly, because it's not a semantics preserving transform.
Forgetting to do so has caused miscompiles before, like the case that
was fixed in #113667

This PR rewrites it to work in terms of pattern matching, so it now only
converts a recipe to a VP predicated recipe if it is exactly masked with
the header mask.

After this the transform should be a true optimisation and not change
any semantics, so it shouldn't miscompile things if other parts of VPlan
change.

This fixes #152541, and allows us to move addExplicitVectorLength into
tryToBuildVPlanWithVPRecipes in #153144

It also splits out the load/store transforms into separate patterns for
reversed and non-reversed, which should make #146525 easier to implement
and reason about.
2025-11-03 16:53:18 +08:00
Ramkumar Ramachandra
03eb3cdaaa
[VPlan] Rewrite sinkScalarOperands (NFC) (#151696)
Rewrite sinkScalarOperands in VPlanTransforms for clarity, in
preparation for follow-up work to extend it to handle more recipes.
2025-11-03 06:43:42 +00:00
Florian Hahn
1c727baf69
[VPlan] Mark BranchOnCount and BranchOnCond as having side effects (NFC)
BranchOnCount and BranchOnCond do not read memory, but cannot be moved.

Mark them as having side-effects, but not reading/writing memory, which
more accurately models that above. This allows removing some special
checking for branches both in the current code and future patches.
2025-11-02 21:14:37 +00:00
Florian Hahn
b7e922a3da
[VPlan] Convert BuildVector with all-equal values to Broadcast. (#165826)
Fold BuildVector where all operands are equal to Broadcast of the first
operand. This will subsequently make it easier to remove additional
buildvectors/broadcasts, e.g. via
https://github.com/llvm/llvm-project/pull/165506.

PR: https://github.com/llvm/llvm-project/pull/165826
2025-11-01 17:28:42 -07:00
Florian Hahn
6e83937f39
[VPlan] Add getConstantInt helpers for constant int creation (NFC).
Add getConstantInt helper methods to VPlan to simplify the common
pattern of creating constant integer live-ins.

Suggested as follow-up in
https://github.com/llvm/llvm-project/pull/164127.
2025-11-01 04:13:01 +00:00
Florian Hahn
a943132761
[VPlan] Add VPRegionBlock::getCanonicalIVType (NFC). (#164127)
Split off from https://github.com/llvm/llvm-project/pull/156262.

Similar to VPRegionBlock::getCanonicalIV, add helper to get the type of
the canonical IV, in preparation for removing VPCanonicalIVPHIRecipe.

PR: https://github.com/llvm/llvm-project/pull/164127
2025-10-31 20:05:02 -07:00
Florian Hahn
317b42ef5c
[VPlan] Remove original recipe after narrowing to single-scalar.
Directly remove RepOrWidenR after replacing all uses. Removing the dead
user early unlocks additional opportunities for further narrowing.
2025-10-31 04:38:16 +00:00
Florian Hahn
98d3a25f74
[VPlan] Don't preserve LCSSA in expandSCEVs. (#165505)
This follows similar reasoning as 45ce88758d24
(https://github.com/llvm/llvm-project/pull/159556):

LV does not preserve LCSSA, it constructs it just before processing a
loop to vectorize. Runtime check expressions are invariant to that loop,
so expanding them should not break LCSSA form for the loop we are about
to vectorize.

LV creates SCEV and memory runtime checks early on and then disconnects
the blocks temporarily. The patch fixes a mis-compile, where previously
LCSSA construction during SCEV expand may replace uses in currently
unreachable SCEV/memory check blocks.

Fixes https://github.com/llvm/llvm-project/issues/162512

PR: https://github.com/llvm/llvm-project/pull/165505
2025-10-29 18:25:46 +00:00
Sam Tebbs
22f860a55d
[LV] Bundle (partial) reductions with a mul of a constant (#162503)
A reduction (including partial reductions) with a multiply of a constant
value can be bundled by first converting it from `reduce.add(mul(ext,
const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to
extend the constant.

This PR adds such bundling by first truncating the constant to the
source type of the other extend, then extending it to the destination
type of the extend. The first truncate is necessary so that the types of
each extend's operand are then the same, and the call to
canConstantBeExtended proves that the extend following a truncate is
safe to do. The truncate is removed by optimisations.

This is a stacked PR, 1a and 1b can be merged in any order:
1a. https://github.com/llvm/llvm-project/pull/147302
1b. https://github.com/llvm/llvm-project/pull/163175
2. -> https://github.com/llvm/llvm-project/pull/162503
2025-10-28 16:59:53 +00:00
Ramkumar Ramachandra
a2d873fb87
[VPlan] Introduce cannotHoistOrSinkRecipe, fix miscompile (#162674)
Factor out common code to determine legality of hoisting and sinking.
The patch has the side-effect of fixing an underlying bug, where a
load/store pair is reordered.
2025-10-28 09:36:17 +00:00
Mel Chen
6bf948999f
[VPlan] Store memory alignment in VPWidenMemoryRecipe. nfc (#165255)
Add an member Alignment to VPWidenMemoryRecipe to store memory alignment
directly in the recipe. Update constructors, clone(), and relevant
methods to use this stored alignment instead of querying the IR
instruction. This allows VPWidenLoadRecipe/VPWidenStoreRecipe to be
constructed without relying on the original IR instruction in the
future.
2025-10-28 15:29:35 +08:00
Ramkumar Ramachandra
2c6c2689c5
[VPlan] Extend tryToFoldLiveIns to fold binary intrinsics (#161703)
InstSimplifyFolder can fold binary intrinsics, so take the opportunity
to unify code with getOpcodeOrIntrinsicID, and handle the case. The
additional handling of WidenGEP is non-functional, as the GEP is
simplified before it is widened, as the included test shows.
2025-10-24 10:21:39 +00:00
Florian Hahn
301fa24671 [VPlan] Limit narrowInterleaveGroups to single block regions for now.
Currently only regions with a single block are supported by the legality
checks.
2025-10-23 23:55:59 +01:00
Sam Tebbs
6b19a546aa
[LV] Bundle partial reductions inside VPExpressionRecipe (#147302)
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. -> https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. https://github.com/llvm/llvm-project/pull/147513
2025-10-23 11:18:55 +00:00
Florian Hahn
bfc322dd72
Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

There have been reports of mis-compiles
in https://github.com/llvm/llvm-project/pull/149706.

Revert while I investigate.
2025-10-22 21:27:11 +01:00
Florian Hahn
aca53f4375
[VPlan] Skip masked interleave groups in narrowInterleaveGroups.
8d29d09309 exposed a crash due to incorrectly trying to handle masked
interleave recipes. For now, the current code does not support masked
interleave recipes. Bail out for them.
2025-10-22 14:10:01 +01:00
Florian Hahn
82b59345fe
[VPlan] Clarify naming for helpers to create loop&replicate regions (NFC)
Split off to clarify naming, as suggested in
https://github.com/llvm/llvm-project/pull/156262.
2025-10-21 20:41:54 +01:00
Florian Hahn
8d29d09309
[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)
Move narrowInterleaveGroups to to general VPlan optimization stage.

To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.

If such a VF is found, the original VPlan is split into 2:
 a) a new clone which contains all VFs of Plan, except VFToOptimize, and
 b) the original Plan with VFToOptimize as single VF.

The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.

Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.

One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz

PR: https://github.com/llvm/llvm-project/pull/149706
2025-10-21 11:37:42 +01:00
Ramkumar Ramachandra
3fbae10faa
[VPlan] Improve code using m_APInt (NFC) (#161683) 2025-10-21 10:27:03 +01:00
Ramkumar Ramachandra
cc850b830c
[VPlan] Use VPlan::getRegion to shorten code (NFC) (#164287) 2025-10-21 10:25:07 +01:00
Florian Hahn
b4dbb1cdc4
[VPlan] Be more careful with CSE in replicate regions. (#162110)
Recipes in replicate regions implicitly depend on the region's
predicate. Limit CSE to recipes in the same block, when either recipe is
in a replicate region.

This allows handling VPPredInstPHIRecipe during CSE. If we perform CSE
on recipes inside a replicate region, we may end up with 2
VPPredInstPHIRecipes sharing the same operand. This is incompatible with
current VPPredInstPHIRecipe codegen, which re-sets the current value of
its operand in VPTransformState. This can cause crashes in the added
test cases.

Note that this patch only modifies ::isEqual to check for replicating
regions and not getHash, as CSE across replicating regions should be
uncommon.

Fixes https://github.com/llvm/llvm-project/issues/157314. 
Fixes https://github.com/llvm/llvm-project/issues/161974.

PR: https://github.com/llvm/llvm-project/pull/162110
2025-10-20 10:53:47 +00:00
Ramkumar Ramachandra
086666de83
[VPlan] Improve code using drop_begin, append_range (NFC) (#163934) 2025-10-20 09:07:18 +01:00
Florian Hahn
b9ce7656e9
[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-19 18:49:05 +00:00
Florian Hahn
8769119027
[VPlan] Add VPRecipeBase::getRegion helper (NFC).
Multiple places retrieve the region for a recipe. Add a helper to make
the code more compact and clearer.
2025-10-18 21:25:34 +01:00
Ramkumar Ramachandra
b71515cc76
[VPlan] Extend licm to hoist assumes (#162636)
Assumes are safe to hoist if they're guaranteed to execute, since they
don't alias, and don't throw. This mirrors what the IR-LICM does.
2025-10-16 13:59:32 +00:00
Ramkumar Ramachandra
8f04f074c9
[VPlan] Clarify legality check in licm (NFC) (#162486)
Recipes in licm are safe to hoist if the legality check passes, and the
recipe is guaranteed to execute; the single successor of the vector
preheader is the vector loop region. Clarify this in the code structure
and comments.
2025-10-16 12:36:39 +01:00
Florian Hahn
4f23767852
[VPlan] Add m_FirstActiveLane matcher (NFC).
Add m_FirstActiveLane, to slightly simplify pattern matching in
preparation for https://github.com/llvm/llvm-project/pull/149042.
2025-10-15 18:55:26 +01:00
Florian Hahn
7f54fccc0e
[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056)
When narrowing stores of a single-scalar, we currently use
ExtractLastElement, which extracts the last element across all parts.
This is not correct if the store's address is not uniform across all
parts. If it is only uniform-per-part, the last lane per part must be
extracted. Add a new ExtractLastLanePerPart opcode to handle this
correctly. Most transforms apply to both ExtractLastElement and
ExtractLastLanePerPart, with the only difference being their treatment
during unrolling.

Fixes https://github.com/llvm/llvm-project/issues/162498.

PR: https://github.com/llvm/llvm-project/pull/163056
2025-10-15 13:46:09 +01:00
Florian Hahn
861519327a
[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)
The canonical IV is tied to region blocks; move getCanonicalIV there and
update all users.

PR: https://github.com/llvm/llvm-project/pull/163020
2025-10-15 12:48:35 +01:00
Florian Hahn
9bb0eedb59
[VPlan] Assign custom opcodes to recipes not mapping to IR opcodes. (#162267)
We can perform CSE on recipes that do not directly map to Instruction
opcodes. One example is VPVectorPointerRecipe. Currently this is handled
by supporting them in ::canHandle, but currently that means that we
return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently
only works, because the only case we return std::nullopt and perform CSE
is VPVectorPointerRecipe. But that does not work if we support more such
recipes, like VPPredInstPHIRecipe
(https://github.com/llvm/llvm-project/pull/162110).

To fix this, return a custom opcode from getOpcodeOrIntrinsicID for
recipes like VPVectorPointerRecipe, using the VPDefID after all regular
instruction opcodes.

PR: https://github.com/llvm/llvm-project/pull/162267
2025-10-13 11:16:14 +01:00
Ramkumar Ramachandra
946238e748
[VPlan] Strip VPDT's default constructor (NFC) (#162692) 2025-10-13 10:16:05 +00:00
Ramkumar Ramachandra
869c76dda3
[VPlan] Allow zero-operand m_BranchOn(Cond|Count) (NFC) (#162721) 2025-10-13 08:50:09 +01:00
Florian Hahn
4bf5ab4f9d
[VPlan] Set flags when constructing truncs using VPWidenCastRecipe.
VPWidenCastRecipes with Trunc opcodes where missing the correct OpType
for IR flags. Update createWidenCast to set the correct flags for
truncs, and use it consistenly.

Fixes https://github.com/llvm/llvm-project/issues/162374.
2025-10-12 14:01:12 +01:00
Florian Hahn
4b8cac2bcc
[VPlan] Don't reset canonical IV start value. (#161589)
Instead of re-setting the start value of the canonical IV when
vectorizing the epilogue we can emit an Add VPInstruction to provide
canonical IV value, adjusted by the resume value from the main loop.

This is in preparation to make the canonical IV a VPValue defined by
loop regions. It ensures that the canonical IV always starts at 0.

PR: https://github.com/llvm/llvm-project/pull/161589
2025-10-11 22:19:05 +01:00
Ramkumar Ramachandra
107940f3be
[VPlan] Improve binary matchers in two places (NFC) (#162268) 2025-10-07 14:56:43 +01:00
Ramkumar Ramachandra
f7f49ee40e
[VPlan] Improve code around WidenPHI's constructor (NFC) (#162277) 2025-10-07 14:56:20 +01:00