510 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
2c6c2689c5
[VPlan] Extend tryToFoldLiveIns to fold binary intrinsics (#161703)
InstSimplifyFolder can fold binary intrinsics, so take the opportunity
to unify code with getOpcodeOrIntrinsicID, and handle the case. The
additional handling of WidenGEP is non-functional, as the GEP is
simplified before it is widened, as the included test shows.
2025-10-24 10:21:39 +00:00
Florian Hahn
301fa24671 [VPlan] Limit narrowInterleaveGroups to single block regions for now.
Currently only regions with a single block are supported by the legality
checks.
2025-10-23 23:55:59 +01:00
Sam Tebbs
6b19a546aa
[LV] Bundle partial reductions inside VPExpressionRecipe (#147302)
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/156976
4. https://github.com/llvm/llvm-project/pull/160154
5. -> https://github.com/llvm/llvm-project/pull/147302
6. https://github.com/llvm/llvm-project/pull/162503
7. https://github.com/llvm/llvm-project/pull/147513
2025-10-23 11:18:55 +00:00
Florian Hahn
bfc322dd72
Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

There have been reports of mis-compiles
in https://github.com/llvm/llvm-project/pull/149706.

Revert while I investigate.
2025-10-22 21:27:11 +01:00
Florian Hahn
aca53f4375
[VPlan] Skip masked interleave groups in narrowInterleaveGroups.
8d29d09309 exposed a crash due to incorrectly trying to handle masked
interleave recipes. For now, the current code does not support masked
interleave recipes. Bail out for them.
2025-10-22 14:10:01 +01:00
Florian Hahn
82b59345fe
[VPlan] Clarify naming for helpers to create loop&replicate regions (NFC)
Split off to clarify naming, as suggested in
https://github.com/llvm/llvm-project/pull/156262.
2025-10-21 20:41:54 +01:00
Florian Hahn
8d29d09309
[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)
Move narrowInterleaveGroups to to general VPlan optimization stage.

To do so, narrowInterleaveGroups now has to find a suitable VF where all
interleave groups are consecutive and saturate the full vector width.

If such a VF is found, the original VPlan is split into 2:
 a) a new clone which contains all VFs of Plan, except VFToOptimize, and
 b) the original Plan with VFToOptimize as single VF.

The original Plan is then optimized. If a new copy for the other VFs has
been created, it is returned and the caller has to add it to the list of
candidate plans.

Together with https://github.com/llvm/llvm-project/pull/149702, this
allows to take the narrowed interleave groups into account when
computing costs to choose the best VF and interleave count.

One example where we currently miss interleaving/unrolling when
narrowing interleave groups is https://godbolt.org/z/Yz77zbacz

PR: https://github.com/llvm/llvm-project/pull/149706
2025-10-21 11:37:42 +01:00
Ramkumar Ramachandra
3fbae10faa
[VPlan] Improve code using m_APInt (NFC) (#161683) 2025-10-21 10:27:03 +01:00
Ramkumar Ramachandra
cc850b830c
[VPlan] Use VPlan::getRegion to shorten code (NFC) (#164287) 2025-10-21 10:25:07 +01:00
Florian Hahn
b4dbb1cdc4
[VPlan] Be more careful with CSE in replicate regions. (#162110)
Recipes in replicate regions implicitly depend on the region's
predicate. Limit CSE to recipes in the same block, when either recipe is
in a replicate region.

This allows handling VPPredInstPHIRecipe during CSE. If we perform CSE
on recipes inside a replicate region, we may end up with 2
VPPredInstPHIRecipes sharing the same operand. This is incompatible with
current VPPredInstPHIRecipe codegen, which re-sets the current value of
its operand in VPTransformState. This can cause crashes in the added
test cases.

Note that this patch only modifies ::isEqual to check for replicating
regions and not getHash, as CSE across replicating regions should be
uncommon.

Fixes https://github.com/llvm/llvm-project/issues/157314. 
Fixes https://github.com/llvm/llvm-project/issues/161974.

PR: https://github.com/llvm/llvm-project/pull/162110
2025-10-20 10:53:47 +00:00
Ramkumar Ramachandra
086666de83
[VPlan] Improve code using drop_begin, append_range (NFC) (#163934) 2025-10-20 09:07:18 +01:00
Florian Hahn
b9ce7656e9
[VPlan] Add VPInstruction to unpack vector values to scalars. (#155670)
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on https://github.com/llvm/llvm-project/pull/155102 (included in
PR)

PR: https://github.com/llvm/llvm-project/pull/155670
2025-10-19 18:49:05 +00:00
Florian Hahn
8769119027
[VPlan] Add VPRecipeBase::getRegion helper (NFC).
Multiple places retrieve the region for a recipe. Add a helper to make
the code more compact and clearer.
2025-10-18 21:25:34 +01:00
Ramkumar Ramachandra
b71515cc76
[VPlan] Extend licm to hoist assumes (#162636)
Assumes are safe to hoist if they're guaranteed to execute, since they
don't alias, and don't throw. This mirrors what the IR-LICM does.
2025-10-16 13:59:32 +00:00
Ramkumar Ramachandra
8f04f074c9
[VPlan] Clarify legality check in licm (NFC) (#162486)
Recipes in licm are safe to hoist if the legality check passes, and the
recipe is guaranteed to execute; the single successor of the vector
preheader is the vector loop region. Clarify this in the code structure
and comments.
2025-10-16 12:36:39 +01:00
Florian Hahn
4f23767852
[VPlan] Add m_FirstActiveLane matcher (NFC).
Add m_FirstActiveLane, to slightly simplify pattern matching in
preparation for https://github.com/llvm/llvm-project/pull/149042.
2025-10-15 18:55:26 +01:00
Florian Hahn
7f54fccc0e
[VPlan] Add ExtractLastLanePerPart, use in narrowToSingleScalar. (#163056)
When narrowing stores of a single-scalar, we currently use
ExtractLastElement, which extracts the last element across all parts.
This is not correct if the store's address is not uniform across all
parts. If it is only uniform-per-part, the last lane per part must be
extracted. Add a new ExtractLastLanePerPart opcode to handle this
correctly. Most transforms apply to both ExtractLastElement and
ExtractLastLanePerPart, with the only difference being their treatment
during unrolling.

Fixes https://github.com/llvm/llvm-project/issues/162498.

PR: https://github.com/llvm/llvm-project/pull/163056
2025-10-15 13:46:09 +01:00
Florian Hahn
861519327a
[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)
The canonical IV is tied to region blocks; move getCanonicalIV there and
update all users.

PR: https://github.com/llvm/llvm-project/pull/163020
2025-10-15 12:48:35 +01:00
Florian Hahn
9bb0eedb59
[VPlan] Assign custom opcodes to recipes not mapping to IR opcodes. (#162267)
We can perform CSE on recipes that do not directly map to Instruction
opcodes. One example is VPVectorPointerRecipe. Currently this is handled
by supporting them in ::canHandle, but currently that means that we
return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently
only works, because the only case we return std::nullopt and perform CSE
is VPVectorPointerRecipe. But that does not work if we support more such
recipes, like VPPredInstPHIRecipe
(https://github.com/llvm/llvm-project/pull/162110).

To fix this, return a custom opcode from getOpcodeOrIntrinsicID for
recipes like VPVectorPointerRecipe, using the VPDefID after all regular
instruction opcodes.

PR: https://github.com/llvm/llvm-project/pull/162267
2025-10-13 11:16:14 +01:00
Ramkumar Ramachandra
946238e748
[VPlan] Strip VPDT's default constructor (NFC) (#162692) 2025-10-13 10:16:05 +00:00
Ramkumar Ramachandra
869c76dda3
[VPlan] Allow zero-operand m_BranchOn(Cond|Count) (NFC) (#162721) 2025-10-13 08:50:09 +01:00
Florian Hahn
4bf5ab4f9d
[VPlan] Set flags when constructing truncs using VPWidenCastRecipe.
VPWidenCastRecipes with Trunc opcodes where missing the correct OpType
for IR flags. Update createWidenCast to set the correct flags for
truncs, and use it consistenly.

Fixes https://github.com/llvm/llvm-project/issues/162374.
2025-10-12 14:01:12 +01:00
Florian Hahn
4b8cac2bcc
[VPlan] Don't reset canonical IV start value. (#161589)
Instead of re-setting the start value of the canonical IV when
vectorizing the epilogue we can emit an Add VPInstruction to provide
canonical IV value, adjusted by the resume value from the main loop.

This is in preparation to make the canonical IV a VPValue defined by
loop regions. It ensures that the canonical IV always starts at 0.

PR: https://github.com/llvm/llvm-project/pull/161589
2025-10-11 22:19:05 +01:00
Ramkumar Ramachandra
107940f3be
[VPlan] Improve binary matchers in two places (NFC) (#162268) 2025-10-07 14:56:43 +01:00
Ramkumar Ramachandra
f7f49ee40e
[VPlan] Improve code around WidenPHI's constructor (NFC) (#162277) 2025-10-07 14:56:20 +01:00
Ramkumar Ramachandra
93073af121
[LV] Move 3 functions into VPlanTransforms (NFC) (#158644)
Two of them are actually transforms, and the third is a dependent
static.
2025-10-06 11:13:01 +01:00
Ramkumar Ramachandra
ff4aec5d3c
[VPlan] Deref VPlanPtr when passing to transform (NFC) (#161369)
For uniformity with other transforms.
2025-10-03 09:47:35 +01:00
Ramkumar Ramachandra
1f225676f4
[VPlan] Improve code using VPlan::getFalse (NFC) (#161681) 2025-10-02 19:21:27 +01:00
Ramkumar Ramachandra
5843ffb149
[VPlan] Improve code using m_One (NFC) (#161686) 2025-10-02 18:14:43 +01:00
Nicolai Hähnle
11a4b2d950
Cleanup the LLVM exported symbols namespace (#161240)
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.
2025-10-01 15:32:07 -07:00
Ramkumar Ramachandra
280abaf9da
[VPlan] Handle scalar-VF in transforms (NFC) (#161365) 2025-09-30 19:35:12 +01:00
Sam Tebbs
88658dbbc5
[LV] Add ExtNegatedMulAccReduction expression type (#160154)
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

1. https://github.com/llvm/llvm-project/pull/156976
2. -> https://github.com/llvm/llvm-project/pull/160154
3. https://github.com/llvm/llvm-project/pull/147302
2025-09-30 10:10:37 +01:00
Florian Hahn
71be13a6f0
[VPlan] Rewrite VPExpandSCEVExprs in replaceSymbolicStrides.
Extend replaceSymbolicStrides to also replace SCEVUnknowns in
VPExpandSCEVExprs using the information from StridesMaps.

This results in simpler SCEV expansions in some cases.
2025-09-28 21:55:31 +01:00
Florian Hahn
70a26da639
[VPlan] Set correct flags when creating and cloning VPWidenCastRecipe.
Make sure that we set the correct wrap flags when creating new
VPWidenCastRecipes for truncs and preserve the flags from the recipe
directly when cloning, to make sure they are not dropped.

Fixes https://github.com/llvm/llvm-project/issues/160396
2025-09-25 09:00:47 +01:00
Ramkumar Ramachandra
019913e4fa
[VPlan] Add WidenGEP::getSourceElementType (NFC) (#159029) 2025-09-22 10:02:08 +01:00
Ramkumar Ramachandra
b716d35388
[VPlanPatternMatch] Introduce m_ConstantInt (#159558) 2025-09-21 13:27:46 +01:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Ramkumar Ramachandra
0384f6c9db
[VPlanPatternMatch] Introduce match functor (NFC) (#159521)
Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to
introduce the VPlanPatternMatch version of the match functor to shorten
some idioms.

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-18 10:36:12 +01:00
Ramkumar Ramachandra
46fcece2a8
[VPlan] Extend CSE to eliminate GEPs (#156699)
The motivation for this patch is to close the gap between the
VPlan-based CSE and the legacy CSE, to make it easier to remove the
legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22
test failures, and after this patch, there are only 12 failures, and all
of them seem to have a single root cause:
VPlanTransforms::createInterleaveGroups() and
VPInterleaveGroup::execute(). The improvements from this patch are of
course welcome.

While developing the patch, a miscompile was found when GEP
source-element-types differ, and this has been fixed.

Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-16 10:14:32 +00:00
Ramkumar Ramachandra
148a83543b
[LV] Introduce m_One and improve (0|1)-match (NFC) (#157419) 2025-09-15 10:34:06 +00:00
Florian Hahn
ef7e03a2d1
[VPlan] Limit ExtractLastElem fold to recipes guaranteed single-scalar.
vputils::isSingleScalar(A) may return true to recipes that produce only
a single scalar value, but they could still end up as vector
instruction, because the recipe could not be converted to a
single-scalar VPInstruction/VPReplicateRecipe.

For now, only apply the fold for recipes guaranteed to produce a single
value, i.e. single-scalar VPInstructions and VPReplicateRecipes.

Fixes https://github.com/llvm/llvm-project/issues/158319.
2025-09-13 18:15:38 +01:00
Florian Hahn
b8eaceb39b
[VPlan] Explicitly replicate VPInstructions by VF. (#155102)
Extend replicateByVF added in #142433 (aa240293190) to also explicitly
unroll replicating VPInstructions.

Now the only remaining case where we replicate for all lanes is
VPReplicateRecipes in replicate regions.

PR: https://github.com/llvm/llvm-project/pull/155102
2025-09-12 17:06:26 +01:00
Florian Hahn
1efa997317
[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars.
Move handling of stores to single-scalar/uniform address from
replicateByVF to narrowToSingleScalar.
2025-09-10 21:58:29 +01:00
Florian Hahn
055e4ff35a
[VPlan] Don't narrow op multiple times in narrowInterleaveGroups.
Track which ops already have been narrowed, to avoid narrowing the same
operation multiple times. Repeated narrowing will lead to incorrect
results, because we could first narrow from an interleave group -> wide
load, and then narrow the wide load > single-scalar load.

Fixes thttps://github.com/llvm/llvm-project/issues/156190.
2025-09-10 19:22:42 +01:00
Florian Hahn
c3e76b2770
[VPlan] Keep common flags during CSE. (#157664)
During CSE, we don't have to drop all poison-generating flags on
mis-match, we can keep the ones common on both recipes.

PR: https://github.com/llvm/llvm-project/pull/157664
2025-09-10 10:20:48 +00:00
Stephen Tozer
d4f7995488
[VPlan] Use Unknown instead of empty location in VPlanTransforms (#157702)
The default values for DebugLocs in LoopVectorizer/VPlan were recently
updated from empty DebugLocs to DebugLoc::getUnknown, as part of the
DebugLoc Coverage Tracking work. However, there are some cases where we
also pass an explicit empty DebugLoc, in many cases as a filler
argument. This patch updates all of these to `getUnknown` for now, until
either valid locations or a suitable categorization can be assigned to
each instead.

This change is NFC outside of DebugLoc coverage tracking builds.
2025-09-10 10:33:58 +01:00
Mel Chen
4d9a7fa9ba
[VPlan] Remove dead recipes before simplifying blends (#157622)
In simplifyBlends, when normalizing a blend recipe, the first mask that
is used only by the blend and is not all-false is chosen, and its
corresponding incoming value becomes the initial value, with the others
blended into it. At the same time, the mask that is chosen can be
eliminated. However, a multi-user mask might be used by a dead recipe,
which prevents this optimization. This patch moves removeDeadRecipes
before simplifyBlends to eliminate dead recipes, allowing simplifyBlends
to remove more dead masks.
2025-09-10 08:03:18 +00:00
Florian Hahn
c4b17bf9ed
[VPlan] Slightly extend ExtractLastElement fold to single-scalars.
Update ExtractLastElement fold to support single scalar recipes, if all
their users only use scalars.
2025-09-09 22:08:08 +01:00
Florian Hahn
132bacde22
[VPlan] Also allow extracts as users when converting to single scalars.
Extracts technically do not use scalars, but vectors, but if the operand
is a single scalar we do not need a vector and they should not block
forming single scalars.
2025-09-08 22:11:39 +01:00
Luke Lau
3f9e0736ac
[VPlan] Move findCommonEdgeMask optimization to simplifyBlends (#156304)
Following up from #150368, this moves folding common edge masks into
simplifyBlends.

One test in uniform-blend.ll ended up regressing but after looking at it
closely, it came from a weird (x && !x) edge mask. So I've just included
a simplifcation in this PR to fold that to false.
2025-09-05 01:29:22 +00:00