6554 Commits

Author SHA1 Message Date
Florian Hahn
0c028bbf33
[LV] Always add uniform pointers to uniforms list.
Always add pointers proved to be uniform via legal/SCEV to worklist.
This extends the existing logic to handle a few more pointers known to
be uniform.
2025-09-18 22:56:19 +01:00
Alexey Bataev
8c41859a21 [SLP]Clear the operands deps of non-schedulable nodes, if previously all operands were copyable
If all operands of the non-schedulable nodes were previously only
copyables, need to clear the dependencies of the original schedule data
for such copyable operands and recalculate them to correctly handle
  number of dependecies.

Fixes #159406
2025-09-18 12:11:33 -07:00
Ramkumar Ramachandra
f1ba44f50a
[VPlan] Strip dead code in cst live-in match (NFC) (#159589)
A live-in constant can never be of vector type.
2025-09-18 19:28:42 +01:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Graham Hunter
6b99a7bbed
[LV] Provide utility routine to find uncounted exit recipes (#152530)
Splitting out just the recipe finding code from #148626 into a utility
function (along with the extra pattern matchers). Hopefully this makes
reviewing a bit easier.

Added a gtest, since this isn't actually used anywhere yet.
2025-09-18 15:45:23 +00:00
Ramkumar Ramachandra
f68f3b9a7e
[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550) 2025-09-18 12:40:31 +01:00
Ramkumar Ramachandra
0384f6c9db
[VPlanPatternMatch] Introduce match functor (NFC) (#159521)
Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to
introduce the VPlanPatternMatch version of the match functor to shorten
some idioms.

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-18 10:36:12 +01:00
Ramkumar Ramachandra
7fb3a91418
[PatternMatch] Introduce match functor (NFC) (#159386)
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-17 21:04:33 +01:00
Piotr Fusik
2ce04d0a41
[SLP][NFC] Refactor a long if into an early return (#156410) 2025-09-17 18:31:46 +02:00
Hassnaa Hamdi
e8aa0b688a
[LV]: Ensure fairness when selecting epilogue VF. (#155547)
Consider IC when deciding if epilogue profitable for scalable vectors,
same as fixed-width vectors.
2025-09-17 14:48:10 +01:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
Mikhail Gudim
66a8f47066
[SLPVectorizer][NFC] Save stride in a map. (#157706)
In order to avoid recalculating stride of strided load twice save it in
a map.
2025-09-16 09:02:09 -04:00
Ramkumar Ramachandra
46fcece2a8
[VPlan] Extend CSE to eliminate GEPs (#156699)
The motivation for this patch is to close the gap between the
VPlan-based CSE and the legacy CSE, to make it easier to remove the
legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22
test failures, and after this patch, there are only 12 failures, and all
of them seem to have a single root cause:
VPlanTransforms::createInterleaveGroups() and
VPInterleaveGroup::execute(). The improvements from this patch are of
course welcome.

While developing the patch, a miscompile was found when GEP
source-element-types differ, and this has been fixed.

Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-16 10:14:32 +00:00
Florian Hahn
1858532c48
[VPlan] Handle predicated UDiv in VPReplicateRecipe::computeCost.
Account for predicated UDiv,SDiv,URem,SRem in
VPReplicateRecipe::computeCost: compute costs of extra phis and apply
getPredBlockCostDivisor.

Fixes https://github.com/llvm/llvm-project/issues/158660
2025-09-15 21:46:50 +01:00
Florian Hahn
4949cb4a5e
[VPlan] Track VPValues instead of VPRecipes in calculateRegisterUsage. (#155301)
Update calculateRegisterUsageForPlan to track live-ness of VPValues
instead of recipes. This gives slightly more accurate results for
recipes that define multiple values (i.e. VPInterleaveRecipe).

When tracking the live-ness of recipes, all VPValues defined by an
VPInterleaveRecipe are considered alive until the last use of any of
them. When tracking the live-ness of individual VPValues, we can
accurately track the individual values until their last use.

Note the changes in large-loop-rdx.ll and pr47437.ll. This patch
restores the original behavior before introducing VPlan-based liveness
tracking.

PR: https://github.com/llvm/llvm-project/pull/155301
2025-09-15 20:55:11 +01:00
Alexey Bataev
f2301be0e8 [SLP]Add a check if the user itself is commutable
If the commutable instruction can be represented as a non-commutable
vector instruction (like add 0, %v can be represented as a part of sub
nodes with operation sub %v, 0), its operands might still be reordered
and this should be accounted when checking for copyables in operands

Fixes #158293
2025-09-15 12:50:03 -07:00
Ramkumar Ramachandra
d012642be1
[VPlan] Match more GEP-like in m_GetElementPtr (#158019)
The m_GetElementPtr matcher is incorrect and incomplete. Fix it to match
all possible GEPs to avoid misleading users. It currently just has one
use, and the change is non-functional for that use.
2025-09-15 20:06:37 +01:00
Ramkumar Ramachandra
148a83543b
[LV] Introduce m_One and improve (0|1)-match (NFC) (#157419) 2025-09-15 10:34:06 +00:00
Uyiosa Iyekekpolor
994a6a39e1
[VectorCombine] Fix scalarizeExtExtract for big-endian (#157962)
The scalarizeExtExtract transform assumed little-endian lane ordering,
causing miscompiles on big-endian targets such as AIX/PowerPC under -O3
-flto.

This patch updates the shift calculation to handle endianness correctly
for big-endian targets. No functional change
for little-endian targets.

Fixes #158197.

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-09-15 10:08:16 +00:00
Florian Hahn
fb60d0337c
[VPlan] Return non-option cost from getCostForRecipeWithOpcode (NFC).
getCostForRecipeWithOpcode must only be called with supported opcodes.
Directly return the cost, and add llvm_unreachable to catch unhandled
cases.
2025-09-14 22:24:57 +01:00
Mikhail Gudim
ee3a4f4c94
[SLPVectorizer] Test -1 stride loads. (#158358)
Add a test to generate -1 stride load and flags to force this behaviour.
2025-09-14 15:29:28 -04:00
Florian Hahn
91d4c0dfdf
Reapply "[VPlan] Compute cost of scalar (U|S)Div, (U|S)Rem in computeCost (NFCI)."
This reverts commit 9490d58fa92bb338db96af331194c9ba26eb0201.

Recommits de7e3a58952 with a fix for an unhandled case, causing crashes
in some configs.
2025-09-14 13:15:07 +01:00
Aiden Grossman
9490d58fa9 Revert "[VPlan] Compute cost of scalar (U|S)Div, (U|S)Rem in computeCost (NFCI)."
This reverts commit de7e3a589525179f3b02b84b194aac6cf581425c.

This broke quite a few upstream buildbots and premerge. Reverting for now to
get things back to green.

https://lab.llvm.org/buildbot/#/builders/137/builds/25467
2025-09-13 22:32:48 +00:00
Florian Hahn
de7e3a5895
[VPlan] Compute cost of scalar (U|S)Div, (U|S)Rem in computeCost (NFCI).
Directly compute the cost of UDiv, SDiv, URem, SRem in VPlan.
2025-09-13 22:09:06 +01:00
Florian Hahn
30e9cbacab
[VPlan] Move logic to compute scalarization overhead to cost helper(NFC)
Extract the logic to compute the scalarization overhead to a helper for
easy re-use in the future.
2025-09-13 20:41:44 +01:00
Florian Hahn
ef7e03a2d1
[VPlan] Limit ExtractLastElem fold to recipes guaranteed single-scalar.
vputils::isSingleScalar(A) may return true to recipes that produce only
a single scalar value, but they could still end up as vector
instruction, because the recipe could not be converted to a
single-scalar VPInstruction/VPReplicateRecipe.

For now, only apply the fold for recipes guaranteed to produce a single
value, i.e. single-scalar VPInstructions and VPReplicateRecipes.

Fixes https://github.com/llvm/llvm-project/issues/158319.
2025-09-13 18:15:38 +01:00
Florian Hahn
b8eaceb39b
[VPlan] Explicitly replicate VPInstructions by VF. (#155102)
Extend replicateByVF added in #142433 (aa240293190) to also explicitly
unroll replicating VPInstructions.

Now the only remaining case where we replicate for all lanes is
VPReplicateRecipes in replicate regions.

PR: https://github.com/llvm/llvm-project/pull/155102
2025-09-12 17:06:26 +01:00
Graham Hunter
54fc5367f6 [LV] Fix crash in uncountable exit with side effects checking
Fixes an ICE reported on PR #145663, as an assert was found to be
reachable with a specific combination of unreachable blocks.
2025-09-12 10:41:05 +00:00
Luke Lau
4bb250d6a3
[VPlan] Always consider register pressure on RISC-V (#156951)
Stacked on #156923 

In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because
whilst the largest element type is i8, we generate a bunch of pointer
vectors for gathers and scatters. This means the VF chosen is quite high
e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x
i64> m8 registers for the pointers.

This was briefly fixed by #132190 where we computed register pressure in
VPlan and used it to prune VFs that were likely to spill. The legacy
cost model wasn't able to do this pruning because it didn't have
visibility into the pointer vectors that were needed for the
gathers/scatters.

However VF pruning was restricted again to just the case when max
bandwidth was enabled in #141736 to avoid an AArch64 regression, and
restricted again in #149056 to only prune VFs that had max bandwidth
enabled.

On RISC-V we take advantage of register grouping for performance and
choose a default of LMUL 2, which means there are 16 registers to work
with – half the number as SVE, so we encounter higher register pressure
more frequently.

As such, we likely want to always consider pruning VFs with high
register pressure and not just the VFs from max bandwidth.

This adds a TTI hook to opt into this behaviour for RISC-V which fixes
the motivating godbolt example above. When last checked this
significantly reduces the number of spills on SPEC CPU 2017, up to
80% on 538.imagick_r.
2025-09-12 06:21:54 +00:00
Graham Hunter
e285602fda [LV] Enforce addrec in current loop for uncountable exit load address check
Addresses post-commit review raised for #145663
2025-09-11 11:18:22 +00:00
Hongyu Chen
c62ea6598e
[VectorCombine] Add Ext and Trunc support in foldBitOpOfCastConstant (#157822)
Follow-up of https://github.com/llvm/llvm-project/pull/155216.
This patch doesn't preserve the flags. I will implement it in the
follow-up patch.
2025-09-11 17:08:47 +08:00
Garth Lei
8a8a810506
[SLP][NFC] Remove unused local variable in lambda (#156835) 2025-09-11 02:05:55 +00:00
Elvis Wang
3e898bc40f
[LV] Fix cost misaligned when gather/scatter w/ addr is uniform. (#157387)
This patch fix the assertion when the `isUniform` (from legacy model)
and `isSingleScalar`(from Vplan-based model) mismatch.

The simplify test that cause assertion
```
loop:
  loadA = load %a  => %a is loop invariant.
  loadB = load %LoadA
  ...
```
In the legacy cost model, it cannot analysis that addr of `%loadB` is
uniform but in the Vplan-based cost model both addr in `%loadA` and
`loadB` is single scalar.

Full test caused crash: https://llvm.godbolt.org/z/zEG8YKjqh.

---------

Co-authored-by: Luke Lau <luke@igalia.com>
2025-09-11 07:49:54 +08:00
Florian Hahn
1efa997317
[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars.
Move handling of stores to single-scalar/uniform address from
replicateByVF to narrowToSingleScalar.
2025-09-10 21:58:29 +01:00
Florian Hahn
055e4ff35a
[VPlan] Don't narrow op multiple times in narrowInterleaveGroups.
Track which ops already have been narrowed, to avoid narrowing the same
operation multiple times. Repeated narrowing will lead to incorrect
results, because we could first narrow from an interleave group -> wide
load, and then narrow the wide load > single-scalar load.

Fixes thttps://github.com/llvm/llvm-project/issues/156190.
2025-09-10 19:22:42 +01:00
Alexey Bataev
0dddfab54c [SLP]Recalculate deps if the original instruction scheduled after being copyable
If the original instruction is going to be scheduled after same
instruction being scheduled as copyable, need to recalculate
dependencies. Otherwise, the dependencies maybe calculated incorrectly.
2025-09-10 10:18:45 -07:00
Graham Hunter
3c810b76b9
[LV] Add initial legality checks for early exit loops with side effects (#145663)
This adds initial support to LoopVectorizationLegality to analyze loops
with side effects (particularly stores to memory) and an uncountable
exit. This patch alone doesn't enable any new transformations, but
does give clearer reasons for rejecting vectorization for such a loop.

The intent is for a loop like the following to pass the specific checks,
and only be rejected at the end until the transformation code is
committed:

```
// Assume a is marked restrict
// Assume b is known to be large enough to access up to b[N-1]
for (int i = 0; i < N; ++) {
  a[i]++;
  if (b[i] > threshold)
    break;
}
```
2025-09-10 13:54:52 +01:00
Florian Hahn
c3e76b2770
[VPlan] Keep common flags during CSE. (#157664)
During CSE, we don't have to drop all poison-generating flags on
mis-match, we can keep the ones common on both recipes.

PR: https://github.com/llvm/llvm-project/pull/157664
2025-09-10 10:20:48 +00:00
Stephen Tozer
d4f7995488
[VPlan] Use Unknown instead of empty location in VPlanTransforms (#157702)
The default values for DebugLocs in LoopVectorizer/VPlan were recently
updated from empty DebugLocs to DebugLoc::getUnknown, as part of the
DebugLoc Coverage Tracking work. However, there are some cases where we
also pass an explicit empty DebugLoc, in many cases as a filler
argument. This patch updates all of these to `getUnknown` for now, until
either valid locations or a suitable categorization can be assigned to
each instead.

This change is NFC outside of DebugLoc coverage tracking builds.
2025-09-10 10:33:58 +01:00
Mel Chen
4d9a7fa9ba
[VPlan] Remove dead recipes before simplifying blends (#157622)
In simplifyBlends, when normalizing a blend recipe, the first mask that
is used only by the blend and is not all-false is chosen, and its
corresponding incoming value becomes the initial value, with the others
blended into it. At the same time, the mask that is chosen can be
eliminated. However, a multi-user mask might be used by a dead recipe,
which prevents this optimization. This patch moves removeDeadRecipes
before simplifyBlends to eliminate dead recipes, allowing simplifyBlends
to remove more dead masks.
2025-09-10 08:03:18 +00:00
Florian Hahn
c4b17bf9ed
[VPlan] Slightly extend ExtractLastElement fold to single-scalars.
Update ExtractLastElement fold to support single scalar recipes, if all
their users only use scalars.
2025-09-09 22:08:08 +01:00
Ramkumar Ramachandra
5544afd253
[LoopUtils] Simplify expanded RT-checks (#157518)
Follow up on 528b13d ([SCEVExp] Add helper to clean up dead instructions
after expansion.) to hoist the SCEVExapnder::eraseDeadInstructions call
from LoopVectorize into the LoopUtils APIs add[Diff]RuntimeChecks, so
that other callers (LoopDistribute and LoopVersioning) can benefit from
the patch.
2025-09-09 11:38:54 +00:00
Florian Hahn
9b1b93766d
Reapply "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9.

Recommit with with a fix for MSan failure (
https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a
set to track deleted values. Using the InsertedInstructions set is not
sufficient, as it use asserting value handles as keys, which may
dereference the value at construction.

Original message:

Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-09 09:47:41 +01:00
Florian Hahn
132bacde22
[VPlan] Also allow extracts as users when converting to single scalars.
Extracts technically do not use scalars, but vectors, but if the operand
is a single scalar we do not need a vector and they should not block
forming single scalars.
2025-09-08 22:11:39 +01:00
Alexey Bataev
d0ea176cce [SLP]Do not consider SExt/ZExt profitable for demotion, if the user is a bitcast to float
If the user node of the SExt/ZExt node is a bitcast to a float point
type, the node itself should not be considered legal to demote, since
still the casting is required to match the size of the float point type.

Fixes #157277
2025-09-08 07:59:01 -07:00
Florian Hahn
eeb43806eb
Revert "[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)"
This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f.

Triggers MSan errors in some configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/169/builds/14799
2025-09-08 14:52:28 +01:00
Hongyu Chen
75b0c89e62
[InstCombine][VectorCombine][NFC] Unify uses of lossless inverse cast (#156597)
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
2025-09-08 13:30:06 +00:00
Simon Pilgrim
ad3a0ae9e1
[VectorCombine] foldSelectShuffle - early-out cases where the max vector register width isn't large enough (#157430)
Technically this could happen with vector units that can't handle all legal scalar widths - but its good enough to use a generic crash test without a suitable target

Fixes #157335
2025-09-08 12:04:23 +00:00
Florian Hahn
528b13df57
[SCEVExp] Add helper to clean up dead instructions after expansion. (#157308)
Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.

Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205

PR: https://github.com/llvm/llvm-project/pull/157308
2025-09-08 10:53:20 +01:00
Luke Lau
fe6e178401
[VPlan] Don't build recipes for unconditional switches (#157323)
In #157322 we crash because we try to infer a type for a VPReplicate
switch recipe.

My understanding was that these switches should be removed by
VPlanPredicator, but this switch survived through it because it was
unconditional, i.e. had no cases other than the default case.

This fixes #157322 by not emitting any recipes for unconditional
switches to begin with, similar to how we treat unconditional branches.
2025-09-08 09:01:43 +00:00