5754 Commits

Author SHA1 Message Date
Florian Hahn
b24694371c
[VPlan] Add instantiation of VPUnrollPartAccessor<2> to fix link errors.
Speculative fix for missing definitions in some configs, including
 https://lab.llvm.org/buildbot/#/builders/159/builds/18760
2025-03-25 13:12:34 +00:00
Florian Hahn
dfca6c0d3b
[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655)
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part where
only the first lane is demanded.

This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.

In preparation for removing redundant WideIV steps added in
https://github.com/llvm/llvm-project/pull/119284.

PR: https://github.com/llvm/llvm-project/pull/123655
2025-03-25 12:57:24 +00:00
Han-Kuan Chen
6e66cfeeae
[SLP] Make getSameOpcode support interchangeable instructions. (#132887)
We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., `add x, 0` is equivalent to
`mul x, 1`).
Non-constant values are not supported, as they may incur high costs with
little benefit.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2025-03-25 19:46:15 +08:00
Alexey Bataev
8122bb9dbe [SLP]Fix a check for non-schedulable instructions
Need to fix a check for non-schedulable instructions in
getLastInstructionInBundle function, because this check may not work
correctly during the codegen. Instead, need to check that actually these
instructions were never scheduled, since the scheduling analysis always
performed before the codegen and is stable.

Fixes #132841
2025-03-25 04:35:33 -07:00
Han-Kuan Chen
2682a9433b
[SLP][REVEC] Add ExtractSubvector cost for ExternalUses. (#132761)
For llvm/test/Transforms/SLPVectorizer/revec-shufflevector.ll,
ScalarCost and ExtraCost is 1, so the original scalar will be kept.
2025-03-25 18:58:54 +08:00
Ramkumar Ramachandra
e8d882a95b
[LV] Audit and fix nits in cl::opts (NFC) (#130601)
Non-static cl::opts should be under the llvm namespace.
2025-03-25 10:19:45 +00:00
Martin Storsjö
b33bec9b21 Revert "[SLP] Make getSameOpcode support interchangeable instructions. (#127450)"
This reverts commit 71a0cfd93263552ddc0bfd2ea7b0abe9a578f87e.

This commit triggers failed asserts when compiling ffmpeg. The
issue is reproducible with a small standalone reproducer like this:

    void make_filters_from_proto(int *filter[][2], int bands) {
      int c, q, n;
      for (;; q++) {
        n = 0;
        for (; n < 7; n++) {
          int theta = (q * (n - 6) + (n >> 1) - 3) % bands;
          if (theta)
            c = theta;
          filter[q][n][0] = c;
        }
      }
    }

$ clang -target x86_64-linux-gnu -c repro.c -O3
clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:989: llvm::SmallVector<llvm
::Value*> {anonymous}::BinOpSameOpcodeHelper::InterchangeableInfo::getOperand(ll
vm::Instruction*) const: Assertion `FromCIValue.isZero() && "Cannot convert the
instruction."' failed.

The same issue also reproduces for a large number of other target
triples, aarch64-linux-gnu and others.
2025-03-25 10:22:44 +02:00
Martin Storsjö
dd059338a2 Revert "[Vectorize] Fix a warning"
This reverts commit 4c68061254c896214b7ad5ab807ac4ba11517812.

Reverting as part of a revert of a preceding commit.
2025-03-25 10:21:05 +02:00
Kazu Hirata
4c68061254 [Vectorize] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:855:52: error:
  unused variable 'SupportedOp' [-Werror,-Wunused-const-variable]
2025-03-24 17:38:47 -07:00
Han-Kuan Chen
71a0cfd932
[SLP] Make getSameOpcode support interchangeable instructions. (#127450)
We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., `add x, 0` is equivalent to
`mul x, 1`).
Non-constant values are not supported, as they may incur high costs with
little benefit.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2025-03-25 08:24:46 +08:00
Florian Hahn
5b38fb59df
[VPlan] Remove remaining references to VPScalarPHIRecipe (NFC).
VPScalarPHIRecipe has been replaced by VPInstructions with PHI opcodes.
Strip remaining dead references to VPScalarPHIRecipe.
2025-03-24 19:37:00 +00:00
Alexey Bataev
ad9909dd73
[SLP]Fix perfect diamond match with extractelements in scalars
Need to drop all previous estimations/vectorizations, when found
a perfect diamond match. This improves cost estimation and improves code
emission.
Also, need to adjust getScalarizationOverhead cost for non-poison input
vector. Currently, it does not allow to estimate it correctly, so
instead use conservative element-by-element insertelement cost for each
unique scalar.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/132466
2025-03-24 09:29:18 -04:00
Luke Lau
6a8606e99e
[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300)
VPReductionRecipes take a RecurrenceDescriptor, but only use the
RecurKind and FastMathFlags in it when executing. This patch makes the
recipe more lightweight by stripping it to only take the latter two.

The motiviation for this is to simplify an upcoming patch to support
in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to
create an Or reduction, and by using RecurKind we can create an
arbitrary reduction without needing a full RecurrenceDescriptor.
2025-03-24 19:18:54 +08:00
Florian Hahn
06fd10f1da
[VPlan] Don't create ExtractElement recipes for scalar plans. (#131604)
ExtractElements are no-ops for scalar VPlans. Don't introduce them in 
handleUncountableEarlyExit if the plan has only a scalar VF.

This fixes a crash trying to compute the cost of ExtractElement after 26ecf978951b79.

PR: https://github.com/llvm/llvm-project/pull/131604
2025-03-23 22:00:02 +00:00
Martin Storsjö
ff3e2ba9eb Revert "[VPlan] Add transformation to narrow interleave groups. (#106441)"
This reverts commit dfa665f19c52d98b8d833a8e9073427ba5641b19.

This commit caused miscompilations in ffmpeg, see
https://github.com/llvm/llvm-project/pull/106441 for details.
2025-03-23 23:27:39 +02:00
Florian Hahn
206b42c98e
[LV] Use VPBuilder to create ComputeReductionResult. (NFC)
Update code to use VPBuilder, to simplify follow-up changes.
2025-03-23 16:10:07 +00:00
Florian Hahn
c482b8faea
[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC).
Instead of executing the whole entry VPIRBB twice, first only execute
the VPExpandSCEVRecipes and replace their uses with the expanded
VPValue, which will be a live-in. This allows removing special logic in
VPExpandSCEVRecipe to support executing twice and allows moving the
ExpandedSCEVs map out of VPTransformState.

It will also allow adding other recipes to the entry VPBB in the future.
2025-03-23 09:06:01 +00:00
Kazu Hirata
fae34938f6
[llvm] Use *Set::insert_range (NFC) (#132591)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range with
iterator ranges.  For each case, I've verified that foos is defined as
make_range(foo_begin(), foo_end()) or in a similar manner.
2025-03-22 22:14:45 -07:00
Florian Hahn
dfa665f19c
[VPlan] Add transformation to narrow interleave groups. (#106441)
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.

Depends on https://github.com/llvm/llvm-project/pull/106431.

Fixes https://github.com/llvm/llvm-project/issues/82936.

PR: https://github.com/llvm/llvm-project/pull/106441
2025-03-22 21:40:17 +00:00
Florian Hahn
0d3ba087f7
[LV] Move IV bypass value creation out of ILV (NFC)
createInductionAdditionalBypassValues is only used for epilogue
vectorization now. Move it out of ILV, which means we do not have to
thread through ExpandedSCEVs and also don't have to track the bypass
values in ILV. Instead, directly create them if needed after executing
the epilogue plan. This moves more the epilogue specific logic out of
the generic executePlan.
2025-03-22 20:36:45 +00:00
Florian Hahn
34631744af
[VPlan] Get DataLayout from SE in VPExpandSCEVRecipe::execute (NFC)
This doesn't rely on State.CFG.
2025-03-22 15:49:57 +00:00
Alexey Bataev
3b0ec61156
[SLP][NFC] Redesign schedule bundle, separate from schedule data, NFC
That's the initial patch, intended to support revectorization of the
previously vectorized scalars. If the scalar is marked for the
vectorization, it becomes a part of the schedule bundle, used to check
dependencies and then schedule tree entry scalars into a single batch of
instructions. Unfortunately, currently this info is part of the
ScheduleData struct and it does not allow making scalars part of many
bundles. The patch separates schedule bundles from the ScheduleData,
introduces explicit class ScheduleBundle for bundles, allowing later to
extend it to support revectorization of the previously vectorized
scalars.

Reviewers: hiraditya, RKSimon

Reviewed By: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/131625
2025-03-21 13:36:57 -04:00
David Sherwood
4e69258bf3
[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565)
At the moment if we decide to enable tail-folding we do not include
the cost of generating the mask per VF. This can mean we make some
poor choices of VF, which is definitely true for SVE-enabled AArch64
targets where mask generation for fixed-width vectors is more
expensive than for scalable vectors.

I've added a VPInstruction::computeCost function to return the costs
of the ActiveLaneMask and ExplicitVectorLength operations.
Unfortunately, in order to prevent asserts firing I've also had to
duplicate the same code in the legacy cost model to make sure the
chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for
now. The alternative would be to disable the assert completely when
tail-folding, which I imagine is just as bad.

New tests added:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
  Transforms/LoopVectorize/RISCV/tail-folding-cost.ll
2025-03-21 09:24:56 +00:00
Kazu Hirata
3520dc5e7a [Vectorize] Fix a build
This patch fixes:

  llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:2263:19: error:
  expected ';' after return statement
2025-03-20 12:52:27 -07:00
Florian Hahn
c73ad7ba20
[VPlan] Add transformation to narrow interleave groups.
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms. For now
it only transforms load interleave groups feeding store groups.

Depends on #106431.

This lands the main parts of the approved
https://github.com/llvm/llvm-project/pull/106441 as suggested to break
things up a bit more.
2025-03-20 19:41:37 +00:00
Kazu Hirata
2df51fd9c4
[Transforms] Avoid repeated hash lookups (NFC) (#132146) 2025-03-20 09:11:56 -07:00
Han-Kuan Chen
73558dc329
[SLP][REVEC] Fix getStoreMinimumVF only accept scalar types. (#132181)
Fix "Element type of a VectorType must " "be an integer, floating point,
or " "pointer type.".
2025-03-20 21:04:30 +08:00
Han-Kuan Chen
a5d4b50f93
[SLP] NFC. Change the inner loop and outer loop of appendOperandsOfVL. (#132152) 2025-03-20 20:32:20 +08:00
Luke Lau
01f04252b6
[LV] Get FMFs from VectorBuilder in createSimpleReduction. NFC (#132017)
The other createSimpleReduction takes the FMFs from the IRBuilder, so
this aligns the VectorBuilder variant to do the same and reduce the
possibility of there being a mismatch in flags.
2025-03-20 16:38:56 +08:00
Nikita Popov
0738f70615
[Intrinsics] Add Intrinsic::getFnAttributes() (NFC) (#132029)
Most places that call Intrinsic::getAttributes() are only interested in
the function attributes, so add a separate function for that.

The motivation for this is that I'd like to add the ability to specify
range attributes on intrinsics, which requires knowing the function
type. This avoids needing to know the type for most attribute queries.
2025-03-20 09:20:39 +01:00
Han-Kuan Chen
c3e16337a4
[SLP][REVEC] Ignore UserTreeIndex if it is empty. (#131993)
Previously, the all_of check did not consider the case where the
TreeEntry is empty (i.e., when it is the first entry).
2025-03-20 11:31:49 +08:00
Kazu Hirata
0dcc201ac4
[Transforms] Use *Set::insert_range (NFC) (#132056)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-19 15:35:01 -07:00
Florian Hahn
2e13ec561c
[VPlan] Bail out on non-intrinsic calls in VPlanNativePath.
Update initial VPlan-construction in VPlanNativePath in line with the
inner loop path, in that it bails out when encountering constructs it
cannot handle, like non-intrinsic calls.

Fixes https://github.com/llvm/llvm-project/issues/131071.
2025-03-19 21:35:15 +00:00
Florian Hahn
11b8699572
[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415)
calculateRegisterUsage adds end points for each user of an instruction
to Ends and ignores instructions not added to it, i.e. instructions with
no users.

This means things like stores aren't included, which in turn means
values that are only used in stores are also not included for
consideration. This means we underestimate the register usage in cases
where the only users are things like stores.

Update the code to don't skip instructions without users (i.e. not in
Ends) if they have side-effects.

PR: https://github.com/llvm/llvm-project/pull/126415
2025-03-19 15:13:43 +00:00
Luke Lau
f536f71580
[LV] Split RecurrenceDescriptor into RecurKind + FastMathFlags in LoopUtils. NFC (#132014)
Split off from #131300, this splits up RecurrenceDescriptor arguments so
that arbitrary recurrence kinds may be used down the line.
2025-03-19 22:56:57 +08:00
Kazu Hirata
9705010b5e
[Vectorize] Avoid repeated hash lookups (NFC) (#131962) 2025-03-19 07:14:54 -07:00
Longsheng Mou
f3f7f08eca
[SLP] Fix Wsign-compare warning (NFC) (#131948)
llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4805:57:
warning: comparison of integer expressions of different signedness:
‘int’ and ‘std::size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
    [](const auto &P) { return P.value() % 2 != P.index() % 2; }))
                               ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
2025-03-19 17:01:42 +08:00
Florian Hahn
870f753f1f
[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC).
Also include VPlan's BTC in the set of VPValues to materialize
broadcasts for, if it is used.
2025-03-18 22:35:18 +00:00
Florian Hahn
8a91f6bcda
[VPlan] Use CurrentParentLoop instead of looking up via CFG (NFC).
There is no need to look up the current parent loop via LoopInfo and the
vector preheader; we can simply use CurrentParentLoop.
2025-03-18 22:11:47 +00:00
Florian Hahn
d51bc83511
[VPlan] Only skip live-ins with constants in materializeBroadccast (NFC)
Currently this should be NFC, but will be needed in future patches.
2025-03-18 20:23:16 +00:00
Alexey Bataev
45090b3059 [SLP]Check the whole def-use chain in the tree to find proper dominance, if the last instruction is the same
If the insertion point (last instruction) of the user nodes is the same,
need to check the whole def-use chain in the tree to find proper
dominance to prevent a compiler crash.

Fixes #131818
2025-03-18 10:01:13 -07:00
Luke Lau
a4dc02c0e7
[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086)
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.

Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.

This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
2025-03-19 00:09:15 +08:00
Elvis Wang
ed19620b8c
[VPlan] Make VPReductionRecipe a VPRecipeWithIRFlags. NFC (#130881)
This patch change the parent of the VPReductionRecipe from
VPSingleDefRecipe to VPRecipeWithIRFlags and also print/get/drop/control
flags by the VPRecipeWithIRFlags. This will remove the dependency of the
underlying instruction.

This patch also add a new function `setFastMathFlags()` to the
VPRecipeWithIRFlags because the entire reduction chain may contains
multiple instructions. And the underlying instruction may not contains
the corresponding flags for this reduction.

Split from #113903.
2025-03-18 10:08:23 +08:00
Florian Hahn
166937b49d
[LV] Cleanup after expanding SCEV predicate to constant.
In some cases, SCEV isn't able to prove that no wrap checks are needed,
while constant folding in SCEVExpander can. In those cases, we may leave
around IR for computing the trip count, which is unused at this point
but may be re-used later, triggering an assertion when trying to clean
up SCEVExp after vectorization.

Directly run the cleaner after expanding to a constant predicate to
prevent any generated code from being re-used.

Fixes https://github.com/llvm/llvm-project/issues/131281.
2025-03-17 21:26:51 +00:00
Jeffrey Byrnes
4336e5edbc
[SLP] Sort PHIs by ExtractElements when relevant (#131229)
Considering the PHIs in order of element extracted can lead to better shuffles.
2025-03-17 14:19:46 -07:00
Alexey Bataev
ead9d6a56d [SLP]Check VectorizableTree is not empty before accessing elements
Need to check VectorizableTree is not empty before accessing elements.

Fixes #131635
2025-03-17 11:04:38 -07:00
Luke Lau
67f1c033b8
[VPlan] Remove createReduction. NFCI (#131336)
This is split off from #131300.

A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so
when it calls createReduction it always calls createSimpleReduction.

If we replace the call then it leaves createReduction with one user in
VPInstruction::ComputeReductionResult, which we can inline and then
remove.
2025-03-18 00:18:15 +08:00
Luke Lau
eef5ea0c42
[VPlan] Account for dead FOR splice simplification in cost model (#131486)
Fixes #131359

After #129645, a first-order recurrence will no longer have it's splice
costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and
is dead.

The legacy cost model didn't account for this, so this accounts for it
in planContainsAdditionalSimplifications to avoid the "VPlan cost model
and legacy cost model disagreed" assertion.
2025-03-18 00:00:54 +08:00
Florian Hahn
17b4be8f63
[VPlan] Move setting name and adding VFs after recipe creation.(NFC)
Recipe creation is the only place where the VF range is restricted. Move
setting the VFs just after initial recipe creation.
2025-03-17 12:04:09 +00:00
Florian Hahn
4e9894498e
[VPlan] Truncate VFxUF if needed in VPWidenPointerInduction::execute.
Create truncate if needed after 56b05a0d6. Note that this preserves the
original behavior pre 56b05a0d6. If truncate would strip any set bits,
then the explicit computation in the narrower type would wrap.
2025-03-16 11:37:58 +00:00