3948 Commits

Author SHA1 Message Date
Alexey Bataev
8d933ea5ac [SLP][NFC]Use SmallDensetSet for lookup instead of ArrayRef, NFC. 2023-09-06 13:17:30 -07:00
Florian Hahn
785e7063b9
[VPlan] Don't rely on underlying instr in VPWidenRecipe (NFCI).
VPWidenRecipe only needs the opcode to widen, all other information
(flags, debug loc and operands) is already modeled directly via the
recipe.

This removes the remaining uses of the underlying instruction from
VPWidenRecipe::execute.
2023-09-06 16:27:09 +01:00
Alexey Bataev
09b8bbd6e0 [SLP][NFC]Reorder indeces instead of real values, NFC.
May save some memory/compile time.
2023-09-05 08:48:52 -07:00
Florian Hahn
165e24aa2a
[VPlan] Move DebugLoc to VPRecipeBase (NFCI).
Add a dedicated debug location to VPRecipeBase to remove another
unneeded use of the underlying LLVM IR instruction and also consolidate
various DL fields in sub classes.

Each recipe can have debug location and it shouldn't rely on reference
to the underlying LLVM IR instructions to retain it. See various recipes
that had separate DL fields already.
2023-09-05 15:45:16 +01:00
Florian Hahn
168e23c741
[VPlan] Remove reference to Instr when setting debug loc. (NFCI)
This allows untangling references to underlying IR for various recipes.
2023-09-05 10:59:13 +01:00
Mel Chen
26aed5b9a8 [VPlan][LoopUtils] Remove unused parameter TTI
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158148
2023-09-04 05:30:37 -07:00
Florian Hahn
3fa1b254b7
[VPlan] Print blend recipe as operand directly, instead of IR PHI.
Update VPBlendRecipe::print() to print the result directly, instead of
relying on the stored Phi pointer. This brings the recipe in line with
how other recipes are printed.
2023-09-04 12:35:58 +01:00
Florian Hahn
19d286bca0
[VPlan] Assert that inst isnt' a debug or pseudo inst (NFCI).
Debug and pseudo instructions aren't modeled in VPlan. Turn a check into
an assertion. This will help removing the direct use of Inst here in the
future.
2023-09-03 21:31:31 +01:00
Nuno Lopes
5a3fd5f3f5 [LoopVectorizer] Fix PR #65212: vectorization of reduction loop wasn't respecting original store alignment 2023-09-03 16:35:05 +01:00
Florian Hahn
fd66195777
[VPlan] Manage compare predicates in VPRecipeWithIRFlags.
Extend VPRecipeWithIRFlags to also manage predicates for compares. This
allows removing the custom ICmpULE opcode from VPInstruction which was a
workaround for missing proper predicate handling.

This simplifies the code a bit while also allowing compares with any
predicates. It also fixes a case where the compare predixcate wasn't
printed properly for VPReplicateRecipes.

Discussed/split off from D150398.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158992
2023-09-02 21:45:24 +01:00
Kazu Hirata
6da470d7f8 [llvm] Use range-based for loops (NFC) 2023-09-02 09:32:45 -07:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Igor Kirillov
ac65fb8699 [LoopVectorize] Fix incorrect order of invariant stores when there are multiple reductions.
When a loop has multiple reductions, each with an intermediate invariant
store, the order in which those reductions are processed is not considered.
This can result in the invariant stores outside the loop not preserving the
original order.
This patch sorts VPReductionPHIRecipes by the order in which they have
stores in the original loop before running
`InnerLoopVectorizer::fixReduction` function, and it helps to maintain
the correct order of stores.

Fixes https://github.com/llvm/llvm-project/issues/64047

Differential Revision: https://reviews.llvm.org/D157631
2023-08-31 16:21:44 +00:00
Philip Reames
aada8f2e54 [slp] Tweak debug costing output to include VL
This makes it much easier to understand which vector length is being considered when the same set of nodes are evaluated at multiple vector lengths.
2023-08-30 09:13:19 -07:00
Florian Hahn
e544d9cc36
[VPlan] Remove unused VPBuilder::insert member (NFC). 2023-08-30 16:35:55 +01:00
Florian Hahn
cd9563ae17
[VPlan] Remove unused VPInstruction::clone member (NFC). 2023-08-30 15:53:39 +01:00
Florian Hahn
96e83d3705
[LV] Use IRBuilder to create and optimize middle-block compare.
Split off from D150398 to avoid builder-related diff changes there.
Using IRBuilder to create ICmps simplifies the result if both operands
are constants.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158332
2023-08-29 11:42:18 +01:00
Florian Hahn
32cb8f519e
[VPlan] Generalize variable names for ICmpULE operands (NFC)
ICmp codegen for VPInstructionD will be extended for other predicates,
and the operands could be any values (not just IV and TC as implied by
the names). Suggested cleanup from 150398.
2023-08-28 15:47:04 +01:00
David Sherwood
c02184f286 [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop
Suppose we have a nested loop like this:

  void foo(int32_t *dst, int32_t *src, int m, int n) {
    for (int i = 0; i < m; i++) {
      for (int j = 0; j < n; j++) {
        dst[(i * n) + j] += src[(i * n) + j];
      }
    }
  }

We currently generate runtime memory checks as a precondition for
entering the vectorised version of the inner loop. However, if the
runtime-determined trip count for the inner loop is quite small
then the cost of these checks becomes quite expensive. This patch
attempts to mitigate these costs by adding a new option to
expand the memory ranges being checked to include the outer loop
as well. This leads to runtime checks that can then be hoisted
above the outer loop. For example, rather than looking for a
conflict between the memory ranges:

1. &dst[(i * n)] -> &dst[(i * n) + n]
2. &src[(i * n)] -> &src[(i * n) + n]

we can instead look at the expanded ranges:

1. &dst[0] -> &dst[((m - 1) * n) + n]
2. &src[0] -> &src[((m - 1) * n) + n]

which are outer-loop-invariant. As with many optimisations there
is a trade-off here, because there is a danger that using the
expanded ranges we may never enter the vectorised inner loop,
whereas with the smaller ranges we might enter at least once.

I have added a HoistRuntimeChecks option that is turned off by
default, but can be enabled for workloads where we know this is
guaranteed to be of real benefit. In future, we can also use
PGO to determine if this is worthwhile by using the inner loop
trip count information.

When enabling this option for SPEC2017 on neoverse-v1 with the
flags "-Ofast -mcpu=native -flto" I see an overall geomean
improvement of ~0.5%:

SPEC2017 results (+ is an improvement, - is a regression):
520.omnetpp: +2%
525.x264: +2%
557.xz: +1.2%
...
GEOMEAN: +0.5%

I didn't investigate all the differences to see if they are
genuine or noise, but I know the x264 improvement is real because
it has some hot nested loops with low trip counts where I can
see this hoisting is beneficial.

Tests have been added here:

  Transforms/LoopVectorize/runtime-checks-hoist.ll

Differential Revision: https://reviews.llvm.org/D152366
2023-08-24 12:14:02 +00:00
Alexey Bataev
66c623bfc6 [SLP][NFC]Use TreeEntry::getOprand instead of trying to rebuild it in getOperandInfo(), NFC. 2023-08-23 13:37:36 -07:00
Florian Hahn
26bb2da28b
[VPlan] Proactively create mask for tail-folding up-front (NFCI).
Split off mask creation for tail folding and proactively create the mask for
the header block.

This simplifies createBlockInMask.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157037
2023-08-23 21:36:16 +01:00
Ben Shi
ad35d916cd [VectorCombine] Enable transform 'foldSingleElementStore' for scalable vector types
The transform 'foldSingleElementStore' can be applied to scalable
vector types if the index is less than the minimum number of elements.

Reviewed By: dmgreen, nikic

Differential Revision: https://reviews.llvm.org/D157676
2023-08-23 17:12:36 +08:00
Florian Hahn
34d25924c4
[VPlan] Mark some VPInstruction opcodes as not having side effects.
Mark some VPInstruction opcodes as not having side effects, preparation
for D157037.
2023-08-22 20:05:57 +01:00
Alexey Bataev
a9e6295548 [SLP][NFC]Use all_of/any_of instead of loops, NFC. 2023-08-22 08:21:36 -07:00
Alexey Bataev
b51195dece [SLP]Fix PR63854: Add proper sorting of pointers for masked stores.
If the masked gathers can be reordered, it may produce strided access
pattern and the reordering does not affect common reodering, better to
try to reorder masked gathers for better performance.

Differential Revision: https://reviews.llvm.org/D157009
2023-08-22 06:14:01 -07:00
Kolya Panchenko
acbe886880 [LV] Vectorization remark for outerloop
Reviewed By: fhahn, ABataev

Differential Revision: https://reviews.llvm.org/D150696
2023-08-21 13:05:06 -04:00
Florian Hahn
57a6f6579c
[LV] Simplify condition for induction recipe insertion (NFCI).
Split off independent suggestion from D157037. This simplifies the
condition to decide if a recipe needs to be inserted to the header phi
section or simply appended.

The assertion has been updated to allow cases where the first non-phi
recipe is the end of the block, in which case inserting before this
point is equivalent to appending.
2023-08-21 15:58:07 +01:00
Florian Hahn
c34d049706
[LV] Re-use existing NewInsertionPoint variable for insertion (NFCI).
Split off independent suggestion from D157037.
2023-08-21 15:21:29 +01:00
Florian Hahn
56f5738d85
[LV] Move induction ::execute impls to VPlanRecipes.cpp (NFC).
All dependencies on code from LoopVectorize.cpp have been
removed/refactored. Move the ::execute implementations to other recipe
definitions in VPlanRecipes.cpp
2023-08-20 21:00:05 +01:00
Craig Topper
46eded75cd [LoopVectorize] Replace dyn_cast with isa to suppress an unused variable warning. NFC 2023-08-19 14:41:00 -07:00
Florian Hahn
622b611f23
[VPlan] Inline buildScalarSteps in single user (NFC).
Other users have been refactored, remove the uneeded function.
2023-08-19 17:02:31 +01:00
Florian Hahn
ada2a455fc
[VPlan] Use VPBasicBlock to get incoming block for exit phi fixup (NFC)
Retrieve block via VPlan infrastructure as suggested as independent
cleanup in D150398.
2023-08-17 18:17:45 +01:00
Florian Hahn
9ee4a740e3
[LV] Remove unused MiddleVPBB argument from addUsersInExitBlock (NFC).
The argument is no longer used, remove it.
2023-08-17 10:36:12 +01:00
Mel Chen
463e7cb892 [LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc
This commit refactors the implementation of VPReductionRecipe to use
reference instead of pointer for member RdxDesc. Because the member
RdxDesc in VPReductionRecipe should not be a nullptr, using a reference
will provide clearer semantics.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158058
2023-08-16 19:37:49 -07:00
Alexey Bataev
ca2eabdb52 [SLP][NFC]Improve code to meet coding standards, NFC. 2023-08-15 11:08:25 -07:00
Alexey Bataev
63c7815faf [SLP]Fix comparator for PHI nodes comparison.
Fixed comparator for PHI nodes sorting to meet the criteria for strict
weak ordering.
2023-08-14 14:05:39 -07:00
Florian Hahn
00bc500830
[VPlan] Store FPBinOp directly in VPDerivedIVRecipe (NFCI).
Address post-commit simplification suggestion for 8a56179bcd8c:
Store operator only for floating point inductions (i.e. the binary op is
a FPMathOperator).
2023-08-14 21:45:19 +01:00
Alexey Bataev
4f0bd8f7ac [SLP]Fix strict weak ordering for Cmp instruction comparator.
Sorting algorithms require strict weak ordering for comparators, final
fix for cmp instructions comparator.
2023-08-14 09:37:46 -07:00
Florian Hahn
aacaf3d580
[VPlan] Simplify VPDerivedIV truncation handling (NFCI).
Address post-commit simplification suggestion for 8a56179bcd8c: Replace
IsTruncated by conditionally setting TruncResultTy only if truncation
is required.
2023-08-14 17:33:10 +01:00
Florian Hahn
d32e68ae53
[docs] Graduate VectorizationPlan.rst from proposal.
VPlan has become an integral part of the inner loop vectorizer pipeline
that has been actively developed over the previous years. Let's move
VectorizationPlan.rst from the proposal stage to bring the docs in line
and to avoid confusion when reading the docs.

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D157593
2023-08-10 17:15:43 +01:00
Alexey Bataev
2216507171 [SLP]Fix PR64568: Crash during horizontal reduction.
If the reduced values is constant-foldable and was folded to a constant
during previous transformations, need to excluded it from the list of
the reduced values-instructions as non-matchable.
2023-08-10 07:33:16 -07:00
Alexey Bataev
42b3925d42 [SLP][NFC]Fix formatting/warnings in tryToReduce(), NFC. 2023-08-10 06:42:50 -07:00
Bjorn Pettersson
e53b28c833 [llvm] Drop some bitcasts and references related to typed pointers
Differential Revision: https://reviews.llvm.org/D157551
2023-08-10 15:07:07 +02:00
Florian Hahn
8a56179bcd
[VPlan] Store induction kind & binop directly in VPDerviedIVRecipe(NFC)
Limit the information stored in VPDerivedIVRecipe to the ingredients
really needed.
2023-08-10 10:57:32 +01:00
Florian Hahn
e6d5dcf84c
[LV] Pass kind and induction binop to emitTransformedIndex (NFC).
Explicitly pass InductionKind and InductionBinOp to
emitTransformedIndex. Only those values are needed from the induction
descriptor. This makes explicit what is needed for the function and
allows future use cases where the a full induction descriptor object is
not available.
2023-08-10 10:35:42 +01:00
Valery N Dmitriev
f522be63bc [SLP][NFC] Make buildShuffleEntryMask routine a TreeEntry method.
The routine uses data stored at TreeEntry node for building a mask
so it is natural to make it a method for the type. That will simplify
its interface and reduces data transfer.
The method is added as buildAltOpShuffleMask.

Differential Revision: https://reviews.llvm.org/D157545
2023-08-09 13:43:03 -07:00
Alexey Bataev
c619222ea4 [SLP]Use common logic for cost estimation of the alternate vector nodes.
We can use buildShuffleEntryMask() to build the shuffle mask correctly
not only for the alternate nodes with reuses, but also for the nodes
without reused scalars. It allows better to estimate the cost of the
node and emit better code.

Differential Revision: https://reviews.llvm.org/D157413
2023-08-09 11:50:39 -07:00
Florian Hahn
b223229e2c
[VPlan] Re-use existing step again after 34accad1feae.
This fixes a failing RISCV test case that was missed originally.
2023-08-08 21:42:56 +01:00
Florian Hahn
34accad1fe
[VPlan] Move logic to create VPScalarIVStepsRecipe to helper (NFC).
This allows for easier re-use in follow-on patches.
2023-08-08 21:25:06 +01:00
Florian Hahn
698ae66092
[VPlan] Replace FMF in VPInstruction with VPRecipeWithIRFlags (NFC).
Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for
VPInstruction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157144
2023-08-08 20:13:11 +01:00