3993 Commits

Author SHA1 Message Date
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Alexey Bataev
019aee8327 [SLP]Improve costs in computeExtractCost() to avoid crash after D158449.
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
2023-09-29 07:48:02 -07:00
Hans Wennborg
06f3b0ed43 Revert "[SLP]Improve costs in computeExtractCost() to avoid crash after D158449."
This caused asserts:

  Assertion failed: NumElts > 1 && "Expected at least 2-element fixed length vector(s).",
  file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp, line 7096

see comment on 59a67ea35d

> Need to consider the length of the original vector for extractelements,
> not the length, matched number of the scalars. It fixes 2 issues: 1)
> improves cost estimation; 2) Fixes crashes after D158449.

This reverts commit 59a67ea35d608480257fc64ec3e5106ef50de740.
2023-09-29 10:42:19 +02:00
Alexey Bataev
3204f88a8b Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the
crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.
2023-09-28 11:57:32 -07:00
Alexey Bataev
c88c281cf1 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-28 11:03:21 -07:00
Alexey Bataev
59a67ea35d [SLP]Improve costs in computeExtractCost() to avoid crash after D158449.
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
2023-09-28 09:36:08 -07:00
Nikita Popov
3b82397965 [VectorCombine] Check for non-byte-sized element type
We should check whether the element type is non-byte-sized, not
the vector type. For types like <32 x i1> the whole type is
byte-sized, but the individual elements (that we scalarize to)
are not.

Fixes https://github.com/llvm/llvm-project/issues/67060.
2023-09-28 14:18:30 +02:00
Mikael Holmen
9cecee97a0 [VPlan] Silence gcc Wparentheses warning [NFC]
Without the fix gcc warns about
../lib/Transforms/Vectorize/VPlanTransforms.cpp:968:42: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
  968 |          UseActiveLaneMaskForControlFlow &&
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
  969 |              "DataAndControlFlowWithoutRuntimeCheck implies "
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  970 |              "UseActiveLaneMaskForControlFlow");
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-09-28 12:04:26 +02:00
Alexey Bataev
9eeb0293e2 [SLP]Cleanup MultiNodeScalars when tree deleted.
Need to clear MultiNodeScalars map to avoid compiler crash when tree is
deleted.
2023-09-27 07:48:53 -07:00
Alexey Bataev
ea7f43ec14 [SLP]Do not gather node, if the instruction, that does not require
scheduling, is previously vectorized.

If the main node was vectorized already, but does not require
scheduling, we still can try to vectorize it in this new node instead of
gathering.
2023-09-26 11:57:35 -07:00
Ben Shi
ea0ee55c02
[VectorCombine] Enable transform 'scalarizeLoadExtract' for non constant indexes (#65445)
Enable the transform if a non constant index is guaranteed to be safe
via a UREM/AND.
2023-09-26 09:41:53 +08:00
alexfh
5d86176f48
Revert "[SLP]Do not gather node, if the instruction, that does not require" (#67386)
This reverts commit 77053421228edd12a3ba73d4eebd970fcdd3b2c0, which
introduces a
clang crash (test case: https://gcc.godbolt.org/z/zn5n4KWPY).
2023-09-26 02:45:11 +02:00
Florian Hahn
97687b7aea
[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.

This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.

This has the advantage of keeping the logic  (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158779
2023-09-25 13:34:45 +01:00
Florian Hahn
1a9e45080f
[VPBuilder] Add setInsertPoint version taking a recipe directly (NFC).
This helps to slightly simplify code when a recipe can be obtained
easily. Suggested in D158779.
2023-09-25 12:17:53 +01:00
Florian Hahn
541e88dbc2
[VPlan] Simplify HCFG construction of region blocks (NFC).
Update the logic to update the successors and predecessors of region
blocks directly. This adds special handling for header and latch blocks
in place, and removes the separate loop to fix up the region blocks.

Helps to simplify D158333.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D159136
2023-09-24 21:53:35 +01:00
Kazu Hirata
e7497570d8 [Vectorize] Use range-based for loops (NFC) 2023-09-22 17:43:06 -07:00
Youngsuk Kim
e5026f0179 [llvm] Remove uses of Type::getPointerTo() (NFC)
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:

* Drop the call entirely if the sole purpose of it is to support a no-op
  bitcast (remove the no-op bitcast as well).

* Replace with `PointerType::get()`/`PointerType::getUnqual()`

This is a NFC cleanup effort.

Reviewed By: barannikov88

Differential Revision: https://reviews.llvm.org/D155232
2023-09-22 19:44:38 -04:00
Florian Hahn
d9f83169d1
[VPlan] Ensure start value of phis is the first op at construction (NFC)
Header phi recipes have the start value (incoming from outside the loop)
as first operand. This wasn't the case for VPWidenPHIRecipes. Instead
the start value was picked during execute() by doing extra work.

To be in line with other recipes, ensure the operand order is as
expected during construction.
2023-09-22 21:24:15 +01:00
Alexey Bataev
7ff83ed6cd [SLP]Do not try to reorder possible strided nodes.
Reordering of possible strided nodes in bottom-to-top order requires
top-to-bottom reordering of the operands of such nodes, which is not
supported. Need to disable reordering of strided operands to avoid
compiler crashes.
2023-09-22 07:55:43 -07:00
David Spickett
8f548610a6 Revert "[SLP]Use source vector type as the original vector type instead of"
This reverts commit 9a99944df068b29b905cd8ba9a2132cc6382b6fb.

Due to test suite failures on all our SVE buildbots e.g.:
https://lab.llvm.org/buildbot/#/builders/184/builds/7375

clang: ../llvm/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:3565:
InstructionCost llvm::AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind,
VectorType *, ArrayRef<int>, TTI::TargetCostKind, int, VectorType *,
ArrayRef<const Value *>): Assertion `Mask.size() == TpNumElts && "Expected Mask and Tp size to match!"' failed.
2023-09-22 07:52:16 +00:00
Alexey Bataev
9a99944df0 [SLP]Use source vector type as the original vector type instead of
artificial for better cost estimation.

Need to use original source vector type, not the one artificially
constructed, based on the number of vectorized scalars. It affect the
cost significantly.
2023-09-21 11:34:02 -07:00
Alexey Bataev
3dc28e6c6a [SLp]Fix a crash because of wrong deps between vectorized nodes.
Need to change the order of the nodes vectorization to avoid too early
insertion of the first node.
2023-09-21 10:19:11 -07:00
Alexey Bataev
12fda304cc [SLP][NFC]Unify add() member function in CostEstimator, NFC.
Make add() function smart enough to understand that the shuffle of
a single entry is requested, if it sees that the second node is the same
as the first.
2023-09-21 07:59:37 -07:00
Alexey Bataev
c601928cb9 [SLP][NFC]Improve compile time by storing all nodes for the given
scalar.

No need to scan the whole graph when trying to find matching node for
the scalar, vectorized in several nodes, better to store corresponding
nodes along and scan just this small list.
2023-09-21 07:24:31 -07:00
Florian Hahn
f23246a0bb
[LV] Directly add fast-math flags to select recipe (NFC).
Now that VPInstruction can manage fast math flags via
VPRecipeWithIRFlags, use them directly to model the fast-math flags of
the select created for the final reduction value instead of adding them
late.
2023-09-21 11:05:55 +01:00
Florian Hahn
1a9358c090
[LV] Relax over-strict assertion for reduction exit value selects.
After f108c6c, (mul x, 1) is simplified to x, which can cause the select
for the final reduction value when tail-folding to use the reduction
value for both options. Relax the assertion to make sure this case is
allowed.

Note that the reduction is now redundant itself and could be further
simplified.

Fixes #66895.
2023-09-21 10:12:29 +01:00
Michael Maitland
e0aaa1956d
[VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats (#65706)
of the scalar operation

VP Intrinsics whose vector operands are both splat values may be
simplified into the scalar version of the operation and the result is
splatted.

This issue is the intrinsic dual of #65072.
2023-09-20 18:27:51 -04:00
Alexey Bataev
7705342122 [SLP]Do not gather node, if the instruction, that does not require
scheduling, is previously vectorized.

If the main node was vectorized already, but does not require
scheduling, we still can try to vectorize it in this new node instead of
gathering.
2023-09-20 12:52:37 -07:00
Alexey Bataev
ebed4692f8 [SLP]Fix a crash when trying to find operand with re-vectorized main
instruction.

Need to check if the operand scalars are vectorized in the a different
vector node, if the main instruction is already gets vectorized in other
vector node.
2023-09-20 09:54:15 -07:00
Alexey Bataev
7db87a66b0 [SLP]Fix PR66795: Check correct deps for vectorized inst with multiple
vectorized node uses.

If the instruction is vectorized in many different vector nodes, it may
break the dependency analysis for gathered nodes with matched scalars.
Need to properly check the dependency between such gather nodes to avoid
cycle dependency.
2023-09-19 12:11:33 -07:00
Florian Hahn
f108c6cdc1
[VPlan] Fold (MUL A, 1) -> A as VPlan2VPlan transform.
Add first VPlan-based recipe simplification to fold (MUL A, 1) -> A.
Among other things, this enables additional simplifications after
applying versioned strides, as follow up to D147783.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D159200
2023-09-18 21:45:34 +01:00
Ben Shi
87143ff9f2 [VectorCombine] Fix a spot in commit 068357d9b09cd635b1c2f126d119ce9afecb28f7
My previous commit leads to a crash in "Builders/sanitizer-x86_64-linux-fast"
as https://lab.llvm.org/buildbot/#/builders/5/builds/36746. And this patch
fixes it.
2023-09-18 15:01:47 +08:00
Ben Shi
068357d9b0
[VectorCombine] Enable transform 'scalarizeLoadExtract' for scalable vector types (#65443)
The transform 'scalarizeLoadExtract' can be applied to scalable
vector types if the index is less than the minimum number of elements.

The check whether the index is less than the minimum number of elements
locates at line 1175~1180. 'scalarizeLoadExtract' will call
'canScalarizeAccess' and check the returned result if this transform is safe.

At the beginning of the function 'canScalarizeAccess', the index will be
checked
1. If it is less than the number of elements of a fixed vector type.
2. If it is less than the minimum number of elements of a scalable vector type.

Otherwise 'canScalarizeAccess' will return unsafe and this transform
will be prevented.
2023-09-18 10:49:18 +08:00
Florian Hahn
1d1cba44ea
[VPlan] Remove stray indent when printing scalar steps recipe.
VPScalarIVStepsRecipe will now be printed as
      vp<%6> = SCALAR-STEPS vp<%3>, ir<1>
instead of
      vp<%6>      = SCALAR-STEPS vp<%3>, ir<1>
2023-09-17 10:15:52 +01:00
Alexey Bataev
434aa2fe56 [SLP]Improve canreuseExtracts for reordering analysis.
Improve the analysis in canReuseExtracts for the reodering to better
reorder extracts for ExtractSubvector pattern.
2023-09-15 12:09:45 -07:00
Alexey Bataev
b9ad72ba05 [SLP]Fix PR66176: SLP incorrectly reorders select operands.
On the very first iteration for the reductions, when trying to build
reduction for boolean logic operations, no need to compare LHS/RHS with
the Reduction(VectorizedTree), need to compare with actual parameters of
the reduction operations.
2023-09-15 03:57:36 -07:00
Alexey Bataev
c15c1e5dd5 [SLP]Do not account non-instructions for external use.
If the non-instruction gets vectorized, no need to account its extract
cost, it won't be removed and replaced by extractelement instruction.
2023-09-14 12:40:33 -07:00
Jeremy Morse
e54277fa10 [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder
This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152468
2023-09-11 20:01:19 +01:00
Alexey Bataev
9a90457a76 [SLP][NFC]Use ArrayReffor operands directly instead of entry/operand number, NFC. 2023-09-11 11:16:13 -07:00
Jeremy Morse
6942c64e81 [NFC][RemoveDIs] Prefer iterator-insertion over instructions
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.

At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152537
2023-09-11 11:48:45 +01:00
Alexey Bataev
5bab59de44 [SLP]Try to vectorize scalars, being vectorized already, but does not need to be scheduled.
If the scalar does not need to be scheduled and it was vectorized
already in one of the vector nodes, we still can try to vectorize it in
another node. Just does not need account its cost in the scalar total
cost, as it will be handled in the main vectorized node.

Differential Revision: https://reviews.llvm.org/D159205
2023-09-08 13:34:12 -07:00
Florian Hahn
08de6508ab
[LV] Return debug loc directly from getDebugLocFromInstrOrOps (NFCI)
The return value of the function is only used to get the debug location.
Directly return the debug location, as this avoids an extra null
check in the caller.
2023-09-08 16:29:09 +01:00
Florian Hahn
3e2d564c3d
[VPlan] Use VPRecipeWithFlags for VPScalarIVStepsRecipe (NFC).
This directly models the flags as part of the recipe, which allows
dropping them using the VPlan infrastructure when required.

It also allows removing the full reference to InductionDescriptor and
limit it to only the opcode.
2023-09-08 15:46:12 +01:00
Alexey Bataev
30edf1c449
[SLP]Do not early exit if the number of unique elements is non-power-of-2. (#65476)
We still can try to vectorize the bundle of the instructions, even if
the
repeated number of instruction is non-power-of-2. In this case need to
adjust the cost (calculate the cost only for unique scalar instructions)
and cost of the extracts. Also, when scheduling the bundle need to
schedule only unique scalars to avoid compiler crash because of the
multiple dependencies. Can be safely applied only if all scalars's users
are also vectorized and do not require memory accesses (this one is
a temporarily requirement, can be relaxed later).

---------

Co-authored-by: Alexey Bataev <a.bataev@outlook.com>
2023-09-08 10:00:46 -04:00
Alexey Bataev
8d933ea5ac [SLP][NFC]Use SmallDensetSet for lookup instead of ArrayRef, NFC. 2023-09-06 13:17:30 -07:00
Florian Hahn
785e7063b9
[VPlan] Don't rely on underlying instr in VPWidenRecipe (NFCI).
VPWidenRecipe only needs the opcode to widen, all other information
(flags, debug loc and operands) is already modeled directly via the
recipe.

This removes the remaining uses of the underlying instruction from
VPWidenRecipe::execute.
2023-09-06 16:27:09 +01:00
Alexey Bataev
09b8bbd6e0 [SLP][NFC]Reorder indeces instead of real values, NFC.
May save some memory/compile time.
2023-09-05 08:48:52 -07:00
Florian Hahn
165e24aa2a
[VPlan] Move DebugLoc to VPRecipeBase (NFCI).
Add a dedicated debug location to VPRecipeBase to remove another
unneeded use of the underlying LLVM IR instruction and also consolidate
various DL fields in sub classes.

Each recipe can have debug location and it shouldn't rely on reference
to the underlying LLVM IR instructions to retain it. See various recipes
that had separate DL fields already.
2023-09-05 15:45:16 +01:00
Florian Hahn
168e23c741
[VPlan] Remove reference to Instr when setting debug loc. (NFCI)
This allows untangling references to underlying IR for various recipes.
2023-09-05 10:59:13 +01:00