4020 Commits

Author SHA1 Message Date
Valery Dmitriev
c80b503496
[SLP] Improve gather tree nodes matching when users are PHIs. (#69392) 2023-10-18 09:05:11 -07:00
Valery Dmitriev
9aa571f080
[SLP][NFC] Try to cleanup and better document some isGatherShuffledEntry code. (#69384)
Outline some often used common code to dedicated variables in order
to make code compact. Rename variables to more accurately reflect
their purpose. Apply const qualifier where appropriate.
Fix and add bit more explanation comment for the existing code.
2023-10-17 14:59:36 -07:00
Florian Hahn
fd31112634
[VPlan] Insert Trunc/Exts for reductions directly in VPlan.
Update the code to create Trunc/Ext recipes directly in
adjustRecipesForReductions instead of fixing it up later in
fixReductions.

This explicitly models the required conversions and also makes sure they
are generated at the right place (instead of after the exit condition),
hence the changes in a few tests.
2023-10-17 19:17:40 +01:00
Alexey Bataev
66775f8ccd [SLP]Fix PR69196: Instruction does not dominate all uses
During emission of the postponed gathers, need to insert them before
user instruction to avoid use before definition crash.
2023-10-17 10:43:59 -07:00
Alexey Bataev
119b0f3895 Revert "[SLP]Fix PR69196: Instruction does not dominate all uses"
This reverts commit 8e2b2c4181506efc5b9321c203dd107bbd63392b to fix
a crash reported in https://lab.llvm.org/buildbot/#/builders/230/builds/19993.
2023-10-16 13:29:17 -07:00
Alexey Bataev
8e2b2c4181 [SLP]Fix PR69196: Instruction does not dominate all uses
During emission of the postponed gathers, need to insert them before
user instruction to avoid use before definition crash.
2023-10-16 12:57:18 -07:00
Yingwei Zheng
4718b4011f
[LV] Invalidate disposition of SCEV values after loop vectorization (#69230)
This PR fixes the assertion failure of `SE.verify()` after loop vectorization.
2023-10-17 03:49:39 +08:00
Florian Hahn
f7a8a78cb7
[VPlan] Also print operands of canonical IV (NFC).
Also print the operands of VPCanonicalIVPHIRecipe. That was missed
earlier.
2023-10-16 20:28:23 +01:00
Nikita Popov
d4300154b6 Revert "[ValueTracking] Remove by-ref computeKnownBits() overloads (NFC)"
This reverts commit b5743d4798b250506965e07ebab806a3c2d767cc.

This causes some minor compile-time impact. Revert for now, better
to do the change more gradually.
2023-10-16 14:04:09 +02:00
Nikita Popov
b5743d4798 [ValueTracking] Remove by-ref computeKnownBits() overloads (NFC)
Remove the old overloads that accept KnownBits by reference, in
favor of those that return it by value.
2023-10-16 13:00:31 +02:00
Florian Hahn
b1115f8cce
[LV] Use LatchVPBB directly instead of going through region (NFC).
Split off from D158333.
2023-10-13 20:08:31 +01:00
Fangrui Song
2d854dd3e7 Move global namespace cl::opt inside llvm:: or internalize them 2023-10-10 19:58:03 -07:00
Alexey Bataev
c2ae16f6a7 [VectorCombine]Fix a crash during long vector analysis.
If the analysis of the single vector requested, need to use original
type to avoid crash
2023-10-09 14:22:37 -07:00
Rin
df8e0d057d
[AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF (#67697)
This patch is based off of
https://github.com/llvm/llvm-project/pull/67543.

We are currently using the exact trip count to make decisions regarding
the maximum VF. We can instead use the upper bound TC, which will be the
same as the constant trip count when that is known.
2023-10-09 16:26:19 +01:00
Simon Pilgrim
bea3967271 [VectorCombine] Rename foldBitcastShuf -> foldBitcastShuffle. NFC.
Consistently use the term "Shuffle" in all vector combiner folds.
2023-10-09 11:28:50 +01:00
Graham Hunter
3273ea40e5
[LV] Cache call vectorization decisions (#66521)
LoopVectorize currently queries VFDatabase repeatedly for each CI,
and each query to VFDatabase rescans all vector variants.

This patch instead makes a decision for each call once per VF based
on the cost of scalarization vs. function call to a vector variant
of the function vs. a vector intrinsic, then caches the decision
along with relevant info for use in planning and plan execution.
2023-10-09 11:23:19 +01:00
Florian Hahn
dae91f5dbc
[VPlan] Avoid VPTransformState::reset in fixReduction (NFCI).
There's no need to repeatedly query and reset the state for
LoopExitInstDef. This removes one of the last uses of
VPTransformState::reset, by use a vector to store and update the
results. No other code should try to retrieve the result from State
outside the fixReductionCall.
2023-10-07 23:24:24 +01:00
Simon Pilgrim
94795a37e8 [VectorCombine] foldBitcastShuf - add support for length changing shuffles
Allow length changing shuffle masks in the "bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'" fold.

It also exposes some poor shuffle mask detection for extract/insert subvector cases inside improveShuffleKindFromMask

First stage towards addressing Issue #67803
2023-10-06 11:59:51 +01:00
Simon Pilgrim
d3e66a88c2 [VectorCombine] foldBitcastShuf - compute scale factors using shuffle type element size instead of element count. NFCI.
First step towards supporting length changing shuffles
2023-10-05 18:58:36 +01:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00
Rin
d3e4702c0f
[AArch64] [LoopVectorize] Use either fixed-width or scalable VF when tail-folding (#67543)
Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.
2023-10-05 10:24:30 +01:00
Arthur Eubanks
07389535a7 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit b186f1f68be11630355afb0c08b80374a6d31782.

Causes crashes, see https://reviews.llvm.org/D158449.
2023-10-04 14:37:16 -07:00
Alexey Bataev
b186f1f68b [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-04 07:53:30 -07:00
Alexey Bataev
1129dec778 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 6f43d28f3452b3ef598bc12b761cfc2dbd0f34c9 to fix
a crash reported in https://reviews.llvm.org/D158449.
2023-10-03 13:02:16 -07:00
Florian Hahn
07e715953b
[VPlan] Check users of LoopExitInstDef in VPlan directly. (NFCI)
Instead of walking the IR def use chains of the generated code, adjust
the generated VPInstruction if needed and check its users in VPlan.
2023-10-03 20:42:15 +01:00
Alexey Bataev
6f43d28f34 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-03 10:26:11 -07:00
Alexey Bataev
d0d608383e [SLP][NFC]Fix assert message, NFC. 2023-10-02 13:38:54 -07:00
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Alexey Bataev
019aee8327 [SLP]Improve costs in computeExtractCost() to avoid crash after D158449.
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
2023-09-29 07:48:02 -07:00
Hans Wennborg
06f3b0ed43 Revert "[SLP]Improve costs in computeExtractCost() to avoid crash after D158449."
This caused asserts:

  Assertion failed: NumElts > 1 && "Expected at least 2-element fixed length vector(s).",
  file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp, line 7096

see comment on 59a67ea35d

> Need to consider the length of the original vector for extractelements,
> not the length, matched number of the scalars. It fixes 2 issues: 1)
> improves cost estimation; 2) Fixes crashes after D158449.

This reverts commit 59a67ea35d608480257fc64ec3e5106ef50de740.
2023-09-29 10:42:19 +02:00
Alexey Bataev
3204f88a8b Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the
crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.
2023-09-28 11:57:32 -07:00
Alexey Bataev
c88c281cf1 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-28 11:03:21 -07:00
Alexey Bataev
59a67ea35d [SLP]Improve costs in computeExtractCost() to avoid crash after D158449.
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
2023-09-28 09:36:08 -07:00
Nikita Popov
3b82397965 [VectorCombine] Check for non-byte-sized element type
We should check whether the element type is non-byte-sized, not
the vector type. For types like <32 x i1> the whole type is
byte-sized, but the individual elements (that we scalarize to)
are not.

Fixes https://github.com/llvm/llvm-project/issues/67060.
2023-09-28 14:18:30 +02:00
Mikael Holmen
9cecee97a0 [VPlan] Silence gcc Wparentheses warning [NFC]
Without the fix gcc warns about
../lib/Transforms/Vectorize/VPlanTransforms.cpp:968:42: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
  968 |          UseActiveLaneMaskForControlFlow &&
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
  969 |              "DataAndControlFlowWithoutRuntimeCheck implies "
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  970 |              "UseActiveLaneMaskForControlFlow");
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-09-28 12:04:26 +02:00
Alexey Bataev
9eeb0293e2 [SLP]Cleanup MultiNodeScalars when tree deleted.
Need to clear MultiNodeScalars map to avoid compiler crash when tree is
deleted.
2023-09-27 07:48:53 -07:00
Alexey Bataev
ea7f43ec14 [SLP]Do not gather node, if the instruction, that does not require
scheduling, is previously vectorized.

If the main node was vectorized already, but does not require
scheduling, we still can try to vectorize it in this new node instead of
gathering.
2023-09-26 11:57:35 -07:00
Ben Shi
ea0ee55c02
[VectorCombine] Enable transform 'scalarizeLoadExtract' for non constant indexes (#65445)
Enable the transform if a non constant index is guaranteed to be safe
via a UREM/AND.
2023-09-26 09:41:53 +08:00
alexfh
5d86176f48
Revert "[SLP]Do not gather node, if the instruction, that does not require" (#67386)
This reverts commit 77053421228edd12a3ba73d4eebd970fcdd3b2c0, which
introduces a
clang crash (test case: https://gcc.godbolt.org/z/zn5n4KWPY).
2023-09-26 02:45:11 +02:00
Florian Hahn
97687b7aea
[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.

This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.

This has the advantage of keeping the logic  (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158779
2023-09-25 13:34:45 +01:00
Florian Hahn
1a9e45080f
[VPBuilder] Add setInsertPoint version taking a recipe directly (NFC).
This helps to slightly simplify code when a recipe can be obtained
easily. Suggested in D158779.
2023-09-25 12:17:53 +01:00
Florian Hahn
541e88dbc2
[VPlan] Simplify HCFG construction of region blocks (NFC).
Update the logic to update the successors and predecessors of region
blocks directly. This adds special handling for header and latch blocks
in place, and removes the separate loop to fix up the region blocks.

Helps to simplify D158333.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D159136
2023-09-24 21:53:35 +01:00
Kazu Hirata
e7497570d8 [Vectorize] Use range-based for loops (NFC) 2023-09-22 17:43:06 -07:00
Youngsuk Kim
e5026f0179 [llvm] Remove uses of Type::getPointerTo() (NFC)
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:

* Drop the call entirely if the sole purpose of it is to support a no-op
  bitcast (remove the no-op bitcast as well).

* Replace with `PointerType::get()`/`PointerType::getUnqual()`

This is a NFC cleanup effort.

Reviewed By: barannikov88

Differential Revision: https://reviews.llvm.org/D155232
2023-09-22 19:44:38 -04:00
Florian Hahn
d9f83169d1
[VPlan] Ensure start value of phis is the first op at construction (NFC)
Header phi recipes have the start value (incoming from outside the loop)
as first operand. This wasn't the case for VPWidenPHIRecipes. Instead
the start value was picked during execute() by doing extra work.

To be in line with other recipes, ensure the operand order is as
expected during construction.
2023-09-22 21:24:15 +01:00
Alexey Bataev
7ff83ed6cd [SLP]Do not try to reorder possible strided nodes.
Reordering of possible strided nodes in bottom-to-top order requires
top-to-bottom reordering of the operands of such nodes, which is not
supported. Need to disable reordering of strided operands to avoid
compiler crashes.
2023-09-22 07:55:43 -07:00
David Spickett
8f548610a6 Revert "[SLP]Use source vector type as the original vector type instead of"
This reverts commit 9a99944df068b29b905cd8ba9a2132cc6382b6fb.

Due to test suite failures on all our SVE buildbots e.g.:
https://lab.llvm.org/buildbot/#/builders/184/builds/7375

clang: ../llvm/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:3565:
InstructionCost llvm::AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind,
VectorType *, ArrayRef<int>, TTI::TargetCostKind, int, VectorType *,
ArrayRef<const Value *>): Assertion `Mask.size() == TpNumElts && "Expected Mask and Tp size to match!"' failed.
2023-09-22 07:52:16 +00:00
Alexey Bataev
9a99944df0 [SLP]Use source vector type as the original vector type instead of
artificial for better cost estimation.

Need to use original source vector type, not the one artificially
constructed, based on the number of vectorized scalars. It affect the
cost significantly.
2023-09-21 11:34:02 -07:00
Alexey Bataev
3dc28e6c6a [SLp]Fix a crash because of wrong deps between vectorized nodes.
Need to change the order of the nodes vectorization to avoid too early
insertion of the first node.
2023-09-21 10:19:11 -07:00