1865 Commits

Author SHA1 Message Date
Alexey Bataev
aeedab77b5 [SLP]Correctly decide if the non-power-of-2 number of stores can be vectorized.
Need to consider the maximum type size in the graph before doing attempt
for the vectorization of non-power-of-2 number of elements, which may be
  less than MinVF.
2024-08-29 12:40:31 -07:00
Philip Reames
4bc7c74240
[SLP] Extract isIdentityOrder to common routine [probably NFC] (#106582)
This isn't quite just code motion as the four different versions we had
of this routine differed in whether they ignored the "size" marker used
to represent undef. I doubt this matters in practice, but it is a
functional change.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-08-29 11:00:31 -07:00
Philip Reames
b5a1b45fe3 [SLP] Early return in getReorderingData [nfc] 2024-08-29 08:58:27 -07:00
Alexey Bataev
50515db57f [SLP][NFC]Format canVectorizeLoads after previous NFC patches. 2024-08-29 04:31:13 -07:00
Alexey Bataev
fdf72c992b [SLP]Fix a crash when requestin the cost for buildvector cmp nodes types.
Need to use original cmp type i1 when estimating the cost for the
buildvector node, not its operand types to prevent compiler crash upon
TTI cost estimation.
2024-08-29 03:53:28 -07:00
tcwzxx
121fb2c2cc
[SLP] Fix the Vec lane overridden by the shuffle mask (#106341)
Currently, SLP uses shuffle for the external user of `InsertElementInst`
and iterates through the `InsertElementInst` chain to fill the mask with
constant indices. However, it may override the original Vec lane. Using
the original Vec lane is sufficient.
2024-08-29 11:18:26 +08:00
Alexey Bataev
ec360d6523 [SLP][NFC]Add getValueType function and use instead of complex scalar type analysis 2024-08-28 13:02:59 -07:00
Philip Reames
ee764a2603
[SLP] Remove -slp-optimize-identity-hor-reduction-ops option (#106238)
This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.
2024-08-27 13:21:57 -07:00
Philip Reames
6a74b0ee59 [SLP] Use early-return in canVectorizeLoads [nfc] 2024-08-27 12:30:15 -07:00
Philip Reames
ed03070eb3
[SLP] Support vectorizing 2^N-1 reductions (#106266)
Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.

Specifically, two related changes:
1) When searching for a profitable VL, start with the 2^N-1 reduction
width.
If cost model does not select that VL, return to power of two boundaries
   when halfing the search VL.  The later is mostly for simplicity.
2) Reduce the minimum reduction width from 4 to 3 when supporting
non-power
   of two vectors.  This is required to support <3 x Ty> cases.

One thing which isn't directly related to this change, but I want to
note for clarity is that the non-power-of-two vectorization appears to
be sensative to operand order of reduction. I haven't yet fully figured
out why, but I suspect this is non-power-of-two specific.
2024-08-27 12:27:03 -07:00
Alexey Bataev
2dbc6d4d4b [SLP][NFC]Assert total number of scalar uses not less than number of scalar uses, NFC. 2024-08-27 09:57:08 -07:00
Philip Reames
d0a6434e86 [SLP] Reduce scope of variable using if clause [NFC]
This particular variable name is shadowed by another lower in the
function, so reducing it's scope to it's single use removes the
shadowing and makes the code much less error prone.
2024-08-27 09:14:30 -07:00
Alexey Bataev
9b408961eb [SLP][NFC]Use has_single_bit instead of isPowerOf2 functions, NFC. 2024-08-27 08:21:19 -07:00
Alexey Bataev
9b4a8f44ed [SLP][NFC]Improve auto types, NFC. 2024-08-27 06:11:08 -07:00
Han-Kuan Chen
3d1c63ee2c
[SLP][REVEC] Expand getelementptr into vector form. (#103704) 2024-08-27 16:11:52 +08:00
Alexey Bataev
e1d2251290 [SLP]Fix minbitwidth analysis for gather nodes with icmp users.
If the node is not in MinBWs container and the user node is icmp node,
the compiler should not check the type size of the user instruction, it
is always 1 and is not good for actual bitwidth analysis.

Fixes https://github.com/llvm/llvm-project/issues/105988
2024-08-26 11:40:44 -07:00
Alexey Bataev
b9d3da8c8d [SLP]Fix PR105904: the root node might be a gather node without user for reductions.
Before checking the user components of the gather/buildvector nodes,
need to check if the node has users at all. Root nodes might not have
users, if it is a node for the reduction.

Fixes https://github.com/llvm/llvm-project/issues/105904
2024-08-26 07:09:05 -07:00
Alexey Bataev
dab19dac94 [SLP]Fix a crash for the strided nodes with reversed order and externally used pointer.
If the strided node is reversed, need to cehck for the last instruction,
not the first one in the list of scalars, when checking if the root
pointer must be extracted.
2024-08-23 07:35:48 -07:00
Alexey Bataev
f3d2609af3 [SLP]Improve/fix subvectors in gather/buildvector nodes handling
SLP vectorizer has an estimation for gather/buildvector nodes, which
contain some scalar loads. SLP vectorizer performs pretty similar (but
large in SLOCs) estimation, which not always correct. Instead, this
patch implements clustering analysis and actual node allocation with the
full analysis for the vectorized clustered scalars (not only loads, but
also some other instructions) with the correct cost estimation and
vector insert instructions. Improves overall vectorization quality and
simplifies analysis/estimations.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/104144
2024-08-23 06:45:22 -07:00
Vitaly Buka
96b3166602
Revert "[SLP]Improve/fix subvectors in gather/buildvector nodes handling" (#105780)
with "[Vectorize] Fix warnings"

It introduced compiler crashes, see #104144.

This reverts commit 69332bb8995aef60d830406de12cb79a50390261 and
351f4a5593f1ef507708ec5eeca165b20add3340.
2024-08-22 22:21:20 -07:00
Vitaly Buka
351f4a5593
Reland "[Vectorize] Fix warnings"" (#105772)
Revert was wrong, 

The bot is still broken
https://lab.llvm.org/buildbot/#/builders/51/builds/2838

Reverts llvm/llvm-project#105771
2024-08-22 21:14:12 -07:00
Vitaly Buka
151945151c
Revert "[Vectorize] Fix warnings" (#105771)
Triggers assert in compiler
https://lab.llvm.org/buildbot/#/builders/51/builds/2836

```
Instructions.cpp:1700: llvm::ShuffleVectorInst::ShuffleVectorInst(Value *, Value *, ArrayRef<int>, const Twine &, InsertPosition): Assertion `isValidOperands(V1, V2, Mask) && "Invalid shuffle vector instruction operands!"' failed.
```

This reverts commit a625435d3ef4c7bbfceb44498b9b5a2cbbed838b.
2024-08-22 20:03:08 -07:00
Kazu Hirata
a625435d3e [Vectorize] Fix warnings
This patch fixes warnings of the form:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:9300:23: error: loop
  variable '[E, Idx]' creates a copy from type 'const value_type' (aka
  'const std::pair<const llvm::slpvectorizer::BoUpSLP::TreeEntry *,
  unsigned int>') [-Werror,-Wrange-loop-construct]
2024-08-22 08:52:01 -07:00
Alexey Bataev
69332bb899
[SLP]Improve/fix subvectors in gather/buildvector nodes handling
SLP vectorizer has an estimation for gather/buildvector nodes, which
contain some scalar loads. SLP vectorizer performs pretty similar (but
large in SLOCs) estimation, which not always correct. Instead, this
patch implements clustering analysis and actual node allocation with the
full analysis for the vectorized clustered scalars (not only loads, but
also some other instructions) with the correct cost estimation and
vector insert instructions. Improves overall vectorization quality and
simplifies analysis/estimations.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/104144
2024-08-22 11:24:08 -04:00
Alexey Bataev
9402bb0908
[SLP]Do not count extractelement costs in unreachable/landing pad blocks.
If the external user of the scalar to be extract is in
unreachable/landing pad block, we can skip counting their cost.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/105667
2024-08-22 11:03:34 -04:00
Alexey Bataev
b765fdd997
[SLP]Try to keep scalars, used in phi nodes, if phi nodes from same block are vectorized.
Before doing the vectorization of the PHI nodes, the compiler sorts them
by the opcodes of the operands. If the scalar is replaced during the
vectorization by extractelement, it breaks this sorting and prevent some
further vectorization attempts. Patch tries to improve this by doing
extra analysis of the scalars and tries to keep them, if it is found that
this scalar is used in other (external) PHI node in the same block.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/103923
2024-08-21 15:23:47 -04:00
Alexey Bataev
e31252bf54 [SLP]Fix PR105120: fix the order of phi nodes vectorization.
The operands of the phi nodes should be vectorized in the same order, in
which they were created, otherwise the compiler may crash when trying
to correctly build dependency for nodes with non-schedulable
instructions for gather/buildvector nodes.

Fixes https://github.com/llvm/llvm-project/issues/105120
2024-08-21 12:22:01 -07:00
tcwzxx
816068e462
[NFC][SLP] Remove useless code of the schedule (#104697)
Currently, the SLP schedule has two containers of `ScheduleData`:
`ExtraScheduleDataMap` and `ScheduleDataMap`. However, the
`ScheduleData` in `ExtraScheduleDataMap` is only used to indicate
whether the instruction is processed or not and does not participate in
the schedule, which is useless. `ScheduleDataMap` is sufficient for this
purpose. The `OpValue` member is used only in `ExtraScheduleDataMap`,
which is also useless.
2024-08-19 20:16:51 +08:00
Alexey Bataev
4a0bbbcbcf [SLP]Fix PR104637: do not create new nodes for fully overlapped non-schedulable nodes
If the scalars do not require scheduling and were already vectorized,
but in the different order, compiler still tries to create the new node.
It may cause the compiler crash for the gathered operands. Instead need
to consider such nodes as full overlap and just reshuffle vectorized
node.

Fixes https://github.com/llvm/llvm-project/issues/104637
2024-08-16 13:49:44 -07:00
Han-Kuan Chen
81f8abdca4
[SLP][REVEC] Fix CreateInsertElement does not use the correct result if MinBWs applied. (#104558) 2024-08-16 21:09:48 +08:00
Alexey Bataev
b6bb208662 Revert "[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC"
This reverts commit 2d52eb6a434fe47e67086f5ec1c3789bf6e7a604 to fix
compile time regression found in https://llvm-compile-time-tracker.com/compare.php?from=fcefe957ddfdc5a2fe9463757b597635e3436e01&to=2d52eb6a434fe47e67086f5ec1c3789bf6e7a604&stat=instructions%3Au.
2024-08-15 09:19:01 -07:00
Alexey Bataev
2d52eb6a43 [SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC 2024-08-15 08:12:20 -07:00
Alexey Bataev
56140a8258 [SLP]Fix PR104422: Wrong value truncation
The minbitwidth restrictions can be skipped only for immediate reduced
values, for other nodes still need to check if external users allow
bitwidth reduction.

Fixes https://github.com/llvm/llvm-project/issues/104422
2024-08-15 08:00:08 -07:00
Nikita Popov
aaab4fcf65 Revert "[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC"
This reverts commit e1b15504a831e63af6fb9a6e83faaa10ef425ae6.

This causes compile-time regressions, see:
http://llvm-compile-time-tracker.com/compare.php?from=e687a9f2dd389a54a10456e57693f93df0c64c02&to=e1b15504a831e63af6fb9a6e83faaa10ef425ae6&stat=instructions:u

Probably some of the new SmallVector sizes are sub-optimal.
2024-08-15 15:50:48 +02:00
Alexey Bataev
e1b15504a8 [SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC 2024-08-14 12:28:45 -07:00
Alexey Bataev
20b2c9f10f [SLP][NFC]Use GatheredScalars vector instead of the original E->Scalars, NFC
GateredScalars is a full copy of the E->Scalars in this places and can
be safely used for now. Unifies the code across the function.
2024-08-14 08:29:38 -07:00
Alexey Bataev
d9b9ae6ba9 [SLP][NFC]Use transform nodes before building external uses, NFC.
In preparing for the future upcoming patches, just moving the call to
the proper place, which is NFC for now.
2024-08-14 08:19:05 -07:00
Han-Kuan Chen
246f345152
[SLP][REVEC] Make CastInst support vector instructions. (#103216) 2024-08-13 23:52:32 +08:00
Han-Kuan Chen
6aad4918e8
[SLP][REVEC] Make MinBWs support vector instructions. (#103049)
If ScalarTy is FixedVectorType, it should remain as FixedVectorType.
2024-08-13 21:35:28 +08:00
Han-Kuan Chen
2256d00a14
[SLP][REVEC] Use VL.front()->getType() as ScalarTy. (#102437)
VL.front()->getType() may be FixedVectorType when revec is enabled.

Fix "Expected item in MinBWs.".
2024-08-13 19:53:45 +08:00
Han-Kuan Chen
875b551de7
[SLP][REVEC] Make computeMinimumValueSizes and collectValuesToDemote support vector instructions. (#103005) 2024-08-13 19:35:25 +08:00
Vitaly Buka
5ce47a5813
Reland "[Support] Assert that DomTree nodes share parent" (#102782)
A dominance query of a block that is in a different function is
ill-defined, so assert that getNode() is only called for blocks that are
in the same function.

There are three cases, where this behavior did occur. LoopFuse didn't
explicitly do this, but didn't invalidate the SCEV block dispositions,
leaving dangling pointers to free'ed basic blocks behind, causing
use-after-free. We do, however, want to be able to dereference basic
blocks inside the dominator tree, so that we can refer to them by a
number stored inside the basic block.

Reverts #102780
Reland #101198
Fixes #102784

Co-authored-by: Alexis Engelke <engelke@in.tum.de>
2024-08-13 11:56:02 +02:00
Han-Kuan Chen
b4b0c02306
[SLP][REVEC] Make tryToReduce and related functions support vector instructions. (#102327) 2024-08-13 11:44:23 +08:00
Han-Kuan Chen
70cf58e6c1
[SLP][REVEC] Make SLP vectorize shufflevector. (#102489)
Add getShufflevectorNumGroups to vectorize shufflevector.

Current getShufflevectorNumGroups can only vectorize limited pattern
(e.g., the masks of shufflevector use the elements of the source in
order).

In addition, ReuseShuffleIndices and ReorderIndices are not supported.
2024-08-13 11:19:29 +08:00
Alexey Bataev
ecbbe5b431
[SLP]Fix mask building for alternate node cost estimation (#102966)
Need to to use same functionality in cost model, as for the codegen, to
correctly build the shuffle mask and estimate the cost.
2024-08-12 17:26:56 -04:00
Alexey Bataev
b10ecfa914
[SLP]Represent externally used values as original scalars, if profitable.
Currently SLP vectorizer tries to keep only GEPs as scalar, if they are
vectorized but used externally. Same approach can be used for all scalar
values. This patch tries to keep original scalars if all its operands
remain scalar or externally used, the cost of the original scalar is
lower than the cost of the extractelement instruction, or if the number
of externally used scalars in the same entry is power of 2. Last
criterion allows better revectorization for multiply used scalars.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/100904
2024-08-12 10:15:02 -04:00
Alexey Bataev
34514ce09a [SLP][NFC]Use local getShuffleCost function across the code, NFC. 2024-08-12 06:49:53 -07:00
Alexey Bataev
2a05971de2 [SLP]Add index of the node to the short name output.
Improves debugging experience, does nothing with the functionality.
2024-08-08 08:57:14 -07:00
Han-Kuan Chen
7a4fc7491c
[SLP][REVEC] Fix insertelement has multiple uses. (#102329) 2024-08-08 23:23:10 +08:00
Alexey Bataev
7e7a439705
[SLP][NFC]Introduce CombinedVectorize nodes, NFC. (#99309)
This adds combined vectorized node. It simplifies handling of the
combined nodes, like select/cmp, which can be reduced to min/max,
mul/add transformed to fma, etc. Improves cost mode handling and may end
up with better codegen in future (direct emission of the intrinsics).
2024-08-08 08:05:33 -04:00