1893 Commits

Author SHA1 Message Date
Philip Reames
63e8a1b16f
[SLP] Enable reordering for non-power-of-two vectors (#106638)
This change tries to enable vector reordering during vectorization for
non-power-of-two vectors. Specifically, my goal is to be able to
vectorize reductions whose operands appear in other than identity order.
(i.e. a[1] + a[0] + a[2]). Our standard pass pipeline, Reassociation
effectively canonicalizes towards this form. So for reduction
vectorization to be wildly applicable, we need this feature.

This change enables the use of a non-empty ReorderIndices structure -
which is effectively required for out of order loads or gathers - while
leaving the ReuseShuffleIndices mechanism unused and disabled. If I've
understood the code structure, the former is used when describing
implicit shuffles required by the vectorization strategy (i.e. loading
elements 0,1,3,2 in the order 0,1,2,3 and then shuffling later), while
the later is used when trying to optimize explode/buildvectors (called
gathers in this code).

I audited all the code enabled by this change, but can't claim to
deeply understand most of it. I added a couple of bailouts in places
which appeared to be difficult to audit and optional optimizations. I've
tried to do so in the least risky way I can, but am not completely
confident in this change. Careful review appreciated.
2024-09-05 07:52:27 -07:00
Alexey Bataev
75dc9af1a2 [SLP][NFC]Remove some dead code + reorder calls to avoid extra checks 2024-09-04 07:24:35 -07:00
Alexey Bataev
d65ff3e936 [SLP]Fix PR107198: add a check for empty complex type
Need to check if the complex type is empty before trying to dig in,
trying to find vectorizable type

Fixes https://github.com/llvm/llvm-project/issues/107198
2024-09-04 05:13:43 -07:00
Alexey Bataev
af1e59aea2 [SLP]Fix PR107037: correctly track origonal/modified after vectorizations reduced values
Need to correctly track reduced values with multiple uses in the same
reduction emission attempt. Otherwise, the number of the reuses might be
calculated incorrectly, and may cause compiler crash.

Fixes https://github.com/llvm/llvm-project/issues/107037
2024-09-04 04:54:32 -07:00
Philip Reames
3e8840ba71 Remove "Target" from createXReduction naming [nfc]
Despite the stale comments, none of these actually use TTI, and they're
solely generating standard LLVM IR.
2024-09-03 17:03:55 -07:00
Alexey Bataev
dce73e115e Revert "[SLP]Fix PR107037: correctly track origonal/modified after vectorizations reduced values"
This reverts commit 98bb354a0add4aeb614430f48a23f87992166239 to fix
buildbots https://lab.llvm.org/buildbot/#/builders/155/builds/2056 and https://lab.llvm.org/buildbot/#/builders/11/builds/4407
2024-09-03 16:14:17 -07:00
Alexey Bataev
98bb354a0a [SLP]Fix PR107037: correctly track origonal/modified after vectorizations reduced values
Need to correctly track reduced values with multiple uses in the same
reduction emission attempt. Otherwise, the number of the reuses might be
calculated incorrectly, and may cause compiler crash.

Fixes https://github.com/llvm/llvm-project/issues/107037
2024-09-03 15:49:19 -07:00
Kazu Hirata
53d3d1ab9a
[SLPVectorizer] Avoid two successive hash lookups on the same key (#107143)
This patch replaces the find-try_emplace sequence with just one call
to try_emplace, thereby avoiding two successive hash lookups on the
same key.  I am not using the "inserted" boolean from try_emplace to
preserve the original behavior (that is, before PR 107123) that checks
to see if the value is nullptr or not.
2024-09-03 14:51:00 -07:00
Kazu Hirata
126940bde3
[SLPVectorizer] Use DenseMap::{find,try_emplace} (NFC) (#107123)
I'm planning to deprecate and eventually remove
DenseMap::FindAndConstruct in favor of operator[].
2024-09-03 11:25:35 -07:00
Alexey Bataev
571c8c2c88 Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands."
This reverts commit a3ea90ffbbe47d9a1b3eab03324f09d7b8e0dcb3 after the
post commit review. The number of parts is calculated incorrectly.
2024-09-03 11:02:07 -07:00
Alexey Bataev
884d7c137a Revert "[SLP]Check for the whole vector vectorization in unique scalars analysis"
This reverts commit b74e09cb20e6218320013b54c9ba2f5c069d44b9 after
post-commit review. The number of parts is calculated incorrectly.
2024-09-03 11:02:07 -07:00
Jie Fu
20fa37bbfa [Vectorize] Fix -Wunused-variable in SLPVectorizer.cpp (NFC)
/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10310:26:
error: unused variable 'isExtractSubvectorMask' [-Werror,-Wunused-variable]
                    bool isExtractSubvectorMask =
                         ^
1 error generated.
2024-09-03 21:46:42 +08:00
Han-Kuan Chen
ce8ec31298
[SLP][REVEC] Support more mask pattern usage in shufflevector. (#106212) 2024-09-03 21:30:40 +08:00
Alexey Bataev
b74e09cb20 [SLP]Check for the whole vector vectorization in unique scalars analysis
Need to check that thr whole number of register is attempted to
vectorize before actually trying to build the node to avoid compiler
crash.
2024-09-03 06:19:21 -07:00
Alexey Bataev
f381cd0699 [SLP]Fix PR107036: Check if the type of the user is sizable before requesting its size.
Only some instructions should be considered as potentially reducing the
size of the operands types, not all instructions should be considered.

Fixes https://github.com/llvm/llvm-project/issues/107036
2024-09-03 05:29:59 -07:00
Alexey Bataev
6e68fa921b [SLP]Fix PR106909: add a check for unsafe FP operations.
NEON has non-IEEE compliant denormal flushing and the compiler should
check if it safe to vectorize instructions for NEON in non-fast math
mode.

Fixes https://github.com/llvm/llvm-project/issues/106909
2024-09-01 07:10:09 -07:00
tcwzxx
24a043a6ff
[SLP] Fix crash of shuffle poison (#106857)
When the shuffle masks are `PoisonMaskElem`, there is not need to check
the cost of `SK_ExtractSubvector`. It is free. Otherwise, it will cause
the compiler to crash.

Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) &&
"SK_ExtractSubvector index out of range"' failed.
2024-09-01 20:24:09 +08:00
Alexey Bataev
a3ea90ffbb [SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands.
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/106449
2024-08-31 08:14:49 -07:00
Martin Storsjö
9e86d4f2ed Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands."
This reverts commit 6ab07d71174982e5cb95420ee4df01347333c342.

This commit caused failed asserts, see
https://github.com/llvm/llvm-project/pull/106449.
2024-08-31 14:53:08 +03:00
Alexey Bataev
6ab07d7117
[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands.
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/106449
2024-08-30 14:50:34 -04:00
Alexey Bataev
8a267b7211 [SLP][NFC]Remove unused variable 2024-08-30 11:44:29 -07:00
Alexey Bataev
079746d2c0
[SLP]Better cost estimation for masked gather or "clustered" loads.
After landing support for actual vectorization of the "clustered" loads,
need better estimate the cost between the masked gather and clustered loads.
This includes estimation of the address calculation and better
estimation of the gathered loads. Also, this estimation now relies on
SLPCostThreshold option, allowing modify the behavior of the compiler.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/105858
2024-08-30 14:27:51 -04:00
Alexey Bataev
6023d17e6b [SLP][NFC]Add a function description, NFC. 2024-08-30 10:35:10 -07:00
Alexey Bataev
a4aa6bc8fc [SLP]Fix PR106667: carefully look for operand nodes.
If the operand node has the same scalars as one of the vectorized nodes,
the compiler could miss this and incorrectly request minbitwidth data
for the wrong node. It may lead to a compiler crash, because the
  vectorized node might have different minbw result.

Fixes https://github.com/llvm/llvm-project/issues/106667
2024-08-30 10:19:27 -07:00
Simon Pilgrim
b719c92551 [SLP] findBestRootPair - fix incorrect argument name comment. NFC. 2024-08-30 14:45:48 +01:00
Simon Pilgrim
96ad495289 [SLP] vectorizeChainsInBlock - remove superfluous continue at the end of for loop. NFC. 2024-08-30 14:45:48 +01:00
Alexey Bataev
87a988e881 [SLP]Fix PR106655: Use FinalShuffle for alternate cast nodes.
Need to use FinalShuffle function for all vectorized results to
correctly produce vectorized value.

Fixes https://github.com/llvm/llvm-project/issues/106655
2024-08-30 05:18:21 -07:00
Alexey Bataev
cc943a67d1 [SLP]Fix PR106626: trye several attempts for lookup values, if not found.
If the value is used in Scalar several times, the first attempt to find
its position in the node (if ReuseShuffleIndices and ReorderIndices not
empty) may fail. In this case need to find another copy of the same
value and try again.
Fixes https://github.com/llvm/llvm-project/issues/106626
2024-08-29 15:07:20 -07:00
Alexey Bataev
aeedab77b5 [SLP]Correctly decide if the non-power-of-2 number of stores can be vectorized.
Need to consider the maximum type size in the graph before doing attempt
for the vectorization of non-power-of-2 number of elements, which may be
  less than MinVF.
2024-08-29 12:40:31 -07:00
Philip Reames
4bc7c74240
[SLP] Extract isIdentityOrder to common routine [probably NFC] (#106582)
This isn't quite just code motion as the four different versions we had
of this routine differed in whether they ignored the "size" marker used
to represent undef. I doubt this matters in practice, but it is a
functional change.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-08-29 11:00:31 -07:00
Philip Reames
b5a1b45fe3 [SLP] Early return in getReorderingData [nfc] 2024-08-29 08:58:27 -07:00
Alexey Bataev
50515db57f [SLP][NFC]Format canVectorizeLoads after previous NFC patches. 2024-08-29 04:31:13 -07:00
Alexey Bataev
fdf72c992b [SLP]Fix a crash when requestin the cost for buildvector cmp nodes types.
Need to use original cmp type i1 when estimating the cost for the
buildvector node, not its operand types to prevent compiler crash upon
TTI cost estimation.
2024-08-29 03:53:28 -07:00
tcwzxx
121fb2c2cc
[SLP] Fix the Vec lane overridden by the shuffle mask (#106341)
Currently, SLP uses shuffle for the external user of `InsertElementInst`
and iterates through the `InsertElementInst` chain to fill the mask with
constant indices. However, it may override the original Vec lane. Using
the original Vec lane is sufficient.
2024-08-29 11:18:26 +08:00
Alexey Bataev
ec360d6523 [SLP][NFC]Add getValueType function and use instead of complex scalar type analysis 2024-08-28 13:02:59 -07:00
Philip Reames
ee764a2603
[SLP] Remove -slp-optimize-identity-hor-reduction-ops option (#106238)
This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.
2024-08-27 13:21:57 -07:00
Philip Reames
6a74b0ee59 [SLP] Use early-return in canVectorizeLoads [nfc] 2024-08-27 12:30:15 -07:00
Philip Reames
ed03070eb3
[SLP] Support vectorizing 2^N-1 reductions (#106266)
Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.

Specifically, two related changes:
1) When searching for a profitable VL, start with the 2^N-1 reduction
width.
If cost model does not select that VL, return to power of two boundaries
   when halfing the search VL.  The later is mostly for simplicity.
2) Reduce the minimum reduction width from 4 to 3 when supporting
non-power
   of two vectors.  This is required to support <3 x Ty> cases.

One thing which isn't directly related to this change, but I want to
note for clarity is that the non-power-of-two vectorization appears to
be sensative to operand order of reduction. I haven't yet fully figured
out why, but I suspect this is non-power-of-two specific.
2024-08-27 12:27:03 -07:00
Alexey Bataev
2dbc6d4d4b [SLP][NFC]Assert total number of scalar uses not less than number of scalar uses, NFC. 2024-08-27 09:57:08 -07:00
Philip Reames
d0a6434e86 [SLP] Reduce scope of variable using if clause [NFC]
This particular variable name is shadowed by another lower in the
function, so reducing it's scope to it's single use removes the
shadowing and makes the code much less error prone.
2024-08-27 09:14:30 -07:00
Alexey Bataev
9b408961eb [SLP][NFC]Use has_single_bit instead of isPowerOf2 functions, NFC. 2024-08-27 08:21:19 -07:00
Alexey Bataev
9b4a8f44ed [SLP][NFC]Improve auto types, NFC. 2024-08-27 06:11:08 -07:00
Han-Kuan Chen
3d1c63ee2c
[SLP][REVEC] Expand getelementptr into vector form. (#103704) 2024-08-27 16:11:52 +08:00
Alexey Bataev
e1d2251290 [SLP]Fix minbitwidth analysis for gather nodes with icmp users.
If the node is not in MinBWs container and the user node is icmp node,
the compiler should not check the type size of the user instruction, it
is always 1 and is not good for actual bitwidth analysis.

Fixes https://github.com/llvm/llvm-project/issues/105988
2024-08-26 11:40:44 -07:00
Alexey Bataev
b9d3da8c8d [SLP]Fix PR105904: the root node might be a gather node without user for reductions.
Before checking the user components of the gather/buildvector nodes,
need to check if the node has users at all. Root nodes might not have
users, if it is a node for the reduction.

Fixes https://github.com/llvm/llvm-project/issues/105904
2024-08-26 07:09:05 -07:00
Alexey Bataev
dab19dac94 [SLP]Fix a crash for the strided nodes with reversed order and externally used pointer.
If the strided node is reversed, need to cehck for the last instruction,
not the first one in the list of scalars, when checking if the root
pointer must be extracted.
2024-08-23 07:35:48 -07:00
Alexey Bataev
f3d2609af3 [SLP]Improve/fix subvectors in gather/buildvector nodes handling
SLP vectorizer has an estimation for gather/buildvector nodes, which
contain some scalar loads. SLP vectorizer performs pretty similar (but
large in SLOCs) estimation, which not always correct. Instead, this
patch implements clustering analysis and actual node allocation with the
full analysis for the vectorized clustered scalars (not only loads, but
also some other instructions) with the correct cost estimation and
vector insert instructions. Improves overall vectorization quality and
simplifies analysis/estimations.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/104144
2024-08-23 06:45:22 -07:00
Vitaly Buka
96b3166602
Revert "[SLP]Improve/fix subvectors in gather/buildvector nodes handling" (#105780)
with "[Vectorize] Fix warnings"

It introduced compiler crashes, see #104144.

This reverts commit 69332bb8995aef60d830406de12cb79a50390261 and
351f4a5593f1ef507708ec5eeca165b20add3340.
2024-08-22 22:21:20 -07:00
Vitaly Buka
351f4a5593
Reland "[Vectorize] Fix warnings"" (#105772)
Revert was wrong, 

The bot is still broken
https://lab.llvm.org/buildbot/#/builders/51/builds/2838

Reverts llvm/llvm-project#105771
2024-08-22 21:14:12 -07:00
Vitaly Buka
151945151c
Revert "[Vectorize] Fix warnings" (#105771)
Triggers assert in compiler
https://lab.llvm.org/buildbot/#/builders/51/builds/2836

```
Instructions.cpp:1700: llvm::ShuffleVectorInst::ShuffleVectorInst(Value *, Value *, ArrayRef<int>, const Twine &, InsertPosition): Assertion `isValidOperands(V1, V2, Mask) && "Invalid shuffle vector instruction operands!"' failed.
```

This reverts commit a625435d3ef4c7bbfceb44498b9b5a2cbbed838b.
2024-08-22 20:03:08 -07:00