1878 Commits

Author SHA1 Message Date
Han-Kuan Chen
d7c44eff42 [SLP][REVEC] Update test. NFC. 2024-09-03 06:58:15 -07:00
Han-Kuan Chen
ce8ec31298
[SLP][REVEC] Support more mask pattern usage in shufflevector. (#106212) 2024-09-03 21:30:40 +08:00
Alexey Bataev
b74e09cb20 [SLP]Check for the whole vector vectorization in unique scalars analysis
Need to check that thr whole number of register is attempted to
vectorize before actually trying to build the node to avoid compiler
crash.
2024-09-03 06:19:21 -07:00
Alexey Bataev
f381cd0699 [SLP]Fix PR107036: Check if the type of the user is sizable before requesting its size.
Only some instructions should be considered as potentially reducing the
size of the operands types, not all instructions should be considered.

Fixes https://github.com/llvm/llvm-project/issues/107036
2024-09-03 05:29:59 -07:00
Simon Pilgrim
6c8746b6e3
[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844)
Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents
2024-09-03 10:05:56 +01:00
Yingwei Zheng
a156b5a47d
[SLP] Add vectorization support for [u|s]cmp (#106747)
This patch adds vectorization support for [u|s]cmp intrinsic calls.
2024-09-02 17:06:07 +08:00
Alexey Bataev
6e68fa921b [SLP]Fix PR106909: add a check for unsafe FP operations.
NEON has non-IEEE compliant denormal flushing and the compiler should
check if it safe to vectorize instructions for NEON in non-fast math
mode.

Fixes https://github.com/llvm/llvm-project/issues/106909
2024-09-01 07:10:09 -07:00
Alexey Bataev
803ab28090 [SLP][NFC]Add a test with unsafe fp vectorization. 2024-09-01 07:05:36 -07:00
tcwzxx
24a043a6ff
[SLP] Fix crash of shuffle poison (#106857)
When the shuffle masks are `PoisonMaskElem`, there is not need to check
the cost of `SK_ExtractSubvector`. It is free. Otherwise, it will cause
the compiler to crash.

Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) &&
"SK_ExtractSubvector index out of range"' failed.
2024-09-01 20:24:09 +08:00
Alexey Bataev
a3ea90ffbb [SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands.
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/106449
2024-08-31 08:14:49 -07:00
Martin Storsjö
9e86d4f2ed Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands."
This reverts commit 6ab07d71174982e5cb95420ee4df01347333c342.

This commit caused failed asserts, see
https://github.com/llvm/llvm-project/pull/106449.
2024-08-31 14:53:08 +03:00
Alexey Bataev
6ab07d7117
[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands.
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/106449
2024-08-30 14:50:34 -04:00
Alexey Bataev
079746d2c0
[SLP]Better cost estimation for masked gather or "clustered" loads.
After landing support for actual vectorization of the "clustered" loads,
need better estimate the cost between the masked gather and clustered loads.
This includes estimation of the address calculation and better
estimation of the gathered loads. Also, this estimation now relies on
SLPCostThreshold option, allowing modify the behavior of the compiler.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/105858
2024-08-30 14:27:51 -04:00
Alexey Bataev
a4aa6bc8fc [SLP]Fix PR106667: carefully look for operand nodes.
If the operand node has the same scalars as one of the vectorized nodes,
the compiler could miss this and incorrectly request minbitwidth data
for the wrong node. It may lead to a compiler crash, because the
  vectorized node might have different minbw result.

Fixes https://github.com/llvm/llvm-project/issues/106667
2024-08-30 10:19:27 -07:00
Philip Reames
4b553f4916 Regen a bunch of vectorizer tests to avoid naming churn in upcoming review 2024-08-30 10:13:02 -07:00
Simon Pilgrim
d58d105cda
[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584)
Show fallback cases in amdlibm tests where it doesn't have that specific op
2024-08-30 16:49:23 +01:00
Simon Pilgrim
ceb613a8be [RISCV] Add full test coverage for acos/asin/atan and cosh/sinh/tanh intrinsics to support #106584 2024-08-30 14:01:15 +01:00
Alexey Bataev
87a988e881 [SLP]Fix PR106655: Use FinalShuffle for alternate cast nodes.
Need to use FinalShuffle function for all vectorized results to
correctly produce vectorized value.

Fixes https://github.com/llvm/llvm-project/issues/106655
2024-08-30 05:18:21 -07:00
Simon Pilgrim
c4b5cb0f31 [AArch64] Add accelerate test coverage for acos/asin/atan and cosh/sinh/tanh intrinsics to support #106584 2024-08-30 10:58:31 +01:00
Alexey Bataev
cc943a67d1 [SLP]Fix PR106626: trye several attempts for lookup values, if not found.
If the value is used in Scalar several times, the first attempt to find
its position in the node (if ReuseShuffleIndices and ReorderIndices not
empty) may fail. In this case need to find another copy of the same
value and try again.
Fixes https://github.com/llvm/llvm-project/issues/106626
2024-08-29 15:07:20 -07:00
Alexey Bataev
aeedab77b5 [SLP]Correctly decide if the non-power-of-2 number of stores can be vectorized.
Need to consider the maximum type size in the graph before doing attempt
for the vectorization of non-power-of-2 number of elements, which may be
  less than MinVF.
2024-08-29 12:40:31 -07:00
Philip Reames
22ba351108 [RISCV][SLP] Test for <3 x Ty> reductions which require reordering
These tests show a vectorizable reduction where the order of the
reduction has been adjusted so that profitable vectorization requires
a reordering of the computation.   We currently have no reordering
in SLP for non-power-of-two vectors, so this doesn't work.

Note that due to reassociation performed in the standard pipeline,
this is actually the canonical form for a reduction reaching SLP.
2024-08-29 11:42:09 -07:00
Elvina Yakubova
9167667b5c
[SLP] Fix REQUIRES line for failing tests (#106531) 2024-08-29 12:38:08 +01:00
Alexey Bataev
fdf72c992b [SLP]Fix a crash when requestin the cost for buildvector cmp nodes types.
Need to use original cmp type i1 when estimating the cost for the
buildvector node, not its operand types to prevent compiler crash upon
TTI cost estimation.
2024-08-29 03:53:28 -07:00
Elvina Yakubova
ddbc8f331a
[SLP] Move some of X86 tests to common directory (#106401)
Some of the tests from X86 directory can be generalized for AArch64 to
improve its coverage.
2024-08-29 11:28:33 +01:00
tcwzxx
121fb2c2cc
[SLP] Fix the Vec lane overridden by the shuffle mask (#106341)
Currently, SLP uses shuffle for the external user of `InsertElementInst`
and iterates through the `InsertElementInst` chain to fill the mask with
constant indices. However, it may override the original Vec lane. Using
the original Vec lane is sufficient.
2024-08-29 11:18:26 +08:00
Alexey Bataev
be7014e95a [SLP][NFC]Add a test with non-power-of-2 (but still whole vector) operands. 2024-08-28 10:08:20 -07:00
Philip Reames
ee764a2603
[SLP] Remove -slp-optimize-identity-hor-reduction-ops option (#106238)
This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.
2024-08-27 13:21:57 -07:00
Philip Reames
ed03070eb3
[SLP] Support vectorizing 2^N-1 reductions (#106266)
Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.

Specifically, two related changes:
1) When searching for a profitable VL, start with the 2^N-1 reduction
width.
If cost model does not select that VL, return to power of two boundaries
   when halfing the search VL.  The later is mostly for simplicity.
2) Reduce the minimum reduction width from 4 to 3 when supporting
non-power
   of two vectors.  This is required to support <3 x Ty> cases.

One thing which isn't directly related to this change, but I want to
note for clarity is that the non-power-of-two vectorization appears to
be sensative to operand order of reduction. I haven't yet fully figured
out why, but I suspect this is non-power-of-two specific.
2024-08-27 12:27:03 -07:00
Philip Reames
acb33a0c9b [RISCV][SLP] Add test coverage for 2^N-1 vector sizes w/FP types
Our cost modeling for FP and integer differs in enough cases that
having both is useful for exercising different logic in SLP.
2024-08-27 10:56:32 -07:00
Philip Reames
4dda564c72 [RISCV][SLP] Add test coverage for 2^N-1 vector sizes
Mostly copied from the AArch64 coverage for same, but also added
a couple tests for reductions which aren't currently supported.
2024-08-27 10:24:46 -07:00
Han-Kuan Chen
3d1c63ee2c
[SLP][REVEC] Expand getelementptr into vector form. (#103704) 2024-08-27 16:11:52 +08:00
Alexey Bataev
2a50dac9fb [RISCV][TTI]Fix the cost estimation for long select shuffle.
The code was broken completely. Need to iterate over the whole mask and
process the submasks correctly, check if they form full indentity and
adjust indices correctly.

Fixes https://github.com/llvm/llvm-project/issues/106126
2024-08-26 17:27:52 -07:00
Alexey Bataev
e1d2251290 [SLP]Fix minbitwidth analysis for gather nodes with icmp users.
If the node is not in MinBWs container and the user node is icmp node,
the compiler should not check the type size of the user instruction, it
is always 1 and is not good for actual bitwidth analysis.

Fixes https://github.com/llvm/llvm-project/issues/105988
2024-08-26 11:40:44 -07:00
Alexey Bataev
625e929d43 [SLP][NFC]Add a test with incorrect reduced gather node with extra use in cmp node, NFC. 2024-08-26 09:40:36 -07:00
Alexey Bataev
b9d3da8c8d [SLP]Fix PR105904: the root node might be a gather node without user for reductions.
Before checking the user components of the gather/buildvector nodes,
need to check if the node has users at all. Root nodes might not have
users, if it is a node for the reduction.

Fixes https://github.com/llvm/llvm-project/issues/105904
2024-08-26 07:09:05 -07:00
Alexey Bataev
dab19dac94 [SLP]Fix a crash for the strided nodes with reversed order and externally used pointer.
If the strided node is reversed, need to cehck for the last instruction,
not the first one in the list of scalars, when checking if the root
pointer must be extracted.
2024-08-23 07:35:48 -07:00
Alexey Bataev
f3d2609af3 [SLP]Improve/fix subvectors in gather/buildvector nodes handling
SLP vectorizer has an estimation for gather/buildvector nodes, which
contain some scalar loads. SLP vectorizer performs pretty similar (but
large in SLOCs) estimation, which not always correct. Instead, this
patch implements clustering analysis and actual node allocation with the
full analysis for the vectorized clustered scalars (not only loads, but
also some other instructions) with the correct cost estimation and
vector insert instructions. Improves overall vectorization quality and
simplifies analysis/estimations.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/104144
2024-08-23 06:45:22 -07:00
Vitaly Buka
96b3166602
Revert "[SLP]Improve/fix subvectors in gather/buildvector nodes handling" (#105780)
with "[Vectorize] Fix warnings"

It introduced compiler crashes, see #104144.

This reverts commit 69332bb8995aef60d830406de12cb79a50390261 and
351f4a5593f1ef507708ec5eeca165b20add3340.
2024-08-22 22:21:20 -07:00
Alexey Bataev
69332bb899
[SLP]Improve/fix subvectors in gather/buildvector nodes handling
SLP vectorizer has an estimation for gather/buildvector nodes, which
contain some scalar loads. SLP vectorizer performs pretty similar (but
large in SLOCs) estimation, which not always correct. Instead, this
patch implements clustering analysis and actual node allocation with the
full analysis for the vectorized clustered scalars (not only loads, but
also some other instructions) with the correct cost estimation and
vector insert instructions. Improves overall vectorization quality and
simplifies analysis/estimations.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/104144
2024-08-22 11:24:08 -04:00
Alexey Bataev
9402bb0908
[SLP]Do not count extractelement costs in unreachable/landing pad blocks.
If the external user of the scalar to be extract is in
unreachable/landing pad block, we can skip counting their cost.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/105667
2024-08-22 11:03:34 -04:00
Simon Pilgrim
f673882323
[X86] Allow speculative BSR/BSF instructions on targets with CMOV (#102885)
Currently targets without LZCNT/TZCNT won't speculate with BSR/BSF instructions in case they have a zero value input, meaning we always insert a test+branch for the zero-input case.

This patch proposes we allow speculation if the target has CMOV, and perform a branchless select instead to handle the zero input case. This will predominately help x86-64 targets where we haven't set any particular cpu target. We already always perform BSR/BSF instructions if we were lowering a CTLZ/CTTZ_ZERO_UNDEF instruction.
2024-08-22 11:11:00 +01:00
Alexey Bataev
b765fdd997
[SLP]Try to keep scalars, used in phi nodes, if phi nodes from same block are vectorized.
Before doing the vectorization of the PHI nodes, the compiler sorts them
by the opcodes of the operands. If the scalar is replaced during the
vectorization by extractelement, it breaks this sorting and prevent some
further vectorization attempts. Patch tries to improve this by doing
extra analysis of the scalars and tries to keep them, if it is found that
this scalar is used in other (external) PHI node in the same block.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/103923
2024-08-21 15:23:47 -04:00
Alexey Bataev
e31252bf54 [SLP]Fix PR105120: fix the order of phi nodes vectorization.
The operands of the phi nodes should be vectorized in the same order, in
which they were created, otherwise the compiler may crash when trying
to correctly build dependency for nodes with non-schedulable
instructions for gather/buildvector nodes.

Fixes https://github.com/llvm/llvm-project/issues/105120
2024-08-21 12:22:01 -07:00
Nikita Popov
a105877646
[InstCombine] Remove some of the complexity-based canonicalization (#91185)
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.

However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.

The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.

For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
2024-08-21 12:02:54 +02:00
Alexey Bataev
4a0bbbcbcf [SLP]Fix PR104637: do not create new nodes for fully overlapped non-schedulable nodes
If the scalars do not require scheduling and were already vectorized,
but in the different order, compiler still tries to create the new node.
It may cause the compiler crash for the gathered operands. Instead need
to consider such nodes as full overlap and just reshuffle vectorized
node.

Fixes https://github.com/llvm/llvm-project/issues/104637
2024-08-16 13:49:44 -07:00
Han-Kuan Chen
81f8abdca4
[SLP][REVEC] Fix CreateInsertElement does not use the correct result if MinBWs applied. (#104558) 2024-08-16 21:09:48 +08:00
Alexey Bataev
56140a8258 [SLP]Fix PR104422: Wrong value truncation
The minbitwidth restrictions can be skipped only for immediate reduced
values, for other nodes still need to check if external users allow
bitwidth reduction.

Fixes https://github.com/llvm/llvm-project/issues/104422
2024-08-15 08:00:08 -07:00
Alexey Bataev
65ac12d3c9 [SLP][NFC]Add a test with incorrect minbitwidth analysis for reduced operands 2024-08-15 07:26:44 -07:00
Han-Kuan Chen
246f345152
[SLP][REVEC] Make CastInst support vector instructions. (#103216) 2024-08-13 23:52:32 +08:00