1327 Commits

Author SHA1 Message Date
Alexey Bataev
641939baa9 [SLP]Remove CreateShuffle lambda and reuse ShuffleBuilder functions.
After merging main part of the gather/buildvector code, CreateShuffle
lambda can removed and ShuffleBuilder add functions can be used instead.
Also, part of the code from CreateShuffle migrated to createShuffle of
the BaseShuffleAnalysis::createShuffle function for better code emission.

Differential Revision: https://reviews.llvm.org/D145988
2023-03-14 10:15:41 -07:00
Alexey Bataev
874c49f554 [SLP]Fix PR61395: need to adjust vector factor after emitting shuffle
operation for combined entries.

The vector factor after combining of the shuffle entries is defined by
the size of the mask, not by the vector factors  of the original
entries. So, need to adjust it to emit correct code.
2023-03-14 06:27:08 -07:00
Sjoerd Meijer
775451b66a [AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation
This slightly increases the costs of InsertElement instructions that are part
of a vector splat sequence, i.e. a load, InsertElement and a shuffle (load +
dup). The resulting LD1R is a high latency instruction, and this slight
increase in costs avoids SLP vectorisation for a couple of cases where this
isn't profitable.

Fixes: https://github.com/llvm/llvm-project/issues/61047

Differential Revision: https://reviews.llvm.org/D145578
2023-03-13 14:52:09 +00:00
Alexey Bataev
93a9be0cea [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes.
Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
But the compiler may do the same for other gather/buildvector nodes too, just need to check the
dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.

Part of D110978

Differential Revision: https://reviews.llvm.org/D144958
2023-03-10 13:19:43 -08:00
Alexey Bataev
395c11f7b8 [SLP][NFC]Add a test with phi nodes in one tree node with different
order of incoming basic blocks, NFC.
2023-03-10 12:19:08 -08:00
Alexey Bataev
d84e971f48 [SLP][NFC]Add a test with multilevel dependency between buildvector
nodes, NFC.
2023-03-10 10:33:03 -08:00
Alexey Bataev
151d3b607e [SLP][NFC]Update/simplify test to avoid dead code elimination. 2023-03-10 08:12:53 -08:00
Hans Wennborg
3b3a4c270b Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes."
This caused verifier errors:

  Instruction does not dominate all uses!
    %8 = insertelement <2 x i64> %7, i64 %pgocount1330, i64 1
    %15 = shufflevector <2 x i64> %8, <2 x i64> poison, <2 x i32> <i32 1, i32 1>
  in function ?NearestInclusiveAncestorAssignedToSlot@SlotScopedTraversal@blink@@SAPAVElement@2@ABV32@@Z

(or register allocator crash when the verifier was disabled).

See comment on the code review.

> Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
> But the compiler may do the same for other gather/buildvector nodes too, just need to check the
> dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.
>
> Part of D110978
>
> Differential Revision: https://reviews.llvm.org/D144958

This reverts commit a611b3f3059e4c3b9e7b914091c3edaef099fd5d.
It also reverts 7a4061ae372b3262703ffeea3b64db89187db611 which depended on the above.
2023-03-10 14:40:12 +01:00
Ben Shi
013235a200 [RISCV][NFC] Add tests for SLP vectorization of math functions
RISCV has "vfabs.v" and "vfsqrt.v" so math functions abs and sqrt
can be SLP vectorized. But others exp/log/sin/asin/sinh/asinh/...
can not.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D145562
2023-03-10 07:34:21 +08:00
Alexey Bataev
7a4061ae37 [SLP][NFC]Update/simplify test to avoid dead code elimination. 2023-03-08 13:49:25 -08:00
Alexey Bataev
a611b3f305 [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes.
Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
But the compiler may do the same for other gather/buildvector nodes too, just need to check the
dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.

Part of D110978

Differential Revision: https://reviews.llvm.org/D144958
2023-03-07 12:45:40 -08:00
Alexey Bataev
c411965820 [SLP]Fix PR61224: Compiler hits infinite loop.
IRBuilder in many cases is able to fold constant code automatically,
but in some cases (for some intrinsics) it cannot do it. Need to perform
manual calculation, if constant provided in these corner cases, to avoid
infinite loop.
2023-03-06 13:46:41 -08:00
Alexey Bataev
6b9be26207 [SLP][NFC]Update the test to avoid dead code elimination, NFC. 2023-03-03 06:10:15 -08:00
Alexey Bataev
931bba2bc3 [SLP][NFC]Add a test with reused scalars in 3 tree nodes with different
VF, NFC.
2023-03-02 10:50:03 -08:00
Alexey Bataev
4e4ad3ab0e [SLP][NFC]Update the test to simplify and avoid dead instruction
removal, NFC.
2023-03-01 06:35:56 -08:00
Alexey Bataev
1d6b5b66bb [SLP]Fix PR61050: Assertion `I->use_empty() && "trying to erase instruction with users."
When gathering the counter for the reused scalars, need to use reduced
value, not the original reduced value. Same values counter is gathered
for reduced values, not original ones.
2023-02-28 07:51:34 -08:00
Vasileios Porpodas
a700fb3d9b [SLP] Fixes crash in BoUpSLP::isGatherShuffledEntry()
Crash caused by: 708eb1b96d9a36f9c0182b7d53c492059778fa35

Differential Revision: https://reviews.llvm.org/D144895
2023-02-27 12:29:25 -08:00
Alexey Bataev
007177bdde [SLP]Fix PR61018: Assertion `Mask[I] == UndefMaskElem && "Multiple uses
of scalars."' failed.

Need to check for the reused indices when checking if 2 insertelement
instruction are from the same buildvector. If the inidices are reused,
better not to match buildvectors and consider them as differenet,
otherwise need to track the order of insertelement operations.
2023-02-27 10:09:48 -08:00
Alexey Bataev
5f53e85f8a [SLP]Fix a crash when trying to find reduced ops for the reduced value.
Need to use original reduced value, not the one the compiler gets after
reduction, it may be replaced by the extractelement instruction already.
2023-02-27 07:32:36 -08:00
Alexey Bataev
f1c8b72c13 [SLP]Improve handling gathers/buildvectors with undefs.
If have just one non-undef scalar in the buildvector/gather node, we try
to put it to be the very first element, which is profitable in most
cases. Do the preliminary estimation, if this more profitable during
graph rotation and do same for all elements, including extractelements.

Differential Revision: https://reviews.llvm.org/D144689
2023-02-24 13:17:40 -08:00
Jonas Paulsson
1387a13e1d [SLP] Check with target before vectorizing GEP Indices.
The target hook prefersVectorizedAddressing() already exists to check with
target if address computations should be vectorized, so it seems like this
should be used in SLPVectorizer as well.

Reviewed By: ABataev, RKSimon

Differential Revision: https://reviews.llvm.org/D144128
2023-02-23 15:31:34 +01:00
Alexey Bataev
cbcdd747e8 [SLP]Do not swap not counted extractelements.
No need to swap extractelements, which were not excluded from the list
during cost analysis. It leads to incorrect cost calculation and make
vector code more profitable than it is actually is.
2023-02-21 13:16:51 -08:00
Alexey Bataev
677ea15e35 [NFC][SLP]Add a test for optimistic vectorization, NFC. 2023-02-21 11:02:32 -08:00
Alexey Bataev
5f928a223e [SLP]Properly define incoming block for user PHI nodes.
MainOp of the PHI vectorizable entries contains the proper order of
incoming blocks, not the last instruction in the block.
2023-02-21 08:01:24 -08:00
Simon Pilgrim
2ca266dc1a [SLP][X86] minimum-sizes.ll - add AVX512 test coverage
As noticed on D144128, we need better AVX512 coverage for GEP vectorization
2023-02-20 23:31:56 +00:00
Simon Pilgrim
d9bceeedbf [SLP][X86] load-merge.ll - add AVX512 test coverage
As noticed on D144128, we need better AVX512 coverage for GEP vectorization
2023-02-20 23:21:33 +00:00
Ricardo Jesus
287267c23a [AArch64] Add SLP test for abs (NFC)
Differential Revision: https://reviews.llvm.org/D144376
2023-02-20 14:50:06 +00:00
Alexey Bataev
708eb1b96d [SLP]Add shuffling of extractelements to avoid extra costs/data movement.
If the scalar must be extracted and then used in the gather node,
instead we can emit shuffle instruction to avoid those extra
extractelements and vector-to-scalar and back data movement.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141940
2023-02-20 06:14:42 -08:00
Florian Hahn
f61c9b7569
[SLP] Fix infinite loop in isUndefVector.
This fixes an infinite loop if isa<T>(II->getOperand(1)) is true.
Update Base at the top of the loop, before the continue.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D144292
2023-02-19 21:42:24 +00:00
Alexey Bataev
e03d254bbd [SLP]Do not reduce repeated values, use scalar red ops instead.
Metric: size..text

                                                     size..text                 results     results0    diff
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test      445.00      461.00  3.6%
SingleSource/Benchmarks/Adobe-C++/loop_unroll.test                               428477.00   428445.00 -0.0%
External/SPEC/CFP2006/447.dealII/447.dealII.test                                 618849.00   618785.00 -0.0%

For all tests some extra code was optimized, GCC-C-execute has some more
inlining after

Differential Revision: https://reviews.llvm.org/D132261
2023-02-17 07:19:35 -08:00
Alexey Bataev
9bdcf8778a [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.
The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141512
2023-01-19 13:46:25 -08:00
Valery N Dmitriev
d1fbe2ba6d [SLP] Remove unused check label from test - NFC 2023-01-13 16:00:43 -08:00
Valery N Dmitriev
fd7273359a [SLP] Do not ignore ordering for root node when it has in-tree uses.
When rooted with PHIs, a vectorization tree may have another node with PHIs
which have roots as their operands. We cannot ignore ordering information
for root in such a case.

Differential Revision: https://reviews.llvm.org/D141309
2023-01-10 10:12:51 -08:00
Alexey Bataev
7439e1b2de [SLP]Fix incorrect reordering of clustered scalars.
The new mask represents the order, not the mask itself. At first, need
to treat as the order, convert to mask and only after that reorder
gathered scalars to build correct clustered order.

Differential Revision: https://reviews.llvm.org/D141161
2023-01-06 16:04:09 -08:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Valery N Dmitriev
6d677c0b3d [SLP] Unify GEP cost modeling for load, store and GEP nodes.
Make a separate routine for GEPs cost calculation and make
the approach uniform across load, store and GEP tree nodes.
Additional issue fixed is GEP cost savings were applied twice
for ScatterVectorize nodes (aka gather load) making them look
unrealistically profitable for vectorization.

Differential Revision: https://reviews.llvm.org/D140789
2023-01-05 10:11:36 -08:00
Nikita Popov
b061159e79 [SLPVectorizer] Convert test to opaque pointers (NFC) 2023-01-05 12:32:44 +01:00
Alexey Bataev
a1b18946f9 [SLP]Fix incorrect shuffle results because of missing shuffle mask
analysis.

Missed the analysis of the shuffle mask when trying to analyze the
operands of the shuffle instruction during peeking through shuffle
instructions.
2023-01-04 13:10:40 -08:00
Alexey Bataev
352b660c1b [SLP][NFC]Add a pass. 2023-01-04 10:30:48 -08:00
Alexey Bataev
53a858f7fc [SLP][NFC]Add a test for incorrect skipping of shuffle instruction at
peek-through-shuffles, NFC.
2023-01-04 10:17:03 -08:00
Nikita Popov
51ba34708d [SLPVectorizer] Convert test to opaque pointers (NFC) 2023-01-04 16:39:51 +01:00
Nikita Popov
8383da1583 [SLPVectorizer] Name instructions in test (NFC) 2023-01-04 16:35:45 +01:00
Nikita Popov
a34ae06c20 [SLPVectorizer] Convert some tests to opaque pointers (NFC) 2023-01-04 16:34:39 +01:00
Dinar Temirbulatov
55c600819f [SLP][AArch64] Incorrectly estimated intrinsic as a function call.
We incorrectly assume intrinsic as a function call and it prevents us from
the opportunity to vectorize. On Aarch64 Cortex-A53 we think that
llvm.fmuladd.f64 is a function call which is wrong.

Differential Revision: https://reviews.llvm.org/D140392
2023-01-03 19:45:24 +00:00
Alexey Bataev
26fec4e845 [SLP]Fix crash on casting non-instruction extractelement.
Need to check if the extractelement operation is an extraction before
trying to move it around the buildblocks to avoid crash on cast.
2023-01-03 09:45:57 -08:00
Dinar Temirbulatov
3c205efe8b [SLP][AArch64] Add fmuladd test coverage 2023-01-03 11:28:18 +00:00
Valery N Dmitriev
6bb4b2d002 [NFC] Test case intended to cover SLP cost for chain with masked gather loads.
SLP produces two gather loads (one feeds another).
For the first set of scalar loads GEP indices are all constant.
The result of the second load is then fed into reduction (as a seed).

Differential Revision: https://reviews.llvm.org/D140785
2022-12-30 12:27:34 -08:00
Alexey Bataev
5dccea5a68 [SLP]Do not emit many extractelements, reuse the single one emitted.
We do not need to emit many extractelements for each particular use, we
can reuse the only one, just need to adjust it to make it dominate on
all uses.

Differential Revision: https://reviews.llvm.org/D140580
2022-12-30 06:38:06 -08:00
Alexey Bataev
ac01ae71f0 [SLP]Use ShuffleInstructionBuilder for vector shrinking.
We can use ShuffleInstructionBuilder now for shrinking shuffle emission.
It allows to remove extra shuffle from the emitted code and reuse
original vector.

Part of D110978

Differential Revision: https://reviews.llvm.org/D140499
2022-12-28 06:09:04 -08:00
Alexey Bataev
a9b052e2ef [SLP]Fix PR59693: Do not crash trying to set insert point for buildvector
of extractvalues.

No need to get the last instruction only for vectorized extractvalues,
for gathered(buildvector sequence) still need to get the insertion
  point.
2022-12-27 06:01:38 -08:00