1351 Commits

Author SHA1 Message Date
Alexey Bataev
f82eb7e066 [SLP]Introduce gather cost estimation function.
Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial
implementation of the gather/buildvector cost estimation for buildvector
nodes. It will allow to use general codegen infrastructure for better
cost estimation + it improves the cost estimation for the
gathers/buildvectors.

Improved part of D110978.

Differential Revision: https://reviews.llvm.org/D148174
2023-04-13 10:16:00 -07:00
Simon Pilgrim
b3480d5ede [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support).

Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
2023-04-13 17:00:39 +01:00
Simon Pilgrim
9e30b87afb [TTI] getMinMaxReductionCost - add FastMathFlag argument
Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>).

This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6).

Differential Revision: https://reviews.llvm.org/D148149
2023-04-13 10:42:42 +01:00
Alexey Bataev
b28f407df9 [SLP]Improve reduction cost model for scalars.
Instead of abstract cost of the scalar reduction ops, try to use the
cost of actual reduction operation instructions, where possible. Also,
remove the estimation of the vectorized GEPs pointers for reduced loads,
since it is already handled in the tree.

Differential Revision: https://reviews.llvm.org/D148036
2023-04-12 11:32:51 -07:00
Alexey Bataev
d00158cd28 [SLP][NFC]Introduce ShuffleCostEstimator and adjustExtracts member function.
Added ShuffleCostEstimator class and the first adjustExtracts member,
which is just a copy of previous AdjustExtractCost lambda.

Differential Revision: https://reviews.llvm.org/D147787
2023-04-11 12:47:07 -07:00
Alexey Bataev
85327f307b [SLP][NFC]Make clusterSortPtrAccesses static. 2023-04-07 13:24:24 -07:00
Alexey Bataev
6ff177d928 [SLP][NFC]Improve SLP time by precomputing value<->gather nodes
dependencies.

Improved compiled time by the precomputing the mapping between gathered
scalars and their gather/buildvector nodes for later use in
isGatherShuffledEntry to avoid recomputing this map each time this
function is called.
2023-04-07 12:12:02 -07:00
Alexey Bataev
52dd72a37a [SLP][NFC]Make adjustExtracts/needToDelay members of ShuffleInstructionBuilder.
Make adjustExtracts/needToDelay lambdas members of ShuffleInstructionBuilder to allow to overload them later for cost model.

Differential Revision: https://reviews.llvm.org/D147730
2023-04-06 16:27:19 -07:00
Alexey Bataev
e58a49300e [SLP][NFC]Evaluate FMF for reductions before the loop, no need to
reevaluate it.
2023-04-06 11:57:20 -07:00
Alexey Bataev
50af6ab0ab [SLP]Fix emission of the masks in shuffles for undefs.
If the value is used in the expression, need to adjust the mask before
applying the mask. Plus, need to fix the analysis of the phi nodes for
reused scalars.
2023-04-06 10:16:58 -07:00
Alexey Bataev
cf62adbbd8 [SLP]Fix delete of the extractelement with users.
Made the condition for the erasing of the gathered extractelements
stricter, remove it only if it has single vectorized use, otherwise
leave it for instcombiner/instsimplify analysis.
2023-04-06 09:15:30 -07:00
Alexey Bataev
40105a9933 [SLP]Find reused scalars in buildvector sequences, if any.
Patch generalizes analysis of scalars. The main part is outlined into
lambda, which can be used to find reused inserted scalars and emit
shuffle for them instead of multiple insertelement instructions, if the
permutation is found alreadyi. I.e. some scalars are transformed by the
permutation of previously vectorized nodes, and some are inserted
directly.

Reworked part of D110978

Differential Revision: https://reviews.llvm.org/D146564
2023-04-05 09:37:05 -07:00
Alexey Bataev
c1660006b2 [SLP]Reorder counters for same values, if the root node is reordered.
The counters for the repeated scalars are ordered in the natural order,
but the original scalars might be reordered during SLP graph reordering
and this order can be dropped. Need to use the scalars after the
reordering, not the original ones, to emit correct code for same value
counters.
2023-04-03 07:52:49 -07:00
Alexey Bataev
c1bcf5dd0a [SLP]Fix PR61835: Assertion `I->use_empty() && "trying to erase
instruction with users."' failed.

If the externally used scalar is part of the tree and is replaced by
extractelement instruction, need to add generated extractelement
instruction to the list of the ExternallyUsedValues to avoid deletion
during vectorization.
2023-03-31 14:21:19 -07:00
Alexey Bataev
9255124a07 [SLP]Fix a crash when trying to shuffle multiple nodes.
Need to transform mask after applying shuffle using the mask itself as
a base to correctly mark with identity those indices, actually used in
previous shuffle. Allows to fix a crash, if different sized vectors are
shuffled.
2023-03-30 09:32:11 -07:00
Florian Hahn
417fe52e6f
Revert "[SLP] Check with target before vectorizing GEP Indices."
This reverts commit 1387a13e1d0bac94457626ef3e7427c84caf6e65.

This introduced performance regressions on AArch64, when the cost of a
vector GEP + extracts is offset by the benefits of vectorizing the rest
of the tree.

The test in llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll
illustrates the issue. It was extracted from code that regressed a SPEC
benchmark by 15%.
2023-03-28 08:06:53 +01:00
Alexey Bataev
59ff9d3777 [SLP]Fix PR61554: use of missing vectorized value in buildvector nodes.
If the buildvector node matches the vector node, it reuse the vector
value from this vector node, but its VectorizedValue field is not
updated. Need to update this field to avoid misses during the analysis
of the reused gather/buildvector nodes.
2023-03-20 12:05:26 -07:00
Alexey Bataev
0ad87ffdcc [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars.
Currently compiler does not support mixing of shuffled nodes
+ gather/buildvector of the remaining scalar values. It may reduce total
  number of instructions and improve performance of the
  gather/buildvector sequences.

Part of D110978

Differential Revision: https://reviews.llvm.org/D146167
2023-03-17 11:18:36 -07:00
Valery N Dmitriev
f9b438b519 [SLP] Outline GEP chain cost modeling into new TTI interface - NFCI.
Cost modeling for GEPs should actually be target dependent but is currently
done inside SLP target-independent way.
Sinking it into TTI enables target dependent implementation.
This patch adds new TTI interface and implementation of the basic functionality
trying to retain existing cost modeling.

Differential Revision: https://reviews.llvm.org/D144770
2023-03-14 14:01:34 -07:00
Alexey Bataev
641939baa9 [SLP]Remove CreateShuffle lambda and reuse ShuffleBuilder functions.
After merging main part of the gather/buildvector code, CreateShuffle
lambda can removed and ShuffleBuilder add functions can be used instead.
Also, part of the code from CreateShuffle migrated to createShuffle of
the BaseShuffleAnalysis::createShuffle function for better code emission.

Differential Revision: https://reviews.llvm.org/D145988
2023-03-14 10:15:41 -07:00
Alexey Bataev
874c49f554 [SLP]Fix PR61395: need to adjust vector factor after emitting shuffle
operation for combined entries.

The vector factor after combining of the shuffle entries is defined by
the size of the mask, not by the vector factors  of the original
entries. So, need to adjust it to emit correct code.
2023-03-14 06:27:08 -07:00
Jakub Kuderski
b9db89fbcf [ADT][NFCI] Do not use non-const lvalue-refs with enumerate in llvm/
Replace references to `enumerate` results with either const lvalue
rerences or structured bindings. I did not use structured bindings
everywhere as it wasn't clear to me it would improve readability.

This is in preparation to the switch to `zip` semantics which won't
support non-const lvalue reference to elements:
https://reviews.llvm.org/D144503.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D145987
2023-03-13 20:59:06 -04:00
Alexey Bataev
f3a68ac10c [SLP][NFC]Initial merge of gather/buildvector code in the createBuildVector function.
Required for future changes with combining shuffled nodes and
buildvector sequences to improve cost/emission of the gather nodes.

Part of D110978

Differential Revision: https://reviews.llvm.org/D145732
2023-03-13 06:11:05 -07:00
Arthur Eubanks
7c3c981442 [Passes] Remove some legacy passes
DFAJumpThreading
JumpThreading
LibCallsShrink
LoopVectorize
SLPVectorizer
DeadStoreElimination
AggressiveDCE
CorrelatedValuePropagation
IndVarSimplify

These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.
2023-03-10 17:17:00 -08:00
Alexey Bataev
93a9be0cea [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes.
Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
But the compiler may do the same for other gather/buildvector nodes too, just need to check the
dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.

Part of D110978

Differential Revision: https://reviews.llvm.org/D144958
2023-03-10 13:19:43 -08:00
Hans Wennborg
3b3a4c270b Revert "[SLP]Initial support for reshuffling of non-starting buildvector/gather nodes."
This caused verifier errors:

  Instruction does not dominate all uses!
    %8 = insertelement <2 x i64> %7, i64 %pgocount1330, i64 1
    %15 = shufflevector <2 x i64> %8, <2 x i64> poison, <2 x i32> <i32 1, i32 1>
  in function ?NearestInclusiveAncestorAssignedToSlot@SlotScopedTraversal@blink@@SAPAVElement@2@ABV32@@Z

(or register allocator crash when the verifier was disabled).

See comment on the code review.

> Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
> But the compiler may do the same for other gather/buildvector nodes too, just need to check the
> dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.
>
> Part of D110978
>
> Differential Revision: https://reviews.llvm.org/D144958

This reverts commit a611b3f3059e4c3b9e7b914091c3edaef099fd5d.
It also reverts 7a4061ae372b3262703ffeea3b64db89187db611 which depended on the above.
2023-03-10 14:40:12 +01:00
Alexey Bataev
a611b3f305 [SLP]Initial support for reshuffling of non-starting buildvector/gather nodes.
Previously only the very first gather/buildvector node might be probed for reshuffling of other nodes.
But the compiler may do the same for other gather/buildvector nodes too, just need to check the
dependency and postpone the emission of the dependent nodes, if the origin nodes were not emitted yet.

Part of D110978

Differential Revision: https://reviews.llvm.org/D144958
2023-03-07 12:45:40 -08:00
Alexey Bataev
c411965820 [SLP]Fix PR61224: Compiler hits infinite loop.
IRBuilder in many cases is able to fold constant code automatically,
but in some cases (for some intrinsics) it cannot do it. Need to perform
manual calculation, if constant provided in these corner cases, to avoid
infinite loop.
2023-03-06 13:46:41 -08:00
Valery N Dmitriev
ec7154fe70 [SLP] Add banner argument to SLP costs debug printer method - NFC.
Removed unnecessary warning workaround.

Differential Revision: https://reviews.llvm.org/D144992
2023-02-28 11:22:49 -08:00
Alexey Bataev
1d6b5b66bb [SLP]Fix PR61050: Assertion `I->use_empty() && "trying to erase instruction with users."
When gathering the counter for the reused scalars, need to use reduced
value, not the original reduced value. Same values counter is gathered
for reduced values, not original ones.
2023-02-28 07:51:34 -08:00
Vasileios Porpodas
a700fb3d9b [SLP] Fixes crash in BoUpSLP::isGatherShuffledEntry()
Crash caused by: 708eb1b96d9a36f9c0182b7d53c492059778fa35

Differential Revision: https://reviews.llvm.org/D144895
2023-02-27 12:29:25 -08:00
Alexey Bataev
007177bdde [SLP]Fix PR61018: Assertion `Mask[I] == UndefMaskElem && "Multiple uses
of scalars."' failed.

Need to check for the reused indices when checking if 2 insertelement
instruction are from the same buildvector. If the inidices are reused,
better not to match buildvectors and consider them as differenet,
otherwise need to track the order of insertelement operations.
2023-02-27 10:09:48 -08:00
Alexey Bataev
5f53e85f8a [SLP]Fix a crash when trying to find reduced ops for the reduced value.
Need to use original reduced value, not the one the compiler gets after
reduction, it may be replaced by the extractelement instruction already.
2023-02-27 07:32:36 -08:00
Alexey Bataev
f1c8b72c13 [SLP]Improve handling gathers/buildvectors with undefs.
If have just one non-undef scalar in the buildvector/gather node, we try
to put it to be the very first element, which is profitable in most
cases. Do the preliminary estimation, if this more profitable during
graph rotation and do same for all elements, including extractelements.

Differential Revision: https://reviews.llvm.org/D144689
2023-02-24 13:17:40 -08:00
Alexey Bataev
6e30dffe71 [SLP][NFC]Format and improve function, returning std::optional<struct>,
NFC.
2023-02-24 11:06:31 -08:00
Jonas Paulsson
1387a13e1d [SLP] Check with target before vectorizing GEP Indices.
The target hook prefersVectorizedAddressing() already exists to check with
target if address computations should be vectorized, so it seems like this
should be used in SLPVectorizer as well.

Reviewed By: ABataev, RKSimon

Differential Revision: https://reviews.llvm.org/D144128
2023-02-23 15:31:34 +01:00
Alexey Bataev
cbcdd747e8 [SLP]Do not swap not counted extractelements.
No need to swap extractelements, which were not excluded from the list
during cost analysis. It leads to incorrect cost calculation and make
vector code more profitable than it is actually is.
2023-02-21 13:16:51 -08:00
Alexey Bataev
5f928a223e [SLP]Properly define incoming block for user PHI nodes.
MainOp of the PHI vectorizable entries contains the proper order of
incoming blocks, not the last instruction in the block.
2023-02-21 08:01:24 -08:00
Alexey Bataev
708eb1b96d [SLP]Add shuffling of extractelements to avoid extra costs/data movement.
If the scalar must be extracted and then used in the gather node,
instead we can emit shuffle instruction to avoid those extra
extractelements and vector-to-scalar and back data movement.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141940
2023-02-20 06:14:42 -08:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Florian Hahn
f61c9b7569
[SLP] Fix infinite loop in isUndefVector.
This fixes an infinite loop if isa<T>(II->getOperand(1)) is true.
Update Base at the top of the loop, before the continue.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D144292
2023-02-19 21:42:24 +00:00
Alexey Bataev
e03d254bbd [SLP]Do not reduce repeated values, use scalar red ops instead.
Metric: size..text

                                                     size..text                 results     results0    diff
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test      445.00      461.00  3.6%
SingleSource/Benchmarks/Adobe-C++/loop_unroll.test                               428477.00   428445.00 -0.0%
External/SPEC/CFP2006/447.dealII/447.dealII.test                                 618849.00   618785.00 -0.0%

For all tests some extra code was optimized, GCC-C-execute has some more
inlining after

Differential Revision: https://reviews.llvm.org/D132261
2023-02-17 07:19:35 -08:00
Kazu Hirata
7e6e636fb6 Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.
2023-02-15 22:17:27 -08:00
Simon Pilgrim
552e27c521 [SLP] Use allConstant helper. NFCI. 2023-02-05 19:21:48 +00:00
Kazu Hirata
526966d07d Use llvm::bit_ceil (NFC)
Note that:

  std::has_single_bit(X) ? X : llvm::NextPowerOf2(X);

is equivalent to:

  std::bit_ceil(X)

even for input 0.
2023-01-28 16:13:09 -08:00
Kazu Hirata
f20b5071f3 [llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC) 2023-01-28 09:06:31 -08:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Alexey Bataev
9bdcf8778a [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.
The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141512
2023-01-19 13:46:25 -08:00
Joe Loser
a288d7f937 [llvm][ADT] Replace uses of makeMutableArrayRef with deduction guides
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.

Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.

Differential Revision: https://reviews.llvm.org/D141814
2023-01-16 14:49:37 -07:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00