1319 Commits

Author SHA1 Message Date
Alexey Bataev
5f53e85f8a [SLP]Fix a crash when trying to find reduced ops for the reduced value.
Need to use original reduced value, not the one the compiler gets after
reduction, it may be replaced by the extractelement instruction already.
2023-02-27 07:32:36 -08:00
Alexey Bataev
f1c8b72c13 [SLP]Improve handling gathers/buildvectors with undefs.
If have just one non-undef scalar in the buildvector/gather node, we try
to put it to be the very first element, which is profitable in most
cases. Do the preliminary estimation, if this more profitable during
graph rotation and do same for all elements, including extractelements.

Differential Revision: https://reviews.llvm.org/D144689
2023-02-24 13:17:40 -08:00
Alexey Bataev
6e30dffe71 [SLP][NFC]Format and improve function, returning std::optional<struct>,
NFC.
2023-02-24 11:06:31 -08:00
Jonas Paulsson
1387a13e1d [SLP] Check with target before vectorizing GEP Indices.
The target hook prefersVectorizedAddressing() already exists to check with
target if address computations should be vectorized, so it seems like this
should be used in SLPVectorizer as well.

Reviewed By: ABataev, RKSimon

Differential Revision: https://reviews.llvm.org/D144128
2023-02-23 15:31:34 +01:00
Alexey Bataev
cbcdd747e8 [SLP]Do not swap not counted extractelements.
No need to swap extractelements, which were not excluded from the list
during cost analysis. It leads to incorrect cost calculation and make
vector code more profitable than it is actually is.
2023-02-21 13:16:51 -08:00
Alexey Bataev
5f928a223e [SLP]Properly define incoming block for user PHI nodes.
MainOp of the PHI vectorizable entries contains the proper order of
incoming blocks, not the last instruction in the block.
2023-02-21 08:01:24 -08:00
Alexey Bataev
708eb1b96d [SLP]Add shuffling of extractelements to avoid extra costs/data movement.
If the scalar must be extracted and then used in the gather node,
instead we can emit shuffle instruction to avoid those extra
extractelements and vector-to-scalar and back data movement.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141940
2023-02-20 06:14:42 -08:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Florian Hahn
f61c9b7569
[SLP] Fix infinite loop in isUndefVector.
This fixes an infinite loop if isa<T>(II->getOperand(1)) is true.
Update Base at the top of the loop, before the continue.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D144292
2023-02-19 21:42:24 +00:00
Alexey Bataev
e03d254bbd [SLP]Do not reduce repeated values, use scalar red ops instead.
Metric: size..text

                                                     size..text                 results     results0    diff
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test      445.00      461.00  3.6%
SingleSource/Benchmarks/Adobe-C++/loop_unroll.test                               428477.00   428445.00 -0.0%
External/SPEC/CFP2006/447.dealII/447.dealII.test                                 618849.00   618785.00 -0.0%

For all tests some extra code was optimized, GCC-C-execute has some more
inlining after

Differential Revision: https://reviews.llvm.org/D132261
2023-02-17 07:19:35 -08:00
Kazu Hirata
7e6e636fb6 Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.
2023-02-15 22:17:27 -08:00
Simon Pilgrim
552e27c521 [SLP] Use allConstant helper. NFCI. 2023-02-05 19:21:48 +00:00
Kazu Hirata
526966d07d Use llvm::bit_ceil (NFC)
Note that:

  std::has_single_bit(X) ? X : llvm::NextPowerOf2(X);

is equivalent to:

  std::bit_ceil(X)

even for input 0.
2023-01-28 16:13:09 -08:00
Kazu Hirata
f20b5071f3 [llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC) 2023-01-28 09:06:31 -08:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Alexey Bataev
9bdcf8778a [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.
The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141512
2023-01-19 13:46:25 -08:00
Joe Loser
a288d7f937 [llvm][ADT] Replace uses of makeMutableArrayRef with deduction guides
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.

Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.

Differential Revision: https://reviews.llvm.org/D141814
2023-01-16 14:49:37 -07:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
Valery N Dmitriev
fd7273359a [SLP] Do not ignore ordering for root node when it has in-tree uses.
When rooted with PHIs, a vectorization tree may have another node with PHIs
which have roots as their operands. We cannot ignore ordering information
for root in such a case.

Differential Revision: https://reviews.llvm.org/D141309
2023-01-10 10:12:51 -08:00
Alexey Bataev
755282ec1e [SLP][NFC]Move getExtractIndex function for future changes, NFC. 2023-01-09 09:53:01 -08:00
Alexey Bataev
996ad44b97 [SLP][NFC]Fix compile build by declaring ArrayRef, NFC.
Fix compiler build reported in https://lab.llvm.org/buildbot#builders/243/builds/218
2023-01-06 17:01:48 -08:00
Alexey Bataev
cc17e93178 [SLP][NFC]Remove unused variables, NFC. 2023-01-06 16:55:54 -08:00
Alexey Bataev
7439e1b2de [SLP]Fix incorrect reordering of clustered scalars.
The new mask represents the order, not the mask itself. At first, need
to treat as the order, convert to mask and only after that reorder
gathered scalars to build correct clustered order.

Differential Revision: https://reviews.llvm.org/D141161
2023-01-06 16:04:09 -08:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Valery N Dmitriev
6d677c0b3d [SLP] Unify GEP cost modeling for load, store and GEP nodes.
Make a separate routine for GEPs cost calculation and make
the approach uniform across load, store and GEP tree nodes.
Additional issue fixed is GEP cost savings were applied twice
for ScatterVectorize nodes (aka gather load) making them look
unrealistically profitable for vectorization.

Differential Revision: https://reviews.llvm.org/D140789
2023-01-05 10:11:36 -08:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Alexey Bataev
a1b18946f9 [SLP]Fix incorrect shuffle results because of missing shuffle mask
analysis.

Missed the analysis of the shuffle mask when trying to analyze the
operands of the shuffle instruction during peeking through shuffle
instructions.
2023-01-04 13:10:40 -08:00
Dinar Temirbulatov
55c600819f [SLP][AArch64] Incorrectly estimated intrinsic as a function call.
We incorrectly assume intrinsic as a function call and it prevents us from
the opportunity to vectorize. On Aarch64 Cortex-A53 we think that
llvm.fmuladd.f64 is a function call which is wrong.

Differential Revision: https://reviews.llvm.org/D140392
2023-01-03 19:45:24 +00:00
Alexey Bataev
26fec4e845 [SLP]Fix crash on casting non-instruction extractelement.
Need to check if the extractelement operation is an extraction before
trying to move it around the buildblocks to avoid crash on cast.
2023-01-03 09:45:57 -08:00
Alexey Bataev
5dccea5a68 [SLP]Do not emit many extractelements, reuse the single one emitted.
We do not need to emit many extractelements for each particular use, we
can reuse the only one, just need to adjust it to make it dominate on
all uses.

Differential Revision: https://reviews.llvm.org/D140580
2022-12-30 06:38:06 -08:00
Valery N Dmitriev
ad956ed568 [SLP] Fix debug print for cost in tryToVectorizeList - NFC.
Actual VF was confused with local variable named "VF".
2022-12-29 11:30:10 -08:00
Valery N Dmitriev
8eb3698b94 [SLP] A couple of minor improvements for slp graph view - NFC.
Show ScatterVectorize nodes in frames of blue color
and print vectorize tree indices.
2022-12-29 11:02:36 -08:00
Alexey Bataev
ac01ae71f0 [SLP]Use ShuffleInstructionBuilder for vector shrinking.
We can use ShuffleInstructionBuilder now for shrinking shuffle emission.
It allows to remove extra shuffle from the emitted code and reuse
original vector.

Part of D110978

Differential Revision: https://reviews.llvm.org/D140499
2022-12-28 06:09:04 -08:00
Alexey Bataev
a9b052e2ef [SLP]Fix PR59693: Do not crash trying to set insert point for buildvector
of extractvalues.

No need to get the last instruction only for vectorized extractvalues,
for gathered(buildvector sequence) still need to get the insertion
  point.
2022-12-27 06:01:38 -08:00
Alexey Bataev
2e972ea056 [SLP]Integrate looking through shuffles logic into ShuffleInstructionBuilder.
Added BaseShuffleAnalysis as a base class for ShuffleInstructionBuilder
and integrated shuffle logic from shuffles for externally used scalars
into this class. This class is used as the main container that
implements smart shuffle instruction builder logic.
ShuffleInstructionBuilder uses this logic.
ShuffleInstructionBuilder is also used in building of the shuffle for
the externally used scalars instead of lambdas, which are now part of BaseShuffleAnalysis class.

Differential Revision: https://reviews.llvm.org/D140100
2022-12-21 06:12:53 -08:00
Kazu Hirata
c08fad8193 [llvm] Remove redundant initialization of std::optional (NFC) 2022-12-20 15:53:38 -08:00
Fangrui Song
21c4dc7997 std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).

This fixes clang.
2022-12-17 00:42:05 +00:00
Kazu Hirata
6eb0b0a045 Don't include Optional.h
These files no longer use llvm::Optional.
2022-12-14 21:16:22 -08:00
Fangrui Song
d4b6fcb32e [Analysis] llvm::Optional => std::optional 2022-12-14 07:32:24 +00:00
Alexey Bataev
ecac8192db [SLP][NFC]Initial redesign of ShuffleInstructionBuilder, NFC.
The patch redesigns ShuffleInstructionBuilder so it could later be used
for reshuffling of the buildvector sequences and vectorized parts of
  externally used scalars. Also will allow to generalize cost model for
  the gathers/buildvectors.

Part of D110978.

Differential Revision: https://reviews.llvm.org/D139718
2022-12-13 09:37:18 -08:00
Fangrui Song
1ec11d2d48 [Transforms/Vectorize] llvm::Optional => std::optional 2022-12-12 08:56:35 +00:00
Alexey Bataev
f4c6d7b813 [SLP][NFC]prepare isUndefVector function to be used for differently
sized vectors as shuffle masks, NFC.

Use use-mask instead of actual mask to speed up the process and make it
possible to use for the cases where the mask is used for vector
resizing.
2022-12-09 04:14:16 -08:00
Kazu Hirata
1f421b6d7e [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-06 22:45:17 -08:00
Kazu Hirata
9f252e5567 [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:31:17 -08:00
Kazu Hirata
3c09ed006a [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:12:44 -08:00
Kazu Hirata
343de6856e [Transforms] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:37 -08:00
Alexey Bataev
0cc15050a4 [SLP]Fix PR59230: Use actual vector factor when sorting entries.
When we sort entries for attempting to reorder scalars, need to use
actual vectorization factor, not the number of scalars. Otherwise the
compiler crashes, if the scalars has to be reordered.

Differential Revision: https://reviews.llvm.org/D138819
2022-11-29 06:46:06 -08:00
Qiongsi Wu
f946c70130 [SLPVectorizer] Do Not Move Loads/Stores Beyond Stacksave/Stackrestore Boundaries
If left unchecked, the SLPVecrtorizer can move loads/stores below a stackrestore. The move can cause issues if the loads/stores have pointer operands from `alloca`s that are reset by the stackrestores. This patch adds the dependency check.

The check is conservative, in that it does not check if the pointer operands of the loads/stores are actually from `alloca`s that may be reset. We did not observe any SPECCPU2017 performance degradation so this simple fix seems sufficient.

The test could have been added to `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll`, but that test has not been updated to use opaque pointers. I am not inclined to add tests that still use typed pointers, or to refactor `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll` to use opaque pointers in this patch. If desired, I will open a different patch to refactor and consolidate the tests.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D138585
2022-11-28 10:00:29 -05:00
Kazu Hirata
5fc8f6c37c [Vectorize] Use std::optional in SLPVectorizer.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 18:03:49 -08:00
Alexey Bataev
ac93b61165 [SLP]Fix PR59098: check if the vector type is scalarized for
extractelements.

If the resulting type is going to be scalarized, no need to adjust the
cost of removed extractelement and insert/extract subvector costs.
Otherwise, the compiler can crash because of the wrong type sizes.
2022-11-21 10:26:01 -08:00