883 Commits

Author SHA1 Message Date
Alexey Bataev
a010d4230e [SLP]Allow reordering of insertelements.
After we added support for non-ordered insertelements, we can allow
their reordering.

Differential Revision: https://reviews.llvm.org/D104057
2021-06-11 08:47:41 -07:00
Alexey Bataev
74af4bb1f4 [SLP]Remove unnecessary UndefValue in CreateShuffle.
No need to use UndefValue in CreateShuffle call.

Differential Revision: https://reviews.llvm.org/D104113
2021-06-11 08:08:30 -07:00
Alexey Bataev
a893b44187 [SLP]Disable scheduling of insertelements.
There is no need to schedule insertelement instructions. The compiler
did not schedule them before it started support their vectorization and
it should not do it after. We pre-schedule them manually when finding
a build vector sequence.
Disabling scheduling of insertelement instructions improves compile
time and vectorization of the very large basic blocks by saving
scheduling budget for other instructions.

Differential Revision: https://reviews.llvm.org/D104026
2021-06-10 10:25:26 -07:00
Alexey Bataev
a0086add2e [SLP]Improve gathering of scalar elements.
1. Better sorting of scalars to be gathered. Trying to insert
   constants/arguments/instructions-out-of-loop at first and only then
   the instructions which are inside the loop. It improves hoisting of
   invariant insertelements instructions.
2. Better detection of shuffle candidates in gathering function.
3. The cost of insertelement for constants is 0.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103458
2021-06-09 05:23:21 -07:00
Alexey Bataev
8c48d77cdf [SLP]Improve cost estimation/emission of externally used extractelements.
No need to recalculate the cost of extractelements, just no need to
compensate the cost of all extractelements, need to check before if this
is actually going to be removed at the vectorization. Also, no need to
 generate new extractelement instruction, we may just regenerate the
 original one. It may improve the final vectorization.

Differential Revision: https://reviews.llvm.org/D102933
2021-06-03 10:26:59 -07:00
Alexey Bataev
89f3bc7698 [SLP]Allow to reorder nodes with >2 scalar values.
tryToVectorizeList function allows to reorder only 2 scalars. Patch
allows to reorder >2 scalars. Also, to avoid possible regressions, it
allows extra vectorization of the remaining parts of the scalars
elements if possible.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103247
2021-06-03 10:01:36 -07:00
Harald van Dijk
5d2b3de284
[SLP] Avoid std::stable_sort(properlyDominates()).
As noticed by NAKAMURA Takumi back in 2017, we cannot use
properlyDominates for std::stable_sort as properlyDominates only
partially orders blocks. That is, for blocks A, B, C, D, where A
dominates B and C dominates D, we have A == C, B == C, but A < B. This
is not a valid comparison function for std::stable_sort and causes
different results between libstdc++ and libc++. This change uses DFS
numbering to give deterministic results for all reachable blocks.
Unreachable blocks are ignored already, so do not need special
consideration.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103441
2021-06-03 17:51:52 +01:00
Harald van Dijk
f126e8ec28
[SLPVectorizer] Ignore unreachable blocks
As the existing test unreachable.ll shows, we should be doing more
work to avoid entering unreachable blocks: we should not stop
vectorization just because a PHI incoming value from an unreachable
block cannot be vectorized. We know that particular value will never
be used so we can just replace it with poison.
2021-06-01 20:21:04 +01:00
Alexey Bataev
36911971a5 [SLP]Better detection of perfect/shuffles matches for gather nodes.
Implemented better scheme for perfect/shuffled matches of the gather
nodes which allows to fix the performance regressions introduced by
earlier patches. Starting detecting matches for broadcast nodes and
extractelement gathering.

Differential Revision: https://reviews.llvm.org/D102920
2021-06-01 07:08:07 -07:00
Alexey Bataev
27d3528acf [SLP]Fix vectorization of insertelements with multiple uses.
SLP vectorizer should not consider in sertelements with multiple uses as
a part of high level build vector, it must be considered as
a terminating insertelement in the vector build, otherwise it may
produce incorrect code.

Differential Revision: https://reviews.llvm.org/D103164
2021-05-26 09:42:18 -07:00
Anton Afanasyev
b2cd895011 [SLP] Fix "gathering" of insertelement instructions
For rare exceptional case vector tree node (insertelements for now only)
is marked as `NeedToGather`, this case is processed by patch. Follow-up
of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135.

Differential Revision: https://reviews.llvm.org/D102675
2021-05-25 01:35:43 +03:00
Alexey Bataev
8dab25954b [SLP]Improve handling of compensate external uses cost.
External insertelement users can be represented as a result of shuffle
of the vectorized element and noconsecutive insertlements too. Added
support for handling non-consecutive insertelements.

Differential Revision: https://reviews.llvm.org/D101555
2021-05-21 07:45:31 -07:00
Alexey Bataev
182162b616 [SLP]Try to vectorize tiny trees with shuffled gathers of extractelements.
If we gather extract elements and they actually are just shuffles, it
might be profitable to vectorize them even if the tree is tiny.

Differential Revision: https://reviews.llvm.org/D101460
2021-05-20 08:36:16 -07:00
Arthur Eubanks
6b9524a05b [NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.

SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102032
2021-05-18 13:49:03 -07:00
Anton Afanasyev
207cdd7ed9 [SLP] Fix spill cost computation for insertelement tree node
This is follow up for D98714, bugfixing.
2021-05-14 13:14:41 +03:00
Anton Afanasyev
ab2c499d3a [SLP] Add insertelement instructions to vectorizable tree
Add new type of tree node for `InsertElementInst` chain forming vector.
These instructions could be either removed, or replaced by shuffles during
vectorization and we can add this node to cost model, so naturally estimating
their cost, getting rid of `CompensateCost` tricks and reducing further work
for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this
patch is the first step towards revectorization of partially vectorization
(to fix PR42022 completely). After adding inserts to tree the next step is
to add vector instructions there (for instance, to merge `store <2 x float>`
and `store <2 x float>` to `store <4 x float>`).

Fixes PR40522 and PR35732.

Differential Revision: https://reviews.llvm.org/D98714
2021-05-13 07:41:45 +03:00
Sanjay Patel
49950cb1f6 [SLP] restrict matching of load combine candidates
The test example from https://llvm.org/PR50256 (and reduced here)
shows that we can match a load combine candidate even when there
are no "or" instructions. We can avoid that by confirming that we
do see an "or". This doesn't apply when matching an or-reduction
because that match begins from the operands of the reduction.

Differential Revision: https://reviews.llvm.org/D102074
2021-05-11 08:46:40 -04:00
Alexey Bataev
30463bc3f1 [SLP]Do not count perfect diamond matches for gathers several times.
Need to remove the old code for avoiding double counting of the gather
nodes with perfect diamond matches within the tree after we started
detecting perfect/shuffled matching in the previous patch D100495. We
may skip the cost for such nodes completely.

Differential Revision: https://reviews.llvm.org/D102023
2021-05-10 07:08:07 -07:00
Simon Pilgrim
338c1b701f [SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI. 2021-05-06 16:20:19 +01:00
Simon Pilgrim
2dab059021 [SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI. 2021-05-06 16:20:19 +01:00
Simon Pilgrim
1b47489fd0 [SLP] Use empty() instead of size() == 0. NFCI. 2021-05-06 16:20:18 +01:00
Alexey Bataev
369cd2ae52 Revert "[SLP]Allow masked gathers only if allowed by target."
This reverts commit fd18547e0721983dcb273670d16341921f831e50. Need to
add a check for the size of the vectorization tree to avoid some extra
vectorization.
2021-05-04 04:53:22 -07:00
Alexey Bataev
fd18547e07 [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 08:06:20 -07:00
Alexey Bataev
2e4cc9a725 Revert "[SLP]Allow masked gathers only if allowed by target."
This reverts commit b5f64768cfeecca16c7c9c53cbd97ac7289c43aa to fix
a compiler crash revealed by buildbots.
2021-05-03 07:20:00 -07:00
Alexey Bataev
b5f64768cf [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 06:45:42 -07:00
Alexey Bataev
a3fd82c289 [SLP]Fix the crash on cost calculation if non-compatible vectors shuffled.
If the extracts from the non-power-2 vectors are recognized as shuffles,
need some extra checks to not crash cost calculations if trying to gext
the ecost for subvector extracts. In this case need to check carefully
that we do not exit out of bounds of the original vector, otherwise the
TTI's cost model will crash on assert.

Differential Revision: https://reviews.llvm.org/D101477
2021-04-30 09:34:20 -07:00
Alexey Bataev
12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00
Alexey Bataev
6e859f3cd4 Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit 92399322217917e67c0d72a55ec51ddc82251cf6 to fix
a compiler crash on mask checks.
2021-04-29 12:40:33 -07:00
Alexey Bataev
9239932221 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 09:42:56 -07:00
Alexey Bataev
8af4723c58 [SLP]Try to vectorize tiny trees with shuffled gathers.
If the first tree element is vectorize and the second is gather, it
still might be profitable to vectorize it if the gather node contains
less scalars to vectorize than the original tree node. It might be
profitable to use shuffles.

Differential Revision: https://reviews.llvm.org/D101397
2021-04-28 06:35:31 -07:00
Alexey Bataev
24590d8d67 [SLP]Improved isGatherShuffledEntry, NFC.
Reworked isGatherShuffledEntry function, simplified and moved
common code to the lambda (it shall go away when non-power-2 patch will
be landed).
2021-04-27 05:59:46 -07:00
Alexey Bataev
18c61fc498 [SLP]Skip undefs trying to find perfect/shuffled tree entries matching.
We can skip check for undefs trying to find perfect/shuffled tree
entries matching, they can be ignored completely improving the final
cost/vectorization results.

Differential Revision: https://reviews.llvm.org/D101061
2021-04-22 08:59:07 -07:00
Alexey Bataev
d4f5f23bbb [SLP]Replace more TTI with TTIRef, NFC.
To pacify MSVC buildbots.
2021-04-22 07:53:20 -07:00
Alexey Bataev
da2cdfd421 [SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC
buildbots, NFC.
2021-04-22 07:49:48 -07:00
Alexey Bataev
e99b98cb1b [SLP]Improve cost model for the vectorized extractelements.
1. No need to call `areAllUsersVectorized` as later the cost is
   calculated only if the instruction has one use and gets vectorized.
2. Need to calculate the cost of the dead extractelement more precisely,
   taking the vector type of the vector operand, not the resulting
   vector type.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D99980
2021-04-22 07:40:17 -07:00
Alexey Bataev
af870e11ae [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 09:08:46 -07:00
Alexey Bataev
b82344a019 Revert "[SLP] Add detection of shuffled/perfect matching of tree entries."
This reverts commit daf6e18c55c2ac56bbf0f9de233fb2a1150ee331 to fix the
compiler crash.
2021-04-20 08:29:32 -07:00
Alexey Bataev
daf6e18c55 [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 07:46:49 -07:00
Alexey Bataev
cf00cb8bed Revert "[SLP] Add detection of shuffled/perfect matching of tree entries."
This reverts commit b232771acad6225574a2eaf9f860a0fed7ef0804 to fix
buildbots.
2021-04-20 07:16:11 -07:00
Alexey Bataev
b232771aca [SLP] Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100495
2021-04-20 06:55:55 -07:00
Alexey Bataev
8030481065 Revert "[SLP]Add detection of shuffled/perfect matching of tree entries."
This reverts commit d6fde913790db898e72e27b51defbc7442f3418a to fix
compiler crashes.
2021-04-19 14:10:04 -07:00
Alexey Bataev
d6fde91379 [SLP]Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Differential Revision: https://reviews.llvm.org/D100495
2021-04-19 13:29:30 -07:00
Simon Pilgrim
b49c41afba [SLP] createOp - fix null dereference warning. NFCI.
Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.
2021-04-14 15:24:41 +01:00
dfukalov
d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Alexey Bataev
ab124bbe2a [SLP]Fix PR49898: Infinite loop in SLP vectorizer.
We should not re-try attempt of finding of the consecutive store chain
if it was tried before.

Differential Revision: https://reviews.llvm.org/D100131
2021-04-08 14:18:06 -07:00
Alexey Bataev
a78e86e6be [SLP]Avoid multiple attempts to vectorize CmpInsts.
No need to lookup through and/or try to vectorize operands of the
CmpInst instructions during attempts to find/vectorize min/max
reductions. Compiler implements postanalysis of the CmpInsts so we can
skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save
compile time.

Differential Revision: https://reviews.llvm.org/D99950
2021-04-07 06:15:42 -07:00
Alexey Bataev
00a84f9a7f [SLP]Improve vectorization of the CmpInst instructions.
During vectorization better to postpone the vectorization of the CmpInst
instructions till the end of the basic block. Otherwise we may vectorize
it too early and may miss some vectorization patterns, like reductions.

Reworked part of D57059

Differential Revision: https://reviews.llvm.org/D99796
2021-04-05 06:22:51 -07:00
Fangrui Song
8e5f3d04f2 [SLPVectorizer] Fix divide-by-zero after D99719
Will add a test case later.
2021-04-02 11:13:51 -07:00
Alexey Bataev
5fcb07a070 [SLP]Fix a bug in min/max reduction, number of condition uses.
The ultimate reduction node may have multiple uses, but if the ultimate
reduction is min/max reduction and based on SelectInstruction, the
condition of this select instruction must have only single use.

Differential Revision: https://reviews.llvm.org/D99753
2021-04-02 07:09:44 -07:00
Florian Hahn
0f3230390b
[SLP] Better estimate cost of no-op extracts on target vectors.
The motivation for this patch is to better estimate the cost of
extracelement instructions in cases were they are going to be free,
because the source vector can be used directly.

A simple example is

    %v1.lane.0 = extractelement <2 x double> %v.1, i32 0
    %v1.lane.1 = extractelement <2 x double> %v.1, i32 1

    %a.lane.0 = fmul double %v1.lane.0, %x
    %a.lane.1 = fmul double %v1.lane.1, %y

Currently we only consider the extracts free, if there are no other
users.

In this particular case, on AArch64 which can fit <2 x double> in a
vector register, the extracts should be free, independently of other
users, because the source vector of the extracts will be in a vector
register directly, so it should be free to use the vector directly.

The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on
certain AArch64 CPUs.

It looks like this does not impact any code in
SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto.

This originally regressed after D80773, so if there's a better
alternative to explore, I'd be more than happy to do that.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99719
2021-04-02 10:40:12 +01:00