1017 Commits

Author SHA1 Message Date
Alexey Bataev
802ceb8343 [SLP]Excluded external uses from the reordering estimation.
Compiler adds the estimation for the external uses during operands
reordering analysis, which makes it tend to prefer duplicates in the
lanes rather than diamond/shuffled match in the graph. It changes the sizes of
the vector operands and may prevent some vectorization. We don't need
this kind of estimation for the analysis phase, because we just need to
choose the most compatible instruction and it does not matter if it has
external user or used in the non-matching lane. Instead, we count the number
of unique instruction in the lane and see if the reassociation changes
the number of unique scalars to be power of 2 or not. If we have power
of 2 unique scalars in the lane, it is considered more profitable rather
than having non-power-of-2 number of unique scalars.

Metric: SLP.NumVectorInstructions

                          test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test   70.00   86.00   22.9%
                             test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test  346.00  353.00    2.0%
                            test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test  346.00  353.00    2.0%
                         test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test  235.00  239.00    1.7%
                  test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test  235.00  239.00    1.7%
                     test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8723.00 8834.00    1.3%
                                 test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00    1.2%
                         test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00    1.1%
                          test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00    1.1%
                       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9184.00    0.9%
                     test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00    0.3%
                    test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00    0.3%
                       test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4235.00 4245.00    0.2%
                              test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00    0.1%
                                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00    0.1%

test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test  783.00  782.00   -0.1%
                      test-suite :: SingleSource/Benchmarks/Misc/oourafft.test   69.00   68.00   -1.4%
        test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test  207.00  192.00   -7.2%
         test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test  207.00  192.00   -7.2%
 test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test   89.00   80.00  -10.1%
test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test   89.00   80.00  -10.1%
       test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test  260.00  215.00  -17.3%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test  256.00  211.00  -17.6%

MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same.
SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by
one <4 x> load.
External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and
not inlined anymore.
External/SPEC/CINT2017rate/541.leela_r - same
xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in
multi-block tree, the result is pretty the same.
External/SPEC/CINT2017speed/631.deepsjeng_s - same.
MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same
as before.
MultiSource/Benchmarks/MiBench/consumer-jpeg - same.

Differential Revision: https://reviews.llvm.org/D116688
2022-02-03 06:50:06 -08:00
Alexey Bataev
ad2a0ccf8f [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-03 06:24:10 -08:00
Alexey Bataev
8a1dfbc4d8 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit 842a2360a84692f2e4c37cc3e652640e6627d004 to fix the
bugs reported by users in https://reviews.llvm.org/D115955#3291538.
2022-02-02 12:06:36 -08:00
Alexey Bataev
842a2360a8 [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-02 10:32:52 -08:00
Benjamin Kramer
0c3d22a592 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit 83620bd2ad867f706c699d0f2b8be10e43d9f3d7.

It's causing miscompilations, see review comments at
https://reviews.llvm.org/D115955
2022-02-02 13:08:51 +01:00
Alexey Bataev
83620bd2ad [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-01 09:54:20 -08:00
Benjamin Kramer
5281f0dab2 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit afaaecc88c6e5989de8a6a0266610860ef99d9d6.

Crashes when compiling SciPy, test case https://reviews.llvm.org/P8276
2022-02-01 11:40:43 +01:00
Alexey Bataev
afaaecc88c [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-01-31 11:11:25 -08:00
Philip Reames
6888081e32 [SLP] Use moveBefore to simplify code [NFC] 2022-01-28 12:44:07 -08:00
Philip Reames
746e435ff7 Revert "[SLP] Add a clarifying assert in block scheduling [NFC]"
This reverts commit db49a78900f5e4b59714565876b5dbb5e2dfe840.  The reasoning in the patch applied to a downstream branch, and I got myself confused when trying to split apart pieces.  Thankfully, the assert was simply weaker than the actual invariant currently upstream which is that ReadyInsts is not empty.
2022-01-28 12:10:31 -08:00
Philip Reames
db49a78900 [SLP] Add a clarifying assert in block scheduling [NFC]
The fact we could have a block with a valid scheduling window, but nothing to schedule was surprising to me.  After digging through the code, this can only happen if we don't find anything to directly vectorize.  However, the reduction handling code relies on this mode, so we can't simply consider such trees unvectorizeable.  The assert conveys both that this situation can happen, but also that it can *only* happen for an immediate gather.

Context: We built the bundle before deciding that vectorization of a bundle is possible.  A side effect of bundle construction is manipulating the scheduling window, so a bundle which isn't vectorizable can cause the creation or expansion of a scheduling window.
2022-01-28 11:08:59 -08:00
Alexey Bataev
cec8b614f3 [SLP]Do not reorder top nodes if they do not require reordering.
No need to reorder the top nodes, if they are not stores or
insertelement instructions and each node should be analized only
once, when the bottom-to-top analysis is performed.
We still endup with extractelements for the top node scalars and
the final shuffle just adds an extra cost and currently
crashes the compiler for PHI nodes.

Differential Revision: https://reviews.llvm.org/D116760
2022-01-28 09:16:18 -08:00
eopXD
6be77561f8 [SLP][NFC] Add debug logs for entry.
Tell the users they are specifying something without vector register.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D117980
2022-01-24 09:05:21 -08:00
Kazu Hirata
f63a9cd99d [Vectorize] Remove unused variables (NFC) 2022-01-23 20:32:54 -08:00
Philip Reames
c0906f6b21 [SLP] Remove stray semicolon to make bots happy
Certain bots (e.g. sanitizer-x86_64-linux-android) appear to be running with strict c++98 flags which disallow ; at global scope.
2022-01-20 14:09:28 -08:00
Philip Reames
5a670f1378 [SLP] Kill an unused param and use a for-loop in calculateDependencies [NFC] 2022-01-20 13:58:20 -08:00
Philip Reames
60f6191879 [SLP] Extract formBundle helper for readability [NFC] 2022-01-20 13:08:37 -08:00
Philip Reames
118babe67a [SLP] Use for loops for walking bundle elements 2022-01-20 12:44:33 -08:00
Philip Reames
860038e0d7 [SLP] Rename a couple lambdas to be more clearly separate from method names 2022-01-20 12:13:30 -08:00
Philip Reames
c104fca36b {SLP] Delete dead code in favor of proper assert [NFC] 2022-01-20 08:54:12 -08:00
Philip Reames
c43ebae838 [SLP] Reduce nesting depth in calculateDependencies via for loop and early continue [NFC] 2022-01-20 08:46:44 -08:00
Philip Reames
3c422cbe6b [SLP] Add an asser to make a non-obvious precondition clear [NFC] 2022-01-20 08:24:10 -08:00
Kazu Hirata
435a5a3652 [llvm] Fix bugprone argument comments (NFC)
Identified with bugprone-argument-comment.
2022-01-08 11:56:38 -08:00
Alexey Bataev
d130df544d [SLP]Improve reordering for the nodes beeing used in alternate vectorization.
No need to include the order of the scalars beeing used as part of the
alternate vectorization into account when trying to reorder the whole
graph. Such elements better to reorder in the following phase because
the subtree still ends up in shuffle.

Part of D116688, fixes the regression in D116690.

Differential Revision: https://reviews.llvm.org/D116740
2022-01-06 11:18:57 -08:00
Alexey Bataev
7cb19fe493 [SLP]Initialize the lane with the given value instead of default 0.
There is a bug in the reordering analysis stage. If the element with the
given hash is not added to the map but has the same number of APOs and
instructions with same parent, but different instruction opcode, it will
be initalized with default values and then the counter is increased by
1. But the lane is not updated and default to 0 instead of the actual
   `Lane` value. It leads to the fact that the analysis is useless in
   many cases and default to lane 0 instead of actual lane with the
   minimum amount of APO operands.

Differential Revision: https://reviews.llvm.org/D116690
2022-01-06 10:57:11 -08:00
Alexey Bataev
700997aef8 [SLP][NFC]Fix comment, NFC. 2022-01-06 06:38:29 -08:00
Alexey Bataev
dd83befe33 [SLP][NFC]Improved isAltShuffle by comparing instructions instead of
opcodes, NFC.

NFC part of D115955.
2022-01-05 12:30:13 -08:00
Alexey Bataev
e0efedd2c3 [SLP][NFC]Fix non-determinism in reordering, NFC.
Need to clear CurrentOrder order mask if it is determined that
extractelements form identity order and need to use a vector-like
construct when iterating over ordered entries in the reorderTopToBottom
function.
2021-12-30 13:10:25 -08:00
Alexey Bataev
ab9078f3d3 [SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy.
Need to check for the number of the unique non-constant values since the
unique values may include several constants.

Differential Revision: https://reviews.llvm.org/D115939
2021-12-20 07:21:20 -08:00
Alexey Bataev
4459a11f4d Revert "[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy."
This reverts commit fcaf290d0278bb83387e1a1d972c55e08b8c40e3 to fix test
mismatch reported in https://lab.llvm.org/buildbot#builders/117/builds/3531
2021-12-20 07:21:18 -08:00
Alexey Bataev
fcaf290d02 [SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy.
Need to check for the number of the unique non-constant values since the
unique values may include several constants.

Differential Revision: https://reviews.llvm.org/D115939
2021-12-20 05:15:01 -08:00
Alexey Bataev
71fe59212c [SLP][NFC]Adjust type in debug output loop.
The ReuseShuffleIndices indeces are integer, not unsigned, need to fix
the type in the debug print loop.
2021-12-17 12:43:01 -08:00
Alexey Bataev
46ad66b817 [SLP][NFC]Use 'llvm::copy' instead of element-by-elemen copying. 2021-12-17 12:07:59 -08:00
Alexey Bataev
65fc992579 [SLP]Early exit out of the reordering if shuffled/perfect diamond match found.
Need to early exit out of the reordering process if the perfect/shuffled match is found in the operands. Such pattern will result in not profitable reordering because of (false positive) external use of scalars.

Differential Revision: https://reviews.llvm.org/D115811
2021-12-16 11:09:49 -08:00
Alexey Bataev
6f2e087631 [SLP]Do not represent splats as node with the reused scalars.
No need to represent splats as a node with the reused scalars, it may
increase the cost (currently pass just ignores extra shuffle cost and it
is still not correct).

Differential Revision: https://reviews.llvm.org/D115800
2021-12-15 06:33:11 -08:00
Alexey Bataev
bd05376986 [SLP]Improve multinode analysis.
Changes the preliminary multinode analysis:
1. Introduced scores for reversed loads/extractelements.
2. Improved shallow score calculation.
3. Lowered the cost of external uses (no need to consider it several times, just ones).
4. The initial lane for analysis is the one with the minimal possible
   reorderings.

These changes in general shall reduce compile time and improve the
reordering in many cases.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101109
2021-12-14 06:01:52 -08:00
Alexey Bataev
e5b191a433 [SLP]Improve/fix reodering for gather nodes with extractelements/undefs.
If the gather node is a mix of undefvalues and exractelement
instructions, need to take the ordering for such nodes into account too.
It allows to reorder some (sub)trees and remove some extra shuffles,
improving overall vectorization.
Also, outlined common functionality into a separate function.

Differential Revision: https://reviews.llvm.org/D115358
2021-12-13 10:59:38 -08:00
Nikita Popov
432c41ebe9 [SLP] Avoid getPointerElementType() call
Use the load result type instead of the element type of the load
pointer operand.
2021-12-13 15:46:13 +01:00
Alexey Bataev
19c5cf4167 [SLP]Fix comparator for cmp instruction vectorization.
The comparator for the sort functions should provide strict weak
ordering relation between parameters. Current solution causes compiler
crash with some standard c++ library implementations, because it does
not meet this criteria. Tried to fix it + it improves the iverall
vectorization result.

Differential Revision: https://reviews.llvm.org/D115268
2021-12-09 10:57:57 -08:00
Philip Reames
b24db85c0b [recurrence] Delete dead flag/fmf handling [NFC]
The recurrence lowering code has handling which claims to be about flag intersection, but all the callers pass empty arrays to the arguments.  The sole exception is a caller of a method which has the argument, but no implementation.

I don't know what the intent was here, but it certaintly doesn't actually do anything today.
2021-12-09 10:43:53 -08:00
Alexey Bataev
a101a9b64b [SLP]Fix compiler crash when calculating extract cost for undefs.
Need to add an extra check for potential undef values in
computeExtractCost function to avoid compiler crash on casting to
instructon.

Differential Revision: https://reviews.llvm.org/D115162
2021-12-06 10:46:13 -08:00
Alexey Bataev
ba74bb3a22 [SLP]Fix reused extracts cost.
If the extractelement instruction is used multiple times in the
different tree entries (either vectorized, or gathered), need to
compensate the scalar cost of such instructions. They are completely
removed if all users are part of the tree but we need to compensate the
cost only once for each instruction.

Differential Revision: https://reviews.llvm.org/D114958
2021-12-02 10:52:00 -08:00
Alexey Bataev
8ceccbd321 [SLP]Outline and fix code for finding common insertelement vectors.
Need to outline the code for finding common vectors in insertelement
instructions into a separate function for future patches. It also
improves the process by adding some extra checks for early exit and
fixes a bug where it always finds the match because of erroneous compare
of the same values.

Differential Revision: https://reviews.llvm.org/D114909
2021-12-02 09:18:25 -08:00
Alexey Bataev
92fbd76af5 [SLP]Improve registering and merging of compatible shuffles.
If several shuffle instructions are emitted, some of them might
same/compatible (less defined) with the previously emitted ones. Such
shuffles can be removed safely, improving the total cost of the
vectorized code.

Differential Revision: https://reviews.llvm.org/D114087
2021-12-02 08:48:29 -08:00
Alexey Bataev
afc9e7517a [SLP]Improve cost model for the shuffled extracts.
Improved the calculation of the shuffled extracts, where possible. Need
to calculate the cost for the extracted scalars if some users are not
insertelements + improved the total estimation of the shuffled scalars
used in insertelements build vectors.

Differential Revision: https://reviews.llvm.org/D113782
2021-12-01 08:10:57 -08:00
Alexey Bataev
cc30fbf242 [SLP]Introduce isUndefVector function to check for undef vectors.
Undefined vector might be not only the UndefValue, but also it can be
a constant vector with undef ot poison elements, need to check for this
kind of undef too.

Differential Revision: https://reviews.llvm.org/D114873
2021-12-01 07:46:10 -08:00
Alexey Bataev
ddce6e0561 [SLP]Improve vectorization of cmp instructions sequences.
Final attempt to vectorize bundles of comptatible cmp instructions after
all other instructions processing.

Metric: SLP.NumVectorInstructions

Program                                                                             results results0 diff
        test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    1.00    5.00  400.0%
                              test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test    8.00   11.00   37.5%
                    test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test   20.00   26.00   30.0%
                test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1344.00 1648.00   22.6%
               test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1344.00 1648.00   22.6%
                              test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test  102.00  124.00   21.6%
                test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test  118.00  133.00   12.7%
          test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3233.00 3554.00    9.9%
           test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3233.00 3554.00    9.9%
                        test-suite :: MultiSource/Benchmarks/Olden/power/power.test   64.00   70.00    9.4%
           test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 7879.00 8604.00    9.2%
           test-suite :: MultiSource/Benchmarks/Prolangs-C/simulator/simulator.test   50.00   54.00    8.0%
                        test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   27.00   29.00    7.4%
             test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 8345.00 8955.00    7.3%
     test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test  694.00  738.00    6.3%
                        test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test  361.00  382.00    5.8%
                      test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  409.00  430.00    5.1%
     test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  140.00  147.00    5.0%
      test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  140.00  147.00    5.0%
             test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4013.00 4206.00    4.8%
                       test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  966.00 1011.00    4.7%
                           test-suite :: SingleSource/Benchmarks/Misc/oourafft.test   65.00   68.00    4.6%
                            test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4219.00 4381.00    3.8%
                    test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1911.00 1973.00    3.2%
      test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test   62.00   64.00    3.2%
     test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test   62.00   64.00    3.2%
                 test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  852.00  877.00    2.9%
                  test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  852.00  877.00    2.9%
                       test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1624.00 1668.00    2.7%
                         test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test   39.00   40.00    2.6%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test  613.00  624.00    1.8%
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  378.00  383.00    1.3%
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test  293.00  295.00    0.7%
            test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test  297.00  299.00    0.7%
      test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5522.00 5534.00    0.2%
     test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5522.00 5534.00    0.2%

Differential Revision: https://reviews.llvm.org/D114799
2021-12-01 07:26:29 -08:00
Alexey Bataev
dce6c434ea [SLP]Improve isFixedVectorShuffle and its use.
Extended support for undefined source vector/extract indices/non-fixed
vector types, also no need to check for the parent of the extractelement
instructions with the constant indicies.

Differential Revision: https://reviews.llvm.org/D114121
2021-11-30 10:10:20 -08:00
Alexey Bataev
fc57cfad3c [SLP][NFC]Move static function to make it visible in member function,
NFC.
2021-11-30 09:38:46 -08:00
Alexey Bataev
fc0aacf324 [SLP]Improve analysis/emission of vector operands for alternate nodes.
Compiler has an analysis for perfect diamond matching but it does not
support nodes with main/alternate opcodes. The problem is that the
scalars themselves are different and might not match directly with other
nodes, but operands and main/alternate opcodes might match and compiler
might reuse some previously emitted vector instructions. Need to include
this analysis in the cost model and actual vector instructions emission
process.

Differential Revision: https://reviews.llvm.org/D114101
2021-11-26 06:38:02 -08:00