1534 Commits

Author SHA1 Message Date
Alexey Bataev
e05def081e
[SLP]Do not vectorize code in EH and non-returning blocks
The code in EH and non-returning blocks can be skipped by the
vectorizer, since it does not add to the perfromance, just consumes
compile/link time.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112221
2024-10-31 13:50:02 -04:00
Alexey Bataev
19a34dded7
[SLP]Do not account external uses in EH block and in non-returning blocks
No need to account the cost of the external uses in EH and non-returning
basic blocks.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112045
2024-10-31 13:23:43 -04:00
Alexey Bataev
e7080fd735 [SLP]Extra check if the intruction matked for removal, must be replaced in reduction ops
If the instruction is vectorized and it is a part of the reduced values
gather/buildvector node, it should replaced in reduced operation
instructions before removal properly, to avoid compiler crash.

Fixes #114371
2024-10-31 09:59:35 -07:00
Matthias Braun
255e441613
X86: Do not return invalid cost for fp16 conversion (#114128)
Returning invalid instruction costs when converting from/to fp16 in
`X86TTIImpl::getCastInstrCost` when there is no hardware support
available was triggering asserts. This changes the code to return a
large (arbitrary) number to model the fact that libcalls are used to
implement the conversion.

This also simplifies the code by only reporting costs for the scalar
fp16 conversion; vectorized costs being left to the fallback assuming
scalarization.

This is a follow-up to assertion issues reported for the changes in
#113195
2024-10-29 17:16:17 -07:00
Matthias Braun
054c23d78f
X86: Improve cost model of fp16 conversion (#113195)
Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer
transforms the patterns:

- Override `X86TTIImpl::getStoreMinimumVF` to report a minimum VF of 4 (SSE
  register can hold 4xfloat converted/stored to 4xf16) this is necessary as
  fp16 stores are neither modeled as trunc-stores nor can we mark direct Xxfp16
  stores as legal as we generally expand fp16 operations).
- Add missing cost entries to `X86TTIImpl::getCastInstrCost`
  conversion from/to fp16. Note that conversion from f64 to f16 is not
  supported by an X86 instruction.
2024-10-25 16:22:24 -07:00
Alexey Bataev
d2e7ee77d3 [SLP]Do not check for clustered loads only
Since SLP support "clusterization" of the non-load instructions, the
restriction for reduced values for loads only should be removed to avoid
compiler crash.

Fixes #113516
2024-10-24 08:16:42 -07:00
Alexey Bataev
cb5046da26 [SLP]Do not ignore undefs when trying to replace with "poisonous" shuffles
Need to consider undefs correctly, when trying to replace them with
potentially poisonous values in shuffles. Such elements should not be
silently replaced by poison values, instead complex analysis should be
implemented to see if it is safe to do it.

Fixes #113425
2024-10-24 07:47:23 -07:00
Alexey Bataev
a3508e0246 [SLP]Small buidlvector only graph should contains scalars from same block
If the graph is small and has single buildvector node, all scalars
instructions must be from the same basic block to prevent compiler
crash.

Fixes #113451
2024-10-23 10:46:38 -07:00
Alexey Bataev
4b1b51ac52 [SLP]Initial non-power-of-2 support (but still whole register) for reductions
Enables initial non-power-of-2 support (but still requires number of
elements, forming whole registers) for reductions.
Enables extra vectorization for
MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and
CFP2017rate/526.blender_r (checked for SSE2)

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112361
2024-10-21 12:25:39 -07:00
Alexey Bataev
9e03920cbf [SLP]Ignore root gather node, when searching for reuses
Root gather/buildvector node should be ignored when SLP vectorizer tries
to find matching gather nodes, vectorized earlier. This node is
definitely the last one in the pipeline and it does not have users. It
may cause the compiler crash

Fixes #113143
2024-10-21 09:16:16 -07:00
David Green
17ac10c28f Revert "[SLP]Initial non-power-of-2 support (but still whole register) for reductions"
This reverts commit 7f2e937469a8cec3fe977bf41ad2dfb9b4ce648a as it causes
regressions in the tests it modifies, and undoes what was added in #100653
(which itself was a fix for a previous regression).
2024-10-21 13:37:44 +01:00
Alexey Bataev
7f2e937469 [SLP]Initial non-power-of-2 support (but still whole register) for reductions
Enables initial non-power-of-2 support (but still requires number of
elements, forming whole registers) for reductions.
Enables extra vectorization for
MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and
CFP2017rate/526.blender_r (checked for SSE2)

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112361
2024-10-18 12:50:11 -07:00
Alexey Bataev
685bec722f Revert "[SLP]Initial non-power-of-2 support (but still whole register) for reductions"
This reverts commit 8287fa8e596d8fc8655c8df3bc99e068ad9f7d4b to
investigate and fix compile time regressions reported by https://llvm-compile-time-tracker.com/compare.php?from=ec78f0da0e9b1b8e2b2323e434ea742e272dd913&to=8287fa8e596d8fc8655c8df3bc99e068ad9f7d4b&stat=instructions:u
2024-10-15 12:59:44 -07:00
Alexey Bataev
8287fa8e59
[SLP]Initial non-power-of-2 support (but still whole register) for reductions
Enables initial non-power-of-2 support (but still requiresnumber of
elements, forming whole registers) for reductions.
Enables extra vectorization for
MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and
CFP2017rate/526.blender_r (checked for SSE2)

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112361
2024-10-15 12:10:48 -04:00
Alexey Bataev
ab902ee54a [SLP][NFC]Replace more unreachable terminators by rets, NFC 2024-10-14 07:50:07 -07:00
Alexey Bataev
91a0fecf19 [SLP][NFC]Replace unreachable instructions by rets, NFC. 2024-10-14 07:00:56 -07:00
Alexey Bataev
f020bf1526
[SLP]Initial support for non-power-of-2 (but whole reg) vectorization for stores
Allows non-power-of-2 vectorization for stores, but still requires, that
vectorized number of elements forms full vector registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/111194
2024-10-09 15:22:44 -04:00
Alexey Bataev
9f3c55954e
[SLP]Fix loads sorting for loads from diffrent basic blocks
Patch fixes lookup for loads from different basic blocks. Originally,
the code checked is the main key (combined with parent basic block) was
created, but did not include the key into LoadsMap. When the code looked for
the load pointer in LoadsMap, it skipped check for parent basic block
and could mix loads from different basic blocks (but the same underlying
pointer). Currently, it does lead to any issues, since later the code
compares parent basic blocks and sorts loads properly. But it increases
compile time and affects compile time.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/111521
2024-10-08 16:44:16 -04:00
Alexey Bataev
a65a5feb1a
[SLP]Improve masked loads vectorization, attempting gathered loads
If the vector of loads can be vectorized as masked gather and there are
several other masked gather nodes, compiler can try to attempt to check,
if it possible to gather such nodes into big consecutive/strided loads
  node, which provide better performance.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/110151
2024-10-08 16:43:10 -04:00
Alexey Bataev
c0dfef878e [SLP][NFC]Add a test with potential non-power-of2 (but whole reg) vectorized stores 2024-10-04 11:22:55 -07:00
Elvina Yakubova
15ee17c3ce
[SLP] Move more X86 tests to common directory (#111134)
Some of the tests from the X86 directory can be generalized to improve
coverage for other architectures (cont.)
2024-10-04 13:18:56 +01:00
Alexey Bataev
133c1224de [SLP]Fix a crash on accessing element with index -1 for reused mask with PoisonMaskElem
Need to check if the index from the ReuseShuffleIndices mask is not
equal to PoisonMaskElem before trying to access the element by index.
2024-10-03 08:24:05 -07:00
Alexey Bataev
c1b911c579 [SLP]Do correct signedness analysis for clustered nodes
Should get the signedness info from the original scalar instructions, if
possible, to correctly generate sext/zext instructions. Also, the
clustered node must be assigned a gather node user info to correctly
estimate its bitwidth/sign.
2024-10-02 12:56:49 -07:00
Alexey Bataev
848cb21ddc [SLP][NFC]Add a test with the incorrect signedness info for subvector 2024-10-02 12:06:08 -07:00
Alexey Bataev
4197e732a5 [SLP]Add debug counter support
Fixes #110725

Reviewers: aeubanks

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/110734
2024-10-02 11:14:34 -07:00
Alexey Bataev
ec7266617f Revert "[SLP]Add debug counter support"
This reverts commit 67dd9d23474bd570d5befaddad0be8a5559b815b to fix https://lab.llvm.org/buildbot/#/builders/11/builds/6012
2024-10-02 10:33:27 -07:00
Alexey Bataev
67dd9d2347 [SLP]Add debug counter support
Fixes #110725

Reviewers: aeubanks

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/110734
2024-10-02 10:00:48 -07:00
Alexey Bataev
4dede756f2 [SLP]Transform nodes before building externally used values
transformNodes function may create new vector nodes, so the reduced
values might be vectorized later. Need to build the list of the
externally used values after the transformNodes() function call to avoid
compiler crash.

Fixe #110787
2024-10-02 06:01:25 -07:00
Haowei Wu
ccbda38b70 Revert "[SLP][NFC]Make a test target specific to avoid failures"
This reverts commit 998033b3501e96035e4da7027e0a9dfc81c721ad.
This breaks llvm test on debug-counter.ll
2024-10-01 16:14:45 -07:00
Alexey Bataev
998033b350 [SLP][NFC]Make a test target specific to avoid failures 2024-10-01 14:08:37 -07:00
Alexey Bataev
b16e694948
[SLP]Try to keep operand of external casts as scalars, if profitable
If the cost of original scalar instruction + cast is better than the
extractelement from the vector cast instruction, better to keep original
scalar instructions, where possible

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/110537
2024-10-01 13:35:42 -04:00
Alexey Bataev
0dab02258a [SLP][NFC]Add a test with external cast and extracted operand, NFC 2024-10-01 09:23:02 -07:00
Alexey Bataev
f49344e19d [SLP]Check if number of elements forms a full register
Need to check if number of elements form a full register before trying
per-register permutations to avoid compiler crash
2024-09-27 12:54:56 -07:00
Alexey Bataev
af6354634d [SLP]Look for vector user when estimating the cost
Need to find the first vector node user, not the very first user node at
all. The very first user might be a gather, vectorized as clustered,
which may cause compiler crash.

Fixes https://github.com/llvm/llvm-project/issues/110193
2024-09-27 04:14:28 -07:00
Alexey Bataev
100fd0cd5a [SLP]Fix a crash when trying to identify one source order
Need to check that order index is not out-of-boundaries when trying to
detect that the reuse mask is one-source-mask with clusters to fix
compiler crash
2024-09-26 04:47:48 -07:00
Alexey Bataev
3469db82b5
[SLP]Add subvector vectorization for non-load nodes
Previously SLP vectorize supported clustered vectorization for loads
only. This patch adds support for "clustered" vectorization for other
instructions.
If the buildvector node contains "clusters", which can be vectorized
separately and then inserted into the resulting buildvector result, it
is better to do, since it may reduce the cost of the vector graph and
produce better vector code.
The patch does some analysis, if it is profitable to try to do this kind
of extra vectorization. It checks the scalar instructions and its
operands and tries to vectorize them only if they result in a better
graph.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/108430
2024-09-25 10:23:41 -04:00
Alexey Bataev
d7dd31e417 [SLP]Better analysis of the repeated instructions during operands reordering
When doing the repeated instructions analysis, better to make the
reordering non-profitable, if the number of unique instructions is not
power-of-2. In this case better to keep power-of-2 elements as this
allows better vectorization.

Fixes https://github.com/llvm/llvm-project/issues/109725
2024-09-24 14:03:10 -07:00
Alexey Bataev
538dbb97a2 [SLP][NFC]Add a test with the operand reordering and bad reused values decision 2024-09-24 13:54:43 -07:00
Elvina Yakubova
7773243d99
[SLP] Move more X86 tests to common directory (#109821)
Some of the tests from the X86 directory can be generalized to improve
coverage for other architectures
2024-09-24 17:10:47 +01:00
Elvina Yakubova
706e71076e
[SLP] Move some of X86 tests to common directory (#107587)
Some of the tests from the X86 directory can be generalized to improve
coverage for other architectures
2024-09-23 20:53:52 +01:00
Alexey Bataev
3db0f8c895 [SLP]Update TrackedToOrig mappings after reduction vectorization
Need to update mappings in TrackedToOrig to correctly provide mapping
between updated reduced value after vectorization and its original
value, otherwise the compiler might miss this update and it may cause
compiler crash later, when it tries to find the original instruction
mapping for the updated value.

Fixes https://github.com/llvm/llvm-project/issues/109376
2024-09-23 10:52:03 -07:00
Alexey Bataev
1833d418a0 [SLP]Vectorize gathered loads
Final gather/buildvector nodes may have scalar loads, which are not
vectorized (since they are part of the gather nodes) but may form full
vector loads, being combined. This patch walks over all gather nodes,
"gathering" and sorting gathered scalar loads and then tries to build
vector loads, which later are reshuffled between the gather nodes.
It allows later to add support for segmented loads (kind of AOS to SOA
load kind for RISC-V RVV) and may help with the removal of the alternat
e opcodes support.
Currently, alternate nodes may depend on each other because of the
consecutive loads between their operands. Because of that we cannot
simply remove alternate vectorization. But this approach may help to
remove most of the stuff for it, since we'll be able to vectorize loads
in between lanes.

Metric: size..text, AVX512

Program                                                                                                                                                size..text
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   238381.00   250669.00  5.2%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test    25753.00    26329.00  2.2%
                                                                  test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-psadbw.test     3028.00     3092.00  2.1%
                                                                                     test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test     4243.00     4275.00  0.8%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   649765.00   653877.00  0.6%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   649765.00   653877.00  0.6%
                                                                                       test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test     4199.00     4222.00  0.5%
                                                             test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test    12933.00    12997.00  0.5%
                                                                                                 test-suite :: SingleSource/Benchmarks/Misc/flops.test     8282.00     8314.00  0.4%
                                                            test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-unpack_msasm.test    10065.00    10097.00  0.3%
                                                                                         test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test     5160.00     5176.00  0.3%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12472220.00 12509612.00  0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test     6908.00     6924.00  0.2%
                                                                         test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   202830.00   203278.00  0.2%
                                                                                       test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test     9133.00     9149.00  0.2%
                                                                                           test-suite :: MultiSource/Benchmarks/Olden/power/power.test     6792.00     6803.00  0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1395585.00  1397473.00  0.1%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1395585.00  1397473.00  0.1%
                                                                        test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    97662.00    97758.00  0.1%
                                                                                        test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   595179.00   595739.00  0.1%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test    70603.00    70667.00  0.1%
                                                                            test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail.test    19877.00    19893.00  0.1%
                                                                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test    90231.00    90279.00  0.1%
                                                                                         test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test    33738.00    33754.00  0.0%
                                                                                     test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test    13262.00    13268.00  0.0%
                                                                                        test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  1139964.00  1140460.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test   849507.00   849875.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1158379.00  1158859.00  0.0%
                                                                                   test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test    38724.00    38740.00  0.0%
                                                                                              test-suite :: External/SPEC/CFP2006/470.lbm/470.lbm.test    15180.00    15186.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test    15484.00    15490.00  0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test   167391.00   167455.00  0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test   137448.00   137496.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2030254.00  2030766.00  0.0%
                                                                              test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test   302870.00   302934.00  0.0%
                                                                                    test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test   303126.00   303190.00  0.0%
                                                                                            test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test   241107.00   241155.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test   162974.00   163006.00  0.0%
                                                                                                 test-suite :: MultiSource/Applications/siod/siod.test   167168.00   167200.00  0.0%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1048796.00  1048988.00  0.0%
                                                                               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test   201623.00   201655.00  0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   501734.00   501798.00  0.0%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test   580888.00   580952.00  0.0%
                                                                                           test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test   168319.00   168335.00  0.0%
                                                                        test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test   226022.00   226038.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test   118011.00   118015.00  0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test   550589.00   550605.00  0.0%
                                                                                             test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test  3072477.00  3072541.00  0.0%
                                                                                 test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test  2385563.00  2385579.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   389171.00   389155.00 -0.0%
                                                                                                   test-suite :: MultiSource/Applications/lua/lua.test   234764.00   234748.00 -0.0%
                                                                                        test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   227694.00   227678.00 -0.0%
                                                                    test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test   119819.00   119807.00 -0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test   117995.00   117983.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test   123610.00   123594.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test    81414.00    81398.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782040.00   781880.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  9597420.00  9595292.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  9597420.00  9595292.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   911832.00   911608.00 -0.0%
                                                                                             test-suite :: MultiSource/Applications/oggenc/oggenc.test   192507.00   192459.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test   122843.00   122811.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   122292.00   122260.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   777363.00   777155.00 -0.0%
                                                                            test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test   123265.00   123205.00 -0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   315534.00   315358.00 -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test   128163.00   128083.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test     6562.00     6555.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test    23428.00    23396.00 -0.1%
                                                                             test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test    22749.00    22717.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39549.00    39485.00 -0.2%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39546.00    39482.00 -0.2%
                                                                                    test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test    57214.00    57118.00 -0.2%
                                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   413668.00   412804.00 -0.2%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1044047.00  1041487.00 -0.2%
                                                                                            test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test    12414.00    12382.00 -0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test    31161.00    30969.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   224726.00   223254.00 -0.7%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test    93512.00    92824.00 -0.7%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   281151.00   278463.00 -1.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test     2820.00     2788.00 -1.1%
                                                                                            test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   156819.00   154739.00 -1.3%
                                                                 test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test    11560.00    11160.00 -3.5%
                                                                                          test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test     6734.00     6382.00 -5.2%
                                                                                                                                                       results     results0    diff

ASCI_Purple/SMG2000 - extra vector code
VPlanNativePath/outer-loop-vect - extra vectorization, better vector
code
AVX512BWVL/Vector-AVX512BWVL-psadbw - better vector code
Rodinia/hotspot - small variations
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - extra vector code, better vectorization
BenchmarkGame/n-body - better vector code.
AVX512BWVL/Vector-AVX512BWVL-unpack_msasm - small variations
Misc/flops - extra vector code
AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - small variations
Misc-C++/Large - better vector code
CFP2017rate/526.blender_r - extra vector code
Prolangs-C++/city - extra vector code
MiBench/consumer-lame - extra vector code
CoyoteBench/fftbench - extra vector code
Olden/power - better vector code
CFP2017rate/538.imagick_r
CFP2017speed/638.imagick_s - extra vector code
CINT2017rate/531.deepsjeng_r - extra vector code
CFP2006/447.dealII - small variations
DOE-ProxyApps-C/miniAMR - small variations
Prolangs-C/unix-smail - small variations
DOE-ProxyApps-C++/PENNANT - small variations
CINT2006/473.astar - small variations
CFP2006/453.povray - small variations
JM/lencod - extra vector code
CFP2017rate/511.povray_r - small variations
DOE-ProxyApps-C/CoMD - small variations
CFP2006/470.lbm - extra vector code
CFP2017speed/619.lbm_s
CFP2017rate/519.lbm_r - extra vector code
CINT2006/456.hmmer - extra code vectorized
TSVC/ControlFlow-dbl - extra vector code
CFP2017rate/510.parest_r - better vector code
LCALS/SubsetALambdaLoops - extra code vectorized
LCALS/SubsetARawLoops - extra code vectorized
CFP2006/444.namd - extra code vectorized
CFP2006/482.sphinx3 - better vector code
Applications/siod - better vector code
Benchmarks/7zip - better vector code
DOE-ProxyApps-C++/CLAMR - extra code vectorized
Applications/sqlite3 - extra code vectorized
Applications/ClamAV - smaller vector code
MallocBench/gs - small variations
MicroBenchmarks/ImageProcessing - small variations
TSVC/StatementReordering-flt - extra code vectorized
CINT2006/471.omnetpp - small variations
CINT2006/403.gcc - extra code vectorized
CINT2006/483.xalancbmk - extra code vectorized
JM/ldecod - small variations
Applications/lua - extra code vectorized
mafft/pairlocalalign - small variations
TSVC/NodeSplitting-flt - extra code vectorized
TSVC/Recurrences-flt - extra code vectorized
TSVC/InductionVariable-flt - extra code vectorized
FreeBench/pifft - small variations
CINT2006/464.h264ref - extra code vectorized
CINT2017speed/602.gcc_s
CINT2017rate/502.gcc_r - some extra code vectorized, extra code inlined
CINT2006/445.gobmk - small variations
Applications/oggenc - small variations
TSVC/LoopRestructuring-flt - extra code vectorized
TSVC/CrossingThresholds-flt - extra code vectorized
CFP2017rate/508.namd_r - small variations
TSVC/ControlFlow-flt - extra code vectorized
mediabench/g721 - small variations
Prolangs-C/compiler - small variations
FreeBench/fourinarow - better vector code
MiBench/telecomm-gsm - small variation in vector code
mediabench/gsm - same
Prolangs-C/bison - small variations
Adobe-C++/loop_unroll - extra code vectorized
Benchmarks/tramp3d-v4 - extra code gets inlined, small changes in vetor
code
McCat/18-imp - variations in vector code
Prolangs-C/gnugo - variations in vector code
MallocBench/espresso - extra code vectorized
DOE-ProxyApps-C++/miniFE - small variations in vector code
Prolangs-C/TimberWolfMC - extra code vectorized, small changes in
previously vectorized code.
Olden/tsp - small changes in vector code
CFP2006/433.milc - extra code gets inlined, vectorized 2 x stores to 4 x stores
MiBench/security-blowfish - extra code vectorized
McCat/08-main - better vector code.

Metric: size..text, RISCV, sifive-p670

Program                                                                                                                                                size..text
                                                                                                                                                       results    results0   diff
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   63580.00   64020.00  0.7%
                                                                   test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   21388.00   21406.00  0.1%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  296992.00  297088.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  968112.00  968208.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test   45160.00   45164.00  0.0%
                                                                         test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2635902.00 2635854.00 -0.0%
                                                                        test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2635902.00 2635854.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7568730.00 7568578.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7568730.00 7568578.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   49764.00   49762.00 -0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  449132.00  449108.00 -0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test  695932.00  695892.00 -0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  508820.00  508788.00 -0.0%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  508820.00  508788.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9594152.00 9593336.00 -0.0%
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test  166522.00  166490.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  722252.00  722092.00 -0.0%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   27554.00   27546.00 -0.0%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test   10900.00   10896.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test   46754.00   46732.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  631570.00  631226.00 -0.1%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  850698.00  850218.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   24816.00   24800.00 -0.1%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   24814.00   24798.00 -0.1%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1599946.00 1598394.00 -0.1%
                                                                                                   test-suite :: MultiSource/Applications/hbd/hbd.test   27236.00   27204.00 -0.1%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  293848.00  293480.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test   20160.00   20048.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test  182088.00  181040.00 -0.6%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    4788.00    4748.00 -0.8%

DOE-ProxyApps-C++/miniFE - extra vector code
MiBench/automotive-susan - small variations
Benchmarks/Bullet - extra vector code
CFP2017rate/511.povray_r - slightly better vector code
TSVC/StatementReordering-dbl - small variations
CINT2017rate/523.xalancbmk_r
CINT2017speed/623.xalancbmk_s - extra vector code
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - extra vector code
TSVC/CrossingThresholds-flt - small variations
Applications/sqlite3 - extra vector code
JM/lencod - extra vector code, small variations
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - small variations
CFP2017rate/526.blender_r - extra vector code, small variations
DOE-ProxyApps-C/miniGMG - small variations
Vectorizer/VPlanNativePath/outer-loop-vect - small variations
TSVC/CrossingThresholds-dbl - small variations
Benchmarks/tramp3d-v4 - small variations
Benchmarks/7zip - extra vector code
MiBench/telecomm-gsm - small variations
mediabench/gsm/toast - small variations
CFP2017rate/510.parest_r - extra vector code
Applications/hbd - extra vector code
JM/ldecod - better vector code
Prolangs-C/compiler - extra vector code
MallocBench/espresso - extra vector code
mediabench/g721/g721encode - extra vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/107461
2024-09-21 15:41:06 -07:00
Alexey Bataev
e588fd994f Revert "[SLP]Vectorize gathered loads"
This reverts commit dc2deb53131b9d4c5e881229190bdda1ca3ea47f to fix the
issue reported in https://lab.llvm.org/buildbot/#/builders/25/builds/2668
2024-09-21 15:25:36 -07:00
Alexey Bataev
dc2deb5313 [SLP]Vectorize gathered loads
Final gather/buildvector nodes may have scalar loads, which are not
vectorized (since they are part of the gather nodes) but may form full
vector loads, being combined. This patch walks over all gather nodes,
"gathering" and sorting gathered scalar loads and then tries to build
vector loads, which later are reshuffled between the gather nodes.
It allows later to add support for segmented loads (kind of AOS to SOA
load kind for RISC-V RVV) and may help with the removal of the alternat
e opcodes support.
Currently, alternate nodes may depend on each other because of the
consecutive loads between their operands. Because of that we cannot
simply remove alternate vectorization. But this approach may help to
remove most of the stuff for it, since we'll be able to vectorize loads
in between lanes.

Metric: size..text, AVX512

Program                                                                                                                                                size..text
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   238381.00   250669.00  5.2%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test    25753.00    26329.00  2.2%
                                                                  test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-psadbw.test     3028.00     3092.00  2.1%
                                                                                     test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test     4243.00     4275.00  0.8%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   649765.00   653877.00  0.6%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   649765.00   653877.00  0.6%
                                                                                       test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test     4199.00     4222.00  0.5%
                                                             test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test    12933.00    12997.00  0.5%
                                                                                                 test-suite :: SingleSource/Benchmarks/Misc/flops.test     8282.00     8314.00  0.4%
                                                            test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-unpack_msasm.test    10065.00    10097.00  0.3%
                                                                                         test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test     5160.00     5176.00  0.3%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12472220.00 12509612.00  0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test     6908.00     6924.00  0.2%
                                                                         test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   202830.00   203278.00  0.2%
                                                                                       test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test     9133.00     9149.00  0.2%
                                                                                           test-suite :: MultiSource/Benchmarks/Olden/power/power.test     6792.00     6803.00  0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1395585.00  1397473.00  0.1%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1395585.00  1397473.00  0.1%
                                                                        test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    97662.00    97758.00  0.1%
                                                                                        test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   595179.00   595739.00  0.1%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test    70603.00    70667.00  0.1%
                                                                            test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail.test    19877.00    19893.00  0.1%
                                                                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test    90231.00    90279.00  0.1%
                                                                                         test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test    33738.00    33754.00  0.0%
                                                                                     test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test    13262.00    13268.00  0.0%
                                                                                        test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  1139964.00  1140460.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test   849507.00   849875.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1158379.00  1158859.00  0.0%
                                                                                   test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test    38724.00    38740.00  0.0%
                                                                                              test-suite :: External/SPEC/CFP2006/470.lbm/470.lbm.test    15180.00    15186.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test    15484.00    15490.00  0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test   167391.00   167455.00  0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test   137448.00   137496.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2030254.00  2030766.00  0.0%
                                                                              test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test   302870.00   302934.00  0.0%
                                                                                    test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test   303126.00   303190.00  0.0%
                                                                                            test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test   241107.00   241155.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test   162974.00   163006.00  0.0%
                                                                                                 test-suite :: MultiSource/Applications/siod/siod.test   167168.00   167200.00  0.0%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1048796.00  1048988.00  0.0%
                                                                               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test   201623.00   201655.00  0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   501734.00   501798.00  0.0%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test   580888.00   580952.00  0.0%
                                                                                           test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test   168319.00   168335.00  0.0%
                                                                        test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test   226022.00   226038.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test   118011.00   118015.00  0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test   550589.00   550605.00  0.0%
                                                                                             test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test  3072477.00  3072541.00  0.0%
                                                                                 test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test  2385563.00  2385579.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   389171.00   389155.00 -0.0%
                                                                                                   test-suite :: MultiSource/Applications/lua/lua.test   234764.00   234748.00 -0.0%
                                                                                        test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   227694.00   227678.00 -0.0%
                                                                    test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test   119819.00   119807.00 -0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test   117995.00   117983.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test   123610.00   123594.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test    81414.00    81398.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782040.00   781880.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  9597420.00  9595292.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  9597420.00  9595292.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   911832.00   911608.00 -0.0%
                                                                                             test-suite :: MultiSource/Applications/oggenc/oggenc.test   192507.00   192459.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test   122843.00   122811.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   122292.00   122260.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   777363.00   777155.00 -0.0%
                                                                            test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test   123265.00   123205.00 -0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   315534.00   315358.00 -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test   128163.00   128083.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test     6562.00     6555.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test    23428.00    23396.00 -0.1%
                                                                             test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test    22749.00    22717.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39549.00    39485.00 -0.2%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39546.00    39482.00 -0.2%
                                                                                    test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test    57214.00    57118.00 -0.2%
                                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   413668.00   412804.00 -0.2%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1044047.00  1041487.00 -0.2%
                                                                                            test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test    12414.00    12382.00 -0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test    31161.00    30969.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   224726.00   223254.00 -0.7%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test    93512.00    92824.00 -0.7%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   281151.00   278463.00 -1.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test     2820.00     2788.00 -1.1%
                                                                                            test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   156819.00   154739.00 -1.3%
                                                                 test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test    11560.00    11160.00 -3.5%
                                                                                          test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test     6734.00     6382.00 -5.2%
                                                                                                                                                       results     results0    diff

ASCI_Purple/SMG2000 - extra vector code
VPlanNativePath/outer-loop-vect - extra vectorization, better vector
code
AVX512BWVL/Vector-AVX512BWVL-psadbw - better vector code
Rodinia/hotspot - small variations
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - extra vector code, better vectorization
BenchmarkGame/n-body - better vector code.
AVX512BWVL/Vector-AVX512BWVL-unpack_msasm - small variations
Misc/flops - extra vector code
AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - small variations
Misc-C++/Large - better vector code
CFP2017rate/526.blender_r - extra vector code
Prolangs-C++/city - extra vector code
MiBench/consumer-lame - extra vector code
CoyoteBench/fftbench - extra vector code
Olden/power - better vector code
CFP2017rate/538.imagick_r
CFP2017speed/638.imagick_s - extra vector code
CINT2017rate/531.deepsjeng_r - extra vector code
CFP2006/447.dealII - small variations
DOE-ProxyApps-C/miniAMR - small variations
Prolangs-C/unix-smail - small variations
DOE-ProxyApps-C++/PENNANT - small variations
CINT2006/473.astar - small variations
CFP2006/453.povray - small variations
JM/lencod - extra vector code
CFP2017rate/511.povray_r - small variations
DOE-ProxyApps-C/CoMD - small variations
CFP2006/470.lbm - extra vector code
CFP2017speed/619.lbm_s
CFP2017rate/519.lbm_r - extra vector code
CINT2006/456.hmmer - extra code vectorized
TSVC/ControlFlow-dbl - extra vector code
CFP2017rate/510.parest_r - better vector code
LCALS/SubsetALambdaLoops - extra code vectorized
LCALS/SubsetARawLoops - extra code vectorized
CFP2006/444.namd - extra code vectorized
CFP2006/482.sphinx3 - better vector code
Applications/siod - better vector code
Benchmarks/7zip - better vector code
DOE-ProxyApps-C++/CLAMR - extra code vectorized
Applications/sqlite3 - extra code vectorized
Applications/ClamAV - smaller vector code
MallocBench/gs - small variations
MicroBenchmarks/ImageProcessing - small variations
TSVC/StatementReordering-flt - extra code vectorized
CINT2006/471.omnetpp - small variations
CINT2006/403.gcc - extra code vectorized
CINT2006/483.xalancbmk - extra code vectorized
JM/ldecod - small variations
Applications/lua - extra code vectorized
mafft/pairlocalalign - small variations
TSVC/NodeSplitting-flt - extra code vectorized
TSVC/Recurrences-flt - extra code vectorized
TSVC/InductionVariable-flt - extra code vectorized
FreeBench/pifft - small variations
CINT2006/464.h264ref - extra code vectorized
CINT2017speed/602.gcc_s
CINT2017rate/502.gcc_r - some extra code vectorized, extra code inlined
CINT2006/445.gobmk - small variations
Applications/oggenc - small variations
TSVC/LoopRestructuring-flt - extra code vectorized
TSVC/CrossingThresholds-flt - extra code vectorized
CFP2017rate/508.namd_r - small variations
TSVC/ControlFlow-flt - extra code vectorized
mediabench/g721 - small variations
Prolangs-C/compiler - small variations
FreeBench/fourinarow - better vector code
MiBench/telecomm-gsm - small variation in vector code
mediabench/gsm - same
Prolangs-C/bison - small variations
Adobe-C++/loop_unroll - extra code vectorized
Benchmarks/tramp3d-v4 - extra code gets inlined, small changes in vetor
code
McCat/18-imp - variations in vector code
Prolangs-C/gnugo - variations in vector code
MallocBench/espresso - extra code vectorized
DOE-ProxyApps-C++/miniFE - small variations in vector code
Prolangs-C/TimberWolfMC - extra code vectorized, small changes in
previously vectorized code.
Olden/tsp - small changes in vector code
CFP2006/433.milc - extra code gets inlined, vectorized 2 x stores to 4 x stores
MiBench/security-blowfish - extra code vectorized
McCat/08-main - better vector code.

Metric: size..text, RISCV, sifive-p670

Program                                                                                                                                                size..text
                                                                                                                                                       results    results0   diff
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   63580.00   64020.00  0.7%
                                                                   test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   21388.00   21406.00  0.1%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  296992.00  297088.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  968112.00  968208.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test   45160.00   45164.00  0.0%
                                                                         test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2635902.00 2635854.00 -0.0%
                                                                        test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2635902.00 2635854.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7568730.00 7568578.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7568730.00 7568578.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   49764.00   49762.00 -0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  449132.00  449108.00 -0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test  695932.00  695892.00 -0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  508820.00  508788.00 -0.0%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  508820.00  508788.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9594152.00 9593336.00 -0.0%
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test  166522.00  166490.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  722252.00  722092.00 -0.0%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   27554.00   27546.00 -0.0%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test   10900.00   10896.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test   46754.00   46732.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  631570.00  631226.00 -0.1%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  850698.00  850218.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   24816.00   24800.00 -0.1%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   24814.00   24798.00 -0.1%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1599946.00 1598394.00 -0.1%
                                                                                                   test-suite :: MultiSource/Applications/hbd/hbd.test   27236.00   27204.00 -0.1%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  293848.00  293480.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test   20160.00   20048.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test  182088.00  181040.00 -0.6%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    4788.00    4748.00 -0.8%

DOE-ProxyApps-C++/miniFE - extra vector code
MiBench/automotive-susan - small variations
Benchmarks/Bullet - extra vector code
CFP2017rate/511.povray_r - slightly better vector code
TSVC/StatementReordering-dbl - small variations
CINT2017rate/523.xalancbmk_r
CINT2017speed/623.xalancbmk_s - extra vector code
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - extra vector code
TSVC/CrossingThresholds-flt - small variations
Applications/sqlite3 - extra vector code
JM/lencod - extra vector code, small variations
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - small variations
CFP2017rate/526.blender_r - extra vector code, small variations
DOE-ProxyApps-C/miniGMG - small variations
Vectorizer/VPlanNativePath/outer-loop-vect - small variations
TSVC/CrossingThresholds-dbl - small variations
Benchmarks/tramp3d-v4 - small variations
Benchmarks/7zip - extra vector code
MiBench/telecomm-gsm - small variations
mediabench/gsm/toast - small variations
CFP2017rate/510.parest_r - extra vector code
Applications/hbd - extra vector code
JM/ldecod - better vector code
Prolangs-C/compiler - extra vector code
MallocBench/espresso - extra vector code
mediabench/g721/g721encode - extra vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/107461
2024-09-21 14:08:23 -07:00
Nikita Popov
848cec11f5 Revert "[SLP]Vectorize gathered loads"
This reverts commit de1f5b96adcea52bf7c9670c46123fe1197050d2.

This has a very large compile-time impact in some cases, in
particular lencod. See:
http://llvm-compile-time-tracker.com/compare.php?from=b1339abb713063363e7804124b8fb3d84143a003&to=de1f5b96adcea52bf7c9670c46123fe1197050d2&stat=instructions:u
2024-09-17 16:45:25 +02:00
Alexey Bataev
de1f5b96ad
[SLP]Vectorize gathered loads
Final gather/buildvector nodes may have scalar loads, which are not
vectorized (since they are part of the gather nodes) but may form full
vector loads, being combined. This patch walks over all gather nodes,
"gathering" and sorting gathered scalar loads and then tries to build
vector loads, which later are reshuffled between the gather nodes.
It allows later to add support for segmented loads (kind of AOS to SOA
load kind for RISC-V RVV) and may help with the removal of the alternat
e opcodes support.
Currently, alternate nodes may depend on each other because of the
consecutive loads between their operands. Because of that we cannot
simply remove alternate vectorization. But this approach may help to
remove most of the stuff for it, since we'll be able to vectorize loads
in between lanes.

Metric: size..text, AVX512

Program                                                                                                                                                size..text
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   238381.00   250669.00  5.2%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test    25753.00    26329.00  2.2%
                                                                  test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-psadbw.test     3028.00     3092.00  2.1%
                                                                                     test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test     4243.00     4275.00  0.8%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   649765.00   653877.00  0.6%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   649765.00   653877.00  0.6%
                                                                                       test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test     4199.00     4222.00  0.5%
                                                             test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test    12933.00    12997.00  0.5%
                                                                                                 test-suite :: SingleSource/Benchmarks/Misc/flops.test     8282.00     8314.00  0.4%
                                                            test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-unpack_msasm.test    10065.00    10097.00  0.3%
                                                                                         test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test     5160.00     5176.00  0.3%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12472220.00 12509612.00  0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test     6908.00     6924.00  0.2%
                                                                         test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   202830.00   203278.00  0.2%
                                                                                       test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test     9133.00     9149.00  0.2%
                                                                                           test-suite :: MultiSource/Benchmarks/Olden/power/power.test     6792.00     6803.00  0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1395585.00  1397473.00  0.1%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1395585.00  1397473.00  0.1%
                                                                        test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    97662.00    97758.00  0.1%
                                                                                        test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   595179.00   595739.00  0.1%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test    70603.00    70667.00  0.1%
                                                                            test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail.test    19877.00    19893.00  0.1%
                                                                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test    90231.00    90279.00  0.1%
                                                                                         test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test    33738.00    33754.00  0.0%
                                                                                     test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test    13262.00    13268.00  0.0%
                                                                                        test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  1139964.00  1140460.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test   849507.00   849875.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1158379.00  1158859.00  0.0%
                                                                                   test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test    38724.00    38740.00  0.0%
                                                                                              test-suite :: External/SPEC/CFP2006/470.lbm/470.lbm.test    15180.00    15186.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test    15484.00    15490.00  0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test   167391.00   167455.00  0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test   137448.00   137496.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2030254.00  2030766.00  0.0%
                                                                              test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test   302870.00   302934.00  0.0%
                                                                                    test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test   303126.00   303190.00  0.0%
                                                                                            test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test   241107.00   241155.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test   162974.00   163006.00  0.0%
                                                                                                 test-suite :: MultiSource/Applications/siod/siod.test   167168.00   167200.00  0.0%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1048796.00  1048988.00  0.0%
                                                                               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test   201623.00   201655.00  0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   501734.00   501798.00  0.0%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test   580888.00   580952.00  0.0%
                                                                                           test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test   168319.00   168335.00  0.0%
                                                                        test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test   226022.00   226038.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test   118011.00   118015.00  0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test   550589.00   550605.00  0.0%
                                                                                             test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test  3072477.00  3072541.00  0.0%
                                                                                 test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test  2385563.00  2385579.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   389171.00   389155.00 -0.0%
                                                                                                   test-suite :: MultiSource/Applications/lua/lua.test   234764.00   234748.00 -0.0%
                                                                                        test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   227694.00   227678.00 -0.0%
                                                                    test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test   119819.00   119807.00 -0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test   117995.00   117983.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test   123610.00   123594.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test    81414.00    81398.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782040.00   781880.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  9597420.00  9595292.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  9597420.00  9595292.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   911832.00   911608.00 -0.0%
                                                                                             test-suite :: MultiSource/Applications/oggenc/oggenc.test   192507.00   192459.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test   122843.00   122811.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   122292.00   122260.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   777363.00   777155.00 -0.0%
                                                                            test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test   123265.00   123205.00 -0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   315534.00   315358.00 -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test   128163.00   128083.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test     6562.00     6555.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test    23428.00    23396.00 -0.1%
                                                                             test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test    22749.00    22717.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39549.00    39485.00 -0.2%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39546.00    39482.00 -0.2%
                                                                                    test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test    57214.00    57118.00 -0.2%
                                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   413668.00   412804.00 -0.2%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1044047.00  1041487.00 -0.2%
                                                                                            test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test    12414.00    12382.00 -0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test    31161.00    30969.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   224726.00   223254.00 -0.7%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test    93512.00    92824.00 -0.7%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   281151.00   278463.00 -1.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test     2820.00     2788.00 -1.1%
                                                                                            test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   156819.00   154739.00 -1.3%
                                                                 test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test    11560.00    11160.00 -3.5%
                                                                                          test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test     6734.00     6382.00 -5.2%
                                                                                                                                                       results     results0    diff

ASCI_Purple/SMG2000 - extra vector code
VPlanNativePath/outer-loop-vect - extra vectorization, better vector
code
AVX512BWVL/Vector-AVX512BWVL-psadbw - better vector code
Rodinia/hotspot - small variations
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - extra vector code, better vectorization
BenchmarkGame/n-body - better vector code.
AVX512BWVL/Vector-AVX512BWVL-unpack_msasm - small variations
Misc/flops - extra vector code
AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - small variations
Misc-C++/Large - better vector code
CFP2017rate/526.blender_r - extra vector code
Prolangs-C++/city - extra vector code
MiBench/consumer-lame - extra vector code
CoyoteBench/fftbench - extra vector code
Olden/power - better vector code
CFP2017rate/538.imagick_r
CFP2017speed/638.imagick_s - extra vector code
CINT2017rate/531.deepsjeng_r - extra vector code
CFP2006/447.dealII - small variations
DOE-ProxyApps-C/miniAMR - small variations
Prolangs-C/unix-smail - small variations
DOE-ProxyApps-C++/PENNANT - small variations
CINT2006/473.astar - small variations
CFP2006/453.povray - small variations
JM/lencod - extra vector code
CFP2017rate/511.povray_r - small variations
DOE-ProxyApps-C/CoMD - small variations
CFP2006/470.lbm - extra vector code
CFP2017speed/619.lbm_s
CFP2017rate/519.lbm_r - extra vector code
CINT2006/456.hmmer - extra code vectorized
TSVC/ControlFlow-dbl - extra vector code
CFP2017rate/510.parest_r - better vector code
LCALS/SubsetALambdaLoops - extra code vectorized
LCALS/SubsetARawLoops - extra code vectorized
CFP2006/444.namd - extra code vectorized
CFP2006/482.sphinx3 - better vector code
Applications/siod - better vector code
Benchmarks/7zip - better vector code
DOE-ProxyApps-C++/CLAMR - extra code vectorized
Applications/sqlite3 - extra code vectorized
Applications/ClamAV - smaller vector code
MallocBench/gs - small variations
MicroBenchmarks/ImageProcessing - small variations
TSVC/StatementReordering-flt - extra code vectorized
CINT2006/471.omnetpp - small variations
CINT2006/403.gcc - extra code vectorized
CINT2006/483.xalancbmk - extra code vectorized
JM/ldecod - small variations
Applications/lua - extra code vectorized
mafft/pairlocalalign - small variations
TSVC/NodeSplitting-flt - extra code vectorized
TSVC/Recurrences-flt - extra code vectorized
TSVC/InductionVariable-flt - extra code vectorized
FreeBench/pifft - small variations
CINT2006/464.h264ref - extra code vectorized
CINT2017speed/602.gcc_s
CINT2017rate/502.gcc_r - some extra code vectorized, extra code inlined
CINT2006/445.gobmk - small variations
Applications/oggenc - small variations
TSVC/LoopRestructuring-flt - extra code vectorized
TSVC/CrossingThresholds-flt - extra code vectorized
CFP2017rate/508.namd_r - small variations
TSVC/ControlFlow-flt - extra code vectorized
mediabench/g721 - small variations
Prolangs-C/compiler - small variations
FreeBench/fourinarow - better vector code
MiBench/telecomm-gsm - small variation in vector code
mediabench/gsm - same
Prolangs-C/bison - small variations
Adobe-C++/loop_unroll - extra code vectorized
Benchmarks/tramp3d-v4 - extra code gets inlined, small changes in vetor
code
McCat/18-imp - variations in vector code
Prolangs-C/gnugo - variations in vector code
MallocBench/espresso - extra code vectorized
DOE-ProxyApps-C++/miniFE - small variations in vector code
Prolangs-C/TimberWolfMC - extra code vectorized, small changes in
previously vectorized code.
Olden/tsp - small changes in vector code
CFP2006/433.milc - extra code gets inlined, vectorized 2 x stores to 4 x stores
MiBench/security-blowfish - extra code vectorized
McCat/08-main - better vector code.

Metric: size..text, RISCV, sifive-p670

Program                                                                                                                                                size..text
                                                                                                                                                       results    results0   diff
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   63580.00   64020.00  0.7%
                                                                   test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   21388.00   21406.00  0.1%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  296992.00  297088.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  968112.00  968208.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test   45160.00   45164.00  0.0%
                                                                         test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2635902.00 2635854.00 -0.0%
                                                                        test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2635902.00 2635854.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7568730.00 7568578.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7568730.00 7568578.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   49764.00   49762.00 -0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  449132.00  449108.00 -0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test  695932.00  695892.00 -0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  508820.00  508788.00 -0.0%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  508820.00  508788.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9594152.00 9593336.00 -0.0%
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test  166522.00  166490.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  722252.00  722092.00 -0.0%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   27554.00   27546.00 -0.0%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test   10900.00   10896.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test   46754.00   46732.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  631570.00  631226.00 -0.1%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  850698.00  850218.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   24816.00   24800.00 -0.1%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   24814.00   24798.00 -0.1%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1599946.00 1598394.00 -0.1%
                                                                                                   test-suite :: MultiSource/Applications/hbd/hbd.test   27236.00   27204.00 -0.1%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  293848.00  293480.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test   20160.00   20048.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test  182088.00  181040.00 -0.6%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    4788.00    4748.00 -0.8%

DOE-ProxyApps-C++/miniFE - extra vector code
MiBench/automotive-susan - small variations
Benchmarks/Bullet - extra vector code
CFP2017rate/511.povray_r - slightly better vector code
TSVC/StatementReordering-dbl - small variations
CINT2017rate/523.xalancbmk_r
CINT2017speed/623.xalancbmk_s - extra vector code
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - extra vector code
TSVC/CrossingThresholds-flt - small variations
Applications/sqlite3 - extra vector code
JM/lencod - extra vector code, small variations
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - small variations
CFP2017rate/526.blender_r - extra vector code, small variations
DOE-ProxyApps-C/miniGMG - small variations
Vectorizer/VPlanNativePath/outer-loop-vect - small variations
TSVC/CrossingThresholds-dbl - small variations
Benchmarks/tramp3d-v4 - small variations
Benchmarks/7zip - extra vector code
MiBench/telecomm-gsm - small variations
mediabench/gsm/toast - small variations
CFP2017rate/510.parest_r - extra vector code
Applications/hbd - extra vector code
JM/ldecod - better vector code
Prolangs-C/compiler - extra vector code
MallocBench/espresso - extra vector code
mediabench/g721/g721encode - extra vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/107461
2024-09-17 06:57:47 -04:00
Alexey Bataev
18ef467d73 [SLP]Fix PR108709: postpone buildvector clustered nodes, if required
The "clustered" nodes for buildvector nodes must be postponed in
accordance with the global flag, otherwise it may cause crash because of
the dependency between phi nodes.
2024-09-16 09:53:46 -07:00
Alexey Bataev
f564a48f0e [SLP]Fix PR108700: correctly identify id of the operand node
If the operand node for truncs is not created during construction, but
one of the previous ones is reused instead, need to correctly identify
its index, to correctly emit the code.

Fixes https://github.com/llvm/llvm-project/issues/108700
2024-09-16 09:44:47 -07:00
Alexey Bataev
1e3536ef31 [SLP]Fix PR108620: Need to check, if the reduced value was transformed
Before trying to include the scalar into the list of
ExternallyUsedValues, need to check, if it was transformed in previous
iteration and use the transformed value, not the original one, to avoid
compiler crash when building external uses.

Fixes https://github.com/llvm/llvm-project/issues/108620
2024-09-13 15:43:06 -07:00
Alexey Bataev
c13bf6d4a8 [SLP]Return proper value for phi vectorized node
Should not return the original phi vector instruction, need to return
actual vectorized value as a result.
2024-09-13 11:30:29 -07:00