1477 Commits

Author SHA1 Message Date
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Alexey Bataev
019aee8327 [SLP]Improve costs in computeExtractCost() to avoid crash after D158449.
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
2023-09-29 07:48:02 -07:00
Hans Wennborg
06f3b0ed43 Revert "[SLP]Improve costs in computeExtractCost() to avoid crash after D158449."
This caused asserts:

  Assertion failed: NumElts > 1 && "Expected at least 2-element fixed length vector(s).",
  file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp, line 7096

see comment on 59a67ea35d

> Need to consider the length of the original vector for extractelements,
> not the length, matched number of the scalars. It fixes 2 issues: 1)
> improves cost estimation; 2) Fixes crashes after D158449.

This reverts commit 59a67ea35d608480257fc64ec3e5106ef50de740.
2023-09-29 10:42:19 +02:00
Alexey Bataev
3204f88a8b Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the
crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.
2023-09-28 11:57:32 -07:00
Alexey Bataev
c88c281cf1 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-28 11:03:21 -07:00
Alexey Bataev
59a67ea35d [SLP]Improve costs in computeExtractCost() to avoid crash after D158449.
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
2023-09-28 09:36:08 -07:00
Alexey Bataev
9eeb0293e2 [SLP]Cleanup MultiNodeScalars when tree deleted.
Need to clear MultiNodeScalars map to avoid compiler crash when tree is
deleted.
2023-09-27 07:48:53 -07:00
Alexey Bataev
ea7f43ec14 [SLP]Do not gather node, if the instruction, that does not require
scheduling, is previously vectorized.

If the main node was vectorized already, but does not require
scheduling, we still can try to vectorize it in this new node instead of
gathering.
2023-09-26 11:57:35 -07:00
alexfh
5d86176f48
Revert "[SLP]Do not gather node, if the instruction, that does not require" (#67386)
This reverts commit 77053421228edd12a3ba73d4eebd970fcdd3b2c0, which
introduces a
clang crash (test case: https://gcc.godbolt.org/z/zn5n4KWPY).
2023-09-26 02:45:11 +02:00
Kazu Hirata
e7497570d8 [Vectorize] Use range-based for loops (NFC) 2023-09-22 17:43:06 -07:00
Alexey Bataev
7ff83ed6cd [SLP]Do not try to reorder possible strided nodes.
Reordering of possible strided nodes in bottom-to-top order requires
top-to-bottom reordering of the operands of such nodes, which is not
supported. Need to disable reordering of strided operands to avoid
compiler crashes.
2023-09-22 07:55:43 -07:00
David Spickett
8f548610a6 Revert "[SLP]Use source vector type as the original vector type instead of"
This reverts commit 9a99944df068b29b905cd8ba9a2132cc6382b6fb.

Due to test suite failures on all our SVE buildbots e.g.:
https://lab.llvm.org/buildbot/#/builders/184/builds/7375

clang: ../llvm/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:3565:
InstructionCost llvm::AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind,
VectorType *, ArrayRef<int>, TTI::TargetCostKind, int, VectorType *,
ArrayRef<const Value *>): Assertion `Mask.size() == TpNumElts && "Expected Mask and Tp size to match!"' failed.
2023-09-22 07:52:16 +00:00
Alexey Bataev
9a99944df0 [SLP]Use source vector type as the original vector type instead of
artificial for better cost estimation.

Need to use original source vector type, not the one artificially
constructed, based on the number of vectorized scalars. It affect the
cost significantly.
2023-09-21 11:34:02 -07:00
Alexey Bataev
3dc28e6c6a [SLp]Fix a crash because of wrong deps between vectorized nodes.
Need to change the order of the nodes vectorization to avoid too early
insertion of the first node.
2023-09-21 10:19:11 -07:00
Alexey Bataev
12fda304cc [SLP][NFC]Unify add() member function in CostEstimator, NFC.
Make add() function smart enough to understand that the shuffle of
a single entry is requested, if it sees that the second node is the same
as the first.
2023-09-21 07:59:37 -07:00
Alexey Bataev
c601928cb9 [SLP][NFC]Improve compile time by storing all nodes for the given
scalar.

No need to scan the whole graph when trying to find matching node for
the scalar, vectorized in several nodes, better to store corresponding
nodes along and scan just this small list.
2023-09-21 07:24:31 -07:00
Alexey Bataev
7705342122 [SLP]Do not gather node, if the instruction, that does not require
scheduling, is previously vectorized.

If the main node was vectorized already, but does not require
scheduling, we still can try to vectorize it in this new node instead of
gathering.
2023-09-20 12:52:37 -07:00
Alexey Bataev
ebed4692f8 [SLP]Fix a crash when trying to find operand with re-vectorized main
instruction.

Need to check if the operand scalars are vectorized in the a different
vector node, if the main instruction is already gets vectorized in other
vector node.
2023-09-20 09:54:15 -07:00
Alexey Bataev
7db87a66b0 [SLP]Fix PR66795: Check correct deps for vectorized inst with multiple
vectorized node uses.

If the instruction is vectorized in many different vector nodes, it may
break the dependency analysis for gathered nodes with matched scalars.
Need to properly check the dependency between such gather nodes to avoid
cycle dependency.
2023-09-19 12:11:33 -07:00
Alexey Bataev
434aa2fe56 [SLP]Improve canreuseExtracts for reordering analysis.
Improve the analysis in canReuseExtracts for the reodering to better
reorder extracts for ExtractSubvector pattern.
2023-09-15 12:09:45 -07:00
Alexey Bataev
b9ad72ba05 [SLP]Fix PR66176: SLP incorrectly reorders select operands.
On the very first iteration for the reductions, when trying to build
reduction for boolean logic operations, no need to compare LHS/RHS with
the Reduction(VectorizedTree), need to compare with actual parameters of
the reduction operations.
2023-09-15 03:57:36 -07:00
Alexey Bataev
c15c1e5dd5 [SLP]Do not account non-instructions for external use.
If the non-instruction gets vectorized, no need to account its extract
cost, it won't be removed and replaced by extractelement instruction.
2023-09-14 12:40:33 -07:00
Jeremy Morse
e54277fa10 [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder
This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152468
2023-09-11 20:01:19 +01:00
Alexey Bataev
9a90457a76 [SLP][NFC]Use ArrayReffor operands directly instead of entry/operand number, NFC. 2023-09-11 11:16:13 -07:00
Jeremy Morse
6942c64e81 [NFC][RemoveDIs] Prefer iterator-insertion over instructions
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.

At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152537
2023-09-11 11:48:45 +01:00
Alexey Bataev
5bab59de44 [SLP]Try to vectorize scalars, being vectorized already, but does not need to be scheduled.
If the scalar does not need to be scheduled and it was vectorized
already in one of the vector nodes, we still can try to vectorize it in
another node. Just does not need account its cost in the scalar total
cost, as it will be handled in the main vectorized node.

Differential Revision: https://reviews.llvm.org/D159205
2023-09-08 13:34:12 -07:00
Alexey Bataev
30edf1c449
[SLP]Do not early exit if the number of unique elements is non-power-of-2. (#65476)
We still can try to vectorize the bundle of the instructions, even if
the
repeated number of instruction is non-power-of-2. In this case need to
adjust the cost (calculate the cost only for unique scalar instructions)
and cost of the extracts. Also, when scheduling the bundle need to
schedule only unique scalars to avoid compiler crash because of the
multiple dependencies. Can be safely applied only if all scalars's users
are also vectorized and do not require memory accesses (this one is
a temporarily requirement, can be relaxed later).

---------

Co-authored-by: Alexey Bataev <a.bataev@outlook.com>
2023-09-08 10:00:46 -04:00
Alexey Bataev
8d933ea5ac [SLP][NFC]Use SmallDensetSet for lookup instead of ArrayRef, NFC. 2023-09-06 13:17:30 -07:00
Alexey Bataev
09b8bbd6e0 [SLP][NFC]Reorder indeces instead of real values, NFC.
May save some memory/compile time.
2023-09-05 08:48:52 -07:00
Mel Chen
26aed5b9a8 [VPlan][LoopUtils] Remove unused parameter TTI
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158148
2023-09-04 05:30:37 -07:00
Kazu Hirata
6da470d7f8 [llvm] Use range-based for loops (NFC) 2023-09-02 09:32:45 -07:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Philip Reames
aada8f2e54 [slp] Tweak debug costing output to include VL
This makes it much easier to understand which vector length is being considered when the same set of nodes are evaluated at multiple vector lengths.
2023-08-30 09:13:19 -07:00
Alexey Bataev
66c623bfc6 [SLP][NFC]Use TreeEntry::getOprand instead of trying to rebuild it in getOperandInfo(), NFC. 2023-08-23 13:37:36 -07:00
Alexey Bataev
a9e6295548 [SLP][NFC]Use all_of/any_of instead of loops, NFC. 2023-08-22 08:21:36 -07:00
Alexey Bataev
b51195dece [SLP]Fix PR63854: Add proper sorting of pointers for masked stores.
If the masked gathers can be reordered, it may produce strided access
pattern and the reordering does not affect common reodering, better to
try to reorder masked gathers for better performance.

Differential Revision: https://reviews.llvm.org/D157009
2023-08-22 06:14:01 -07:00
Alexey Bataev
ca2eabdb52 [SLP][NFC]Improve code to meet coding standards, NFC. 2023-08-15 11:08:25 -07:00
Alexey Bataev
63c7815faf [SLP]Fix comparator for PHI nodes comparison.
Fixed comparator for PHI nodes sorting to meet the criteria for strict
weak ordering.
2023-08-14 14:05:39 -07:00
Alexey Bataev
4f0bd8f7ac [SLP]Fix strict weak ordering for Cmp instruction comparator.
Sorting algorithms require strict weak ordering for comparators, final
fix for cmp instructions comparator.
2023-08-14 09:37:46 -07:00
Alexey Bataev
2216507171 [SLP]Fix PR64568: Crash during horizontal reduction.
If the reduced values is constant-foldable and was folded to a constant
during previous transformations, need to excluded it from the list of
the reduced values-instructions as non-matchable.
2023-08-10 07:33:16 -07:00
Alexey Bataev
42b3925d42 [SLP][NFC]Fix formatting/warnings in tryToReduce(), NFC. 2023-08-10 06:42:50 -07:00
Valery N Dmitriev
f522be63bc [SLP][NFC] Make buildShuffleEntryMask routine a TreeEntry method.
The routine uses data stored at TreeEntry node for building a mask
so it is natural to make it a method for the type. That will simplify
its interface and reduces data transfer.
The method is added as buildAltOpShuffleMask.

Differential Revision: https://reviews.llvm.org/D157545
2023-08-09 13:43:03 -07:00
Alexey Bataev
c619222ea4 [SLP]Use common logic for cost estimation of the alternate vector nodes.
We can use buildShuffleEntryMask() to build the shuffle mask correctly
not only for the alternate nodes with reuses, but also for the nodes
without reused scalars. It allows better to estimate the cost of the
node and emit better code.

Differential Revision: https://reviews.llvm.org/D157413
2023-08-09 11:50:39 -07:00
Alexey Bataev
d0e3a571e7 [SLP]Fix PR64519: Unexpected reordering of gathers.
The issue is actually related to ScatterVectorize nodes. If such node
gets reordered during bottom-to-top reordering, it may have associated
non-empty ReorderIndices. In this case, such nodes need to be handled
the same way as regular Vectorize nodes, not NeedToGather nodes. In this
case we need to reorder ReorderIndices array rather than scalars.
2023-08-08 08:07:25 -07:00
Alexey Bataev
e894c3d1a9 [SLP]Improve stores vectorization.
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15       120.11     2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67       207.42     1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43       235.01     1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49       207.25     0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46       306.23    -1.4%

Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69   1475.94    6.7%

Other changes are too small, cannot rely on them.

size..text
Program                                                                                                           size..text
                                                                                                                  results     results0    diff
                                               test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test      392.00     1439.00 267.1%
                                                     test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   394258.00   394818.00   0.1%
                                                     test-suite :: MultiSource/Applications/JM/lencod/lencod.test   846355.00   847075.00   0.1%
                                                test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782816.00   783360.00   0.1%
                                               test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   779667.00   779923.00   0.0%
                                                   test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   224398.00   224446.00   0.0%
                                                        test-suite :: MultiSource/Applications/oggenc/oggenc.test   185019.00   185035.00   0.0%
                                         test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00   0.0%
                                                    test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1051772.00  1051804.00   0.0%
                                                          test-suite :: MultiSource/Applications/SPASS/SPASS.test   529586.00   529602.00   0.0%
                                            test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1084684.00  1084716.00   0.0%
                                                  test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1014245.00  1014261.00   0.0%

                                          test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   223494.00   223478.00  -0.0%
                                             test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   660843.00   660795.00  -0.0%
                                              test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   660843.00   660795.00  -0.0%
                                                      test-suite :: MultiSource/Applications/ClamAV/clamscan.test   568824.00   568760.00  -0.0%

espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.

Differential Revision: https://reviews.llvm.org/D155246
2023-08-07 09:17:56 -07:00
Alexey Bataev
48bcaeb997 Revert "[SLP]Improve stores vectorization."
This reverts commit 58066edbb05d66e6a7512a675da778475da3bdfb reported in https://lab.llvm.org/buildbot/#/builders/252/builds/3389
2023-08-04 07:37:26 -07:00
Alexey Bataev
58066edbb0 [SLP]Improve stores vectorization.
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15       120.11     2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67       207.42     1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43       235.01     1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49       207.25     0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46       306.23    -1.4%

Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69   1475.94    6.7%

Other changes are too small, cannot rely on them.

size..text
Program                                                                                                           size..text
                                                                                                                  results     results0    diff
                                               test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test      392.00     1439.00 267.1%
                                                     test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   394258.00   394818.00   0.1%
                                                     test-suite :: MultiSource/Applications/JM/lencod/lencod.test   846355.00   847075.00   0.1%
                                                test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782816.00   783360.00   0.1%
                                               test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   779667.00   779923.00   0.0%
                                                   test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   224398.00   224446.00   0.0%
                                                        test-suite :: MultiSource/Applications/oggenc/oggenc.test   185019.00   185035.00   0.0%
                                         test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00   0.0%
                                                    test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1051772.00  1051804.00   0.0%
                                                          test-suite :: MultiSource/Applications/SPASS/SPASS.test   529586.00   529602.00   0.0%
                                            test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1084684.00  1084716.00   0.0%
                                                  test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1014245.00  1014261.00   0.0%

                                          test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   223494.00   223478.00  -0.0%
                                             test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   660843.00   660795.00  -0.0%
                                              test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   660843.00   660795.00  -0.0%
                                                      test-suite :: MultiSource/Applications/ClamAV/clamscan.test   568824.00   568760.00  -0.0%

espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.

Differential Revision: https://reviews.llvm.org/D155246
2023-08-04 06:47:16 -07:00
Alexey Bataev
2f6ca38d51 Revert "[SLP]Improve stores vectorization."
This reverts commit 58b0d7c34ddd9d2117009a8cd7bd5e34a8276082 to fix
crashes reported in https://lab.llvm.org/buildbot/#/builders/85/builds/18117.
2023-08-04 05:25:31 -07:00
Alexey Bataev
58b0d7c34d [SLP]Improve stores vectorization.
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15       120.11     2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67       207.42     1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43       235.01     1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49       207.25     0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46       306.23    -1.4%

Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69   1475.94    6.7%

Other changes are too small, cannot rely on them.

size..text
Program                                                                                                           size..text
                                                                                                                  results     results0    diff
                                               test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test      392.00     1439.00 267.1%
                                                     test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   394258.00   394818.00   0.1%
                                                     test-suite :: MultiSource/Applications/JM/lencod/lencod.test   846355.00   847075.00   0.1%
                                                test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782816.00   783360.00   0.1%
                                               test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   779667.00   779923.00   0.0%
                                                   test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   224398.00   224446.00   0.0%
                                                        test-suite :: MultiSource/Applications/oggenc/oggenc.test   185019.00   185035.00   0.0%
                                         test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00   0.0%
                                                    test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1051772.00  1051804.00   0.0%
                                                          test-suite :: MultiSource/Applications/SPASS/SPASS.test   529586.00   529602.00   0.0%
                                            test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1084684.00  1084716.00   0.0%
                                                  test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1014245.00  1014261.00   0.0%

                                          test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   223494.00   223478.00  -0.0%
                                             test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   660843.00   660795.00  -0.0%
                                              test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   660843.00   660795.00  -0.0%
                                                      test-suite :: MultiSource/Applications/ClamAV/clamscan.test   568824.00   568760.00  -0.0%

espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.

Differential Revision: https://reviews.llvm.org/D155246
2023-08-03 16:14:21 -07:00