2025 Commits

Author SHA1 Message Date
Kazu Hirata
2f8238f849
[llvm] Migrate away from PointerUnion::{is,get} (NFC) (#119679)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2024-12-12 07:54:48 -08:00
Mel Chen
b3cba9be41
[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812)
Consider the following loop:
```
  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

---------

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
2024-12-12 16:48:31 +08:00
Han-Kuan Chen
51a0c1bf25
[SLP] NFC. Replace TreeEntry::setOperandsInOrder with VLOperands. (#118949)
To reduce repeated code, TreeEntry::setOperandsInOrder will be replaced
by VLOperands.
Arg_size will be provided to make sure other operands will not be
reorderd when VL[0] is IntrinsicInst (because APO is a boolean value).
In addition, BoUpSLP::reorderInputsAccordingToOpcode will also be
removed since it is simple.
2024-12-11 10:09:23 +08:00
Alexey Bataev
a42aa8f265 [SLP]Fix adjusting of the mask for the fully matched nodes.
When checking for the poison elements in the matches node, need to
consider the register number, when clearing the corresponding mask
element.

Fixes #119393
2024-12-10 09:47:16 -08:00
Han-Kuan Chen
da421f55a7
[SLP] NFC. Make InstructionsState more constant. (#118609)
Add getMainOp and getAltOp.
Use `InstructionsState &` instead of `const InstructionsState &`.
Use `!S.isAltShuffle()` instead of `S.MainOp == S.AltOp`.
2024-12-10 23:28:14 +08:00
Han-Kuan Chen
b97c447dac
[SLP] NFC. Add assert for shouldBroadcast and canBeVectorized. (#119327) 2024-12-10 19:25:19 +08:00
Alexey Bataev
376dad72ab [SLP]Move resulting vector before inert point, if the late generated buildvector fully matched
If the perfect diamond match was detected for the postponed buildvectors
and the vector for the previous node comes after the current node, need
to move the vector register before the current inserting point to
prevent compiler crash.

Fixes #119002
2024-12-06 13:54:48 -08:00
Nikita Popov
6307e4b31e Revert "[SLP] NFC. Replace TreeEntry::setOperandsInOrder with VLOperands. (#113880)"
This reverts commit 94fbe7e3ae7c0ce4e9a7d801e7700457a36f731d.

Causes a crash when linking mafft in ReleaseLTO-g config.
2024-12-06 14:27:03 +01:00
Anutosh Bhat
89e919fb0d
Fix warnings while compiling SLPVectorizer.cpp (#118051)
Towards #118048

I was building llvm (clang and lld) for webassembly and came across
these warnings. Not sure if they are seen in our builds too.
```
/Users/anutosh491/work/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6924:67: warning: comparison of integers of different signs: 'typename iterator_traits<user_iterator_impl<User>>::difference_type' (aka 'long') and 'unsigned int' [-Wsign-compare]
 6924 |               if (std::distance(LI->user_begin(), LI->user_end()) !=
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
 6925 |                       LI->getNumUses())
      |                       ~~~~~~~~~~~~~~~~
[ 79%] Building CXX object lib/Transforms/Instrumentation/CMakeFiles/LLVMInstrumentation.dir/PGOInstrumentation.cpp.o
/Users/anutosh491/work/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:9754:43: warning: comparison of integers of different signs: 'typename iterator_traits<Value *const *>::difference_type' (aka 'long') and 'unsigned int' [-Wsign-compare]
 9754 |               count(Slice, Slice.front()) ==
      |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
 9755 |                   (isa<UndefValue>(Slice.front()) ? VF - 1 : 1)) {
 ```


This PR tries to address those warnings.
2024-12-06 08:11:39 -05:00
Han-Kuan Chen
94fbe7e3ae
[SLP] NFC. Replace TreeEntry::setOperandsInOrder with VLOperands. (#113880)
To reduce repeated code, TreeEntry::setOperandsInOrder will be replaced
by VLOperands.
Arg_size will be provided to make sure other operands will not be
reorderd when VL[0] is IntrinsicInst (because APO is a boolean value).
In addition, BoUpSLP::reorderInputsAccordingToOpcode will also be
removed since it is simple.
2024-12-06 12:03:23 +08:00
Han-Kuan Chen
f71ea4bc1b
[SLP][REVEC] reorderNodeWithReuses should not be called if all users of a TreeEntry are ShuffleVectorInst. (#118260) 2024-12-03 09:04:04 +08:00
Jonas Paulsson
0ad6be1927
[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491)
As vector element loads are free on SystemZ, this patch improves the cost
computation in getGatherCost() to reflect this.

getScalarizationOverhead() gets an optional parameter which can hold the actual
Values so that they in turn can be passed (by BasicTTIImpl) to
getVectorInstrCost().

SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and
typically return a 0 cost for it, with some exceptions.
2024-11-29 21:19:45 +01:00
Alexey Bataev
f4974e0931 [SLP] Add a check for poison value in AShrChecker
Need to check if the value in AShrChecker is a poison before casting it
to instruction to avoid compiler crash

Fixes #118030
2024-11-29 06:51:19 -08:00
Han-Kuan Chen
ead3a2f598
[SLP][REVEC] getScalarizationOverhead should not be used when ScalarTy is FixedVectorType. (#117536) 2024-11-26 22:05:54 +08:00
Alexey Bataev
76f0ff8210 [SLP]Add an extra check to avoid infinite vectorization attempts
Added extra check for the cost of the buildvector if the -slp-threshold
option is used. Prevents infinite vectorization attempts.
2024-11-25 14:27:44 -08:00
Alexey Bataev
f953b5eb72 [SLP]Relax assertion about subvectors mask size
SubVectorsMask might be less than CommonMask, if the vectors with larger
number of elements are permuted or reused elements are used. Need to
consider this when estimation/building the vector to avoid compiler
crash

Fixes #117518
2024-11-25 08:31:42 -08:00
Alexey Bataev
57bbdbd7ae [SLP]Relax assertion in mask combine for non-power-of-2 number of elements
The nodes may contain non-power-of-2 number of elements. Need to relax
the assertion to avoid possible compiler crash

Fixes #117517
2024-11-25 07:58:19 -08:00
Alexey Bataev
7523086a05
[SLP]Use getExtendedReduction cost and fix reduction cost calculations
Patch uses getExtendedReduction for reductions of ext-based nodes + adds
cost estimation for ctpop-kind reductions into basic implementation and
RISCV-V specific vcpop cost estimation.

Reviewers: RKSimon, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/117350
2024-11-22 16:12:53 -05:00
Alexey Bataev
b8703369da
[SLP] Match poison as instruction with the same opcode
Patch allows to vector scalar instruction + poison values as if poisons
are instructions with the same opcode. It allows better vectorization of
the repeated values, reduces number of insertelement instructions and
serves as a base ground for copyable elements vectorization

AVX512, -O3 + LTO

JM/ldecod - better vector code
Applications/oggenc - better vectorization
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - better vector code
CFP2017rate/526.blender_r - better vector code
CFP2006/447.dealII - small variations
Benchmarks/Bullet - extra vector code
CFP2017rate/510.parest_r - better vectorization
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - extra vector code
Benchmarks/tramp3d-v4 - small variations
CFP2006/453.povray - extra vector code
JM/lencod - better vector code
CFP2017rate/511.povray_r - extra vector code
MemFunctions/MemFunctions - extra vector code
LoopVectorization/LoopVectorizationBenchmarks - extra vector code
XRay/FDRMode - extra vector code
XRay/ReturnReference - extra vector code
LCALS/SubsetCLambdaLoops - extra vector code
LCALS/SubsetCRawLoops - extra vector code
LCALS/SubsetARawLoops - extra vector code
LCALS/SubsetALambdaLoops - extra vector code
DOE-ProxyApps-C++/miniFE - extra vector code
LoopVectorization/LoopInterleavingBenchmarks - extra vector code
LCALS/SubsetBLambdaLoops - extra vector code
MicroBenchmarks/harris - extra vector code
ImageProcessing/Dither - extra vector code
MicroBenchmarks/SLPVectorization - extra vector code
ImageProcessing/Blur - extra vector code
ImageProcessing/Dilate - extra vector code
Builtins/Int128 - extra vector code
ImageProcessing/Interpolation - extra vector code
ImageProcessing/BilateralFiltering - extra vector code
ImageProcessing/AnisotropicDiffusion - extra vector code
MicroBenchmarks/LoopInterchange - extra code vectorized
LCALS/SubsetBRawLoops - extra code vectorized
CINT2006/464.h264ref - extra vectorization with wider vectors
CFP2017rate/508.namd_r - small variations, extra phis vectorized
CFP2006/444.namd - 2 2 x phi replaced by 4 x phi
DOE-ProxyApps-C/SimpleMOC - extra code vectorized
CINT2017rate/541.leela_r
CINT2017speed/641.leela_s - the function better vectorized and inlined
Benchmarks/Misc/oourafft - 2 4 x bit reductions replaced by 2 x vector code
FreeBench/fourinarow - better vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/115946
2024-11-22 16:10:17 -05:00
Han-Kuan Chen
39913ae095
[SLP][REVEC] Make reorderTopToBottom support ShuffleVectorInst. (#117310)
We don't want reorderTopToBottom to reorder ShuffleVectorInst (because
ShuffleVectorInst currently supports only a limited set of patterns).
Either we make ShuffleVectorInst support more patterns, or we let
ReorderIndices reorder the result of the vectorization of
ShuffleVectorInst. We choose the latter solution.
2024-11-23 01:20:57 +08:00
Alexey Bataev
14bdcefbd8 [SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))
Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector
extensions + reduction add, but later instcombiner transforms it into
ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for
this in SLP vectorizer, which enables better cost estimation.

AVX512, -O3+LTO

CINT2006/445.gobmk - extra vector code
Prolangs-C/bison - extra vector code
Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24
x reduction

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/116875
2024-11-22 06:50:25 -08:00
Han-Kuan Chen
68aa6ac58c
[SLP] NFC. Remove redundant computation in getReorderingData. (#117295) 2024-11-22 18:54:41 +08:00
Han-Kuan Chen
55e9afab6e
[SLP] NFC. Remove the useless check for alternate instruction. (#117293)
Only BinaryOperator and CastInst support alternate instruction. It
always returns false for TreeEntry::isAltShuffle if an instruction is
ExtractElementInst, ExtractValueInst, LoadInst, StoreInst or
InsertElementInst.
2024-11-22 18:53:40 +08:00
Han-Kuan Chen
6b22e39f26
[SLP] NFC. Remove the useless check for alternate instruction. (#117116)
Only BinaryOperator and CastInst support alternate instruction. It
always returns false for TreeEntry::isAltShuffle if an instruction is
ExtractElementInst, ExtractValueInst, LoadInst, StoreInst or
InsertElementInst.
2024-11-22 10:39:41 +08:00
Alexey Bataev
68ce528def [SLP]Fix vector factor calculation for adjusted mask
Need to choose max vector factor as max(Mask.size(), prev-val-size).

Fixes build erros in https://lab.llvm.org/buildbot/#/builders/95/builds/6504
2024-11-21 14:30:20 -08:00
Alexey Bataev
07507cb591 [SLP]Fix shuffling of entries of the different sizes
Need to choose the size of vector factor for mask based on the entries
vector factors, not mask size, to generate correct code.

Fixes #117170
2024-11-21 13:08:27 -08:00
Alexey Bataev
b62557aaeb Revert "[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))"
This reverts commit 0298c5921d3b9fbeb5fefc2555321ea82ade6090 to fix
a buildbot crash reported by https://lab.llvm.org/buildbot/#/builders/113/builds/4079.
2024-11-21 12:52:55 -08:00
Finn Plummer
8663b8777e
[NFC][VectorUtils][TargetTransformInfo] Add isVectorIntrinsicWithOverloadTypeAtArg api (#114849)
This changes allows target intrinsics to specify and overwrite overloaded types.

- Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case
- Updates `SLPVectorizer` to use available TTI
- Updates `VPTransformState` to pass down TTI
- Updates `VPlanRecipe` to use passed-down TTI

This change will let us add scalarization for `asdouble`:  #114847
2024-11-21 11:04:25 -08:00
Alexey Bataev
0298c5921d
[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))
Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector
extensions + reduction add, but later instcombiner transforms it into
ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for
this in SLP vectorizer, which enables better cost estimation.

AVX512, -O3+LTO

CINT2006/445.gobmk - extra vector code
Prolangs-C/bison - extra vector code
Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24
x reduction

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/116875
2024-11-21 13:21:00 -05:00
Han-Kuan Chen
75b8f98ef6
[SLP] NFC. Change the comment to match the code execution. (#116022)
Make code execute like the comment will modify many tests and affect the
performance. As a result, we change the comment instead of the code.
2024-11-21 12:42:20 +08:00
Han-Kuan Chen
a62c5497c9
[SLP][REVEC] The vectorized result for ShuffleVector may not be ShuffleVectorInst. (#116940) 2024-11-20 23:59:23 +08:00
Alexey Bataev
b17f607703 [SLP][NFC]Remove unnecessary std::optional around Factor value 2024-11-20 05:54:15 -08:00
Alexey Bataev
79682c4d57 [SLP]Check if the buildvector root is not a part of the graph before deletion
If the buildvector root has no uses, it might be still needed as a part
of the graph, so need to check that it is not a part of the graph before
deletion.

Fixes #116852
2024-11-19 11:31:40 -08:00
Alexey Bataev
ad9c0b369e [SLP]Check if the gathered loads form full vector before attempting build it
Need to check that the number of gathered loads in the slice forms the
build vector to avoid compiler crash.

Fixes #116691
2024-11-18 14:09:31 -08:00
Alexey Bataev
f6e1d64458
[SLP]Enable interleaved stores support
Enables interaleaved stores, results in better estimation for segmented
stores for RISC-V

Reviewers: preames, topperc, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/115354
2024-11-15 11:01:57 -05:00
Alexey Bataev
af3295bd3d
[SLP]Enable splat ordering for loads
Enables splat support for loads with lanes> 2 or number of operands> 2.

Allows better detect splats of loads and reduces number of shuffles in
some cases.

X86, AVX512, -O3+LTO

Metric: size..text
                                                                          results     results0    diff
               test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   154867.00   156723.00  1.2%
 test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12467735.00 12468023.00  0.0%

Better vectorization quality

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/115173
2024-11-15 10:29:43 -05:00
Sushant Gokhale
9991ea28fc
[CostModel][AArch64] Make extractelement, with fmul user, free whenev… (#111479)
…er possible

In case of Neon, if there exists extractelement from lane != 0 such that
  1. extractelement does not necessitate a move from vector_reg -> GPR
  2. extractelement result feeds into fmul
3. Other operand of fmul is a scalar or extractelement from lane 0 or
lane equivalent to 0
then the extractelement can be merged with fmul in the backend and it
incurs no cost.

  e.g. 
  ```
define double @foo(<2 x double> %a) { 
    %1 = extractelement <2 x double> %a, i32 0 
    %2 = extractelement <2 x double> %a, i32 1
    %res = fmul double %1, %2    
    ret double %res
  }
```
  `%2` and `%res` can be merged in the backend to generate:
  `fmul    d0, d0, v0.d[1]`

The change was tested with SPEC FP(C/C++) on Neoverse-v2. 
**Compile time impact**: None
**Performance impact**: Observing 1.3-1.7% uplift on lbm benchmark with -flto depending upon the config.
2024-11-13 11:10:49 +05:30
Han-Kuan Chen
5a5502b9e1
[SLP] NFC. Use Value instead of template. (#115440) 2024-11-13 11:58:19 +08:00
Alexey Bataev
058ac837bc [SLP]Use generic createShuffle for buildvector
Use generic createShuffle function, which know how to adjust the vectors
correctly, to avoid compiler crash when trying to build a buildvector as
a shuffle

Fixes #115732
2024-11-11 10:49:39 -08:00
Han-Kuan Chen
3cdd86bb47
[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#115417) 2024-11-10 13:53:15 +08:00
Alexey Bataev
26a9f3f590 [SLP][NFC]Cleanup getSameOpcode, return InstructionsState::invalid() for non-valid inputs
Just a cleanup and related changes
2024-11-08 14:00:32 -08:00
Kazu Hirata
bc7e5c2016
[SLP] Avoid repeated hash lookups (NFC) (#115428) 2024-11-08 07:35:06 -08:00
Alexey Bataev
77bec78878 [SLP]Do not look for last instruction in schedule block for buildvectors
If looking for the insertion point for the node and the node is
a buildvector node, the compiler should not use scheduling info for such
nodes, they may contain only partial info, which is not fully correct
and may cause compiler crash.

Fixes #114082
2024-11-08 06:55:29 -08:00
Alexey Bataev
62db1c8a07 [SLP]Better decision making on whether to try stores packs for vectorization
Since the stores are sorted by distance, comparing the indices in the
original array and early exit, if the index is less than the index of
the last store, not always the best strategy. Better to remove such
stores explicitly to try better to check for the vectorization
opportunity.

Fixes #115008
2024-11-07 14:23:15 -08:00
Alexey Bataev
b7a8f5f4c9 [SLP][NFC]Exit early from attempt-to-reorder, if it is useless
Adds early exits, which just save compile time. It can exit earl, if the
total number of scalars is 2, or all scalars are constant, or the opcode
is the same and not alternate. In this case reordering will not happen
and compiler can exit early to save compile time
2024-11-07 11:07:49 -08:00
Kazu Hirata
22b4b1ab10 Revert "[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946)"
This reverts commit f58757b8dc167809b69ec00f9b5ab59281df0902.

Failing buildbots:
https://lab.llvm.org/buildbot/#/builders/174/builds/8058
https://lab.llvm.org/buildbot/#/builders/127/builds/1357
2024-11-07 10:43:11 -08:00
Han-Kuan Chen
f58757b8dc
[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946) 2024-11-08 00:52:59 +08:00
Han-Kuan Chen
c6091cdbed
[SLP][REVEC] Make shufflevector can be vectorized with ReorderIndices and ReuseShuffleIndices. (#114965) 2024-11-07 11:04:34 +08:00
Alexey Bataev
9f3b6adb15 [SLP][NFC]Exit early if the graph is empty, NFC
No need to check anything if the graph is empty, just exit early.
2024-11-06 08:33:14 -08:00
Alexey Bataev
76422385c3
[SLP]Support reordered buildvector nodes for better clustering
Patch adds reordering of the buildvector nodes for better clustering of
the compatible operations and future vectorization. Includes basic cost
estimation and if the transformation is not profitable - reverts it.

AVX512, -O3+LTO
Metric: size..text

Program                                                                          size..text
                                                                                       results     results0    diff
                        test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test    74565.00    75701.00  1.5%
                test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test    75773.00    76397.00  0.8%
               test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test    75773.00    76397.00  0.8%
               test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2014462.00  2024494.00  0.5%
                         test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   395219.00   396979.00  0.4%
                         test-suite :: MultiSource/Applications/JM/lencod/lencod.test   857795.00   859667.00  0.2%
                    test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   800472.00   802440.00  0.2%
                       test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   590699.00   591403.00  0.1%
        test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   203006.00   203102.00  0.0%
            test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test    42408.00    42424.00  0.0%
            test-suite ::  External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12451575.00  12451927.00  0.0%
            test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1396480.00  1396448.00 -0.0%
             test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1396480.00  1396448.00 -0.0%
                        test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1047708.00  1047580.00 -0.0%
        test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   111344.00   111328.00 -0.0%
                test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1087660.00  1087500.00 -0.0%
       test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   280664.00   280616.00 -0.0%
                          test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   502646.00   502006.00 -0.1%
                      test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1033135.00  1031567.00 -0.2%
        test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  2070917.00  2065845.00 -0.2%
       test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  2070917.00  2065845.00 -0.2%
                        test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test    33893.00    33797.00 -0.3%
          test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39677.00    39549.00 -0.3%
                 test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39674.00    39546.00 -0.3%
test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test    11560.00    11512.00 -0.4%
                 test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   653867.00   649275.00 -0.7%
                  test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   653867.00   649275.00 -0.7%

CINT2006/401.bzip2 - extra code vectorized
CINT2017rate/541.leela_r
CINT2017speed/641.leela_s - function
_ZN9FastBoard25get_pattern3_augment_specEiib not inlined anymore, better
vectorization
CFP2017rate/510.parest_r - better vectorization
JM/ldecod - better vectorization
JM/lencod - same
CINT2006/464.h264ref - extra code vectorized
CFP2006/447.dealII - extra vector code
MiBench/consumer-lame - vectorized 2 loops previously scalar
DOE-ProxyApps-C/miniGMG - small changes
Benchmarks/7zip - extra code vectorized, better vectorization
CFP2017rate/526.blender_r - extra vectorization
CFP2017speed/638.imagick_s
CFP2017rate/538.imagick_r - extra vectorization
MiBench/consumer-jpeg - extra vectorization
CINT2006/400.perlbench - extra vectorization
Prolangs-C/TimberWolfMC - small variations
Applications/sqlite3 - extra function vectorized and inlined
Benchmarks/tramp3d-v4 - extra code vectorized
CINT2017rate/500.perlbench_r
CINT2017speed/600.perlbench_s - extra code vectorized, function digcpy gets
vectorized and inlined
CINT2006/473.astar - extra code vectorized
MiBench/telecomm-gsm - extra code vectorized, better vector code
mediabench/gsm - same
MiBench/security-blowfish - extra code vectorized
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - sub4x4_dct function vectorized and gets
inlined

RISCV-V, SiFive-p670, O3+LTO

CFP2017rate/510.parest_r - extra vectorization
CFP2017rate/526.blender_r - extra vectorization
MiBench/consumer-lame - extra vectorized code

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/114284
2024-11-06 10:51:15 -05:00