1654 Commits

Author SHA1 Message Date
Alexey Bataev
9c0a0659d4 [SLP]Fix a crash for non-profitable non-schedulable single buildvector node tree, if the threshold allows its vectorization. 2024-03-22 07:44:23 -07:00
Alexey Bataev
3942bd2fb5 [SLP]Fix a crash if the argument of call was affected by minbitwidt
analysis.

Need to support proper type conversion for function arguments to avoid
compiler crash.
2024-03-21 17:06:48 -07:00
Alexey Bataev
8d7a6e2fd8 [SLP]Fix a crash for gather node with instructions from different bbs,
if cost threshold is very low.
2024-03-21 08:03:06 -07:00
Alexey Bataev
34f0a8aaba [SLP]Fix comparison in bitwidth check.
Projected bitwidth should be less than the original, not greater.
2024-03-21 04:24:34 -07:00
Alexey Bataev
04f7cd7f45 [SLP][NFC]Make findBestRootPair() member function constant. 2024-03-20 08:33:47 -07:00
Alexey Bataev
6c1d4454ad
[SLP]Improve minbitwidth analysis for shifts.
Adds improved bitwidth analysis for shl/ashr/lshr instructions. The
analysis is based on similar version in InstCombiner.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84356
2024-03-20 09:07:26 -04:00
Alexey Bataev
81d9ed605b [SLP]Do extra analysis int minbitwidth if some checks return false.
The instruction itself can be considered good for minbitwidth casting,
even if one of the operand checks returns false.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84363
2024-03-20 05:48:55 -07:00
Alexey Bataev
3a90cb4c18 Revert "[SLP]Do extra analysis int minbitwidth if some checks return false."
This reverts commit da118c93b40f74f6770cf8550903721555d3c97b to fix
crashes reported in https://github.com/llvm/llvm-project/pull/84363.
2024-03-20 05:00:05 -07:00
Nikita Popov
27df1b23e0
[SLPVectorizer] Use TargetFolder (#85800)
Use IRBuilder with TargetFolder in SLPVectorizer to avoid the custom
constant folding code.

This fixes the remaining part of
https://github.com/llvm/llvm-project/issues/61240.
2024-03-20 09:18:45 +01:00
Alexey Bataev
da118c93b4 [SLP]Do extra analysis int minbitwidth if some checks return false.
The instruction itself can be considered good for minbitwidth casting,
even if one of the operand checks returns false.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84363
2024-03-19 12:24:18 -07:00
Alexey Bataev
31eaf86a1e [SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: https://github.com/llvm/llvm-project/pull/84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Reviewers:

Pull Request: https://github.com/llvm/llvm-project/pull/84536
2024-03-19 08:19:45 -07:00
Nikita Popov
94c6ce1de9 [SLPVectorizer] Use IRBuilderBase where possible (NFC)
Instead of hardcoding a specific IRBuilder type, use the base
class.
2024-03-19 14:27:48 +01:00
Alexey Bataev
9a42bdc0ae [SLP][NFC]Fix signedness to avoid comparison warning. 2024-03-15 09:56:40 -07:00
Philip Reames
0674ed753a
[SLP] Compute a shuffle mask for getGatherCost (#85330)
This is the second of a series of small patches to compute shuffle masks
for the couple of cases where we call getShuffleCost without one. My
goal is to add an invariant that all calls to getShuffleCost for fixed
length vectors have a mask.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-03-15 08:32:46 -07:00
Philip Reames
45e41f9686
[SLP] Compute a shuffle mask for SK_InsertSubvector (#85408)
This is the third of a series of small patches to compute shuffle masks
for the couple of cases where we call getShuffleCost without one. My
goal is to add an invariant that all calls to getShuffleCost for fixed
length vectors have a mask.

After this change, there is one SK_InsertSubvector case left. I excluded
it from this patch just because I thought it worthy of individual
attention and review.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-03-15 08:32:18 -07:00
Philip Reames
f337525ee8
[SLP] Compute a shuffle mask for SK_Broadcast shuffle (#85327)
This is the first of a couple of small patches to compute shuffle masks
for the couple of cases where we call getShuffleCost without one. My
goal is to add an invariant that all calls to getShuffleCost for fixed
length vectors have a mask.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-03-15 07:41:26 -07:00
Alexey Bataev
3789870758 Revert "[SLP]Improve minbitwidth analysis."
This reverts commit 7f2167868d8c1cedd3915883412b9c787a2f01db to fix
issues reported in https://github.com/llvm/llvm-project/pull/84536.
2024-03-15 03:59:48 -07:00
Alexey Bataev
dbbe2fe2a2 Revert "[SLP]Do extra analysis int minbitwidth if some checks return false."
This reverts commit e4b772444c8176abe30d364e4a946ee6c8ae8de4 to fixx the
issues reported in https://github.com/llvm/llvm-project/pull/84536.
2024-03-15 03:58:34 -07:00
Alexey Bataev
7567f5ba78 Revert "[SLP]Do extra analysis int minbitwidth if some checks return false."
This reverts commit ea429e19f56005bf89e717c14efdf49ec055b183 to fix
issues reported in https://github.com/llvm/llvm-project/pull/84536#issuecomment-1999295445.
2024-03-15 03:52:58 -07:00
Artem Tyurin
141145232f
[IRBuilder] Fold binary intrinsics (#80743)
Fixes https://github.com/llvm/llvm-project/issues/61240.
2024-03-15 09:58:25 +01:00
Alexey Bataev
e4b772444c [SLP]Do extra analysis int minbitwidth if some checks return false.
The instruction itself can be considered good for minbitwidth casting,
even if one of the operand checks returns false.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84363
2024-03-14 16:41:04 -07:00
Alexey Bataev
ea41ac1132 [SLP][NFC]Fix a warning for comparison of integers of different signs. 2024-03-14 16:06:08 -07:00
Alexey Bataev
5b303a98a8 Revert "[SLP]Do extra analysis int minbitwidth if some checks return false."
This reverts commit ea429e19f56005bf89e717c14efdf49ec055b183 to fix
issues revealed in
https://lab.llvm.org/buildbot/#/builders/186/builds/15299 and https://lab.llvm.org/buildbot/#/builders/238/builds/8426.
2024-03-14 15:48:26 -07:00
Alexey Bataev
ea429e19f5
[SLP]Do extra analysis int minbitwidth if some checks return false.
The instruction itself can be considered good for minbitwidth casting,
even if one of the operand checks returns false.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84363
2024-03-14 16:16:18 -04:00
Paschalis Mpeis
f795d1a8b1
[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem (#82488)
getArithmeticInstrCost is used by both LoopVectorizer and SLPVectorizer
to compute the cost of frem, which becomes a call cost on AArch64 when
TLI has a vector library function.

Add tests that do SLP vectorization for code that contains 2x double and
4x float frem instructions.
2024-03-14 17:20:29 +00:00
Alexey Bataev
7f2167868d [SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: https://github.com/llvm/llvm-project/pull/84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Reviewers:

Pull Request: https://github.com/llvm/llvm-project/pull/84536
2024-03-14 06:23:14 -07:00
Alexey Bataev
4dd186afd5 [SLP]Fix PR85082: PHI node has multiple entries.
Need to record casted extractelement for the externally used scalar, not
original extract instruction.
2024-03-13 13:59:58 -07:00
Alexey Bataev
b966b224b3 Revert "[SLP]Fix PR85082: PHI node has multiple entries."
This reverts commit 8237520eb42b37d7ed353d64a865d3ba5ac24ec6 to fix
a crash in https://lab.llvm.org/buildbot/#/builders/198/builds/8891.
2024-03-13 13:38:58 -07:00
Alexey Bataev
8237520eb4 [SLP]Fix PR85082: PHI node has multiple entries.
Need to record casted extractelement for the externally used scalar, not
original extract instruction.
2024-03-13 12:55:24 -07:00
Alexey Bataev
b77c079987 Revert "[SLP]Fix PR85082: PHI node has multiple entries."
This reverts commit 59ff907fc14aa2d02e57b4af4140949d4f8caca1 to fix
crash revealed in https://lab.llvm.org/buildbot/#/builders/198/builds/8881
2024-03-13 12:10:40 -07:00
Alexey Bataev
59ff907fc1 [SLP]Fix PR85082: PHI node has multiple entries.
Need to record casted extractelement for the externally used scalar, not
original extract instruction.
2024-03-13 08:14:10 -07:00
David Blaikie
9ac0315898 Add comment to assert from a843f26 2024-03-12 18:28:30 +00:00
David Blaikie
a843f26a77
[NFC] SLVectorizer comparator refactoring that preserves behavior (#84966)
Spinning off from #79321 / 35f4592 - looked like the comparator could be
simplified & made more clear/less risk of leaving hidden bugs.
2024-03-12 11:25:12 -07:00
Justin Lebar
fab2bb8bfd
Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678)
For some reason this was missing from STLExtras.
2024-03-10 20:00:13 -07:00
Martin Storsjö
5b5c21d772 Revert "[SLP]Improve minbitwidth analysis."
This reverts commit 2bd369b48dbf0bc3128becb7ef8f8a1b82514b87.

That commit triggered failed assertions:
$ cat repro.c
short *a;
int b;
void h() {
  short *c = a;
  b = 0;
  for (; b < 4; b++) {
    unsigned d = a[b] + a[b + 4 * 2], e = a[b] - a[b + 4 * 2],
             f = (a[b + 4] >> 1) - a[b + 4 * 3],
             g = a[b + 4] + (a[b + 4 * 3] >> 1);
    c[b] = g;
    c[b + 4] = e + f;
    c[b + 4 * 2] = e - f;
    c[b + 4 * 3] = d - g;
  }
}
$ clang -target aarch64-linux-gnu -c -O2 repro.c
clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:12503: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool): Assertion `(MinBWs.contains(getOperandEntry(E, 0)) || MinBWs.contains(getOperandEntry(E, 1))) && "Expected item in MinBWs."' failed.
2024-03-09 13:53:13 +02:00
Alexey Bataev
2bd369b48d
[SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: https://github.com/llvm/llvm-project/pull/84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/84536
2024-03-08 13:57:02 -05:00
Alexey Bataev
11185715a2 Revert "[SLP]Improve minbitwidth analysis."
This reverts commit 4ce52e2d576937fe930294cae883a0daa17eeced to fix
issues detected by https://lab.llvm.org/buildbot/#/builders/74/builds/26470/steps/12/logs/stdio.
2024-03-07 12:44:53 -08:00
Alexey Bataev
4ce52e2d57
[SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Reviewers: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84334
2024-03-07 10:36:41 -05:00
Alexey Bataev
aae152f1be Revert "[SLP]Improve minbitwidth analysis."
This reverts commit a730ed7c1a4a35f5219df720ffb0ba6122d64fe4 to fix
compile time issue.
2024-03-05 12:13:45 -08:00
Alexey Bataev
c8b3edcddd Revert "[SLP][NFC]Use TargetTransformInfo:: instead of TTI:: in BoUpSLP to avoid"
This reverts commit 083d8aa03aca55b88098a91e41e41a8e321a5721.
2024-03-05 12:13:44 -08:00
Alexey Bataev
083d8aa03a [SLP][NFC]Use TargetTransformInfo:: instead of TTI:: in BoUpSLP to avoid
some compilers confusion.
2024-03-05 10:24:06 -08:00
Alexey Bataev
a730ed7c1a
[SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/78976
2024-03-05 12:20:28 -05:00
Alexey Bataev
1c2b79add6
[SLP]Add runtime stride support for strided loads.
Added support for runtime strides.

Reviewers: preames, RKSimon

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/81517
2024-03-05 09:38:25 -05:00
Alexey Bataev
89827863a3 [SLP][NFC]Make canVectorizeLoads member of BoUpSLP class, NFC. 2024-03-04 07:10:27 -08:00
Florian Hahn
617398e5e2
[SLP] Collect candidate VFs in vector in vectorizeStores (NFC). (#82793)
This is in preparation for
https://github.com/llvm/llvm-project/pull/77790 and makes it easy to add
other, non-power-of-2 VFs for processing.

PR: https://github.com/llvm/llvm-project/pull/82793
2024-03-01 20:13:49 +00:00
Florian Hahn
6fe60bd89f
[SLP] Exit early if MaxVF < MinVF (NFCI). (#83283)
Exit early if MaxVF < MinVF. In that case, the loop body below will
never get entered. Note that this adjusts the condition from MaxVF <=
MinVF. If MaxVF == MinVF, vectorization may still be feasible (and the
loop below gets entered).

PR: https://github.com/llvm/llvm-project/pull/83283
2024-03-01 19:43:06 +00:00
Alexey Bataev
2ab6d1e18e [SLP][NFC]Move some check to the outer if to simplify inner checks. 2024-03-01 10:58:41 -08:00
Alexey Bataev
3a30d8e9e5
[SLP]Check if masked gather can be emitted as a serie of loads/insert subvector.
Masked gather is very expensive operation and sometimes better to
represent it as a serie of consecutive/strided loads + insertsubvectors
sequences. Patch adds some basic estimation and if loads+insertsubvector
is cheaper, decides to represent it in this way rather than masked
gather.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/83481
2024-03-01 13:49:29 -05:00
Alexey Bataev
df0fd3a80e
[SLP]Try to vectorize small graph with extractelements, used in buildvector.
If the graph incudes only single "gather" node with only
extractelements/undefs, which used only in insertelement-based
buildvector sequences, it still might be profitable to vectorize it.
Need to rely on the cost model, not throw this graph away immediately.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/83581
2024-03-01 12:48:45 -05:00
Benjamin Kramer
3fc277f665
[SLPVectorizer] Make the insert/extractvector PHICompare a strict-weak ordering (#83571)
This was tripping off STL implementations that check for it (like libc++
with debug checking). The goal of this sort is to cluster operations on
the same values so preserve that property but sort everything else based
on the existing numbering.
2024-03-01 15:37:54 +01:00