1610 Commits

Author SHA1 Message Date
Alexey Bataev
97fb91ee66 [SLP][NFC]Add a test with non-profitable alternate vectorized
instructions.
2024-03-12 12:56:35 -07:00
Martin Storsjö
5b5c21d772 Revert "[SLP]Improve minbitwidth analysis."
This reverts commit 2bd369b48dbf0bc3128becb7ef8f8a1b82514b87.

That commit triggered failed assertions:
$ cat repro.c
short *a;
int b;
void h() {
  short *c = a;
  b = 0;
  for (; b < 4; b++) {
    unsigned d = a[b] + a[b + 4 * 2], e = a[b] - a[b + 4 * 2],
             f = (a[b + 4] >> 1) - a[b + 4 * 3],
             g = a[b + 4] + (a[b + 4 * 3] >> 1);
    c[b] = g;
    c[b + 4] = e + f;
    c[b + 4 * 2] = e - f;
    c[b + 4 * 3] = d - g;
  }
}
$ clang -target aarch64-linux-gnu -c -O2 repro.c
clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:12503: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool): Assertion `(MinBWs.contains(getOperandEntry(E, 0)) || MinBWs.contains(getOperandEntry(E, 1))) && "Expected item in MinBWs."' failed.
2024-03-09 13:53:13 +02:00
Alexey Bataev
2bd369b48d
[SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: https://github.com/llvm/llvm-project/pull/84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/84536
2024-03-08 13:57:02 -05:00
Alexey Bataev
11185715a2 Revert "[SLP]Improve minbitwidth analysis."
This reverts commit 4ce52e2d576937fe930294cae883a0daa17eeced to fix
issues detected by https://lab.llvm.org/buildbot/#/builders/74/builds/26470/steps/12/logs/stdio.
2024-03-07 12:44:53 -08:00
Alexey Bataev
904a6aedca [SLP][NFC]Add lshr version of the test with casting, NFC. 2024-03-07 08:31:58 -08:00
Alexey Bataev
4ce52e2d57
[SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Reviewers: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84334
2024-03-07 10:36:41 -05:00
Alexey Bataev
aae152f1be Revert "[SLP]Improve minbitwidth analysis."
This reverts commit a730ed7c1a4a35f5219df720ffb0ba6122d64fe4 to fix
compile time issue.
2024-03-05 12:13:45 -08:00
Alexey Bataev
a730ed7c1a
[SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/78976
2024-03-05 12:20:28 -05:00
Alexey Bataev
169824ba40 [SLP][NFC]SPlit test/Transforms/SLPVectorizer/AArch64/getelementptr.ll,
NFC.
2024-03-05 08:18:51 -08:00
Alexey Bataev
1c2b79add6
[SLP]Add runtime stride support for strided loads.
Added support for runtime strides.

Reviewers: preames, RKSimon

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/81517
2024-03-05 09:38:25 -05:00
Alexey Bataev
3a30d8e9e5
[SLP]Check if masked gather can be emitted as a serie of loads/insert subvector.
Masked gather is very expensive operation and sometimes better to
represent it as a serie of consecutive/strided loads + insertsubvectors
sequences. Patch adds some basic estimation and if loads+insertsubvector
is cheaper, decides to represent it in this way rather than masked
gather.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/83481
2024-03-01 13:49:29 -05:00
Alexey Bataev
df0fd3a80e
[SLP]Try to vectorize small graph with extractelements, used in buildvector.
If the graph incudes only single "gather" node with only
extractelements/undefs, which used only in insertelement-based
buildvector sequences, it still might be profitable to vectorize it.
Need to rely on the cost model, not throw this graph away immediately.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/83581
2024-03-01 12:48:45 -05:00
Alexey Bataev
f28c4b4bac
[SLP]Fix/improve potential masked gather loads analysis.
When do the analysis for the (potential) masked gather node, we check
that not greater than half of  the pointer operands are loop invariants
or potentially vectorizable.
Need to check actually, that we have a loop at first
and do better check for the potentially vectorizable
pointers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/83472
2024-03-01 07:38:18 -05:00
Alexey Bataev
2d98d763a8
[SLP]Fix the cost model for extracts combined with later shuffle.
If the buildvector node contains extract, which later should be combined
with some other nodes by shuffling, need to estimate the cost of this
shuffle before building the mask after shuffle.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/83442
2024-03-01 07:12:07 -05:00
Alexey Bataev
45d82f33af [SLP]Fix miscompilation, cause by incorrect final node reordering.
Need to use the regular reordering from the correct node for the final
store/insertelement node to avoid miscommilation.
2024-02-29 11:21:06 -08:00
Alexey Bataev
3e0425c76c [SLP][NFC]Add a test showing incorrect transformation, NFC. 2024-02-29 09:24:31 -08:00
Alexey Bataev
32994cc0d6
[SLP]Improve findReusedOrderedScalars and graph rotation.
Patch syncs the code in findReusedOrderedScalars with cost
estimation/codegen. It tries to use similar logic to better determine
best order.
Before, it just tried to find previously vectorized node without
checking if it is possible to use the vectorized value in the shuffle.
Now it relies on the more generalized version. If it determines, that
a single vector must be reordered (using same mechanism, as codegen and
cost estimation), it generates better order.

The comparison between new/ref ordering:

Metric: SLP.NumVectorInstructions

Program                                                                                                                                                SLP.NumVectorInstructions
                                                                                                                                                       results                   results0 diff
                                                                                               test-suite :: MultiSource/Benchmarks/nbench/nbench.test   139.00                    140.00   0.7%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   344.00                    346.00   0.6%
                                                                                        test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test  1293.00                   1292.00  -0.1%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5176.00                   5170.00  -0.1%
                                                                                        test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  5173.00                   5167.00  -0.1%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11692.00                  11660.00  -0.3%
                                                                                     test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  1621.00                   1615.00  -0.4%
                                                                                             test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test   795.00                    792.00  -0.4%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26499.00                  26338.00  -0.6%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  7343.00                   7281.00  -0.8%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  1104.00                   1094.00  -0.9%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test  2216.00                   2180.00  -1.6%
                                                                                            test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   787.00                    637.00 -19.1%

Less 0% is better.
Most of the benchmarks see more vectorized code. The first ones just
have shuffles removed.

The ordering analysis still may require some improvements (e.g. for
alternate nodes), but this one should be produce better results.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/77529
2024-02-22 14:32:15 -05:00
Philip Reames
0f1f82a92e [SLP] Regen a couple of tests to reduce future diff 2024-02-16 10:51:42 -08:00
Philip Reames
dfae162773 [SLP] Update test naming to avoid FileCheck warnings
This only address the tmp name collision.  We still get warnings due to
conflicting ASM.  This is due to the different target attributes on the
function.
2024-02-16 08:33:50 -08:00
Rohit Aggarwal
36adfec155
Adding support of AMDLIBM vector library (#78560)
Hi,

AMD has it's own implementation of vector calls. This patch include the
changes to enable the use of AMD's math library using -fveclib=AMDLIBM.
Please refer https://github.com/amd/aocl-libm-ose 

---------

Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
2024-02-15 12:13:07 +05:30
Florian Hahn
192c23b0c7
[SLP] Add X86 version of non-power-of-2 vectorization tests.
Extra X86 tests for https://github.com/llvm/llvm-project/pull/77790.
2024-02-13 15:56:40 +00:00
Alexey Bataev
b04dd5d187 [SLP]FIx PR81403: compiler crah because wrongly resized vector value.
The mask for the reshuffling/resizing might be calculated incorrectly,
fixed.
2024-02-12 10:27:25 -08:00
Alexey Bataev
833a1cadeb [SLP]Add support for strided loads.
Added basic support for strided loads support in SLP vectorizer.
Supports constant strides only. If the strided load must be
reversed, applies -stride to avoid extra reverse shuffle.

Reviewers: preames, lukel97

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/80310
2024-02-12 09:43:54 -08:00
Alexey Bataev
6a3a5cad2e Revert "[SLP]Add support for strided loads."
This reverts commit 0940f9083e68bda78bcbb323c2968a4294092e21 to fix
issues reported in https://github.com/llvm/llvm-project/pull/80310.
2024-02-12 08:47:28 -08:00
Alexey Bataev
0940f9083e
[SLP]Add support for strided loads.
Added basic support for strided loads support in SLP vectorizer.
Supports constant strides only. If the strided load must be
reversed, applies -stride to avoid extra reverse shuffle.

Reviewers: preames, lukel97

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/80310
2024-02-12 07:41:42 -05:00
Alexey Bataev
df856e4977
[SLP]Add GEP cost estimation for gathered loads.
When doing estimation for vectorization of gathered loads, need to
estimate the cost of the pointer (vectorization), as it is done for the
actual vectorized loads. Otherwise may be too optimistic about the cost
of the gathered loads.

Reviewers: preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/80867
2024-02-07 07:30:41 -05:00
Nikita Popov
2d69827c5c [Transforms] Convert tests to opaque pointers (NFC) 2024-02-05 11:57:34 +01:00
Alexey Bataev
7b9bf80ab5 [SLP][NFC]Add tests with strided loads, NFC. 2024-02-01 09:09:02 -08:00
Alexey Bataev
285bc69846 [SLP]Fix PR80027: Fix costs processing for minbitwidth types.
Need to switch the types, the destination is first in getCastInstrCost
function.
2024-01-30 10:32:55 -08:00
Alexey Bataev
8d89dd4a58 [SLP]Fix PR79743: Check that all users are demoted before trying to
demote the tree entry.

Need to check if all user nodes are marked for demotion before demoting
the node. Otherwise, some data info might be lost after vectorization.
2024-01-29 10:51:20 -08:00
Alexey Bataev
c07d3343dc [SLP][NFC]Add a test for PR79743 with incorrect node demotion, NFC. 2024-01-29 10:26:12 -08:00
Alexey Bataev
6fe21bc1da [SLP]Fix PR79229: Do not erase extractelement, if it used in
multiregister node.

If the node can be span between several registers and same
extractelement instruction is used in several parts, it may be required
to keep such extractelement instruction to avoid compiler crash.
2024-01-25 06:20:53 -08:00
Alexey Bataev
48bbd76587 [SLP]Fix PR79229: Check that extractelement is used only in a single node
before erasing.

Before trying to erase the extractelement instruction, not enough to
check for single use, need to check that it is not used in several nodes
because of the preliminary nodes reordering.
2024-01-24 11:22:22 -08:00
Alexey Bataev
ca654acc16 [SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict
weak ordering.

Compared NumUses to meet the reaquirements of the strict weak ordering.
2024-01-24 09:36:25 -08:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Alexey Bataev
bb3e0d7fc3 [SLP]Fix PR79193: skip analysis of gather nodes for minbitwidth.
No need in trying to analyze small graphs with gather node only to avoid
crash.
2024-01-23 12:44:49 -08:00
Alexey Bataev
f9da4c6ead [SLP][NFC]Add a test with extending the types for vectorized
stores/insertelement instructions, NFC.
2024-01-18 16:42:45 -08:00
Alexey Bataev
093206bb7e [SLP]Fix PR78298: Assertion `GEP->getNumIndices() == 1 &&
!isa<Constant>(GEPIdx)' failed.

The non-constant index might be folded to constant during earlier stages
of vectorization. Need to consider this option and filter out out GEP
with the constant indices from the candidates list.
2024-01-16 09:17:35 -08:00
Alexey Bataev
d79fdb2749 [SLP]Fix PR78236: correctly track external values, replaced several
times during reduction vectorization.

If the external value was replaced in the vectorizer several times during reduction vectorization, need to find the original value to correctly handle external uses and emit extractelement instructions properly.
2024-01-16 06:52:43 -08:00
Alexey Bataev
6fdc2ce8c5 [SLP]Fix PR77916: transform the whole mask, not only the elements for
the second vector.

Need to transform all elements in the long mask, if we decided to
produce shorter version, some elements may still have incorrect inifices
after transformation for the first vector in the permutation.
2024-01-12 07:07:43 -08:00
Alexey Bataev
39b2104b4a [SLP]Fix a crash for reduced values with minbitwidth, which are reused.
If the reduced values are additionally affected by minbitwidth analysis,
need to cast them to a proper type before doing any math, if they are
reused.
2024-01-12 04:49:48 -08:00
Florian Hahn
3b3da7c7fb
[SLP] Add a set of tests with non-power-of-2 operations. 2024-01-11 16:47:38 +00:00
Alexey Bataev
18473eb108 [SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679)
After changes, that does not require support from InstCombine, we can
drop some extra requirements for values-to-be-demoted. No need to check
for external uses for roots/other instructions, just check that the
  no non-vectorized insertelement instruction, which may require
  widening.

Review: https://github.com/llvm/llvm-project/pull/72679
2024-01-11 06:59:57 -08:00
Alexey Bataev
dc717b1992 [SLP][NFC]Add a test for final vector with minbitwidth, NFC. 2024-01-11 06:53:42 -08:00
Martin Storsjö
1de3f46938 Revert "[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679)"
This reverts commit 408dce82016463dcb5026b2ddfc62174970a88e9.

This triggered failed asserts with code like this:

    char a[];
    short *b;
    int c, d, e, f;
    void g() {
      char *h;
      for (;;) {
        for (; f; ++f) {
          h[f] = b[0] * a[e] + b[c] * a[1] >> 7;
          ++b;
        }
        h += d;
      }
    }

Compiled like this:

    $ clang -target x86_64-linux-gnu -c repro.c -O2
    clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
2024-01-11 12:15:35 +02:00
Alexey Bataev
408dce8201
[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679)
After changes, that does not require support from InstCombine, we can
drop some extra requirements for values-to-be-demoted. No need to check
for external uses for roots/other instructions, just check that the
  no non-vectorized insertelement instruction, which may require
  widening.
2024-01-10 14:06:29 -05:00
Alexey Bataev
04f77a1320 [SLP][NFC]Replace constant by some meaningfull values to make test more
relevant, NFC.
2024-01-10 10:34:32 -08:00
Alexey Bataev
73ce13d79b
[SLP][TTI]Improve detection of the insert-subvector pattern for SLP. (#74749)
SLP vectorizer passes the type of the subvector and the mask, which size
determines the size of the resulting vector. TTI should support this
pattern to improve cost estimation of the insert_subvector shuffle
pattern.
2024-01-10 10:39:34 -05:00
Alexey Bataev
036e48e2f5 [SLP]Fix PR76850: do the analysis of the submask.
Need to limit the transformation of the VecMask by the corresponding part of the mask of SliceSize size to avoid compiler crash during further cost analysis.
2024-01-08 07:51:02 -08:00
Alexey Bataev
79e62315be [SLP]Use revectorized value for extracts from buildvector, beeing
vectorized.

When trying to reuse the extractelement instruction, emitted for the
insertelement instruction, need to check, if the this insertelement
instruction was vectorized. In this case, need to use vectorized value,
not the original insertelement.
2024-01-04 06:45:26 -08:00