1712 Commits

Author SHA1 Message Date
Mikhail Goncharov
054b1b3b5a Revert "[SLP]Fix final analysis for unsigned nodes."
This reverts commit 74e07ab523122d6a8347b25770062ab331b6bb84.

It might be that Mask.getBitWidth() == Mask.countl_zero() (32 in my
case) and zero bitwidth2 causes the crash.
2024-04-19 11:32:56 +02:00
Alexey Bataev
74e07ab523 [SLP]Fix final analysis for unsigned nodes.
Need to check that at least single bit is cleared for unsigned nodes
before reducing their size. Otherwise they might be treated as signed in
signed nodes.
2024-04-18 10:05:54 -07:00
Alexey Bataev
9462abdff1 [SLP]Fix PR89187: fixx assertion check.
Need to use proper index variable to fix a crash.
2024-04-18 04:22:25 -07:00
Nikita Popov
888836930b Revert "[SLP]Attempt to vectorize long stores, if short one failed."
This reverts commit 6f7160eedb2db02f37d4ffd52fff7b0cf88b3fdc.

This still causes large compile-time regressions in some cases.
2024-04-18 10:15:45 +09:00
Alexey Bataev
6f7160eedb [SLP]Attempt to vectorize long stores, if short one failed.
We can try to vectorize long store sequences, if short ones were
unsuccessful because of the non-profitable vectorization. It should not
increase compile time significantly (stores are sorted already,
complexity is n x log n), but vectorize extra code.

Metric: size..text

Program                                                                         size..text
                                                                                results     results0    diff
         test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1088012.00  1088236.00  0.0%
                  test-suite :: SingleSource/UnitTests/matrix-types-spec.test   480396.00   480476.00  0.0%
          test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   664613.00   664661.00  0.0%
         test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   664613.00   664661.00  0.0%
        test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2041105.00  2040961.00 -0.0%
                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test   836563.00   836387.00 -0.0%
                 test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1035100.00  1032140.00 -0.3%

In all benchmarks extra code gets vectorized

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/88563
2024-04-17 10:24:35 -07:00
Nikita Popov
efd60556f7 Revert "[SLP]Attempt to vectorize long stores, if short one failed."
This reverts commit 7d4e8c1f3bbfe976f4871c9cf953f76d771b0eda.

Contrary to the commit description, this does cause large
compile-time regressions (up to 10% on individual files).
2024-04-17 09:25:05 +09:00
Alexey Bataev
7d4e8c1f3b
[SLP]Attempt to vectorize long stores, if short one failed.
We can try to vectorize long store sequences, if short ones were
unsuccessful because of the non-profitable vectorization. It should not
increase compile time significantly (stores are sorted already,
complexity is n x log n), but vectorize extra code.

Metric: size..text

Program                                                                         size..text
                                                                                results     results0    diff
         test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1088012.00  1088236.00  0.0%
                  test-suite :: SingleSource/UnitTests/matrix-types-spec.test   480396.00   480476.00  0.0%
          test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   664613.00   664661.00  0.0%
         test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   664613.00   664661.00  0.0%
        test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2041105.00  2040961.00 -0.0%
                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test   836563.00   836387.00 -0.0%
                 test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1035100.00  1032140.00 -0.3%

In all benchmarks extra code gets vectorized

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/88563
2024-04-16 14:55:41 -04:00
Alexey Bataev
c7657cf7d1
[SLP]Keep externally used GEPs as GEPs, if possible instead of extractelement.
If the vectorized GEP instruction can be still kept as a scalar GEP,
better to keep it as scalar instead of extractelement. In many cases it
is more profitable.

Metric: size..text

Program                                                                          size..text
                                                                                 results     results0    diff
                        test-suite :: SingleSource/Benchmarks/Misc/oourafft.test    18911.00    19695.00  4.1%
                   test-suite :: SingleSource/Benchmarks/Misc-C++-EH/spirit.test    59987.00    60707.00  1.2%
       test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1392209.00  1392753.00  0.0%
        test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1392209.00  1392753.00  0.0%
           test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1087996.00  1088236.00  0.0%
                         test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   309310.00   309342.00  0.0%
             test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   664661.00   664693.00  0.0%
            test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   664661.00   664693.00  0.0%
        test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12354636.00 12354908.00  0.0%
                  test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  1152748.00  1152716.00 -0.0%
                       test-suite :: MultiSource/Applications/oggenc/oggenc.test   191787.00   191771.00 -0.0%
                     test-suite :: SingleSource/UnitTests/matrix-types-spec.test   480796.00   480476.00 -0.1%

Misc/oourafft - Extra code gets vectorized
Misc-C++-EH/spirit - same
CFP2017speed/638.imagick_s
CFP2017rate/538.imagick_r - same, extra code gets vectorized
CINT2006/400.perlbench - some extra 4 x ptr stores vectorized
Bullet/bullet - extra 4 x ptr store vectorized
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - same
CFP2017rate/526.blender_r - extra 8 x float stores (several), some extra
4 x ptr stores
CFP2006/453.povray - 2 x double loads/stores replaced by 4 x double
loads/stores
Applications/oggenc - extra code is vectorized
UnitTests/matrix-types-spec - extra code gets vectorized

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/88877
2024-04-16 14:54:06 -04:00
Alexey Bataev
26ebe16d78 [SLP]Fix PR88834: check if unsigned arg can be trunced, being used in smax/smin intrinsics.
Need to check that unsigned argument can be safely used in smax/smin
intrinsics by checking if at least single sign bit is cleared, otherwise
its value may be treated as negative instead of positive.
2024-04-16 06:42:15 -07:00
Florian Hahn
b73476c784
[SLP] Make sure MinVF is a power-of-2 by using PowerOf2Ceil.
This should ensure we explore the same VFs as before 6d66db3890a18e39.

Fixes https://github.com/llvm/llvm-project/issues/88640.
2024-04-16 13:29:35 +01:00
Alexey Bataev
b7b183371b [SLP][NFC]Improve perf of BoUpSLP::buildExternalUses function, NFC.
Perform required operations, only when the result is required + check if
value was not inserted already into the list of the externally used
scalars and early exit, if it was.
2024-04-15 11:18:24 -07:00
Valery Dmitriev
9abb1ffc5c
[SLP][NFC] Add option to bypass early profitability check. (#88594)
The option intended primarily for LIT tests to suppress heuristic based
profitability check and proceed vectorization of a seemingly
unprofitable alternate operation pattern. This allows the vectorizer to
execute path that was the original intent of a test.
2024-04-15 09:09:39 -07:00
Florian Hahn
6704faf6f8
[SLP] Use StoreTy to compute min VF.
This ensures that MinVF is a power-of-2, even if ValueTy's width is
not a power-of-2.

This should fix a number of buildbot failures with X86 bootstrapping.
2024-04-13 11:12:33 +01:00
Florian Hahn
6d66db3890
[SLP] Initial vectorization of non-power-of-2 ops. (#77790)
This patch enables vectorization for non-power-of-2 VFs. Initially only
VFs where adding 1 makes the VF a power-of-2, i.e. we can still make
relatively effective use of the vectors.

It relies on the existing target cost-models to return accurate costs
for
non-power-of-2 vectors. I checked mostly AArch64 and X86 and
there the costs seem reasonable for the costs I checked, although
I expect there will be a need to refine both the cost-models and
lowering
to make most effective use of non-power-of-2 SLP vectorization.

Note that re-ordering and shuffling is not implemented for nodes
requiring padding yet to keep the initial implementation simpler.

The feature is guarded by a new flag, off by defaul for now.

PR: https://github.com/llvm/llvm-project/pull/77790
2024-04-13 08:14:40 +01:00
Alexey Bataev
8a4b7de91d [SLP][NFC]Make TTIRef capture argument instead of outer declaration. 2024-04-12 13:44:26 -07:00
Alexey Bataev
a5eaec83b3 [SLP]Fix variable redefinition error 2024-04-11 12:01:38 -07:00
Alexey Bataev
6b85fb1ef8
[SLP]Consider (f)sub, being operand of llvm.(f)abs/icmp eq/ne 0, commutative.
If (f)sub is only operand of llvm.(f)abs or icmp eq/ne 0 (int only), we can consider it as commutative operation, just need to drop wrapping flags for ineteger
operation.

https://alive2.llvm.org/ce/z/GxvxjB for correctness of abs with dropped
flags.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/86196
2024-04-11 14:56:39 -04:00
Alexey Bataev
f7cc224044 Revert "[libclc] Refactor build system to allow in-tree builds (#87622)"
This reverts commit 9029e6ebdfd98a58b5c5646fd534c6c849148cda, which was
committed by mistake with the wrong message and fails  https://lab.llvm.org/buildbot/#/builders/221/builds/21958.
2024-04-11 11:08:47 -07:00
Fraser Cormack
9029e6ebdf [libclc] Refactor build system to allow in-tree builds (#87622)
The previous build system was adding custom "OpenCL" and "LLVM IR"
languages in CMake to build the builtin libraries. This was making it
harder to build in-tree because the tool binaries needed to be present
at configure time.

This commit refactors the build system to use custom commands to build
the bytecode files one by one, and link them all together into the final
bytecode library. It also enables in-tree builds by aliasing the
clang/llvm-link/etc. tool targets to internal targets, which are
imported from the LLVM installation directory when building out of tree.

Diffing (with llvm-diff) all of the final bytecode libraries in an
out-of-tree configuration against those built using the current tip
system shows no changes. Note that there are textual changes to metadata
IDs which confuse regular diff, and that llvm-diff 14 and below may show
false-positives.

This commit also removes a file listed in one of the SOURCEs which
didn't exist and which was preventing the use of
ENABLE_RUNTIME_SUBNORMAL when configuring CMake.
2024-04-11 10:56:45 -07:00
Alexey Bataev
f6749d8dcb [SLP]Consider unsigned nodes, feeding into sitofp, being converted using
uitofp.

Need to use uitofp for unsigned nodes, which are part of minbitwidth
analysis, to correctly handle signedness info.
2024-04-11 10:39:54 -07:00
Alexey Bataev
2b00a73f62
[SLP]Buildvector for alternate instructions with non-profitable gather operands.
If the operands of the potentially alternate node are going to produce
buildvector sequences, which result in more instructions, than the
original code, then suhinstructions should be vectorized as alternate
node, better to end up with the buildvector node.

Left column - experimental, Right - reference.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   413680.00   416272.00  0.6%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00  0.0%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   664901.00   664949.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   664901.00   664949.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1171371.00  1171355.00 -0.0%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1036396.00  1036284.00 -0.0%
                                                                         test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   111280.00   111248.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1392113.00  1391361.00 -0.1%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1392113.00  1391361.00 -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   281676.00   281452.00 -0.1%
                                                                                    test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test     3025.00     3019.00 -0.2%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test     6351.00     6335.00 -0.3%

Metric: SLP.NumVectorInstructions

Program                                                                                                                                                SLP.NumVectorInstructions
                                                                                                                                                       results                   results0 diff
                                                                                    test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test    15.00                     16.00   6.7%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1703.00                   1707.00   0.2%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1703.00                   1707.00   0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00                  26239.00  -0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00                  11754.00  -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   824.00                    822.00  -0.2%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  5668.00                   5654.00  -0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  5668.00                   5654.00  -0.2%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   792.00                    790.00  -0.3%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   792.00                    790.00  -0.3%
                                                                                       test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test  1389.00                   1384.00  -0.4%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   596.00                    590.00  -1.0%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test     6.00                      5.00 -16.7%

Metric: exec_time

Program                                                                                                                                                exec_time
                                                                                                                                                       results   results0  diff
                                                                               test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test     99.14    100.00    0.9%

Other changes are not significant (less than 0.1% percent with exectime
less 5 secs).

SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns
remain scalar, smaller code.
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small
changes, some extra stores gets vectorized.
External/SPEC/CINT2017speed/625.x264_s/625.x264_s
External/SPEC/CINT2017rate/525.x264_r/525.x264_r
x264 has one change in a loop body, in function ssim_end4, some code
remain scalar, resulting in less code size.
External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code
gets vectorized, looks like some other patterns were matched.
MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were
vectorized (looks like the graphs become profitable)
MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small
changes in vectorized code (some small part remain scalar).
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s
Many changes cause by the fact that the code of one function becomes
smaller (onvertLCHabToRGB) and this functions gets inlined after that.
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small
changes here and there, some extra code is vectorized, some remain
scalar (2 x vectors)
MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars
+ 2 insertelems instead of insert, broadcast, alt code (3 instructions,
  total 5 insts)
MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph
becomes profitable and gets vectorized.
External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r
External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s
Some small graph becomes profitable and gets vectorized.
MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code.

Reviewers: RKSimon, dtcxzyw

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84978
2024-04-10 14:33:56 -04:00
Alexey Bataev
6ca5a410d2 [SLP]Fix PR87358: broken module, Instruction does not dominate all uses.
If the first node is a gather node with extractelement instructions,
still need to put the vector value after all instructions, not after the
very first one.
2024-04-10 08:24:15 -07:00
Alexey Bataev
938a73422e [SLP][NFC]Walk over entries, not single values.
Better to walk over SLP nodes rather than single values. Matching
a value to a node is not a 1-to-1 relation, one value may be part of
several nodes and compiler may get wrong node, when trying to map it.
Currently there are no such issues detected, but they may appear in
future.
2024-04-10 06:03:26 -07:00
Alexey Bataev
910d2de357 [SLP]Fix PR88103: consider the sign of the compare for non-negative operands.
Need to improve detection of number of bits, required for the operand,
before doing a reduction. If the instruction is incoming operand of the
signed compare, need to consider adding an extra bit for signedness.
2024-04-09 10:47:47 -07:00
Alexey Bataev
e8e67957fa [SLP]Fix PR88123: use vectorized operands consistently.
Need to use vectorized operands, not the vecop of the extractelement
instructions, to avoid false detection of the extra vector operand in
the extractelements shuffling.
2024-04-09 08:42:57 -07:00
Alexey Bataev
01d9528ef9
[SLP]Improve final minbitwidth analysis attempt.
Added part for demanded bits analysis in the IsPotentiallyTruncated to
improve minbitwidth analysis final attempts.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    43069.00    42973.00 -0.2%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    43066.00    42970.00 -0.2%

Extra trunc instructions are emitted to operate with <32 x i8> instead
of <32 x i16>, will be removed in the next patches.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/87786
2024-04-08 15:54:30 -04:00
Alexey Bataev
78c50bbd45 [SLP][NFC]Remove unused variable, NFC. 2024-04-08 09:16:44 -07:00
Alexey Bataev
4a1c53f9fa [SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics.
https://alive2.llvm.org/ce/z/ivPZ26 for the abs transformations.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/86135
2024-04-08 08:32:35 -07:00
Alexey Bataev
a612524197
[SLP]Fix the cost of the reduction result to the final type.
Need to fix the way the cost is calculated, otherwise wrong cast opcode
can be selected and lead to the over-optimistic vector cost. Plus, need
to take into account reduction type size.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/87528
2024-04-07 09:51:47 -04:00
Martin Storsjö
bd9486b4ec Revert "[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics."
This reverts commit 66b528078e4852412769375e35d2a672bf36a0ec.

This commit caused miscompilations, breaking tests in the libyuv
testsuite - see
https://github.com/llvm/llvm-project/pull/86135#issuecomment-2041049709
for more details.
2024-04-06 23:53:26 +03:00
Alexey Bataev
66b528078e
[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics.
https://alive2.llvm.org/ce/z/ivPZ26 for the abs transformations.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/86135
2024-04-05 14:29:26 -04:00
David Green
31fd6b8eec [SLP] Protect against scalable vector users.
We started seeing a crash after 8a0bfe490592de3df28d82c5dd69956e43c20f1d that
the user could be scalable, meaning the typesize is scalable and an implicit
convertion to uint64_t could be performed. Protect against that by making sure
the users type is not scalable.
2024-04-05 11:30:14 +01:00
Alexey Bataev
8a0bfe4905 [SLP]Fix PR87630: wrong result for externally used vector value.
Need to check that the externally used value can be represented with the
BitWidth before applying it, otherwise need to keep wider type.
2024-04-04 12:03:28 -07:00
Simon Pilgrim
d54d476300 [SLP] Fix Wunused-variable warning. NFC. 2024-04-04 12:26:34 +01:00
Alexey Bataev
42cbceb0f0 [SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions.
Compiler can improve analysis for operands of UIToFP/SIToFP instructions
and operands of ICmp instruction.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/85966
2024-04-03 14:18:45 -07:00
Alexey Bataev
fa2bbea14d Revert "[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions."
This reverts commit 899855d2b11856a44e530fffe854d76be69b9008 to fix the
issue reported in https://lab.llvm.org/buildbot/#/builders/165/builds/51659.
2024-04-03 13:10:16 -07:00
Alexey Bataev
899855d2b1
[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions.
Compiler can improve analysis for operands of UIToFP/SIToFP instructions
and operands of ICmp instruction.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/85966
2024-04-03 15:58:58 -04:00
Alexey Bataev
d57884011e
[SLP]Add support for commutative intrinsics.
Implemented long-standing TODO to support commutative intrinsics.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/86316
2024-04-03 14:28:36 -04:00
Alexey Bataev
cd29126b63
[SLP]Fix PR87133: crash because of different altopcodes for cmps after reordering.
If the node has cmp instruction with 3 or more different but swappable
predicates, need to keep same kind of main/alternate opcodes to avoid
incorrect detection of opcodes after reordering. Reordering changes the
order and we may erroneously consider swappable opcodes as
non-compatible/alternate, which may lead to a later compiler crash.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/87267
2024-04-03 13:47:50 -04:00
Alexey Bataev
07a566793b [SLP]Fix PR87477: fix alternate node cast cost/codegen.
Have to compare actual type size to pick up proper cast operation
opcode.
2024-04-03 10:00:03 -07:00
Alexey Bataev
250b467f7c [SLP][NFC]Simplify common analysis of instructions in BoUpSLP::collectValuesToDemote by outlining common code, NFC. 2024-04-03 06:45:42 -07:00
Han-Kuan Chen
bf1df25048
[SLP] Use isValidElementType instead of (#87469)
FixedVectorType::isValidElementType for consistency.
2024-04-03 17:57:46 +08:00
Alexey Bataev
d595080b48 [SLP]Fix PR87384: check for fixed vector type before using.
If we have mixed extractelement instructions, fixed and scalable ones,
need to check that compiler tries to estimate the cost for fixed vector
extractelement, not the scalable one, to avoid compiler crash.
2024-04-02 11:38:26 -07:00
Alexey Bataev
9cb7dffa88 [SLP]Fix PR80027: handle case when ext is not reduced but its operand is.
Need to handle the case, where the resize operation itself is not
reduced but its operand is. In this case need to take an extra analysis
for the operand, not the instruction itself.
2024-04-02 09:32:25 -07:00
Alexey Bataev
6b7b18a1a7 [SLP]Fix PR87329: crash on alternate cast vectorization.
Need to fix the analysis for the alternate instructions, based on int
extension operations. If the alternate extension node is resized, but
not the operand, need to resize the node and do not shuffle final
result, we end up only with trunc instruction.
2024-04-02 08:19:29 -07:00
Alexey Bataev
cb9cf331fa [SLP][NFC]Do not lookup in MinBWs, reuse previously used iterator. 2024-04-02 05:53:34 -07:00
Alexey Bataev
41afef9066 [SLP]Fix PR87011: Missing sign extension of demoted type before zero extension
Need to drop skipping of the first zext/sext nodes, it leads to
incorrect and less profitable code.
2024-04-01 06:07:18 -07:00
Jakub Kuderski
2b0ab05c4a
[SLP][NFC] Simplify type checks with isa predicates (#87182)
For more context on isa predicates, see:
https://github.com/llvm/llvm-project/pull/83753.
2024-03-31 14:55:11 -04:00
Alexey Bataev
01e02e0b6a [SLP]Fix PR87011: Do not assume that initial ext/trunc nodes can be
represented by bitwidth without analysis.

Need to check that initial ext/trunc nodes can be safely represented
using calculated bitwidth before applying it.
2024-03-28 18:02:26 -07:00
Alexey Bataev
70cf2a09ce [SLP][NFC]Simplify function/constructors by removing unnecessary params. 2024-03-28 13:34:59 -07:00