1686 Commits

Author SHA1 Message Date
Alexey Bataev
78c50bbd45 [SLP][NFC]Remove unused variable, NFC. 2024-04-08 09:16:44 -07:00
Alexey Bataev
4a1c53f9fa [SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics.
https://alive2.llvm.org/ce/z/ivPZ26 for the abs transformations.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/86135
2024-04-08 08:32:35 -07:00
Alexey Bataev
a612524197
[SLP]Fix the cost of the reduction result to the final type.
Need to fix the way the cost is calculated, otherwise wrong cast opcode
can be selected and lead to the over-optimistic vector cost. Plus, need
to take into account reduction type size.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/87528
2024-04-07 09:51:47 -04:00
Martin Storsjö
bd9486b4ec Revert "[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics."
This reverts commit 66b528078e4852412769375e35d2a672bf36a0ec.

This commit caused miscompilations, breaking tests in the libyuv
testsuite - see
https://github.com/llvm/llvm-project/pull/86135#issuecomment-2041049709
for more details.
2024-04-06 23:53:26 +03:00
Alexey Bataev
66b528078e
[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics.
https://alive2.llvm.org/ce/z/ivPZ26 for the abs transformations.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/86135
2024-04-05 14:29:26 -04:00
David Green
31fd6b8eec [SLP] Protect against scalable vector users.
We started seeing a crash after 8a0bfe490592de3df28d82c5dd69956e43c20f1d that
the user could be scalable, meaning the typesize is scalable and an implicit
convertion to uint64_t could be performed. Protect against that by making sure
the users type is not scalable.
2024-04-05 11:30:14 +01:00
Alexey Bataev
8a0bfe4905 [SLP]Fix PR87630: wrong result for externally used vector value.
Need to check that the externally used value can be represented with the
BitWidth before applying it, otherwise need to keep wider type.
2024-04-04 12:03:28 -07:00
Simon Pilgrim
d54d476300 [SLP] Fix Wunused-variable warning. NFC. 2024-04-04 12:26:34 +01:00
Alexey Bataev
42cbceb0f0 [SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions.
Compiler can improve analysis for operands of UIToFP/SIToFP instructions
and operands of ICmp instruction.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/85966
2024-04-03 14:18:45 -07:00
Alexey Bataev
fa2bbea14d Revert "[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions."
This reverts commit 899855d2b11856a44e530fffe854d76be69b9008 to fix the
issue reported in https://lab.llvm.org/buildbot/#/builders/165/builds/51659.
2024-04-03 13:10:16 -07:00
Alexey Bataev
899855d2b1
[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions.
Compiler can improve analysis for operands of UIToFP/SIToFP instructions
and operands of ICmp instruction.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/85966
2024-04-03 15:58:58 -04:00
Alexey Bataev
d57884011e
[SLP]Add support for commutative intrinsics.
Implemented long-standing TODO to support commutative intrinsics.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/86316
2024-04-03 14:28:36 -04:00
Alexey Bataev
cd29126b63
[SLP]Fix PR87133: crash because of different altopcodes for cmps after reordering.
If the node has cmp instruction with 3 or more different but swappable
predicates, need to keep same kind of main/alternate opcodes to avoid
incorrect detection of opcodes after reordering. Reordering changes the
order and we may erroneously consider swappable opcodes as
non-compatible/alternate, which may lead to a later compiler crash.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/87267
2024-04-03 13:47:50 -04:00
Alexey Bataev
07a566793b [SLP]Fix PR87477: fix alternate node cast cost/codegen.
Have to compare actual type size to pick up proper cast operation
opcode.
2024-04-03 10:00:03 -07:00
Alexey Bataev
250b467f7c [SLP][NFC]Simplify common analysis of instructions in BoUpSLP::collectValuesToDemote by outlining common code, NFC. 2024-04-03 06:45:42 -07:00
Han-Kuan Chen
bf1df25048
[SLP] Use isValidElementType instead of (#87469)
FixedVectorType::isValidElementType for consistency.
2024-04-03 17:57:46 +08:00
Alexey Bataev
d595080b48 [SLP]Fix PR87384: check for fixed vector type before using.
If we have mixed extractelement instructions, fixed and scalable ones,
need to check that compiler tries to estimate the cost for fixed vector
extractelement, not the scalable one, to avoid compiler crash.
2024-04-02 11:38:26 -07:00
Alexey Bataev
9cb7dffa88 [SLP]Fix PR80027: handle case when ext is not reduced but its operand is.
Need to handle the case, where the resize operation itself is not
reduced but its operand is. In this case need to take an extra analysis
for the operand, not the instruction itself.
2024-04-02 09:32:25 -07:00
Alexey Bataev
6b7b18a1a7 [SLP]Fix PR87329: crash on alternate cast vectorization.
Need to fix the analysis for the alternate instructions, based on int
extension operations. If the alternate extension node is resized, but
not the operand, need to resize the node and do not shuffle final
result, we end up only with trunc instruction.
2024-04-02 08:19:29 -07:00
Alexey Bataev
cb9cf331fa [SLP][NFC]Do not lookup in MinBWs, reuse previously used iterator. 2024-04-02 05:53:34 -07:00
Alexey Bataev
41afef9066 [SLP]Fix PR87011: Missing sign extension of demoted type before zero extension
Need to drop skipping of the first zext/sext nodes, it leads to
incorrect and less profitable code.
2024-04-01 06:07:18 -07:00
Jakub Kuderski
2b0ab05c4a
[SLP][NFC] Simplify type checks with isa predicates (#87182)
For more context on isa predicates, see:
https://github.com/llvm/llvm-project/pull/83753.
2024-03-31 14:55:11 -04:00
Alexey Bataev
01e02e0b6a [SLP]Fix PR87011: Do not assume that initial ext/trunc nodes can be
represented by bitwidth without analysis.

Need to check that initial ext/trunc nodes can be safely represented
using calculated bitwidth before applying it.
2024-03-28 18:02:26 -07:00
Alexey Bataev
70cf2a09ce [SLP][NFC]Simplify function/constructors by removing unnecessary params. 2024-03-28 13:34:59 -07:00
Alexey Bataev
d7975c9d93
[SLP]Add better minbitwidth analysis for udiv/urem instructions.
Adds improved bitwidth analysis for udiv/urem instructions. The
analysis is based on similar version in InstCombiner.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/85928
2024-03-28 10:35:15 -04:00
Alexey Bataev
b7a4ace72e [SLP][NFC]Improve compile time by size analysis limit and reduction size
limit.

Used RecursionMaxDepth to limit number of lookups in BoUpSLP::getVectorElementSize and limited reduction width for bool reduced values.
2024-03-27 14:46:04 -07:00
Alexey Bataev
d94dc5f0d6 [SLP]Fix PR86763: do not truncate reductions to the demanded bits size.
Need to adjust ReductionBitWIdth after minbitwidth analysis, if the
demanded bits analysis sjows tht its size is less than the size of the
vectorized value. It prevents incorrect sign-zero extension
transformation after.
2024-03-27 14:34:59 -07:00
Alexey Bataev
b43ec8e62b [SLP]Fix PR86798: handle phi nodes being trunced, but not its operands.
If the phi node is trunced, but not its operand(s), need to handle this
situation in the assertion, code already does the right transformation.
2024-03-27 07:21:45 -07:00
Alexey Bataev
342f7d0d35 [SLP]Fix PR86620: check final minbitwidth for truncs/exts before
accepting it.

If the minbitwidth is deduced from the demanded elements, need to check
the final bitwidthfor trunc/ext instruction, bot blindly accepting
the used one.
2024-03-26 11:27:17 -07:00
Alexey Bataev
26dd12871c [SLP]Do not propagate nuw/nsw flags for alt nodes, affected by
minbitwidth analysis.

Need to drop nuw/nsw flags, if the alternate node is resized after the
minbitwidth analysis, to avoid producing poison values in corner cases.
2024-03-26 10:24:09 -07:00
Alexey Bataev
54ca1e2c04 [SLP]Fix PR80027: include initial trunc nodes to the demoted values.
Need to include initial sext/zext/trunc nodes to the list of the demoted
root values to correctly calculate the cost and handle the
vectorization.
2024-03-26 06:40:57 -07:00
Patrick O'Neill
4652ec0e29
[SLP] Delete vectorized users when tree contains an invalid cost (#86344) 2024-03-22 17:52:27 -04:00
Alexey Bataev
9c0a0659d4 [SLP]Fix a crash for non-profitable non-schedulable single buildvector node tree, if the threshold allows its vectorization. 2024-03-22 07:44:23 -07:00
Alexey Bataev
3942bd2fb5 [SLP]Fix a crash if the argument of call was affected by minbitwidt
analysis.

Need to support proper type conversion for function arguments to avoid
compiler crash.
2024-03-21 17:06:48 -07:00
Alexey Bataev
8d7a6e2fd8 [SLP]Fix a crash for gather node with instructions from different bbs,
if cost threshold is very low.
2024-03-21 08:03:06 -07:00
Alexey Bataev
34f0a8aaba [SLP]Fix comparison in bitwidth check.
Projected bitwidth should be less than the original, not greater.
2024-03-21 04:24:34 -07:00
Alexey Bataev
04f7cd7f45 [SLP][NFC]Make findBestRootPair() member function constant. 2024-03-20 08:33:47 -07:00
Alexey Bataev
6c1d4454ad
[SLP]Improve minbitwidth analysis for shifts.
Adds improved bitwidth analysis for shl/ashr/lshr instructions. The
analysis is based on similar version in InstCombiner.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84356
2024-03-20 09:07:26 -04:00
Alexey Bataev
81d9ed605b [SLP]Do extra analysis int minbitwidth if some checks return false.
The instruction itself can be considered good for minbitwidth casting,
even if one of the operand checks returns false.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84363
2024-03-20 05:48:55 -07:00
Alexey Bataev
3a90cb4c18 Revert "[SLP]Do extra analysis int minbitwidth if some checks return false."
This reverts commit da118c93b40f74f6770cf8550903721555d3c97b to fix
crashes reported in https://github.com/llvm/llvm-project/pull/84363.
2024-03-20 05:00:05 -07:00
Nikita Popov
27df1b23e0
[SLPVectorizer] Use TargetFolder (#85800)
Use IRBuilder with TargetFolder in SLPVectorizer to avoid the custom
constant folding code.

This fixes the remaining part of
https://github.com/llvm/llvm-project/issues/61240.
2024-03-20 09:18:45 +01:00
Alexey Bataev
da118c93b4 [SLP]Do extra analysis int minbitwidth if some checks return false.
The instruction itself can be considered good for minbitwidth casting,
even if one of the operand checks returns false.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84363
2024-03-19 12:24:18 -07:00
Alexey Bataev
31eaf86a1e [SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: https://github.com/llvm/llvm-project/pull/84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Reviewers:

Pull Request: https://github.com/llvm/llvm-project/pull/84536
2024-03-19 08:19:45 -07:00
Nikita Popov
94c6ce1de9 [SLPVectorizer] Use IRBuilderBase where possible (NFC)
Instead of hardcoding a specific IRBuilder type, use the base
class.
2024-03-19 14:27:48 +01:00
Alexey Bataev
9a42bdc0ae [SLP][NFC]Fix signedness to avoid comparison warning. 2024-03-15 09:56:40 -07:00
Philip Reames
0674ed753a
[SLP] Compute a shuffle mask for getGatherCost (#85330)
This is the second of a series of small patches to compute shuffle masks
for the couple of cases where we call getShuffleCost without one. My
goal is to add an invariant that all calls to getShuffleCost for fixed
length vectors have a mask.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-03-15 08:32:46 -07:00
Philip Reames
45e41f9686
[SLP] Compute a shuffle mask for SK_InsertSubvector (#85408)
This is the third of a series of small patches to compute shuffle masks
for the couple of cases where we call getShuffleCost without one. My
goal is to add an invariant that all calls to getShuffleCost for fixed
length vectors have a mask.

After this change, there is one SK_InsertSubvector case left. I excluded
it from this patch just because I thought it worthy of individual
attention and review.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-03-15 08:32:18 -07:00
Philip Reames
f337525ee8
[SLP] Compute a shuffle mask for SK_Broadcast shuffle (#85327)
This is the first of a couple of small patches to compute shuffle masks
for the couple of cases where we call getShuffleCost without one. My
goal is to add an invariant that all calls to getShuffleCost for fixed
length vectors have a mask.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-03-15 07:41:26 -07:00
Alexey Bataev
3789870758 Revert "[SLP]Improve minbitwidth analysis."
This reverts commit 7f2167868d8c1cedd3915883412b9c787a2f01db to fix
issues reported in https://github.com/llvm/llvm-project/pull/84536.
2024-03-15 03:59:48 -07:00
Alexey Bataev
dbbe2fe2a2 Revert "[SLP]Do extra analysis int minbitwidth if some checks return false."
This reverts commit e4b772444c8176abe30d364e4a946ee6c8ae8de4 to fixx the
issues reported in https://github.com/llvm/llvm-project/pull/84536.
2024-03-15 03:58:34 -07:00