Need to fix the way the cost is calculated, otherwise wrong cast opcode
can be selected and lead to the over-optimistic vector cost. Plus, need
to take into account reduction type size.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/87528
We started seeing a crash after 8a0bfe490592de3df28d82c5dd69956e43c20f1d that
the user could be scalable, meaning the typesize is scalable and an implicit
convertion to uint64_t could be performed. Protect against that by making sure
the users type is not scalable.
Compiler can improve analysis for operands of UIToFP/SIToFP instructions
and operands of ICmp instruction.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/85966
Compiler can improve analysis for operands of UIToFP/SIToFP instructions
and operands of ICmp instruction.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/85966
If the node has cmp instruction with 3 or more different but swappable
predicates, need to keep same kind of main/alternate opcodes to avoid
incorrect detection of opcodes after reordering. Reordering changes the
order and we may erroneously consider swappable opcodes as
non-compatible/alternate, which may lead to a later compiler crash.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/87267
If we have mixed extractelement instructions, fixed and scalable ones,
need to check that compiler tries to estimate the cost for fixed vector
extractelement, not the scalable one, to avoid compiler crash.
Need to handle the case, where the resize operation itself is not
reduced but its operand is. In this case need to take an extra analysis
for the operand, not the instruction itself.
Need to fix the analysis for the alternate instructions, based on int
extension operations. If the alternate extension node is resized, but
not the operand, need to resize the node and do not shuffle final
result, we end up only with trunc instruction.
represented by bitwidth without analysis.
Need to check that initial ext/trunc nodes can be safely represented
using calculated bitwidth before applying it.
Adds improved bitwidth analysis for udiv/urem instructions. The
analysis is based on similar version in InstCombiner.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/85928
Need to adjust ReductionBitWIdth after minbitwidth analysis, if the
demanded bits analysis sjows tht its size is less than the size of the
vectorized value. It prevents incorrect sign-zero extension
transformation after.
accepting it.
If the minbitwidth is deduced from the demanded elements, need to check
the final bitwidthfor trunc/ext instruction, bot blindly accepting
the used one.
minbitwidth analysis.
Need to drop nuw/nsw flags, if the alternate node is resized after the
minbitwidth analysis, to avoid producing poison values in corner cases.
Adds improved bitwidth analysis for shl/ashr/lshr instructions. The
analysis is based on similar version in InstCombiner.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/84356
The instruction itself can be considered good for minbitwidth casting,
even if one of the operand checks returns false.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/84363
The instruction itself can be considered good for minbitwidth casting,
even if one of the operand checks returns false.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/84363
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.
Metric: size..text
Program size..text
results results0 diff
test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0%
SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.
Original Pull Request: https://github.com/llvm/llvm-project/pull/84334
The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/84536
This is the second of a series of small patches to compute shuffle masks
for the couple of cases where we call getShuffleCost without one. My
goal is to add an invariant that all calls to getShuffleCost for fixed
length vectors have a mask.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
This is the third of a series of small patches to compute shuffle masks
for the couple of cases where we call getShuffleCost without one. My
goal is to add an invariant that all calls to getShuffleCost for fixed
length vectors have a mask.
After this change, there is one SK_InsertSubvector case left. I excluded
it from this patch just because I thought it worthy of individual
attention and review.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
This is the first of a couple of small patches to compute shuffle masks
for the couple of cases where we call getShuffleCost without one. My
goal is to add an invariant that all calls to getShuffleCost for fixed
length vectors have a mask.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>