llvm-project

Author	SHA1	Message	Date
Alexey Bataev	2b00a73f62	[SLP]Buildvector for alternate instructions with non-profitable gather operands. If the operands of the potentially alternate node are going to produce buildvector sequences, which result in more instructions, than the original code, then suhinstructions should be vectorized as alternate node, better to end up with the buildvector node. Left column - experimental, Right - reference. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413680.00 416272.00 0.6% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171371.00 1171355.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1036396.00 1036284.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111280.00 111248.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1392113.00 1391361.00 -0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1392113.00 1391361.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281676.00 281452.00 -0.1% test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 3025.00 3019.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6351.00 6335.00 -0.3% Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 15.00 16.00 6.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00 26239.00 -0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00 11754.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 824.00 822.00 -0.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 792.00 790.00 -0.3% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 792.00 790.00 -0.3% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1389.00 1384.00 -0.4% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 596.00 590.00 -1.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6.00 5.00 -16.7% Metric: exec_time Program exec_time results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 99.14 100.00 0.9% Other changes are not significant (less than 0.1% percent with exectime less 5 secs). SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns remain scalar, smaller code. External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small changes, some extra stores gets vectorized. External/SPEC/CINT2017speed/625.x264_s/625.x264_s External/SPEC/CINT2017rate/525.x264_r/525.x264_r x264 has one change in a loop body, in function ssim_end4, some code remain scalar, resulting in less code size. External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code gets vectorized, looks like some other patterns were matched. MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were vectorized (looks like the graphs become profitable) MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small changes in vectorized code (some small part remain scalar). External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s Many changes cause by the fact that the code of one function becomes smaller (onvertLCHabToRGB) and this functions gets inlined after that. MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small changes here and there, some extra code is vectorized, some remain scalar (2 x vectors) MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars + 2 insertelems instead of insert, broadcast, alt code (3 instructions, total 5 insts) MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph becomes profitable and gets vectorized. External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s Some small graph becomes profitable and gets vectorized. MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code. Reviewers: RKSimon, dtcxzyw Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84978	2024-04-10 14:33:56 -04:00
Alexey Bataev	6ca5a410d2	[SLP]Fix PR87358: broken module, Instruction does not dominate all uses. If the first node is a gather node with extractelement instructions, still need to put the vector value after all instructions, not after the very first one.	2024-04-10 08:24:15 -07:00
Alexey Bataev	910d2de357	[SLP]Fix PR88103: consider the sign of the compare for non-negative operands. Need to improve detection of number of bits, required for the operand, before doing a reduction. If the instruction is incoming operand of the signed compare, need to consider adding an extra bit for signedness.	2024-04-09 10:47:47 -07:00
Alexey Bataev	4dfc55f7e7	[SLP][NFC]Add a test for PR88103, where zext is incoming to signed comparison.	2024-04-09 10:39:25 -07:00
Alexey Bataev	e8e67957fa	[SLP]Fix PR88123: use vectorized operands consistently. Need to use vectorized operands, not the vecop of the extractelement instructions, to avoid false detection of the extra vector operand in the extractelements shuffling.	2024-04-09 08:42:57 -07:00
Alexey Bataev	01d9528ef9	[SLP]Improve final minbitwidth analysis attempt. Added part for demanded bits analysis in the IsPotentiallyTruncated to improve minbitwidth analysis final attempts. Metric: size..text Program size..text results results0 diff test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 43069.00 42973.00 -0.2% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 43066.00 42970.00 -0.2% Extra trunc instructions are emitted to operate with <32 x i8> instead of <32 x i16>, will be removed in the next patches. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87786	2024-04-08 15:54:30 -04:00
Alexey Bataev	4a1c53f9fa	[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics. https://alive2.llvm.org/ce/z/ivPZ26 for the abs transformations. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/86135	2024-04-08 08:32:35 -07:00
Alexey Bataev	a612524197	[SLP]Fix the cost of the reduction result to the final type. Need to fix the way the cost is calculated, otherwise wrong cast opcode can be selected and lead to the over-optimistic vector cost. Plus, need to take into account reduction type size. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87528	2024-04-07 09:51:47 -04:00
Martin Storsjö	bd9486b4ec	Revert "[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics." This reverts commit 66b528078e4852412769375e35d2a672bf36a0ec. This commit caused miscompilations, breaking tests in the libyuv testsuite - see https://github.com/llvm/llvm-project/pull/86135#issuecomment-2041049709 for more details.	2024-04-06 23:53:26 +03:00
Alexey Bataev	66b528078e	[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics. https://alive2.llvm.org/ce/z/ivPZ26 for the abs transformations. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/86135	2024-04-05 14:29:26 -04:00
Alexey Bataev	8a0bfe4905	[SLP]Fix PR87630: wrong result for externally used vector value. Need to check that the externally used value can be represented with the BitWidth before applying it, otherwise need to keep wider type.	2024-04-04 12:03:28 -07:00
Alexey Bataev	5ae143da45	[SLP]Add a test with the incorrect casting for external user, NFC.	2024-04-04 11:25:52 -07:00
Alexey Bataev	42cbceb0f0	[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions. Compiler can improve analysis for operands of UIToFP/SIToFP instructions and operands of ICmp instruction. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85966	2024-04-03 14:18:45 -07:00
Alexey Bataev	fa2bbea14d	Revert "[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions." This reverts commit 899855d2b11856a44e530fffe854d76be69b9008 to fix the issue reported in https://lab.llvm.org/buildbot/#/builders/165/builds/51659.	2024-04-03 13:10:16 -07:00
Alexey Bataev	899855d2b1	[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions. Compiler can improve analysis for operands of UIToFP/SIToFP instructions and operands of ICmp instruction. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85966	2024-04-03 15:58:58 -04:00
Alexey Bataev	d57884011e	[SLP]Add support for commutative intrinsics. Implemented long-standing TODO to support commutative intrinsics. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/86316	2024-04-03 14:28:36 -04:00
Alexey Bataev	cd29126b63	[SLP]Fix PR87133: crash because of different altopcodes for cmps after reordering. If the node has cmp instruction with 3 or more different but swappable predicates, need to keep same kind of main/alternate opcodes to avoid incorrect detection of opcodes after reordering. Reordering changes the order and we may erroneously consider swappable opcodes as non-compatible/alternate, which may lead to a later compiler crash. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87267	2024-04-03 13:47:50 -04:00
Alexey Bataev	6b7b18a1a7	[SLP]Fix PR87329: crash on alternate cast vectorization. Need to fix the analysis for the alternate instructions, based on int extension operations. If the alternate extension node is resized, but not the operand, need to resize the node and do not shuffle final result, we end up only with trunc instruction.	2024-04-02 08:19:29 -07:00
Alexey Bataev	41afef9066	[SLP]Fix PR87011: Missing sign extension of demoted type before zero extension Need to drop skipping of the first zext/sext nodes, it leads to incorrect and less profitable code.	2024-04-01 06:07:18 -07:00
Alexey Bataev	d7975c9d93	[SLP]Add better minbitwidth analysis for udiv/urem instructions. Adds improved bitwidth analysis for udiv/urem instructions. The analysis is based on similar version in InstCombiner. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85928	2024-03-28 10:35:15 -04:00
Alexey Bataev	b43ec8e62b	[SLP]Fix PR86798: handle phi nodes being trunced, but not its operands. If the phi node is trunced, but not its operand(s), need to handle this situation in the assertion, code already does the right transformation.	2024-03-27 07:21:45 -07:00
Alexey Bataev	342f7d0d35	[SLP]Fix PR86620: check final minbitwidth for truncs/exts before accepting it. If the minbitwidth is deduced from the demanded elements, need to check the final bitwidthfor trunc/ext instruction, bot blindly accepting the used one.	2024-03-26 11:27:17 -07:00
Alexey Bataev	26dd12871c	[SLP]Do not propagate nuw/nsw flags for alt nodes, affected by minbitwidth analysis. Need to drop nuw/nsw flags, if the alternate node is resized after the minbitwidth analysis, to avoid producing poison values in corner cases.	2024-03-26 10:24:09 -07:00
Alexey Bataev	3942bd2fb5	[SLP]Fix a crash if the argument of call was affected by minbitwidt analysis. Need to support proper type conversion for function arguments to avoid compiler crash.	2024-03-21 17:06:48 -07:00
Alexey Bataev	8d7a6e2fd8	[SLP]Fix a crash for gather node with instructions from different bbs, if cost threshold is very low.	2024-03-21 08:03:06 -07:00
Alexey Bataev	aa4cbaba1d	[SLP][NFC]Add a test with @llvm.abs nodes, which can be analyzed for better bitwidth.	2024-03-21 06:16:18 -07:00
Alexey Bataev	55c82f149c	[SLP][NFC]Add a test with arguments of functions, reduced by minbitwidth analysis.	2024-03-20 08:08:01 -07:00
Alexey Bataev	6c1d4454ad	[SLP]Improve minbitwidth analysis for shifts. Adds improved bitwidth analysis for shl/ashr/lshr instructions. The analysis is based on similar version in InstCombiner. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84356	2024-03-20 09:07:26 -04:00
Alexey Bataev	81d9ed605b	[SLP]Do extra analysis int minbitwidth if some checks return false. The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84363	2024-03-20 05:48:55 -07:00
Alexey Bataev	31eaf86a1e	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536	2024-03-19 08:19:45 -07:00
Alexey Bataev	44c579f5b5	[SLP][NFC]Add a test with minbitwidth operand, but not a user.	2024-03-18 10:20:58 -07:00
Alexey Bataev	39a96bc7b2	[SLP][NFC]Add a test for minbitwidth analysis of icmp, being transformed to trunc.	2024-03-15 09:52:33 -07:00
Alexey Bataev	3789870758	Revert "[SLP]Improve minbitwidth analysis." This reverts commit 7f2167868d8c1cedd3915883412b9c787a2f01db to fix issues reported in https://github.com/llvm/llvm-project/pull/84536.	2024-03-15 03:59:48 -07:00
Alexey Bataev	ad0de4e6e6	[SLP][NFC]Add extra test for udiv/urem instructions.	2024-03-14 17:09:24 -07:00
Philip Reames	33960c9025	Regen some tests to reflect naming changes Cutting down on diff in an upcoming change.	2024-03-14 13:06:30 -07:00
Alexey Bataev	7f2167868d	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536	2024-03-14 06:23:14 -07:00
Alexey Bataev	4dd186afd5	[SLP]Fix PR85082: PHI node has multiple entries. Need to record casted extractelement for the externally used scalar, not original extract instruction.	2024-03-13 13:59:58 -07:00
Alexey Bataev	b966b224b3	Revert "[SLP]Fix PR85082: PHI node has multiple entries." This reverts commit 8237520eb42b37d7ed353d64a865d3ba5ac24ec6 to fix a crash in https://lab.llvm.org/buildbot/#/builders/198/builds/8891.	2024-03-13 13:38:58 -07:00
Alexey Bataev	8237520eb4	[SLP]Fix PR85082: PHI node has multiple entries. Need to record casted extractelement for the externally used scalar, not original extract instruction.	2024-03-13 12:55:24 -07:00
Alexey Bataev	b77c079987	Revert "[SLP]Fix PR85082: PHI node has multiple entries." This reverts commit 59ff907fc14aa2d02e57b4af4140949d4f8caca1 to fix crash revealed in https://lab.llvm.org/buildbot/#/builders/198/builds/8881	2024-03-13 12:10:40 -07:00
Alexey Bataev	59ff907fc1	[SLP]Fix PR85082: PHI node has multiple entries. Need to record casted extractelement for the externally used scalar, not original extract instruction.	2024-03-13 08:14:10 -07:00
Martin Storsjö	5b5c21d772	Revert "[SLP]Improve minbitwidth analysis." This reverts commit 2bd369b48dbf0bc3128becb7ef8f8a1b82514b87. That commit triggered failed assertions: $ cat repro.c short a; int b; void h() { short c = a; b = 0; for (; b < 4; b++) { unsigned d = a[b] + a[b + 4 * 2], e = a[b] - a[b + 4 * 2], f = (a[b + 4] >> 1) - a[b + 4 * 3], g = a[b + 4] + (a[b + 4 * 3] >> 1); c[b] = g; c[b + 4] = e + f; c[b + 4 * 2] = e - f; c[b + 4 * 3] = d - g; } } $ clang -target aarch64-linux-gnu -c -O2 repro.c clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:12503: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool): Assertion `(MinBWs.contains(getOperandEntry(E, 0)) \|\| MinBWs.contains(getOperandEntry(E, 1))) && "Expected item in MinBWs."' failed.	2024-03-09 13:53:13 +02:00
Alexey Bataev	2bd369b48d	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536	2024-03-08 13:57:02 -05:00
Alexey Bataev	11185715a2	Revert "[SLP]Improve minbitwidth analysis." This reverts commit 4ce52e2d576937fe930294cae883a0daa17eeced to fix issues detected by https://lab.llvm.org/buildbot/#/builders/74/builds/26470/steps/12/logs/stdio.	2024-03-07 12:44:53 -08:00
Alexey Bataev	904a6aedca	[SLP][NFC]Add lshr version of the test with casting, NFC.	2024-03-07 08:31:58 -08:00
Alexey Bataev	4ce52e2d57	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Reviewers: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84334	2024-03-07 10:36:41 -05:00
Alexey Bataev	aae152f1be	Revert "[SLP]Improve minbitwidth analysis." This reverts commit a730ed7c1a4a35f5219df720ffb0ba6122d64fe4 to fix compile time issue.	2024-03-05 12:13:45 -08:00
Alexey Bataev	a730ed7c1a	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/78976	2024-03-05 12:20:28 -05:00
Alexey Bataev	3a30d8e9e5	[SLP]Check if masked gather can be emitted as a serie of loads/insert subvector. Masked gather is very expensive operation and sometimes better to represent it as a serie of consecutive/strided loads + insertsubvectors sequences. Patch adds some basic estimation and if loads+insertsubvector is cheaper, decides to represent it in this way rather than masked gather. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83481	2024-03-01 13:49:29 -05:00
Alexey Bataev	df0fd3a80e	[SLP]Try to vectorize small graph with extractelements, used in buildvector. If the graph incudes only single "gather" node with only extractelements/undefs, which used only in insertelement-based buildvector sequences, it still might be profitable to vectorize it. Need to rely on the cost model, not throw this graph away immediately. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83581	2024-03-01 12:48:45 -05:00

1 2 3 4 5 ...

1362 Commits