llvm-project

Author	SHA1	Message	Date
Alexey Bataev	31eaf86a1e	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536	2024-03-19 08:19:45 -07:00
Nikita Popov	94c6ce1de9	[SLPVectorizer] Use IRBuilderBase where possible (NFC) Instead of hardcoding a specific IRBuilder type, use the base class.	2024-03-19 14:27:48 +01:00
Alexey Bataev	9a42bdc0ae	[SLP][NFC]Fix signedness to avoid comparison warning.	2024-03-15 09:56:40 -07:00
Philip Reames	0674ed753a	[SLP] Compute a shuffle mask for getGatherCost (#85330 ) This is the second of a series of small patches to compute shuffle masks for the couple of cases where we call getShuffleCost without one. My goal is to add an invariant that all calls to getShuffleCost for fixed length vectors have a mask. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-03-15 08:32:46 -07:00
Philip Reames	45e41f9686	[SLP] Compute a shuffle mask for SK_InsertSubvector (#85408 ) This is the third of a series of small patches to compute shuffle masks for the couple of cases where we call getShuffleCost without one. My goal is to add an invariant that all calls to getShuffleCost for fixed length vectors have a mask. After this change, there is one SK_InsertSubvector case left. I excluded it from this patch just because I thought it worthy of individual attention and review. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-03-15 08:32:18 -07:00
Philip Reames	f337525ee8	[SLP] Compute a shuffle mask for SK_Broadcast shuffle (#85327 ) This is the first of a couple of small patches to compute shuffle masks for the couple of cases where we call getShuffleCost without one. My goal is to add an invariant that all calls to getShuffleCost for fixed length vectors have a mask. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-03-15 07:41:26 -07:00
Alexey Bataev	3789870758	Revert "[SLP]Improve minbitwidth analysis." This reverts commit 7f2167868d8c1cedd3915883412b9c787a2f01db to fix issues reported in https://github.com/llvm/llvm-project/pull/84536.	2024-03-15 03:59:48 -07:00
Alexey Bataev	dbbe2fe2a2	Revert "[SLP]Do extra analysis int minbitwidth if some checks return false." This reverts commit e4b772444c8176abe30d364e4a946ee6c8ae8de4 to fixx the issues reported in https://github.com/llvm/llvm-project/pull/84536.	2024-03-15 03:58:34 -07:00
Alexey Bataev	7567f5ba78	Revert "[SLP]Do extra analysis int minbitwidth if some checks return false." This reverts commit ea429e19f56005bf89e717c14efdf49ec055b183 to fix issues reported in https://github.com/llvm/llvm-project/pull/84536#issuecomment-1999295445.	2024-03-15 03:52:58 -07:00
Artem Tyurin	141145232f	[IRBuilder] Fold binary intrinsics (#80743 ) Fixes https://github.com/llvm/llvm-project/issues/61240.	2024-03-15 09:58:25 +01:00
Alexey Bataev	e4b772444c	[SLP]Do extra analysis int minbitwidth if some checks return false. The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84363	2024-03-14 16:41:04 -07:00
Alexey Bataev	ea41ac1132	[SLP][NFC]Fix a warning for comparison of integers of different signs.	2024-03-14 16:06:08 -07:00
Alexey Bataev	5b303a98a8	Revert "[SLP]Do extra analysis int minbitwidth if some checks return false." This reverts commit ea429e19f56005bf89e717c14efdf49ec055b183 to fix issues revealed in https://lab.llvm.org/buildbot/#/builders/186/builds/15299 and https://lab.llvm.org/buildbot/#/builders/238/builds/8426.	2024-03-14 15:48:26 -07:00
Alexey Bataev	ea429e19f5	[SLP]Do extra analysis int minbitwidth if some checks return false. The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84363	2024-03-14 16:16:18 -04:00
Paschalis Mpeis	f795d1a8b1	[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem (#82488 ) getArithmeticInstrCost is used by both LoopVectorizer and SLPVectorizer to compute the cost of frem, which becomes a call cost on AArch64 when TLI has a vector library function. Add tests that do SLP vectorization for code that contains 2x double and 4x float frem instructions.	2024-03-14 17:20:29 +00:00
Alexey Bataev	7f2167868d	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536	2024-03-14 06:23:14 -07:00
Alexey Bataev	4dd186afd5	[SLP]Fix PR85082: PHI node has multiple entries. Need to record casted extractelement for the externally used scalar, not original extract instruction.	2024-03-13 13:59:58 -07:00
Alexey Bataev	b966b224b3	Revert "[SLP]Fix PR85082: PHI node has multiple entries." This reverts commit 8237520eb42b37d7ed353d64a865d3ba5ac24ec6 to fix a crash in https://lab.llvm.org/buildbot/#/builders/198/builds/8891.	2024-03-13 13:38:58 -07:00
Alexey Bataev	8237520eb4	[SLP]Fix PR85082: PHI node has multiple entries. Need to record casted extractelement for the externally used scalar, not original extract instruction.	2024-03-13 12:55:24 -07:00
Alexey Bataev	b77c079987	Revert "[SLP]Fix PR85082: PHI node has multiple entries." This reverts commit 59ff907fc14aa2d02e57b4af4140949d4f8caca1 to fix crash revealed in https://lab.llvm.org/buildbot/#/builders/198/builds/8881	2024-03-13 12:10:40 -07:00
Alexey Bataev	59ff907fc1	[SLP]Fix PR85082: PHI node has multiple entries. Need to record casted extractelement for the externally used scalar, not original extract instruction.	2024-03-13 08:14:10 -07:00
David Blaikie	9ac0315898	Add comment to assert from a843f26	2024-03-12 18:28:30 +00:00
David Blaikie	a843f26a77	[NFC] SLVectorizer comparator refactoring that preserves behavior (#84966 ) Spinning off from #79321 / 35f4592 - looked like the comparator could be simplified & made more clear/less risk of leaving hidden bugs.	2024-03-12 11:25:12 -07:00
Justin Lebar	fab2bb8bfd	Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678 ) For some reason this was missing from STLExtras.	2024-03-10 20:00:13 -07:00
Martin Storsjö	5b5c21d772	Revert "[SLP]Improve minbitwidth analysis." This reverts commit 2bd369b48dbf0bc3128becb7ef8f8a1b82514b87. That commit triggered failed assertions: $ cat repro.c short a; int b; void h() { short c = a; b = 0; for (; b < 4; b++) { unsigned d = a[b] + a[b + 4 * 2], e = a[b] - a[b + 4 * 2], f = (a[b + 4] >> 1) - a[b + 4 * 3], g = a[b + 4] + (a[b + 4 * 3] >> 1); c[b] = g; c[b + 4] = e + f; c[b + 4 * 2] = e - f; c[b + 4 * 3] = d - g; } } $ clang -target aarch64-linux-gnu -c -O2 repro.c clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:12503: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool): Assertion `(MinBWs.contains(getOperandEntry(E, 0)) \|\| MinBWs.contains(getOperandEntry(E, 1))) && "Expected item in MinBWs."' failed.	2024-03-09 13:53:13 +02:00
Alexey Bataev	2bd369b48d	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536	2024-03-08 13:57:02 -05:00
Alexey Bataev	11185715a2	Revert "[SLP]Improve minbitwidth analysis." This reverts commit 4ce52e2d576937fe930294cae883a0daa17eeced to fix issues detected by https://lab.llvm.org/buildbot/#/builders/74/builds/26470/steps/12/logs/stdio.	2024-03-07 12:44:53 -08:00
Alexey Bataev	4ce52e2d57	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Reviewers: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84334	2024-03-07 10:36:41 -05:00
Alexey Bataev	aae152f1be	Revert "[SLP]Improve minbitwidth analysis." This reverts commit a730ed7c1a4a35f5219df720ffb0ba6122d64fe4 to fix compile time issue.	2024-03-05 12:13:45 -08:00
Alexey Bataev	c8b3edcddd	Revert "[SLP][NFC]Use TargetTransformInfo:: instead of TTI:: in BoUpSLP to avoid" This reverts commit 083d8aa03aca55b88098a91e41e41a8e321a5721.	2024-03-05 12:13:44 -08:00
Alexey Bataev	083d8aa03a	[SLP][NFC]Use TargetTransformInfo:: instead of TTI:: in BoUpSLP to avoid some compilers confusion.	2024-03-05 10:24:06 -08:00
Alexey Bataev	a730ed7c1a	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/78976	2024-03-05 12:20:28 -05:00
Alexey Bataev	1c2b79add6	[SLP]Add runtime stride support for strided loads. Added support for runtime strides. Reviewers: preames, RKSimon Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/81517	2024-03-05 09:38:25 -05:00
Alexey Bataev	89827863a3	[SLP][NFC]Make canVectorizeLoads member of BoUpSLP class, NFC.	2024-03-04 07:10:27 -08:00
Florian Hahn	617398e5e2	[SLP] Collect candidate VFs in vector in vectorizeStores (NFC). (#82793 ) This is in preparation for https://github.com/llvm/llvm-project/pull/77790 and makes it easy to add other, non-power-of-2 VFs for processing. PR: https://github.com/llvm/llvm-project/pull/82793	2024-03-01 20:13:49 +00:00
Florian Hahn	6fe60bd89f	[SLP] Exit early if MaxVF < MinVF (NFCI). (#83283 ) Exit early if MaxVF < MinVF. In that case, the loop body below will never get entered. Note that this adjusts the condition from MaxVF <= MinVF. If MaxVF == MinVF, vectorization may still be feasible (and the loop below gets entered). PR: https://github.com/llvm/llvm-project/pull/83283	2024-03-01 19:43:06 +00:00
Alexey Bataev	2ab6d1e18e	[SLP][NFC]Move some check to the outer if to simplify inner checks.	2024-03-01 10:58:41 -08:00
Alexey Bataev	3a30d8e9e5	[SLP]Check if masked gather can be emitted as a serie of loads/insert subvector. Masked gather is very expensive operation and sometimes better to represent it as a serie of consecutive/strided loads + insertsubvectors sequences. Patch adds some basic estimation and if loads+insertsubvector is cheaper, decides to represent it in this way rather than masked gather. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83481	2024-03-01 13:49:29 -05:00
Alexey Bataev	df0fd3a80e	[SLP]Try to vectorize small graph with extractelements, used in buildvector. If the graph incudes only single "gather" node with only extractelements/undefs, which used only in insertelement-based buildvector sequences, it still might be profitable to vectorize it. Need to rely on the cost model, not throw this graph away immediately. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83581	2024-03-01 12:48:45 -05:00
Benjamin Kramer	3fc277f665	[SLPVectorizer] Make the insert/extractvector PHICompare a strict-weak ordering (#83571 ) This was tripping off STL implementations that check for it (like libc++ with debug checking). The goal of this sort is to cluster operations on the same values so preserve that property but sort everything else based on the existing numbering.	2024-03-01 15:37:54 +01:00
Alexey Bataev	5bafb8d952	[SLP][NFC]Add/use single UsesLimit constant, NFC.	2024-03-01 06:37:08 -08:00
Florian Hahn	6ecd26132b	[SLP] Use ScopeExit to update Operands/PrevDist on all paths. (NFC) (#83490 ) Use ScopeExit to make sure Operands/PrevDist are updated on all paths in the loop. This makes it easier to ensure they are updated correctly if new early continues are added. Split off from https://github.com/llvm/llvm-project/pull/83283 PR: https://github.com/llvm/llvm-project/pull/83490	2024-03-01 14:30:01 +00:00
Alexey Bataev	f28c4b4bac	[SLP]Fix/improve potential masked gather loads analysis. When do the analysis for the (potential) masked gather node, we check that not greater than half of the pointer operands are loop invariants or potentially vectorizable. Need to check actually, that we have a loop at first and do better check for the potentially vectorizable pointers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83472	2024-03-01 07:38:18 -05:00
Alexey Bataev	2d98d763a8	[SLP]Fix the cost model for extracts combined with later shuffle. If the buildvector node contains extract, which later should be combined with some other nodes by shuffling, need to estimate the cost of this shuffle before building the mask after shuffle. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/83442	2024-03-01 07:12:07 -05:00
Alexey Bataev	45d82f33af	[SLP]Fix miscompilation, cause by incorrect final node reordering. Need to use the regular reordering from the correct node for the final store/insertelement node to avoid miscommilation.	2024-02-29 11:21:06 -08:00
Alexey Bataev	c89d51112d	[SLP]Use It->second.first for BWSz, NFC.	2024-02-28 06:38:41 -08:00
Alexey Bataev	32994cc0d6	[SLP]Improve findReusedOrderedScalars and graph rotation. Patch syncs the code in findReusedOrderedScalars with cost estimation/codegen. It tries to use similar logic to better determine best order. Before, it just tried to find previously vectorized node without checking if it is possible to use the vectorized value in the shuffle. Now it relies on the more generalized version. If it determines, that a single vector must be reordered (using same mechanism, as codegen and cost estimation), it generates better order. The comparison between new/ref ordering: Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/nbench/nbench.test 139.00 140.00 0.7% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 344.00 346.00 0.6% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1293.00 1292.00 -0.1% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 5176.00 5170.00 -0.1% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 5173.00 5167.00 -0.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11692.00 11660.00 -0.3% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 1621.00 1615.00 -0.4% test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 795.00 792.00 -0.4% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26499.00 26338.00 -0.6% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 7343.00 7281.00 -0.8% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1104.00 1094.00 -0.9% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2216.00 2180.00 -1.6% test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 787.00 637.00 -19.1% Less 0% is better. Most of the benchmarks see more vectorized code. The first ones just have shuffles removed. The ordering analysis still may require some improvements (e.g. for alternate nodes), but this one should be produce better results. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/77529	2024-02-22 14:32:15 -05:00
Alexey Bataev	35f45926eb	[SLP][NFC]Add asserts for undef handling in PHIComparator, NFC.	2024-02-19 12:57:56 -08:00
Alexey Bataev	b04dd5d187	[SLP]FIx PR81403: compiler crah because wrongly resized vector value. The mask for the reshuffling/resizing might be calculated incorrectly, fixed.	2024-02-12 10:27:25 -08:00
Alexey Bataev	833a1cadeb	[SLP]Add support for strided loads. Added basic support for strided loads support in SLP vectorizer. Supports constant strides only. If the strided load must be reversed, applies -stride to avoid extra reverse shuffle. Reviewers: preames, lukel97 Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/80310	2024-02-12 09:43:54 -08:00

1 2 3 4 5 ...

1644 Commits