llvm-project

Author	SHA1	Message	Date
Alexey Bataev	cb9cf331fa	[SLP][NFC]Do not lookup in MinBWs, reuse previously used iterator.	2024-04-02 05:53:34 -07:00
Alexey Bataev	41afef9066	[SLP]Fix PR87011: Missing sign extension of demoted type before zero extension Need to drop skipping of the first zext/sext nodes, it leads to incorrect and less profitable code.	2024-04-01 06:07:18 -07:00
Jakub Kuderski	2b0ab05c4a	[SLP][NFC] Simplify type checks with isa predicates (#87182 ) For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.	2024-03-31 14:55:11 -04:00
Alexey Bataev	01e02e0b6a	[SLP]Fix PR87011: Do not assume that initial ext/trunc nodes can be represented by bitwidth without analysis. Need to check that initial ext/trunc nodes can be safely represented using calculated bitwidth before applying it.	2024-03-28 18:02:26 -07:00
Alexey Bataev	70cf2a09ce	[SLP][NFC]Simplify function/constructors by removing unnecessary params.	2024-03-28 13:34:59 -07:00
Alexey Bataev	d7975c9d93	[SLP]Add better minbitwidth analysis for udiv/urem instructions. Adds improved bitwidth analysis for udiv/urem instructions. The analysis is based on similar version in InstCombiner. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85928	2024-03-28 10:35:15 -04:00
Alexey Bataev	b7a4ace72e	[SLP][NFC]Improve compile time by size analysis limit and reduction size limit. Used RecursionMaxDepth to limit number of lookups in BoUpSLP::getVectorElementSize and limited reduction width for bool reduced values.	2024-03-27 14:46:04 -07:00
Alexey Bataev	d94dc5f0d6	[SLP]Fix PR86763: do not truncate reductions to the demanded bits size. Need to adjust ReductionBitWIdth after minbitwidth analysis, if the demanded bits analysis sjows tht its size is less than the size of the vectorized value. It prevents incorrect sign-zero extension transformation after.	2024-03-27 14:34:59 -07:00
Alexey Bataev	b43ec8e62b	[SLP]Fix PR86798: handle phi nodes being trunced, but not its operands. If the phi node is trunced, but not its operand(s), need to handle this situation in the assertion, code already does the right transformation.	2024-03-27 07:21:45 -07:00
Alexey Bataev	342f7d0d35	[SLP]Fix PR86620: check final minbitwidth for truncs/exts before accepting it. If the minbitwidth is deduced from the demanded elements, need to check the final bitwidthfor trunc/ext instruction, bot blindly accepting the used one.	2024-03-26 11:27:17 -07:00
Alexey Bataev	26dd12871c	[SLP]Do not propagate nuw/nsw flags for alt nodes, affected by minbitwidth analysis. Need to drop nuw/nsw flags, if the alternate node is resized after the minbitwidth analysis, to avoid producing poison values in corner cases.	2024-03-26 10:24:09 -07:00
Alexey Bataev	54ca1e2c04	[SLP]Fix PR80027: include initial trunc nodes to the demoted values. Need to include initial sext/zext/trunc nodes to the list of the demoted root values to correctly calculate the cost and handle the vectorization.	2024-03-26 06:40:57 -07:00
Patrick O'Neill	4652ec0e29	[SLP] Delete vectorized users when tree contains an invalid cost (#86344 )	2024-03-22 17:52:27 -04:00
Alexey Bataev	9c0a0659d4	[SLP]Fix a crash for non-profitable non-schedulable single buildvector node tree, if the threshold allows its vectorization.	2024-03-22 07:44:23 -07:00
Alexey Bataev	3942bd2fb5	[SLP]Fix a crash if the argument of call was affected by minbitwidt analysis. Need to support proper type conversion for function arguments to avoid compiler crash.	2024-03-21 17:06:48 -07:00
Alexey Bataev	8d7a6e2fd8	[SLP]Fix a crash for gather node with instructions from different bbs, if cost threshold is very low.	2024-03-21 08:03:06 -07:00
Alexey Bataev	34f0a8aaba	[SLP]Fix comparison in bitwidth check. Projected bitwidth should be less than the original, not greater.	2024-03-21 04:24:34 -07:00
Alexey Bataev	04f7cd7f45	[SLP][NFC]Make findBestRootPair() member function constant.	2024-03-20 08:33:47 -07:00
Alexey Bataev	6c1d4454ad	[SLP]Improve minbitwidth analysis for shifts. Adds improved bitwidth analysis for shl/ashr/lshr instructions. The analysis is based on similar version in InstCombiner. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84356	2024-03-20 09:07:26 -04:00
Alexey Bataev	81d9ed605b	[SLP]Do extra analysis int minbitwidth if some checks return false. The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84363	2024-03-20 05:48:55 -07:00
Alexey Bataev	3a90cb4c18	Revert "[SLP]Do extra analysis int minbitwidth if some checks return false." This reverts commit da118c93b40f74f6770cf8550903721555d3c97b to fix crashes reported in https://github.com/llvm/llvm-project/pull/84363.	2024-03-20 05:00:05 -07:00
Nikita Popov	27df1b23e0	[SLPVectorizer] Use TargetFolder (#85800 ) Use IRBuilder with TargetFolder in SLPVectorizer to avoid the custom constant folding code. This fixes the remaining part of https://github.com/llvm/llvm-project/issues/61240.	2024-03-20 09:18:45 +01:00
Alexey Bataev	da118c93b4	[SLP]Do extra analysis int minbitwidth if some checks return false. The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84363	2024-03-19 12:24:18 -07:00
Alexey Bataev	31eaf86a1e	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536	2024-03-19 08:19:45 -07:00
Nikita Popov	94c6ce1de9	[SLPVectorizer] Use IRBuilderBase where possible (NFC) Instead of hardcoding a specific IRBuilder type, use the base class.	2024-03-19 14:27:48 +01:00
Alexey Bataev	9a42bdc0ae	[SLP][NFC]Fix signedness to avoid comparison warning.	2024-03-15 09:56:40 -07:00
Philip Reames	0674ed753a	[SLP] Compute a shuffle mask for getGatherCost (#85330 ) This is the second of a series of small patches to compute shuffle masks for the couple of cases where we call getShuffleCost without one. My goal is to add an invariant that all calls to getShuffleCost for fixed length vectors have a mask. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-03-15 08:32:46 -07:00
Philip Reames	45e41f9686	[SLP] Compute a shuffle mask for SK_InsertSubvector (#85408 ) This is the third of a series of small patches to compute shuffle masks for the couple of cases where we call getShuffleCost without one. My goal is to add an invariant that all calls to getShuffleCost for fixed length vectors have a mask. After this change, there is one SK_InsertSubvector case left. I excluded it from this patch just because I thought it worthy of individual attention and review. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-03-15 08:32:18 -07:00
Philip Reames	f337525ee8	[SLP] Compute a shuffle mask for SK_Broadcast shuffle (#85327 ) This is the first of a couple of small patches to compute shuffle masks for the couple of cases where we call getShuffleCost without one. My goal is to add an invariant that all calls to getShuffleCost for fixed length vectors have a mask. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-03-15 07:41:26 -07:00
Alexey Bataev	3789870758	Revert "[SLP]Improve minbitwidth analysis." This reverts commit 7f2167868d8c1cedd3915883412b9c787a2f01db to fix issues reported in https://github.com/llvm/llvm-project/pull/84536.	2024-03-15 03:59:48 -07:00
Alexey Bataev	dbbe2fe2a2	Revert "[SLP]Do extra analysis int minbitwidth if some checks return false." This reverts commit e4b772444c8176abe30d364e4a946ee6c8ae8de4 to fixx the issues reported in https://github.com/llvm/llvm-project/pull/84536.	2024-03-15 03:58:34 -07:00
Alexey Bataev	7567f5ba78	Revert "[SLP]Do extra analysis int minbitwidth if some checks return false." This reverts commit ea429e19f56005bf89e717c14efdf49ec055b183 to fix issues reported in https://github.com/llvm/llvm-project/pull/84536#issuecomment-1999295445.	2024-03-15 03:52:58 -07:00
Artem Tyurin	141145232f	[IRBuilder] Fold binary intrinsics (#80743 ) Fixes https://github.com/llvm/llvm-project/issues/61240.	2024-03-15 09:58:25 +01:00
Alexey Bataev	e4b772444c	[SLP]Do extra analysis int minbitwidth if some checks return false. The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84363	2024-03-14 16:41:04 -07:00
Alexey Bataev	ea41ac1132	[SLP][NFC]Fix a warning for comparison of integers of different signs.	2024-03-14 16:06:08 -07:00
Alexey Bataev	5b303a98a8	Revert "[SLP]Do extra analysis int minbitwidth if some checks return false." This reverts commit ea429e19f56005bf89e717c14efdf49ec055b183 to fix issues revealed in https://lab.llvm.org/buildbot/#/builders/186/builds/15299 and https://lab.llvm.org/buildbot/#/builders/238/builds/8426.	2024-03-14 15:48:26 -07:00
Alexey Bataev	ea429e19f5	[SLP]Do extra analysis int minbitwidth if some checks return false. The instruction itself can be considered good for minbitwidth casting, even if one of the operand checks returns false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84363	2024-03-14 16:16:18 -04:00
Paschalis Mpeis	f795d1a8b1	[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem (#82488 ) getArithmeticInstrCost is used by both LoopVectorizer and SLPVectorizer to compute the cost of frem, which becomes a call cost on AArch64 when TLI has a vector library function. Add tests that do SLP vectorization for code that contains 2x double and 4x float frem instructions.	2024-03-14 17:20:29 +00:00
Alexey Bataev	7f2167868d	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536	2024-03-14 06:23:14 -07:00
Alexey Bataev	4dd186afd5	[SLP]Fix PR85082: PHI node has multiple entries. Need to record casted extractelement for the externally used scalar, not original extract instruction.	2024-03-13 13:59:58 -07:00
Alexey Bataev	b966b224b3	Revert "[SLP]Fix PR85082: PHI node has multiple entries." This reverts commit 8237520eb42b37d7ed353d64a865d3ba5ac24ec6 to fix a crash in https://lab.llvm.org/buildbot/#/builders/198/builds/8891.	2024-03-13 13:38:58 -07:00
Alexey Bataev	8237520eb4	[SLP]Fix PR85082: PHI node has multiple entries. Need to record casted extractelement for the externally used scalar, not original extract instruction.	2024-03-13 12:55:24 -07:00
Alexey Bataev	b77c079987	Revert "[SLP]Fix PR85082: PHI node has multiple entries." This reverts commit 59ff907fc14aa2d02e57b4af4140949d4f8caca1 to fix crash revealed in https://lab.llvm.org/buildbot/#/builders/198/builds/8881	2024-03-13 12:10:40 -07:00
Alexey Bataev	59ff907fc1	[SLP]Fix PR85082: PHI node has multiple entries. Need to record casted extractelement for the externally used scalar, not original extract instruction.	2024-03-13 08:14:10 -07:00
David Blaikie	9ac0315898	Add comment to assert from a843f26	2024-03-12 18:28:30 +00:00
David Blaikie	a843f26a77	[NFC] SLVectorizer comparator refactoring that preserves behavior (#84966 ) Spinning off from #79321 / 35f4592 - looked like the comparator could be simplified & made more clear/less risk of leaving hidden bugs.	2024-03-12 11:25:12 -07:00
Justin Lebar	fab2bb8bfd	Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678 ) For some reason this was missing from STLExtras.	2024-03-10 20:00:13 -07:00
Martin Storsjö	5b5c21d772	Revert "[SLP]Improve minbitwidth analysis." This reverts commit 2bd369b48dbf0bc3128becb7ef8f8a1b82514b87. That commit triggered failed assertions: $ cat repro.c short a; int b; void h() { short c = a; b = 0; for (; b < 4; b++) { unsigned d = a[b] + a[b + 4 * 2], e = a[b] - a[b + 4 * 2], f = (a[b + 4] >> 1) - a[b + 4 * 3], g = a[b + 4] + (a[b + 4 * 3] >> 1); c[b] = g; c[b + 4] = e + f; c[b + 4 * 2] = e - f; c[b + 4 * 3] = d - g; } } $ clang -target aarch64-linux-gnu -c -O2 repro.c clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:12503: llvm::Value* llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::slpvectorizer::BoUpSLP::TreeEntry*, bool): Assertion `(MinBWs.contains(getOperandEntry(E, 0)) \|\| MinBWs.contains(getOperandEntry(E, 1))) && "Expected item in MinBWs."' failed.	2024-03-09 13:53:13 +02:00
Alexey Bataev	2bd369b48d	[SLP]Improve minbitwidth analysis. This improves overall analysis for minbitwidth in SLP. It allows to analyze the trees with store/insertelement root nodes. Also, instead of using single minbitwidth, detected from the very first analysis stage, it tries to detect the best one for each trunc/ext subtree in the graph and use it for the subtree. Results in better code and less vector register pressure. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0% SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar instructions remain scalar (good). Spec2017/x264 - the whole function idct4x4dc is vectorized using <16 x i16> instead of <16 x i32>, also zext/trunc are removed. In other places last vector zext/sext removed and replaced by extractelement + scalar zext/sext pair. MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by reduce or <4 x i8> Spec2017/imagick - Removed extra zext from 2 packs of the operations. Spec2017/parest - Removed extra zext, replaced by extractelement+scalar zext Spec2017/blender - the whole bunch of vector zext/sext replaced by extractelement+scalar zext/sext, some extra code vectorized in smaller types. Spec2006/gobmk - fixed cost estimation, some small code remains scalar. Original Pull Request: https://github.com/llvm/llvm-project/pull/84334 The patch has the same functionality (no test changes, no changes in benchmarks) as the original patch, just has some compile time improvements + fixes for xxhash unittest, discovered earlier in the previous version of the patch. Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/84536	2024-03-08 13:57:02 -05:00
Alexey Bataev	11185715a2	Revert "[SLP]Improve minbitwidth analysis." This reverts commit 4ce52e2d576937fe930294cae883a0daa17eeced to fix issues detected by https://lab.llvm.org/buildbot/#/builders/74/builds/26470/steps/12/logs/stdio.	2024-03-07 12:44:53 -08:00

1 2 3 4 5 ...

1667 Commits