llvm-project

Author	SHA1	Message	Date
Nikita Popov	2d69827c5c	[Transforms] Convert tests to opaque pointers (NFC)	2024-02-05 11:57:34 +01:00
Nikita Popov	6a06155c53	[VectorCombine] Discard ScalarizationResults if transform aborted Fixes https://github.com/llvm/llvm-project/issues/69820.	2023-10-31 11:24:30 +01:00
Nikita Popov	3b82397965	[VectorCombine] Check for non-byte-sized element type We should check whether the element type is non-byte-sized, not the vector type. For types like <32 x i1> the whole type is byte-sized, but the individual elements (that we scalarize to) are not. Fixes https://github.com/llvm/llvm-project/issues/67060.	2023-09-28 14:18:30 +02:00
Nikita Popov	95606a58c1	[VectorCombine] Add tests for #67060 (NFC)	2023-09-28 14:18:29 +02:00
Ben Shi	ea0ee55c02	[VectorCombine] Enable transform 'scalarizeLoadExtract' for non constant indexes (#65445 ) Enable the transform if a non constant index is guaranteed to be safe via a UREM/AND.	2023-09-26 09:41:53 +08:00
Ben Shi	068357d9b0	[VectorCombine] Enable transform 'scalarizeLoadExtract' for scalable vector types (#65443 ) The transform 'scalarizeLoadExtract' can be applied to scalable vector types if the index is less than the minimum number of elements. The check whether the index is less than the minimum number of elements locates at line 1175~1180. 'scalarizeLoadExtract' will call 'canScalarizeAccess' and check the returned result if this transform is safe. At the beginning of the function 'canScalarizeAccess', the index will be checked 1. If it is less than the number of elements of a fixed vector type. 2. If it is less than the minimum number of elements of a scalable vector type. Otherwise 'canScalarizeAccess' will return unsafe and this transform will be prevented.	2023-09-18 10:49:18 +08:00
Ben Shi	f3fbea2cac	[VectorCombine][test] Supplement tests of the load-extractelement sequence (#65442 ) The newly added tests are all about scalable vector types.	2023-09-13 18:56:44 +08:00
Alexey Bataev	9a207578ac	[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D157425	2023-08-18 13:47:01 -07:00
David Green	2a859b2014	[AArch64] Change the cost of vector insert/extract to 2 The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core. This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing). The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer. We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments. Differential Revision: https://reviews.llvm.org/D155459	2023-07-28 21:26:50 +01:00
Fangrui Song	d39b4ce3ce	[test] Replace aarch64-*-eabi with aarch64 Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid it elsewhere as well. Just use the common "aarch64" without other triple components.	2023-06-27 20:02:52 -07:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Nikita Popov	c00ffbe02b	[VectorCombine] Convert tests to opaque pointers (NFC)	2022-12-23 10:04:26 +01:00
Matt Devereau	ee4d6c8bf0	[VectorCombine] Enable scalarizeBinopOrCmp for scalable vectors This reverts a change to exclude scalarizeBinopOrCmp in VectorCombine for scalable vectors which caused poor scalable Binop codegen. Differential Revision: https://reviews.llvm.org/D138545	2022-11-23 13:17:21 +00:00
Matt Devereau	a8c24d57b8	[InstCombine] Remove redundant splats in InstCombineVectorOps Splatting the first vector element of the result of a BinOp, where any of the BinOp's operands are the result of a first vector element splat can be simplified to splatting the first vector element of the result of the BinOp Differential Revision: https://reviews.llvm.org/D135876	2022-11-07 15:39:05 +00:00
Peter Waller	e1790c8c29	Revert "[InstCombine] Remove redundant splats in InstCombineVectorOps" This reverts commit 957eed0b1af2cb88edafe1ff2643a38165c67a40.	2022-11-03 07:56:03 +00:00
Matt Devereau	957eed0b1a	[InstCombine] Remove redundant splats in InstCombineVectorOps Splatting the first vector element of the result of a BinOp, where any of the BinOp's operands are the result of a first vector element splat can be simplified to splatting the first vector element of the result of the BinOp Differential Revision: https://reviews.llvm.org/D135876	2022-11-02 11:57:05 +00:00
Matt Devereau	be0d427a14	[VectorCombine] Add insertelement-shufflevector VectorCombine tests This is a precommit which adds some tests to show the functionality of an upcoming VectorCombine optimization	2022-10-13 14:10:06 +00:00
David Green	4b7913c357	[VectorCombine] Only consider shuffle uses with the same type. The backend getShuffleCosts do not currently handle shuffles that change size very well. Limit the shuffles we collect to the same type to make sure they do not cause issues as reported in D128732.	2022-07-16 13:23:39 +01:00
Sander de Smalen	519d7876cb	[VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector. This addresses https://github.com/llvm/llvm-project/issues/56377 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D129136	2022-07-07 08:37:04 +00:00
Sander de Smalen	f45a3f7f9b	[VectorCombine] NFC: rename test extract-cmp-binop.ll to extract-scalable.ll	2022-07-07 08:37:04 +00:00
David Green	5493f8fc59	[VectorCombine] Improve shuffle select shuffle-of-shuffles This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like: %x = shuffle %i1, %i2 %y = shuffle %i1, %i2 %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in: %x = shuffle %i1, %i2 %y = shuffle %x The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree. This is a recommit with some additional checks for supported forms and out-of-bounds mask elements, with some extra tests. Differential Revision: https://reviews.llvm.org/D128732	2022-07-05 17:16:18 +01:00
Nikita Popov	b69c75d53f	Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles" This reverts commit 19a1e20b8a0f69da2a871eae6cbd03d1314ee02d. Clang crashes while linking bullet from llvm-test-suite in ReleaseLTO-g cmake configuration.	2022-07-05 09:31:20 +02:00
David Green	19a1e20b8a	[VectorCombine] Improve shuffle select shuffle-of-shuffles This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like: %x = shuffle %i1, %i2 %y = shuffle %i1, %i2 %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in: %x = shuffle %i1, %i2 %y = shuffle %x The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree. Differential Revision: https://reviews.llvm.org/D128732	2022-07-04 13:38:43 +01:00
Nikita Popov	03aceab08b	[ValueTracking] Enable -branch-on-poison-as-ub by default Now that SimpleLoopUnswitch and other transforms no longer introduce branch on poison, enable the -branch-on-poison-as-ub option by default. The practical impact of this is mostly better flag preservation in SCEV, and some freeze instructions no longer being necessary. Differential Revision: https://reviews.llvm.org/D125299	2022-06-01 10:46:06 +02:00
David Green	4c6a070a2c	[AArch64] Teach perfect shuffles tables about D-lane movs Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle tables, slightly lowering the costs a little more. This is a rough improvement in general, especially if you ignore mov v0.16b, v2.16b type moves that are often artefacts of the calling convention. The D register movs are encoded as (0x4 \| LaneIdx), and to generate a D register move we are required to bitcast into a higher type, but it is otherwise very similar to the S-lane mov's already supported. Differential Revision: https://reviews.llvm.org/D125477	2022-05-17 18:16:45 +01:00
David Green	6f9e1ea0ef	[VectorCombine] Attempt to fold select shuffles from reductions Given a commutative reduction leading from a shuffle, the order of the lanes on the shuffle are not important for the result. This means we can reorder the shuffle to something simpler, which we try shuffling the first vector lanes first. This was D123494. The new shuffle may not be profitable though, and if it is not we can try the folding of select shuffles from D123911. This, with some adjustment as the output lane ordering is now unimportant, can allow the final shuffle to simplify given the inputs to the patterns from D123911. Where as each transformation on their own are not profitable, the combination is. We can only support a single shuffle when called from reductions, but we are able to sort the ReconstructMask, potentially allowing it to simplify to an identity or concat mask. Differential Revision: https://reviews.llvm.org/D125086	2022-05-08 10:32:41 +01:00
David Green	100cb9a2ba	[VectorCombine] Fold shuffle select pattern This patch adds a combine to attempt to reduce the costs of certain select-shuffle patterns. The form of code it attempts to detect is: %x = shuffle ... %y = shuffle ... %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask A classic select-mask will pick items from each lane of a or b. These do not always have a great lowering on many architectures. This patch attempts to pack a and b into the lower elements, creating a differently ordered shuffle for reconstructing the orignal which may be better than the select mask. This can be better for performance, especially if less elements of a and b need to be computed and the input shuffles are cheaper. Because select-masks are just one form of shuffle, we generalize to any mask. So long as the backend has decent costmodel for the shuffles, this can generally improve things when they come up. For more basic cost models the folds do not appear to be profitable, not getting past the cost checks. Differential Revision: https://reviews.llvm.org/D123911	2022-05-06 08:13:18 +01:00
David Green	7aadfc5099	[VectorCombine] Add tests for shuffle binops patterns. NFC	2022-05-04 15:07:47 +01:00
David Green	ded8187e35	[VectorCombine] Try to reduce shuffle cost for commutative reduction operands Given a shuffle feeding a commutative reduction, the lane ordering of the shuffle will not alter the result. This is also true if there are a number of operations between the reduction and the shuffle, providing they only operate lane-wise. This patch searches for cases like that in Vector Combine, allowing us to check the cost of the shuffle vs an in-order identity shuffle and replace the order if possible. This only handles a single shuffle at the moment to keep things simple, and is able to ignore splats that produce results where every result is the same. This is a more powerful version of a combine that already happens in instrcombine, capable of optimizing more cases by looking through more instructions and being able to cost the shuffle. Differential Revision: https://reviews.llvm.org/D123494	2022-04-28 19:46:12 +01:00
David Green	f9f2276399	[VecCombine] Add tests for removing shuffles from reductions. NFC	2022-04-28 15:06:24 +01:00
Bjorn Pettersson	5e4dbd7a2f	[NewPM][test] Use -passes syntax in VectorCombine lit tests The legacy PM is deprecated, so use the new PM syntax in lit tests running the vector-combine pass.	2021-10-20 15:16:17 +02:00
Florian Hahn	e2f6290e06	[VectorCombine] Discard ScalarizationResult state in early exit. ScalarizationResult's destructor makes sure ToFreeze is not ignored if set. Currently, scalarizeLoadExtract has an early exit if the index is not safe directly. But when it is SafeWithFreeze, we need to discard the state first, otherwise we hit the assert in the destructor. Fixes PR51992.	2021-09-28 12:52:16 +01:00
Florian Hahn	300870a95c	[VectorCombine] Switch to using a worklist. This patch updates VectorCombine to use a worklist to allow iterative simplifications where a combine enables other combines. Suggested in D100302. The main use case at the moment is foldSingleElementStore and scalarizeLoadExtract working together to improve scalarization. Note that we now also do not run SimplifyInstructionsInBlock on the whole function if there have been changes. This means we fail to remove/simplify instructions not related to any of the vector combines. IMO this is fine, as simplifying the whole function seems more like a workaround for not tracking the changed instructions. Compile-time impact looks neutral: NewPM-O3: +0.02% NewPM-ReleaseThinLTO: -0.00% NewPM-ReleaseLTO-g: -0.02% http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D110171	2021-09-22 09:54:58 +01:00
Florian Hahn	5131037ea9	[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange. isValidAssumeForContext can provide better results with access to the dominator tree in some cases. This patch adjusts computeConstantRange to allow passing through a dominator tree. The use VectorCombine is updated to pass through the DT to enable additional scalarization. Note that similar APIs like computeKnownBits already accept optional dominator tree arguments. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110175	2021-09-21 16:54:47 +01:00
Florian Hahn	ea27dd7497	[VectorCombine] Add tests which require DT to use info from assumes.	2021-09-21 13:07:06 +01:00
Florian Hahn	96ca03493a	[VectorCombine] Limit scalarization to non-poison indices for now. As Eli mentioned post-commit in D103378, the result of the freeze may still be out-of-range according to Alive2. So for now, just limit the transform to indices that are non-poison.	2021-06-14 16:40:14 +01:00
Roman Lebedev	20542b47d6	[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper This results in slightly more optimistic alignments in some cases	2021-06-11 12:47:10 +03:00
Kerry McLaughlin	5db52751a5	[CostModel] Return an invalid cost for memory ops with unsupported types Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector type cannot be legalized by widening/splitting. When this is the method of legalization found, getTypeLegalizationCost will return an Invalid cost. The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call getTypeLegalizationCost and will now also return an Invalid cost for unsupported types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102515	2021-06-08 12:07:36 +01:00
Florian Hahn	d4c070d801	[VectorCombine] Freeze index unless it is known to be non-poison. If the index itself is already poison, the poison propagates through instructions clamping the index to a valid range. This still causes introducing a load of poison, as flagged by Alive2 and pointed out at 575e2aff5574. This patch updates the code to freeze the index, unless it is proven to not be poison. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D103378	2021-06-01 10:40:57 +01:00
Florian Hahn	f000c4cfb6	[VectorCombine] Add tests with multiple noundef indices for scalarization.	2021-06-01 10:17:50 +01:00
Florian Hahn	829978744d	[VectorCombine] Add tests with noundef index for load scalarization.	2021-05-30 12:15:41 +01:00
Florian Hahn	007f268c35	[VectorCombine] Check indices for all extracts we scalarize. We need to make sure that the indices of all extracts we scalarize are valid.	2021-05-28 18:35:29 +01:00
Florian Hahn	f01df9805c	[VectorCombine] Add variants of multi-extract tests with assumes.	2021-05-28 18:35:24 +01:00
Florian Hahn	a92376d297	[VectorCombine] Add test that combines load & store scalarization.	2021-05-25 14:28:37 +01:00
Florian Hahn	575e2aff55	[VectorCombine] Use constant range info for index scalarization legality. We can only scalarize memory accesses if we know the index is valid. This patch adjusts canScalarizeAcceess to fall back to computeConstantRange to check if the index is known to be valid. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D102476	2021-05-25 13:58:42 +01:00
Florian Hahn	d251d6f812	[VectorCombine] Fix load extract scalarization tests with assumes. The input IR for @load_extract_idx_var_i64_known_valid_by_assume and @load_extract_idx_var_i64_not_known_valid_by_assume_after_load has been swapped. This patch fixes the test so that @load_extract_idx_var_i64_known_valid_by_assume has the assume before the load and the other test has it after.	2021-05-24 13:14:13 +01:00
Florian Hahn	4e8c28b6fb	Recommit "[VectorCombine] Scalarize vector load/extract." This reverts commit 94d54155e2f38b56171811757044a3e6f643c14b. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.	2021-05-24 11:35:07 +01:00
Florian Hahn	94d54155e2	Revert "[VectorCombine] Scalarize vector load/extract." This reverts commit 86497785d540e59eaca24bed4219ddec183cbc9b. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio	2021-05-24 10:11:00 +01:00
Florian Hahn	86497785d5	[VectorCombine] Scalarize vector load/extract. This patch adds a new combine that tries to scalarize chains of `extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is profitable when extracting only a few elements out of a large vector. At the moment, `store (extractelement (load %ptr), %idx), %ptr` operations on large vectors result in huge code in the backend. This can easily be triggered by using the matrix extension, e.g. https://clang.godbolt.org/z/qsccPdPf4 This should complement D98240. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D100273	2021-05-24 09:29:08 +01:00

1 2

56 Commits