llvm-project

Author	SHA1	Message	Date
Alexey Bataev	3742c2a83c	[SLP]Use stored signedness after minbitwidth analysis. Need to used stored signedness info for the root node instead of recalculating it after the vectorization, which may lead to a compiler crash.	2024-07-10 03:58:00 -07:00
Han-Kuan Chen	ac299ed2c7	[SLP] Provide an universal interface for FixedVectorType::get. NFC. (#96845 ) SLP vectorizes scalar type to vector type. In the future, we will try to make SLP vectorizes vector type to vector type. We add a getWidenedType as a helper function. For example, SLP will make the following code %v0 = load i32, ptr %in0, align 4 %v1 = load i32, ptr %in1, align 4 %v2 = load i32, ptr %in2, align 4 %v3 = load i32, ptr %in3, align 4 into a load <4 x i32>. The ScalarTy is i32 and VF is 4. In the future, SLP will make the following code %v0 = load <4 x i32>, ptr %in0, align 4 %v1 = load <4 x i32>, ptr %in1, align 4 %v2 = load <4 x i32>, ptr %in2, align 4 %v3 = load <4 x i32>, ptr %in3, align 4 into a load <16 x i32>. The ScalarTy is <4 x i32> and VF is 4. reference: https://discourse.llvm.org/t/rfc-make-slp-vectorizer-revectorize-vector-instructions/79436	2024-07-10 11:50:35 +08:00
Alexey Bataev	af21bc1917	[SLP]Fix a crash on attempt to revectorize vectorized phi. If the PHI node is vectorized during vectorization of its operands, no need to try to vectorize its operands once again.	2024-07-09 14:11:08 -07:00
Alexey Bataev	822a818786	[SLP][NFC]Add comments for the code, NFC.	2024-07-09 10:06:34 -07:00
Alexey Bataev	a988821123	[SLP]Keep the original order in the reductions. The patch tries to keep the original order of the instruction in the reductions. Previously, two first instructions were switched, giving reverse order. The first step to support of the ordered reductions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/98025	2024-07-09 12:26:42 -04:00
Alexey Bataev	2cba218ca5	[SLP]Fix PR98133: Inserting PHI after debug-records! The phi-node-to-be-deleted still should be inserted as the first instruction in the block to avoid random compiler crashes. Fixes https://github.com/llvm/llvm-project/issues/98133	2024-07-09 05:44:45 -07:00
Alexey Bataev	f5ee07a1b5	[SLP]Improve instruction reordering mode detection. The "instruction" reordering mode should be selected only if there are compatible instructions in other operands, which can be reordered. Otherwise, better to select splat reordering mode. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12383340.00 12383324.00 -0.0% Some 4x operations get replaced by 8x. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97485	2024-07-08 16:01:55 -04:00
Alexey Bataev	385118644c	[SLP]Remove operands upon marking instruction for deletion. If the instruction is marked for deletion, better to drop all its operands and mark them for deletion too (if allowed). It allows to have more vectorizable patterns and generate less useless extractelement instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97409	2024-07-08 07:56:48 -07:00
Alexey Bataev	4c47b41771	[SLP]Allow matching and shuffling of extractelement vector operands with different VF. Allows better codegen with the free resizing of small VF vector operands and then regular shuffling of the operands of the same size and simplifies the code. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97414	2024-07-08 09:27:08 -04:00
tcwzxx	c2fe75f99c	Make the logic for checking scatter vectorized nodes of GEP clearer (#97826 ) There is no functional change. Authored-by: zhizhixu <zhizhixu@tencent.com>	2024-07-08 06:08:04 -04:00
Kazu Hirata	75bc20ff89	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914 )	2024-07-07 08:23:41 +09:00
Jon Roelofs	d3a76b03d8	[llvm][SLPVectorizer] Fix a bad cast assertion (#97621 ) Fixes: rdar://128092379	2024-07-03 16:25:32 -07:00
Alexey Bataev	873c3f7e78	Revert "[SLP]Remove operands upon marking instruction for deletion." This reverts commit bbd52dd44ceee80e3b6ba6a9b2bd8ee9a9713833 to fix a crash revealed in https://lab.llvm.org/buildbot/#/builders/4/builds/505	2024-07-03 13:05:17 -07:00
Alexey Bataev	bbd52dd44c	[SLP]Remove operands upon marking instruction for deletion. If the instruction is marked for deletion, better to drop all its operands and mark them for deletion too (if allowed). It allows to have more vectorizable patterns and generate less useless extractelement instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97409	2024-07-03 15:11:18 -04:00
Alexey Bataev	4eecf3c650	[SLP]Reorder buildvector/reduction vectorization and fuse the loops. Currently SLP vectorizer tries at first to find reduction nodes, and then vectorize buildvector sequences. Need to try to vectorize wide buildvector sequences at first and only then try to vectorize reductions, and then smaller buildvector sequences. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/96943	2024-07-03 14:36:30 -04:00
Gabriel Baraldi	380beaec86	Fix potential crash in SLPVectorizer caused by missing check (#95937 ) I'm not super familiar with this code, but it seems that we were just missing a check. The original code that triggered this did not have uselistorders but llvm-reduce created them and it reproduces the same issue in a way more compact way. Fixes https://github.com/llvm/llvm-project/issues/95016	2024-07-02 08:15:51 -04:00
Youngsuk Kim	2051736f7b	[llvm][Transforms] Avoid 'raw_string_ostream::str' (NFC) Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO comment to remove `raw_string_ostream::str()`.	2024-06-30 09:03:29 -05:00
Alexey Bataev	d70963a762	[SLP]Fix the cost of the adjusted extracts in per-register analysis. Previous patch did not pass the list of the extract indices by reference, so the compiler just ignored them. Pass indices by reference and fix the per-register analysis. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/96808	2024-06-28 14:33:08 -07:00
Alexey Bataev	a9c12e481b	Revert "[SLP]Fix the cost of the adjusted extracts in per-register analysis." This reverts commit 784152056ea40a800a8fd9f4157a428dfb7a6de8 to fix buildbots issues reported in https://lab.llvm.org/buildbot/#/builders/4/builds/315 and https://lab.llvm.org/buildbot/#/builders/35/builds/481	2024-06-28 13:41:51 -07:00
Alexey Bataev	784152056e	[SLP]Fix the cost of the adjusted extracts in per-register analysis. Previous patch did not pass the list of the extract indices by reference, so the compiler just ignored them. Pass indices by reference and fix the per-register analysis. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/96808	2024-06-28 15:49:47 -04:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Alexey Bataev	6f582b7ed3	[SLP][NFC]Remove extra check for VU.	2024-06-26 05:39:37 -07:00
Alexey Bataev	0280f97b36	[SLP]Fix PR95925: extract vectorized index of the potential buildvector sequence. If the vectorized scalar is not the insert value in the buildvector sequence but the index, it should be always extracted.	2024-06-25 14:07:51 -07:00
Alexey Bataev	228c2e1473	[SLP]Fix incorrect promotion of nodes before shuffling. If the base node is signed, but some values are unsigned, still the whole node should be considered signed. Also, an extra bitwidth analysis should be performed, when estimating the minimal bitwidth.	2024-06-25 13:39:28 -07:00
Han-Kuan Chen	de7c1396f2	[SLP] NFC. Refactor and add getAltInstrMask help function. (#94709 ) Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-06-26 00:42:38 +08:00
Nikita Popov	8263bec533	[SLP] Use poison instead of undef in reorderScalars() (#96619 ) -1 mask elements are specified to return poison rather than undef nowadays , so update the reorderScalars() implementation to match.	2024-06-25 14:23:40 +02:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Simon Pilgrim	f9fc6f6d75	[SLP] Remove dead initialization noticed by static analyser. NFC.	2024-06-21 17:42:01 +01:00
Han-Kuan Chen	be339fd99d	[SLP] NFC. Reduce redundant assignment. (#96149 )	2024-06-20 20:09:28 +08:00
Tyler Lanphear	d337c504ef	[SLP][NFCI] Address issues seen in downstream Coverity scan. (#93757 ) - Prevent null dereference: if the Mask given to `ShuffleInstructionBuilder::adjustExtracts()` is empty or all-poison, then `VecBase` will be `nullptr` and the call to `castToScalarTyElem(VecBase)` will dereference it. Add an assert to guard against this. - Prevent use of uninitialized scalar: in the unlikely event that `CandidateVFs` is empty, then `AnyProfitableGraph` will be uninitialized in `if` condition following the loop. (This seems like a false-positive, but I submitted this change anyways as initializing bools costs nothing and is generally good practice)	2024-05-31 18:34:23 -07:00
Alexey Bataev	70a54bca6f	[SLP]Improve/fix extracts calculations for non-power-of-2 elements. One of the previous patches introduced initial support for non-power-of-2 number of elements but some parts of the SLP vectorizer still were not adjusted to handle the costs correctly. Patch fixes it by improving analysis of the non-power-of-2 number of elements and fixes in the cost of the extractelements instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/93213	2024-05-24 09:33:36 -04:00
Alexey Bataev	554c47c8e9	[SLP]Fix undef poison vector values shuffles with poisonous vectors. If trying to find vector value in shuffling of the extractelements and one of the vector values is undef value, need to generate real mask value for such vector and either undef vector, or incoming second vector, if non-poisonous.	2024-05-22 10:41:57 -07:00
Han-Kuan Chen	5b205956e1	[SLP] NFC. Reduce newTreeEntry usage. (#92994 )	2024-05-22 23:26:33 +08:00
Alexey Bataev	30d484fa99	[SLP]Fix a crash when trying to convert masked gather nodes to strided. Need to check if the loads node is masked gather. Only vectorized loads can be converted to strided.	2024-05-22 08:08:56 -07:00
Han-Kuan Chen	9f449c3427	[SLP] NFC. Use TreeEntry::getOperand if setOperandsInOrder is called (#92727 ) already.	2024-05-20 18:46:30 +08:00
Jay Foad	1650f1b3d7	Fix typo "indicies" (#92232 )	2024-05-15 13:10:16 +01:00
Alexey Bataev	58a94b1d0a	[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type. Need to look through the SExt/ZExt scalars to be gathered, when trying to reduce their width after minbitwidth analysis to prevent permanent attempts to revectorize such gathered instructions.	2024-05-09 04:19:43 -07:00
Arthur Eubanks	2fb3774321	Revert "[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type." This reverts commit 2475efa91d8b4fa8f1a2d16052cb6d14be7d5dc6. Causes crashes, see comments on `2475efa91d`.	2024-05-08 23:01:47 +00:00
Alexey Bataev	2475efa91d	[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type. Need to look through the SExt/ZExt scalars to be gathered, when trying to reduce their width after minbitwidth analysis to prevent permanent attempts to revectorize such gathered instructions.	2024-05-08 07:25:19 -07:00
Alexey Bataev	f00f294130	[SLP]Fix PR91309: Do not consider SExt as always producing signed result. Still need to do the full analysis of the signedness of the values rather than rely on Instruction opcode, if the opcode is SExt. Still may produce unsigned result.	2024-05-07 08:57:52 -07:00
Alexey Bataev	c144157f3d	[SLP]Use last pointer instead of first for reversed strided stores. Need to use the last address of the vectorized stores for the strided stores, not the first one, to correctly store the data.	2024-05-06 10:16:28 -07:00
Alexey Bataev	a476032101	[SLP]Fix PR91025: correctly handle smin/smax of signed operands. Need to check that the signed operand has an extra sign bit to be sure that we do not skip signedness, when trying to minimize bitwidth for smin/smax intrinsics.	2024-05-06 08:10:20 -07:00
Alexey Bataev	c7910ee1f0	[SLP][NFC]Use std::optional::value_or.	2024-05-04 11:47:41 -07:00
Alexey Bataev	03972261a9	[SLP]Fix PR90892: do a correct sign analysis of the entries elements in gather shuffles. Need to do extra analysis of the scalar elements of the tree entry to be shuffled instead of the vectorized value to correctly deduce signedness info.	2024-05-03 14:01:25 -07:00
Alexey Bataev	5e67c41a93	[SLP]Fix PR90780: insert cast instruction for PHI nodes after all phi nodes. Need to check if the vectorized value is a PHINode before insert casting instruction and insert it after all phis to generate the code correctly.	2024-05-02 06:30:14 -07:00
Alexey Bataev	fc382db239	[SLP]Improve comparison of shuffled loads/masked gathers by adding GEP cost. In some cases masked gather is less profitable than insert-subvector of consecutive/strided stores. SLP has this kind of analysis, but need to improve it by adding the cost of the GEP analysis. Also, the GEP cost estimation for masked gather is fixed. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/90737	2024-05-01 15:53:25 -04:00
Alexey Bataev	59ef94d7cf	[SLP]Do not include the cost of and -1, <v> and emit just <v> after MinBitWidth. After minbitwidth analysis, and <v>, (power_of_2 - 1 const) can be transformed into just an <v>, (all_ones const), which can be ignored at the cost estimation and at the codegen. x264 benchmark has this pattern. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/90739	2024-05-01 15:52:23 -04:00
Alexey Bataev	576261ac8f	[SLP]Improve reordering for consts, splats and ops from same nodes + improved analysis. Improved detection of const/splat candidates, their matching and analysis of instructions from same nodes. Metric: size..text Program size..text results results0 diff results results0 diff test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 92952.00 93096.00 0.2% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 779832.00 780136.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 839923.00 840179.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 392708.00 392740.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171131.00 1171147.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1391089.00 1391073.00 -0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1391089.00 1391073.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12352780.00 12352636.00 -0.0% MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE - small reordering External/SPEC/CINT2006/464.h264ref/464.h264ref - small better code after reordering MultiSource/Applications/JM/lencod/lencod - smaller code with less shuffles MultiSource/Applications/JM/ldecod/ldecod - same External/SPEC/CFP2017rate/511.povray_r/511.povray_r - 2 extra loads vectorized, smaller code External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - better code, size increased because of more constant vectors. External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same External/SPEC/CFP2017rate/526.blender_r/526.blender_r - small change in the vectorized code, some code a bit better, some a bit worse. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87091	2024-05-01 07:34:06 -04:00
Alexey Bataev	67e726a2f7	[SLP]Transform stores + reverse to strided stores with stride -1, if profitable. Adds transformation of consecutive vector store + reverse to strided stores with stride -1, if it is profitable Reviewers: RKSimon, preames Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/90464	2024-05-01 07:32:33 -04:00

1 2 3 4 5 ...

1775 Commits