llvm-project

Author	SHA1	Message	Date
Alexey Bataev	a6ef0864e9	Revert "[SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value." This reverts commit f70f1228035c9610de38e0e376afdacb647c4ad9 to fix the crash reported by https://lab.llvm.org/buildbot/#/builders/133/builds/2456.	2024-07-30 15:11:35 -07:00
David Green	89b67a6400	[SLP] Cluster SortedBases before sorting. (#101144 ) In order to enforce a strict-weak ordering, this patch clusters the bases that are being sorted by the root - the first value in a gep chain. The sorting is then performed in each cluster.	2024-07-30 22:12:20 +01:00
Alexey Bataev	f70f122803	[SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value. When trying to reuse extractelement instruction, need to check that it is inserted into proper position. Its original vector operand should come before new vector value, otherwise new extractelement instruction must be generated. Fixes https://github.com/llvm/llvm-project/issues/101213	2024-07-30 14:04:50 -07:00
Alexey Bataev	197f4a9051	[SLP]Remove ExtraArgs from reductions. No need to handle extra arguments during the reductions anymore, the compiler now can handle all reduced values and reduction operands correctly, even if they are from different basic blocks. Simplifies analysis, reduces compiler size, improves overall vectorization. Metric: size..text test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test 16668.00 17148.00 2.9% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2389675.00 2418683.00 1.2% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 253517.00 253645.00 0.1% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309678.00 309806.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 389203.00 389363.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111120.00 111152.00 0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1039103.00 1039215.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1155883.00 1155963.00 0.0% test-suite :: MicroBenchmarks/LoopVectorization/LoopInterleavingBenchmarks.test 276646.00 276662.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 848691.00 848739.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1138604.00 1138636.00 0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 910201.00 910217.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12385484.00 12385628.00 0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 9667580.00 9667676.00 0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 9667580.00 9667676.00 0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2856182.00 2856198.00 0.0% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2856182.00 2856198.00 0.0% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 773224.00 773192.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035148.00 1035084.00 -0.0% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 98126.00 98094.00 -0.0% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 97966.00 97934.00 -0.0% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 167391.00 167215.00 -0.1% test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test 56685.00 56605.00 -0.1% test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test 56685.00 56605.00 -0.1% test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-20050826-2.test 1302.00 1294.00 -0.6% Misc-C++/stepanov_container - better code due to cost fixes. 483.xalancbmk - better code due to cost fixes. ASCI_Purple/SMG2000 - better code due to cost fixes. Benchmarks/Bullet - better vector code because of the cost. JM/ldecod - extra code remain scalar, extra reduction vectorized consumer-jpeg - extra code remain scalar because of the cost. tramp3d-v4 - better vectorization because of cost fixes. 511.povray_r - better vectorization because of cost fixes. LoopInterleavingBenchmarks - extra reductions are vectorized JM/lencod - small changes in vector code because of extract cost fixes. 453.povray - small changes in vector code because of extract cost fixes. 445.gobmk - extra small reduction vectorized 526.blender_r - extra reduced scalars, better small reduction, small changes in the vetorization because of the fixes for extracts cost 602.gcc_s 502.gcc_r - small changes in reductions vectorization because of the fixes in the extract cost. 631.deepsjeng_s 623.xalancbmk_s - small changes in reductions vectorization because of the fixes in the extract cost. MallocBench/gs - extra code remain scalar because of extracts cost alacconvert-encode - extra code remain scalar because of extracts cost alacconvert-decode - extra code remain scalar because of extracts cost GCC-C-execute-20050826-2 - extra reduction gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99923	2024-07-29 13:23:56 -04:00
David Green	f2d2ae3f5a	[SLP] Order clustered load base pointers by ascending offsets (#100653 ) This attempts to fix a regression from #98025, where the new order of reduction nodes causes later passes to not be able to produce as nice shuffles. The issue boils down to picking an order of [0 1 3 2] for loaded v4i8 values, which meant later parts could not find a simpler ordering for the shuffles given the legal nodes available in AArch64. If instead we make sure they are ordered [0 1 2 3] then everything can fall into place. In order to produce a better order that is more likely to work in more cases, this patch takes the existing clustered loads and sort the base pointers if there is an order between them. i.e if `V2 == gep (V1, X)` then V1 is sorted before V2.	2024-07-27 11:18:56 +01:00
Alexey Bataev	1e1c8d1615	[SLP]Add external uses cost for the gathered loads. If the load is a part of the gather node and also a part of the vectorized subvector, need to add the estimation for the non-vectorized external uses. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99889	2024-07-26 11:09:44 -04:00
Han-Kuan Chen	5fc9502f19	[SLP] NFC. ShuffleInstructionBuilder::add V1->getType() is always a FixedVectorType. (#99842 ) castToScalarTyElem has a cast<VectorType>(V->getType()).	2024-07-24 01:40:24 +08:00
Alexey Bataev	3cb82f49dc	[SLP]Fix PR99899: Use canonical type instead of original vector of ptr. Use adjusted canonical integer type instead of the original ptr type to fix the crash in the TTI. Fixes https://github.com/llvm/llvm-project/issues/99899	2024-07-22 13:05:12 -07:00
Alexey Bataev	f6e01b9ece	[SLP]Do not trunc bv nodes, if the user is vectorized an requires wider type. If at least a single user of the gathered trunc'ed instruction is vectorized and requires wider type, than the trunc node, such gathers/buildvectors should not be optimized for better bitwidth.	2024-07-19 07:28:04 -07:00
Yangyu Chen	007aa6d1b2	[SLP] Increase UsesLimit to 64 (#99467 ) Since commit 82b800ecb35fb46881aa52000fa40b1b99aa654e addressed the issue #99327 , we see some performance regression (13%) on some verilator generated C++ code. This is because the UsesLimit is set to 8, which is too small for the verilator generated code. I have analyzed the need for the UsesLimit from [1] and found that the UsesLimit should be at least 64 to cover most of these cases. Thus, This patch increases the UsesLimit to 64. Link: https://github.com/llvm/llvm-project/issues/99327#issuecomment-2236052879 [1] Signed-off-by: Yangyu Chen <cyy@cyyself.name>	2024-07-19 20:32:28 +08:00
Han-Kuan Chen	39bb244a16	[SLP][REVEC] Make Instruction::Call support vector instructions. (#99317 )	2024-07-18 20:49:53 +08:00
Han-Kuan Chen	b634e057dd	[SLP][REVEC] Fix false assumption of the source for castToScalarTyElem. (#99424 ) The argument V may come from adjustExtracts, which is the vector operand of ExtractElementInst. In addition, it is not existed in getTreeEntry. The vector operand of ExtractElementInst may have a type of <1 x Ty>, ensuring that the number of elements in ScalarTy and VecTy are equal. reference: https://github.com/llvm/llvm-project/issues/99411	2024-07-18 19:54:46 +08:00
Alexey Bataev	82b800ecb3	[SLP][NFC]Limit number of the external uses analysis, NFC. BoUpSLP::buildExternalUses runs through all the users of the vectorized scalars, which may require significant amount of time, if there are too many users. Limited the analysis, if there are too many users, all of them are replaced, not individually.	2024-07-17 14:12:22 -07:00
Alexey Bataev	c5c1bd164f	[SLP]Improve minbitwidth analysis for trun'ed gather nodes. If the gather node is trunc'ed, better to trunc scalars and then gather them rather than gather and then trunc. Trunc for scalars is free in most cases. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99072	2024-07-17 07:41:00 -07:00
Alexey Bataev	05b067b5f9	Revert "[SLP]Improve minbitwidth analysis for trun'ed gather nodes." This reverts commit d3d2f9a4208eedbd2f372c34725ab61c3f4d3aed to fix buildbot https://lab.llvm.org/buildbot/#/builders/92/builds/1880.	2024-07-17 07:31:27 -07:00
Alexey Bataev	d3d2f9a420	[SLP]Improve minbitwidth analysis for trun'ed gather nodes. If the gather node is trunc'ed, better to trunc scalars and then gather them rather than gather and then trunc. Trunc for scalars is free in most cases. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99072	2024-07-17 07:29:02 -07:00
Alexey Bataev	b05ccaf451	Revert "[SLP]Improve minbitwidth analysis for trun'ed gather nodes." This reverts commit 6425f2d66740b84fc3027b649cd4baf660c384e8 to fix the buildbost issues reported in https://lab.llvm.org/buildbot/#/builders/95/builds/1404.	2024-07-17 05:51:54 -07:00
Han-Kuan Chen	1813ffd6b2	[SLP][REVEC] Make SLP support revectorization (-slp-revec) and add simple test. (#98269 ) This PR will make SLP support revectorization. Add an option -slp-revec to control the functionality. reference: https://discourse.llvm.org/t/rfc-make-slp-vectorizer-revectorize-vector-instructions/79436	2024-07-17 20:14:12 +08:00
Alexey Bataev	6425f2d667	[SLP]Improve minbitwidth analysis for trun'ed gather nodes. If the gather node is trunc'ed, better to trunc scalars and then gather them rather than gather and then trunc. Trunc for scalars is free in most cases. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99072	2024-07-17 07:17:25 -04:00
Alexey Bataev	15915c06d5	[SLP]Do not vectorize small (<=2) buildvector/buildvalue sequences with MaxVF==true. If MaxVFOnly for buildvector/buildvalue vectorization is set to true and the total number of elements to vectorize is <= 2, better to try to vectorize reductions at first, which may produce larger tree (reductions have a limit of at least 4 elements to vectorize). Smaller buildvector/buildvalue sequence will be attempted to vectorize later, with MaxVFOnly set to false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/98957	2024-07-16 12:45:58 -04:00
Alexey Bataev	8ff233f4f1	[SLP]Correctly detect minnum/maxnum patterns for select/cmp operations on floats. The patch enables detection of minnum/maxnum patterns for float point instruction, represented as select/cmp. Also, enables better cost estimation for integer min/max patterns since the compiler starts to estimate the scalars separately. Reviewers: nikic, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/98570	2024-07-16 09:42:08 -07:00
Alexey Bataev	c3540d0b6b	Revert "[SLP]Correctly detect minnum/maxnum patterns for select/cmp operations on floats." This reverts commit c7aac38c29f564bc48f7cfb71d3b3b8b482c873b to fix crashes reavealed by the buildbot in https://lab.llvm.org/buildbot/#/builders/168/builds/1104.	2024-07-16 05:59:59 -07:00
Alexey Bataev	c7aac38c29	[SLP]Correctly detect minnum/maxnum patterns for select/cmp operations on floats. The patch enables detection of minnum/maxnum patterns for float point instruction, represented as select/cmp. Also, enables better cost estimation for integer min/max patterns since the compiler starts to estimate the scalars separately. Reviewers: nikic, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/98570	2024-07-16 08:14:27 -04:00
Alexey Bataev	beccecaacd	[SLP]Fix PR98838: do no replace condition of select-based logical op by poison. If the reduction operation is a select-based logical op, the condition should be replaced by the poison, better to replace by the non-poisoning constant to prevent poison propagation in the vector code. Fixes https://github.com/llvm/llvm-project/issues/98838	2024-07-15 07:27:54 -07:00
Alexey Bataev	9e261c5bee	[SLP]Do not salvage debug info from instructions, marked for deletion already. If the instruction was processed already for the deletion, no need to process it second time, it may cause compiler crash.	2024-07-12 08:08:50 -07:00
Alexey Bataev	01a9888694	[SLP][NFC]Add isGather() function and use it instead direct comparison, NFC.	2024-07-11 11:56:32 -07:00
Alexey Bataev	3742c2a83c	[SLP]Use stored signedness after minbitwidth analysis. Need to used stored signedness info for the root node instead of recalculating it after the vectorization, which may lead to a compiler crash.	2024-07-10 03:58:00 -07:00
Han-Kuan Chen	ac299ed2c7	[SLP] Provide an universal interface for FixedVectorType::get. NFC. (#96845 ) SLP vectorizes scalar type to vector type. In the future, we will try to make SLP vectorizes vector type to vector type. We add a getWidenedType as a helper function. For example, SLP will make the following code %v0 = load i32, ptr %in0, align 4 %v1 = load i32, ptr %in1, align 4 %v2 = load i32, ptr %in2, align 4 %v3 = load i32, ptr %in3, align 4 into a load <4 x i32>. The ScalarTy is i32 and VF is 4. In the future, SLP will make the following code %v0 = load <4 x i32>, ptr %in0, align 4 %v1 = load <4 x i32>, ptr %in1, align 4 %v2 = load <4 x i32>, ptr %in2, align 4 %v3 = load <4 x i32>, ptr %in3, align 4 into a load <16 x i32>. The ScalarTy is <4 x i32> and VF is 4. reference: https://discourse.llvm.org/t/rfc-make-slp-vectorizer-revectorize-vector-instructions/79436	2024-07-10 11:50:35 +08:00
Alexey Bataev	af21bc1917	[SLP]Fix a crash on attempt to revectorize vectorized phi. If the PHI node is vectorized during vectorization of its operands, no need to try to vectorize its operands once again.	2024-07-09 14:11:08 -07:00
Alexey Bataev	822a818786	[SLP][NFC]Add comments for the code, NFC.	2024-07-09 10:06:34 -07:00
Alexey Bataev	a988821123	[SLP]Keep the original order in the reductions. The patch tries to keep the original order of the instruction in the reductions. Previously, two first instructions were switched, giving reverse order. The first step to support of the ordered reductions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/98025	2024-07-09 12:26:42 -04:00
Alexey Bataev	2cba218ca5	[SLP]Fix PR98133: Inserting PHI after debug-records! The phi-node-to-be-deleted still should be inserted as the first instruction in the block to avoid random compiler crashes. Fixes https://github.com/llvm/llvm-project/issues/98133	2024-07-09 05:44:45 -07:00
Alexey Bataev	f5ee07a1b5	[SLP]Improve instruction reordering mode detection. The "instruction" reordering mode should be selected only if there are compatible instructions in other operands, which can be reordered. Otherwise, better to select splat reordering mode. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12383340.00 12383324.00 -0.0% Some 4x operations get replaced by 8x. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97485	2024-07-08 16:01:55 -04:00
Alexey Bataev	385118644c	[SLP]Remove operands upon marking instruction for deletion. If the instruction is marked for deletion, better to drop all its operands and mark them for deletion too (if allowed). It allows to have more vectorizable patterns and generate less useless extractelement instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97409	2024-07-08 07:56:48 -07:00
Alexey Bataev	4c47b41771	[SLP]Allow matching and shuffling of extractelement vector operands with different VF. Allows better codegen with the free resizing of small VF vector operands and then regular shuffling of the operands of the same size and simplifies the code. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97414	2024-07-08 09:27:08 -04:00
tcwzxx	c2fe75f99c	Make the logic for checking scatter vectorized nodes of GEP clearer (#97826 ) There is no functional change. Authored-by: zhizhixu <zhizhixu@tencent.com>	2024-07-08 06:08:04 -04:00
Kazu Hirata	75bc20ff89	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914 )	2024-07-07 08:23:41 +09:00
Jon Roelofs	d3a76b03d8	[llvm][SLPVectorizer] Fix a bad cast assertion (#97621 ) Fixes: rdar://128092379	2024-07-03 16:25:32 -07:00
Alexey Bataev	873c3f7e78	Revert "[SLP]Remove operands upon marking instruction for deletion." This reverts commit bbd52dd44ceee80e3b6ba6a9b2bd8ee9a9713833 to fix a crash revealed in https://lab.llvm.org/buildbot/#/builders/4/builds/505	2024-07-03 13:05:17 -07:00
Alexey Bataev	bbd52dd44c	[SLP]Remove operands upon marking instruction for deletion. If the instruction is marked for deletion, better to drop all its operands and mark them for deletion too (if allowed). It allows to have more vectorizable patterns and generate less useless extractelement instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/97409	2024-07-03 15:11:18 -04:00
Alexey Bataev	4eecf3c650	[SLP]Reorder buildvector/reduction vectorization and fuse the loops. Currently SLP vectorizer tries at first to find reduction nodes, and then vectorize buildvector sequences. Need to try to vectorize wide buildvector sequences at first and only then try to vectorize reductions, and then smaller buildvector sequences. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/96943	2024-07-03 14:36:30 -04:00
Gabriel Baraldi	380beaec86	Fix potential crash in SLPVectorizer caused by missing check (#95937 ) I'm not super familiar with this code, but it seems that we were just missing a check. The original code that triggered this did not have uselistorders but llvm-reduce created them and it reproduces the same issue in a way more compact way. Fixes https://github.com/llvm/llvm-project/issues/95016	2024-07-02 08:15:51 -04:00
Youngsuk Kim	2051736f7b	[llvm][Transforms] Avoid 'raw_string_ostream::str' (NFC) Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO comment to remove `raw_string_ostream::str()`.	2024-06-30 09:03:29 -05:00
Alexey Bataev	d70963a762	[SLP]Fix the cost of the adjusted extracts in per-register analysis. Previous patch did not pass the list of the extract indices by reference, so the compiler just ignored them. Pass indices by reference and fix the per-register analysis. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/96808	2024-06-28 14:33:08 -07:00
Alexey Bataev	a9c12e481b	Revert "[SLP]Fix the cost of the adjusted extracts in per-register analysis." This reverts commit 784152056ea40a800a8fd9f4157a428dfb7a6de8 to fix buildbots issues reported in https://lab.llvm.org/buildbot/#/builders/4/builds/315 and https://lab.llvm.org/buildbot/#/builders/35/builds/481	2024-06-28 13:41:51 -07:00
Alexey Bataev	784152056e	[SLP]Fix the cost of the adjusted extracts in per-register analysis. Previous patch did not pass the list of the extract indices by reference, so the compiler just ignored them. Pass indices by reference and fix the per-register analysis. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/96808	2024-06-28 15:49:47 -04:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Alexey Bataev	6f582b7ed3	[SLP][NFC]Remove extra check for VU.	2024-06-26 05:39:37 -07:00
Alexey Bataev	0280f97b36	[SLP]Fix PR95925: extract vectorized index of the potential buildvector sequence. If the vectorized scalar is not the insert value in the buildvector sequence but the index, it should be always extracted.	2024-06-25 14:07:51 -07:00
Alexey Bataev	228c2e1473	[SLP]Fix incorrect promotion of nodes before shuffling. If the base node is signed, but some values are unsigned, still the whole node should be considered signed. Also, an extra bitwidth analysis should be performed, when estimating the minimal bitwidth.	2024-06-25 13:39:28 -07:00

1 2 3 4 5 ...

1801 Commits