llvm-project

Author	SHA1	Message	Date
Alexey Bataev	20b2c9f10f	[SLP][NFC]Use GatheredScalars vector instead of the original E->Scalars, NFC GateredScalars is a full copy of the E->Scalars in this places and can be safely used for now. Unifies the code across the function.	2024-08-14 08:29:38 -07:00
Alexey Bataev	d9b9ae6ba9	[SLP][NFC]Use transform nodes before building external uses, NFC. In preparing for the future upcoming patches, just moving the call to the proper place, which is NFC for now.	2024-08-14 08:19:05 -07:00
Han-Kuan Chen	246f345152	[SLP][REVEC] Make CastInst support vector instructions. (#103216 )	2024-08-13 23:52:32 +08:00
Han-Kuan Chen	6aad4918e8	[SLP][REVEC] Make MinBWs support vector instructions. (#103049 ) If ScalarTy is FixedVectorType, it should remain as FixedVectorType.	2024-08-13 21:35:28 +08:00
Han-Kuan Chen	2256d00a14	[SLP][REVEC] Use VL.front()->getType() as ScalarTy. (#102437 ) VL.front()->getType() may be FixedVectorType when revec is enabled. Fix "Expected item in MinBWs.".	2024-08-13 19:53:45 +08:00
Han-Kuan Chen	875b551de7	[SLP][REVEC] Make computeMinimumValueSizes and collectValuesToDemote support vector instructions. (#103005 )	2024-08-13 19:35:25 +08:00
Vitaly Buka	5ce47a5813	Reland "[Support] Assert that DomTree nodes share parent" (#102782 ) A dominance query of a block that is in a different function is ill-defined, so assert that getNode() is only called for blocks that are in the same function. There are three cases, where this behavior did occur. LoopFuse didn't explicitly do this, but didn't invalidate the SCEV block dispositions, leaving dangling pointers to free'ed basic blocks behind, causing use-after-free. We do, however, want to be able to dereference basic blocks inside the dominator tree, so that we can refer to them by a number stored inside the basic block. Reverts #102780 Reland #101198 Fixes #102784 Co-authored-by: Alexis Engelke <engelke@in.tum.de>	2024-08-13 11:56:02 +02:00
Han-Kuan Chen	b4b0c02306	[SLP][REVEC] Make tryToReduce and related functions support vector instructions. (#102327 )	2024-08-13 11:44:23 +08:00
Han-Kuan Chen	70cf58e6c1	[SLP][REVEC] Make SLP vectorize shufflevector. (#102489 ) Add getShufflevectorNumGroups to vectorize shufflevector. Current getShufflevectorNumGroups can only vectorize limited pattern (e.g., the masks of shufflevector use the elements of the source in order). In addition, ReuseShuffleIndices and ReorderIndices are not supported.	2024-08-13 11:19:29 +08:00
Alexey Bataev	ecbbe5b431	[SLP]Fix mask building for alternate node cost estimation (#102966 ) Need to to use same functionality in cost model, as for the codegen, to correctly build the shuffle mask and estimate the cost.	2024-08-12 17:26:56 -04:00
Alexey Bataev	b10ecfa914	[SLP]Represent externally used values as original scalars, if profitable. Currently SLP vectorizer tries to keep only GEPs as scalar, if they are vectorized but used externally. Same approach can be used for all scalar values. This patch tries to keep original scalars if all its operands remain scalar or externally used, the cost of the original scalar is lower than the cost of the extractelement instruction, or if the number of externally used scalars in the same entry is power of 2. Last criterion allows better revectorization for multiply used scalars. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/100904	2024-08-12 10:15:02 -04:00
Alexey Bataev	34514ce09a	[SLP][NFC]Use local getShuffleCost function across the code, NFC.	2024-08-12 06:49:53 -07:00
Alexey Bataev	2a05971de2	[SLP]Add index of the node to the short name output. Improves debugging experience, does nothing with the functionality.	2024-08-08 08:57:14 -07:00
Han-Kuan Chen	7a4fc7491c	[SLP][REVEC] Fix insertelement has multiple uses. (#102329 )	2024-08-08 23:23:10 +08:00
Alexey Bataev	7e7a439705	[SLP][NFC]Introduce CombinedVectorize nodes, NFC. (#99309 ) This adds combined vectorized node. It simplifies handling of the combined nodes, like select/cmp, which can be reduced to min/max, mul/add transformed to fma, etc. Improves cost mode handling and may end up with better codegen in future (direct emission of the intrinsics).	2024-08-08 08:05:33 -04:00
Han-Kuan Chen	60ac34701e	[SLP][REVEC] Make getAltInstrMask and getGatherCost vectorize vector instructions. (#99461 )	2024-08-08 10:39:01 +08:00
John McIver	bb82c79d3b	[SLP] Enable optimization of freeze instructions (#102217 ) Allow SLP optimization to progress in the presence of freeze instructions. Prior to this commit, freeze instructions blocked SLP optimization. The following URL shows correctness of the addsub_freeze test: https://alive2.llvm.org/ce/z/qm38oh	2024-08-07 15:01:37 -04:00
Han-Kuan Chen	97743b8be8	[SLP][REVEC] Make ShuffleCostEstimator and ShuffleInstructionBuilder support vector instructions. (#99499 ) 1. When REVEC is enabled, we need to expand vector types into scalar types. 2. When REVEC is enabled, CreateInsertVector (and CreateExtractVector) is used because the scalar type may be a FixedVectorType. 3. Since the mask indices which are used by processBuildVector expect the source is scalar type, we need to transform the mask indices into a form which can be used when REVEC is enabled. The transform is only called when the mask is really used.	2024-08-07 23:47:57 +08:00
Alexey Bataev	441f94f4bd	[SLP]Fix PR102279: check the tracked values for extractelements, not the original values If the reduced value was replaced by the extractelement instruction during vectorization and we attempt to check if this is so, need to check the tracked value, not the original (deleted) instruction. Otherwise, the compiler may crash Fixes https://github.com/llvm/llvm-project/issues/102279	2024-08-07 04:21:24 -07:00
tcwzxx	b64ec3c9fa	[SLP] The order of store chains needs to consider the size of the values. (#101810 ) When store chains have the same value type ID and pointer type ID, they may mix different sizes of values, such as i8 and i64. This can lead to missed vectorization opportunities.	2024-08-07 11:01:53 +08:00
Alexey Bataev	af80d3a248	[SLP]Better sorting of phi instructions by comparing type sizes (#102188 ) Currently SLP vectorizer compares phi instructions by the type id of the compared instructions, which may failed in case of different integer types, with the different sizes. Patch adds comparison by type sizes to fix this.	2024-08-06 16:09:11 -04:00
Alexey Bataev	2601d6f189	[SLP]Fix PR102187: do not insert extractelement before landingpad instruction. Landingpad instruction must be the very first instruction after the phi nodes, so need to inser extractelement/shuffles after this instruction. Fixes https://github.com/llvm/llvm-project/issues/102187	2024-08-06 12:33:13 -07:00
Alexey Bataev	3c3ea7e751	[SLP]Better sorting of cmp instructions by comparing type sizes. Currently SLP vectorizer compares cmp instructions by the type id of the compared operands, which may failed in case of different integer types, for example, which have same type id, but different sizes. Patch adds comparison by type sizes to fix this. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/102132	2024-08-06 11:03:36 -04:00
Alexey Bataev	daf4a06e5c	[SLP]Try detect strided loads, if any pointer op require extraction. If any pointer operand of the non-cosencutive loads is an instructions with the user, which is not part of the current graph, and, thus, requires emission of the extractelement instruction, better to try to detect if the load sequence can be repsented as strided load and extractelement instructions for pointers are not required. Reviewers: preames, RKSimon, topperc Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/101668	2024-08-06 09:20:50 -04:00
Alexey Bataev	799fd3d87b	[SLP]Support vectorization of small strided loads only graph. If the graph includes only strided loads node, the compiler should still try to vectorize it. Reviewers: RKSimon, preames, topperc Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/101659	2024-08-05 12:51:10 -04:00
Kazu Hirata	b7146aed5b	[Transforms] Construct SmallVector with ArrayRef (NFC) (#101851 )	2024-08-03 15:33:08 -07:00
Florian Hahn	edf46f365c	[SCEV] Use const SCEV * explicitly in more places. Use const SCEV * explicitly in more places to prepare for https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.	2024-08-03 20:10:01 +01:00
Han-Kuan Chen	b5a7d3b6c2	[SLP][REVEC] Make Instruction::Select support vector instructions. (#100507 )	2024-07-31 23:03:50 +08:00
Alexey Bataev	6b1d13761a	[SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value. When trying to reuse extractelement instruction, need to check that it is inserted into proper position. Its original vector operand should come before new vector value, otherwise new extractelement instruction must be generated. Fixes https://github.com/llvm/llvm-project/issues/101213	2024-07-30 16:02:46 -07:00
Alexey Bataev	a6ef0864e9	Revert "[SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value." This reverts commit f70f1228035c9610de38e0e376afdacb647c4ad9 to fix the crash reported by https://lab.llvm.org/buildbot/#/builders/133/builds/2456.	2024-07-30 15:11:35 -07:00
David Green	89b67a6400	[SLP] Cluster SortedBases before sorting. (#101144 ) In order to enforce a strict-weak ordering, this patch clusters the bases that are being sorted by the root - the first value in a gep chain. The sorting is then performed in each cluster.	2024-07-30 22:12:20 +01:00
Alexey Bataev	f70f122803	[SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value. When trying to reuse extractelement instruction, need to check that it is inserted into proper position. Its original vector operand should come before new vector value, otherwise new extractelement instruction must be generated. Fixes https://github.com/llvm/llvm-project/issues/101213	2024-07-30 14:04:50 -07:00
Alexey Bataev	197f4a9051	[SLP]Remove ExtraArgs from reductions. No need to handle extra arguments during the reductions anymore, the compiler now can handle all reduced values and reduction operands correctly, even if they are from different basic blocks. Simplifies analysis, reduces compiler size, improves overall vectorization. Metric: size..text test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test 16668.00 17148.00 2.9% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2389675.00 2418683.00 1.2% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 253517.00 253645.00 0.1% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309678.00 309806.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 389203.00 389363.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111120.00 111152.00 0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1039103.00 1039215.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1155883.00 1155963.00 0.0% test-suite :: MicroBenchmarks/LoopVectorization/LoopInterleavingBenchmarks.test 276646.00 276662.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 848691.00 848739.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1138604.00 1138636.00 0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 910201.00 910217.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12385484.00 12385628.00 0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 9667580.00 9667676.00 0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 9667580.00 9667676.00 0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2856182.00 2856198.00 0.0% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2856182.00 2856198.00 0.0% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 773224.00 773192.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035148.00 1035084.00 -0.0% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 98126.00 98094.00 -0.0% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 97966.00 97934.00 -0.0% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 167391.00 167215.00 -0.1% test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test 56685.00 56605.00 -0.1% test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test 56685.00 56605.00 -0.1% test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-20050826-2.test 1302.00 1294.00 -0.6% Misc-C++/stepanov_container - better code due to cost fixes. 483.xalancbmk - better code due to cost fixes. ASCI_Purple/SMG2000 - better code due to cost fixes. Benchmarks/Bullet - better vector code because of the cost. JM/ldecod - extra code remain scalar, extra reduction vectorized consumer-jpeg - extra code remain scalar because of the cost. tramp3d-v4 - better vectorization because of cost fixes. 511.povray_r - better vectorization because of cost fixes. LoopInterleavingBenchmarks - extra reductions are vectorized JM/lencod - small changes in vector code because of extract cost fixes. 453.povray - small changes in vector code because of extract cost fixes. 445.gobmk - extra small reduction vectorized 526.blender_r - extra reduced scalars, better small reduction, small changes in the vetorization because of the fixes for extracts cost 602.gcc_s 502.gcc_r - small changes in reductions vectorization because of the fixes in the extract cost. 631.deepsjeng_s 623.xalancbmk_s - small changes in reductions vectorization because of the fixes in the extract cost. MallocBench/gs - extra code remain scalar because of extracts cost alacconvert-encode - extra code remain scalar because of extracts cost alacconvert-decode - extra code remain scalar because of extracts cost GCC-C-execute-20050826-2 - extra reduction gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99923	2024-07-29 13:23:56 -04:00
David Green	f2d2ae3f5a	[SLP] Order clustered load base pointers by ascending offsets (#100653 ) This attempts to fix a regression from #98025, where the new order of reduction nodes causes later passes to not be able to produce as nice shuffles. The issue boils down to picking an order of [0 1 3 2] for loaded v4i8 values, which meant later parts could not find a simpler ordering for the shuffles given the legal nodes available in AArch64. If instead we make sure they are ordered [0 1 2 3] then everything can fall into place. In order to produce a better order that is more likely to work in more cases, this patch takes the existing clustered loads and sort the base pointers if there is an order between them. i.e if `V2 == gep (V1, X)` then V1 is sorted before V2.	2024-07-27 11:18:56 +01:00
Alexey Bataev	1e1c8d1615	[SLP]Add external uses cost for the gathered loads. If the load is a part of the gather node and also a part of the vectorized subvector, need to add the estimation for the non-vectorized external uses. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99889	2024-07-26 11:09:44 -04:00
Han-Kuan Chen	5fc9502f19	[SLP] NFC. ShuffleInstructionBuilder::add V1->getType() is always a FixedVectorType. (#99842 ) castToScalarTyElem has a cast<VectorType>(V->getType()).	2024-07-24 01:40:24 +08:00
Alexey Bataev	3cb82f49dc	[SLP]Fix PR99899: Use canonical type instead of original vector of ptr. Use adjusted canonical integer type instead of the original ptr type to fix the crash in the TTI. Fixes https://github.com/llvm/llvm-project/issues/99899	2024-07-22 13:05:12 -07:00
Alexey Bataev	f6e01b9ece	[SLP]Do not trunc bv nodes, if the user is vectorized an requires wider type. If at least a single user of the gathered trunc'ed instruction is vectorized and requires wider type, than the trunc node, such gathers/buildvectors should not be optimized for better bitwidth.	2024-07-19 07:28:04 -07:00
Yangyu Chen	007aa6d1b2	[SLP] Increase UsesLimit to 64 (#99467 ) Since commit 82b800ecb35fb46881aa52000fa40b1b99aa654e addressed the issue #99327 , we see some performance regression (13%) on some verilator generated C++ code. This is because the UsesLimit is set to 8, which is too small for the verilator generated code. I have analyzed the need for the UsesLimit from [1] and found that the UsesLimit should be at least 64 to cover most of these cases. Thus, This patch increases the UsesLimit to 64. Link: https://github.com/llvm/llvm-project/issues/99327#issuecomment-2236052879 [1] Signed-off-by: Yangyu Chen <cyy@cyyself.name>	2024-07-19 20:32:28 +08:00
Han-Kuan Chen	39bb244a16	[SLP][REVEC] Make Instruction::Call support vector instructions. (#99317 )	2024-07-18 20:49:53 +08:00
Han-Kuan Chen	b634e057dd	[SLP][REVEC] Fix false assumption of the source for castToScalarTyElem. (#99424 ) The argument V may come from adjustExtracts, which is the vector operand of ExtractElementInst. In addition, it is not existed in getTreeEntry. The vector operand of ExtractElementInst may have a type of <1 x Ty>, ensuring that the number of elements in ScalarTy and VecTy are equal. reference: https://github.com/llvm/llvm-project/issues/99411	2024-07-18 19:54:46 +08:00
Alexey Bataev	82b800ecb3	[SLP][NFC]Limit number of the external uses analysis, NFC. BoUpSLP::buildExternalUses runs through all the users of the vectorized scalars, which may require significant amount of time, if there are too many users. Limited the analysis, if there are too many users, all of them are replaced, not individually.	2024-07-17 14:12:22 -07:00
Alexey Bataev	c5c1bd164f	[SLP]Improve minbitwidth analysis for trun'ed gather nodes. If the gather node is trunc'ed, better to trunc scalars and then gather them rather than gather and then trunc. Trunc for scalars is free in most cases. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99072	2024-07-17 07:41:00 -07:00
Alexey Bataev	05b067b5f9	Revert "[SLP]Improve minbitwidth analysis for trun'ed gather nodes." This reverts commit d3d2f9a4208eedbd2f372c34725ab61c3f4d3aed to fix buildbot https://lab.llvm.org/buildbot/#/builders/92/builds/1880.	2024-07-17 07:31:27 -07:00
Alexey Bataev	d3d2f9a420	[SLP]Improve minbitwidth analysis for trun'ed gather nodes. If the gather node is trunc'ed, better to trunc scalars and then gather them rather than gather and then trunc. Trunc for scalars is free in most cases. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99072	2024-07-17 07:29:02 -07:00
Alexey Bataev	b05ccaf451	Revert "[SLP]Improve minbitwidth analysis for trun'ed gather nodes." This reverts commit 6425f2d66740b84fc3027b649cd4baf660c384e8 to fix the buildbost issues reported in https://lab.llvm.org/buildbot/#/builders/95/builds/1404.	2024-07-17 05:51:54 -07:00
Han-Kuan Chen	1813ffd6b2	[SLP][REVEC] Make SLP support revectorization (-slp-revec) and add simple test. (#98269 ) This PR will make SLP support revectorization. Add an option -slp-revec to control the functionality. reference: https://discourse.llvm.org/t/rfc-make-slp-vectorizer-revectorize-vector-instructions/79436	2024-07-17 20:14:12 +08:00
Alexey Bataev	6425f2d667	[SLP]Improve minbitwidth analysis for trun'ed gather nodes. If the gather node is trunc'ed, better to trunc scalars and then gather them rather than gather and then trunc. Trunc for scalars is free in most cases. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99072	2024-07-17 07:17:25 -04:00
Alexey Bataev	15915c06d5	[SLP]Do not vectorize small (<=2) buildvector/buildvalue sequences with MaxVF==true. If MaxVFOnly for buildvector/buildvalue vectorization is set to true and the total number of elements to vectorize is <= 2, better to try to vectorize reductions at first, which may produce larger tree (reductions have a limit of at least 4 elements to vectorize). Smaller buildvector/buildvalue sequence will be attempted to vectorize later, with MaxVFOnly set to false. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/98957	2024-07-16 12:45:58 -04:00
Alexey Bataev	8ff233f4f1	[SLP]Correctly detect minnum/maxnum patterns for select/cmp operations on floats. The patch enables detection of minnum/maxnum patterns for float point instruction, represented as select/cmp. Also, enables better cost estimation for integer min/max patterns since the compiler starts to estimate the scalars separately. Reviewers: nikic, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/98570	2024-07-16 09:42:08 -07:00

1 2 3 4 5 ...

1830 Commits