llvm-project

Author	SHA1	Message	Date
Alexey Bataev	b765fdd997	[SLP]Try to keep scalars, used in phi nodes, if phi nodes from same block are vectorized. Before doing the vectorization of the PHI nodes, the compiler sorts them by the opcodes of the operands. If the scalar is replaced during the vectorization by extractelement, it breaks this sorting and prevent some further vectorization attempts. Patch tries to improve this by doing extra analysis of the scalars and tries to keep them, if it is found that this scalar is used in other (external) PHI node in the same block. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/103923	2024-08-21 15:23:47 -04:00
Alexey Bataev	e31252bf54	[SLP]Fix PR105120: fix the order of phi nodes vectorization. The operands of the phi nodes should be vectorized in the same order, in which they were created, otherwise the compiler may crash when trying to correctly build dependency for nodes with non-schedulable instructions for gather/buildvector nodes. Fixes https://github.com/llvm/llvm-project/issues/105120	2024-08-21 12:22:01 -07:00
tcwzxx	816068e462	[NFC][SLP] Remove useless code of the schedule (#104697 ) Currently, the SLP schedule has two containers of `ScheduleData`: `ExtraScheduleDataMap` and `ScheduleDataMap`. However, the `ScheduleData` in `ExtraScheduleDataMap` is only used to indicate whether the instruction is processed or not and does not participate in the schedule, which is useless. `ScheduleDataMap` is sufficient for this purpose. The `OpValue` member is used only in `ExtraScheduleDataMap`, which is also useless.	2024-08-19 20:16:51 +08:00
Alexey Bataev	4a0bbbcbcf	[SLP]Fix PR104637: do not create new nodes for fully overlapped non-schedulable nodes If the scalars do not require scheduling and were already vectorized, but in the different order, compiler still tries to create the new node. It may cause the compiler crash for the gathered operands. Instead need to consider such nodes as full overlap and just reshuffle vectorized node. Fixes https://github.com/llvm/llvm-project/issues/104637	2024-08-16 13:49:44 -07:00
Han-Kuan Chen	81f8abdca4	[SLP][REVEC] Fix CreateInsertElement does not use the correct result if MinBWs applied. (#104558 )	2024-08-16 21:09:48 +08:00
Alexey Bataev	b6bb208662	Revert "[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC" This reverts commit 2d52eb6a434fe47e67086f5ec1c3789bf6e7a604 to fix compile time regression found in https://llvm-compile-time-tracker.com/compare.php?from=fcefe957ddfdc5a2fe9463757b597635e3436e01&to=2d52eb6a434fe47e67086f5ec1c3789bf6e7a604&stat=instructions%3Au.	2024-08-15 09:19:01 -07:00
Alexey Bataev	2d52eb6a43	[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC	2024-08-15 08:12:20 -07:00
Alexey Bataev	56140a8258	[SLP]Fix PR104422: Wrong value truncation The minbitwidth restrictions can be skipped only for immediate reduced values, for other nodes still need to check if external users allow bitwidth reduction. Fixes https://github.com/llvm/llvm-project/issues/104422	2024-08-15 08:00:08 -07:00
Nikita Popov	aaab4fcf65	Revert "[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC" This reverts commit e1b15504a831e63af6fb9a6e83faaa10ef425ae6. This causes compile-time regressions, see: http://llvm-compile-time-tracker.com/compare.php?from=e687a9f2dd389a54a10456e57693f93df0c64c02&to=e1b15504a831e63af6fb9a6e83faaa10ef425ae6&stat=instructions:u Probably some of the new SmallVector sizes are sub-optimal.	2024-08-15 15:50:48 +02:00
Alexey Bataev	e1b15504a8	[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC	2024-08-14 12:28:45 -07:00
Alexey Bataev	20b2c9f10f	[SLP][NFC]Use GatheredScalars vector instead of the original E->Scalars, NFC GateredScalars is a full copy of the E->Scalars in this places and can be safely used for now. Unifies the code across the function.	2024-08-14 08:29:38 -07:00
Alexey Bataev	d9b9ae6ba9	[SLP][NFC]Use transform nodes before building external uses, NFC. In preparing for the future upcoming patches, just moving the call to the proper place, which is NFC for now.	2024-08-14 08:19:05 -07:00
Han-Kuan Chen	246f345152	[SLP][REVEC] Make CastInst support vector instructions. (#103216 )	2024-08-13 23:52:32 +08:00
Han-Kuan Chen	6aad4918e8	[SLP][REVEC] Make MinBWs support vector instructions. (#103049 ) If ScalarTy is FixedVectorType, it should remain as FixedVectorType.	2024-08-13 21:35:28 +08:00
Han-Kuan Chen	2256d00a14	[SLP][REVEC] Use VL.front()->getType() as ScalarTy. (#102437 ) VL.front()->getType() may be FixedVectorType when revec is enabled. Fix "Expected item in MinBWs.".	2024-08-13 19:53:45 +08:00
Han-Kuan Chen	875b551de7	[SLP][REVEC] Make computeMinimumValueSizes and collectValuesToDemote support vector instructions. (#103005 )	2024-08-13 19:35:25 +08:00
Vitaly Buka	5ce47a5813	Reland "[Support] Assert that DomTree nodes share parent" (#102782 ) A dominance query of a block that is in a different function is ill-defined, so assert that getNode() is only called for blocks that are in the same function. There are three cases, where this behavior did occur. LoopFuse didn't explicitly do this, but didn't invalidate the SCEV block dispositions, leaving dangling pointers to free'ed basic blocks behind, causing use-after-free. We do, however, want to be able to dereference basic blocks inside the dominator tree, so that we can refer to them by a number stored inside the basic block. Reverts #102780 Reland #101198 Fixes #102784 Co-authored-by: Alexis Engelke <engelke@in.tum.de>	2024-08-13 11:56:02 +02:00
Han-Kuan Chen	b4b0c02306	[SLP][REVEC] Make tryToReduce and related functions support vector instructions. (#102327 )	2024-08-13 11:44:23 +08:00
Han-Kuan Chen	70cf58e6c1	[SLP][REVEC] Make SLP vectorize shufflevector. (#102489 ) Add getShufflevectorNumGroups to vectorize shufflevector. Current getShufflevectorNumGroups can only vectorize limited pattern (e.g., the masks of shufflevector use the elements of the source in order). In addition, ReuseShuffleIndices and ReorderIndices are not supported.	2024-08-13 11:19:29 +08:00
Alexey Bataev	ecbbe5b431	[SLP]Fix mask building for alternate node cost estimation (#102966 ) Need to to use same functionality in cost model, as for the codegen, to correctly build the shuffle mask and estimate the cost.	2024-08-12 17:26:56 -04:00
Alexey Bataev	b10ecfa914	[SLP]Represent externally used values as original scalars, if profitable. Currently SLP vectorizer tries to keep only GEPs as scalar, if they are vectorized but used externally. Same approach can be used for all scalar values. This patch tries to keep original scalars if all its operands remain scalar or externally used, the cost of the original scalar is lower than the cost of the extractelement instruction, or if the number of externally used scalars in the same entry is power of 2. Last criterion allows better revectorization for multiply used scalars. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/100904	2024-08-12 10:15:02 -04:00
Alexey Bataev	34514ce09a	[SLP][NFC]Use local getShuffleCost function across the code, NFC.	2024-08-12 06:49:53 -07:00
Alexey Bataev	2a05971de2	[SLP]Add index of the node to the short name output. Improves debugging experience, does nothing with the functionality.	2024-08-08 08:57:14 -07:00
Han-Kuan Chen	7a4fc7491c	[SLP][REVEC] Fix insertelement has multiple uses. (#102329 )	2024-08-08 23:23:10 +08:00
Alexey Bataev	7e7a439705	[SLP][NFC]Introduce CombinedVectorize nodes, NFC. (#99309 ) This adds combined vectorized node. It simplifies handling of the combined nodes, like select/cmp, which can be reduced to min/max, mul/add transformed to fma, etc. Improves cost mode handling and may end up with better codegen in future (direct emission of the intrinsics).	2024-08-08 08:05:33 -04:00
Han-Kuan Chen	60ac34701e	[SLP][REVEC] Make getAltInstrMask and getGatherCost vectorize vector instructions. (#99461 )	2024-08-08 10:39:01 +08:00
John McIver	bb82c79d3b	[SLP] Enable optimization of freeze instructions (#102217 ) Allow SLP optimization to progress in the presence of freeze instructions. Prior to this commit, freeze instructions blocked SLP optimization. The following URL shows correctness of the addsub_freeze test: https://alive2.llvm.org/ce/z/qm38oh	2024-08-07 15:01:37 -04:00
Han-Kuan Chen	97743b8be8	[SLP][REVEC] Make ShuffleCostEstimator and ShuffleInstructionBuilder support vector instructions. (#99499 ) 1. When REVEC is enabled, we need to expand vector types into scalar types. 2. When REVEC is enabled, CreateInsertVector (and CreateExtractVector) is used because the scalar type may be a FixedVectorType. 3. Since the mask indices which are used by processBuildVector expect the source is scalar type, we need to transform the mask indices into a form which can be used when REVEC is enabled. The transform is only called when the mask is really used.	2024-08-07 23:47:57 +08:00
Alexey Bataev	441f94f4bd	[SLP]Fix PR102279: check the tracked values for extractelements, not the original values If the reduced value was replaced by the extractelement instruction during vectorization and we attempt to check if this is so, need to check the tracked value, not the original (deleted) instruction. Otherwise, the compiler may crash Fixes https://github.com/llvm/llvm-project/issues/102279	2024-08-07 04:21:24 -07:00
tcwzxx	b64ec3c9fa	[SLP] The order of store chains needs to consider the size of the values. (#101810 ) When store chains have the same value type ID and pointer type ID, they may mix different sizes of values, such as i8 and i64. This can lead to missed vectorization opportunities.	2024-08-07 11:01:53 +08:00
Alexey Bataev	af80d3a248	[SLP]Better sorting of phi instructions by comparing type sizes (#102188 ) Currently SLP vectorizer compares phi instructions by the type id of the compared instructions, which may failed in case of different integer types, with the different sizes. Patch adds comparison by type sizes to fix this.	2024-08-06 16:09:11 -04:00
Alexey Bataev	2601d6f189	[SLP]Fix PR102187: do not insert extractelement before landingpad instruction. Landingpad instruction must be the very first instruction after the phi nodes, so need to inser extractelement/shuffles after this instruction. Fixes https://github.com/llvm/llvm-project/issues/102187	2024-08-06 12:33:13 -07:00
Alexey Bataev	3c3ea7e751	[SLP]Better sorting of cmp instructions by comparing type sizes. Currently SLP vectorizer compares cmp instructions by the type id of the compared operands, which may failed in case of different integer types, for example, which have same type id, but different sizes. Patch adds comparison by type sizes to fix this. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/102132	2024-08-06 11:03:36 -04:00
Alexey Bataev	daf4a06e5c	[SLP]Try detect strided loads, if any pointer op require extraction. If any pointer operand of the non-cosencutive loads is an instructions with the user, which is not part of the current graph, and, thus, requires emission of the extractelement instruction, better to try to detect if the load sequence can be repsented as strided load and extractelement instructions for pointers are not required. Reviewers: preames, RKSimon, topperc Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/101668	2024-08-06 09:20:50 -04:00
Alexey Bataev	799fd3d87b	[SLP]Support vectorization of small strided loads only graph. If the graph includes only strided loads node, the compiler should still try to vectorize it. Reviewers: RKSimon, preames, topperc Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/101659	2024-08-05 12:51:10 -04:00
Kazu Hirata	b7146aed5b	[Transforms] Construct SmallVector with ArrayRef (NFC) (#101851 )	2024-08-03 15:33:08 -07:00
Florian Hahn	edf46f365c	[SCEV] Use const SCEV * explicitly in more places. Use const SCEV * explicitly in more places to prepare for https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.	2024-08-03 20:10:01 +01:00
Han-Kuan Chen	b5a7d3b6c2	[SLP][REVEC] Make Instruction::Select support vector instructions. (#100507 )	2024-07-31 23:03:50 +08:00
Alexey Bataev	6b1d13761a	[SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value. When trying to reuse extractelement instruction, need to check that it is inserted into proper position. Its original vector operand should come before new vector value, otherwise new extractelement instruction must be generated. Fixes https://github.com/llvm/llvm-project/issues/101213	2024-07-30 16:02:46 -07:00
Alexey Bataev	a6ef0864e9	Revert "[SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value." This reverts commit f70f1228035c9610de38e0e376afdacb647c4ad9 to fix the crash reported by https://lab.llvm.org/buildbot/#/builders/133/builds/2456.	2024-07-30 15:11:35 -07:00
David Green	89b67a6400	[SLP] Cluster SortedBases before sorting. (#101144 ) In order to enforce a strict-weak ordering, this patch clusters the bases that are being sorted by the root - the first value in a gep chain. The sorting is then performed in each cluster.	2024-07-30 22:12:20 +01:00
Alexey Bataev	f70f122803	[SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value. When trying to reuse extractelement instruction, need to check that it is inserted into proper position. Its original vector operand should come before new vector value, otherwise new extractelement instruction must be generated. Fixes https://github.com/llvm/llvm-project/issues/101213	2024-07-30 14:04:50 -07:00
Alexey Bataev	197f4a9051	[SLP]Remove ExtraArgs from reductions. No need to handle extra arguments during the reductions anymore, the compiler now can handle all reduced values and reduction operands correctly, even if they are from different basic blocks. Simplifies analysis, reduces compiler size, improves overall vectorization. Metric: size..text test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test 16668.00 17148.00 2.9% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2389675.00 2418683.00 1.2% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 253517.00 253645.00 0.1% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309678.00 309806.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 389203.00 389363.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111120.00 111152.00 0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1039103.00 1039215.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1155883.00 1155963.00 0.0% test-suite :: MicroBenchmarks/LoopVectorization/LoopInterleavingBenchmarks.test 276646.00 276662.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 848691.00 848739.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1138604.00 1138636.00 0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 910201.00 910217.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12385484.00 12385628.00 0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 9667580.00 9667676.00 0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 9667580.00 9667676.00 0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2856182.00 2856198.00 0.0% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2856182.00 2856198.00 0.0% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 773224.00 773192.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035148.00 1035084.00 -0.0% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 98126.00 98094.00 -0.0% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 97966.00 97934.00 -0.0% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 167391.00 167215.00 -0.1% test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test 56685.00 56605.00 -0.1% test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test 56685.00 56605.00 -0.1% test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-20050826-2.test 1302.00 1294.00 -0.6% Misc-C++/stepanov_container - better code due to cost fixes. 483.xalancbmk - better code due to cost fixes. ASCI_Purple/SMG2000 - better code due to cost fixes. Benchmarks/Bullet - better vector code because of the cost. JM/ldecod - extra code remain scalar, extra reduction vectorized consumer-jpeg - extra code remain scalar because of the cost. tramp3d-v4 - better vectorization because of cost fixes. 511.povray_r - better vectorization because of cost fixes. LoopInterleavingBenchmarks - extra reductions are vectorized JM/lencod - small changes in vector code because of extract cost fixes. 453.povray - small changes in vector code because of extract cost fixes. 445.gobmk - extra small reduction vectorized 526.blender_r - extra reduced scalars, better small reduction, small changes in the vetorization because of the fixes for extracts cost 602.gcc_s 502.gcc_r - small changes in reductions vectorization because of the fixes in the extract cost. 631.deepsjeng_s 623.xalancbmk_s - small changes in reductions vectorization because of the fixes in the extract cost. MallocBench/gs - extra code remain scalar because of extracts cost alacconvert-encode - extra code remain scalar because of extracts cost alacconvert-decode - extra code remain scalar because of extracts cost GCC-C-execute-20050826-2 - extra reduction gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99923	2024-07-29 13:23:56 -04:00
David Green	f2d2ae3f5a	[SLP] Order clustered load base pointers by ascending offsets (#100653 ) This attempts to fix a regression from #98025, where the new order of reduction nodes causes later passes to not be able to produce as nice shuffles. The issue boils down to picking an order of [0 1 3 2] for loaded v4i8 values, which meant later parts could not find a simpler ordering for the shuffles given the legal nodes available in AArch64. If instead we make sure they are ordered [0 1 2 3] then everything can fall into place. In order to produce a better order that is more likely to work in more cases, this patch takes the existing clustered loads and sort the base pointers if there is an order between them. i.e if `V2 == gep (V1, X)` then V1 is sorted before V2.	2024-07-27 11:18:56 +01:00
Alexey Bataev	1e1c8d1615	[SLP]Add external uses cost for the gathered loads. If the load is a part of the gather node and also a part of the vectorized subvector, need to add the estimation for the non-vectorized external uses. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/99889	2024-07-26 11:09:44 -04:00
Han-Kuan Chen	5fc9502f19	[SLP] NFC. ShuffleInstructionBuilder::add V1->getType() is always a FixedVectorType. (#99842 ) castToScalarTyElem has a cast<VectorType>(V->getType()).	2024-07-24 01:40:24 +08:00
Alexey Bataev	3cb82f49dc	[SLP]Fix PR99899: Use canonical type instead of original vector of ptr. Use adjusted canonical integer type instead of the original ptr type to fix the crash in the TTI. Fixes https://github.com/llvm/llvm-project/issues/99899	2024-07-22 13:05:12 -07:00
Alexey Bataev	f6e01b9ece	[SLP]Do not trunc bv nodes, if the user is vectorized an requires wider type. If at least a single user of the gathered trunc'ed instruction is vectorized and requires wider type, than the trunc node, such gathers/buildvectors should not be optimized for better bitwidth.	2024-07-19 07:28:04 -07:00
Yangyu Chen	007aa6d1b2	[SLP] Increase UsesLimit to 64 (#99467 ) Since commit 82b800ecb35fb46881aa52000fa40b1b99aa654e addressed the issue #99327 , we see some performance regression (13%) on some verilator generated C++ code. This is because the UsesLimit is set to 8, which is too small for the verilator generated code. I have analyzed the need for the UsesLimit from [1] and found that the UsesLimit should be at least 64 to cover most of these cases. Thus, This patch increases the UsesLimit to 64. Link: https://github.com/llvm/llvm-project/issues/99327#issuecomment-2236052879 [1] Signed-off-by: Yangyu Chen <cyy@cyyself.name>	2024-07-19 20:32:28 +08:00
Han-Kuan Chen	39bb244a16	[SLP][REVEC] Make Instruction::Call support vector instructions. (#99317 )	2024-07-18 20:49:53 +08:00

1 2 3 4 5 ...

1840 Commits