llvm-project

Author	SHA1	Message	Date
Anton Afanasyev	954ea0f044	[SLP] Simplify indices processing for insertelements Get rid of non-constant and undef indices of insertelements at `buildTree()` stage. Fix bugs. Differential Revision: https://reviews.llvm.org/D119623	2022-02-14 14:50:44 +03:00
Anton Afanasyev	cd685f5736	[NFC][SLP] Set default parameter for Offset equal to zero	2022-02-11 17:22:33 +03:00
Alexey Bataev	370ea1a199	[SLP][NFC]Fix comment, NFC.	2022-02-09 07:14:14 -08:00
Djordje Todorovic	afd54e1ed1	[SLPVectorizer] Fix "unused variable" build warning	2022-02-07 10:38:19 +01:00
Benjamin Kramer	ce9417348e	[SLP] Skip a DenseSet<unsigned> -> bit vector conversion. NFCI.	2022-02-06 00:57:47 +01:00
Philip Reames	0cc6165d05	[SLP] Strengthen internal asserts about scheduled node state [NFC] All members of a scheduled bundle must have valid dependencies, with no unscheduled ones, and only the lead element gets marked scheduled.	2022-02-04 12:22:52 -08:00
Philip Reames	f3f8e3da9f	[SLP] Remove ScheduleData::UnscheduledDepsInBundle field [NFC-ish] We can simply compute the value of this field on demand. Doing so clarifies the behavior when one of the instructions within a bundle doesn't have valid dependencies. I vaguely thing this could change behavior slightly, but none of the test cases are affected, and my attempts to write one by hand have failed. This also minorly reduces memory usage, but that's a secondary value at best.	2022-02-04 10:12:09 -08:00
Philip Reames	bb9964ba43	[SLP] Have only ready items in ready list [NFC] This adds the assertion that all items in the ready list are in-fact scheduleable entities ready to be scheduled. This involves changing the ReadyInsts structure to be a set, and fixing a couple places where we left nodes on the list when they were no longer ready.	2022-02-03 19:49:24 -08:00
Philip Reames	2cbc92fb11	[SLP] Strengthen internal invariant assertions slightly This builds on the invariant checks introduced in 1519629, and adds a couple more than seem to hold without additional work.	2022-02-03 14:56:39 -08:00
Philip Reames	1519629a20	[SLP] Add basic self consistency asserts into scheduling The idea here is to have a verify routine we can call during scheduling to ensure broken invariants are reported. The intent is to help in debugging scheduling bugs. At the moment, only the most basic properties are checked as adding several I thought held reported failures.	2022-02-03 13:27:35 -08:00
Philip Reames	6d0c007bc1	[SLP] Fix a typo in comment	2022-02-03 09:11:47 -08:00
Alexey Bataev	802ceb8343	[SLP]Excluded external uses from the reordering estimation. Compiler adds the estimation for the external uses during operands reordering analysis, which makes it tend to prefer duplicates in the lanes rather than diamond/shuffled match in the graph. It changes the sizes of the vector operands and may prevent some vectorization. We don't need this kind of estimation for the analysis phase, because we just need to choose the most compatible instruction and it does not matter if it has external user or used in the non-matching lane. Instead, we count the number of unique instruction in the lane and see if the reassociation changes the number of unique scalars to be power of 2 or not. If we have power of 2 unique scalars in the lane, it is considered more profitable rather than having non-power-of-2 number of unique scalars. Metric: SLP.NumVectorInstructions test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test 70.00 86.00 22.9% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 346.00 353.00 2.0% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 346.00 353.00 2.0% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 235.00 239.00 1.7% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 235.00 239.00 1.7% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8723.00 8834.00 1.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00 1.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9184.00 0.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4235.00 4245.00 0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00 0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 783.00 782.00 -0.1% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 69.00 68.00 -1.4% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 89.00 80.00 -10.1% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 89.00 80.00 -10.1% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 260.00 215.00 -17.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 256.00 211.00 -17.6% MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same. SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by one <4 x> load. External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and not inlined anymore. External/SPEC/CINT2017rate/541.leela_r - same xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in multi-block tree, the result is pretty the same. External/SPEC/CINT2017speed/631.deepsjeng_s - same. MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same as before. MultiSource/Benchmarks/MiBench/consumer-jpeg - same. Differential Revision: https://reviews.llvm.org/D116688	2022-02-03 06:50:06 -08:00
Alexey Bataev	ad2a0ccf8f	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-03 06:24:10 -08:00
Alexey Bataev	8a1dfbc4d8	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit 842a2360a84692f2e4c37cc3e652640e6627d004 to fix the bugs reported by users in https://reviews.llvm.org/D115955#3291538.	2022-02-02 12:06:36 -08:00
Alexey Bataev	842a2360a8	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-02 10:32:52 -08:00
Benjamin Kramer	0c3d22a592	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit 83620bd2ad867f706c699d0f2b8be10e43d9f3d7. It's causing miscompilations, see review comments at https://reviews.llvm.org/D115955	2022-02-02 13:08:51 +01:00
Alexey Bataev	83620bd2ad	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-01 09:54:20 -08:00
Benjamin Kramer	5281f0dab2	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit afaaecc88c6e5989de8a6a0266610860ef99d9d6. Crashes when compiling SciPy, test case https://reviews.llvm.org/P8276	2022-02-01 11:40:43 +01:00
Alexey Bataev	afaaecc88c	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-01-31 11:11:25 -08:00
Philip Reames	6888081e32	[SLP] Use moveBefore to simplify code [NFC]	2022-01-28 12:44:07 -08:00
Philip Reames	746e435ff7	Revert "[SLP] Add a clarifying assert in block scheduling [NFC]" This reverts commit db49a78900f5e4b59714565876b5dbb5e2dfe840. The reasoning in the patch applied to a downstream branch, and I got myself confused when trying to split apart pieces. Thankfully, the assert was simply weaker than the actual invariant currently upstream which is that ReadyInsts is not empty.	2022-01-28 12:10:31 -08:00
Philip Reames	db49a78900	[SLP] Add a clarifying assert in block scheduling [NFC] The fact we could have a block with a valid scheduling window, but nothing to schedule was surprising to me. After digging through the code, this can only happen if we don't find anything to directly vectorize. However, the reduction handling code relies on this mode, so we can't simply consider such trees unvectorizeable. The assert conveys both that this situation can happen, but also that it can only happen for an immediate gather. Context: We built the bundle before deciding that vectorization of a bundle is possible. A side effect of bundle construction is manipulating the scheduling window, so a bundle which isn't vectorizable can cause the creation or expansion of a scheduling window.	2022-01-28 11:08:59 -08:00
Alexey Bataev	cec8b614f3	[SLP]Do not reorder top nodes if they do not require reordering. No need to reorder the top nodes, if they are not stores or insertelement instructions and each node should be analized only once, when the bottom-to-top analysis is performed. We still endup with extractelements for the top node scalars and the final shuffle just adds an extra cost and currently crashes the compiler for PHI nodes. Differential Revision: https://reviews.llvm.org/D116760	2022-01-28 09:16:18 -08:00
eopXD	6be77561f8	[SLP][NFC] Add debug logs for entry. Tell the users they are specifying something without vector register. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D117980	2022-01-24 09:05:21 -08:00
Kazu Hirata	f63a9cd99d	[Vectorize] Remove unused variables (NFC)	2022-01-23 20:32:54 -08:00
Philip Reames	c0906f6b21	[SLP] Remove stray semicolon to make bots happy Certain bots (e.g. sanitizer-x86_64-linux-android) appear to be running with strict c++98 flags which disallow ; at global scope.	2022-01-20 14:09:28 -08:00
Philip Reames	5a670f1378	[SLP] Kill an unused param and use a for-loop in calculateDependencies [NFC]	2022-01-20 13:58:20 -08:00
Philip Reames	60f6191879	[SLP] Extract formBundle helper for readability [NFC]	2022-01-20 13:08:37 -08:00
Philip Reames	118babe67a	[SLP] Use for loops for walking bundle elements	2022-01-20 12:44:33 -08:00
Philip Reames	860038e0d7	[SLP] Rename a couple lambdas to be more clearly separate from method names	2022-01-20 12:13:30 -08:00
Philip Reames	c104fca36b	{SLP] Delete dead code in favor of proper assert [NFC]	2022-01-20 08:54:12 -08:00
Philip Reames	c43ebae838	[SLP] Reduce nesting depth in calculateDependencies via for loop and early continue [NFC]	2022-01-20 08:46:44 -08:00
Philip Reames	3c422cbe6b	[SLP] Add an asser to make a non-obvious precondition clear [NFC]	2022-01-20 08:24:10 -08:00
Kazu Hirata	435a5a3652	[llvm] Fix bugprone argument comments (NFC) Identified with bugprone-argument-comment.	2022-01-08 11:56:38 -08:00
Alexey Bataev	d130df544d	[SLP]Improve reordering for the nodes beeing used in alternate vectorization. No need to include the order of the scalars beeing used as part of the alternate vectorization into account when trying to reorder the whole graph. Such elements better to reorder in the following phase because the subtree still ends up in shuffle. Part of D116688, fixes the regression in D116690. Differential Revision: https://reviews.llvm.org/D116740	2022-01-06 11:18:57 -08:00
Alexey Bataev	7cb19fe493	[SLP]Initialize the lane with the given value instead of default 0. There is a bug in the reordering analysis stage. If the element with the given hash is not added to the map but has the same number of APOs and instructions with same parent, but different instruction opcode, it will be initalized with default values and then the counter is increased by 1. But the lane is not updated and default to 0 instead of the actual `Lane` value. It leads to the fact that the analysis is useless in many cases and default to lane 0 instead of actual lane with the minimum amount of APO operands. Differential Revision: https://reviews.llvm.org/D116690	2022-01-06 10:57:11 -08:00
Alexey Bataev	700997aef8	[SLP][NFC]Fix comment, NFC.	2022-01-06 06:38:29 -08:00
Alexey Bataev	dd83befe33	[SLP][NFC]Improved isAltShuffle by comparing instructions instead of opcodes, NFC. NFC part of D115955.	2022-01-05 12:30:13 -08:00
Alexey Bataev	e0efedd2c3	[SLP][NFC]Fix non-determinism in reordering, NFC. Need to clear CurrentOrder order mask if it is determined that extractelements form identity order and need to use a vector-like construct when iterating over ordered entries in the reorderTopToBottom function.	2021-12-30 13:10:25 -08:00
Alexey Bataev	ab9078f3d3	[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy. Need to check for the number of the unique non-constant values since the unique values may include several constants. Differential Revision: https://reviews.llvm.org/D115939	2021-12-20 07:21:20 -08:00
Alexey Bataev	4459a11f4d	Revert "[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy." This reverts commit fcaf290d0278bb83387e1a1d972c55e08b8c40e3 to fix test mismatch reported in https://lab.llvm.org/buildbot#builders/117/builds/3531	2021-12-20 07:21:18 -08:00
Alexey Bataev	fcaf290d02	[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy. Need to check for the number of the unique non-constant values since the unique values may include several constants. Differential Revision: https://reviews.llvm.org/D115939	2021-12-20 05:15:01 -08:00
Alexey Bataev	71fe59212c	[SLP][NFC]Adjust type in debug output loop. The ReuseShuffleIndices indeces are integer, not unsigned, need to fix the type in the debug print loop.	2021-12-17 12:43:01 -08:00
Alexey Bataev	46ad66b817	[SLP][NFC]Use 'llvm::copy' instead of element-by-elemen copying.	2021-12-17 12:07:59 -08:00
Alexey Bataev	65fc992579	[SLP]Early exit out of the reordering if shuffled/perfect diamond match found. Need to early exit out of the reordering process if the perfect/shuffled match is found in the operands. Such pattern will result in not profitable reordering because of (false positive) external use of scalars. Differential Revision: https://reviews.llvm.org/D115811	2021-12-16 11:09:49 -08:00
Alexey Bataev	6f2e087631	[SLP]Do not represent splats as node with the reused scalars. No need to represent splats as a node with the reused scalars, it may increase the cost (currently pass just ignores extra shuffle cost and it is still not correct). Differential Revision: https://reviews.llvm.org/D115800	2021-12-15 06:33:11 -08:00
Alexey Bataev	bd05376986	[SLP]Improve multinode analysis. Changes the preliminary multinode analysis: 1. Introduced scores for reversed loads/extractelements. 2. Improved shallow score calculation. 3. Lowered the cost of external uses (no need to consider it several times, just ones). 4. The initial lane for analysis is the one with the minimal possible reorderings. These changes in general shall reduce compile time and improve the reordering in many cases. Part of D57059. Differential Revision: https://reviews.llvm.org/D101109	2021-12-14 06:01:52 -08:00
Alexey Bataev	e5b191a433	[SLP]Improve/fix reodering for gather nodes with extractelements/undefs. If the gather node is a mix of undefvalues and exractelement instructions, need to take the ordering for such nodes into account too. It allows to reorder some (sub)trees and remove some extra shuffles, improving overall vectorization. Also, outlined common functionality into a separate function. Differential Revision: https://reviews.llvm.org/D115358	2021-12-13 10:59:38 -08:00
Nikita Popov	432c41ebe9	[SLP] Avoid getPointerElementType() call Use the load result type instead of the element type of the load pointer operand.	2021-12-13 15:46:13 +01:00
Alexey Bataev	19c5cf4167	[SLP]Fix comparator for cmp instruction vectorization. The comparator for the sort functions should provide strict weak ordering relation between parameters. Current solution causes compiler crash with some standard c++ library implementations, because it does not meet this criteria. Tried to fix it + it improves the iverall vectorization result. Differential Revision: https://reviews.llvm.org/D115268	2021-12-09 10:57:57 -08:00

1 2 3 4 5 ...

1028 Commits