llvm-project

Author	SHA1	Message	Date
Alexey Bataev	802ceb8343	[SLP]Excluded external uses from the reordering estimation. Compiler adds the estimation for the external uses during operands reordering analysis, which makes it tend to prefer duplicates in the lanes rather than diamond/shuffled match in the graph. It changes the sizes of the vector operands and may prevent some vectorization. We don't need this kind of estimation for the analysis phase, because we just need to choose the most compatible instruction and it does not matter if it has external user or used in the non-matching lane. Instead, we count the number of unique instruction in the lane and see if the reassociation changes the number of unique scalars to be power of 2 or not. If we have power of 2 unique scalars in the lane, it is considered more profitable rather than having non-power-of-2 number of unique scalars. Metric: SLP.NumVectorInstructions test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test 70.00 86.00 22.9% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 346.00 353.00 2.0% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 346.00 353.00 2.0% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 235.00 239.00 1.7% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 235.00 239.00 1.7% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8723.00 8834.00 1.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00 1.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9184.00 0.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4235.00 4245.00 0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00 0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 783.00 782.00 -0.1% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 69.00 68.00 -1.4% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 89.00 80.00 -10.1% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 89.00 80.00 -10.1% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 260.00 215.00 -17.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 256.00 211.00 -17.6% MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same. SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by one <4 x> load. External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and not inlined anymore. External/SPEC/CINT2017rate/541.leela_r - same xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in multi-block tree, the result is pretty the same. External/SPEC/CINT2017speed/631.deepsjeng_s - same. MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same as before. MultiSource/Benchmarks/MiBench/consumer-jpeg - same. Differential Revision: https://reviews.llvm.org/D116688	2022-02-03 06:50:06 -08:00
Alexey Bataev	ad2a0ccf8f	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-03 06:24:10 -08:00
Alexey Bataev	8a1dfbc4d8	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit 842a2360a84692f2e4c37cc3e652640e6627d004 to fix the bugs reported by users in https://reviews.llvm.org/D115955#3291538.	2022-02-02 12:06:36 -08:00
Alexey Bataev	842a2360a8	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-02 10:32:52 -08:00
Benjamin Kramer	0c3d22a592	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit 83620bd2ad867f706c699d0f2b8be10e43d9f3d7. It's causing miscompilations, see review comments at https://reviews.llvm.org/D115955	2022-02-02 13:08:51 +01:00
Alexey Bataev	83620bd2ad	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-01 09:54:20 -08:00
Benjamin Kramer	5281f0dab2	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit afaaecc88c6e5989de8a6a0266610860ef99d9d6. Crashes when compiling SciPy, test case https://reviews.llvm.org/P8276	2022-02-01 11:40:43 +01:00
Alexey Bataev	afaaecc88c	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-01-31 11:11:25 -08:00
Philip Reames	6888081e32	[SLP] Use moveBefore to simplify code [NFC]	2022-01-28 12:44:07 -08:00
Philip Reames	746e435ff7	Revert "[SLP] Add a clarifying assert in block scheduling [NFC]" This reverts commit db49a78900f5e4b59714565876b5dbb5e2dfe840. The reasoning in the patch applied to a downstream branch, and I got myself confused when trying to split apart pieces. Thankfully, the assert was simply weaker than the actual invariant currently upstream which is that ReadyInsts is not empty.	2022-01-28 12:10:31 -08:00
Philip Reames	db49a78900	[SLP] Add a clarifying assert in block scheduling [NFC] The fact we could have a block with a valid scheduling window, but nothing to schedule was surprising to me. After digging through the code, this can only happen if we don't find anything to directly vectorize. However, the reduction handling code relies on this mode, so we can't simply consider such trees unvectorizeable. The assert conveys both that this situation can happen, but also that it can only happen for an immediate gather. Context: We built the bundle before deciding that vectorization of a bundle is possible. A side effect of bundle construction is manipulating the scheduling window, so a bundle which isn't vectorizable can cause the creation or expansion of a scheduling window.	2022-01-28 11:08:59 -08:00
Alexey Bataev	cec8b614f3	[SLP]Do not reorder top nodes if they do not require reordering. No need to reorder the top nodes, if they are not stores or insertelement instructions and each node should be analized only once, when the bottom-to-top analysis is performed. We still endup with extractelements for the top node scalars and the final shuffle just adds an extra cost and currently crashes the compiler for PHI nodes. Differential Revision: https://reviews.llvm.org/D116760	2022-01-28 09:16:18 -08:00
eopXD	6be77561f8	[SLP][NFC] Add debug logs for entry. Tell the users they are specifying something without vector register. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D117980	2022-01-24 09:05:21 -08:00
Kazu Hirata	f63a9cd99d	[Vectorize] Remove unused variables (NFC)	2022-01-23 20:32:54 -08:00
Philip Reames	c0906f6b21	[SLP] Remove stray semicolon to make bots happy Certain bots (e.g. sanitizer-x86_64-linux-android) appear to be running with strict c++98 flags which disallow ; at global scope.	2022-01-20 14:09:28 -08:00
Philip Reames	5a670f1378	[SLP] Kill an unused param and use a for-loop in calculateDependencies [NFC]	2022-01-20 13:58:20 -08:00
Philip Reames	60f6191879	[SLP] Extract formBundle helper for readability [NFC]	2022-01-20 13:08:37 -08:00
Philip Reames	118babe67a	[SLP] Use for loops for walking bundle elements	2022-01-20 12:44:33 -08:00
Philip Reames	860038e0d7	[SLP] Rename a couple lambdas to be more clearly separate from method names	2022-01-20 12:13:30 -08:00
Philip Reames	c104fca36b	{SLP] Delete dead code in favor of proper assert [NFC]	2022-01-20 08:54:12 -08:00
Philip Reames	c43ebae838	[SLP] Reduce nesting depth in calculateDependencies via for loop and early continue [NFC]	2022-01-20 08:46:44 -08:00
Philip Reames	3c422cbe6b	[SLP] Add an asser to make a non-obvious precondition clear [NFC]	2022-01-20 08:24:10 -08:00
Kazu Hirata	435a5a3652	[llvm] Fix bugprone argument comments (NFC) Identified with bugprone-argument-comment.	2022-01-08 11:56:38 -08:00
Alexey Bataev	d130df544d	[SLP]Improve reordering for the nodes beeing used in alternate vectorization. No need to include the order of the scalars beeing used as part of the alternate vectorization into account when trying to reorder the whole graph. Such elements better to reorder in the following phase because the subtree still ends up in shuffle. Part of D116688, fixes the regression in D116690. Differential Revision: https://reviews.llvm.org/D116740	2022-01-06 11:18:57 -08:00
Alexey Bataev	7cb19fe493	[SLP]Initialize the lane with the given value instead of default 0. There is a bug in the reordering analysis stage. If the element with the given hash is not added to the map but has the same number of APOs and instructions with same parent, but different instruction opcode, it will be initalized with default values and then the counter is increased by 1. But the lane is not updated and default to 0 instead of the actual `Lane` value. It leads to the fact that the analysis is useless in many cases and default to lane 0 instead of actual lane with the minimum amount of APO operands. Differential Revision: https://reviews.llvm.org/D116690	2022-01-06 10:57:11 -08:00
Alexey Bataev	700997aef8	[SLP][NFC]Fix comment, NFC.	2022-01-06 06:38:29 -08:00
Alexey Bataev	dd83befe33	[SLP][NFC]Improved isAltShuffle by comparing instructions instead of opcodes, NFC. NFC part of D115955.	2022-01-05 12:30:13 -08:00
Alexey Bataev	e0efedd2c3	[SLP][NFC]Fix non-determinism in reordering, NFC. Need to clear CurrentOrder order mask if it is determined that extractelements form identity order and need to use a vector-like construct when iterating over ordered entries in the reorderTopToBottom function.	2021-12-30 13:10:25 -08:00
Alexey Bataev	ab9078f3d3	[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy. Need to check for the number of the unique non-constant values since the unique values may include several constants. Differential Revision: https://reviews.llvm.org/D115939	2021-12-20 07:21:20 -08:00
Alexey Bataev	4459a11f4d	Revert "[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy." This reverts commit fcaf290d0278bb83387e1a1d972c55e08b8c40e3 to fix test mismatch reported in https://lab.llvm.org/buildbot#builders/117/builds/3531	2021-12-20 07:21:18 -08:00
Alexey Bataev	fcaf290d02	[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy. Need to check for the number of the unique non-constant values since the unique values may include several constants. Differential Revision: https://reviews.llvm.org/D115939	2021-12-20 05:15:01 -08:00
Alexey Bataev	71fe59212c	[SLP][NFC]Adjust type in debug output loop. The ReuseShuffleIndices indeces are integer, not unsigned, need to fix the type in the debug print loop.	2021-12-17 12:43:01 -08:00
Alexey Bataev	46ad66b817	[SLP][NFC]Use 'llvm::copy' instead of element-by-elemen copying.	2021-12-17 12:07:59 -08:00
Alexey Bataev	65fc992579	[SLP]Early exit out of the reordering if shuffled/perfect diamond match found. Need to early exit out of the reordering process if the perfect/shuffled match is found in the operands. Such pattern will result in not profitable reordering because of (false positive) external use of scalars. Differential Revision: https://reviews.llvm.org/D115811	2021-12-16 11:09:49 -08:00
Alexey Bataev	6f2e087631	[SLP]Do not represent splats as node with the reused scalars. No need to represent splats as a node with the reused scalars, it may increase the cost (currently pass just ignores extra shuffle cost and it is still not correct). Differential Revision: https://reviews.llvm.org/D115800	2021-12-15 06:33:11 -08:00
Alexey Bataev	bd05376986	[SLP]Improve multinode analysis. Changes the preliminary multinode analysis: 1. Introduced scores for reversed loads/extractelements. 2. Improved shallow score calculation. 3. Lowered the cost of external uses (no need to consider it several times, just ones). 4. The initial lane for analysis is the one with the minimal possible reorderings. These changes in general shall reduce compile time and improve the reordering in many cases. Part of D57059. Differential Revision: https://reviews.llvm.org/D101109	2021-12-14 06:01:52 -08:00
Alexey Bataev	e5b191a433	[SLP]Improve/fix reodering for gather nodes with extractelements/undefs. If the gather node is a mix of undefvalues and exractelement instructions, need to take the ordering for such nodes into account too. It allows to reorder some (sub)trees and remove some extra shuffles, improving overall vectorization. Also, outlined common functionality into a separate function. Differential Revision: https://reviews.llvm.org/D115358	2021-12-13 10:59:38 -08:00
Nikita Popov	432c41ebe9	[SLP] Avoid getPointerElementType() call Use the load result type instead of the element type of the load pointer operand.	2021-12-13 15:46:13 +01:00
Alexey Bataev	19c5cf4167	[SLP]Fix comparator for cmp instruction vectorization. The comparator for the sort functions should provide strict weak ordering relation between parameters. Current solution causes compiler crash with some standard c++ library implementations, because it does not meet this criteria. Tried to fix it + it improves the iverall vectorization result. Differential Revision: https://reviews.llvm.org/D115268	2021-12-09 10:57:57 -08:00
Philip Reames	b24db85c0b	[recurrence] Delete dead flag/fmf handling [NFC] The recurrence lowering code has handling which claims to be about flag intersection, but all the callers pass empty arrays to the arguments. The sole exception is a caller of a method which has the argument, but no implementation. I don't know what the intent was here, but it certaintly doesn't actually do anything today.	2021-12-09 10:43:53 -08:00
Alexey Bataev	a101a9b64b	[SLP]Fix compiler crash when calculating extract cost for undefs. Need to add an extra check for potential undef values in computeExtractCost function to avoid compiler crash on casting to instructon. Differential Revision: https://reviews.llvm.org/D115162	2021-12-06 10:46:13 -08:00
Alexey Bataev	ba74bb3a22	[SLP]Fix reused extracts cost. If the extractelement instruction is used multiple times in the different tree entries (either vectorized, or gathered), need to compensate the scalar cost of such instructions. They are completely removed if all users are part of the tree but we need to compensate the cost only once for each instruction. Differential Revision: https://reviews.llvm.org/D114958	2021-12-02 10:52:00 -08:00
Alexey Bataev	8ceccbd321	[SLP]Outline and fix code for finding common insertelement vectors. Need to outline the code for finding common vectors in insertelement instructions into a separate function for future patches. It also improves the process by adding some extra checks for early exit and fixes a bug where it always finds the match because of erroneous compare of the same values. Differential Revision: https://reviews.llvm.org/D114909	2021-12-02 09:18:25 -08:00
Alexey Bataev	92fbd76af5	[SLP]Improve registering and merging of compatible shuffles. If several shuffle instructions are emitted, some of them might same/compatible (less defined) with the previously emitted ones. Such shuffles can be removed safely, improving the total cost of the vectorized code. Differential Revision: https://reviews.llvm.org/D114087	2021-12-02 08:48:29 -08:00
Alexey Bataev	afc9e7517a	[SLP]Improve cost model for the shuffled extracts. Improved the calculation of the shuffled extracts, where possible. Need to calculate the cost for the extracted scalars if some users are not insertelements + improved the total estimation of the shuffled scalars used in insertelements build vectors. Differential Revision: https://reviews.llvm.org/D113782	2021-12-01 08:10:57 -08:00
Alexey Bataev	cc30fbf242	[SLP]Introduce isUndefVector function to check for undef vectors. Undefined vector might be not only the UndefValue, but also it can be a constant vector with undef ot poison elements, need to check for this kind of undef too. Differential Revision: https://reviews.llvm.org/D114873	2021-12-01 07:46:10 -08:00
Alexey Bataev	ddce6e0561	[SLP]Improve vectorization of cmp instructions sequences. Final attempt to vectorize bundles of comptatible cmp instructions after all other instructions processing. Metric: SLP.NumVectorInstructions Program results results0 diff test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 1.00 5.00 400.0% test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test 8.00 11.00 37.5% test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test 20.00 26.00 30.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1344.00 1648.00 22.6% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1344.00 1648.00 22.6% test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test 102.00 124.00 21.6% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 118.00 133.00 12.7% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3233.00 3554.00 9.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3233.00 3554.00 9.9% test-suite :: MultiSource/Benchmarks/Olden/power/power.test 64.00 70.00 9.4% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 7879.00 8604.00 9.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/simulator/simulator.test 50.00 54.00 8.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 27.00 29.00 7.4% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 8345.00 8955.00 7.3% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 694.00 738.00 6.3% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 361.00 382.00 5.8% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 409.00 430.00 5.1% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 140.00 147.00 5.0% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 140.00 147.00 5.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4013.00 4206.00 4.8% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 966.00 1011.00 4.7% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 65.00 68.00 4.6% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4219.00 4381.00 3.8% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1911.00 1973.00 3.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 62.00 64.00 3.2% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 62.00 64.00 3.2% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 852.00 877.00 2.9% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 852.00 877.00 2.9% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1624.00 1668.00 2.7% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 39.00 40.00 2.6% test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 613.00 624.00 1.8% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 378.00 383.00 1.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 293.00 295.00 0.7% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 297.00 299.00 0.7% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5522.00 5534.00 0.2% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5522.00 5534.00 0.2% Differential Revision: https://reviews.llvm.org/D114799	2021-12-01 07:26:29 -08:00
Alexey Bataev	dce6c434ea	[SLP]Improve isFixedVectorShuffle and its use. Extended support for undefined source vector/extract indices/non-fixed vector types, also no need to check for the parent of the extractelement instructions with the constant indicies. Differential Revision: https://reviews.llvm.org/D114121	2021-11-30 10:10:20 -08:00
Alexey Bataev	fc57cfad3c	[SLP][NFC]Move static function to make it visible in member function, NFC.	2021-11-30 09:38:46 -08:00
Alexey Bataev	fc0aacf324	[SLP]Improve analysis/emission of vector operands for alternate nodes. Compiler has an analysis for perfect diamond matching but it does not support nodes with main/alternate opcodes. The problem is that the scalars themselves are different and might not match directly with other nodes, but operands and main/alternate opcodes might match and compiler might reuse some previously emitted vector instructions. Need to include this analysis in the cost model and actual vector instructions emission process. Differential Revision: https://reviews.llvm.org/D114101	2021-11-26 06:38:02 -08:00

1 2 3 4 5 ...

1017 Commits