llvm-project

Author	SHA1	Message	Date
Alexey Bataev	58a94b1d0a	[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type. Need to look through the SExt/ZExt scalars to be gathered, when trying to reduce their width after minbitwidth analysis to prevent permanent attempts to revectorize such gathered instructions.	2024-05-09 04:19:43 -07:00
Arthur Eubanks	2fb3774321	Revert "[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type." This reverts commit 2475efa91d8b4fa8f1a2d16052cb6d14be7d5dc6. Causes crashes, see comments on `2475efa91d`.	2024-05-08 23:01:47 +00:00
Alexey Bataev	2475efa91d	[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type. Need to look through the SExt/ZExt scalars to be gathered, when trying to reduce their width after minbitwidth analysis to prevent permanent attempts to revectorize such gathered instructions.	2024-05-08 07:25:19 -07:00
Alexey Bataev	f00f294130	[SLP]Fix PR91309: Do not consider SExt as always producing signed result. Still need to do the full analysis of the signedness of the values rather than rely on Instruction opcode, if the opcode is SExt. Still may produce unsigned result.	2024-05-07 08:57:52 -07:00
Alexey Bataev	5d9b549bb0	[SLP][NFC]Add a test showing incorrect signedness detection in sext nodes.	2024-05-07 06:46:30 -07:00
Alexey Bataev	c144157f3d	[SLP]Use last pointer instead of first for reversed strided stores. Need to use the last address of the vectorized stores for the strided stores, not the first one, to correctly store the data.	2024-05-06 10:16:28 -07:00
Alexey Bataev	a476032101	[SLP]Fix PR91025: correctly handle smin/smax of signed operands. Need to check that the signed operand has an extra sign bit to be sure that we do not skip signedness, when trying to minimize bitwidth for smin/smax intrinsics.	2024-05-06 08:10:20 -07:00
Alexey Bataev	d584df6c8f	[SLP][NFC]Add a test with incorrect smin analysis for minimal bitwidth, NFC.	2024-05-06 07:46:41 -07:00
Alexey Bataev	03972261a9	[SLP]Fix PR90892: do a correct sign analysis of the entries elements in gather shuffles. Need to do extra analysis of the scalar elements of the tree entry to be shuffled instead of the vectorized value to correctly deduce signedness info.	2024-05-03 14:01:25 -07:00
Alexey Bataev	9620d3ee3e	[SLP][NFC]Add a test with incorrect casting of shuffled gathered values, NFC.	2024-05-03 13:54:51 -07:00
Simon Pilgrim	8a0073ad46	[CostModel][X86] Treat lrint/llrint as fptosi calls (#90883 ) X86 can use the CVTP2SI instructions to lower lrint/llrint calls, which have the same costs as the CVTTP2SI (fptosi) instructions Followup to #90065	2024-05-03 18:06:50 +01:00
Simon Pilgrim	e4bb6634cd	[SLP][X86] Add test coverage for rint/lrint/llrint fp calls	2024-05-02 18:51:35 +01:00
Alexey Bataev	5e67c41a93	[SLP]Fix PR90780: insert cast instruction for PHI nodes after all phi nodes. Need to check if the vectorized value is a PHINode before insert casting instruction and insert it after all phis to generate the code correctly.	2024-05-02 06:30:14 -07:00
Alexey Bataev	fc382db239	[SLP]Improve comparison of shuffled loads/masked gathers by adding GEP cost. In some cases masked gather is less profitable than insert-subvector of consecutive/strided stores. SLP has this kind of analysis, but need to improve it by adding the cost of the GEP analysis. Also, the GEP cost estimation for masked gather is fixed. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/90737	2024-05-01 15:53:25 -04:00
Alexey Bataev	59ef94d7cf	[SLP]Do not include the cost of and -1, <v> and emit just <v> after MinBitWidth. After minbitwidth analysis, and <v>, (power_of_2 - 1 const) can be transformed into just an <v>, (all_ones const), which can be ignored at the cost estimation and at the codegen. x264 benchmark has this pattern. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/90739	2024-05-01 15:52:23 -04:00
Alexey Bataev	e83c6ddf46	[SLP][NFC]Add a test with the non profitable masked gather loads.	2024-05-01 07:41:35 -07:00
Alexey Bataev	576261ac8f	[SLP]Improve reordering for consts, splats and ops from same nodes + improved analysis. Improved detection of const/splat candidates, their matching and analysis of instructions from same nodes. Metric: size..text Program size..text results results0 diff results results0 diff test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 92952.00 93096.00 0.2% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 779832.00 780136.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 839923.00 840179.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 392708.00 392740.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171131.00 1171147.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1391089.00 1391073.00 -0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1391089.00 1391073.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12352780.00 12352636.00 -0.0% MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE - small reordering External/SPEC/CINT2006/464.h264ref/464.h264ref - small better code after reordering MultiSource/Applications/JM/lencod/lencod - smaller code with less shuffles MultiSource/Applications/JM/ldecod/ldecod - same External/SPEC/CFP2017rate/511.povray_r/511.povray_r - 2 extra loads vectorized, smaller code External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - better code, size increased because of more constant vectors. External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same External/SPEC/CFP2017rate/526.blender_r/526.blender_r - small change in the vectorized code, some code a bit better, some a bit worse. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87091	2024-05-01 07:34:06 -04:00
Alexey Bataev	67e726a2f7	[SLP]Transform stores + reverse to strided stores with stride -1, if profitable. Adds transformation of consecutive vector store + reverse to strided stores with stride -1, if it is profitable Reviewers: RKSimon, preames Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/90464	2024-05-01 07:32:33 -04:00
Alexey Bataev	ef78edafab	[SLP][NFC]Add a test with the optimizable and and final ext, NFC.	2024-04-29 07:46:21 -07:00
Alexey Bataev	37ae4ad0ee	[SLP]Support minbitwidth analisys for buildvector nodes. Metric: size..text Program size..text exp ref diff test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 42906.00 42986.00 0.2% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 42909.00 42989.00 0.2% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664581.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664581.00 664661.00 0.0% Less is better. Replaces `buildvector <p x in> + trunc <p x in> to <p x im>` sequences to `buildvector <p x im> of { trunc in to im }` scalars, which is free in most cases, results in better code. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88504	2024-04-29 09:57:37 -04:00
Alexey Bataev	040b5a1255	[SLP]Fix PR90211: vectorized node must match completely to be reused. If the gather node matches the vectorized node, it must also match with the scalars completely. Otherwise, need to revectorize the gather node to generate correct code.	2024-04-29 06:51:11 -07:00
Alexey Bataev	86b9a4f892	[SLP][NFC]Add a test with the skipped gather node, which is same, as vectorized node.	2024-04-29 06:43:13 -07:00
Alexey Bataev	217c099ead	[SLP][NFC]Add a test for strided stores support, NFC.	2024-04-29 05:52:38 -07:00
Alexey Bataev	79314c64d0	[SLP]Fix PR90224: check that users of gep are all vectorized. Before deleting extractelement instruction for vectorized GEP with external users, need to check that all users vectorized before deleting this extractelement.	2024-04-26 11:49:12 -07:00
Alexey Bataev	d74e42acd2	[SLP]Attempt to vectorize long stores, if short one failed. We can try to vectorize long store sequences, if short ones were unsuccessful because of the non-profitable vectorization. It should not increase compile time significantly (stores are sorted already, complexity is n x log n), but vectorize extra code. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1088012.00 1088236.00 0.0% test-suite :: SingleSource/UnitTests/matrix-types-spec.test 480396.00 480476.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2041105.00 2040961.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 836563.00 836387.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035100.00 1032140.00 -0.3% In all benchmarks extra code gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88563	2024-04-26 06:53:44 -07:00
Alexey Bataev	f758bb66e8	[SLP]Fix PR89988: do extra analysis of the icmp args to correctly handle signed/unsigned comparison. If operands of icmp has different signedness, need to consider extending unsigned operands to correctly handle comparison with the signed operands.	2024-04-25 16:10:24 -07:00
Alexey Bataev	bef6687f9b	[SLP][NFC]Add a test with the incorrect comparison after minbiwidth analysis.	2024-04-25 16:07:45 -07:00
Alexey Bataev	b4a0fd40f1	[SLP]Fix PR89635: do not try to vectorize single-gather alternate node. No need to try to vectorize single gather/buildvector with alternate opcode graph, it is not profitable. In other cases, need to use last instruction for inserting the vectorized code.	2024-04-23 06:45:43 -07:00
Alexey Bataev	0ab0c1d982	[SLP]Introduce transformNodes() and transform loads + reverse to strided loads. Introduced transformNodes() function to perform transformation of the nodes (cost-based, instruction count based, etc.). Implemented transformation of consecutive loads + reverse order to strided loads with stride -1, if profitable. Reviewers: RKSimon, preames, topperc Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88530	2024-04-22 12:31:57 -04:00
Alexey Bataev	6bd29d6639	[SLP]Fix PR89614: phis can be reordered, if reuses are not empty. Need to relax assertion and check ReuseShuffleIndices is not empty, if the root phi node has reorder indices.	2024-04-22 08:40:19 -07:00
Alexey Bataev	102a811094	[SLP]Fix a check for multi-users for icmp user. The compiler should not take into account the type of the cmp instruction, otherwise it may treat the size incorrectly and it may lead to incorrect codegen.	2024-04-22 08:23:15 -07:00
Alexey Bataev	19a625a0a7	[SLP][NFC]Add a test with incorrect size of the external user detection.	2024-04-22 08:16:43 -07:00
Alexey Bataev	ef1d19b0a5	[SLP]Fix PR89438: check for all tree entries for the resized value. Need to check all possible entries, before trying looking for the minbitwidth in the user node. Otherwise we may incorrectly get signedness info.	2024-04-22 06:38:38 -07:00
Alexey Bataev	cee7d994b9	[SLP]Fix PR89438: Check for same vectorized node in MinBWs, not user. Need to check if the buildvector node has perfect diamond match in the graph and the matched node is resized.	2024-04-19 12:52:19 -07:00
Alexey Bataev	4d7f3d9e0f	[SLP]Fix final analysis for unsigned nodes. Need to check that at least single bit is cleared for unsigned nodes before reducing their size. Otherwise they might be treated as signed in signed nodes.	2024-04-19 03:03:56 -07:00
Mikhail Goncharov	054b1b3b5a	Revert "[SLP]Fix final analysis for unsigned nodes." This reverts commit 74e07ab523122d6a8347b25770062ab331b6bb84. It might be that Mask.getBitWidth() == Mask.countl_zero() (32 in my case) and zero bitwidth2 causes the crash.	2024-04-19 11:32:56 +02:00
Alexey Bataev	74e07ab523	[SLP]Fix final analysis for unsigned nodes. Need to check that at least single bit is cleared for unsigned nodes before reducing their size. Otherwise they might be treated as signed in signed nodes.	2024-04-18 10:05:54 -07:00
Alexey Bataev	df7eb202ce	[SLP][NFC]Add a test with incorrect final analysis for unsigned nodes, being used in signed nodes.	2024-04-18 09:58:38 -07:00
Alexey Bataev	9462abdff1	[SLP]Fix PR89187: fixx assertion check. Need to use proper index variable to fix a crash.	2024-04-18 04:22:25 -07:00
Nikita Popov	888836930b	Revert "[SLP]Attempt to vectorize long stores, if short one failed." This reverts commit 6f7160eedb2db02f37d4ffd52fff7b0cf88b3fdc. This still causes large compile-time regressions in some cases.	2024-04-18 10:15:45 +09:00
Simon Pilgrim	c02ed29ec1	[CostModel][X86] Recognise vector rotation by uniform constant patterns Adds suitable costs for AVX512 targets (we still rely on default expansion for AVX2 and earlier)	2024-04-17 19:08:36 +01:00
Alexey Bataev	6f7160eedb	[SLP]Attempt to vectorize long stores, if short one failed. We can try to vectorize long store sequences, if short ones were unsuccessful because of the non-profitable vectorization. It should not increase compile time significantly (stores are sorted already, complexity is n x log n), but vectorize extra code. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1088012.00 1088236.00 0.0% test-suite :: SingleSource/UnitTests/matrix-types-spec.test 480396.00 480476.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2041105.00 2040961.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 836563.00 836387.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035100.00 1032140.00 -0.3% In all benchmarks extra code gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88563	2024-04-17 10:24:35 -07:00
Nikita Popov	efd60556f7	Revert "[SLP]Attempt to vectorize long stores, if short one failed." This reverts commit 7d4e8c1f3bbfe976f4871c9cf953f76d771b0eda. Contrary to the commit description, this does cause large compile-time regressions (up to 10% on individual files).	2024-04-17 09:25:05 +09:00
Alexey Bataev	7d4e8c1f3b	[SLP]Attempt to vectorize long stores, if short one failed. We can try to vectorize long store sequences, if short ones were unsuccessful because of the non-profitable vectorization. It should not increase compile time significantly (stores are sorted already, complexity is n x log n), but vectorize extra code. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1088012.00 1088236.00 0.0% test-suite :: SingleSource/UnitTests/matrix-types-spec.test 480396.00 480476.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2041105.00 2040961.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 836563.00 836387.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035100.00 1032140.00 -0.3% In all benchmarks extra code gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88563	2024-04-16 14:55:41 -04:00
Alexey Bataev	c7657cf7d1	[SLP]Keep externally used GEPs as GEPs, if possible instead of extractelement. If the vectorized GEP instruction can be still kept as a scalar GEP, better to keep it as scalar instead of extractelement. In many cases it is more profitable. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 18911.00 19695.00 4.1% test-suite :: SingleSource/Benchmarks/Misc-C++-EH/spirit.test 59987.00 60707.00 1.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1392209.00 1392753.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1392209.00 1392753.00 0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1087996.00 1088236.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309310.00 309342.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664661.00 664693.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664661.00 664693.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12354636.00 12354908.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1152748.00 1152716.00 -0.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 191787.00 191771.00 -0.0% test-suite :: SingleSource/UnitTests/matrix-types-spec.test 480796.00 480476.00 -0.1% Misc/oourafft - Extra code gets vectorized Misc-C++-EH/spirit - same CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - same, extra code gets vectorized CINT2006/400.perlbench - some extra 4 x ptr stores vectorized Bullet/bullet - extra 4 x ptr store vectorized CINT2017rate/525.x264_r CINT2017speed/625.x264_s - same CFP2017rate/526.blender_r - extra 8 x float stores (several), some extra 4 x ptr stores CFP2006/453.povray - 2 x double loads/stores replaced by 4 x double loads/stores Applications/oggenc - extra code is vectorized UnitTests/matrix-types-spec - extra code gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88877	2024-04-16 14:54:06 -04:00
Alexey Bataev	26ebe16d78	[SLP]Fix PR88834: check if unsigned arg can be trunced, being used in smax/smin intrinsics. Need to check that unsigned argument can be safely used in smax/smin intrinsics by checking if at least single sign bit is cleared, otherwise its value may be treated as negative instead of positive.	2024-04-16 06:42:15 -07:00
Alexey Bataev	6ab5927238	[SLP][NFC]Add a test with the incorrect vectorization of smax with unsigned arg.	2024-04-16 06:35:13 -07:00
Florian Hahn	b73476c784	[SLP] Make sure MinVF is a power-of-2 by using PowerOf2Ceil. This should ensure we explore the same VFs as before 6d66db3890a18e39. Fixes https://github.com/llvm/llvm-project/issues/88640.	2024-04-16 13:29:35 +01:00
Valery Dmitriev	9abb1ffc5c	[SLP][NFC] Add option to bypass early profitability check. (#88594 ) The option intended primarily for LIT tests to suppress heuristic based profitability check and proceed vectorization of a seemingly unprofitable alternate operation pattern. This allows the vectorizer to execute path that was the original intent of a test.	2024-04-15 09:09:39 -07:00
Florian Hahn	6704faf6f8	[SLP] Use StoreTy to compute min VF. This ensures that MinVF is a power-of-2, even if ValueTy's width is not a power-of-2. This should fix a number of buildbot failures with X86 bootstrapping.	2024-04-13 11:12:33 +01:00

1 2 3 4 5 ...

1731 Commits