llvm-project

Author	SHA1	Message	Date
Sushant Gokhale	9991ea28fc	[CostModel][AArch64] Make extractelement, with fmul user, free whenev… (#111479 ) …er possible In case of Neon, if there exists extractelement from lane != 0 such that 1. extractelement does not necessitate a move from vector_reg -> GPR 2. extractelement result feeds into fmul 3. Other operand of fmul is a scalar or extractelement from lane 0 or lane equivalent to 0 then the extractelement can be merged with fmul in the backend and it incurs no cost. e.g. ``` define double @foo(<2 x double> %a) { %1 = extractelement <2 x double> %a, i32 0 %2 = extractelement <2 x double> %a, i32 1 %res = fmul double %1, %2 ret double %res } ``` `%2` and `%res` can be merged in the backend to generate: `fmul d0, d0, v0.d[1]` The change was tested with SPEC FP(C/C++) on Neoverse-v2. Compile time impact: None Performance impact: Observing 1.3-1.7% uplift on lbm benchmark with -flto depending upon the config.	2024-11-13 11:10:49 +05:30
Han-Kuan Chen	5a5502b9e1	[SLP] NFC. Use Value instead of template. (#115440 )	2024-11-13 11:58:19 +08:00
Alexey Bataev	058ac837bc	[SLP]Use generic createShuffle for buildvector Use generic createShuffle function, which know how to adjust the vectors correctly, to avoid compiler crash when trying to build a buildvector as a shuffle Fixes #115732	2024-11-11 10:49:39 -08:00
Han-Kuan Chen	3cdd86bb47	[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#115417 )	2024-11-10 13:53:15 +08:00
Alexey Bataev	26a9f3f590	[SLP][NFC]Cleanup getSameOpcode, return InstructionsState::invalid() for non-valid inputs Just a cleanup and related changes	2024-11-08 14:00:32 -08:00
Kazu Hirata	bc7e5c2016	[SLP] Avoid repeated hash lookups (NFC) (#115428 )	2024-11-08 07:35:06 -08:00
Alexey Bataev	77bec78878	[SLP]Do not look for last instruction in schedule block for buildvectors If looking for the insertion point for the node and the node is a buildvector node, the compiler should not use scheduling info for such nodes, they may contain only partial info, which is not fully correct and may cause compiler crash. Fixes #114082	2024-11-08 06:55:29 -08:00
Alexey Bataev	62db1c8a07	[SLP]Better decision making on whether to try stores packs for vectorization Since the stores are sorted by distance, comparing the indices in the original array and early exit, if the index is less than the index of the last store, not always the best strategy. Better to remove such stores explicitly to try better to check for the vectorization opportunity. Fixes #115008	2024-11-07 14:23:15 -08:00
Alexey Bataev	b7a8f5f4c9	[SLP][NFC]Exit early from attempt-to-reorder, if it is useless Adds early exits, which just save compile time. It can exit earl, if the total number of scalars is 2, or all scalars are constant, or the opcode is the same and not alternate. In this case reordering will not happen and compiler can exit early to save compile time	2024-11-07 11:07:49 -08:00
Kazu Hirata	22b4b1ab10	Revert "[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946 )" This reverts commit f58757b8dc167809b69ec00f9b5ab59281df0902. Failing buildbots: https://lab.llvm.org/buildbot/#/builders/174/builds/8058 https://lab.llvm.org/buildbot/#/builders/127/builds/1357	2024-11-07 10:43:11 -08:00
Han-Kuan Chen	f58757b8dc	[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946 )	2024-11-08 00:52:59 +08:00
Han-Kuan Chen	c6091cdbed	[SLP][REVEC] Make shufflevector can be vectorized with ReorderIndices and ReuseShuffleIndices. (#114965 )	2024-11-07 11:04:34 +08:00
Alexey Bataev	9f3b6adb15	[SLP][NFC]Exit early if the graph is empty, NFC No need to check anything if the graph is empty, just exit early.	2024-11-06 08:33:14 -08:00
Alexey Bataev	76422385c3	[SLP]Support reordered buildvector nodes for better clustering Patch adds reordering of the buildvector nodes for better clustering of the compatible operations and future vectorization. Includes basic cost estimation and if the transformation is not profitable - reverts it. AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test 74565.00 75701.00 1.5% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2014462.00 2024494.00 0.5% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 395219.00 396979.00 0.4% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 857795.00 859667.00 0.2% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 800472.00 802440.00 0.2% test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 590699.00 591403.00 0.1% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203006.00 203102.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 42408.00 42424.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12451575.00 12451927.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396480.00 1396448.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396480.00 1396448.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1047708.00 1047580.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111344.00 111328.00 -0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1087660.00 1087500.00 -0.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 280664.00 280616.00 -0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 502646.00 502006.00 -0.1% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1033135.00 1031567.00 -0.2% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test 33893.00 33797.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39677.00 39549.00 -0.3% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39674.00 39546.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test 11560.00 11512.00 -0.4% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 653867.00 649275.00 -0.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 653867.00 649275.00 -0.7% CINT2006/401.bzip2 - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - function _ZN9FastBoard25get_pattern3_augment_specEiib not inlined anymore, better vectorization CFP2017rate/510.parest_r - better vectorization JM/ldecod - better vectorization JM/lencod - same CINT2006/464.h264ref - extra code vectorized CFP2006/447.dealII - extra vector code MiBench/consumer-lame - vectorized 2 loops previously scalar DOE-ProxyApps-C/miniGMG - small changes Benchmarks/7zip - extra code vectorized, better vectorization CFP2017rate/526.blender_r - extra vectorization CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vectorization MiBench/consumer-jpeg - extra vectorization CINT2006/400.perlbench - extra vectorization Prolangs-C/TimberWolfMC - small variations Applications/sqlite3 - extra function vectorized and inlined Benchmarks/tramp3d-v4 - extra code vectorized CINT2017rate/500.perlbench_r CINT2017speed/600.perlbench_s - extra code vectorized, function digcpy gets vectorized and inlined CINT2006/473.astar - extra code vectorized MiBench/telecomm-gsm - extra code vectorized, better vector code mediabench/gsm - same MiBench/security-blowfish - extra code vectorized CINT2017speed/625.x264_s CINT2017rate/525.x264_r - sub4x4_dct function vectorized and gets inlined RISCV-V, SiFive-p670, O3+LTO CFP2017rate/510.parest_r - extra vectorization CFP2017rate/526.blender_r - extra vectorization MiBench/consumer-lame - extra vectorized code Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/114284	2024-11-06 10:51:15 -05:00
Alexey Bataev	0c18def2c1	[SLP]Allow interleaving check only if it is less than number of elements Need to check if the interleaving factor is less than total number of elements in loads slice to handle it correctly and avoid compiler crash. Fixes report https://github.com/llvm/llvm-project/pull/112361#issuecomment-2457227670	2024-11-05 07:06:15 -08:00
Alexey Bataev	899336735a	[SLP]Be more pessimistic about poisonous reductions Consider all possible reductions ops as being non-poisoning boolean logical operations, which require freeze to be fully correct. https://alive2.llvm.org/ce/z/TKWDMP Fixes #114738	2024-11-04 06:13:52 -08:00
Han-Kuan Chen	a795a18bba	[SLP][REVEC] VF should be scaled when ScalarTy is FixedVectorType. (#114551 )	2024-11-02 03:03:52 +08:00
Han-Kuan Chen	e4aeeba84c	[SLP][REVEC] When ScalarTy is FixedVectorType, the insertion index should consider the number of elements of ScalarTy. (#114526 )	2024-11-01 21:17:57 +08:00
Alexey Bataev	e05def081e	[SLP]Do not vectorize code in EH and non-returning blocks The code in EH and non-returning blocks can be skipped by the vectorizer, since it does not add to the perfromance, just consumes compile/link time. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112221	2024-10-31 13:50:02 -04:00
Alexey Bataev	19a34dded7	[SLP]Do not account external uses in EH block and in non-returning blocks No need to account the cost of the external uses in EH and non-returning basic blocks. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112045	2024-10-31 13:23:43 -04:00
Alexey Bataev	e7080fd735	[SLP]Extra check if the intruction matked for removal, must be replaced in reduction ops If the instruction is vectorized and it is a part of the reduced values gather/buildvector node, it should replaced in reduced operation instructions before removal properly, to avoid compiler crash. Fixes #114371	2024-10-31 09:59:35 -07:00
Alexey Bataev	7152bf3bc8	[SLP]Do not create new vector node if scalars fully overlap with the existing one If the list of scalars vectorized as the part of the same vector node, no need to generate vector node again, it will be handled as part of overlapping matching. Fixes #113810	2024-10-28 06:59:41 -07:00
Alexey Bataev	e914421d7f	[SLP]Do correct signedness analysis for externally used scalars If the scalars is used externally is in the root node, it may have incorrect signedness info because of the conflict with the demanded bits analysis. Need to perform exact signedness analysis and compute it rather than rely on the precomputed value, which might be incorrect for alternate zext/sext nodes. Fixes #113520	2024-10-24 08:59:24 -07:00
Alexey Bataev	d2e7ee77d3	[SLP]Do not check for clustered loads only Since SLP support "clusterization" of the non-load instructions, the restriction for reduced values for loads only should be removed to avoid compiler crash. Fixes #113516	2024-10-24 08:16:42 -07:00
Alexey Bataev	cb5046da26	[SLP]Do not ignore undefs when trying to replace with "poisonous" shuffles Need to consider undefs correctly, when trying to replace them with potentially poisonous values in shuffles. Such elements should not be silently replaced by poison values, instead complex analysis should be implemented to see if it is safe to do it. Fixes #113425	2024-10-24 07:47:23 -07:00
Alexey Bataev	b65b2b4ab6	[SLP]Expand vector to the whole register size in extracts adjustment Need to expand the number of elements to the whole register to correctly process estimation and avoid compiler crash. Fixes #113462	2024-10-23 12:04:40 -07:00
Alexey Bataev	a3508e0246	[SLP]Small buidlvector only graph should contains scalars from same block If the graph is small and has single buildvector node, all scalars instructions must be from the same basic block to prevent compiler crash. Fixes #113451	2024-10-23 10:46:38 -07:00
Alexey Bataev	4b1b51ac52	[SLP]Initial non-power-of-2 support (but still whole register) for reductions Enables initial non-power-of-2 support (but still requires number of elements, forming whole registers) for reductions. Enables extra vectorization for MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and CFP2017rate/526.blender_r (checked for SSE2) Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112361	2024-10-21 12:25:39 -07:00
Alexey Bataev	9e03920cbf	[SLP]Ignore root gather node, when searching for reuses Root gather/buildvector node should be ignored when SLP vectorizer tries to find matching gather nodes, vectorized earlier. This node is definitely the last one in the pipeline and it does not have users. It may cause the compiler crash Fixes #113143	2024-10-21 09:16:16 -07:00
David Green	17ac10c28f	Revert "[SLP]Initial non-power-of-2 support (but still whole register) for reductions" This reverts commit 7f2e937469a8cec3fe977bf41ad2dfb9b4ce648a as it causes regressions in the tests it modifies, and undoes what was added in #100653 (which itself was a fix for a previous regression).	2024-10-21 13:37:44 +01:00
Alexey Bataev	709abacdc3	[SLP]Check that operand of abs does not overflow before making it part of minbitwidth transformation Need to check that the operand of the abs intrinsic can be safely truncated before making it part of the minbitwidth transformation. Fixes #112577	2024-10-18 13:56:19 -07:00
Alexey Bataev	e56e9dd8ad	[SLP]Fix minbitwidth emission and analysis for freeze instruction Need to add minbw emission and analysis for freeze instruction to fix incorrect signedness propagation. Fixes #112460	2024-10-18 13:36:37 -07:00
Alexey Bataev	7f2e937469	[SLP]Initial non-power-of-2 support (but still whole register) for reductions Enables initial non-power-of-2 support (but still requires number of elements, forming whole registers) for reductions. Enables extra vectorization for MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and CFP2017rate/526.blender_r (checked for SSE2) Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112361	2024-10-18 12:50:11 -07:00
Jim Lin	5e9166e02a	[SLP] Remove TTI parameter from vectorizeHorReduction and vectorizeRootInstruction. NFC. Since TTI is a member variable.	2024-10-17 09:35:22 +08:00
Alexey Bataev	685bec722f	Revert "[SLP]Initial non-power-of-2 support (but still whole register) for reductions" This reverts commit 8287fa8e596d8fc8655c8df3bc99e068ad9f7d4b to investigate and fix compile time regressions reported by https://llvm-compile-time-tracker.com/compare.php?from=ec78f0da0e9b1b8e2b2323e434ea742e272dd913&to=8287fa8e596d8fc8655c8df3bc99e068ad9f7d4b&stat=instructions:u	2024-10-15 12:59:44 -07:00
Alexey Bataev	060d151476	[SLP][NFCI]Check early for deleted instructions Check as early as possible for the deleted instructions before trying to vectorize the code. May reduce number of attempts for the vectorization.	2024-10-15 10:51:03 -07:00
Alexey Bataev	8287fa8e59	[SLP]Initial non-power-of-2 support (but still whole register) for reductions Enables initial non-power-of-2 support (but still requiresnumber of elements, forming whole registers) for reductions. Enables extra vectorization for MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and CFP2017rate/526.blender_r (checked for SSE2) Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112361	2024-10-15 12:10:48 -04:00
Alexey Bataev	f9bc00e4bb	[SLP]Initial support for interleaved loads Adds initial support for interleaved loads, which allows emission of segmented loads for RISCV RVV. Vectorizes extra code for RISCV CFP2006/447.dealII, CFP2006/453.povray, CFP2017rate/510.parest_r, CFP2017rate/511.povray_r, CFP2017rate/526.blender_r, CFP2017rate/538.imagick_r, CINT2006/403.gcc, CINT2006/473.astar, CINT2017rate/502.gcc_r, CINT2017rate/525.x264_r Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/112042	2024-10-14 09:12:33 -04:00
Alexey Bataev	3ed8acf2f0	[SLP][NFC]Simplify check for external user parent basic block, NFC.	2024-10-11 13:11:16 -07:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Alexey Bataev	4b5018d231	[SLP]Track repeated reduced value as it might be vectorized Need to track changes with the repeated reduced value, since it might be vectorized in the next attempt for reduction vectorization, to correctly generate the code and avoid compiler crash. Fixes #111887	2024-10-10 13:41:56 -07:00
Alexey Bataev	f020bf1526	[SLP]Initial support for non-power-of-2 (but whole reg) vectorization for stores Allows non-power-of-2 vectorization for stores, but still requires, that vectorized number of elements forms full vector registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/111194	2024-10-09 15:22:44 -04:00
Alexey Bataev	9f3c55954e	[SLP]Fix loads sorting for loads from diffrent basic blocks Patch fixes lookup for loads from different basic blocks. Originally, the code checked is the main key (combined with parent basic block) was created, but did not include the key into LoadsMap. When the code looked for the load pointer in LoadsMap, it skipped check for parent basic block and could mix loads from different basic blocks (but the same underlying pointer). Currently, it does lead to any issues, since later the code compares parent basic blocks and sorts loads properly. But it increases compile time and affects compile time. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/111521	2024-10-08 16:44:16 -04:00
Alexey Bataev	a65a5feb1a	[SLP]Improve masked loads vectorization, attempting gathered loads If the vector of loads can be vectorized as masked gather and there are several other masked gather nodes, compiler can try to attempt to check, if it possible to gather such nodes into big consecutive/strided loads node, which provide better performance. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/110151	2024-10-08 16:43:10 -04:00
Simon Pilgrim	d38addf099	Fix MSVC signed/unsigned mismatch warning	2024-10-08 17:36:35 +01:00
Alexey Bataev	45826513ef	[SLP][NFC]Fix clang-tidy suggestions, cleanup, NFC.	2024-10-08 08:31:23 -07:00
Alexey Bataev	7692d106b4	[SLP][NFC]Remove dead code + use nlogn lookups instead of n^2	2024-10-04 15:32:04 -07:00
Alexey Bataev	f74879cf0c	[SLP]Make PHICompare comparator follow weak strict ordering requirement Reviewers: efriedma-quic Reviewed By: efriedma-quic Pull Request: https://github.com/llvm/llvm-project/pull/110529	2024-10-04 14:23:48 -04:00
Alexey Bataev	d991e05452	[SLP]Fix compiler crash on vectorizing gatehrd loads with different types Need to check not only parents, but also types for compatible loads, when trying to build the vectorizable sequences. Fixes crash reported in https://github.com/llvm/llvm-project/pull/107461#issuecomment-2392980214	2024-10-04 08:36:57 -07:00
Han-Kuan Chen	f5815b9903	[SLP] NFC. Set NumOperands directly if VL[0] is IntrinsicInst. (#111103 )	2024-10-04 19:38:33 +08:00

1 2 3 4 5 ...

1989 Commits