llvm-project

Author	SHA1	Message	Date
Alexey Bataev	b65b2b4ab6	[SLP]Expand vector to the whole register size in extracts adjustment Need to expand the number of elements to the whole register to correctly process estimation and avoid compiler crash. Fixes #113462	2024-10-23 12:04:40 -07:00
Alexey Bataev	a3508e0246	[SLP]Small buidlvector only graph should contains scalars from same block If the graph is small and has single buildvector node, all scalars instructions must be from the same basic block to prevent compiler crash. Fixes #113451	2024-10-23 10:46:38 -07:00
Alexey Bataev	4b1b51ac52	[SLP]Initial non-power-of-2 support (but still whole register) for reductions Enables initial non-power-of-2 support (but still requires number of elements, forming whole registers) for reductions. Enables extra vectorization for MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and CFP2017rate/526.blender_r (checked for SSE2) Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112361	2024-10-21 12:25:39 -07:00
Alexey Bataev	9e03920cbf	[SLP]Ignore root gather node, when searching for reuses Root gather/buildvector node should be ignored when SLP vectorizer tries to find matching gather nodes, vectorized earlier. This node is definitely the last one in the pipeline and it does not have users. It may cause the compiler crash Fixes #113143	2024-10-21 09:16:16 -07:00
David Green	17ac10c28f	Revert "[SLP]Initial non-power-of-2 support (but still whole register) for reductions" This reverts commit 7f2e937469a8cec3fe977bf41ad2dfb9b4ce648a as it causes regressions in the tests it modifies, and undoes what was added in #100653 (which itself was a fix for a previous regression).	2024-10-21 13:37:44 +01:00
Alexey Bataev	709abacdc3	[SLP]Check that operand of abs does not overflow before making it part of minbitwidth transformation Need to check that the operand of the abs intrinsic can be safely truncated before making it part of the minbitwidth transformation. Fixes #112577	2024-10-18 13:56:19 -07:00
Alexey Bataev	e56e9dd8ad	[SLP]Fix minbitwidth emission and analysis for freeze instruction Need to add minbw emission and analysis for freeze instruction to fix incorrect signedness propagation. Fixes #112460	2024-10-18 13:36:37 -07:00
Alexey Bataev	7f2e937469	[SLP]Initial non-power-of-2 support (but still whole register) for reductions Enables initial non-power-of-2 support (but still requires number of elements, forming whole registers) for reductions. Enables extra vectorization for MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and CFP2017rate/526.blender_r (checked for SSE2) Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112361	2024-10-18 12:50:11 -07:00
Jim Lin	5e9166e02a	[SLP] Remove TTI parameter from vectorizeHorReduction and vectorizeRootInstruction. NFC. Since TTI is a member variable.	2024-10-17 09:35:22 +08:00
Alexey Bataev	685bec722f	Revert "[SLP]Initial non-power-of-2 support (but still whole register) for reductions" This reverts commit 8287fa8e596d8fc8655c8df3bc99e068ad9f7d4b to investigate and fix compile time regressions reported by https://llvm-compile-time-tracker.com/compare.php?from=ec78f0da0e9b1b8e2b2323e434ea742e272dd913&to=8287fa8e596d8fc8655c8df3bc99e068ad9f7d4b&stat=instructions:u	2024-10-15 12:59:44 -07:00
Alexey Bataev	060d151476	[SLP][NFCI]Check early for deleted instructions Check as early as possible for the deleted instructions before trying to vectorize the code. May reduce number of attempts for the vectorization.	2024-10-15 10:51:03 -07:00
Alexey Bataev	8287fa8e59	[SLP]Initial non-power-of-2 support (but still whole register) for reductions Enables initial non-power-of-2 support (but still requiresnumber of elements, forming whole registers) for reductions. Enables extra vectorization for MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and CFP2017rate/526.blender_r (checked for SSE2) Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/112361	2024-10-15 12:10:48 -04:00
Alexey Bataev	f9bc00e4bb	[SLP]Initial support for interleaved loads Adds initial support for interleaved loads, which allows emission of segmented loads for RISCV RVV. Vectorizes extra code for RISCV CFP2006/447.dealII, CFP2006/453.povray, CFP2017rate/510.parest_r, CFP2017rate/511.povray_r, CFP2017rate/526.blender_r, CFP2017rate/538.imagick_r, CINT2006/403.gcc, CINT2006/473.astar, CINT2017rate/502.gcc_r, CINT2017rate/525.x264_r Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/112042	2024-10-14 09:12:33 -04:00
Alexey Bataev	3ed8acf2f0	[SLP][NFC]Simplify check for external user parent basic block, NFC.	2024-10-11 13:11:16 -07:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Alexey Bataev	4b5018d231	[SLP]Track repeated reduced value as it might be vectorized Need to track changes with the repeated reduced value, since it might be vectorized in the next attempt for reduction vectorization, to correctly generate the code and avoid compiler crash. Fixes #111887	2024-10-10 13:41:56 -07:00
Alexey Bataev	f020bf1526	[SLP]Initial support for non-power-of-2 (but whole reg) vectorization for stores Allows non-power-of-2 vectorization for stores, but still requires, that vectorized number of elements forms full vector registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/111194	2024-10-09 15:22:44 -04:00
Alexey Bataev	9f3c55954e	[SLP]Fix loads sorting for loads from diffrent basic blocks Patch fixes lookup for loads from different basic blocks. Originally, the code checked is the main key (combined with parent basic block) was created, but did not include the key into LoadsMap. When the code looked for the load pointer in LoadsMap, it skipped check for parent basic block and could mix loads from different basic blocks (but the same underlying pointer). Currently, it does lead to any issues, since later the code compares parent basic blocks and sorts loads properly. But it increases compile time and affects compile time. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/111521	2024-10-08 16:44:16 -04:00
Alexey Bataev	a65a5feb1a	[SLP]Improve masked loads vectorization, attempting gathered loads If the vector of loads can be vectorized as masked gather and there are several other masked gather nodes, compiler can try to attempt to check, if it possible to gather such nodes into big consecutive/strided loads node, which provide better performance. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/110151	2024-10-08 16:43:10 -04:00
Simon Pilgrim	d38addf099	Fix MSVC signed/unsigned mismatch warning	2024-10-08 17:36:35 +01:00
Alexey Bataev	45826513ef	[SLP][NFC]Fix clang-tidy suggestions, cleanup, NFC.	2024-10-08 08:31:23 -07:00
Alexey Bataev	7692d106b4	[SLP][NFC]Remove dead code + use nlogn lookups instead of n^2	2024-10-04 15:32:04 -07:00
Alexey Bataev	f74879cf0c	[SLP]Make PHICompare comparator follow weak strict ordering requirement Reviewers: efriedma-quic Reviewed By: efriedma-quic Pull Request: https://github.com/llvm/llvm-project/pull/110529	2024-10-04 14:23:48 -04:00
Alexey Bataev	d991e05452	[SLP]Fix compiler crash on vectorizing gatehrd loads with different types Need to check not only parents, but also types for compatible loads, when trying to build the vectorizable sequences. Fixes crash reported in https://github.com/llvm/llvm-project/pull/107461#issuecomment-2392980214	2024-10-04 08:36:57 -07:00
Han-Kuan Chen	f5815b9903	[SLP] NFC. Set NumOperands directly if VL[0] is IntrinsicInst. (#111103 )	2024-10-04 19:38:33 +08:00
Alexey Bataev	133c1224de	[SLP]Fix a crash on accessing element with index -1 for reused mask with PoisonMaskElem Need to check if the index from the ReuseShuffleIndices mask is not equal to PoisonMaskElem before trying to access the element by index.	2024-10-03 08:24:05 -07:00
Han-Kuan Chen	5901463ada	[SLP] NFC. BaseIndex is not used for getSameOpcode. (#110948 )	2024-10-03 19:58:44 +08:00
Alexey Bataev	c1b911c579	[SLP]Do correct signedness analysis for clustered nodes Should get the signedness info from the original scalar instructions, if possible, to correctly generate sext/zext instructions. Also, the clustered node must be assigned a gather node user info to correctly estimate its bitwidth/sign.	2024-10-02 12:56:49 -07:00
Alexey Bataev	4197e732a5	[SLP]Add debug counter support Fixes #110725 Reviewers: aeubanks Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/110734	2024-10-02 11:14:34 -07:00
Alexey Bataev	ec7266617f	Revert "[SLP]Add debug counter support" This reverts commit 67dd9d23474bd570d5befaddad0be8a5559b815b to fix https://lab.llvm.org/buildbot/#/builders/11/builds/6012	2024-10-02 10:33:27 -07:00
Alexey Bataev	67dd9d2347	[SLP]Add debug counter support Fixes #110725 Reviewers: aeubanks Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/110734	2024-10-02 10:00:48 -07:00
Alexey Bataev	4dede756f2	[SLP]Transform nodes before building externally used values transformNodes function may create new vector nodes, so the reduced values might be vectorized later. Need to build the list of the externally used values after the transformNodes() function call to avoid compiler crash. Fixe #110787	2024-10-02 06:01:25 -07:00
Haowei Wu	948326163c	Revert "[SLP]Add debug counter support" This reverts commit f3c408d1726f6a921212faf68085f68bf8533f0c. This breaks LLVM test on debug-counter.ll	2024-10-01 16:15:30 -07:00
Alexey Bataev	f3c408d172	[SLP]Add debug counter support Fixes #110725 Reviewers: aeubanks Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/110734	2024-10-01 16:21:00 -04:00
Alexey Bataev	b16e694948	[SLP]Try to keep operand of external casts as scalars, if profitable If the cost of original scalar instruction + cast is better than the extractelement from the vector cast instruction, better to keep original scalar instructions, where possible Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/110537	2024-10-01 13:35:42 -04:00
Han-Kuan Chen	cc01112660	[SLP][REVEC] getTypeSizeInBits should apply to scalar type instead of FixedVectorType. (#110610 ) reference: https://github.com/llvm/llvm-project/issues/109835	2024-10-01 19:15:58 +08:00
Jeremy Morse	96f37ae453	[NFC] Use initial-stack-allocations for more data structures (#110544 ) This replaces some of the most frequent offenders of using a DenseMap that cause a malloc, where the typical element-count is small enough to fit in an initial stack allocation. Most of these are fairly obvious, one to highlight is the collectOffset method of GEP instructions: if there's a GEP, of course it's going to have at least one offset, but every time we've called collectOffset we end up calling malloc as well for the DenseMap in the MapVector.	2024-09-30 23:15:18 +01:00
Han-Kuan Chen	061762933b	[SLP][REVEC] Fix cost model for getBuildVectorCost with FixedVectorType ScalarTy. (#110073 ) BoUpSLP::gather always use CreateInsertVector for FixedVectorType ScalarTy.	2024-09-30 21:51:12 +08:00
Alexey Bataev	f49344e19d	[SLP]Check if number of elements forms a full register Need to check if number of elements form a full register before trying per-register permutations to avoid compiler crash	2024-09-27 12:54:56 -07:00
Alexey Bataev	af6354634d	[SLP]Look for vector user when estimating the cost Need to find the first vector node user, not the very first user node at all. The very first user might be a gather, vectorized as clustered, which may cause compiler crash. Fixes https://github.com/llvm/llvm-project/issues/110193	2024-09-27 04:14:28 -07:00
Alexey Bataev	be6aed90c7	[SLP]Use number of scalars as a vector length for minbw cast Need to use the number of scalars, not the vector factor of the node. Otherwise incorrect casting can be estimated, leading to a compiler crash.	2024-09-26 13:06:19 -07:00
Alexey Bataev	100fd0cd5a	[SLP]Fix a crash when trying to identify one source order Need to check that order index is not out-of-boundaries when trying to detect that the reuse mask is one-source-mask with clusters to fix compiler crash	2024-09-26 04:47:48 -07:00
Jeremy Morse	056a3f4673	[NFC] Reapply 3f37c517f, SmallDenseMap speedups This time with 100% more building unit tests. Original commit message follows. [NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417) If we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic. Discovered by instrumenting DenseMap with some accounting code, then selecting sites where we'll get the most bang for our buck.	2024-09-26 10:49:29 +01:00
Alexey Bataev	1bfca99909	[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands. Patch adds basic support for non-power-of-2 number of elements in operands. The patch still requires that this number addresses whole registers. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/107273	2024-09-25 14:38:17 -07:00
Nikita Popov	29b92d0774	Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands." This reverts commit 6b109a34ccedd3c75a067e322da0386c156c241d. This causes a crash when linking lencod in ReleaseThinLTO configuration	2024-09-25 22:05:10 +02:00
Philip Reames	556ec4a726	[SLP] Pass operand info to getCmpSelInstrInfo (#109998 ) Depending on the constant, selects with constant arms can have highly varying cost. This adjusts SLP to use the new API introduced in d2885743. Fixes https://github.com/llvm/llvm-project/issues/109466.	2024-09-25 08:17:55 -07:00
Alexey Bataev	6b109a34cc	[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands. Patch adds basic support for non-power-of-2 number of elements in operands. The patch still requires that this number addresses whole registers. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/107273	2024-09-25 10:43:27 -04:00
Philip Reames	d288574363	[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824 ) This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704, and extends the costing API for compares and selects to provide information about the operands passed in an analogous manner. This allows us to model the cost of materializing the vector constant, as some select-of-constants are significantly more expensive than others when you account for the cost of materializing the constants involved. This is a stepping stone towards fixing https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch will be required to utilize the new API.	2024-09-25 07:25:57 -07:00
Alexey Bataev	3469db82b5	[SLP]Add subvector vectorization for non-load nodes Previously SLP vectorize supported clustered vectorization for loads only. This patch adds support for "clustered" vectorization for other instructions. If the buildvector node contains "clusters", which can be vectorized separately and then inserted into the resulting buildvector result, it is better to do, since it may reduce the cost of the vector graph and produce better vector code. The patch does some analysis, if it is profitable to try to do this kind of extra vectorization. It checks the scalar instructions and its operands and tries to vectorize them only if they result in a better graph. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/108430	2024-09-25 10:23:41 -04:00
Jeremy Morse	817e742ba5	Revert "[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417 )" This reverts commit 3f37c517fbc40531571f8b9f951a8610b4789cd6. Lo and behold, I missed a unit test	2024-09-25 14:31:30 +01:00

1 2 3 4 5 ...

1964 Commits