llvm-project

Author	SHA1	Message	Date
Alexey Bataev	dd5ba694bd	[SLP]Recalculate deps for potential control-dependent schedule data After clearing the dependencies in copyable data, need to recalculate dependencies for the original ScheduleData, if it can be marked as control dependent. Fixes #153289	2025-08-13 08:18:26 -07:00
Sam Tebbs	0bfa1718af	[LV] Create in-loop sub reductions (#147026 ) This PR allows the loop vectorizer to handle in-loop sub reductions by forming a normal in-loop add reduction with a negated input. Stacked PRs: 1. -> https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-08-12 10:22:41 +01:00
Alexey Bataev	2d7b55a028	[SLP]Initial support for copyable elements Adds initial support for copyable elements, both schedulable and non-schedulable. Adds support only for add for now, other opcodes will added in future. Still some cases are not handled, e.g. stores do not include this, because currently do not check for copyable elements. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/147366	2025-08-11 09:41:19 -04:00
Alexey Bataev	67af2f6c5c	[SLP]Initial FMAD support (#149102 ) Added initial check for potential fmad conversion in reductions and operands vectorization. Added the check for instruction to fix #152683 Skipped the code for reduction to avoid regressions.	2025-08-11 05:53:55 -07:00
David Green	cfe190979e	Revert "[SLP]Initial FMAD support (#149102 )" This reverts commit 0fffb9f9ed81f4c2084b8fe040c88b60bb6c372a due to major performance regressions.	2025-08-10 15:16:01 +01:00
Alexey Bataev	0fffb9f9ed	[SLP]Initial FMAD support (#149102 ) Added initial check for potential fmad conversion in reductions and operands vectorization. Added the check for instruction to fix #152683	2025-08-08 10:30:23 -07:00
Alexey Bataev	0419b459be	Revert "[SLP]Initial FMAD support (#149102 )" This reverts commit 0bcf45ea3458ba79eb4257afcfd6af954292c9ce to fix the regresions, reported in https://github.com/llvm/llvm-project/issues/152683	2025-08-08 09:17:59 -07:00
Alexey Bataev	adae370805	[SLP][NFC]Cleanup undefs and the whole test, NFC	2025-08-07 13:41:22 -07:00
Alexey Bataev	0bcf45ea34	[SLP]Initial FMAD support (#149102 ) Added initial check for potential fmad conversion in reductions and operands vectorization.	2025-08-07 09:51:43 -04:00
Ramkumar Ramachandra	edeee824f0	Reland [VectorUtils] Trivially vectorize ldexp, [l]lround (#152476 ) Changes: The original patch, landed as 1336675, was reverted due to a bug in LoopVectorize resulting in a crash. The bug has now been fixed by 95c32bf ([VPlan] Return invalid cost if any skeleton block has invalid costs), and this reland is identical to the original patch.	2025-08-07 12:07:29 +01:00
Mikhail Gudim	3404c0b013	Slp basic test (#152355 ) Add a basic test for SLPVectorizer to make sure that upcoming refactoring patches don't break anything. Also, record a test for a missed opportunity.	2025-08-06 14:54:50 -04:00
Alexey Bataev	e27831ff9b	[SLP] Fix a check for main/alternate interchanged instruction If the instruction is checked for matching the main instruction, need to check if the opcode of the main instruction is compatible with the operands of the instruction. If they are not, need to check the alternate instruction and its operands for compatibility and return alternate instruction as a match. Fixes #151699 Fixed check for non-supported binary operations.	2025-08-04 11:20:54 -07:00
Michael Halkenhäuser	70af09e3a1	Revert "[SLP] Fix a check for main/alternate interchanged instruction" (#151997 ) This reverts commit 3ee8d047109ea4bb479095f4b153c2120a8d726c. Revert reason: FAILED build for openmp-offload-amdgpu-runtime-2 https://lab.llvm.org/buildbot/#/builders/10/builds/10827	2025-08-04 12:57:20 -04:00
Alexey Bataev	3ee8d04710	[SLP] Fix a check for main/alternate interchanged instruction If the instruction is checked for matching the main instruction, need to check if the opcode of the main instruction is compatible with the operands of the instruction. If they are not, need to check the alternate instruction and its operands for compatibility and return alternate instruction as a match. Fixes #151699	2025-08-04 08:31:35 -07:00
Alexey Bataev	7cd1ce3aa0	[SLP]Check vector-like instruction for dominance in copyables Need to check if the vector-like instruction is dominated by main operation in the copyables to prevent broken def-use chain Fixes #151456	2025-08-04 06:14:19 -07:00
David Green	b30d5315b7	[AArch64] Add better fcmp costs for expanded predicates (#147940 ) Certain fcmp predicates need to be expanded into multiple operations and or'd together. This adds some more accurate cost modelling for them based on the predicate. Unsupported operations are given the cost of a libcall and the latency is set to 2 as that seemed to be fairly common between different CPUs.	2025-08-04 13:42:57 +01:00
Muhammad Omair Javaid	176d54aa33	Revert "[VectorUtils] Trivially vectorize ldexp, [l]lround (#145545 )" This reverts commit 13366759c3b9db9366659d870cc73c938422b020. This broke various LLVM testsuite buildbots for AArch64 SVE, but the problem got masked because relevant buildbots were already failing due to other breakage. It has broken llvm-test-suite test: gfortran-regression-compile-regression__vect__pr106253_f.test https://lab.llvm.org/buildbot/#/builders/4/builds/8164 https://lab.llvm.org/buildbot/#/builders/17/builds/9858 https://lab.llvm.org/buildbot/#/builders/41/builds/8067 https://lab.llvm.org/buildbot/#/builders/143/builds/9607	2025-08-01 01:24:52 +05:00
Ramkumar Ramachandra	13366759c3	[VectorUtils] Trivially vectorize ldexp, [l]lround (#145545 )	2025-07-29 19:23:09 +01:00
Simon Pilgrim	0fa0ce1f3a	[CostModel][X86] Update SK_Broadcast based on cost kinds (#150620 ) When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds Regenerated with my check_cost_tables.py helper script	2025-07-26 13:52:47 +01:00
Alexey Bataev	ef98e248c7	[SLP]Initial support for copyable elements (non-schedulable only) Adds initial support for copyable elements. This patch only models adds and model copyable elements as add <element>, 0, i.e. uses identity constants for missing lanes. Only support for elements, which do not require scheduling, is added to reduce size of the patch. Fixed compile time regressions, reported crashes, updated release notes Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/140279	2025-07-25 10:55:07 -07:00
Martin Storsjö	936ee35dcc	Revert "[SLP]Initial support for copyable elements (non-schedulable only)" This reverts commit 898bba311f180ed54de33dc09e7071c279a4942a. This change caused hangs and crashes, see https://github.com/llvm/llvm-project/pull/140279#issuecomment-3115051063.	2025-07-25 01:22:20 +03:00
Martin Storsjö	bd170b78bb	Revert "[SLP] Check if the user node has state before trying getting main instruction/opcode" This reverts commit c9cea24fe68e24750b2d479144f839e1c2ec9d2b. This is being reverted as it is intermixed with another commit (898bba311f180ed54de33dc09e7071c279a4942a) that needs to be reverted.	2025-07-25 01:22:19 +03:00
Alexey Bataev	c9cea24fe6	[SLP] Check if the user node has state before trying getting main instruction/opcode Need to check if the parent node has state to prevent compiler crashes. Fixes #150479	2025-07-24 12:00:43 -07:00
Alexey Bataev	898bba311f	[SLP]Initial support for copyable elements (non-schedulable only) Adds initial support for copyable elements. This patch only models adds and model copyable elements as add <element>, 0, i.e. uses identity constants for missing lanes. Only support for elements, which do not require scheduling, is added to reduce size of the patch. Fixed compile time regressions, updated release notes Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/140279	2025-07-23 13:38:34 -07:00
Alexey Bataev	a415d68e48	Revert "[SLP]Initial support for copyable elements (non-schedulable only)" This reverts commit e202dba288edd47f1b370cc43aa8cd36a924e7c1 to try to resolve compile time issues, reported in https://llvm-compile-time-tracker.com/compare.php?from=36089e5d983fe9ae00f497c2d500f37227f82db1&to=e202dba288edd47f1b370cc43aa8cd36a924e7c1&stat=instructions%3Au&details=on	2025-07-22 07:39:32 -07:00
Alexey Bataev	e202dba288	[SLP]Initial support for copyable elements (non-schedulable only) Adds initial support for copyable elements. This patch only models adds and model copyable elements as add <element>, 0, i.e. uses identity constants for missing lanes. Only support for elements, which do not require scheduling, is added to reduce size of the patch. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/140279	2025-07-21 14:07:28 -04:00
Alexey Bataev	ff225b5d88	[SLP][NFC]Add a run line for the test, NFC	2025-07-18 10:14:18 -07:00
Nikita Popov	369f749dc4	[SLP] Remove lifetime.start on null pointer in test (NFC)	2025-07-18 12:47:49 +02:00
Alexey Bataev	60ae9c9c63	[SLP]Do not consider non-profitable loads slices If all slices are small and end up with strided or even vectorization states, better to not consider these candidates for the vectorization and try to vectorize the whole bunch as gathered loads. Reviewers: hiraditya, RKSimon, HanKuanChen Reviewed By: RKSimon, HanKuanChen Pull Request: https://github.com/llvm/llvm-project/pull/149209	2025-07-17 08:00:02 -04:00
Florian Hahn	02d3738be9	[AArch64,TTI] Remove RealUse check for vector insert/extract costs. (#146526 ) getVectorInstrCostHelper would return costs of zero for vector inserts/extracts that move data between GPR and vector registers, if there was no 'real' use, i.e. there was no corresponding existing instruction. This meant that passes like LoopVectorize and SLPVectorizer, which likely are the main users of the interface, would understimate the cost of insert/extracts that move data between GPR and vector registers, which has non-trivial costs. The patch removes the special case and only returns costs of zero for lane 0 if it there is no need to transfer between integer and vector registers. This impacts a number of SLP test, and most of them look like general improvements.I think the change should make things more accurate for any AArch64 target, but if not it could also just be Apple CPU specific. I am seeing +2% end-to-end improvements on SLP-heavy workloads. PR: https://github.com/llvm/llvm-project/pull/146526	2025-07-15 15:19:27 +01:00
Gaëtan Bossu	adb6efeac9	[SLP] Fix cost estimation of external uses with wrong VF (#148185 ) It assumed that the VF remains constant throughout the tree. That's not always true. This meant that we could query the extraction cost for a lane that is out of bounds. While experimenting with re-vectorisation for AArch64, we ran into this issue. We cannot add a proper AArch64 test as more changes would need to be brought in. This commit is only fixing the computation of VF and adding an assert. Some tests were failing after adding the assert: - foo() in llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll - test() in llvm/test/Transforms/SLPVectorizer/X86/reduction-with-removed-extracts.ll - test_with_extract() in llvm/test/Transforms/SLPVectorizer/RISCV/segmented-loads.ll	2025-07-15 11:39:09 +01:00
Florian Hahn	eb4de577da	[SLP,AArch64] Update build-vector test to actually build vectors. Update test with all zero constant input values which get folded during IR construction to actually use different input values, which require materializing build vectors.	2025-07-14 13:47:44 +01:00
Alexey Bataev	a999a1b88c	[SLP]Remove emission of vector_insert/vector_extract intrinsics Replaced by the regular shuffles. Fixes #145512 Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/148007	2025-07-11 15:26:45 -04:00
Alexey Bataev	dd60663b9b	[SLP] Emit reduction instead of 2 extracts + scalar op, when vectorizing operands (#147583 ) Added emission of the 2-element reduction instead of 2 extracts + scalar op, when trying to vectorize operands of the instruction, if it is more profitable.	2025-07-10 12:50:52 -07:00
Alex Bradbury	18627e995c	Revert "[SLP] Emit reduction instead of 2 extracts + scalar op, when vectorizing operands (#147583 )" This reverts commit ac4a38e9bd573a173432b89cbef7cce7a48e7907. This breaks the RVV builders (MicroBenchmarks/ImageProcessing/Blur/blur.test and MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test from llvm-test-suite) and reportedly SPEC Accel2023 <https://github.com/llvm/llvm-project/pull/147583#issuecomment-3057183138>.	2025-07-10 14:55:22 +01:00
Simon Pilgrim	59a99c6f2c	[SLP] Drop unnecessary '' from around -passes=... arg lists to appease update_test_checks.py when run on DOS. NFC.	2025-07-10 12:18:41 +01:00
Alexey Bataev	ac4a38e9bd	[SLP] Emit reduction instead of 2 extracts + scalar op, when vectorizing operands (#147583 ) Added emission of the 2-element reduction instead of 2 extracts + scalar op, when trying to vectorize operands of the instruction, if it is more profitable.	2025-07-09 19:52:09 -04:00
Gaëtan Bossu	50facad7fc	[SLP][REVEC] Fix insertelement legality checks (#146921 ) The current code assumes that all the values in VL are valid instructions, while it is possible to get poison.	2025-07-09 10:28:50 +01:00
David Spickett	651c994feb	[llvm][test] Fix REQUIRES in extractelement-insertpoint.ll The target is called "x86" not "x86_64".	2025-07-03 13:14:42 +00:00
Hanyang (Eric) Xu	6e1e89ee38	[SLP] Avoid -passes=instcombine stages in SLP tests (#146257 ) Fixes #145511 Note that there are still two instances of --passes=slp-vectorizer,instcombine left unchanged because it seems that the tests are meant to run in conjunction with instcombine and removing instcombine would invalidate their original objective: [llvm/test/Transforms/SLPVectorizer/arith-div-undef.ll](https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/SLPVectorizer/arith-div-undef.ll) [llvm/test/Transforms/SLPVectorizer/slp-hr-with-reuse.ll](https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/SLPVectorizer/slp-hr-with-reuse.ll)	2025-07-02 06:14:41 -04:00
Luke Lau	d0c1ea928c	[InstCombine] Pull unary shuffles through fneg/fabs (#144933 ) This canonicalizes fneg/fabs (shuffle X, poison, mask) -> shuffle (fneg/fabs X), posion, mask This undoes part of b331a7ebc1e02f9939d1a4a1509e7eb6cdda3d38 and a8f13dbdeb31be37ee15b5febb7cc2137bbece67, but keeps the binary shuffle case i.e. shuffle fneg, fneg, mask. By pulling out the shuffle we bring it inline with the same canonicalisation we perform on binary ops and intrinsics, which the original commit acknowledges it goes in the opposite direction. However nowadays VectorCombine is more powerful and can do more optimisations when the shuffle is pulled out, so I think we should revisit this. In particular we get more shuffles folded and can perform scalarization.	2025-06-30 10:40:12 +01:00
Gheorghe-Teodor Bercea	3df36a2b18	[AMDGPU] Enable vectorization of i8 values. (#134934 ) This patch adjusts the cost model to account for the ability of the AMDGPU optimizer to group together i8 values into i32 values. Co-authored-by: Erich Keane <ekeane@nvidia.com>	2025-06-26 19:15:31 -04:00
Simon Pilgrim	1a60c74c13	[CostModel][X86] SK_InsertSubvector inserted into the lowest subvector should be treated as SK_Select blend (#145892 ) X86 uses implicit widening and BLEND/MOV shuffles in these cases - otherwise we still treat it as a SK_PermuteTwoSrc	2025-06-26 16:00:51 +01:00
Simon Pilgrim	8202c94cec	[CostModel][X86] getMaskedMemoryOpCost - widening masks must compute the cost of the full width insert_subvector across multiple legal vectors (#145693 ) The memory value and mask value types might legalise differently - e.g. a v64i32 might split into 4 x v16i32 / 8 x v8i32 but the mask might legalize as 1 x v64i8 / 2 x v32i8 etc. If the legalised value type has been split, then we must ensure we compute the cost for the entire mask value type and let getShuffleCost handle any legalisation, not assume that only a single trailing split mask will require widening.	2025-06-25 16:30:35 +01:00
Simon Pilgrim	bf4afb08fe	[CostModel] improveShuffleKindFromMask - recognise a SK_PermuteSingleSrc incorrectly tagged as SK_PermuteTwoSrc (#145352 ) If a SK_PermuteTwoSrc shuffle kind's mask only references the first operand, then treat this as SK_PermuteSingleSrc Part of #145335	2025-06-23 20:20:47 +01:00
Matt Arsenault	54015f36c6	AMDGPU: Cost model for minimumnum/maximumnum (#141946 )	2025-06-18 08:19:06 +09:00
Matt Arsenault	c9b2816388	AMDGPU: Fix cost model for 16-bit operations on gfx8 (#141943 ) We should only divide the number of pieces to fit the packed instructions if we actually have pk instructions. This increases the cost of copysign, but is closer to the current codegen output. It could be much cheaper than it is now.	2025-06-18 08:07:03 +09:00
Alexey Bataev	0108a5908c	[SLP]Fix a crash on an subvector size calculation for non-power-of-2 vector Patch fixes cost estimation for the extractelements from non-power-of-2 vectors, defined as subvector extracts. In this case the subvector size might be not adjusted to a whole register size, need to get the minimum between whole vector size and the actual difference to prevent compiler crash. Fixes #143513	2025-06-17 08:58:07 -07:00
Jeffrey Byrnes	c9a87a50ae	[SLPVectorizer] Use accurate cost for external users of resize shuffles (#137419 ) When implementing the vectorization, we potentially need to add shuffles for external users. In such cases, we may be shuffling a smaller vector into a larger vector. When this happens `ResizeToVF` will just build a poison padded identity vector. Then the to build the final shuffle, we just use the `SK_InsertSubvector` mask. This is possibly clearer by looking at the included test in SLPVectorizer/AMDGPU/external-shuffle.ll In the exit block we have a bunch of shuffles to glue the vectorized tree match the `InsertElement` users. `TMP25` holds the result of resizing the v2i16 vectorized sequence to match the `InsertElement` size v16i16. Then `TMP26` is the final shuffle which replaces the `InsertElement` sequence. This is just an insertsubvector. However, when calculating the cost for these shuffles, we aren't modelling this correctly. `ResizeToVF` will indicate to `performExtractsShuffleAction` that we cannot use the original mask due to the resize shuffle. The consequence is that the cost calculation uses a different shuffle mask than what is ultimately used. Going back to the included test, we can consider again `TMP26`. Clearly we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}. However, we will currently calculate the cost with a mask {0, 1, 2, 3, 20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not recognize this as an `SK_InsertSubvector` mask, and targets which have reduced costs for `SK_InsertSubvector` will not accurately calculate the cost.	2025-06-17 08:14:05 -07:00
Han-Kuan Chen	414710c753	[SLP] Fix isCommutative to check uses of the original instruction instead of the converted instruction. (#143094 )	2025-06-17 22:03:14 +08:00

1 2 3 4 5 ...

2295 Commits