llvm-project

Author	SHA1	Message	Date
Alex Bradbury	db5f7dc374	Revert "[SLP]Support LShr as base for copyable elements" This reverts commit ca4ebf95172d24f8c47655709b2c9eb85bda5cb2. Causes compile-time crashes for some inputs with RVV zvl512b/zvl1024b configurations. See here for a minimal reproducer: https://github.com/llvm/llvm-project/pull/153393#issuecomment-3189898813	2025-08-14 22:18:24 +01:00
Alexey Bataev	ca4ebf9517	[SLP]Support LShr as base for copyable elements Added support for LShr instructions as base for copyable elements. Also, added simple analysis for best base instruction selection, if multiple candidates are available. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/153393	2025-08-14 12:35:28 -04:00
Alexey Bataev	d57ab276b6	[SLP] Recalculate cleared deps for potential control schedule data nodes Need to recalculate the dependencies for all potential control data schedule nodes to prevent compiler crash. Fixes #153571	2025-08-14 09:00:42 -07:00
Kazu Hirata	1f04b15c56	[Vectorize] Remove a redundant call to std::unique_ptr<T>::get (NFC) (#153359 )	2025-08-13 10:37:31 -07:00
Alexey Bataev	dd5ba694bd	[SLP]Recalculate deps for potential control-dependent schedule data After clearing the dependencies in copyable data, need to recalculate dependencies for the original ScheduleData, if it can be marked as control dependent. Fixes #153289	2025-08-13 08:18:26 -07:00
Sam Tebbs	0bfa1718af	[LV] Create in-loop sub reductions (#147026 ) This PR allows the loop vectorizer to handle in-loop sub reductions by forming a normal in-loop add reduction with a negated input. Stacked PRs: 1. -> https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-08-12 10:22:41 +01:00
Alexey Bataev	2d7b55a028	[SLP]Initial support for copyable elements Adds initial support for copyable elements, both schedulable and non-schedulable. Adds support only for add for now, other opcodes will added in future. Still some cases are not handled, e.g. stores do not include this, because currently do not check for copyable elements. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/147366	2025-08-11 09:41:19 -04:00
Alexey Bataev	67af2f6c5c	[SLP]Initial FMAD support (#149102 ) Added initial check for potential fmad conversion in reductions and operands vectorization. Added the check for instruction to fix #152683 Skipped the code for reduction to avoid regressions.	2025-08-11 05:53:55 -07:00
David Green	cfe190979e	Revert "[SLP]Initial FMAD support (#149102 )" This reverts commit 0fffb9f9ed81f4c2084b8fe040c88b60bb6c372a due to major performance regressions.	2025-08-10 15:16:01 +01:00
Alexey Bataev	0fffb9f9ed	[SLP]Initial FMAD support (#149102 ) Added initial check for potential fmad conversion in reductions and operands vectorization. Added the check for instruction to fix #152683	2025-08-08 10:30:23 -07:00
Alexey Bataev	0419b459be	Revert "[SLP]Initial FMAD support (#149102 )" This reverts commit 0bcf45ea3458ba79eb4257afcfd6af954292c9ce to fix the regresions, reported in https://github.com/llvm/llvm-project/issues/152683	2025-08-08 09:17:59 -07:00
Alexey Bataev	0bcf45ea34	[SLP]Initial FMAD support (#149102 ) Added initial check for potential fmad conversion in reductions and operands vectorization.	2025-08-07 09:51:43 -04:00
Alexey Bataev	4784ce9ebc	[SLP][NFC]Check an external user before trying to address it in debug dump, NFC	2025-08-06 08:58:16 -07:00
Alexey Bataev	e27831ff9b	[SLP] Fix a check for main/alternate interchanged instruction If the instruction is checked for matching the main instruction, need to check if the opcode of the main instruction is compatible with the operands of the instruction. If they are not, need to check the alternate instruction and its operands for compatibility and return alternate instruction as a match. Fixes #151699 Fixed check for non-supported binary operations.	2025-08-04 11:20:54 -07:00
Michael Halkenhäuser	70af09e3a1	Revert "[SLP] Fix a check for main/alternate interchanged instruction" (#151997 ) This reverts commit 3ee8d047109ea4bb479095f4b153c2120a8d726c. Revert reason: FAILED build for openmp-offload-amdgpu-runtime-2 https://lab.llvm.org/buildbot/#/builders/10/builds/10827	2025-08-04 12:57:20 -04:00
Alexey Bataev	3ee8d04710	[SLP] Fix a check for main/alternate interchanged instruction If the instruction is checked for matching the main instruction, need to check if the opcode of the main instruction is compatible with the operands of the instruction. If they are not, need to check the alternate instruction and its operands for compatibility and return alternate instruction as a match. Fixes #151699	2025-08-04 08:31:35 -07:00
Alexey Bataev	7cd1ce3aa0	[SLP]Check vector-like instruction for dominance in copyables Need to check if the vector-like instruction is dominated by main operation in the copyables to prevent broken def-use chain Fixes #151456	2025-08-04 06:14:19 -07:00
Kazu Hirata	3549134836	[Vectorize] Remove an unnecessary cast (NFC) (#151850 ) getNumElements() already returns unsigned.	2025-08-03 08:44:50 -07:00
Alexey Bataev	ef98e248c7	[SLP]Initial support for copyable elements (non-schedulable only) Adds initial support for copyable elements. This patch only models adds and model copyable elements as add <element>, 0, i.e. uses identity constants for missing lanes. Only support for elements, which do not require scheduling, is added to reduce size of the patch. Fixed compile time regressions, reported crashes, updated release notes Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/140279	2025-07-25 10:55:07 -07:00
Martin Storsjö	936ee35dcc	Revert "[SLP]Initial support for copyable elements (non-schedulable only)" This reverts commit 898bba311f180ed54de33dc09e7071c279a4942a. This change caused hangs and crashes, see https://github.com/llvm/llvm-project/pull/140279#issuecomment-3115051063.	2025-07-25 01:22:20 +03:00
Martin Storsjö	bd170b78bb	Revert "[SLP] Check if the user node has state before trying getting main instruction/opcode" This reverts commit c9cea24fe68e24750b2d479144f839e1c2ec9d2b. This is being reverted as it is intermixed with another commit (898bba311f180ed54de33dc09e7071c279a4942a) that needs to be reverted.	2025-07-25 01:22:19 +03:00
Alexey Bataev	c9cea24fe6	[SLP] Check if the user node has state before trying getting main instruction/opcode Need to check if the parent node has state to prevent compiler crashes. Fixes #150479	2025-07-24 12:00:43 -07:00
Alexey Bataev	898bba311f	[SLP]Initial support for copyable elements (non-schedulable only) Adds initial support for copyable elements. This patch only models adds and model copyable elements as add <element>, 0, i.e. uses identity constants for missing lanes. Only support for elements, which do not require scheduling, is added to reduce size of the patch. Fixed compile time regressions, updated release notes Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/140279	2025-07-23 13:38:34 -07:00
Alexey Bataev	a415d68e48	Revert "[SLP]Initial support for copyable elements (non-schedulable only)" This reverts commit e202dba288edd47f1b370cc43aa8cd36a924e7c1 to try to resolve compile time issues, reported in https://llvm-compile-time-tracker.com/compare.php?from=36089e5d983fe9ae00f497c2d500f37227f82db1&to=e202dba288edd47f1b370cc43aa8cd36a924e7c1&stat=instructions%3Au&details=on	2025-07-22 07:39:32 -07:00
Alexey Bataev	e202dba288	[SLP]Initial support for copyable elements (non-schedulable only) Adds initial support for copyable elements. This patch only models adds and model copyable elements as add <element>, 0, i.e. uses identity constants for missing lanes. Only support for elements, which do not require scheduling, is added to reduce size of the patch. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/140279	2025-07-21 14:07:28 -04:00
Florian Hahn	004c67ea25	[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239 ) Update LV to vectorize maxnum/minnum reductions without fast-math flags, by adding an extra check in the loop if any inputs to maxnum/minnum are NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros are already handled consistently by maxnum/minnum. If any input is NaN, exit the vector loop, compute the reduction result up to the vector iteration that contained NaN inputs and * resume in the scalar loop New recurrence kinds are added for reductions using maxnum/minnum without fast-math flags. PR: https://github.com/llvm/llvm-project/pull/148239	2025-07-18 21:58:19 +01:00
Alexey Bataev	60ae9c9c63	[SLP]Do not consider non-profitable loads slices If all slices are small and end up with strided or even vectorization states, better to not consider these candidates for the vectorization and try to vectorize the whole bunch as gathered loads. Reviewers: hiraditya, RKSimon, HanKuanChen Reviewed By: RKSimon, HanKuanChen Pull Request: https://github.com/llvm/llvm-project/pull/149209	2025-07-17 08:00:02 -04:00
Piotr Fusik	ade2f1023d	[SLP][NFCI] Don't trim indexes, reuse a variable (#149074 )	2025-07-16 14:09:27 +02:00
Piotr Fusik	7674566c96	[SLP][NFC] Simplify `count_if` to `count` (#149072 )	2025-07-16 14:09:09 +02:00
Piotr Fusik	949103b45c	[SLP][NFC] Use range-based `for` in `matchAssociativeReduction` (#149029 )	2025-07-16 14:08:41 +02:00
Jeremy Morse	57a5f9c47e	[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383 ) There are no longer debug-info instructions, thus we don't need this skipping. Horray!	2025-07-15 15:34:10 +01:00
Gaëtan Bossu	adb6efeac9	[SLP] Fix cost estimation of external uses with wrong VF (#148185 ) It assumed that the VF remains constant throughout the tree. That's not always true. This meant that we could query the extraction cost for a lane that is out of bounds. While experimenting with re-vectorisation for AArch64, we ran into this issue. We cannot add a proper AArch64 test as more changes would need to be brought in. This commit is only fixing the computation of VF and adding an assert. Some tests were failing after adding the assert: - foo() in llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll - test() in llvm/test/Transforms/SLPVectorizer/X86/reduction-with-removed-extracts.ll - test_with_extract() in llvm/test/Transforms/SLPVectorizer/RISCV/segmented-loads.ll	2025-07-15 11:39:09 +01:00
Alexey Bataev	a999a1b88c	[SLP]Remove emission of vector_insert/vector_extract intrinsics Replaced by the regular shuffles. Fixes #145512 Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/148007	2025-07-11 15:26:45 -04:00
Gaëtan Bossu	d386b3b0b5	[SLP] Harmonise findLaneForValue() return type (NFC) (#148232 ) The lane is computed as an unsigned, so let's return it as unsigned.	2025-07-11 14:05:22 +01:00
Alexey Bataev	dd60663b9b	[SLP] Emit reduction instead of 2 extracts + scalar op, when vectorizing operands (#147583 ) Added emission of the 2-element reduction instead of 2 extracts + scalar op, when trying to vectorize operands of the instruction, if it is more profitable.	2025-07-10 12:50:52 -07:00
Alex Bradbury	18627e995c	Revert "[SLP] Emit reduction instead of 2 extracts + scalar op, when vectorizing operands (#147583 )" This reverts commit ac4a38e9bd573a173432b89cbef7cce7a48e7907. This breaks the RVV builders (MicroBenchmarks/ImageProcessing/Blur/blur.test and MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test from llvm-test-suite) and reportedly SPEC Accel2023 <https://github.com/llvm/llvm-project/pull/147583#issuecomment-3057183138>.	2025-07-10 14:55:22 +01:00
Alexey Bataev	ac4a38e9bd	[SLP] Emit reduction instead of 2 extracts + scalar op, when vectorizing operands (#147583 ) Added emission of the 2-element reduction instead of 2 extracts + scalar op, when trying to vectorize operands of the instruction, if it is more profitable.	2025-07-09 19:52:09 -04:00
Ramkumar Ramachandra	62f8377e40	[LV] Extend FindFirstIV to unsigned case (#146386 ) Extend FindFirstIV vectorization to the unsigned case by introducing and handling FindFirstIVUMin. Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-07-09 15:56:40 +01:00
Alexey Bataev	9e132f5068	[SLP][NFC]Move function SLPVectorizerPass::tryToVectorize around, NFC	2025-07-09 05:34:36 -07:00
Gaëtan Bossu	50facad7fc	[SLP][REVEC] Fix insertelement legality checks (#146921 ) The current code assumes that all the values in VL are valid instructions, while it is possible to get poison.	2025-07-09 10:28:50 +01:00
Rahul Joshi	b38de6c18e	[NFCI][LLVM] Adopt `ArrayRef::consume_front()` in a few places (#146793 )	2025-07-04 10:42:14 -07:00
Austin	a550fef906	[llvm] Use llvm::fill instead of std::fill(NFC) (#146911 ) Use llvm::fill instead of std::fill	2025-07-04 14:10:28 +08:00
Florian Hahn	20fbbd7675	[LV] Add support for cmp reductions with decreasing IVs. (#140451 ) Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y) reductions where one of x or y is a decreasing induction, producing a SMin reduction. It uses signed max as sentinel value. PR: https://github.com/llvm/llvm-project/pull/140451	2025-06-29 11:17:03 +01:00
Ramkumar Ramachandra	bb8c42e859	[LV] Extend FindLastIV to unsigned case (#141752 ) Split the FindLastIV RecurKind into SMax and UMax variants, depending on the reduction op produced.	2025-06-23 15:27:49 +01:00
David Green	77941eba7f	[CostModel] Add a DstTy to getShuffleCost (#141634 ) A shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size.	2025-06-21 12:29:29 +01:00
Sander de Smalen	874773635d	[SLP] NFC: Simplify CandidateVFs initialization (#144882 ) Also adds a comment to clarify the meaning of MaxRegVF.	2025-06-20 10:00:55 +01:00
Kazu Hirata	64fe323647	[llvm] Migrate away from ArrayRef(std::nullopt) (NFC) (#144967 ) ArrayRef has a constructor that accepts std::nullopt. This constructor dates back to the days when we still had llvm::Optional. Since the use of std::nullopt outside the context of std::optional is kind of abuse and not intuitive to new comers, I would like to move away from the constructor and eventually remove it. This patch takes care of the llvm side of the migration.	2025-06-19 21:31:26 -07:00
Alexey Bataev	0108a5908c	[SLP]Fix a crash on an subvector size calculation for non-power-of-2 vector Patch fixes cost estimation for the extractelements from non-power-of-2 vectors, defined as subvector extracts. In this case the subvector size might be not adjusted to a whole register size, need to get the minimum between whole vector size and the actual difference to prevent compiler crash. Fixes #143513	2025-06-17 08:58:07 -07:00
Jeffrey Byrnes	c9a87a50ae	[SLPVectorizer] Use accurate cost for external users of resize shuffles (#137419 ) When implementing the vectorization, we potentially need to add shuffles for external users. In such cases, we may be shuffling a smaller vector into a larger vector. When this happens `ResizeToVF` will just build a poison padded identity vector. Then the to build the final shuffle, we just use the `SK_InsertSubvector` mask. This is possibly clearer by looking at the included test in SLPVectorizer/AMDGPU/external-shuffle.ll In the exit block we have a bunch of shuffles to glue the vectorized tree match the `InsertElement` users. `TMP25` holds the result of resizing the v2i16 vectorized sequence to match the `InsertElement` size v16i16. Then `TMP26` is the final shuffle which replaces the `InsertElement` sequence. This is just an insertsubvector. However, when calculating the cost for these shuffles, we aren't modelling this correctly. `ResizeToVF` will indicate to `performExtractsShuffleAction` that we cannot use the original mask due to the resize shuffle. The consequence is that the cost calculation uses a different shuffle mask than what is ultimately used. Going back to the included test, we can consider again `TMP26`. Clearly we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}. However, we will currently calculate the cost with a mask {0, 1, 2, 3, 20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not recognize this as an `SK_InsertSubvector` mask, and targets which have reduced costs for `SK_InsertSubvector` will not accurately calculate the cost.	2025-06-17 08:14:05 -07:00
Jeremy Morse	9eb0020555	[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389 ) Seeing how we can't generate any debug intrinsics any more: delete a variety of codepaths where they're handled. For the most part these are plain deletions, in others I've tweaked comments to remain coherent, or added a type to (what was) type-generic-lambdas. This isn't all the DbgInfoIntrinsic call sites but it's most of the simple scenarios. Co-authored-by: Nikita Popov <github@npopov.com>	2025-06-17 15:55:14 +01:00

1 2 3 4 5 ...

2307 Commits