llvm-project

Author	SHA1	Message	Date
Florian Hahn	20fbbd7675	[LV] Add support for cmp reductions with decreasing IVs. (#140451 ) Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y) reductions where one of x or y is a decreasing induction, producing a SMin reduction. It uses signed max as sentinel value. PR: https://github.com/llvm/llvm-project/pull/140451	2025-06-29 11:17:03 +01:00
Ramkumar Ramachandra	bb8c42e859	[LV] Extend FindLastIV to unsigned case (#141752 ) Split the FindLastIV RecurKind into SMax and UMax variants, depending on the reduction op produced.	2025-06-23 15:27:49 +01:00
David Green	77941eba7f	[CostModel] Add a DstTy to getShuffleCost (#141634 ) A shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size.	2025-06-21 12:29:29 +01:00
Sander de Smalen	874773635d	[SLP] NFC: Simplify CandidateVFs initialization (#144882 ) Also adds a comment to clarify the meaning of MaxRegVF.	2025-06-20 10:00:55 +01:00
Kazu Hirata	64fe323647	[llvm] Migrate away from ArrayRef(std::nullopt) (NFC) (#144967 ) ArrayRef has a constructor that accepts std::nullopt. This constructor dates back to the days when we still had llvm::Optional. Since the use of std::nullopt outside the context of std::optional is kind of abuse and not intuitive to new comers, I would like to move away from the constructor and eventually remove it. This patch takes care of the llvm side of the migration.	2025-06-19 21:31:26 -07:00
Alexey Bataev	0108a5908c	[SLP]Fix a crash on an subvector size calculation for non-power-of-2 vector Patch fixes cost estimation for the extractelements from non-power-of-2 vectors, defined as subvector extracts. In this case the subvector size might be not adjusted to a whole register size, need to get the minimum between whole vector size and the actual difference to prevent compiler crash. Fixes #143513	2025-06-17 08:58:07 -07:00
Jeffrey Byrnes	c9a87a50ae	[SLPVectorizer] Use accurate cost for external users of resize shuffles (#137419 ) When implementing the vectorization, we potentially need to add shuffles for external users. In such cases, we may be shuffling a smaller vector into a larger vector. When this happens `ResizeToVF` will just build a poison padded identity vector. Then the to build the final shuffle, we just use the `SK_InsertSubvector` mask. This is possibly clearer by looking at the included test in SLPVectorizer/AMDGPU/external-shuffle.ll In the exit block we have a bunch of shuffles to glue the vectorized tree match the `InsertElement` users. `TMP25` holds the result of resizing the v2i16 vectorized sequence to match the `InsertElement` size v16i16. Then `TMP26` is the final shuffle which replaces the `InsertElement` sequence. This is just an insertsubvector. However, when calculating the cost for these shuffles, we aren't modelling this correctly. `ResizeToVF` will indicate to `performExtractsShuffleAction` that we cannot use the original mask due to the resize shuffle. The consequence is that the cost calculation uses a different shuffle mask than what is ultimately used. Going back to the included test, we can consider again `TMP26`. Clearly we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}. However, we will currently calculate the cost with a mask {0, 1, 2, 3, 20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not recognize this as an `SK_InsertSubvector` mask, and targets which have reduced costs for `SK_InsertSubvector` will not accurately calculate the cost.	2025-06-17 08:14:05 -07:00
Jeremy Morse	9eb0020555	[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389 ) Seeing how we can't generate any debug intrinsics any more: delete a variety of codepaths where they're handled. For the most part these are plain deletions, in others I've tweaked comments to remain coherent, or added a type to (what was) type-generic-lambdas. This isn't all the DbgInfoIntrinsic call sites but it's most of the simple scenarios. Co-authored-by: Nikita Popov <github@npopov.com>	2025-06-17 15:55:14 +01:00
Han-Kuan Chen	414710c753	[SLP] Fix isCommutative to check uses of the original instruction instead of the converted instruction. (#143094 )	2025-06-17 22:03:14 +08:00
Gaëtan Bossu	087d83e0c6	[SLP] vectorizeStores: Name things a bit more clearly (NFC) (#144511 ) I believe the new variable names better convey their purpose. However, I also believe that function is more complex than it needs to be, and this tiny patch should be seen as a first step towards (maybe) further refactoring. The previous names were very generic (Size, Sz, Cnt, StartIdx). This made it easy to get confused given that the vecotrizeStores() function is already complex enough. My hope would be to eventually have a function concise enough to clearly see what are the different strategies being attempted to vectorise a group of related store instructions.	2025-06-17 13:20:52 +01:00
Stephen Tozer	aa8a1fa6f5	[DLCov][NFC] Annotate intentionally-blank DebugLocs in existing code (#136192 ) Following the work in PR #107279, this patch applies the annotative DebugLocs, which indicate that a particular instruction is intentionally missing a location for a given reason, to existing sites in the compiler where their conditions apply. This is NFC in ordinary LLVM builds (each function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the instruction in coverage-tracking builds so that it will be ignored by Debugify, allowing only real errors to be reported. From a developer standpoint, it also communicates the intentionality and reason for a missing DebugLoc. Some notes for reviewers: - The difference between `I->dropLocation()` and `I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide to keep some debug info alive, while the latter will always be empty; in this patch, I always used the latter (even if the former could technically be correct), because the former could result in some (barely) different output, and I'd prefer to keep this patch purely NFC. - I've generally documented the uses of `DebugLoc::getUnknown()`, with the exception of the vectorizers - in summary, they are a huge cause of dropped source locations, and I don't have the time or the domain knowledge currently to solve that, so I've plastered it all over them as a form of "fixme".	2025-06-11 17:42:10 +01:00
Kazu Hirata	9ea3972cd1	[Vectorize] Strip away lambdas (NFC) (#143279 ) We don't need lambdas here.	2025-06-08 01:34:09 -07:00
Ramkumar Ramachandra	b40e4ceaa6	[ValueTracking] Make Depth last default arg (NFC) (#142384 ) Having a finite Depth (or recursion limit) for computeKnownBits is very limiting, but is currently a load-bearing necessity, as all KnownBits are recomputed on each call and there is no caching. As a prerequisite for an effort to remove the recursion limit altogether, either using a clever caching technique, or writing a easily-invalidable KnownBits analysis, make the Depth argument in APIs in ValueTracking uniformly the last argument with a default value. This would aid in removing the argument when the time comes, as many callers that currently pass 0 explicitly are now updated to omit the argument altogether.	2025-06-03 17:12:24 +01:00
Alexey Bataev	cb648ba970	[SLP]Check if the user node has instructions, used only outside Gather nodes with parents, which scalar instructions are used only outside, are generated before the whole tree vectorization. Need to teach isGatherShuffledSingleRegisterEntry to check that such nodes are emitted first and they cannot depend on other nodes, which are emitted later. Fixes #141628	2025-05-29 10:09:49 -07:00
Alexey Bataev	aa452b65fc	[SLP]Restore insertion points after gathers vectorization Restore insertion points after gathers vectorization to avoid a crash in a root node vectorization. Fixes #141265	2025-05-24 07:25:20 -07:00
Ramkumar Ramachandra	0240129218	[IVDesc] Unify RecurKinds [I\|F]AnyOf (#118393 ) Co-authored-by: Mel Chen <mel.chen@sifive.com>	2025-05-23 11:57:30 +01:00
Ramkumar Ramachandra	b81170ecff	[IVDesc] Unify RecurKinds [I\|F]FindLastIV (NFC) (#141082 )	2025-05-22 22:48:01 +01:00
Alexey Bataev	2318491432	[SLP][NFC]Do the analysis first and then actual codegen, NFC	2025-05-20 08:12:53 -07:00
Alexey Bataev	a0058d1851	[SLP][NFC]Make TreeEntry a class and store "need-to-schedule" state TreeEntry should be a class, not a struct, since it has private members. Also, do no repeat Does-Not-Need-To-Schedule analysis during codegen, codegen may affect the result of the analysis in future patches. Reviewers: hiraditya, HanKuanChen, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/140734	2025-05-20 10:33:59 -04:00
Alexey Bataev	3918ef3688	[SLP]Fix the analysis for masked compress loads Need to remove the check for Orders in interleaved loads analysis and estimate shuffle cost without the reordering to correctly handle the costs of masked compress loads. Reviewers: hiraditya, HanKuanChen, RKSimon Reviewed By: HanKuanChen, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/140647	2025-05-20 07:31:16 -04:00
Alexey Bataev	30ebcf6280	[SLP][NFC]Store operand entries in the map Instead of looking through all the vectorizable tree to find the operand entry, better to store it in a separate map and perform quick lookup, basing on user tree entry and operand index. It allows to remove lots of duplicated code, simplify processing and fix potential future issues with the analysis, affected by the codegen. Also, improves compile time. Reviewers: HanKuanChen, RKSimon, hiraditya Reviewed By: hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/140549	2025-05-19 19:53:47 -04:00
Alexey Bataev	bb8e2a8937	[SLP]Relax assertion to avoid compiler crash Need to relax the assertion to fix a compiler crash in case if the reordered compress loads are more profitable than the ordered ones. Fixes #140334	2025-05-18 14:26:36 -07:00
Alexey Bataev	fb86b3d96b	[SLP]Change the insertion point for outside-block-used nodes and prevec phi operand gathers Need to set the insertion point for (non-schedulable) vector node after the last instruction in the node to avoid def-use breakage. But it also causes miscompilation with gather/buildvector operands of the phi nodes, used in the same phi only in the block. These nodes supposed to be inserted at the end of the block and after changing the insertion point for the non-schedulable vec block, it also may break def-use dependencies. Need to prevector such nodes, to emit them as early as possible, so the vectorized nodes are inserted before these nodes. Fixes #139728 Recommit after revert 60fb92179291e848eb7b04913bdc818d081db296 Reviewers: hiraditya, HanKuanChen, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/139917	2025-05-18 12:59:36 -07:00
Alexey Bataev	60fb921792	Revert "[SLP]Change the insertion point for outside-block-used nodes and prevec phi operand gathers" This reverts commit d79d9b8fbfc7e8411aeaf2f5e1be9d4247594fee to fix a bug reported in https://github.com/llvm/llvm-project/pull/139917#issuecomment-2888216404	2025-05-17 11:06:37 -07:00
Alexey Bataev	d79d9b8fbf	[SLP]Change the insertion point for outside-block-used nodes and prevec phi operand gathers Need to set the insertion point for (non-schedulable) vector node after the last instruction in the node to avoid def-use breakage. But it also causes miscompilation with gather/buildvector operands of the phi nodes, used in the same phi only in the block. These nodes supposed to be inserted at the end of the block and after changing the insertion point for the non-schedulable vec block, it also may break def-use dependencies. Need to prevector such nodes, to emit them as early as possible, so the vectorized nodes are inserted before these nodes. Fixes #139728 Reviewers: hiraditya, HanKuanChen, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/139917	2025-05-16 12:52:27 -04:00
Ramkumar Ramachandra	c807395011	[LAA/SLP] Don't truncate APInt in getPointersDiff (#139941 ) Change getPointersDiff to return an std::optional<int64_t>, and fill this value with using APInt::trySExtValue. This simple change requires changes to other functions in LAA, and major changes in SLPVectorizer changing types from 32-bit to 64-bit. Fixes #139202.	2025-05-15 10:08:05 +01:00
Kazu Hirata	690a30f3fd	[llvm] Construct SmallVector with ArrayRef (NFC) (#139992 )	2025-05-14 22:30:38 -07:00
Alexey Bataev	a05cf2927a	[SLP][NFC]Use WeakTrackVH instead of Instruction in EntryToLastInstruction Use WEakTrackVH to prevent instability in the vectorizer. Fixes #139729	2025-05-14 11:19:54 -07:00
Alexey Bataev	e1ea86e849	[SLP]Do not try to use interleaved loads, if reordering is required If the interleaved loads require reordering, better to avoid generate load + shuffle sequence, which in this case cannot be recognized as interleaved load. Also, it fixes the issue with the incorrect codegen. Fixes #138923	2025-05-12 14:12:51 -07:00
Han-Kuan Chen	53df6400af	[SLP] Fix incorrect operand order in interchangeable instruction. (#139225 )	2025-05-12 20:03:45 +08:00
Alexey Bataev	c870b675db	[SLP][NFC]Extract values state/operands analysis into separate class Extract values state and operands analysis/building into a separate class. This class allows to localize instrutions state and operands building for future support of copyable elements vectorization. Recommit after revert 10f512074fb13ab5da9f49c25965508f51c8452a Recommit after revert 6a2a8ebe27c1941f5b952313239fc6d155f58e9d Reviewers: HanKuanChen, RKSimon Reviewed By: HanKuanChen Pull Request: https://github.com/llvm/llvm-project/pull/138724	2025-05-11 08:14:05 -07:00
Alex Bradbury	6a2a8ebe27	Revert "[SLP][NFC]Extract values state/operands analysis into separate class" This reverts commit 512a5d0b8aa82749995204f4852e93757192288a. It broke RISC-V vector code generation on some inputs (oggenc.c from llvm-test-suite), as found by our CI. Reduced test case and more information posted in #138274.	2025-05-10 16:02:47 +01:00
Alexey Bataev	512a5d0b8a	[SLP][NFC]Extract values state/operands analysis into separate class Extract values state and operands analysis/building into a separate class. This class allows to localize instrutions state and operands building for future support of copyable elements vectorization. Recommit after revert 10f512074fb13ab5da9f49c25965508f51c8452a Reviewers: HanKuanChen, RKSimon Reviewed By: HanKuanChen Pull Request: https://github.com/llvm/llvm-project/pull/138724	2025-05-09 07:37:37 -07:00
Alexey Bataev	10f512074f	Revert "[SLP][NFC]Extract values state/operands analysis into separate class" This reverts commit 3954e9d6235d4e90c3f786594e877ab83fab3bf1to fix a buildbot https://lab.llvm.org/buildbot/#/builders/46/builds/16518.	2025-05-09 06:52:55 -07:00
Alexey Bataev	3954e9d623	[SLP][NFC]Extract values state/operands analysis into separate class Extract values state and operands analysis/building into a separate class. This class allows to localize instrutions state and operands building for future support of copyable elements vectorization. Reviewers: HanKuanChen, RKSimon Reviewed By: HanKuanChen Pull Request: https://github.com/llvm/llvm-project/pull/138724	2025-05-09 09:38:49 -04:00
Gaëtan Bossu	19174126cf	[SLP] Simplify buildTree() legality checks (NFC) (#138833 ) This NFC aims to simplify the interfaces used in `buildTree()` to make it easier to understand where decisions for legality are made. In particular, there is now a single point of definition for legality decisions. This makes it clear where all those decisions are made. Previously, multiple variables with a large scope were passed by reference.	2025-05-08 08:34:53 +01:00
Alexey Bataev	3aecbbcbf6	[SLP]Do not match nodes if schedulability of parent nodes is different If one user node is non-schedulable and another one is schedulable, such nodes should be considered matched. The selection of the actual insert point in this case differs and the insert points may match, which may cause a compiler crash because of the broken def-use chain. Fixes #137797	2025-05-06 07:52:49 -07:00
Kazu Hirata	6ab7cb7899	[Transforms] Remove unused local variables (NFC) (#138442 )	2025-05-04 00:35:22 -07:00
Craig Topper	123758b1f4	[IRBuilder] Add versions of createInsertVector/createExtractVector that take a uint64_t index. (#138324 ) Most callers want a constant index. Instead of making every caller create a ConstantInt, we can do it in IRBuilder. This is similar to createInsertElement/createExtractElement.	2025-05-02 16:10:18 -07:00
Kazu Hirata	4ec473e0e1	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#138236 )	2025-05-02 08:53:53 -07:00
Alexey Bataev	9400270449	[SLP]Fix comparator for vector operands of extractelements in PHICompare Need to make comparator to follow strict-weak ordering to fix compiler crashes. Fixes #138178	2025-05-01 14:28:20 -07:00
Jonas Paulsson	f5c8c1eedb	[SLPVectorizer] Move X86 specific handling into X86TTIImpl. (#137830 ) `ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" ` changed SLPVectorizer getScalarizationOverhead() to call TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some cases. This was due to X86 specific handlings in these (overridden) methods, and unfortunately the general preference of TTI.getScalarizationOverhead() was dropped. If VL is available it should always be preferred to use getScalarizationOverhead(), and this is indeed the case for SystemZ which has a special insertion instruction that can insert two GPR64s. Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with codegen"` reworked SLPVectorizer getGatherCost() which together with ad9909d caused the SystemZ test vec-elt-insertion.ll to fail. This patch restores the SystemZ test and reverts the change in SLPVectorizer getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always called again. The ForPoisonSrc argument is now passed on to the TTI method so that X86 can handle this as required. Fixes: #135346	2025-04-30 17:11:27 +02:00
Gaëtan Bossu	c5c4f0d11c	[SLP] Simplify tryToFindDuplicates() (NFC) (#135766 ) This NFC aims to simplify the control-flow and interfaces used in tryToFindDuplicates(). The point is to make it easier to understand where decisions for scalar de-duplication are made. In particular: - Limit indentation - Rename some variables to better match their use case - Always give consistent outputs for VL and ReuseShuffleIndices. This makes it possible to use the same code for building gather TreeEntry everywhere. This also allows to remove the TryToFindDuplicates lambda.	2025-04-29 14:47:22 +01:00
Florian Hahn	d68b446933	[IR] Add matchers for remaining FP min/max intrinsics (NFC). (#137612 ) Add dedicated matchers for minimum,maximum,minimumnum and maximumnum intrinsics, similar for the existing matchers for maxnum and minnum. As suggested in https://github.com/llvm/llvm-project/pull/137335. PR: https://github.com/llvm/llvm-project/pull/137612	2025-04-29 12:20:00 +01:00
Alexey Bataev	73d90ec825	[SLP][NFC]Consider non-profitable trees with only phis, gathers, splits and small nodes with reuses Improves compile time for non-profitable cases. Fixes #135965	2025-04-28 03:56:08 -07:00
Florian Hahn	ec1016f7ef	[IVDescriptors] Support reductions with minimumnum/maximumnum. (#137335 ) Add a new reduction recurrence kind for reductions with minimumnum/maximumnum. Such reductions can be vectorized without nsz/nnans, same as reductions with maximum/minimum intrinsics. Note that a new reduction kind is needed to make sure partial reductions are also combined with minimumnum/maximumnum. Note that the final reduction to a scalar value is performed with vector.reduce.fmin/fmax. This should be fine, as the results of the partial reductions with maximumnum/minimumnum silences any sNaNs. In-loop and reductions in SLP are not supported yet, as there's no reduction version of maximumnum/minimumnum yet and fmax may be incorrect. PR: https://github.com/llvm/llvm-project/pull/137335	2025-04-28 11:16:36 +01:00
Kazu Hirata	5cfd81b0cc	[llvm] Use range constructors of *Set (NFC) (#137552 )	2025-04-27 15:59:57 -07:00
Matt Arsenault	4ea2278e39	SLPVectorizer: Use use_empty instead of hasNUses(0) (#137336 )	2025-04-25 17:27:01 +02:00
Alexey Bataev	a7a74b349d	[SLP]Improve reordering of the alternate nodes Better to preserve the original order of the alternate nodes to avoid inter-lane shuffling, select/insert subvector patterns provide better perf. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/136329	2025-04-24 14:33:10 -04:00
Alexey Bataev	f427890a1d	[SLP]Fix PHI comparator to make it follow weak strict ordering restriction Fixes #137164	2025-04-24 11:08:17 -07:00

1 2 3 4 5 ...

2265 Commits