llvm-project

Author	SHA1	Message	Date
Alexey Bataev	a6e7c38ea6	[SLP]Do not vectorize select nodes with scalar and vector conditions If the select nodes contains selects with mixed scalar/vector conditions, such nodes should not be revectorized. Fixes #170836	2026-03-01 07:01:04 -08:00
Alexey Bataev	e317f42455	[SLP]Recalculate dependencies for the buildvector schedule node, if they have copyable node Need to recalculate the deps for all buildvector nodes with copyable deps to prevent a compiler crash during scheduling of instructions	2026-02-28 12:29:47 -08:00
Alexey Bataev	12e1075b64	[SLP]Fix operand reordering when estimating profitability of operands Need to swap operand for a single instruction, not for the the same lane of the first and second instruction in the list	2026-02-27 16:16:22 -08:00
Akash Dutta	cf28f23f10	[SLP] Reject duplicate shift amounts in matchesShlZExt reorder path (#183627 ) In the reordered RHS path of matchesShlZExt, the code never checked that each shift amount (0, Stride, 2×Stride, …) appears at most once. When the same shift appeared in multiple lanes, it still filled Order, producing a non-permutation (e.g. Order = [0,0,0,1]). That led to bad shuffle masks and miscompilation (e.g. shuffles with poison). The patch adds an explicit duplicate check: before setting Order[Idx] = Pos, it ensures Pos has not been seen before, using a SmallBitVector SeenPositions(VF). If a position is seen twice, the function returns false and the optimization is not applied.	2026-02-27 13:00:58 -06:00
Alexey Bataev	c08079d8e7	[SLP]Add single-use check for the bitcasted reduction If the reduced value, to be bitcasted, is used multiple times, it will require emission of the extractelement instruction. Such nodes should not be bitcasted, should be vectorized as vector instructions. Fixes https://github.com/llvm/llvm-project/pull/181940#issuecomment-3950734168	2026-02-24 05:27:38 -08:00
Alexey Bataev	95a960daa0	[SLP]Do not convert inversed cmp nodes, if they reordered/reused If the cmp node with inversed compares must be reordered/shuffled with the reuses, disable transformation for such nodes for now, they require some special processing. Fixes https://github.com/llvm/llvm-project/pull/181580#issuecomment-3933026221	2026-02-20 06:04:51 -08:00
Alexey Bataev	29d4fea59b	[SLP]Handle mixed select-to-bicasts and general reductions If the reduction tree represents mixed select-to-bitcasts and general reductions, need to handle them correctly to avoid a compiler crash Fixes https://github.com/llvm/llvm-project/pull/181940#issuecomment-3929220929	2026-02-19 13:38:34 -08:00
Alexey Bataev	38d804725f	[SLP]Do not mark for transforming to buildvector inversed compares Inversed compares must remain vector nodes, they should be converted to gathers to generate correct code. Fixes issue reported in https://github.com/llvm/llvm-project/pull/181580#issuecomment-3926951332	2026-02-19 09:37:50 -08:00
Alexey Bataev	c6425aa9ae	[SLP]Support reduced or selects of bitmask as cmp bitcast Converts reduced or(select %cmp, bitmask, 0) to zext(bitcast %vector_cmp to i<num_reduced_values>) to in Reviewers: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/181940	2026-02-18 18:01:42 -05:00
Alexey Bataev	a7c25ba33c	[SLP][NFC]Fix reorered -> reordered	2026-02-18 13:54:39 -08:00
Ryan Buchner	a6e8de7407	[SLP][NFC] Fix `MainOp/AltOp` assertion to check the correct value (#182093 ) Previously both assertions were checking MainOp. Initial assertion added incorrectly in d41e517748e2d.	2026-02-18 13:08:10 -08:00
Alexey Bataev	a5aaa9dc63	[SLP]Convert compares from zexts, promoted to selects, to inversed op, if improves codegen Some of the zext i1 (cmp) + select sequences can be transformed by inverting compare predicates to remove extra shuffles, like zext 1 (cmp ne) + select (cmp eq), 0, 2 can be modeled as select <2 x > (cmp ne), <1, 2>, zeroinitializer Reviewers: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/181580	2026-02-17 13:26:06 -05:00
Alexey Bataev	26f944bb50	[SLP]Fix an ArrayRef out-of-bounds access in slice If the revec is enabled, may have the number of parts (registers) for the combined node, not a single element node, so need to check for potential out-of-bounds access Fixes #181798	2026-02-17 10:00:13 -08:00
Alexey Bataev	ef52df4365	[SLP]Do not increase depth for type-changing nodes and NotProfitableForVectorization removal The patch changes the maximum tree size analysis. 1. Do not increase depth for type changing nodes (like casts and compares), allowing more deeper trees to be built. 2. Removes NotProfitableForVectorization workaround, not needed anymore after throttling enabled Reviewers: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/180950	2026-02-17 09:52:23 -05:00
Alexey Bataev	2ff4ec172a	[SLP]Fix revec in split nodes Initially split nodes do not support vector entries in revec mode, patch fixes the issue by adding analysis for the scale factor Fixes #181546	2026-02-16 14:09:28 -08:00
Alexey Bataev	7ec7907b80	[SLP] Fix a very long loads offset, being stored in DenseMap Added a check for a very long offset to avoid a crash in the compiler Fixes #181682	2026-02-16 11:07:22 -08:00
Alexey Bataev	255b493673	[SLP]Do not overflow number of the reduced values Need to trunc the total number of the reduced values, in case if the number is too big Fixes #181520	2026-02-15 11:02:32 -08:00
Ryan Buchner	f2903793de	[SLP][NFC] Use static_assert to confirm SupportedOps is sorted (#181397 ) Can be checked at compile time.	2026-02-13 11:30:59 -08:00
Alexey Bataev	e93829e807	[SLP]Fix crash with deleted non-copyable node in scheduling copyables If the copyables are parts of the deleted nodes, need to check the actual tree to correctly handling the scheduling of copyables	2026-02-12 11:42:02 -08:00
Ryan Buchner	95ef1a5c31	[SLP] Use the correct identity when combining binary opcodes with AND/MUL (#180457 ) Fixes #180456 Fix bug in the following SLP lowering: ``` define void @sub_mul(ptr %p, ptr %s) { entry: %p1 = getelementptr i16, ptr %p, i64 1 %l0 = load i16, ptr %p %l1 = load i16, ptr %p1 %mul0 = sub i16 %l0, 0 %mul1 = mul i16 %l1, 5 %s1 = getelementptr i16, ptr %s, i64 1 store i16 %mul0, ptr %s store i16 %mul1, ptr %s1 ret void } ``` to ``` define void @sub_mul(ptr %p, ptr %s) { entry: %tmp0 = load <2 x i16>, ptr %p, align 2 %tmp1 = mul <2 x i16> %tmp0, <i16 0, i16 5> -> updates to <i16 1, i16 5> store <2 x i16> %tmp1, ptr %s, align 2 ret void } ```	2026-02-12 09:34:44 -08:00
Alexey Bataev	fc648683cd	[SLP]Add external uses estimations into tree throttling Added basic estimations for the external uses, when calculating the cost of the non-profitable trees. Excluding stores/insertelement, as thay are very good candidates for the vectorization. Also, tuned buildvector/gather cost with minimum bitwidth analysis data. Reviewers: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/178024	2026-02-11 16:14:34 -05:00
Alexey Bataev	601364f1ba	[SLP]Correctly process deleted gathered loads and short trees If the gathered loads nodes are deleted for deletion, need to actually deleted them from tree. Also, if the remaining tree is too short (buildvector + gather node), need to skip such trees to avoid hanging. Fixes #180846	2026-02-11 10:27:01 -08:00
Alexey Bataev	54cdd903b8	[SLP]Skip operands comparing on non-matching (but compatible) instructions If the instructions are compatible but non-matching (zext-select pair as example), no need to perform operands analysis, just return that they are matching.	2026-02-11 04:55:29 -08:00
David Sherwood	6f0b8a7ebc	[SLP] Use the correct calling convention for vector math routines (#180759 ) When vectorising calls to math intrinsics such as llvm.pow we correctly detect and generate calls to the corresponding vector math variant. However, we don't pick up and use the calling convention for the vector math function. This matters for veclibs such as ArmPL where the aarch64_vector_pcs calling convention can improve codegen by reducing the number of registers that need saving across calls.	2026-02-11 10:52:58 +00:00
Alexey Bataev	78490acb32	[SLP]Support for zext i1 %x modeling as select %x, 1, 0 Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are other select instructions, which can be combined into a bundle. Fixes #178403 Recommit after revert in 993e1f66afcfe9da03bd813e669eada341b11d2f Reviewers: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/180635	2026-02-10 12:54:12 -08:00
Alexey Bataev	993e1f66af	Revert "[SLP]Support for zext i1 %x modeling as select %x, 1, 0" This reverts commit 70aebae2a13114f4e3d5e2460c052d8f3de295be to fix buildbots https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flab.llvm.org%2Fbuildbot%2F%23%2Fbuilders%2F85%2Fbuilds%2F18614&data=05%7C02%7C%7Ce5641da3fe984280a6e908de68b3658c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639063316889757116%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=65hUwLDdZkXq3zUEt3cVuqJNwXN7Alw4JKDggDbjeVk%3D&reserved=0	2026-02-10 06:49:53 -08:00
Alexey Bataev	70aebae2a1	[SLP]Support for zext i1 %x modeling as select %x, 1, 0 Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are other select instructions, which can be combined into a bundle. Fixes #178403 Reviewers: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/180635	2026-02-10 08:59:44 -05:00
Alexey Bataev	fe754dff6d	[SLP]Remove LoadCombine workaround after handling of the copyables LoadCombine pattern handling was added as a workaround for the cases, where the SLP vectorizer could not vectorize the code effectively. With the copyables support, it can handle it directly. Also, patch adds support for scalar loads[ + bswap] pattern for byte sized loads (+ reverse bytes for bswap) Recommit after revert in 6377c86d718232fe60c548dfd7ab439f7ff84df7 Reviewers: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/174205	2026-02-05 11:16:08 -08:00
Alexey Bataev	6377c86d71	Revert "[SLP]Remove LoadCombine workaround after handling of the copyables" This reverts commit 8dbb9f66e8b14a8a06f1873a2c1b7dce366ed2d6 to fix buildbot issues https://lab.llvm.org/buildbot/#/builders/224/builds/2795	2026-02-05 09:57:00 -08:00
Alexey Bataev	8dbb9f66e8	[SLP]Remove LoadCombine workaround after handling of the copyables LoadCombine pattern handling was added as a workaround for the cases, where the SLP vectorizer could not vectorize the code effectively. With the copyables support, it can handle it directly. Also, patch adds support for scalar loads[ + bswap] pattern for byte sized loads (+ reverse bytes for bswap) Reviewers: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/174205	2026-02-05 10:42:08 -05:00
Florian Hahn	05a2b146fb	[LV] Optimize FindLast recurrences to FindIV (NFCI). (#177870 ) This patch restructures Find(First\|Last)IV handling. Instead of differentiating between FindLast, FindFirstIV and FindLastIV up front, this patch simplifies the logic in IVDescriptor to just identify the FindLast pattern up-front. It then adds a new VPlan transformation to optimize FindLast reductions to FindIV reductions if there is a suitable sentinel value. Find(Last\|First)IV recurrence kinds to a single FindIV kind. This is simpler and more accurate, given selecting the first/last induction of the final IV reduction is directly controlled by the corresponding recurrence kind of the ComputeReductionResult. The new structure also allows further optimizations, like vectorizing FindLastIV with another boolean reduction that tracks if the condition in the loop was ever true, if there is no suitable sentinel value. PR: https://github.com/llvm/llvm-project/pull/177870	2026-02-05 13:57:20 +00:00
Alexey Bataev	46a38488a4	[SLP]Disable modeling disjoint reduction or as bitcast for big endian Big endian targets cannot be modeled as bitcast, need to support it as a reversion/bswap instead, just disabling it for now.	2026-02-03 06:16:25 -08:00
Ryan Buchner	e5b99502d7	[SLP] Avoid adding duplicate VFs into vectorizeStores()::CandidateVFs (#179296 ) Small compile time improvement: ``` stage1-O3: (-0.01%) stage1-ReleaseThinLTO (-0.00%) stage1-ReleaseLTO-g (-0.01%) stage1-O0-g (-0.00%) stage1-aarch64-O3 (+0.01%) stage1-aarch64-O0-g (-0.02%) stage2-O3 (-0.00%) stage2-O0-g (-0.03%) stage2-clang (+0.00%) ``` Also changes/removes a few comments for clarity.	2026-02-02 11:03:27 -08:00
Ryan Buchner	b936771eea	[SLP][NFC] Refactor vectorizeStores::RangeSizes (#177241 ) Currently `RangeSizes` is used to allow us to skip trying to vectorize clearly unprofitable trees by caching prior attempts `TreeSizes`. This PR refactors that logic to simplify and improve readability. This will make it easier to handle the strided stores. Switches RangeSizes to use `first` as the location to lookup values from, and `second` as the location to store values to. `first` gets updated by `second` at the appropriate times to match the behavior prior to this change.	2026-01-30 10:25:15 -08:00
Alexey Bataev	b73122d5b7	[SLP]Cast incoming value to a propr type for int nodes, bitcasted to fp Before casting the value to FP type, need to check, if the type for reduced during minbitwidth analysis and need to restore the original source type to generate correct bitcast operation. Fixes #178884	2026-01-30 08:51:03 -08:00
Alexey Bataev	2ea77ed013	[SLP]Support for bswap pattern for bytes-based disjoint or reductions If the reduction forms reversed bitcast, we can represent it as a bitcast + bswap, if the source elements are byte sized Reviewers: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/178513	2026-01-29 11:28:01 -05:00
Alexey Bataev	f6fe6fc86a	[SLP]Do not vectorize subtrees of the split node, marked as gathers. If the split node was marked as gather/buildvector nodes, the vectorizer should not vectorize its subtrees, which are marked as deleted.	2026-01-28 17:44:38 -08:00
Alexey Bataev	5413a22e79	[SLP] Reordered disjoint or reduction of shl(zext, (0, stride, 2* stride)) modelled as bitcast Added support for reorder reduction of shl(zext)-like construct. Such constructs are modelled currently as shuffle + bitcast. Reviewers: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/178292	2026-01-28 11:15:54 -05:00
Alexey Bataev	68450ba210	[SLP]Support for tree throttling in SLP graphs with gathered loads Gathered loads forming DAG instead of trees in SLP vectorizer. When doing the throttling analysis for such graphs, need to consider partially matched gathered loads DAG nodes and consider extract and/or gather operations and their costs. The patch adds this analysis and allows cutting off the expensive sub-graphs with gathered loads. Reviewers: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/177855 Recommit after revert in d733771113339608aff6002d1fa89aaf4a51c502, which was related to a crash in SelectionDAG	2026-01-28 07:19:09 -08:00
Soumik15630m	51845a53fd	[SLP] Fix crash on extractelement with out-of-bounds index.....Fixes … (#176918 ) …The cose modeling logic was attempting to set a bit in APInt for an out-of-bounds index, causing an assertion failure. This patch ignores OOB indices as they produce poison- which is already handled. Fixes #176780 this is the same test result which produces this bug <img width="1600" height="964" alt="image" src="https://github.com/user-attachments/assets/80593902-9d15-4e18-850b-a558bca8518e" />	2026-01-28 06:08:03 -05:00
Nico Weber	d733771113	Revert "[SLP]Support for tree throttling in SLP graphs with gathered loads" This reverts commit 0666a777ec8138f58ebc7fc41a2fb8097328308a. Makes clang assert, see repro at https://github.com/llvm/llvm-project/pull/177855#issuecomment-3808529832	2026-01-27 21:01:56 -05:00
serge-sans-paille	84cccfc828	[perf] Replace copy-assign by move-assign in llvm/lib/Transforms/* (#178178 )	2026-01-27 16:29:35 +00:00
Alexey Bataev	5786ca7bd0	[SLP]Model disjoint or reduction of shl(zext, (0, stride, 2* stride)) as bitcast Patch models the cost and lowering of disjoint or reduction of shl(zext, (0, stride, 2* stride)) as bitcast via modeling as combined ops. Reviewers: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/177041	2026-01-26 15:59:58 -05:00
Alexey Bataev	0666a777ec	[SLP]Support for tree throttling in SLP graphs with gathered loads Gathered loads forming DAG instead of trees in SLP vectorizer. When doing the throttling analysis for such graphs, need to consider partially matched gathered loads DAG nodes and consider extract and/or gather operations and their costs. The patch adds this analysis and allows cutting off the expensive sub-graphs with gathered loads. Reviewers: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/177855	2026-01-25 12:22:47 -05:00
Rahul Joshi	358db292cc	[NFC] Fix typo `instrinsic` -> `intrinsic` (#177627 )	2026-01-23 13:48:30 -08:00
Alexey Bataev	d64d3735ab	[SLP]Correctly handle vector nodes, coming from same incoming blocks in PHI nodes If multiple nodes are generated from same PHI node for the same block, still need to vectorize vector nodes, even if the value for the incoming block was already emitted. Fixes #177124	2026-01-21 12:59:21 -08:00
Ryan Buchner	9c2124e0f6	[NFC][SLP] Fix typo in assertion (#177079 )	2026-01-21 09:39:12 -08:00
Alexey Bataev	3dc5259bc8	[SLP]Do not build bundle for copyables, with parents used in PHI node If the copyables have parents, used in PHI nodes, this causes complex schedulable/non-schedulable dependecies, which require complex processing, but with small profitability. Cut such case early for now to prevent compiler crashes and compile time blow up. Fixes #176658	2026-01-18 13:37:51 -08:00
David Stone	74379c2d44	[llvm][clang] Remove `llvm::OwningArrayRef` (#169126 ) `OwningArrayRef` has several problems. The naming is strange: `ArrayRef` is specifically a non-owning view, so the name means "owning non-owning view". It has a const-correctness bug that is inherent to the interface. `OwningArrayRef<T>` publicly derives from `MutableArrayRef<T>`. This means that the following code compiles: ```c++ void const_incorrect(llvm::OwningArrayRef<int> const a) { a[0] = 5; } ``` It's surprising for a non-reference type to allow modification of its elements even when it's declared `const`. However, the problems from this inheritance (which ultimately stem from the same issue as the weird name) are even worse. The following function compiles without warning but corrupts memory when called: ```c++ void memory_corruption(llvm::OwningArrayRef<int> a) { a.consume_front(); } ``` This happens because `MutableArrayRef::consume_front` modifies the internal data pointer to advance the referenced array forward. That's not an issue for `MutableArrayRef` because it's just a view. It is an issue for `OwningArrayRef` because that pointer is passed as the argument to `delete[]`, so when it's modified by advancing it forward it ceases to be valid to `delete[]`. From there, undefined behavior occurs. It is less convenient than `llvm::SmallVector` for construction. By combining the `size` and the `capacity` together without going through `std::allocator` to get memory, it's not possible to fill in data with the correct value to begin with. Instead, the user must construct an `OwningArrayRef` of the appropriate size, then fill in the data. This has one of two consequences: 1. If `T` is a class type, we have to first default construct all of the elements when we construct `OwningArrayRef` and then in a second pass we can assign to those elements to give what we want. This wastes time and for some classes is not possible. 2. If `T` is a built-in type, the data starts out uninitialized. This easily forgotten step means we access uninitialized memory. Using `llvm::SmallVector`, by constrast, has well-known constructors that can fill in the data that we actually want on construction. `OwningArrayRef` has slightly different performance characteristics than `llvm::SmallVector`, but the difference is minimal. The first difference is a theoretical negative for `OwningArrayRef`: by implementing in terms of `new[]` and `delete[]`, the implementation has less room to optimize these calls. However, I say this is theoretical because for clang, at least, the extra freedom of optimization given to `std::allocator` is not yet taken advantage of (see https://github.com/llvm/llvm-project/issues/68365) The second difference is slightly in favor of `OwningArrayRef`: `sizeof(llvm::SmallVector<T>) == sizeof(void ) 3` on pretty much any implementation, whereas `sizeof(OwningArrayRef) == sizeof(void ) 2` which seems like a win. However, this is just a misdirection of the accounting costs: array-new sticks bookkeeping information in the allocated storage. There are some cases where this is beneficial to reduce stack usage, but that minor benefit doesn't seem worth the costs. If we actually need that optimization, we'd be better served by writing a `DynamicArray` type that implements a full vector-like feature set (except for operations that change the size of the container) while allocating through `std::allocator` to avoid the pitfalls outlined earlier.	2026-01-17 21:06:25 -07:00
Gabriel Baraldi	72a20b8e29	[SLPVectorizer] Check std::optional coming out of getPointersDiff (#175784 ) Fixes https://github.com/llvm/llvm-project/issues/175768 There are other unchecked uses std::optional in this pass but I couldn't figure out a test that triggers them	2026-01-15 09:07:13 -06:00

1 2 3 4 5 ...

2474 Commits