llvm-project

Author	SHA1	Message	Date
David Green	fd40c60665	[VectorCombine] Fix transitive Uses in foldShuffleToIdentity (#188989 ) The Uses in foldShuffleToIdentity is intended to detect where an operand is used to distinguish between splats, identities and concats of the same value. When looking through multiple unsimplified shuffles the same Use could be both a splat and a identity though. This patch changes the Use to a Value and an original Use, so that even if we are looking through multiple vectors we recognise the splat vs identity vs concat of each use correctly. Fixes #180338	2026-04-01 14:53:04 +01:00
wanglei	3e015b89e8	[NFC][test] Precommit test for pr188989 (#188667 ) Precommit test for #188989. This test case covers a scenario in the vector combine foldShuffleToIdentity function where incorrect folding was caused when different shuffle sequences shared the same initial Use *. This issue may be due to cost model differences and currently reproduces only on LoongArch for this test case.	2026-03-30 10:39:42 +08:00
Nathiyaa Sengodan	9d20e75c2d	[VectorCombine] Fix crash in foldShuffleOfSelects for single-element shuffle result (#185713 ) In foldShuffleOfSelects, if the shuffle result has a single element, the resulting type may be scalar rather than a vector. The later code in foldShuffleOfSelects assumes the result is a vector and performs cast< FixedVectorType >, which triggers an assertion. Fixes #183625	2026-03-13 13:24:18 +00:00
V.Rybalov	19a31b27b0	[VectorCombine] Fixing bitcast processing in VectorCombine (#185075 ) Fixing bitcast instruction processing in VectorCombine pass that operates: arbitrary precision integer types.	2026-03-09 12:47:56 -07:00
Valeriy Savchenko	49b77e3b45	[VectorCombine] Fold sign-bit check for multiple vectors (#182911 ) ## Alive2 proofs \| Reduction \| Shift \| Cmp \| Sources \| Proof \| \|-----------\|-------\|----------\|---------\|-------\| \| add \| lshr \| == 0 \| 2 \| [proof](https://alive2.llvm.org/ce/z/f44vco) \| \| add \| lshr \| == 8 \| 2 \| [proof](https://alive2.llvm.org/ce/z/Ks_nea) \| \| add \| ashr \| == 0 \| 2 \| [proof](https://alive2.llvm.org/ce/z/ZsXJ5k) \| \| add \| ashr \| == -8 \| 2 \| [proof](https://alive2.llvm.org/ce/z/HZfans) \| \| add \| lshr \| == 0 \| 3 \| [proof](https://alive2.llvm.org/ce/z/x-dEdz) \| \| add \| lshr \| == 12 \| 3 \| [proof](https://alive2.llvm.org/ce/z/sfNvhr) \| These proofs are not very exhaustive, but somewhat show that it works for addition. Apart from the fact that we use multiple vectors, the proofs from the previous changes generally apply here as well because we effectively match on reductions of size M x N.	2026-03-01 14:07:44 +00:00
Simon Pilgrim	92704064e5	[VectorCombine][X86] Ensure we recognise free sign extends of vector comparison results (#183575 ) Unless we're working with AVX512 mask predicate types, sign extending a vXi1 comparison result back to the width of the comparison source types is free. VectorCombine::foldShuffleOfCastops - pass the original CastInst in the getCastInstrCost calls to track the source comparison instruction. Fixes #165813	2026-02-27 07:55:39 +00:00
Simon Pilgrim	1ad3a030a8	[VectorCombine][X86] shuffle-of-casts.ll - add #165813 test coverage (#183569 )	2026-02-26 17:18:54 +00:00
David Green	f5c3a8e99d	[AArch64] Rename shuffle-of-intrinscis.ll to shuffle-of-intrinsics.ll. NFC	2026-02-26 17:09:16 +00:00
Simon Pilgrim	a17349f787	[VectorCombine][X86] shuffle-of-casts.ll - add x86-64-v2 and x86-64-v4 test coverage (#183562 ) Prep work for #165813	2026-02-26 16:43:09 +00:00
Valeriy Savchenko	966a4618b8	[VectorCombine] Support ashr sign-bit extraction (#181998 ) This change extends a sign-bit reduction fold introduced earlier. Prior to it, we only supported LSHR isntructions for sign-bits extraction. Similar logic can be applied to ASHR and the fold can be generalized. ## Alive2 proofs \| Reduction \| == 0 \| == -1 / -N \| slt 0 \| sgt -1 / -N \| \|-----------\|------\|------------\|-------\|-------------\| \| or \| [proof](https://alive2.llvm.org/ce/z/DaSMPt) \| [proof](https://alive2.llvm.org/ce/z/wzR48R) \| [proof](https://alive2.llvm.org/ce/z/rfyr_7) \| [proof](https://alive2.llvm.org/ce/z/MTFFe5) \| \| and \| [proof](https://alive2.llvm.org/ce/z/PmmpbX) \| [proof](https://alive2.llvm.org/ce/z/7_9hSn) \| [proof](https://alive2.llvm.org/ce/z/wudWY3) \| [proof](https://alive2.llvm.org/ce/z/QZ33KB) \| \| umax \| [proof](https://alive2.llvm.org/ce/z/gQGnDc) \| [proof](https://alive2.llvm.org/ce/z/dMsoQF) \| [proof](https://alive2.llvm.org/ce/z/QwFbae) \| [proof](https://alive2.llvm.org/ce/z/3dbmy6) \| \| umin \| [proof](https://alive2.llvm.org/ce/z/Z2pZUQ) \| [proof](https://alive2.llvm.org/ce/z/6FQgGR) \| [proof](https://alive2.llvm.org/ce/z/95-em6) \| [proof](https://alive2.llvm.org/ce/z/PW7c-m) \| \| add \| [proof](https://alive2.llvm.org/ce/z/FVhuhj) \| [proof](https://alive2.llvm.org/ce/z/h1B9jQ) \| [proof](https://alive2.llvm.org/ce/z/DmiYRr) \| [proof](https://alive2.llvm.org/ce/z/P4WDN5) \|	2026-02-23 13:01:39 +00:00
Simon Pilgrim	d8e679c286	[CostModel][X86] getShuffleCost - SK_Transpose v4f64/v4i64 matches UNPCK - don't generalise to SK_PermuteTwoSrc (#180514 ) Other SK_Transpose shuffles can be cheaper than SK_PermuteTwoSrc but this is the easy one to handle Fixes #161980	2026-02-09 14:09:49 +00:00
Simon Pilgrim	4bb16b130d	[VectorCombine][X86] Add test coverage for #161980 (#180508 )	2026-02-09 12:11:37 +00:00
Florian Hahn	689c99557f	[VectorCombine] Skip dead shufflevector in GetIndexRangeInShuffles to fix crash. (#179217 ) Update GetIndexRangeInShuffles to skip unused shuffles. This matches the behavior in the loop below and without it, we end up with an index mis-match, causing a crash for the added test case. PR: https://github.com/llvm/llvm-project/pull/179217	2026-02-06 12:05:47 +00:00
Julian Pokrovsky	3f4d94fd4c	[VectorCombine] foldShuffleOfBinops - support multiple uses of shuffled binops (#179429 ) Resolves #173035	2026-02-06 11:10:20 +00:00
Valeriy Savchenko	92c26bb1a5	[VectorCombine] Fix crash in foldEquivalentReductionCmp on i1 vector (#179917 )	2026-02-05 12:43:48 +00:00
Hans Wennborg	2ee37cc4cf	Revert "[VectorCombine] Trim low end of loads used in shufflevector rebroadcasts. (#149093 )" It appears to create loads from underaligned addresses. See comment on the PR. > Following on from #128938, trim the low end of loads where only some of > the incoming lanes are used for rebroadcasts in shufflevector > instructions. This reverts commit 6c8d9d0c4da51c7f9e7671902be3ad9b65d56c84 and the follow-up commits 07a6a23f6c5387fc1e7df174b5921f6004db64e0 and 313a2008538abb61ab13f8cc9f9a712f7faff3a5.	2026-02-02 18:50:02 +01:00
Deric C.	313a200853	[VectorCombine] Fix the PtrAdd offset in shrinkLoadForShuffles to account for element type size (#179001 ) This PR fixes an [issue I pointed out in regards to incorrect GEP indices](https://github.com/llvm/llvm-project/pull/149093#discussion_r2748266079) introduced by PR #149093. Changes: - Updated the pointer offset calculation in `VectorCombine::shrinkLoadForShuffles` so that the offset is now multiplied by the element size (`ElemSize`) when computing the new pointer for loads - Updated the GEP indices in `llvm/test/Transforms/VectorCombine/load-shufflevector.ll` for the correct byte offsets	2026-01-31 02:25:29 +00:00
NAKAMURA Takumi	9926b045b8	VectorCombine: Mark the test `+asserts` (fixup for #178072 )	2026-01-31 07:44:54 +09:00
puneeth_aditya_5656	07a6a23f6c	[VectorCombine] Fix crash with poison mask elements in shrinkLoadForShuffles (#178920 ) ## Summary Fixes assertion failure when `shrinkLoadForShuffles` processes shuffle masks containing poison elements. The bug was introduced in #149093 , when adjusting mask indices for load trimming, poison indices (-1) were modified to invalid values (e.g., -2), causing `isSingleSourceMaskImpl` to assert. The fix preserves poison indices without modification. Fixes #178917 ## Test plan - Added regression test `@shuffle_with_poison_mask`	2026-01-30 19:22:39 +00:00
Leon Clark	6c8d9d0c4d	[VectorCombine] Trim low end of loads used in shufflevector rebroadcasts. (#149093 ) Following on from #128938, trim the low end of loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2026-01-30 15:20:27 +00:00
calebwat	0694daa0ad	[VectorCombine] Fix typo in foldPermuteOfBinops cost calculation (#178072 ) Addresses an issue in #173153. This patch expanded the supported ops for folding binary ops through shuffles, but seemingly had a typo which could inaccurately increase the unmodified cost.	2026-01-30 14:03:49 +00:00
Mitch Briles	4ec35a0b0e	[VectorCombine] Fix crash when folding select of bitcast (#177183 ) Fixes #177144. Nits appreciated. The fold in question does the following transformation: Before ``` %bc = bitcast <4 x i32> %src to <16 x i8> %e0 = extractelement <16 x i8> %bc, i32 0 %s0 = select i1 %cond, i8 %e0, i8 0 %e1 = extractelement <16 x i8> %bc, i32 1 %s1 = select i1 %cond, i8 %e1, i8 0 ... ``` After ``` %sel = select i1 %cond, <4 x i32> %src, <4 x i32> zeroinitializer %bc = bitcast <4 x i32> %sel to <16 x i8> %e0 = extractelement <16 x i8> %bc, i32 0 %e1 = extractelement <16 x i8> %bc, i32 1 ... ``` If every select shares the condition and has 0 in the false branch, the bitcast can be replaced with a select between the original vector and `zeroinitializer`, followed by a bitcast. Then each `select(cond, extelt(...), 0)` can be replaced with `extelt(...)`. The crash happens when the condition is defined after the original bitcast, because the bitcast is replaced with the select + bitcast, and now the select references a condition not yet defined.	2026-01-28 15:13:50 +00:00
Valeriy Savchenko	28fc4f1d96	[VectorCombine] Call cost calculation with correct intrinsic IDs (#177996 ) #175194, #177159, and #173069 introduced the code calling `TTI.getMinMaxReductionCost` with unexpected `Intrinsic::ID` causing RISC-V to fail with `llvm_unreachable` panic. Functionally, this is a small fix that also ports tests for the aforementioned folds to RISCV.	2026-01-26 17:48:10 +00:00
Valeriy Savchenko	090a08d91b	[VectorCombine] Switch vector or<->umax/and<->umin in comparisons (#177159 ) Resolves #174500 In the transformation, we use either use one of these equivalences directly or one of the trivial inferences of their combinations. `or<->umax` 1. `or(X) == 0 <=> umax(X) == 0` 2. `or(X) == 1 <=> umax(X) == 1` 3. `sign(or(X)) == sign(umax(X))` `and<->umin` 1. `and(X) == -1 <=> umin(X) == -1` 2. `and(X) == -2 <=> umin(X) == -2` 3. `sign(and(X)) == sign(umin(X))` \| Case \| Proof \| \|------\|-------\| \| a. `or(X) ==/!= 0 <-> umax(X) ==/!= 0` \| [proof](https://alive2.llvm.org/ce/z/t9kER4) \| \| b. `or(X) s< 0 <-> umax(X) s< 0` \| [proof](https://alive2.llvm.org/ce/z/q67EXU) \| \| c. `or(X) s> -1 <-> umax(X) s> -1` \| [proof](https://alive2.llvm.org/ce/z/vY-tUd) \| \| d. `or(X) s< 1 <-> umax(X) s< 1` \| [proof](https://alive2.llvm.org/ce/z/d5izg3) \| \| e. `or(X) ==/!= 1 <-> umax(X) ==/!= 1` \| [proof](https://alive2.llvm.org/ce/z/gSjvpk) \| \| f. `or(X) s< 2 <-> umax(X) s< 2` \| [proof](https://alive2.llvm.org/ce/z/sGUV6c) \| \| g. `and(X) ==/!= -1 <-> umin(X) ==/!= -1` \| [proof](https://alive2.llvm.org/ce/z/mSAs2p) \| \| h. `and(X) s< 0 <-> umin(X) s< 0` \| [proof](https://alive2.llvm.org/ce/z/xnZeDT) \| \| i. `and(X) s> -1 <-> umin(X) s> -1` \| [proof](https://alive2.llvm.org/ce/z/ea_tKG) \| \| j. `and(X) s> -2 <-> umin(X) s> -2` \| [proof](https://alive2.llvm.org/ce/z/ewhAab) \| \| k. `and(X) ==/!= -2 <-> umin(X) ==/!= -2` \| [proof](https://alive2.llvm.org/ce/z/nBBt62) \| \| l. `and(X) s> -3 <-> umin(X) s> -3` \| [proof](https://alive2.llvm.org/ce/z/F3dsfz) \|	2026-01-26 15:55:54 +00:00
Valeriy Savchenko	9d6f011333	[VectorCombine] Fold vector.reduce.OP(F(X)) == 0 -> OP(X) == 0 (#173069 ) This commit introduces a pattern to do the following fold: vector.reduce.OP f(X_i) == 0 -> vector.reduce.OP X_i == 0 In order to decide on this fold, we use the following properties: 1. OP X_i == 0 <=> \forall i \in [1, N] X_i == 0 1'. OP X_i == 0 <=> \exists j \in [1, N] X_j == 0 2. f(x) == 0 <=> x == 0 From 1 and 2 (or 1' and 2), we can infer that OP f(X_i) == 0 <=> OP X_i == 0. For some of the OP's and f's, we need to have domain constraints on X to ensure properties 1 (or 1') and 2. In this change we support the following operations f: 1. f(x) = shl nuw x, y for arbitrary y 2. f(x) = mul nuw x, c for defined c != 0 3. f(x) = zext x 4. f(x) = sext x 5. f(x) = neg x And the following reductions OP: a. OR X_i - has property 1 for every X b. UMAX X_i - has property 1 for every X c. UMIN X_i - has property 1' for every X d. SMAX X_i - has property 1 for X >= 0 e. SMIN X_i - has property 1' for X >= 0 f. ADD X_i - has property 1 for X >= 0 && ADD X_i doesn't sign wrap The matrix of Alive2 proofs for every pair of {f,OP}: \| OP\f \| zext \| sext \| neg \| mul \| shl \| \|------\|------\|------\|-----\|-----\|-----\| \| or \| [proof](https://alive2.llvm.org/ce/z/EqHAPd) \| [proof](https://alive2.llvm.org/ce/z/DS3eP2) \| [proof](https://alive2.llvm.org/ce/z/65A5x9) \| [proof](https://alive2.llvm.org/ce/z/TVPpUf) \| [proof](https://alive2.llvm.org/ce/z/kj--vH) \| \| umin \| [proof](https://alive2.llvm.org/ce/z/AK39LL) \| [proof](https://alive2.llvm.org/ce/z/xEPH2S) \| [proof](https://alive2.llvm.org/ce/z/N-ubNr) \| [proof](https://alive2.llvm.org/ce/z/dgUEH4) \| [proof](https://alive2.llvm.org/ce/z/2TUNDu) \| \| umax \| [proof](https://alive2.llvm.org/ce/z/Cy_DJS) \| [proof](https://alive2.llvm.org/ce/z/f42bGQ) \| [proof](https://alive2.llvm.org/ce/z/ReUx4M) \| [proof](https://alive2.llvm.org/ce/z/qSsvdG) \| [proof](https://alive2.llvm.org/ce/z/cE3Qgw) \| \| smin \| [proof](https://alive2.llvm.org/ce/z/j5TwTA) \| [proof](https://alive2.llvm.org/ce/z/DhNxPQ) \| — \| [proof](https://alive2.llvm.org/ce/z/m03AOt) \| [proof](https://alive2.llvm.org/ce/z/bp58Q3) \| \| smax \| [proof](https://alive2.llvm.org/ce/z/3zmbRn) \| [proof](https://alive2.llvm.org/ce/z/6FTfRJ) \| — \| [proof](https://alive2.llvm.org/ce/z/KDfKEW) \| [proof](https://alive2.llvm.org/ce/z/dajm7T) \| \| add \| [proof](https://alive2.llvm.org/ce/z/3kt7BB) \| [proof](https://alive2.llvm.org/ce/z/cyqzQH) \| — \| [proof](https://alive2.llvm.org/ce/z/n_oGjT) \| [proof](https://alive2.llvm.org/ce/z/67bkJm) \| Proofs for known bits: * Leading zeros - [4vi32](https://alive2.llvm.org/ce/z/w--S2D), [16vi8](https://alive2.llvm.org/ce/z/hEdVks) * Leading ones - [4vi16](https://alive2.llvm.org/ce/z/RyPdBS), [v16i8](https://alive2.llvm.org/ce/z/UTFFt9)	2026-01-25 16:47:38 +00:00
Kavin Gnanapandithan	4237e74e52	[VectorCombine] foldShuffleOfBinops - failure to track OperandValueInfo (#171934 ) Resolves #170500. Implemented mergeInfo static helper to return common TTI::OperandValueInfo data . Added common OperandValueInfo `Op0Info` && `Op1Info` to NewCost calculation.	2026-01-23 18:04:06 +00:00
Valeriy Savchenko	48fb51b14c	[VectorCombine] Fold vector sign-bit checks (#175194 ) Fold patterns that extract sign bits, reduce them, and compare against boundary values into direct sign checks on the reduced vector. ``` icmp pred (reduce.{add,or,and,umax,umin}(lshr X, BitWidth-1)), C -> icmp slt/sgt (reduce.{or,umax,and,umin}(X)), 0/-1 ``` When the comparison is against 0 or MAX (1 for boolean reductions, NumElts for add), the pattern reduces to one of four quantified predicates: - ∀x: x < 0 (AllNeg) - ∀x: x ≥ 0 (AllNonNeg) - ∃x: x < 0 (AnyNeg) - ∃x: x ≥ 0 (AnyNonNeg) The transform eliminates the shift and selects between reduce.or/reduce.umax or reduce.and/reduce.umin based on cost modeling. ## The matrix of Alive2 proofs for every pair of {reduction, comparison}: \| Reduction \| == 0 \| != 0 \| == MAX \| != MAX \| \|-----------\|------\|------\|--------\|--------\| \| or \| [proof](https://alive2.llvm.org/ce/z/_BWxJW) \| [proof](https://alive2.llvm.org/ce/z/k3EiK6) \| [proof](https://alive2.llvm.org/ce/z/a8cAjp) \| [proof](https://alive2.llvm.org/ce/z/ci-HMt) \| \| umax \| [proof](https://alive2.llvm.org/ce/z/dWt28G) \| [proof](https://alive2.llvm.org/ce/z/_MqxXC) \| [proof](https://alive2.llvm.org/ce/z/KQebnF) \| [proof](https://alive2.llvm.org/ce/z/mixEgN) \| \| and \| [proof](https://alive2.llvm.org/ce/z/JgYrLj) \| [proof](https://alive2.llvm.org/ce/z/FZuPLy) \| [proof](https://alive2.llvm.org/ce/z/bYCa8V) \| [proof](https://alive2.llvm.org/ce/z/9fsLsN) \| \| umin \| [proof](https://alive2.llvm.org/ce/z/YnaSL-) \| [proof](https://alive2.llvm.org/ce/z/rGrgoM) \| [proof](https://alive2.llvm.org/ce/z/pb-ezQ) \| [proof](https://alive2.llvm.org/ce/z/JkoqEi) \| \| add \| [proof](https://alive2.llvm.org/ce/z/d5w5CF) \| [proof](https://alive2.llvm.org/ce/z/GUgQ2Z) \| [proof](https://alive2.llvm.org/ce/z/HnstY8) \| [proof](https://alive2.llvm.org/ce/z/j8z_3C) \| ### Other test cases \| Test \| Proof \| \|------\|-------\| \| or_slt_1 (slt 1 ≡ eq 0) \| [proof](https://alive2.llvm.org/ce/z/Wdb_uN) \| \| umax_sgt_0 (sgt 0 ≡ ne 0) \| [proof](https://alive2.llvm.org/ce/z/nw6NZc) \| \| and_slt_max (slt 1 ≡ ne 1) \| [proof](https://alive2.llvm.org/ce/z/ZDMSXZ) \| \| umin_sgt_max_minus_1 (sgt 0 ≡ eq 1) \| [proof](https://alive2.llvm.org/ce/z/Uynf8P) \| \| add_ult_max (ult 4 ≡ ne 4) \| [proof](https://alive2.llvm.org/ce/z/pyDgTg) \| \| add_ugt_max_minus_1 (ugt 3 ≡ eq 4) \| [proof](https://alive2.llvm.org/ce/z/mHVXJk) \| \| ashr_add_eq_0 (ashr instead of lshr) \| [proof](https://alive2.llvm.org/ce/z/oa9Kgo) \| ### or/umax and and/umin equivalence \| Check \| Equivalence \| Proof \| \|-----------------\|-------------\|-------\| \| AnyNeg \| or slt 0 ≡ umax slt 0 \| [proof](https://alive2.llvm.org/ce/z/Do2tNQ) \| \| AllNonNeg \| or sgt -1 ≡ umax sgt -1 \| [proof](https://alive2.llvm.org/ce/z/N4kZ8Z) \| \| AllNeg \| and slt 0 ≡ umin slt 0 \| [proof](https://alive2.llvm.org/ce/z/4mNpMk) \| \| AnyNonNeg \| and sgt -1 ≡ umin sgt -1 \| [proof](https://alive2.llvm.org/ce/z/2pVnyg) \|	2026-01-23 16:02:06 +00:00
Pankaj Dwivedi	8246257cac	Reapply "[VectorCombine] Fold scalar selects from bitcast into vector select" (#174762 ) Reapply https://github.com/llvm/llvm-project/pull/173990 with fixes for post-commit review comments. --------- Co-authored-by: padivedi <padivedi@amd.com> Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>	2026-01-13 16:25:09 +05:30
Julian Pokrovsky	38cb7ddca2	[VectorCombine] foldPermuteOfIntrinsic - support multiple uses of shuffled ops (#175299 ) Fixes #173039	2026-01-12 20:34:06 +00:00
Marcell Leleszi	fdc07534e7	[VectorCombine] foldShuffleOfSelects - support multiple uses of shuffled selects (#173166 ) This patch removes the single-use restriction of selects in foldShuffleOfSelects, allowing the fold to trigger for multi-use instructions as well if the cost model finds it cheaper. Fixes #173036	2025-12-23 13:10:12 +00:00
Dhruva Narayan K	1235409ed7	[VectorCombine] foldShuffleOfIntrinsics - support multiple uses of shuffled ops (#173183 ) Fixes #173037 Remove the `m_OneUse` restriction in `foldShuffleOfIntrinsics` and update the cost model to account for additional uses of the original intrinsics.	2025-12-22 19:00:53 +00:00
Miloš Poletanović	f60eec59fb	[VectorCombine] foldPermuteOfBinops - support multi-use binary ops and operands in shuffle folding (#173153 ) Fixes #173033 This patch extends VectorCombine to fold binary operations through shuffles in scenarios involving multiple uses of both the binary operator and its operands. Previously, the transformation was restricted to single-use cases to prevent instruction duplication. This change implements a cost-based evaluation that allows the fold even when: 1. The binary operator has multiple users (requiring duplication of the arithmetic instruction). 2. The operands of the binary operator (the shuffles) have multiple users (requiring the original shuffles to be preserved). The optimization is performed if the TTI cost of the new instruction sequence—including any duplicated arithmetic—is lower than the cost of the shuffle sequence it replaces. This is particularly beneficial on X86 targets for expensive cross-lane shuffles.	2025-12-22 18:12:35 +00:00
Simon Pilgrim	24d9550b27	[VectorCombine] foldShuffleOfBinops - if both operands are the same don't duplicate the total new cost (#172719 ) If we're shuffling/concatenating the same operands then ensure we don't duplicate the total cost, ensure we reuse the final shuffle and recognise that we reduce the total instruction count (so fold even when NewCost == OldCost, not just NewCost < OldCost).	2025-12-18 07:03:06 +00:00
Simon Pilgrim	d176c8d20f	[VectorCombine] foldShuffleOfBinops - add test showing failure to recognise that the new shuffle is repeated (so only a single cost) (#172708 ) Similar to #170867	2025-12-17 18:56:18 +00:00
Nicolai Hähnle	88bd56597c	VectorCombine: Improve the insert/extract fold in the narrowing case (#168820 ) Keeping the extracted element in a natural position in the narrowed vector has two beneficial effects: 1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which allows the insert/extract fold to trigger. 2. It makes the narrowing shuffles in a chain of extract/insert compatible, which allows foldLengthChangingShuffles to successfully recognize a chain that can be folded. There are minor X86 test changes that look reasonable to me. The IR change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2` at all.	2025-12-15 11:25:51 -08:00
Bala_Bhuvan_Varma	0b2fe07e6b	[VectorCombine] Prevent redundant cost computation for repeated operand pairs in foldShuffleOfIntrinsics (#171965 ) This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867) Existing code recomputes the cost for creating a shuffle instruction even for the repeating Intrinsic operand pairs. This will result in higher newCost. Hence the runtime will decide not to fold. The change proposed in this pr will address this issue. When calculating the newCost we are skipping the cost calculation of an operand pair if it was already considered. And when creating the transformed code, we are reusing the already created shuffle instruction for repeated operand pair.	2025-12-15 14:42:41 +00:00
Nicolai Hähnle	54ae1222ef	VectorCombine: Fold chains of shuffles fed by length-changing shuffles (#168819 ) Such chains can arise from folding insert/extract chains.	2025-12-12 13:53:03 -08:00
Nicolai Hähnle	2ab198fd15	AMDGPU: Precommit a test (#171208 )	2025-12-08 22:19:14 +00:00
Jerry Dang	23f09fd3e9	[VectorCombine] Fold permute of intrinsics into intrinsic of permutes: shuffle(intrinsic, poison/undef) -> intrinsic(shuffle) (#170052 ) [VectorCombine] Fold permute of intrinsics into intrinsic of permutes Add foldPermuteOfIntrinsic to transform: shuffle(intrinsic(args), poison) -> intrinsic(shuffle(args)) when the shuffle is a permute (operates on single vector) and the cost model determines the transformation is profitable. This optimization is particularly beneficial for subvector extractions where we can avoid computing unused elements. For example: %fma = call <8 x float> @llvm.fma.v8f32(<8 x float> %a, %b, %c) %result = shufflevector <8 x float> %fma, poison, <4 x i32> <0,1,2,3> transforms to: %a_low = shufflevector <8 x float> %a, poison, <4 x i32> <0,1,2,3> %b_low = shufflevector <8 x float> %b, poison, <4 x i32> <0,1,2,3> %c_low = shufflevector <8 x float> %c, poison, <4 x i32> <0,1,2,3> %result = call <4 x float> @llvm.fma.v4f32(%a_low, %b_low, %c_low) The transformation creates one shuffle per vector argument and calls the intrinsic with smaller vector types, reducing computation when only a subset of elements is needed. The existing foldShuffleOfIntrinsics handled the blend case (two intrinsic inputs), this adds support for the permute case (single intrinsic input). Fixes #170002	2025-12-05 15:54:53 +00:00
Simon Pilgrim	c2472be3fb	[VectorCombine][X86] foldShuffleOfIntrinsics - provide the arguments to a getShuffleCost call (#170465 ) Ensure the arguments are passed to the getShuffleCost calls to improve cost analysis, in particular if these are constant the costs will be recognised as free Noticed while reviewing #170052	2025-12-03 18:40:48 +00:00
Simon Pilgrim	6822e3c91b	[VectorCombine][X86] Add tests showing failure to push a shuffle through a fma with multiple constants (#170458 ) Despite 2 of the 3 arguments of the fma intrinsics calls being constant (free shuffle), foldShuffleOfIntrinsics fails to fold the shuffle through	2025-12-03 11:07:26 +00:00
Nicolai Hähnle	69589dd2c0	AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles (#168818 ) These shuffles can always be implemented using v_perm_b32, and so this rewrites the analysis from the perspective of "how many v_perm_b32s does it take to assemble each register of the result?" The test changes in Transforms/SLPVectorizer/reduction.ll are reasonable: VI (gfx8) has native f16 math, but not packed math.	2025-11-21 19:33:13 +00:00
Nicolai Hähnle	0b6a74ced0	VectorCombine/AMDGPU: Cleanup a test and add a new one (#168817 ) The existing, recently added test contains a whole lot of noise in the form of dead instructions. Also, prefer named values. The new test isolates a separate issue with concatenating i8 vectors.	2025-11-20 06:59:14 -08:00
Pawan Nirpal	124fa5ce5f	[AArch64] - Improve costing for Identity shuffles for SVE targets. (#165375 ) Identity masks can be treated as free when scalable vectorization is possible making the check agnostic of the vectorization policy fixed/scalable, This allows for aggressive vector combines for identity shuffle masks.	2025-11-18 11:52:43 -08:00
Julian Nagele	8280070a73	[VectorCombine] Try to scalarize vector loads feeding bitcast instructions. (#164682 ) This change aims to convert vector loads to scalar loads, if they are only converted to scalars after anyway. alive2 proof: https://alive2.llvm.org/ce/z/U_rvht	2025-11-12 15:35:03 +00:00
hanbeom	50ba89a22e	[VectorCombine] support mismatching extract/insert indices for foldInsExtFNeg (#126408 ) insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index -> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask In previous, the above transform was only possible if the Extract/Insert Index was the same; this patch makes the above transform possible even if the two indexes are different. Proof: https://alive2.llvm.org/ce/z/aDfdyG Fixes: https://github.com/llvm/llvm-project/issues/125675	2025-11-07 18:35:40 +00:00
Shakil Ahmed	47d71b69b4	[BasicTTI] Only split vectors with even element counts in getCastInstrCost (#166528 ) Fixes #166320	2025-11-06 12:45:42 +00:00
Julian Nagele	28a20b4af9	[VectorCombine] Avoid inserting freeze when scalarizing extend-extract if all extracts would lead to UB on poison. (#164683 ) This change aims to avoid inserting a freeze instruction between the load and bitcast when scalarizing extend-extract. This is particularly useful in combination with https://github.com/llvm/llvm-project/pull/164682, which can then potentially further scalarize, provided there is no freeze. alive2 proof: https://alive2.llvm.org/ce/z/W-GD88	2025-11-04 12:39:04 +00:00
choikwa	2b45efe920	[AMDGPU] NFC, move testcase, only test output of promote-alloca with vector-combine (#166289 )	2025-11-03 22:21:26 -05:00
Hongyu Chen	87bc0f7431	[VectorCombine] Preserve cast flags in foldBitOpOfCastConstant (#161237 ) Follow-up of #157822.	2025-09-30 16:38:03 +08:00

1 2 3 4 5 ...

414 Commits