llvm-project

Author	SHA1	Message	Date
Valeriy Savchenko	28fc4f1d96	[VectorCombine] Call cost calculation with correct intrinsic IDs (#177996 ) #175194, #177159, and #173069 introduced the code calling `TTI.getMinMaxReductionCost` with unexpected `Intrinsic::ID` causing RISC-V to fail with `llvm_unreachable` panic. Functionally, this is a small fix that also ports tests for the aforementioned folds to RISCV.	2026-01-26 17:48:10 +00:00
Valeriy Savchenko	090a08d91b	[VectorCombine] Switch vector or<->umax/and<->umin in comparisons (#177159 ) Resolves #174500 In the transformation, we use either use one of these equivalences directly or one of the trivial inferences of their combinations. `or<->umax` 1. `or(X) == 0 <=> umax(X) == 0` 2. `or(X) == 1 <=> umax(X) == 1` 3. `sign(or(X)) == sign(umax(X))` `and<->umin` 1. `and(X) == -1 <=> umin(X) == -1` 2. `and(X) == -2 <=> umin(X) == -2` 3. `sign(and(X)) == sign(umin(X))` \| Case \| Proof \| \|------\|-------\| \| a. `or(X) ==/!= 0 <-> umax(X) ==/!= 0` \| [proof](https://alive2.llvm.org/ce/z/t9kER4) \| \| b. `or(X) s< 0 <-> umax(X) s< 0` \| [proof](https://alive2.llvm.org/ce/z/q67EXU) \| \| c. `or(X) s> -1 <-> umax(X) s> -1` \| [proof](https://alive2.llvm.org/ce/z/vY-tUd) \| \| d. `or(X) s< 1 <-> umax(X) s< 1` \| [proof](https://alive2.llvm.org/ce/z/d5izg3) \| \| e. `or(X) ==/!= 1 <-> umax(X) ==/!= 1` \| [proof](https://alive2.llvm.org/ce/z/gSjvpk) \| \| f. `or(X) s< 2 <-> umax(X) s< 2` \| [proof](https://alive2.llvm.org/ce/z/sGUV6c) \| \| g. `and(X) ==/!= -1 <-> umin(X) ==/!= -1` \| [proof](https://alive2.llvm.org/ce/z/mSAs2p) \| \| h. `and(X) s< 0 <-> umin(X) s< 0` \| [proof](https://alive2.llvm.org/ce/z/xnZeDT) \| \| i. `and(X) s> -1 <-> umin(X) s> -1` \| [proof](https://alive2.llvm.org/ce/z/ea_tKG) \| \| j. `and(X) s> -2 <-> umin(X) s> -2` \| [proof](https://alive2.llvm.org/ce/z/ewhAab) \| \| k. `and(X) ==/!= -2 <-> umin(X) ==/!= -2` \| [proof](https://alive2.llvm.org/ce/z/nBBt62) \| \| l. `and(X) s> -3 <-> umin(X) s> -3` \| [proof](https://alive2.llvm.org/ce/z/F3dsfz) \|	2026-01-26 15:55:54 +00:00
Valeriy Savchenko	9d6f011333	[VectorCombine] Fold vector.reduce.OP(F(X)) == 0 -> OP(X) == 0 (#173069 ) This commit introduces a pattern to do the following fold: vector.reduce.OP f(X_i) == 0 -> vector.reduce.OP X_i == 0 In order to decide on this fold, we use the following properties: 1. OP X_i == 0 <=> \forall i \in [1, N] X_i == 0 1'. OP X_i == 0 <=> \exists j \in [1, N] X_j == 0 2. f(x) == 0 <=> x == 0 From 1 and 2 (or 1' and 2), we can infer that OP f(X_i) == 0 <=> OP X_i == 0. For some of the OP's and f's, we need to have domain constraints on X to ensure properties 1 (or 1') and 2. In this change we support the following operations f: 1. f(x) = shl nuw x, y for arbitrary y 2. f(x) = mul nuw x, c for defined c != 0 3. f(x) = zext x 4. f(x) = sext x 5. f(x) = neg x And the following reductions OP: a. OR X_i - has property 1 for every X b. UMAX X_i - has property 1 for every X c. UMIN X_i - has property 1' for every X d. SMAX X_i - has property 1 for X >= 0 e. SMIN X_i - has property 1' for X >= 0 f. ADD X_i - has property 1 for X >= 0 && ADD X_i doesn't sign wrap The matrix of Alive2 proofs for every pair of {f,OP}: \| OP\f \| zext \| sext \| neg \| mul \| shl \| \|------\|------\|------\|-----\|-----\|-----\| \| or \| [proof](https://alive2.llvm.org/ce/z/EqHAPd) \| [proof](https://alive2.llvm.org/ce/z/DS3eP2) \| [proof](https://alive2.llvm.org/ce/z/65A5x9) \| [proof](https://alive2.llvm.org/ce/z/TVPpUf) \| [proof](https://alive2.llvm.org/ce/z/kj--vH) \| \| umin \| [proof](https://alive2.llvm.org/ce/z/AK39LL) \| [proof](https://alive2.llvm.org/ce/z/xEPH2S) \| [proof](https://alive2.llvm.org/ce/z/N-ubNr) \| [proof](https://alive2.llvm.org/ce/z/dgUEH4) \| [proof](https://alive2.llvm.org/ce/z/2TUNDu) \| \| umax \| [proof](https://alive2.llvm.org/ce/z/Cy_DJS) \| [proof](https://alive2.llvm.org/ce/z/f42bGQ) \| [proof](https://alive2.llvm.org/ce/z/ReUx4M) \| [proof](https://alive2.llvm.org/ce/z/qSsvdG) \| [proof](https://alive2.llvm.org/ce/z/cE3Qgw) \| \| smin \| [proof](https://alive2.llvm.org/ce/z/j5TwTA) \| [proof](https://alive2.llvm.org/ce/z/DhNxPQ) \| — \| [proof](https://alive2.llvm.org/ce/z/m03AOt) \| [proof](https://alive2.llvm.org/ce/z/bp58Q3) \| \| smax \| [proof](https://alive2.llvm.org/ce/z/3zmbRn) \| [proof](https://alive2.llvm.org/ce/z/6FTfRJ) \| — \| [proof](https://alive2.llvm.org/ce/z/KDfKEW) \| [proof](https://alive2.llvm.org/ce/z/dajm7T) \| \| add \| [proof](https://alive2.llvm.org/ce/z/3kt7BB) \| [proof](https://alive2.llvm.org/ce/z/cyqzQH) \| — \| [proof](https://alive2.llvm.org/ce/z/n_oGjT) \| [proof](https://alive2.llvm.org/ce/z/67bkJm) \| Proofs for known bits: * Leading zeros - [4vi32](https://alive2.llvm.org/ce/z/w--S2D), [16vi8](https://alive2.llvm.org/ce/z/hEdVks) * Leading ones - [4vi16](https://alive2.llvm.org/ce/z/RyPdBS), [v16i8](https://alive2.llvm.org/ce/z/UTFFt9)	2026-01-25 16:47:38 +00:00
Kavin Gnanapandithan	4237e74e52	[VectorCombine] foldShuffleOfBinops - failure to track OperandValueInfo (#171934 ) Resolves #170500. Implemented mergeInfo static helper to return common TTI::OperandValueInfo data . Added common OperandValueInfo `Op0Info` && `Op1Info` to NewCost calculation.	2026-01-23 18:04:06 +00:00
Valeriy Savchenko	48fb51b14c	[VectorCombine] Fold vector sign-bit checks (#175194 ) Fold patterns that extract sign bits, reduce them, and compare against boundary values into direct sign checks on the reduced vector. ``` icmp pred (reduce.{add,or,and,umax,umin}(lshr X, BitWidth-1)), C -> icmp slt/sgt (reduce.{or,umax,and,umin}(X)), 0/-1 ``` When the comparison is against 0 or MAX (1 for boolean reductions, NumElts for add), the pattern reduces to one of four quantified predicates: - ∀x: x < 0 (AllNeg) - ∀x: x ≥ 0 (AllNonNeg) - ∃x: x < 0 (AnyNeg) - ∃x: x ≥ 0 (AnyNonNeg) The transform eliminates the shift and selects between reduce.or/reduce.umax or reduce.and/reduce.umin based on cost modeling. ## The matrix of Alive2 proofs for every pair of {reduction, comparison}: \| Reduction \| == 0 \| != 0 \| == MAX \| != MAX \| \|-----------\|------\|------\|--------\|--------\| \| or \| [proof](https://alive2.llvm.org/ce/z/_BWxJW) \| [proof](https://alive2.llvm.org/ce/z/k3EiK6) \| [proof](https://alive2.llvm.org/ce/z/a8cAjp) \| [proof](https://alive2.llvm.org/ce/z/ci-HMt) \| \| umax \| [proof](https://alive2.llvm.org/ce/z/dWt28G) \| [proof](https://alive2.llvm.org/ce/z/_MqxXC) \| [proof](https://alive2.llvm.org/ce/z/KQebnF) \| [proof](https://alive2.llvm.org/ce/z/mixEgN) \| \| and \| [proof](https://alive2.llvm.org/ce/z/JgYrLj) \| [proof](https://alive2.llvm.org/ce/z/FZuPLy) \| [proof](https://alive2.llvm.org/ce/z/bYCa8V) \| [proof](https://alive2.llvm.org/ce/z/9fsLsN) \| \| umin \| [proof](https://alive2.llvm.org/ce/z/YnaSL-) \| [proof](https://alive2.llvm.org/ce/z/rGrgoM) \| [proof](https://alive2.llvm.org/ce/z/pb-ezQ) \| [proof](https://alive2.llvm.org/ce/z/JkoqEi) \| \| add \| [proof](https://alive2.llvm.org/ce/z/d5w5CF) \| [proof](https://alive2.llvm.org/ce/z/GUgQ2Z) \| [proof](https://alive2.llvm.org/ce/z/HnstY8) \| [proof](https://alive2.llvm.org/ce/z/j8z_3C) \| ### Other test cases \| Test \| Proof \| \|------\|-------\| \| or_slt_1 (slt 1 ≡ eq 0) \| [proof](https://alive2.llvm.org/ce/z/Wdb_uN) \| \| umax_sgt_0 (sgt 0 ≡ ne 0) \| [proof](https://alive2.llvm.org/ce/z/nw6NZc) \| \| and_slt_max (slt 1 ≡ ne 1) \| [proof](https://alive2.llvm.org/ce/z/ZDMSXZ) \| \| umin_sgt_max_minus_1 (sgt 0 ≡ eq 1) \| [proof](https://alive2.llvm.org/ce/z/Uynf8P) \| \| add_ult_max (ult 4 ≡ ne 4) \| [proof](https://alive2.llvm.org/ce/z/pyDgTg) \| \| add_ugt_max_minus_1 (ugt 3 ≡ eq 4) \| [proof](https://alive2.llvm.org/ce/z/mHVXJk) \| \| ashr_add_eq_0 (ashr instead of lshr) \| [proof](https://alive2.llvm.org/ce/z/oa9Kgo) \| ### or/umax and and/umin equivalence \| Check \| Equivalence \| Proof \| \|-----------------\|-------------\|-------\| \| AnyNeg \| or slt 0 ≡ umax slt 0 \| [proof](https://alive2.llvm.org/ce/z/Do2tNQ) \| \| AllNonNeg \| or sgt -1 ≡ umax sgt -1 \| [proof](https://alive2.llvm.org/ce/z/N4kZ8Z) \| \| AllNeg \| and slt 0 ≡ umin slt 0 \| [proof](https://alive2.llvm.org/ce/z/4mNpMk) \| \| AnyNonNeg \| and sgt -1 ≡ umin sgt -1 \| [proof](https://alive2.llvm.org/ce/z/2pVnyg) \|	2026-01-23 16:02:06 +00:00
Ramkumar Ramachandra	d69335bac9	[LLVM] Clean up code using [not_]equal_to (NFC) (#175824 ) Use llvm::[not_]equal_to landed in d2a521750 ([ADT] Introduce bind_{front,back}, [not_]equal_to, #175056) across LLVM for cleaner code.	2026-01-13 21:19:39 +00:00
Pankaj Dwivedi	8246257cac	Reapply "[VectorCombine] Fold scalar selects from bitcast into vector select" (#174762 ) Reapply https://github.com/llvm/llvm-project/pull/173990 with fixes for post-commit review comments. --------- Co-authored-by: padivedi <padivedi@amd.com> Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>	2026-01-13 16:25:09 +05:30
Julian Pokrovsky	38cb7ddca2	[VectorCombine] foldPermuteOfIntrinsic - support multiple uses of shuffled ops (#175299 ) Fixes #173039	2026-01-12 20:34:06 +00:00
Pankaj Dwivedi	1ab7b6655d	Revert "[VectorCombine] Fold scalar selects from bitcast into vector select" (#174758 ) Reverts llvm/llvm-project#173990 Reverting to address post-commit review feedback. Will recommit with fixes.	2026-01-07 18:59:32 +05:30
Pankaj Dwivedi	72f18a05d6	[VectorCombine] Fold scalar selects from bitcast into vector select (#173990 )	2026-01-07 15:16:33 +05:30
Victor Chernyakin	c438773432	[LLVM][ADT] Migrate users of `make_scope_exit` to CTAD (#174030 ) This is a followup to #173131, which introduced the CTAD functionality.	2026-01-02 20:42:56 -08:00
Simon Pilgrim	3b85a631df	[VectorCombine] scalarizeExtExtract - create bitmasks with APInt::getLowBitsSet to avoid UB (#174202 ) If we're dealing with uint64 elements or larger, the existing `(1ull << SrcEltSizeInBits) - 1` mask can cause UB. Fixes #174046	2026-01-02 13:02:41 +00:00
Marcell Leleszi	fdc07534e7	[VectorCombine] foldShuffleOfSelects - support multiple uses of shuffled selects (#173166 ) This patch removes the single-use restriction of selects in foldShuffleOfSelects, allowing the fold to trigger for multi-use instructions as well if the cost model finds it cheaper. Fixes #173036	2025-12-23 13:10:12 +00:00
Dhruva Narayan K	1235409ed7	[VectorCombine] foldShuffleOfIntrinsics - support multiple uses of shuffled ops (#173183 ) Fixes #173037 Remove the `m_OneUse` restriction in `foldShuffleOfIntrinsics` and update the cost model to account for additional uses of the original intrinsics.	2025-12-22 19:00:53 +00:00
Miloš Poletanović	f60eec59fb	[VectorCombine] foldPermuteOfBinops - support multi-use binary ops and operands in shuffle folding (#173153 ) Fixes #173033 This patch extends VectorCombine to fold binary operations through shuffles in scenarios involving multiple uses of both the binary operator and its operands. Previously, the transformation was restricted to single-use cases to prevent instruction duplication. This change implements a cost-based evaluation that allows the fold even when: 1. The binary operator has multiple users (requiring duplication of the arithmetic instruction). 2. The operands of the binary operator (the shuffles) have multiple users (requiring the original shuffles to be preserved). The optimization is performed if the TTI cost of the new instruction sequence—including any duplicated arithmetic—is lower than the cost of the shuffle sequence it replaces. This is particularly beneficial on X86 targets for expensive cross-lane shuffles.	2025-12-22 18:12:35 +00:00
Simon Pilgrim	24d9550b27	[VectorCombine] foldShuffleOfBinops - if both operands are the same don't duplicate the total new cost (#172719 ) If we're shuffling/concatenating the same operands then ensure we don't duplicate the total cost, ensure we reuse the final shuffle and recognise that we reduce the total instruction count (so fold even when NewCost == OldCost, not just NewCost < OldCost).	2025-12-18 07:03:06 +00:00
Nicolai Hähnle	88bd56597c	VectorCombine: Improve the insert/extract fold in the narrowing case (#168820 ) Keeping the extracted element in a natural position in the narrowed vector has two beneficial effects: 1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which allows the insert/extract fold to trigger. 2. It makes the narrowing shuffles in a chain of extract/insert compatible, which allows foldLengthChangingShuffles to successfully recognize a chain that can be folded. There are minor X86 test changes that look reasonable to me. The IR change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2` at all.	2025-12-15 11:25:51 -08:00
Bala_Bhuvan_Varma	0b2fe07e6b	[VectorCombine] Prevent redundant cost computation for repeated operand pairs in foldShuffleOfIntrinsics (#171965 ) This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867) Existing code recomputes the cost for creating a shuffle instruction even for the repeating Intrinsic operand pairs. This will result in higher newCost. Hence the runtime will decide not to fold. The change proposed in this pr will address this issue. When calculating the newCost we are skipping the cost calculation of an operand pair if it was already considered. And when creating the transformed code, we are reusing the already created shuffle instruction for repeated operand pair.	2025-12-15 14:42:41 +00:00
Nicolai Hähnle	54ae1222ef	VectorCombine: Fold chains of shuffles fed by length-changing shuffles (#168819 ) Such chains can arise from folding insert/extract chains.	2025-12-12 13:53:03 -08:00
Jerry Dang	23f09fd3e9	[VectorCombine] Fold permute of intrinsics into intrinsic of permutes: shuffle(intrinsic, poison/undef) -> intrinsic(shuffle) (#170052 ) [VectorCombine] Fold permute of intrinsics into intrinsic of permutes Add foldPermuteOfIntrinsic to transform: shuffle(intrinsic(args), poison) -> intrinsic(shuffle(args)) when the shuffle is a permute (operates on single vector) and the cost model determines the transformation is profitable. This optimization is particularly beneficial for subvector extractions where we can avoid computing unused elements. For example: %fma = call <8 x float> @llvm.fma.v8f32(<8 x float> %a, %b, %c) %result = shufflevector <8 x float> %fma, poison, <4 x i32> <0,1,2,3> transforms to: %a_low = shufflevector <8 x float> %a, poison, <4 x i32> <0,1,2,3> %b_low = shufflevector <8 x float> %b, poison, <4 x i32> <0,1,2,3> %c_low = shufflevector <8 x float> %c, poison, <4 x i32> <0,1,2,3> %result = call <4 x float> @llvm.fma.v4f32(%a_low, %b_low, %c_low) The transformation creates one shuffle per vector argument and calls the intrinsic with smaller vector types, reducing computation when only a subset of elements is needed. The existing foldShuffleOfIntrinsics handled the blend case (two intrinsic inputs), this adds support for the permute case (single intrinsic input). Fixes #170002	2025-12-05 15:54:53 +00:00
Simon Pilgrim	c2472be3fb	[VectorCombine][X86] foldShuffleOfIntrinsics - provide the arguments to a getShuffleCost call (#170465 ) Ensure the arguments are passed to the getShuffleCost calls to improve cost analysis, in particular if these are constant the costs will be recognised as free Noticed while reviewing #170052	2025-12-03 18:40:48 +00:00
Julian Nagele	8280070a73	[VectorCombine] Try to scalarize vector loads feeding bitcast instructions. (#164682 ) This change aims to convert vector loads to scalar loads, if they are only converted to scalars after anyway. alive2 proof: https://alive2.llvm.org/ce/z/U_rvht	2025-11-12 15:35:03 +00:00
hanbeom	50ba89a22e	[VectorCombine] support mismatching extract/insert indices for foldInsExtFNeg (#126408 ) insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index -> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask In previous, the above transform was only possible if the Extract/Insert Index was the same; this patch makes the above transform possible even if the two indexes are different. Proof: https://alive2.llvm.org/ce/z/aDfdyG Fixes: https://github.com/llvm/llvm-project/issues/125675	2025-11-07 18:35:40 +00:00
Julian Nagele	28a20b4af9	[VectorCombine] Avoid inserting freeze when scalarizing extend-extract if all extracts would lead to UB on poison. (#164683 ) This change aims to avoid inserting a freeze instruction between the load and bitcast when scalarizing extend-extract. This is particularly useful in combination with https://github.com/llvm/llvm-project/pull/164682, which can then potentially further scalarize, provided there is no freeze. alive2 proof: https://alive2.llvm.org/ce/z/W-GD88	2025-11-04 12:39:04 +00:00
Hongyu Chen	87bc0f7431	[VectorCombine] Preserve cast flags in foldBitOpOfCastConstant (#161237 ) Follow-up of #157822.	2025-09-30 16:38:03 +08:00
Chaitanya Koparkar	766c90f439	[VectorCombine] foldShuffleOfCastops - handle unary shuffles (#160009 ) Fixes #156853.	2025-09-29 14:21:59 +01:00
Leon Clark	8df643f663	[VectorCombine] Fix rotation in phi narrowing. (#160465 ) Fix bug in #140188 where incoming vectors are rotated in the wrong direction. Co-authored-by: Leon Clark <leoclark@amd.com>	2025-09-29 13:26:35 +01:00
Uyiosa Iyekekpolor	994a6a39e1	[VectorCombine] Fix scalarizeExtExtract for big-endian (#157962 ) The scalarizeExtExtract transform assumed little-endian lane ordering, causing miscompiles on big-endian targets such as AIX/PowerPC under -O3 -flto. This patch updates the shift calculation to handle endianness correctly for big-endian targets. No functional change for little-endian targets. Fixes #158197. --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-09-15 10:08:16 +00:00
Hongyu Chen	c62ea6598e	[VectorCombine] Add Ext and Trunc support in foldBitOpOfCastConstant (#157822 ) Follow-up of https://github.com/llvm/llvm-project/pull/155216. This patch doesn't preserve the flags. I will implement it in the follow-up patch.	2025-09-11 17:08:47 +08:00
Hongyu Chen	75b0c89e62	[InstCombine][VectorCombine][NFC] Unify uses of lossless inverse cast (#156597 ) This patch addresses https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663. This patch adds a helper function to put the inverse cast on constants, with cast flags preserved(optional). Follow-up patches will add trunc/ext handling on VectorCombine and flags preservation on InstCombine.	2025-09-08 13:30:06 +00:00
Simon Pilgrim	ad3a0ae9e1	[VectorCombine] foldSelectShuffle - early-out cases where the max vector register width isn't large enough (#157430 ) Technically this could happen with vector units that can't handle all legal scalar widths - but its good enough to use a generic crash test without a suitable target Fixes #157335	2025-09-08 12:04:23 +00:00
Hongyu Chen	3bdd39715a	[VectorCombine] Relax vector type constraint on bitop(bitcast, bitcast) (#157245 ) Inspired by https://github.com/llvm/llvm-project/issues/157131. This patch allows `bitop(bitcast, bitcast) -> bitcast(bitop)` for scalar integer types.	2025-09-08 06:58:09 +00:00
Hongyu Chen	db2fc84f93	[VectorCombine] Relax vector type constraint on bitop(bitcast, constant) (#157246 ) Fixes https://github.com/llvm/llvm-project/issues/157131. This patch allows bitop(bitcast, constant) -> bitcast(bitop) for scalar integer types.	2025-09-08 12:35:19 +08:00
XChy	cb80fa756c	[VectorCombine] Support pattern `bitop(bitcast(x), C) -> bitcast(bitop(x, InvC))` (#155216 ) Resolves #154797. This patch adds the fold `bitop(bitcast(x), C) -> bitop(bitcast(x), cast(InvC)) -> bitcast(bitop(x, InvC))`. The helper function `getLosslessInvCast` tries to calculate the constant `InvC`, satisfying `castop(InvC) == C`, and will try its best to keep the poison-generated flags of the cast operation.	2025-09-02 23:54:12 +08:00
Sam Tebbs	37127f74f4	[LV] Bundle sub reductions into VPExpressionRecipe (#147255 ) This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. -> https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-09-01 17:25:01 +01:00
Yingwei Zheng	abd2dc90c0	[VectorCombine] Avoid double deletion in `eraseInstruction` (#155621 ) Consider the following pattern: ``` C = op A B D = op C E = op D, C ``` As `E` is dead, we call `eraseInstruction(E)` and see if its operands become dead. `RecursivelyDeleteTriviallyDeadInstructions(D)` also erases `C`, which causes a UAF crash in the subsequent call `RecursivelyDeleteTriviallyDeadInstructions(C)`. This patch also adds deleted ops into the visit list to avoid double deletion. Closes https://github.com/llvm/llvm-project/issues/155543.	2025-08-27 22:28:02 +08:00
Yingwei Zheng	db6a8f1009	[VectorCombine] Avoid crash when the next node is deleted. (#155115 ) `RecursivelyDeleteTriviallyDeadInstructions` is introduced by https://github.com/llvm/llvm-project/pull/149047 to immediately drop dead instructions. However, it may invalidate the next iterator in `make_early_inc_range` in some edge cases, which leads to a crash. This patch manually maintains the next iterator and updates it when the next instruction is about to be deleted. Closes https://github.com/llvm/llvm-project/issues/155110.	2025-08-26 00:22:53 +08:00
Rajveer Singh Bharadwaj	4ce550614b	[Post-Commit] Add missing `break` https://github.com/llvm/llvm-project/pull/145232	2025-08-24 15:08:32 +05:30
Rajveer Singh Bharadwaj	93c96849c8	[VectorCombine] New folding pattern for extract/binop/shuffle chains (#145232 ) Resolves #144654 Part of #143088 This adds a new `foldShuffleChainsToReduce` for horizontal reduction of patterns like: ```llvm define i16 @test_reduce_v8i16(<8 x i16> %a0) local_unnamed_addr #0 { %1 = shufflevector <8 x i16> %a0, <8 x i16> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison> %2 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %a0, <8 x i16> %1) %3 = shufflevector <8 x i16> %2, <8 x i16> poison, <8 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> %4 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %2, <8 x i16> %3) %5 = shufflevector <8 x i16> %4, <8 x i16> poison, <8 x i32> <i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> %6 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %4, <8 x i16> %5) %7 = extractelement <8 x i16> %6, i64 0 ret i16 %7 } ``` ...which can be reduced to a llvm.vector.reduce.umin.v8i16(%a0) intrinsic call. Similar transformation for other ops when costs permit to do so.	2025-08-24 14:21:48 +05:30
Kazu Hirata	c1bc55ee06	[Vectorize] Remove an unnecessary cast (NFC) (#155135 ) getOpcode() already returns Instruction::CastOps.	2025-08-23 22:20:07 -07:00
Kyle Wang	064f02dac0	[VectorCombine] Preserve scoped alias metadata (#153714 ) Right now if a load op is scalarized, the `!alias.scope` and `!noalias` metadata are dropped. This PR is to keep them if exist.	2025-08-18 18:16:32 +00:00
David Green	790bee99de	[VectorCombine] Remove dead node immediately in VectorCombine (#149047 ) The vector combiner will process all instructions as it first loops through the function, adding any newly added and deleted instructions to a worklist which is then processed when all nodes are done. These leaves extra uses in the graph as the initial processing is performed, leading to sub-optimal decisions being made for other combines. This changes it so that trivially dead instructions are removed immediately. The main changes that this requires is to make sure iterator invalidation does not occur.	2025-08-18 07:55:21 +01:00
Kazu Hirata	cbf5af9668	[llvm] Remove unused includes (NFC) (#154051 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-08-17 23:46:35 -07:00
XChy	3a4a60deff	[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop (#153069 ) Fixes #153012 As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we may fold ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 { entry: %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0 %159 = or disjoint <2 x i64> splat (i64 2), %158 store <2 x i64> %159, ptr %ptr2 ret void } ``` to ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) { entry: %.scalar = or disjoint i64 2, %idx %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)> %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0 store <2 x i64> %1, ptr %ptr2, align 16 ret void } ``` And it would be folded back in `foldInsExtBinop`, resulting in an infinite loop. This patch forces scalarization iff InstSimplify can fold the constant expression.	2025-08-15 18:38:04 +00:00
zGoldthorpe	a8d25683ee	[PatternMatch] Allow `m_ConstantInt` to match integer splats (#153692 ) When matching integers, `m_ConstantInt` is a convenient alternative to `m_APInt` for matching unsigned 64-bit integers, allowing one to simplify ```cpp const APInt *IntC; if (match(V, m_APInt(IntC))) { if (IntC->ule(UINT64_MAX)) { uint64_t Int = IntC->getZExtValue(); // ... } } ``` to ```cpp uint64_t Int; if (match(V, m_ConstantInt(Int))) { // ... } ``` However, this simplification is only true if `V` is a scalar type. Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt` does not. This patch ensures that the matching behaviour of `m_ConstantInt` parallels that of `m_APInt`, and also incorporates it in some obvious places.	2025-08-15 10:43:54 -06:00
Elvis Wang	01fac67e2a	[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342 ) This patch add cost kind to `getAddressComputationCost()` for #149955. Note that this patch also remove all the default value in `getAddressComputationCost()`.	2025-08-14 16:01:44 +08:00
Leon Clark	e2bbd6d287	[VectorCombine][AMDGPU] Narrow Phi of Shuffles. (#140188 ) Attempt to narrow a phi of shufflevector instructions where the two incoming values have the same operands but different masks. Related to #128938. --------- Co-authored-by: Leon Clark <leoclark@amd.com>	2025-08-12 18:45:11 +01:00
Leon Clark	9115bef8ee	[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138 ) Reopen #128938. Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-12 14:08:37 +01:00
Luke Lau	acb86fb9e0	[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657 ) In some places we were passing the type of value being accessed, in other cases we were passing the type of the pointer for the access. The most "involved" user is LoopVectorizationCostModel::getMemInstScalarizationCost, which is the only call site that passes in the SCEV, and it passes along the pointer type. This changes call sites to consistently pass the pointer type, and renames the arguments to clarify this. No target actually checks the contents of the type passed, only to see if it's a vector or not, so this shouldn't have an effect.	2025-08-11 18:00:12 +08:00
David Green	6ca6d45b29	[VectorCombine] Use hasOneUser in shuffle-to-identity fold (#152675 ) We need to check that the node is part of the graph being converted, so will not contain external uses when transformed.	2025-08-11 07:45:15 +01:00

1 2 3 4 5 ...

337 Commits