352 Commits

Author SHA1 Message Date
David Green
fd40c60665
[VectorCombine] Fix transitive Uses in foldShuffleToIdentity (#188989)
The Uses in foldShuffleToIdentity is intended to detect where an operand
is used to distinguish between splats, identities and concats of the
same value. When looking through multiple unsimplified shuffles the same
Use could be both a splat and a identity though. This patch changes the
Use to a Value and an original Use, so that even if we are looking
through multiple vectors we recognise the splat vs identity vs concat of
each use correctly.

Fixes #180338
2026-04-01 14:53:04 +01:00
Nathiyaa Sengodan
9d20e75c2d
[VectorCombine] Fix crash in foldShuffleOfSelects for single-element shuffle result (#185713)
In foldShuffleOfSelects, if the shuffle result has a single element, the
resulting type may be scalar rather than a vector. The later code in
foldShuffleOfSelects assumes the result is a vector and performs cast<
FixedVectorType >, which triggers an assertion.

Fixes #183625
2026-03-13 13:24:18 +00:00
V.Rybalov
19a31b27b0
[VectorCombine] Fixing bitcast processing in VectorCombine (#185075)
Fixing bitcast instruction processing in VectorCombine pass
that operates: arbitrary precision integer types.
2026-03-09 12:47:56 -07:00
Valeriy Savchenko
49b77e3b45
[VectorCombine] Fold sign-bit check for multiple vectors (#182911)
## Alive2 proofs

| Reduction | Shift | Cmp      | Sources | Proof |
|-----------|-------|----------|---------|-------|
| add | lshr | == 0 | 2 | [proof](https://alive2.llvm.org/ce/z/f44vco) |
| add | lshr | == 8 | 2 | [proof](https://alive2.llvm.org/ce/z/Ks_nea) |
| add | ashr | == 0 | 2 | [proof](https://alive2.llvm.org/ce/z/ZsXJ5k) |
| add | ashr | == -8 | 2 | [proof](https://alive2.llvm.org/ce/z/HZfans)
|
| add | lshr | == 0 | 3 | [proof](https://alive2.llvm.org/ce/z/x-dEdz) |
| add | lshr | == 12 | 3 | [proof](https://alive2.llvm.org/ce/z/sfNvhr)
|

These proofs are not very exhaustive, but somewhat show that it works
for addition. Apart from the fact that we use multiple vectors, the
proofs from the previous changes generally apply here as well because we
effectively match on reductions of size M x N.
2026-03-01 14:07:44 +00:00
Simon Pilgrim
92704064e5
[VectorCombine][X86] Ensure we recognise free sign extends of vector comparison results (#183575)
Unless we're working with AVX512 mask predicate types, sign extending a
vXi1 comparison result back to the width of the comparison source types
is free.

VectorCombine::foldShuffleOfCastops - pass the original CastInst in the
getCastInstrCost calls to track the source comparison instruction.

Fixes #165813
2026-02-27 07:55:39 +00:00
Valeriy Savchenko
966a4618b8
[VectorCombine] Support ashr sign-bit extraction (#181998)
This change extends a sign-bit reduction fold introduced earlier. Prior
to it, we only supported LSHR isntructions for sign-bits extraction.
Similar logic can be applied to ASHR and the fold can be generalized.

## Alive2 proofs

| Reduction | == 0 | == -1 / -N | slt 0 | sgt -1 / -N | 
|-----------|------|------------|-------|-------------|
| or | [proof](https://alive2.llvm.org/ce/z/DaSMPt) |
[proof](https://alive2.llvm.org/ce/z/wzR48R) |
[proof](https://alive2.llvm.org/ce/z/rfyr_7) |
[proof](https://alive2.llvm.org/ce/z/MTFFe5) |
| and | [proof](https://alive2.llvm.org/ce/z/PmmpbX) |
[proof](https://alive2.llvm.org/ce/z/7_9hSn) |
[proof](https://alive2.llvm.org/ce/z/wudWY3) |
[proof](https://alive2.llvm.org/ce/z/QZ33KB) |
| umax | [proof](https://alive2.llvm.org/ce/z/gQGnDc) |
[proof](https://alive2.llvm.org/ce/z/dMsoQF) |
[proof](https://alive2.llvm.org/ce/z/QwFbae) |
[proof](https://alive2.llvm.org/ce/z/3dbmy6) |
| umin | [proof](https://alive2.llvm.org/ce/z/Z2pZUQ) |
[proof](https://alive2.llvm.org/ce/z/6FQgGR) |
[proof](https://alive2.llvm.org/ce/z/95-em6) |
[proof](https://alive2.llvm.org/ce/z/PW7c-m) |
| add | [proof](https://alive2.llvm.org/ce/z/FVhuhj) |
[proof](https://alive2.llvm.org/ce/z/h1B9jQ) |
[proof](https://alive2.llvm.org/ce/z/DmiYRr) |
[proof](https://alive2.llvm.org/ce/z/P4WDN5) |
2026-02-23 13:01:39 +00:00
Florian Hahn
689c99557f
[VectorCombine] Skip dead shufflevector in GetIndexRangeInShuffles to fix crash. (#179217)
Update GetIndexRangeInShuffles to skip unused shuffles. This matches the
behavior in the loop below and without it, we end up with an index
mis-match, causing a crash for the added test case.

PR: https://github.com/llvm/llvm-project/pull/179217
2026-02-06 12:05:47 +00:00
Julian Pokrovsky
3f4d94fd4c
[VectorCombine] foldShuffleOfBinops - support multiple uses of shuffled binops (#179429)
Resolves #173035
2026-02-06 11:10:20 +00:00
Valeriy Savchenko
92c26bb1a5
[VectorCombine] Fix crash in foldEquivalentReductionCmp on i1 vector (#179917) 2026-02-05 12:43:48 +00:00
Hans Wennborg
2ee37cc4cf Revert "[VectorCombine] Trim low end of loads used in shufflevector rebroadcasts. (#149093)"
It appears to create loads from underaligned addresses. See comment on
the PR.

> Following on from #128938, trim the low end of loads where only some of
> the incoming lanes are used for rebroadcasts in shufflevector
> instructions.

This reverts commit 6c8d9d0c4da51c7f9e7671902be3ad9b65d56c84 and the
follow-up commits 07a6a23f6c5387fc1e7df174b5921f6004db64e0 and
313a2008538abb61ab13f8cc9f9a712f7faff3a5.
2026-02-02 18:50:02 +01:00
Deric C.
313a200853
[VectorCombine] Fix the PtrAdd offset in shrinkLoadForShuffles to account for element type size (#179001)
This PR fixes an [issue I pointed out in regards to incorrect GEP
indices](https://github.com/llvm/llvm-project/pull/149093#discussion_r2748266079)
introduced by PR #149093.

Changes:
- Updated the pointer offset calculation in
`VectorCombine::shrinkLoadForShuffles` so that the offset is now
multiplied by the element size (`ElemSize`) when computing the new
pointer for loads
- Updated the GEP indices in
`llvm/test/Transforms/VectorCombine/load-shufflevector.ll` for the
correct byte offsets
2026-01-31 02:25:29 +00:00
puneeth_aditya_5656
07a6a23f6c
[VectorCombine] Fix crash with poison mask elements in shrinkLoadForShuffles (#178920)
## Summary
Fixes assertion failure when `shrinkLoadForShuffles` processes shuffle
masks containing poison elements.

The bug was introduced in #149093 , when adjusting mask indices for load
trimming, poison indices (-1) were modified to invalid values (e.g.,
-2), causing `isSingleSourceMaskImpl` to assert.

The fix preserves poison indices without modification.

Fixes #178917

## Test plan
- Added regression test `@shuffle_with_poison_mask`
2026-01-30 19:22:39 +00:00
Leon Clark
6c8d9d0c4d
[VectorCombine] Trim low end of loads used in shufflevector rebroadcasts. (#149093)
Following on from #128938, trim the low end of loads where only some of
the incoming lanes are used for rebroadcasts in shufflevector
instructions.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2026-01-30 15:20:27 +00:00
calebwat
0694daa0ad
[VectorCombine] Fix typo in foldPermuteOfBinops cost calculation (#178072)
Addresses an issue in #173153. This patch expanded the supported ops for
folding binary ops through shuffles, but seemingly had a typo which
could inaccurately increase the unmodified cost.
2026-01-30 14:03:49 +00:00
Mitch Briles
4ec35a0b0e
[VectorCombine] Fix crash when folding select of bitcast (#177183)
Fixes #177144. Nits appreciated.

The fold in question does the following transformation:
Before
```
%bc = bitcast <4 x i32> %src to <16 x i8>
%e0 = extractelement <16 x i8> %bc, i32 0
%s0 = select i1 %cond, i8 %e0, i8 0
%e1 = extractelement <16 x i8> %bc, i32 1
%s1 = select i1 %cond, i8 %e1, i8 0
...
```

After
```
%sel = select i1 %cond, <4 x i32> %src, <4 x i32> zeroinitializer
%bc = bitcast <4 x i32> %sel to <16 x i8>
%e0 = extractelement <16 x i8> %bc, i32 0
%e1 = extractelement <16 x i8> %bc, i32 1
...
```
If every select shares the condition and has 0 in the false branch, the
bitcast can be replaced with a select between the original vector and
`zeroinitializer`, followed by a bitcast. Then each `select(cond,
extelt(...), 0)` can be replaced with `extelt(...)`.
The crash happens when the condition is defined after the original
bitcast, because the bitcast is replaced with the select + bitcast, and
now the select references a condition not yet defined.
2026-01-28 15:13:50 +00:00
Valeriy Savchenko
28fc4f1d96
[VectorCombine] Call cost calculation with correct intrinsic IDs (#177996)
#175194, #177159, and #173069 introduced the code calling
`TTI.getMinMaxReductionCost` with unexpected `Intrinsic::ID` causing
RISC-V to fail with `llvm_unreachable` panic.

Functionally, this is a small fix that also ports tests for the
aforementioned folds to RISCV.
2026-01-26 17:48:10 +00:00
Valeriy Savchenko
090a08d91b
[VectorCombine] Switch vector or<->umax/and<->umin in comparisons (#177159)
Resolves #174500

In the transformation, we use either use one of these equivalences
directly or one of the trivial inferences of their combinations.

`or<->umax`

1. `or(X) == 0 <=> umax(X) == 0`
2. `or(X) == 1 <=> umax(X) == 1`
3. `sign(or(X)) == sign(umax(X))`

`and<->umin`

1. `and(X) == -1 <=> umin(X) == -1`
2. `and(X) == -2 <=> umin(X) == -2`
3. `sign(and(X)) == sign(umin(X))`

| Case | Proof |
|------|-------|
| a. `or(X) ==/!= 0 <-> umax(X) ==/!= 0` |
[proof](https://alive2.llvm.org/ce/z/t9kER4) |
| b. `or(X) s< 0 <-> umax(X) s< 0` |
[proof](https://alive2.llvm.org/ce/z/q67EXU) |
| c. `or(X) s> -1 <-> umax(X) s> -1` |
[proof](https://alive2.llvm.org/ce/z/vY-tUd) |
| d. `or(X) s< 1 <-> umax(X) s< 1` |
[proof](https://alive2.llvm.org/ce/z/d5izg3) |
| e. `or(X) ==/!= 1 <-> umax(X) ==/!= 1` |
[proof](https://alive2.llvm.org/ce/z/gSjvpk) |
| f. `or(X) s< 2 <-> umax(X) s< 2` |
[proof](https://alive2.llvm.org/ce/z/sGUV6c) |
| g. `and(X) ==/!= -1 <-> umin(X) ==/!= -1` |
[proof](https://alive2.llvm.org/ce/z/mSAs2p) |
| h. `and(X) s< 0 <-> umin(X) s< 0` |
[proof](https://alive2.llvm.org/ce/z/xnZeDT) |
| i. `and(X) s> -1 <-> umin(X) s> -1` |
[proof](https://alive2.llvm.org/ce/z/ea_tKG) |
| j. `and(X) s> -2 <-> umin(X) s> -2` |
[proof](https://alive2.llvm.org/ce/z/ewhAab) |
| k. `and(X) ==/!= -2 <-> umin(X) ==/!= -2` |
[proof](https://alive2.llvm.org/ce/z/nBBt62) |
| l. `and(X) s> -3 <-> umin(X) s> -3` |
[proof](https://alive2.llvm.org/ce/z/F3dsfz) |
2026-01-26 15:55:54 +00:00
Valeriy Savchenko
9d6f011333
[VectorCombine] Fold vector.reduce.OP(F(X)) == 0 -> OP(X) == 0 (#173069)
This commit introduces a pattern to do the following fold:

  vector.reduce.OP f(X_i) == 0 -> vector.reduce.OP X_i == 0

In order to decide on this fold, we use the following properties:

1. OP X_i == 0 <=> \forall i \in [1, N] X_i == 0 1'. OP X_i == 0 <=>
\exists j \in [1, N] X_j == 0
  2.  f(x) == 0 <=> x == 0

From 1 and 2 (or 1' and 2), we can infer that

  OP f(X_i) == 0 <=> OP X_i == 0.

For some of the OP's and f's, we need to have domain constraints on X to
ensure properties 1 (or 1') and 2.

In this change we support the following operations f:

  1. f(x) = shl nuw x, y for arbitrary y
  2. f(x) = mul nuw x, c for defined c != 0
  3. f(x) = zext x
  4. f(x) = sext x
  5. f(x) = neg x

And the following reductions OP:

  a. OR X_i   - has property 1  for every X
  b. UMAX X_i - has property 1  for every X
  c. UMIN X_i - has property 1' for every X
  d. SMAX X_i - has property 1  for X >= 0
  e. SMIN X_i - has property 1' for X >= 0
  f. ADD X_i  - has property 1  for X >= 0 && ADD X_i doesn't sign wrap

The matrix of Alive2 proofs for every pair of {f,OP}:
  | OP\f | zext | sext | neg | mul | shl |
  |------|------|------|-----|-----|-----|
| or | [proof](https://alive2.llvm.org/ce/z/EqHAPd) |
[proof](https://alive2.llvm.org/ce/z/DS3eP2) |
[proof](https://alive2.llvm.org/ce/z/65A5x9) |
[proof](https://alive2.llvm.org/ce/z/TVPpUf) |
[proof](https://alive2.llvm.org/ce/z/kj--vH) |
| umin | [proof](https://alive2.llvm.org/ce/z/AK39LL) |
[proof](https://alive2.llvm.org/ce/z/xEPH2S) |
[proof](https://alive2.llvm.org/ce/z/N-ubNr) |
[proof](https://alive2.llvm.org/ce/z/dgUEH4) |
[proof](https://alive2.llvm.org/ce/z/2TUNDu) |
| umax | [proof](https://alive2.llvm.org/ce/z/Cy_DJS) |
[proof](https://alive2.llvm.org/ce/z/f42bGQ) |
[proof](https://alive2.llvm.org/ce/z/ReUx4M) |
[proof](https://alive2.llvm.org/ce/z/qSsvdG) |
[proof](https://alive2.llvm.org/ce/z/cE3Qgw) |
| smin | [proof](https://alive2.llvm.org/ce/z/j5TwTA) |
[proof](https://alive2.llvm.org/ce/z/DhNxPQ) | — |
[proof](https://alive2.llvm.org/ce/z/m03AOt) |
[proof](https://alive2.llvm.org/ce/z/bp58Q3) |
| smax | [proof](https://alive2.llvm.org/ce/z/3zmbRn) |
[proof](https://alive2.llvm.org/ce/z/6FTfRJ) | — |
[proof](https://alive2.llvm.org/ce/z/KDfKEW) |
[proof](https://alive2.llvm.org/ce/z/dajm7T) |
| add | [proof](https://alive2.llvm.org/ce/z/3kt7BB) |
[proof](https://alive2.llvm.org/ce/z/cyqzQH) | — |
[proof](https://alive2.llvm.org/ce/z/n_oGjT) |
[proof](https://alive2.llvm.org/ce/z/67bkJm) |

Proofs for known bits:
* Leading zeros - [4vi32](https://alive2.llvm.org/ce/z/w--S2D),
[16vi8](https://alive2.llvm.org/ce/z/hEdVks)
* Leading ones - [4vi16](https://alive2.llvm.org/ce/z/RyPdBS),
[v16i8](https://alive2.llvm.org/ce/z/UTFFt9)
2026-01-25 16:47:38 +00:00
Kavin Gnanapandithan
4237e74e52
[VectorCombine] foldShuffleOfBinops - failure to track OperandValueInfo (#171934)
Resolves #170500.

Implemented mergeInfo static helper to return common
TTI::OperandValueInfo data .

Added common OperandValueInfo `Op0Info` && `Op1Info` to NewCost
calculation.
2026-01-23 18:04:06 +00:00
Valeriy Savchenko
48fb51b14c
[VectorCombine] Fold vector sign-bit checks (#175194)
Fold patterns that extract sign bits, reduce them, and compare against
boundary values into direct sign checks on the reduced vector.

```
icmp pred (reduce.{add,or,and,umax,umin}(lshr X, BitWidth-1)), C
    ->  icmp slt/sgt (reduce.{or,umax,and,umin}(X)), 0/-1
```

When the comparison is against 0 or MAX (1 for boolean reductions,
NumElts for add), the pattern reduces to one of four quantified
predicates:
- ∀x: x < 0 (AllNeg)
- ∀x: x ≥ 0 (AllNonNeg)
- ∃x: x < 0 (AnyNeg)
- ∃x: x ≥ 0 (AnyNonNeg)

The transform eliminates the shift and selects between
reduce.or/reduce.umax or reduce.and/reduce.umin based on cost modeling.

## The matrix of Alive2 proofs for every pair of {reduction,
comparison}:

| Reduction | == 0 | != 0 | == MAX | != MAX |
|-----------|------|------|--------|--------|
| or | [proof](https://alive2.llvm.org/ce/z/_BWxJW) |
[proof](https://alive2.llvm.org/ce/z/k3EiK6) |
[proof](https://alive2.llvm.org/ce/z/a8cAjp) |
[proof](https://alive2.llvm.org/ce/z/ci-HMt) |
| umax | [proof](https://alive2.llvm.org/ce/z/dWt28G) |
[proof](https://alive2.llvm.org/ce/z/_MqxXC) |
[proof](https://alive2.llvm.org/ce/z/KQebnF) |
[proof](https://alive2.llvm.org/ce/z/mixEgN) |
| and | [proof](https://alive2.llvm.org/ce/z/JgYrLj) |
[proof](https://alive2.llvm.org/ce/z/FZuPLy) |
[proof](https://alive2.llvm.org/ce/z/bYCa8V) |
[proof](https://alive2.llvm.org/ce/z/9fsLsN) |
| umin | [proof](https://alive2.llvm.org/ce/z/YnaSL-) |
[proof](https://alive2.llvm.org/ce/z/rGrgoM) |
[proof](https://alive2.llvm.org/ce/z/pb-ezQ) |
[proof](https://alive2.llvm.org/ce/z/JkoqEi) |
| add | [proof](https://alive2.llvm.org/ce/z/d5w5CF) |
[proof](https://alive2.llvm.org/ce/z/GUgQ2Z) |
[proof](https://alive2.llvm.org/ce/z/HnstY8) |
[proof](https://alive2.llvm.org/ce/z/j8z_3C) |

### Other test cases

| Test | Proof |
|------|-------|
| or_slt_1 (slt 1 ≡ eq 0) | [proof](https://alive2.llvm.org/ce/z/Wdb_uN)
|
| umax_sgt_0 (sgt 0 ≡ ne 0) |
[proof](https://alive2.llvm.org/ce/z/nw6NZc) |
| and_slt_max (slt 1 ≡ ne 1) |
[proof](https://alive2.llvm.org/ce/z/ZDMSXZ) |
| umin_sgt_max_minus_1 (sgt 0 ≡ eq 1) |
[proof](https://alive2.llvm.org/ce/z/Uynf8P) |
| add_ult_max (ult 4 ≡ ne 4) |
[proof](https://alive2.llvm.org/ce/z/pyDgTg) |
| add_ugt_max_minus_1 (ugt 3 ≡ eq 4) |
[proof](https://alive2.llvm.org/ce/z/mHVXJk) |
| ashr_add_eq_0 (ashr instead of lshr) |
[proof](https://alive2.llvm.org/ce/z/oa9Kgo) |

### or/umax and and/umin equivalence

| Check | Equivalence | Proof |
|-----------------|-------------|-------|
| AnyNeg | or slt 0 ≡ umax slt 0 |
[proof](https://alive2.llvm.org/ce/z/Do2tNQ) |
| AllNonNeg | or sgt -1 ≡ umax sgt -1 |
[proof](https://alive2.llvm.org/ce/z/N4kZ8Z) |
| AllNeg | and slt 0 ≡ umin slt 0 |
[proof](https://alive2.llvm.org/ce/z/4mNpMk) |
| AnyNonNeg | and sgt -1 ≡ umin sgt -1 |
[proof](https://alive2.llvm.org/ce/z/2pVnyg) |
2026-01-23 16:02:06 +00:00
Ramkumar Ramachandra
d69335bac9
[LLVM] Clean up code using [not_]equal_to (NFC) (#175824)
Use llvm::[not_]equal_to landed in d2a521750 ([ADT] Introduce
bind_{front,back}, [not_]equal_to, #175056) across LLVM for cleaner
code.
2026-01-13 21:19:39 +00:00
Pankaj Dwivedi
8246257cac
Reapply "[VectorCombine] Fold scalar selects from bitcast into vector select" (#174762)
Reapply https://github.com/llvm/llvm-project/pull/173990 with fixes for
post-commit review comments.

---------

Co-authored-by: padivedi <padivedi@amd.com>
Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
2026-01-13 16:25:09 +05:30
Julian Pokrovsky
38cb7ddca2
[VectorCombine] foldPermuteOfIntrinsic - support multiple uses of shuffled ops (#175299)
Fixes #173039
2026-01-12 20:34:06 +00:00
Pankaj Dwivedi
1ab7b6655d
Revert "[VectorCombine] Fold scalar selects from bitcast into vector select" (#174758)
Reverts llvm/llvm-project#173990
Reverting to address post-commit review feedback. Will recommit with
fixes.
2026-01-07 18:59:32 +05:30
Pankaj Dwivedi
72f18a05d6
[VectorCombine] Fold scalar selects from bitcast into vector select (#173990) 2026-01-07 15:16:33 +05:30
Victor Chernyakin
c438773432
[LLVM][ADT] Migrate users of make_scope_exit to CTAD (#174030)
This is a followup to #173131, which introduced the CTAD functionality.
2026-01-02 20:42:56 -08:00
Simon Pilgrim
3b85a631df
[VectorCombine] scalarizeExtExtract - create bitmasks with APInt::getLowBitsSet to avoid UB (#174202)
If we're dealing with uint64 elements or larger, the existing `(1ull <<
SrcEltSizeInBits) - 1` mask can cause UB.

Fixes #174046
2026-01-02 13:02:41 +00:00
Marcell Leleszi
fdc07534e7
[VectorCombine] foldShuffleOfSelects - support multiple uses of shuffled selects (#173166)
This patch removes the single-use restriction of selects in
foldShuffleOfSelects, allowing the fold to trigger for multi-use
instructions as well if the cost model finds it cheaper.

Fixes #173036
2025-12-23 13:10:12 +00:00
Dhruva Narayan K
1235409ed7
[VectorCombine] foldShuffleOfIntrinsics - support multiple uses of shuffled ops (#173183)
Fixes #173037

Remove the `m_OneUse` restriction in `foldShuffleOfIntrinsics` and
update the cost model to account for additional uses of the original intrinsics.
2025-12-22 19:00:53 +00:00
Miloš Poletanović
f60eec59fb
[VectorCombine] foldPermuteOfBinops - support multi-use binary ops and operands in shuffle folding (#173153)
Fixes #173033 

This patch extends VectorCombine to fold binary operations through
shuffles in scenarios involving multiple uses of both the binary
operator and its operands.

Previously, the transformation was restricted to single-use cases to
prevent instruction duplication. This change implements a cost-based
evaluation that allows the fold even when:
1. The binary operator has multiple users (requiring duplication of the
arithmetic instruction).
2. The operands of the binary operator (the shuffles) have multiple
users (requiring the original shuffles to be preserved).

The optimization is performed if the TTI cost of the new instruction
sequence—including any duplicated arithmetic—is lower than the cost of
the shuffle sequence it replaces. This is particularly beneficial on X86
targets for expensive cross-lane shuffles.
2025-12-22 18:12:35 +00:00
Simon Pilgrim
24d9550b27
[VectorCombine] foldShuffleOfBinops - if both operands are the same don't duplicate the total new cost (#172719)
If we're shuffling/concatenating the same operands then ensure we don't
duplicate the total cost, ensure we reuse the final shuffle and
recognise that we reduce the total instruction count (so fold even when
NewCost == OldCost, not just NewCost < OldCost).
2025-12-18 07:03:06 +00:00
Nicolai Hähnle
88bd56597c
VectorCombine: Improve the insert/extract fold in the narrowing case (#168820)
Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:

1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
compatible, which allows foldLengthChangingShuffles to successfully
recognize a chain that can be folded.

There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in
llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64--
-mattr=AVX2`
at all.
2025-12-15 11:25:51 -08:00
Bala_Bhuvan_Varma
0b2fe07e6b
[VectorCombine] Prevent redundant cost computation for repeated operand pairs in foldShuffleOfIntrinsics (#171965)
This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867)

Existing code recomputes the cost for creating a shuffle instruction even for the
repeating Intrinsic operand pairs. This will result in higher newCost.
Hence the runtime will decide not to fold.

The change proposed in this pr will address this issue. When calculating
the newCost we are skipping the cost calculation of an operand pair if
it was already considered. And when creating the transformed code, we
are reusing the already created shuffle instruction for repeated operand
pair.
2025-12-15 14:42:41 +00:00
Nicolai Hähnle
54ae1222ef
VectorCombine: Fold chains of shuffles fed by length-changing shuffles (#168819)
Such chains can arise from folding insert/extract chains.
2025-12-12 13:53:03 -08:00
Jerry Dang
23f09fd3e9
[VectorCombine] Fold permute of intrinsics into intrinsic of permutes: shuffle(intrinsic, poison/undef) -> intrinsic(shuffle) (#170052)
[VectorCombine] Fold permute of intrinsics into intrinsic of permutes

Add foldPermuteOfIntrinsic to transform:
  shuffle(intrinsic(args), poison) -> intrinsic(shuffle(args))
when the shuffle is a permute (operates on single vector) and the cost
model determines the transformation is profitable.

This optimization is particularly beneficial for subvector extractions
where we can avoid computing unused elements.

For example:
  %fma = call <8 x float> @llvm.fma.v8f32(<8 x float> %a, %b, %c)
  %result = shufflevector <8 x float> %fma, poison, <4 x i32> <0,1,2,3>
transforms to:
  %a_low = shufflevector <8 x float> %a, poison, <4 x i32> <0,1,2,3>
  %b_low = shufflevector <8 x float> %b, poison, <4 x i32> <0,1,2,3>
  %c_low = shufflevector <8 x float> %c, poison, <4 x i32> <0,1,2,3>
  %result = call <4 x float> @llvm.fma.v4f32(%a_low, %b_low, %c_low)

The transformation creates one shuffle per vector argument and calls the
intrinsic with smaller vector types, reducing computation when only a
subset of elements is needed.

The existing foldShuffleOfIntrinsics handled the blend case (two
intrinsic inputs), this adds support for the permute case (single
intrinsic input).

Fixes #170002
2025-12-05 15:54:53 +00:00
Simon Pilgrim
c2472be3fb
[VectorCombine][X86] foldShuffleOfIntrinsics - provide the arguments to a getShuffleCost call (#170465)
Ensure the arguments are passed to the getShuffleCost calls to improve
cost analysis, in particular if these are constant the costs will be
recognised as free

Noticed while reviewing #170052
2025-12-03 18:40:48 +00:00
Julian Nagele
8280070a73
[VectorCombine] Try to scalarize vector loads feeding bitcast instructions. (#164682)
This change aims to convert vector loads to scalar loads, if they are
only converted to scalars after anyway.

alive2 proof: https://alive2.llvm.org/ce/z/U_rvht
2025-11-12 15:35:03 +00:00
hanbeom
50ba89a22e
[VectorCombine] support mismatching extract/insert indices for foldInsExtFNeg (#126408)
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index 
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask

In previous, the above transform was only possible if the Extract/Insert
Index was the same; this patch makes the above transform possible
even if the two indexes are different.

Proof: https://alive2.llvm.org/ce/z/aDfdyG
Fixes: https://github.com/llvm/llvm-project/issues/125675
2025-11-07 18:35:40 +00:00
Julian Nagele
28a20b4af9
[VectorCombine] Avoid inserting freeze when scalarizing extend-extract if all extracts would lead to UB on poison. (#164683)
This change aims to avoid inserting a freeze instruction between the
load and bitcast when scalarizing extend-extract. This is particularly
useful in combination with
https://github.com/llvm/llvm-project/pull/164682, which can then
potentially further scalarize, provided there is no freeze.

alive2 proof: https://alive2.llvm.org/ce/z/W-GD88
2025-11-04 12:39:04 +00:00
Hongyu Chen
87bc0f7431
[VectorCombine] Preserve cast flags in foldBitOpOfCastConstant (#161237)
Follow-up of #157822.
2025-09-30 16:38:03 +08:00
Chaitanya Koparkar
766c90f439
[VectorCombine] foldShuffleOfCastops - handle unary shuffles (#160009)
Fixes #156853.
2025-09-29 14:21:59 +01:00
Leon Clark
8df643f663
[VectorCombine] Fix rotation in phi narrowing. (#160465)
Fix bug in #140188 where incoming vectors are rotated in the wrong
direction.

Co-authored-by: Leon Clark <leoclark@amd.com>
2025-09-29 13:26:35 +01:00
Uyiosa Iyekekpolor
994a6a39e1
[VectorCombine] Fix scalarizeExtExtract for big-endian (#157962)
The scalarizeExtExtract transform assumed little-endian lane ordering,
causing miscompiles on big-endian targets such as AIX/PowerPC under -O3
-flto.

This patch updates the shift calculation to handle endianness correctly
for big-endian targets. No functional change
for little-endian targets.

Fixes #158197.

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-09-15 10:08:16 +00:00
Hongyu Chen
c62ea6598e
[VectorCombine] Add Ext and Trunc support in foldBitOpOfCastConstant (#157822)
Follow-up of https://github.com/llvm/llvm-project/pull/155216.
This patch doesn't preserve the flags. I will implement it in the
follow-up patch.
2025-09-11 17:08:47 +08:00
Hongyu Chen
75b0c89e62
[InstCombine][VectorCombine][NFC] Unify uses of lossless inverse cast (#156597)
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
2025-09-08 13:30:06 +00:00
Simon Pilgrim
ad3a0ae9e1
[VectorCombine] foldSelectShuffle - early-out cases where the max vector register width isn't large enough (#157430)
Technically this could happen with vector units that can't handle all legal scalar widths - but its good enough to use a generic crash test without a suitable target

Fixes #157335
2025-09-08 12:04:23 +00:00
Hongyu Chen
3bdd39715a
[VectorCombine] Relax vector type constraint on bitop(bitcast, bitcast) (#157245)
Inspired by https://github.com/llvm/llvm-project/issues/157131.
This patch allows `bitop(bitcast, bitcast) -> bitcast(bitop)` for scalar
integer types.
2025-09-08 06:58:09 +00:00
Hongyu Chen
db2fc84f93
[VectorCombine] Relax vector type constraint on bitop(bitcast, constant) (#157246)
Fixes https://github.com/llvm/llvm-project/issues/157131.
This patch allows bitop(bitcast, constant) -> bitcast(bitop) for scalar
integer types.
2025-09-08 12:35:19 +08:00
XChy
cb80fa756c
[VectorCombine] Support pattern bitop(bitcast(x), C) -> bitcast(bitop(x, InvC)) (#155216)
Resolves #154797.
This patch adds the fold `bitop(bitcast(x), C) -> bitop(bitcast(x),
cast(InvC)) -> bitcast(bitop(x, InvC))`.
The helper function `getLosslessInvCast` tries to calculate the constant
`InvC`, satisfying `castop(InvC) == C`, and will try its best to keep
the poison-generated flags of the cast operation.
2025-09-02 23:54:12 +08:00
Sam Tebbs
37127f74f4
[LV] Bundle sub reductions into VPExpressionRecipe (#147255)
This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. -> https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-09-01 17:25:01 +01:00