324 Commits

Author SHA1 Message Date
Dhruva Narayan K
1235409ed7
[VectorCombine] foldShuffleOfIntrinsics - support multiple uses of shuffled ops (#173183)
Fixes #173037

Remove the `m_OneUse` restriction in `foldShuffleOfIntrinsics` and
update the cost model to account for additional uses of the original intrinsics.
2025-12-22 19:00:53 +00:00
Miloš Poletanović
f60eec59fb
[VectorCombine] foldPermuteOfBinops - support multi-use binary ops and operands in shuffle folding (#173153)
Fixes #173033 

This patch extends VectorCombine to fold binary operations through
shuffles in scenarios involving multiple uses of both the binary
operator and its operands.

Previously, the transformation was restricted to single-use cases to
prevent instruction duplication. This change implements a cost-based
evaluation that allows the fold even when:
1. The binary operator has multiple users (requiring duplication of the
arithmetic instruction).
2. The operands of the binary operator (the shuffles) have multiple
users (requiring the original shuffles to be preserved).

The optimization is performed if the TTI cost of the new instruction
sequence—including any duplicated arithmetic—is lower than the cost of
the shuffle sequence it replaces. This is particularly beneficial on X86
targets for expensive cross-lane shuffles.
2025-12-22 18:12:35 +00:00
Simon Pilgrim
24d9550b27
[VectorCombine] foldShuffleOfBinops - if both operands are the same don't duplicate the total new cost (#172719)
If we're shuffling/concatenating the same operands then ensure we don't
duplicate the total cost, ensure we reuse the final shuffle and
recognise that we reduce the total instruction count (so fold even when
NewCost == OldCost, not just NewCost < OldCost).
2025-12-18 07:03:06 +00:00
Nicolai Hähnle
88bd56597c
VectorCombine: Improve the insert/extract fold in the narrowing case (#168820)
Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:

1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
compatible, which allows foldLengthChangingShuffles to successfully
recognize a chain that can be folded.

There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in
llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64--
-mattr=AVX2`
at all.
2025-12-15 11:25:51 -08:00
Bala_Bhuvan_Varma
0b2fe07e6b
[VectorCombine] Prevent redundant cost computation for repeated operand pairs in foldShuffleOfIntrinsics (#171965)
This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867)

Existing code recomputes the cost for creating a shuffle instruction even for the
repeating Intrinsic operand pairs. This will result in higher newCost.
Hence the runtime will decide not to fold.

The change proposed in this pr will address this issue. When calculating
the newCost we are skipping the cost calculation of an operand pair if
it was already considered. And when creating the transformed code, we
are reusing the already created shuffle instruction for repeated operand
pair.
2025-12-15 14:42:41 +00:00
Nicolai Hähnle
54ae1222ef
VectorCombine: Fold chains of shuffles fed by length-changing shuffles (#168819)
Such chains can arise from folding insert/extract chains.
2025-12-12 13:53:03 -08:00
Jerry Dang
23f09fd3e9
[VectorCombine] Fold permute of intrinsics into intrinsic of permutes: shuffle(intrinsic, poison/undef) -> intrinsic(shuffle) (#170052)
[VectorCombine] Fold permute of intrinsics into intrinsic of permutes

Add foldPermuteOfIntrinsic to transform:
  shuffle(intrinsic(args), poison) -> intrinsic(shuffle(args))
when the shuffle is a permute (operates on single vector) and the cost
model determines the transformation is profitable.

This optimization is particularly beneficial for subvector extractions
where we can avoid computing unused elements.

For example:
  %fma = call <8 x float> @llvm.fma.v8f32(<8 x float> %a, %b, %c)
  %result = shufflevector <8 x float> %fma, poison, <4 x i32> <0,1,2,3>
transforms to:
  %a_low = shufflevector <8 x float> %a, poison, <4 x i32> <0,1,2,3>
  %b_low = shufflevector <8 x float> %b, poison, <4 x i32> <0,1,2,3>
  %c_low = shufflevector <8 x float> %c, poison, <4 x i32> <0,1,2,3>
  %result = call <4 x float> @llvm.fma.v4f32(%a_low, %b_low, %c_low)

The transformation creates one shuffle per vector argument and calls the
intrinsic with smaller vector types, reducing computation when only a
subset of elements is needed.

The existing foldShuffleOfIntrinsics handled the blend case (two
intrinsic inputs), this adds support for the permute case (single
intrinsic input).

Fixes #170002
2025-12-05 15:54:53 +00:00
Simon Pilgrim
c2472be3fb
[VectorCombine][X86] foldShuffleOfIntrinsics - provide the arguments to a getShuffleCost call (#170465)
Ensure the arguments are passed to the getShuffleCost calls to improve
cost analysis, in particular if these are constant the costs will be
recognised as free

Noticed while reviewing #170052
2025-12-03 18:40:48 +00:00
Julian Nagele
8280070a73
[VectorCombine] Try to scalarize vector loads feeding bitcast instructions. (#164682)
This change aims to convert vector loads to scalar loads, if they are
only converted to scalars after anyway.

alive2 proof: https://alive2.llvm.org/ce/z/U_rvht
2025-11-12 15:35:03 +00:00
hanbeom
50ba89a22e
[VectorCombine] support mismatching extract/insert indices for foldInsExtFNeg (#126408)
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index 
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask

In previous, the above transform was only possible if the Extract/Insert
Index was the same; this patch makes the above transform possible
even if the two indexes are different.

Proof: https://alive2.llvm.org/ce/z/aDfdyG
Fixes: https://github.com/llvm/llvm-project/issues/125675
2025-11-07 18:35:40 +00:00
Julian Nagele
28a20b4af9
[VectorCombine] Avoid inserting freeze when scalarizing extend-extract if all extracts would lead to UB on poison. (#164683)
This change aims to avoid inserting a freeze instruction between the
load and bitcast when scalarizing extend-extract. This is particularly
useful in combination with
https://github.com/llvm/llvm-project/pull/164682, which can then
potentially further scalarize, provided there is no freeze.

alive2 proof: https://alive2.llvm.org/ce/z/W-GD88
2025-11-04 12:39:04 +00:00
Hongyu Chen
87bc0f7431
[VectorCombine] Preserve cast flags in foldBitOpOfCastConstant (#161237)
Follow-up of #157822.
2025-09-30 16:38:03 +08:00
Chaitanya Koparkar
766c90f439
[VectorCombine] foldShuffleOfCastops - handle unary shuffles (#160009)
Fixes #156853.
2025-09-29 14:21:59 +01:00
Leon Clark
8df643f663
[VectorCombine] Fix rotation in phi narrowing. (#160465)
Fix bug in #140188 where incoming vectors are rotated in the wrong
direction.

Co-authored-by: Leon Clark <leoclark@amd.com>
2025-09-29 13:26:35 +01:00
Uyiosa Iyekekpolor
994a6a39e1
[VectorCombine] Fix scalarizeExtExtract for big-endian (#157962)
The scalarizeExtExtract transform assumed little-endian lane ordering,
causing miscompiles on big-endian targets such as AIX/PowerPC under -O3
-flto.

This patch updates the shift calculation to handle endianness correctly
for big-endian targets. No functional change
for little-endian targets.

Fixes #158197.

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-09-15 10:08:16 +00:00
Hongyu Chen
c62ea6598e
[VectorCombine] Add Ext and Trunc support in foldBitOpOfCastConstant (#157822)
Follow-up of https://github.com/llvm/llvm-project/pull/155216.
This patch doesn't preserve the flags. I will implement it in the
follow-up patch.
2025-09-11 17:08:47 +08:00
Hongyu Chen
75b0c89e62
[InstCombine][VectorCombine][NFC] Unify uses of lossless inverse cast (#156597)
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
2025-09-08 13:30:06 +00:00
Simon Pilgrim
ad3a0ae9e1
[VectorCombine] foldSelectShuffle - early-out cases where the max vector register width isn't large enough (#157430)
Technically this could happen with vector units that can't handle all legal scalar widths - but its good enough to use a generic crash test without a suitable target

Fixes #157335
2025-09-08 12:04:23 +00:00
Hongyu Chen
3bdd39715a
[VectorCombine] Relax vector type constraint on bitop(bitcast, bitcast) (#157245)
Inspired by https://github.com/llvm/llvm-project/issues/157131.
This patch allows `bitop(bitcast, bitcast) -> bitcast(bitop)` for scalar
integer types.
2025-09-08 06:58:09 +00:00
Hongyu Chen
db2fc84f93
[VectorCombine] Relax vector type constraint on bitop(bitcast, constant) (#157246)
Fixes https://github.com/llvm/llvm-project/issues/157131.
This patch allows bitop(bitcast, constant) -> bitcast(bitop) for scalar
integer types.
2025-09-08 12:35:19 +08:00
XChy
cb80fa756c
[VectorCombine] Support pattern bitop(bitcast(x), C) -> bitcast(bitop(x, InvC)) (#155216)
Resolves #154797.
This patch adds the fold `bitop(bitcast(x), C) -> bitop(bitcast(x),
cast(InvC)) -> bitcast(bitop(x, InvC))`.
The helper function `getLosslessInvCast` tries to calculate the constant
`InvC`, satisfying `castop(InvC) == C`, and will try its best to keep
the poison-generated flags of the cast operation.
2025-09-02 23:54:12 +08:00
Sam Tebbs
37127f74f4
[LV] Bundle sub reductions into VPExpressionRecipe (#147255)
This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. -> https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-09-01 17:25:01 +01:00
Yingwei Zheng
abd2dc90c0
[VectorCombine] Avoid double deletion in eraseInstruction (#155621)
Consider the following pattern:
```
C = op A B
D = op C
E = op D, C
```
As `E` is dead, we call `eraseInstruction(E)` and see if its operands
become dead. `RecursivelyDeleteTriviallyDeadInstructions(D)` also erases
`C`, which causes a UAF crash in the subsequent call
`RecursivelyDeleteTriviallyDeadInstructions(C)`.

This patch also adds deleted ops into the visit list to avoid double
deletion.

Closes https://github.com/llvm/llvm-project/issues/155543.
2025-08-27 22:28:02 +08:00
Yingwei Zheng
db6a8f1009
[VectorCombine] Avoid crash when the next node is deleted. (#155115)
`RecursivelyDeleteTriviallyDeadInstructions` is introduced by
https://github.com/llvm/llvm-project/pull/149047 to immediately drop
dead instructions. However, it may invalidate the next iterator in
`make_early_inc_range` in some edge cases, which leads to a crash. This
patch manually maintains the next iterator and updates it when the next
instruction is about to be deleted.

Closes https://github.com/llvm/llvm-project/issues/155110.
2025-08-26 00:22:53 +08:00
Rajveer Singh Bharadwaj
4ce550614b
[Post-Commit] Add missing break
https://github.com/llvm/llvm-project/pull/145232
2025-08-24 15:08:32 +05:30
Rajveer Singh Bharadwaj
93c96849c8
[VectorCombine] New folding pattern for extract/binop/shuffle chains (#145232)
Resolves #144654
Part of #143088

This adds a new `foldShuffleChainsToReduce` for horizontal reduction of
patterns like:

```llvm
define i16 @test_reduce_v8i16(<8 x i16> %a0) local_unnamed_addr #0 {
  %1 = shufflevector <8 x i16> %a0, <8 x i16> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison>
  %2 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %a0, <8 x i16> %1)
  %3 = shufflevector <8 x i16> %2, <8 x i16> poison, <8 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %4 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %2, <8 x i16> %3)
  %5 = shufflevector <8 x i16> %4, <8 x i16> poison, <8 x i32> <i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %6 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %4, <8 x i16> %5)
  %7 = extractelement <8 x i16> %6, i64 0
  ret i16 %7
}
```
...which can be reduced to a llvm.vector.reduce.umin.v8i16(%a0)
intrinsic call.

Similar transformation for other ops when costs permit to do so.
2025-08-24 14:21:48 +05:30
Kazu Hirata
c1bc55ee06
[Vectorize] Remove an unnecessary cast (NFC) (#155135)
getOpcode() already returns Instruction::CastOps.
2025-08-23 22:20:07 -07:00
Kyle Wang
064f02dac0
[VectorCombine] Preserve scoped alias metadata (#153714)
Right now if a load op is scalarized, the `!alias.scope` and `!noalias`
metadata are dropped. This PR is to keep them if exist.
2025-08-18 18:16:32 +00:00
David Green
790bee99de
[VectorCombine] Remove dead node immediately in VectorCombine (#149047)
The vector combiner will process all instructions as it first loops
through the function, adding any newly added and deleted instructions to
a worklist which is then processed when all nodes are done. These leaves
extra uses in the graph as the initial processing is performed, leading
to sub-optimal decisions being made for other combines. This changes it
so that trivially dead instructions are removed immediately. The main
changes that this requires is to make sure iterator invalidation does not
occur.
2025-08-18 07:55:21 +01:00
Kazu Hirata
cbf5af9668
[llvm] Remove unused includes (NFC) (#154051)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-08-17 23:46:35 -07:00
XChy
3a4a60deff
[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop (#153069)
Fixes #153012

As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we
may fold
```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 {
entry:
  %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
  %159 = or disjoint <2 x i64> splat (i64 2), %158
  store <2 x i64> %159, ptr %ptr2
  ret void
}
```

to

```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)>
  %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0
  store <2 x i64> %1, ptr %ptr2, align 16
  ret void
}
```
And it would be folded back in `foldInsExtBinop`, resulting in an
infinite loop.

This patch forces scalarization iff InstSimplify can fold the constant
expression.
2025-08-15 18:38:04 +00:00
zGoldthorpe
a8d25683ee
[PatternMatch] Allow m_ConstantInt to match integer splats (#153692)
When matching integers, `m_ConstantInt` is a convenient alternative to
`m_APInt` for matching unsigned 64-bit integers, allowing one to
simplify

```cpp
const APInt *IntC;
if (match(V, m_APInt(IntC))) {
  if (IntC->ule(UINT64_MAX)) {
    uint64_t Int = IntC->getZExtValue();
    // ...
  }
}
```
to
```cpp
uint64_t Int;
if (match(V, m_ConstantInt(Int))) {
  // ...
}
```

However, this simplification is only true if `V` is a scalar type.
Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt`
does not.

This patch ensures that the matching behaviour of `m_ConstantInt`
parallels that of `m_APInt`, and also incorporates it in some obvious
places.
2025-08-15 10:43:54 -06:00
Elvis Wang
01fac67e2a
[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342)
This patch add cost kind to `getAddressComputationCost()` for #149955.

Note that this patch also remove all the default value in `getAddressComputationCost()`.
2025-08-14 16:01:44 +08:00
Leon Clark
e2bbd6d287
[VectorCombine][AMDGPU] Narrow Phi of Shuffles. (#140188)
Attempt to narrow a phi of shufflevector instructions where the two
incoming values have the same operands but different masks.

Related to #128938.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
2025-08-12 18:45:11 +01:00
Leon Clark
9115bef8ee
[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138)
Reopen #128938.

Attempt to shrink the size of vector loads where only some of the
incoming lanes are used for rebroadcasts in shufflevector instructions.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-12 14:08:37 +01:00
Luke Lau
acb86fb9e0
[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657)
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.

The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.

This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.

No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
2025-08-11 18:00:12 +08:00
David Green
6ca6d45b29
[VectorCombine] Use hasOneUser in shuffle-to-identity fold (#152675)
We need to check that the node is part of the graph being converted, so
will not contain external uses when transformed.
2025-08-11 07:45:15 +01:00
Simon Pilgrim
88c6448fa2
Revert "[VectorCombine] Shrink loads used in shufflevector rebroadcasts" (#151960)
Reverts llvm/llvm-project#128938 while a crash regression is investigated
2025-08-04 15:03:53 +01:00
Leon Clark
1feed444aa
[VectorCombine] Shrink loads used in shufflevector rebroadcasts (#128938)
Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-04 10:49:27 +01:00
Nathan Gauër
67273393b1
[VectorCombine][TTI] Prevent extract/ins rewrite to GEP (#150216)
Using GEP to index into a vector is not disallowed, but not recommended.
The SPIR-V backend needs to generate structured access into types, which
is impossible with an untyped GEP instruction unless we add more info to
the IR. Finding a solution is a work-in-progress, but in the meantime,
we'd like to reduce the amount of failures.

Preventing this optimizations from rewritting extract/insert
instructions into a GEP helps us lower more code to SPIR-V. This change
should be OK as it's only active when targeting SPIR-V and disabling a
non-recommended transformation.

Related to #145002
2025-07-31 14:14:00 +02:00
Igor Kirillov
0c91e977c0
[VectorCombine] Refine cost model and decision logic in foldSelectShuffle (#146694)
After PR #136329, shuffle indices may differ, which can cause the
existing cost-based logic to miss optimisation opportunities for
binop/shuffle sequences.

This patch improves the cost model in foldSelectShuffle to more
accurately assess costs, recognising when certain duplicate shuffles do
not require actual instructions.

Additionally, in break-even cases, this change introduces a check for
whether the pattern ultimately feeds into a vector reduction, allowing
the transform to proceed when it is likely to be profitable overall.
2025-07-25 13:28:27 +01:00
Rahul Yadav
04e5e643f5
[VectorCombine] Generalize foldBitOpOfBitcasts to support more cast operations (#148350)
This patch generalizes the existing foldBitOpOfBitcasts optimization in the VectorCombine pass to handle additional cast operations beyond just bitcast.

  Fixes: [#146037](https://github.com/llvm/llvm-project/issues/146037)

  Summary

The optimization now supports folding bitwise operations (AND/OR/XOR)
with the following cast operations:
  - bitcast (original functionality)
  - trunc (truncate)
  - sext (sign extend)
  - zext (zero extend)

  The transformation pattern is:
  bitop(castop(x), castop(y)) -> castop(bitop(x, y))

This reduces the number of cast instructions from 2 to 1, improving
performance on targets where cast operations
are expensive or where performing bitwise operations on narrower types
is beneficial.
  
  Implementation Details

- Renamed foldBitOpOfBitcasts to foldBitOpOfCastops to reflect broader
functionality
  - Extended pattern matching to handle any CastInst operation
- Added validation for each cast type's constraints (e.g., trunc
requires source > dest)
  - Updated cost model to use the actual cast opcode
  - Preserves IR flags from original instructions
  - Handles multi-use scenarios appropriately

  Testing

- Added comprehensive tests in
test/Transforms/VectorCombine/bitop-of-castops.ll
  - Tests cover all supported cast types with all bitwise operations
  - Includes negative tests for unsupported patterns
  - All existing VectorCombine tests pass
2025-07-21 17:14:56 +01:00
Florian Hahn
1113224f94
[VectorCombine] Account for IRBuilder simplification in translateExt.
After https://github.com/llvm/llvm-project/pull/146350,
CreateExtractElement may return a folded value and not create an
ExtractElement instruction.

Replace cast with dyn_cast. Note that the function returns nullptr
already earlier if the extract may be constant folded.

Fixes https://github.com/llvm/llvm-project/issues/147218
2025-07-07 13:39:39 +02:00
Florian Hahn
4acdb8e14e
[VectorCombine] Scalarize extracts of ZExt if profitable. (#142976)
Add a new scalarization transform that tries to convert extracts of a
vector ZExt to a set of scalar shift and mask operations. This can be
profitable if the cost of extracting is the same or higher than the cost
of 2 scalar ops. This is the case on AArch64 for example.

For AArch64,this shows up in a number of workloads, including av1aom,
gmsh, minizinc and astc-encoder.

PR: https://github.com/llvm/llvm-project/pull/142976
2025-07-03 08:49:32 +01:00
Luke Lau
7931a8f102
[VectorCombine] Scalarize vector intrinsics with scalar arguments (#146530)
Some intrinsics like llvm.abs or llvm.powi have a scalar argument even
when the overloaded type is a vector.
This patch handles these in scalarizeOpOrCmp to allow scalarizing them.

In the test the leftover vector powi isn't folded away to poison, this
should be fixed in a separate patch.
2025-07-02 16:48:53 +01:00
Florian Hahn
829f2f2448
[VectorCombine] Mark function as changed if shuffle is created.
777d6b5de90b7e0 exposed a code path where a function is modified but not
marked accordingly. Make sure we return true from foldShuffleFromReductions
if only a shuffle has been inserted/replaced.

Should fix    https://lab.llvm.org/buildbot/#/builders/187/builds/7578.
2025-07-01 21:38:29 +01:00
Florian Hahn
777d6b5de9
[VectorCombine] Use InstSimplifyFolder to simplify instrs on creation. (#146350)
Update VectorCombine to use InstSimplifyFolder to simplify redundant
instructions on creation.

PR: https://github.com/llvm/llvm-project/pull/146350
2025-07-01 20:55:51 +01:00
Narayan
d05634d5cd
[VectorCombine] Fold bitwise operations of bitcasts into bitcast of bitwise operation (#137322)
Currently, LLVM fails to convert certain pblendvb intrinsics into select
instructions when the blend mask is derived from complex boolean logic
operations. This occurs even when the mask is ultimately based on
sign-extended comparison results, preventing further optimization
opportunities.

Fixes #66513

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-06-26 14:57:22 +01:00
Kazu Hirata
2a35414e98
[Transforms] Use range-based for loops (NFC) (#145252)
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-25 10:08:26 -07:00
Simon Pilgrim
26390f22b8
[VectorCombine] foldShuffleOfShuffles - fold shuffle(shuffle(x,y),poison) length changing masks (#144690)
The shuffle merging code assumes that the shuffle sources are all the
same type, which fails if we've changed length and don't have 2 inner
shuffles. We already handle length-changing shuffles if we do have 2
inner shuffles.

This patch creates a fake "all poison" shuffle mask and reuses the other
shuffle's sources, which can be safely used with the existing merge
code.

The alternative was a considerable refactor of the merge code to account
for different vector widths......

Fixes #144656
2025-06-22 13:30:45 +01:00