297 Commits

Author SHA1 Message Date
Kyle Wang
064f02dac0
[VectorCombine] Preserve scoped alias metadata (#153714)
Right now if a load op is scalarized, the `!alias.scope` and `!noalias`
metadata are dropped. This PR is to keep them if exist.
2025-08-18 18:16:32 +00:00
David Green
790bee99de
[VectorCombine] Remove dead node immediately in VectorCombine (#149047)
The vector combiner will process all instructions as it first loops
through the function, adding any newly added and deleted instructions to
a worklist which is then processed when all nodes are done. These leaves
extra uses in the graph as the initial processing is performed, leading
to sub-optimal decisions being made for other combines. This changes it
so that trivially dead instructions are removed immediately. The main
changes that this requires is to make sure iterator invalidation does not
occur.
2025-08-18 07:55:21 +01:00
Kazu Hirata
cbf5af9668
[llvm] Remove unused includes (NFC) (#154051)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-08-17 23:46:35 -07:00
XChy
3a4a60deff
[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop (#153069)
Fixes #153012

As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we
may fold
```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 {
entry:
  %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
  %159 = or disjoint <2 x i64> splat (i64 2), %158
  store <2 x i64> %159, ptr %ptr2
  ret void
}
```

to

```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)>
  %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0
  store <2 x i64> %1, ptr %ptr2, align 16
  ret void
}
```
And it would be folded back in `foldInsExtBinop`, resulting in an
infinite loop.

This patch forces scalarization iff InstSimplify can fold the constant
expression.
2025-08-15 18:38:04 +00:00
zGoldthorpe
a8d25683ee
[PatternMatch] Allow m_ConstantInt to match integer splats (#153692)
When matching integers, `m_ConstantInt` is a convenient alternative to
`m_APInt` for matching unsigned 64-bit integers, allowing one to
simplify

```cpp
const APInt *IntC;
if (match(V, m_APInt(IntC))) {
  if (IntC->ule(UINT64_MAX)) {
    uint64_t Int = IntC->getZExtValue();
    // ...
  }
}
```
to
```cpp
uint64_t Int;
if (match(V, m_ConstantInt(Int))) {
  // ...
}
```

However, this simplification is only true if `V` is a scalar type.
Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt`
does not.

This patch ensures that the matching behaviour of `m_ConstantInt`
parallels that of `m_APInt`, and also incorporates it in some obvious
places.
2025-08-15 10:43:54 -06:00
Elvis Wang
01fac67e2a
[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342)
This patch add cost kind to `getAddressComputationCost()` for #149955.

Note that this patch also remove all the default value in `getAddressComputationCost()`.
2025-08-14 16:01:44 +08:00
Leon Clark
e2bbd6d287
[VectorCombine][AMDGPU] Narrow Phi of Shuffles. (#140188)
Attempt to narrow a phi of shufflevector instructions where the two
incoming values have the same operands but different masks.

Related to #128938.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
2025-08-12 18:45:11 +01:00
Leon Clark
9115bef8ee
[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138)
Reopen #128938.

Attempt to shrink the size of vector loads where only some of the
incoming lanes are used for rebroadcasts in shufflevector instructions.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-12 14:08:37 +01:00
Luke Lau
acb86fb9e0
[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657)
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.

The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.

This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.

No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
2025-08-11 18:00:12 +08:00
David Green
6ca6d45b29
[VectorCombine] Use hasOneUser in shuffle-to-identity fold (#152675)
We need to check that the node is part of the graph being converted, so
will not contain external uses when transformed.
2025-08-11 07:45:15 +01:00
Simon Pilgrim
88c6448fa2
Revert "[VectorCombine] Shrink loads used in shufflevector rebroadcasts" (#151960)
Reverts llvm/llvm-project#128938 while a crash regression is investigated
2025-08-04 15:03:53 +01:00
Leon Clark
1feed444aa
[VectorCombine] Shrink loads used in shufflevector rebroadcasts (#128938)
Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-04 10:49:27 +01:00
Nathan Gauër
67273393b1
[VectorCombine][TTI] Prevent extract/ins rewrite to GEP (#150216)
Using GEP to index into a vector is not disallowed, but not recommended.
The SPIR-V backend needs to generate structured access into types, which
is impossible with an untyped GEP instruction unless we add more info to
the IR. Finding a solution is a work-in-progress, but in the meantime,
we'd like to reduce the amount of failures.

Preventing this optimizations from rewritting extract/insert
instructions into a GEP helps us lower more code to SPIR-V. This change
should be OK as it's only active when targeting SPIR-V and disabling a
non-recommended transformation.

Related to #145002
2025-07-31 14:14:00 +02:00
Igor Kirillov
0c91e977c0
[VectorCombine] Refine cost model and decision logic in foldSelectShuffle (#146694)
After PR #136329, shuffle indices may differ, which can cause the
existing cost-based logic to miss optimisation opportunities for
binop/shuffle sequences.

This patch improves the cost model in foldSelectShuffle to more
accurately assess costs, recognising when certain duplicate shuffles do
not require actual instructions.

Additionally, in break-even cases, this change introduces a check for
whether the pattern ultimately feeds into a vector reduction, allowing
the transform to proceed when it is likely to be profitable overall.
2025-07-25 13:28:27 +01:00
Rahul Yadav
04e5e643f5
[VectorCombine] Generalize foldBitOpOfBitcasts to support more cast operations (#148350)
This patch generalizes the existing foldBitOpOfBitcasts optimization in the VectorCombine pass to handle additional cast operations beyond just bitcast.

  Fixes: [#146037](https://github.com/llvm/llvm-project/issues/146037)

  Summary

The optimization now supports folding bitwise operations (AND/OR/XOR)
with the following cast operations:
  - bitcast (original functionality)
  - trunc (truncate)
  - sext (sign extend)
  - zext (zero extend)

  The transformation pattern is:
  bitop(castop(x), castop(y)) -> castop(bitop(x, y))

This reduces the number of cast instructions from 2 to 1, improving
performance on targets where cast operations
are expensive or where performing bitwise operations on narrower types
is beneficial.
  
  Implementation Details

- Renamed foldBitOpOfBitcasts to foldBitOpOfCastops to reflect broader
functionality
  - Extended pattern matching to handle any CastInst operation
- Added validation for each cast type's constraints (e.g., trunc
requires source > dest)
  - Updated cost model to use the actual cast opcode
  - Preserves IR flags from original instructions
  - Handles multi-use scenarios appropriately

  Testing

- Added comprehensive tests in
test/Transforms/VectorCombine/bitop-of-castops.ll
  - Tests cover all supported cast types with all bitwise operations
  - Includes negative tests for unsupported patterns
  - All existing VectorCombine tests pass
2025-07-21 17:14:56 +01:00
Florian Hahn
1113224f94
[VectorCombine] Account for IRBuilder simplification in translateExt.
After https://github.com/llvm/llvm-project/pull/146350,
CreateExtractElement may return a folded value and not create an
ExtractElement instruction.

Replace cast with dyn_cast. Note that the function returns nullptr
already earlier if the extract may be constant folded.

Fixes https://github.com/llvm/llvm-project/issues/147218
2025-07-07 13:39:39 +02:00
Florian Hahn
4acdb8e14e
[VectorCombine] Scalarize extracts of ZExt if profitable. (#142976)
Add a new scalarization transform that tries to convert extracts of a
vector ZExt to a set of scalar shift and mask operations. This can be
profitable if the cost of extracting is the same or higher than the cost
of 2 scalar ops. This is the case on AArch64 for example.

For AArch64,this shows up in a number of workloads, including av1aom,
gmsh, minizinc and astc-encoder.

PR: https://github.com/llvm/llvm-project/pull/142976
2025-07-03 08:49:32 +01:00
Luke Lau
7931a8f102
[VectorCombine] Scalarize vector intrinsics with scalar arguments (#146530)
Some intrinsics like llvm.abs or llvm.powi have a scalar argument even
when the overloaded type is a vector.
This patch handles these in scalarizeOpOrCmp to allow scalarizing them.

In the test the leftover vector powi isn't folded away to poison, this
should be fixed in a separate patch.
2025-07-02 16:48:53 +01:00
Florian Hahn
829f2f2448
[VectorCombine] Mark function as changed if shuffle is created.
777d6b5de90b7e0 exposed a code path where a function is modified but not
marked accordingly. Make sure we return true from foldShuffleFromReductions
if only a shuffle has been inserted/replaced.

Should fix    https://lab.llvm.org/buildbot/#/builders/187/builds/7578.
2025-07-01 21:38:29 +01:00
Florian Hahn
777d6b5de9
[VectorCombine] Use InstSimplifyFolder to simplify instrs on creation. (#146350)
Update VectorCombine to use InstSimplifyFolder to simplify redundant
instructions on creation.

PR: https://github.com/llvm/llvm-project/pull/146350
2025-07-01 20:55:51 +01:00
Narayan
d05634d5cd
[VectorCombine] Fold bitwise operations of bitcasts into bitcast of bitwise operation (#137322)
Currently, LLVM fails to convert certain pblendvb intrinsics into select
instructions when the blend mask is derived from complex boolean logic
operations. This occurs even when the mask is ultimately based on
sign-extended comparison results, preventing further optimization
opportunities.

Fixes #66513

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-06-26 14:57:22 +01:00
Kazu Hirata
2a35414e98
[Transforms] Use range-based for loops (NFC) (#145252)
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-25 10:08:26 -07:00
Simon Pilgrim
26390f22b8
[VectorCombine] foldShuffleOfShuffles - fold shuffle(shuffle(x,y),poison) length changing masks (#144690)
The shuffle merging code assumes that the shuffle sources are all the
same type, which fails if we've changed length and don't have 2 inner
shuffles. We already handle length-changing shuffles if we do have 2
inner shuffles.

This patch creates a fake "all poison" shuffle mask and reuses the other
shuffle's sources, which can be safely used with the existing merge
code.

The alternative was a considerable refactor of the merge code to account
for different vector widths......

Fixes #144656
2025-06-22 13:30:45 +01:00
David Green
77941eba7f
[CostModel] Add a DstTy to getShuffleCost (#141634)
A shuffle will take two input vectors and a mask, to produce a new
vector of size <MaskElts x SrcEltTy>. Historically it has been assumed
that the SrcTy and the DstTy are the same for getShuffleCost, with that
being relaxed in recent years. If the Tp passed to getShuffleCost is the
SrcTy, then the DstTy can be calculated from the Mask elts and the src
elt size, but the Mask is not always provided and the Tp is not reliably
always the SrcTy. This has led to situations notably in the SLP
vectorizer but also in the generic cost routines where assumption about
how vectors will be legalized are built into the generic cost routines -
for example whether they will widen or promote, with the cost modelling
assuming they will widen but the default lowering to promote for integer
vectors.

This patch attempts to start improving that - it originally tried to
alter more of the cost model but that too quickly became too many
changes at once, so this patch just plumbs in a DstTy to getShuffleCost
so that DstTy and SrcTy can be reliably distinguished. The callers of
getShuffleCost have been updated to try and include a DstTy that is more
accurate. Otherwise it tries to be fairly non-functional, keeping the
SrcTy used as the primary type used in shuffle cost routines, only using
DstTy where it was in the past (for InsertSubVector for example).

Some asserts have been added that help to check for consistent values
when a Mask and a DstTy are provided to getShuffleCost. Some of them
took a while to get right, and some non-mask calls might still be
incorrect. Hopefully this will provide a useful base to build more
shuffles that alter size.
2025-06-21 12:29:29 +01:00
Luke Lau
2e7489c8c8 [VectorCombine] Fix build on gcc-7.5
Hopefully this fixes the build failure at
https://lab.llvm.org/buildbot/#/builders/116/builds/13423. gcc-14
seems to be able to deduce the type and compile this fine, but for
gcc-7 we need to avoid the Use/Value mismatch I guess.
2025-05-28 10:55:38 +01:00
Luke Lau
2b9ded64b0
[VectorCombine] Support nary operands and intrinsics in scalarizeOpOrCmp (#138406)
This adds support for unary operands, and unary + ternary intrinsics in
scalarizeOpOrCmp (FKA scalarizeBinOpOrCmp).

The motivation behind this is to scalarize more intrinsics in
VectorCombine rather than in DAGCombine, so we can sink splats across
basic blocks: see https://github.com/llvm/llvm-project/pull/137786

The main change required is to generalize the existing VecC0/VecC1 rules
across n-ary ops:

- An operand can either be a constant vector or an insert of a scalar
into a constant vector
- If it's an insert, the index needs to be static and in bounds
- If it's an insert, all indices need to be the same across all operands
- If all the operands are constant vectors, bail as it will get constant
folded anyway
2025-05-28 09:45:54 +01:00
Luke Lau
97f6076ded
[VectorCombine][X86] Use updated getVectorInstrCost hook (#137823)
This addresses a TODO where previously scalarizeBinopOrCmp
conservatively bailed if one of the operands was a load.

getVectorInstrCost was updated to take in values in
https://reviews.llvm.org/D140498 so we can pass in the scalar value to
be inserted, which should return an accurate cost for a gather.

To prevent regressions on x86 this tries to constant fold NewVecC up
front so we can pass it into TTI and get a more accurate cost.

We want to remove this restriction on RISC-V since this is always
profitable whether or not the scalar is a load.
2025-05-27 16:27:28 +01:00
Luke Lau
d827588c36
[VectorCombine] Scalarize binop-like intrinsics (#138095)
Currently VectorCombine can scalarize vector compares and binary ops.
This extends it to also scalarize binary-op like intrinsics like umax,
minnum etc.

The motivation behind this is to scalarize more intrinsics in
VectorCombine rather than in DAGCombine, so we can sink splats across
basic blocks: see #137786

This currently has very little effect on generated code because
InstCombine doesn't yet canonicalize binary intrinsics where one operand
is a constant into the form that VectorCombine expects, i.e. `binop
(shuffle insert) const --> shuffle (binop insert const)`. The plan is to
land this first and then in a subsequent patch teach InstCombine to do
the canonicalization to avoid regressions in the meantime.

This uses `isTriviallyVectorizable` to determine whether or not an
intrinsic is safe to scalarize. There's also `isTriviallyScalarizable`,
but this seems more geared towards the Scalarizer pass and includes
intrinsics with multiple return values.

It also only handles intrinsics with two operands with the same type as
the return type. In the future we would generalize this to handle
arbitrary numbers of operands, including unary operators too, e.g. fneg
or fma, as well as different operand types, e.g. powi or scmp
2025-05-21 09:24:11 +01:00
David Green
fff12fbdb9
[VectorCombine] Fix the type used in foldShuffleOfIntrinsics Cost. (#138419)
The shuffle needn't be twice the original number of vector elements, so
the intermediate type used between the shuffle and the intrinsic should
use the ShuffleDstTy number of elements.

I found this when looking at shuffle costs and do not have test where it
alters the output, but have added some cases where the shuffle output is
not twice the size of the input.
2025-05-09 08:55:36 +01:00
Kazu Hirata
aa613777af
[llvm] Remove redundant control flow (NFC) (#138304) 2025-05-02 10:34:25 -07:00
Simon Pilgrim
f572a5951a [VectorCombine] Ensure canScalarizeAccess handles cases where the index type can't represent all inbounds values
Fixes #132563
2025-04-24 14:17:55 +01:00
calebwat
f989db5745
[NFC] Use cast instead of dyn_cast for Src and Dst vec types in VecCombine folding (#134432)
SrcVecTy and DstVecTy are used without a null check, and originate from
a dyn_cast. This patch adjusts this to use a fixed cast, since it is not
checked for null before use otherwise, but is semantically guaranteed
from previous checks.
2025-04-10 15:21:37 +01:00
Florian Hahn
13799998c0
[EquivalenceClasses] Use DenseMap instead of std::set. (NFC) (#134264)
Replace the std::set with DenseMap, which removes the requirement for an
ordering predicate. This also requires to allocate the ECValue objects
separately. This patch uses a BumpPtrAllocator.

Follow-up to https://github.com/llvm/llvm-project/pull/134075.

Compile-time impact is mostly neutral or slightly positive:

https://llvm-compile-time-tracker.com/compare.php?from=ee4e8197fa67dd1ed6e9470e00708e7feeaacd97&to=242e6a8e42889eebfc0bb5d433a4de7dd9e224a7&stat=instructions:u
2025-04-05 12:24:39 +01:00
Nikita Popov
0738f70615
[Intrinsics] Add Intrinsic::getFnAttributes() (NFC) (#132029)
Most places that call Intrinsic::getAttributes() are only interested in
the function attributes, so add a separate function for that.

The motivation for this is that I'd like to add the ability to specify
range attributes on intrinsics, which requires knowing the function
type. This avoids needing to know the type for most attribute queries.
2025-03-20 09:20:39 +01:00
hanbeom
0ee8f69978
[VectorCombine] Fix invalid shuffle cost argument of foldShuffleOfSelects (#130281)
In the previous code (#128032), it specified the destination vector as the
getShuffleCost argument. Because the shuffle mask specifies the indices
of the two vectors specified as elements, the maximum value is twice the
size of the source vector. This causes a problem if the destination
vector is smaller than the source vector and specify an index in the
mask that exceeds the size of the destination vector.

Fix the problem by correcting the previous code, which was using wrong
argument in the Cost calculation.

Fixes #130250
2025-03-07 16:40:26 +00:00
hanbeom
5d1029b4a8
[VectorCombine] Handle shuffle of selects (#128032)
(shuffle(select(c1,t1,f1)), (select(c2,t2,f2)), m)
-> (select (shuffle c1,c2,m), (shuffle t1,t2,m), (shuffle f1,f2,m))

The behaviour of SelectInst on vectors is the same as for
`V'select[i] = Condition[i] ? V'True[i] : V'False[i]`.

If a ShuffleVector is performed on two selects, it will be like:
`V'[mask] = (V'select[i] = Condition[i] ? V'True[i] : V'False[i])`

That's why a ShuffleVector with two SelectInst is equivalent to
first ShuffleVector Condition/True/False and then SelectInst that
result.

This patch implements the transforming described above.

Proof: https://alive2.llvm.org/ce/z/97wfHp
Fixes #120775
2025-03-06 12:43:47 +00:00
Simon Pilgrim
5ddf40fa78
[VectorCombine] scalarizeLoadExtract - don't create scalar loads if any extract is waiting to be erased (#129375)
If any extract is waiting to be erased, then bail out as this will distort the cost calculation and possibly lead to infinite loops.

Fixes #129373
2025-03-01 16:54:22 +00:00
Simon Pilgrim
8f4d2e02be [VectorCombine] scalarizeLoadExtract - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2025-03-01 09:57:08 +00:00
Mikhail Gudim
f5d153ef26
[VectorCombine] Fold binary op of reductions. (#121567)
Replace binary of of two reductions with one reduction of the binary op
applied to vectors. For example:

```
%v0_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v0)
%v1_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v1)
%res = add i32 %v0_red, %v1_red
```
gets transformed to:

```
%1 = add <16 x i32> %v0, %v1
%res = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %1)
```
2025-02-22 06:11:33 -05:00
Simon Pilgrim
2fab6db728
[VectorCombine] foldSelectShuffle - remove extra adds of old shuffles to worklist (#127999)
We already push the old shuffles to the worklist as part of the replaceValue calls, so we shouldn't need to add them to the deferred list as well - my guess is this was to ensure that the instructions got erased first to help cleanup unused instructions, but eraseInstruction should handle this now.
2025-02-20 18:02:34 +00:00
Simon Pilgrim
eb2b453eb7 [VectorCombine] foldInsExtVectorToShuffle - ensure we call getShuffleCost with the input operand type, not the result
Typo in #121216

Fixes #126085
2025-02-06 17:41:24 +00:00
hanbeom
8c1dbac304
[VectorCombine] Allow shuffling between vectors the same type but different element sizes (#121216)
`foldInsExtVectorToShuffle` function combines the extract/insert of a vector into a vector through a shuffle. However, we only supported coupling between vectors of the same size.

This commit allows combining extract/insert for vectors of the same type but with different sizes by converting the length of the vectors.

Proof: https://alive2.llvm.org/ce/z/ELNLr7
Fixed https://github.com/llvm/llvm-project/issues/120772
2025-02-06 10:38:50 +00:00
Min-Yih Hsu
635ab515d5
[VectorCombine] Fold vector.interleave2 with two constant splats (#125144)
If we're interleaving 2 constant splats, for instance `<vscale x 8 x
i32> <splat of 666>` and `<vscale x 8 x i32> <splat of 777>`, we can
create a larger splat `<vscale x 8 x i64> <splat of ((777 << 32) |
666)>` first before casting it back into `<vscale x 16 x i32>`.
2025-02-03 19:05:49 -08:00
Simon Pilgrim
6f6d8084ad
[VectorCombine] Fold insert(binop(x,y),binop(a,b),idx) --> binop(insert(x,a,idx),insert(y,b,idx)) (#124909)
Add foldInsExtBinop fold to cleanup missed vectorization cases which can happen on targets with cheap insert/extract instructions which prevent foldExtractExtract (binop(extract(x),extract(y)) -> extract(binop(x,shuffle(y)))) from helping with the merge.
2025-01-31 09:41:58 +00:00
Simon Pilgrim
87750c9de4 [VectorCombine] foldPermuteOfBinops - match identity shuffles only if they match the destination type
Fixes regression identified after #122118
2025-01-14 16:09:50 +00:00
Simon Pilgrim
6a9e9878a2 [VectorCombine] foldPermuteOfBinops - ensure potential identity mask isn't length changing. 2025-01-14 12:17:21 +00:00
Ramkumar Ramachandra
e409204a89
VectorCombine: teach foldExtractedCmps about samesign (#122883)
Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid
of one of the FIXMEs it introduced by replacing a predicate comparison
with CmpPredicate::getMatching.
2025-01-14 12:04:14 +00:00
Simon Pilgrim
0bf1591d01
[VectorCombine] foldPermuteOfBinops - fold "shuffle (binop (shuffle, other)), undef" --> "binop (shuffle), (shuffle)". (#122118)
foldPermuteOfBinops currently requires both binop operands to be oneuse shuffles to fold the shuffles across the binop, but there will be cases where its still profitable to fold across the binop with only one foldable shuffle.
2025-01-14 10:43:22 +00:00
David Green
676c641718
[VectorCombine] Use getInstructionCost to cost Shuffle. (#122068)
This allows it to produce a more accurate cost for the shuffle, using
the more accurate calls to getShuffleCost in getInstructionCost. It
helps fix some of the regressions from vector combine a little while
ago, now that we have better subvector extract costs.
2025-01-08 20:48:40 +00:00
Simon Pilgrim
a5e129ccde [CostModel][X86] getVectorInstrCost - correctly cost v4f32 insertelement into index 0
This is just the MOVSS instruction (SSE41 INSERTPS is still necessary for index != 0)

This exposed an issue in VectorCombine::foldInsExtFNeg - we need to use the more general SK_PermuteTwoSrc shuffle kind to allow getShuffleCost to match other shuffle kinds (not just SK_Select).
2025-01-07 12:23:45 +00:00