268 Commits

Author SHA1 Message Date
Kazu Hirata
aa613777af
[llvm] Remove redundant control flow (NFC) (#138304) 2025-05-02 10:34:25 -07:00
Simon Pilgrim
f572a5951a [VectorCombine] Ensure canScalarizeAccess handles cases where the index type can't represent all inbounds values
Fixes #132563
2025-04-24 14:17:55 +01:00
calebwat
f989db5745
[NFC] Use cast instead of dyn_cast for Src and Dst vec types in VecCombine folding (#134432)
SrcVecTy and DstVecTy are used without a null check, and originate from
a dyn_cast. This patch adjusts this to use a fixed cast, since it is not
checked for null before use otherwise, but is semantically guaranteed
from previous checks.
2025-04-10 15:21:37 +01:00
Florian Hahn
13799998c0
[EquivalenceClasses] Use DenseMap instead of std::set. (NFC) (#134264)
Replace the std::set with DenseMap, which removes the requirement for an
ordering predicate. This also requires to allocate the ECValue objects
separately. This patch uses a BumpPtrAllocator.

Follow-up to https://github.com/llvm/llvm-project/pull/134075.

Compile-time impact is mostly neutral or slightly positive:

https://llvm-compile-time-tracker.com/compare.php?from=ee4e8197fa67dd1ed6e9470e00708e7feeaacd97&to=242e6a8e42889eebfc0bb5d433a4de7dd9e224a7&stat=instructions:u
2025-04-05 12:24:39 +01:00
Nikita Popov
0738f70615
[Intrinsics] Add Intrinsic::getFnAttributes() (NFC) (#132029)
Most places that call Intrinsic::getAttributes() are only interested in
the function attributes, so add a separate function for that.

The motivation for this is that I'd like to add the ability to specify
range attributes on intrinsics, which requires knowing the function
type. This avoids needing to know the type for most attribute queries.
2025-03-20 09:20:39 +01:00
hanbeom
0ee8f69978
[VectorCombine] Fix invalid shuffle cost argument of foldShuffleOfSelects (#130281)
In the previous code (#128032), it specified the destination vector as the
getShuffleCost argument. Because the shuffle mask specifies the indices
of the two vectors specified as elements, the maximum value is twice the
size of the source vector. This causes a problem if the destination
vector is smaller than the source vector and specify an index in the
mask that exceeds the size of the destination vector.

Fix the problem by correcting the previous code, which was using wrong
argument in the Cost calculation.

Fixes #130250
2025-03-07 16:40:26 +00:00
hanbeom
5d1029b4a8
[VectorCombine] Handle shuffle of selects (#128032)
(shuffle(select(c1,t1,f1)), (select(c2,t2,f2)), m)
-> (select (shuffle c1,c2,m), (shuffle t1,t2,m), (shuffle f1,f2,m))

The behaviour of SelectInst on vectors is the same as for
`V'select[i] = Condition[i] ? V'True[i] : V'False[i]`.

If a ShuffleVector is performed on two selects, it will be like:
`V'[mask] = (V'select[i] = Condition[i] ? V'True[i] : V'False[i])`

That's why a ShuffleVector with two SelectInst is equivalent to
first ShuffleVector Condition/True/False and then SelectInst that
result.

This patch implements the transforming described above.

Proof: https://alive2.llvm.org/ce/z/97wfHp
Fixes #120775
2025-03-06 12:43:47 +00:00
Simon Pilgrim
5ddf40fa78
[VectorCombine] scalarizeLoadExtract - don't create scalar loads if any extract is waiting to be erased (#129375)
If any extract is waiting to be erased, then bail out as this will distort the cost calculation and possibly lead to infinite loops.

Fixes #129373
2025-03-01 16:54:22 +00:00
Simon Pilgrim
8f4d2e02be [VectorCombine] scalarizeLoadExtract - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2025-03-01 09:57:08 +00:00
Mikhail Gudim
f5d153ef26
[VectorCombine] Fold binary op of reductions. (#121567)
Replace binary of of two reductions with one reduction of the binary op
applied to vectors. For example:

```
%v0_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v0)
%v1_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v1)
%res = add i32 %v0_red, %v1_red
```
gets transformed to:

```
%1 = add <16 x i32> %v0, %v1
%res = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %1)
```
2025-02-22 06:11:33 -05:00
Simon Pilgrim
2fab6db728
[VectorCombine] foldSelectShuffle - remove extra adds of old shuffles to worklist (#127999)
We already push the old shuffles to the worklist as part of the replaceValue calls, so we shouldn't need to add them to the deferred list as well - my guess is this was to ensure that the instructions got erased first to help cleanup unused instructions, but eraseInstruction should handle this now.
2025-02-20 18:02:34 +00:00
Simon Pilgrim
eb2b453eb7 [VectorCombine] foldInsExtVectorToShuffle - ensure we call getShuffleCost with the input operand type, not the result
Typo in #121216

Fixes #126085
2025-02-06 17:41:24 +00:00
hanbeom
8c1dbac304
[VectorCombine] Allow shuffling between vectors the same type but different element sizes (#121216)
`foldInsExtVectorToShuffle` function combines the extract/insert of a vector into a vector through a shuffle. However, we only supported coupling between vectors of the same size.

This commit allows combining extract/insert for vectors of the same type but with different sizes by converting the length of the vectors.

Proof: https://alive2.llvm.org/ce/z/ELNLr7
Fixed https://github.com/llvm/llvm-project/issues/120772
2025-02-06 10:38:50 +00:00
Min-Yih Hsu
635ab515d5
[VectorCombine] Fold vector.interleave2 with two constant splats (#125144)
If we're interleaving 2 constant splats, for instance `<vscale x 8 x
i32> <splat of 666>` and `<vscale x 8 x i32> <splat of 777>`, we can
create a larger splat `<vscale x 8 x i64> <splat of ((777 << 32) |
666)>` first before casting it back into `<vscale x 16 x i32>`.
2025-02-03 19:05:49 -08:00
Simon Pilgrim
6f6d8084ad
[VectorCombine] Fold insert(binop(x,y),binop(a,b),idx) --> binop(insert(x,a,idx),insert(y,b,idx)) (#124909)
Add foldInsExtBinop fold to cleanup missed vectorization cases which can happen on targets with cheap insert/extract instructions which prevent foldExtractExtract (binop(extract(x),extract(y)) -> extract(binop(x,shuffle(y)))) from helping with the merge.
2025-01-31 09:41:58 +00:00
Simon Pilgrim
87750c9de4 [VectorCombine] foldPermuteOfBinops - match identity shuffles only if they match the destination type
Fixes regression identified after #122118
2025-01-14 16:09:50 +00:00
Simon Pilgrim
6a9e9878a2 [VectorCombine] foldPermuteOfBinops - ensure potential identity mask isn't length changing. 2025-01-14 12:17:21 +00:00
Ramkumar Ramachandra
e409204a89
VectorCombine: teach foldExtractedCmps about samesign (#122883)
Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid
of one of the FIXMEs it introduced by replacing a predicate comparison
with CmpPredicate::getMatching.
2025-01-14 12:04:14 +00:00
Simon Pilgrim
0bf1591d01
[VectorCombine] foldPermuteOfBinops - fold "shuffle (binop (shuffle, other)), undef" --> "binop (shuffle), (shuffle)". (#122118)
foldPermuteOfBinops currently requires both binop operands to be oneuse shuffles to fold the shuffles across the binop, but there will be cases where its still profitable to fold across the binop with only one foldable shuffle.
2025-01-14 10:43:22 +00:00
David Green
676c641718
[VectorCombine] Use getInstructionCost to cost Shuffle. (#122068)
This allows it to produce a more accurate cost for the shuffle, using
the more accurate calls to getShuffleCost in getInstructionCost. It
helps fix some of the regressions from vector combine a little while
ago, now that we have better subvector extract costs.
2025-01-08 20:48:40 +00:00
Simon Pilgrim
a5e129ccde [CostModel][X86] getVectorInstrCost - correctly cost v4f32 insertelement into index 0
This is just the MOVSS instruction (SSE41 INSERTPS is still necessary for index != 0)

This exposed an issue in VectorCombine::foldInsExtFNeg - we need to use the more general SK_PermuteTwoSrc shuffle kind to allow getShuffleCost to match other shuffle kinds (not just SK_Select).
2025-01-07 12:23:45 +00:00
Simon Pilgrim
d993b11b86 [VectorCombine] Remove superfluous whitespace from debug log comment. NFC. 2025-01-06 15:37:15 +00:00
Simon Pilgrim
054e7c5971 [VectorCombine] foldInsExtVectorToShuffle - ignore shuffle costs for 'identity' insertion masks
<u,1,u,u> 'inplace' single src shuffles can be treated as free identity shuffles - ignore any shuffle cost (similar to what we already do in other folds like foldShuffleOfShuffles) - eventually getShuffleCost should just return TCC_Free in these cases but in a lot of the targets' shuffle cost logic this currently ends up treated as a generic SK_PermuteSingleSrc.

We still want to generate the shuffle as it will help further shuffle folds with the additional PoisonMaskElem 'undemanded' elements.
2025-01-05 13:02:31 +00:00
Simon Pilgrim
e3ec5a7286
[VectorCombine] foldShuffleOfBinops - fold shuffle(binop(shuffle(x),shuffle(z)),binop(shuffle(y),shuffle(w)) -> binop(shuffle(x,z),shuffle(y,w)) (#120984)
Some patterns (in particular horizontal style patterns) can end up with shuffles straddling both sides of a binop/cmp.

Where individually the folds aren't worth it, by merging the (oneuse) shuffles we can notably reduce the net instruction count and cost.

One of the final steps towards finally addressing #34072
2025-01-03 10:29:07 +00:00
Simon Pilgrim
035e64c0ec [VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands (REAPPLIED)
As we're reducing the use count of the operands its more likely that they will now fold, as they were previously being prevented by a m_OneUse check, or the cost of retaining the extra instruction had been too high.

This is necessary for some upcoming patches, although the only change so far is instruction ordering as it allows some SSE folds of 256/512-bit with 128-bit subvectors to occur earlier in foldShuffleToIdentity as the subvector concats are free.

Reapplied with a fix for foldSingleElementStore/scalarizeLoadExtract which were replacing/removing memory operations - we need to ensure that the worklist is populated in the correct order so all users of the old memory operations are erased first, so there are no remaining users of the loads when its time to remove them as well.

Pulled out of #120984
2025-01-02 18:19:02 +00:00
Simon Pilgrim
f739aa4004 [VectorCombine] replaceValue - add "VC: Replacing" debug message to help the log show replacement for old/new. 2025-01-02 17:23:13 +00:00
Simon Pilgrim
b195bb87e1 [VectorCombine] scalarizeLoadExtract - consistently use LoadInst and ExtractElementInst specific operand getters. NFC
Noticed while investigating the hung builds reported after af83093933ca73bc82c33130f8bda9f1ae54aae2
2024-12-31 14:42:39 +00:00
Simon Pilgrim
d5a96eb125 Revert af83093933ca73bc82c33130f8bda9f1ae54aae2 "[VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands"
Reports of hung builds, but I don't have time to investigate at the moment.
2024-12-30 21:20:56 +00:00
Simon Pilgrim
af83093933 [VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands
As we're reducing the use count of the operands its more likely that they will now fold, as they were previously being prevented by a m_OneUse check, or the cost of retaining the extra instruction had been too high.

This is necessary for some upcoming patches, although the only change so far is instruction ordering as it allows some SSE folds of 256/512-bit with 128-bit subvectors to occur earlier in foldShuffleToIdentity as the subvector concats are free.

Pulled out of #120984
2024-12-30 17:52:42 +00:00
Simon Pilgrim
f2f02b21cd [VectorCombine] foldShuffleOfBinops - only accept exact matching cmp predicates
m_SpecificCmp allowed equivalent predicate+flags which don't necessarily work after being folded from "shuffle (cmpop), (cmpop)" into "cmpop (shuffle), (shuffle)"

Fixes #121110
2024-12-28 09:21:31 +00:00
Simon Pilgrim
e3f8c229f5 [VectorCombine] foldInsExtVectorToShuffle - inserting into a poison base vector can be modelled as a single src shuffle
We already canonicalized an undef base vector to the RHS to improve further folding, this extends this to improve the shuffle cost estimate of the single src shuffle
2024-12-23 15:49:17 +00:00
Simon Pilgrim
29c89d7265
[VectorCombine] foldShuffleOfShuffles - fold "shuffle (shuffle x, y, m1), (shuffle y, x, m2)" -> "shuffle x, y, m3" (#120959)
foldShuffleOfShuffles currently only folds unary shuffles to ensure we don't end up with a merged shuffle with more than 2 sources, but this prevented cases where both shuffles were sharing sources.

This patch generalizes the merge process to find up to 2 sources as it merges with the inner shuffles, it also moves the undef/poison handling stages into the merge loop as well.

Fixes #120764
2024-12-23 14:56:15 +00:00
Simon Pilgrim
bf873aa3ec [VectorCombine] foldShuffleToIdentity - add debug message for match
Helps with debugging to show to that the fold found the match.
2024-12-22 17:21:44 +00:00
Simon Pilgrim
f96337e04e [VectorCombine] foldConcatOfBoolMasks - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2024-12-22 16:21:02 +00:00
Simon Pilgrim
82b5bda42c [VectorCombine] Add "VC: Erasing" debug message to help the log show when dead WorkList instructions are erased. 2024-12-20 17:59:14 +00:00
Simon Pilgrim
e3157d3f0d [VectorCombine] foldBitcastShuffle - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2024-12-20 17:59:13 +00:00
David Green
70eac255b8
[VectorCombine] Add fp cast handling for shuffletoidentity (#120641)
This fixes some regressions from recent changes to vector combine in
#120216. It allows shuffleToIdentity to look through fp casts as other
casts, and makes sure mismatching vector types in splats and casts do
not block the transform, as only the lanes should matter.
2024-12-20 15:05:08 +00:00
Simon Pilgrim
b87a5fb9fd [VectorCombine] Add "VC: Visiting" debug message to help the log show the instruction folding order. 2024-12-20 14:57:58 +00:00
Simon Pilgrim
5f0db7c112 [VectorCombine] Add "VECTORCOMBINE on <FUNCTION_NAME>" title debug message to help finding vectorcombine stages in the debug log 2024-12-20 13:32:49 +00:00
Simon Pilgrim
c5434804ee [VectorCombine] foldInsExtVectorToShuffle - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2024-12-20 13:32:49 +00:00
hanbeom
ff93ca7d6c
[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different (#120461)
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask

Original combining left the combine between vectors of different lengths as a TODO. this commit do that. (see
#[baab4aa1ba])
2024-12-20 10:44:49 +00:00
Finn Plummer
45c01e8a33
[NFC][TargetTransformInfo][VectorUtils] Consolidate isVectorIntrinsic... api (#117635)
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
  consistently name them
- move `isTriviallyScalarizable` to VectorUtils
  
- update all uses of the api and provide the TTI parameter

Resolves #117030
2024-12-19 11:54:26 -08:00
Simon Pilgrim
fbc18b85d6
Revert "[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different" (#120422)
Reverts llvm/llvm-project#115209 - investigating a reported regression
2024-12-18 13:32:53 +00:00
hanbeom
b7a8d9584c
[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different (#115209)
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask

Original combining left the combine between vectors of different lengths as a TODO.
2024-12-18 07:47:42 +00:00
Simon Pilgrim
5287299f88
[VectorCombine] foldShuffleOfBinops - prefer same cost fold if it reduces instruction count (#120216)
We don't fold "shuffle (binop), (binop)" -> "binop (shuffle), (shuffle)" if the old/new costs are equal, but we can relax this if either new shuffle will constant fold as it will reduce instruction count.
2024-12-17 18:10:20 +00:00
Simon Pilgrim
8217c2eaef
[VectorCombine] foldShuffleOfBinops - extend to handle icmp/fcmp ops as well (#120075)
Extend binary instructions matching to match compare instructions + predicate as well.
2024-12-16 17:23:04 +00:00
Simon Pilgrim
916bae2d92 [VectorCombine] foldShuffleOfBinops - refactor to make it easier to match icmp/fcmp patterns
NFC refactor to make it easier to also use the fold for icmp/fcmp patterns in a future patch - match the Shuffle with general Instruction operands and avoid explicit use of the BinaryOperator matches as much as possible for the general costing / fold.
2024-12-15 12:49:24 +00:00
Simon Pilgrim
cc54a0ce56
[VectorCombine] vectorizeLoadInsert - only fold when inserting into a poison vector (#119906)
We have corresponding poison tests in the "-inseltpoison.ll" sibling test files.

Fixes #119900
2024-12-14 11:56:12 +00:00
Ramkumar Ramachandra
4a0d53a0b0
PatternMatch: migrate to CmpPredicate (#118534)
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.

This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
2024-12-13 14:18:33 +00:00
Simon Pilgrim
86779da52b
[VectorCombine] Fold "(or (zext (bitcast X)), (shl (zext (bitcast Y)), C))" -> "(bitcast (concat X, Y))" MOVMSK bool mask style patterns (#119695)
Mask/Bool vectors are often bitcast to/from scalar integers, in particular when concatenating mask results, often this is due to the difficulties of working with vector of bools on C/C++. On x86 this typically involves the MOVMSK/KMOV instructions.

To concatenate bool masks, these are typically cast to scalars, which are then zero-extended, shifted and OR'd together.

This patch attempts to match these scalar concatenation patterns and convert them to vector shuffles instead. This in turn often assists with further vector combines, depending on the cost model.

Reapplied patch from #119559 - fixed use after free issue.

Fixes #111431
2024-12-12 13:45:10 +00:00