This fixes some regressions from recent changes to vector combine in
#120216. It allows shuffleToIdentity to look through fp casts as other
casts, and makes sure mismatching vector types in splats and casts do
not block the transform, as only the lanes should matter.
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask
Original combining left the combine between vectors of different lengths as a TODO. this commit do that. (see
#[baab4aa1ba])
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
consistently name them
- move `isTriviallyScalarizable` to VectorUtils
- update all uses of the api and provide the TTI parameter
Resolves#117030
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask
Original combining left the combine between vectors of different lengths as a TODO.
We don't fold "shuffle (binop), (binop)" -> "binop (shuffle), (shuffle)" if the old/new costs are equal, but we can relax this if either new shuffle will constant fold as it will reduce instruction count.
NFC refactor to make it easier to also use the fold for icmp/fcmp patterns in a future patch - match the Shuffle with general Instruction operands and avoid explicit use of the BinaryOperator matches as much as possible for the general costing / fold.
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
Mask/Bool vectors are often bitcast to/from scalar integers, in particular when concatenating mask results, often this is due to the difficulties of working with vector of bools on C/C++. On x86 this typically involves the MOVMSK/KMOV instructions.
To concatenate bool masks, these are typically cast to scalars, which are then zero-extended, shifted and OR'd together.
This patch attempts to match these scalar concatenation patterns and convert them to vector shuffles instead. This in turn often assists with further vector combines, depending on the cost model.
Reapplied patch from #119559 - fixed use after free issue.
Fixes#111431
Noticed while investigating a crash in #119559 - we don't account for I being replaced and its Type being reallocated. So hoist the checks to the start of the loop.
Mask/Bool vectors are often bitcast to/from scalar integers, in particular when concatenating mask results, often this is due to the difficulties of working with vector of bools on C/C++. On x86 this typically involves the MOVMSK/KMOV instructions.
To concatenate bool masks, these are typically cast to scalars, which are then zero-extended, shifted and OR'd together.
This patch attempts to match these scalar concatenation patterns and convert them to vector shuffles instead. This in turn often assists with further vector combines, depending on the cost model.
Fixes#111431
foldInsExtVectorToShuffle is likely to be inserting into an undef value, so make sure we've canonicalized this to the RHS in the folded shuffle to help further VectorCombine folds.
Minor tweak to help #34072
foldShuffleOfShuffles already handles "shuffle (shuffle x, undef), (shuffle y, undef)" patterns, this patch relaxes the requirement so it can handle cases where only a single operand is a shuffle (and the other can be any other value and will be kept in place).
Fixes#86068
Don't use TCK_RecipThroughput independently in every VectorCombine fold.
Some prep work to allow a potential future patch to use VectorCombine to optimise for code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput).
There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).
insert (DstVec, (extract SrcVec, ExtIdx), InsIdx) --> shuffle (DstVec, SrcVec, Mask)
This commit combines extract/insert on a vector into Shuffle with vector.
#114901 exposed that foldExtractedCmps didn't account for non-commutative binops, and were disabled by 05e838f428555bcc4507bd37912da60ea9110ef6
This patch re-enables support for non-commutative binops by ensuring that the LHS/RHS arg order of the binop is retained.
The fold needs to be adjusted to correctly track the LHS/RHS operands, which will take some refactoring, for now just disable the fold in this case.
Fixes#114901
There are artificial one-use limitations on foldExtractedCmps. Adjust
the costs to account for multi-use, and strip the one-use matcher,
lifting the limitations.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
Consider the following case:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
%19 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
%20 = zext <2 x i1> %19 to <2 x i32>
%21 = lshr <2 x i32> %20, %broadcast.splat20
ret <2 x i32> %21
}
```
After https://github.com/llvm/llvm-project/pull/104606, we shrink the
lshr into:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
%1 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
%2 = trunc <2 x i32> %broadcast.splat20 to <2 x i1>
%3 = lshr <2 x i1> %1, %2
%4 = zext <2 x i1> %3 to <2 x i32>
ret <2 x i32> %4
}
```
It is incorrect since `lshr i1 X, 1` returns `poison`.
This patch adds additional check on the shamt operand. The lshr will get
shrunk iff we ensure that the shamt is less than bitwidth of the smaller
type. As `computeKnownBits(&I, *DL).countMaxActiveBits() > BW` always
evaluates to true for `lshr(zext(X), Y)`, this check will only apply to
bitwise logical instructions.
Alive2: https://alive2.llvm.org/ce/z/j_RmTa
Fixes https://github.com/llvm/llvm-project/issues/108698.
Check that `binop(zext(value)`, other) is possible and profitable to transform
into: `zext(binop(value, trunc(other)))`.
When CPU architecture has illegal scalar type iX, but vector type <N * iX> is
legal, scalar expressions before vectorisation may be extended to a legal
type iY. This extension could result in underutilization of vector lanes,
as more lanes could be used at one instruction with the lower type.
Vectorisers may not always recognize opportunities for type shrinking, and
this patch aims to address that limitation.
This extends the existing foldTruncFromReductions transform to handle
sext and zext as well. This is only legal for the bitwise reductions
(and/or/xor) and not the arithmetic ones (add, mul). Use the same
costing decision to drive whether we do the transform.
Workaround until I can get #96884 fixed properly - when trying to find identity sequences, peek through any bitcasts to see if the values all came from the same source. We don't run CSE frequently enough to merge all the bitcasts that we end up with.