199 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
1f919aa778
VectorCombine: lift one-use limitation in foldExtractedCmps (#110902)
There are artificial one-use limitations on foldExtractedCmps. Adjust
the costs to account for multi-use, and strip the one-use matcher,
lifting the limitations.
2024-10-10 14:10:41 +01:00
David Green
c136d3237a
[VectorCombine] Do not try to operate on OperandBundles. (#111635)
This bails out if we see an intrinsic with an operand bundle on it, to
make sure we don't process the bundles incorrectly.

Fixes #110382.
2024-10-09 16:20:03 +01:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Yingwei Zheng
87663fdab9
[VectorCombine] Don't shrink lshr if the shamt is not less than bitwidth (#108705)
Consider the following case:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
  %19 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
  %20 = zext <2 x i1> %19 to <2 x i32>
  %21 = lshr <2 x i32> %20, %broadcast.splat20
  ret <2 x i32> %21
}
```
After https://github.com/llvm/llvm-project/pull/104606, we shrink the
lshr into:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
  %1 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
  %2 = trunc <2 x i32> %broadcast.splat20 to <2 x i1>
  %3 = lshr <2 x i1> %1, %2
  %4 = zext <2 x i1> %3 to <2 x i32>
  ret <2 x i32> %4
}
```
It is incorrect since `lshr i1 X, 1` returns `poison`.
This patch adds additional check on the shamt operand. The lshr will get
shrunk iff we ensure that the shamt is less than bitwidth of the smaller
type. As `computeKnownBits(&I, *DL).countMaxActiveBits() > BW` always
evaluates to true for `lshr(zext(X), Y)`, this check will only apply to
bitwise logical instructions.

Alive2: https://alive2.llvm.org/ce/z/j_RmTa
Fixes https://github.com/llvm/llvm-project/issues/108698.
2024-09-15 18:38:06 +08:00
Igor Kirillov
1b57cbcf25
[VectorCombine] Refactor Insertion Point setting in shrinkType (#108398) 2024-09-13 10:03:31 +01:00
Igor Kirillov
958a337132
[VectorCombine] Fix trunc generated between PHINodes (#108228) 2024-09-12 10:20:56 +01:00
Han-Kuan Chen
0ccc6092d2
[VectorCombine] Add foldShuffleOfIntrinsics. (#106502) 2024-09-10 21:10:09 +08:00
Igor Kirillov
bf694841f5
[VectorCombine] Add type shrinking and zext propagation for fixed-width vector types (#104606)
Check that `binop(zext(value)`, other) is possible and profitable to transform
into: `zext(binop(value, trunc(other)))`.
When CPU architecture has illegal scalar type iX, but vector type <N * iX> is
legal, scalar expressions before vectorisation may be extended to a legal
type iY. This extension could result in underutilization of vector lanes,
as more lanes could be used at one instruction with the lower type.
Vectorisers may not always recognize opportunities for type shrinking, and
this patch aims to address that limitation.
2024-09-10 10:09:03 +01:00
Kazu Hirata
e525f91640
[llvm] Use llvm::is_contained (NFC) (#101855) 2024-08-04 11:42:48 -07:00
Kazu Hirata
b7146aed5b
[Transforms] Construct SmallVector with ArrayRef (NFC) (#101851) 2024-08-03 15:33:08 -07:00
Philip Reames
ded35c0c3a
[vectorcombine] Pull sext/zext through reduce.or/and/xor (#99548)
This extends the existing foldTruncFromReductions transform to handle
sext and zext as well. This is only legal for the bitwise reductions
(and/or/xor) and not the arithmetic ones (add, mul). Use the same
costing decision to drive whether we do the transform.
2024-07-18 13:56:40 -07:00
Simon Pilgrim
da286c8bf6
[VectorCombine] foldShuffleToIdentity - peek through bitcasts to see if they come from the same value to form identity sequence (#98334)
Workaround until I can get #96884 fixed properly - when trying to find identity sequences, peek through any bitcasts to see if the values all came from the same source. We don't run CSE frequently enough to merge all the bitcasts that we end up with.
2024-07-15 21:36:23 +01:00
Simon Pilgrim
ef5b1ec0dd [VectorCombine] foldShuffleToIdentity - ensure casts have the same source type 2024-07-09 13:10:11 +01:00
Simon Pilgrim
7e054c33d4 [VectorCombine] foldShuffleOfCastops - don't restrict to oneuse but compare total costs instead
Some casts (especially bitcasts but others as well) are incredibly cheap (or free), so don't limit the shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) to oneuse cases, but instead compare the total before/after costs of possibly repeating some casts.
2024-07-08 14:57:51 +01:00
Simon Pilgrim
b546096d94
[VectorCombine] foldShuffleToIdentity - handle bitcasts with equal element counts (#97731)
Basic initial patch for #96884 that just handles case where we bitcast between float/integers of the same element width
2024-07-05 09:47:42 +01:00
David Green
76c8e1d857 [VectorCombine] Guard against the lane zero select predicate being scalar
All but the first lane was being checked, but this could leave the first lane
with a scalar select predicate. This just extends the check to make sure the
types are all the same
2024-06-28 17:27:16 +01:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
David Green
efa8463ab9
[VectorCombine] Add free concats to shuffleToIdentity. (#94954)
This is another relatively small adjustment to shuffleToIdentity, which
has had a few knock-one effects to need a few more changes. It attempts
to detect free concats, that will be legalized to multiple vector
operations. For example if the lanes are '[a[0], a[1], b[0], b[1]]' and
a and b are v2f64 under aarch64.

In order to do this:
- isFreeConcat detects whether the input has piece-wise identities from
multiple inputs that can become a concat.
- A tree of concat shuffles is created to concatenate the input values
into a single vector. This is a little different to most other inputs as
there are created from multiple values that are being combined together,
and we cannot rely on the Lane0 insert location always being valid.
- The insert location is changed to the original location instead of
updating per item, which ensure it is valid due to the order that we
visit and create items.
2024-06-25 07:55:08 +01:00
David Green
a1bdb01656 [VectorCombine] Change shuffleToIdentity to use Use. NFC
When looking up through shuffles, a Value can be multiple different leaf types
(for example an identity from one position, a splat from another). We currently
detect this by recalculating which type of leaf it is when generating, but as
more types of leafs are added (#94954) this doesn't scale very well.

This patch switches it to use Use, not Value, to more accurately detect which
type of leaf each Use should have.
2024-06-17 15:25:33 +01:00
Henry Jiang
b5b61cce96
[VectorCombine] Preserves the maximal legal FPMathFlags during foldShuffleToIdentity (#94295)
The `VectorCombine::foldShuffleToIdentity` does not preserve fast math
flags when folding the shuffle, leading to unexpected vectorized result
and missed optimizations with FMA instructions.

We can conservatively take the maximal legal set of fast math flags
whenever we fold shuffles to identity to enable further optimizations in
the backend.

---------

Co-authored-by: Henry Jiang <henry.jiang1@ibm.com>
2024-06-05 11:35:37 -04:00
David Green
93d8d74ae6
[VectorCombine] Remove requirement for Instructions in shuffleToIdentity (#93543)
This removes the check that both operands of the original shuffle are
instructions, which is a relic from a previous version that held more
variables as Instructions.
2024-05-29 09:36:53 +01:00
David Green
1c6746e2db
[VectorCombine] Add support for zext/sext/trunc to shuffleToIdentity (#92696)
This is one of the simple additions to shuffleToIdentity that help it
look through intermediate zext/sext instructions.
2024-05-29 08:56:41 +01:00
David Green
516a9f5183
[VectorCombine] Add Cmp and Select for shuffleToIdentity (#92794)
Other than some additional checks needed for compare predicates and
selects with scalar condition operands, these are relatively simple
additions to what already exists.
2024-05-28 13:10:19 +01:00
David Green
f53f2a8c92
[VectorCombine] Add constant splat handling for shuffleToIdentity (#92797)
This just adds splat constants, which can be treated like any other
splat which hopefully makes them very simple. It does not try to handle
more complex constant vectors yet, just the more common splats.
2024-05-28 10:55:57 +01:00
Ramkumar Ramachandra
e3fa7ee11d
VectorCombine: refactor foldShuffleToIdentity (NFC) (#92766)
Lift out the long lambdas into static functions, use C++ destructing
syntax, and fix other minor things to improve the readability of the
function.
2024-05-21 08:12:30 +01:00
David Green
c3677e4522 [VectorCombine] Don't transform single shuffles in shuffleToIdentity
This will help in later patches where the checks for operands being
instructions is removed, and might help not remove unnecessary poison lanes.
2024-05-18 23:37:55 +01:00
David Green
b7ed097f29
[VectorCombine] Add intrinsics handling to shuffleToIdentity (#91000)
This is probably the most involved addition, as it tries to make use of
isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a
number of different intrinsics that are all lane-wise. Additional tests have
been added for some of the different intrinsics from
isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.
2024-05-12 20:31:11 +01:00
Ramkumar Ramachandra
57b9c15227
VectorCombine: fix logical error after m_Trunc match (#91201)
The matcher m_Trunc() matches an Operator with a given Opcode, which
could either be an Instruction or ConstExpr.
VectorCombine::foldTruncFromReductions() incorrectly assumes that the
pattern matched is always an Instruction, and attempts a cast. Fix this.

Fixes #88796.
2024-05-08 09:47:55 +01:00
David Green
d145f40963 [VectorCombine] shuffleToIdentity - guard against call instructions.
The shuffleToIdentity fold needs to be a bit more careful about the difference
between call instructions and intrinsics. The second can be handled, but the
first should result in bailing out. This patch also adds some extra intrinsic
tests from #91000.

Fixes #91078
2024-05-05 10:47:11 +01:00
David Green
a4d10266d2
[VectorCombine] Add foldShuffleToIdentity (#88693)
This patch adds a basic version of a combine that attempts to remove
shuffles that when combined simplify away to an identity shuffle. For
example:
%ab = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 3,
i32 2, i32 1, i32 0>
%at = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 7,
i32 6, i32 5, i32 4>
  %abt = fneg <4 x half> %at
  %abb = fneg <4 x half> %ab
%r = shufflevector <4 x half> %abt, <4 x half> %abb, <8 x i32> <i32 7,
i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
By looking through the shuffles and fneg, it can be simplified to:
  %r = fneg <8 x half> %a

The code tracks each lane starting from the original shuffle, keeping a
track of a vector of {src, idx}. As we propagate up through the
instructions we will either look through intermediate instructions
(binops and unops) or see a collections of lanes that all have the same
src and incrementing idx (an identity). We can also see a single value
with identical lanes, which we can treat like a splat.

Only the basic version is added here, handling identities, splats,
binops and unops. In follow-up patches other instructions can be added
such as constants, intrinsics, cmp/sel and zext/sext/trunc.
2024-05-03 19:14:38 +01:00
Simon Pilgrim
282b56f43d
[VectorCombine] foldShuffleOfBinops - add support for length changing shuffles (#88899)
Refactor to be closer to foldShuffleOfCastops - sibling patch to #88743 that can be used to address some of the issues identified in #88693
2024-04-24 10:18:49 +01:00
Simon Pilgrim
7f4f237cd8 [VectorCombine] foldShuffleOfShuffles - add missing arguments to getShuffleCost calls.
Ensure the getShuffleCost arguments/instruction args are populated - minor extension to #88743 to help improve shuffle costs for certain corner cases (e.g. shuffles of loads)
2024-04-23 11:53:08 +01:00
Simon Pilgrim
bddfbe748b
[VectorCombine] foldShuffleOfShuffles - fold "shuffle (shuffle x, undef), (shuffle y, undef)" -> "shuffle x, y" (#88743)
Another step towards cleaning up shuffles that have been split, often across bitcasts between SSE intrinsic.

Strip shuffles entirely if we fold to an identity shuffle.
2024-04-22 15:57:59 +01:00
Simon Pilgrim
4cc9c6d98d [VectorCombine] foldShuffleOfBinops - don't fold shuffle(divrem(x,y),divrem(z,w)) if mask contains poison
Fixes #89390
2024-04-22 09:00:38 +01:00
Harald van Dijk
60de56c743
[ValueTracking] Restore isKnownNonZero parameter order. (#88873)
Prior to #85863, the required parameters of llvm::isKnownNonZero were
Value and DataLayout. After, they are Value, Depth, and SimplifyQuery,
where SimplifyQuery is implicitly constructible from DataLayout. The
change to move Depth before SimplifyQuery needed callers to be updated
unnecessarily, and as commented in #85863, we actually want Depth to be
after SimplifyQuery anyway so that it can be defaulted and the caller
does not need to specify it.
2024-04-16 15:21:09 +01:00
Yingwei Zheng
e0a628715a
[ValueTracking] Convert isKnownNonZero to use SimplifyQuery (#85863)
This patch converts `isKnownNonZero` to use SimplifyQuery. Then we can
use the context information from `DomCondCache`.

Fixes https://github.com/llvm/llvm-project/issues/85823.
Alive2: https://alive2.llvm.org/ce/z/QUvHVj
2024-04-12 23:47:20 +08:00
Simon Pilgrim
ea3d0db130 [VectorCombine] foldShuffleOfCastops - ensure we can scale shuffle masks between bitcasted vector types
Don't just assert that the src/dst vector element counts are multiples of one another - in general IR this can actually happen.

Reported by @mikaelholmen
2024-04-12 13:53:02 +01:00
Simon Pilgrim
ff74236f34 [VectorCombine] foldShuffleOfCastops - ensure we add all new instructions onto the worklist
When creating cast(shuffle(x,y)) we were only adding the cast() to the worklist, not the new shuffle, preventing recursive combines.

foldShuffleOfBinops is also failing to do this, but I still need to add test coverage for this.
2024-04-11 15:47:09 +01:00
Simon Pilgrim
6fd2fdccf2 [VectorCombine] foldShuffleOfCastops - extend shuffle(bitcast(x),bitcast(y)) -> bitcast(shuffle(x,y)) support
Handle shuffle mask scaling handling for cases where the bitcast src/dst element counts are different
2024-04-11 14:02:56 +01:00
Simon Pilgrim
717d3f3974 [VectorCombine] foldShuffleOfCastops - add initial shuffle(bitcast(x),bitcast(y)) -> bitcast(shuffle(x,y)) support
Just handle cases where the bitcast src/dst element counts are the same (future patches will add shuffle mask scaling)
2024-04-11 11:43:11 +01:00
Simon Pilgrim
a403ad9336 [VectorCombine] foldBitcastShuffle - limit bitcast(shuffle(x,y)) -> shuffle(bitcast(x),bitcast(y))
Only fold bitcast(shuffle(x,y)) -> shuffle(bitcast(x),bitcast(y)) if we won't actually increase the number of bitcasts (i.e. x or y is already bitcasted from the correct type).
2024-04-11 11:43:11 +01:00
David Green
4ac2721e51
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
David Green
869797daca [VectorCombine] Add a debug message for foldShuffleOfCastop. NFC
This optimization, much like the existing foldShuffleOfBinops can cause a
lot of regressions. Add a quick debug message to make the costs are more
obvious.
2024-04-07 07:54:22 +01:00
Simon Pilgrim
212b2bbcd1
[VectorCombine][X86] foldShuffleOfCastops - fold shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) iff cost efficient (#87510)
Based off the existing foldShuffleOfBinops fold

Fixes #67803
2024-04-04 11:22:37 +01:00
Simon Pilgrim
1d06f41b72
[VectorCombine] foldBitcastShuffle - peek through any residual bitcasts before creating a new bitcast on top (#86119)
Encountered while working on #67803, wading through the chains of bitcasts that SSE intrinsics introduces - this patch helps prevents cases where the bitcast chains aren't cleared out and we can't perform further combines until after InstCombine/InstSimplify has run.
2024-04-02 10:58:45 +01:00
Simon Pilgrim
15eba9c12a [VectorCombine] Add DataLayout to VectorCombine class instead of repeated calls to getDataLayout(). NFC. 2024-03-21 13:36:23 +00:00
Simon Pilgrim
7812fcf3d7 [VectorCombine] foldBitcastShuf - add support for binary shuffles (REAPPLIED)
Generalise fold to "bitcast (shuf V0, V1, MaskC) --> shuf (bitcast V0), (bitcast V1), MaskC'".

Reapplied with a clang codegen test fix.

Further prep work for #67803
2024-03-20 15:06:19 +00:00
Simon Pilgrim
ada24ae5e6 Revert 2ac85d8d200a9e1e0ced501c2d2f04404c400bd9 "[VectorCombine] foldBitcastShuf - add support for binary shuffles"
Breaks some tests in other subprojects - will recommit with a fix later
2024-03-20 13:39:42 +00:00
Simon Pilgrim
2ac85d8d20 [VectorCombine] foldBitcastShuf - add support for binary shuffles
Generalise fold to "bitcast (shuf V0, V1, MaskC) --> shuf (bitcast V0), (bitcast V1), MaskC'".

Further prep work for #67803
2024-03-20 13:19:30 +00:00
Simon Pilgrim
fe2119a7b0 [VectorCombine] foldBitcastShuffle - include the cost of bitcasts in the comparison
This makes no real difference currently as we only fold unary shuffles, but I'm hoping to handle binary shuffles in a future patch.
2024-03-20 10:56:38 +00:00