175 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
e3fa7ee11d
VectorCombine: refactor foldShuffleToIdentity (NFC) (#92766)
Lift out the long lambdas into static functions, use C++ destructing
syntax, and fix other minor things to improve the readability of the
function.
2024-05-21 08:12:30 +01:00
David Green
c3677e4522 [VectorCombine] Don't transform single shuffles in shuffleToIdentity
This will help in later patches where the checks for operands being
instructions is removed, and might help not remove unnecessary poison lanes.
2024-05-18 23:37:55 +01:00
David Green
b7ed097f29
[VectorCombine] Add intrinsics handling to shuffleToIdentity (#91000)
This is probably the most involved addition, as it tries to make use of
isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a
number of different intrinsics that are all lane-wise. Additional tests have
been added for some of the different intrinsics from
isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.
2024-05-12 20:31:11 +01:00
Ramkumar Ramachandra
57b9c15227
VectorCombine: fix logical error after m_Trunc match (#91201)
The matcher m_Trunc() matches an Operator with a given Opcode, which
could either be an Instruction or ConstExpr.
VectorCombine::foldTruncFromReductions() incorrectly assumes that the
pattern matched is always an Instruction, and attempts a cast. Fix this.

Fixes #88796.
2024-05-08 09:47:55 +01:00
David Green
d145f40963 [VectorCombine] shuffleToIdentity - guard against call instructions.
The shuffleToIdentity fold needs to be a bit more careful about the difference
between call instructions and intrinsics. The second can be handled, but the
first should result in bailing out. This patch also adds some extra intrinsic
tests from #91000.

Fixes #91078
2024-05-05 10:47:11 +01:00
David Green
a4d10266d2
[VectorCombine] Add foldShuffleToIdentity (#88693)
This patch adds a basic version of a combine that attempts to remove
shuffles that when combined simplify away to an identity shuffle. For
example:
%ab = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 3,
i32 2, i32 1, i32 0>
%at = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 7,
i32 6, i32 5, i32 4>
  %abt = fneg <4 x half> %at
  %abb = fneg <4 x half> %ab
%r = shufflevector <4 x half> %abt, <4 x half> %abb, <8 x i32> <i32 7,
i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
By looking through the shuffles and fneg, it can be simplified to:
  %r = fneg <8 x half> %a

The code tracks each lane starting from the original shuffle, keeping a
track of a vector of {src, idx}. As we propagate up through the
instructions we will either look through intermediate instructions
(binops and unops) or see a collections of lanes that all have the same
src and incrementing idx (an identity). We can also see a single value
with identical lanes, which we can treat like a splat.

Only the basic version is added here, handling identities, splats,
binops and unops. In follow-up patches other instructions can be added
such as constants, intrinsics, cmp/sel and zext/sext/trunc.
2024-05-03 19:14:38 +01:00
Simon Pilgrim
282b56f43d
[VectorCombine] foldShuffleOfBinops - add support for length changing shuffles (#88899)
Refactor to be closer to foldShuffleOfCastops - sibling patch to #88743 that can be used to address some of the issues identified in #88693
2024-04-24 10:18:49 +01:00
Simon Pilgrim
7f4f237cd8 [VectorCombine] foldShuffleOfShuffles - add missing arguments to getShuffleCost calls.
Ensure the getShuffleCost arguments/instruction args are populated - minor extension to #88743 to help improve shuffle costs for certain corner cases (e.g. shuffles of loads)
2024-04-23 11:53:08 +01:00
Simon Pilgrim
bddfbe748b
[VectorCombine] foldShuffleOfShuffles - fold "shuffle (shuffle x, undef), (shuffle y, undef)" -> "shuffle x, y" (#88743)
Another step towards cleaning up shuffles that have been split, often across bitcasts between SSE intrinsic.

Strip shuffles entirely if we fold to an identity shuffle.
2024-04-22 15:57:59 +01:00
Simon Pilgrim
4cc9c6d98d [VectorCombine] foldShuffleOfBinops - don't fold shuffle(divrem(x,y),divrem(z,w)) if mask contains poison
Fixes #89390
2024-04-22 09:00:38 +01:00
Harald van Dijk
60de56c743
[ValueTracking] Restore isKnownNonZero parameter order. (#88873)
Prior to #85863, the required parameters of llvm::isKnownNonZero were
Value and DataLayout. After, they are Value, Depth, and SimplifyQuery,
where SimplifyQuery is implicitly constructible from DataLayout. The
change to move Depth before SimplifyQuery needed callers to be updated
unnecessarily, and as commented in #85863, we actually want Depth to be
after SimplifyQuery anyway so that it can be defaulted and the caller
does not need to specify it.
2024-04-16 15:21:09 +01:00
Yingwei Zheng
e0a628715a
[ValueTracking] Convert isKnownNonZero to use SimplifyQuery (#85863)
This patch converts `isKnownNonZero` to use SimplifyQuery. Then we can
use the context information from `DomCondCache`.

Fixes https://github.com/llvm/llvm-project/issues/85823.
Alive2: https://alive2.llvm.org/ce/z/QUvHVj
2024-04-12 23:47:20 +08:00
Simon Pilgrim
ea3d0db130 [VectorCombine] foldShuffleOfCastops - ensure we can scale shuffle masks between bitcasted vector types
Don't just assert that the src/dst vector element counts are multiples of one another - in general IR this can actually happen.

Reported by @mikaelholmen
2024-04-12 13:53:02 +01:00
Simon Pilgrim
ff74236f34 [VectorCombine] foldShuffleOfCastops - ensure we add all new instructions onto the worklist
When creating cast(shuffle(x,y)) we were only adding the cast() to the worklist, not the new shuffle, preventing recursive combines.

foldShuffleOfBinops is also failing to do this, but I still need to add test coverage for this.
2024-04-11 15:47:09 +01:00
Simon Pilgrim
6fd2fdccf2 [VectorCombine] foldShuffleOfCastops - extend shuffle(bitcast(x),bitcast(y)) -> bitcast(shuffle(x,y)) support
Handle shuffle mask scaling handling for cases where the bitcast src/dst element counts are different
2024-04-11 14:02:56 +01:00
Simon Pilgrim
717d3f3974 [VectorCombine] foldShuffleOfCastops - add initial shuffle(bitcast(x),bitcast(y)) -> bitcast(shuffle(x,y)) support
Just handle cases where the bitcast src/dst element counts are the same (future patches will add shuffle mask scaling)
2024-04-11 11:43:11 +01:00
Simon Pilgrim
a403ad9336 [VectorCombine] foldBitcastShuffle - limit bitcast(shuffle(x,y)) -> shuffle(bitcast(x),bitcast(y))
Only fold bitcast(shuffle(x,y)) -> shuffle(bitcast(x),bitcast(y)) if we won't actually increase the number of bitcasts (i.e. x or y is already bitcasted from the correct type).
2024-04-11 11:43:11 +01:00
David Green
4ac2721e51
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
David Green
869797daca [VectorCombine] Add a debug message for foldShuffleOfCastop. NFC
This optimization, much like the existing foldShuffleOfBinops can cause a
lot of regressions. Add a quick debug message to make the costs are more
obvious.
2024-04-07 07:54:22 +01:00
Simon Pilgrim
212b2bbcd1
[VectorCombine][X86] foldShuffleOfCastops - fold shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) iff cost efficient (#87510)
Based off the existing foldShuffleOfBinops fold

Fixes #67803
2024-04-04 11:22:37 +01:00
Simon Pilgrim
1d06f41b72
[VectorCombine] foldBitcastShuffle - peek through any residual bitcasts before creating a new bitcast on top (#86119)
Encountered while working on #67803, wading through the chains of bitcasts that SSE intrinsics introduces - this patch helps prevents cases where the bitcast chains aren't cleared out and we can't perform further combines until after InstCombine/InstSimplify has run.
2024-04-02 10:58:45 +01:00
Simon Pilgrim
15eba9c12a [VectorCombine] Add DataLayout to VectorCombine class instead of repeated calls to getDataLayout(). NFC. 2024-03-21 13:36:23 +00:00
Simon Pilgrim
7812fcf3d7 [VectorCombine] foldBitcastShuf - add support for binary shuffles (REAPPLIED)
Generalise fold to "bitcast (shuf V0, V1, MaskC) --> shuf (bitcast V0), (bitcast V1), MaskC'".

Reapplied with a clang codegen test fix.

Further prep work for #67803
2024-03-20 15:06:19 +00:00
Simon Pilgrim
ada24ae5e6 Revert 2ac85d8d200a9e1e0ced501c2d2f04404c400bd9 "[VectorCombine] foldBitcastShuf - add support for binary shuffles"
Breaks some tests in other subprojects - will recommit with a fix later
2024-03-20 13:39:42 +00:00
Simon Pilgrim
2ac85d8d20 [VectorCombine] foldBitcastShuf - add support for binary shuffles
Generalise fold to "bitcast (shuf V0, V1, MaskC) --> shuf (bitcast V0), (bitcast V1), MaskC'".

Further prep work for #67803
2024-03-20 13:19:30 +00:00
Simon Pilgrim
fe2119a7b0 [VectorCombine] foldBitcastShuffle - include the cost of bitcasts in the comparison
This makes no real difference currently as we only fold unary shuffles, but I'm hoping to handle binary shuffles in a future patch.
2024-03-20 10:56:38 +00:00
Philip Reames
0081ec11d8
[VectorCombine] Add a mask for SK_Broadcast shuffle costing (#85808)
This is part of a series of small patches to compute shuffle masks for
the couple of cases where we call getShuffleCost without one. My goal is
to add an invariant that all calls to getShuffleCost for fixed length
vectors have a mask.

Note that this code appears to be reachable with scalable vectors, and
thus we have to only pass a non-empty mask when the number of elements
is precisely known.
2024-03-19 08:57:09 -07:00
Simon Pilgrim
769c22f25b
[VectorCombine] Fold reduce(trunc(x)) -> trunc(reduce(x)) iff cost effective (#81852)
Vector truncations can be pretty expensive, especially on X86, whilst scalar truncations are often free.

If the cost of performing the add/mul/and/or/xor reduction is cheap enough on the pre-truncated type, then avoid the vector truncation entirely.

Fixes https://github.com/llvm/llvm-project/issues/81469
2024-02-19 11:32:23 +00:00
Jeremy Morse
2425e2940e
[DebugInfo][RemoveDIs] Have getInsertionPtAfterDef return an iterator (#73149)
Part of the "RemoveDIs" project to remove debug intrinsics requires
passing block-positions around in iterators rather than as instruction
pointers, allowing some debug-info to reside in BasicBlock::iterator.
This means getInsertionPointAfterDef has to return an iterator, and as
it can return no-instruction that means returning an optional iterator.

This patch changes the signature for getInsertionPtAfterDef and then
patches up the various places that use it to handle the different type.
This would overall be an NFC patch, however in
InstCombinerImpl::freezeOtherUses I've started skipping any debug
intrinsics at the returned insert-position. This should not have any
_meaningful_ effect on the compiler output: at worst it means variable
assignments that are skipped will now cover the freeze instruction and
anything inserted before it, which should be inconsequential.

Sadly: this makes the function signature ugly. This is probably the
ugliest piece of fallout for the "RemoveDIs" work, but it serves the
overall purpose of improving compile times and not allowing `-g` to
affect compiler output, so should be worthwhile in the end.
2023-11-30 12:19:57 +00:00
Youngsuk Kim
859338a695 [llvm] Replace uses of Type::getPointerTo (NFC)
Work towards removing method Type::getPointerTo.
Opaque ptr cleanup effort.
2023-11-29 10:22:31 -06:00
Nikita Popov
03f05a4e72 [IR] Don't include GenericDomTreeConstruction.h in header (NFC)
The whole point of the GenericDomTree.h vs
GenericDomTreeConstruction.h distinction is that the latter only
needs to be included in the source file and not the header.
2023-11-22 09:06:36 +01:00
Michael Maitland
acef83c142
[VectorCombine] Fix crash in scalarizeVPIntrinsic (#72039)
When getSplatOp returns nullptr, the intrinsic cannot be scalarized.
This patch includes a test case that fixes a crash from trying to
scalarize the VPIntrinsic when getSplatOp returns nullptr.

This fixes https://github.com/llvm/llvm-project/issues/72034.
2023-11-11 19:54:15 -05:00
Nikita Popov
6a06155c53 [VectorCombine] Discard ScalarizationResults if transform aborted
Fixes https://github.com/llvm/llvm-project/issues/69820.
2023-10-31 11:24:30 +01:00
Nabeel Omer
8e31acf8ca
[VectorCombine] Add special handling for truncating shuffles (#70013)
When dealing with a truncating shuffle, we can end up in a situation
where the type passed to getShuffleCost is the type of the result of the
shuffle, and the mask references an element which is out of bounds of
the result vector.

If dealing with truncating shuffles, pass the type of the input vectors
to `getShuffleCost()` in order to avoid an out-of-bounds assertion.
2023-10-24 15:03:43 +01:00
Hans Wennborg
e2fc68c3db Typos: 'maxium', 'minium' 2023-10-23 10:42:28 +02:00
Luke Lau
c35939b22e
[VectorCombine] Use isSafeToSpeculativelyExecute to guard VP scalarization (#69494)
Previously we were just matching against a fixed list of VP intrinsics
that we
knew couldn't be speculated, but we can reuse the logic in
isSafeToSpeculativelyExecuteWithOpcode. This also allows speculation in
more
cases, e.g. when the divisor is known to be non-zero.

Unfortunately we can't reuse the exact same function call for VP
intrinsics
with functional intrinsics instead of opcodes, because
isSafeToSpeculativelyExecute needs an instruction that already exists.
So this
just copies the logic by peeking into the function attributes of the
intrinsic.
2023-10-19 12:45:21 -04:00
Alexey Bataev
c2ae16f6a7 [VectorCombine]Fix a crash during long vector analysis.
If the analysis of the single vector requested, need to use original
type to avoid crash
2023-10-09 14:22:37 -07:00
Simon Pilgrim
bea3967271 [VectorCombine] Rename foldBitcastShuf -> foldBitcastShuffle. NFC.
Consistently use the term "Shuffle" in all vector combiner folds.
2023-10-09 11:28:50 +01:00
Simon Pilgrim
94795a37e8 [VectorCombine] foldBitcastShuf - add support for length changing shuffles
Allow length changing shuffle masks in the "bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'" fold.

It also exposes some poor shuffle mask detection for extract/insert subvector cases inside improveShuffleKindFromMask

First stage towards addressing Issue #67803
2023-10-06 11:59:51 +01:00
Simon Pilgrim
d3e66a88c2 [VectorCombine] foldBitcastShuf - compute scale factors using shuffle type element size instead of element count. NFCI.
First step towards supporting length changing shuffles
2023-10-05 18:58:36 +01:00
Nikita Popov
3b82397965 [VectorCombine] Check for non-byte-sized element type
We should check whether the element type is non-byte-sized, not
the vector type. For types like <32 x i1> the whole type is
byte-sized, but the individual elements (that we scalarize to)
are not.

Fixes https://github.com/llvm/llvm-project/issues/67060.
2023-09-28 14:18:30 +02:00
Ben Shi
ea0ee55c02
[VectorCombine] Enable transform 'scalarizeLoadExtract' for non constant indexes (#65445)
Enable the transform if a non constant index is guaranteed to be safe
via a UREM/AND.
2023-09-26 09:41:53 +08:00
Michael Maitland
e0aaa1956d
[VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats (#65706)
of the scalar operation

VP Intrinsics whose vector operands are both splat values may be
simplified into the scalar version of the operation and the result is
splatted.

This issue is the intrinsic dual of #65072.
2023-09-20 18:27:51 -04:00
Ben Shi
87143ff9f2 [VectorCombine] Fix a spot in commit 068357d9b09cd635b1c2f126d119ce9afecb28f7
My previous commit leads to a crash in "Builders/sanitizer-x86_64-linux-fast"
as https://lab.llvm.org/buildbot/#/builders/5/builds/36746. And this patch
fixes it.
2023-09-18 15:01:47 +08:00
Ben Shi
068357d9b0
[VectorCombine] Enable transform 'scalarizeLoadExtract' for scalable vector types (#65443)
The transform 'scalarizeLoadExtract' can be applied to scalable
vector types if the index is less than the minimum number of elements.

The check whether the index is less than the minimum number of elements
locates at line 1175~1180. 'scalarizeLoadExtract' will call
'canScalarizeAccess' and check the returned result if this transform is safe.

At the beginning of the function 'canScalarizeAccess', the index will be
checked
1. If it is less than the number of elements of a fixed vector type.
2. If it is less than the minimum number of elements of a scalable vector type.

Otherwise 'canScalarizeAccess' will return unsafe and this transform
will be prevented.
2023-09-18 10:49:18 +08:00
Ben Shi
ad35d916cd [VectorCombine] Enable transform 'foldSingleElementStore' for scalable vector types
The transform 'foldSingleElementStore' can be applied to scalable
vector types if the index is less than the minimum number of elements.

Reviewed By: dmgreen, nikic

Differential Revision: https://reviews.llvm.org/D157676
2023-08-23 17:12:36 +08:00
Nuno Lopes
d75fb17963 [VectorCombine] Use poison insteaf of undef as placeholder [NFC]
These vector lanes are never accessed. They are used for shifting a value into the right lane
and therefore only 1 value of the whole vector is actually used
2023-07-19 10:29:08 +01:00
ManuelJBrito
d22edb9794 [IR][NFC] Change UndefMaskElem to PoisonMaskElem
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.

Differential Revision: https://reviews.llvm.org/D149256
2023-04-27 18:01:54 +01:00
Bjorn Pettersson
a20f7efbc5 Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in
passes that no longer has support for the legacy PM.
2023-04-17 13:54:19 +02:00
Kazu Hirata
c83c4b58d1 [Transforms] Apply fixes from performance-for-range-copy (NFC) 2023-04-16 08:25:28 -07:00