148 Commits

Author SHA1 Message Date
Simon Pilgrim
769c22f25b
[VectorCombine] Fold reduce(trunc(x)) -> trunc(reduce(x)) iff cost effective (#81852)
Vector truncations can be pretty expensive, especially on X86, whilst scalar truncations are often free.

If the cost of performing the add/mul/and/or/xor reduction is cheap enough on the pre-truncated type, then avoid the vector truncation entirely.

Fixes https://github.com/llvm/llvm-project/issues/81469
2024-02-19 11:32:23 +00:00
Jeremy Morse
2425e2940e
[DebugInfo][RemoveDIs] Have getInsertionPtAfterDef return an iterator (#73149)
Part of the "RemoveDIs" project to remove debug intrinsics requires
passing block-positions around in iterators rather than as instruction
pointers, allowing some debug-info to reside in BasicBlock::iterator.
This means getInsertionPointAfterDef has to return an iterator, and as
it can return no-instruction that means returning an optional iterator.

This patch changes the signature for getInsertionPtAfterDef and then
patches up the various places that use it to handle the different type.
This would overall be an NFC patch, however in
InstCombinerImpl::freezeOtherUses I've started skipping any debug
intrinsics at the returned insert-position. This should not have any
_meaningful_ effect on the compiler output: at worst it means variable
assignments that are skipped will now cover the freeze instruction and
anything inserted before it, which should be inconsequential.

Sadly: this makes the function signature ugly. This is probably the
ugliest piece of fallout for the "RemoveDIs" work, but it serves the
overall purpose of improving compile times and not allowing `-g` to
affect compiler output, so should be worthwhile in the end.
2023-11-30 12:19:57 +00:00
Youngsuk Kim
859338a695 [llvm] Replace uses of Type::getPointerTo (NFC)
Work towards removing method Type::getPointerTo.
Opaque ptr cleanup effort.
2023-11-29 10:22:31 -06:00
Nikita Popov
03f05a4e72 [IR] Don't include GenericDomTreeConstruction.h in header (NFC)
The whole point of the GenericDomTree.h vs
GenericDomTreeConstruction.h distinction is that the latter only
needs to be included in the source file and not the header.
2023-11-22 09:06:36 +01:00
Michael Maitland
acef83c142
[VectorCombine] Fix crash in scalarizeVPIntrinsic (#72039)
When getSplatOp returns nullptr, the intrinsic cannot be scalarized.
This patch includes a test case that fixes a crash from trying to
scalarize the VPIntrinsic when getSplatOp returns nullptr.

This fixes https://github.com/llvm/llvm-project/issues/72034.
2023-11-11 19:54:15 -05:00
Nikita Popov
6a06155c53 [VectorCombine] Discard ScalarizationResults if transform aborted
Fixes https://github.com/llvm/llvm-project/issues/69820.
2023-10-31 11:24:30 +01:00
Nabeel Omer
8e31acf8ca
[VectorCombine] Add special handling for truncating shuffles (#70013)
When dealing with a truncating shuffle, we can end up in a situation
where the type passed to getShuffleCost is the type of the result of the
shuffle, and the mask references an element which is out of bounds of
the result vector.

If dealing with truncating shuffles, pass the type of the input vectors
to `getShuffleCost()` in order to avoid an out-of-bounds assertion.
2023-10-24 15:03:43 +01:00
Hans Wennborg
e2fc68c3db Typos: 'maxium', 'minium' 2023-10-23 10:42:28 +02:00
Luke Lau
c35939b22e
[VectorCombine] Use isSafeToSpeculativelyExecute to guard VP scalarization (#69494)
Previously we were just matching against a fixed list of VP intrinsics
that we
knew couldn't be speculated, but we can reuse the logic in
isSafeToSpeculativelyExecuteWithOpcode. This also allows speculation in
more
cases, e.g. when the divisor is known to be non-zero.

Unfortunately we can't reuse the exact same function call for VP
intrinsics
with functional intrinsics instead of opcodes, because
isSafeToSpeculativelyExecute needs an instruction that already exists.
So this
just copies the logic by peeking into the function attributes of the
intrinsic.
2023-10-19 12:45:21 -04:00
Alexey Bataev
c2ae16f6a7 [VectorCombine]Fix a crash during long vector analysis.
If the analysis of the single vector requested, need to use original
type to avoid crash
2023-10-09 14:22:37 -07:00
Simon Pilgrim
bea3967271 [VectorCombine] Rename foldBitcastShuf -> foldBitcastShuffle. NFC.
Consistently use the term "Shuffle" in all vector combiner folds.
2023-10-09 11:28:50 +01:00
Simon Pilgrim
94795a37e8 [VectorCombine] foldBitcastShuf - add support for length changing shuffles
Allow length changing shuffle masks in the "bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'" fold.

It also exposes some poor shuffle mask detection for extract/insert subvector cases inside improveShuffleKindFromMask

First stage towards addressing Issue #67803
2023-10-06 11:59:51 +01:00
Simon Pilgrim
d3e66a88c2 [VectorCombine] foldBitcastShuf - compute scale factors using shuffle type element size instead of element count. NFCI.
First step towards supporting length changing shuffles
2023-10-05 18:58:36 +01:00
Nikita Popov
3b82397965 [VectorCombine] Check for non-byte-sized element type
We should check whether the element type is non-byte-sized, not
the vector type. For types like <32 x i1> the whole type is
byte-sized, but the individual elements (that we scalarize to)
are not.

Fixes https://github.com/llvm/llvm-project/issues/67060.
2023-09-28 14:18:30 +02:00
Ben Shi
ea0ee55c02
[VectorCombine] Enable transform 'scalarizeLoadExtract' for non constant indexes (#65445)
Enable the transform if a non constant index is guaranteed to be safe
via a UREM/AND.
2023-09-26 09:41:53 +08:00
Michael Maitland
e0aaa1956d
[VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats (#65706)
of the scalar operation

VP Intrinsics whose vector operands are both splat values may be
simplified into the scalar version of the operation and the result is
splatted.

This issue is the intrinsic dual of #65072.
2023-09-20 18:27:51 -04:00
Ben Shi
87143ff9f2 [VectorCombine] Fix a spot in commit 068357d9b09cd635b1c2f126d119ce9afecb28f7
My previous commit leads to a crash in "Builders/sanitizer-x86_64-linux-fast"
as https://lab.llvm.org/buildbot/#/builders/5/builds/36746. And this patch
fixes it.
2023-09-18 15:01:47 +08:00
Ben Shi
068357d9b0
[VectorCombine] Enable transform 'scalarizeLoadExtract' for scalable vector types (#65443)
The transform 'scalarizeLoadExtract' can be applied to scalable
vector types if the index is less than the minimum number of elements.

The check whether the index is less than the minimum number of elements
locates at line 1175~1180. 'scalarizeLoadExtract' will call
'canScalarizeAccess' and check the returned result if this transform is safe.

At the beginning of the function 'canScalarizeAccess', the index will be
checked
1. If it is less than the number of elements of a fixed vector type.
2. If it is less than the minimum number of elements of a scalable vector type.

Otherwise 'canScalarizeAccess' will return unsafe and this transform
will be prevented.
2023-09-18 10:49:18 +08:00
Ben Shi
ad35d916cd [VectorCombine] Enable transform 'foldSingleElementStore' for scalable vector types
The transform 'foldSingleElementStore' can be applied to scalable
vector types if the index is less than the minimum number of elements.

Reviewed By: dmgreen, nikic

Differential Revision: https://reviews.llvm.org/D157676
2023-08-23 17:12:36 +08:00
Nuno Lopes
d75fb17963 [VectorCombine] Use poison insteaf of undef as placeholder [NFC]
These vector lanes are never accessed. They are used for shifting a value into the right lane
and therefore only 1 value of the whole vector is actually used
2023-07-19 10:29:08 +01:00
ManuelJBrito
d22edb9794 [IR][NFC] Change UndefMaskElem to PoisonMaskElem
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.

Differential Revision: https://reviews.llvm.org/D149256
2023-04-27 18:01:54 +01:00
Bjorn Pettersson
a20f7efbc5 Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in
passes that no longer has support for the legacy PM.
2023-04-17 13:54:19 +02:00
Kazu Hirata
c83c4b58d1 [Transforms] Apply fixes from performance-for-range-copy (NFC) 2023-04-16 08:25:28 -07:00
Sanjay Patel
af39acda88 [VectorCombine] fix insertion point of shuffles
As shown in issue #60649, the new shuffles were
being inserted before a phi, and that is invalid.

It seems like most test coverage for this fold
(foldSelectShuffle) lives in the AArch64 dir,
but this doesn't repro there for a base target.
2023-02-10 10:57:11 -05:00
Arthur Eubanks
15977742d3 Reland [LegacyPM] Remove some legacy passes
These are part of the optimization pipeline, of which the legacy pass manager version is deprecated.

Namely
* Internalize
* StripSymbols
* StripNonDebugSymbols
* StripDeadDebugInfo
* StripDeadPrototypes
* VectorCombine
* WarnMissedTransformations

Fixed previously failing ocaml tests (one of them seems to already be failing?)
2023-02-07 12:56:05 -08:00
Arthur Eubanks
1b254022b2 Revert "[LegacyPM] Remove some legacy passes"
This reverts commit a4b4f62beb0bf40123181e5f5bdf32ef54f87166.

Ocaml bindings tests failing.
2023-02-07 10:17:45 -08:00
Arthur Eubanks
a4b4f62beb [LegacyPM] Remove some legacy passes
These are part of the optimization pipeline, of which the legacy pass manager version is deprecated.

Namely
* Internalize
* StripSymbols
* StripNonDebugSymbols
* StripDeadDebugInfo
* StripDeadPrototypes
* VectorCombine
* WarnMissedTransformations
2023-02-07 09:57:48 -08:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Benjamin Kramer
b6942a2880 [NFC] Hide implementation details in anonymous namespaces 2023-01-08 17:37:02 +01:00
Matt Devereau
ee4d6c8bf0 [VectorCombine] Enable scalarizeBinopOrCmp for scalable vectors
This reverts a change to exclude scalarizeBinopOrCmp in VectorCombine for
scalable vectors which caused poor scalable Binop codegen.

Differential Revision: https://reviews.llvm.org/D138545
2022-11-23 13:17:21 +00:00
Benjamin Kramer
f116107f2d [VectorCombine] Don't touch instruction after foldSingleElementStore, it might be deleted
Use after free found by asan.
2022-11-22 21:12:42 +01:00
Sanjay Patel
ede6d608f4 [VectorCombine] switch on opcode to compile faster
This follows 87debdadaf18 to further eliminate wasting time
calling helper functions only to early return to the main
run loop.

Once again, this results in significant savings based on
experimental data:
https://llvm-compile-time-tracker.com/compare.php?from=01023bfcd33f922ed8c934ce563e54abe8bfe246&to=3dce4f70b73e48ccb045decb634c185e6b4c67d5&stat=instructions:u

This is NFCI other than making the pass faster. The total
cost of VectorCombine runs in an -O3 build appears to be
well under 0.1% of compile-time now, so there's not much
left to do AFAICT.

There's a TODO about making the code cleaner, but it
probably doesn't change timing much. I didn't include those
changes here because it requires updating much more code.
2022-11-22 10:23:32 -05:00
Sanjay Patel
163bb6d64e [Passes][VectorCombine] enable early run generally and try load folds
An early run of VectorCombine was added with D102496 specifically to
deal with unnecessary vector ops produced with the C matrix extension.
This patch is proposing to try those folds in general and add a pair
of load folds to the menu.

The load transform will partly solve (see PhaseOrdering diffs) a
longstanding vectorization perf bug by removing redundant loads via GVN:
issue #17113

The main reason for not enabling the extra pass generally in the initial
patch was compile-time cost. The cost of VectorCombine was significantly
(surprisingly) improved with:
87debdadaf18
https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u

...so the extra run is going to cost very little now - the total cost of
the 2 runs should be less than the 1 run before that micro-optimization:
https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u

It may be possible to reduce the cost slightly more with a few more
earlier-exits like that, but it's probably in the noise based on timing
experiments.

Differential Revision: https://reviews.llvm.org/D138353
2022-11-21 13:57:55 -05:00
Sanjay Patel
8f337f8ffe [VectorCombine] generalize pass param name for early combines; NFC
The option was added with https://reviews.llvm.org/D102496,
and currently the name is accurate, but I am hoping to add
a load transform that is not a scalarization. See issue #17113.
2022-11-21 13:57:55 -05:00
Sanjay Patel
87debdadaf [VectorCombine] check instruction type before dispatching to folds
This is no externally visible change intended, but appears to be a
noticeable (surprising) improvement in compile-time based on:
https://llvm-compile-time-tracker.com/compare.php?from=0f3e72e86c8c7c6bf0ec24bf1e2acd74b4123e7b&to=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&stat=instructions:u

The early returns in the individual fold functions are not good
enough to avoid the overhead of the many "fold*" calls, so this
speeds up the main instruction loop enough to make a difference.
2022-11-18 16:03:18 -05:00
Sanjay Patel
b57819e130 [VectorCombine] widen a load with subvector insert
This adapts/copies code from the existing fold that allows
widening of load scalar+insert. It can help in IR because
it removes a shuffle, and the backend can already narrow
loads if that is profitable in codegen.

We might be able to consolidate more of the logic, but
handling this basic pattern should be enough to make a small
difference on one of the motivating examples from issue #17113.
The final goal of combining loads on those patterns is not
solved though.

Differential Revision: https://reviews.llvm.org/D137341
2022-11-10 14:11:32 -05:00
Sanjay Patel
710e34e136 [VectorCombine] move load safety checks to helper function; NFC
These checks can be re-used with other potential transforms
such as a load of a subvector-insert.
2022-11-04 10:39:37 -04:00
Sanjay Patel
8d76fbb5f0 [VectorCombine] fix crashing on match of non-canonical fneg
We can't assume that operand 0 is the negated operand because
the matcher handles "fsub -0.0, X" (and also +0.0 with FMF).

By capturing the extract within the match, we avoid the bug
and make the transform more robust (can't assume that this
pass will only see canonical IR).
2022-10-17 10:47:48 -04:00
Sanjay Patel
baab4aa1ba [VectorCombine] convert scalar fneg with insert/extract to vector fneg
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index --> shuffle DestVec, (fneg SrcVec), Mask

This is a specialized form of what could be a more general fold for a binop.
It's also possible that fneg is overlooked by SLP in this kind of
insert/extract pattern since it's a unary op.

This shows up in the motivating example from #issue 58139, but it won't solve
it (that probably requires some x86-specific backend changes). There are also
some small enhancements (see TODO comments) that can be done as follow-up
patches.

Differential Revision: https://reviews.llvm.org/D135278
2022-10-10 14:59:56 -04:00
Matt Arsenault
2adae8e1b7 VectorCombine: Pass through AssumptionCache 2022-09-19 19:25:22 -04:00
Kazu Hirata
56ea4f9bd3 [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-27 21:21:02 -07:00
Fangrui Song
7d6017fd31 [TTI] Change new getVectorInstrCost overload to use const reference after D131114
A const reference is preferred over a non-null const pointer.
`Type *` is kept as is to match the other overload.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D131197
2022-08-04 15:16:51 -07:00
Mingming Liu
bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
David Green
4b7913c357 [VectorCombine] Only consider shuffle uses with the same type.
The backend getShuffleCosts do not currently handle shuffles that change
size very well. Limit the shuffles we collect to the same type to make
sure they do not cause issues as reported in D128732.
2022-07-16 13:23:39 +01:00
Sander de Smalen
519d7876cb [VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector.
This addresses https://github.com/llvm/llvm-project/issues/56377

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D129136
2022-07-07 08:37:04 +00:00
David Green
5493f8fc59 [VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
  %x = shuffle %i1, %i2
  %y = shuffle %i1, %i2
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
  %x = shuffle %i1, %i2
  %y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

This is a recommit with some additional checks for supported forms and
out-of-bounds mask elements, with some extra tests.

Differential Revision: https://reviews.llvm.org/D128732
2022-07-05 17:16:18 +01:00
Nikita Popov
b69c75d53f Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles"
This reverts commit 19a1e20b8a0f69da2a871eae6cbd03d1314ee02d.

Clang crashes while linking bullet from llvm-test-suite in
ReleaseLTO-g cmake configuration.
2022-07-05 09:31:20 +02:00
David Green
19a1e20b8a [VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
  %x = shuffle %i1, %i2
  %y = shuffle %i1, %i2
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
  %x = shuffle %i1, %i2
  %y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

Differential Revision: https://reviews.llvm.org/D128732
2022-07-04 13:38:43 +01:00
Nikita Popov
bdba8278d9 [VectorCombine] Avoid ConstantExpr::get() (NFC)
Use IRBuilder APIs instead, which will still constant fold.
2022-06-29 17:17:52 +02:00
Kazu Hirata
c399b3a608 [Vectorize] Use llvm::is_contained (NFC) 2022-06-18 15:49:15 -07:00