1423 Commits

Author SHA1 Message Date
Alexey Bataev
85635c7f60 [SLP][NFC]Use ScalarTy consistently in getEntryCost, NFC. 2023-07-31 06:52:56 -07:00
Alexey Bataev
48bc5b0a29 [SLP][PR64099]Fix unsound undef to poison transformation when handling
insertelement instructions.

If the original vector has undef, not poison values, which are not
rewritten by later insertelement instructions, need to transform shuffle
with the undef vector, not a poison vector, and actual indices, not
PoisonMaskElem, otherwise the transformation may produce more poisons
output than the input.
2023-07-27 16:09:49 -07:00
Fangrui Song
e8e7a959c7 [SLP] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D154891 2023-07-24 09:47:50 -07:00
Alexey Bataev
44eca64224 [SLP]Check scalars before trying scheduling.
Need to check the scalars if they can be vectorized before trying to
schedule them. It may save compile time and improve vectorization on
large functions/basic blocks.

Differential Revision: https://reviews.llvm.org/D154891
2023-07-24 09:25:19 -07:00
David Berard
8fa02db8cf [llvm][SLP] Exit early if inputs to comparator are equal
**TL;DR:** This PR modifies a comparator. The comparator is used in a subsequent call to llvm::stable_sort. Sorting comparators should follow strict weak ordering - in particular, (x < x) should return false. This PR adds a fix to avoid an infinite loop when the inputs to the comparator are equal.

**Details**:

Sometimes when two equivalent tensors passed into the comparator, we encounter infinite looping (at aae2eaae2c/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (L4049))

Although it seems like this comparator will never be called with two equivalent pointers, some sanitizers, e.g. https://chromium.googlesource.com/chromiumos/third_party/gcc/+/refs/heads/stabilize-zako-5712.88.B/libstdc++-v3/include/bits/stl_algo.h#360, will add checks for (x < x). When this sanitizer is used with the current implementation, it triggers a comparator check for (x < x) which runs into the infinite loop

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D155874
2023-07-21 05:40:55 -07:00
Alexey Bataev
aae2eaae2c [SLP]Fix a crash when trying to cast scalable vector type to fixed.
Need to check for FixedVectorType, not a vector type, since later
compiler performs unconditional cast to FixedVectorType and gets the
number of elements in this type.
2023-07-19 11:53:49 -07:00
Alexey Bataev
4bbf37199c [SLP][NFC]Improve compile-time by using map {TreeEntry *, Instruction *}
in getLastInstructionInBundle(), NFC.

Instead of building EntryToLastInstruction before the vectorization,
build it automatically during the calls to getLastInstructionInBundle()
function.
2023-07-18 13:24:55 -07:00
Alexey Bataev
83ba148a8a [SLP]Include cost of the reshuffling for same nodes with resizing.
Need to account reshuffling, required for the reused elements in the
buildvector nodes, which are copies (perfect match) of other nodes, but
include reused elements.

Differential Revision: https://reviews.llvm.org/D149966
2023-07-18 06:05:15 -07:00
Arthur Eubanks
1cb3fbc713 Revert "[SLP][NFC]Improve compile-time by using map {TreeEntry *, Instruction *}"
This reverts commit 0d21b7cbdeb2f2eb5ef123a15099da0b651b24c0.

Causes broken IR, test case provided at
https://reviews.llvm.org/rG0d21b7cbdeb2f2eb5ef123a15099da0b651b24c0
2023-07-17 14:54:47 -07:00
Alexey Bataev
d8d4c99685 [SLP][NFC]Improve performance of isGatherShuffledEntry() function, NFC.
Transformed if checks to asserts and simplified some more code to
improve compile time.
2023-07-17 14:08:56 -07:00
Alexey Bataev
0d21b7cbde [SLP][NFC]Improve compile-time by using map {TreeEntry *, Instruction *}
in getLastInstructionInBundle(), NFC.

Instead of building EntryToLastInstruction before the vectorization,
build it automatically during the calls to getLastInstructionInBundle()
function.
2023-07-17 11:36:21 -07:00
Alexey Bataev
8ab962e411 [SLP]Relax assertion to check if the input scalars were extended to
match the size of base node (PR63668).

Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
2023-07-14 07:19:49 -07:00
Alexey Bataev
bc8abb42bb Revert "[SLP]Relax assertion to check if the input scalars were extended to"
This reverts commit 6fdfc81287ecdc2a7f409d08538ec6ce2bd698da to fix the
check in the assert )need to use end, nod begin function).
2023-07-14 07:04:06 -07:00
Alexey Bataev
6fdfc81287 [SLP]Relax assertion to check if the input scalars were extended to
match the size of base node (PR63668).

Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
2023-07-14 06:48:25 -07:00
Anna Thomas
1159266734 [SLP] Add support for fmaximum/fminimum reduction
This patch adds support for vectorized reduction of maximum/minimum
intrinsics which are under the appropriate reduction kind.

Differential Revision: https://reviews.llvm.org/D154463
2023-07-12 15:22:38 -04:00
David Green
12025cef3e [CostModel] Use min/max intrinsics for vecreduce.min/max costs
This changes the costmodelling of the vecreduce.min/max nodes to use the costs
of the relevant min/max intrinsics instead of expanding them to compare and
selects. The getMinMaxReductionCost have changed to take a Opcode for the
relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are
no longer needed.

A follow up patch will add some basic fminimum/fmaximum costmodelling.

Differential Revision: https://reviews.llvm.org/D153547
2023-07-04 15:02:30 +01:00
Valery N Dmitriev
03b118c7e4 [SLP] Fix crash on attempt to access on invalid iterator state.
The patch fixes corner case when no of scalar instructions
required scheduling for vectorized node.

Differential Revision: https://reviews.llvm.org/D154175
2023-06-30 11:40:25 -07:00
Nikita Popov
cc31d787c3 Revert "Reland [SLP] Provide an universal interface for FixedVectorType::get. NFC."
This reverts commit 19b1d3bd7eeecbeb1e45045960faf325c7bc5c64.

Both the commit and the review are missing a patch description.
2023-06-30 11:31:16 +02:00
Han-Kuan Chen
19b1d3bd7e Reland [SLP] Provide an universal interface for FixedVectorType::get. NFC.
Differential Revision: https://reviews.llvm.org/D154114
2023-06-29 23:15:52 -07:00
Arthur Eubanks
a374fb2b5e Revert "[SLP] Provide an universal interface for FixedVectorType::get. NFC."
This reverts commit fcd58ea50c218b61a58d6815b9d15bad7dbc75a3.

Causes crashes, see comments on D154114.
2023-06-29 21:49:05 -07:00
Han-Kuan Chen
fcd58ea50c [SLP] Provide an universal interface for FixedVectorType::get. NFC.
Differential Revision: https://reviews.llvm.org/D154114
2023-06-29 17:06:08 -07:00
Luke Lau
d0d864f6f4 [SLP] Explicitly pass AccessTy to getGEPCost
Building on D149889, this patch updates SLP to pass the vector type as
the AccessTy to getGEPCost.
This should have the effect of GEPs being costed for more often instead
of being treated as foldable into the address mode and thus free, as
some architectures, notably RISC-V, do not have offset+reg addressing
modes for vector memory accesses.

Note that in SLP, GEPs are costed in two places: getPointersChainCost
and GetGEPCostDiff.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D153570
2023-06-29 18:42:24 +01:00
Luke Lau
a68dcd09e8 [TTI] Use users of GEP to guess access type in getGEPCost
Currently getGEPCost uses the target type of the GEP as a heuristic for
the type that will be accessed, to pass onto isLegalAddressingMode.
Targets use this to work out if a GEP can then be folded into the
load/store instruction that uses the GEP.
For example, on RISC-V loads and stores can have an offset added to a
base register folded into a single instruction, so the following GEP is
free:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load i32, ptr %p                           ; getInstructionCost = 1
------------------------------------------------------------------------
lw t0, a0(42)

However vector loads and stores cannot have an offset folded into them,
so the following GEP is costed:

%p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

The issue arises whenever there is a mismatch between the target type of
the GEP and the type that is actually accessed:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

Even though this GEP will result in an add instruction, because TTI
thinks it's loading an i32, it will think it can be folded and not
charge for it.

The target type can become mismatched with the memory access during
transformations, noticeably during SLP where a scalar base pointer will
be reused to perform a vector load or store.

This patch adds an optional AccessType argument to getGEPCost which
allows the type of memory accessed by users to be passed in as a hint,
so that we can more accurately determine if the GEP can be folded into
its users.

If AccessType is not provided, getGEPCost falls back to the old
behaviour of using the PointeeType to guess the memory access type. This
can be revisited in a later patch.

Also for now, only GEPs with exactly one user use the access type hint.
Whilst we could look through all users and use all access types to
determine if we can fold the GEP, this patch avoids doing so to prevent
O(N) behaviour.

Differential Revision: https://reviews.llvm.org/D149889
2023-06-29 13:44:37 +01:00
Alexey Bataev
5d2cc8e242 [SLP]Fix emission of buildvectors with full match.
If the buildvector node is a full match of another node, need to
correctly build the mask for the original vector value and build common
mask for the emitted node.
2023-06-28 13:47:08 -07:00
David Green
9c7aab362a [SLP] Use vector types for cmp alt instructions costs
Similar to the other code that costs main/alt instructions, the cmp should be
using the VecTy for the costs, not the ScalarTy.

One of the tests look like it gets worse just because it is not simplified to
0.

Differential Revision: https://reviews.llvm.org/D153507
2023-06-28 21:02:29 +01:00
Youngsuk Kim
243f0566dc [llvm] Replace uses of Type::getPointerTo (NFC)
Partial progress towards removing in-tree uses of `Type::getPointerTo`,
before we can deprecate the API.

If the API is used solely to support an unnecessary bitcast, get rid of
the bitcast as well.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D153933
2023-06-28 09:21:34 -04:00
Alexey Bataev
a8f1a3e025 [SLP]Fix PR63141: compareCmp is not strict weak ordering.
Added some extra checks for comapreCMP function if IsCompatibility is
false to make it meat the strict weak ordering requirements to be
correctly used in sort functions.
2023-06-28 06:00:31 -07:00
Alexey Bataev
9d4fbcd5ff Revert "[SLP]Fix PR63141: compareCmp is not strict weak ordering."
This reverts commit f3ebd88064d7f1c36a8272b3e5f7d53501c3f53b to pacify
windows-based buildbots.
2023-06-28 04:37:27 -07:00
Alexey Bataev
f3ebd88064 [SLP]Fix PR63141: compareCmp is not strict weak ordering.
Added some extra checks for comapreCMP function if IsCompatibility is
false to make it meat the strict weak ordering requirements to be
correctly used in sort functions.
2023-06-27 14:31:55 -07:00
Anna Thomas
0ce2ce2a47 Revert "Fix mlir windows buildbot after ec146cb7c0b4"
This reverts commit 46c2e1fdb3065542bed96ba944ae4a58465d2b5e.

The fix was already landed in 51e917d4af.
2023-06-20 15:29:53 -04:00
Anna Thomas
b6979698ea Fix mlir windows buildbot after ec146cb7c0b4
ec146cb7c0b4 added two new ReducKinds for float maximum and minimum
(along with support in loop vectorizer).

Enumerate these kinds in SLPVectorizer to fix mlir windows build bot
errors:

C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(13883): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(13883): warning C4062: enumerator 'llvm::RecurKind::FMinimum' in switch of enum 'llvm::RecurKind' is not handled
2023-06-20 15:24:36 -04:00
Benjamin Kramer
51e917d4af [SLP] Silence -Wswitch warning after ec146cb7c0b4a162ee73463e6c7bb306b99e013b
{min,max}imum(X, X) is X so we can take the same path as min/max.
2023-06-20 19:51:08 +02:00
Mikael Holmen
ac9b9e3aad [SLPVectorizer] Don't include isAssumeLikeIntrinsics in ScheduleRegionSize
We don't want the existence of debug instructions affect codegen so we now
ignore debug instructions and other "isAssumeLikeIntrinsics in the
"extend schedule region" search loop in
BoUpSLP::BlockScheduling::extendSchedulingRegion.

Differential Revision: https://reviews.llvm.org/D152441
2023-06-14 13:00:15 +02:00
Ivan Kelarev
884bf2e1f1 [NFC][SLP] Fix a few minor formatting issues
Differential Revision: https://reviews.llvm.org/D152395
2023-06-12 14:22:46 -07:00
Alexey Bataev
95b631181a [SLP]Fix getSpillCost functions.
There are several issues in the current implementation. The instructions
are not properly ordered, if they are placed in different basic blocks,
need to reverse the order of blocks. Also, need to exclude
non-vectorizable nodes and check for CallBase, not CallInst, otherwise
invoke calls are not handled correctly.
2023-05-26 12:19:28 -07:00
Alexey Bataev
ae5ff3ca0c [SLP]Fix PR62665: compiler crash when trying to access non-existing mask
element.

Need to check at first if the SubMask element is PoisonMaskElem to avoid
compiler crash.
2023-05-22 13:43:25 -07:00
Luke Lau
c27a0b21c5 [SLP][RISCV] Account for offset folding in getPointersChainCost
For a GEP in a pointer chain, if:
1) a pointer chain is unit-strided
2) the base pointer wasn't folded and is sitting in a register somewhere
3) the distance between the GEP and the base pointer is small enough and
   can be folded into the addressing mode of the using load/store

Then we can exclude that GEP from the total cost of the pointer chain,
as it will likely be folded away.

In order to check if 3) holds, we need to know the type of memory access
being made by the users of the pointer chain. For that, we need to pass
along a new argument to getPointersChainCost. (Using the source pointer
type of the GEP isn't accurate, see https://reviews.llvm.org/D149889 for
more details).

Also note that 2) is currently an assumption, and could be modelled more
accurately.

This prevents some unprofitable cases from being SLP vectorized on
RISC-V by making the scalar costs cheaper and closer to the actual
codegen.

For now the getPointersChainCost hook is duplicated for RISC-V to prevent
disturbing other targets, but could be merged back in and shared with
other targets in a following patch.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D149654
2023-05-22 13:55:30 +01:00
Vasileios Porpodas
806dea46be [SLP] Cleanup: Remove tryToVectorizePair(), most probably NFC
`tryToVectorizePair()` adds a level of indirection over `tryToVectorizeList()`.
I am not really sure why it is needed, it looks redundant.

I replaced all calls to `tryToVectorizePair()` with calls to
`tryToVectorizeList()` and I am not seeing any failures.

Differential Revision: https://reviews.llvm.org/D151004
2023-05-19 20:25:20 -07:00
Vasileios Porpodas
338fc76200 [SLP][NFC] Cleanup: Remove KeyNodes set.
I don't see a good reason form having the `KeyNodes` set.
This patch removes the set.

Differential Revision: https://reviews.llvm.org/D150918
2023-05-19 10:30:02 -07:00
Alexey Bataev
9a7248f561 [SLP]Fix crash for scalarized vectors.
Need to remove insertion of the nodes to the InVector in case of
scalarized vectors too to avoid compiler crashes.
2023-05-17 06:32:22 -07:00
Luke Lau
d088b8af93 [SLP] Rename IsUniformStride to IsUnitStride. NFCI
IsUniformStride is only used when the stride is a unit-stride, i.e. in a
plain wide vector load. This tightens the condition and renames it to
isUnitStride. It removes the old unused getUniformStrided() variant, as
isUnitStride should now imply that the stride is known.

Reviewed By: vdmitrie, ABataev

Differential Revision: https://reviews.llvm.org/D150662
2023-05-17 13:21:33 +01:00
Alexey Bataev
6c7acc6409 [SLP][NFC]Add missing finalize params in the CostEstimator, NFC.
Prepare functions for generalization of codegen/cost estimation.

Differential Revision: https://reviews.llvm.org/D150121
2023-05-15 11:17:37 -07:00
Vasileios Porpodas
ddb2188afc [SLP][NFC] Cleanup: Separate vectorization of Inserts and CmpInsts.
This deprecates `vectorizeSimpleInstructions()` and replaces it with separate
functions that vectorize CmpInsts and Inserts.

Differential Revision: https://reviews.llvm.org/D149993
2023-05-15 10:12:34 -07:00
Vasileios Porpodas
dda2a5d457 [SLP][NFC] Rename a couple of variables and replace an if-else with an std::min
- Rename `LimitForRegisterSize` to `MaxVFOnly` to make the meaning of the limit less ambiguous
- Rename `OpsWidth` to `ActualVF`, which makes it clear that this is the VF we are using for vectorization.
- Replace the if-else code for the initialization of OpsWidth with an std::min.

Differential Revision: https://reviews.llvm.org/D150241
2023-05-10 09:37:58 -07:00
Alexey Bataev
2672c6e4dc [SLP][NFC]Add processBuildVector member function, NFC.
Introduce processBuildVector as a next step to generalize code for cost
estimation and code emission for gather/buildvector nodes.

Differential Revision: https://reviews.llvm.org/D149973
2023-05-05 11:00:53 -07:00
Vasileios Porpodas
7749f6e976 [SLP][NFC] Cleanup: Outline the code that vectorizes CmpInsts into a seaparate function.
Differential Revision: https://reviews.llvm.org/D149919
2023-05-05 09:56:41 -07:00
Alexey Bataev
ca3f4236e4 [SLP][NFC]Add/use gather and createFreeeze member functions in
ShuffleInstructionBuilder, NFC.
2023-05-05 09:12:54 -07:00
Alexey Bataev
d726f99d43 [SLP][NFC]Do not try to revectorize instructions with constant operands, NFC.
The pass should not try to revectorize instructions with constant
operands, which were not folded by the IRBuilder. It prevents the
non-terminating loop in the SLP vectorizer for non foldable constant
operations.
2023-05-04 13:52:42 -07:00
Alexey Bataev
c0e5e7db9a [SLP]Fix a crash trying finding insert point for GEP nodes with non-gep
insts.

If the vectorizable GEP node is built, which should not be scheduled,
and at least one node is a non-gep instruction, need to insert the
vectorized instructions before the last instruction in the list, not
before the first one, otherwise the instructions may be emitted in the
wrong order.
2023-05-04 09:43:37 -07:00
Vasileios Porpodas
ce46e1aa76 [NFC][SLP] Cleanup: Simplify traversal loop in SLPVectorizerPass::vectorizeHorReduction().
This includes a couple of changes:
1. Moves the code that changes the root node out of the `TryToReduce` lambda and out of the traversal loop.
2. Since that code moved, there isn't much left in `TryToReduce` so the code was inlined.
3. The phi node variable `P` was also being used as a flag that turns on/off the exploration of operands as new seeds. This patch uses a new variable `TryOperandsAsNewSeeds` for this.
4. Simplifies the code executed when vectorization fails.

The logic of the code should be identical to the original, but I may be missing something not caught by tests.

Differential Revision: https://reviews.llvm.org/D149627
2023-05-03 12:48:01 -07:00