3855 Commits

Author SHA1 Message Date
Alexey Bataev
8ab962e411 [SLP]Relax assertion to check if the input scalars were extended to
match the size of base node (PR63668).

Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
2023-07-14 07:19:49 -07:00
Alexey Bataev
bc8abb42bb Revert "[SLP]Relax assertion to check if the input scalars were extended to"
This reverts commit 6fdfc81287ecdc2a7f409d08538ec6ce2bd698da to fix the
check in the assert )need to use end, nod begin function).
2023-07-14 07:04:06 -07:00
Alexey Bataev
6fdfc81287 [SLP]Relax assertion to check if the input scalars were extended to
match the size of base node (PR63668).

Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
2023-07-14 06:48:25 -07:00
Anna Thomas
1159266734 [SLP] Add support for fmaximum/fminimum reduction
This patch adds support for vectorized reduction of maximum/minimum
intrinsics which are under the appropriate reduction kind.

Differential Revision: https://reviews.llvm.org/D154463
2023-07-12 15:22:38 -04:00
Nikita Popov
94abecca6b [IVDescriptors] Remove typed pointer support (NFC)
This also removes the element type from the descriptor, as it is
always i8. The meaning of the step is now the same between
integers and pointers.
2023-07-12 15:48:29 +02:00
Florian Hahn
9259f41e62
[VPlan] Clear reduction flags directly as VPlanTransform.
After D150027, all relevant recipes should model their IR flags
directly. Instead of removing the flags after codegen as part of
fixReductions, drop poison generating flags directly from the recipes.

Depends on D150027.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150028
2023-07-09 21:11:51 +01:00
Florian Hahn
14ec3f4b06
[LV] Skip VFs > # iterations remaining for epilogue vectorization.
If a candidate VF for epilogue vectorization is greater than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154264
2023-07-07 21:43:51 +01:00
Florian Hahn
aee851fd0e
Revert "[LV] Skip VFs < iterations remaining for epilogue vectorization."
This reverts commit 7cc0be01a0068946ea3613dc2cb45c81b0f45860.

The title of the commit is incorrect, revert to fix the commit message.
2023-07-07 21:41:24 +01:00
Florian Hahn
7cc0be01a0
[LV] Skip VFs < iterations remaining for epilogue vectorization.
If a candidate VF for epilogue vectorization is less than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154264
2023-07-07 20:33:42 +01:00
Florian Hahn
a0fcf84a8c
[LV] Consider if scalar epilogue is required in getMaximizedVFForTarget.
When a scalar epilogue is required, at least one iteration of the scalar loop
has to execute. Adjust ConstTripCount accordingly to avoid picking a max VF
that results in a dead vector loop.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154261
2023-07-06 13:35:35 +01:00
Florian Hahn
2265bb064b
[LV] Update generateInstruction to return produced value (NFC).
Update generateInstruction to return the produced value instead of
setting it for each opcode. This reduces the amount of duplicated code
and is a preparation for D153696.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154240
2023-07-05 19:53:59 +01:00
Florian Hahn
1746ac42ca
[LV] Forget SCEVs for exit phis after vectorization.
After vectorization, the exit blocks of the original loop will have additional
predecessors. Invalidate SCEVs for the exit phis in case SE looked through
single-entry phis.

Fixes https://github.com/llvm/llvm-project/issues/63368
Fixes https://github.com/llvm/llvm-project/issues/63669
2023-07-04 21:28:03 +01:00
David Green
12025cef3e [CostModel] Use min/max intrinsics for vecreduce.min/max costs
This changes the costmodelling of the vecreduce.min/max nodes to use the costs
of the relevant min/max intrinsics instead of expanding them to compare and
selects. The getMinMaxReductionCost have changed to take a Opcode for the
relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are
no longer needed.

A follow up patch will add some basic fminimum/fmaximum costmodelling.

Differential Revision: https://reviews.llvm.org/D153547
2023-07-04 15:02:30 +01:00
Florian Hahn
39385c521d
[LV] Move getBroadcastInstr to VPTransformState.::get (NFCI).
getBroadcastInstrs is only used in VPTransformState::get. Move it closer
to use to reduce unnecessary interaction with ILV object.
2023-07-04 11:24:11 +01:00
Evgeniy Brevnov
d7329653d0 [VPlan] Allow sinking of instructions with no defs
We started seeing new failure after D142886. Looks like it enabled new cases and we hit an assert:
assert(Current->getNumDefinedValues() == 1 &&
           "only recipes with a single defined value expected");

 When we do instruction sinking for the first order recurrence we hit an assert if instruction doesn't have single def. In case instruction doesn't produce any new def there is no new users and nothing to sink.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D151204
2023-07-04 16:53:06 +07:00
Florian Hahn
b4efc0f070
[LV] Break up condition in selectEpilogueVectorizationFactor loop (NFCI)
Restructure the loop as suggested in D154264 to increase readability and
make it easier to extend.
2023-07-03 22:39:40 +01:00
Florian Hahn
55e7f1f786
[LV] Pass bool to requiresScalarEpilogue (NFC).
requiresScalarEpilogue only checks if the selected VF is vectorizing
(and not scalar). Update it to just take a boolean, to make it clearer
what information is used and to allow callers without a VF (used in a
follow-up patch).
2023-06-30 22:08:27 +01:00
Valery N Dmitriev
03b118c7e4 [SLP] Fix crash on attempt to access on invalid iterator state.
The patch fixes corner case when no of scalar instructions
required scheduling for vectorized node.

Differential Revision: https://reviews.llvm.org/D154175
2023-06-30 11:40:25 -07:00
Nikita Popov
cc31d787c3 Revert "Reland [SLP] Provide an universal interface for FixedVectorType::get. NFC."
This reverts commit 19b1d3bd7eeecbeb1e45045960faf325c7bc5c64.

Both the commit and the review are missing a patch description.
2023-06-30 11:31:16 +02:00
Han-Kuan Chen
19b1d3bd7e Reland [SLP] Provide an universal interface for FixedVectorType::get. NFC.
Differential Revision: https://reviews.llvm.org/D154114
2023-06-29 23:15:52 -07:00
Arthur Eubanks
a374fb2b5e Revert "[SLP] Provide an universal interface for FixedVectorType::get. NFC."
This reverts commit fcd58ea50c218b61a58d6815b9d15bad7dbc75a3.

Causes crashes, see comments on D154114.
2023-06-29 21:49:05 -07:00
Han-Kuan Chen
fcd58ea50c [SLP] Provide an universal interface for FixedVectorType::get. NFC.
Differential Revision: https://reviews.llvm.org/D154114
2023-06-29 17:06:08 -07:00
Igor Kirillov
17bde328d6 [LV] Add mask support for vectorizing interleaved groups
This patch extends LoopVectorize to handle the vectorization of interleaved
memory accesses with scalable vectors when mask is required or/and predicated
tail folding is enabled.

Differential Revision: https://reviews.llvm.org/D152258
2023-06-29 17:50:56 +00:00
Luke Lau
d0d864f6f4 [SLP] Explicitly pass AccessTy to getGEPCost
Building on D149889, this patch updates SLP to pass the vector type as
the AccessTy to getGEPCost.
This should have the effect of GEPs being costed for more often instead
of being treated as foldable into the address mode and thus free, as
some architectures, notably RISC-V, do not have offset+reg addressing
modes for vector memory accesses.

Note that in SLP, GEPs are costed in two places: getPointersChainCost
and GetGEPCostDiff.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D153570
2023-06-29 18:42:24 +01:00
Luke Lau
a68dcd09e8 [TTI] Use users of GEP to guess access type in getGEPCost
Currently getGEPCost uses the target type of the GEP as a heuristic for
the type that will be accessed, to pass onto isLegalAddressingMode.
Targets use this to work out if a GEP can then be folded into the
load/store instruction that uses the GEP.
For example, on RISC-V loads and stores can have an offset added to a
base register folded into a single instruction, so the following GEP is
free:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load i32, ptr %p                           ; getInstructionCost = 1
------------------------------------------------------------------------
lw t0, a0(42)

However vector loads and stores cannot have an offset folded into them,
so the following GEP is costed:

%p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

The issue arises whenever there is a mismatch between the target type of
the GEP and the type that is actually accessed:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

Even though this GEP will result in an add instruction, because TTI
thinks it's loading an i32, it will think it can be folded and not
charge for it.

The target type can become mismatched with the memory access during
transformations, noticeably during SLP where a scalar base pointer will
be reused to perform a vector load or store.

This patch adds an optional AccessType argument to getGEPCost which
allows the type of memory accessed by users to be passed in as a hint,
so that we can more accurately determine if the GEP can be folded into
its users.

If AccessType is not provided, getGEPCost falls back to the old
behaviour of using the PointeeType to guess the memory access type. This
can be revisited in a later patch.

Also for now, only GEPs with exactly one user use the access type hint.
Whilst we could look through all users and use all access types to
determine if we can fold the GEP, this patch avoids doing so to prevent
O(N) behaviour.

Differential Revision: https://reviews.llvm.org/D149889
2023-06-29 13:44:37 +01:00
Alexey Bataev
5d2cc8e242 [SLP]Fix emission of buildvectors with full match.
If the buildvector node is a full match of another node, need to
correctly build the mask for the original vector value and build common
mask for the emitted node.
2023-06-28 13:47:08 -07:00
David Green
9c7aab362a [SLP] Use vector types for cmp alt instructions costs
Similar to the other code that costs main/alt instructions, the cmp should be
using the VecTy for the costs, not the ScalarTy.

One of the tests look like it gets worse just because it is not simplified to
0.

Differential Revision: https://reviews.llvm.org/D153507
2023-06-28 21:02:29 +01:00
Youngsuk Kim
243f0566dc [llvm] Replace uses of Type::getPointerTo (NFC)
Partial progress towards removing in-tree uses of `Type::getPointerTo`,
before we can deprecate the API.

If the API is used solely to support an unnecessary bitcast, get rid of
the bitcast as well.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D153933
2023-06-28 09:21:34 -04:00
Alexey Bataev
a8f1a3e025 [SLP]Fix PR63141: compareCmp is not strict weak ordering.
Added some extra checks for comapreCMP function if IsCompatibility is
false to make it meat the strict weak ordering requirements to be
correctly used in sort functions.
2023-06-28 06:00:31 -07:00
Alexey Bataev
9d4fbcd5ff Revert "[SLP]Fix PR63141: compareCmp is not strict weak ordering."
This reverts commit f3ebd88064d7f1c36a8272b3e5f7d53501c3f53b to pacify
windows-based buildbots.
2023-06-28 04:37:27 -07:00
Alexey Bataev
f3ebd88064 [SLP]Fix PR63141: compareCmp is not strict weak ordering.
Added some extra checks for comapreCMP function if IsCompatibility is
false to make it meat the strict weak ordering requirements to be
correctly used in sort functions.
2023-06-27 14:31:55 -07:00
Elliot Goodrich
b0abd4893f [llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
2023-06-25 15:42:22 +01:00
Kazu Hirata
3f8ed16c67 [Transforms] Remove unused forward declaration PredicateScalarEvolution
The declaration was added without a corresponding class definition by:

  commit a84064bcda1a737658d33e96ca58516d01af70a6
  Author: Florian Hahn <flo@fhahn.com>
  Date:   Wed Dec 21 22:02:31 2022 +0000

It is most likely a misspelling of PredicatedScalarEvolution.
2023-06-22 23:45:52 -07:00
Anna Thomas
0ce2ce2a47 Revert "Fix mlir windows buildbot after ec146cb7c0b4"
This reverts commit 46c2e1fdb3065542bed96ba944ae4a58465d2b5e.

The fix was already landed in 51e917d4af.
2023-06-20 15:29:53 -04:00
Anna Thomas
b6979698ea Fix mlir windows buildbot after ec146cb7c0b4
ec146cb7c0b4 added two new ReducKinds for float maximum and minimum
(along with support in loop vectorizer).

Enumerate these kinds in SLPVectorizer to fix mlir windows build bot
errors:

C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(13883): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(13883): warning C4062: enumerator 'llvm::RecurKind::FMinimum' in switch of enum 'llvm::RecurKind' is not handled
2023-06-20 15:24:36 -04:00
Benjamin Kramer
51e917d4af [SLP] Silence -Wswitch warning after ec146cb7c0b4a162ee73463e6c7bb306b99e013b
{min,max}imum(X, X) is X so we can take the same path as min/max.
2023-06-20 19:51:08 +02:00
Florian Hahn
0a246a0c72
[LV] Use VPValues when creating GEP with all invariant indices.
Update VPWidenGEPRecipe::execute to use the VPValue operands of the
recipe when creating the GEP instruction.

Fixes #63340.
2023-06-16 16:14:01 +01:00
Florian Hahn
ea6ca9cb2b
[LV] Fix crash when stride isn't a constant.
In same cases, the stride may not be a constant. Just skip those cases
for now. This should only happen for cases where LV interleaves only, if
it is vectorized the stride needs to be versioned to a constant.
2023-06-14 16:53:34 +01:00
Mikael Holmen
ac9b9e3aad [SLPVectorizer] Don't include isAssumeLikeIntrinsics in ScheduleRegionSize
We don't want the existence of debug instructions affect codegen so we now
ignore debug instructions and other "isAssumeLikeIntrinsics in the
"extend schedule region" search loop in
BoUpSLP::BlockScheduling::extendSchedulingRegion.

Differential Revision: https://reviews.llvm.org/D152441
2023-06-14 13:00:15 +02:00
Florian Hahn
d209084720
[VPlan] Replace versioned stride with constant during VPlan opts.
After constructing the initial VPlan, replace VPValues for versioned
strides with their constant counterparts.

Differential Revision: https://reviews.llvm.org/D147783
2023-06-13 08:26:55 +01:00
Ivan Kelarev
884bf2e1f1 [NFC][SLP] Fix a few minor formatting issues
Differential Revision: https://reviews.llvm.org/D152395
2023-06-12 14:22:46 -07:00
Florian Hahn
df357a71dd
[VPlan] Use step from induction recipe directly. (NFC)
Directly use the step of the transformed induction instead of creating
a new step. This allows replacing all uses of strides in D147783.
2023-06-11 21:40:06 +01:00
Kazu Hirata
c963892a45 [llvm] Use DenseMapBase::lookup (NFC) 2023-06-10 09:02:25 -07:00
Bjorn Pettersson
d2223221e7 [LoadStoreVectorizer] Optimize check for IsAllocaAccess. NFC
Swap order for checking address space and the strip pointer cast
when analyzing if a load/store is accessing an alloca. This to
make sure we do the cheaper check first.

This is done as a follow up to D152386.
2023-06-09 21:43:42 +02:00
Bjorn Pettersson
263bc7f905 [LoadStoreVectorizer] Only upgrade align for alloca
In commit 2be0abb7fe72ed453 (D149893) the load store vectorized was
reimplemented. One thing that can happen with the new LSV is that
it can increase the align of alloca and global objects. However,
the code comments indicate that the intention only was to increase
alignment of alloca.
Now we will use stripPointerCasts to analyse if the load/store really
is accessing an alloca (same as getOrEnforceKnownAlignment is using).
And then we only try to change the align if we find an alloca
instruction. This way the code will match better with code comments,
and we won't change alignment of non-stack variables to use the
"StackAdjustedAlignment".

Differential Revision: https://reviews.llvm.org/D152386
2023-06-09 15:33:35 +02:00
Graham Hunter
95bfb1902d [LV][AArch64] Allow (limited) interleaving for scalable vectors
This patch uses the (de)interleaving intrinsics introduced in
D141924 to handle vectorization of interleaving groups with a
factor of 2 for scalable vectors.

Reviewed By: fhahn, reames

Differential Revision: https://reviews.llvm.org/D145163
2023-06-09 11:42:10 +01:00
Florian Hahn
8f781b96e2
Revert "[VPlan] Mark recurrence recipes as not having side-effects."
This reverts commit 02369b75fdd7b5fc5d9b47f1b60587c225918511.

At the moment, live-outs used *only* for the resume values in the scalar
loop are not modeled in VPlan yet. This means first-order recurrence
recipes could be removed, when a scalar epilogue is required and the
only use of a FOR is outside the loop.

Keep treating recurrence recipes as having side-effects for now, to
avoid them being removed.

Fixes #62954.
2023-06-06 11:35:26 +02:00
Nikita Popov
143ed21b26 Revert "[LCSSA] Remove unused ScalarEvolution argument (NFC)"
This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7.

In preparation for reverting a dependent revision.
2023-06-05 16:45:38 +02:00
Florian Hahn
e19297471a
[LV] Check if value was already not uniform for previous VF.
If the value was already known to not be uniform for the previous
(smaller VF), it cannot be uniform for the larger VF.

This slightly reduces compile-time, once uniformity checks are becoming
a bit more expensive due to using SCEV rewriting (D148841).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D151658
2023-06-04 20:31:01 +01:00
Florian Hahn
e48b1e87a3
[LV] Split off invariance check from isUniform (NFCI).
After 572cfa3fde5433, isUniform now checks VF based uniformity instead of
just invariance as before.

As follow-up cleanup suggested in D148841, separate the invariance check
out and update callers that currently check only for invariance.

This also moves the implementation of isUniform from LoopAccessAnalysis
to LoopVectorizationLegality, as LoopAccesAnalysis doesn't use the more
general isUniform.
2023-06-01 19:09:11 +01:00