4068 Commits

Author SHA1 Message Date
Graham Hunter
b070629c10
[LV] Increase max VF if vectorized function variants exist (#66639)
If there are function calls in the candidate loop and we have vectorized
variants available, try some wider VFs in case the conservative initial
maximum based on the widest types in the loop won't actually allow us
to make use of those function variants.
2023-11-13 10:27:10 +00:00
Florian Hahn
34c2dcd5ac
[VPlan] Move initial skeleton construction to createInitialVPlan. (NFC)
This patch moves creating the  middle VPBBs and an initial empty
vector loop region for the top-level loop to createInitialVPlan.

This consolidates code to create the initial VPlan skeleton and enables
adding other bits outside the main region during initial VPlan
construction. In particular, D150398 will add the exit check & branch to
the middle block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158333
2023-11-12 13:00:44 +00:00
Michael Maitland
acef83c142
[VectorCombine] Fix crash in scalarizeVPIntrinsic (#72039)
When getSplatOp returns nullptr, the intrinsic cannot be scalarized.
This patch includes a test case that fixes a crash from trying to
scalarize the VPIntrinsic when getSplatOp returns nullptr.

This fixes https://github.com/llvm/llvm-project/issues/72034.
2023-11-11 19:54:15 -05:00
Florian Hahn
ed6f4994d8
[VPlan] Handle conditional ordered reductions with scalar VFs.
VPReductionRecipe::execute was not handling predicates for ordered
reduction with scalar VFs, which was causing a crash. Thsi patch adds
dedicated handling for scalar VFs when dealing with the condition.
The other operands are already handled in a similar fashion below.

Fixes #70988.
2023-11-11 12:55:40 +00:00
Tom Stellard
2400c54c37
[Vectorize] Remove Transforms/Vectorize.h (#71294)
The only thing in this file is a declaration for
createLoadStoreVectorizerPass(), and this function is already declared
in LoadStoreVectorizer.h.
2023-11-06 14:04:22 -08:00
Simon Pilgrim
3ca4fe80d4 [Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-06 16:50:18 +00:00
Florian Hahn
a002271972
[VPlan] Add VPValue::replaceUsesWithIf (NFCI).
Add replaceUsesWithIf helper and use it in a few places.
2023-11-06 16:08:22 +00:00
Alexey Bataev
ac254fc055 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-06 07:29:27 -08:00
Hans Wennborg
046c57e705 Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This causes asserts:

  llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10082:
  Value *llvm::slpvectorizer::BoUpSLP::ShuffleInstructionBuilder::adjustExtracts(
    const TreeEntry *, MutableArrayRef<int>, unsigned int, bool &):
  Assertion `Part == 0 && "Expected firs part."' failed.

See comment on the code review.

> Currently tryToGatherExtractElements function analyzes the whole vector,
> regrdless number of actual registers, used in this vector. It may
> prevent some optimizations, because per-register analysis may allow to
> simplify the final code by reusing more already emitted vectors and
> better shuffles.
>
> Differential Revision: https://reviews.llvm.org/D148855

This reverts commit 9dfdbd788707edc8c39eb2bff16004aba1f3586b.
2023-11-06 13:56:42 +01:00
Alexey Bataev
9dfdbd7887 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-03 10:43:58 -07:00
Florian Hahn
fd82b5b287
[LV] Support recieps without underlying instr in collectPoisonGenRec.
Support recipes without underlying instruction in
collectPoisonGeneratingRecipes by directly trying to dyn_cast_or_null
the underlying value.

Fixes https://github.com/llvm/llvm-project/issues/70590.
2023-11-03 10:21:14 +00:00
Martin Storsjö
66152f4eed Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This reverts commit 3e6d7c6d983dd5896e3a03857584654eb1360fda.

That commit caused miscompilation of ffmpeg's libavcodec/vp9dsp_8bpp.o
on aarch64; the file still compiles correctly, but no longer produces
the right result - see https://reviews.llvm.org/D148855#4655968
for details.
2023-11-03 00:08:17 +02:00
Alexey Bataev
3026c13612 [SLP][NFC]Remove commented out code, NFC. 2023-11-02 11:06:56 -07:00
Alexey Bataev
495ed8d8c8 [SLP]Fix PR70507: freeze poisonous insts to avoid poison propagation.
If the reduction instruction is not bool logical op, but reduced within bool logical op reduction list, need to freeze to avoid poison propagation.
2023-11-02 10:37:38 -07:00
Alexey Bataev
3e6d7c6d98 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-01 10:42:35 -07:00
Alexey Bataev
6e8d957a22 Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This reverts commit 0a34aaedd8ec2dc2375076976c1327fdbfd7877f to fix
fails reported in https://lab.llvm.org/buildbot/#/builders/265/builds/40
2023-11-01 08:52:31 -07:00
Alexey Bataev
c28b7eb496 [SLP]Fix handling of -slp-vectorize-hor-store for values with many uses. 2023-11-01 08:41:54 -07:00
Alexey Bataev
0a34aaedd8 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-01 07:44:49 -07:00
Nikita Popov
79bd1d8a95 [SLPVectorizer] Avoid use of ConstantExpr::getIntegerCast() (NFC)
We are working on a ConstantInt here, so folding will always
succeed. This just avoids use of the ConstantExpr API.
2023-11-01 12:25:55 +01:00
Alexey Bataev
4c997e1536 [SLP]Fix PR70507: emit freeeze whenever required for bool logical ops in
the middle of reduction ops.

Need to emit freeze instruction not only in the case, where the root is
bool logical op, but also if we reduce several scalars, but unable to
say precisely, if the root is bool logical op.
2023-10-31 12:23:12 -07:00
Nikita Popov
6a06155c53 [VectorCombine] Discard ScalarizationResults if transform aborted
Fixes https://github.com/llvm/llvm-project/issues/69820.
2023-10-31 11:24:30 +01:00
Alexey Bataev
9da19e4340 [SLP]Fix PR70507: correctly handle bool logical ops in reductions.
If the very first reduction operation is not bool logical op, but some
others are, still need to emit the boo logic op for all the extra
reduction operations to avoid incorrect poison propagation.
2023-10-30 14:09:08 -07:00
Alexey Bataev
af15c46777 [SLP]Do not crash if number of vector registers does not feet the vector
type.

Need to check, if the number of vector registers, returned by TTI, is
not greater than total number of mask element and not zero, before
trying to perform any operations. TTI still may return non-valid number
of registers.
2023-10-30 07:30:52 -07:00
Igor Kirillov
70904226e1
[LoopVectorize] Enhance Vectorization decisions for predicate tail-folded loops with low trip counts (#69588)
* Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known
to be predicate tail-folded, delegating to `areRuntimeChecksProfitable`
to decide on the profitability of vectorizing loops with runtime checks.
* Update the `areRuntimeChecksProfitable` function to consider the
`ScalarEpilogueLowering` setting when assessing vectorization of a loop.

With this patch, we can make more informed decisions for loops with low
trip counts, especially when leveraging Profile-Guided Optimization
(PGO) data.
2023-10-30 13:43:26 +00:00
Florian Hahn
b0b88643a1
[VPlan] Add initial anlysis to infer scalar type of VPValues. (#69013)
This patch adds initial type inferrence for VPValues. It infers the
scalar type of a VPValue, by bottom-up traversing through defining
recipes until root nodes with known types are reached (e.g. live-ins or
load recipes). The types are then propagated top down through
operations.

This is intended as building block for a VPlan-based cost model, which
will need access to type information for VPValues/recipes.

Initial testing is done by asserting the inferred type matches the type
of the result value generated for a widen and replicate recipes.
2023-10-27 14:38:28 +01:00
Florian Hahn
cff6652129
[VPlan] Handle VPValues without underlying values in getTypeForVPValue.
Fixes a crash after 0c8e5be6fa08.

Full type inference will be added in
https://github.com/llvm/llvm-project/pull/69013
2023-10-27 13:34:54 +01:00
Alexey Bataev
196d154ab7 [SLP]Improve isGatherShuffledEntry by trying per-register shuffle.
Currently when building gather/buildvector node, we try to build nodes
shuffles without taking into account separate vector registers. We can
improve final codegen and the whole vectorization process by including
this info into the analysis and the vector code emission, allows to emit
better vectorized code.

Differential Revision: https://reviews.llvm.org/D149742
2023-10-26 08:51:37 -07:00
Alexey Bataev
c65ec9d919 Revert "[SLP]Improve isGatherShuffledEntry by trying per-register shuffle."
This reverts commit 560bad013ebcb8d2c2c1722e35270b9a70ab40ce to fix
a bug reported in https://lab.llvm.org/buildbot/#/builders/5/builds/37763.
2023-10-26 08:36:50 -07:00
Alexey Bataev
560bad013e [SLP]Improve isGatherShuffledEntry by trying per-register shuffle.
Currently when building gather/buildvector node, we try to build nodes
shuffles without taking into account separate vector registers. We can
improve final codegen and the whole vectorization process by including
this info into the analysis and the vector code emission, allows to emit
better vectorized code.

Differential Revision: https://reviews.llvm.org/D149742
2023-10-26 05:57:03 -07:00
Kazu Hirata
f9306f6de3
[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)
C++20 comes with std::erase to erase a value from std::vector.  This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.

We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.

Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
2023-10-24 23:03:13 -07:00
Valery Dmitriev
3324776d9c
[SLP] Improve gather tree nodes matching when users are PHIs. (#70111)
This is re-commit of #69392 and also fixes issue #69670 which was
uncovered with the prior commit.
For delayed gather emission it may be incorrect to use stab instruction
as insertion point if it is a PHI operand. For that case insertion point
is adjusted to be at the end of block, ensuring that prior dependecy
vector code is emitted earlier.
2023-10-24 16:39:36 -07:00
Alexey Bataev
a3c68754b0 [SLP][NFC]Remove unused variables, NFC. 2023-10-24 14:36:33 -07:00
Alexey Bataev
d79051f894 [SLP]Fix PR70004: Do not change insert point for reduction gather nodes.
No need to change the insert point for reduction gather node, we can use
the ReductionRoot as insert point instead to avoid possible crashes.
2023-10-24 09:19:59 -07:00
Alexey Bataev
8d307f59ee [SLP]Fix PR69246: do not treat resizing maskas identity.
If the mask is resizing and the mask size is greater than than the
length of the vector, being reused from extractelement instructions, the
mask for undefs cannot be treated as identity, must be treated as
a broadcast.
2023-10-24 08:14:13 -07:00
Nabeel Omer
8e31acf8ca
[VectorCombine] Add special handling for truncating shuffles (#70013)
When dealing with a truncating shuffle, we can end up in a situation
where the type passed to getShuffleCost is the type of the result of the
shuffle, and the mask references an element which is out of bounds of
the result vector.

If dealing with truncating shuffles, pass the type of the input vectors
to `getShuffleCost()` in order to avoid an out-of-bounds assertion.
2023-10-24 15:03:43 +01:00
Alexey Bataev
254558ac53 [SLP]Fix PR69976: Check for multi-node uses during node building.
Need to check if there is already a node created for the multi-node
instruction before ending up with creating a new node for such
instructions.
2023-10-24 07:01:46 -07:00
Hans Wennborg
e2fc68c3db Typos: 'maxium', 'minium' 2023-10-23 10:42:28 +02:00
Kazu Hirata
3af0ff99b1 [llvm] Stop including llvm/ADT/DepthFirstIterator.h (NFC)
Identified with misc-include-cleaner.
2023-10-22 12:15:46 -07:00
Florian Hahn
4f56d47d05
[VPlan] Make ExpandedSCEVs argument const (NFC).
The argument is only used in const contexts. Simplifies a follow-up
diff.
2023-10-22 12:31:55 +01:00
Florian Hahn
0c8e5be6fa
[VPlan] Simplify redundant trunc (zext A) pairs to A.
Add simplification for redundant trunc(zext A) pairs. Generally apply a
transform from D149903.

Depends on D159200.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D159202
2023-10-22 11:41:38 +01:00
Florian Hahn
ca01f2af78
[LV] Enforce order of reductions with intermediate stores in VPlan (NFC)
Reductions with intermediate stores currently need to be fixed in order
of their intermediate stores. Instead of doing this at fixup time after
code has been generated, sort the reductions in adjustRecipesForReductions.

This makes the order explicit in VPlan and will enable removing
fixReductions with modeling computing the final reduction result in
VPlan, followed by also modeling the intermediate stores explicitly.
2023-10-21 21:26:52 +01:00
Lou Knauer
852bac4439
[VPlan] Support scalable vectors in outer-loop vectorization
This patch enables scalable vectors in the VPlan-native path.
If a vectorization factor is specified via loop vectorization hints,
that factor is used. If no vectorization factor is specified, but the
target preferes scalable vectorization, a scalable vectorization factor
is selected.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D157484
2023-10-20 23:17:35 +01:00
Douglas Yung
734b016b66 Revert "[SLP] Improve gather tree nodes matching when users are PHIs. (#69392)"
This reverts commit c80b50349648dcf7fcbf4ae69c62b3d34bee0c70.

This change causes a fatal error in the backend and is filed as issue #69670.
2023-10-20 10:59:07 -07:00
Florian Hahn
2ec7bba77b
Recommit "[VPlan] Insert Trunc/Exts for reductions directly in VPlan."
This reverts commit e4ea0997486000b460c4875a00301b73b3c0d6a7.

The recommit fixes a reported crash by adding a missing check to make
sure the cast recipes are only introduced when vectorizing.

Test coverage added in 3cac608fbd0811b2f5c59c6e13148162ccd8543e.
Original commit message:
   Update the code to create Trunc/Ext recipes directly in
    adjustRecipesForReductions instead of fixing it up later in
    fixReductions.

    This explicitly models the required conversions and also makes sure they
    are generated at the right place (instead of after the exit condition),
    hence the changes in a few tests.
2023-10-20 14:30:04 +01:00
Luke Lau
c35939b22e
[VectorCombine] Use isSafeToSpeculativelyExecute to guard VP scalarization (#69494)
Previously we were just matching against a fixed list of VP intrinsics
that we
knew couldn't be speculated, but we can reuse the logic in
isSafeToSpeculativelyExecuteWithOpcode. This also allows speculation in
more
cases, e.g. when the divisor is known to be non-zero.

Unfortunately we can't reuse the exact same function call for VP
intrinsics
with functional intrinsics instead of opcodes, because
isSafeToSpeculativelyExecute needs an instruction that already exists.
So this
just copies the logic by peeking into the function attributes of the
intrinsic.
2023-10-19 12:45:21 -04:00
Fangrui Song
e4ea099748 Revert "[VPlan] Insert Trunc/Exts for reductions directly in VPlan."
This reverts commit fd311126349b8fe1684d62154a9fa5a7bbb0b713.

There are two different crash reports on fd31112634
2023-10-18 23:25:31 -07:00
Alexey Bataev
4a06332e45 [SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl&, rename
function, NFC.
2023-10-18 13:09:20 -07:00
Alexey Bataev
3ef271c3d6 [SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl& in param, NFC. 2023-10-18 09:47:07 -07:00
Valery Dmitriev
c80b503496
[SLP] Improve gather tree nodes matching when users are PHIs. (#69392) 2023-10-18 09:05:11 -07:00
Valery Dmitriev
9aa571f080
[SLP][NFC] Try to cleanup and better document some isGatherShuffledEntry code. (#69384)
Outline some often used common code to dedicated variables in order
to make code compact. Rename variables to more accurately reflect
their purpose. Apply const qualifier where appropriate.
Fix and add bit more explanation comment for the existing code.
2023-10-17 14:59:36 -07:00