1532 Commits

Author SHA1 Message Date
Alexey Bataev
52df67ba76 [SLP][NFC]Make collectValuesToDemote member of BoUpSLP to avoid using
Expr container, NFC.

Saves the memory and may improve compile time.
2023-11-17 13:45:28 -08:00
Arthur Eubanks
6a126e279d Revert "[SLP][NFC]Make collectValuesToDemote member of BoUpSLP to avoid using"
This reverts commit cfd0f41f4effb5d31654dcb28c1a577c152ee23b.

Causes crashes, see cfd0f41f4e.
2023-11-17 13:23:38 -08:00
Valery Dmitriev
94e86751e5
[NFC][SLP] Remove unnecessary DL argument (#72674) 2023-11-17 10:08:25 -08:00
Alexey Bataev
cfd0f41f4e [SLP][NFC]Make collectValuesToDemote member of BoUpSLP to avoid using
Expr container, NFC.

Saves the memory and may improve compile time.
2023-11-17 08:05:02 -08:00
Alexey Bataev
72b97630bc [SLP][NFC]Fix comparison of integers of different signs warning, NFC. 2023-11-16 17:18:28 -08:00
Alexey Bataev
cb678708e6 [SLP][NFC]Add TreeEntry-based add member functions and use them, where
possible, NFC.
2023-11-16 16:30:52 -08:00
Alexey Bataev
484a27e412 [SLP][NFC]Make needToDelay constant, NFC. 2023-11-16 16:11:43 -08:00
Alexey Bataev
009002a8cb [SLP][NFC]Unify matching for perfect diamond match between cost and codegen
models, NFC.
2023-11-16 08:11:52 -08:00
Alexey Bataev
206799fcf5 [SLP]Fix PR72524: "Out-of-bounds shuffle mask element" failed.
Need to check if we ran into subvector extract pattern before checking
for identity vector to avoid compiler crash.
2023-11-16 07:39:32 -08:00
Alexey Bataev
95703642e3 [SLP]Fix PR72202: wrong mask emission for the first found vector
operand.

Need to copy the submask not to the very first part of the common
extractelements vector mask, but to the proper one to avoid wrong code
emission.
2023-11-16 07:01:05 -08:00
Alexey Bataev
8ea8dd9a01 [SLP] Fix crash on trying to reshuffle a scalar that was vectorized.
If the buildvector node contains extractelement, which vector operand
depends on vector node, need to check if the node is ready and use
vectorized value instead of the original vector operation.
2023-11-15 11:01:45 -08:00
Alexey Bataev
d202b00826 [SLP][NFC] Make tryToGather[SingleRegister]ExtractElements routines BoUpSLP methods. 2023-11-15 09:47:24 -08:00
Alexey Bataev
b6f51787f6 [SLP]Fix signedness analysis for scalars in graph.
Cannot use the sign info for the roots for all scalars in the graph,
need to perform the analysis for each particular scalar (tree node).
2023-11-15 07:10:59 -08:00
Alexey Bataev
5adfad254e [SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI.
SLP includes analysis for the minimum bitwidth, the actual integer
operations can be emitted. It allows to reduce register pressure and
improve perf. Currently, it includes only cost model and the next
transformation relies on InstructionCombiner. Better to do it directly
in SLP, it allows to reduce compile time and fix cost model issues.
2023-11-14 11:12:52 -08:00
Alexey Bataev
f2f3050476 Revert "[SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI."
This reverts commit f6ae50f710d02d8553d28192a1f048b2a9e1fc4d to fix
a crash revealed in the internal testing.
2023-11-14 09:45:54 -08:00
Alexey Bataev
f6ae50f710 [SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI.
SLP includes analysis for the minimum bitwidth, the actual integer
operations can be emitted. It allows to reduce register pressure and
improve perf. Currently, it includes only cost model and the next
transformation relies on InstructionCombiner. Better to do it directly
in SLP, it allows to reduce compile time and fix cost model issues.
2023-11-14 07:57:37 -08:00
Alexey Bataev
d4cec1ce73 [SLP][NFCI]Improve compile time by using SmallBitVector and filtering
trees with phis/buildvectors only.
2023-11-14 06:27:17 -08:00
Alexey Bataev
ac254fc055 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-06 07:29:27 -08:00
Hans Wennborg
046c57e705 Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This causes asserts:

  llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10082:
  Value *llvm::slpvectorizer::BoUpSLP::ShuffleInstructionBuilder::adjustExtracts(
    const TreeEntry *, MutableArrayRef<int>, unsigned int, bool &):
  Assertion `Part == 0 && "Expected firs part."' failed.

See comment on the code review.

> Currently tryToGatherExtractElements function analyzes the whole vector,
> regrdless number of actual registers, used in this vector. It may
> prevent some optimizations, because per-register analysis may allow to
> simplify the final code by reusing more already emitted vectors and
> better shuffles.
>
> Differential Revision: https://reviews.llvm.org/D148855

This reverts commit 9dfdbd788707edc8c39eb2bff16004aba1f3586b.
2023-11-06 13:56:42 +01:00
Alexey Bataev
9dfdbd7887 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-03 10:43:58 -07:00
Martin Storsjö
66152f4eed Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This reverts commit 3e6d7c6d983dd5896e3a03857584654eb1360fda.

That commit caused miscompilation of ffmpeg's libavcodec/vp9dsp_8bpp.o
on aarch64; the file still compiles correctly, but no longer produces
the right result - see https://reviews.llvm.org/D148855#4655968
for details.
2023-11-03 00:08:17 +02:00
Alexey Bataev
3026c13612 [SLP][NFC]Remove commented out code, NFC. 2023-11-02 11:06:56 -07:00
Alexey Bataev
495ed8d8c8 [SLP]Fix PR70507: freeze poisonous insts to avoid poison propagation.
If the reduction instruction is not bool logical op, but reduced within bool logical op reduction list, need to freeze to avoid poison propagation.
2023-11-02 10:37:38 -07:00
Alexey Bataev
3e6d7c6d98 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-01 10:42:35 -07:00
Alexey Bataev
6e8d957a22 Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis."
This reverts commit 0a34aaedd8ec2dc2375076976c1327fdbfd7877f to fix
fails reported in https://lab.llvm.org/buildbot/#/builders/265/builds/40
2023-11-01 08:52:31 -07:00
Alexey Bataev
c28b7eb496 [SLP]Fix handling of -slp-vectorize-hor-store for values with many uses. 2023-11-01 08:41:54 -07:00
Alexey Bataev
0a34aaedd8 [SLP]Improve tryToGatherExtractElements by using per-register analysis.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Differential Revision: https://reviews.llvm.org/D148855
2023-11-01 07:44:49 -07:00
Nikita Popov
79bd1d8a95 [SLPVectorizer] Avoid use of ConstantExpr::getIntegerCast() (NFC)
We are working on a ConstantInt here, so folding will always
succeed. This just avoids use of the ConstantExpr API.
2023-11-01 12:25:55 +01:00
Alexey Bataev
4c997e1536 [SLP]Fix PR70507: emit freeeze whenever required for bool logical ops in
the middle of reduction ops.

Need to emit freeze instruction not only in the case, where the root is
bool logical op, but also if we reduce several scalars, but unable to
say precisely, if the root is bool logical op.
2023-10-31 12:23:12 -07:00
Alexey Bataev
9da19e4340 [SLP]Fix PR70507: correctly handle bool logical ops in reductions.
If the very first reduction operation is not bool logical op, but some
others are, still need to emit the boo logic op for all the extra
reduction operations to avoid incorrect poison propagation.
2023-10-30 14:09:08 -07:00
Alexey Bataev
af15c46777 [SLP]Do not crash if number of vector registers does not feet the vector
type.

Need to check, if the number of vector registers, returned by TTI, is
not greater than total number of mask element and not zero, before
trying to perform any operations. TTI still may return non-valid number
of registers.
2023-10-30 07:30:52 -07:00
Alexey Bataev
196d154ab7 [SLP]Improve isGatherShuffledEntry by trying per-register shuffle.
Currently when building gather/buildvector node, we try to build nodes
shuffles without taking into account separate vector registers. We can
improve final codegen and the whole vectorization process by including
this info into the analysis and the vector code emission, allows to emit
better vectorized code.

Differential Revision: https://reviews.llvm.org/D149742
2023-10-26 08:51:37 -07:00
Alexey Bataev
c65ec9d919 Revert "[SLP]Improve isGatherShuffledEntry by trying per-register shuffle."
This reverts commit 560bad013ebcb8d2c2c1722e35270b9a70ab40ce to fix
a bug reported in https://lab.llvm.org/buildbot/#/builders/5/builds/37763.
2023-10-26 08:36:50 -07:00
Alexey Bataev
560bad013e [SLP]Improve isGatherShuffledEntry by trying per-register shuffle.
Currently when building gather/buildvector node, we try to build nodes
shuffles without taking into account separate vector registers. We can
improve final codegen and the whole vectorization process by including
this info into the analysis and the vector code emission, allows to emit
better vectorized code.

Differential Revision: https://reviews.llvm.org/D149742
2023-10-26 05:57:03 -07:00
Kazu Hirata
f9306f6de3
[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)
C++20 comes with std::erase to erase a value from std::vector.  This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.

We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.

Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
2023-10-24 23:03:13 -07:00
Valery Dmitriev
3324776d9c
[SLP] Improve gather tree nodes matching when users are PHIs. (#70111)
This is re-commit of #69392 and also fixes issue #69670 which was
uncovered with the prior commit.
For delayed gather emission it may be incorrect to use stab instruction
as insertion point if it is a PHI operand. For that case insertion point
is adjusted to be at the end of block, ensuring that prior dependecy
vector code is emitted earlier.
2023-10-24 16:39:36 -07:00
Alexey Bataev
a3c68754b0 [SLP][NFC]Remove unused variables, NFC. 2023-10-24 14:36:33 -07:00
Alexey Bataev
d79051f894 [SLP]Fix PR70004: Do not change insert point for reduction gather nodes.
No need to change the insert point for reduction gather node, we can use
the ReductionRoot as insert point instead to avoid possible crashes.
2023-10-24 09:19:59 -07:00
Alexey Bataev
8d307f59ee [SLP]Fix PR69246: do not treat resizing maskas identity.
If the mask is resizing and the mask size is greater than than the
length of the vector, being reused from extractelement instructions, the
mask for undefs cannot be treated as identity, must be treated as
a broadcast.
2023-10-24 08:14:13 -07:00
Alexey Bataev
254558ac53 [SLP]Fix PR69976: Check for multi-node uses during node building.
Need to check if there is already a node created for the multi-node
instruction before ending up with creating a new node for such
instructions.
2023-10-24 07:01:46 -07:00
Douglas Yung
734b016b66 Revert "[SLP] Improve gather tree nodes matching when users are PHIs. (#69392)"
This reverts commit c80b50349648dcf7fcbf4ae69c62b3d34bee0c70.

This change causes a fatal error in the backend and is filed as issue #69670.
2023-10-20 10:59:07 -07:00
Alexey Bataev
4a06332e45 [SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl&, rename
function, NFC.
2023-10-18 13:09:20 -07:00
Alexey Bataev
3ef271c3d6 [SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl& in param, NFC. 2023-10-18 09:47:07 -07:00
Valery Dmitriev
c80b503496
[SLP] Improve gather tree nodes matching when users are PHIs. (#69392) 2023-10-18 09:05:11 -07:00
Valery Dmitriev
9aa571f080
[SLP][NFC] Try to cleanup and better document some isGatherShuffledEntry code. (#69384)
Outline some often used common code to dedicated variables in order
to make code compact. Rename variables to more accurately reflect
their purpose. Apply const qualifier where appropriate.
Fix and add bit more explanation comment for the existing code.
2023-10-17 14:59:36 -07:00
Alexey Bataev
66775f8ccd [SLP]Fix PR69196: Instruction does not dominate all uses
During emission of the postponed gathers, need to insert them before
user instruction to avoid use before definition crash.
2023-10-17 10:43:59 -07:00
Alexey Bataev
119b0f3895 Revert "[SLP]Fix PR69196: Instruction does not dominate all uses"
This reverts commit 8e2b2c4181506efc5b9321c203dd107bbd63392b to fix
a crash reported in https://lab.llvm.org/buildbot/#/builders/230/builds/19993.
2023-10-16 13:29:17 -07:00
Alexey Bataev
8e2b2c4181 [SLP]Fix PR69196: Instruction does not dominate all uses
During emission of the postponed gathers, need to insert them before
user instruction to avoid use before definition crash.
2023-10-16 12:57:18 -07:00
Fangrui Song
2d854dd3e7 Move global namespace cl::opt inside llvm:: or internalize them 2023-10-10 19:58:03 -07:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00