1964 Commits

Author SHA1 Message Date
Alexey Bataev
b65b2b4ab6 [SLP]Expand vector to the whole register size in extracts adjustment
Need to expand the number of elements to the whole register to correctly
process estimation and avoid compiler crash.

Fixes #113462
2024-10-23 12:04:40 -07:00
Alexey Bataev
a3508e0246 [SLP]Small buidlvector only graph should contains scalars from same block
If the graph is small and has single buildvector node, all scalars
instructions must be from the same basic block to prevent compiler
crash.

Fixes #113451
2024-10-23 10:46:38 -07:00
Alexey Bataev
4b1b51ac52 [SLP]Initial non-power-of-2 support (but still whole register) for reductions
Enables initial non-power-of-2 support (but still requires number of
elements, forming whole registers) for reductions.
Enables extra vectorization for
MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and
CFP2017rate/526.blender_r (checked for SSE2)

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112361
2024-10-21 12:25:39 -07:00
Alexey Bataev
9e03920cbf [SLP]Ignore root gather node, when searching for reuses
Root gather/buildvector node should be ignored when SLP vectorizer tries
to find matching gather nodes, vectorized earlier. This node is
definitely the last one in the pipeline and it does not have users. It
may cause the compiler crash

Fixes #113143
2024-10-21 09:16:16 -07:00
David Green
17ac10c28f Revert "[SLP]Initial non-power-of-2 support (but still whole register) for reductions"
This reverts commit 7f2e937469a8cec3fe977bf41ad2dfb9b4ce648a as it causes
regressions in the tests it modifies, and undoes what was added in #100653
(which itself was a fix for a previous regression).
2024-10-21 13:37:44 +01:00
Alexey Bataev
709abacdc3 [SLP]Check that operand of abs does not overflow before making it part of minbitwidth transformation
Need to check that the operand of the abs intrinsic can be safely
truncated before making it part of the minbitwidth transformation.

Fixes #112577
2024-10-18 13:56:19 -07:00
Alexey Bataev
e56e9dd8ad [SLP]Fix minbitwidth emission and analysis for freeze instruction
Need to add minbw emission and analysis for freeze instruction to fix
incorrect signedness propagation.

Fixes #112460
2024-10-18 13:36:37 -07:00
Alexey Bataev
7f2e937469 [SLP]Initial non-power-of-2 support (but still whole register) for reductions
Enables initial non-power-of-2 support (but still requires number of
elements, forming whole registers) for reductions.
Enables extra vectorization for
MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and
CFP2017rate/526.blender_r (checked for SSE2)

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112361
2024-10-18 12:50:11 -07:00
Jim Lin
5e9166e02a [SLP] Remove TTI parameter from vectorizeHorReduction and vectorizeRootInstruction. NFC.
Since TTI is a member variable.
2024-10-17 09:35:22 +08:00
Alexey Bataev
685bec722f Revert "[SLP]Initial non-power-of-2 support (but still whole register) for reductions"
This reverts commit 8287fa8e596d8fc8655c8df3bc99e068ad9f7d4b to
investigate and fix compile time regressions reported by https://llvm-compile-time-tracker.com/compare.php?from=ec78f0da0e9b1b8e2b2323e434ea742e272dd913&to=8287fa8e596d8fc8655c8df3bc99e068ad9f7d4b&stat=instructions:u
2024-10-15 12:59:44 -07:00
Alexey Bataev
060d151476 [SLP][NFCI]Check early for deleted instructions
Check as early as possible for the deleted instructions before trying to
vectorize the code. May reduce number of attempts for the vectorization.
2024-10-15 10:51:03 -07:00
Alexey Bataev
8287fa8e59
[SLP]Initial non-power-of-2 support (but still whole register) for reductions
Enables initial non-power-of-2 support (but still requiresnumber of
elements, forming whole registers) for reductions.
Enables extra vectorization for
MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and
CFP2017rate/526.blender_r (checked for SSE2)

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112361
2024-10-15 12:10:48 -04:00
Alexey Bataev
f9bc00e4bb
[SLP]Initial support for interleaved loads
Adds initial support for interleaved loads, which allows
emission of segmented loads for RISCV RVV.

Vectorizes extra code for RISCV
CFP2006/447.dealII, CFP2006/453.povray,
CFP2017rate/510.parest_r, CFP2017rate/511.povray_r,
CFP2017rate/526.blender_r, CFP2017rate/538.imagick_r, CINT2006/403.gcc,
CINT2006/473.astar, CINT2017rate/502.gcc_r, CINT2017rate/525.x264_r

Reviewers: RKSimon, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/112042
2024-10-14 09:12:33 -04:00
Alexey Bataev
3ed8acf2f0 [SLP][NFC]Simplify check for external user parent basic block, NFC. 2024-10-11 13:11:16 -07:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Alexey Bataev
4b5018d231 [SLP]Track repeated reduced value as it might be vectorized
Need to track changes with the repeated reduced value, since it might be
vectorized in the next attempt for reduction vectorization, to correctly
generate the code and avoid compiler crash.

Fixes #111887
2024-10-10 13:41:56 -07:00
Alexey Bataev
f020bf1526
[SLP]Initial support for non-power-of-2 (but whole reg) vectorization for stores
Allows non-power-of-2 vectorization for stores, but still requires, that
vectorized number of elements forms full vector registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/111194
2024-10-09 15:22:44 -04:00
Alexey Bataev
9f3c55954e
[SLP]Fix loads sorting for loads from diffrent basic blocks
Patch fixes lookup for loads from different basic blocks. Originally,
the code checked is the main key (combined with parent basic block) was
created, but did not include the key into LoadsMap. When the code looked for
the load pointer in LoadsMap, it skipped check for parent basic block
and could mix loads from different basic blocks (but the same underlying
pointer). Currently, it does lead to any issues, since later the code
compares parent basic blocks and sorts loads properly. But it increases
compile time and affects compile time.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/111521
2024-10-08 16:44:16 -04:00
Alexey Bataev
a65a5feb1a
[SLP]Improve masked loads vectorization, attempting gathered loads
If the vector of loads can be vectorized as masked gather and there are
several other masked gather nodes, compiler can try to attempt to check,
if it possible to gather such nodes into big consecutive/strided loads
  node, which provide better performance.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/110151
2024-10-08 16:43:10 -04:00
Simon Pilgrim
d38addf099 Fix MSVC signed/unsigned mismatch warning 2024-10-08 17:36:35 +01:00
Alexey Bataev
45826513ef [SLP][NFC]Fix clang-tidy suggestions, cleanup, NFC. 2024-10-08 08:31:23 -07:00
Alexey Bataev
7692d106b4 [SLP][NFC]Remove dead code + use nlogn lookups instead of n^2 2024-10-04 15:32:04 -07:00
Alexey Bataev
f74879cf0c
[SLP]Make PHICompare comparator follow weak strict ordering requirement
Reviewers: efriedma-quic

Reviewed By: efriedma-quic

Pull Request: https://github.com/llvm/llvm-project/pull/110529
2024-10-04 14:23:48 -04:00
Alexey Bataev
d991e05452 [SLP]Fix compiler crash on vectorizing gatehrd loads with different types
Need to check not only parents, but also types for compatible loads,
when trying to build the vectorizable sequences.

Fixes crash reported in https://github.com/llvm/llvm-project/pull/107461#issuecomment-2392980214
2024-10-04 08:36:57 -07:00
Han-Kuan Chen
f5815b9903
[SLP] NFC. Set NumOperands directly if VL[0] is IntrinsicInst. (#111103) 2024-10-04 19:38:33 +08:00
Alexey Bataev
133c1224de [SLP]Fix a crash on accessing element with index -1 for reused mask with PoisonMaskElem
Need to check if the index from the ReuseShuffleIndices mask is not
equal to PoisonMaskElem before trying to access the element by index.
2024-10-03 08:24:05 -07:00
Han-Kuan Chen
5901463ada
[SLP] NFC. BaseIndex is not used for getSameOpcode. (#110948) 2024-10-03 19:58:44 +08:00
Alexey Bataev
c1b911c579 [SLP]Do correct signedness analysis for clustered nodes
Should get the signedness info from the original scalar instructions, if
possible, to correctly generate sext/zext instructions. Also, the
clustered node must be assigned a gather node user info to correctly
estimate its bitwidth/sign.
2024-10-02 12:56:49 -07:00
Alexey Bataev
4197e732a5 [SLP]Add debug counter support
Fixes #110725

Reviewers: aeubanks

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/110734
2024-10-02 11:14:34 -07:00
Alexey Bataev
ec7266617f Revert "[SLP]Add debug counter support"
This reverts commit 67dd9d23474bd570d5befaddad0be8a5559b815b to fix https://lab.llvm.org/buildbot/#/builders/11/builds/6012
2024-10-02 10:33:27 -07:00
Alexey Bataev
67dd9d2347 [SLP]Add debug counter support
Fixes #110725

Reviewers: aeubanks

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/110734
2024-10-02 10:00:48 -07:00
Alexey Bataev
4dede756f2 [SLP]Transform nodes before building externally used values
transformNodes function may create new vector nodes, so the reduced
values might be vectorized later. Need to build the list of the
externally used values after the transformNodes() function call to avoid
compiler crash.

Fixe #110787
2024-10-02 06:01:25 -07:00
Haowei Wu
948326163c Revert "[SLP]Add debug counter support"
This reverts commit f3c408d1726f6a921212faf68085f68bf8533f0c.
This breaks LLVM test on debug-counter.ll
2024-10-01 16:15:30 -07:00
Alexey Bataev
f3c408d172
[SLP]Add debug counter support
Fixes #110725

Reviewers: aeubanks

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/110734
2024-10-01 16:21:00 -04:00
Alexey Bataev
b16e694948
[SLP]Try to keep operand of external casts as scalars, if profitable
If the cost of original scalar instruction + cast is better than the
extractelement from the vector cast instruction, better to keep original
scalar instructions, where possible

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/110537
2024-10-01 13:35:42 -04:00
Han-Kuan Chen
cc01112660
[SLP][REVEC] getTypeSizeInBits should apply to scalar type instead of FixedVectorType. (#110610)
reference: https://github.com/llvm/llvm-project/issues/109835
2024-10-01 19:15:58 +08:00
Jeremy Morse
96f37ae453
[NFC] Use initial-stack-allocations for more data structures (#110544)
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.

Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
2024-09-30 23:15:18 +01:00
Han-Kuan Chen
061762933b
[SLP][REVEC] Fix cost model for getBuildVectorCost with FixedVectorType ScalarTy. (#110073)
BoUpSLP::gather always use CreateInsertVector for FixedVectorType
ScalarTy.
2024-09-30 21:51:12 +08:00
Alexey Bataev
f49344e19d [SLP]Check if number of elements forms a full register
Need to check if number of elements form a full register before trying
per-register permutations to avoid compiler crash
2024-09-27 12:54:56 -07:00
Alexey Bataev
af6354634d [SLP]Look for vector user when estimating the cost
Need to find the first vector node user, not the very first user node at
all. The very first user might be a gather, vectorized as clustered,
which may cause compiler crash.

Fixes https://github.com/llvm/llvm-project/issues/110193
2024-09-27 04:14:28 -07:00
Alexey Bataev
be6aed90c7 [SLP]Use number of scalars as a vector length for minbw cast
Need to use the number of scalars, not the vector factor of the node.
Otherwise incorrect casting can be estimated, leading to a compiler
crash.
2024-09-26 13:06:19 -07:00
Alexey Bataev
100fd0cd5a [SLP]Fix a crash when trying to identify one source order
Need to check that order index is not out-of-boundaries when trying to
detect that the reuse mask is one-source-mask with clusters to fix
compiler crash
2024-09-26 04:47:48 -07:00
Jeremy Morse
056a3f4673 [NFC] Reapply 3f37c517f, SmallDenseMap speedups
This time with 100% more building unit tests. Original commit message follows.

[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)

If we use SmallDenseMaps instead of DenseMaps at these locations,
we get a substantial speedup because there's less spurious malloc
traffic. Discovered by instrumenting DenseMap with some accounting
code, then selecting sites where we'll get the most bang for our buck.
2024-09-26 10:49:29 +01:00
Alexey Bataev
1bfca99909 [SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands.
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/107273
2024-09-25 14:38:17 -07:00
Nikita Popov
29b92d0774 Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands."
This reverts commit 6b109a34ccedd3c75a067e322da0386c156c241d.

This causes a crash when linking lencod in ReleaseThinLTO configuration
2024-09-25 22:05:10 +02:00
Philip Reames
556ec4a726
[SLP] Pass operand info to getCmpSelInstrInfo (#109998)
Depending on the constant, selects with constant arms can have highly
varying cost. This adjusts SLP to use the new API introduced in
d2885743.

Fixes https://github.com/llvm/llvm-project/issues/109466.
2024-09-25 08:17:55 -07:00
Alexey Bataev
6b109a34cc
[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands.
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/107273
2024-09-25 10:43:27 -04:00
Philip Reames
d288574363
[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)
This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.

This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.
2024-09-25 07:25:57 -07:00
Alexey Bataev
3469db82b5
[SLP]Add subvector vectorization for non-load nodes
Previously SLP vectorize supported clustered vectorization for loads
only. This patch adds support for "clustered" vectorization for other
instructions.
If the buildvector node contains "clusters", which can be vectorized
separately and then inserted into the resulting buildvector result, it
is better to do, since it may reduce the cost of the vector graph and
produce better vector code.
The patch does some analysis, if it is profitable to try to do this kind
of extra vectorization. It checks the scalar instructions and its
operands and tries to vectorize them only if they result in a better
graph.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/108430
2024-09-25 10:23:41 -04:00
Jeremy Morse
817e742ba5 Revert "[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)"
This reverts commit 3f37c517fbc40531571f8b9f951a8610b4789cd6.

Lo and behold, I missed a unit test
2024-09-25 14:31:30 +01:00