1840 Commits

Author SHA1 Message Date
Alexey Bataev
b765fdd997
[SLP]Try to keep scalars, used in phi nodes, if phi nodes from same block are vectorized.
Before doing the vectorization of the PHI nodes, the compiler sorts them
by the opcodes of the operands. If the scalar is replaced during the
vectorization by extractelement, it breaks this sorting and prevent some
further vectorization attempts. Patch tries to improve this by doing
extra analysis of the scalars and tries to keep them, if it is found that
this scalar is used in other (external) PHI node in the same block.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/103923
2024-08-21 15:23:47 -04:00
Alexey Bataev
e31252bf54 [SLP]Fix PR105120: fix the order of phi nodes vectorization.
The operands of the phi nodes should be vectorized in the same order, in
which they were created, otherwise the compiler may crash when trying
to correctly build dependency for nodes with non-schedulable
instructions for gather/buildvector nodes.

Fixes https://github.com/llvm/llvm-project/issues/105120
2024-08-21 12:22:01 -07:00
tcwzxx
816068e462
[NFC][SLP] Remove useless code of the schedule (#104697)
Currently, the SLP schedule has two containers of `ScheduleData`:
`ExtraScheduleDataMap` and `ScheduleDataMap`. However, the
`ScheduleData` in `ExtraScheduleDataMap` is only used to indicate
whether the instruction is processed or not and does not participate in
the schedule, which is useless. `ScheduleDataMap` is sufficient for this
purpose. The `OpValue` member is used only in `ExtraScheduleDataMap`,
which is also useless.
2024-08-19 20:16:51 +08:00
Alexey Bataev
4a0bbbcbcf [SLP]Fix PR104637: do not create new nodes for fully overlapped non-schedulable nodes
If the scalars do not require scheduling and were already vectorized,
but in the different order, compiler still tries to create the new node.
It may cause the compiler crash for the gathered operands. Instead need
to consider such nodes as full overlap and just reshuffle vectorized
node.

Fixes https://github.com/llvm/llvm-project/issues/104637
2024-08-16 13:49:44 -07:00
Han-Kuan Chen
81f8abdca4
[SLP][REVEC] Fix CreateInsertElement does not use the correct result if MinBWs applied. (#104558) 2024-08-16 21:09:48 +08:00
Alexey Bataev
b6bb208662 Revert "[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC"
This reverts commit 2d52eb6a434fe47e67086f5ec1c3789bf6e7a604 to fix
compile time regression found in https://llvm-compile-time-tracker.com/compare.php?from=fcefe957ddfdc5a2fe9463757b597635e3436e01&to=2d52eb6a434fe47e67086f5ec1c3789bf6e7a604&stat=instructions%3Au.
2024-08-15 09:19:01 -07:00
Alexey Bataev
2d52eb6a43 [SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC 2024-08-15 08:12:20 -07:00
Alexey Bataev
56140a8258 [SLP]Fix PR104422: Wrong value truncation
The minbitwidth restrictions can be skipped only for immediate reduced
values, for other nodes still need to check if external users allow
bitwidth reduction.

Fixes https://github.com/llvm/llvm-project/issues/104422
2024-08-15 08:00:08 -07:00
Nikita Popov
aaab4fcf65 Revert "[SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC"
This reverts commit e1b15504a831e63af6fb9a6e83faaa10ef425ae6.

This causes compile-time regressions, see:
http://llvm-compile-time-tracker.com/compare.php?from=e687a9f2dd389a54a10456e57693f93df0c64c02&to=e1b15504a831e63af6fb9a6e83faaa10ef425ae6&stat=instructions:u

Probably some of the new SmallVector sizes are sub-optimal.
2024-08-15 15:50:48 +02:00
Alexey Bataev
e1b15504a8 [SLP][NFC]Remove unused using declarations, reduce mem usage in containers, NFC 2024-08-14 12:28:45 -07:00
Alexey Bataev
20b2c9f10f [SLP][NFC]Use GatheredScalars vector instead of the original E->Scalars, NFC
GateredScalars is a full copy of the E->Scalars in this places and can
be safely used for now. Unifies the code across the function.
2024-08-14 08:29:38 -07:00
Alexey Bataev
d9b9ae6ba9 [SLP][NFC]Use transform nodes before building external uses, NFC.
In preparing for the future upcoming patches, just moving the call to
the proper place, which is NFC for now.
2024-08-14 08:19:05 -07:00
Han-Kuan Chen
246f345152
[SLP][REVEC] Make CastInst support vector instructions. (#103216) 2024-08-13 23:52:32 +08:00
Han-Kuan Chen
6aad4918e8
[SLP][REVEC] Make MinBWs support vector instructions. (#103049)
If ScalarTy is FixedVectorType, it should remain as FixedVectorType.
2024-08-13 21:35:28 +08:00
Han-Kuan Chen
2256d00a14
[SLP][REVEC] Use VL.front()->getType() as ScalarTy. (#102437)
VL.front()->getType() may be FixedVectorType when revec is enabled.

Fix "Expected item in MinBWs.".
2024-08-13 19:53:45 +08:00
Han-Kuan Chen
875b551de7
[SLP][REVEC] Make computeMinimumValueSizes and collectValuesToDemote support vector instructions. (#103005) 2024-08-13 19:35:25 +08:00
Vitaly Buka
5ce47a5813
Reland "[Support] Assert that DomTree nodes share parent" (#102782)
A dominance query of a block that is in a different function is
ill-defined, so assert that getNode() is only called for blocks that are
in the same function.

There are three cases, where this behavior did occur. LoopFuse didn't
explicitly do this, but didn't invalidate the SCEV block dispositions,
leaving dangling pointers to free'ed basic blocks behind, causing
use-after-free. We do, however, want to be able to dereference basic
blocks inside the dominator tree, so that we can refer to them by a
number stored inside the basic block.

Reverts #102780
Reland #101198
Fixes #102784

Co-authored-by: Alexis Engelke <engelke@in.tum.de>
2024-08-13 11:56:02 +02:00
Han-Kuan Chen
b4b0c02306
[SLP][REVEC] Make tryToReduce and related functions support vector instructions. (#102327) 2024-08-13 11:44:23 +08:00
Han-Kuan Chen
70cf58e6c1
[SLP][REVEC] Make SLP vectorize shufflevector. (#102489)
Add getShufflevectorNumGroups to vectorize shufflevector.

Current getShufflevectorNumGroups can only vectorize limited pattern
(e.g., the masks of shufflevector use the elements of the source in
order).

In addition, ReuseShuffleIndices and ReorderIndices are not supported.
2024-08-13 11:19:29 +08:00
Alexey Bataev
ecbbe5b431
[SLP]Fix mask building for alternate node cost estimation (#102966)
Need to to use same functionality in cost model, as for the codegen, to
correctly build the shuffle mask and estimate the cost.
2024-08-12 17:26:56 -04:00
Alexey Bataev
b10ecfa914
[SLP]Represent externally used values as original scalars, if profitable.
Currently SLP vectorizer tries to keep only GEPs as scalar, if they are
vectorized but used externally. Same approach can be used for all scalar
values. This patch tries to keep original scalars if all its operands
remain scalar or externally used, the cost of the original scalar is
lower than the cost of the extractelement instruction, or if the number
of externally used scalars in the same entry is power of 2. Last
criterion allows better revectorization for multiply used scalars.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/100904
2024-08-12 10:15:02 -04:00
Alexey Bataev
34514ce09a [SLP][NFC]Use local getShuffleCost function across the code, NFC. 2024-08-12 06:49:53 -07:00
Alexey Bataev
2a05971de2 [SLP]Add index of the node to the short name output.
Improves debugging experience, does nothing with the functionality.
2024-08-08 08:57:14 -07:00
Han-Kuan Chen
7a4fc7491c
[SLP][REVEC] Fix insertelement has multiple uses. (#102329) 2024-08-08 23:23:10 +08:00
Alexey Bataev
7e7a439705
[SLP][NFC]Introduce CombinedVectorize nodes, NFC. (#99309)
This adds combined vectorized node. It simplifies handling of the
combined nodes, like select/cmp, which can be reduced to min/max,
mul/add transformed to fma, etc. Improves cost mode handling and may end
up with better codegen in future (direct emission of the intrinsics).
2024-08-08 08:05:33 -04:00
Han-Kuan Chen
60ac34701e
[SLP][REVEC] Make getAltInstrMask and getGatherCost vectorize vector instructions. (#99461) 2024-08-08 10:39:01 +08:00
John McIver
bb82c79d3b
[SLP] Enable optimization of freeze instructions (#102217)
Allow SLP optimization to progress in the presence of freeze
instructions. Prior
to this commit, freeze instructions blocked SLP optimization.
    
The following URL shows correctness of the addsub_freeze test:
https://alive2.llvm.org/ce/z/qm38oh
2024-08-07 15:01:37 -04:00
Han-Kuan Chen
97743b8be8
[SLP][REVEC] Make ShuffleCostEstimator and ShuffleInstructionBuilder support vector instructions. (#99499)
1. When REVEC is enabled, we need to expand vector types into scalar
types.
2. When REVEC is enabled, CreateInsertVector (and CreateExtractVector)
is used because the scalar type may be a FixedVectorType.
3. Since the mask indices which are used by processBuildVector expect
the source is scalar type, we need to transform the mask indices into a
form which can be used when REVEC is enabled. The transform is only
called when the mask is really used.
2024-08-07 23:47:57 +08:00
Alexey Bataev
441f94f4bd [SLP]Fix PR102279: check the tracked values for extractelements, not the original values
If the reduced value was replaced by the extractelement instruction
during vectorization and we attempt to check if this is so, need to
check the tracked value, not the original (deleted) instruction.
Otherwise, the compiler may crash

Fixes https://github.com/llvm/llvm-project/issues/102279
2024-08-07 04:21:24 -07:00
tcwzxx
b64ec3c9fa
[SLP] The order of store chains needs to consider the size of the values. (#101810)
When store chains have the same value type ID and pointer type ID, they
may mix different sizes of values, such as i8 and i64. This can lead to
missed vectorization opportunities.
2024-08-07 11:01:53 +08:00
Alexey Bataev
af80d3a248
[SLP]Better sorting of phi instructions by comparing type sizes (#102188)
Currently SLP vectorizer compares phi instructions by the type id of the
compared instructions, which may failed in case of different integer
types,
with the different sizes. Patch adds comparison by type sizes to fix
this.
2024-08-06 16:09:11 -04:00
Alexey Bataev
2601d6f189 [SLP]Fix PR102187: do not insert extractelement before landingpad instruction.
Landingpad instruction must be the very first instruction after the phi
nodes, so need to inser extractelement/shuffles after this instruction.

Fixes https://github.com/llvm/llvm-project/issues/102187
2024-08-06 12:33:13 -07:00
Alexey Bataev
3c3ea7e751
[SLP]Better sorting of cmp instructions by comparing type sizes.
Currently SLP vectorizer compares cmp instructions by the type id of the
compared operands, which may failed in case of different integer types,
for example, which have same type id, but different sizes. Patch adds
  comparison by type sizes to fix this.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/102132
2024-08-06 11:03:36 -04:00
Alexey Bataev
daf4a06e5c
[SLP]Try detect strided loads, if any pointer op require extraction.
If any pointer operand of the non-cosencutive loads is an instructions
with the user, which is not part of the current graph, and, thus,
requires emission of the extractelement instruction, better to try to
detect if the load sequence can be repsented as strided load and
extractelement instructions for pointers are not required.

Reviewers: preames, RKSimon, topperc

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/101668
2024-08-06 09:20:50 -04:00
Alexey Bataev
799fd3d87b
[SLP]Support vectorization of small strided loads only graph.
If the graph includes only strided loads node, the compiler should still
try to vectorize it.

Reviewers: RKSimon, preames, topperc

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/101659
2024-08-05 12:51:10 -04:00
Kazu Hirata
b7146aed5b
[Transforms] Construct SmallVector with ArrayRef (NFC) (#101851) 2024-08-03 15:33:08 -07:00
Florian Hahn
edf46f365c
[SCEV] Use const SCEV * explicitly in more places.
Use const SCEV * explicitly in more places to prepare for
https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.
2024-08-03 20:10:01 +01:00
Han-Kuan Chen
b5a7d3b6c2
[SLP][REVEC] Make Instruction::Select support vector instructions. (#100507) 2024-07-31 23:03:50 +08:00
Alexey Bataev
6b1d13761a [SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value.
When trying to reuse extractelement instruction, need to check that it
is inserted into proper position. Its original vector operand should
come before new vector value, otherwise new extractelement instruction
must be generated.

Fixes https://github.com/llvm/llvm-project/issues/101213
2024-07-30 16:02:46 -07:00
Alexey Bataev
a6ef0864e9 Revert "[SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value."
This reverts commit f70f1228035c9610de38e0e376afdacb647c4ad9 to fix the
crash reported by https://lab.llvm.org/buildbot/#/builders/133/builds/2456.
2024-07-30 15:11:35 -07:00
David Green
89b67a6400
[SLP] Cluster SortedBases before sorting. (#101144)
In order to enforce a strict-weak ordering, this patch clusters the
bases that are being sorted by the root - the first value in a gep
chain. The sorting is then performed in each cluster.
2024-07-30 22:12:20 +01:00
Alexey Bataev
f70f122803 [SLP]Fix PR101213: Reuse extractelement, only if its vector operand comes before new vector value.
When trying to reuse extractelement instruction, need to check that it
is inserted into proper position. Its original vector operand should
come before new vector value, otherwise new extractelement instruction
must be generated.

Fixes https://github.com/llvm/llvm-project/issues/101213
2024-07-30 14:04:50 -07:00
Alexey Bataev
197f4a9051
[SLP]Remove ExtraArgs from reductions.
No need to handle extra arguments during the reductions anymore, the
compiler now can handle all reduced values and reduction operands
correctly, even if they are from different basic blocks.

Simplifies analysis, reduces compiler size, improves overall
vectorization.

Metric: size..text
test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test    16668.00    17148.00  2.9%
test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test  2389675.00  2418683.00  1.2%
test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   253517.00   253645.00  0.1%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   309678.00   309806.00  0.0%
test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   389203.00   389363.00  0.0%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   111120.00   111152.00  0.0%
test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1039103.00  1039215.00  0.0%
test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1155883.00  1155963.00  0.0%
test-suite :: MicroBenchmarks/LoopVectorization/LoopInterleavingBenchmarks.test   276646.00   276662.00  0.0%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test   848691.00   848739.00  0.0%
test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  1138604.00  1138636.00  0.0%
test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   910201.00   910217.00  0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12385484.00 12385628.00  0.0%
test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  9667580.00  9667676.00  0.0%
test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  9667580.00  9667676.00  0.0%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test  2856182.00  2856198.00  0.0%
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  2856182.00  2856198.00  0.0%
test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   773224.00   773192.00 -0.0%
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1035148.00  1035084.00 -0.0%
test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    98126.00    98094.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test    97966.00    97934.00 -0.0%
test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test   167391.00   167215.00 -0.1%
test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test    56685.00    56605.00 -0.1%
test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test    56685.00    56605.00 -0.1%
test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-20050826-2.test     1302.00     1294.00 -0.6%

Misc-C++/stepanov_container - better code due to cost fixes.
483.xalancbmk - better code due to cost fixes.
ASCI_Purple/SMG2000 - better code due to cost fixes.
Benchmarks/Bullet - better vector code because of the cost.
JM/ldecod - extra code remain scalar, extra reduction vectorized
consumer-jpeg - extra code remain scalar because of the cost.
tramp3d-v4 - better vectorization because of cost fixes.
511.povray_r - better vectorization because of cost fixes.
LoopInterleavingBenchmarks - extra reductions are vectorized
JM/lencod - small changes in vector code because of extract cost fixes.
453.povray - small changes in vector code because of extract cost fixes.
445.gobmk - extra small reduction vectorized
526.blender_r - extra reduced scalars, better small reduction, small
changes in the vetorization because of the fixes for extracts cost
602.gcc_s
502.gcc_r - small changes in reductions vectorization because of the
fixes in the extract cost.
631.deepsjeng_s
623.xalancbmk_s - small changes in reductions vectorization because of
the fixes in the extract cost.
MallocBench/gs - extra code remain scalar because of extracts cost
alacconvert-encode - extra code remain scalar because of extracts cost
alacconvert-decode - extra code remain scalar because of extracts cost
GCC-C-execute-20050826-2 - extra reduction gets vectorized

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/99923
2024-07-29 13:23:56 -04:00
David Green
f2d2ae3f5a
[SLP] Order clustered load base pointers by ascending offsets (#100653)
This attempts to fix a regression from #98025, where the new order of
reduction nodes causes later passes to not be able to produce as nice
shuffles. The issue boils down to picking an order of [0 1 3 2] for
loaded v4i8 values, which meant later parts could not find a simpler
ordering for the shuffles given the legal nodes available in AArch64. If
instead we make sure they are ordered [0 1 2 3] then everything can fall
into place.

In order to produce a better order that is more likely to work in more
cases, this patch takes the existing clustered loads and sort the base
pointers if there is an order between them. i.e if `V2 == gep (V1, X)`
then V1 is sorted before V2.
2024-07-27 11:18:56 +01:00
Alexey Bataev
1e1c8d1615
[SLP]Add external uses cost for the gathered loads.
If the load is a part of the gather node and also a part of the
vectorized subvector, need to add the estimation for the non-vectorized
external uses.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/99889
2024-07-26 11:09:44 -04:00
Han-Kuan Chen
5fc9502f19
[SLP] NFC. ShuffleInstructionBuilder::add V1->getType() is always a FixedVectorType. (#99842)
castToScalarTyElem has a cast<VectorType>(V->getType()).
2024-07-24 01:40:24 +08:00
Alexey Bataev
3cb82f49dc [SLP]Fix PR99899: Use canonical type instead of original vector of ptr.
Use adjusted canonical integer type instead of the original ptr type to
fix the crash in the TTI.
Fixes https://github.com/llvm/llvm-project/issues/99899
2024-07-22 13:05:12 -07:00
Alexey Bataev
f6e01b9ece [SLP]Do not trunc bv nodes, if the user is vectorized an requires wider type.
If at least a single user of the gathered trunc'ed instruction is
vectorized and requires wider type, than the trunc node, such
gathers/buildvectors should not be optimized for better bitwidth.
2024-07-19 07:28:04 -07:00
Yangyu Chen
007aa6d1b2
[SLP] Increase UsesLimit to 64 (#99467)
Since commit 82b800ecb35fb46881aa52000fa40b1b99aa654e addressed the
issue #99327 , we see some performance regression (13%) on some
verilator generated C++ code. This is because the UsesLimit is set to 8,
which is too small for the verilator generated code. I have analyzed the
need for the UsesLimit from [1] and found that the UsesLimit should be
at least 64 to cover most of these cases. Thus, This patch increases the
UsesLimit to 64.

Link:
https://github.com/llvm/llvm-project/issues/99327#issuecomment-2236052879
[1]

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
2024-07-19 20:32:28 +08:00
Han-Kuan Chen
39bb244a16
[SLP][REVEC] Make Instruction::Call support vector instructions. (#99317) 2024-07-18 20:49:53 +08:00