1197 Commits

Author SHA1 Message Date
Philip Reames
5cd427106d [TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc]
OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling.  We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so.

This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works.  Target TTI implementations still use the split flags.  I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.
2022-08-22 09:48:15 -07:00
Simon Pilgrim
5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Kazu Hirata
8b1b0d1d81 Revert "Use std::is_same_v instead of std::is_same (NFC)"
This reverts commit c5da37e42d388947a40654b7011f2a820ec51601.

This patch seems to break builds with some versions of MSVC.
2022-08-20 23:00:39 -07:00
Kazu Hirata
c5da37e42d Use std::is_same_v instead of std::is_same (NFC) 2022-08-20 22:36:26 -07:00
Kazu Hirata
258531b7ac Remove redundant initialization of Optional (NFC) 2022-08-20 21:18:28 -07:00
Philip Reames
b0a2c48e9f [tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc] 2022-08-19 16:22:22 -07:00
Alexey Bataev
c167028684 [SLP]Delay vectorization of postponable values for instructions with no users.
SLP vectorizer tries to find the reductions starting the operands of the
instructions with no-users/void returns/etc. But such operands can be
postponable instructions, like Cmp, InsertElement or InsertValue. Such
operands still must be postponed, vectorizer should not try to vectorize
them immediately.

Differential Revision: https://reviews.llvm.org/D131965
2022-08-19 08:39:16 -07:00
Alexey Bataev
0e7ed32c71 [SLP]Cost for a constant buildvector.
In many cases constant buildvector results in a vector load from a
constant/data pool. Need to consider this cost too.

Differential Revision: https://reviews.llvm.org/D126885
2022-08-19 08:02:42 -07:00
Alexey Bataev
d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Simon Pilgrim
594c5b1a42 [SLP] Update TODO comment about shuffle mask decoding
This is handled in ShuffleVectorInst/getShuffleCost - getInstructionThroughput is (slowly) being removed.
2022-08-17 11:41:46 +01:00
Alexey Bataev
65c7cecb13 [SLP]Fix PR51320: Try to vectorize single store operands.
Currently, we try to vectorize values, feeding into stores, only if
slp-vectorize-hor-store option is provided. We can safely enable
vectorization of the value operand of a single store in the basic block,
if the operand value is used only in store.
It should enable extra vectorization and should not increase compile
time significantly.
Fixes https://github.com/llvm/llvm-project/issues/51320

Differential Revision: https://reviews.llvm.org/D131894
2022-08-16 07:25:21 -07:00
Philip Reames
e792a353b5 [slp] adjust debug output to include final computed cost 2022-08-15 13:51:39 -07:00
Alexey Bataev
2819126d0c [SLP][NFC]Replace multiple isa calls with single one where possible,
NFC.
2022-08-15 11:56:58 -07:00
Fangrui Song
de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Kazu Hirata
0e37ef0186 [Transforms] Fix comment typos (NFC) 2022-08-07 23:55:24 -07:00
Dawid Jurczak
1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
Fangrui Song
7d6017fd31 [TTI] Change new getVectorInstrCost overload to use const reference after D131114
A const reference is preferred over a non-null const pointer.
`Type *` is kept as is to match the other overload.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D131197
2022-08-04 15:16:51 -07:00
Mingming Liu
bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
Kazu Hirata
acf648b5e9 Use llvm::less_first and llvm::less_second (NFC) 2022-07-24 16:21:29 -07:00
William Schmidt
bccc9aa81c Don't vectorize PHIs in catchswitch blocks
We currently assert in vectorizeTree(TreeEntry*) when processing a PHI
bundle in a block containing a catchswitch.  We attempt to set the
IRBuilder insertion point following the catchswitch, which is invalid.
This is done so that ShuffleBuilder.finalize() knows where to insert
a shuffle if one is needed.

To avoid this occurring, watch out for catchswitch blocks during
buildTree_rec() processing, and avoid adding PHIs in such blocks to
the vectorizable tree.  It is unlikely that constraining vectorization
over an exception path will cause a noticeable performance loss, so
this seems preferable to trying to anticipate when a shuffle will and
will not be required.
2022-07-19 06:10:17 -07:00
Kazu Hirata
7094ab4ee7 [llvm] Modernize bool literals (NFC)
Identified with modernize-use-bool-literals.
2022-07-17 18:08:51 -07:00
Kazu Hirata
611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
Craig Topper
0266773464 [SLP] Add missing space to optimization remark.
Reviewed By: vporpo

Differential Revision: https://reviews.llvm.org/D129330
2022-07-07 23:29:11 -07:00
David Green
2de05afc19 [SLP] Peek into loads when hitting the RecursionMaxDepth
This patch slightly extends the limit on the RecursionMaxDepth inside
the SLP vectorizer. It does it only when it hits a load (or zext/sext of
a load), which allows it to peek through in the places where it will be
the most valuable, without ballooning out the O(..) by any 2^n factors.

Differential Revision: https://reviews.llvm.org/D122148
2022-07-04 14:22:50 +01:00
Alexey Bataev
4be3fc35aa [SLP][NFC]Cleanup up operands of the removed insertelements, NFC.
Replace all operands of the insertelement instruction, replaced by
shuffles, by poisons to avoid false-positive reports about incorrect function.
2022-06-30 17:51:43 -07:00
Alexey Bataev
bf4dcbd2df [SLP]Fix PR56251: Do not remove the reordering from the root node, being used as an operand.
If the root order itself does not require reordering, we can just
remove its reorder mask safely (e.g., if the root node is a vector of
phis). But if this node is used as an operand in the graph, we cannot
delete the reordering, need to keep it. Otherwise the graph nodes are
not synchronized with the operands. It may cause an extra gather
instruction(s) or a compiler crash.
Also, need to be very careful when selecting the gather nodes for
reordering since there might several gather nodes with the same scalars
and we can try to reorder just the same node many times instead of
different nodes.

Differential Revision: https://reviews.llvm.org/D128680
2022-06-28 13:42:05 -07:00
Guillaume Chatelet
3c126d5fe4 [Alignment] Replace commonAlignment with std::min
`commonAlignment` is a shortcut to pick the smallest of two `Align`
objects. As-is it doesn't bring much value compared to `std::min`.

Differential Revision: https://reviews.llvm.org/D128345
2022-06-28 07:15:02 +00:00
Kazu Hirata
a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata
3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
2022-06-25 11:56:50 -07:00
Kazu Hirata
aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Alexey Bataev
2faacf61a5 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-24 09:28:01 -07:00
Alexey Bataev
3b6edef15d [SLP]Fix a crash when reorder masked gather nodes with reused scalars.
If the masked gather nodes must be reordered, we can just reorder
scalars, just like for gather nodes. But if the node contains reused
scalars, it must be handled same way as a regular vectorizable node,
since need to reorder reused mask, not the scalars directly.

Differential Revision: https://reviews.llvm.org/D128360
2022-06-23 11:32:30 -07:00
Fangrui Song
1ffd2d99c2 Revert D115462 "[SLP]Improve shuffles cost estimation where possible."
This reverts commit cac60940b771a0685d058a5b471c84cea05fdc46.

Caused -Os -fsanitize=memory -march=haswell miscompile to pytorch/cpuinfo.
See my latest comment (may update) on D115462.
2022-06-22 23:16:31 -07:00
Fangrui Song
a411bc11d6 Revert "[SLP]Fix a crash when insert subvector is out of range."
This reverts commit f1ee2738b3d70fea803ac1f3401c2fc9f61e514a.

Revert due to the revert of a dependent commit `[SLP]Improve shuffles cost estimation where possible.`
2022-06-22 23:16:25 -07:00
Guillaume Chatelet
57ffff6db0 Revert "[NFC] Remove dead code"
This reverts commit 8ba2cbff70f2c49a8926451c59cc260d67b706cf.
2022-06-22 14:55:47 +00:00
Guillaume Chatelet
8ba2cbff70 [NFC] Remove dead code 2022-06-22 13:33:58 +00:00
Vasileios Porpodas
7a9ad25769 Recommit "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6d6268dcbf0f48e43f6f9fe46b3a28c29ba63c7d.

Review: https://reviews.llvm.org/D125712
2022-06-21 18:35:29 -07:00
Vasileios Porpodas
6d6268dcbf Revert "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6f88acf410b48f3e6c1526df2dc32ed86f249685.
2022-06-21 17:07:21 -07:00
Vasileios Porpodas
6f88acf410 [SLP][X86] Improve reordering to consider alternate instruction bundles
During the reordering transformation we should try to avoid reordering bundles
like fadd,fsub because this may block them being matched into a single vector
instruction in x86.
We do this by checking if a TreeEntry is such a pattern and adding it to the
list of TreeEntries with orders that need to be considered.

Differential Revision: https://reviews.llvm.org/D125712
2022-06-21 16:44:48 -07:00
Alexey Bataev
d4ee43153d [SLP][NFC]Fix a warning in a comparison, NFC.
Fixed signedness warning.
2022-06-21 10:19:47 -07:00
Alexey Bataev
f1ee2738b3 [SLP]Fix a crash when insert subvector is out of range.
If the OffsetBeg + InsertVecSz is greater than VecSz, need to estimate
the cost as shuffle of 2 vector, not as insert of subvector. Otherwise,
the inserted subvector is out of range and compiler may crash.

Differential Revision: https://reviews.llvm.org/D128071
2022-06-21 07:16:35 -07:00
Kazu Hirata
7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata
e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Kazu Hirata
129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Alexey Bataev
76782a65ee [SLP]Use original vector if need to shuffle truncated root.
If the root scalar is mapped to to the smallest bit width, the vector is
truncated and the types between original buildvector and extracted value
mismatched. For extract, we emit sext/zext instructions, for shuffles we
can reuse oringal vector instead of the truncated one.

Differential Revision: https://reviews.llvm.org/D127974
2022-06-16 10:41:18 -07:00
Alexey Bataev
7236d49fd5 [SLP]Extend vectorization for scatter vectorize nodes.
Currently scatter vectorize nodes can be emitted only for GEPs with
constant indices. But we can also emit such nodes for GEPs with the same
ptr and non-constant vectorizable/gathered indices, if profitable. Patch
adds support for such nodes and tries to improve handling of GEPs with
non-const indeces for such nodes.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                                              results                   results0 diff
                    test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27550.00                  27507.00  -0.2%
                               test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  5395.00                   5380.00  -0.3%
                       test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5389.00                   5374.00  -0.3%
                    test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   961.00                    958.00  -0.3%
                   test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   961.00                    958.00  -0.3%
                               test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test  5664.00                   5643.00  -0.4%
                       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 13202.00                  13127.00  -0.6%
                                test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   212.00                    207.00  -2.4%
                                test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   890.00                    850.00  -4.5%
                            test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  1695.00                   1581.00  -6.7%
                                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test  2338.00                   2140.00  -8.5%
                                  test-suite :: SingleSource/UnitTests/matrix-types-spec.test    63.00                     55.00 -12.7%
                             test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   468.00                    356.00 -23.9%
                                                                           Geomean difference                                     -0.3%

All numbers show increased number of generated vector instructions.

Diff:
SingleSource/Benchmarks/Adobe-C++/loop_unroll - better without LTO, but
need an extra analysis with LTO (with LTO compiler generates
masked_gather, while before regular loads were emitted because of extra
data, availbale at LTO time).
SingleSource/UnitTests/matrix-types-spec - more vector code.
MultiSource/Applications/JM/lencod/lencod - same.
External/SPEC/CINT2006/464.h264ref/464.h264ref - same.
MultiSource/Benchmarks/7zip/7zip-benchmark - same.
External/SPEC/CINT2006/445.gobmk/445.gobmk - no changes.
External/SPEC/CFP2017rate/510.parest_r/510.parest_r - more vector code.
External/SPEC/CFP2006/447.dealII/447.dealII - same
External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s - same
External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp - same
External/SPEC/CFP2017rate/511.povray_r/511.povray - same
External/SPEC/CFP2006/453.povray/453.povray - same
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - same
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - same
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same

Differential Revision: https://reviews.llvm.org/D127219
2022-06-16 06:05:48 -07:00
Alexey Bataev
c60c13f7eb [SLP] Improve reordering in presence of constant only nodes.
We can skip the analysis of the constant nodes, their order should not
affect the ordering of the trees/subtrees.

Differential Revision: https://reviews.llvm.org/D127775
2022-06-15 06:17:34 -07:00
Alexey Bataev
cac60940b7 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-03 08:06:22 -07:00
Fangrui Song
df0f30dc36 Revert "[SLP]Improve shuffles cost estimation where possible."
This reverts commit 9980c9971892378ea82475e000de8df210a58e69.

Caused assertion failures: https://reviews.llvm.org/D115462#3555350
2022-06-03 00:30:34 -07:00
Alexey Bataev
9980c99718 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-02 11:18:14 -07:00