1762 Commits

Author SHA1 Message Date
Alexey Bataev
3742c2a83c [SLP]Use stored signedness after minbitwidth analysis.
Need to used stored signedness info for the root node instead of
recalculating it after the vectorization, which may lead to a compiler
crash.
2024-07-10 03:58:00 -07:00
Alexey Bataev
af21bc1917 [SLP]Fix a crash on attempt to revectorize vectorized phi.
If the PHI node is vectorized during vectorization of its operands, no
need to try to vectorize its operands once again.
2024-07-09 14:11:08 -07:00
Alexey Bataev
a988821123
[SLP]Keep the original order in the reductions.
The patch tries to keep the original order of the instruction in the
reductions. Previously, two first instructions were switched, giving
reverse order.
The first step to support of the ordered reductions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/98025
2024-07-09 12:26:42 -04:00
Alexey Bataev
2cba218ca5 [SLP]Fix PR98133: Inserting PHI after debug-records!
The phi-node-to-be-deleted still should be inserted as the first
instruction in the block to avoid random compiler crashes.

Fixes https://github.com/llvm/llvm-project/issues/98133
2024-07-09 05:44:45 -07:00
Alexey Bataev
f5ee07a1b5
[SLP]Improve instruction reordering mode detection.
The "instruction" reordering mode should be selected only if there are
compatible instructions in other operands, which can be reordered.
Otherwise, better to select splat reordering mode.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff

test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12383340.00 12383324.00 -0.0%

Some 4x operations get replaced by 8x.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97485
2024-07-08 16:01:55 -04:00
Alexey Bataev
385118644c [SLP]Remove operands upon marking instruction for deletion.
If the instruction is marked for deletion, better to drop all its
operands and mark them for deletion too (if allowed). It allows to have
more vectorizable patterns and generate less useless extractelement
instructions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97409
2024-07-08 07:56:48 -07:00
Alexey Bataev
4c47b41771
[SLP]Allow matching and shuffling of extractelement vector operands with different VF.
Allows better codegen with the free resizing of small VF vector operands
and then regular shuffling of the operands of the same size and
simplifies the code.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97414
2024-07-08 09:27:08 -04:00
Jon Roelofs
d3a76b03d8
[llvm][SLPVectorizer] Fix a bad cast assertion (#97621)
Fixes: rdar://128092379
2024-07-03 16:25:32 -07:00
Alexey Bataev
873c3f7e78 Revert "[SLP]Remove operands upon marking instruction for deletion."
This reverts commit bbd52dd44ceee80e3b6ba6a9b2bd8ee9a9713833 to fix
a crash revealed in https://lab.llvm.org/buildbot/#/builders/4/builds/505
2024-07-03 13:05:17 -07:00
Alexey Bataev
bbd52dd44c
[SLP]Remove operands upon marking instruction for deletion.
If the instruction is marked for deletion, better to drop all its
operands and mark them for deletion too (if allowed). It allows to have
more vectorizable patterns and generate less useless extractelement
instructions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97409
2024-07-03 15:11:18 -04:00
Alexey Bataev
4eecf3c650
[SLP]Reorder buildvector/reduction vectorization and fuse the loops.
Currently SLP vectorizer tries at first to find reduction nodes, and
then vectorize buildvector sequences. Need to try to vectorize wide
buildvector sequences at first and only then try to vectorize
reductions, and then smaller buildvector sequences.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/96943
2024-07-03 14:36:30 -04:00
Alexey Bataev
96c18a2769 [SLP][NFC]Make instructions non-foldable, NFC 2024-07-03 11:12:37 -07:00
Alexey Bataev
139508eb87 [SLP][NFC]Add a test with inefficient reordering of operands, NFC. 2024-07-02 12:04:34 -07:00
Gabriel Baraldi
380beaec86
Fix potential crash in SLPVectorizer caused by missing check (#95937)
I'm not super familiar with this code, but it seems that we were just
missing a check.

The original code that triggered this did not have uselistorders but
llvm-reduce created them and it reproduces the same issue in a way more
compact way.

Fixes https://github.com/llvm/llvm-project/issues/95016
2024-07-02 08:15:51 -04:00
Alexey Bataev
d70963a762 [SLP]Fix the cost of the adjusted extracts in per-register analysis.
Previous patch did not pass the list of the extract indices by
reference, so the compiler just ignored them. Pass indices by reference
and fix the per-register analysis.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/96808
2024-06-28 14:33:08 -07:00
Alexey Bataev
a9c12e481b Revert "[SLP]Fix the cost of the adjusted extracts in per-register analysis."
This reverts commit 784152056ea40a800a8fd9f4157a428dfb7a6de8 to fix
buildbots issues reported in
https://lab.llvm.org/buildbot/#/builders/4/builds/315 and https://lab.llvm.org/buildbot/#/builders/35/builds/481
2024-06-28 13:41:51 -07:00
Alexey Bataev
784152056e
[SLP]Fix the cost of the adjusted extracts in per-register analysis.
Previous patch did not pass the list of the extract indices by
reference, so the compiler just ignored them. Pass indices by reference
and fix the per-register analysis.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/96808
2024-06-28 15:49:47 -04:00
Farzon Lotfi
918313d17d
[SLPVectorizer] Support SLPVectorizer cases of tan across all backends (#95517)
This PR is intended to address the limited SLPVectorizer support of tan
raised in the comments of this PR:
https://github.com/llvm/llvm-project/pull/94559.

Right now emitting the tan intrinsisic allows you to vectorize tan, but
emitting the libfunc does not. to address this the libcall needs to be
mapped to the intrinsic. and the libcall and function name need to be
marked approriately so they can be optimized or defined as a call
lowering.
2024-06-27 15:15:13 -04:00
Alexey Bataev
0280f97b36 [SLP]Fix PR95925: extract vectorized index of the potential buildvector sequence.
If the vectorized scalar is not the insert value in the buildvector
sequence but the index, it should be always extracted.
2024-06-25 14:07:51 -07:00
Alexey Bataev
228c2e1473 [SLP]Fix incorrect promotion of nodes before shuffling.
If the base node is signed, but some values are unsigned, still the
whole node should be considered signed. Also, an extra bitwidth analysis
should be performed, when estimating the minimal bitwidth.
2024-06-25 13:39:28 -07:00
Alexey Bataev
a55dc1d3ca [SLP][NFC]Add a test with the incorrect casting of the sext/zext alternate node, NFC. 2024-06-25 12:34:30 -07:00
Stephen Tozer
094572701d
[RemoveDIs] Print IR with debug records by default (#91724)
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records. 

If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.

For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
2024-06-14 15:07:27 +01:00
Nikita Popov
deab451e7a
[IR] Remove support for icmp and fcmp constant expressions (#93038)
Remove support for the icmp and fcmp constant expressions.

This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179

As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
2024-06-04 08:31:03 +02:00
Alexey Bataev
70a54bca6f
[SLP]Improve/fix extracts calculations for non-power-of-2 elements.
One of the previous patches introduced initial support for non-power-of-2
number of elements but some parts of the SLP vectorizer still were not
adjusted to handle the costs correctly. Patch fixes it by improving
analysis of the non-power-of-2 number of elements and fixes in the cost
of the extractelements instructions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/93213
2024-05-24 09:33:36 -04:00
Freddy Ye
4def1ce101
Reland "[X86] Remove knl/knm specific ISAs supports (#92883)" (#93136)
This reverts commit aa4069ea96e5eb62bc8c7895b9d920f129611b3a.
2024-05-24 13:46:34 +08:00
Freddy Ye
aa4069ea96
Revert "[X86] Remove knl/knm specific ISAs supports (#92883)" (#93123)
This reverts commit 282d2ab58f56c89510f810a43d4569824a90c538.
2024-05-23 10:25:23 +08:00
Freddy Ye
282d2ab58f
[X86] Remove knl/knm specific ISAs supports (#92883)
Cont. patch after https://github.com/llvm/llvm-project/pull/75580
2024-05-23 09:46:44 +08:00
Alexey Bataev
554c47c8e9 [SLP]Fix undef poison vector values shuffles with poisonous vectors.
If trying to find vector value in shuffling of the extractelements and
one of the vector values is undef value, need to generate real mask value
for such vector and either undef vector, or incoming second vector, if
  non-poisonous.
2024-05-22 10:41:57 -07:00
Alexey Bataev
30d484fa99 [SLP]Fix a crash when trying to convert masked gather nodes to strided.
Need to check if the loads node is masked gather. Only vectorized loads
can be converted to strided.
2024-05-22 08:08:56 -07:00
Jeffrey Byrnes
ea43a30899
[AMDGPU] Vectorize more 16 bit shuffles (#90648)
In the case of larger vectors, we should still prefer the vectorized
version (i.e. shufflevector vs extract/insert chains).

In arithmetic chains, vectorization results in chains of packed math
instructions (as opposed to unpack/repack & scalarized arithmetic):
https://godbolt.org/z/c5onaf6G5

In chains with PHIs, vectorization again removes the unnecessary pack /
repack code around BBs: https://godbolt.org/z/vz7zYzvhs
2024-05-21 09:21:36 -07:00
Nikita Popov
8e8d2595da
[ConstantFolding] Canonicalize constexpr GEPs to i8 (#89872)
This patch canonicalizes constant expression GEPs to use i8 source
element type, aka ptradd. This is the ConstantFolding equivalent of the
InstCombine canonicalization introduced in #68882.

I believe all our optimizations working on constant expression GEPs
(like GlobalOpt etc) have already been switched to work on offsets, so I
don't expect any significant fallout from this change.

This is part of:
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699
2024-05-20 11:47:30 +02:00
Alexey Bataev
58a94b1d0a [SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type.
Need to look through the SExt/ZExt scalars to be gathered, when trying
to reduce their width after minbitwidth analysis to prevent permanent
attempts to revectorize such gathered instructions.
2024-05-09 04:19:43 -07:00
Arthur Eubanks
2fb3774321 Revert "[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type."
This reverts commit 2475efa91d8b4fa8f1a2d16052cb6d14be7d5dc6.

Causes crashes, see comments on 2475efa91d.
2024-05-08 23:01:47 +00:00
Alexey Bataev
2475efa91d [SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type.
Need to look through the SExt/ZExt scalars to be gathered, when trying
to reduce their width after minbitwidth analysis to prevent permanent
attempts to revectorize such gathered instructions.
2024-05-08 07:25:19 -07:00
Alexey Bataev
f00f294130 [SLP]Fix PR91309: Do not consider SExt as always producing signed result.
Still need to do the full analysis of the signedness of the values
rather than rely on Instruction opcode, if the opcode is SExt. Still may
produce unsigned result.
2024-05-07 08:57:52 -07:00
Alexey Bataev
5d9b549bb0 [SLP][NFC]Add a test showing incorrect signedness detection in sext nodes. 2024-05-07 06:46:30 -07:00
Alexey Bataev
c144157f3d [SLP]Use last pointer instead of first for reversed strided stores.
Need to use the last address of the vectorized stores for the strided
stores, not the first one, to correctly store the data.
2024-05-06 10:16:28 -07:00
Alexey Bataev
a476032101 [SLP]Fix PR91025: correctly handle smin/smax of signed operands.
Need to check that the signed operand has an extra sign bit to be sure
that we do not skip signedness, when trying to minimize bitwidth for
smin/smax intrinsics.
2024-05-06 08:10:20 -07:00
Alexey Bataev
d584df6c8f [SLP][NFC]Add a test with incorrect smin analysis for minimal bitwidth, NFC. 2024-05-06 07:46:41 -07:00
Alexey Bataev
03972261a9 [SLP]Fix PR90892: do a correct sign analysis of the entries elements in gather shuffles.
Need to do extra analysis of the scalar elements of the tree entry to be
shuffled instead of the vectorized value to correctly deduce signedness
info.
2024-05-03 14:01:25 -07:00
Alexey Bataev
9620d3ee3e [SLP][NFC]Add a test with incorrect casting of shuffled gathered values, NFC. 2024-05-03 13:54:51 -07:00
Simon Pilgrim
8a0073ad46
[CostModel][X86] Treat lrint/llrint as fptosi calls (#90883)
X86 can use the CVTP2SI instructions to lower lrint/llrint calls, which have the same costs as the CVTTP2SI (fptosi) instructions

Followup to #90065
2024-05-03 18:06:50 +01:00
Simon Pilgrim
e4bb6634cd [SLP][X86] Add test coverage for rint/lrint/llrint fp calls 2024-05-02 18:51:35 +01:00
Alexey Bataev
5e67c41a93 [SLP]Fix PR90780: insert cast instruction for PHI nodes after all phi nodes.
Need to check if the vectorized value is a PHINode before insert casting
instruction and insert it after all phis to generate the code correctly.
2024-05-02 06:30:14 -07:00
Alexey Bataev
fc382db239
[SLP]Improve comparison of shuffled loads/masked gathers by adding GEP cost.
In some cases masked gather is less profitable than insert-subvector of
consecutive/strided stores. SLP has this kind of analysis, but need to
improve it by adding the cost of the GEP analysis.
Also, the GEP cost estimation for masked gather is fixed.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/90737
2024-05-01 15:53:25 -04:00
Alexey Bataev
59ef94d7cf
[SLP]Do not include the cost of and -1, <v> and emit just <v> after MinBitWidth.
After minbitwidth analysis, and <v>, (power_of_2 - 1 const) can be
transformed into just an <v>, (all_ones const), which can be ignored at
the cost estimation and at the codegen. x264 benchmark has this pattern.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/90739
2024-05-01 15:52:23 -04:00
Alexey Bataev
e83c6ddf46 [SLP][NFC]Add a test with the non profitable masked gather loads. 2024-05-01 07:41:35 -07:00
Alexey Bataev
576261ac8f
[SLP]Improve reordering for consts, splats and ops from same nodes + improved analysis.
Improved detection of const/splat candidates, their matching and analysis of instructions from same nodes.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                                                                                                       results     results0    diff
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test    92952.00    93096.00  0.2%
                                                                                     test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   779832.00   780136.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test   839923.00   840179.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   392708.00   392740.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1171131.00  1171147.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1391089.00  1391073.00 -0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1391089.00  1391073.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12352780.00 12352636.00 -0.0%

MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE - small
reordering
External/SPEC/CINT2006/464.h264ref/464.h264ref - small better code after
reordering
MultiSource/Applications/JM/lencod/lencod - smaller code with less
shuffles
MultiSource/Applications/JM/ldecod/ldecod - same
External/SPEC/CFP2017rate/511.povray_r/511.povray_r - 2 extra loads
vectorized, smaller code
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - better code,
size increased because of more constant vectors.
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - small change in
the vectorized code, some code a bit better, some a bit worse.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/87091
2024-05-01 07:34:06 -04:00
Alexey Bataev
67e726a2f7
[SLP]Transform stores + reverse to strided stores with stride -1, if profitable.
Adds transformation of consecutive vector store + reverse to strided
stores with stride -1, if it is profitable

Reviewers: RKSimon, preames

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/90464
2024-05-01 07:32:33 -04:00
Alexey Bataev
ef78edafab [SLP][NFC]Add a test with the optimizable and and final ext, NFC. 2024-04-29 07:46:21 -07:00