45 Commits

Author SHA1 Message Date
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00
Arthur Eubanks
07389535a7 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit b186f1f68be11630355afb0c08b80374a6d31782.

Causes crashes, see https://reviews.llvm.org/D158449.
2023-10-04 14:37:16 -07:00
Alexey Bataev
b186f1f68b [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-04 07:53:30 -07:00
Alexey Bataev
1129dec778 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 6f43d28f3452b3ef598bc12b761cfc2dbd0f34c9 to fix
a crash reported in https://reviews.llvm.org/D158449.
2023-10-03 13:02:16 -07:00
Alexey Bataev
6f43d28f34 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-03 10:26:11 -07:00
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Alexey Bataev
44eca64224 [SLP]Check scalars before trying scheduling.
Need to check the scalars if they can be vectorized before trying to
schedule them. It may save compile time and improve vectorization on
large functions/basic blocks.

Differential Revision: https://reviews.llvm.org/D154891
2023-07-24 09:25:19 -07:00
Simon Pilgrim
ab6ec66642 [SLP][X86] Regenerate some test checks to reduce diff in D154891 2023-07-19 17:02:11 +01:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Nikita Popov
580210a0c9 [SLP] Convert some tests to opaque pointers (NFC) 2022-12-23 10:02:57 +01:00
Bjorn Pettersson
3be72f4029 [test][SLPVectorizer] Use -passes syntax in RUN lines. NFC 2022-10-13 10:44:38 +02:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Philip Reames
7d6e8f2a96 [slp] Delete dead scalar instructions feeding vectorized instructions
If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We can go ahead and delete them as well.

This is purely for test output quality and readability. It should have no effect in any sane pipeline.

Differential Revision: https://reviews.llvm.org/D122493
2022-03-28 20:10:13 -07:00
Philip Reames
48cc9287f5 Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)
The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling).  Most of these were fixed over the weekend and have had several days to bake.  The last was fixed this morning after being noticed in manual review of test changes yesterday.  See the review thread for links to each change.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

   Before this patch:

   704357 SLP                          - Number of calcDeps actions
   699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
   59 SLP                          - Number of ReScheduleOnFail actions
   10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

   After this patch:

   102895 SLP                          - Number of calcDeps actions
   161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
   55 SLP                          - Number of ReScheduleOnFail actions
   10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-03-25 10:39:23 -07:00
Philip Reames
deae979a2c Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""
This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on the original review.
2022-03-03 11:35:34 -08:00
Philip Reames
738042711b Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""
Root issue which triggered the revert was fixed in 689bab.  No changes in the reapplied patch.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instr
uction potentially vectorized to the last. This window can include a very large number of unrelated instruct
ions which are not being considered for vectorization. This change switches the code to only schedule the su
b-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

    Before this patch:

    704357 SLP                          - Number of calcDeps actions
     699021 SLP                          - Number of schedule calls
       5598 SLP                          - Number of ReSchedule actions
         59 SLP                          - Number of ReScheduleOnFail actions
      10084 SLP                          - Number of schedule resets
       8523 SLP                          - Number of vector instructions generated

    After this patch:

    102895 SLP                          - Number of calcDeps actions
     161916 SLP                          - Number of schedule calls
       5637 SLP                          - Number of ReSchedule actions
         55 SLP                          - Number of ReScheduleOnFail actions
      10083 SLP                          - Number of schedule resets
       8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-03-02 10:47:20 -08:00
Arthur Eubanks
9c6250ee41 Revert "[SLP] Schedule only sub-graph of vectorizable instructions"
This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bottom-to-top-reorder.ll.
2022-03-01 17:31:16 -08:00
Philip Reames
0539a26d91 [SLP] Schedule only sub-graph of vectorizable instructions
SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP                          - Number of calcDeps actions
 699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
     59 SLP                          - Number of ReScheduleOnFail actions
  10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

After this patch:

102895 SLP                          - Number of calcDeps actions
 161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
     55 SLP                          - Number of ReScheduleOnFail actions
  10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-02-22 10:15:55 -08:00
Alexey Bataev
352c46e707 [SLP]Improve vectorization of split loads.
Need to fix ther cost estimation for split loads, since we look at the
subregs already, no need to permute them, need just to estimate
subregister insert, if it is smaller than the real register. Also, using
split loads, it might be profitable already to vectorize smaller trees
with gathering of the loads.

Differential Revision: https://reviews.llvm.org/D107188
2021-11-12 06:13:22 -08:00
Alexey Bataev
3ea7877c8b [SLP]Unify vectorization of PHI and store nodes with improved tiny tree vectorization.
Vectorization of PHIs and stores very similar, it might be beneficial to
try to revectorize stores (like PHIs) if the total number of stores with
the same/alternate opcode is less than the vector size but number of
stores with the same type is larger than the vector size.

Differential Revision: https://reviews.llvm.org/D109831
2021-10-21 06:25:32 -07:00
Alexey Bataev
b6d10beb50 [SLP][NFC]Rename function in the test for better matching of the
transformation.
2021-09-22 05:51:18 -07:00
Alexey Bataev
446e11fa29 [SLP][NFC]Add a test for tiny tree with stores and with not
same/alternate instructions.
2021-09-15 08:07:01 -07:00
Alexey Bataev
95e5d401ae [SLP]Improve splats vectorization.
Replace insertelement instructions for splats with just single
insertelement + broadcast shuffle. Also, try to merge these instructions
if they come from the same/shuffled gather node.

Differential Revision: https://reviews.llvm.org/D107104
2021-07-30 10:17:45 -07:00
Alexey Bataev
da3dbfcacf [SLP]Improve calculations of the cost for reused/reordered scalars.
Part of D105020. Also, fixed FIXMEs that need to use wider vector type
when trying to calculate the cost of reused scalars. This may cause
regressions unless D100486 is landed to improve the cost estimations
for long vectors shuffling.

Differential Revision: https://reviews.llvm.org/D106060
2021-07-16 13:40:15 -07:00
Alexey Bataev
a0086add2e [SLP]Improve gathering of scalar elements.
1. Better sorting of scalars to be gathered. Trying to insert
   constants/arguments/instructions-out-of-loop at first and only then
   the instructions which are inside the loop. It improves hoisting of
   invariant insertelements instructions.
2. Better detection of shuffle candidates in gathering function.
3. The cost of insertelement for constants is 0.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103458
2021-06-09 05:23:21 -07:00
Alexey Bataev
369cd2ae52 Revert "[SLP]Allow masked gathers only if allowed by target."
This reverts commit fd18547e0721983dcb273670d16341921f831e50. Need to
add a check for the size of the vectorization tree to avoid some extra
vectorization.
2021-05-04 04:53:22 -07:00
Alexey Bataev
fd18547e07 [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 08:06:20 -07:00
Alexey Bataev
2e4cc9a725 Revert "[SLP]Allow masked gathers only if allowed by target."
This reverts commit b5f64768cfeecca16c7c9c53cbd97ac7289c43aa to fix
a compiler crash revealed by buildbots.
2021-05-03 07:20:00 -07:00
Alexey Bataev
b5f64768cf [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 06:45:42 -07:00
Alexey Bataev
8af4723c58 [SLP]Try to vectorize tiny trees with shuffled gathers.
If the first tree element is vectorize and the second is gather, it
still might be profitable to vectorize it if the gather node contains
less scalars to vectorize than the original tree node. It might be
profitable to use shuffles.

Differential Revision: https://reviews.llvm.org/D101397
2021-04-28 06:35:31 -07:00
Alexey Bataev
1c0ab3411a [SLP]Add a test for possibly vectorized tiny tree, NFC. 2021-04-27 13:39:02 -07:00
Juneyoung Lee
4a8e6ed2f7 [SLP,LV] Use poison constant vector for shufflevector/initial insertelement
This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector.
The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ).

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94061
2021-01-06 11:22:50 +09:00
Arthur Eubanks
691c086d15 [NewPM][BasicAA] basicaa -> basic-aa in Transforms/SLPVectorizer
Following https://reviews.llvm.org/D82607.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D82681
2020-06-26 14:58:41 -07:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Eric Christopher
2534592b9f Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"
As this has broken the lto bootstrap build for 3 days and is
showing a significant regression on the Dither_benchmark results (from
the LLVM benchmark suite) -- specifically, on the
BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
BENCHMARK_FLOYD_DITHER_512; the others are unchanged.  These have
regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
Sandybridge.

This reverts commit r353923.

llvm-svn: 354434
2019-02-20 04:42:07 +00:00
Anton Afanasyev
ca9aff9353 [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)
Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for partial vector operations (for instance,
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).

Fixes llvm.org/PR32433

llvm-svn: 353923
2019-02-13 08:26:43 +00:00
Simon Pilgrim
f1e668830f [SLPVectorizer][X86] Regenerate some tests. NFCI
llvm-svn: 329196
2018-04-04 13:53:51 +00:00
Michael Zolotukhin
4d8ffa082c [SLP] Vectorize for all-constant entries.
Differential Revision: http://reviews.llvm.org/D10531

llvm-svn: 240144
2015-06-19 17:40:15 +00:00
David Blaikie
a79ac14fa6 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie
79e6c74981 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Arnold Schwaighofer
9611d23d63 SLPVectorizer: Try vectorizing 'splat' stores
Vectorize sequential stores of a broadcasted value.
5% on eon.

radar://16124699

llvm-svn: 202067
2014-02-24 19:52:29 +00:00
Yi Jiang
8fd1a806d5 Apply slp vectorization on fully-vectorizable tree of height 2
llvm-svn: 191852
2013-10-02 20:20:39 +00:00
Yi Jiang
582ba6c808 Test case for r191314.
Some supplemental information for r191314: We would like to make sure SLP Vectorizer will not try to vectorize tiny trees even with a negative threshold so we set the cost to INT_MAX. 

llvm-svn: 191327
2013-09-24 19:33:53 +00:00