30 Commits

Author SHA1 Message Date
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Alexey Bataev
0ad87ffdcc [SLP]Introduce shuffle of the nodes + gather/vectorbuild of the remaining scalars.
Currently compiler does not support mixing of shuffled nodes
+ gather/buildvector of the remaining scalar values. It may reduce total
  number of instructions and improve performance of the
  gather/buildvector sequences.

Part of D110978

Differential Revision: https://reviews.llvm.org/D146167
2023-03-17 11:18:36 -07:00
Nikita Popov
580210a0c9 [SLP] Convert some tests to opaque pointers (NFC) 2022-12-23 10:02:57 +01:00
Roman Lebedev
6697140ba1
[NFC] Port all SLPVectorizer tests to -passes= syntax 2022-12-07 21:44:09 +03:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Alexey Bataev
2ac5ebedea [SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.
SLP vectorizer emits extracts for externally used vectorized scalars and
estimates the cost for each such extract. But in many cases these
scalars are input for insertelement instructions, forming buildvector,
and instead of extractelement/insertelement pair we can emit/cost
estimate shuffle(s) cost and generate series of shuffles, which can be
further optimized.

Tested using test-suite (+SPEC2017), the tests passed, SLP was able to
generate/vectorize more instructions in many cases and it allowed to reduce
number of re-vectorization attempts (where we could try to vectorize
buildector insertelements again and again).

Differential Revision: https://reviews.llvm.org/D107966
2022-05-23 07:06:45 -07:00
Florian Hahn
aeb19817d6
Revert "[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly."
This reverts commit fc9c59c355cb255446e571b4515b5e41a76503c4.

The patch triggers an assertion when building SPEC on X86. Reduced
reproducer shared at D107966.

Also reverts follow-up commit 11a09af76d11ad5a9f1f95b561112af17ff81f80.
2022-05-21 21:00:01 +01:00
Alexey Bataev
fc9c59c355 [SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.
SLP vectorizer emits extracts for externally used vectorized scalars and
estimates the cost for each such extract. But in many cases these
scalars are input for insertelement instructions, forming buildvector,
and instead of extractelement/insertelement pair we can emit/cost
estimate shuffle(s) cost and generate series of shuffles, which can be
further optimized.

Tested using test-suite (+SPEC2017), the tests passed, SLP was able to
generate/vectorize more instructions in many cases and it allowed to reduce
number of re-vectorization attempts (where we could try to vectorize
buildector insertelements again and again).

Differential Revision: https://reviews.llvm.org/D107966
2022-05-20 05:58:09 -07:00
Alexey Bataev
150ea76543 Revert "[SLP]Do not schedule instructions with constants/argument/phi operands and external users."
This reverts commit 1eeb2bfe727323332800e8d390f2f8c63c953779 to fix
a bug reported in https://reviews.llvm.org/D121121
2022-03-16 13:54:59 -07:00
Alexey Bataev
1eeb2bfe72 [SLP]Do not schedule instructions with constants/argument/phi operands and external users.
No need to schedule entry nodes where all instructions are not memory
read/write instructions and their operands are either constants, or
arguments, or phis, or instructions from others blocks, or their users
are phis or from the other blocks.
The resulting vector instructions can be placed at
the beginning of the basic block without scheduling (if operands does
not need to be scheduled) or at the end of the block (if users are
outside of the block).
It may save some compile time and scheduling resources.

Differential Revision: https://reviews.llvm.org/D121121
2022-03-16 06:05:43 -07:00
Alexey Bataev
74af4bb1f4 [SLP]Remove unnecessary UndefValue in CreateShuffle.
No need to use UndefValue in CreateShuffle call.

Differential Revision: https://reviews.llvm.org/D104113
2021-06-11 08:08:30 -07:00
Alexey Bataev
89f3bc7698 [SLP]Allow to reorder nodes with >2 scalar values.
tryToVectorizeList function allows to reorder only 2 scalars. Patch
allows to reorder >2 scalars. Also, to avoid possible regressions, it
allows extra vectorization of the remaining parts of the scalars
elements if possible.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D103247
2021-06-03 10:01:36 -07:00
Anton Afanasyev
ab2c499d3a [SLP] Add insertelement instructions to vectorizable tree
Add new type of tree node for `InsertElementInst` chain forming vector.
These instructions could be either removed, or replaced by shuffles during
vectorization and we can add this node to cost model, so naturally estimating
their cost, getting rid of `CompensateCost` tricks and reducing further work
for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this
patch is the first step towards revectorization of partially vectorization
(to fix PR42022 completely). After adding inserts to tree the next step is
to add vector instructions there (for instance, to merge `store <2 x float>`
and `store <2 x float>` to `store <4 x float>`).

Fixes PR40522 and PR35732.

Differential Revision: https://reviews.llvm.org/D98714
2021-05-13 07:41:45 +03:00
Juneyoung Lee
4a8e6ed2f7 [SLP,LV] Use poison constant vector for shufflevector/initial insertelement
This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef.
This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector.
The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ).

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94061
2021-01-06 11:22:50 +09:00
Juneyoung Lee
278aa65cc4 [IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder
This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93793
2020-12-30 04:21:04 +09:00
Arthur Eubanks
691c086d15 [NewPM][BasicAA] basicaa -> basic-aa in Transforms/SLPVectorizer
Following https://reviews.llvm.org/D82607.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D82681
2020-06-26 14:58:41 -07:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Simon Pilgrim
ff3abef395 [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.

This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.

Differential Revision: https://reviews.llvm.org/D59738

llvm-svn: 356913
2019-03-25 15:53:55 +00:00
Eric Christopher
2534592b9f Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"
As this has broken the lto bootstrap build for 3 days and is
showing a significant regression on the Dither_benchmark results (from
the LLVM benchmark suite) -- specifically, on the
BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
BENCHMARK_FLOYD_DITHER_512; the others are unchanged.  These have
regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
Sandybridge.

This reverts commit r353923.

llvm-svn: 354434
2019-02-20 04:42:07 +00:00
Anton Afanasyev
ca9aff9353 [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)
Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for partial vector operations (for instance,
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).

Fixes llvm.org/PR32433

llvm-svn: 353923
2019-02-13 08:26:43 +00:00
Alexey Bataev
ce2c8b3360 [SLP]Update test checks for the SPL vectorizer, NFC.
llvm-svn: 350967
2019-01-11 20:21:14 +00:00
David Blaikie
a79ac14fa6 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie
79e6c74981 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Benjamin Kramer
c10563d14e Fix broken CHECK lines.
llvm-svn: 199016
2014-01-11 21:06:00 +00:00
Arnold Schwaighofer
58864d2d5f SLPVectorizer: Sort PHINodes based on their opcode
Before this patch we relied on the order of phi nodes when we looked for phi
nodes of the same type. This could prevent vectorization of cases where there
was a phi node of a second type in between phi nodes of some type.

This is important for vectorization of an internal graphics kernel. On the test
suite + external on x86_64 (and on a run on armv7s) it showed no impact on
either performance or compile time.

radar://15024459

llvm-svn: 192537
2013-10-12 18:56:27 +00:00
Yi Jiang
aeb5b46a85 In this patch we are trying to do two things:
1) If the width of vectorization list candidate is bigger than vector reg width, we will break it down to fit the vector reg.
2) We do not vectorize the width which is not power of two.

The performance result shows it will help some spec benchmarks. mesa improved 6.97% and ammp improved 1.54%. 

llvm-svn: 189830
2013-09-03 17:26:04 +00:00
Chandler Carruth
1c34afcb61 Teach the SLP vectorizer the correct way to check for consecutive access
using GEPs. Previously, it used a number of different heuristics for
analyzing the GEPs. Several of these were conservatively correct, but
failed to fall back to SCEV even when SCEV might have given a reasonable
answer. One was simply incorrect in how it was formulated.

There was good code already to recursively evaluate the constant offsets
in GEPs, look through pointer casts, etc. I gathered this into a form
code like the SLP code can use in a previous commit, which allows all of
this code to become quite simple.

There is some performance (compile time) concern here at first glance as
we're directly attempting to walk both pointers constant GEP chains.
However, a couple of thoughts:

1) The very common cases where there is a dynamic pointer, and a second
   pointer at a constant offset (usually a stride) from it, this code
   will actually not do any unnecessary work.

2) InstCombine and other passes work very hard to collapse constant
   GEPs, so it will be rare that we iterate here for a long time.

That said, if there remain performance problems here, there are some
obvious things that can improve the situation immensely. Doing
a vectorizer-pass-wide memoizer for each individual layer of pointer
values, their base values, and the constant offset is likely to be able
to completely remove redundant work and strictly limit the scaling of
the work to scrape these GEPs. Since this optimization was not done on
the prior version (which would still benefit from it), I've not done it
here. But if folks have benchmarks that slow down it should be straight
forward for them to add.

I've added a test case, but I'm not really confident of the amount of
testing done for different access patterns, strides, and pointer
manipulation.

llvm-svn: 189007
2013-08-22 12:45:17 +00:00
Nadav Rotem
060be733a5 SLP Vectorizer: Add support for trees with external users.
To support this we have to insert 'extractelement' instructions to pick the right lane.
We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated.

llvm-svn: 185230
2013-06-28 22:07:09 +00:00
Nadav Rotem
0794acc1da SLPVectorizer: support slp-vectorization of PHINodes between basic blocks
llvm-svn: 184888
2013-06-25 23:04:09 +00:00