If a load instruction qualifies to be optimized by InterleavedAccess
Pass, but also has a dead binop instruction, this will lead to a crash.
Binop instruction will not be deleted, because normally it would be
deleted through its' users, but it has none. Later on deleting a load
instruction will fail because it still has uses.
The InterleavedAccess pass currently matches (de)interleaving
shufflevector instructions with loads or stores, and calls into
target lowering to generate ldN or stN instructions.
Since we can't use shufflevector for scalable vectors (besides a
splat with zeroinitializer), we have interleave2 and deinterleave2
intrinsics. This patch extends InterleavedAccess to recognize those
intrinsics and if possible replace them with ld2/st2.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D146218
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.
Differential Revision: https://reviews.llvm.org/D149210
This enables the interleaved access pass on O1 and above, and causes
interleaving/deinterleaving shuffles of fixed length vectors with
stores/loads to be lowered into vssegN/vlsegN.
We need to be careful and make sure that we only lower vsseg/vlseg
whenever we know the fixed vector type will fit within the minimum vlen,
and that the interleaving factor is supported for the given LMUL.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145085
This enables the interleaved access pass on O1 and above, and causes
interleaving/deinterleaving shuffles of fixed length vectors with
stores/loads to be lowered into vssegN/vlsegN.
We need to be careful and make sure that we only lower vsseg/vlseg
whenever we know the fixed vector type will fit within the minimum vlen,
and that the interleaving factor is supported for the given LMUL.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145085
It is expected that shuffles that we hoist through binops only have a single
vector operand, the other being undef/poison. The checks for
isDeInterleaveMaskOfFactor check that all the elements come from inside the
first vector, but with non-canonical shuffles the second operand could still
have a value. Add a quick check to make sure it is UndefValue as expected, to
make sure we don't run into problems with BinOpShuffles not using BinOps.
Fixes#61749
Differential Revision: https://reviews.llvm.org/D147306
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.
Differential Revision: https://reviews.llvm.org/D127976
This adds FP type support to the SVE Container type list as a supplement to D112303.
Reviewed By: peterwaller-arm, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D113333
This adds support for SVE structured loads/stores to the relevant target
hooks, such that we can support these instructions in the InterleavedAccess
pass.
Depends on D112078
Differential Revision: https://reviews.llvm.org/D112303
The Interleave Access pass will convert shuffle(binop(load, load)) to
binop(shuffle(load), shuffle(load)), in order to create more
interleaving load patterns (VLD2/3/4) that might have been messed up by
instcombine. As shown in D104247 we were missing copying IR flags to the
new instruction though, which should just be kept the same as the
original instruction.
Differential Revision: https://reviews.llvm.org/D104255
This patch is a part of D93817 and makes transformations in CodeGen use poison for shufflevector/insertelem's initial vector element.
The change in CodeGenPrepare.cpp is fine because the mask of shufflevector should be always zero.
It doesn't touch the second element (which is poison).
The change in InterleavedAccessPass.cpp is also fine becauses the mask is of the form <a, a+m, a+2m, .., a+km> where a+km is smaller than
the size of the first vector operand.
This is guaranteed by the caller of replaceBinOpShuffles, which is lowerInterleavedLoad.
It calls isDeInterleaveMask and isDeInterleaveMaskOfFactor to check the mask is the desirable form.
isDeInterleaveMask has the check that a+km is smaller than the vector size.
To check my understanding, I added an assertion & added a test to show that this optimization doesn't fire in such case.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D94056
Both tryReplaceExtracts and replaceBinOpShuffles may modify the IR, even
if no interleaved loads are generated, but currently the pass pretends
no changes were made.
This patch updates the pass to return true if either of the functions
made any changes. In case of tryReplaceExtracts, changes are made if
there are any Extracts and true is returned.
`replaceBinOpShuffles` always makes changes if BinOpShuffles is not empty.
It also always returned true, so I went ahead and change it to just
`replaceBinOpShuffles`.
Fixes PR48208.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D93997
This patch makes X86InterleavedAccessGroup::deinterleave8bitStride3 use the unary CreateShuffleVector.
This is a continuation of D93923. There were a few missing replacements.
IIUC, this patch does not cause change in the generated programs' semantics because the
function inserts shufflevectors that only choose elements from the first vector.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93993
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
This commit copies existing tests at llvm/Transforms containing
'shufflevector X, undef' and replaces them with 'shufflevector X, poison'.
The new copied tests have *-inseltpoison.ll suffix at its file name
(as db7a2f347f132b3920415013d62d1adfb18d8d58 did)
See https://reviews.llvm.org/D93793
Test files listed using
grep -R -E "^[^;]*shufflevector <.*> .*, <.*> undef" | cut -d":" -f1 | uniq
Test files copied & updated using
file_org=llvm/test/Transforms/$1
if [[ "$file_org" = *-inseltpoison.ll ]]; then
file=$file_org
else
file=${file_org%.ll}-inseltpoison.ll
if [ ! -f $file ]; then
cp $file_org $file
fi
fi
sed -i -E 's/^([^;]*)shufflevector <(.*)> (.*), <(.*)> undef/\1shufflevector <\2> \3, <\4> poison/g' $file
head -1 $file | grep "Assertions have been autogenerated by utils/update_test_checks.py" -q
if [ "$?" == 1 ]; then
echo "$file : should be manually updated"
# The test is manually updated
exit 1
fi
python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file
Instcombine will currently sink identical shuffles though vector binary
operations. This is probably generally useful, but can break up the code
pattern we use to represent an interleaving load group. This patch
reverses that in the InterleaveAccessPass to re-recognise the pattern of
shuffles sunk past binary operations and folds them back if an
interleave group can be created.
Differential Revision: https://reviews.llvm.org/D89489
Summary:
Updating IR now that alignment is explicitly set.
This is a prerequisite to D80276.
Reviewers: efriedma
Subscribers: llvm-commits, craig.topper
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80549
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.
The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.
Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.
This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.
Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.
Differential Revision: https://reviews.llvm.org/D78403
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.
Differential Revision: https://reviews.llvm.org/D71109
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.
The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.
We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.
Differential Revision: https://reviews.llvm.org/D69392
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
Even if the interleaving transform would otherwise be legal, we shouldn't
introduce an interleaved load that is wider than the original load: it might
have undefined behavior.
It might be possible to perform some sort of mask-narrowing transform in
some cases (using a narrower interleaved load, then extending the
results using shufflevectors). But I haven't tried to implement that,
at least for now.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 .
Differential Revision: https://reviews.llvm.org/D59954
llvm-svn: 357212
I continue to support different VF interleaved and in this pass for this patch,
I added the vf64 stride3 support for both load and store.
I also added support fot the stride4 store.
Reviewers:
1. zvi
2. dorit
3. igorb
4. guyblank
Differential Revision: https://reviews.llvm.org/D37687
Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b
llvm-svn: 314651
This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) .
This patch is part two of two patches and it covers the store (interlevaed) side.
The patch goal is to optimize the following sequence:
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7
into
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7
Reviewers:
zvi
guyblank
dorit
Ayal
Differential Revision: https://reviews.llvm.org/D37117
Change-Id: I56ced8bcbea809a37654060771911ade20246ccc
llvm-svn: 314234
This patch expands the support of lowerInterleavedStore to 8x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2.
In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future.
The patch goal is to optimize the following sequence:
At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding
each 8 chars:
c0, c1, , c7
m0, m1, , m7
y0, y1, , y7
k0, k1, ., k7
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers
DavidKreitzer
Farhana
zvi
igorb
guyblank
RKSimon
Ayal
Differential Revision: https://reviews.llvm.org/D36058
Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32
llvm-svn: 314109
This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).
The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7
into
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7
Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal
llvm-svn: 312722
This patch expands the support of lowerInterleavedStore to 16x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.
The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:
c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Differential Revision: https://reviews.llvm.org/D35829
llvm-svn: 310252
This patch expands the support of lowerInterleavedStore to 32x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).
The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:
c0, c1, , c31
m0, m1, , m31
y0, y1, , y31
k0, k1, ., k31
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers:
dorit
Farhana
RKSimon
guyblank
DavidKreitzer
Differential Revision: https://reviews.llvm.org/D34601
llvm-svn: 309086
r306334 fixed a bug in AArch64 dealing with wide interleaved accesses having
pointer types. The bug also exists in ARM, so this patch copies over the fix.
llvm-svn: 307409