This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
This PR replaces some instances of `undef` with `function argument
value` or `poison` or `concrete values` in several tests under
`llvm/test/Transforms/` directory. These changes align with modern LLVM
standards for better-defined behavior and test determinism. If this
small PR is okay and gets merged, I will work on the rest.
This is inspired by [this
project](https://discourse.llvm.org/t/gsoc-2024-remove-undefined-behavior-from-tests/77236/29),
work done on this by @leewei05
A dominance query of a block that is in a different function is
ill-defined, so assert that getNode() is only called for blocks that are
in the same function.
There are three cases, where this behavior did occur. LoopFuse didn't
explicitly do this, but didn't invalidate the SCEV block dispositions,
leaving dangling pointers to free'ed basic blocks behind, causing
use-after-free. We do, however, want to be able to dereference basic
blocks inside the dominator tree, so that we can refer to them by a
number stored inside the basic block.
Reverts #102780
Reland #101198Fixes#102784
Co-authored-by: Alexis Engelke <engelke@in.tum.de>
A dominance query of a block that is in a different function is
ill-defined, so assert that getNode() is only called for blocks that are
in the same function.
There are two cases, where this behavior did occur. LoopFuse didn't
explicitly do this, but didn't invalidate the SCEV block dispositions,
leaving dangling pointers to free'ed basic blocks behind, causing
use-after-free. We do, however, want to be able to dereference basic
blocks inside the dominator tree, so that we can refer to them by a
number stored inside the basic block.
AlignmentFromAssumptions uses SCEV to update the load/store alignment.
It tracks down the use-def chains for the pointer which it takes from
the assumption cache until it reaches the load or store instruction. It
mistakenly adds to the worklist the users of the load result
irrespective of the fact that the load result has no connection with the
original pointer, moreover, it is not a pointer at all in most cases.
Thus the def-use chain contains irrelevant load users. When it is a
store instruction the algorithm attempts to adjust its alignment to the
alignment of the original pointer. The problem appears when the load and
store memory operand pointers belong to different address spaces and
possibly have different sizes.
The 4bf015c035e4e5b63c7222dfb15ff274a5ed905c was an attempt to address a
similar problem. The truncation or zero extension was added to make
pointers the same size. That looks strange to me because the zero
extension of the pointer is not legal. The test in the
4bf015c035e4e5b63c7222dfb15ff274a5ed905c does not work any longer as for
the explicit address spaces conversion the addrspacecast is generated.
Summarize:
1. For the alloca to global address spaces conversion addrspacecasts are
used, so the code added by the 4bf015c035e4e5b63c7222dfb15ff274a5ed905c
is no longer functional.
2. The AlignmentFromAssumptions algorithm should not add the load users
to the worklist as they have nothing to do with the original pointer.
3. Instead we only track users that are: GetelementPtrIns, PHINodes.
Align operand bundles can contain non-power-of-two alignments,
but LLVM otherwise does not support them. Bail out in that case
instead of crashing.
Fixes https://github.com/llvm/llvm-project/issues/64687.
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.
This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27.
Differential Revision: https://reviews.llvm.org/D149776
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.
The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.
The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.
Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.
This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.
Depends on D143437
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D145441
This updates transform test cases for
ADCE
AddDiscriminators
AggressiveInstCombine
AlignmentFromAssumptions
ArgumentPromotion
BDCE
CalledValuePropagation
DCE
Reg2Mem
WholeProgramDevirt
to use the -passes syntax when specifying the pipeline.
Given that LLVM_ENABLE_NEW_PASS_MANAGER isn't set to off (which is
a deprecated feature) the updated test cases already used the new
pass manager, but they were using the legacy syntax when specifying
the passes to run. This patch can be seen as a step toward deprecating
that interface.
This patch also removes some redundant RUN lines. Here I am
referring to test cases that had multiple RUN lines verifying both
the legacy "-passname" syntax and the new "-passes=passname" syntax.
Since we switched the default pass manager to "new PM" both RUN lines
have verified the new PM version of the pass (more or less wasting
time running the same test twice), unless LLVM_ENABLE_NEW_PASS_MANAGER
is set to "off". It is assumed that it is enough to run these tests
with the new pass manager now.
Differential Revision: https://reviews.llvm.org/D108472
This is a bailout for pr51680. This pass appears to assume that the alignment operand to an align tag on an assume bundle is constant. This doesn't appear to be required anywhere, and clang happily generates non-constant alignments for cases such as this case taken from the bug report:
// clang -cc1 -triple powerpc64-- -S -O1 opal_pci-min.c
extern int a[];
long *b;
long c;
void *d(long, int *, int, long, long, long) __attribute__((__alloc_align__(6)));
void e() {
b = d(c, a, 0, 0, 5, c);
b[0] = 0;
}
This was exposed by a SCEV change which allowed a non-constant alignment to reach further into the pass' code. We could generalize the pass, but for now, let's fix the crash.
This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492).
Consider this code:
```
call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 -1
; aligment of %arrayidx?
```
The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k.
Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned.
The reason why this happened is as follows.
`DiffSCEV` stores `%arrayidx - %a` which is -4.
`OffSCEV` stores the offset value of “align”, which is 28.
`DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead.
Reviewed By: Tyker
Differential Revision: https://reviews.llvm.org/D98759
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
due to the performance bugs filed in https://bugs.llvm.org/show_bug.cgi?id=46753.
An SROA change soon may obviate some of these problems.
This reverts commit 8d09f20798ac180b1749276bff364682ce0196ab.
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.
Minimal reduced reproducer: run `opt -alignment-from-assumptions` on
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }
; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
ret i32 0
}
; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1
attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }
This is what we'd have with -mllvm -enable-knowledge-retention
This reverts commit c95ffadb2474a4d8c4f598d94d35a9f31d9606cb.
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
Summary:
On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue.
In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64.
This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type.
Reviewers: hfinkel, jdoerfert
Reviewed By: jdoerfert
Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75471
The alignment is calculated incorrectly, thus sometimes it doesn't generate aligned mov instructions, as shown by the example below:
```
// b.cc
typedef long long index;
extern "C" index g_tid;
extern "C" index g_num;
void add3(float* __restrict__ a, float* __restrict__ b, float* __restrict__ c) {
index n = 64*1024;
index m = 16*1024;
index k = 4*1024;
index tid = g_tid;
index num = g_num;
__builtin_assume_aligned(a, 32);
__builtin_assume_aligned(b, 32);
__builtin_assume_aligned(c, 32);
for (index i0=tid*k; i0<m; i0+=num*k)
for (index i1=0; i1<n*m; i1+=m)
for (index i2=0; i2<k; i2++)
c[i1+i0+i2] = b[i0+i2] + a[i1+i0+i2];
}
```
Compile with `clang b.cc -Ofast -march=skylake -mavx2 -S`
```
vmovaps -224(%rdi,%rbx,4), %ymm0
vmovups -192(%rdi,%rbx,4), %ymm1 # should be movaps
vmovups -160(%rdi,%rbx,4), %ymm2 # should be movaps
vmovups -128(%rdi,%rbx,4), %ymm3 # should be movaps
vaddps -224(%rsi,%rbx,4), %ymm0, %ymm0
vaddps -192(%rsi,%rbx,4), %ymm1, %ymm1
vaddps -160(%rsi,%rbx,4), %ymm2, %ymm2
vaddps -128(%rsi,%rbx,4), %ymm3, %ymm3
vmovaps %ymm0, -224(%rdx,%rbx,4)
vmovups %ymm1, -192(%rdx,%rbx,4) # should be movaps
vmovups %ymm2, -160(%rdx,%rbx,4) # should be movaps
vmovups %ymm3, -128(%rdx,%rbx,4) # should be movaps
```
Differential Revision: https://reviews.llvm.org/D66575
Patch by Dun Liang
llvm-svn: 369723
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of
MemoryIntrinsic in favour of getting/setting source & dest specific alignments through
the new API. This allows us to simplify some of the code in this pass and also be more
aggressive about setting the source and destination alignments separately.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: hfinkel, bollu, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D43081
llvm-svn: 325816
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe. For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)
For out of tree owners, I was able to strip alignment from calls using sed by replacing:
(call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
$1i1 false)
and similarly for memmove and memcpy.
I then added back in alignment to test cases which needed it.
A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.
In IRBuilder itself, a new argument was added. Instead of calling:
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)
There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool. This is to prevent isVolatile here from passing its default
parameter to the source alignment.
Note, changes in future can now be made to codegen. I didn't change anything here, but this
change should enable better memcpy code sequences.
Reviewed by Hal Finkel.
llvm-svn: 253511
Essentially the same as the GEP change in r230786.
A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)
import fileinput
import sys
import re
pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
for line in sys.stdin:
sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7649
llvm-svn: 230794
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.
This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.
* This doesn't modify gep operators, only instructions (operators will be
handled separately)
* Textual IR changes only. Bitcode (including upgrade) and changing the
in-memory representation will be in separate changes.
* geps of vectors are transformed as:
getelementptr <4 x float*> %x, ...
->getelementptr float, <4 x float*> %x, ...
Then, once the opaque pointer type is introduced, this will ultimately look
like:
getelementptr float, <4 x ptr> %x
with the unambiguous interpretation that it is a vector of pointers to float.
* address spaces remain on the pointer, not the type:
getelementptr float addrspace(1)* %x
->getelementptr float, float addrspace(1)* %x
Then, eventually:
getelementptr float, ptr addrspace(1) %x
Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.
update.py:
import fileinput
import sys
import re
ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
def conv(match, line):
if not match:
return line
line = match.groups()[0]
if len(match.groups()[5]) == 0:
line += match.groups()[2]
line += match.groups()[3]
line += ", "
line += match.groups()[1]
line += "\n"
return line
for line in sys.stdin:
if line.find("getelementptr ") == line.find("getelementptr inbounds"):
if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
line = conv(re.match(ibrep, line), line)
elif line.find("getelementptr ") != line.find("getelementptr ("):
line = conv(re.match(normrep, line), line)
sys.stdout.write(line)
apply.sh:
for name in "$@"
do
python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
rm -f "$name.tmp"
done
The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh
After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).
The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7636
llvm-svn: 230786
We used to crash processing any relevant @llvm.assume on a 32-bit target
(because we'd ask SE to subtract expressions of differing types). I've copied
our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get
some meaningful test coverage here.
llvm-svn: 217574
The routine that determines an alignment given some SCEV returns zero if the
answer is unknown. In a case where we could determine the increment of an
AddRec but not the starting alignment, we would compute the integer modulus by
zero (which is illegal and traps). Prevent this by returning early if either
the start or increment alignment is unknown (zero).
llvm-svn: 217544
This adds a ScalarEvolution-powered transformation that updates load, store and
memory intrinsic pointer alignments based on invariant((a+q) & b == 0)
expressions. Many of the simple cases we can get with ValueTracking, but we
still need something like this for the more complicated cases (such as those
with an offset) that require some algebra. Note that gcc's
__builtin_assume_aligned's optional third argument provides exactly for this
kind of 'misalignment' offset for which this kind of logic is necessary.
The primary motivation is to fixup alignments for vector loads/stores after
vectorization (and unrolling). This pass is added to the optimization pipeline
just after the SLP vectorizer runs (which, admittedly, does not preserve SE,
although I imagine it could). Regardless, I actually don't think that the
preservation matters too much in this case: SE computes lazily, and this pass
won't issue any SE queries unless there are any assume intrinsics, so there
should be no real additional cost in the common case (SLP does preserve DT and
LoopInfo).
llvm-svn: 217344