Summary:
The previous offloading entry type did not fit the current use-cases
very well. This widens it and adds a version to prevent further
annoyances. It also includes the kind to better sort who's using it.
The first 64-bytes are reserved as zero so the OpenMP runtime can detect
the old format for binary compatibilitry.
Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.
This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.
One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:
```
struct S { ~S() { foo(); } };
void foo() {
static S s;
}
```
However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.
This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.
Depends on: https://github.com/llvm/llvm-project/pull/71549
The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the runtime. The
KernelLaunchEnvironment is for dynamic information *per* kernel launch.
It allows the rutime to feed information to the kernel that is not
shared with other invocations of the kernel. The first use case is to
replace the globals that synchronize teams reductions with per-launch
versions. This allows concurrent teams reductions. More uses cases will
follow, e.g., per launch memory pools.
Fixes: https://github.com/llvm/llvm-project/issues/70249
Summary:
A previous patch changed the logic to force external visibliity on
declare target variables. This is because they need to be exported in
the dynamic symbol table to be usable as the standard depicts. However,
the logic was always setting the visibility to `protected`, which would
override some symbols. For example, when calling `libc` functions for
CPU offloading. This patch changes the logic to only fire if the
variable has hidden visibliity to start with.
Summary:
There's some logic in the AMDGPU target that manually resets the
requested visibility of certain variables. This was triggering when we
set a constant variable in OpenMP. However, we shouldn't do this for
OpenMP when the variable has the `nohost` type. That implies that the
variable is not visible to the host and therefore does not need to be
visible, so we should respect the original value of it.
Currently, the precense of the OpenMP target declare metadata requires
that we always codegen a global declaration. This is undesirable in the
case that we could defer or omit this declaration as is common with
unused extern variables. This is important as it allows us, in the
runtime, to rely on static linking semantics to omit unused symbols so
they are not included when the user links it in.
This patch changes the check for always emitting these variables.
Because of this we also need to extend this logic to the generation of
the offloading entries. This has the result of derring the offload entry
generation to the canonical definitoin. So we are effectively assuming
whoever owns the storage for this variable will perform that operation.
This makes an exception for `link` attributes as those require their own
special handling.
Let me know if this is sound in the implementation, I do not have the
largest view of the standards here.
Fixes: https://github.com/llvm/llvm-project/issues/64133
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D156368
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
This patch changes the handling of OpenMP to add the device attributes
to the canonical definitions when we encounter a non-canonical
definition. Previously, the following code would not work because it
would find the non-canonical definition first which would then not be
used anywhere else.
```
int x;
extern int x;
```
This patch now adds the attribute to both of them. This allows us to
perform the following operation if, for example, there were an
implementation of `stderr` on the device.
```
#include <stdio.h>
// List of libc symbols supported on the device.
extern FILE *stderr;
```
Unfortunately I cannot think of an equivalent solution to HIP / CUDA
device declarations as those are done with simple attributes. Attributes
themselves cannot be used to affect a definition once its canonical
definition has already been seen. Some help on that front would be
appreciated.
Fixes https://github.com/llvm/llvm-project/issues/63355
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D153369
Re-apply of: 3d0e9edd8e53fb72e85084f4170513159212839a
Reverted in: 0cb65b0a585c8b3d4a8a2aefe994a8fc907934f8
A function parameter was using the wrong type 'llvm::TargetRegion' instead of
'const llvm:: TargetRegion&', which caused the error in the address sanitizer.
The correct type is now used.
This patch puts the individual target region information attributes into a
struct so that the nested mappings are not needed and passing the information
around is simplified.
Reviewed By: jdoerfert, mikerice
Differential Revision: https://reviews.llvm.org/D136601
This patch puts the individual target region information attributes into a
struct so that the nested mappings are not needed and passing the information
around is simplified.
Reviewed By: jdoerfert, mikerice
Differential Revision: https://reviews.llvm.org/D136601
This patch changes the kernels generated by OpenMP to have protected
visibility. This is unlikely to change anything functionally. However,
protected visibility better matches the behaviour of these GPU kernels.
We do not expect any pending shared library load to preempt these
kernels so we can specify a more restrictive visibility.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D136198
This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.
The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.
Differential Revision: https://reviews.llvm.org/D123115
The default construction of constructor functions by LLVM tends to make
them have internal linkage. When we call a ctor / dtor function in the
target region we are actually creating a kernel that is called at
registration. Because the ctor is a kernel we need to make sure it's
externally visible so we can actually call it. This prevented AMDGPU
from correctly using constructors while NVPTX could use them simply
because it ignored internal visibility.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D122504
Currently the device kernels all have weak linkage to prevent linkage
errors on multiple defintions. However, this prevents some optimizations
from adequately analyzing them because of the nature of weak linkage.
This patch replaces the weak linkage with weak_odr linkage so we can
statically assert that multiple declarations of the same kernel will
have the same definition.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122443
This patch changes the special-case handling of visibility when
compiling for an OpenMP target offloading device. This was orignally
added as a precaution against the bug encountered in PR41826 when
symbols in the device were being preempted by shared library symbols.
This should instead be done by making the visibility protected by default.
With protected visibility we are asserting that the symbols on the device
will never be preempted or preempt another symbol pending a shared library
load.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D117806
This patch changes the visiblity of variables declared within a declare
target directive. Variable declarations within a declare target
directive need to be externally visible to the plugin for initialization
or reading. Previously this would cause runtime errors where the named
global could not be found because it was not included in the symbol
table.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D117362
This patch changes the visiblity of variables declared within a declare
target directive. Variable declarations within a declare target
directive need to be externally visible to the plugin for initialization
or reading. Previously this would cause runtime errors where the named
global could not be found because it was not included in the symbol
table.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D117362
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
This reverts commit aacfbb953eb705af2ecfeb95a6262818fa85dd92.
Revert "Fix lit test failures in CodeGenCoroutines"
This reverts commit 63fff0f5bffe20fa2c84a45a41161afa0043cb34.
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
Resolve lit failures in clang after 8ca4b3e's land
Fix lit test failures in clang-ppc* and clang-x64-windows-msvc
Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc
Fix internal_clone(aarch64) inline assembly
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
This reverts the following commits:
37ca7a795b277c20c02a218bf44052278c03344b
9aa6c72b92b6c89cc6d23b693257df9af7de2d15
705387c5074bcca36d626882462ebbc2bcc3bed4
8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515
80dba72a669b5416e97a42fd2c2a7bc5a6d3f44a
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
When an object is allocated in a non-default address space we do not
need to check for a constructor if it is not initialized and has a
trivial constructor (which we won't call then).
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100929
The original issue is caused by the fact that the variable is allocated
with incorrect type i1 instead of i8. This causes the bitcasting of the
declaration to i8 type and the bitcast expression does not match the
original variable.
To fix the problem, the UndefValue initializer and the original
variable should be emitted with type i8, not i1.
Differential Revision: https://reviews.llvm.org/D99297
arguments.
* Adds 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments
* Gates 'nonnull' on -f(no-)delete-null-pointer-checks
* Introduces this-nonnull.cpp and microsoft-abi-this-nullable.cpp tests to
explicitly test the behavior of this change
* Refactors hundreds of over-constrained clang tests to permit these
attributes, where needed
* Updates Clang12 patch notes mentioning this change
Reviewed-by: rsmith, jdoerfert
Differential Revision: https://reviews.llvm.org/D17993
This third patch in the series removes version 5.0 string from
test cases making them check for default version. It also add test
cases for version 4.5.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D85214
Summary:
When -fopenmp option is specified then version 5.0 will be set as
default.
Reviewers: gregrodgers, jdoerfert, ABataev
Reviewed By: ABataev
Subscribers: pdhaliwal, yaxunl, guansong, sstefan1, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81098
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.
See also D80072.
Differential Revision: https://reviews.llvm.org/D80166
If current kind of the translation unit is TU_Prefix and it is not
complete, cannot decide what to do with virtual members/table at that
time, need to delay it to later stages.
Summary:
Currently, we ignore all locality attributes/info when building for
the device and thus all symblos are externally visible and can be
preemted at the runtime. It may lead to incorrect results. We need to
follow the same logic, compiler uses for static/pie builds. But in some
cases changing of dso locality may lead to problems with codegen, so
instead mark external symbols as hidden instead in the device code.
Reviewers: jdoerfert
Subscribers: guansong, caomhin, kkwli0, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D70549
is not provided.
We should not emit any target-dependent code if only -fopenmp flag is
used and device targets are not provided to prevent compiler crash.
llvm-svn: 372623
construct.
OpenMP 5.0 introduced new clause for declare target directive, device_type clause, which may accept values host, nohost, and any. Host means
that the function must be emitted only for the host, nohost - only for
the device, and any - for both, device and the host.
llvm-svn: 369775
Summary:
This patch fixes the case where variables in different compilation units or the same compilation unit are under the declare target link clause AND have the same name.
This also fixes the name clash error that occurs when unified memory is activated.
The changes in this patch include:
- Pointers to internal variables are given unique names.
- Externally visible variables are given the same name as before.
- All pointer variables (external or internal) are weakly linked.
Reviewers: ABataev, jdoerfert, caomhin
Reviewed By: ABataev
Subscribers: lebedev.ri, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D64592
llvm-svn: 367613
Summary:
This patch adds support for the handling of the variables under the declare target to clause.
The variables in this case are handled like link variables are. A pointer is created on the host and then mapped to the device. The runtime will then copy the address of the host variable in the device pointer.
Reviewers: ABataev, AlexEichenberger, caomhin
Reviewed By: ABataev
Subscribers: guansong, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D63108
llvm-svn: 363959
Currently, we ignore all dso locality attributes/info when building for
the device and thus all symblos are externally visible and can be
preemted at the runtime. It may lead to incorrect results. We need to
follow the same logic, compiler uses for static/pie builds.
llvm-svn: 361283
If the variable was declared and marked as declare target, a new offload
entry with size 0 is created. But if later a definition is created and
marked as declare target, this definition is not added to the entry set
and the definition remains not mapped to the target. Patch fixes this
problem allowing to redefine the size and linkage for
previously registered declaration.
llvm-svn: 355960
Fixed emission of the target regions found in the virtual functions.
Previously we may end up with the situation when those regions could be
skipped.
llvm-svn: 347793
Fixed lookup for the target regions in unused virtual functions + fixed
processing of the global variables not marked as declare target but
emitted during debug info emission.
llvm-svn: 346343
This reverts commit https://reviews.llvm.org/rL344150 which causes
MachineOutliner related failures on the ppc64le multistage buildbot.
llvm-svn: 344526
This is currently a clang extension and a resolution
of the defect report in the C++ Standard.
Differential Revision: https://reviews.llvm.org/D46441
llvm-svn: 344150
context.
If the explicit template instantiation definition defined outside of the
target context, its vtable should not be marked as used. This is true
for other situations where the compiler want to emit vtables
unconditionally.
llvm-svn: 341570