Summary:
This patch adds an RPC interface that lives directly in the OpenMP
device runtime. This allows OpenMP to implement custom opcodes.
Currently this is only providing the host call interface, which is the
raw version of reverse offloading. Previously this lived in `libc/` as
an extension which is not the correct place.
The interface here uses a weak symbol for the RPC client by the same
name that the `libc` interface uses. This means that it will defer to
the libc one if both are present so we don't need to set up multiple
instances.
The presense of this symbol is what controls whether or not we set up
the RPC server. Because this is an external symbol it normally won't be
optimized out, so there's a special pass in OpenMPOpt that deletes this
symbol if it is unused during linking. That means at `O0` the RPC server
will always be present now, but will be removed trivially if it's not
used at O1 and higher.
Summary:
This is going to be deprecated in
https://github.com/llvm/llvm-project/pull/112849. This patch ports it to
use the builtin instead. This isn't a compile constant, so it could
slightly negatively affect codegen. There really should be an IR pass to
turn it into a constant if the function has known attributes.
Using the builtin is correct when we just do it for knowing the size
like we do here. Obviously guarding w32/w64 code with this check would
be broken.
Summary:
We can include `stdint.h` just fine as long as we don't allow it to find
system headers, passing `-nostdlibinc` and `-nogpuinc` suppresses these
extra paths so we will just use the clang resource headers for
`stdint.h` and `stddef.h`.
We had three `utils::` namespaces, all with different "meaning" (host,
device, hsa_utils). We should, when we can, keep "include/Shared"
accessible from host and device, thus RefCountTy has been moved to a
separate header. `hsa_utils` was introduced to make `utils::` less
overloaded. And common functionality was de-duplicated, e.g.,
`utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type
punning now checks for the size of the result to make sure it matches
the source type.
No functional change was intended.
In #92581 the `LibomptargetUitls.cmake` helpers have been removed, but
only uses of `libomptarget_say` were migrated. Migrate the remaining few
warning and error messages so the `check-offload` target would not fail
due to missing `libomptarget_warning_say`.
While at it, update the `check-offload` unavailability message to say
`check-offload` instead of `check-libomptarget`.
Fixes#92581
This pull request is a revised version of #76587. This pull request
fixes some build issues that were present in the previous version of
this change.
> This pull request is the first part of an ongoing effort to extends
PGO instrumentation to GPU device code. This PR makes the following
changes:
>
> - Adds blank registration functions to device RTL
> - Gives PGO globals protected visibility when targeting a supported
GPU
> - Handles any addrspace casts for PGO calls
> - Implements PGO global extraction in GPU plugins (currently only
dumps info)
>
> These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
Summary:
The 'omp_alloc' function should be callable from a target region. This
patch implemets it by simply calling `malloc` for every non-default
trait value allocator. All the special access modifiers are
unimplemented and return null. The null allocator returns null as the
spec states it should not be usable from the target.
Summary:
Currently there are several layers to handle `printf`. Since we now have
varargs and an implementation of `printf` this can be heavily
simplified.
1. The frontend renames `printf` into `omp_vprintf` and gives it an
argument buffer.
Removing 1. triggered some code in the AMDGPU backend menat for HIP /
OpenCL, so I hadded an exception to it.
2. Forward this to CUDA vprintf or ignore it.
We no longer need special handling for it since we have varargs. So now
we just forward this to CUDA vprintf if we have libc, otherwise just
leave `printf` as an external function and expect that `libc` will be
linked in.
Summary:
The C library is intended to provide `__assert_fail`, so in the cases
that we have both we should defer to that. This means that if you build
the C library for GPUs you'll get the RPC based asser, and if not you'll
get the trap based one.
Summary:
These functions provide special-case implementations internal to the
OpenMP device runtime. This can potentially conflict with the symbols
pulled in from the actual GPU `libc`. This patch makes these weak, so in
the case that the GPU libc functions exist they will be overridden. This
should not impact performance in the average case because the old
`-mlink-builtin-bitcode` version does internalization, deleting weak,
and the new LTO path will resolve to the strong reference and then
internalize it.
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:
- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)
These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
This reverts commit 098c6dfa8157681699a71fce9e3d94515e66311f.
This reverts commit 8c718a3a91df4ab68dc3f1ca3887ea730c9aed84.
This reverts commit 4fb02de9d490d0773441aa30124bb4d1272230d3.
Summary:
This isn't `libomptarget` anymore, and these messages were always
unnecessary because no other project uses these prefixed messages. The
effect of this is that no longer will the logs have `LIBOMPTARGET --` in
front of everything. We have a message stating when we start building
the offload project so it'll still be trivial to find.
Summary:
No other project has these in the CMake itself, and they're wildly
inconsistent even within the project. These don't really add anything so
I think they should be removed.
In a nutshell, this moves our libomptarget code to populate the offload
subproject.
With this commit, users need to enable the new LLVM/Offload subproject
as a runtime in their cmake configuration.
No further changes are expected for downstream code.
Tests and other components still depend on OpenMP and have also not been
renamed. The results below are for a build in which OpenMP and Offload
are enabled runtimes. In addition to the pure `git mv`, we needed to
adjust some CMake files. Nothing is intended to change semantics.
```
ninja check-offload
```
Works with the X86 and AMDGPU offload tests
```
ninja check-openmp
```
Still works but doesn't build offload tests anymore.
```
ls install/lib
```
Shows all expected libraries, incl.
- `libomptarget.devicertl.a`
- `libomptarget-nvptx-sm_90.bc`
- `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git`
- `libomptarget.so` -> `libomptarget.so.18git`
Fixes: https://github.com/llvm/llvm-project/issues/75124
---------
Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>