Summary:
These entires are generic for offloading with the new driver now. Having
the `omp` prefix was a historical artifact and is confusing when used
for CUDA. This patch just renames them for now, future patches will
rework the binary format to make it more common.
Summary:
This patch adds support for registering texture / surface variables from
CUDA / HIP. Additionally, we now properly track the `extern` and `const`
flags that are also used in these runtime functions.
This does not implement the `managed` variables yet as those seem to
require some extra handling I'm not familiar with. The issue is that the
current offload entry isn't large enough to carry size and alignment
information along with an extra global.
Summary:
This patch provides the initial support to allow handling the new
driver's offloading entries. Normally, the ELF target can emit varibles
at C-identifier named sections and the linker will provide a pointer to
the section. For COFF target, instead the linker merges sections
containing a `$` in alphabetical order. We thus can emit these variables
at sections and then emit two variables that are guaranteed to be sorted
before and after the others to traverse it. Previous patches
consolidated the handling of offloading entries so that this patch more
easily can handle mapping them to the appropriate section.
Ideally, the only remaining step to allow the new driver to run on
Windows targets is to accurately map the following `ld.lld` arguments to
their `llvm-link` equivalents. These are used inside the linker-wrapper,
so we should simply need to remap the arguments to the same
functionality if possible.
```
-o, -output
-l, --library
-L, --library-path
-v, --version
-rpath
-whole-archive, -no-whole-archive
```
I have not tested this at runtime as I do not have access to a windows
machine.
This patch was adapted from some initial efforts in
https://reviews.llvm.org/D137470.
The new driver primarily allows us to support RDC-mode compilations with
proper linking. This is not needed for non-RDC mode compilation, but we
still would like the new driver to be able to handle this mode so we can
transition away from the old driver in the future. This patch adds the
necessary code to support creating a fatbinary for CUDA code generation
as well as removing old assumptions and errors about RDC-mode with the
new driver.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D129655
This patch adds the small change required to output offloading entried
for HIP instead of CUDA. These should be placed in different sections so
because they need to be distinct to the offloading toolchain, otherwise
we'd have HIP trying to register CUDA kernels or vice-versa. This patch will
precede support for HIP in the linker wrapper.
Reviewed By: yaxunl, tra
Differential Revision: https://reviews.llvm.org/D128850
The changes made in D123460 generalized the code generation for OpenMP's
offloading entries. We can use the same scheme to register globals for
CUDA code. This patch adds the code generation to create these
offloading entries when compiling using the new offloading driver mode.
The offloading entries are simple structs that contain the information
necessary to register the global. The struct used is as follows:
```
Type struct __tgt_offload_entry {
void *addr; // Pointer to the offload entry info.
// (function or global)
char *name; // Name of the function or global.
size_t size; // Size of the entry info (0 if it a function).
int32_t flags;
int32_t reserved;
};
```
Currently CUDA handles RDC code generation by deferring the registration
of globals in the current TU to a callback function containing the
modules ID. Later all the module IDs will be used to register all of the
globals at once. Rather than mimic this, offloading entries allow us to
mimic the way OpenMP registers globals. That is, we create a simple
global struct for each device global to be registered. These are placed
at a special section `cuda_offloading_entires`. Because this section is
a valid C-identifier, the linker will profide a `__start` and `__stop`
pointer that we can use to iterate and register all globals at runtime.
the registration requires a flag variable to indicate which registration
function to use. I have assigned the flags somewhat arbitrarily, but
these use the following values.
Kernel: 0
Variable: 0
Managed: 1
Surface: 2
Texture: 3
Depends on D120272
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D123471