This patch performs the same operation to copy over the `argv` array to
the `envp` array. This allows the GPU tests to use environment
variables.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146322
This patch enables integration tests running on the GPU. This uses the
RPC interface implemented in D145913 to compile the necessary
dependencies for the integration test object. We can then use this to
compile the objects for the GPU directly and execute them using the AMD
HSA loader combined with its RPC server. For example, the compiler is
performing the following actions to execute the integration tests.
```
$ clang++ --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nostdlib -flto -ffreestanding \
crt1.o io.o quick_exit.o test.o rpc_client.o args_test.o -o image
$ ./amdhsa_loader image 1 2 5
args_test.cpp:24: Expected 'my_streq(argv[3], "3")' to be true, but is false
```
This currently only works with a single threaded client implementation
running on AMDGPU. Further work will implement multiple clients for AMD
and the ability to run on NVPTX as well.
Depends on D145913
Reviewed By: sivachandra, JonChesterfield
Differential Revision: https://reviews.llvm.org/D146256
This patch adds initial support for an RPC client / server architecture.
The GPU is unable to perform several system utilities on its own, so in
order to implement features like printing or memory allocation we need
to be able to communicate with the executing process. This is done via a
buffer of "sharable" memory. That is, a buffer with a unified pointer
that both the client and server can use to communicate.
The implementation here is based off of Jon Chesterfields minimal RPC
example in his work. We use an `inbox` and `outbox` to communicate
between if there is an RPC request and to signify when work is done.
We use a fixed-size buffer for the communication channel. This is fixed
size so that we can ensure that there is enough space for all
compute-units on the GPU to issue work to any of the ports. Right now
the implementation is single threaded so there is only a single buffer
that is not shared.
This implementation still has several features missing to be complete.
Such as multi-threaded support and asynchrnonous calls.
Depends on D145912
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D145913
Summary:
Currently AMDGPU only barely supports cross-TU ELF linking. Full linking
is usually done via LTO. This requires passing the architecture to the
link job. This is done automatically via `-flto` since D144505. Add this
to the link options.
This patch unifies the handling of generating the GPU build targets
between the `add_entrypoint_library` and the `add_object_library`
functions. The `_build_gpu_objects` function will create two targets.
One contains a single object file with several GPU binaries embedded in
it, a so-called fatbinary. The other is a direct compile of the
supported target to be used internally only. This patch pulls out some
of the properties logic so that we can handle both more easily. This
patch also required adding an ovverride `NO_GPU_BUILD` for cases when
we only want to build the source file as normal.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D144214
This patch introduces startup code for executing `main` on a device
compiled for the GPU. We will primarily use this to run standalone
integration tests on the GPU. The actual execution of this routine will
need to be provided by a `loader` utility to bootstrap execution on the
GPU.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D143212