llvm-project

Author	SHA1	Message	Date
Joseph Huber	d0ff5e4030	[libc] Update RPC interface for system utilities on the GPU This patch reworks the RPC interface to allow more generic memory operations using the shared better. This patch decomposes the entire RPC interface into opening a port and calling `send` or `recv` on it. The `send` function sends a single packet of the length of the buffer. The `recv` function is paired with the `send` call to then use the data. So, any aribtrary combination of sending packets is possible. The only restriction is that the client initiates the exchange with a `send` while the server consumes it with a `recv`. The operation of this is driven by two independent state machines that tracks the buffer ownership during loads / stores. We keep track of two so that we can transition between a send state and a recv state without an extra wait. State transitions are observed via bit toggling, e.g. This interface supports an efficient `send -> ack -> send -> ack -> send` interface and allows for the last send to be ignored without checking the ack. A following patch will add some more comprehensive testing to this interface. I I informally made an RPC call that simply incremented an integer and it took roughly 10 microsends to complete an RPC call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D148288	2023-04-19 20:02:31 -05:00
Joseph Huber	bc11bb3e26	[libc] Add the '--threads' and '--blocks' option to the GPU loaders We will want to test the GPU `libc` with multiple threads in the future. This patch adds the `--threads` and `--blocks` option to set the `x` dimension of the kernel. Using CUDA terminology instead of OpenCL for familiarity. Depends on D148288 D148342 Reviewed By: jdoerfert, sivachandra, tra Differential Revision: https://reviews.llvm.org/D148485	2023-04-19 08:01:58 -05:00
Joseph Huber	dfc162ad3f	[libc] Free the GPU memory allocated in the device loaders Summary: This part was ignored and we just hoped that shutting down the runtime freed these correctly. But it's best to be specific and free the memory we've allocated.	2023-04-03 11:55:32 -05:00
Joseph Huber	2bef46d2ad	[libc] Add a loader utility for NVPTX architectures for testing This patch adds a loader utility targeting the CUDA driver API to launch NVPTX images called `nvptx_loader`. This takes a GPU image on the command line and launches the `_start` kernel with the appropriate arguments. The `_start` kernel is provided by the already implemented `nvptx/start.cpp`. So, an application with a `main` function can be compiled and run as follows. ``` clang++ --target=nvptx64-nvidia-cuda main.cpp crt1.o -march=sm_70 -o image ./nvptx_loader image args to kernel ``` This implementation is not tested and does not yet support RPC. This requires further development to work around NVIDIA specific limitations in atomics and linking. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D146681	2023-03-24 20:04:42 -05:00
Joseph Huber	6bd4d717d5	[libc] Add environment variables to GPU libc test for AMDGPU This patch performs the same operation to copy over the `argv` array to the `envp` array. This allows the GPU tests to use environment variables. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D146322	2023-03-20 13:16:58 -05:00
Joseph Huber	ae30ae23aa	[libc][NFC] Add some missing comments to the RPC implementation Summary: These comments were accidentally dropped from the committed version. Add them back in.	2023-03-20 09:30:12 -05:00
Joseph Huber	8e4f9b1fcb	[libc] Add initial support for an RPC mechanism for the GPU This patch adds initial support for an RPC client / server architecture. The GPU is unable to perform several system utilities on its own, so in order to implement features like printing or memory allocation we need to be able to communicate with the executing process. This is done via a buffer of "sharable" memory. That is, a buffer with a unified pointer that both the client and server can use to communicate. The implementation here is based off of Jon Chesterfields minimal RPC example in his work. We use an `inbox` and `outbox` to communicate between if there is an RPC request and to signify when work is done. We use a fixed-size buffer for the communication channel. This is fixed size so that we can ensure that there is enough space for all compute-units on the GPU to issue work to any of the ports. Right now the implementation is single threaded so there is only a single buffer that is not shared. This implementation still has several features missing to be complete. Such as multi-threaded support and asynchrnonous calls. Depends on D145912 Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D145913	2023-03-17 12:55:31 -05:00
Joseph Huber	67d78e3c6f	[libc] Add a loader utility for AMDHSA architectures for testing This is the first attempt to get some testing support for GPUs in LLVM's libc. We want to be able to compile for and call generic code while on the device. This is difficult as most GPU applications also require the support of large runtimes that may contain their own bugs (e.g. CUDA / HIP / OpenMP / OpenCL / SYCL). The proposed solution is to provide a "loader" utility that allows us to execute a "main" function on the GPU. This patch implements a simple loader utility targeting the AMDHSA runtime called `amdhsa_loader` that takes a GPU program as its first argument. It will then attempt to load a predetermined `_start` kernel inside that image and launch execution. The `_start` symbol is provided by a `start` utility function that will be linked alongside the application. Thus, this should allow us to run arbitrary code on the user's GPU with the following steps for testing. ``` clang++ Start.cpp --target=amdgcn-amd-amdhsa -mcpu=<arch> -ffreestanding -nogpulib -nostdinc -nostdlib -c clang++ Main.cpp --target=amdgcn-amd-amdhsa -mcpu=<arch> -nogpulib -nostdinc -nostdlib -c clang++ Start.o Main.o --target=amdgcn-amd-amdhsa -o image amdhsa_loader image <args, ...> ``` We determine the `-mcpu` value using the `amdgpu-arch` utility provided either by `clang` or `rocm`. If `amdgpu-arch` isn't found or returns an error we shouldn't run the tests as the machine does not have a valid HSA compatible GPU. Alternatively we could make this utility in-source to avoid the external dependency. This patch provides a single test for this untility that simply checks to see if we can compile an application containing a simple `main` function and execute it. The proposed solution in the future is to create an alternate implementation of the LibcTest.cpp source that can be compiled and launched using this utility. This approach should allow us to use the same test sources as the other applications. This is primarily a prototype, suggestions for how to better integrate this with the existing LibC infastructure would be greatly appreciated. The loader code should also be cleaned up somewhat. An implementation for NVPTX will need to be written as well. Reviewed By: sivachandra, JonChesterfield Differential Revision: https://reviews.llvm.org/D139839	2023-02-13 13:49:01 -06:00

8 Commits