Summary:
We currently have an unnecessary level of indirection when initializing
the RPC client. This is a holdover from when the RPC client was not
trivially copyable and simply makes it more complicated. Here we use the
`asm` syntax to give the C++ variable a valid name so that we can just
copy to it directly.
Another advantage to this, is that if users want to piggy-back on the
same RPC interface they need only declare theirs as extern with the same
symbol name, or make it weak to optionally use it if LIBC isn't
avaialb.e
Summary:
The AMDGPU backend can handle wavefront sizes of 32 and 64, with the
native hardware preferring one or the other. The user can override the
hardware with `-mwavefrontsize64` or `-mwavefrontsize32` which
previously wasn't handled. We need to know the wavefront size to know
how much memory to allocate and how to index the RPC buffer. There isn't
a good way to do this with ROCm so we just use the LLVM support for
offloading to check this from the image.
Summary:
This patch removes much of the `llvmlibc_rpc_server` interface. This
pretty much deletes all of this code and just replaces it with including
`rpc.h` directly. We still maintain the file to let `libc` handle the
opcodes, since those depend on the `printf` impelmentation.
This will need to be cleaned up more, but I don't want to put too much
into a single patch.
Summary:
These functions were deprecated in ROCR 1.3 which was released quite
some time ago. The main functionality that was lost was modifying and
inspecting the code object indepedently of the executable, however we do
all of that custom through our ELF API. This should be within the
versions of other functions we use.
Summary:
Make a separate thread to run the server when we launch. This is
required by CUDA, which you can force with `export
CUDA_LAUNCH_BLOCKING=1`. I figured I might as well be consistent and do
it for the AMD implementation as well even though I believe it's not
necessary.
Summary:
It's safer to use the maximum size, as this prevents the runtime from
oversubscribing with multiple producers. Additionally we should set the
barrier bit to ensure that the queue entries block if multiple are
submitted (Which shouldn't happen for this tool).
Summary:
This patch removes the ad-hoc parsing that I used previously and
replaces it with the LLVM CommnadLine interface. This doesn't change any
functionality, but makes it easier to maintain.