Summary:
In order for this to work with CUDA we need to declare functions as
__host__ and __device__ while also making sure we only call the GPU
functions during the CUDA / HIP compile stage.
Summary:
Previous patches have made the `rpc.h` header independent of the `libc`
internals. This allows us to include it directly rather than providing
an indirect C API. This patch only does the work to move the header. A
future patch will pull out the `rpc_server` interface and simply replace
it with a single function that handles the opcodes.