This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.
2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.
3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.
4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
Without explicitly unregistering you will get
```
'cuMemHostRegister(ptr, sizeBytes, 0)' failed with 'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'
```
in CUDA (for example) after repeated runs (e.g., during benchmarking the same kernel).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D147277
The push/pop context APIs are deprecated in HIP, and keeping the
default device set is handled in IHP using hipSetDevice().
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D134747
This op is added to allow MLIR code running on multi-GPU systems to
select the GPU they want to execute operations on when no GPU is
otherwise specified.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D119883
Create a gpu memset op and corresponding CUDA and ROCm wrappers.
Reviewed By: herhut, lorenrose1013
Differential Revision: https://reviews.llvm.org/D107548
This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.
I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D98447