The current only way to obtain pinned memory with libomptarget is to use a custom allocator llvm_omp_target_alloc_host.
This reflects well the CUDA implementation of libomptarget, but it does not correctly expose the AMDGPU runtime API,
where any system allocated page can be locked/unlocked through a call to hsa_amd_memory_lock/unlock.
This patch enables users to allocate memory through malloc (mmap, sbreak) and then pin the related memory pages
with a libomptarget special call. It is a base support in the amdgpu libomptarget plugin to enable users to prelock
their host memory pages so that the runtime doesn't need to lock them itself for asynchronous memory transfers.
Reviewed By: jdoerfert, ye-luo
Differential Revision: https://reviews.llvm.org/D139208
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115279
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115279
Use the same debug print as the rest of libomptarget plugins with
the same environment control. Also drop the max queue size debugging hook as
I don't believe it is still in use, can bring it back near the rest of the env
handling in rtl.cpp if someone objects.
That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify
control flow in a couple of places.
Behaviour change is that debug prints that used to use the old environment
variable now use the new one and print in slightly different format, and the
removal of the max queue size variable.
Reviewed By: pdhaliwal
Differential Revision: https://reviews.llvm.org/D108784