Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.
Liboffload
This directory contains the implementation of the work-in-progress new API for Offload. It builds on top of the existing plugin implementations but provides a single level of abstraction suitable for implementation of many offloading language runtimes, rather than just OpenMP.
Testing liboffload
The main test suite for liboffload can be run with the check-offload-unit
target, which runs the offload.unittests executable. The test suite will
automatically run on every available device, but can be restricted to a single
platform (CUDA, AMDGPU) with a command line argument:
$ ./offload.unittests --platform=CUDA
Tracing of Offload API calls can be enabled by setting the OFFLOAD_TRACE
environment variable. This works with any program that uses liboffload.
$ OFFLOAD_TRACE=1 ./offload.unittests
---> olInit()-> OL_SUCCESS
# etc
The host plugin is not currently supported.
Modifying liboffload
The main header (OffloadAPI.h) and some implementation details are
autogenerated with tablegen. See the API definition README
for implementation details.