We had three `utils::` namespaces, all with different "meaning" (host,
device, hsa_utils). We should, when we can, keep "include/Shared"
accessible from host and device, thus RefCountTy has been moved to a
separate header. `hsa_utils` was introduced to make `utils::` less
overloaded. And common functionality was de-duplicated, e.g.,
`utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type
punning now checks for the size of the result to make sure it matches
the source type.
No functional change was intended.
When we use the device, e.g., with an API that interacts with it, we
need to ensure the image is loaded and the constructors are executed.
Two tests are included to verify we 1) load images and run constructors
when needed, and 2) we do so lazily only if the device is actually used.
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>
As a first step towards a GPU sanitizer we now can track allocations and
deallocations in order to report double frees, and other problems during
deallocation.
Summary:
Currently, we register images into a linear table according to the
logical OpenMP device identifier. We then initialize all of these images
as one block. This logic requires that images are compatible with *all*
devices instead of just the one that it can run on. This prevents us
from running on systems with heterogeneous devices (i.e. image 1 runs on
device 0 image 0 runs on device 1).
This patch reworks the logic by instead making the compatibility check a
per-device query. We then scan every device to see if it's compatible
and do it as they come.
Summary:
The `GenericDeviceTy::dataDelete` method doesn't verify the
`TargetAllocTy` of the of the device pointer. Because of this, it can
use the `MemoryManager` to free the ptr. However, the
`TARGET_ALLOC_HOST` and `TARGET_ALLOC_SHARED` types are not allocated
using the `MemoryManager` in the `GenericDeviceTy::dataAlloc` method.
Since the `MemoryManager` uses the `DeviceAllocatorTy::free` operation
without specifying the type of the ptr, some plugins may use incorrect
operations to free ptrs of certain types. In particular, this bug causes
the CUDA plugin to use the `cuMemFree` operation on ptrs of type
`TARGET_ALLOC_HOST`, resulting in an unchecked error, as shown in the
output snippet of the test
`offload/test/api/omp_host_pinned_memory_alloc.c`:
```
omptarget --> Notifying about an unmapping: HstPtr=0x00007c6114200000
omptarget --> Call to llvm_omp_target_free_host for device 0 and address 0x00007c6114200000
omptarget --> Call to omp_get_num_devices returning 1
omptarget --> Call to omp_get_initial_device returning 1
PluginInterface --> MemoryManagerTy:🆓 target memory 0x00007c6114200000.
PluginInterface --> Cannot find its node. Delete it on device directly.
TARGET CUDA RTL --> Failure to free memory: Error in cuMemFree[Host]: invalid argument
omptarget --> omp_target_free deallocated device ptr
```
This patch fixes this by adding the check of the device pointer type
before calling the appropriate operation for each type.
In a nutshell, this moves our libomptarget code to populate the offload
subproject.
With this commit, users need to enable the new LLVM/Offload subproject
as a runtime in their cmake configuration.
No further changes are expected for downstream code.
Tests and other components still depend on OpenMP and have also not been
renamed. The results below are for a build in which OpenMP and Offload
are enabled runtimes. In addition to the pure `git mv`, we needed to
adjust some CMake files. Nothing is intended to change semantics.
```
ninja check-offload
```
Works with the X86 and AMDGPU offload tests
```
ninja check-openmp
```
Still works but doesn't build offload tests anymore.
```
ls install/lib
```
Shows all expected libraries, incl.
- `libomptarget.devicertl.a`
- `libomptarget-nvptx-sm_90.bc`
- `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git`
- `libomptarget.so` -> `libomptarget.so.18git`
Fixes: https://github.com/llvm/llvm-project/issues/75124
---------
Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>