The following patch introduces a new interop interface implementation
with the following characteristics:
* It supports the new 6.0 prefer_type specification
* It supports both explicit objects (from interop constructs) and
implicit objects (from variant calls).
* Implements a per-thread reuse mechanism for implicit objects to reduce
overheads.
* It provides a plugin interface that allows selecting the supported
interop types, and managing all the backend related interop operations
(init, sync, ...).
* It enables cooperation with the OpenMP runtime to allow progress on
OpenMP synchronizations.
* It cleanups some vendor/fr_id mismatchs from the current query
routines.
* It supports extension to define interop callbacks for library cleanup.
Summary:
Currently the Auto Zero-Copy is enabled by checking every initialized
device to ensure that no dGPU is attached to an APU. However, an APU is
designed to comprise a homogeneous set of GPUs, therefore, it should be
sufficient to check any device for configuring Auto Zero-Copy. In this
PR, it checks the first initialized device in the list.
The changes in this PR are to clearly reflect the design and logic of
enabling the feature for further improving the readibility.
[Offload] Use new error code handling mechanism
This removes the old ErrorCode-less error method and requires
every user to provide a concrete error code. All calls have been
updated.
In addition, for consistency with error messages elsewhere in LLVM, all
messages have been made to start lower case.
Summary:
Right now we generally assume that we have one image per device. The
binary descriptor represents a single 'compilation'. This means that
each image is going to contain the same code built for different
architectures when used through the OpenMP interface. This is
problematic when we have cases where the same code will then be loaded
multiple times (like wiht sm_80, sm_89 or the generic GFX ISAs). This
patch is the quick and dirty slution, we just prevent this from
happening at all. This means we use the first one we find, which might
not be overly optimal, but it should be better than the alternative.
Note that this does not affect shared library loads as it is per binary
descriptor, not per device.
If user specifies offload is disabled (e.g.,
OMP_TARGET_OFFLOAD=disable), disable library almost completely. This
reduces resources spent to a minimum and ensures all APIs behave as if
the only available device is the host device.
Currently some of the APIs behave as if there were devices avaible for
offload even when under OMP_TARGET_OFFLOAD=disable.
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>