58 Commits

Author SHA1 Message Date
Joseph Huber
51e3c3d51b
[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)
Summary:
This exposes the 'isDeviceCompatible' routine for checking if a binary
*can* be loaded. This is useful if people don't want to consume errors
everywhere when figuring out which image to put to what device.

I don't know if this is a good name, I was thining like `olIsCompatible`
or whatever. Let me know what you think.

Long term I'd like to be able to do something similar to what OpenMP
does where we can conditionally only initialize devices if we need them.
That's going to be support needed if we want this to be more
generic.
2025-09-19 12:15:57 -05:00
Joseph Huber
e7101dac9c
[Offload] Copy loaded images into managed storage (#158748)
Summary:
Currently we have this `__tgt_device_image` indirection which just takes
a reference to some pointers. This was all find and good when the only
usage of this was from a section of GPU code that came from an ELF
constant section. However, we have expanded beyond that and now need to
worry about managing lifetimes. We have code that references the image
even after it was loaded internally. This patch changes the
implementation to instaed copy the memory buffer and manage it locally.

This PR reworks the JIT and other image handling to directly manage its
own memory. We now don't need to duplicate this behavior externally at
the Offload API level. Also we actually free these if the user unloads
them.

Upside, less likely to crash and burn. Downside, more latency when
loading an image.
2025-09-16 08:57:28 -05:00
Ross Brunton
ffb756dff2
[Offload] Add OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION] (#155823)
This is the total number of work items that the device supports (the
equivalent work group properties are for only a single work group).
2025-08-29 09:39:18 +01:00
Ross Brunton
9e5d8bd3d1
[Offload] Improve olDestroyQueue logic (#153041)
Previously, `olDestroyQueue` would not actually destroy the queue,
instead leaving it for the device to clean up when it was destroyed.
Now, the queue is either released immediately if it is complete or put
into a list of "pending" queues if it is not. Whenever we create a new
queue, we check this list to see if any are now completed. If there are
any we release their resources and use them instead of pulling from
the pool.

This prevents long running programs that create and drop many queues
without syncing them from leaking memory all over the place.
2025-08-29 09:39:00 +01:00
Ross Brunton
41fed2d048
[Offload] Add PRODUCT_NAME device info (#155632)
On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-28 15:16:17 +01:00
Callum Fare
0b18d2da70
[Offload] Implement olMemFill (#154102)
Implement olMemFill to support filling device memory with arbitrary
length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22 14:31:16 +01:00
Ross Brunton
4c0c295775
[Offload] OL_EVENT_INFO_IS_COMPLETE (#153194)
A simple info query for events that returns whether the event is
complete or not.
2025-08-22 13:40:31 +01:00
Ross Brunton
273ca1f77b
[Offload] Fix OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE on AMD (#154521)
This wasn't handled with the normal info API, so needs special handling.
2025-08-21 09:37:58 +01:00
Ross Brunton
c8986d1ecb
[Offload] Guard olMemAlloc/Free with a mutex (#153786)
Both these functions update an `AllocInfoMap` structure in the context,
however they did not use any locks, causing random failures in threaded
code. Now they use a mutex.
2025-08-20 13:23:57 +01:00
Ross Brunton
2c11a83691
[Offload] Add olCalculateOptimalOccupancy (#142950)
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is
currently
only implemented on Cuda; AMDGPU and Host return unsupported.

---------

Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19 15:16:47 +01:00
Rafal Bielski
9c9d9e4cb6
[Offload] Define additional device info properties (#152533)
Add the following properties in Offload device info:
* VENDOR_ID
* NUM_COMPUTE_UNITS
* [SINGLE|DOUBLE|HALF]_FP_CONFIG
* NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
* MAX_CLOCK_FREQUENCY
* MEMORY_CLOCK_RATE
* ADDRESS_BITS
* MAX_MEM_ALLOC_SIZE
* GLOBAL_MEM_SIZE

Add a bitfield option to enumerators, allowing the values to be
bit-shifted instead of incremented. Generate the per-type enums using
`foreach` to reduce code duplication.

Use macros in unit test definitions to reduce code duplication.
2025-08-19 13:02:01 +01:00
Ross Brunton
30c7951136
[Offload] olLaunchHostFunction (#152482)
Add an `olLaunchHostFunction` method that allows enqueueing host work
to the stream.
2025-08-15 09:39:48 +01:00
Ross Brunton
3e9f29cfee
[Offload] Store globals in the program's global list rather than the kernel list (#153441) 2025-08-13 17:18:25 +01:00
Ross Brunton
910d7e90bf
[Offload] Make olLaunchKernel test thread safe (#149497)
This sprinkles a few mutexes around the plugin interface so that the
olLaunchKernel CTS test now passes when ran on multiple threads.

Part of this also involved changing the interface for device synchronise
so that it can optionally not free the underlying queue (which
introduced a race condition in liboffload).
2025-08-08 10:57:04 +01:00
Ross Brunton
197d1c1570
[Offload] OL_QUEUE_INFO_EMPTY (#152473)
Add a queue query that (if possible) reports whether the queue is empty
2025-08-08 10:20:45 +01:00
Ross Brunton
a44532544b
[Offload] Don't create events for empty queues (#152304)
Add a device function to check if a device queue is empty. If liboffload
tries to create an event for an empty queue, we create an "empty" event
that is already complete.

This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run
quickly for empty queues.
2025-08-07 10:16:33 +01:00
Ross Brunton
d03692a00e
[Offload] Rework MAX_WORK_GROUP_SIZE (#151926)
`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work
groups the device can allocate, rather than the maximum per dimension.
`MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old
behaviour.
2025-08-04 15:21:24 +01:00
Ross Brunton
adb2421202
[Offload] Refactor device information queries to use new tagging (#147318)
Instead using strings to look up device information (which is brittle
and slow), use the new tags that the plugins specify when building the
nodes.
2025-07-25 14:51:51 +01:00
Ross Brunton
690c3ee5be
[Offload] Replace "EventOut" parameters with olCreateEvent (#150217)
Rather than having every "enqueue"-type function have an output pointer
specifically for an output event, just provide an `olCreateEvent`
entrypoint which pushes an event to the queue.

For example, replace:
```cpp
olMemcpy(Queue, ..., EventOut);
```
with
```cpp
olMemcpy(Queue, ...);
olCreateEvent(Queue, EventOut);
```
2025-07-24 14:31:06 +01:00
Ross Brunton
081b74caf5
[Offload] Add olWaitEvents (#150036)
This function causes a queue to wait until all the provided events have
completed before running any future scheduled work.
2025-07-23 14:12:16 +01:00
Ross Brunton
2726b7fb1c
[Offload] Rename olWaitEvent/Queue to olSyncEvent/Queue (#150023)
This more closely matches the nomenclature used by CUDA, AMDGPU and
the plugin interface.
2025-07-23 10:52:13 +01:00
Ross Brunton
55b417a75f
[Offload] Cache symbols in program (#148209)
When creating a new symbol, check that it already exists. If it does,
return that pointer rather than building a new symbol structure.
2025-07-16 18:32:47 +01:00
Callum Fare
47c9609a86
[Offload] Check plugins aren't already deinitialized when tearing down (#148642)
This is a hotfix for #148615 - it fixes the issue for me locally.

I think a broader issue is that in the test environment we're calling
olShutDown from a global destructor in the test binaries. We should do
something more controlled, either calling olInit/olShutDown in every
test, or move those to a GTest global environment. I didn't do that
originally because it looked like it needed changes to LLVM's GTest
wrapper.
2025-07-14 16:17:10 +01:00
Ross Brunton
2fdeeefacf
[Offload] Add global variable address/size queries (#147972)
Add two new symbol info types for getting the bounds of a global
variable. As well as a number of tests for reading/writing to it.
2025-07-11 16:12:48 +01:00
Ross Brunton
84e15d08c2
[Offload] Add olGetSymbolInfo[Size] (#147962)
This mirrors the similar functions for other handles. The only
implemented info at the moment is the symbol's kind.
2025-07-11 15:29:53 +01:00
Ross Brunton
eee723f928
[Offload] Replace GetKernel with GetSymbol with global support (#148221)
`olGetKernel` has been replaced by `olGetSymbol` which accepts a
`Kind` parameter. As well as loading information about kernels, it
can now also load information about global variables.
2025-07-11 14:48:10 +01:00
Ross Brunton
466357ab51
[Offload] Change ol_kernel_handle_t -> ol_symbol_handle_t (#147943)
In the future, we want `ol_symbol_handle_t` to represent both kernels
and global variables The first step in this process is a rename and
promotion to a "typed handle".
2025-07-10 14:54:10 +01:00
Callum Fare
7c6edf4a05
[Offload] Implement olGetQueueInfo, olGetEventInfo (#142947)
Add info queries for queues and events.

`olGetQueueInfo` only supports getting the associated device. We were
already tracking this so we can implement this for free. We will likely
add other queries to it in the future (whether the queue is empty, what
flags it was created with, etc)

`olGetEventInfo` only supports getting the associated queue. This is
another thing we were already storing in the handle. We'll be able to
add other queries in future (the event type, status, etc)
2025-07-09 17:09:31 +01:00
Ross Brunton
7d52b0983e
[Offload] Add MAX_WORK_GROUP_SIZE device info query (#143718)
This adds a new device info query for the maximum workgroup/block size
for each dimension.
2025-07-02 16:33:54 +01:00
Ross Brunton
67e73ba605
[Offload] Refactor device/platform info queries (#146345)
This makes several small changes to how the platform and device info
queries are handled:
* ReturnHelper has been replaced with InfoWriter which is more explicit
  in how it is invoked.
* InfoWriter consumes `llvm::Expected` rather than values directly, and
  will early exit if it returns an error.
* As a result of the above, `GetInfoString` now correctly returns errors
  rather than empty strings.
* The host device now has its own dedicated "getInfo" function rather
  than being checked in multiple places.
2025-06-30 15:00:43 +01:00
Ross Brunton
003145d0c8
[Offload] Implement olShutDown (#144055)
`olShutDown` was not properly calling deinit on the platforms, resulting
in random segfaults on AMD devices.

As part of this, `olInit` and `olShutDown` now alloc and free the
offload context rather than it being static. This
allows `olShutDown` to be called within a destructor of a static object
(like the tests do) without having to worry about destructor ordering.
2025-06-30 12:14:00 +01:00
Ross Brunton
39f19f2f1f
[Offload] Store device info tree in device handle (#145913)
Rather than creating a new device info tree for each call to
`olGetDeviceInfo`, we instead do it on device initialisation. As well
as improving performance, this fixes a few lifetime issues with returned
strings.

This does unfortunately mean that device information is immutable,
but hopefully that shouldn't be a problem for any queries we want to
implement.

This also meant allowing offload initialization to fail, which it can
now do.
2025-06-27 15:10:43 +01:00
Ross Brunton
0870c8838b
[Offload] Add an unloadBinary interface to PluginInterface (#143873)
This allows removal of a specific Image from a Device, rather than
requiring all image data to outlive the device they were created for.

This is required for `ol_program_handle_t`s, which now specify the
lifetime of the buffer used to create the program.
2025-06-25 14:53:18 +01:00
Ross Brunton
4359e55838
[Offload] Properly report errors when jit compiling (#145498)
Previously, if a binary failed to load due to failures when jit
compiling, the function would return success with nullptr. Now it
returns a new plugin error, `COMPILE_FAILURE`.
2025-06-24 16:27:12 +01:00
Ross Brunton
f242360e15
[Offload] Add type information to device info nodes (#144535)
Rather than being "stringly typed", store values as a std::variant that
can hold various types. This means that liboffload doesn't have to do
any string parsing for integer/bool device info keys.
2025-06-20 09:05:05 -05:00
Ross Brunton
e0633d59b9
[Offload] Check for initialization (#144370)
All entry points (except olInit) now check that offload has been
initialized. If not, a new `OL_ERRC_UNINITIALIZED` error is returned.
2025-06-20 09:04:50 -05:00
Ross Brunton
53336ad488
[Offload] Move (most) global state to an OffloadContext struct (#144494)
Rather than having a number of static local variables, we now use
a single `OffloadContext` struct to store global state. This is
initialised by `olInit`, but is never deleted (de-initialization of
Offload isn't yet implemented).

The error reporting mechanism has not been moved to the struct, since
that's going to cause issues with teardown (error messages must outlive
liboffload).
2025-06-19 16:02:03 -05:00
Ross Brunton
e6a3579653
[Offload] Replace device info queue with a tree (#144050)
Previously, device info was returned as a queue with each element having
a "Level" field indicating its nesting level. This replaces this queue
with a more traditional tree-like structure.

This should not result in a change to the output of
`llvm-offload-device-info`.
2025-06-13 09:22:47 -05:00
Ross Brunton
4f60321ca1
[Offload] Add ol_dimensions_t and convert ranges from size_t -> uint32_t (#143901)
This is a three element x, y, z size_t vector that can be used any place
where a 3D vector is required. This ensures that all vectors across
liboffload are the same and don't require any resizing/reordering
dances.
2025-06-12 09:59:59 -05:00
Callum Fare
835497a4dc
[Offload] Make olMemcpy src parameter const (#143161) 2025-06-06 10:25:00 -05:00
Ross Brunton
7efb79b705
[Offload] Fix Error checking (#141939)
All errors must be checked - this includes the local variable we were
using to increase the lifetime of `Res`. As we were not explicitly
checking it, it resulted in an `abort` in debug builds.
2025-05-29 08:17:08 -05:00
Joseph Huber
0ebe5557d9
[Offload] Add specifier for the host type (#141635)
Summary:
We use this sepcial type to indicate a host value, this will be refined
later but for now it's used as a stand-in device for transfers and
queues. It needs a special kind because it is not a device target as the
other ones so we need to differentiate it between a CPU and GPU type.

Fixes: https://github.com/llvm/llvm-project/issues/141436
2025-05-28 08:51:14 -05:00
Joseph Huber
a9b64bb318
[Offload] Fix segfault when looking for host device name (#141632)
Summary:
This is done using the generic device into pointe, but no such thing
exists for the host device, leading to a segfault. This patch fixes that
for now, but in the future we should probably be more careful in general
handling the possibility that the handle is null everywhere.

Fixes: https://github.com/llvm/llvm-project/issues/141434
2025-05-27 13:43:29 -05:00
Ross Brunton
7e9d708be0
[Offload] Use llvm::Error throughout liboffload internals (#140879)
This removes the `ol_impl_result_t` helper class, replacing it with
`llvm::Error`. In addition, some internal functions that returned
`ol_errc_t` now return `llvm::Error` (with a fancy message).
2025-05-27 13:42:56 -05:00
Ross Brunton
050892d2f8
[Offload] Use new error code handling mechanism and lower-case messages (#139275)
[Offload] Use new error code handling mechanism

This removes the old ErrorCode-less error method and requires
every user to provide a concrete error code. All calls have been
updated.

In addition, for consistency with error messages elsewhere in LLVM, all
messages have been made to start lower case.
2025-05-20 08:50:20 -05:00
Ross Brunton
f6ac5276ee
[Offload] Ensure all llvm::Errors are handled (#137339)
`llvm::Error`s containing errors must be explicitly handled or an assert
will be raised. With this change, `ol_impl_result_t` can accept and
consume an `llvm::Error` for errors raised by PluginInterface that have
multiple causes and other places now call `llvm::consumeError`.

Note that there is currently no facility for PluginInterface to
communicate exact error codes, but the constructor is designed in
such a way that it can be easily added later. This MR is to
convert a crash into an error code.

A new test was added, however due to the aforementioned issue with
error codes, it does not pass and instead is marked as a skip.
2025-05-02 07:37:19 -05:00
Callum Fare
7bc16a0f63
[Offload] Adding missing Offload unit tests for event entry points (#137315)
A couple of liboffload entry points were missed out from the tests, and
unsurprisingly a crash in one of them made it in. Add the tests and fix
the unchecked error in `olDestroyEvent`.
2025-04-30 09:06:00 -05:00
Callum Fare
6022a5214b
[Offload] Add check-offload-unit for liboffload unittests (#137312)
Adds a `check-offload-unit` target for running the liboffload unit test
suite. This unit test binary runs the tests for every available device.
This can optionally filtered to devices from a single platform, but the
check target runs on everything.

The target is not part of `check-offload` and does not get propagated to
the top level build. I'm not sure if either of these things are
desirable, but I'm happy to look into it if we want.

Also remove the `offload/unittests/Plugins` test as it's dead code and
doesn't build.
2025-04-29 11:21:59 -05:00
Callum Fare
800d949bb3
[Offload] Implement the remaining initial Offload API (#122106)
Implement the complete initial version of the Offload API, to the extent
that is usable for simple offloading programs. Tested with a basic SYCL
program.

As far as possible, these are simple wrappers over existing
functionality in the plugins.

* Allocating and freeing memory (host, device, shared).
* Creating a program 
* Creating a queue (wrapper over asynchronous stream resource)
* Enqueuing memcpy operations
* Enqueuing kernel executions
* Waiting on (optional) output events from the enqueue operations
* Waiting on a queue to finish

Objects created with the API have reference counting semantics to handle
their lifetime. They are created with an initial reference count of 1,
which can be incremented and decremented with retain and release
functions. They are freed when their reference count reaches 0. Platform
and device objects are not reference counted, as they are expected to
persist as long as the library is in use, and it's not meaningful for
users to create or destroy them.

Tests have been added to `offload.unittests`, including device code for
testing program and kernel related functionality.

The API should still be considered unstable and it's very likely we will
need to change the existing entry points.
2025-04-22 13:27:50 -05:00
Callum Fare
fd3907ccb5
Reland #118503: [Offload] Introduce offload-tblgen and initial new API implementation (#118614)
Reland #118503. Added a fix for builds with `-DBUILD_SHARED_LIBS=ON`
(see last commit). Otherwise the changes are identical.

---


### New API

Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.

### Demoing the new API

```sh
# From the runtime build directory
$ ninja LibomptUnitTests
$ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests 
```


### Open questions and future work
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
2024-12-05 09:34:04 +01:00