63 Commits

Author SHA1 Message Date
Joseph Huber
6282a7b993 [Offload] Fix missing end to string in .td file 2026-02-17 15:32:17 -06:00
fineg74
1c6d774baa
[OFFLOAD] Extend olMemRegister API to handle cases when a memory block may have been mapped outside of liboffload. (#172226)
This PR adds extends liboffload olMemRegister API to handle a case when
a memory block may have been mapped before calling olMemRegister to
support some use cases in libomptarget
2026-02-17 20:53:00 +00:00
Joseph Huber
d62cd1b89d
[Offload] Add argument to 'olInit' for global configuration options (#181872)
Summary:
This PR adds a pointer argument to the initialization routine to be used
for global options. Right now this is used to allow the user to
constrain which backends they wish to use.

If a null argument is passed, the same behavior as before is observed.
This is epxected to be extensible by forcing the user to encode the size
of the struct. So, old executables will encode which fields they have
access to.

We use a macro helper to get this struct rather than a runtime call so
that the current state of the size is baked into the executable rather
than something looked up by the runtime. Otherwise it would just return
the size that the (potentially newer) runtime would see
2026-02-17 14:04:00 -06:00
Joseph Huber
1a86c146ae
[Offload] Add a function to register an RPC Server callback (#178774)
Summary:
We provide an RPC server to manage calls initiated by the device to run
on the host. This is very useful for the built-in handling we have,
however there are cases where we would want to extend this
functionality.

Cases like Fortran or MPI would be useful, but we cannot put references
to these in the core offloading runtime. This way, we can provide this
as a library interface that registers custom handlers for whatever code
people want.
2026-01-30 08:03:13 -06:00
fineg74
848d736e64
[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231)
Add liboffload asynchronous queue query API for libomptarget migration

This PR adds liboffload asynchronous queue query API that needed to make
libomptarget to use liboffload
2026-01-20 10:53:32 -08:00
fineg74
1232599032
[OFFLOAD] Add memory data locking API for libomptarget migration (#173138)
Add liboffload memory data locking API for libomptarget migration

This PR adds liboffload memory data locking API that needed to make
libomptarget to use liboffload
2026-01-12 13:07:57 -06:00
Alex Duran
ae739a240c
[OFFLOAD] Recognize level_zero backend in liboffload (#172818)
The code to recognize the level_zero plugin as a liboffload backend was
split from #158900. This PR adds the support back.

---------

Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
Co-authored-by: Nick Sarnie <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-12-18 15:31:36 +00:00
Kevin Sala Penades
1a86f0aae7
[Offload] Add device info for shared memory (#167817) 2025-11-13 11:00:12 -08:00
Robert Imschweiler
dc94f2cbad
[Offload] Add device UID (#164391)
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of
a device on a given system. (Not necessarily a UUID.) Since it is not
guaranteed that the (U)UIDs defined by the device vendor libraries, such
as HSA, do not overlap with those of other vendors, the device UIDs in
offload are always combined with the offload plugin name. In case the
vendor library does not specify any device UID for a given device, we
fall back to the offload-internal device ID.
The device UID can be retrieved using the `llvm-offload-device-info`
tool.
2025-11-04 20:15:47 +01:00
Joseph Huber
4a35c4d38a
[Offload] Lazily initialize platforms in the Offloading API (#163272)
Summary:
The Offloading library wraps around the underlying plugins. The problem
is that we currently initialize all plugins we find, even if they are
not needed for the program. This is very expensive for trivial uses, as
fully heterogenous usage is quite rare. In practice this means that you
will always pay a 200 ms penalty for having CUDA installed.

This patch changes the behavior to provide accessors into the plugins
and devices that allows them to be initialized lazily. We use a
once_flag, this should properly take a fast-path check while still
blocking on concurrent use.

Making full use of this will require a way to filter platforms more
specifically. I'm thinking of what this would look like as an API.
I'm thinking that we either have an extra iterate function that takes a
callback on the platform, or we just provide a helper to find all the
devices that can run a given image. Maybe both?

Fixes: https://github.com/llvm/llvm-project/issues/159636
2025-10-14 09:35:53 -05:00
Ross Brunton
ea0e5185e2
[Offload] Add olGetMemInfo with platform-less API (#159581) 2025-09-24 12:17:57 +01:00
Ross Brunton
fcebe6bdbb
[Offload] Re-allocate overlapping memory (#159567)
If olMemAlloc happens to allocate memory that was already allocated
elsewhere (possibly by another device on another platform), it is now
thrown away and a new allocation generated.

A new `AllocBases` vector is now available, which is an ordered list
of allocation start addresses.
2025-09-23 13:59:52 +01:00
Joseph Huber
51e3c3d51b
[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)
Summary:
This exposes the 'isDeviceCompatible' routine for checking if a binary
*can* be loaded. This is useful if people don't want to consume errors
everywhere when figuring out which image to put to what device.

I don't know if this is a good name, I was thining like `olIsCompatible`
or whatever. Let me know what you think.

Long term I'd like to be able to do something similar to what OpenMP
does where we can conditionally only initialize devices if we need them.
That's going to be support needed if we want this to be more
generic.
2025-09-19 12:15:57 -05:00
Ross Brunton
ffb756dff2
[Offload] Add OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION] (#155823)
This is the total number of work items that the device supports (the
equivalent work group properties are for only a single work group).
2025-08-29 09:39:18 +01:00
Ross Brunton
41fed2d048
[Offload] Add PRODUCT_NAME device info (#155632)
On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-28 15:16:17 +01:00
Callum Fare
77c5a6506f
[Offload] Fix definition of olMemFill (#154947)
Fix regression introduced by #154102 - the way offload-tblgen handles
names has changed
2025-08-22 14:48:00 +01:00
Callum Fare
0b18d2da70
[Offload] Implement olMemFill (#154102)
Implement olMemFill to support filling device memory with arbitrary
length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22 14:31:16 +01:00
Ross Brunton
4c0c295775
[Offload] OL_EVENT_INFO_IS_COMPLETE (#153194)
A simple info query for events that returns whether the event is
complete or not.
2025-08-22 13:40:31 +01:00
Ross Brunton
17dbb92612
[Offload][NFC] Use tablegen names rather than name parameter for API (#154736) 2025-08-22 11:13:57 +01:00
Ross Brunton
2e74cc6c04
[Offload][NFC] Use a sensible order for APIGen (#154518)
The order entries in the tablegen API files are iterated is not the
order
they appear in the file. To avoid any issues with the order changing
in future, we now generate all definitions of a certain class before
class that can use them.

This is a NFC; the definitions don't actually change, just the order
they exist in in the OffloadAPI.h header.
2025-08-21 09:38:21 +01:00
Ross Brunton
2c11a83691
[Offload] Add olCalculateOptimalOccupancy (#142950)
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is
currently
only implemented on Cuda; AMDGPU and Host return unsupported.

---------

Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19 15:16:47 +01:00
Rafal Bielski
9c9d9e4cb6
[Offload] Define additional device info properties (#152533)
Add the following properties in Offload device info:
* VENDOR_ID
* NUM_COMPUTE_UNITS
* [SINGLE|DOUBLE|HALF]_FP_CONFIG
* NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF]
* MAX_CLOCK_FREQUENCY
* MEMORY_CLOCK_RATE
* ADDRESS_BITS
* MAX_MEM_ALLOC_SIZE
* GLOBAL_MEM_SIZE

Add a bitfield option to enumerators, allowing the values to be
bit-shifted instead of incremented. Generate the per-type enums using
`foreach` to reduce code duplication.

Use macros in unit test definitions to reduce code duplication.
2025-08-19 13:02:01 +01:00
Ross Brunton
30c7951136
[Offload] olLaunchHostFunction (#152482)
Add an `olLaunchHostFunction` method that allows enqueueing host work
to the stream.
2025-08-15 09:39:48 +01:00
Ross Brunton
197d1c1570
[Offload] OL_QUEUE_INFO_EMPTY (#152473)
Add a queue query that (if possible) reports whether the queue is empty
2025-08-08 10:20:45 +01:00
Ross Brunton
ca13c44bbc
[NFC][Offload] Clarify olDestroyQueue (#152132)
This has no code changes.
2025-08-06 15:34:31 +01:00
Ross Brunton
d03692a00e
[Offload] Rework MAX_WORK_GROUP_SIZE (#151926)
`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work
groups the device can allocate, rather than the maximum per dimension.
`MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old
behaviour.
2025-08-04 15:21:24 +01:00
Ross Brunton
690c3ee5be
[Offload] Replace "EventOut" parameters with olCreateEvent (#150217)
Rather than having every "enqueue"-type function have an output pointer
specifically for an output event, just provide an `olCreateEvent`
entrypoint which pushes an event to the queue.

For example, replace:
```cpp
olMemcpy(Queue, ..., EventOut);
```
with
```cpp
olMemcpy(Queue, ...);
olCreateEvent(Queue, EventOut);
```
2025-07-24 14:31:06 +01:00
Ross Brunton
081b74caf5
[Offload] Add olWaitEvents (#150036)
This function causes a queue to wait until all the provided events have
completed before running any future scheduled work.
2025-07-23 14:12:16 +01:00
Ross Brunton
2726b7fb1c
[Offload] Rename olWaitEvent/Queue to olSyncEvent/Queue (#150023)
This more closely matches the nomenclature used by CUDA, AMDGPU and
the plugin interface.
2025-07-23 10:52:13 +01:00
Ross Brunton
2fdeeefacf
[Offload] Add global variable address/size queries (#147972)
Add two new symbol info types for getting the bounds of a global
variable. As well as a number of tests for reading/writing to it.
2025-07-11 16:12:48 +01:00
Ross Brunton
84e15d08c2
[Offload] Add olGetSymbolInfo[Size] (#147962)
This mirrors the similar functions for other handles. The only
implemented info at the moment is the symbol's kind.
2025-07-11 15:29:53 +01:00
Ross Brunton
eee723f928
[Offload] Replace GetKernel with GetSymbol with global support (#148221)
`olGetKernel` has been replaced by `olGetSymbol` which accepts a
`Kind` parameter. As well as loading information about kernels, it
can now also load information about global variables.
2025-07-11 14:48:10 +01:00
Ross Brunton
466357ab51
[Offload] Change ol_kernel_handle_t -> ol_symbol_handle_t (#147943)
In the future, we want `ol_symbol_handle_t` to represent both kernels
and global variables The first step in this process is a rename and
promotion to a "typed handle".
2025-07-10 14:54:10 +01:00
Kenneth Benzie (Benie)
cea33304c0
[Offload] Add Offload API Sphinx documentation (#147323)
* Add spec generation to offload-tblgen tool
* This patch adds generation of Sphinx compatible reStructuedText
utilizing the C domain to document the Offload API directly from the
spec definition `.td` files.
* Add Sphinx HTML documentation target
* Introduces the `docs-offload-html` target when CMake is configured
with `LLVM_ENABLE_SPHINX=ON` and `SPHINX_OUTPUT_HTML=ON`. Utilized
`offload-tblgen -gen-spen` to generate Offload API specification docs.
2025-07-10 11:50:51 +01:00
Callum Fare
7c6edf4a05
[Offload] Implement olGetQueueInfo, olGetEventInfo (#142947)
Add info queries for queues and events.

`olGetQueueInfo` only supports getting the associated device. We were
already tracking this so we can implement this for free. We will likely
add other queries to it in the future (whether the queue is empty, what
flags it was created with, etc)

`olGetEventInfo` only supports getting the associated queue. This is
another thing we were already storing in the handle. We'll be able to
add other queries in future (the event type, status, etc)
2025-07-09 17:09:31 +01:00
Ross Brunton
8c06d0e547
[Offload] Generate OffloadInfo.inc (#147316)
This is a generated file which contains a macro for all Device Info
keys. This is visible to the plugin interface so that it can use the
definitions in a future patch.
2025-07-09 11:35:22 +01:00
Callum Fare
3c0571a749
[Offload] Add missing license header to Common.td (#146737)
All other tablegen files in this directory have the license header, but
`Common.td` is missing it
2025-07-02 17:17:30 +01:00
Ross Brunton
7d52b0983e
[Offload] Add MAX_WORK_GROUP_SIZE device info query (#143718)
This adds a new device info query for the maximum workgroup/block size
for each dimension.
2025-07-02 16:33:54 +01:00
Callum Fare
acb52a8a98
[Offload] Improve liboffload documentation (#142403)
- Update the main README to reflect the current project status
- Rework the main API generation documentation. General fixes/tidying,
but also spell out explicitly how to make API changes at the top of the
document since this is what most people will care about.

---------

Co-authored-by: Martin Grant <martingrant@outlook.com>
2025-07-02 13:52:27 +01:00
Ross Brunton
003145d0c8
[Offload] Implement olShutDown (#144055)
`olShutDown` was not properly calling deinit on the platforms, resulting
in random segfaults on AMD devices.

As part of this, `olInit` and `olShutDown` now alloc and free the
offload context rather than it being static. This
allows `olShutDown` to be called within a destructor of a static object
(like the tests do) without having to worry about destructor ordering.
2025-06-30 12:14:00 +01:00
Ross Brunton
0870c8838b
[Offload] Add an unloadBinary interface to PluginInterface (#143873)
This allows removal of a specific Image from a Device, rather than
requiring all image data to outlive the device they were created for.

This is required for `ol_program_handle_t`s, which now specify the
lifetime of the buffer used to create the program.
2025-06-25 14:53:18 +01:00
Ross Brunton
4359e55838
[Offload] Properly report errors when jit compiling (#145498)
Previously, if a binary failed to load due to failures when jit
compiling, the function would return success with nullptr. Now it
returns a new plugin error, `COMPILE_FAILURE`.
2025-06-24 16:27:12 +01:00
Ross Brunton
e0633d59b9
[Offload] Check for initialization (#144370)
All entry points (except olInit) now check that offload has been
initialized. If not, a new `OL_ERRC_UNINITIALIZED` error is returned.
2025-06-20 09:04:50 -05:00
Ross Brunton
4f60321ca1
[Offload] Add ol_dimensions_t and convert ranges from size_t -> uint32_t (#143901)
This is a three element x, y, z size_t vector that can be used any place
where a 3D vector is required. This ensures that all vectors across
liboffload are the same and don't require any resizing/reordering
dances.
2025-06-12 09:59:59 -05:00
Callum Fare
835497a4dc
[Offload] Make olMemcpy src parameter const (#143161) 2025-06-06 10:25:00 -05:00
Ross Brunton
269c29ae67
[Offload] Allow setting null arguments in olLaunchKernel (#141958) 2025-06-06 07:05:11 -05:00
Callum Fare
f44df93a9c
[Offload] Explicitly create directories that contain tablegen output (#142817)
This isn't required when building with Ninja, but with the Makefile
generator these directories don't get implicitly created.
2025-06-04 13:46:19 -05:00
Callum Fare
817af2ddf2
[Offload] Fix missing dependencies in Offload API generation (#142776)
Thanks to @RossBrunton for spotting this.

We attempt to clang-format the generated Offload header files, but if
clang-format isn't available we just copy the generated files instead.
That fallback path was missing the correct dependencies.

Fixes #142756
2025-06-04 08:51:50 -05:00
Callum Fare
b78bc35d16
[Offload] Don't check in generated files (#141982)
Previously we decided to check in files that we generate with tablegen.
The justification at the time was that it helped reviewers unfamiliar
with `offload-tblgen` see the actual changes to the headers in PRs.
After trying it for a while, it's ended up causing some headaches and is
also not how tablegen is used elsewhere in LLVM.

This changes our use of tablegen to be more conventional. Where
possible, files are still clang-formatted, but this is no longer a hard
requirement. Because `OffloadErrcodes.inc` is shared with libomptarget
it now gets generated in a more appropriate place.
2025-06-03 10:39:04 -05:00
Joseph Huber
0ebe5557d9
[Offload] Add specifier for the host type (#141635)
Summary:
We use this sepcial type to indicate a host value, this will be refined
later but for now it's used as a stand-in device for transfers and
queues. It needs a special kind because it is not a device target as the
other ones so we need to differentiate it between a CPU and GPU type.

Fixes: https://github.com/llvm/llvm-project/issues/141436
2025-05-28 08:51:14 -05:00