351 Commits

Author SHA1 Message Date
Callum Fare
47c9609a86
[Offload] Check plugins aren't already deinitialized when tearing down (#148642)
This is a hotfix for #148615 - it fixes the issue for me locally.

I think a broader issue is that in the test environment we're calling
olShutDown from a global destructor in the test binaries. We should do
something more controlled, either calling olInit/olShutDown in every
test, or move those to a GTest global environment. I didn't do that
originally because it looked like it needed changes to LLVM's GTest
wrapper.
2025-07-14 16:17:10 +01:00
Kenneth Benzie (Benie)
508f9a0274
[Offload] Skip event tests on AMDGPU (#148632)
Add `OffloadDeviceTest::getPlatformBackend()` and use it to skip event
tests which currently fail on AMDGPU due to:

```
OL_ERRC_UNIMPLEMENTED: synchronize event not implemented
```
2025-07-14 09:19:53 -05:00
Ross Brunton
a71187e976
[Offload] Return error rather than dropping it (#148609) 2025-07-14 14:05:58 +01:00
Kenneth Benzie (Benie)
b520d21c02
[Offload] Add tagged type to enumerator docs (#147998)
When `EnumRec::isTyped()` is true, include the
`EnumValueRec::getTaggedType()` to the documentation.
2025-07-14 13:35:36 +01:00
Ross Brunton
2fdeeefacf
[Offload] Add global variable address/size queries (#147972)
Add two new symbol info types for getting the bounds of a global
variable. As well as a number of tests for reading/writing to it.
2025-07-11 16:12:48 +01:00
Ross Brunton
84e15d08c2
[Offload] Add olGetSymbolInfo[Size] (#147962)
This mirrors the similar functions for other handles. The only
implemented info at the moment is the symbol's kind.
2025-07-11 15:29:53 +01:00
Ross Brunton
eee723f928
[Offload] Replace GetKernel with GetSymbol with global support (#148221)
`olGetKernel` has been replaced by `olGetSymbol` which accepts a
`Kind` parameter. As well as loading information about kernels, it
can now also load information about global variables.
2025-07-11 14:48:10 +01:00
Ross Brunton
466357ab51
[Offload] Change ol_kernel_handle_t -> ol_symbol_handle_t (#147943)
In the future, we want `ol_symbol_handle_t` to represent both kernels
and global variables The first step in this process is a rename and
promotion to a "typed handle".
2025-07-10 14:54:10 +01:00
Ross Brunton
abb878438a
[Offload] Allow querying the size of globals (#147698)
The `GlobalTy` helper has been extended to make both the Size and Ptr be
optional. Now `getGlobalMetadataFromDevice`/`Image` is able to write the
size of the global to the struct, instead of just verifying it.
2025-07-10 12:05:31 +01:00
Kenneth Benzie (Benie)
cea33304c0
[Offload] Add Offload API Sphinx documentation (#147323)
* Add spec generation to offload-tblgen tool
* This patch adds generation of Sphinx compatible reStructuedText
utilizing the C domain to document the Offload API directly from the
spec definition `.td` files.
* Add Sphinx HTML documentation target
* Introduces the `docs-offload-html` target when CMake is configured
with `LLVM_ENABLE_SPHINX=ON` and `SPHINX_OUTPUT_HTML=ON`. Utilized
`offload-tblgen -gen-spen` to generate Offload API specification docs.
2025-07-10 11:50:51 +01:00
Callum Fare
7c6edf4a05
[Offload] Implement olGetQueueInfo, olGetEventInfo (#142947)
Add info queries for queues and events.

`olGetQueueInfo` only supports getting the associated device. We were
already tracking this so we can implement this for free. We will likely
add other queries to it in the future (whether the queue is empty, what
flags it was created with, etc)

`olGetEventInfo` only supports getting the associated queue. This is
another thing we were already storing in the handle. We'll be able to
add other queries in future (the event type, status, etc)
2025-07-09 17:09:31 +01:00
Ye Luo
9f6784cc1f [libomptarget] fix test offloading/disable_default_device.c
Fixes the incorrect lit command line introduced in 536ba87726d8dea862d964678dbb761ca32e21fb
2025-07-09 09:52:00 -05:00
Ross Brunton
bed9fe77dc
[Offload] Tests for global memory and constructors (#147537)
Adds two "launch kernel" tests for lib offload, one testing that
global memory works and persists between different kernels, and one
verifying that `[[gnu::constructor]]` works correctly.

Since we now have tests that contain multiple kernels in the same
binary, the test framework has been updated a bit.
2025-07-09 14:26:50 +01:00
Ross Brunton
0740db9bc1
[Offload] Add _LAST variant for generated enumerations (#147314) 2025-07-09 13:55:25 +01:00
Michael Kruse
4be3e95284
[Flang-RT][Offload] Always use LLVM-built GTest (#143682)
The Offload and Flang-RT had the ability to compile GTest themselves.
But in bootstrapping builds, LLVM_LIBRARY_OUTPUT_INTDIR points to the
same location as the stage1 build. If both are building GTest, they
everwrite each others `libllvm_gtest.a` and `libllvm_test_main.a` which
causes #143134.

This PR removes the ability for the Offload/Flang-RT runtimes to build
their own GTest and instead relies on the stage1 build of GTest. This
was already the case with LLVM_INSTALL_GTEST=ON configurations. For
LLVM_INSTALL_GTEST=OFF configurations, we now also export gtest into the
buildtree configuration. Ultimately, this reduces combinatorial
explosion of configurations in which unittests could be built
(LLVM_INSTALL_GTEST=ON, GTest built by Offload, GTest built by Flang-RT,
GTest built by Offload and also used by Flang-RT).

GTest and therefore Offload/Runtime unittests will not be available if
the runtimes are configured against an LLVM install tree. Since llvm-lit
isn't available in the install tree either, it doesn't matter.

Note that compiler-rt and libc also use GTest in non-default
configrations. libc also depends on LLVM's GTest build (and would
error-out if unavailable), but compiler-rt builds it completely
different.

Fixes #143134
2025-07-09 12:53:33 +02:00
Ross Brunton
8c06d0e547
[Offload] Generate OffloadInfo.inc (#147316)
This is a generated file which contains a macro for all Device Info
keys. This is visible to the plugin interface so that it can use the
definitions in a future patch.
2025-07-09 11:35:22 +01:00
Ross Brunton
8e104d69fc
[Offload] Provide proper memory management for Images on host device (#146066)
The `unloadBinaryImpl` method on the host plugin is now implemented
properly (rather than just being a stub). When an image is unloaded,
it is deallocated and the library associated with it is closed.
2025-07-08 12:42:06 +01:00
Abhinav Gaba
ae4a81e849
[NFC][OpenMP] Add tests for mapping pointers and their dereferences. (#146934)
The output of the compile-and-run tests is incorrect. These will be used
for reference in future commits that resolve the issues.

Also updated the existing clang LIT test,
target_map_both_pointer_pointee_codegen.cpp, with more constructs and
fewer CHECKs (through more update_cc_test_checks filters).
2025-07-08 06:52:38 -04:00
Callum Fare
fdf6ab2a53
[Offload] Implement 'Vendor Name' device info for CUDA (#147334)
After #146345 the device info implementation requires a value for every
query, rather than silently returning an empty string. This broke the
test for `OL_DEVICE_INFO_VENDOR` on CUDA.

Add a value to the CUDA plugin. We can quite safely hard code this one.
2025-07-08 10:04:48 +01:00
Giorgi Gvalia
5110ac4113
[Offload] Allow CUDA Kernels to use arbitrarily large shared memory (#145963)
Previously, the user was not able to use more than 48 KB of shared
memory on NVIDIA GPUs. In order to do so, setting the function attribute
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` is required, which was not
present in the code base. With this commit, we add the ability toset
this attribute, allowing the user to utilize the full power of their
GPU.

In order to not have to reset the function attribute for each launch of
the same kernel, we keep track of the maximum memory limit (as the
variable `MaxDynCGroupMemLimit`) and only set the attribute if our
desired amount exceeds the limit. By default, this limit is set to 48
KB.

Feedback is greatly appreciated, especially around setting the new
variable as mutable. I did this becuase the `launchImpl` method is const
and I am not able to modify my variable otherwise.

---------

Co-authored-by: Giorgi Gvalia <ggvalia@login33.chn.perlmutter.nersc.gov>
Co-authored-by: Giorgi Gvalia <ggvalia@login07.chn.perlmutter.nersc.gov>
2025-07-07 15:26:16 -04:00
Ross Brunton
8ae8d31832
[Offload] Add liboffload unit tests for shared/local memory (#147040) 2025-07-07 16:20:02 +01:00
Ross Brunton
6b19cdcefa
[Offload][amdgpu] Map INVALID_CODE_OBJECT to INVALID_BINARY (#147070) 2025-07-04 16:17:51 +01:00
Callum Fare
3c0571a749
[Offload] Add missing license header to Common.td (#146737)
All other tablegen files in this directory have the license header, but
`Common.td` is missing it
2025-07-02 17:17:30 +01:00
Ross Brunton
7d52b0983e
[Offload] Add MAX_WORK_GROUP_SIZE device info query (#143718)
This adds a new device info query for the maximum workgroup/block size
for each dimension.
2025-07-02 16:33:54 +01:00
Ross Brunton
4f02965ae2
[Offload] Store kernel name in GenericKernelTy (#142799)
GenericKernelTy has a pointer to the name that was used to create it.
However, the name passed in as an argument may not outlive the kernel.
Instead, GenericKernelTy now contains a std::string, and copies the
name into there.
2025-07-02 14:11:05 +01:00
Callum Fare
acb52a8a98
[Offload] Improve liboffload documentation (#142403)
- Update the main README to reflect the current project status
- Rework the main API generation documentation. General fixes/tidying,
but also spell out explicitly how to make API changes at the top of the
document since this is what most people will care about.

---------

Co-authored-by: Martin Grant <martingrant@outlook.com>
2025-07-02 13:52:27 +01:00
Kewen12
2b16af8df2
[Offload][cmake] Add GPU test job limit for AMDGPU buildbot cmake cache (#146611)
Added GPU test job limit to make it consistent with current config
https://github.com/llvm/llvm-zorg/blob/main/buildbot/osuosl/master/config/builders.py#L2027C31-L2027C77
2025-07-01 19:18:28 -05:00
Joseph Huber
3cff3d882b
[Offload] Add skeleton for offload conformance tests (#146391)
Summary:
This adds a basic outline for adding 'conformance' tests. These are
tests that are intended to check device code against a standard. In this
case, we will expect this to be filled with math conformance tests to
make sure their results are within the ULP requirements we demand.

Right now this just *assumes* the GPU libc is there, meaning you'll
likely need to do a manual `ninja` before doing `ninja -C
runtimes/runtimes-bins offload.conformance`.
2025-07-01 10:20:40 -05:00
Callum Fare
1a253e213d
[NFC][Offload] Fix possible edge cases in offload-tblgen (#146511)
Fix a couple of unhandled edge cases in offload-tblgen that were found
by static analysis
* `LineStart` may wrap around to 0 when processing multi-line strings.
The value is not actually being used in that case, but still better to
explicitly handle it
* Possible unchecked nullptr when processing parameter flags
2025-07-01 14:09:49 +01:00
Ye Luo
536ba87726
[libomptarget] Add a test for OMP_TARGET_OFFLOAD=disabled (#146385)
closes https://github.com/llvm/llvm-project/issues/144786
2025-06-30 13:29:36 -05:00
Ross Brunton
67e73ba605
[Offload] Refactor device/platform info queries (#146345)
This makes several small changes to how the platform and device info
queries are handled:
* ReturnHelper has been replaced with InfoWriter which is more explicit
  in how it is invoked.
* InfoWriter consumes `llvm::Expected` rather than values directly, and
  will early exit if it returns an error.
* As a result of the above, `GetInfoString` now correctly returns errors
  rather than empty strings.
* The host device now has its own dedicated "getInfo" function rather
  than being checked in multiple places.
2025-06-30 15:00:43 +01:00
Ross Brunton
003145d0c8
[Offload] Implement olShutDown (#144055)
`olShutDown` was not properly calling deinit on the platforms, resulting
in random segfaults on AMD devices.

As part of this, `olInit` and `olShutDown` now alloc and free the
offload context rather than it being static. This
allows `olShutDown` to be called within a destructor of a static object
(like the tests do) without having to worry about destructor ordering.
2025-06-30 12:14:00 +01:00
Julian Brown
b62b58d1bb
[OpenMP] Fix crash with duplicate mapping on target directive (#146136)
OpenMP allows duplicate mappings, i.e. in OpenMP 6.0, 7.9.6 "map
Clause":

  Two list items of the map clauses on the same construct must not share
  original storage unless one of the following is true: they are the same
  list item [or other omitted reasons]"

Duplicate mappings can arise as a result of user-defined mapper
processing (which I think is a separate bug, and is not addressed here),
but also in straightforward cases such as:

  #pragma omp target map(tofrom: s.mem[0:10]) map(tofrom: s.mem[0:10])

Both these cases cause crashes at runtime at present, due to an
unfortunate interaction between reference counting behaviour and shadow
pointer handling for blocks. This is what happens:

  1.  The member "s.mem" is copied to the target
  2.  A shadow pointer is created, modifying the pointer on the target
  3.  The member "s.mem" is copied to the target again
  4. The previous shadow pointer metadata is still present, so the runtime doesn't modify the target pointer a second time.

The fix is to disable step 3 if we've already done step 2 for a given
block that has the "is new" flag set.
2025-06-29 22:41:24 +01:00
Ross Brunton
39f19f2f1f
[Offload] Store device info tree in device handle (#145913)
Rather than creating a new device info tree for each call to
`olGetDeviceInfo`, we instead do it on device initialisation. As well
as improving performance, this fixes a few lifetime issues with returned
strings.

This does unfortunately mean that device information is immutable,
but hopefully that shouldn't be a problem for any queries we want to
implement.

This also meant allowing offload initialization to fail, which it can
now do.
2025-06-27 15:10:43 +01:00
Ross Brunton
102cf1b999
[Offload] Make CUDA Driver Version a string (#146049)
AMD treats this value as a string, so for consistency require this in
NVIDIA as well. This shouldn't change the output of the
`llvm-offload-device-info` tool, but does fix an issue in liboffload
when it tries to query the version.
2025-06-27 15:07:04 +01:00
Joseph Huber
df5097dd94
[Offload] Add default for HSA agent type to silence warning (#145943)
Summary:
There's a new one called the AIE (AI Engine). We could handle this, but
since we don't use it currently I'm just making it future-proof. Adding
the AIE check would require checking the HSA version which isn't
worthwhile just yet.
2025-06-26 14:46:08 -05:00
Ross Brunton
3e337bc308
[Offload] Add a stub unloadBinaryImpl for host device (#145716) 2025-06-25 17:06:17 +01:00
Ross Brunton
0870c8838b
[Offload] Add an unloadBinary interface to PluginInterface (#143873)
This allows removal of a specific Image from a Device, rather than
requiring all image data to outlive the device they were created for.

This is required for `ol_program_handle_t`s, which now specify the
lifetime of the buffer used to create the program.
2025-06-25 14:53:18 +01:00
Ross Brunton
4359e55838
[Offload] Properly report errors when jit compiling (#145498)
Previously, if a binary failed to load due to failures when jit
compiling, the function would return success with nullptr. Now it
returns a new plugin error, `COMPILE_FAILURE`.
2025-06-24 16:27:12 +01:00
Ross Brunton
4785832144
[Offload] Fix cmake warning (#145488)
Cmake was unhappy that there was no space between arguments, now it
is.
2025-06-24 13:42:03 +01:00
Ross Brunton
02d2a1646a
[Offload] Fix entry_points.td test (#145292)
This was broken as part of #144494 , and just needs an update to the
check lines.
2025-06-23 11:09:08 +01:00
Ross Brunton
613c38a992
[Offload] Fix type mismatch warning in test (#143700) 2025-06-23 10:14:12 +01:00
Joseph Huber
3f1de197b1
[Offload] Rework compiling device code for unit test suites (#144776)
Summary:
I'll probably want to use this as a more generic utility in the future.
This patch reworks it to make it a top level function. I also tried to
decouple this from the OpenMP utilities to make that easier in the
future. Instead, I just use `-march=native` functionality which is the
same thing. Needed a small hack to skip the linker stage for checking if
that works.

This should still create the same output as far as I'm aware.
2025-06-20 10:31:54 -05:00
Ross Brunton
f242360e15
[Offload] Add type information to device info nodes (#144535)
Rather than being "stringly typed", store values as a std::variant that
can hold various types. This means that liboffload doesn't have to do
any string parsing for integer/bool device info keys.
2025-06-20 09:05:05 -05:00
Ross Brunton
e0633d59b9
[Offload] Check for initialization (#144370)
All entry points (except olInit) now check that offload has been
initialized. If not, a new `OL_ERRC_UNINITIALIZED` error is returned.
2025-06-20 09:04:50 -05:00
Ross Brunton
53336ad488
[Offload] Move (most) global state to an OffloadContext struct (#144494)
Rather than having a number of static local variables, we now use
a single `OffloadContext` struct to store global state. This is
initialised by `olInit`, but is never deleted (de-initialization of
Offload isn't yet implemented).

The error reporting mechanism has not been moved to the struct, since
that's going to cause issues with teardown (error messages must outlive
liboffload).
2025-06-19 16:02:03 -05:00
Jan Patrick Lehr
dd65e6e060
[Offload][libc] Add cmake cache AMDGPU buildbot (#144500)
An upcoming libc4GPU buildbot will be using this CMake cache file for
its build configuration.
2025-06-17 20:51:40 +02:00
Ross Brunton
e6a3579653
[Offload] Replace device info queue with a tree (#144050)
Previously, device info was returned as a queue with each element having
a "Level" field indicating its nesting level. This replaces this queue
with a more traditional tree-like structure.

This should not result in a change to the output of
`llvm-offload-device-info`.
2025-06-13 09:22:47 -05:00
Ethan Luis McDonough
daee5eee85
[Offload][PGO] Fix new GPU PGO tests (#143645)
`pgo_atomic_teams.c` and `pgo_atomic_threads.c` currently are set to run
on NVPTX despite the changes for that target not being upstreamed yet.
This patch also replaces instances of `llvm-profdata` with `%profdata`
in those tests.
2025-06-12 11:14:21 -05:00
Ross Brunton
4f60321ca1
[Offload] Add ol_dimensions_t and convert ranges from size_t -> uint32_t (#143901)
This is a three element x, y, z size_t vector that can be used any place
where a 3D vector is required. This ensures that all vectors across
liboffload are the same and don't require any resizing/reordering
dances.
2025-06-12 09:59:59 -05:00