398 Commits

Author SHA1 Message Date
Akash Banerjee
6aafe6582d Fix test added in 1fd1d634630754cc9b9c4b5526961d5856f64ff9 2025-08-18 13:29:23 +01:00
Abhinav Gaba
12769aa728
[Offload] Introduce ATTACH map-type support for pointer attachment. (#149036)
This patch introduces libomptarget support for the ATTACH map-type,
which can be used to implement OpenMP conditional compliant pointer
attachment, based on whether the pointer/pointee is newly mapped on a
given construct.

For example, for the following:

```c
  int *p;
  #pragma omp target enter data map(p[1:10])
```

The following maps can be emitted by clang:
```
  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH
```

Without this map-type, these two possible maps could be emitted by
clang:
```
  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ
````

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, which are both incorrect.

In terms of implementation, maps with the ATTACH map-type are handled
after all other maps have been processed, as it requires knowledge of
which new allocations happened as part of the construct. As per OpenMP
5.0, an attachment should happen only when either the pointer or the
pointee was newly mapped while handling the construct.

Maps with ATTACH map-type-bit do not increase/decrease the ref-count.

With OpenMP 6.1, `attach(always/never)` can be used to force/prevent
attachment. For `attach(always)`, the compiler will insert the ALWAYS
map-type, which would let libomptarget bypass the check about one of the
pointer/pointee being new. With `attach(never)`, the ATTACH map will not
be emitted at all.

The size argument of the ATTACH map-type can specify values greater than
`sizeof(void*)` which can be used to support pointer attachment on
Fortran descriptors. Note that this also requires shadow-pointer
tracking to also support them. That has not been implemented in this
patch.

This was worked upon in coordination with Ravi Narayanaswamy, who has
since retired. Happy retirement, Ravi!

---------

Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-08-17 15:17:04 -07:00
Abhinav Gaba
79cf877627
[Offload] Introduce dataFence plugin interface. (#153793)
The purpose of this fence is to ensure that any `dataSubmit`s inserted
into a queue before a `dataFence` finish before finish before any
`dataSubmit`s
inserted after it begin.

This is a no-op for most queues, since they are in-order, and by design
any operations inserted into them occur in order.

But the interface is supposed to be functional for out-of-order queues.

The addition of the interface means that any operations that rely on
such ordering (like ATTACH map-type support in #149036) can invoke it,
without worrying about whether the underlying queue is in-order or
out-of-order.

Once a plugin supports out-of-order queues, the plugin can implement
this function, without requiring any change at the libomptarget level.

---------

Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-08-15 11:49:35 -07:00
Akash Banerjee
1fd1d63463 [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-15 15:41:41 +01:00
Ross Brunton
30c7951136
[Offload] olLaunchHostFunction (#152482)
Add an `olLaunchHostFunction` method that allows enqueueing host work
to the stream.
2025-08-15 09:39:48 +01:00
Abhinav Gaba
2912c9c249
[NFC][Offload] Add missing maps to OpenMP offloading tests. (#153103)
A few tests were only mapping a pointee, like: `map(pp[0][0])`, on an
`int** pp`, but expecting the pointers, like `pp`, `pp[0]` to also be
mapped, which is incorrect.

This change fixes six such tests.
2025-08-14 12:22:28 -07:00
Ross Brunton
3e9f29cfee
[Offload] Store globals in the program's global list rather than the kernel list (#153441) 2025-08-13 17:18:25 +01:00
Callum Fare
aa6f591b63
[Offload] Implement hasPendingWork on CUDA (#152728)
Following on from #152304, implement the new query in the CUDA plugin
2025-08-13 16:35:23 +01:00
Akash Banerjee
1c7720ef78 Revert "[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)"
This reverts commit 4e6d510eb3ec5b5e5ea234756ea1f0b283feee4a.
2025-08-12 20:19:45 +01:00
Akash Banerjee
4e6d510eb3
[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-12 15:18:15 +01:00
Amit Tiwari
2074e1320f
[Clang][OpenMP] Non-contiguous strided update (#144635)
This patch handles the strided update in the `#pragma omp target update
from(data[a🅱️c])` directive where 'c' represents the strided access
leading to non-contiguous update in the `data` array when the offloaded
execution returns the control back to host from device using the `from`
clause.

Issue: Clang CodeGen where info is generated for the particular
`MapType` (to, from, etc), it was failing to detect the strided access.
Because of this, the `MapType` bits were incorrect when passed to
runtime. This led to incorrect execution (contiguous) in the
libomptarget runtime code.

Added a minimal testcase that verifies the working of the patch.
2025-08-12 19:32:15 +05:30
Akash Banerjee
0998da27e9 Revert "[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#151989)"
This reverts commit 5a5e8ba0c388d57aecb359ed67919cda429fc7b1.
2025-08-11 13:52:39 +01:00
Akash Banerjee
5a5e8ba0c3
[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#151989)
Add a new `AutomapToTargetData` pass. This gathers the declare target
enter variables which have the `AUTOMAP` modifier. And adds
`omp.declare_target_enter/exit` mapping directives for `fir.allocmem`
and `fir.freemem` oeprations on the `AUTOMAP` enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-11 13:18:38 +01:00
Kevin Sala Penades
5751e96f9a
[Offload][NFC] Re-enable clang-format for omptarget.h (#152937) 2025-08-10 15:46:37 -07:00
Kevin Sala Penades
7de50beb52
[Offload] Fix return error with a condition (#152876)
Adds a conditional to the error return so that it only returns if there was an error.
2025-08-10 12:03:09 -07:00
Ross Brunton
910d7e90bf
[Offload] Make olLaunchKernel test thread safe (#149497)
This sprinkles a few mutexes around the plugin interface so that the
olLaunchKernel CTS test now passes when ran on multiple threads.

Part of this also involved changing the interface for device synchronise
so that it can optionally not free the underlying queue (which
introduced a race condition in liboffload).
2025-08-08 10:57:04 +01:00
Ross Brunton
197d1c1570
[Offload] OL_QUEUE_INFO_EMPTY (#152473)
Add a queue query that (if possible) reports whether the queue is empty
2025-08-08 10:20:45 +01:00
Leandro Lacerda
27ed1f99e2
[Offload][Conformance] Add support for CUDA Math and HIP Math providers (#152362)
This patch extends the conformance testing infrastructure to support two
new providers of math function implementations for GPUs: CUDA Math
(`cuda-math`) and HIP Math (`hip-math`).
2025-08-07 18:32:34 -05:00
Ross Brunton
900d20d0dc
[NFC][Offload] Move conformance test warning outside of function (#152466)
`add_conformance_test` checks for libc and prints a warning if it is not
found. However, this warning ends up being printed once for each test,
spamming the cmake log. Moving it up to the folder cmake allows it to
be reported only once.
2025-08-07 08:14:26 -05:00
Ross Brunton
a44532544b
[Offload] Don't create events for empty queues (#152304)
Add a device function to check if a device queue is empty. If liboffload
tries to create an event for an empty queue, we create an "empty" event
that is already complete.

This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run
quickly for empty queues.
2025-08-07 10:16:33 +01:00
hidekisaito
83e5a99ff6
[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc (#151882)
Enables AMD data center class GPUs to use memory manager memory pooling
up to 3GB allocation by default, up from the "1 << 13" threshold that
all plugin-nextgen devices use.
2025-08-06 14:41:20 -07:00
Alex Duran
f092b820d1
[OFFLOAD] Fix typo in assert (#152316)
Fixes an issue introduced by PR https://github.com/llvm/llvm-project/pull/143491.
2025-08-06 17:01:47 +02:00
Alex Duran
66d1c37eb6
[OFFLOAD][OPENMP] 6.0 compatible interop interface (#143491)
The following patch introduces a new interop interface implementation
with the following characteristics:

* It supports the new 6.0 prefer_type specification
* It supports both explicit objects (from interop constructs) and
implicit objects (from variant calls).
* Implements a per-thread reuse mechanism for implicit objects to reduce
overheads.
* It provides a plugin interface that allows selecting the supported
interop types, and managing all the backend related interop operations
(init, sync, ...).
* It enables cooperation with the OpenMP runtime to allow progress on
OpenMP synchronizations.
* It cleanups some vendor/fr_id mismatchs from the current query
routines.
* It supports extension to define interop callbacks for library cleanup.
2025-08-06 16:34:39 +02:00
Ross Brunton
ca13c44bbc
[NFC][Offload] Clarify olDestroyQueue (#152132)
This has no code changes.
2025-08-06 15:34:31 +01:00
Joseph Huber
3bc1b15235 [OpenMP] Fix weak linkage on malloc declaration
Summary:
This being weak forces the external reference to be weak. Either we
define it weak or not by pulling it from `libc`. Doing it here causes it
to not be extracted properly.
2025-08-05 17:38:59 -05:00
Leandro Lacerda
cd0268063d
[Offload][Conformance] Add tests for single-precision math functions (#152013)
This patch adds a new set of conformance tests for single-precision math
functions provided by the LLVM libm for GPUs.

The functions included in this set were selected based on the following
criteria:
- An implementation exists in `libc/src/math/generic` (i.e., it is not
just a wrapper around a compiler built-in).
- The corresponding LLVM CPU libm implementation is correctly rounded.
- The function is listed in Table 65 of the OpenCL C Specification
v3.0.19.
2025-08-05 13:33:20 -05:00
Ross Brunton
d03692a00e
[Offload] Rework MAX_WORK_GROUP_SIZE (#151926)
`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work
groups the device can allocate, rather than the maximum per dimension.
`MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old
behaviour.
2025-08-04 15:21:24 +01:00
Leandro Lacerda
f1eb869bae
[Offload][UnitTests] Build device code as C++ (#151714)
This commit refactors the `add_offload_test_device_code` CMake function
to compile device code using the C++ compiler (`CMAKE_CXX_COMPILER`)
instead of the C compiler.

This change enables the use of C++ features, such as templates, within
device-side test kernels. This will allow for more advanced and reusable
kernel wrappers, reducing boilerplate code in the conformance test
suite.

As part of this change:
- All `.c` files for device code in `unittests/` have been renamed to
`.cpp`.
- Kernel definitions are now wrapped in `extern "C"` to ensure C linkage
and prevent name mangling.

This change affects the `OffloadAPI` and `Conformance` test suites.

cc @callumfare @RossBrunton @jhuber6
2025-08-04 07:00:51 -05:00
Leandro Lacerda
acdc01b039
[Offload][UnitTests] Fix incorrect CUDA path variable in CMake helper (#151820)
This PR fixes a minor bug in the `add_offload_test_device_code` CMake
helper function in `offload/unittests/CMakeLists.txt`.

The function was discovering the local CUDA Toolkit path and storing it
in the `cuda_path` variable but was then using the incorrect `CUDA_ROOT`
variable in the `add_custom_command` call for the NVPTX target.

This change corrects the command to use the intended `cuda_path`
variable.
2025-08-02 09:09:30 -05:00
Joseph Huber
8934a6e13b
[OpenMP] Use the libc malloc for AMDGPU if available (#151241)
Summary:
This patch enables the OpenMP runtime to use the general-purpose
`malloc` interface in `libc` if the user built OpenMP with it enabled.
All this requires is keeping `malloc` as an external function so it will
be resolved later by the linker.
2025-08-01 20:41:06 -05:00
Joachim
a86ad73064
[offload] Add missing build dependency (#149326)
libc++ headers must be generated before compiling part of liboffload. 
The build error occurs if clang is configured to use libc++ by default. 
Fixes issue #149324
2025-07-30 09:44:46 +02:00
Leandro Lacerda
2abd58cb7e
[Offload] Add framework for math conformance tests (#149242)
This PR introduces the initial version of a C++ framework for the
conformance testing of GPU math library functions, building upon the
skeleton provided in #146391.

The main goal of this framework is to systematically measure the
accuracy of math functions in the GPU libc, verifying correctness or at
least conformance to standards like OpenCL via exhaustive or random
accuracy tests.
2025-07-29 11:08:27 -05:00
Aiden Grossman
2e3fd547de [Offload] Fix typo in shared_lib_fp_mapping.c
Made a typo in 963259ef6be4871e5252ff3ac9df737af5d2b4cb because I cannot
run tests and also did not review it. This should fix it...
2025-07-25 23:17:46 +00:00
Aiden Grossman
963259ef6b
[Offload] Remove uses of %T from lit tests (#150721)
This patch removes all the instances of %T from offload/ (only one test
contained this construction). %T has been deprecated for ~7 years and is
not reccomended as it does not use a unique directory per test. Switch
to using %t to ensure we use a unique dir per test and so that we can
eventually remove %T.

I did not actually test this. A couple feeble attempts at
building/running the offload tests just leaves me with a ton of test
failures. Given how small this is I'm reasonably sure it works though.
2025-07-25 16:16:22 -07:00
Ross Brunton
ae44418f28
[Offload] Erase entries from JIT cache when program is destroyed (#148847)
When `unloadBinary` is called, any entries in the JITEngine's cache
for that binary will be cleared. This fixes a nasty issue with
liboffload program handles. If two handles happen to have had the same
address (after one was free'd, for example), the cache would be hit and
return the wrong program.
2025-07-25 16:11:30 +01:00
agozillon
73272d6fc6
[Flang][OpenMP] Appropriately emit present/load/store in all cases in MapInfoFinalization (#150311)
Currently, we return early whenever we've already generated an
allocation for intermediate descriptor variables (required in certain
cases when we can't directly access the base address of a passes in
descriptor function argument due to HLFIR/FIR restrictions). This
unfortunately, skips over the presence check and load/store required to
set the intermediate descriptor allocations values/data. This is fine in
most cases, but if a function happens to have a series of branches with
seperate target regions capturing the same input argument, we'd emit the
present/load/store into the first branch with the first target inside of
it, the secondary (or any preceding) branches would not have the
present/load/store, this would lead to the subsequent mapped values in
that branch being empty and then leading to a memory access violation on
device.

The fix for the moment is to emit a present/load/store at the relevant
location of every target utilising the input argument, this likely will
also lead to fixing possible issues with the input argument being
manipulated inbetween target regions (primarily resizing, the data
should remain the same as we're just copying an address around, in
theory at least). There's possible optimizations/simplifications to emit
less load/stores such as by raising the load/store out of the branches
when we can, but I'm inclined to leave this sort of optimization to
lower level passes such as an LLVM pass (which very possibly already
covers it).
2025-07-25 16:15:54 +02:00
Ross Brunton
adb2421202
[Offload] Refactor device information queries to use new tagging (#147318)
Instead using strings to look up device information (which is brittle
and slow), use the new tags that the plugins specify when building the
nodes.
2025-07-25 14:51:51 +01:00
Callum Fare
78faf99c4f
[Offload] Fix olWaitEvents tests after change to events API (#150465)
Fix the olWaitEvents tests after #150217 broke them
2025-07-24 18:35:47 +01:00
Ross Brunton
690c3ee5be
[Offload] Replace "EventOut" parameters with olCreateEvent (#150217)
Rather than having every "enqueue"-type function have an output pointer
specifically for an output event, just provide an `olCreateEvent`
entrypoint which pushes an event to the queue.

For example, replace:
```cpp
olMemcpy(Queue, ..., EventOut);
```
with
```cpp
olMemcpy(Queue, ...);
olCreateEvent(Queue, EventOut);
```
2025-07-24 14:31:06 +01:00
hidekisaito
75e60e745b
[AMDGPU][Offload][LIT] Run unified_shared_memory tests on gfx950 (#150372)
Enables 9 more tests
2025-07-23 22:46:26 -07:00
Ross Brunton
081b74caf5
[Offload] Add olWaitEvents (#150036)
This function causes a queue to wait until all the provided events have
completed before running any future scheduled work.
2025-07-23 14:12:16 +01:00
Ross Brunton
2726b7fb1c
[Offload] Rename olWaitEvent/Queue to olSyncEvent/Queue (#150023)
This more closely matches the nomenclature used by CUDA, AMDGPU and
the plugin interface.
2025-07-23 10:52:13 +01:00
Joseph Huber
b53be5f4b2
[LLVM] Update CUDA ELF flags for their new ABI (#149534)
Summary:
We rely on these flags to do things in the runtime and print the
contents of binaries correctly. CUDA updated their ABI encoding recently
and we didn't handle that. it's a new ABI entirely so we just select on
it when it shows up.

Fixes: https://github.com/llvm/llvm-project/issues/148703
2025-07-21 14:38:03 -05:00
Ross Brunton
e87d3904f6
[Offload] Verify SyncCycle for events in AMDGPU (#149524)
This check ensures that events after a synchronise (and thus after the
queue is reset) are always considered complete. A test has been added
as well.
2025-07-21 09:37:29 +01:00
Ross Brunton
311847be4c
[Offload] Allow "tagging" device info entries with offload keys (#147317)
When generating the device info tree, nodes can be marked with an
offload Device Info value. The nodes can also look up children based
on this value.
2025-07-18 14:27:34 +01:00
Ross Brunton
df9a864b04
[Offload] Implement event sync in amdgpu (#149300) 2025-07-18 09:56:17 +01:00
Ross Brunton
55b417a75f
[Offload] Cache symbols in program (#148209)
When creating a new symbol, check that it already exists. If it does,
return that pointer rather than building a new symbol structure.
2025-07-16 18:32:47 +01:00
Callum Fare
47c9609a86
[Offload] Check plugins aren't already deinitialized when tearing down (#148642)
This is a hotfix for #148615 - it fixes the issue for me locally.

I think a broader issue is that in the test environment we're calling
olShutDown from a global destructor in the test binaries. We should do
something more controlled, either calling olInit/olShutDown in every
test, or move those to a GTest global environment. I didn't do that
originally because it looked like it needed changes to LLVM's GTest
wrapper.
2025-07-14 16:17:10 +01:00
Kenneth Benzie (Benie)
508f9a0274
[Offload] Skip event tests on AMDGPU (#148632)
Add `OffloadDeviceTest::getPlatformBackend()` and use it to skip event
tests which currently fail on AMDGPU due to:

```
OL_ERRC_UNIMPLEMENTED: synchronize event not implemented
```
2025-07-14 09:19:53 -05:00
Ross Brunton
a71187e976
[Offload] Return error rather than dropping it (#148609) 2025-07-14 14:05:58 +01:00