664 Commits

Author SHA1 Message Date
Alex Duran
e40062c0bd
[OFFLOAD][L0] Add support to run ctor/dtor code (#187510)
This PR adds support in the Level Zero plugin to execute
constructors/destructors on the device code. As spirv-link has some
limitations, it mimics the CUDA plugin behavior where the RTL constructs
the device side tables before invoking the kernel that will execute
them.

The kernel and other necessary symbols to create the device tables are
created by the SPIRVCtorDtorLowering pass to be added in #187509
2026-03-25 08:43:44 +01:00
Alex Duran
227bab0a62
[OFFLOAD][L0] Improve cleanup on errors (#188251)
Additional cleanup improvements on error conditions (in addition to
those in #187597):

  * Fixed incomplete cleanup in L0Context::init()
  * Fixed build log leak in addModule()
  * Fixed context inconsistent state in findDevices()

Disclaimer: The base of this PR was generated by Claude and adjusted by
me afterwards.
2026-03-24 15:36:01 +01:00
Joseph Huber
376874a345 [Offload] Fix destroying signal that was never initialized
Summary:
We create the RPC doorbell signal lazily and destroy it at the plugin
level. This means that we can't rely on the normal 'per-device' handling
so this needs to be called unconditionally. We only create the signal if
a device is registered, but deinit is called unconditionally. Just check
the handle.
2026-03-24 09:29:27 -05:00
Joseph Huber
4961700c10
[libc] Support AMDGPU device interrupts for the RPC interface (#188067)
Summary:
One of the main disadvantages to using the RPC interface is that it
requires a server thread to spin on the mailboxes checking for work.
The vast majority of the time, there will be no work and work will come
in large bursts.

The HSA / KFD interface supports device-side interrupts and already has
handling for binding these events to an HSA signal. This means that we
can send interrupts from the GPU to wake a sleeping thread on the CPU.
The sleeping thread will be descheduled with a blocking HSA wait call
and woken up when its event ID is raised through the kernel driver's
interrupt.

This is very target-specific handling, but I believe it is valuable
enough to warrant it being in the protocol. It is completely optional,
as it is ignored if uninitialized. This should bring this support at
parity with the interface HIP expects.
2026-03-24 08:48:52 -05:00
Joseph Huber
07896d44a3
[OpenMP] Emit aggregate kernel prototypes and remove libffi dependency (#186261)
Summary:
This PR changes the handling of the emitted kernels when targeting a CPU
to be a pointer struct.

The old handling emitted a standard function prototype, this
necessitated a target specific ABI to call it because the signature
differed with the number of arguments. Instead, this PR emits a void
pointer to a naturally aligned struct, this is what APIs like `pthreads`
assert.

This allows us to remove all the complexity around launching host
kernels and just pass the argument list.
2026-03-20 13:08:23 -05:00
Robert Imschweiler
bc6a265e3b
[offload] Use flang-rt for test feature requirements (#187733) 2026-03-20 18:59:06 +01:00
Robert Imschweiler
c3e7b4556e
[offload] Define flang-rt as an available test feature (#187732)
Can now be used as `REQUIRES: flang-rt`, for example.
2026-03-20 17:47:51 +01:00
Bruce Changlong Xu
cbab7e65a7
[AMDGPU] Minor cleanups in offload plugin and AMDGPUEmitPrintf. NFC. (#187587)
Use empty() in assert, brace-init instead of std::make_pair in the
AMDGPU offload plugin, and fix a comment typo in AMDGPUEmitPrintf.
2026-03-19 18:16:47 -04:00
Joseph Huber
d18a784d41
[compiler-rt] Define GPU specific handling of profiling functions (#185763)
Summary:
The changes in https://www.github.com/llvm/llvm-project/pull/185552
allowed us to
start building the standard `libclang_rt.profile.a` for GPU targets.
This PR expands this by adding an optimized GPU routine for counter
increment and removing the special-case handling of these functions in
the OpenMP runtime.

Vast majority of these functions are boilerplate, but we should be able
to do more interesting things with this in the future, like value or
memory profiling.
2026-03-19 10:51:48 -05:00
estewart08
0e7262407c
[offload] - Remove standalone build in favor of 'runtimes' (#170693)
Summary:
Follow up on removal of OPENMP_STANDALONE_BUILD in openmp (#149878).
This
build method is redundant and can be accomplished via runtimes.

Removes support for:
`cmake -S <llvm-project>/offload ...`

 Switches over to:
`make -S <llvm-project>/runtimes -DLLVM_ENABLE_RUNTIMES=openmp;offload
...`

Libomptarget has a dependency on libomp.so and requires the omp cmake
target to exist at build time, which is why both runtimes are listed.

Updates cmake compiler logic in offload/CMakeLists.txt to mirror openmp
changes:
 [openmp] Allow testing OpenMP without a full clang build tree (#182470)

User will still need to have a separate invocation to build openmp
DeviceRTL via:
`-DLLVM_ENABLE_RUNTIMES=openmp`
`-DLLVM_DEFAULT_TARGET_TRIPLE=<amdgcn-amd-amdhsa|nvptx64-nvidia-cuda>`
2026-03-19 09:00:40 -05:00
Jan Patrick Lehr
7a2193cd19
[Offload] Add CMake alias for CI (#186099)
In the pre-merge CI we need a top-level visible target that can be used
to build offload, i.e., libomptarget and LLVMOffload.

The related PR to include offload into pre-merge CI is here:
https://github.com/llvm/llvm-project/pull/174955
2026-03-18 15:46:08 +01:00
fineg74
2890f9883c
[OFFLOAD] Improve handling of synchronization errors in L0 plugin and reenable tests (#186927)
This change improves handling of errors during synchronization in Level
Zero plugin by ensuring cleanup of queues and events in case of an
synchronization error. As a result multiple tests stopped hanging.

---------

Co-authored-by: Duran, Alex <alejandro.duran@intel.com>
2026-03-18 05:50:06 +01:00
Jan Patrick Lehr
964091a2db
[OpenMP][AMDGPU] Enable omptest build (#161649)
This enables building the omptest library across the AMD buildbots that
rely on this CMake cache.
2026-03-16 15:25:12 +00:00
Joseph Huber
154a128c65 Reapply "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309)
Should be working downstream now
This reverts commit 9b61ff210fdff752d5db55b128474e9990258488.
2026-03-13 15:48:37 -05:00
Piotr Balcer
1b9a4a0f72
[Offload][L0] clear completed events from a wait list (#186379)
Queue's WaitEvent collection wasn't being cleared after synchronization
and resetting of the events. This led to hangs on subsequent host
synchronizations if not preceeded by any other operation.
2026-03-13 13:56:27 +00:00
theRonShark
9b61ff210f
Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309)
Reverts llvm/llvm-project#185989
2026-03-13 05:20:40 +00:00
Kevin Sala Penades
ac71b185c2
[offload] Remove LIBOMPTARGET_SHARED_MEMORY_SIZE envar (#186231)
This commit removes the `LIBOMPTARGET_SHARED_MEMORY_SIZE` envar and
outputs a runtime warning if it is defined. Access to dynamic shared memory
should be obtained through the `dyn_groupprivate` clause (OpenMP 6.1) or
the launch arguments in liboffload kernel launch.
2026-03-12 21:21:29 -07:00
Joseph Huber
4376fbd793
[OpenMP] Move OpenMP implicit argument to the end and reformat (#185989)
Summary:
We use this `dyn_ptr` argument in Clang/OpenMP to handle the
`KernelLaunchEnvironment`. This is a per-kernel argument used to share
some information. Currenetly, it's prepended to the argument list and we
generate storage for it in the runtime.

This is bad for a few reasons:
1. It changes the ABI by shifting user arguments
2. It cannot be trivially be left uninitialized if unused
3. The runtime must allocate its own memory for it

This PR changes it to be appended instead. Additionally, space for this
is always emitted. This means the OMPIRBuilder itself will provide the
storage, we simply need to populate it in the runtime if it is used.
This means that if it's unused we don't always pay the cost and it's
easier for non-OpenMP users to ignore it.

Backward compatibility is maintained by auto-upgrading the kernel
arguments. In `libomptarget` we completely allocate a new buffer to
store this in the new format. The plugins still need to respect the old
ABI of the called device object, so we simply rotate it if it's the old
version.
2026-03-12 18:08:22 -05:00
Nick Sarnie
1beec14434
[offload][lit] XFAIL new tests failing on intelgpu (#185908)
New tests from https://github.com/llvm/llvm-project/pull/176708 and
https://github.com/llvm/llvm-project/pull/181987 fail on `intelgpu`, I
updated the [GH
issue](https://github.com/llvm/llvm-project/issues/182897).

Example fails
[here](https://lab.llvm.org/buildbot/#/builders/225/builds/3441).

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2026-03-12 14:37:56 +00:00
Jan Patrick Lehr
ee797883e8
[Offload] Escape \; in command string (#186120)
This adds a \ in front of the ; between the two cache files to stop the
run function to interpret it as a shell statement separator (or so).
2026-03-12 15:02:40 +01:00
Jan Patrick Lehr
bb72ec480f
[Offload] AMD Flang bot to use CMake cache file (#186070)
Converting the current bot config to use the CMake cache file that we
use in other bots (offload/cmake/caches/AMDGPUBot.cmake). This PR
removes all CMake settings that the cache file already sets and only
leaves those that were either not set explicitly or which differ.

Thus, first load the cache file and then adjust the settings to override
existing values.
2026-03-12 14:10:04 +01:00
Kevin Sala Penades
1f583c6dee
[OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (#152831)
Part 3 adding offload runtime support. See
https://github.com/llvm/llvm-project/pull/152651.

---------

Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
2026-03-12 01:13:06 -07:00
Łukasz Plewa
0e122bea82
[OFFLOAD] Enable Level Zero unittests (#185492) 2026-03-11 14:09:59 +00:00
Amit Tiwari
a15dcd4117
[Clang][OpenMP] Handled NonContig Descriptor DimCount (#181987)
### Issue: Dimension override missing
When variable count expressions were used with stride, the constant
subsection path computed size first. This marked `ArgSizes` with byte
size semantics. Variable expression logic later triggered, but reused
`ArgSizes` assuming "bytes" semantics

`OMPIRBuilder.cpp` didn't handle dimension count for
`OMP_MAP_NON_CONTIG` flag

**Result**: `ArgSizes` wasn't overwritten with dimension count, breaking
non-contiguous mapping.

**Fixes**:

`llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp` - Expression semantics for
non-contiguous.
 stride/count.
Generate 3D descriptor structures with runtime dimensions.
Fix dimension override to use dimension count instead of byte size.

Added testcases to cover stack arrays, heap pointers, struct members,
etc.
2026-03-11 19:39:55 +05:30
Alex Duran
789fea83bb
[offload][l0][nfc] remove duplicated entry (#185855)
Remove left over function by mistake from #185404
2026-03-11 11:55:30 +01:00
Alex Duran
3ff332ad0f
[Offload][L0] Add support for OffloadBinary format in L0 plugin (#185404)
- Accept OffloadBinaries as valid images by plugins that support them in
the PluginInterface.
- Add support in L0 plugin to extract SPIRV images and their associated
metadata from an OffloadBinary image.

Depends on:
- #185663

Follow-up PRs:
- #185413 (Changes SPIRV wrapper generation to use OffloadBinary)
- #185425 (Adjusts llvm-objdump)
- #184774 (Adjusts llvm-offload-binary)
2026-03-11 11:42:36 +01:00
Joseph Huber
fd069a46bf
[copmiler-rt] Initial support for building profile library on the GPU (#185552)
Summary:
As suggested in https://github.com/llvm/llvm-project/pull/177665, we
should build a GPU version of the compiler-rt profile library instead of
writing it in-line in the lowering. This PR does not define anything GPU
specific, it simply re-uses the baremetal handling. Later PRs will
prevent the GPU specific handling we would want to do to optimize
counter handling on the GPU.

Note that this will require using the cache file, or setting these
options
manually for existing users. Hopefully if people are using the cache
file
as they should it won't break anything.
2026-03-10 13:45:18 -05:00
Alex Duran
be021b8433
[OFFLOAD] Add interface to extend image validation (#185663)
As discussed in #185404 we might want to provide a way for plugins to
validate images not recognized by the common layer.

This PR adds such extension and uses it to validate pure SPIRV images by
the Level Zero plugin.
2026-03-10 18:41:23 +01:00
Amit Tiwari
14de1bb711
[Clang][OpenMP] Support expression semantics in target update fields with non-contiguous array sections (#176708)
### Issue: Variable stride not recognized as non-contiguous
`CGOpenMPRuntime.cpp` failed to detect `DeclRefExpr`, `MemberExpr`,
`ArraySubscriptExpr` as non-contiguous.

**Fixes**:

`clang/lib/CodeGen/CGOpenMPRuntime.cpp` - Variable stride detection +
dimension count logic
Detect variable stride expressions
(`DeclRefExpr/MemberExpr/ArraySubscriptExpr`) as non-contiguous

Added testcases to cover stack arrays, heap pointers, struct members,
etc., for expression semantics in non-contiguous update.
2026-03-10 19:26:17 +05:30
Joseph Huber
a9e457a82f
[Offload][AMDGPU] Fix RPC server on mixed w32 w64 workloads (#185496)
Summary:
This was a regression from the original LLVM-gpu-loader. We used to
handle `-mwavefrontsize64` correctly in the loader by over-allocating
memory and just leaving the upper 32-bits masked off. In order to handle
this in offload we need to scan loaded kernels to see how much memory we
need to allocate. This should be safe, the protocol is designed to
handle an arbitrary size and worst-case this just wastes space.
2026-03-09 17:13:59 -05:00
Kareem Ergawy
0bf9bb5c42
[Flang][OpenMP] Fix close map flag propagation for derived types in USM (#185330)
This fixes a bug in USM mode where the `close` map type modifer was
attached to some `map.info.op`'s corresponding to user-defined type
members while the parent type instance itself is not marked as `close`.

This fix ensures that if a parent record type map does not have the
'close' flag, it is cleared from its members as well, maintaining
consistency.

Gemini was used to create tests. AI generated test code was reviewed
line-by-line by me. Which were derived from a reproducer I was working
with to debug the issue.

Assisted-by: Gemini <gemini@google.com>
2026-03-09 15:55:53 +01:00
Łukasz Plewa
57614e8810
[OFFLOAD] Replace C-style casts with C++ style casts in obtainInfoImpl (#185023)
Replace C-style bool casts (bool)TmpInt with C++ functional casts
bool(TmpInt)
2026-03-06 10:28:38 -06:00
Jason Van Beusekom
2d4c8e0d0f
[OpenMP][clang] Indirect and Virtual function call mapping from host to device (#184412)
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
2026-03-03 13:20:24 -06:00
Jason Van Beusekom
f95662d159
Revert "[OpenMP][clang] Indirect and Virtual function call mapping from host to device" (#184378)
Reverts llvm/llvm-project#159857
2026-03-03 17:11:14 +00:00
Jason Van Beusekom
b23438661c
[OpenMP][clang] Indirect and Virtual function call mapping from host to device (#159857)
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
2026-03-03 02:52:34 +00:00
Abhinav Gaba
0ced81f7ea
[NFC][OpenMP] Remove redundant prints in target regions from tests added in #184260. (#184266)
Some buildbots don't like them, and the correctness of the values in the
`target` region is ensured via prints after the region.
2026-03-03 00:02:28 +00:00
Abhinav Gaba
1d1c83ad73
Reland "[OpenMP][Offload] Handle present/to/from when a different entry did alloc/delete." (#184260)
Some tests that were checking for prints inside/outside `target` regions
needed to be updated to work on systems where the ordering wasn't
deterministic.

Reverts llvm/llvm-project#184240
    
Original description from #165494:

-----

OpenMP allows cases like the following:

```c
  int *p1, *p2, x;
  p1 = p2 = &x;
  ...
  #pragma omp target_exit_data map(delete: p1[:]) from(p2[0])
```

Which means, when the runtime encounters the `from` entry, the ref-count
may
not be zero, but it will go down to zero at the end of the current
construct,
which should cause the "from" transfer to happen.

Similarly, a user may have:

```c
  struct S {
    int *p;
  };

  #pragma omp declare_mapper (id1: S s) map(s.p) map(present, alloc: s.p[0:10])
  #pragma omp declare_mapper (id2: S s) map(s.p, s.p[0:10])

  S s1;

 // present-check should fail here
 #pragma omp target_enter_data map(alloc: s.p[0:10]) map(mapper(id1), to: s)
 // "to" should be honored here
 #pragma omp target_enter_data map(alloc: s.p[0:10]) map(mapper(id2), to: s)
```

Where the allocation happens before the "to" entry is encountered by the
runtime. Or, an allocation happens before a "present" entry is
encountered.

To handle cases like this, we need to use the state information of
previously
seen new allocations, deletions, "from" entries, when honoring
`to`/`from`/`present` map entries.

-----
2026-03-02 15:32:30 -08:00
Abhinav Gaba
42a0fbc2c7
Revert "[OpenMP][Offload] Handle present/to/from when a different entry did alloc/delete." (#184240)
Reverts llvm/llvm-project#165494

Some buildbots are not happy about CHECKs enforcing strict ordering of
prints inside/after target regions.
2026-03-02 21:52:20 +00:00
Abhinav Gaba
1a7060a7b0
[OpenMP][Offload] Handle present/to/from when a different entry did alloc/delete. (#165494)
OpenMP allows cases like the following:

```c
  int *p1, *p2, x;
  p1 = p2 = &x;
  ...
  #pragma omp target_exit_data map(delete: p1[:]) from(p2[0])
```

Which means, when the runtime encounters the `from` entry, the ref-count
may not be zero, but it will go down to zero at the end of the current
construct, which should cause the "from" transfer to happen.

Similarly, a user may have:

```c
  struct S {
    int *p;
  };

  #pragma omp declare_mapper (id1: S s) map(s.p) map(present, alloc: s.p[0:10])
  #pragma omp declare_mapper (id2: S s) map(s.p, s.p[0:10])

  S s1;

 // present-check should fail here
 #pragma omp target_enter_data map(alloc: s.p[0:10]) map(mapper(id1), to: s)
 // "to" should be honored here
 #pragma omp target_enter_data map(alloc: s.p[0:10]) map(mapper(id2), to: s)
```

Where the allocation happens before the "to" entry is encountered by the
runtime. Or, an allocation happens before a "present" entry is
encountered.

To handle cases like this, we need to use the state information of
previously seen new allocations, deletions, "from" entries, when
honoring `to`/`from`/`present` map entries.
2026-03-02 13:13:51 -08:00
Krish Gupta
148b10be8a
[flang][OpenMP] Support custom mappers in target update to/from clauses (#169673)
Implement support for the OpenMP `mapper` modifier on `target update` `to` and
`from` clauses in Flang.

Semantic name resolution is extended to bind the mapper symbol for
`OmpClause::To` and `OmpClause::From` via a shared `ResolveMapperModifier`
helper. Lowering is extended in `ClauseProcessor` with a `getMapperIdentifier`
template helper to extract the mapper name for both `map` and `target update`
clauses and forward it to `omp.map_info`.

Fixes #168701.

Reviewed By: TIFitis, kparzysz
Assited By: Copilot( For review and articulations of messages)
2026-03-02 21:59:56 +05:30
Hansang Bae
8f268e63e4
[Offload] Remove unused data type (#183840) 2026-02-27 15:46:59 -06:00
Joseph Huber
c49460bae7
[flang-rt] Enable more runtime functions for the GPU target (#183649)
Summary:
This enables primarily `stop.cpp` and `descriptor.cpp`. Requires a
little bit of wrangling to get it to compile. Unlike the CUDA build,
this build uses an in-tree libc++ configured for the GPU. This is
configured without thread support, environment, or filesystem, and it is
not POSIX at all. So, no mutexes, pthreads, or get/setenv.

I tested stop, but i don't know if it's actually legal to exit from
OpenMP offloading.
2026-02-27 12:27:39 -06:00
Johannes Doerfert
9f4636210d
[Offload] Fix type mismatch by using uint64_t instead of size_t (#183375)
The variant uses uint64_t, so should the get.
2026-02-25 13:31:03 -08:00
Nick Sarnie
600919ac32
[Offload][clang-linker-wrapper][SPIRV] Tell spirv-link to not optimize out exported symbols (#182930)
`spirv-link` seems to internalize all symbols, which ends up causing the
OpenMP Device Environment global generated by the OMP FE to get
optimized out which causes `liboffload` to run in the wrong
parallelization mode which breaks at least one liboffload lit test.

Pass `--create-library` to tell it not to do that.

```
  --create-library
               Link the binaries into a library, keeping all exported symbols.
```

This fixes the test.

Closes: https://github.com/llvm/llvm-project/issues/182901

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2026-02-24 15:10:59 +00:00
Hansang Bae
a347e1298c
[Offload] Enable memory usage printing with alloc debug type (#182938) 2026-02-23 17:19:41 -06:00
Jan Patrick Lehr
92447ed273
[Offload] Fix copy-elision warning (#182848)
This fixes a warning about a prohibited copy-elision due to the move of
a temporary object.
2026-02-23 13:58:07 +00:00
Alex Duran
7ed0aa2652
[OFFLOAD][L0] Remove leftover global constructor (#182611) (#182665)
fixes #182611
2026-02-21 18:09:46 +01:00
Joseph Huber
70b5a1d050
[flang-rt] Add support for formatted I/O on the GPU (#182580)
Summary:
Expands on the previous support to enable formatted output, characters,
and checking basic iostat. We intentionally do not handle cases where
the descriptor is non-null as this is a non-trivial class that cannot
easily be shepherded across the wire.
2026-02-20 14:43:06 -06:00
Joseph Huber
5f5d27d7d3
[libc] Support array tags in the RPC dispatch helpers (#181395)
Summary:
This PR adds support for tagging a pointer as an array when marshaling
between the CPU and GPU.
2026-02-20 09:35:47 -06:00
Joseph Huber
21b3461440
[flang-rt] Implement basic support for I/O from OpenMP GPU Offloading (#181039)
Summary:
This PR provides the minimal support for Fortran I/O coming from a GPU
in OpenMP offloading. We use the same support the `libc` uses for its
printing through the RPC server. The helper functions `rpc::dispatch`
and `rpc::invoke` help make this mostly automatic.

Becaus Fortran I/O is not reentrant, the vast majority of complexity
comes from needing to stitch together calls from the GPU until they can
be executed all at once. This is needed not only because of the
limitations of recursive I/O, but without this the output would all be
interleaved because of the GPU's lock-step execution.

As such, the return values from the intermediate functions are
meaningless, all returning true. The final value is correct however. For
cookies we create a context pointer on the server to chain these
together.

Works on both my AMD and NVIDIA GPUs.
```fortran
program hello_gpu
  implicit none

  !$omp target teams num_teams(1)
  !$omp parallel num_threads(2)
    ! Print strings
    print *, "Hello from GPU"
  !$omp end parallel
  !$omp end target teams

end program hello_gpu
```
```console
> flang hello.f90 -O2 -fopenmp --offload-arch=gfx1030 
> ./a.out 
 Hello from GPU
 Hello from GPU
> flang hello.f90 -O2 -fopenmp --offload-arch=sm_89  
> ./a.out 
 Hello from GPU
 Hello from GPU
```
2026-02-20 07:56:59 -06:00