221 Commits

Author SHA1 Message Date
Joseph Huber
07896d44a3
[OpenMP] Emit aggregate kernel prototypes and remove libffi dependency (#186261)
Summary:
This PR changes the handling of the emitted kernels when targeting a CPU
to be a pointer struct.

The old handling emitted a standard function prototype, this
necessitated a target specific ABI to call it because the signature
differed with the number of arguments. Instead, this PR emits a void
pointer to a naturally aligned struct, this is what APIs like `pthreads`
assert.

This allows us to remove all the complexity around launching host
kernels and just pass the argument list.
2026-03-20 13:08:23 -05:00
Bruce Changlong Xu
cbab7e65a7
[AMDGPU] Minor cleanups in offload plugin and AMDGPUEmitPrintf. NFC. (#187587)
Use empty() in assert, brace-init instead of std::make_pair in the
AMDGPU offload plugin, and fix a comment typo in AMDGPUEmitPrintf.
2026-03-19 18:16:47 -04:00
fineg74
2890f9883c
[OFFLOAD] Improve handling of synchronization errors in L0 plugin and reenable tests (#186927)
This change improves handling of errors during synchronization in Level
Zero plugin by ensuring cleanup of queues and events in case of an
synchronization error. As a result multiple tests stopped hanging.

---------

Co-authored-by: Duran, Alex <alejandro.duran@intel.com>
2026-03-18 05:50:06 +01:00
Joseph Huber
154a128c65 Reapply "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309)
Should be working downstream now
This reverts commit 9b61ff210fdff752d5db55b128474e9990258488.
2026-03-13 15:48:37 -05:00
Piotr Balcer
1b9a4a0f72
[Offload][L0] clear completed events from a wait list (#186379)
Queue's WaitEvent collection wasn't being cleared after synchronization
and resetting of the events. This led to hangs on subsequent host
synchronizations if not preceeded by any other operation.
2026-03-13 13:56:27 +00:00
theRonShark
9b61ff210f
Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309)
Reverts llvm/llvm-project#185989
2026-03-13 05:20:40 +00:00
Kevin Sala Penades
ac71b185c2
[offload] Remove LIBOMPTARGET_SHARED_MEMORY_SIZE envar (#186231)
This commit removes the `LIBOMPTARGET_SHARED_MEMORY_SIZE` envar and
outputs a runtime warning if it is defined. Access to dynamic shared memory
should be obtained through the `dyn_groupprivate` clause (OpenMP 6.1) or
the launch arguments in liboffload kernel launch.
2026-03-12 21:21:29 -07:00
Joseph Huber
4376fbd793
[OpenMP] Move OpenMP implicit argument to the end and reformat (#185989)
Summary:
We use this `dyn_ptr` argument in Clang/OpenMP to handle the
`KernelLaunchEnvironment`. This is a per-kernel argument used to share
some information. Currenetly, it's prepended to the argument list and we
generate storage for it in the runtime.

This is bad for a few reasons:
1. It changes the ABI by shifting user arguments
2. It cannot be trivially be left uninitialized if unused
3. The runtime must allocate its own memory for it

This PR changes it to be appended instead. Additionally, space for this
is always emitted. This means the OMPIRBuilder itself will provide the
storage, we simply need to populate it in the runtime if it is used.
This means that if it's unused we don't always pay the cost and it's
easier for non-OpenMP users to ignore it.

Backward compatibility is maintained by auto-upgrading the kernel
arguments. In `libomptarget` we completely allocate a new buffer to
store this in the new format. The plugins still need to respect the old
ABI of the called device object, so we simply rotate it if it's the old
version.
2026-03-12 18:08:22 -05:00
Kevin Sala Penades
1f583c6dee
[OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (#152831)
Part 3 adding offload runtime support. See
https://github.com/llvm/llvm-project/pull/152651.

---------

Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
2026-03-12 01:13:06 -07:00
Alex Duran
789fea83bb
[offload][l0][nfc] remove duplicated entry (#185855)
Remove left over function by mistake from #185404
2026-03-11 11:55:30 +01:00
Alex Duran
3ff332ad0f
[Offload][L0] Add support for OffloadBinary format in L0 plugin (#185404)
- Accept OffloadBinaries as valid images by plugins that support them in
the PluginInterface.
- Add support in L0 plugin to extract SPIRV images and their associated
metadata from an OffloadBinary image.

Depends on:
- #185663

Follow-up PRs:
- #185413 (Changes SPIRV wrapper generation to use OffloadBinary)
- #185425 (Adjusts llvm-objdump)
- #184774 (Adjusts llvm-offload-binary)
2026-03-11 11:42:36 +01:00
Alex Duran
be021b8433
[OFFLOAD] Add interface to extend image validation (#185663)
As discussed in #185404 we might want to provide a way for plugins to
validate images not recognized by the common layer.

This PR adds such extension and uses it to validate pure SPIRV images by
the Level Zero plugin.
2026-03-10 18:41:23 +01:00
Joseph Huber
a9e457a82f
[Offload][AMDGPU] Fix RPC server on mixed w32 w64 workloads (#185496)
Summary:
This was a regression from the original LLVM-gpu-loader. We used to
handle `-mwavefrontsize64` correctly in the loader by over-allocating
memory and just leaving the upper 32-bits masked off. In order to handle
this in offload we need to scan loaded kernels to see how much memory we
need to allocate. This should be safe, the protocol is designed to
handle an arbitrary size and worst-case this just wastes space.
2026-03-09 17:13:59 -05:00
Łukasz Plewa
57614e8810
[OFFLOAD] Replace C-style casts with C++ style casts in obtainInfoImpl (#185023)
Replace C-style bool casts (bool)TmpInt with C++ functional casts
bool(TmpInt)
2026-03-06 10:28:38 -06:00
Hansang Bae
8f268e63e4
[Offload] Remove unused data type (#183840) 2026-02-27 15:46:59 -06:00
Hansang Bae
a347e1298c
[Offload] Enable memory usage printing with alloc debug type (#182938) 2026-02-23 17:19:41 -06:00
Jan Patrick Lehr
92447ed273
[Offload] Fix copy-elision warning (#182848)
This fixes a warning about a prohibited copy-elision due to the move of
a temporary object.
2026-02-23 13:58:07 +00:00
Alex Duran
7ed0aa2652
[OFFLOAD][L0] Remove leftover global constructor (#182611) (#182665)
fixes #182611
2026-02-21 18:09:46 +01:00
Joseph Huber
21b3461440
[flang-rt] Implement basic support for I/O from OpenMP GPU Offloading (#181039)
Summary:
This PR provides the minimal support for Fortran I/O coming from a GPU
in OpenMP offloading. We use the same support the `libc` uses for its
printing through the RPC server. The helper functions `rpc::dispatch`
and `rpc::invoke` help make this mostly automatic.

Becaus Fortran I/O is not reentrant, the vast majority of complexity
comes from needing to stitch together calls from the GPU until they can
be executed all at once. This is needed not only because of the
limitations of recursive I/O, but without this the output would all be
interleaved because of the GPU's lock-step execution.

As such, the return values from the intermediate functions are
meaningless, all returning true. The final value is correct however. For
cookies we create a context pointer on the server to chain these
together.

Works on both my AMD and NVIDIA GPUs.
```fortran
program hello_gpu
  implicit none

  !$omp target teams num_teams(1)
  !$omp parallel num_threads(2)
    ! Print strings
    print *, "Hello from GPU"
  !$omp end parallel
  !$omp end target teams

end program hello_gpu
```
```console
> flang hello.f90 -O2 -fopenmp --offload-arch=gfx1030 
> ./a.out 
 Hello from GPU
 Hello from GPU
> flang hello.f90 -O2 -fopenmp --offload-arch=sm_89  
> ./a.out 
 Hello from GPU
 Hello from GPU
```
2026-02-20 07:56:59 -06:00
Jan Patrick Lehr
e1e0e86e60
[Offload] Always check/consume Error (#182008)
This fixes an issue introduced in
https://github.com/llvm/llvm-project/pull/172226 where an llvm::Error is
not checked in the "good" code path.
2026-02-18 13:46:21 +01:00
fineg74
1c6d774baa
[OFFLOAD] Extend olMemRegister API to handle cases when a memory block may have been mapped outside of liboffload. (#172226)
This PR adds extends liboffload olMemRegister API to handle a case when
a memory block may have been mapped before calling olMemRegister to
support some use cases in libomptarget
2026-02-17 20:53:00 +00:00
Joseph Huber
d85576d368
[libc] Replace RPC 'close()' mechanism with RAII handler (#181690)
Summary:
Closing ports was previously done manually, This makes the protocol more
error prone as unclosed ports will leak and eventually the locks will
run out. I believe the original fear was that the RAII portion would
negatively impact code generation but I have not noticed anything
significant.
2026-02-16 15:14:30 -06:00
fineg74
b58a31d3ce
[OFFLOAD] Add support for host offloading device (#177307)
The purpose of this PR is to add support of host as an offloading device
to liboffload. Both OpenMP and sycl support offloading to a host as
their normal workflow and therefore would require such capability from
liboffload library.
2026-02-13 10:27:52 +01:00
Hansang Bae
0deb1b6e05
[Offload] Try to load Level Zero loader with version suffix (#180042)
The default Level Zero loader `libze_loader.so` may not be available on
systems that don't have Level Zero development package. Level Zero
loaders with major version suffix are searched in that case.
2026-02-11 15:13:26 -06:00
Alex Duran
8b9fd4803c
[OFFLOAD] Support host plugin on Windows (#180401)
Changes to make host plugin compile on Windows:
* Change IO code to be portable
* Adjust Makefiles

Allow plugin to work partially when libffi support is not found
dynamically (compilation works fine even on Windows because of the
wrapper support).
2026-02-11 08:54:47 +01:00
Joseph Huber
2f00977fea
[Offload] Make the RPC callbacks private to each running server (#178901)
Summary:
The static object mixes callbacks from different plugins because ever
since we moved to the object library target these are actually shared.
Just make it a member of the base class and make it a pointer set just
to do some basic deduplication.
2026-02-06 08:28:57 -06:00
Alex Duran
4096cb6017
[OFFLOAD] Fix TARGET_NAME in plugins common code (#180151)
Unlike other names is set between quotes which prevents our debug macros
to properly match it.
2026-02-06 14:12:04 +01:00
Joseph Huber
1a86c146ae
[Offload] Add a function to register an RPC Server callback (#178774)
Summary:
We provide an RPC server to manage calls initiated by the device to run
on the host. This is very useful for the built-in handling we have,
however there are cases where we would want to extend this
functionality.

Cases like Fortran or MPI would be useful, but we cannot put references
to these in the core offloading runtime. This way, we can provide this
as a library interface that registers custom handlers for whatever code
people want.
2026-01-30 08:03:13 -06:00
Hansang Bae
85d64d1201
[Offload] Cast to void * in the debug message (#177019)
There are a few places where data types based on character array or
string are printed in the debug message while they do not represent
strings. Such expressions should be casted to `void *` unless they
represent actual strings. Change also includes casting from integral
type to pointer type when appropriate.
2026-01-20 15:44:08 -06:00
fineg74
848d736e64
[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231)
Add liboffload asynchronous queue query API for libomptarget migration

This PR adds liboffload asynchronous queue query API that needed to make
libomptarget to use liboffload
2026-01-20 10:53:32 -08:00
Hansang Bae
edd857aad8
[Offload] Remove unnecessary maybe_unused attribute (#175855)
The attribute is not necessary in the new debug messaging.
2026-01-15 14:31:58 -06:00
Hansang Bae
90b6d33755
[Offload] Small debug message fix in Level Zero plugin (#175958)
Do not include trailing zeros in the device name.
2026-01-14 09:42:19 -06:00
Alex Duran
efad3563ea
[OFFLOAD] Update CUDA and AMD plugins to new debug format (#175787) 2026-01-13 17:53:59 +01:00
Alex Duran
86e114a9b2
Revert "[OFFLOAD] Update CUDA and AMD plugins to new debug format" (#175786)
Reverts llvm/llvm-project#175757
2026-01-13 17:13:46 +01:00
Alex Duran
7c2f49373b
[OFFLOAD] Update CUDA and AMD plugins to new debug format (#175757)
This should be the last step before completely removing the DP macro.
2026-01-13 17:06:35 +01:00
Hansang Bae
13cd7003ad
[NFC][Offload] Rename a function (#175673)
Renamed a function as suggested in #175664.
2026-01-12 19:40:17 -06:00
Hansang Bae
496729fe7e
[Offload] Fix level_zero plugin build (#175664)
Build has been broken when OMPTARGET_DEBUG is undefined.
2026-01-12 16:53:23 -06:00
Hansang Bae
dae3b49cba
[Offload] Update debug message printig in the plugins (#175205)
* Prepare a set of debug types in llvm::offload::debug to be used in
plugin code
* Update debug messages in the plugins
2026-01-12 14:26:43 -06:00
fineg74
1232599032
[OFFLOAD] Add memory data locking API for libomptarget migration (#173138)
Add liboffload memory data locking API for libomptarget migration

This PR adds liboffload memory data locking API that needed to make
libomptarget to use liboffload
2026-01-12 13:07:57 -06:00
Alex Duran
dbd52bd558
[OFFLOAD][OpenMP] Remove old style REPORT support (#175607)
Fix the few remaining usages and remove the support for the old REPORT
macro.
2026-01-12 19:48:40 +01:00
Joseph Huber
c722ef4874
[OpenMP] Remove testing LTO variant on CPU targets (#175187)
Summary:
This is only really meaningful for the NVPTX target. Not all build
environments support host LTO and these are redundant tests, just clean
this up and make it run faster.
2026-01-09 10:13:44 -06:00
fineg74
583ce49a40
[OFFLOAD] Make L0 provide more information about device to be consistent with other plugins (#172946)
Update information about devices provided by level zero plugin in order
to be more consistent with other plugins.
2026-01-08 22:10:44 +00:00
Alex Duran
280e609d4e
[OFFLOAD][L0] Expose native ELF to upper layers (#172819)
This PR refactors how the device image is built so we can expose the
native ELF of the device to DeviceImageTy which solves several issues
regarding symbol look up (as DeviceImageTy expects an ELF). It also
simplifies the module linking code taking into account the latest
changes in the driver (which adds "-library-compilation when necessary).

---------

Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
Co-authored-by: Nick Sarnie <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-12-18 18:03:12 +00:00
Alex Duran
5559918321
[OFFLOAD][L0] Improve symbol device lookup (#172820)
When looking for the device address of a symbol, we need to also look if
it's a function symbol if not found as global symbol in the device.

---------

Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
Co-authored-by: Nick Sarnie <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-18 15:31:20 +00:00
Alex Duran
3ac0ff2f36
[OFFLOAD][L0] Fix usages of getDebugLevel in L0 plugin (#172815)
Support for getDebugLevel was removed as part of the new debug macros
(#165416). This PR updates such usages to use the new ODBG_* macros.

---------

Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
Co-authored-by: Nick Sarnie <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-12-18 15:30:59 +00:00
Alex Duran
f125c8db5c
[OFFLOAD] Add plugin with support for Intel oneAPI Level Zero (#158900)
Add a new nextgen plugin that supports GPU devices through the Intel oneAPI Level Zero library. The plugin is not enabled by default  and needs to be added to LIBOMPTARGET_PLUGINS_TO_BUILD explicitely.

---------

Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
Co-authored-by: Nick Sarnie <nick.sarnie@intel.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-12-18 08:53:03 +01:00
Hansang Bae
ecb94bcfe2
[Offload] Debug message update part 3 (#171684)
Update debug messages based on the new method from #170425. Updated the
following files.
- plugins-nextgen/common/include/MemoryManager.h
- plugins-nextgen/common/include/PluginInterface.h
- plugins-nextgen/common/src/GlobalHandler.cpp
- plugins-nextgen/common/src/PluginInterface.cpp
- plugins-nextgen/host/dynamic_ffi/ffi.cpp
2025-12-17 09:05:16 -06:00
Kevin Sala Penades
35315a84b4
[offload] Fix CUDA args size by subtracting tail padding (#172249)
This commit makes the cuLaunchKernel call to pass the total arguments size without tail padding.
2025-12-14 21:57:25 -08:00
Alex Duran
66ddc9b3e7
[OFFLOAD] Add support for more fine grained debug messages control (#165416)
This PR introduces new debug macros that allow a more fined control of
which debug message to output and introduce C++ stream style for debug
messages.

Changing existing messages (except a few that I changed for testing)
will come in subsequent PRs.

I also think that we should make debug enabling OpenMP agnostic but, for
now, I prioritized maintaing the current libomptarget behavior for now,
and we might need more changes further down the line as we we decouple
libomptarget.
2025-11-20 18:39:56 +01:00
Joseph Huber
eea62159e8
[Offload] Make the RPC thread sleep briefly when idle (#168596)
Summary:
We start this thread if the RPC client symbol is detected in the loaded
binary. We should make this sleep if there's no work to avoid the thread
running at high priority when the (scarecely used) RPC call is actually
required. So, right now after 25 microseconds we will assume the server
is inactive and begin sleeping. This resets once we do find work.

AMD supports a more intelligent way to do this. HSA signals can wake a
sleeping thread from the kernel, and signals can be sent from the GPU
side. This would be nice to have and I'm planning on working with it in
the future to make this infrastructure more usable with existing AMD
workloads.
2025-11-19 15:56:25 -06:00