History

[compiler-rt] Rework profile data handling for GPU targets (#187136 )

Summary:
Currently, the GPU iterates through all of the present symbols and
copies them by prefix. This is inefficient as it requires a lot of small
high-latency data transfers rather than a few large ones. Additionally,
we force every single profiling symbol to have protected visibility.
This means potentially hundreds of unnecessary symbols in the symbol
table.

This PR changes the interface to move towards the start / stop section
handling. AMDGPU supports this natively as an ELF target, so we need
little changes. Instead of overriding visibility, we use a single table
to define the bounds that we can obtain with one contiguous load.

Using a table interface should also work for the in-progress HIP
implementation for this, as it wraps the start / stop sections into
standard void pointers which will be inside of an already mapped region
of memory, so they should be accessible from the HIP API.

NVPTX is more difficult as it is an ELF platform without this support. I
have hooked up the 'Other' handling to work around this, but even then
it's a bit of a stretch. I could remove this support here, but I wanted
to demonstrate that we can share the ABI. However, NVPTX will only work
if we force LTO and change the backend to emit variables in the same

TL;DR, we now do this:
```c
struct { start1, stop1, start2, stop2, start3, stop3, version; } device;
struct host = DtoH(lookup("device"));
counters = DtoH(host.stop - host.start)
version = DtoH(host.version);
```

2026-03-26 10:17:43 -05:00

[Offload] Escape \; in command string (#186120 )

2026-03-12 15:02:40 +01:00

cmake

[OpenMP] Emit aggregate kernel prototypes and remove libffi dependency (#186261 )

2026-03-20 13:08:23 -05:00

docs

[Offload] Add Offload API Sphinx documentation (#147323 )

2025-07-10 11:50:51 +01:00

include

[OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (#152831 )

2026-03-12 01:13:06 -07:00

liboffload

[Offload] Fix type mismatch by using uint64_t instead of size_t (#183375 )

2026-02-25 13:31:03 -08:00

libomptarget

[OpenMP] Fix non-contiguous array omp target update (#156889 )

2026-03-26 15:55:31 +01:00

plugins-nextgen

[compiler-rt] Rework profile data handling for GPU targets (#187136 )

2026-03-26 10:17:43 -05:00

test

[OpenMP] Fix non-contiguous array omp target update (#156889 )

2026-03-26 15:55:31 +01:00

tools

[Offload] Add argument to 'olInit' for global configuration options (#181872 )

2026-02-17 14:04:00 -06:00

unittests

[OFFLOAD] Enable Level Zero unittests (#185492 )

2026-03-11 14:09:59 +00:00

utils

…

CMakeLists.txt

[Offload] Enable multilib building for OpenMP/Offload (#188485 )

2026-03-26 07:37:22 -05:00

Maintainers.md

[Offload] Add 'Maintainers.md' file for offload (#138177 )

2025-05-01 14:06:33 -05:00

README.md

…

README.txt

…

README.md

The LLVM/Offload Subproject

The Offload subproject aims at providing tooling, runtimes, and APIs that allow users to execute code on accelerators or other "co-processors" that may or may not match the architecture of their "host". In the long run, all kinds of targets are in scope of this effort, including but not limited to: CPUs, GPUs, FPGAs, AI/ML accelerators, distributed resources, etc.

For OpenMP offload users, the project is ready and fully usable. The final API design is still under development. More content will show up here and on our webpage soon. In the meantime, people are encouraged to participate in our meetings (see below) and check our development board as well as the discussions on Discourse.

Meetings

Every second Wednesday, 7:00 - 8:00am PT, starting Jan 24, 2024. Alternates with the OpenMP in LLVM meeting. invite.ics Meeting Minutes and Agenda