503 Commits

Author SHA1 Message Date
Joseph Huber
eea62159e8
[Offload] Make the RPC thread sleep briefly when idle (#168596)
Summary:
We start this thread if the RPC client symbol is detected in the loaded
binary. We should make this sleep if there's no work to avoid the thread
running at high priority when the (scarecely used) RPC call is actually
required. So, right now after 25 microseconds we will assume the server
is inactive and begin sleeping. This resets once we do find work.

AMD supports a more intelligent way to do this. HSA signals can wake a
sleeping thread from the kernel, and signals can be sent from the GPU
side. This would be nice to have and I'm planning on working with it in
the future to make this infrastructure more usable with existing AMD
workloads.
2025-11-19 15:56:25 -06:00
Michael Kruse
c32c1d0d21
[Runtimes] Default build must use its own output dirs (#168266)
Post-commit fix of #164794 reported at
https://github.com/llvm/llvm-project/pull/164794#issuecomment-3536253493

`LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by
`AddLLVM.cmake` as output directories. Unless we are in a
bootstrapping-build, It must not point to directories found by
`find_package(LLVM)` which may be read-only directories. MLIR for
instance sets thesese variables to its own build output
directory, so should the runtimes.
2025-11-19 13:51:14 +01:00
Robert Imschweiler
9a0fd22da1
Revert "[OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_uid()" (#168547)
Reverts llvm/llvm-project#164392 due to fortran issues
2025-11-18 15:10:42 +00:00
Robert Imschweiler
65c4a534bd
[OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_uid() (#164392)
Use the implementation in libomptarget. If libomptarget is not
available, always return the UID / device number of the host / the
initial device.
2025-11-18 15:22:49 +01:00
Akash Banerjee
8aa7d823b0
[OpenMP][Flang] Emit default declare mappers implicitly for derived types (#140562)
This patch adds support to emit default declare mappers for implicit
mapping of derived types when not supplied by user. This especially
helps tackle mapping of allocatables of derived types.
2025-11-14 15:59:48 +00:00
Kevin Sala Penades
1a86f0aae7
[Offload] Add device info for shared memory (#167817) 2025-11-13 11:00:12 -08:00
Łukasz Plewa
1bd035d80f
[offload] defer "---> olInit" trace message (#167893)
Tracing requires liboffload to be initialized, so calling
isTracingEnabled() before olInit always returns false. This caused the
first trace log to look like:
```
-> OL_SUCCESS
```
instead of:
```
---> olInit() -> OL_SUCCESS
```
This patch moves the pre-call trace print for olInit so it is emitted
only after initialization.

It would be possible to add extra logic to detect whether liboffload is
already initialized and only postpone the first pre-call print, but this
would add unnecessary complexity, especially since this is tablegen
code. The difference would matter only in the unlikely case of a crash
during a second olInit call.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-11-13 15:56:38 +00:00
Ethan Luis McDonough
38cade7cc6
[PGO][Offload] Fix missing names bug in GPU PGO (#166444)
After #163011 was merged, the tests in
[`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1)
broke because the offload plugins were no longer able to find
`__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm`
visible to the host on GPU targets and reverses the changes made in
f7e9968a5ba99521e6e51161f789f0cc1745193f.
2025-11-10 10:11:53 -06:00
Kevin Sala Penades
64ad5d976d
[Offload] Remove unused KernelArgsTy instantiation (#167197) 2025-11-08 20:54:32 -08:00
Joseph Huber
aaddd8d38a [OpenMP] Fix tests relying on the heap size variable
Summary:
I made that an unimplemented error, but forgot that it was used for this
environment variable.
2025-11-06 13:00:26 -06:00
Joseph Huber
670c453aeb
[Offload] Remove handling for device memory pool (#163629)
Summary:
This was a lot of code that was only used for upstream LLVM builds of
AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so
just use that. Simplifies code, can be added back if we start providing
alternate forms but I don't think there's a single use-case that would
justify it yet.
2025-11-06 10:15:18 -06:00
Robert Imschweiler
dc94f2cbad
[Offload] Add device UID (#164391)
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of
a device on a given system. (Not necessarily a UUID.) Since it is not
guaranteed that the (U)UIDs defined by the device vendor libraries, such
as HSA, do not overlap with those of other vendors, the device UIDs in
offload are always combined with the offload plugin name. In case the
vendor library does not specify any device UID for a given device, we
fall back to the offload-internal device ID.
The device UID can be retrieved using the `llvm-offload-device-info`
tool.
2025-11-04 20:15:47 +01:00
agozillon
09318c6bff
[MLIR][OpenMP] Fix and simplify bounds offset calculation for 1-D GEP offsets (#165486)
Currently this is being calculated incorrectly and will result in
incorrect index offsets in more complicated array slices. This PR tries
to address it by refactoring and changing the calculation to be more
correct.
2025-10-31 00:54:31 +01:00
Alex Duran
426d1fe548
[OFFLOAD] Remove weak from __kmpc_* calls and gather them in one header (#164613)
Follow-up from #162652

---------

Co-authored-by: Michael Klemm <michael.klemm@amd.com>
2025-10-24 15:42:20 +02:00
Nicole Aschenbrenner
16641ad8a2
[OpenMP] Adds omp_target_is_accessible routine (#138294)
Adds omp_target_is_accessible routine.
Refactors common code from omp_target_is_present to work for both
routines.

---------

Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-10-22 17:35:16 +02:00
Kaloyan Ignatov
1f7ddb61b3
[NFC][Offload][OMPT] Improve readability of liboffload OMPT tests (#163181)
- ompt_target_data_op_t, ompt_scope_endpoint_t and ompt_target_t are now
printed as strings instead of just numbers to ease debugging
- some missing clang-format clauses have been added
2025-10-22 10:48:39 +02:00
Abhinav Gaba
829804724b
[NFC][OpenMP] Update a test that was failing on aarch64. (#164456)
The failure was reported here:

https://github.com/llvm/llvm-project/pull/164039#issuecomment-3425429556

The test was checking for the "bad" behavior so as to keep track of it, but there seem to be some issues with the pointer arithmetic specific to aarch64.

The update for now is to not check for the "bad" behavior fully.

We may need to debug further if similar issues are encountered eventually once the codegen has been fixed.
2025-10-21 21:15:52 -07:00
Ross Brunton
186182bb64
[Offload] Use amd_signal_async_handler for host function calls (#154131) 2025-10-21 13:08:30 +01:00
Abhinav Gaba
f37b4459f0
[NFC][OpenMP] Add small class-member use_device_ptr/addr unit tests. (#164039)
Two of the tests are currently asserting, and two are emitting
unexpected results.

The asserting tests will be fixed using the ATTACH-style codegen from
#153683.

The other two involve `use_device_addr` on byrefs, and need more
follow-up codegen changes, that have been noted in a FIXME comment.
2025-10-20 13:14:33 -07:00
Alex Duran
9ba54ca3ee
[OFFLOAD] Interop fixes for Windows (#162652)
On Windows, for a reason I don't fully understand boolean bits get extra
padding (even when asking for packed structures) in the structures that
messes the offsets between the compiler and the runtime.

Also, "weak" works differently on Windows than Linux (i.e., the "local"
routine has preference) which causes it to crash as we don't really have
an alternate implementation of __kmpc_omp_wait_deps. Given this, it
doesn't make sense to mark it as "weak" for Linux either.
2025-10-17 11:07:31 +02:00
Jan Patrick Lehr
f7e9968a5b
[Offload] XFAIL pgo tests until resolved (#163722)
While people look into it, xfail the tests.
2025-10-16 11:43:55 +02:00
Joseph Huber
914fbe367e
[OpenMP] Disable a few more tests to get the bot green (#163614) 2025-10-15 14:14:15 -05:00
Jan Patrick Lehr
4b84e0f3f0
[OpenMP] Add test to print interop identifiers (#161434)
The test covers some of the identifier symbols in the interop runtime.

This test, for now, is to guard against complete breakage, which was the
result of the other `interop.c` test not being enabled on AMD and thus,
not caught by our buildbots.
2025-10-15 20:38:33 +02:00
Joseph Huber
227bc5786f Revert "[Offload] Lazily initialize platforms in the Offloading API" (#163272)
Summary:
This causes issues with CUDA's teardown order when the init is separated
from the total init scope.
2025-10-14 12:46:55 -05:00
Joseph Huber
4a35c4d38a
[Offload] Lazily initialize platforms in the Offloading API (#163272)
Summary:
The Offloading library wraps around the underlying plugins. The problem
is that we currently initialize all plugins we find, even if they are
not needed for the program. This is very expensive for trivial uses, as
fully heterogenous usage is quite rare. In practice this means that you
will always pay a 200 ms penalty for having CUDA installed.

This patch changes the behavior to provide accessors into the plugins
and devices that allows them to be initialized lazily. We use a
once_flag, this should properly take a fast-path check while still
blocking on concurrent use.

Making full use of this will require a way to filter platforms more
specifically. I'm thinking of what this would look like as an API.
I'm thinking that we either have an extra iterate function that takes a
callback on the platform, or we just provide a helper to find all the
devices that can run a given image. Maybe both?

Fixes: https://github.com/llvm/llvm-project/issues/159636
2025-10-14 09:35:53 -05:00
Jan Patrick Lehr
6eef045365
[Offload] Silence warning via maybe unused (NFC) (#163076) 2025-10-12 17:28:46 +02:00
agozillon
9155b318f2
[Flang][OpenMP] Defer descriptor mapping for assumed dummy argument types (#154349)
This PR adds deferral of descriptor maps until they are necessary for
assumed dummy argument types. The intent is to avoid a problem where a
user can inadvertently map a temporary local descriptor to device
without their knowledge and proceed to never unmap it. This temporary
local descriptor remains lodged in OpenMP device memory and the next
time another variable or descriptor residing in the same stack address
is mapped we incur a runtime OpenMP map error as we try to remap the
same address.

This fix was discussed with the OpenMP committee and applies to OpenMP
5.2 and below, future versions of OpenMP can avoid this issue via the
attach semantics added to the specification.
2025-10-09 17:52:41 +02:00
Alex Duran
45757b9284
[OFFLOAD] Remove unused init_device_info plugin interface (#162650)
This was used for the old interop code. It's dead code after #143491
2025-10-09 08:38:24 -05:00
Joseph Huber
095877c12e [Offload] Fix isValidBinary segfault on host platform
Summary:
Need to verify this actually has a device. We really need to rework this
to point to a real impolementation, or streamline it to handle this
automatically.
2025-10-06 14:46:50 -05:00
Joseph Huber
8763812b4c
[Offload] Remove check on kernel argument sizes (#162121)
Summary:
This check is unnecessarily restrictive and currently incorrectly fires
for any size less than eight bytes. Just remove it, we do sanity checks
elsewhere and at some point need to trust the ABI.
2025-10-06 12:49:44 -05:00
Alex Duran
902fe02e87
[OFFLOAD] Restore interop functionality (#161429)
This implements two pieces to restore the interop functionality (that I
broke) when the 6.0 interfaces were added:

* A set of wrappers that support the old interfaces on top of the new
ones
* The same level of interop support for the CUDA amd AMD plugins
2025-10-02 21:48:31 +02:00
Akash Banerjee
ed12dc5e30
[Flang][OpenMP] Implicitly map nested allocatable components in derived types (#160766)
This PR adds support for nested derived types and their mappers to the
MapInfoFinalization pass.

- Generalize MapInfoFinalization to add child maps for arbitrarily
nested allocatables when a derived object is mapped via declare mapper.
- Traverse HLFIR designates rooted at the target block arg and build
full coordinate_of chains; append members with correct membersIndex.

This fixes #156461.
2025-10-02 16:15:16 +00:00
Joseph Huber
0fcce4fb7b
[OpenMP] Mark problematic tests as XFAIL / UNSUPPORTED (#161267)
Summary:
Several of these tests have been failing for literal years. Ideally we
make efforts to fix this, but keeping these broken has had serious
consequences on our testing infrastructure where failures are the norm
so almost all test failures are disregarded. I made a tracking issue for
the ones that have been disabled.

https://github.com/llvm/llvm-project/issues/161265
2025-09-29 15:17:55 -05:00
Joseph Huber
786358a3d7 [Offload] Fix incorrect size used in llvm-offload-device-info tool
Summary:
This was not using the size previously queried and would fail when the
implementation actually verified it.
2025-09-29 14:37:11 -05:00
Abhinav Gaba
7de73c4e9d
[OpenMP][Offload] Support PRIVATE | ATTACH maps for corresponding-pointer-initialization. (#160760)
`PRIVATE | ATTACH` maps can be used to represent firstprivate pointers
that should be initialized by doing doing the pointee's device address,
if its lookup succeeds, or retain the original host pointee's address
otherwise.

With this, for a test like the following:

  ```f90
  integer, pointer :: p(:)
  !$omp target map(p(1))
  ... print*, p(1)
  !$omp end target
  ```

The codegen can look like:
  ```llvm
   ; maps for p:
   ; &p(1),       &p(1), sizeof(p(1)),       TO|FROM              //(1)
   ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), ATTACH               //(2)
   ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), PRIVATE|ATTACH|PARAM //(3)
   call... @__omp_outlined...(ptr %ref_ptr_of_p)
  ```

* `(1)` maps the pointee `p(1)`.
* `(2)` attaches it to the (previously) mapped `ref_ptr(p)`, if present.
  It can be controlled via OpenMP 6.1's `attach(auto/always/never)`
  map-type modifiers.
* `(3)` privatizes and initializes the local `ref_ptr(p)`, which gets
passed
  in as the kernel argument `%ref_ptr_of_p`. Can be skipped if p is not
  referenced directly within the region.

While similar mapping can be used for C/C++, it's more important/useful
for Fortran as we can avoid creating another argument for passing the
descriptor, and use that to initialize the private copy in the body of
the kernel.
2025-09-29 11:47:21 -07:00
Joseph Huber
44f392e999 [OpenMP] Fix 'libc' configuration when building OpenMP
Summary:
Forgot to port this option's old handling from offload. It's not way
easier since they're built in the same CMake project. Also delete the
leftover directory that's not used anymore, don't know how that was
still there.
2025-09-29 11:59:17 -05:00
Dominik Adamski
e4d94f4f7f
[OpenMP][Flang] Fix no-loop test (#161162)
Fortran no-loop test is supported only for GPU.
2025-09-29 16:01:52 +02:00
Piotr Balcer
23d08af3d4
[Offload][NFC] use unique ptrs for platforms (#160888)
Currently, devices store a raw pointer to back to their owning Platform.
Platforms are stored directly inside of a vector. Modifying this vector
risks invalidating all the platform pointers stored in devices.

This patch allocates platforms individually, and changes devices to
store a reference to its platform instead of a pointer. This is safe,
because platforms are guaranteed to outlive the devices they contain.
2025-09-29 07:10:26 -05:00
Kevin Sala Penades
01d761a776
[Offload] Use Error for allocating/deallocating in plugins (#160811)
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-26 13:50:00 -05:00
Dominik Adamski
83ef38a274
[Flang][OpenMP] Enable no-loop kernels (#155818)
Enable the generation of no-loop kernels for Fortran OpenMP code. target
teams distribute parallel do pragmas can be promoted to no-loop kernels
if the user adds the -fopenmp-assume-teams-oversubscription and
-fopenmp-assume-threads-oversubscription flags.

If the OpenMP kernel contains reduction or num_teams clauses, it is not
promoted to no-loop mode.

The global OpenMP device RTL oversubscription flags no longer force
no-loop code generation for Fortran.
2025-09-26 13:57:51 +02:00
Akash Banerjee
3e7e60ae5c
Revert "[Flang][OpenMP] Implicitly map nested allocatable components in derived types" (#160759)
Reverts llvm/llvm-project#160116
2025-09-25 19:53:58 +01:00
Akash Banerjee
b4f1e0e5b1
[Flang][OpenMP] Implicitly map nested allocatable components in derived types (#160116)
This PR adds support for nested derived types and their mappers to the
MapInfoFinalization pass.

- Generalize MapInfoFinalization to add child maps for arbitrarily
nested allocatables when a derived object is mapped via declare mapper.
- Traverse HLFIR designates rooted at the target block arg and build
full coordinate_of chains; append members with correct membersIndex.
  
This fixes #156461.
2025-09-24 14:30:27 +01:00
Ross Brunton
ea0e5185e2
[Offload] Add olGetMemInfo with platform-less API (#159581) 2025-09-24 12:17:57 +01:00
Ross Brunton
e60a5733f0
[Offload] Print Image location rather than casting it (#160309)
This squishes a warning where the runtime tries to bind a StringRef to
a `%p`.
2025-09-24 10:57:55 +01:00
Alexey Sachkov
bb584644e9
[Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372) 2025-09-23 12:21:57 -05:00
Joseph Huber
204580aa8e
[Offload] Don't add the unsupported host plugin to the list (#159642)
Summary:
The host plugin is basically OpenMP specific and doesn't work very well.
Previously we were skipping over it in the list instead of just not
adding it at all.
2025-09-23 08:31:35 -05:00
Ross Brunton
fcebe6bdbb
[Offload] Re-allocate overlapping memory (#159567)
If olMemAlloc happens to allocate memory that was already allocated
elsewhere (possibly by another device on another platform), it is now
thrown away and a new allocation generated.

A new `AllocBases` vector is now available, which is an ordered list
of allocation start addresses.
2025-09-23 13:59:52 +01:00
Tobias Stadler
dfbd76bda0
[Remarks] Restructure bitstream remarks to be fully standalone (#156715)
Currently there are two serialization modes for bitstream Remarks:
standalone and separate. The separate mode splits remark metadata (e.g.
the string table) from actual remark data. The metadata is written into
the object file by the AsmPrinter, while the remark data is stored in a
separate remarks file. This means we can't use bitstream remarks with
tools like opt that don't generate an object file. Also, it is confusing
to post-process bitstream remarks files, because only the standalone
files can be read by llvm-remarkutil. We always need to use dsymutil
to convert the separate files to standalone files, which only works for
MachO. It is not possible for clang/opt to directly emit bitstream
remark files in standalone mode, because the string table can only be
serialized after all remarks were emitted.

Therefore, this change completely removes the separate serialization
mode. Instead, the remark string table is now always written to the end
of the remarks file. This requires us to tell the serializer when to
finalize remark serialization. This automatically happens when the
serializer goes out of scope. However, often the remark file goes out of
scope before the serializer is destroyed. To diagnose this, I have added
an assert to alert users that they need to explicitly call
finalizeLLVMOptimizationRemarks.

This change paves the way for further improvements to the remark
infrastructure, including more tooling (e.g. #159784), size optimizations
for bitstream remarks, and more.

Pull Request: https://github.com/llvm/llvm-project/pull/156715
2025-09-22 16:41:39 +01:00
Joseph Huber
23efc67e19
[Offload] Remove non-blocking allocation type (#159851)
Summary:
This was originally added in as a hack to work around CUDA's limitation
on allocation. The `libc` implementation now isn't even used for CUDA so
this code is never hit. Even if this case, this code never truly worked.

A true solution would be to use CUDA's virtual memory API instead to
allocate 2MiB slabs independenctly from the normal memory management
done in the stream.
2025-09-20 09:07:14 -05:00
Joseph Huber
580860e8b7
[OpenMP][NFC] Clean up a bunch of warnings and clang-tidy messages (#159831)
Summary:
I made the GPU flags accept more of the default LLVM warnings, which
triggered some new cases. Clean those up and fix some other ones while
I'm at it.
2025-09-19 14:09:33 -05:00