The purpose of this PR is to add support of host as an offloading device
to liboffload. Both OpenMP and sycl support offloading to a host as
their normal workflow and therefore would require such capability from
liboffload library.
Summary:
This change allows lambdas to be used in the RPC dispatching functions.
Just requires an extra function trait to convert a lambda with no
captures into a function pointer. Also rearranged where the `Port` lives
because it looks better no that we may use a lambda and it's more
consistent with the dispatch usage (putting the client at the start).
The default Level Zero loader `libze_loader.so` may not be available on
systems that don't have Level Zero development package. Level Zero
loaders with major version suffix are searched in that case.
Summary:
The RPC interface is useful for forwarding functions. This PR adds
helper functions for doing a completely bare forwarding of a function
from the client to the server. This is intended to facilitate
heterogenous libraries that implement host functions on the GPU (like
MPI or Fortran).
Changes to make host plugin compile on Windows:
* Change IO code to be portable
* Adjust Makefiles
Allow plugin to work partially when libffi support is not found
dynamically (compilation works fine even on Windows because of the
wrapper support).
Allow a to define a set of Types that are not shown by default when
doing default debug loggin (e.g., LIBOMPTARGET_DEBUG=All).
Users can enable output of those types of messages by explicitly adding
them to LIBOMPTARGET_DEBUG.
Used to implement: #180545
---------
Co-authored-by: Michael Klemm <michael.klemm@amd.com>
I'll merge this at the same time as some llvm-zorg changes that start
building the DeviceRTL.
We only see one new test passing because everything still fails because
of the issue described in
https://github.com/llvm/llvm-project/pull/178980
Once a fix for that issue is merged we will see many new passes.
Summary:
The static object mixes callbacks from different plugins because ever
since we moved to the object library target these are actually shared.
Just make it a member of the base class and make it a pointer set just
to do some basic deduplication.
Right now if we run `check-offload` for SPIR-V the DeviceRTL isn't used
because we pass `-nogpulib`.
Don't pass that, but also don't pass `--libomptarget-spirv-bc-path` yet
because the DeviceRTL is brand new so we don't want to error if it's not
present.
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
OffloadBinary::create() now returns
`Expected<SmallVector<unique_ptr<OffloadBinary>>>`
instead of a single unique_ptr, to support multiple entries in version 2
format.
Updated DeviceImageTy constructor to extract the first binary from the
returned
vector, with empty check. In this context, only one image per
OffloadBinary is expected.
Summary:
Right now this will fail because the GPU architectures will attempt to
build all of `offload` with the GPU, which obviously won't work. In the
future we will proably have some utility library that we will route
through this, but for now just silently return. This is useful because
the documentation states to use this, but it doesn't work right now.
```
-DLLVM_ENABLE_RUNTIMES=offload;openmp
-DLLVM_RUNTIME_TARGETS=default;amdgcn-amd-amdhsa
```
This PR makes this work.
Summary:
We provide an RPC server to manage calls initiated by the device to run
on the host. This is very useful for the built-in handling we have,
however there are cases where we would want to extend this
functionality.
Cases like Fortran or MPI would be useful, but we cannot put references
to these in the core offloading runtime. This way, we can provide this
as a library interface that registers custom handlers for whatever code
people want.
Recursive types can cause re-entrant mapper emission. The mapper
function is created by OpenMPIRBuilder before the callbacks run, so it
may already exist in the LLVM module even though it is not yet
registered in the ModuleTranslation mapping table. Reuse and register it
to break the recursion. Added offloading test.
## Summary
- Fix uninitialized output parameter in `olQueryQueue_impl` when
`Queue->AsyncInfo->Queue` is null
- Set `IsQueueWorkCompleted` to `true` when no underlying queue exists
(no pending work)
- Resolves test failure on AMDGPU for
`olQueryQueueTest.SuccessEmptyAsyncQueueCheckResult`
Fixes#178462.
## Test plan
- [x] Fixed
`OffloadAPI/queue.unittests/olQueryQueueTest/SuccessEmptyAsyncQueueCheckResult/AMDGPU_AMD_Radeon_RX_7700_XT_0`
test
- [ ] CI tests pass
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Joseph Huber <huberjn@outlook.com>
Fix OpenMP mapper lowering by attaching user-defined/default mappers
only to the base parent entry, not combined/segment entries. This
prevents mapper calls with partial sizes. Added relevant tests.
All error cases in this tool are usage errors (bad user input, missing
files, malformed JSON) rather than internal LLVM bugs, so
`reportFatalUsageError` is the appropriate replacement.
This reverts commit 5a457837dd988aa01c65820848381a5b99a74c0a.
Includes the test fix from
https://github.com/llvm/llvm-project/pull/177659.
The test had to be updated to exclude a scenario that was failing
with/without the change (involving mapping a struct with a byref member
with a mapper).
-----
**Original PR's description:**
This is a fix for https://github.com/llvm/llvm-project/issues/61636.
Ravi had this implemented downstream before he retired. This PR is a
chery-pick of that.
The test is taken from @jdoerfert's WIP change in
527bf4b129.
The change partially undoes the changes done in
0caf736d7e1d16d1059553fc28dbac31f0b9f788, so @alexey-bataev might need
to take a look.
Make implicit default mapper generation respect defaultmap categories so
unrelated defaultmap clauses no longer suppress mappers for derived
types.
Added related tests.
This is a fix for https://github.com/llvm/llvm-project/issues/61636.
Ravi had this implemented downstream before he retired. This PR is a
chery-pick of that.
The test is taken from @jdoerfert's WIP change in
527bf4b129.
The change partially undoes the changes done in
0caf736d7e1d16d1059553fc28dbac31f0b9f788, so @alexey-bataev might need
to take a look.
---------
Co-authored-by: Ravi Narayanaswamy <ravi.narayanaswamy@intel.com>
Co-authored-by: Johannes Doerfert <johannes@jdoerfert.de>
The compiler skips mapping of named constants (parameters) to OpenMP
target regions under the assumption that constants don't need to be
mapped. This assumption is not valid when array is accessed inside with
dynamic index. The problem can be seen with the following code:
```
module fir_lowering_check
implicit none
integer, parameter :: dp = selected_real_kind(15, 307)
real(dp), parameter :: arrays(2) = (/ 0.0, 0.0 /)
contains
subroutine test(hold)
integer, intent(in) :: hold
integer :: z
real(dp) :: temp
!$omp target teams distribute parallel do
do z = 1, 2
temp = arrays(hold)
end do
!$omp end target teams distribute parallel do
end subroutine test
end module fir_lowering_check
program main
use fir_lowering_check
implicit none
integer :: hold
hold = 1
call test(hold)
print *, "Finished"
end program main
```
It fails with the following error
`'hlfir.designate' op using value defined outside the region`
The fix is to allow mapping of constant arrays and map them as `to`.
Depends on #170578.
The fallback modifiers are currently part of OpenMP 6.1. 4/8 of the
tests check for the current bad output, with FIXME comments.
3 of these "bad" tests will be fixed with the 4th PR in this stack with
the `fb_nullify` codegen changes.
4th bad test will need a follow-up fix to privatization of byref
`use_device_ptr` operands.
Dependent PR: #173931.
There are a few places where data types based on character array or
string are printed in the debug message while they do not represent
strings. Such expressions should be casted to `void *` unless they
represent actual strings. Change also includes casting from integral
type to pointer type when appropriate.
Per documentation the call to dataExchange API (move memory block
between different devices) is permitted only if isDataExchangable() call
returned true. While almost all platforms support memory transfer
between different devices, in the case when the transfer is attempted
between devices belonging to different platforms if they are present on
the same machine which can lead to unexpected results. This PR adds a
check if dataExchange can be called and if not uses a workaround by
initiating memory transfer through host.
Add liboffload asynchronous queue query API for libomptarget migration
This PR adds liboffload asynchronous queue query API that needed to make
libomptarget to use liboffload
PR #144635 enabled non-contiguous updates for both `update from` and
`update to` clauses, but tests for `update to` were missing. This PR
adds those missing tests to ensure coverage.
Convert the first AMDGPU buildbots to use the ScriptedBuilder introduced
llvm-zorg. For the motivation, see
https://github.com/llvm/llvm-zorg/pull/648.
Since the production buildbot still needs to be restarted for
ScriptedBuilder to work, only convert the builders that are currently in
staging for now. These are:
* openmp-offload-amdgpu-runtime
* openmp-offload-amdgpu-clang-flang
Both of them happen to be OpenMPBuilder.getOpenMPCMakeBuildFactory-based
builders before this change. They also set an environment variable that
the previous ScriptedBuilder did not, so we are adding support.
The corresponding llvm-zorg change is
https://github.com/llvm/llvm-zorg/pull/697.
Eventually we might want to rework the INFO macro to work like the new
ODBG macro but in the meantime at least translate the Info type to the
correct Debug type instead of just using DP directly (which uses the
default type).
To reduce interference between threads, instead of writing the
components of a debug message directly to the underlying stream, write
them to a buffer and flush the buffer to the stream when its completed.
Depends on #174659.
This PR adds a new map-type bit to control the fallback behavior when
when a pointer lookup fails.
For now, this is only meaningful with `RETURN_PARAM`, and can be used
for `need_device_ptr` (for which the default is to use `nullptr` as the
result
when lookup fails), and OpenMP 6.1's `use_device_ptr(fb_nullify)`.
Eventually, this can be extended to work with assumed-size maps on
`target`
constructs, to control what the argument should be set to when lookup
fails (the OpenMP spec does not have a way to control that yet).
Dependent PR: #170578.