43 Commits

Author SHA1 Message Date
Joseph Huber
38b3f45a81 [Offload] Fix offload-info interface
Summary:
The offload info tool doesn't initialize things properly, just check
this first instead.
2025-01-27 10:36:09 -06:00
Joseph Huber
f07505849c [Offload] Fix server thread from being shut down if unused 2025-01-27 08:29:41 -06:00
Joseph Huber
e7592d83e0 [Offload][NFC] Make sure the thread is not running already 2025-01-27 08:06:29 -06:00
Joseph Huber
134401deea
[Offload] Move RPC server handling to a dedicated thread (#112988)
Summary:
Handling the RPC server requires running through list of jobs that the
device has requested to be done. Currently this is handled by the thread
that does the waiting for the kernel to finish. However, this is not
sound on NVIDIA architectures and only works for async launches in the
OpenMP model that uses helper threads.

However, we also don't want to have this thread doing work
unnnecessarily. For this reason we track the execution of kernels and
cause the thread to sleep via a condition variable (usually backed by
some kind of futex or other intelligent sleeping mechanism) so that the
thread will be idle while no kernels are running.
2025-01-24 11:36:45 -06:00
Joseph Huber
6518b121f0
[Offload][NFC] Factor out and rename the __tgt_offload_entry struct (#123785)
Summary:
This patch is an NFC renaming to make using the offloading entry type
more portable between other targets. Right now this is just moving its
definition to LLVM so others can use it. Future work will rework the
struct layout.
2025-01-21 12:05:24 -06:00
Jinsong Ji
8d1d67ec4d
[Offload][PGO] Fix dump of array in ProfData (#122039)
Exposed by -Warray-bounds:

In file included from
../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:252:

../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1:
error: array index 4 is past the end of the array (that has type 'const
std::remove_const<const uint16_t>::type[4]' (aka 'const unsigned
short[4]')) [-Werror,-Warray-bounds]
109 | INSTR_PROF_DATA(const uint16_t, Int16ArrayTy,
NumValueSites[IPVK_Last+1], \
| ^ ~~~~~~~~~~~

../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:250:15:
note: expanded from macro 'INSTR_PROF_DATA'
250 | outs() << ProfData.Name << " "; \
      |               ^        ~~~~

../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1:
note: array 'NumValueSites' declared here
109 | INSTR_PROF_DATA(const uint16_t, Int16ArrayTy,
NumValueSites[IPVK_Last+1], \
      | ^

../../../../../../../llvm/offload/plugins-nextgen/common/include/GlobalHandler.h:62:3:
note: expanded from macro 'INSTR_PROF_DATA'
   62 |   std::remove_const<Type>::type Name;

Avoid accessing out-of-bound data, but skip printing array data for now.
As there is no simple way to do this without hardcoding the
NumValueSites field.

---------

Co-authored-by: Ethan Luis McDonough <ethanluismcdonough@gmail.com>
2025-01-14 15:46:27 -05:00
wanglei
bdf727065b
[Offload] Add support for loongarch64 to host plugin
This adds support for the loongarch64 architecture to the offload host
plugin.

Similar to #115773

To fix some test issues, I've had to add the LoongArch64 target to:

- CompilerInvocation::ParseLangArgs
- linkDevice in ClangLinuxWrapper.cpp
- OMPContext::OMPContext (to set the device_kind_cpu trait)

Reviewed By: jhuber6

Pull Request: https://github.com/llvm/llvm-project/pull/120173
2024-12-17 19:06:10 +08:00
Jinsong Ji
e85a9f5540
libc: Prefix RPC Status code to avoid conflict in windows build (#119991)
Somehow conflict with define in wingdi.h.

Fix build failures:

[ 52%] Building CXX object
projects/offload/plugins-nextgen/common/CMakeFiles/PluginCommon.dir/src/RPC.cpp.obj
In file included from
...llvm\offload\plugins-nextgen\common\src\RPC.cpp:16:
...\llvm\libc\shared\rpc.h(48,3): error: expected identifier
   48 |   ERROR = 0x1000,
      |   ^
c:\Program files (x86)\Windows
Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from
macro 'ERROR'
  118 | #define ERROR               0
      |                             ^
...\llvm\offload\plugins-nextgen\common\src\RPC.cpp(75,17): error:
expected unqualified-id
   75 |     return rpc::ERROR;
      |                 ^
c:\Program files (x86)\Windows
Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from
macro 'ERROR'
  118 | #define ERROR               0
      |                             ^
2 errors generated.
2024-12-15 09:35:44 -05:00
hidekisaito
f2bceb2311
[Offload][AMDGPU] accept generic target (#118919)
Enables generic ISA, e.g., "--offload-arch=gfx11-generic" device code to
run on gfx11-generic ISA capable device.

Executable may contain one ELF that has specific target ISA and another
ELF that has compatible generic ISA.
Under that circumstance, this code should say both ELFs are compatible,
leaving the rest to PluginManager to handle.
Suggestions on how best to address that is welcome.
2024-12-09 19:11:38 -05:00
Shilei Tian
92376c3ff5
[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042) 2024-12-06 09:07:50 -05:00
Shilei Tian
eb49788bd9
[Offload][AMDGPU] Allow COV6 images (#118909) 2024-12-05 20:05:39 -05:00
Shilei Tian
68bcba6d7a Revert "[AMDGPU] Use COV6 by default (#118515)"
This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some
buildbots are not ready yet.
2024-12-03 20:17:06 -05:00
Shilei Tian
410cbe3cf2
[AMDGPU] Use COV6 by default (#118515) 2024-12-03 19:38:35 -05:00
Joseph Huber
a6ef0debb1 [libc][NFC] Rename RPC opcodes to better reflect their usage
Summary:
RPC_ is a generic prefix here, use LIBC_ to indicate that these are
opcodes used to implement the C library
2024-12-02 15:35:08 -06:00
Joseph Huber
91f5f974cb
[OpenMP] Unconditionally provide an RPC client interface for OpenMP (#117933)
Summary:
This patch adds an RPC interface that lives directly in the OpenMP
device runtime. This allows OpenMP to implement custom opcodes.
Currently this is only providing the host call interface, which is the
raw version of reverse offloading. Previously this lived in `libc/` as
an extension which is not the correct place.

The interface here uses a weak symbol for the RPC client by the same
name that the `libc` interface uses. This means that it will defer to
the libc one if both are present so we don't need to set up multiple
instances.

The presense of this symbol is what controls whether or not we set up
the RPC server. Because this is an external symbol it normally won't be
optimized out, so there's a special pass in OpenMPOpt that deletes this
symbol if it is unused during linking. That means at `O0` the RPC server
will always be present now, but will be removed trivially if it's not
used at O1 and higher.
2024-12-02 14:31:51 -06:00
Joseph Huber
1d810ece2b
[libc] Move libc server handlers to a shared header (#117908)
Summary:
We can simply include this header from the shared directory now and do
not need to have this level of indirection. Simply stash it with the
other libc opcode handlers.

If we were able to move the printf handlers to the shared directory then
this could just be a header as well, which would HEAVILY simplify the
mess associated with building the RPC server first in the projects
build, then copying it to the runtimes build.
2024-11-27 14:57:52 -06:00
Joseph Huber
89d8e70031
[libc] Export a pointer to the RPC client directly (#117913)
Summary:
We currently have an unnecessary level of indirection when initializing
the RPC client. This is a holdover from when the RPC client was not
trivially copyable and simply makes it more complicated. Here we use the
`asm` syntax to give the C++ variable a valid name so that we can just
copy to it directly.

Another advantage to this, is that if users want to piggy-back on the
same RPC interface they need only declare theirs as extern with the same
symbol name, or make it weak to optionally use it if LIBC isn't
avaialb.e
2024-11-27 14:57:38 -06:00
Joseph Huber
d7c20a6f0c [libc][NFC] Move RPC opcodes to the 'shared/' directory as well 2024-11-25 12:04:10 -06:00
Joseph Huber
b4d49fb52e
[libc] Remove RPC server API and use the header directly (#117075)
Summary:
This patch removes much of the `llvmlibc_rpc_server` interface. This
pretty much deletes all of this code and just replaces it with including
`rpc.h` directly. We still maintain the file to let `libc` handle the
opcodes, since those depend on the `printf` impelmentation.

This will need to be cleaned up more, but I don't want to put too much
into a single patch.
2024-11-25 07:13:28 -06:00
Ivan Radanov Ivanov
0a27e4eed4 [offload] Fix copy-paste defect in error message 2024-11-19 17:12:51 +09:00
Matin Raayai
bb3f5e1fed
Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234)
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.

cc @arsenm @aeubanks
2024-11-14 13:30:05 -08:00
aurel32
b6bd7477a9
[Offload] Add support for riscv64 to host plugin (#115773)
This adds support for the riscv64 architecture to the offload host
plugin. The check to define FFI_DEFAULT_ABI is intentionally not guarded
by __riscv_xlen as the value is the same for riscv32 and riscv64
(support for OpenMP on riscv32 is still under review).
2024-11-13 08:15:49 -06:00
Johannes Doerfert
08533a3ee8
[Offload][NFC] Reorganize utils:: and make Device/Host/Shared clearer (#100280)
We had three `utils::` namespaces, all with different "meaning" (host,
device, hsa_utils). We should, when we can, keep "include/Shared"
accessible from host and device, thus RefCountTy has been moved to a
separate header. `hsa_utils` was introduced to make `utils::` less
overloaded. And common functionality was de-duplicated, e.g.,
`utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type
punning now checks for the size of the result to make sure it matches
the source type.

No functional change was intended.
2024-09-05 13:36:26 -07:00
Ethan Luis McDonough
fde2d23ee2
[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)
This pull request is a revised version of #76587. This pull request
fixes some build issues that were present in the previous version of
this change.

> This pull request is the first part of an ongoing effort to extends
PGO instrumentation to GPU device code. This PR makes the following
changes:
>
> - Adds blank registration functions to device RTL
> - Gives PGO globals protected visibility when targeting a supported
GPU
> - Handles any addrspace casts for PGO calls
> - Implements PGO global extraction in GPU plugins (currently only
dumps info)
>
> These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-08-22 01:10:54 -05:00
Johannes Doerfert
80525dfcde
[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549)
Through the new `-foffload-via-llvm` flag, CUDA kernels can now be
lowered to the LLVM/Offload API. On the Clang side, this is simply done
by using the OpenMP offload toolchain and emitting calls to `llvm*`
functions to orchestrate the kernel launch rather than `cuda*`
functions. These `llvm*` functions are implemented on top of the
existing LLVM/Offload API.

As we are about to redefine the Offload API, this wil help us in the
design process as a second offload language.

We do not support any CUDA APIs yet, however, we could:
  https://www.osti.gov/servlets/purl/1892137

For proper host execution we need to resurrect/rebase
  https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf
(which was designed for debugging).

```
❯❯❯ cat test.cu
extern "C" {
void *llvm_omp_target_alloc_shared(size_t Size, int DeviceNum);
void llvm_omp_target_free_shared(void *DevicePtr, int DeviceNum);
}

__global__ void square(int *A) { *A = 42; }

int main(int argc, char **argv) {
  int DevNo = 0;
  int *Ptr = reinterpret_cast<int *>(llvm_omp_target_alloc_shared(4, DevNo));
  *Ptr = 7;
  printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
  square<<<1, 1>>>(Ptr);
  printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
  llvm_omp_target_free_shared(Ptr, DevNo);
}

❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native

❯❯❯ llvm-objdump --offloading test123

test123:        file format elf64-x86-64

OFFLOADING IMAGE [0]:
kind            elf
arch            gfx90a
triple          amdgcn-amd-amdhsa
producer        openmp

❯❯❯ LIBOMPTARGET_INFO=16 ./test123
Ptr 0x155448ac8000, *Ptr 7
Ptr 0x155448ac8000, *Ptr 42
```
2024-08-12 17:44:58 -07:00
Johannes Doerfert
9a1013220b
[Offload] Allow to record kernel launch stack traces (#100472)
Similar to (de)allocation traces, we can record kernel launch stack
traces and display them in case of an error. However, the AMD GPU plugin
signal handler, which is invoked on memroy faults, cannot pinpoint the
offending kernel. Insteade print `<NUM>`, set via
`OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The
recoding/record uses a ring buffer of fixed size (for now 8).
For `trap` errors, we print the actual kernel name, and trace if
recorded.
2024-07-31 11:49:50 -07:00
Johannes Doerfert
c95abe94ae
[Offload] Implement double free (and other allocation error) reporting (#100261)
As a first step towards a GPU sanitizer we now can track allocations and
deallocations in order to report double frees, and other problems during
deallocation.
2024-07-30 10:10:57 -07:00
Ethan Luis McDonough
2c8b912f63
Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587)"
This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.
2024-06-28 12:30:45 -05:00
Ethan Luis McDonough
5fd2af38e4
[PGO][OpenMP] Instrumentation for GPU devices (#76587)
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:

- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)

These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-06-28 10:42:19 -05:00
Tim Gymnich
597d2f7662
[OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop Trip Counts (#89239)
Sometimes it might be beneficial to spawn more thread blocks instead of
reusing existing for multiple loop iterations.

**Alternatives considered:**

Make `DefaultNumBlocks` settable via an environment variable.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-06-14 07:35:23 -07:00
Johannes Doerfert
54b5c76d3b
[Offload] Use flat array for cuLaunchKernel (#95116)
We already used a flat array of kernel launch parameters for the AMD GPU
launch but now we also use this scheme for the NVIDIA GPU launch. The
only remaining/required use of the indirection is the host plugin (due
ot ffi). This allows to us simplify the use for non-OpenMP kernel
launch.
2024-06-13 09:43:47 +03:00
Joseph Huber
435aa7663d
[Libomptarget] Rework device initialization and image registration (#93844)
Summary:
Currently, we register images into a linear table according to the
logical OpenMP device identifier. We then initialize all of these images
as one block. This logic requires that images are compatible with *all*
devices instead of just the one that it can run on. This prevents us
from running on systems with heterogeneous devices (i.e. image 1 runs on
device 0 image 0 runs on device 1).

This patch reworks the logic by instead making the compatibility check a
per-device query. We then scan every device to see if it's compatible
and do it as they come.
2024-06-06 08:10:56 -05:00
Joseph Huber
21f3a6091f
[Offload] Only initialize a plugin if it is needed (#92765)
Summary:
Initializing the plugins requires initializing the runtime like CUDA or
HSA. This has a considerable overhead on most platforms, so we should
only actually initialize a plugin if it is needed by any image that is
loaded.
2024-05-23 09:36:47 -05:00
Joseph Huber
f42f57b52d
[Libomptarget] Rework Record & Replay to be a plugin member (#88928) (#89097)
Summary:
Previously, the R&R support was global state initialized by a global
constructor. This is bad because it prevents us from adequately
constraining the lifetime of the library. Additionally, we want to
minimize the amount of global state floating around.

This patch moves the R&R support into a plugin member like everything
else. This means there will be multiple copies of the R&R implementation
floating around, but this was already the case given the fact that we
currently handle everything with dynamic libraries.
2024-05-16 14:58:46 -05:00
Joseph Huber
3abd3d6e59
[Libomptarget] Remove requires information from plugin (#80345)
Summary:
Currently this is only used for the zero-copy handling. However, this
can easily be moved into `libomptarget` so that we do not need to bother
setting the requires flags in the plugin. The advantage here is that we
no longer need to do this for every device redundently. Additionally,
these requires flags are specifically OpenMP related, so they should
live in `libomptarget`.
2024-05-16 11:13:50 -05:00
Joseph Huber
81d20d861e [Offload][NFC] Fix warning messages in runtime
Summary:
These are lots of random warnings due to inconsistent initialization or
signedness.
2024-05-15 15:30:38 -05:00
Joseph Huber
363258a3cc
[Offload] Remove old references to isCtor (#91766)
Summary:
These have long since been removed, support for ctors / dtors now
happens through special kernels the backend creates.
2024-05-14 06:00:34 -05:00
Joseph Huber
fa9e90f5d2 [Reland][Libomptarget] Statically link all plugin runtimes (#87009)
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_<name>` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971
https://github.com/llvm/llvm-project/pull/86875
https://github.com/llvm/llvm-project/pull/86868
2024-05-09 09:38:22 -05:00
Joseph Huber
e5e66073c3 Revert "[Libomptarget] Statically link all plugin runtimes (#87009)"
Caused failures on build-bots, reverting to investigate.

This reverts commit 80f9e814ec896fdc57ee84afad8ac4cb1f8e4627.
2024-05-09 07:05:23 -05:00
Joseph Huber
80f9e814ec
[Libomptarget] Statically link all plugin runtimes (#87009)
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_<name>` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971
https://github.com/llvm/llvm-project/pull/86875
https://github.com/llvm/llvm-project/pull/86868
2024-05-09 06:35:54 -05:00
Joseph Huber
e3938f4d71
[Offload] Detect native ELF machine from preprocessor (#91282)
Summary:
This gets the target's corresponding ELF value from the preprocessor.
We use this to detect if a given ELF is compatible with the CPU
offloading impolementation for OpenMP. Previously we used defitions from
CMake, but this is easier for people to understand as there may be new
users of this in the future.
2024-05-08 14:36:58 -05:00
Jhonatan Cléto
b438a817bd
[Offload] Fix dataDelete op for TARGET_ALLOC_HOST memory type (#91134)
Summary:
The `GenericDeviceTy::dataDelete` method doesn't verify the
`TargetAllocTy` of the of the device pointer. Because of this, it can
use the `MemoryManager` to free the ptr. However, the
`TARGET_ALLOC_HOST` and `TARGET_ALLOC_SHARED` types are not allocated
using the `MemoryManager` in the `GenericDeviceTy::dataAlloc` method.
Since the `MemoryManager` uses the `DeviceAllocatorTy::free` operation
without specifying the type of the ptr, some plugins may use incorrect
operations to free ptrs of certain types. In particular, this bug causes
the CUDA plugin to use the `cuMemFree` operation on ptrs of type
`TARGET_ALLOC_HOST`, resulting in an unchecked error, as shown in the
output snippet of the test
`offload/test/api/omp_host_pinned_memory_alloc.c`:

```
omptarget --> Notifying about an unmapping: HstPtr=0x00007c6114200000
omptarget --> Call to llvm_omp_target_free_host for device 0 and address 0x00007c6114200000
omptarget --> Call to omp_get_num_devices returning 1
omptarget --> Call to omp_get_initial_device returning 1
PluginInterface --> MemoryManagerTy:🆓 target memory 0x00007c6114200000.
PluginInterface --> Cannot find its node. Delete it on device directly.
TARGET CUDA RTL --> Failure to free memory: Error in cuMemFree[Host]: invalid argument
omptarget --> omp_target_free deallocated device ptr
```

This patch fixes this by adding the check of the device pointer type
before calling the appropriate operation for each type.
2024-05-07 22:21:32 -05:00
Johannes Doerfert
330d8983d2
[Offload] Move /openmp/libomptarget to /offload (#75125)
In a nutshell, this moves our libomptarget code to populate the offload
subproject.

With this commit, users need to enable the new LLVM/Offload subproject
as a runtime in their cmake configuration.
No further changes are expected for downstream code.

Tests and other components still depend on OpenMP and have also not been
renamed. The results below are for a build in which OpenMP and Offload
are enabled runtimes. In addition to the pure `git mv`, we needed to
adjust some CMake files. Nothing is intended to change semantics.

```
ninja check-offload
```
Works with the X86 and AMDGPU offload tests

```
ninja check-openmp
```
Still works but doesn't build offload tests anymore.

```
ls install/lib
```
Shows all expected libraries, incl.
- `libomptarget.devicertl.a`
- `libomptarget-nvptx-sm_90.bc`
- `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git`
- `libomptarget.so` -> `libomptarget.so.18git`

Fixes: https://github.com/llvm/llvm-project/issues/75124

---------

Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>
2024-04-22 09:51:33 -07:00