71 Commits

Author SHA1 Message Date
Joseph Huber
237adfca4e
[OpenMP] Rework handling of global ctor/dtors in OpenMP (#71739)
Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
2023-11-10 14:53:53 -06:00
Konstantinos Parasyris
b34d31d2e1
[OpenMP] Fix record-replay allocation order for kernel environment (#71863) 2023-11-09 12:51:22 -08:00
Johannes Doerfert
726ee40f52 [OpenMP] Move the recording code to account for KernelLaunchEnvironment
We need to record late to account for the kernel launch environment as
well as the potential changes in block and thread count.
2023-11-06 12:30:40 -08:00
Johannes Doerfert
3de645efe3 [OpenMP][NFC] Split the reduction buffer size into two components
Before we tracked the size of the teams reduction buffer in order to
allocate it at runtime per kernel launch. This patch splits the number
into two parts, the size of the reduction data (=all reduction
variables) and the (maximal) length of the buffer. This will allow us to
allocate less if we need less, e.g., if we have less teams than the
maximal length. It also allows us to move code from clangs codegen into
the runtime as we now know how large the reduction data is.
2023-11-06 11:50:41 -08:00
Johannes Doerfert
a273d17d4a [OpenMP][FIX] Do not add implicit argument to device Ctors and Dtors
Constructors and destructors on the device do not take any arguments,
also not the implicit dyn_ptr argument other kernels automatically take.
2023-11-01 11:18:11 -07:00
Johannes Doerfert
f9a89e6b9c
[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752)
We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instances of a kernel,
or different kernels with team reductions, would use the same locations.
Since we now have the kernel launch environment, we can allocate dynamic
memory per-launch, allowing us to move all the state into a non-racy
place.

Fixes: https://github.com/llvm/llvm-project/issues/70249
2023-11-01 11:11:48 -07:00
Johannes Doerfert
b8cbc5c02c
[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401)
The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the runtime. The
KernelLaunchEnvironment is for dynamic information *per* kernel launch.
It allows the rutime to feed information to the kernel that is not
shared with other invocations of the kernel. The first use case is to
replace the globals that synchronize teams reductions with per-launch
versions. This allows concurrent teams reductions. More uses cases will
follow, e.g., per launch memory pools.

Fixes: https://github.com/llvm/llvm-project/issues/70249
2023-10-31 19:38:43 -07:00
Konstantinos Parasyris
d6a3d6b96d
[openmp] Fixed Support for VA for record-replay. (#70396)
The commit was discussed in phabricator
(https://reviews.llvm.org/D157186).

Record replay currently fails on AMD as it conflicts with the heap
memory allocator introduced in #69806. The workaround is setting
`LIBOMPTARGET_HEAP_SIZE=0` during both record and replay run.
2023-10-29 12:27:19 -07:00
Johannes Doerfert
d346c82435
[OpenMP] Associate the KernelEnvironment with the GenericKernelTy (#70383)
By associating the kernel environment with the generic kernel we can
access middle-end information easily, including the launch bounds ranges
that are acceptable. By constraining the number of threads accordingly,
we now obey the user-provided bounds that were passed via attributes.
2023-10-29 11:35:34 -07:00
Johannes Doerfert
d3921e4670
[OpenMP] Basic BumpAllocator for (AMD)GPUs (#69806)
The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to
run more tests. The allocator implements `malloc`, both internally and
externally, while we continue to default to the NVIDIA `malloc` when we
target NVIDIA GPUs. Once we have smarter or customizable allocators we
should consider this choice, for now, this allocator is better than
none. It traps if it is out of memory, making it easy to debug. Heap
size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB.
It allows to track allocation statistics via
`LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with
`-fopenmp-target-debug=8`). Two tests were added, and one was enabled.

This is the next step towards fixing
 https://github.com/llvm/llvm-project/issues/66708
2023-10-21 14:49:30 -07:00
Johannes Doerfert
1cea309b7e [OpenMP][NFC] Move DebugKind to make it reusable from the host 2023-10-20 19:28:09 -07:00
Michael Halkenhäuser
53602e6193
[OpenMP][OMPT] Fix device identifier collision during callbacks (#65595)
Fixes: https://github.com/llvm/llvm-project/issues/65104
When a user assigns devices to target regions it may happen that
different identifiers will map onto the same id within different
plugins. This will lead to situations where callbacks will become much
harder to read, as ambiguous identifiers are reported.

We fix this by collecting the index-offset upon general RTL
initialization. Which in turn, allows to calculate the unique,
user-observable device id.
2023-09-11 12:11:44 +02:00
Joseph Huber
460840c09d
[OpenMP] Support 'omp_get_num_procs' on the device (#65501)
Summary:
The `omp_get_num_procs()` function should return the amount of
parallelism availible. On the GPU, this was not defined. We have elected
to define this function as the maximum amount of wavefronts / warps that
can be simultaneously resident on the device. For AMDGPU this is the
number of CUs multiplied byth CU's per wave. For NVPTX this is the
maximum threads per SM divided by the warp size and multiplied by the
number of SMs.
2023-09-06 13:45:05 -05:00
Doru Bercea
2102ed0b91 Fix for openmp tests honoring thread_limit.
Diff: https://reviews.llvm.org/D159001
2023-08-28 13:04:17 -04:00
Shilei Tian
fbcce33706 [OpenMP] Honor thread_limit value when choosing grid size
D152014 introduced an optimization that favors more smaller blocks over
fewer larger blocks, even if user sets `thread_limit` explicitly. This patch changes
the behavior to honor user value.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D158802
2023-08-26 22:17:49 -04:00
Joseph Huber
aa78e94b0b [Libomptarget] Support mapping indirect host calls to device functions
The changes in D157738 allowed for us to emit stub globals on the device
in the offloading entry section. These globals contain the addresses of
device functions and allow us to map host functions to their
corresponding device equivalent. This patch provides the initial support
required to build a table on the device to lookup the associated value.
This is done by finding these entries and creating a global table on the
device that can be searched with a simple binary search.

This requires an allocation, which supposedly should be automatically
freed at plugin shutdown. This includes a basic test which looks up device
pointers via a host pointer using the added function. This will need to be built
upon to provide full support for these calls in the runtime.

To support reverse offloading it would also be useful to provide a reverse table
that allows us to get host functions from device stubs.

Depends on D157738

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D157918
2023-08-25 18:51:56 -05:00
Kevin Sala
b8e297d1af [OpenMP][libomptarget] Improve kernel initialization in plugins
This patch modifies the plugins so that the initialization of KernelTy objects
is done in the init method. Part of the initialization was done in the
constructKernelEntry method. Now this method is called constructKernel
and only allocates and constructs a KernelTy object.

This patch prepares the kernel class for the new implementation of device
reductions.

Differential Revision: https://reviews.llvm.org/D156917
2023-08-06 11:53:58 +02:00
Shilei Tian
14d57545b2 [NFC][OpenMP] Fix compile warnings introduced in recent patches 2023-08-05 19:38:45 -04:00
koparasy
73cb01dc8a [OpenMP] Support for OpenMP-Offload Record Replay
Enable record-replay for OpenMP offload kernels.  On recording the initialization
is performed on device initialization by reading env variables. (This is similar to
the way rr used to operate). The primary change takes place in the replay phase
with the replay tool explicitly initializing the record-replay functionality.

Differential Revision: https://reviews.llvm.org/D156174

Fix
2023-08-05 00:46:06 -07:00
Joseph Huber
c96cba3aea [Libomptarget] Fix compilation of libomptarget with old GCC
Summary:
Older gcc can't figure out the copy elision and needs an explicit move.
2023-08-03 10:49:35 -05:00
Kevin Sala
4f46a48aaf [OpenMP][libomptarget] Remove unused virtual functions in GenericKernelTy
The virtual functions getDefaultNumBlocks and getDefaultNumThreads from the kernels are
only forwarding the call to the generic device's ones. This patch removes those two
functions from the kernels (and their derived ones). Now calls are made to the device's
functions directly.

Differential Revision: https://reviews.llvm.org/D156905
2023-08-02 17:18:50 +02:00
Michael Halkenhaeuser
5b19f42b63 [OpenMP][AMDGPU] Single eager resource init + HSA queue utilization tracking
This patch lazily initializes queues/streams/events since their initialization
might come at a cost even if we do not use them.

To further benefit from this, AMDGPU/HSA queue management is moved into the
AMDGPUStreamManager of an AMDGPUDevice. Streams may now use different HSA queues
during their lifetime and identify busy queues.

When a Stream is requested from the resource manager, it will search for and
try to assign an idle queue. During the search for an idle queue the manager
may initialize more queues, up to the set maximum (default: 4).
When no idle queue could be found: resort to round robin selection.

With contributions from Johannes Doerfert <johannes@jdoerfert.de>

Depends on D156245

Reviewed By: kevinsala

Differential Revision: https://reviews.llvm.org/D154523
2023-08-02 08:22:26 -04:00
Shilei Tian
10068cd654 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-26 13:35:14 -04:00
Shilei Tian
6bd74fd65f Revert commits for kernel environment
This reverts commits for kernel environments as they causes issues in AMD BB.
2023-07-23 23:32:31 -04:00
Shilei Tian
c5c8040390 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-23 18:36:01 -04:00
Michael Halkenhaeuser
d82eace1c9 [OpenMP][OMPT] Add 'Initialized' flag
We observed some overhead and unnecessary debug output.
This can be alleviated by (re-)introduction of a boolean that indicates, if the
OMPT initialization has been performed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D155186
2023-07-21 08:19:03 -04:00
Joseph Huber
8a0763f19c [Libomptarget] Remove RPCHandleTy indirection
The 'RPCHandleTy' was intended to capture the intention that a specific
device owns its slot in the RPC server. However, this required creating
a temporary store to hold these pointers. This was causing really weird
spurious failure due to undefined behaviour in the order of library
teardown. For example, the x64 plugin would be torn down, set this to
some invalid memory, and then the CUDA plugin would crash. Rather than
spend the time to fully diagnose this problem I found it pertinent to
simply remove the failure mode.

This patch removes this indirection so now the usage of the RPC server
must always be done with the intended device. This just requires some
extra handling for the AMDGPU indirection where we need to store a
reference to the device.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154971
2023-07-11 10:54:40 -05:00
Michael Halkenhaeuser
142faf56f5 [OpenMP] [OMPT] [amdgpu] [5/8] Implemented device init/fini/load callbacks
Added support in the generic plugin to invoke registered callbacks.

Depends on D124070

Patch from John Mellor-Crummey <johnmc@rice.edu>
(With contributions from Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>)

Differential Revision: https://reviews.llvm.org/D124652
2023-07-11 07:13:22 -04:00
Shao-Ce SUN
048423702d [OpenMP] Fix build warnings
```
llvm-project/openmp/libomptarget/src/private.h:260:9: warning: 'DEBUG_PREFIX' macro redefined [-Wmacro-redefined]
#define DEBUG_PREFIX GETNAME(TARGET_NAME)
        ^
llvm-project/openmp/libomptarget/include/ompt_device_callbacks.h:22:9: note: previous definition is here
#define DEBUG_PREFIX "OMPT"
        ^
1 warning generated.
```

```
llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:458:14: warning: moving a local object in a return statement prevents copy elision [-Wpessimizing-move]
      return std::move(Err);
             ^
llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:458:14: note: remove std::move call here
      return std::move(Err);
             ^~~~~~~~~~   ~
llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:552:12: warning: moving a local object in a return statement prevents copy elision [-Wpessimizing-move]
    return std::move(Err);
           ^
llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:552:12: note: remove std::move call here
    return std::move(Err);
           ^~~~~~~~~~   ~
2 warnings generated.
```

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D154787
2023-07-09 22:12:23 +08:00
Joseph Huber
691dc2d10d [Libomptarget] Begin implementing support for RPC services
This patch adds the intial support for running an RPC server in
libomptarget to handle host services. We interface with the library
provided by the `libc` project to stand up a basic server. We introduce
a new type that is controlled by the plugin and has each device
intialize its interface. We then run a basic server to check the RPC
buffer.

This patch does not fully implement the interface. In the future each
plugin will want to define special handlers via the interface to support
things like malloc or H2D copies coming from RPC. We will also want to
allow the plugin to specify t he number of ports. This is currently
capped in the implementation but will be adjusted soon.

Right now running the server is handled by whatever thread ends up doing
the waiting. This is probably not a completely sound solution but I am
not overly familiar with the behaviour of OpenMP tasks and what would be
required here. This works okay with synchrnous regions, and somewhat
fine with `nowait` regions, but I've observed some weird behavior when
one of those regions calls `exit`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D154312
2023-07-07 12:36:46 -05:00
Joseph Huber
6764301a6b [Libomptarget] Correctly implement getWTime on AMDGPU
AMDGPU provides a fixed frequency clock since some generations back.
However, the frequency is variable by card and must be looked up at
runtime. This patch adds a new device environment line for the clock
frequency so that we can use it in the same way as NVPTX. This is the
correct implementation and the version in ASO should be replaced.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D154456
2023-07-04 21:50:43 -05:00
Johannes Doerfert
6629a96a8c [OpenMP] Improve default block count selection fow low block counts
If a combined loop has insufficient parallelism (= low trip count), we
might end up with too few teams/blocks. To counter that we can reduce
the number of threads per team we use. This patch implements a heuristic
and exposes a new environment variable to control the minimum of threads
to be employed in this case.

Issue reported by:
Felipe Cabarcas Jaramillo <cabarcas@udel.edu> (@fel-cab).

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D152014
2023-06-05 16:35:44 -07:00
Kevin Sala
843f496b71 [OpenMP][libomptarget] Improve device info printing in NextGen plugins
This patch improves the device info printing in the NextGen plugins. The device
info properties are composed of keys, values and units (if necessary). These
properties are pushed into a queue by each vendor-specifc plugin, and later,
these properties are printed processed and printed by the common Plugin
Interface. The printing format is common across the different plugins.

Differential Revision: https://reviews.llvm.org/D148178
2023-05-09 15:34:15 +02:00
Shilei Tian
d4ecd1241c Revert "[OpenMP] Introduce kernel environment"
This reverts commit 35cfadfbe2decd9633560b3046fa6c17523b2fa9.

It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
2023-04-22 20:56:35 -04:00
Shilei Tian
35cfadfbe2 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-04-22 20:46:38 -04:00
Kevin Sala
221350965a [OpenMP][libomptarget][NFC] Remove error data member from AsyncInfoWrapperTy
This patch removes the Err data member from the AsyncInfoWrapperTy class. Now the error
is stored externally, in the caller side, and it is explicitly passed to the
AsyncInfoWrapperTy::finalize() function as a reference.

Differential Revision: https://reviews.llvm.org/D148027
2023-04-18 18:52:01 +02:00
Johannes Doerfert
110cf873ad [OpenMP][NFC] Silence warning 2023-04-17 15:57:10 -07:00
Kevin Sala
8dad7f4953 [OpenMP][libomptarget] Do not rely on AsyncInfoWrapperTy's destructor 2023-04-04 17:51:28 +02:00
Kevin Sala
48cd8b54d1 [NFC][OpenMP][libomptarget] Remove unnecessary AsyncInfoWrapperTy parameter 2023-03-28 17:28:12 +02:00
Joseph Huber
edc0355006 [Libomptarget] Add missing explicit moves on llvm::Error
Summary:
Some older compilers, which we still support, have problems handling the
copy elision that allows us to directly move an `Error` to an
`Expected`. This patch adds explicit moves to remove the error.
2023-03-20 11:49:59 -05:00
Joseph Huber
48d5ad93cd [OpenMP][NFC] Clean up Twines and other issues in plugins
Summary:
Tihs patch is mostly NFC to fix some warning currently present in OpenMP
offloading plugins. Specifically this mostly removes the use of Twine
variables in favor of LLVM's small string. Twine variables are prone to
use-after-free and this is a cleaner way to concatenate a string.
2023-03-01 15:03:21 -06:00
Joseph Huber
656378085e [Libomptarget] Fix block and thread limit environment variables not being respected
The next-gen plugins did not properly set the values from
`OMP_NUM_TEAMS` and `OMP_TEAMS_THREAD_LIMIT`. This is because these
maximum values are set by each plugin to its hardware maximum. This
happens *after* the previous initialization. Move it to the correct
place and then add a test.

Fixes https://github.com/llvm/llvm-project/issues/61082

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D145105
2023-03-01 14:12:46 -06:00
JP Lehr
b82ac74f7e [OpenMP][AMDGPU] More detail in AMDGPU kernel launch info
Makes the info that is printed for kernel launches configurable for
different plugins. Adds all machinery to print the detailed launch
info that the current AMD plugin provides and includes e.g. register
spill counts.

The files msgpack.cpp, msgpack.def, and msgpack.h are copied from the old plugin
and are untouched. The contents of UtilitiesHSA.cpp and .h are copied together from
various files from the old plugin. The code was originally written by
Jon Chesterfield. I updated the function and type names visible to the outside, i.e.
in headers, to respect the LLVM conventions.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D144521
2023-02-28 07:41:48 -05:00
Joseph Huber
9b8e4b4f96 [Libomptarget] Remove unused image argument from global handler function
Summary:
A previous patch got rid of the use of this image but forgot to remove
it from this function. Simply remove it as it is unused now.
2023-02-24 07:24:29 -06:00
Kevin Sala
6ca034644d [OpenMP][libomptarget] Notify the plugins regarding new mapping/unmappings
The NextGen plugins use the information regarding new mapping/unmappings to
lock/unlock the corresponding host buffer and speed up the host-device memory
transfers involving those buffers. The locking/unlocking is disabled by default
and can be enabled by the LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS envar. The
envar accepts boolean values (on/off) and a special option:
  - off:       Do not lock mapped host buffers (default).
  - on:        Lock mapped host buffers automatically, but do not report lock
               failures if the plugin fails to lock them.
  - mandatory: Lock mapped host buffers automatically and treat locking failures
               in the plugins as fatal errors. This option may be useful for
               debugging purposes.

Differential Revision: https://reviews.llvm.org/D142514
2023-02-06 10:09:35 +01:00
Kevin Sala
2a539ee17d [OpenMP][libomptarget] Implement memory lock/unlock API in NextGen plugins
This patch implements the memory lock/unlock API, introduced in patch https://reviews.llvm.org/D139208,
in the NextGen plugins. Locked buffers feature reference counting and we allow certain overlapping. Given
an already locked buffer A, other buffers that are fully contained inside A can be locked again, even if
they are smaller than A. In this case, the reference count of locked buffer A will be incremented. However,
extending an existing locked buffer is not allowed. The original buffer is actually unlocked once all its
users have released the locked buffer and sub-buffers (i.e., the reference counter becomes zero).

Differential Revision: https://reviews.llvm.org/D141227
2023-01-25 00:11:38 +01:00
Joseph Huber
b280e12a3d [Libomptarget][NFC] Address a few warnings in libomptarget
Summary:
Fix a few minor warnings that show up in `libomptarget`.
2023-01-23 08:56:03 -06:00
Johannes Doerfert
40f9bf082f [OpenMP] Introduce the ompx_dyn_cgroup_mem(<N>) clause
Dynamic memory allows users to allocate fast shared memory when a kernel
is launched. We support a single size for all kernels via the
`LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable but now we can
control it per kernel invocation, hence allow computed values.

Note: Only the nextgen plugins will allocate memory based on the clause,
      the old plugins will silently miscompile.

Differential Revision: https://reviews.llvm.org/D141233
2023-01-21 18:46:36 -08:00
Johannes Doerfert
16a385ba21 [OpenMP] Modernize the kernel launching interface and APIs
We already created a versioned `__tgt_kernel_arguments` struct but it
was only briefly used and its content was passed in isolation anyway.
This makes it hard to add more information in the future. With this
patch we fully embrace the struct as means to pass information from the
compiler to the plugin as part of a kernel launch.

The patch also extends and renames the struct, bumping the version
number to 2. Version 1 entries are auto-upgraded. This is in preparation
for "bare" kernel launches, per kernel dynamic shared memory, CUDA/HIP
lowering, etc.

The `__tgt_target_kernel_nowait` interface was deprecated as it was
unused. Once we actually implement support for something like that, we
can add an appropriate API.

Note: Only plugins with the `launch_kernel` interface are now supported.
      That means that a new clang won't be able to use an old runtime.
      An old clang can still use the new runtime since the libomptarget
      interface did not change.

Differential Revision: https://reviews.llvm.org/D141232
2023-01-21 11:16:21 -08:00
Giorgis Georgakoudis
0f4b4e8e4d [OpenMP] RecordReplay saves bitcode when JIT-ing
This patch enables to store bitcode images when JIT is enabled for the record-and-replay functionality (see https://reviews.llvm.org/D138931). Credits to @jdoerfert for refactoring the code.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D141986
2023-01-18 11:25:25 -08:00