18 Commits

Author SHA1 Message Date
Joseph Huber
ed68aac9f2
[Libomptarget] Move API implementations into GenericPluginTy (#86683)
Summary:
The plan is to remove the entire plugin interface and simply use the
`GenericPluginTy` inside of `libomptarget` by statically linking against
it. This means that inside of `libomptarget` we will simply do
`Plugin.data_alloc` without the dynamically loaded interface. To reduce
the amount of code required, this patch simply moves all of the RTL
implementation functions inside of the Generic device. Now the
`__tgt_rtl_` interface is simply a shallow wrapper that will soon go
away. There is some redundancy here, this will be improved later. For
now what is important is minimizing the changes to the API.
2024-03-27 14:10:54 -05:00
Joseph Huber
4dc3225248
[Libomptarget] Replace global PluginTy::get interface with references (#86595)
Summary:
We have a plugin singleton that implements the Plugin interface. This
then spawns separate device and kernels. Previously when these needed to
reach into the global singleton they would use the `PluginTy::get`
routine to get access to it. In the future we will move away from this
as the lifetime of the plugin will be handled by `libomptarget`
directly. This patch removes uses of this inside of the plugin
implementaion themselves by simply keeping a reference to the plugin
inside of the device.

The external `__tgt_rtl` functions still use the global method, but will
be removed later.
2024-03-26 07:13:59 -05:00
Joseph Huber
2cad43c1ba
[Libomptarget] Factor functions out of 'Plugin' interface (#86528)
Summary:
This patch factors common functions out of the `Plugin` interface prior
to its removal in a future patch. This simply temporarily renames it to
`PluginTy` so that we could re-use `Plugin::check` internally as this
needs to be defined statically per plugin now. We can refactor this
later.

The future patch will delete `PluginTy` and `PluginTy::get` entirely.
This simply tries to minimize a few changes to make it easier to land.
2024-03-25 15:24:39 -05:00
dhruvachak
b5d02bbd0d
[OpenMP] Increment kernel args version, used by runtime for detecting dyn_ptr. (#85363)
A kernel implicit parameter (dyn_ptr) was introduced some time back.
This patch increments the kernel args version for a compiler supporting
dyn_ptr. The version will be used by the runtime to determine whether
the implicit parameter is generated by the compiler. The versioning is
required to support use cases where code generated by an older compiler
is linked with a newer runtime.

If approved, this patch should be backported to release 18.
2024-03-19 16:40:22 -07:00
Joseph Huber
470040bd4d [Libomptarget][NFC] Remove warning on return value const 2024-03-15 18:50:33 -05:00
Joseph Huber
0ac4438560
[Libomptarget] Remove unused 'SupportsEmptyImages' API function (#80316)
Summary:
This function is always false in the current implementation and is not
even considered required. Just remove it and if someone needs it in the
future they can add it back in. This is done to simplify the interface
prior to other changes
2024-02-05 10:00:09 -06:00
Joseph Huber
621bafd5c1
[Libomptarget] Move target table handling out of the plugins (#77150)
Summary:
This patch removes the bulk of the handling of the
`__tgt_offload_entries` out of the plugins itself. The reason for this
is because the plugins themselves should not be handling this
implementation detail of the OpenMP runtime. Instead, we expose two new
plugin API functions to get the points to a device pointer for a global
as well as a kernel type.

This required introducing a new type to represent a binary image that
has been loaded on a device. We can then use this to load the addresses
as needed. The creation of the mapping table is then handled just in
`libomptarget` where we simply look up each address individually. This
should allow us to expose these operations more generically when we
provide a separate API.
2024-01-22 11:06:47 -06:00
carlobertolli
ae99966a27
[OpenMP] Enable automatic unified shared memory on MI300A. (#77512)
This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.

This patch also introduces an environment variable OMPX_APU_MAPS that,
if set, triggers automatic zero-copy also on non APU GPUs (e.g., on
discrete GPUs).
This patch is still missing support for global variables, which will be
provided in a subsequent patch.

Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-22 10:30:22 -06:00
Joseph Huber
37c1a5e3f5
[Libomptarget] Fix GPU Dtors referencing possibly deallocated image (#77828)
Summary:
The constructors and destructors look up a symbol in the ELF quickly to
determine if they need to be run on the GPU. This allows us to avoid the
very slow actions required to do the slower lookup using the vendor API.

One problem occurs with how we handle the lifetime of these images.
Right now there is no invariant to specify the lifetime of the
underlying binary image that is loaded. In the typical case, this comes
from the binary itself in the `.llvm.offloading` section, meaning that
the lifetime of the binary should match the executable itself. This
would work fine, if it weren't for the fact that the plugin is loaded
via `dlopen` and can have a teardown order out of sync with the main
executable.

This was likely what was occuring when this failed on some systems but
not others. A potential solution would be to simply copy images into
memory so the runtime does not rely on external references. Another
would be to manually zero these out after initialization as to prevent
this mistake from happening accidentally. The former has the benefit of
making some checks easier, and allowing for constant initialization be
done on the ELF itself (normally we can't do this because writing to a
constant section, e.g. .llvm.offloading is a segfault.). The downside
would be the extra time required to copy the image in bulk (Although we
are likely doing this in the vendor runtimes as well).

This patch went with a quick solution to simply set a boolean value at
initialization time if we need to call destructors.

Fixes: https://github.com/llvm/llvm-project/issues/77798
2024-01-11 15:00:53 -06:00
carlobertolli
ce4144406c
Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371)
Reverts llvm/llvm-project#75999

lit test is failing.
2024-01-08 14:38:29 -06:00
carlobertolli
22a73e7c46
[OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999)
…on (zero-copy) on MI300A.

This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.

This patch is still missing support for global variables, which will be
provided in a subsequent patch.

Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-08 14:17:28 -06:00
Joseph Huber
64f0681e97
[Libomptarget] Rework image checking further (#76120)
Summary:
In the future, we may have more checks for different kinds of inputs,
e.g. SPIR-V. This patch simply reworks the handling to be more generic
and do the magic detection up-front. The checks inside the routines are
now asserts so we don't spend time checking this stuff over and over
again.

This patch also tweaked the bitcode check. I used a different function
to get the Lazy-IR module now, as it returns the raw expected value
rather than the SM diganostic.

No functionality change intended.
2023-12-29 15:14:39 -06:00
Joseph Huber
ac029e02a9
[Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720)
Summary:
This patch reorganizes a lot of the code used to check for compatibility
with the current environment. The main bulk of this patch involves
moving from using a separate `__tgt_image_info` struct (which just
contains a string for the architecture) to instead simply checking this
information from the ELF directly. Checking information in the ELF is
very inexpensive as creating an ELF file is simply writing a base
pointer.

The main desire to do this was to reorganize everything into the ELF
image. We can then do the majority of these checks without first
initializing the plugin. A future patch will move the first ELF checks
to happen without initializing the plugin so we no longer need to
initialize and plugins that don't have needed images.

This patch also adds a lot more sanity checks for whether or not the ELF
is actually compatible. Such as if the images have a valid ABI, 64-bit
width, executable, etc.
2023-12-19 20:01:31 -06:00
Shilei Tian
3768039913
[OpenMP] Directly use user's grid and block size in kernel language mode (#70612)
In kernel language mode, use user's grid and blocks size directly. No
validity
check, which means if user's values are too large, the launch will fail,
similar
to what CUDA and HIP are doing right now.
2023-12-18 12:26:18 -05:00
Kazu Hirata
b8f89b84bc Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-16 15:02:17 -08:00
Johannes Doerfert
5fe741f08e
[OpenMP] Separate Requirements into a standalone header (#74126)
This is not completely NFC since we now check all 4 requirements and the
test is checking the good and the bad case for combining flags.
2023-12-01 14:47:00 -08:00
Johannes Doerfert
148dec9fa4
[OpenMP][NFC] Separate Envar (environment variable) handling (#73994) 2023-11-30 15:23:34 -08:00
Johannes Doerfert
db96a9c3b7
[OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725)
For historic reasons we had it setup that there was
`  plugin-nextgen/common/PluginInterface/<sources + headers>`
which is not what we do anywhere else.
Now it looks like the rest:
```
  plugin-nextgen/common/include/<headers>
  plugin-nextgen/common/src/<sources>
```
As part of this, `dlwrap.h` was moved into common/include (as
`DLWrap.h`)
since it is exclusively used by the plugins.
2023-11-29 07:57:01 -08:00