…on (zero-copy) on MI300A.
This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.
This patch is still missing support for global variables, which will be
provided in a subsequent patch.
Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
#65273 "hidden_dynamic_lds_size" argument will be added in the reserved
section at offset 120 of the implicit argument layout
Add DynamicLdsSize to AMDGPUImplicitArgsTy struct at offset 120 and fill
the dynamic LDS size before kernel launch.
Summary:
This patch reorganizes a lot of the code used to check for compatibility
with the current environment. The main bulk of this patch involves
moving from using a separate `__tgt_image_info` struct (which just
contains a string for the architecture) to instead simply checking this
information from the ELF directly. Checking information in the ELF is
very inexpensive as creating an ELF file is simply writing a base
pointer.
The main desire to do this was to reorganize everything into the ELF
image. We can then do the majority of these checks without first
initializing the plugin. A future patch will move the first ELF checks
to happen without initializing the plugin so we no longer need to
initialize and plugins that don't have needed images.
This patch also adds a lot more sanity checks for whether or not the ELF
is actually compatible. Such as if the images have a valid ABI, 64-bit
width, executable, etc.
Headers used throughout the different runtimes are different from the
internal headers. This is a first step to bring structure in into the
include folder.
Update DeviceRTL and the AMDGPU plugin to support code
object version 5. Default is code object version 4.
CodeGen for __builtin_amdgpu_workgroup_size generates code
for cov4 as well as cov5 if -mcode-object-version=none
is specified. DeviceRTL compilation passes this argument
via Xclang option to generate abi-agnostic code.
Generated code for the above builtin uses a clang
control constant "llvm.amdgcn.abi.version" to branch on
the abi version, which is available during linking of
user's OpenMP code. Load of this constant gets eliminated
during linking.
AMDGPU plugin queries the ELF for code object version
and then prepares various implicitargs accordingly.
Differential Revision: https://reviews.llvm.org/D139730
Reviewed By: jhuber6, yaxunl
Summary:
The changes in https://reviews.llvm.org/D150022 changed the API for this
function that we query. Simply pass in the alignment from the associated
header to fix.
Clean up for the AMD-specific kernel launch info in the NextGen Plugins.
- Fixes a mistake introduced with the initial commit that added printing
of an AMD-only property.
- Removes another AMD-only property (not clear on upstream status)
- Adds some more comment to what info is printed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D145924
Makes the info that is printed for kernel launches configurable for
different plugins. Adds all machinery to print the detailed launch
info that the current AMD plugin provides and includes e.g. register
spill counts.
The files msgpack.cpp, msgpack.def, and msgpack.h are copied from the old plugin
and are untouched. The contents of UtilitiesHSA.cpp and .h are copied together from
various files from the old plugin. The code was originally written by
Jon Chesterfield. I updated the function and type names visible to the outside, i.e.
in headers, to respect the LLVM conventions.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D144521
This patch prepares the PluginInterface for the new AMDGPU NextGen plugin. The original and the
NextGen plugin will share some structures and functionalities. We use this header for defining
them and avoiding code duplication.
Differential Revision: https://reviews.llvm.org/D139792