In debug mode there is a wrapper (the kernel) around the function in
which we generate the kernel code. We worked around this before to get
the correct kernel name, but now we really distinguish both to attach
the launch bounds to the kernel, not the inner function.
This patch handles dependencies specified by the `depend` clause on an
OpenMP target construct. It does this much the same way clang does it by
materializing an OpenMP `task` that is tagged with the dependencies.
The following functions are relevant to this patch -
1) `createTarget` - This function itself is largely unchanged except
that it now accepts a vector of `DependData` objects that it simply
forwards to `emitTargetCall`
2) `emitTargetCall` - This function has changed now to check if an outer
target-task needs to be materialized (i.e if `target` construct has
`nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to
do all the heavy lifting for creating and dispatching the task.
3) `emitTargetTask` - Bulk of the change is here. See the large comment
explaining what it does at the beginning of this function
Many tests in the `offload` project have requirements defined by which
targets are not supported rather than which platforms are supported.
This patch aims to streamline the requirement definitions by adding four
new feature tags: `host`, `gpu`, `amdgpu`, and `nvidiagpu`.
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:
- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)
These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
This PR attempts to fix common block mapping for regular mapping of
these types as well as when they have been marked as "declare target
link". This PR should allow correct mapping of both the members of a
common block and the full common block via its block symbol.
The main changes were some adjustments to the Fortran OpenMP lowering to
HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and
adjustments to the way the we handle target kernel map argument
rebinding inside of the OMPIRBuilder.
For the Fortran OpenMP lowering were two changes, one to prevent the
implicit capture of common block members when the common block symbol
itself has been marked and the other creates intermediate member access
inside of the target region to be used in-place of those external to the
target region, this prevents external usages breaking the
IsolatedFromAbove pact.
In the latter case, there was an adjustment to the size calculation for
types to better handle cases where we pass an array as the type of a map
(as opposed to the bounds and the type of the element), which occurs in
the case of common blocks. There is also some adjustment to how
handleDeclareTargetMapVar handles renaming of declare target symbols in
the module to the reference pointer, now it will only apply to those
within the kernel that is currently being generated and we also perform
a modification to replace constants with instructions as necessary as we
cannot replace these with our reference pointer (non-constant and
constants do not mix nicely).
In the case of the OpenMPIRBuilder some changes were made to defer
global symbol rebinding to kernel arguments until all other arguments
have been rebound. This makes sure we do not replace uses that may refer
to the global (e.g. a GEP) but are themselves actually a separate
argument that needs bound.
Currently "declare target to" still needs some work, but this may be the
case for all types in conjunction with "declare target to" at the
moment.
Sometimes it might be beneficial to spawn more thread blocks instead of
reusing existing for multiple loop iterations.
**Alternatives considered:**
Make `DefaultNumBlocks` settable via an environment variable.
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>
This PR seeks to expand/replace the Constant -> Instruction conversion
that needs to occur inside of the OpenMP Target kernel generation to
allow kernel argument replacement of uses within the kernel (cannot
replace constant uses within constant expressions with non-constants).
It does so by making use of the new-ish utility
convertUsersOfConstantsToInstructions which is a much more expansive
version of what the smaller "version" of the function I wrote does,
effectively expanding uses of the input argument that are constant
expressions into instructions so that we can replace with the
appropriate kernel argument.
Also alters convertUsersOfConstantsToInstructions to optionally
restrict the replacement to a function and optionally leave
dead constants alone, the latter is necessary when lowering from
MLIR as we cannot be sure we can remove the constants at this
stage, even if rewritten to instructions the ModuleTranslation may
maintain links to the original constants and utilise them in further
lowering steps (as when we're lowering the kernel, the module is
still in the process of being lowered). This can result in unusual
ICEs later. These dead constants can be tidied up later (and
appear to be in subsequent lowering from checking with
emit-llvm).
The current test is not really correct because the mask is set to
0xffffffff
even if it is on an AMDGPU whose wavefront size is 64. Besides,
`__AMDGCN_WAVEFRONT_SIZE` is not set on host compilation so the
verification
happens to work.
This reverts commit 098c6dfa8157681699a71fce9e3d94515e66311f.
This reverts commit 8c718a3a91df4ab68dc3f1ca3887ea730c9aed84.
This reverts commit 4fb02de9d490d0773441aa30124bb4d1272230d3.
Named location attribute added to `tgt_offload_entry` shall be used by
runtime calls like `ompx_dump_mapping_tables` to print the information
of variables that are mapped to the device. `ompx_dump_mapping_tables`
was printing the wrong location information and this change fixes it.
A sample execution of example before the change:
```
omptarget device 0 info: OpenMP Host-Device pointer mappings after block at libomptarget:0:0:
omptarget device 0 info: Host Ptr Target Ptr Size (B) DynRefCount HoldRefCount Declaration
omptarget device 0 info: 0x0000000000206df0 0x00007f02cdc00000 20000000 1 0 <program-file-loc> at unknown:18:35
```
The change replaces unknown to the mapped symbol and location to the
declaration location.
This is a large series of runtime tests that help to add coverage for the specific cases intended to be supported by the PR stack
that extends derived type map support in Flang+OpenMP. Primarily this will add functionality coverage, there's cases where
things may work, but not optimally (or at least similarly to the status quo in Clang), addiitonal IR tests are added in the
relevant segments of the related PRs to test for breakages like that.
Pull Request: https://github.com/llvm/llvm-project/pull/82850
This is a large series of runtime tests that help to add coverage for
the specific cases intended to be supported by the PR stack
that extends derived type map support in Flang+OpenMP. Primarily this will add functionality coverage, there's cases where
things may work, but not optimally (or at least similarly to the status quo in Clang), additional IR tests are added in the
relevant segments of the related PRs to test for breakages like that.
In a nutshell, this moves our libomptarget code to populate the offload
subproject.
With this commit, users need to enable the new LLVM/Offload subproject
as a runtime in their cmake configuration.
No further changes are expected for downstream code.
Tests and other components still depend on OpenMP and have also not been
renamed. The results below are for a build in which OpenMP and Offload
are enabled runtimes. In addition to the pure `git mv`, we needed to
adjust some CMake files. Nothing is intended to change semantics.
```
ninja check-offload
```
Works with the X86 and AMDGPU offload tests
```
ninja check-openmp
```
Still works but doesn't build offload tests anymore.
```
ls install/lib
```
Shows all expected libraries, incl.
- `libomptarget.devicertl.a`
- `libomptarget-nvptx-sm_90.bc`
- `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git`
- `libomptarget.so` -> `libomptarget.so.18git`
Fixes: https://github.com/llvm/llvm-project/issues/75124
---------
Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>