
Use a top level section to ensure that there is only one entry in the flang.llvm.org/docs page. Also generate a table of contents.
266 lines
13 KiB
Markdown
266 lines
13 KiB
Markdown
<!--===- docs/OpenMP-declare-target.md
|
|
|
|
Part of the LLVM Project, under the Apache License v2.0 with LLVM
|
|
Exceptions.
|
|
See https://llvm.org/LICENSE.txt for license information.
|
|
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
|
-->
|
|
|
|
# OpenMP Declare Target
|
|
|
|
```{contents}
|
|
---
|
|
local:
|
|
---
|
|
```
|
|
|
|
## Introduction to Declare Target
|
|
|
|
In OpenMP `declare target` is a directive that can be applied to a function or
|
|
variable (primarily global) to notate to the compiler that it should be
|
|
generated in a particular device's environment. In essence whether something
|
|
should be emitted for host or device, or both. An example of its usage for
|
|
both data and functions can be seen below.
|
|
|
|
```Fortran
|
|
module test_0
|
|
integer :: sp = 0
|
|
!$omp declare target link(sp)
|
|
end module test_0
|
|
|
|
program main
|
|
use test_0
|
|
!$omp target map(tofrom:sp)
|
|
sp = 1
|
|
!$omp end target
|
|
end program
|
|
```
|
|
|
|
In the above example, we create a variable in a separate module, mark it
|
|
as `declare target` and then map it, embedding it into the device IR and
|
|
assigning to it.
|
|
|
|
|
|
```Fortran
|
|
function func_t_device() result(i)
|
|
!$omp declare target to(func_t_device) device_type(nohost)
|
|
INTEGER :: I
|
|
I = 1
|
|
end function func_t_device
|
|
|
|
program main
|
|
!$omp target
|
|
call func_t_device()
|
|
!$omp end target
|
|
end program
|
|
```
|
|
|
|
In the above example, we are stating that a function is required on device
|
|
utilising `declare target`, and that we will not be utilising it on host,
|
|
so we are in theory free to remove or ignore it there. A user could also
|
|
in this case, leave off the `declare target` from the function and it
|
|
would be implicitly marked `declare target any` (for both host and device),
|
|
as it's been utilised within a target region.
|
|
|
|
## Declare Target as represented in the OpenMP Dialect
|
|
|
|
In the OpenMP Dialect `declare target` is not represented by a specific
|
|
`operation`. Instead, it's an OpenMP dialect specific `attribute` that can be
|
|
applied to any operation in any dialect, which helps to simplify the
|
|
utilisation of it. Rather than replacing or modifying existing global or
|
|
function `operations` in a dialect, it applies to it as extra metadata that
|
|
the lowering can use in different ways as is necessary.
|
|
|
|
The `attribute` is composed of multiple fields representing the clauses you
|
|
would find on the `declare target` directive i.e. device type (`nohost`,
|
|
`any`, `host`) or the capture clause (`link` or `to`). A small example of
|
|
`declare target` applied to a Fortran `real` can be found below:
|
|
|
|
```
|
|
fir.global internal @_QFEi {omp.declare_target =
|
|
#omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 {
|
|
%0 = fir.undefined f32
|
|
fir.has_value %0 : f32
|
|
}
|
|
```
|
|
|
|
This would look similar for function style `operations`.
|
|
|
|
The application and access of this attribute is aided by an OpenMP Dialect
|
|
MLIR Interface named `DeclareTargetInterface`, which can be utilised on
|
|
operations to access the appropriate interface functions, e.g.:
|
|
|
|
```C++
|
|
auto declareTargetGlobal =
|
|
llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(Op.getOperation());
|
|
declareTargetGlobal.isDeclareTarget();
|
|
```
|
|
|
|
## Declare Target Fortran OpenMP Lowering
|
|
|
|
The initial lowering of `declare target` to MLIR for both use-cases is done
|
|
inside of the usual OpenMP lowering in flang/lib/Lower/OpenMP.cpp. However,
|
|
some direct calls to `declare target` related functions from Flang's
|
|
lowering bridge in flang/lib/Lower/Bridge.cpp are made.
|
|
|
|
The marking of operations with the declare target attribute happens in two
|
|
phases, the second one optional and contingent on the first failing. The
|
|
initial phase happens when the declare target directive and its clauses
|
|
are initially processed, with the primary data gathering for the directive and
|
|
clause happening in a function called `getDeclareTargetInfo`. This is then used
|
|
to feed the `markDeclareTarget` function, which does the actual marking
|
|
utilising the `DeclareTargetInterface`. If it encounters a variable or function
|
|
that has been marked twice over multiple directives with two differing device
|
|
types (e.g. `host`, `nohost`), then it will swap the device type to `any`.
|
|
|
|
Whenever we invoke `genFIR` on an `OpenMPDeclarativeConstruct` from the
|
|
lowering bridge, we are also invoking another function called
|
|
`gatherOpenMPDeferredDeclareTargets`, which gathers information relevant to the
|
|
application of the `declare target` attribute. This information
|
|
includes the symbol that it should be applied to, device type clause,
|
|
and capture clause, and it is stored in a vector that is part of the lowering
|
|
bridge's instantiation of the `AbstractConverter`. It is only stored if we
|
|
encounter a function or variable symbol that does not have an operation
|
|
instantiated for it yet. This cannot happen as part of the
|
|
initial marking as we must store this data in the lowering bridge and we
|
|
only have access to the abstract version of the converter via the OpenMP
|
|
lowering.
|
|
|
|
The information produced by the first phase is used in the second phase,
|
|
which is a form of deferred processing of the `declare target` marked
|
|
operations that have delayed generation and cannot be proccessed in the
|
|
first phase. The main notable case this occurs currently is when a
|
|
Fortran function interface has been marked. This is
|
|
done via the function
|
|
`markOpenMPDeferredDeclareTargetFunctions`, which is called from the lowering
|
|
bridge at the end of the lowering process allowing us to mark those where
|
|
possible. It iterates over the data previously gathered by
|
|
`gatherOpenMPDeferredDeclareTargets`
|
|
checking if any of the recorded symbols have now had their corresponding
|
|
operations instantiated and applying the declare target attribute where
|
|
possible utilising `markDeclareTarget`. However, it must be noted that it
|
|
is still possible for operations not to be generated for certain symbols,
|
|
in particular the case of function interfaces that are not directly used
|
|
or defined within the current module. This means we cannot emit errors in
|
|
the case of left-over unmarked symbols. These must (and should) be caught
|
|
by the initial semantic analysis.
|
|
|
|
NOTE: `declare target` can be applied to implicit `SAVE` attributed variables.
|
|
However, by default Flang does not represent these as `GlobalOp`'s, which means
|
|
we cannot tag and lower them as `declare target` normally. Instead, similarly
|
|
to the way `threadprivate` handles these cases, we raise and initialize the
|
|
variable as an internal `GlobalOp` and apply the attribute. This occurs in the
|
|
flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`.
|
|
|
|
## Declare Target Transformation Passes for Flang
|
|
|
|
There are currently two passes within Flang that are related to the processing
|
|
of `declare target`:
|
|
* `MarkDeclareTarget` - This pass is in charge of marking functions captured
|
|
(called from) in `target` regions or other `declare target` marked functions as
|
|
`declare target`. It does so recursively, i.e. nested calls will also be
|
|
implicitly marked. It currently will try to mark things as conservatively as
|
|
possible, e.g. if captured in a `target` region it will apply `nohost`, unless
|
|
it encounters a `host` `declare target` in which case it will apply the `any`
|
|
device type. Functions are handled similarly, except we utilise the parent's
|
|
device type where possible.
|
|
* `FunctionFiltering` - This is executed after the `MarkDeclareTarget`
|
|
pass, and its job is to conservatively remove host functions from
|
|
the module where possible when compiling for the device. This helps make
|
|
sure that most incompatible code for the host is not lowered for the
|
|
device. Host functions with `target` regions in them need to be preserved
|
|
(e.g. for lowering the `target region`(s) inside). Otherwise, it removes
|
|
any function marked as a `declare target host` function and any uses will be
|
|
replaced with `undef`'s so that the remaining host code doesn't become broken.
|
|
Host functions with `target` regions are marked with a `declare target host`
|
|
attribute so they will be removed after outlining the target regions contained
|
|
inside.
|
|
|
|
While this infrastructure could be generally applicable to more than just Flang,
|
|
it is only utilised in the Flang frontend, so it resides there rather than in
|
|
the OpenMP dialect codebase.
|
|
|
|
## Declare Target OpenMP Dialect To LLVM-IR Lowering
|
|
|
|
The OpenMP dialect lowering of `declare target` is done through the
|
|
`amendOperation` flow, as it's not an `operation` but rather an
|
|
`attribute`. This is triggered immediately after the corresponding
|
|
operation has been lowered to LLVM-IR. As it is applicable to
|
|
different types of operations, we must specialise this function for
|
|
each operation type that we may encounter. Currently, this is
|
|
`GlobalOp`'s and `FuncOp`'s.
|
|
|
|
`FuncOp` processing is fairly simple. When compiling for the device,
|
|
`host` marked functions are removed, including those that could not
|
|
be removed earlier due to having `target` directives within. This
|
|
leaves `any`, `device` or indeterminable functions left in the
|
|
module to lower further. When compiling for the host, no filtering is
|
|
done because `nohost` functions must be available as a fallback
|
|
implementation.
|
|
|
|
For `GlobalOp`'s, the processing is a little more complex. We
|
|
currently leverage the `registerTargetGlobalVariable` and
|
|
`getAddrOfDeclareTargetVar` `OMPIRBuilder` functions shared with Clang.
|
|
These two functions invoke each other depending on the clauses and options
|
|
provided to the `OMPIRBuilder` (in particular, unified shared memory). Their
|
|
main purposes are the generation of a new global device pointer with a
|
|
"ref_" prefix on the device and enqueuing metadata generation by the
|
|
`OMPIRBuilder` to be produced at module finalization time. This is done
|
|
for both host and device and it links the newly generated device global
|
|
pointer and the host pointer together across the two modules.
|
|
|
|
Similarly to other metadata (e.g. for `TargetOp`) that is shared across
|
|
both host and device modules, processing of `GlobalOp`'s in the device
|
|
needs access to the previously generated host IR file, which is done
|
|
through another `attribute` applied to the `ModuleOp` by the compiler
|
|
frontend. The file is loaded in and consumed by the `OMPIRBuilder` to
|
|
populate it's `OffloadInfoManager` data structures, keeping host and
|
|
device appropriately synchronised.
|
|
|
|
The second (and more important to remember) is that as we effectively replace
|
|
the original LLVM-IR generated for the `declare target` marked `GlobalOp` we
|
|
have some corrections we need to do for `TargetOp`'s (or other region
|
|
operations that use them directly) which still refer to the original lowered
|
|
global operation. This is done via `handleDeclareTargetMapVar` which is invoked
|
|
as the final function and alteration to the lowered `target` region, it's only
|
|
invoked for device as it's only required in the case where we have emitted the
|
|
"ref" pointer , and it effectively replaces all uses of the originally lowered
|
|
global symbol, with our new global ref pointer's symbol. Currently we do not
|
|
remove or delete the old symbol, this is due to the fact that the same symbol
|
|
can be utilised across multiple target regions, if we remove it, we risk
|
|
breaking lowerings of target regions that will be processed at a later time.
|
|
To appropriately delete these no longer necessary symbols we would need a
|
|
deferred removal process at the end of the module, which is currently not in
|
|
place. It may be possible to store this information in the OMPIRBuilder and
|
|
then perform this cleanup process on finalization, but this is open for
|
|
discussion and implementation still.
|
|
|
|
## Current Support
|
|
|
|
For the moment, `declare target` should work for:
|
|
* Marking functions/subroutines and function/subroutine interfaces for
|
|
generation on host, device or both.
|
|
* Implicit function/subroutine capture for calls emitted in a `target` region
|
|
or explicitly marked `declare target` function/subroutine. Note: Calls made
|
|
via arguments passed to other functions must still be themselves marked
|
|
`declare target`, e.g. passing a `C` function pointer and invoking it, then
|
|
the interface and the `C` function in the other module must be marked
|
|
`declare target`, with the same type of marking as indicated by the
|
|
specification.
|
|
* Marking global variables with `declare target`'s `link` clause and mapping
|
|
the data to the device data environment utilising `declare target`. This may
|
|
not work for all types yet, but for scalars and arrays of scalars, it
|
|
should.
|
|
|
|
Doesn't work for, or needs further testing for:
|
|
* Marking the following types with `declare target link` (needs further
|
|
testing):
|
|
* Descriptor based types, e.g. pointers/allocatables.
|
|
* Derived types.
|
|
* Members of derived types (use-case needs legality checking with OpenMP
|
|
specification).
|
|
* Marking global variables with `declare target`'s `to` clause. A lot of the
|
|
lowering should exist, but it needs further testing and likely some further
|
|
changes to fully function.
|