OpenMP 5.1 allows emission of the `indirect` clause on declare target
functions, see https://www.openmp.org/spec-html/5.1/openmpsu70.html#x98-1080002.14.7.
The intended use of this is to permit calling device functions via their
associated host pointer. In order to do this the first step will be
building a map associating these variables. Doing this will require the
same offloading entry handling we use for other kernels and globals.
We intentionally emit a new global on the device side. Although it's
possible to look up the device function's address directly, this would
require changing the visibility and would prevent us from making static
functions indirect. Also, the CUDA toolchain will optimize out unused
functions and using a global prevents that. The downside is that the
runtime will need to read the global and copy its value, but there
shouldn't be any other costs.
Note that this patch just performs the codegen, currently this new
offloading entry type is unused and will be ignored by the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D157738
We already add AMD GPU annotations, the NVIDIA ones are just a little
more convoluted to add/update but otherwise the same.
We see again that the interplay of ompx_attribute and deduced value
needs to be improved, see the TODO.
Differential Revision: https://reviews.llvm.org/D158383
Migrate createForStaticInitFunction, createDispatchInitFunction, createDispatchNextFunction and createDispatchFiniFunction from Clang CodeGen to OMPIRBuilder.
Differential Revision: https://reviews.llvm.org/D157994
This patch adds code emission in emitTargetCall to call the OpenMP runtime to
launch an kernel, and to call the fallback host implementation if the launch
fails.
Reviewed By: TIFitis, kiranchandramohan, jdoerfert
Differential Revision: https://reviews.llvm.org/D155633
OpenMP runtime functions assume the pointers are aligned to sizeof(pointer),
but it is being aligned incorrectly. Fix with the proper alignment in the IR builder.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D157040
Added MLIR support for translating use_device_ptr and use_device_addr clauses for LLVMIR lowering.
- use_device_ptr: The mapped variables marked with use_device_ptr are accessed through a copy of the base pointer mappers. The mapper is copied onto a new temporary pointer variable.
- use_device_addr: The mapped variables marked with use_device_addr are accessed directly through the base pointer mappers.
- If mapping information is not provided explicitly then default map_type of alloc/release is assumed and the map_size is set to 0.
Depends on D152554
Reviewed By: kiranchandramohan, raghavendhra
Differential Revision: https://reviews.llvm.org/D146648
This patch fixes the issue that multiple definition of kernel environment global
variables can occur because of wrong linkage.
Fixes#64284.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D156955
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Depend on D155886.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
This patch ensures that all outlined functions parameters are i64 or ptr when
compiling for a target device, which is what the OpenMP runtime expects. The
values are then cast to the correct type inside the kernel.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D155628
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Depend on D155886.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
This patch migrates the UseDevicePtr and UseDeviceAddr clause related code for handling privatisation from Clang codegen to the OMPIRBuilder
Depends on D150860
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D152554
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:
* Drop the call entirely if the sole purpose of it is to support
a no-op bitcast (remove the no-op bitcast as well).
* Replace with `PointerType::get()`/`PointerType::getUnqual()`.
Also, remove no-op function `EmitBitCastOfLValueToProperType()`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D154392
This patch changes the emitTargetDataCalls function in clang to make use of the OpenMPIRBuilder::createTargetData function for Target Data directive code gen.
Reapplying commit 0d8d718171192301f2beb10bd08ce62e70281a5e after fixing libomptarget test failure related to debug info.
Depends on D146557
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D150860
This patch refactors the code generation that emits the offloading kernel
launch and moves the core portion to the OpenMPIRBuilder so that it can be used
from flang in the future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D151035
This patch changes the emitTargetDataCalls function in clang to make use of the OpenMPIRBuilder::createTargetData function for Target Data directive code gen.
Depends on D146557
Differential Revision: https://reviews.llvm.org/D150860
Key changes:
- Refactor the createTargetData function to make use of the emitOffloadingArrays and emitOffloadingArraysArgument functions to generate code.
- Added a new emitIfClause helper function to allow handling if clauses in a similar fashion to Clang.
- Updated the MLIR side of code to account for changes to createTargetData.
Depends on D149872
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D146557
This patch migrates the emitOffloadingArrays and EmitNonContiguousDescriptor functions from Clang codegen to OpenMPIRBuilder.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149872
This prevents access of uninitialized ID components later, or continuing progress
further making errors harder to track down.
Hoepfully this will be the last in my long series of mistakes in recent
commits relating to this area of code!
This change tries to move registerTargetglobalVariable and
getAddrOfDeclareTargetVar out of Clang's CGOpenMPRuntime
and into the OMPIRBuilder for shared use with MLIR's OpenMPDialect
and Flang (or other languages that may want to utilise it).
This primarily does this by trying to hoist the Clang specific
types into arguments or callback functions in the form of
lambdas, replacing it with LLVM equivelants and
utilising shared OMPIRBuilder enumerators for
the clauses, rather than Clang's own variation.
Reviewers: jsjodin, jdoerfert
Differential Revision: https://reviews.llvm.org/D149162
This patch adds support in the `OpenMPIRBuilder` for generating working
device code for OpenMP target regions. It generates and handles the
result of a call to `__kmpc_target_init()` at the beginning of the
function resulting from outlining each target region, and it also
generates the matching `__kmpc_target_deinit()` call before returning.
It relies on the implementation of target region outlining for host
codegen to handle the production of the new function and the lowering of
its body based on the contents of the associated target region.
Depends on D147172
Differential Revision: https://reviews.llvm.org/D147940
This allows the generation of OpenMP offload metadata for the OpenMP
dialect when lowering to LLVM-IR and moves some of the shared logic
between the OpenMP Dialect and Clang into the IRBuilder.
Reviewers: jsjodin, jdoerfert, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D148370
We use this helper to make several internal global variables during
codegen. currently we do not specify any alignment which allows the
alignment to be set incorrectly after some changes in how alignment was
handled. This patch explicitly aligns these variables to the natural
alignment as specified by the data layout
Fixes https://github.com/llvm/llvm-project/issues/62668
Reviewed By: tianshilei1992, gchatelet
Differential Revision: https://reviews.llvm.org/D150461
The interop types use the number of dependencies in the function
interface. Every other function uses an `i32` to count the number of
dependencies except for the initialization function. This leads to
codegen issues when the rest of the compiler passes in an `i32` that
then creates an invalid call. Fix this to be consistent with the other
uses.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D150156
This patch adds lowering of TargetOps for the host. The lowering outlines the
target region function and uses the OpenMPIRBuilder support functions to emit
the function and call. Code generation for offloading will be done in later
patches.
Reviewed By: kiranchandramohan, jdoerfert, agozillon
Differential Revision: https://reviews.llvm.org/D147172
This reverts commit 35cfadfbe2decd9633560b3046fa6c17523b2fa9.
It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
Work group size attribute was set in Clang specific class. That's why
we cannot reuse this code in Flang.
If we move setting of this attribute to OpenMPIRBuilder, then we can reuse this
code in Flang and Clang. Function createOffloadEntry from OpenMPIRBuilder is
already used by Clang (via OpenMPIRBuilder::createOffloadEntriesAndInfoMetadata
function).
Differential Revision: https://reviews.llvm.org/D148525
Reviewed By: jdoerfert
This patch adds the OffloadEntriesInfoManager to the OpenMPIRBuilder, and
allows the OffloadEntriesInfoManager to access the Configuration in the
OpenMPIRBuilder. With the shared Config there is no risk for inconsistencies,
and there is no longer the need for clang to have a separate
OffloadEntriesInfoManager.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D146549
This patch adds OpenMP IRBuilder support for the Target Data directives to allow lowering to LLVM IR.
The mlir::Translation is responsible for generating supporting code for processing the map_operands through the processMapOperand function, and also generate code for the r>
The OMPIRBuilder is responsible for generating the begin and end mapper function calls.
Limitations:
- use_device_ptr and use_device_addr clauses are NOT supported for Target Data operation.
- nowait clauses are NOT supported for Target Enter and Exit Data operations.
- Only LLVMPointerType is supported for map_operands.
Differential Revision: https://reviews.llvm.org/D142914
If an inlined kernel is called in a loop, the launch point alloca would
lead to increasing stack usage every time the kernel is invoked. This
could make the application run out of stack space and crash. This problem
is fixed by using the alloca insertion point while creating the alloca instruction.
Fixes https://github.com/llvm/llvm-project/issues/60602
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D145820
This patch adds several missing GlobalList modifier functions, like
removeGlobalVariable(), eraseGlobalVariable() and insertGlobalVariable().
There is no longer need to access the list directly so it also makes
getGlobalList() private.
Differential Revision: https://reviews.llvm.org/D144027
Currently default simd alignment is defined by Clang specific TargetInfo class.
This class cannot be reused for LLVM Flang. That's why default simd alignment
calculation has been moved to OMPIRBuilder which is common for Flang and Clang.
Previous attempt: https://reviews.llvm.org/D138496 was wrong because
the default alignment depended on the number of built LLVM targets.
If we wanted to calculate the default alignment for PPC and we hadn't specified
PPC LLVM target to build, then we would get 0 as the alignment because
OMPIRBuilder couldn't create PPCTargetMachine object and it returned 0 as
the default value.
If PPC LLVM target had been built earlier, then OMPIRBuilder could have created
PPCTargetMachine object and it would have returned 128.
Differential Revision: https://reviews.llvm.org/D141910
Reviewed By: jdoerfert