313 Commits

Author SHA1 Message Date
Sergio Afonso
d87964de78
[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533)
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.

This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
2024-10-25 11:30:16 +01:00
Kareem Ergawy
ad70f3e095
[flang][OpenMP] Support target enter|update|exit .. nowait (#113305)
Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
2024-10-23 10:48:54 +02:00
NimishMishra
645e6f1114
[llvm][OpenMP] Handle complex types in atomic read (#111377)
This patch adds functionality for atomically reading `llvm.struct`
types.

Fixes: https://github.com/llvm/llvm-project/issues/93441
2024-10-22 18:34:22 -07:00
Kareem Ergawy
d0d03805f8
[flang][OpenMP] Support target ... nowait (#111823)
Adds MLIR to LLVM lowering support for `target ... nowait`. This
leverages the already existings code-gen patterns for `task` by treating
`target ... nowait` as `task ... if(1)` and `target` (without `nowait`)
as `task ... if(0)`; similar to what clang does.
2024-10-15 14:39:16 +02:00
NimishMishra
aec87a2143
[llvm][mlir][flang][OpenMP] Emit __atomic_load and __atomic_compare_exchange libcalls for complex types in atomic update (#92364)
This patch adds functionality to emit relevant libcalls in case
atomicrmw instruction can not be emitted (for instance, in case of
complex types). The IRBuilder is modified to directly emit __atomic_load
and __atomic_compare_exchange libcalls. The added functions follow a
similar codegen path as Clang, so that LLVM Flang generates almost
similar IR as Clang.

Fixes https://github.com/llvm/llvm-project/issues/83760 and
https://github.com/llvm/llvm-project/issues/75138

Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
2024-10-02 23:32:36 -07:00
Youngsuk Kim
d071fdab44
[llvm][OMPIRBuilder] Avoid Type::getPointerTo() (NFC) (#110678)
`Type::getPointerTo()` is to be deprecated & removed soon.
2024-10-01 11:47:08 -04:00
Nikita Popov
ecb98f9fed [IRBuilder] Remove uses of CreateGlobalStringPtr() (NFC)
Since the migration to opaque pointers, CreateGlobalStringPtr()
is the same as CreateGlobalString(). Normalize to the latter.
2024-09-23 16:30:50 +02:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
Akash Banerjee
2cf36f0293
[OpenMP]Update use_device_clause lowering (#101707)
This patch updates the use_device_ptr and use_device_addr clauses to use
the mapInfoOps for lowering. This allows all the types that are handle
by the map clauses such as derived types to also be supported by the
use_device_clauses.

This is patch 2/2 in a series of patches.
2024-09-04 12:36:03 +01:00
Abid Qadeer
9e08db796b
[OpenMPIRBuilder] Don't drop debug info for target region. (#80692)
When an outlined function is generated for omp target region, a
corresponding DISubprogram was not being generated. This resulted in all
the debug information for the target region being dropped.

This commit adds DISubprogram for the outlined function if there is one
available for the parent function. It also updates the current debug
location so that the right scope is used for the entries in the outlined
function.

There are places in the OpenMPIRBuilder which changes insertion point but
don't update the debug location accordingly. They cause issue when debug info
is enabled. I have fixed a few that I observed to cause issue. But there may be
more and a systematic cleanup may be required.

With this change in place, I can set source line breakpoint in target
region and run to them in debugger.
2024-09-04 10:16:14 +01:00
Kareem Ergawy
a195e2d461
[MLIR][OpenMP] Handle privatization for global values in MLIR->LLVM translation (#104407)
Potential fix for https://github.com/llvm/llvm-project/issues/102939 and
https://github.com/llvm/llvm-project/issues/102949.

The issues occurs because the CodeExtractor component only collect
inputs (to the parallel regions) that are defined in the same function
in which the parallel regions is present. Howerver, this is problematic
because if we are privatizing a global value (e.g. a `target` variable
which is emitted as a global), then we miss finding that input and we do
not privatize the variable.

This commit attempts to fix the issue by adding a flag to the
CodeExtractor so that we can collect global inputs.
2024-08-26 17:08:24 +02:00
Daniil Fukalov
0da2ba811a
[NFC] Cleanup in ADT and Analysis headers. (#104484)
Remove unused directly includes and forward declarations in ADT and
Analysis headers.
2024-08-17 13:11:18 +02:00
Shilei Tian
0551926fda
[Clang][OMPX] Add the code generation for multi-dim thread_limit clause (#102717) 2024-08-16 13:59:46 -04:00
Kazu Hirata
b57038a611
[OpenMP] Use range-based for loops (NFC) (#103511) 2024-08-14 20:03:45 -07:00
Nikita Popov
dc831e8422 [OMPIRBuilder] Use getAllOnesValue()
Split out from https://github.com/llvm/llvm-project/pull/80309.
2024-08-12 16:37:42 +02:00
Shilei Tian
ee8100ba02
[Clang][OMPX] Add the code generation for multi-dim num_teams (#101407)
This patch adds the code generation support for multi-dim `num_teams`
clause when it is used with `target teams ompx_bare` construct.
2024-08-09 10:33:41 -04:00
arsnyder16
f7b2c2e49f
[openmp][WebAssembly] Allow openmp to compile and run under emscripten toolchain (#95169)
* Separate wasi and emscripten as they have different constraints and
abilities
* Emscripten mimics Linux/POSIX by statically linking the musl runtime.
This allow nearly all KMP_OS_LINUX code paths to work correctly. There
are only a few places that need to be adjusted related to dynamic
linking (dl_open)
* Internally link openmp globals
* With CommonLinkage it is needed to emit them in an assembly file, now
they are defined and used within each compilation unit
* With ExternalLinkage they suffer from duplicate symbols during linking
for unnamed globals like reduction/critical
   * Interestingly this aligns with the TODO comment above this code
2024-08-07 13:00:37 -05:00
Sergio Afonso
84b1e59580
[MLIR][OpenMP][OMPIRBuilder] Add lowering support for omp.target_triples (#100156)
This patch modifies MLIR to LLVM IR lowering of the OpenMP dialect to take into
consideration the contents of the `omp.target_triples` module attribute while
generating code for `omp.target` operations.

It adds the `OpenMPIRBuilderConfig::TargetTriples` field and initializes it
using the `amendOperation` flow of the `OpenMPToLLVMIRTranslation` pass. Some
changes are introduced into the `OpenMPIRBuilder` to allow passing the
information about whether a target region is intended to be offloaded from
outside.

The result of this change is that offloading calls are only generated when the
`--offload-arch` or `-fopenmp-targets` options are given to the compiler.
Otherwise, only the host fallback code is generated. This fixes linker errors
currently triggered by `flang-new` if a source file containing a `target`
construct is compiled without any of the aforementioned options.

Several unit tests impacted by these changes, which are intended to check host
code generated for `omp.target` operations, are updated to contain the new
attribute. Without it, no calls to `__tgt_target_kernel` and associated control
flow operations are generated.

Fixes #100209.
2024-08-02 11:58:40 +01:00
Johannes Doerfert
f3bfc56327
[Offload][OpenMP] Prettify error messages by "demangling" the kernel name (#101400)
The kernel names for OpenMP are manually mangled and not ideal when we
report something to the user. We demangle them now, providing the
function and line number of the target region, together with the actual
kernel name.
2024-08-01 15:24:15 -07:00
Pranav Bhandarkar
5b4e5f8ac6
[OpenMPIRBuilder][Clang][NFC] - Combine emitOffloadingArrays and emitOffloadingArraysArgument in OpenMPIRBuilder (#97088)
This patch introduces a new interface in `OpenMPIRBuilder` that combines
the creation of the so-called offloading pointer arrays and their
subsequent preparation as arguments to the OpenMP runtime library. We
then use this in Clang. 

This is intended to be used in the near future
by other frontends such as Flang when lowering MLIR to LLVMIR.
2024-07-25 16:28:11 -05:00
Akash Banerjee
0613454012
[OpenMP] Fix OpenMPIRBuilder generating incorrect duplicate SrcLocInfo (#100364)
This should further fix some of the incorrect debug info being generated
related to #97458
2024-07-25 17:46:18 +01:00
Johannes Doerfert
3c8efd7928
[OpenMP] Ensure the actual kernel is annotated with launch bounds (#99927)
In debug mode there is a wrapper (the kernel) around the function in
which we generate the kernel code. We worked around this before to get
the correct kernel name, but now we really distinguish both to attach
the launch bounds to the kernel, not the inner function.
2024-07-23 09:02:47 -07:00
Pranav Bhandarkar
d7e185cca9
[OMPIRBuilder] - Handle dependencies in createTarget (#93977)
This patch handles dependencies specified by the `depend` clause on an
OpenMP target construct. It does this much the same way clang does it by
materializing an OpenMP `task` that is tagged with the dependencies.

The following functions are relevant to this patch -
1) `createTarget` - This function itself is largely unchanged except
that it now accepts a vector of `DependData` objects that it simply
forwards to `emitTargetCall`
2) `emitTargetCall` - This function has changed now to check if an outer
target-task needs to be materialized (i.e if `target` construct has
`nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to
do all the heavy lifting for creating and dispatching the task.
3) `emitTargetTask` - Bulk of the change is here. See the large comment
explaining what it does at the beginning of this function
2024-07-22 10:56:45 -05:00
Gheorghe-Teodor Bercea
1a478a69bc
[OpenMP][offload] Fix dynamic schedule tracking (#97065)
This patch fixes the dynamic schedule tracking.
2024-07-01 10:23:11 -04:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Akash Banerjee
6b1c51bc05
[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343)
This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions.

Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>
2024-06-26 20:18:38 +01:00
Nikita Popov
b6a94b6bfb [OMPIRBuilder] Use SmallPtrSet::remove_if() (NFC) 2024-06-26 14:43:01 +02:00
agozillon
aec735cf47
[Flang][OpenMP][MLIR] Fix common block mapping for regular and declare target link (#91829)
This PR attempts to fix common block mapping for regular mapping of
these types as well as when they have been marked as "declare target
link". This PR should allow correct mapping of both the members of a
common block and the full common block via its block symbol.

The main changes were some adjustments to the Fortran OpenMP lowering to
HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and
adjustments to the way the we handle target kernel map argument
rebinding inside of the OMPIRBuilder.

For the Fortran OpenMP lowering were two changes, one to prevent the
implicit capture of common block members when the common block symbol
itself has been marked and the other creates intermediate member access
inside of the target region to be used in-place of those external to the
target region, this prevents external usages breaking the
IsolatedFromAbove pact.

In the latter case, there was an adjustment to the size calculation for
types to better handle cases where we pass an array as the type of a map
(as opposed to the bounds and the type of the element), which occurs in
the case of common blocks. There is also some adjustment to how
handleDeclareTargetMapVar handles renaming of declare target symbols in
the module to the reference pointer, now it will only apply to those
within the kernel that is currently being generated and we also perform
a modification to replace constants with instructions as necessary as we
cannot replace these with our reference pointer (non-constant and
constants do not mix nicely).

In the case of the OpenMPIRBuilder some changes were made to defer
global symbol rebinding to kernel arguments until all other arguments
have been rebound. This makes sure we do not replace uses that may refer
to the global (e.g. a GEP) but are themselves actually a separate
argument that needs bound.

Currently "declare target to" still needs some work, but this may be the
case for all types in conjunction with "declare target to" at the
moment.
2024-06-25 20:54:04 +02:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Nikita Popov
f1075a34ab [FileSystem] Avoid <stack> include (NFC)
The standard pattern in LLVM is to directly use vectors for stacks,
without an additional std::stack wrapper to rename some methods.
2024-06-21 13:44:46 +02:00
Nikita Popov
36c6632eb4
[IR] Don't include PassInstrumentation.h in PassManager.h (NFC) (#96219)
Move PassInstrumentationAnalysis into PassInstrumentation.h and stop
including it in PassManager.h (effectively inverting the direction of
the dependency).

Most places using PassManager are not interested in PassInstrumentation,
and we no longer have any uses of it in PassManager.h itself (only in
PassManagerImpl.h).
2024-06-21 08:41:16 +02:00
Mats Petersson
e5f1639342
[Flang]Fix for changed code at the end of AllocaIP. (#92430)
Some of the OpenMP code can change the instruction pointed at by the
insertion point. This leads to an assert in the compiler about
BB->getParent() and IP->getParent() not matching.

The fix is to rebuild the insertionpoint from the block, rather than use
builder.restoreIP.

Also, move some of the alloca generation, rather than skipping back and
forth between insert points (and ensure all the allocas are done before
their users are created).

A simple test, mainly to ensure the minimal reproducer doesn't fail to
compile in the future is also added.
2024-06-18 21:10:41 +01:00
agozillon
0aeaa2d93d
[OMPIRBuilder][OpenMP][LLVM] Modify and use ReplaceConstant utility in convertTarget (#94541)
This PR seeks to expand/replace the Constant -> Instruction conversion
that needs to occur inside of the OpenMP Target kernel generation to
allow kernel argument replacement of uses within the kernel (cannot
replace constant uses within constant expressions with non-constants).
It does so by making use of the new-ish utility
convertUsersOfConstantsToInstructions which is a much more expansive
version of what the smaller "version" of the function I wrote does,
effectively expanding uses of the input argument that are constant
expressions into instructions so that we can replace with the
appropriate kernel argument.

Also alters convertUsersOfConstantsToInstructions to optionally
restrict the replacement to a function and optionally leave
dead constants alone, the latter is necessary when lowering from 
MLIR as we cannot be sure we can remove the constants at this 
stage, even if rewritten to instructions the ModuleTranslation may 
maintain links to the original constants and utilise them in further 
lowering steps (as when we're lowering the kernel, the module is 
still in the process of being lowered). This can result in unusual 
ICEs later. These dead constants can be tidied up later (and 
appear to be in subsequent lowering from checking with 
emit-llvm).
2024-06-13 15:57:15 +02:00
Sameer Sahasrabuddhe
e0ac087ff0
[LoopUnroll] Consider convergence control tokens when unrolling (#91715)
- There is no restriction on a loop with controlled convergent
operations when
  the relevant tokens are defined and used within the loop.

- When a token defined outside a loop is used inside (also called a loop
convergence heart), unrolling is allowed only in the absence of
remainder or
  runtime checks.

- When a token defined inside a loop is used outside, such a loop is
said to be
"extended". This loop can only be unrolled by also duplicating the
extended part
  lying outside the loop. Such unrolling is disabled for now.

- Clean up loop hearts: When unrolling a loop with a heart, duplicating
the
heart will introduce multiple static uses of a convergence control token
in a
cycle that does not contain its definition. This violates the static
rules for
tokens, and needs to be cleaned up into a single occurrence of the
intrinsic.

- Spell out the initializer for UnrollLoopOptions to improve
readability.


Original implementation [D85605] by Nicolai Haehnle
<nicolai.haehnle@amd.com>.
2024-06-06 13:13:46 +05:30
Kareem Ergawy
7db4e6c1ec
[OpenMP][LLVM] Update alloca IP after PrivCB in OMPIRBUIlder (#93920)
Fixes a crash uncovered by
[pr77666.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr77666.f90)
in the test suite when delayed privatization is enabled by default.

In particular, whenever `PrivCB` (the callback responsible for
generating privatizaiton logic for an OMP variable) generates a
multi-block privatization region, the insertion point diverges: the BB
component of the IP can become a different BB from the parent block of
the instruction iterator component of the IP. This PR updates the IP to
make sure that the BB is the parent block of the instruction iterator.
2024-06-05 05:13:47 +02:00
Tom Eccles
74a87548e5
[flang][MLIR][OpenMP] make reduction by-ref toggled per variable (#92244)
Fixes #88935

Toggling reduction by-ref broke when multiple reduction clauses were
used. Decisions made for the by-ref status for later clauses could then
invalidate decisions for earlier clauses. For example,

```
reduction(+:scalar,scalar2) reduction(+:array)
```

The first clause would choose by value reduction and generate by-value
reduction regions, but then after this the second clause would force
by-ref to support the array argument. But by the time the second clause
is processed, the first clause has already had the wrong kind of
reduction regions generated.

This is solved by toggling whether a variable should be reduced by
reference per variable. In the above example, this allows only `array`
to be reduced by ref.
2024-05-16 15:27:59 +01:00
Pranav Bhandarkar
13cd88108f
[mlir][OpenMP] - Honor dependencies in code-generation of the if clause in omp.task correctly (#90891)
This patch fixes the code generation of the if clause, specifically when the
condition evaluates to false and when the task directive has the depend
clause on it. When the if clause of a task construct evaluates to false,
then the task is an undeferred task. This undeferred task still has to
honor dependencies. Previously, the OpenMPIRbuilder didn't honor
dependencies. This patch fixes that.

Fixes https://github.com/llvm/llvm-project/issues/90869
2024-05-13 08:54:23 -05:00
Abid Qadeer
6419496549
[flang][OMPIRBuilder] Keep debug location in sync with insert point. (#89953)
A customer reported an issue which I have reduced to the test in the PR.
If built with debug info enabled, the build fails with the following
error in the verifier.

!dbg attachment points at wrong subprogram for function

The problem happened because some of the functions in OMPIRBuilder.cpp
updated the insertion point with the passed in location but did not
change the current debug location. This caused a stale debug location to
be attached to the instruction.

I have solved it by replacing restoreIP with updateToLocation which
updates both the insertion point and debug location. The
updateToLocation is used in many places already, so this PR brings
functions that I have changed in line with rest of the file.

Slight issue is that I am not checking the return type of
updateToLocation as there is no good value I could return in that case.
But if we have a condition where updateToLocation will return false,
these functions will fail in any case.

I have added a test that checks that build does not fail. I was not sure
what is the correct location for the test should be. Happy to move it to
more appropriate location.
2024-05-10 10:12:24 +01:00
Kareem Ergawy
922ab7089b
[MLIR][OpenMP] Extend omp.private materialization support: dealloc (#90841)
Extends current support for delayed privatization during translation to
LLVM IR. This adds support for materlizaing the `dealloc` region in
`omp.private` ops when this region contains clean-up/deallocation logic
that needs to be executed at the end of the parallel region.

This changes the `OMPIRBuilder` slightly to execute the finalization
callback **after** the privatization callback. This allows us to collect
information about privatized variables on the MLIR and LLVM sides so
that we can properly emit deallocation logic.
2024-05-03 08:59:01 +02:00
Joseph Huber
0287a5cc4e
[OpenMP] Remove 'minncta' attributes from NVPTX kernels (#88398)
Summary:
Currently we treat this attribute as a minimum number for the amount of
blocks scheduled on the kernel. However, the doucmentation states that
this applies to CTA's mapped onto a *single* SM. Currently we just set
it to the total number of blocks, which will almost always result in a
warning that the value is out of range and will be ignored. We don't
have a good way to automatically know how many CTAs can be put on a
single SM nor if we should do this, so we should probably leave this up
to users manually adding it.


https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm
2024-04-15 15:28:05 -05:00
Joseph Huber
2650375b3b
[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels (#87695)
Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known
2024-04-05 07:38:01 -05:00
Stephen Tozer
9a96fb4445 Reapply "[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737)"
Fixes a build error caused by an unupdated getAsInstruction callsite in clang.

This reverts commit ab851f7fe946e7eed700ef9d82082eb721860189.
2024-03-19 15:49:10 +00:00
Stephen Tozer
ab851f7fe9 Revert "[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737)"
Reverted due to buildbot failures:
https://lab.llvm.org/buildbot/#/builders/139/builds/61717/

This reverts commit 7ef433f62c199c414bffdcac1c8ee3159b29c5f5.
2024-03-19 14:41:27 +00:00
Jeremy Morse
7ef433f62c
[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737)
Because the RemoveDIs work is putting a debug-info bit into
BasicBlock::iterator and iterators are needed for insertion, the
getAsInstruction method declaration would need to use a fully defined
instruction-iterator, which leads to a complicated
header-inclusion-order problem. Much simpler to instead just not insert,
and make it the callers problem to insert.

This is proportionate because there are only four call-sites to
getAsInstruction -- it would suck if we did this everywhere.

---------

Merged by: Stephen Tozer <stephen.tozer@sony.com>
2024-03-19 14:30:41 +00:00
Akash Banerjee
e9da5f0083
[OpenMP] Fix target data region codegen being omitted for device pass (#85218)
This patch enables the BodyCodeGen callback to still trigger for the
TargetData nested region during the device pass. There maybe Target code
nested within the TargetData region for which this is required.

Also add tests for the same.
2024-03-19 13:04:23 +00:00
Tom Eccles
f46f5a01f4
[flang][OpenMP][OMPIRBuilder][mlir] Optionally pass reduction vars by ref (#84304)
Previously reduction variables were always passed by value into and out
of the initialization and combiner regions of the OpenMP reduction
declare operation.

This worked well for reductions of primitive types (and might perform
better than passing by reference). But passing by reference will be
useful for array and derived type reductions (e.g. to move allocation
inside of the init region).

Passing reductions by reference requires different LLVM-IR generation
when lowering from MLIR because some of the loads/stores/allocations
will now be moved inside of the init and combiner regions. This
alternate code generation is requested using a new attribute to
omp.wsloop and omp.parallel.

Existing lowerings from mlir are unaffected (these will continue to use
the by-value argument passing.

Flang will continue to pass by-value argument passing for trivial types
unless a (hidden) command line argument is supplied. Non-trivial types
will always use the by-ref lowering.

Array reductions are not ready yet (but are coming very soon). In the
meantime, this is tested by forcing existing reductions to use by-ref.

Commit series for by-ref OpenMP reductions 3/3

---------

Co-authored-by: Mats Petersson <mats.petersson@arm.com>
2024-03-13 14:51:09 +00:00
Leandro Lupori
64422cf826
[llvm][mlir][OMPIRBuilder] Translate omp.single's copyprivate (#80488)
Use the new copyprivate list from omp.single to emit calls to
__kmpc_copyprivate, during the creation of the single operation
in OMPIRBuilder.

This is patch 4 of 4, to add support for COPYPRIVATE in Flang.
Original PR: https://github.com/llvm/llvm-project/pull/73128
2024-02-28 13:33:42 -03:00
agozillon
dcf4ca558c
[OpenMP][MLIR][OMPIRBuilder] Add a small optional constant alloca raise function pass to finalize, utilised in convertTarget (#78818)
This patch seeks to add a mechanism to raise constant (not ConstantExpr
or runtime/dynamic) sized allocations into the entry block for select
functions that have been inserted into a list for processing. This
processing occurs during the finalize call, after OutlinedInfo regions
have completed. This currently has only been utilised for
createOutlinedFunction, which is triggered for TargetOp generation in
the OpenMP MLIR dialect lowering to LLVM-IR.

This currently is required for Target kernels generated by
createOutlinedFunction to avoid subsequent optimization passes doing
some unintentional malformed optimizations for AMD kernels (unsure if it
occurs for other vendors). If the allocas are generated inside of the
kernel and are not in the entry block and are subsequently passed to a
function this can lead to required instructions being erased or
manipulated in a way that causes the kernel to run into a HSA access
error.

This fix is related to a series of problems found in:
https://github.com/llvm/llvm-project/issues/74603

This problem primarily presents itself for Flang's HLFIR AssignOp
currently, when utilised with a scalar temporary constant on the RHS and
a descriptor type on the LHS. It will generate a call to a runtime
function, wrap the RHS temporary in a newly allocated descriptor (an
llvm struct), and pass both the LHS and RHS descriptor into the runtime
function call. This will currently be
embedded into the middle of the target region in the user entry block,
which means the allocas are also embedded in the middle, which seems to
pose
issues when later passes are executed. This issue may present itself in
other HLFIR operations or unrelated operations that generate allocas as
a by product, but for the moment, this one test case is the only
scenario I've found this problem.

Perhaps this is not the appropriate fix, I am very open to other
suggestions, I've tried a few others (at varying levels of the
flang/mlir compiler flow), but this one is the smallest and least
intrusive change set. The other two, that come to mind (but I've not
fully looked into, the former I tried a little with blocks but it had a
few issues I'd need to think through):

- Having a proper alloca only block (or region) generated for TargetOps
that we could merge into the entry block that's generated by
convertTarget's createOutlinedFunction.
- Or diverging a little from Clang's current target generation and using
the CodeExtractor to generate the user code as an outlined function
region invoked from the kernel we make, with our kernel arguments passed
into it. Similar to the current parallel generation. I am not sure how
well this would intermingle with the existing parallel generation though
that's layered in.

Both of these methods seem like quite a divergence from the current
status quo, which I am not entirely sure is merited for the small test
this change aims to fix.
2024-02-23 22:59:41 +01:00