321 Commits

Author SHA1 Message Date
Longsheng Mou
f047b735e9
[mlir][NFC] Use getDefiningOp<OpTy>() instead of dyn_cast<OpTy>(getDefiningOp()) (#150428)
This PR uses `val.getDefiningOp<OpTy>()` to replace `dyn_cast<OpTy>(val.getDefiningOp())` , `dyn_cast_or_null<OpTy>(val.getDefiningOp())` and `dyn_cast_if_present<OpTy>(val.getDefiningOp())`.
2025-07-25 10:35:51 +08:00
Kazu Hirata
1a0f482de8
[mlir] Remove unused includes (NFC) (#150476)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-24 11:23:53 -07:00
Longsheng Mou
3eb49c482c
[mlir][NFC] Use hasOneBlock instead of llvm::hasSingleElement(region) (#149809) 2025-07-24 10:11:21 +08:00
Tom Eccles
a1c61ac756
[mlir][OpenMP] Allow composite SIMD REDUCTION and IF (#147568)
Reduction support: https://github.com/llvm/llvm-project/pull/146671
If Support is fixed in this PR

The problem for the IF clause in composite constructs was that wsloop
and simd both operate on the same CanonicalLoopInfo structure: with the
SIMD processed first, followed by the wsloop. Previously the IF clause
generated code like
```
if (cond) {
  while (...) {
    simd_loop_body;
  }
} else {
  while (...) {
    nonsimd_loop_body;
  }
}
```
The problem with this is that this invalidates the CanonicalLoopInfo
structure to be processed by the wsloop later. To avoid this, in this
patch I preserve the original loop, moving the IF clause inside of the
loop:
```
while (...) {
  if (cond) {
    simd_loop_body;
  } else {
    non_simd_loop_body;
  }
}
```
On simple examples I tried LLVM was able to hoist the if condition
outside of the loop at -O3.

The disadvantage of this is that we cannot add the
llvm.loop.vectorize.enable attribute on either the SIMD or non-SIMD
loops because they both share a loop back edge. There's no way of
solving this without keeping the old design of having two different
loops: which cannot be represented using only one CanonicalLoopInfo
structure. I don't think the presence or absence of this attribute makes
much difference. In my testing it is the llvm.loop.parallel_access
metadata which makes the difference to vectorization. LLVM will
vectorize if legal whether or not this attribute is there in the TRUE
branch. In the FALSE branch this means the loop might be vectorized even
when the condition is false: but I think this is still standards
compliant: OpenMP 6.0 says that when the if clause is false that should
be treated like the SIMDLEN clause is one. The SIMDLEN clause is defined
as a "hint". For the same reason, SIMDLEN and SAFELEN clauses are
silently ignored when SIMD IF is used.

I think it is better to implement SIMD IF and ignore SIMDLEN and SAFELEN
and some vectorization encouragement metadata when combined with IF than
to ignore IF because IF could have correctness consequences whereas the
rest are optimiztion hints. For example, the user might use the IF
clause to disable SIMD programatically when it is known not safe to
vectorize the loop. In this case it is not at all safe to add the
parallel access or SAFELEN metadata.
2025-07-15 10:30:02 +01:00
Michael Kruse
96bc07d492
[MLIR][OpenMP] Add canonical loop LLVM-IR lowering (#147069)
Support for translating the operations introduced in #144785 to LLVM-IR.

In order to keep the lowering simple,
`OpenMPIRBuider::unrollLoopHeuristic` is applied when encountering the
`omp.unroll_heuristic` op. As a result, the operation that unrolling is
applied to (`omp.canonical_loop`) must have been emitted before even
though logically there is no such requirement.

Eventually, all transformations on a loop must be applied directly after
emitting `omp.canonical_loop`, i.e. future transformations must be
looked-up when encountering `omp.canonical_loop` itself. This is because
many OpenMPIRBuilder methods (e.g. `createParallel`) expect all the
region code to be emitted withing a callback. In the case of
`createParallel`, the region code is getting outlined into a new
function. Therefore, making the operation order a formal requirement
would not make the implementation any easier.
2025-07-11 12:54:25 +02:00
agozillon
71783fea2c [Flang][OpenMP][MLIR] Fix regression by #146653 by adding address space cast to getRefPtrIfDeclareTarget
The patch introduced changes to add address spaces to a wider array of MLIR/LLVM values, however,
it was missing an address space cast that exists in our downstream implementation that's required
for declare target to work correctly.
2025-07-08 12:31:27 -05:00
Kajetan Puchalski
9006bc8717
[OpenMP] Enable simd in non-reduction composite constructs (#146097)
Despite currently being ignored with a warning, simd as a leaf in
composite constructs behaves as expected when the construct does not
contain a reduction. Enable it for those non-reduction constructs.

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-07-08 14:27:33 +01:00
Tom Eccles
ea5ee2e743
[mlir][OpenMP] Don't allow firstprivate for simd (#146734)
This is not allowed by the openmp standard.
2025-07-04 12:15:07 +01:00
Kareem Ergawy
8c9e0c6c61
[flang][OpenMP] Allocate reduction init temps on the stack for GPUs (#146667)
Temps needed for the reduction init regions are now allocate on the heap
all the time. However, this is performance killer for GPUs since malloc
calls are prohibitively expensive. Therefore, we should do these
allocations on the stack for GPU reductions.
2025-07-04 06:29:34 +02:00
Abid Qadeer
d56c06e6c9
[flang][debug] Generate DISubprogramAttr for omp::TargetOp. (#146532)
This is combination of https://github.com/llvm/llvm-project/pull/138149
and https://github.com/llvm/llvm-project/pull/138039 which were opened
separately for ease of reviewing. Only other change is adjustments in 2
tests which have gone in since.

There are `DeclareOp` present for the variables mapped into target
region. That allow us to generate debug information for them. But the
`TargetOp` is still part of parent function and those variables get the
parent function's `DISubprogram` as a scope.
    
In `OMPIRBuilder`, a new function is created for the `TargetOp`. We also
create a new `DISubprogram` for it. All the variables that were in the
target region now have to be updated to have the correct scope. This
after the fact updating of
debug information becomes very difficult in certain cases. Take the
example of variable arrays. The type of those arrays depend on the
artificial `DILocalVariable`(s) which hold the size(s) of the array.
This new function will now require that we generate the new variable and
and new types. Similar issue exist for character type variables too.
    
To avoid this after the fact updating, this PR generates a
`DISubprogramAttr` for the `TargetOp` while generating the debug info in
`flang`. Then we don't need to generate a `DISubprogram` in
`OMPIRBuilder`. This change is made a bit more complicated by the the
fact that in new scheme, the debug location already points to the new
`DISubprogram` by the time it reaches `convertOmpTarget`. But we need
some code generation in the parent function so we have to carefully
manage the debug locations.
    
This fixes issue `#134991`.
2025-07-03 10:38:28 +01:00
Tom Eccles
16b75c819d
[mlir][OpenMP] implement SIMD reduction (#146671)
This replicates clang's implementation. Basically:
- A private copy of the reduction variable is created, initialized to
the reduction neutral value (using regions from the reduction
declaration op).
- The body of the loop is lowered as usual, with accesses to the
reduction variable mapped to the private copy.
- After the loop, we inline the reduction region from the declaration op
to combine the privatized variable into the original variable.
- As usual with the SIMD construct, attributes are added to encourage
vectorization of the loop and to assert that memory accesses in the loop
don't alias across iterations.

I have verified that simple scalar examples do vectorize at -O3 and the
tests I could find in the Fujitsu test suite produce correct results. I
tested on top of #146097 and this seemed to work for composite
constructs as well.

Fixes #144290
2025-07-02 16:49:34 +01:00
Abid Qadeer
232c2921e1
Reland [mlir][OpenMP] Use correct debug location with link clause. (#145889)
https://github.com/llvm/llvm-project/pull/145026 was reverted because it
failed a sanitizer test. That issue has been fixed in
https://github.com/llvm/llvm-project/pull/145883.
2025-06-26 19:32:30 +01:00
Abid Qadeer
a75279e4a5
Revert "[mlir][OpenMP] Use correct debug location with link clause." (#145768)
Reverts llvm/llvm-project#145026

Caused a CI failure on
https://lab.llvm.org/buildbot/#/builders/169/builds/12504.
2025-06-25 20:06:36 +01:00
Abid Qadeer
006037675c
[mlir][OpenMP] Use correct debug location with link clause. (#145026)
Please see the following program.

```
module test_0
    INTEGER :: sp = 1
!$omp declare target link(sp)
end module test_0

program main
use test_0
integer :: new_len

!$omp target map(tofrom:new_len) map(tofrom:sp)
    new_len = sp
!$omp end target

  print *, new_len
  print *, sp
end program
```

When compiled with
`flang -g -O0 -fopenmp --offload-arch=gfx1100`

will fail the compilation with the following error:

`dbg attachment points at wrong subprogram for function`

The reason is that with the `link` clause on `!$omp declare target`, an
extra load instruction is inserted. But the debug location was not
updated before insertion which caused an invalid location to be attached
to the instruction.
2025-06-25 13:49:40 +01:00
Kajetan Puchalski
d3ed84ed67
[Utils][mlir] Fix interaction between CodeExtractor and OpenMPIRBuilder (#145051)
CodeExtractor can currently erroneously insert an alloca into a
different function than it inserts its users into, in cases where code
is being extracted out of a function that has already been outlined. Add
an assertion that the two blocks being inserted into are actually in the
same function.

Add a check to findAllocaInsertPoint in OpenMP to LLVMIR translation to
prevent the aforementioned scenario from happening.

OpenMPIRBuilder relies on a callback mechanism to fix-up a module later
on during the finaliser step. In some cases this results in the module
being invalid prior to the finalise step running. Remove calls to
verifyModule wrapped in LLVM_DEBUG from CodeExtractor, as the presence
of those results in the compiler crashing with -mllvm -debug due to
premature module verification where it would not crash without -debug.

Call ompBuilder->finalize() the end of mlir::translateModuleToLLVMIR, in
order to make sure the module has actually been finalized prior to
trying to verify it.

Resolves https://github.com/llvm/llvm-project/issues/138102.

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-06-25 13:34:35 +01:00
Krzysztof Parzyszek
8231dd71cb
[flang][OpenMP] Skip runtime mapping with no offload targets (#145594)
When no offload targets are specified flang will avoid offloading for
"target" constructs, but not "target data" constructs. This patch makes
the behavior consistent across all offload-related operations.

While ignoring "target" may produce semantically incorrect code, it may
still be a useful debugging tool.

--
This reinstates commits 6ba1955 and 349f8d6, reverted due to compilation
failures in the gfortran test suite. These build problems were caused by
an unrelated issue (https://github.com/llvm/llvm-project/issues/145558)
which is now fixed.

Ref: https://github.com/llvm/llvm-project/pull/144534
2025-06-25 07:10:08 -05:00
Krzysztof Parzyszek
3a71884ab1
[flang][OpenMP] Map device pointers on host device as well (#145562)
Given a TARGET DATA construct with USE_DEVICE_PTR(x) and IF(FALSE), the
compiler will crash if `x` was used in the body. The cause of the crash
is that the MLIR->LLVM codegen tries to look up the translated value of
x, but one had not been mapped.

Given an IF clause, the translation will generate an if-then-else
construct, with the "else" block corresponding to the false condition,
i.e. the host device playing the role of the target device. In that
block, still process the USE_DEVICE_ADDR/USE_DEVICE_PTR clauses, which
will cause the translation mappings to be created.

Fixes https://github.com/llvm/llvm-project/issues/145558
2025-06-24 15:38:23 -05:00
Tom Eccles
cc756716cf
[mlir][NFC] Move LLVM::ModuleTranslation::SaveStack to a shared header (#144897)
This is so that we can re-use the same code in Flang.
2025-06-24 17:45:10 +01:00
antoine moynault
5fa55b2dfc
Revert "[flang][OpenMP] Skip runtime mapping with no offload targets (#144534)" (#145478)
And also revert 6ba1955 "[flang][OpenMP] Fix ignore-target-data.f90 test"

As it causes several bot failures
https://github.com/llvm/llvm-project/pull/144534#issuecomment-2995303224
2025-06-24 10:51:26 +02:00
Krzysztof Parzyszek
349f8d67d4
[flang][OpenMP] Skip runtime mapping with no offload targets (#144534)
When no offload targets are specified flang will ignore "target"
constructs, but not "target data" constructs. This patch makes the
behavior consistent across all offload-related operations.

While ignoring "target" may produce semantically incorrect code, it may
still be a useful debugging tool.
2025-06-20 08:09:36 -05:00
Tom Eccles
aa01e8e9cf
[mlir][OpenMP] Fix broken insertion point for charbox with omp task (#143112)
Fixes #142365
2025-06-17 10:42:42 +01:00
NimishMishra
bf1fe6eb33
[mlir][OpenMP] Reintroduce TODO for translation of linear clause (#143531)
Reintroduce a TODO for linear clause translation unless corner issues
(like linear variables being entities other than `alloca`, and support
for linear variables of types other than integer) are solved.
2025-06-10 07:06:28 -07:00
Tom Eccles
b03081e9fb
[mlir][OpenMP] set correct insert point after creating a barrier (#142997)
Fixes #138436
2025-06-06 10:43:13 +01:00
Tom Eccles
8d06d4c132
[mlir][OpenMP] Add translation of private_barrier attr to LLVMIR (#140090)
Part of a series to fix
https://github.com/llvm/llvm-project/issues/136357
2025-05-22 15:24:20 +01:00
NimishMishra
0baacd1a58
[flang][OpenMP] Support MLIR lowering of linear clause for omp.wsloop (#139385)
This patch adds support for MLIR lowering of linear clause on omp.wsloop
(except for linear modifiers).
2025-05-19 23:33:06 -07:00
Sergio Afonso
0cd7e8aa91
[MLIR][OpenMP] Assert on map translation functions, NFC (#137199)
This patch adds assertions to map-related MLIR to LLVM IR translation
functions and utils to explicitly document whether they are intended for
host or device compilation only.

Over time, map-related handling has increased in complexity. This is
compounded by the fact that some handling is device-specific and some is
host-specific. By explicitly asserting on these functions on the
expected compilation pass, the flow should become slighlty easier to
follow.
2025-05-15 12:29:06 +01:00
Tom Eccles
e40200901c
[mlir][OpenMP] cancel(lation point) taskgroup LLVMIR (#137841)
A cancel or cancellation point for taskgroup is always nested inside of
a task inside of the taskgroup. For the task which is cancelled, it is
that task which needs to be cleaned up: not the owning taskgroup.
Therefore the cancellation branch handler is done in the conversion of
the task not in conversion of taskgroup.

I added a firstprivate clause to the test for cancel taskgroup to
demonstrate that the block being branched to is the same block where
mandatory cleanup code is added. Cancellation point follows exactly the
same code path.
2025-05-08 11:15:58 +01:00
Tom Eccles
8338a3c92b
[mlir][OpenMP] Convert omp.cancellation_point to LLVMIR (#137205)
This is basically identical to cancel except without the if clause.

taskgroup will be implemented in a followup PR.
2025-05-08 11:09:13 +01:00
Tom Eccles
a385c47a59
[mlir][OpenMP] convert wsloop cancellation to LLVMIR (#137194)
Taskloop support will follow in a later patch.
2025-05-08 11:08:52 +01:00
Kaviya Rajendiran
857ac4c229
[MLIR][OpenMP] Lowering nontemporal clause to LLVM IR for SIMD directive (#118751)
This patch,
- Added a new attribute `nontemporal` to fir.load and fir.store operation in the FIR dialect.
- Added a pass `lower-nontemporal` which is called before FIRToLLVM conversion pass and adds the nontemporal attribute to loads and stores on the list items specified in the nontemporal clause of the SIMD directive.
- Set the `UnitAttr:$nontemporal` to llvm.load and llvm.store operations during FIR to LLVM dialect conversion, if the corresponding fir.load or fir.store operations have the nontemporal attribute.
- Attached the `nontemporal metadata` to load and store instructions that have the nontemporal attribute, during LLVM dialect to LLVM IR translation.
2025-04-30 11:13:20 +05:30
Pranav Bhandarkar
7dd8122d4e
[Flang][MLIR][OpenMP] - Add support for firstprivate when translating omp.target ops from MLIR to LLVMIR (#131213)
This patch adds support to translate `firstprivate` clauses on `omp.target` ops when translating from MLIR to LLVMIR.
Presently, this PR is restricted to supporting only included tasks, i.e `#omp target nowait firstprivate(some_variable)` will likely not work correctly even if it produces object code.
2025-04-29 14:53:15 -05:00
Tom Eccles
7b70fc74d0
[mlir][OpenMP] Convert omp.cancel sections to LLVMIR (#137193)
This is quite ugly but it is the best I could think of. The old
FiniCBWrapper was way too brittle depending upon the exact block
structure inside of the section, and could be confused by any control
flow in the section (e.g. an if clause on cancel). The wording in the
comment and variable names didn't seem to match where it was actually
branching too as well.

Clang's (non-OpenMPIRBuilder) lowering for cancel inside of sections
branches to a block containing __kmpc_for_static_fini.

This was hard to achieve here because sometimes the FiniCBWrapper has to
run before the worksharing loop finalization has been crated.

To get around this ordering issue I created a dummy branch to a dummy
block, which is then fixed later once all of the information is
available.
2025-04-29 17:19:40 +01:00
NimishMishra
b62afbccc8
[mlir][OpenMP] Add __atomic_store to AtomicInfo (#121055)
This PR adds functionality for `__atomic_store` libcall in AtomicInfo.
This allows for supporting complex types in `atomic write`.

Fixes https://github.com/llvm/llvm-project/issues/113479
Fixes https://github.com/llvm/llvm-project/issues/115652
2025-04-29 07:53:36 -07:00
Tom Eccles
2085119887
[mlir][OpenMP] Convert omp.cancel parallel to LLVMIR (#137192)
Support for other constructs will follow in subsequent PRs.
2025-04-28 10:33:55 +01:00
Dominik Adamski
adfc577895
[OpenMP][CodeExtractor]Add align metadata to load instructions (#131131)
Moving code to another function can lead to missed optimization
opportunities, because function passes operate on smaller chunks of
code, and they cannot figure out all details.

One example of missed optimization opportunities after code extraction
is information about pointer alignment. The instruction combine pass
adds information about pointer alignment to LLVM intrinsic memcpy calls
if it can deduce it from the code or if align metadata is added. If this
information is not present, then further optimization passes can
generate inefficient code.

If we add align metadata to extracted pointers, then the instruction
combine pass can add the align attribute to the LLVM intrinsic memcpy
call and unblock further optimization.

Scope of changes:
1. Analyze MLIR map operations. Add information about the alignment of
objects that are passed by reference to OpenMP GPU kernels.
2. Propagate alignment information to the outlined by `CodeExtractor`
helper functions.
2025-04-10 09:45:30 +02:00
Jan Leyonberg
1aed6ad906
[MLIR][OpenMP] Enable multiple variables for target teams reductions (#134903)
This patch enables multiple reductions to be used in a reduction clause
inside target regions for GPU offloading.

---------

Co-authored-by: Sergio Afonso <safonsof@amd.com>
2025-04-09 13:01:53 -04:00
NimishMishra
53fa92dcad
[mlir][llvm][OpenMP] Hoist __atomic_load alloca (#132888)
Current implementation of `__atomic_compare_exchange` uses an alloca for
`__atomic_load`, leading to issues like
https://github.com/llvm/llvm-project/issues/120724. This PR hoists this
alloca to `AllocaIP`.


Fixes: https://github.com/llvm/llvm-project/issues/120724
2025-04-09 03:01:44 -07:00
Jan Leyonberg
fbc8335311
[MLIR][OpenMP] Add codegen for teams reductions (#133310)
This patch adds the lowering of teams reductions from the omp dialect to
LLVM-IR. Some minor cleanup was done in clang to remove an unused
parameter.
2025-04-07 12:47:16 -04:00
Sergio Afonso
f59b5b8d59
[MLIR][OpenMP] Fix standalone distribute on the device (#133094)
This patch updates the handling of target regions to set trip counts and
kernel execution modes properly, based on clang's behavior. This fixes a
race condition on `target teams distribute` constructs with no `parallel
do` loop inside.

This is how kernels are classified, after changes introduced in this
patch:

```f90
! Exec mode: SPMD.
! Trip count: Set.
!$omp target teams distribute parallel do
do i=...
end do

! Exec mode: Generic-SPMD.
! Trip count: Set (outer loop).
!$omp target teams distribute
do i=...
  !$omp parallel do private(idx, y)
  do j=...
  end do
end do

! Exec mode: Generic-SPMD.
! Trip count: Set (outer loop).
!$omp target teams distribute
do i=...
  !$omp parallel
    ...
  !$omp end parallel
end do

! Exec mode: Generic.
! Trip count: Set.
!$omp target teams distribute
do i=...
end do

! Exec mode: SPMD.
! Trip count: Not set.
!$omp target parallel do
do i=...
end do

! Exec mode: Generic.
! Trip count: Not set.
!$omp target
  ...
!$omp end target
```

For the split `target teams distribute + parallel do` case, clang
produces a Generic kernel which gets promoted to Generic-SPMD by the
openmp-opt pass. We can't currently replicate that behavior in flang
because our codegen for these constructs results in the introduction of
calls to the `kmpc_distribute_static_loop` family of functions, instead
of `kmpc_distribute_static_init`, which currently prevent promotion of
the kernel to Generic-SPMD.

For the time being, instead of relying on the openmp-opt pass, we look
at the MLIR representation to find the Generic-SPMD pattern and directly
tag the kernel as such during codegen. This is what we were already
doing, but incorrectly matching other kinds of kernels as such in the
process.
2025-04-03 15:41:00 +01:00
Kareem Ergawy
f23a6ef54c
[flang][OpenMP] Process omp.atomic.update while translating scopes for target device (#132165)
Fixes a bug introduced by
https://github.com/llvm/llvm-project/pull/130078.

For non-BlockArgOpenMPOpInterface ops, we also want to map their entry
block arguments to their operands, if any. For the current support in
the OpenMP dialect, the table below lists all ops that have arguments
(SSA operands and/or attributes) and not target-related. Of all these
ops, we need to only process `omp.atomic.update` since it is the only op
that has SSA operands & an attached region. Therefore, the region's
entry block arguments must be mapped to the op's operands in case they
are referenced inside the region.


| op | operands? | region(s)? | parent is func? | processed? |

|--------------|-------------|------------|------------------|-------------|
| atomic.read | yes | no | yes | no |
| atomic.write | yes | no | yes | no |
| atomic.update | yes | yes | yes | yes |
| critical | no | no | yes | no |
| declare_mapper | no | yes | no | no |
| declare_reduction | no | yes | no | no |
| flush | yes | no | yes | no |
| private | no | yes | yes | no |
| threadprivate | yes | no | yes | no |
| yield | yes | no | yes | no |
2025-03-20 16:21:09 -05:00
Sergio Afonso
b231f6f862
[MLIR][OpenMP] Improve omp.map.info verification (#132066)
This patch makes the `map_type` and `map_capture_type` arguments of the
`omp.map.info` operation required, which was already an invariant being
verified by its users via `verifyMapClause()`. This makes it clearer, as
getters no longer return misleading `std::optional` values.

Checks for the `mapper_id` argument are moved to a verifier for the
operation, rather than being checked by users.

Functionally NFC, but not marked as such due to a reordering of
arguments in the assembly format of `omp.map.info`.
2025-03-20 15:48:45 +00:00
Kareem Ergawy
b7eb01b3a1
[NFC][OpenMP][MLIR] Refactor code related to collecting privatizer info into a shared util (#131582)
Moves code needed to collect info about delayed privatizers into a
shared util instread of repeating the same patter across all relevant
constructs.
2025-03-19 12:06:33 +01:00
Kareem Ergawy
e737b846b4
[flang][OpenMP] Translate OpenMP scopes when compiling for target device (#130078)
If a `target` directive is nested in a host OpenMP directive (e.g.
parallel, task, or a worksharing loop), flang currently crashes if the
target directive-related MLIR ops (e.g. `omp.map.bounds` and
`omp.map.info` depends on SSA values defined inside the parent host
OpenMP directives/ops.

This PR tries to solve this problem by treating these parent OpenMP ops
as "SSA scopes". Whenever we are translating for the device, instead of
completely translating host ops, we just tranlate their MLIR ops as pure
SSA values.
2025-03-19 08:26:19 +01:00
Kareem Ergawy
49b8d8472f
[OpenMP][MLIR] Support LLVM translation for distribute with delayed privatization (#131564)
Adds support for tranlating delayed privatization (`private` and
`firstprivate`) for `omp.distribute` ops.
2025-03-18 10:14:42 +01:00
Matthias Springer
6c867e27a7
[mlir] Use getSingleElement/hasSingleElement in various places (#131460)
This is a code cleanup. Update a few places in MLIR that should use
`hasSingleElement`/`getSingleElement`.

Note: `hasSingleElement` is faster than `.getSize() == 1` when it is
used with linked lists etc.

Depends on #131508.
2025-03-17 07:43:18 +01:00
Sergio Afonso
72b8744aa5
[MLIR][OpenMP] Reduce overhead of target compilation (#130945)
This patch avoids calling `TargetOp::getInnermostCapturedOmpOp` multiple
times during initialization of default and runtime target attributes in
MLIR to LLVM IR translation of `omp.target` operations. This is a
potentially expensive operation, so this change should help keep compile
times lower.
2025-03-14 15:18:32 +00:00
Michael Klemm
28ffa7f6a4
[flang][OpenMP] Fix missing missing inode issue (#130798)
When outlining an offload region, Flang creates a unique name by
querying an inode ID. However, when the name of the actual source file
does not match the logical file in a `#line` preprocessor directive,
code-gen was failing as it could not determine the inode ID. This PR
checks for this condition and if the logical file name does not exist,
the inode is replaced with a hash value created from the source code
itself.
2025-03-13 15:50:37 +01:00
Sergio Afonso
6ff33edf4d
[MLIR][OpenMP] Minor improvements to BlockArgOpenMPOpInterface, NFC (#130789)
This patch introduces a use for the new `getBlockArgsPairs` to avoid
having to manually list each applicable clause.

Also, the `numClauseBlockArgs()` function is introduced, which
simplifies the implementation of the interface's verifier and enables
better memory handling within `getBlockArgsPairs`.
2025-03-13 14:48:19 +00:00
Krzysztof Parzyszek
d67947162f
[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568)
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.

When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.

Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).

---------

Co-authored-by: Sergio Afonso <safonsof@amd.com>
2025-03-10 08:11:01 -05:00
Tom Eccles
ca1833b91e
[mlir][OpenMP] cast address space of private variables (#130301)
Fixes #130159

The problem is that the alloca created for the private variable uses the
default alloca address space in that module, but the function the
pointer is being passed to expects a different address space, leading to
a type missmatch in the function argument.

I know nothing about how AMDGPU is supposed to work. I based this
solution on code from createDeviceArgumentAccessor(). Please could
somebody from AMD confirm this solution is appropriate.
2025-03-07 18:30:57 +00:00