382 Commits

Author SHA1 Message Date
Ferran Toda
f560e4cfb1
[MLIR][OpenMP] Add omp.fuse operation (#168898)
This patch is a follow-up from #161213 and adds the omp.fuse loop
transformation for the OpenMP dialect. Used for lowering a `!$omp fuse`
in Flang.

Added Lowering and end2end tests.
2026-02-17 15:34:27 +01:00
Abid Qadeer
deedc7bfe3
[Flang][OpenMP] Don't generate code for unreachable target regions. (#178937)
When a target region is placed inside a constant false condition (e.g.,
`if (.false.)`), the dead code gets eliminated on the host side,
removing the `omp.target` operation entirely. However, the device-side
compilation pipeline is unaware of this elimination and attempts to
generate kernel code. Since the host never created offload metadata for
the eliminated target, the device-side kernel function lacks the
"kernel" attribute, causing `OpenMPOpt` to fail with an assertion when
it expects all outlined kernels to have this attribute. The problem can
be seen with the following code:

```fortran
program cele
  implicit none
  real :: V
  integer :: i
  if (.false.) then
    !$omp target teams distribute parallel do
    do i = 1, 5
      V = V * 2
    end do
    !$omp end target teams distribute parallel do
  end if
end program
```

It currently fails with the following assertion:

```
Assertion `omp::isOpenMPKernel(*Kernel) && "Expected kernel function!"' failed.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp:4291
```

This PR adds `DeleteUnreachableTargetsPass` that identifies `omp.target`
operations in unreachable code blocks and removes them.
2026-02-16 09:31:42 +00:00
Aiden Grossman
6e6f76026d [MLIR][OpenMP] Fix unused variable warning
7c07cb6542a0c5e4340e09a9a247e3e5123c6567 introduced a variable created
in an if statement that is only used in an assertion. Per the coding
guidelines, mark it [[maybe_unused]].
2026-02-10 20:40:29 +00:00
Jack Styles
8949c6d86b
[MLIR][OpenMP] Add Taskloop Collapse Support (#175924)
Following work completed in #174386 and #174623, this patch adds support
for collapse to Taskloop. Collapse allows for the user to compress
multiple loop nests into a single loop, and for this to work with
Taskloop, there needs to be some changes to how we process the loops,
and the tasks that run them.

This patch brings Taskloop equivalent to OpenMP 4.5 support for MLIR and
Flang.
2026-02-05 08:59:00 +00:00
Chi-Chun, Chen
36dadddd74
[Flang][mlir][OpenMP] Add affinity clause to omp.task and Flang lowering (#179003)
- Add MLIR OpenMP affinity clause
- Lower flang task affinity to mlir
- Emit TODO for iterator modifier and update negative test
2026-02-04 10:30:35 -06:00
Akash Banerjee
7c07cb6542
[MLIR][OpenMP] Fix recursive mapper emission. (#178453)
Recursive types can cause re-entrant mapper emission. The mapper
function is created by OpenMPIRBuilder before the callbacks run, so it
may already exist in the LLVM module even though it is not yet
registered in the ModuleTranslation mapping table. Reuse and register it
to break the recursion. Added offloading test.
2026-01-29 16:38:33 +00:00
Walter Lee
b1f845df32
[MLIR][OpenMP] Fix unused variable warning for #137201 (#178659)
Fixes 4cc80831ea5d39c186fc29692556b762ffb6478b.
2026-01-29 14:14:59 +00:00
Sergio Afonso
4cc80831ea
[MLIR][OpenMP] Simplify OpenMP device codegen (#137201)
After removing host operations from the device MLIR module, it is no
longer necessary to provide special codegen logic to prevent these
operations from causing compiler crashes or miscompilations.

This patch removes these now unnecessary code paths to simplify codegen
logic. Some MLIR tests are now replaced with Flang tests, since the
responsibility of dealing with host operations has been moved earlier in
the compilation flow.

MLIR tests holding target device modules are updated to no longer
include now unsupported host operations.
2026-01-29 12:44:40 +00:00
Jakub Kuderski
59e44799bd
[mlir] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178487)
Pre-commiting this before landing the new check in
https://github.com/llvm/llvm-project/pull/177892
2026-01-28 19:13:47 +00:00
Akash Banerjee
c856c3d045
[MLIR][OpenMP] Fix mapper being attached to partial maps. (#178247)
Fix OpenMP mapper lowering by attaching user-defined/default mappers
only to the base parent entry, not combined/segment entries. This
prevents mapper calls with partial sizes. Added relevant tests.
2026-01-28 18:35:03 +00:00
Chaitanya
55f0ed91ef
[OpenMP][MLIR] Add thread_limit with dims modifier support (#171825)
PR adds support of openmp 6.1 feature thread_limit with dims modifier.
llvmIR translation for thread_limit with dims modifier is marked as NYI.
2026-01-27 18:16:48 +05:30
Chaitanya
08654adc62
[OpenMP][MLIR] Add num_threads clause with dims modifier support (#171767)
PR adds support of openmp 6.1 feature num_threads with dims modifier.
llvmIR translation for num_threads with dims modifier is marked as NYI.
2026-01-27 15:30:55 +05:30
Chaitanya
3aaeace4e2
[OpenMP][MLIR] Add num_teams clause with dims modifier support (#169883)
PR adds support of openmp 6.1 feature `num_teams` with dims modifier.
llvmIR translation for num_teams with dims modifier is marked as NYI.
2026-01-27 10:55:40 +05:30
Jason Van Beusekom
0bdbf01e4e
[OpenMP][Flang][MLIR] Skip trip count calculation when bounds are null (#176469)
Fixes a segfault when trip count values are null by skipping trip count
calculation when we cannot determine if it is safe to hoist out the
values.

Of note I originally tried to modify `extractOnlyOmpNestedDir` to return
the first OpenMPConstruct directive, skipping over any earlier
directives (ie stores), which did work for the below generic test case:

```fortran
program minimal_repro
  implicit none

  integer :: i, m
  integer :: res(10) = 0

!$omp target teams map(from:m,res) private(m)
  m = 5
!$omp distribute parallel do
  do i = 1, 10
    res(i) = 5 + i
  end do
!$omp end distribute parallel do
!$omp end target teams

end program minimal_repro
```

But that led to incorrect output in this test case as the trip count was
hoisted out and calculated by m(1000000) instead of m(1)
```fortran
program minimal_repro
  implicit none

  integer :: i, x
  integer :: m(1) = 0
  integer :: res(10) = 0
  m(1) = 10
  x = 1000000
!$omp target teams map(res)
  x = 1
!$omp distribute parallel do
  do i = 1, m(x)
    res(i) = 5 + i
  end do
!$omp end distribute parallel do
!$omp end target teams

print *, "Test completed successfully m =", m, " res=", res

end program minimal_repro
```
Leading to a segfault, due to the loop bounds being calculated with
m(1000000)
```mlir
    %c1000000_i32 = arith.constant 1000000 : i32
    hlfir.assign %c1000000_i32 to %10#0 : i32, !fir.ref<i32>
    %c1_i32 = arith.constant 1 : i32
    %12 = fir.load %10#0 : !fir.ref<i32>
    %13 = fir.convert %12 : (i32) -> i64
    %14 = hlfir.designate %5#0 (%13)  : (!fir.ref<!fir.array<1xi32>>, i64) -> !fir.ref<i32>
    %15 = fir.load %14 : !fir.ref<i32>
    ...
    omp.target host_eval(%c1_i32 -> %arg0, %15 -> %arg1, %c1_i32_1 -> %arg2 : i32, i32, i32) map_entries(%18 -> %arg3, %19 -> %arg4, %20 -> %arg5, %23 -> %arg6 : !fir.ref<!fir.array<10xi32>>, !fir.ref<i32>, !fir.ref<i32>, !fir.ref<!fir.array<1xi32>>) {
      ...
      omp.teams {
              ...
              omp.loop_nest (%arg8) : i32 = (%arg0) to (%arg1) inclusive step (%arg2) {
 
```

The wip commit for this change is here:
beafeae396

We would need to have some sort of intelligent hoisting for these cases,
to allow hoisting, but for now I just created this PR to fix the bug.

Fixes: #176030
2026-01-21 11:56:36 +00:00
Michael Klemm
9f19d1895d
[OpenMP] Fix truncation/extension bug when calling __kmpc_push_num_teams (#173067)
This PR fixes a bug when the lower and upper bound for the number of
teams was not an `int32`, but a different type. In this case, an
internal compiler would trigger due to a mismatching call to
`__kmpc_push_num_teams`.
2026-01-19 11:20:11 +01:00
Austin Jiang
e6cdfb75ac
Fix typos and spelling errors across codebase (#156270)
Corrected various spelling mistakes such as 'occurred', 'receiver',
'initialized', 'length', and others in comments, variable names,
function names, and documentation throughout the project. These
changes improve code readability and maintain consistency in naming
and documentation.

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
2026-01-13 11:52:46 -05:00
Tom Eccles
804aa88317
[MLIR][OpenMP] Support cancel taskgroup inside of taskloop (#174815)
Implementation follows exactly what is done for omp.wsloop and omp.task.
See #137841.

The change to the operation verifier is to allow a taskgroup
cancellation point inside of a taskloop. This was already allowed for
omp.cancel.
2026-01-09 11:43:54 +00:00
Tom Eccles
ddb706bbb0
[mlir][OpenMP] Don't allocate task context structure if not needed (#174588)
Don't allocate a task context structure if none of the private variables
needed it. This was already skipped when there were no private variables
at all.
2026-01-09 10:49:06 +00:00
Jack Styles
b7c17ab957
[MLIR][OpenMP] Add Initial Taskloop Clause Support (#174623)
Following on from the work to implement MLIR -> LLVM IR Translation for
Taskloop, this adds support for the following clauses to be used
alongside taskloop:
- if
- grainsize
- num_tasks
- untied
- Nogroup
- Final
- Mergeable
- Priority

These clauses are ones which work directly through the relevant OpenMP
Runtime functions, so their information just needed collecting from the
relevant location and passing through to the appropriate runtime
function.

Remaining clauses retain their TODO message as they have not yet been
implemented.
2026-01-09 10:34:03 +00:00
Tom Eccles
cc1bb845da
[mlir][OpenMP] Fix sanitizer error in buildTaskLikeBodyGenCallback (#174983)
This is a fix for the asan bot after
https://github.com/llvm/llvm-project/pull/174386

Failing bot: https://lab.llvm.org/buildbot/#/builders/24/builds/16371

This commit undoes a simplification I thought reduced copied+pasted
code. I will merge it like this now to unblock the bot, and then work
separately on a different way to share code between both callbacks.
2026-01-08 14:41:40 +00:00
Tom Eccles
1af1cc21c8
[mlir][OpenMP] Translation support for taskloop construct (#174386)
This PR replaces #166903

This implements translation for taskloop, along with DSA clauses. Other
clauses will follow immediately after this is merged.

This patch was collaborative work by myself, @kaviya2510, and
@Stylie777. I’ve left the commits unsquashed to make authorship clear.
My only changes to other author’s commits are to rebase and run
clang-format.

The taskloop implementation in the runtime works roughly like this: if
the number of loop iterations to perform are more than some threshold,
the current task is duplicated and both resulting tasks gets half of the
loop range. This continues recursively until each task has a small
enough loop range to run itself in a single thread.

This leads to two implementation complexities:
- The runtime needs to be able to update the loop bounds used when
executing the loop inside of the task. This has been implemented by
forcing them to always have a fixed location inside of the structure
produced when outlining the task.
- When a task is duplicated, all data stored for the task’s
(first)private variables needs to also be duplicated and appropriate
constructors run. This is handled by a task duplication function invoked
by the runtime.

With regards to testing, most existing tests in the gfortran and fujitsu
test suites require the reduction clause (not part of OpenMP 4.5). I
wrote some tests of my own and was satisfied that it seems to be
working.

Co-authored-by: Kaviya Rajendiran <kaviyara2000@gmail.com>
Co-authored-by: Jack Styles <jack.styles@arm.com>

---------

Co-authored-by: Kaviya Rajendiran <kaviyara2000@gmail.com>
Co-authored-by: Jack Styles <jack.styles@arm.com>
2026-01-08 11:08:13 +00:00
Chi-Chun, Chen
5fb43838af
[mlir][OpenMP] Lower device clause for target data/enter/exit/update (#174665)
Extend OpenMP device clause lowering for target data, target enter data,
target exit data, and target update to accept non-constant values.
Previously, only constant device IDs could be lowered to LLVM IR.

Add Flang tests to validate device clause handling and mark the feature
as supported in the OpenMPSupport documentation. New tests cover:
- target teams
- target teams distribute
- target teams distribute parallel do
- target teams distribute parallel do simd
- target data

Tests for target update and target enter/exit were
already present in Flang.
2026-01-07 11:19:14 -06:00
Tom Eccles
07d07be73d
[mlir][OpenMP] Fix infinite loop after #174105 (#174736) 2026-01-07 10:48:16 +00:00
Chi-Chun, Chen
3f5d91bfbc
[Flang][OpenMP] Implement device clause lowering for target directive (#173509)
Add lowering support for the OpenMP `device` clause on the `target`
directive in Flang.

The device expression is propagated through MLIR OpenMP and passed to
the host-side `__tgt_target_kernel` call.
2026-01-06 11:10:03 -06:00
Tom Eccles
188d13db20
[mlir][OpenMP] don't add compiler-generated barrier in single threaded code (#174105)
We add barriers to the firstprivate copy region when they are required
to avoid a race condition with the lastprivate clause.

The problem is that these barriers are added by the compiler not implied
by user code so it is the compiler's problem to avoid deadlock.

I came across a testcase whilst working on taskloop support that looks a
bit like this
```
!$omp parallel
  !$omp single
    !$omp taskloop firstprivate(a) lastprivate(a)
      ...
  !$omp end single
!$omp end parallel
```

This is so that there are multiple threads for the generated tasks to be
distributed over, but we don't generate the tasks afresh in every
thread.

The problem comes when the taskloop requires a barrier to prevent the
datarace between firstprivate and lastprivate. This barrier will then be
generated inside of SINGLE and so only one thread will encounter the
barrier: leading to a deadlock.

This patch works around the problem by detecting this situation
statically and then not generating the barrier. There are cases where we
cannot detect this statically (e.g. if the TASKLOOP is inside a function
call inside of SINGLE). The program will still deadlock in this case
after my patch. I'm unsure what the solution would be for that case. I
want to fix this simple case in LLVM 22 before engaging in a longer
discussion as to whether there is a better way to handle the more
general case.

Testing using wsloop because I want to land this (or not) independently
of taskloop. Note that for wsloop it would be up to the programmer to
remember to use the nowait clause, but nowait cannot be used to control
generation of this barrier because it refers to the barrier after the
construct not after firstprivate copyin (before the construct
execution).
2026-01-06 10:22:41 +00:00
NimishMishra
11d9694b75
[flang][mlir] Add support for implicit linearization in omp.simd (#150386)
Up till OpenMP version 4.5, the loop iteration variable in the
associated do-construct of simd is linear with a linear step equal to
the increment of the loop. This PR implements this functionality. For
versions > 4.5, such an implicit linear clause is not assumed for the
loop iteration variable.

Fixes https://github.com/llvm/llvm-project/issues/171006
2026-01-03 21:37:43 -08:00
Krish Gupta
c646d1bd7d
[MLIR][OpenMP] Fix type mismatch in linear clause for INTEGER(8) variables (#173982)
Fixes #173332 

The compiler was crashing when compiling OpenMP `parallel do simd` with
a `linear` clause on `INTEGER(8)` variables. The assertion failure
occurred during MLIR-to-LLVM translation:
Cannot create binary operator with two operands of differing type!

**Root Cause:**
The bug was in `LinearClauseProcessor::updateLinearVar()` where the step
value (i32) and induction variable were multiplied without normalizing
to the linear variable's type (i64), causing type mismatches in LLVM IR
generation.

**Solution:**
Updated the translation logic to cast both the induction variable and
step value to `linearVarTypes[index]` before performing arithmetic
operations. This ensures type consistency for both integer and
floating-point linear variables.

**Testing:**
- Added integration test verifying successful compilation to LLVM IR
- Added lowering test for MLIR generation with various linear clause
forms
- Verified the exact reproducer from the issue now compiles without
errors
2026-01-02 11:52:33 +00:00
Akash Banerjee
b360a782ca
Reland "[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331)" (#170851)
Add support for OpenMP is_device_ptr clause for target directives.

[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr
#169367 This PR adds support for the OpenMP is_device_ptr clause in the
MLIR to LLVM IR translation for target regions. The is_device_ptr clause
allows device pointers (allocated via OpenMP runtime APIs) to be used
directly in target regions without implicit mapping.
2025-12-05 17:38:41 +00:00
NimishMishra
290b32a699
[llvm][mlir][OpenMP] Support translation for linear clause in omp.wsloop and omp.simd (#139386)
This patch adds support for LLVM translation of linear clause on
omp.wsloop (except for linear modifiers).
2025-12-04 20:39:17 -08:00
theRonShark
be79a0d90f
Revert "[Flang][OpenMP] Add lowering support for is_device_ptr clause" (#170778)
Reverts llvm/llvm-project#169331
2025-12-04 19:38:16 -05:00
Akash Banerjee
a77c4948a5
[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331)
Add support for OpenMP is_device_ptr clause for target directives.

[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr #169367
This PR adds support for the OpenMP is_device_ptr clause in the MLIR to LLVM IR translation for target regions. The is_device_ptr clause allows device pointers (allocated via OpenMP runtime APIs) to be used directly in target regions without implicit mapping.
2025-12-04 15:57:24 +00:00
Mehdi Amini
4c09e45f1d [MLIR] Apply clang-tidy fixes for llvm-qualified-auto in OpenMPToLLVMIRTranslation.cpp (NFC) 2025-12-03 07:01:47 -08:00
Tom Eccles
8ec2112ec8
[OMPIRBuilder] re-land cancel barriers patch #164586 (#169931)
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.

The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  // Populated by the callback:
  // unsafe: if any thread makes it to the end without being cancelled
  // it won't reach this barrier and then the program will deadlock
  %unused = call i32 @__kmpc_cancel_barrier(...)
  br label %fini

fini:
  // run destructors etc
  ret
```

In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  br label %fini

fini:
  // run destructors etc
  // safe so long as every exit from the function happens via this block:
  %unused = call i32 @__kmpc_cancel_barrier(...)
  ret
```

To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.

I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.

---

This re-lands #164586 with a small fix for a failing buildbot running
address sanitizer on clang lit tests.

In the previous version of the patch I added an insertion point guard
"just to be safe" and never removed it. There isn't insertion point
guarding on the other route out of this function and we do not
preserve the insertion point around getFiniBB either so it is not
needed here.

The problem flagged by the sanitizers was because the saved insertion
point pointed to an instruction which was then removed inside the FiniCB
for some clang codegen functions. The instruction was freed when it was
removed. Then accessing it to restore the insertion point was a use
after free bug.
2025-12-01 10:07:19 +00:00
Tom Eccles
58fa7e4ccd
Revert "[OMPIRBuilder] always leave PARALLEL via the same barrier" (#169829)
Reverts llvm/llvm-project#164586

Reverting due to buildbot failure:
https://lab.llvm.org/buildbot/#/builders/169/builds/17519
2025-11-27 16:19:52 +00:00
Jack Styles
47ae3eaa29
[MLIR][OpenMP] Add MLIR Lowering Support for dist_schedule (#152736)
`dist_schedule` was previously supported in Flang/Clang but was not
implemented in MLIR, instead a user would get a "not yet implemented"
error. This patch adds support for the `dist_schedule` clause to be
lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop`
section.

There has needed to be some rework required to ensure that MLIR/LLVM
emits the correct Schedule Type for the clause, as it uses a different
schedule type to other OpenMP directives/clauses in the runtime library.

This patch also ensures that when using dist_schedule or a chunked
schedule clause, the correct llvm loop parallel accesses details are
added.
2025-11-27 14:16:44 +00:00
Tom Eccles
0e5633fcd9
[OMPIRBuilder] always leave PARALLEL via the same barrier (#164586)
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.

The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  // Populated by the callback:
  // unsafe: if any thread makes it to the end without being cancelled
  // it won't reach this barrier and then the program will deadlock
  %unused = call i32 @__kmpc_cancel_barrier(...)
  br label %fini

fini:
  // run destructors etc
  ret
```

In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  br label %fini

fini:
  // run destructors etc
  // safe so long as every exit from the function happens via this block:
  %unused = call i32 @__kmpc_cancel_barrier(...)
  ret
```

To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.

I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.
2025-11-27 14:13:25 +00:00
Kareem Ergawy
f481f5bef9
[OpenMP][flang] Add initial support for by-ref reductions on the GPU (#165714)
Adds initial support for GPU by-ref reductions. The main problem for
reduction by reference is that, prior to this PR, we were shuffling
(from remote lanes within the same warp or across different warps within
the block) pointers/references to the private reduction values rather
than the private reduction values themselves.

In particular, this diff adds support for reductions on scalar
allocatables where reductions happen on loops nested in `target`
regions. For example:

```fortran
  integer :: i
  real, allocatable :: scalar_alloc

  allocate(scalar_alloc)
  scalar_alloc = 0

  !$omp target map(tofrom: scalar_alloc)
  !$omp parallel do reduction(+: scalar_alloc)
  do i = 1, 1000000
    scalar_alloc = scalar_alloc + 1
  end do
  !$omp end target
```

This PR supports by-ref reductions on the intra- and inter-warp levels.

So far, there are still steps to be takens for full support of by-ref
reductions, for example:
* Support inter-block value combination is still not supported.
Therefore, `target teams distribute parallel do` is still not supported.
* Support for dynamically-sized arrays still needs to be added.
* Support for more than one allocatable/array on the same `reduction`
clause.
2025-11-26 11:59:22 +01:00
Aiden Grossman
51dd3ec13c
[MLIR][OpenMP] Bail early in sortMapIndices if indices are the same (#169474)
If we are given the same index in the comparator callback, simply return
false. Otherwise we will end up adding invalid items to
occludedChildren, causing extra items to get removed that should not be,
resulting in failures that manifest in different forms (assertions, asan
failures, ubsan failures, etc.).
2025-11-25 06:23:12 -05:00
Jan Leyonberg
3e86f05621
[OpenMP][flang] Lowering of OpenMP custom reductions to MLIR (#168417)
This patch add support for lowering of custom reductions to MLIR. It
also enhances the capability of the pass to automatically mark functions
as "declare target" by traversing custom reduction initializers and
combiners.
2025-11-24 16:00:46 -05:00
agozillon
173600880b
[Flang][OpenMP][MLIR] Initial declare target to for variables implementation (#119589)
While the infrastructure for declare target to/enter and link for
variables exists in the MLIR dialect and at the Flang level, the current
lowering from MLIR -> LLVM IR isn't in place, it's only in place for
variables that have the link clause applied.

This PR aims to extend that lowering to an initial implementation that
incorporates declare target to as well, which primarily requires changes
in the OpenMPToLLVMIRTranslation phase. However, a minor addition to the
OpenMP dialect was required to extend the declare target enumerator to
include a default None field as well.

This also requires a minor change to the Flang lowering's
MapInfoFinlization.cpp pass to alter the map type for descriptors to
deal with cases where a variable is marked declare to. Currently, when a
descriptor variable is mapped declare target to the descriptor component
can become attatched, and cannot be updated, this results in issues when
an unusual allocation range is specified (effectively an off-by X
error). The current solution is to map the descriptor always, as we
always require an up-to-date version of this data. However, this also
requires an interlinked PR that adds a more intricate type of mapping of
structures/record types that clang currently implements, to circumvent
the overwriting of the pointer in the descriptor.

3/3 required PRs to enable declare target to mapping, this PR should
pass all tests and provide an all green CI.

Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
2025-11-24 21:22:49 +01:00
agozillon
20929abb85
[MLIR][OpenMP] Introduce overlapped record type map support (#119588)
This PR introduces a new additional type of map lowering for record
types that Clang currently supports, in which a user can map a top-level
record type and then individual members with different mapping,
effectively creating a sort of "overlapping" mapping that we attempt to
cut around.

This is currently most predominantly used in Fortran, when mapping
descriptors and there data, we map the descriptor and its data with
separate map modifiers and "cut around" the pointer data, so that wedo
not overwrite it unless the runtime deems it a neccesary action based on
its reference counting mechanism. However, it is a mechanism that will
come in handy/trigger when a user explitily maps a record type (derived
type or structure) and then explicitly maps a member with a different
map type.

These additions were predominantly in the OpenMPToLLVMIRTranslation.cpp
file and phase, however, one Flang test that checks end-to-end IR
compilation (as far as we care for now at least) was altered.

2/3 required PRs to enable declare target to mapping, should look at PR
3/3 to check for full green passes (this one will fail a number due to
some dependencies).

Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
2025-11-24 21:20:29 +01:00
agozillon
09318c6bff
[MLIR][OpenMP] Fix and simplify bounds offset calculation for 1-D GEP offsets (#165486)
Currently this is being calculated incorrectly and will result in
incorrect index offsets in more complicated array slices. This PR tries
to address it by refactoring and changing the calculation to be more
correct.
2025-10-31 00:54:31 +01:00
Pranav Bhandarkar
e2ad554991
[Flang][mlir] - Translation of delayed privatization for deferred target-tasks (#155348)
This PR adds support for translation of the private clause on deferred
target tasks - that is `omp.target` operations with the `nowait` clause.

An offloading call for a deferred target-task is not blocking - the
offloading (target-generating) host task continues its execution after issuing the offloading
call. Therefore, the key problem we need to solve is to ensure that the
data needed for private variables to be initialized in the target task
persists even after the host task has completed.
We do this in a new pass called `PrepareForOMPOffloadPrivatizationPass`.
For a privatized variable that needs its host counterpart for
initialization (such as the shape of the data from the descriptor when
an allocatable is privatized or the value of the data when an
allocatable is firstprivatized),
  - the pass allocates memory on the heap.
- it then initializes this memory by using the `init` and `copy` (for
firstprivate) regions of the corresponding `omp::PrivateClauseOp`.
- Finally the memory allocated on the heap is freed using the `dealloc`
region of the same `omp::PrivateClauseOp` instance. This step is not
straightforward though, because we cannot simply free the memory that's
going to be used by another thread without any synchronization. So, for
deallocation, we create a `omp.task` after the `omp.target` and
synchronize the two with a dummy dependency (using the `depend` clause).
In this newly created `omp.task` we do the deallocation.
2025-10-22 12:18:56 -05:00
agozillon
f2b20d3410
[Flang][OpenMP][Dialect] Swap to using MLIR dialect enum to encode map flags (#164043)
This PR shifts from using the LLVM OpenMP enumerator bit flags to an
OpenMP dialect specific enumerator. This allows us to better represent
map types that wouldn't be of interest to the LLVM backend and runtime
in the dialect.

Primarily things like
ref_ptr/ref_ptee/ref_ptr_ptee/atach_none/attach_always/attach_auto which
are of interest to the compiler for certrain transformations (primarily
in the FIR transformation passes dealing with mapping), but the runtime
has no need to know about them. It also means if another OpenMP
implementation comes along they won't need to stick to the same bit flag
system LLVM chose/do leg work to address it.
2025-10-21 21:54:25 +02:00
Mehdi Amini
936e03867f [MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in OpenMPToLLVMIRTranslation.cpp (NFC) 2025-10-17 05:58:15 -07:00
Jakub Kuderski
8bab6c4e8c
[mlir] Simplify unreachable type switch cases. NFC. (#162032)
Use `DefaultUnreachable` from
https://github.com/llvm/llvm-project/pull/161970.
2025-10-06 09:23:25 -04:00
Michael Kruse
419594230f
[mlir][omp] Add omp.tile operation (#160292)
Add the `omp.tile` loop transformations for the OpenMP dialect. Used for
lowering a standalone `!$omp tile` in Flang.
2025-10-02 17:12:14 +00:00
Jan Svoboda
c580ad488e
[clang] Use the VFS to create the OpenMP region entry ID (#160918)
This PR uses the VFS to create the OpenMP target entry instead of going
straight to the real file system. This matches the behavior of other
input files of the compiler.
2025-09-26 12:25:37 -07:00
Dominik Adamski
83ef38a274
[Flang][OpenMP] Enable no-loop kernels (#155818)
Enable the generation of no-loop kernels for Fortran OpenMP code. target
teams distribute parallel do pragmas can be promoted to no-loop kernels
if the user adds the -fopenmp-assume-teams-oversubscription and
-fopenmp-assume-threads-oversubscription flags.

If the OpenMP kernel contains reduction or num_teams clauses, it is not
promoted to no-loop mode.

The global OpenMP device RTL oversubscription flags no longer force
no-loop code generation for Fortran.
2025-09-26 13:57:51 +02:00
Akash Banerjee
8afea0d0ea
[OpenMP][MLIR] Preserve to/from flags in mapper base entry for mappers (#159799)
With declare mapper, the parent base entry was emitted as `TARGET_PARAM`
only. The mapper received a map-type without `to/from`, causing
components to degrade to `alloc`-only (no copies), breaking allocatable
payload mapping. This PR preserves the map-type bits from the parent.

This fixes #156466.
2025-09-19 19:34:09 +01:00