394 Commits

Author SHA1 Message Date
Chi-Chun, Chen
7ff0dc4b9f
[mlir][OpenMP] Add iterator support to depend clause (#189090)
Extend the depend clause to support `!omp.iterated<Ty>` handles
alongside plain depend vars, so the IR can represent both forms.

Assisted with copilot

This is part of feature work for
https://github.com/llvm/llvm-project/issues/188061
2026-03-31 11:11:08 -05:00
Leandro Lupori
a30a8e9474
Reland "[flang][OpenMP] Fix lowering of LINEAR iteration variables (#183794)" (#188851)
Linear iteration variables were being treated as private. This fixes
one of the issues reported in #170784.

The regression reported in #188536 occurred because
LinearClauseProcessor was rewriting all basic blocks whose names
contained a given substring, including those that were not part of the
translated SIMD region.
This didn't cause problems before because linear variables were always
privatized, which doesn't happen with this change.
The issue is fixed by rewriting only the basic blocks that correspond to
the omp.simd operation.
2026-03-31 09:36:08 -03:00
Alexis Engelke
7581430722
[IR] Require well-formed IR for BasicBlock::getTerminator (#189416)
BasicBlock::getTerminator() is frequently called on valid IR, yet the
function has to check that the last instruction is in fact a terminator,
even in release builds. This check can only be optimized away when the
instruction is dereferenced.

Therefore, introduce the functions hasTerminator() and
getTerminatorOrNull() as replacement and require (assert) that
getTerminator() always returns a valid terminator. As a side effect,
this forces explicit expression of intent at call sites when unfinished
basic blocks should be supported.
2026-03-30 18:57:37 +02:00
Alexis Engelke
4c745df8bc
[MLIR][LLVMIR][NFC] Drop uses of BranchInst (#187304) 2026-03-18 15:55:10 +00:00
Chi-Chun, Chen
2ad51ffbfa
[mlir][llvmir][OpenMP] Translate affinity clause in task construct to llvmir (#182223)
Translate affinity entries to LLVMIR by passing affinity information to
createTask (__kmpc_omp_reg_task_with_affinity is created inside
PostOutlineCB).

3/3 in stack for implementing affinity clause with iterator modifier
1/3 #182218
2/3 #182222
3/3 #182223
2026-03-16 10:16:38 -05:00
Joseph Huber
154a128c65 Reapply "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309)
Should be working downstream now
This reverts commit 9b61ff210fdff752d5db55b128474e9990258488.
2026-03-13 15:48:37 -05:00
theRonShark
9b61ff210f
Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309)
Reverts llvm/llvm-project#185989
2026-03-13 05:20:40 +00:00
Joseph Huber
4376fbd793
[OpenMP] Move OpenMP implicit argument to the end and reformat (#185989)
Summary:
We use this `dyn_ptr` argument in Clang/OpenMP to handle the
`KernelLaunchEnvironment`. This is a per-kernel argument used to share
some information. Currenetly, it's prepended to the argument list and we
generate storage for it in the runtime.

This is bad for a few reasons:
1. It changes the ABI by shifting user arguments
2. It cannot be trivially be left uninitialized if unused
3. The runtime must allocate its own memory for it

This PR changes it to be appended instead. Additionally, space for this
is always emitted. This means the OMPIRBuilder itself will provide the
storage, we simply need to populate it in the runtime if it is used.
This means that if it's unused we don't always pay the cost and it's
easier for non-OpenMP users to ignore it.

Backward compatibility is maintained by auto-upgrading the kernel
arguments. In `libomptarget` we completely allocate a new buffer to
store this in the new format. The plugins still need to respect the old
ABI of the called device object, so we simply rotate it if it's the old
version.
2026-03-12 18:08:22 -05:00
Sergio Afonso
8f668cec47
[MLIR][OpenMP] Prevent teams reductions from deadlocking (#184625)
Currently, simple Fortran reductions like the example below cause a
deadlock at runtime:

```f90
integer :: i, x

!$omp teams distribute reduction(+:x)
do i=1, 10
  x = x + 1
end do
```

Preventing a redundant barrier from being added in that case addresses
this issue. Synchronization is already being handled by the
`__kmpc_reduce` and `__kmpc_end_reduce` runtime calls for the host, and
by the OMPIRBuilder-generated `_omp_reduction_inter_warp_copy_func`
function for GPUs.
2026-03-09 15:26:50 +00:00
Sunil Shrestha
af2b6ed6aa
[flang][openmp] Add support for ordered regions in SIMD directives (#… (#183379)
Add support for ordered regions within SIMD directives (!$omp simd
ordered and !$omp do simd ordered). This initial implementation matches
Clang's behavior.

In SIMD directives, loop induction variables have an implicit linear
clause with deferred store semantics (storing to .linear_result). To
properly support ordered regions, the LinearClauseProcessor rewrites
variable references to use .linear_result in:
- omp.ordered.region: Code inside ordered blocks
- omp_region.finalize: Code after ordered blocks

Note: The vectorizer cannot currently vectorize loops with ordered
regions. Future enhancement would require generating lane loops or
unrolling ordered regions across SIMD lanes while maintaining ordering
semantics.

This PR is a reland for https://github.com/llvm/llvm-project/pull/181012
and fixes the regression caused by syntax change in IR for linear clause
2026-02-25 15:23:18 -06:00
Aiden Grossman
efe0b2f993 Revert "[flang][openmp] Add support for ordered regions in SIMD directives (#181012)"
This reverts commit 31dacdc1f5d486da6ef6d8b2f7e3b6126d92c9ff.

See the PR for test failure details.
2026-02-25 19:31:47 +00:00
Sunil Shrestha
31dacdc1f5
[flang][openmp] Add support for ordered regions in SIMD directives (#181012)
Add support for ordered regions within SIMD directives (!$omp simd
ordered and !$omp do simd ordered). This initial implementation matches
Clang's behavior.

In SIMD directives, loop induction variables have an implicit linear
clause with deferred store semantics (storing to .linear_result). To
properly support ordered regions, the LinearClauseProcessor rewrites
variable references to use .linear_result in:
- omp.ordered.region: Code inside ordered blocks
- omp_region.finalize: Code after ordered blocks

Note: The vectorizer cannot currently vectorize loops with ordered
regions. Future enhancement would require generating lane loops or
unrolling ordered regions across SIMD lanes while maintaining ordering
semantics.
2026-02-25 11:49:07 -06:00
Ferran Toda
f560e4cfb1
[MLIR][OpenMP] Add omp.fuse operation (#168898)
This patch is a follow-up from #161213 and adds the omp.fuse loop
transformation for the OpenMP dialect. Used for lowering a `!$omp fuse`
in Flang.

Added Lowering and end2end tests.
2026-02-17 15:34:27 +01:00
Abid Qadeer
deedc7bfe3
[Flang][OpenMP] Don't generate code for unreachable target regions. (#178937)
When a target region is placed inside a constant false condition (e.g.,
`if (.false.)`), the dead code gets eliminated on the host side,
removing the `omp.target` operation entirely. However, the device-side
compilation pipeline is unaware of this elimination and attempts to
generate kernel code. Since the host never created offload metadata for
the eliminated target, the device-side kernel function lacks the
"kernel" attribute, causing `OpenMPOpt` to fail with an assertion when
it expects all outlined kernels to have this attribute. The problem can
be seen with the following code:

```fortran
program cele
  implicit none
  real :: V
  integer :: i
  if (.false.) then
    !$omp target teams distribute parallel do
    do i = 1, 5
      V = V * 2
    end do
    !$omp end target teams distribute parallel do
  end if
end program
```

It currently fails with the following assertion:

```
Assertion `omp::isOpenMPKernel(*Kernel) && "Expected kernel function!"' failed.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp:4291
```

This PR adds `DeleteUnreachableTargetsPass` that identifies `omp.target`
operations in unreachable code blocks and removes them.
2026-02-16 09:31:42 +00:00
Aiden Grossman
6e6f76026d [MLIR][OpenMP] Fix unused variable warning
7c07cb6542a0c5e4340e09a9a247e3e5123c6567 introduced a variable created
in an if statement that is only used in an assertion. Per the coding
guidelines, mark it [[maybe_unused]].
2026-02-10 20:40:29 +00:00
Jack Styles
8949c6d86b
[MLIR][OpenMP] Add Taskloop Collapse Support (#175924)
Following work completed in #174386 and #174623, this patch adds support
for collapse to Taskloop. Collapse allows for the user to compress
multiple loop nests into a single loop, and for this to work with
Taskloop, there needs to be some changes to how we process the loops,
and the tasks that run them.

This patch brings Taskloop equivalent to OpenMP 4.5 support for MLIR and
Flang.
2026-02-05 08:59:00 +00:00
Chi-Chun, Chen
36dadddd74
[Flang][mlir][OpenMP] Add affinity clause to omp.task and Flang lowering (#179003)
- Add MLIR OpenMP affinity clause
- Lower flang task affinity to mlir
- Emit TODO for iterator modifier and update negative test
2026-02-04 10:30:35 -06:00
Akash Banerjee
7c07cb6542
[MLIR][OpenMP] Fix recursive mapper emission. (#178453)
Recursive types can cause re-entrant mapper emission. The mapper
function is created by OpenMPIRBuilder before the callbacks run, so it
may already exist in the LLVM module even though it is not yet
registered in the ModuleTranslation mapping table. Reuse and register it
to break the recursion. Added offloading test.
2026-01-29 16:38:33 +00:00
Walter Lee
b1f845df32
[MLIR][OpenMP] Fix unused variable warning for #137201 (#178659)
Fixes 4cc80831ea5d39c186fc29692556b762ffb6478b.
2026-01-29 14:14:59 +00:00
Sergio Afonso
4cc80831ea
[MLIR][OpenMP] Simplify OpenMP device codegen (#137201)
After removing host operations from the device MLIR module, it is no
longer necessary to provide special codegen logic to prevent these
operations from causing compiler crashes or miscompilations.

This patch removes these now unnecessary code paths to simplify codegen
logic. Some MLIR tests are now replaced with Flang tests, since the
responsibility of dealing with host operations has been moved earlier in
the compilation flow.

MLIR tests holding target device modules are updated to no longer
include now unsupported host operations.
2026-01-29 12:44:40 +00:00
Jakub Kuderski
59e44799bd
[mlir] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178487)
Pre-commiting this before landing the new check in
https://github.com/llvm/llvm-project/pull/177892
2026-01-28 19:13:47 +00:00
Akash Banerjee
c856c3d045
[MLIR][OpenMP] Fix mapper being attached to partial maps. (#178247)
Fix OpenMP mapper lowering by attaching user-defined/default mappers
only to the base parent entry, not combined/segment entries. This
prevents mapper calls with partial sizes. Added relevant tests.
2026-01-28 18:35:03 +00:00
Chaitanya
55f0ed91ef
[OpenMP][MLIR] Add thread_limit with dims modifier support (#171825)
PR adds support of openmp 6.1 feature thread_limit with dims modifier.
llvmIR translation for thread_limit with dims modifier is marked as NYI.
2026-01-27 18:16:48 +05:30
Chaitanya
08654adc62
[OpenMP][MLIR] Add num_threads clause with dims modifier support (#171767)
PR adds support of openmp 6.1 feature num_threads with dims modifier.
llvmIR translation for num_threads with dims modifier is marked as NYI.
2026-01-27 15:30:55 +05:30
Chaitanya
3aaeace4e2
[OpenMP][MLIR] Add num_teams clause with dims modifier support (#169883)
PR adds support of openmp 6.1 feature `num_teams` with dims modifier.
llvmIR translation for num_teams with dims modifier is marked as NYI.
2026-01-27 10:55:40 +05:30
Jason Van Beusekom
0bdbf01e4e
[OpenMP][Flang][MLIR] Skip trip count calculation when bounds are null (#176469)
Fixes a segfault when trip count values are null by skipping trip count
calculation when we cannot determine if it is safe to hoist out the
values.

Of note I originally tried to modify `extractOnlyOmpNestedDir` to return
the first OpenMPConstruct directive, skipping over any earlier
directives (ie stores), which did work for the below generic test case:

```fortran
program minimal_repro
  implicit none

  integer :: i, m
  integer :: res(10) = 0

!$omp target teams map(from:m,res) private(m)
  m = 5
!$omp distribute parallel do
  do i = 1, 10
    res(i) = 5 + i
  end do
!$omp end distribute parallel do
!$omp end target teams

end program minimal_repro
```

But that led to incorrect output in this test case as the trip count was
hoisted out and calculated by m(1000000) instead of m(1)
```fortran
program minimal_repro
  implicit none

  integer :: i, x
  integer :: m(1) = 0
  integer :: res(10) = 0
  m(1) = 10
  x = 1000000
!$omp target teams map(res)
  x = 1
!$omp distribute parallel do
  do i = 1, m(x)
    res(i) = 5 + i
  end do
!$omp end distribute parallel do
!$omp end target teams

print *, "Test completed successfully m =", m, " res=", res

end program minimal_repro
```
Leading to a segfault, due to the loop bounds being calculated with
m(1000000)
```mlir
    %c1000000_i32 = arith.constant 1000000 : i32
    hlfir.assign %c1000000_i32 to %10#0 : i32, !fir.ref<i32>
    %c1_i32 = arith.constant 1 : i32
    %12 = fir.load %10#0 : !fir.ref<i32>
    %13 = fir.convert %12 : (i32) -> i64
    %14 = hlfir.designate %5#0 (%13)  : (!fir.ref<!fir.array<1xi32>>, i64) -> !fir.ref<i32>
    %15 = fir.load %14 : !fir.ref<i32>
    ...
    omp.target host_eval(%c1_i32 -> %arg0, %15 -> %arg1, %c1_i32_1 -> %arg2 : i32, i32, i32) map_entries(%18 -> %arg3, %19 -> %arg4, %20 -> %arg5, %23 -> %arg6 : !fir.ref<!fir.array<10xi32>>, !fir.ref<i32>, !fir.ref<i32>, !fir.ref<!fir.array<1xi32>>) {
      ...
      omp.teams {
              ...
              omp.loop_nest (%arg8) : i32 = (%arg0) to (%arg1) inclusive step (%arg2) {
 
```

The wip commit for this change is here:
beafeae396

We would need to have some sort of intelligent hoisting for these cases,
to allow hoisting, but for now I just created this PR to fix the bug.

Fixes: #176030
2026-01-21 11:56:36 +00:00
Michael Klemm
9f19d1895d
[OpenMP] Fix truncation/extension bug when calling __kmpc_push_num_teams (#173067)
This PR fixes a bug when the lower and upper bound for the number of
teams was not an `int32`, but a different type. In this case, an
internal compiler would trigger due to a mismatching call to
`__kmpc_push_num_teams`.
2026-01-19 11:20:11 +01:00
Austin Jiang
e6cdfb75ac
Fix typos and spelling errors across codebase (#156270)
Corrected various spelling mistakes such as 'occurred', 'receiver',
'initialized', 'length', and others in comments, variable names,
function names, and documentation throughout the project. These
changes improve code readability and maintain consistency in naming
and documentation.

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
2026-01-13 11:52:46 -05:00
Tom Eccles
804aa88317
[MLIR][OpenMP] Support cancel taskgroup inside of taskloop (#174815)
Implementation follows exactly what is done for omp.wsloop and omp.task.
See #137841.

The change to the operation verifier is to allow a taskgroup
cancellation point inside of a taskloop. This was already allowed for
omp.cancel.
2026-01-09 11:43:54 +00:00
Tom Eccles
ddb706bbb0
[mlir][OpenMP] Don't allocate task context structure if not needed (#174588)
Don't allocate a task context structure if none of the private variables
needed it. This was already skipped when there were no private variables
at all.
2026-01-09 10:49:06 +00:00
Jack Styles
b7c17ab957
[MLIR][OpenMP] Add Initial Taskloop Clause Support (#174623)
Following on from the work to implement MLIR -> LLVM IR Translation for
Taskloop, this adds support for the following clauses to be used
alongside taskloop:
- if
- grainsize
- num_tasks
- untied
- Nogroup
- Final
- Mergeable
- Priority

These clauses are ones which work directly through the relevant OpenMP
Runtime functions, so their information just needed collecting from the
relevant location and passing through to the appropriate runtime
function.

Remaining clauses retain their TODO message as they have not yet been
implemented.
2026-01-09 10:34:03 +00:00
Tom Eccles
cc1bb845da
[mlir][OpenMP] Fix sanitizer error in buildTaskLikeBodyGenCallback (#174983)
This is a fix for the asan bot after
https://github.com/llvm/llvm-project/pull/174386

Failing bot: https://lab.llvm.org/buildbot/#/builders/24/builds/16371

This commit undoes a simplification I thought reduced copied+pasted
code. I will merge it like this now to unblock the bot, and then work
separately on a different way to share code between both callbacks.
2026-01-08 14:41:40 +00:00
Tom Eccles
1af1cc21c8
[mlir][OpenMP] Translation support for taskloop construct (#174386)
This PR replaces #166903

This implements translation for taskloop, along with DSA clauses. Other
clauses will follow immediately after this is merged.

This patch was collaborative work by myself, @kaviya2510, and
@Stylie777. I’ve left the commits unsquashed to make authorship clear.
My only changes to other author’s commits are to rebase and run
clang-format.

The taskloop implementation in the runtime works roughly like this: if
the number of loop iterations to perform are more than some threshold,
the current task is duplicated and both resulting tasks gets half of the
loop range. This continues recursively until each task has a small
enough loop range to run itself in a single thread.

This leads to two implementation complexities:
- The runtime needs to be able to update the loop bounds used when
executing the loop inside of the task. This has been implemented by
forcing them to always have a fixed location inside of the structure
produced when outlining the task.
- When a task is duplicated, all data stored for the task’s
(first)private variables needs to also be duplicated and appropriate
constructors run. This is handled by a task duplication function invoked
by the runtime.

With regards to testing, most existing tests in the gfortran and fujitsu
test suites require the reduction clause (not part of OpenMP 4.5). I
wrote some tests of my own and was satisfied that it seems to be
working.

Co-authored-by: Kaviya Rajendiran <kaviyara2000@gmail.com>
Co-authored-by: Jack Styles <jack.styles@arm.com>

---------

Co-authored-by: Kaviya Rajendiran <kaviyara2000@gmail.com>
Co-authored-by: Jack Styles <jack.styles@arm.com>
2026-01-08 11:08:13 +00:00
Chi-Chun, Chen
5fb43838af
[mlir][OpenMP] Lower device clause for target data/enter/exit/update (#174665)
Extend OpenMP device clause lowering for target data, target enter data,
target exit data, and target update to accept non-constant values.
Previously, only constant device IDs could be lowered to LLVM IR.

Add Flang tests to validate device clause handling and mark the feature
as supported in the OpenMPSupport documentation. New tests cover:
- target teams
- target teams distribute
- target teams distribute parallel do
- target teams distribute parallel do simd
- target data

Tests for target update and target enter/exit were
already present in Flang.
2026-01-07 11:19:14 -06:00
Tom Eccles
07d07be73d
[mlir][OpenMP] Fix infinite loop after #174105 (#174736) 2026-01-07 10:48:16 +00:00
Chi-Chun, Chen
3f5d91bfbc
[Flang][OpenMP] Implement device clause lowering for target directive (#173509)
Add lowering support for the OpenMP `device` clause on the `target`
directive in Flang.

The device expression is propagated through MLIR OpenMP and passed to
the host-side `__tgt_target_kernel` call.
2026-01-06 11:10:03 -06:00
Tom Eccles
188d13db20
[mlir][OpenMP] don't add compiler-generated barrier in single threaded code (#174105)
We add barriers to the firstprivate copy region when they are required
to avoid a race condition with the lastprivate clause.

The problem is that these barriers are added by the compiler not implied
by user code so it is the compiler's problem to avoid deadlock.

I came across a testcase whilst working on taskloop support that looks a
bit like this
```
!$omp parallel
  !$omp single
    !$omp taskloop firstprivate(a) lastprivate(a)
      ...
  !$omp end single
!$omp end parallel
```

This is so that there are multiple threads for the generated tasks to be
distributed over, but we don't generate the tasks afresh in every
thread.

The problem comes when the taskloop requires a barrier to prevent the
datarace between firstprivate and lastprivate. This barrier will then be
generated inside of SINGLE and so only one thread will encounter the
barrier: leading to a deadlock.

This patch works around the problem by detecting this situation
statically and then not generating the barrier. There are cases where we
cannot detect this statically (e.g. if the TASKLOOP is inside a function
call inside of SINGLE). The program will still deadlock in this case
after my patch. I'm unsure what the solution would be for that case. I
want to fix this simple case in LLVM 22 before engaging in a longer
discussion as to whether there is a better way to handle the more
general case.

Testing using wsloop because I want to land this (or not) independently
of taskloop. Note that for wsloop it would be up to the programmer to
remember to use the nowait clause, but nowait cannot be used to control
generation of this barrier because it refers to the barrier after the
construct not after firstprivate copyin (before the construct
execution).
2026-01-06 10:22:41 +00:00
NimishMishra
11d9694b75
[flang][mlir] Add support for implicit linearization in omp.simd (#150386)
Up till OpenMP version 4.5, the loop iteration variable in the
associated do-construct of simd is linear with a linear step equal to
the increment of the loop. This PR implements this functionality. For
versions > 4.5, such an implicit linear clause is not assumed for the
loop iteration variable.

Fixes https://github.com/llvm/llvm-project/issues/171006
2026-01-03 21:37:43 -08:00
Krish Gupta
c646d1bd7d
[MLIR][OpenMP] Fix type mismatch in linear clause for INTEGER(8) variables (#173982)
Fixes #173332 

The compiler was crashing when compiling OpenMP `parallel do simd` with
a `linear` clause on `INTEGER(8)` variables. The assertion failure
occurred during MLIR-to-LLVM translation:
Cannot create binary operator with two operands of differing type!

**Root Cause:**
The bug was in `LinearClauseProcessor::updateLinearVar()` where the step
value (i32) and induction variable were multiplied without normalizing
to the linear variable's type (i64), causing type mismatches in LLVM IR
generation.

**Solution:**
Updated the translation logic to cast both the induction variable and
step value to `linearVarTypes[index]` before performing arithmetic
operations. This ensures type consistency for both integer and
floating-point linear variables.

**Testing:**
- Added integration test verifying successful compilation to LLVM IR
- Added lowering test for MLIR generation with various linear clause
forms
- Verified the exact reproducer from the issue now compiles without
errors
2026-01-02 11:52:33 +00:00
Akash Banerjee
b360a782ca
Reland "[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331)" (#170851)
Add support for OpenMP is_device_ptr clause for target directives.

[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr
#169367 This PR adds support for the OpenMP is_device_ptr clause in the
MLIR to LLVM IR translation for target regions. The is_device_ptr clause
allows device pointers (allocated via OpenMP runtime APIs) to be used
directly in target regions without implicit mapping.
2025-12-05 17:38:41 +00:00
NimishMishra
290b32a699
[llvm][mlir][OpenMP] Support translation for linear clause in omp.wsloop and omp.simd (#139386)
This patch adds support for LLVM translation of linear clause on
omp.wsloop (except for linear modifiers).
2025-12-04 20:39:17 -08:00
theRonShark
be79a0d90f
Revert "[Flang][OpenMP] Add lowering support for is_device_ptr clause" (#170778)
Reverts llvm/llvm-project#169331
2025-12-04 19:38:16 -05:00
Akash Banerjee
a77c4948a5
[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331)
Add support for OpenMP is_device_ptr clause for target directives.

[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr #169367
This PR adds support for the OpenMP is_device_ptr clause in the MLIR to LLVM IR translation for target regions. The is_device_ptr clause allows device pointers (allocated via OpenMP runtime APIs) to be used directly in target regions without implicit mapping.
2025-12-04 15:57:24 +00:00
Mehdi Amini
4c09e45f1d [MLIR] Apply clang-tidy fixes for llvm-qualified-auto in OpenMPToLLVMIRTranslation.cpp (NFC) 2025-12-03 07:01:47 -08:00
Tom Eccles
8ec2112ec8
[OMPIRBuilder] re-land cancel barriers patch #164586 (#169931)
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.

The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  // Populated by the callback:
  // unsafe: if any thread makes it to the end without being cancelled
  // it won't reach this barrier and then the program will deadlock
  %unused = call i32 @__kmpc_cancel_barrier(...)
  br label %fini

fini:
  // run destructors etc
  ret
```

In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  br label %fini

fini:
  // run destructors etc
  // safe so long as every exit from the function happens via this block:
  %unused = call i32 @__kmpc_cancel_barrier(...)
  ret
```

To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.

I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.

---

This re-lands #164586 with a small fix for a failing buildbot running
address sanitizer on clang lit tests.

In the previous version of the patch I added an insertion point guard
"just to be safe" and never removed it. There isn't insertion point
guarding on the other route out of this function and we do not
preserve the insertion point around getFiniBB either so it is not
needed here.

The problem flagged by the sanitizers was because the saved insertion
point pointed to an instruction which was then removed inside the FiniCB
for some clang codegen functions. The instruction was freed when it was
removed. Then accessing it to restore the insertion point was a use
after free bug.
2025-12-01 10:07:19 +00:00
Tom Eccles
58fa7e4ccd
Revert "[OMPIRBuilder] always leave PARALLEL via the same barrier" (#169829)
Reverts llvm/llvm-project#164586

Reverting due to buildbot failure:
https://lab.llvm.org/buildbot/#/builders/169/builds/17519
2025-11-27 16:19:52 +00:00
Jack Styles
47ae3eaa29
[MLIR][OpenMP] Add MLIR Lowering Support for dist_schedule (#152736)
`dist_schedule` was previously supported in Flang/Clang but was not
implemented in MLIR, instead a user would get a "not yet implemented"
error. This patch adds support for the `dist_schedule` clause to be
lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop`
section.

There has needed to be some rework required to ensure that MLIR/LLVM
emits the correct Schedule Type for the clause, as it uses a different
schedule type to other OpenMP directives/clauses in the runtime library.

This patch also ensures that when using dist_schedule or a chunked
schedule clause, the correct llvm loop parallel accesses details are
added.
2025-11-27 14:16:44 +00:00
Tom Eccles
0e5633fcd9
[OMPIRBuilder] always leave PARALLEL via the same barrier (#164586)
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.

The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  // Populated by the callback:
  // unsafe: if any thread makes it to the end without being cancelled
  // it won't reach this barrier and then the program will deadlock
  %unused = call i32 @__kmpc_cancel_barrier(...)
  br label %fini

fini:
  // run destructors etc
  ret
```

In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  br label %fini

fini:
  // run destructors etc
  // safe so long as every exit from the function happens via this block:
  %unused = call i32 @__kmpc_cancel_barrier(...)
  ret
```

To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.

I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.
2025-11-27 14:13:25 +00:00
Kareem Ergawy
f481f5bef9
[OpenMP][flang] Add initial support for by-ref reductions on the GPU (#165714)
Adds initial support for GPU by-ref reductions. The main problem for
reduction by reference is that, prior to this PR, we were shuffling
(from remote lanes within the same warp or across different warps within
the block) pointers/references to the private reduction values rather
than the private reduction values themselves.

In particular, this diff adds support for reductions on scalar
allocatables where reductions happen on loops nested in `target`
regions. For example:

```fortran
  integer :: i
  real, allocatable :: scalar_alloc

  allocate(scalar_alloc)
  scalar_alloc = 0

  !$omp target map(tofrom: scalar_alloc)
  !$omp parallel do reduction(+: scalar_alloc)
  do i = 1, 1000000
    scalar_alloc = scalar_alloc + 1
  end do
  !$omp end target
```

This PR supports by-ref reductions on the intra- and inter-warp levels.

So far, there are still steps to be takens for full support of by-ref
reductions, for example:
* Support inter-block value combination is still not supported.
Therefore, `target teams distribute parallel do` is still not supported.
* Support for dynamically-sized arrays still needs to be added.
* Support for more than one allocatable/array on the same `reduction`
clause.
2025-11-26 11:59:22 +01:00
Aiden Grossman
51dd3ec13c
[MLIR][OpenMP] Bail early in sortMapIndices if indices are the same (#169474)
If we are given the same index in the comparator callback, simply return
false. Otherwise we will end up adding invalid items to
occludedChildren, causing extra items to get removed that should not be,
resulting in failures that manifest in different forms (assertions, asan
failures, ubsan failures, etc.).
2025-11-25 06:23:12 -05:00