llvm-project

Author	SHA1	Message	Date
Chi-Chun, Chen	7ff0dc4b9f	[mlir][OpenMP] Add iterator support to depend clause (#189090 ) Extend the depend clause to support `!omp.iterated<Ty>` handles alongside plain depend vars, so the IR can represent both forms. Assisted with copilot This is part of feature work for https://github.com/llvm/llvm-project/issues/188061	2026-03-31 11:11:08 -05:00
Leandro Lupori	a30a8e9474	Reland "[flang][OpenMP] Fix lowering of LINEAR iteration variables (#183794 )" (#188851 ) Linear iteration variables were being treated as private. This fixes one of the issues reported in #170784. The regression reported in #188536 occurred because LinearClauseProcessor was rewriting all basic blocks whose names contained a given substring, including those that were not part of the translated SIMD region. This didn't cause problems before because linear variables were always privatized, which doesn't happen with this change. The issue is fixed by rewriting only the basic blocks that correspond to the omp.simd operation.	2026-03-31 09:36:08 -03:00
Alexis Engelke	7581430722	[IR] Require well-formed IR for BasicBlock::getTerminator (#189416 ) BasicBlock::getTerminator() is frequently called on valid IR, yet the function has to check that the last instruction is in fact a terminator, even in release builds. This check can only be optimized away when the instruction is dereferenced. Therefore, introduce the functions hasTerminator() and getTerminatorOrNull() as replacement and require (assert) that getTerminator() always returns a valid terminator. As a side effect, this forces explicit expression of intent at call sites when unfinished basic blocks should be supported.	2026-03-30 18:57:37 +02:00
Alexis Engelke	4c745df8bc	[MLIR][LLVMIR][NFC] Drop uses of BranchInst (#187304 )	2026-03-18 15:55:10 +00:00
Chi-Chun, Chen	2ad51ffbfa	[mlir][llvmir][OpenMP] Translate affinity clause in task construct to llvmir (#182223 ) Translate affinity entries to LLVMIR by passing affinity information to createTask (__kmpc_omp_reg_task_with_affinity is created inside PostOutlineCB). 3/3 in stack for implementing affinity clause with iterator modifier 1/3 #182218 2/3 #182222 3/3 #182223	2026-03-16 10:16:38 -05:00
Joseph Huber	154a128c65	Reapply "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309 ) Should be working downstream now This reverts commit 9b61ff210fdff752d5db55b128474e9990258488.	2026-03-13 15:48:37 -05:00
theRonShark	9b61ff210f	Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309 ) Reverts llvm/llvm-project#185989	2026-03-13 05:20:40 +00:00
Joseph Huber	4376fbd793	[OpenMP] Move OpenMP implicit argument to the end and reformat (#185989 ) Summary: We use this `dyn_ptr` argument in Clang/OpenMP to handle the `KernelLaunchEnvironment`. This is a per-kernel argument used to share some information. Currenetly, it's prepended to the argument list and we generate storage for it in the runtime. This is bad for a few reasons: 1. It changes the ABI by shifting user arguments 2. It cannot be trivially be left uninitialized if unused 3. The runtime must allocate its own memory for it This PR changes it to be appended instead. Additionally, space for this is always emitted. This means the OMPIRBuilder itself will provide the storage, we simply need to populate it in the runtime if it is used. This means that if it's unused we don't always pay the cost and it's easier for non-OpenMP users to ignore it. Backward compatibility is maintained by auto-upgrading the kernel arguments. In `libomptarget` we completely allocate a new buffer to store this in the new format. The plugins still need to respect the old ABI of the called device object, so we simply rotate it if it's the old version.	2026-03-12 18:08:22 -05:00
Sergio Afonso	8f668cec47	[MLIR][OpenMP] Prevent teams reductions from deadlocking (#184625 ) Currently, simple Fortran reductions like the example below cause a deadlock at runtime: ```f90 integer :: i, x !$omp teams distribute reduction(+:x) do i=1, 10 x = x + 1 end do ``` Preventing a redundant barrier from being added in that case addresses this issue. Synchronization is already being handled by the `__kmpc_reduce` and `__kmpc_end_reduce` runtime calls for the host, and by the OMPIRBuilder-generated `_omp_reduction_inter_warp_copy_func` function for GPUs.	2026-03-09 15:26:50 +00:00
Sunil Shrestha	af2b6ed6aa	[flang][openmp] Add support for ordered regions in SIMD directives (#… (#183379 ) Add support for ordered regions within SIMD directives (!$omp simd ordered and !$omp do simd ordered). This initial implementation matches Clang's behavior. In SIMD directives, loop induction variables have an implicit linear clause with deferred store semantics (storing to .linear_result). To properly support ordered regions, the LinearClauseProcessor rewrites variable references to use .linear_result in: - omp.ordered.region: Code inside ordered blocks - omp_region.finalize: Code after ordered blocks Note: The vectorizer cannot currently vectorize loops with ordered regions. Future enhancement would require generating lane loops or unrolling ordered regions across SIMD lanes while maintaining ordering semantics. This PR is a reland for https://github.com/llvm/llvm-project/pull/181012 and fixes the regression caused by syntax change in IR for linear clause	2026-02-25 15:23:18 -06:00
Aiden Grossman	efe0b2f993	Revert "[flang][openmp] Add support for ordered regions in SIMD directives (#181012 )" This reverts commit 31dacdc1f5d486da6ef6d8b2f7e3b6126d92c9ff. See the PR for test failure details.	2026-02-25 19:31:47 +00:00
Sunil Shrestha	31dacdc1f5	[flang][openmp] Add support for ordered regions in SIMD directives (#181012 ) Add support for ordered regions within SIMD directives (!$omp simd ordered and !$omp do simd ordered). This initial implementation matches Clang's behavior. In SIMD directives, loop induction variables have an implicit linear clause with deferred store semantics (storing to .linear_result). To properly support ordered regions, the LinearClauseProcessor rewrites variable references to use .linear_result in: - omp.ordered.region: Code inside ordered blocks - omp_region.finalize: Code after ordered blocks Note: The vectorizer cannot currently vectorize loops with ordered regions. Future enhancement would require generating lane loops or unrolling ordered regions across SIMD lanes while maintaining ordering semantics.	2026-02-25 11:49:07 -06:00
Ferran Toda	f560e4cfb1	[MLIR][OpenMP] Add omp.fuse operation (#168898 ) This patch is a follow-up from #161213 and adds the omp.fuse loop transformation for the OpenMP dialect. Used for lowering a `!$omp fuse` in Flang. Added Lowering and end2end tests.	2026-02-17 15:34:27 +01:00
Abid Qadeer	deedc7bfe3	[Flang][OpenMP] Don't generate code for unreachable target regions. (#178937 ) When a target region is placed inside a constant false condition (e.g., `if (.false.)`), the dead code gets eliminated on the host side, removing the `omp.target` operation entirely. However, the device-side compilation pipeline is unaware of this elimination and attempts to generate kernel code. Since the host never created offload metadata for the eliminated target, the device-side kernel function lacks the "kernel" attribute, causing `OpenMPOpt` to fail with an assertion when it expects all outlined kernels to have this attribute. The problem can be seen with the following code: ```fortran program cele implicit none real :: V integer :: i if (.false.) then !$omp target teams distribute parallel do do i = 1, 5 V = V * 2 end do !$omp end target teams distribute parallel do end if end program ``` It currently fails with the following assertion: ``` Assertion `omp::isOpenMPKernel(*Kernel) && "Expected kernel function!"' failed. llvm/lib/Transforms/IPO/OpenMPOpt.cpp:4291 ``` This PR adds `DeleteUnreachableTargetsPass` that identifies `omp.target` operations in unreachable code blocks and removes them.	2026-02-16 09:31:42 +00:00
Aiden Grossman	6e6f76026d	[MLIR][OpenMP] Fix unused variable warning 7c07cb6542a0c5e4340e09a9a247e3e5123c6567 introduced a variable created in an if statement that is only used in an assertion. Per the coding guidelines, mark it [[maybe_unused]].	2026-02-10 20:40:29 +00:00
Jack Styles	8949c6d86b	[MLIR][OpenMP] Add Taskloop Collapse Support (#175924 ) Following work completed in #174386 and #174623, this patch adds support for collapse to Taskloop. Collapse allows for the user to compress multiple loop nests into a single loop, and for this to work with Taskloop, there needs to be some changes to how we process the loops, and the tasks that run them. This patch brings Taskloop equivalent to OpenMP 4.5 support for MLIR and Flang.	2026-02-05 08:59:00 +00:00
Chi-Chun, Chen	36dadddd74	[Flang][mlir][OpenMP] Add affinity clause to omp.task and Flang lowering (#179003 ) - Add MLIR OpenMP affinity clause - Lower flang task affinity to mlir - Emit TODO for iterator modifier and update negative test	2026-02-04 10:30:35 -06:00
Akash Banerjee	7c07cb6542	[MLIR][OpenMP] Fix recursive mapper emission. (#178453 ) Recursive types can cause re-entrant mapper emission. The mapper function is created by OpenMPIRBuilder before the callbacks run, so it may already exist in the LLVM module even though it is not yet registered in the ModuleTranslation mapping table. Reuse and register it to break the recursion. Added offloading test.	2026-01-29 16:38:33 +00:00
Walter Lee	b1f845df32	[MLIR][OpenMP] Fix unused variable warning for #137201 (#178659 ) Fixes 4cc80831ea5d39c186fc29692556b762ffb6478b.	2026-01-29 14:14:59 +00:00
Sergio Afonso	4cc80831ea	[MLIR][OpenMP] Simplify OpenMP device codegen (#137201 ) After removing host operations from the device MLIR module, it is no longer necessary to provide special codegen logic to prevent these operations from causing compiler crashes or miscompilations. This patch removes these now unnecessary code paths to simplify codegen logic. Some MLIR tests are now replaced with Flang tests, since the responsibility of dealing with host operations has been moved earlier in the compilation flow. MLIR tests holding target device modules are updated to no longer include now unsupported host operations.	2026-01-29 12:44:40 +00:00
Jakub Kuderski	59e44799bd	[mlir] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178487 ) Pre-commiting this before landing the new check in https://github.com/llvm/llvm-project/pull/177892	2026-01-28 19:13:47 +00:00
Akash Banerjee	c856c3d045	[MLIR][OpenMP] Fix mapper being attached to partial maps. (#178247 ) Fix OpenMP mapper lowering by attaching user-defined/default mappers only to the base parent entry, not combined/segment entries. This prevents mapper calls with partial sizes. Added relevant tests.	2026-01-28 18:35:03 +00:00
Chaitanya	55f0ed91ef	[OpenMP][MLIR] Add thread_limit with dims modifier support (#171825 ) PR adds support of openmp 6.1 feature thread_limit with dims modifier. llvmIR translation for thread_limit with dims modifier is marked as NYI.	2026-01-27 18:16:48 +05:30
Chaitanya	08654adc62	[OpenMP][MLIR] Add num_threads clause with dims modifier support (#171767 ) PR adds support of openmp 6.1 feature num_threads with dims modifier. llvmIR translation for num_threads with dims modifier is marked as NYI.	2026-01-27 15:30:55 +05:30
Chaitanya	3aaeace4e2	[OpenMP][MLIR] Add num_teams clause with dims modifier support (#169883 ) PR adds support of openmp 6.1 feature `num_teams` with dims modifier. llvmIR translation for num_teams with dims modifier is marked as NYI.	2026-01-27 10:55:40 +05:30
Jason Van Beusekom	0bdbf01e4e	[OpenMP][Flang][MLIR] Skip trip count calculation when bounds are null (#176469 ) Fixes a segfault when trip count values are null by skipping trip count calculation when we cannot determine if it is safe to hoist out the values. Of note I originally tried to modify `extractOnlyOmpNestedDir` to return the first OpenMPConstruct directive, skipping over any earlier directives (ie stores), which did work for the below generic test case: ```fortran program minimal_repro implicit none integer :: i, m integer :: res(10) = 0 !$omp target teams map(from:m,res) private(m) m = 5 !$omp distribute parallel do do i = 1, 10 res(i) = 5 + i end do !$omp end distribute parallel do !$omp end target teams end program minimal_repro ``` But that led to incorrect output in this test case as the trip count was hoisted out and calculated by m(1000000) instead of m(1) ```fortran program minimal_repro implicit none integer :: i, x integer :: m(1) = 0 integer :: res(10) = 0 m(1) = 10 x = 1000000 !$omp target teams map(res) x = 1 !$omp distribute parallel do do i = 1, m(x) res(i) = 5 + i end do !$omp end distribute parallel do !$omp end target teams print *, "Test completed successfully m =", m, " res=", res end program minimal_repro ``` Leading to a segfault, due to the loop bounds being calculated with m(1000000) ```mlir %c1000000_i32 = arith.constant 1000000 : i32 hlfir.assign %c1000000_i32 to %10#0 : i32, !fir.ref<i32> %c1_i32 = arith.constant 1 : i32 %12 = fir.load %10#0 : !fir.ref<i32> %13 = fir.convert %12 : (i32) -> i64 %14 = hlfir.designate %5#0 (%13) : (!fir.ref<!fir.array<1xi32>>, i64) -> !fir.ref<i32> %15 = fir.load %14 : !fir.ref<i32> ... omp.target host_eval(%c1_i32 -> %arg0, %15 -> %arg1, %c1_i32_1 -> %arg2 : i32, i32, i32) map_entries(%18 -> %arg3, %19 -> %arg4, %20 -> %arg5, %23 -> %arg6 : !fir.ref<!fir.array<10xi32>>, !fir.ref<i32>, !fir.ref<i32>, !fir.ref<!fir.array<1xi32>>) { ... omp.teams { ... omp.loop_nest (%arg8) : i32 = (%arg0) to (%arg1) inclusive step (%arg2) { ``` The wip commit for this change is here: `beafeae396` We would need to have some sort of intelligent hoisting for these cases, to allow hoisting, but for now I just created this PR to fix the bug. Fixes: #176030	2026-01-21 11:56:36 +00:00
Michael Klemm	9f19d1895d	[OpenMP] Fix truncation/extension bug when calling __kmpc_push_num_teams (#173067 ) This PR fixes a bug when the lower and upper bound for the number of teams was not an `int32`, but a different type. In this case, an internal compiler would trigger due to a mismatching call to `__kmpc_push_num_teams`.	2026-01-19 11:20:11 +01:00
Austin Jiang	e6cdfb75ac	Fix typos and spelling errors across codebase (#156270 ) Corrected various spelling mistakes such as 'occurred', 'receiver', 'initialized', 'length', and others in comments, variable names, function names, and documentation throughout the project. These changes improve code readability and maintain consistency in naming and documentation. Co-authored-by: Louis Dionne <ldionne.2@gmail.com>	2026-01-13 11:52:46 -05:00
Tom Eccles	804aa88317	[MLIR][OpenMP] Support cancel taskgroup inside of taskloop (#174815 ) Implementation follows exactly what is done for omp.wsloop and omp.task. See #137841. The change to the operation verifier is to allow a taskgroup cancellation point inside of a taskloop. This was already allowed for omp.cancel.	2026-01-09 11:43:54 +00:00
Tom Eccles	ddb706bbb0	[mlir][OpenMP] Don't allocate task context structure if not needed (#174588 ) Don't allocate a task context structure if none of the private variables needed it. This was already skipped when there were no private variables at all.	2026-01-09 10:49:06 +00:00
Jack Styles	b7c17ab957	[MLIR][OpenMP] Add Initial Taskloop Clause Support (#174623 ) Following on from the work to implement MLIR -> LLVM IR Translation for Taskloop, this adds support for the following clauses to be used alongside taskloop: - if - grainsize - num_tasks - untied - Nogroup - Final - Mergeable - Priority These clauses are ones which work directly through the relevant OpenMP Runtime functions, so their information just needed collecting from the relevant location and passing through to the appropriate runtime function. Remaining clauses retain their TODO message as they have not yet been implemented.	2026-01-09 10:34:03 +00:00
Tom Eccles	cc1bb845da	[mlir][OpenMP] Fix sanitizer error in buildTaskLikeBodyGenCallback (#174983 ) This is a fix for the asan bot after https://github.com/llvm/llvm-project/pull/174386 Failing bot: https://lab.llvm.org/buildbot/#/builders/24/builds/16371 This commit undoes a simplification I thought reduced copied+pasted code. I will merge it like this now to unblock the bot, and then work separately on a different way to share code between both callbacks.	2026-01-08 14:41:40 +00:00
Tom Eccles	1af1cc21c8	[mlir][OpenMP] Translation support for taskloop construct (#174386 ) This PR replaces #166903 This implements translation for taskloop, along with DSA clauses. Other clauses will follow immediately after this is merged. This patch was collaborative work by myself, @kaviya2510, and @Stylie777. I’ve left the commits unsquashed to make authorship clear. My only changes to other author’s commits are to rebase and run clang-format. The taskloop implementation in the runtime works roughly like this: if the number of loop iterations to perform are more than some threshold, the current task is duplicated and both resulting tasks gets half of the loop range. This continues recursively until each task has a small enough loop range to run itself in a single thread. This leads to two implementation complexities: - The runtime needs to be able to update the loop bounds used when executing the loop inside of the task. This has been implemented by forcing them to always have a fixed location inside of the structure produced when outlining the task. - When a task is duplicated, all data stored for the task’s (first)private variables needs to also be duplicated and appropriate constructors run. This is handled by a task duplication function invoked by the runtime. With regards to testing, most existing tests in the gfortran and fujitsu test suites require the reduction clause (not part of OpenMP 4.5). I wrote some tests of my own and was satisfied that it seems to be working. Co-authored-by: Kaviya Rajendiran <kaviyara2000@gmail.com> Co-authored-by: Jack Styles <jack.styles@arm.com> --------- Co-authored-by: Kaviya Rajendiran <kaviyara2000@gmail.com> Co-authored-by: Jack Styles <jack.styles@arm.com>	2026-01-08 11:08:13 +00:00
Chi-Chun, Chen	5fb43838af	[mlir][OpenMP] Lower device clause for target data/enter/exit/update (#174665 ) Extend OpenMP device clause lowering for target data, target enter data, target exit data, and target update to accept non-constant values. Previously, only constant device IDs could be lowered to LLVM IR. Add Flang tests to validate device clause handling and mark the feature as supported in the OpenMPSupport documentation. New tests cover: - target teams - target teams distribute - target teams distribute parallel do - target teams distribute parallel do simd - target data Tests for target update and target enter/exit were already present in Flang.	2026-01-07 11:19:14 -06:00
Tom Eccles	07d07be73d	[mlir][OpenMP] Fix infinite loop after #174105 (#174736 )	2026-01-07 10:48:16 +00:00
Chi-Chun, Chen	3f5d91bfbc	[Flang][OpenMP] Implement device clause lowering for target directive (#173509 ) Add lowering support for the OpenMP `device` clause on the `target` directive in Flang. The device expression is propagated through MLIR OpenMP and passed to the host-side `__tgt_target_kernel` call.	2026-01-06 11:10:03 -06:00
Tom Eccles	188d13db20	[mlir][OpenMP] don't add compiler-generated barrier in single threaded code (#174105 ) We add barriers to the firstprivate copy region when they are required to avoid a race condition with the lastprivate clause. The problem is that these barriers are added by the compiler not implied by user code so it is the compiler's problem to avoid deadlock. I came across a testcase whilst working on taskloop support that looks a bit like this ``` !$omp parallel !$omp single !$omp taskloop firstprivate(a) lastprivate(a) ... !$omp end single !$omp end parallel ``` This is so that there are multiple threads for the generated tasks to be distributed over, but we don't generate the tasks afresh in every thread. The problem comes when the taskloop requires a barrier to prevent the datarace between firstprivate and lastprivate. This barrier will then be generated inside of SINGLE and so only one thread will encounter the barrier: leading to a deadlock. This patch works around the problem by detecting this situation statically and then not generating the barrier. There are cases where we cannot detect this statically (e.g. if the TASKLOOP is inside a function call inside of SINGLE). The program will still deadlock in this case after my patch. I'm unsure what the solution would be for that case. I want to fix this simple case in LLVM 22 before engaging in a longer discussion as to whether there is a better way to handle the more general case. Testing using wsloop because I want to land this (or not) independently of taskloop. Note that for wsloop it would be up to the programmer to remember to use the nowait clause, but nowait cannot be used to control generation of this barrier because it refers to the barrier after the construct not after firstprivate copyin (before the construct execution).	2026-01-06 10:22:41 +00:00
NimishMishra	11d9694b75	[flang][mlir] Add support for implicit linearization in omp.simd (#150386 ) Up till OpenMP version 4.5, the loop iteration variable in the associated do-construct of simd is linear with a linear step equal to the increment of the loop. This PR implements this functionality. For versions > 4.5, such an implicit linear clause is not assumed for the loop iteration variable. Fixes https://github.com/llvm/llvm-project/issues/171006	2026-01-03 21:37:43 -08:00
Krish Gupta	c646d1bd7d	[MLIR][OpenMP] Fix type mismatch in linear clause for INTEGER(8) variables (#173982 ) Fixes #173332 The compiler was crashing when compiling OpenMP `parallel do simd` with a `linear` clause on `INTEGER(8)` variables. The assertion failure occurred during MLIR-to-LLVM translation: Cannot create binary operator with two operands of differing type! Root Cause: The bug was in `LinearClauseProcessor::updateLinearVar()` where the step value (i32) and induction variable were multiplied without normalizing to the linear variable's type (i64), causing type mismatches in LLVM IR generation. Solution: Updated the translation logic to cast both the induction variable and step value to `linearVarTypes[index]` before performing arithmetic operations. This ensures type consistency for both integer and floating-point linear variables. Testing: - Added integration test verifying successful compilation to LLVM IR - Added lowering test for MLIR generation with various linear clause forms - Verified the exact reproducer from the issue now compiles without errors	2026-01-02 11:52:33 +00:00
Akash Banerjee	b360a782ca	Reland "[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331 )" (#170851 ) Add support for OpenMP is_device_ptr clause for target directives. [MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr #169367 This PR adds support for the OpenMP is_device_ptr clause in the MLIR to LLVM IR translation for target regions. The is_device_ptr clause allows device pointers (allocated via OpenMP runtime APIs) to be used directly in target regions without implicit mapping.	2025-12-05 17:38:41 +00:00
NimishMishra	290b32a699	[llvm][mlir][OpenMP] Support translation for linear clause in omp.wsloop and omp.simd (#139386 ) This patch adds support for LLVM translation of linear clause on omp.wsloop (except for linear modifiers).	2025-12-04 20:39:17 -08:00
theRonShark	be79a0d90f	Revert "[Flang][OpenMP] Add lowering support for is_device_ptr clause" (#170778 ) Reverts llvm/llvm-project#169331	2025-12-04 19:38:16 -05:00
Akash Banerjee	a77c4948a5	[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331 ) Add support for OpenMP is_device_ptr clause for target directives. [MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr #169367 This PR adds support for the OpenMP is_device_ptr clause in the MLIR to LLVM IR translation for target regions. The is_device_ptr clause allows device pointers (allocated via OpenMP runtime APIs) to be used directly in target regions without implicit mapping.	2025-12-04 15:57:24 +00:00
Mehdi Amini	4c09e45f1d	[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in OpenMPToLLVMIRTranslation.cpp (NFC)	2025-12-03 07:01:47 -08:00
Tom Eccles	8ec2112ec8	[OMPIRBuilder] re-land cancel barriers patch #164586 (#169931 ) A barrier will pause execution until all threads reach it. If some go to a different barrier then we deadlock. This manifests in that the finalization callback must only be run once. Fix by ensuring we always go through the same finalization block whether the thread in cancelled or not and no matter which cancellation point causes the cancellation. The old callback only affected PARALLEL, so it has been moved into the code generating PARALLEL. For this reason, we don't need similar changes for other cancellable constructs. We need to create the barrier on the shared exit from the outlined function instead of only on the cancelled branch to make sure that threads exiting normally (without cancellation) meet the same barriers as those which were cancelled. For example, previously we might have generated code like ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: // Populated by the callback: // unsafe: if any thread makes it to the end without being cancelled // it won't reach this barrier and then the program will deadlock %unused = call i32 @__kmpc_cancel_barrier(...) br label %fini fini: // run destructors etc ret ``` In the new version the barrier is moved into fini. I generate it after the destructors because the standard describes the barrier as occurring after the end of the parallel region. ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: br label %fini fini: // run destructors etc // safe so long as every exit from the function happens via this block: %unused = call i32 @__kmpc_cancel_barrier(...) ret ``` To achieve this, the barrier is now generated alongside the finalization code instead of in the callback. This is the reason for the changes to the unit test. I'm unsure if I should keep the incorrect barrier generation callback only on the cancellation branch in clang with the OMPIRBuilder backend because that would match clang's ordinary codegen. Right now I have opted to remove it entirely because it is a deadlock waiting to happen. --- This re-lands #164586 with a small fix for a failing buildbot running address sanitizer on clang lit tests. In the previous version of the patch I added an insertion point guard "just to be safe" and never removed it. There isn't insertion point guarding on the other route out of this function and we do not preserve the insertion point around getFiniBB either so it is not needed here. The problem flagged by the sanitizers was because the saved insertion point pointed to an instruction which was then removed inside the FiniCB for some clang codegen functions. The instruction was freed when it was removed. Then accessing it to restore the insertion point was a use after free bug.	2025-12-01 10:07:19 +00:00
Tom Eccles	58fa7e4ccd	Revert "[OMPIRBuilder] always leave PARALLEL via the same barrier" (#169829 ) Reverts llvm/llvm-project#164586 Reverting due to buildbot failure: https://lab.llvm.org/buildbot/#/builders/169/builds/17519	2025-11-27 16:19:52 +00:00
Jack Styles	47ae3eaa29	[MLIR][OpenMP] Add MLIR Lowering Support for dist_schedule (#152736 ) `dist_schedule` was previously supported in Flang/Clang but was not implemented in MLIR, instead a user would get a "not yet implemented" error. This patch adds support for the `dist_schedule` clause to be lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop` section. There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. This patch also ensures that when using dist_schedule or a chunked schedule clause, the correct llvm loop parallel accesses details are added.	2025-11-27 14:16:44 +00:00
Tom Eccles	0e5633fcd9	[OMPIRBuilder] always leave PARALLEL via the same barrier (#164586 ) A barrier will pause execution until all threads reach it. If some go to a different barrier then we deadlock. This manifests in that the finalization callback must only be run once. Fix by ensuring we always go through the same finalization block whether the thread in cancelled or not and no matter which cancellation point causes the cancellation. The old callback only affected PARALLEL, so it has been moved into the code generating PARALLEL. For this reason, we don't need similar changes for other cancellable constructs. We need to create the barrier on the shared exit from the outlined function instead of only on the cancelled branch to make sure that threads exiting normally (without cancellation) meet the same barriers as those which were cancelled. For example, previously we might have generated code like ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: // Populated by the callback: // unsafe: if any thread makes it to the end without being cancelled // it won't reach this barrier and then the program will deadlock %unused = call i32 @__kmpc_cancel_barrier(...) br label %fini fini: // run destructors etc ret ``` In the new version the barrier is moved into fini. I generate it after the destructors because the standard describes the barrier as occurring after the end of the parallel region. ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: br label %fini fini: // run destructors etc // safe so long as every exit from the function happens via this block: %unused = call i32 @__kmpc_cancel_barrier(...) ret ``` To achieve this, the barrier is now generated alongside the finalization code instead of in the callback. This is the reason for the changes to the unit test. I'm unsure if I should keep the incorrect barrier generation callback only on the cancellation branch in clang with the OMPIRBuilder backend because that would match clang's ordinary codegen. Right now I have opted to remove it entirely because it is a deadlock waiting to happen.	2025-11-27 14:13:25 +00:00
Kareem Ergawy	f481f5bef9	[OpenMP][flang] Add initial support for by-ref reductions on the GPU (#165714 ) Adds initial support for GPU by-ref reductions. The main problem for reduction by reference is that, prior to this PR, we were shuffling (from remote lanes within the same warp or across different warps within the block) pointers/references to the private reduction values rather than the private reduction values themselves. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in `target` regions. For example: ```fortran integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end target ``` This PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example: * Support inter-block value combination is still not supported. Therefore, `target teams distribute parallel do` is still not supported. * Support for dynamically-sized arrays still needs to be added. * Support for more than one allocatable/array on the same `reduction` clause.	2025-11-26 11:59:22 +01:00
Aiden Grossman	51dd3ec13c	[MLIR][OpenMP] Bail early in sortMapIndices if indices are the same (#169474 ) If we are given the same index in the comparator callback, simply return false. Otherwise we will end up adding invalid items to occludedChildren, causing extra items to get removed that should not be, resulting in failures that manifest in different forms (assertions, asan failures, ubsan failures, etc.).	2025-11-25 06:23:12 -05:00

1 2 3 4 5 ...

394 Commits