22 Commits

Author SHA1 Message Date
Kareem Ergawy
e74e970036
[flang][OpenMP][DoConcurrent] Add collapse clause to generated omp.loop_nest op (#178138)
Adds the collpase clause to the generated loop nest both on host and
device.
2026-01-27 11:58:57 +01:00
Kareem Ergawy
f481f5bef9
[OpenMP][flang] Add initial support for by-ref reductions on the GPU (#165714)
Adds initial support for GPU by-ref reductions. The main problem for
reduction by reference is that, prior to this PR, we were shuffling
(from remote lanes within the same warp or across different warps within
the block) pointers/references to the private reduction values rather
than the private reduction values themselves.

In particular, this diff adds support for reductions on scalar
allocatables where reductions happen on loops nested in `target`
regions. For example:

```fortran
  integer :: i
  real, allocatable :: scalar_alloc

  allocate(scalar_alloc)
  scalar_alloc = 0

  !$omp target map(tofrom: scalar_alloc)
  !$omp parallel do reduction(+: scalar_alloc)
  do i = 1, 1000000
    scalar_alloc = scalar_alloc + 1
  end do
  !$omp end target
```

This PR supports by-ref reductions on the intra- and inter-warp levels.

So far, there are still steps to be takens for full support of by-ref
reductions, for example:
* Support inter-block value combination is still not supported.
Therefore, `target teams distribute parallel do` is still not supported.
* Support for dynamically-sized arrays still needs to be added.
* Support for more than one allocatable/array on the same `reduction`
clause.
2025-11-26 11:59:22 +01:00
Kazu Hirata
ee0652b4da
[flang] Remove unused local variables (NFC) (#167105)
Identified with bugprone-unused-local-non-trivial-variable.
2025-11-08 07:40:59 -08:00
Jakub Kuderski
23ead47655
[flang][mlir] Migrate to free create functions. NFC. (#164657)
See
https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.

I plan to mark these as deprecated in
https://github.com/llvm/llvm-project/pull/164649.
2025-10-22 12:47:48 -04:00
agozillon
f2b20d3410
[Flang][OpenMP][Dialect] Swap to using MLIR dialect enum to encode map flags (#164043)
This PR shifts from using the LLVM OpenMP enumerator bit flags to an
OpenMP dialect specific enumerator. This allows us to better represent
map types that wouldn't be of interest to the LLVM backend and runtime
in the dialect.

Primarily things like
ref_ptr/ref_ptee/ref_ptr_ptee/atach_none/attach_always/attach_auto which
are of interest to the compiler for certrain transformations (primarily
in the FIR transformation passes dealing with mapping), but the runtime
has no need to know about them. It also means if another OpenMP
implementation comes along they won't need to stick to the same bit flag
system LLVM chose/do leg work to address it.
2025-10-21 21:54:25 +02:00
Kareem Ergawy
9b75446940
[flang][OpenMP] do concurrent: support reduce on device (#156610)
Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.

- https://github.com/llvm/llvm-project/pull/155754
- https://github.com/llvm/llvm-project/pull/155987
- https://github.com/llvm/llvm-project/pull/155992
- https://github.com/llvm/llvm-project/pull/155993
- https://github.com/llvm/llvm-project/pull/157638
- https://github.com/llvm/llvm-project/pull/156610 ◀️
- https://github.com/llvm/llvm-project/pull/156837
2025-09-23 07:56:16 +02:00
Kareem Ergawy
9008c44c71
[flang][OpenMP] do concurrent: support local on device (#157638)
Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.

- https://github.com/llvm/llvm-project/pull/155754
- https://github.com/llvm/llvm-project/pull/155987
- https://github.com/llvm/llvm-project/pull/155992
- https://github.com/llvm/llvm-project/pull/155993
- https://github.com/llvm/llvm-project/pull/157638 ◀️
- https://github.com/llvm/llvm-project/pull/156610
- https://github.com/llvm/llvm-project/pull/156837
2025-09-23 07:27:21 +02:00
Kareem Ergawy
78853df2bf
[flang][OpenMP] Extend do concurrent mapping to device (#155987)
Upstreams further parts of `do concurrent` to OpenMP conversion pass
from AMD's fork. This PR extends the pass by adding support for mapping
to the device.

PR stack:
- https://github.com/llvm/llvm-project/pull/155754
- https://github.com/llvm/llvm-project/pull/155987 ◀️
- https://github.com/llvm/llvm-project/pull/155992
- https://github.com/llvm/llvm-project/pull/155993
- https://github.com/llvm/llvm-project/pull/157638
- https://github.com/llvm/llvm-project/pull/156610
- https://github.com/llvm/llvm-project/pull/156837
2025-09-10 20:44:55 +02:00
Matthias Springer
2929a2978c
[mlir][Transforms] Add support for ConversionPatternRewriter::replaceAllUsesWith (#155244)
This commit generalizes `replaceUsesOfBlockArgument` to
`replaceAllUsesWith`. In rollback mode, the same restrictions keep
applying: a value cannot be replaced multiple times and a call to
`replaceAllUsesWith` will replace all current and future uses of the
`from` value.

`replaceAllUsesWith` is now fully supported and its behavior is
consistent with the remaining dialect conversion API. Before this
commit, `replaceAllUsesWith` was immediately reflected in the IR when
running in rollback mode. After this commit, `replaceAllUsesWith`
changes are materialized in a delayed fashion, at the end of the dialect
conversion. This is consistent with the `replaceUsesOfBlockArgument` and
`replaceOp` APIs.

`replaceAllUsesExcept` etc. are still not supported and will be
deactivated on the `ConversionPatternRewriter` (when running in rollback
mode) in a follow-up commit.

Note for LLVM integration: Replace `replaceUsesOfBlockArgument` with
`replaceAllUsesWith`. If you are seeing failures, you may have patterns
that use `replaceAllUsesWith` incorrectly (e.g., being called multiple
times on the same value) or bypass the rewriter API entirely. E.g., such
failures were mitigated in Flang by switching to the walk-patterns
driver (#156171).

You can temporarily reactivate the old behavior by calling
`RewriterBase::replaceAllUsesWith`. However, note that that behavior is
faulty in a dialect conversion. E.g., the base
`RewriterBase::replaceAllUsesWith` implementation does not see uses of
the `from` value that have not materialized yet and will, therefore, not
replace them.
2025-09-06 11:17:55 +02:00
Kareem Ergawy
319705d0ab
[flang] do concurrent: fix reduction symbol resolution when mapping to OpenMP (#155355)
Fixes #155273

This PR introduces 2 changes:
1. The `do concurrent` to OpenMP pass is now a module pass rather than a
function pass.
2. Reduction ops are looked up in the parent module before being
created.

The benefit of using a module pass is that the same reduction operation
can be used across multiple functions if the reduction type matches.
2025-08-27 17:06:16 +02:00
Maksim Levental
dcfc853c51
[mlir][NFC] update flang/lib create APIs (12/n) (#149914)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-24 19:05:40 -04:00
Kareem Ergawy
0e9b7b054c
[flang][OpenMP] Basic mapping of do concurrent ... reduce to OpenMP (#146033)
Now that we have changes introduced by #145837, mapping reductions from
`do concurrent` to OpenMP is almost trivial. This PR adds such mapping.

PR stack:
- https://github.com/llvm/llvm-project/pull/145837
- https://github.com/llvm/llvm-project/pull/146025
- https://github.com/llvm/llvm-project/pull/146028
- https://github.com/llvm/llvm-project/pull/146033 (this one)
2025-07-11 09:19:16 +02:00
Kareem Ergawy
a510e75949
[flang][fir] Small clean-up in fir_DoConcurrentLoopOp's defintion (#146028)
Re-organizes the op definition a little bit and removes a method that
does not add much value to the API.

PR stack:
- https://github.com/llvm/llvm-project/pull/145837
- https://github.com/llvm/llvm-project/pull/146025
- https://github.com/llvm/llvm-project/pull/146028 (this one)
- https://github.com/llvm/llvm-project/pull/146033
2025-07-11 08:30:36 +02:00
Kareem Ergawy
2dd88c405d
[flang][OpenMP] Extend locality spec to OMP claues (init and dealloc regions) (#142795)
Extends support for locality specifier to OpenMP translation by adding
supprot for transling localizers that have `init` and `dealloc` regions.
2025-06-11 13:44:01 +02:00
Kareem Ergawy
e44a65ed98
[flang][OpenMP] Map basic local specifiers to private clauses (#142735)
Starts the effort to map `do concurrent` locality specifiers to OpenMP
clauses. This PR adds support for basic specifiers (no `init` or `copy`
regions yet).
2025-06-11 10:36:12 +02:00
Kareem Ergawy
5fe69fd95c
[flang][OpenMP] Update do concurrent mapping pass to use fir.do_concurrent op (#138489)
This PR updates the `do concurrent` to OpenMP mapping pass to use the
newly added `fir.do_concurrent` ops that were recently added upstream
instead of handling nests of `fir.do_loop ... unordered` ops.

Parent PR: https://github.com/llvm/llvm-project/pull/137928.
2025-05-08 20:22:29 +02:00
Kazu Hirata
aa33c09561 [flang] Fix a warning
This patch fixes:

  flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp:184:18: error:
  unused variable 'loc' [-Werror,-Wunused-variable]
2025-04-02 10:14:50 -07:00
Kareem Ergawy
de6c9096ba
[flang][OpenMP] Handle "loop-local values" in do concurrent nests (#127635)
Extends `do concurrent` mapping to handle "loop-local values". A
loop-local value is one that is used exclusively inside the loop but
allocated outside of it. This usually corresponds to temporary values
that are used inside the loop body for initialzing other variables for
example. After collecting these values, the pass localizes them to the
loop nest by moving their allocations.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635 (this PR)
2025-04-02 15:43:19 +02:00
Kareem Ergawy
ef56b53712
[flang][OpenMP] Extend do concurrent mapping to multi-range loops (#127634)
Adds support for converting mulit-range loops to OpenMP (on the host
only for now). The changes here "prepare" a loop nest for collapsing by
sinking iteration variables to the innermost `fir.do_loop` op in the
nest.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634 (this PR)
- https://github.com/llvm/llvm-project/pull/127635
2025-04-02 12:43:04 +02:00
Kareem Ergawy
3f8bfc9f7f
[flang][OpenMP] Map simple do concurrent loops to OpenMP host constructs (#127633)
Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping
pass. This PR add support for converting simple loops to the equivalent
OpenMP constructs on the host: `omp parallel do`. Towards that end, we
have to collect more information about loop nests for which we add new
utils in the `looputils` name space.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633 (this PR)
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
2025-04-02 11:26:58 +02:00
Kareem Ergawy
41d718b1cf
[flang][OpenMP] Upstream do concurrent loop-nest detection. (#127595)
Upstreams the next part of do concurrent to OpenMP mapping pass (from
AMD's ROCm implementation). See
https://github.com/llvm/llvm-project/pull/126026 for more context.

This PR add loop nest detection logic. This enables us to discover
muli-range do concurrent loops and then map them as "collapsed" loop
nests to OpenMP.

This is a follow up for
https://github.com/llvm/llvm-project/pull/126026, only the latest commit
is relevant.

This is a replacement for
https://github.com/llvm/llvm-project/pull/127478 using a
`/user/<username>/<branchname>` branch.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026
- https://github.com/llvm/llvm-project/pull/127595 (this PR)
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
2025-04-02 10:12:52 +02:00
Kareem Ergawy
5d364481e3
[flang][OpenMP] Upstream first part of do concurrent mapping (#126026)
This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.

An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.

In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.

This looks like a huge PR but a lot of the added stuff is documentation.

It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.

PR stack:
- https://github.com/llvm/llvm-project/pull/126026 (this PR)
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
2025-04-02 09:24:38 +02:00