llvm-project

Author	SHA1	Message	Date
Kareem Ergawy	e74e970036	[flang][OpenMP][DoConcurrent] Add `collapse` clause to generated `omp.loop_nest` op (#178138 ) Adds the collpase clause to the generated loop nest both on host and device.	2026-01-27 11:58:57 +01:00
Kareem Ergawy	f481f5bef9	[OpenMP][flang] Add initial support for by-ref reductions on the GPU (#165714 ) Adds initial support for GPU by-ref reductions. The main problem for reduction by reference is that, prior to this PR, we were shuffling (from remote lanes within the same warp or across different warps within the block) pointers/references to the private reduction values rather than the private reduction values themselves. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in `target` regions. For example: ```fortran integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end target ``` This PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example: * Support inter-block value combination is still not supported. Therefore, `target teams distribute parallel do` is still not supported. * Support for dynamically-sized arrays still needs to be added. * Support for more than one allocatable/array on the same `reduction` clause.	2025-11-26 11:59:22 +01:00
Kazu Hirata	ee0652b4da	[flang] Remove unused local variables (NFC) (#167105 ) Identified with bugprone-unused-local-non-trivial-variable.	2025-11-08 07:40:59 -08:00
Jakub Kuderski	23ead47655	[flang][mlir] Migrate to free create functions. NFC. (#164657 ) See https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339. I plan to mark these as deprecated in https://github.com/llvm/llvm-project/pull/164649.	2025-10-22 12:47:48 -04:00
agozillon	f2b20d3410	[Flang][OpenMP][Dialect] Swap to using MLIR dialect enum to encode map flags (#164043 ) This PR shifts from using the LLVM OpenMP enumerator bit flags to an OpenMP dialect specific enumerator. This allows us to better represent map types that wouldn't be of interest to the LLVM backend and runtime in the dialect. Primarily things like ref_ptr/ref_ptee/ref_ptr_ptee/atach_none/attach_always/attach_auto which are of interest to the compiler for certrain transformations (primarily in the FIR transformation passes dealing with mapping), but the runtime has no need to know about them. It also means if another OpenMP implementation comes along they won't need to stick to the same bit flag system LLVM chose/do leg work to address it.	2025-10-21 21:54:25 +02:00
Kareem Ergawy	9b75446940	[flang][OpenMP] `do concurrent`: support `reduce` on device (#156610 ) Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. - https://github.com/llvm/llvm-project/pull/155754 - https://github.com/llvm/llvm-project/pull/155987 - https://github.com/llvm/llvm-project/pull/155992 - https://github.com/llvm/llvm-project/pull/155993 - https://github.com/llvm/llvm-project/pull/157638 - https://github.com/llvm/llvm-project/pull/156610 ◀️ - https://github.com/llvm/llvm-project/pull/156837	2025-09-23 07:56:16 +02:00
Kareem Ergawy	9008c44c71	[flang][OpenMP] `do concurrent`: support `local` on device (#157638 ) Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. - https://github.com/llvm/llvm-project/pull/155754 - https://github.com/llvm/llvm-project/pull/155987 - https://github.com/llvm/llvm-project/pull/155992 - https://github.com/llvm/llvm-project/pull/155993 - https://github.com/llvm/llvm-project/pull/157638 ◀️ - https://github.com/llvm/llvm-project/pull/156610 - https://github.com/llvm/llvm-project/pull/156837	2025-09-23 07:27:21 +02:00
Kareem Ergawy	78853df2bf	[flang][OpenMP] Extend `do concurrent` mapping to device (#155987 ) Upstreams further parts of `do concurrent` to OpenMP conversion pass from AMD's fork. This PR extends the pass by adding support for mapping to the device. PR stack: - https://github.com/llvm/llvm-project/pull/155754 - https://github.com/llvm/llvm-project/pull/155987 ◀️ - https://github.com/llvm/llvm-project/pull/155992 - https://github.com/llvm/llvm-project/pull/155993 - https://github.com/llvm/llvm-project/pull/157638 - https://github.com/llvm/llvm-project/pull/156610 - https://github.com/llvm/llvm-project/pull/156837	2025-09-10 20:44:55 +02:00
Matthias Springer	2929a2978c	[mlir][Transforms] Add support for `ConversionPatternRewriter::replaceAllUsesWith` (#155244 ) This commit generalizes `replaceUsesOfBlockArgument` to `replaceAllUsesWith`. In rollback mode, the same restrictions keep applying: a value cannot be replaced multiple times and a call to `replaceAllUsesWith` will replace all current and future uses of the `from` value. `replaceAllUsesWith` is now fully supported and its behavior is consistent with the remaining dialect conversion API. Before this commit, `replaceAllUsesWith` was immediately reflected in the IR when running in rollback mode. After this commit, `replaceAllUsesWith` changes are materialized in a delayed fashion, at the end of the dialect conversion. This is consistent with the `replaceUsesOfBlockArgument` and `replaceOp` APIs. `replaceAllUsesExcept` etc. are still not supported and will be deactivated on the `ConversionPatternRewriter` (when running in rollback mode) in a follow-up commit. Note for LLVM integration: Replace `replaceUsesOfBlockArgument` with `replaceAllUsesWith`. If you are seeing failures, you may have patterns that use `replaceAllUsesWith` incorrectly (e.g., being called multiple times on the same value) or bypass the rewriter API entirely. E.g., such failures were mitigated in Flang by switching to the walk-patterns driver (#156171). You can temporarily reactivate the old behavior by calling `RewriterBase::replaceAllUsesWith`. However, note that that behavior is faulty in a dialect conversion. E.g., the base `RewriterBase::replaceAllUsesWith` implementation does not see uses of the `from` value that have not materialized yet and will, therefore, not replace them.	2025-09-06 11:17:55 +02:00
Kareem Ergawy	319705d0ab	[flang] `do concurrent`: fix reduction symbol resolution when mapping to OpenMP (#155355 ) Fixes #155273 This PR introduces 2 changes: 1. The `do concurrent` to OpenMP pass is now a module pass rather than a function pass. 2. Reduction ops are looked up in the parent module before being created. The benefit of using a module pass is that the same reduction operation can be used across multiple functions if the reduction type matches.	2025-08-27 17:06:16 +02:00
Maksim Levental	dcfc853c51	[mlir][NFC] update `flang/lib` create APIs (12/n) (#149914 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-24 19:05:40 -04:00
Kareem Ergawy	0e9b7b054c	[flang][OpenMP] Basic mapping of `do concurrent ... reduce` to OpenMP (#146033 ) Now that we have changes introduced by #145837, mapping reductions from `do concurrent` to OpenMP is almost trivial. This PR adds such mapping. PR stack: - https://github.com/llvm/llvm-project/pull/145837 - https://github.com/llvm/llvm-project/pull/146025 - https://github.com/llvm/llvm-project/pull/146028 - https://github.com/llvm/llvm-project/pull/146033 (this one)	2025-07-11 09:19:16 +02:00
Kareem Ergawy	a510e75949	[flang][fir] Small clean-up in `fir_DoConcurrentLoopOp`'s defintion (#146028 ) Re-organizes the op definition a little bit and removes a method that does not add much value to the API. PR stack: - https://github.com/llvm/llvm-project/pull/145837 - https://github.com/llvm/llvm-project/pull/146025 - https://github.com/llvm/llvm-project/pull/146028 (this one) - https://github.com/llvm/llvm-project/pull/146033	2025-07-11 08:30:36 +02:00
Kareem Ergawy	2dd88c405d	[flang][OpenMP] Extend locality spec to OMP claues (`init` and `dealloc` regions) (#142795 ) Extends support for locality specifier to OpenMP translation by adding supprot for transling localizers that have `init` and `dealloc` regions.	2025-06-11 13:44:01 +02:00
Kareem Ergawy	e44a65ed98	[flang][OpenMP] Map basic `local` specifiers to `private` clauses (#142735 ) Starts the effort to map `do concurrent` locality specifiers to OpenMP clauses. This PR adds support for basic specifiers (no `init` or `copy` regions yet).	2025-06-11 10:36:12 +02:00
Kareem Ergawy	5fe69fd95c	[flang][OpenMP] Update `do concurrent` mapping pass to use `fir.do_concurrent` op (#138489 ) This PR updates the `do concurrent` to OpenMP mapping pass to use the newly added `fir.do_concurrent` ops that were recently added upstream instead of handling nests of `fir.do_loop ... unordered` ops. Parent PR: https://github.com/llvm/llvm-project/pull/137928.	2025-05-08 20:22:29 +02:00
Kazu Hirata	aa33c09561	[flang] Fix a warning This patch fixes: flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp:184:18: error: unused variable 'loc' [-Werror,-Wunused-variable]	2025-04-02 10:14:50 -07:00
Kareem Ergawy	de6c9096ba	[flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (#127635 ) Extends `do concurrent` mapping to handle "loop-local values". A loop-local value is one that is used exclusively inside the loop but allocated outside of it. This usually corresponds to temporary values that are used inside the loop body for initialzing other variables for example. After collecting these values, the pass localizes them to the loop nest by moving their allocations. PR stack: - https://github.com/llvm/llvm-project/pull/126026 - https://github.com/llvm/llvm-project/pull/127595 - https://github.com/llvm/llvm-project/pull/127633 - https://github.com/llvm/llvm-project/pull/127634 - https://github.com/llvm/llvm-project/pull/127635 (this PR)	2025-04-02 15:43:19 +02:00
Kareem Ergawy	ef56b53712	[flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (#127634 ) Adds support for converting mulit-range loops to OpenMP (on the host only for now). The changes here "prepare" a loop nest for collapsing by sinking iteration variables to the innermost `fir.do_loop` op in the nest. PR stack: - https://github.com/llvm/llvm-project/pull/126026 - https://github.com/llvm/llvm-project/pull/127595 - https://github.com/llvm/llvm-project/pull/127633 - https://github.com/llvm/llvm-project/pull/127634 (this PR) - https://github.com/llvm/llvm-project/pull/127635	2025-04-02 12:43:04 +02:00
Kareem Ergawy	3f8bfc9f7f	[flang][OpenMP] Map simple `do concurrent` loops to OpenMP host constructs (#127633 ) Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping pass. This PR add support for converting simple loops to the equivalent OpenMP constructs on the host: `omp parallel do`. Towards that end, we have to collect more information about loop nests for which we add new utils in the `looputils` name space. PR stack: - https://github.com/llvm/llvm-project/pull/126026 - https://github.com/llvm/llvm-project/pull/127595 - https://github.com/llvm/llvm-project/pull/127633 (this PR) - https://github.com/llvm/llvm-project/pull/127634 - https://github.com/llvm/llvm-project/pull/127635	2025-04-02 11:26:58 +02:00
Kareem Ergawy	41d718b1cf	[flang][OpenMP] Upstream `do concurrent` loop-nest detection. (#127595 ) Upstreams the next part of do concurrent to OpenMP mapping pass (from AMD's ROCm implementation). See https://github.com/llvm/llvm-project/pull/126026 for more context. This PR add loop nest detection logic. This enables us to discover muli-range do concurrent loops and then map them as "collapsed" loop nests to OpenMP. This is a follow up for https://github.com/llvm/llvm-project/pull/126026, only the latest commit is relevant. This is a replacement for https://github.com/llvm/llvm-project/pull/127478 using a `/user/<username>/<branchname>` branch. PR stack: - https://github.com/llvm/llvm-project/pull/126026 - https://github.com/llvm/llvm-project/pull/127595 (this PR) - https://github.com/llvm/llvm-project/pull/127633 - https://github.com/llvm/llvm-project/pull/127634 - https://github.com/llvm/llvm-project/pull/127635	2025-04-02 10:12:52 +02:00
Kareem Ergawy	5d364481e3	[flang][OpenMP] Upstream first part of `do concurrent` mapping (#126026 ) This PR starts the effort to upstream AMD's internal implementation of `do concurrent` to OpenMP mapping. This replaces #77285 since we extended this WIP quite a bit on our fork over the past year. An important part of this PR is a document that describes the current status downstream, the upstreaming status, and next steps to make this pass much more useful. In addition to this document, this PR also contains the skeleton of the pass (no useful transformations are done yet) and some testing for the added command line options. This looks like a huge PR but a lot of the added stuff is documentation. It is also worth noting that the downstream pass has been validated on https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived performance speed-ups that match pure OpenMP, for GPU mapping we are still working on extending our support for implicit memory mapping and locality specifiers. PR stack: - https://github.com/llvm/llvm-project/pull/126026 (this PR) - https://github.com/llvm/llvm-project/pull/127595 - https://github.com/llvm/llvm-project/pull/127633 - https://github.com/llvm/llvm-project/pull/127634 - https://github.com/llvm/llvm-project/pull/127635	2025-04-02 09:24:38 +02:00

22 Commits