Flang currently doesn't build in debug mode on GCC 15 due to missing
dynamic libraries in some CMakeLists.txt files, and OpenMP doesn't link
in debug mode due to the atomic library pulling in libstdc++ despite an
incomplete attempt in the CMakeLists.txt to disable glibcxx assertions.
This PR fixes these issues and allows Flang and the OpenMP runtime to
build and link on GCC 15 in debug mode.
---------
Co-authored-by: ronlieb <ron.lieberman@amd.com>
When multiple assumed size variable are used in a kernel with dynamic
shared memory, each variable use the 0 offset. Update the pass to
account for that.
```
attributes(global) subroutine testany( a )
real(4), shared :: smasks(*)
real(8), shared :: dmasks(*)
end subroutine
```
`replaceAllUsesWith` is not safe to use in a dialect conversion and will
be deactivated soon (#154112). Fix commit fixes some API violations.
Also some general improvements.
I'm trying to remove the redirection in SmallSet.h:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
to make it clear that we are using SmallPtrSet. There are only
handful places that rely on this redirection.
This patch replaces SmallSet to SmallPtrSet where the element type is
a pointer.
Currently the privatization recipe of a scalar allocatable is as follow:
```
acc.private.recipe @privatization_ref_box_heap_i32 : !fir.ref<!fir.box<!fir.heap<i32>>> init {
^bb0(%arg0: !fir.ref<!fir.box<!fir.heap<i32>>>):
%0 = fir.alloca !fir.box<!fir.heap<i32>>
%1:2 = hlfir.declare %0 {uniq_name = "acc.private.init"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
acc.yield %1#0 : !fir.ref<!fir.box<!fir.heap<i32>>>
}
```
This change adds the allocation for the scalar.
`GetOmpObjectList` takes a clause, and returns the pointer to the
contained OmpObjectList, or nullptr if the clause does not contain one.
Some clauses with object list were not recognized: handle all clauses,
and move the implementation to flang/Parser/openmp-utils.cpp.
This PR adds support for complex power operations (`cpow`) in the
`ComplexToROCDLLibraryCalls` conversion pass, specifically targeting
AMDGPU architectures. The implementation optimises complex
exponentiation by using mathematical identities and special-case
handling for small integer powers.
- Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa`
target instead of using library calls
- Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using
mathematical identity
In relation to the approval and merge of the
https://github.com/llvm/llvm-project/pull/76088 specification about
multi-image features in Flang.
Here is a PR on adding support for `THIS_IMAGE` and `NUM_IMAGES` in
conformance with the PRIF specification.
For `THIS_IMAGE`, the lowering to the subroutine containing the coarray
argument is not present in this PR, and will be in a future one.
The seemingly was a regression that prevented the usage of named
constant (w/ PARAMETER attribute) in SHARED and FIRSTPRIVATE clauses.
This PR corrects that.
This reverts commit 5178aeff7b96e86b066f8407b9d9732ec660dd2e.
In addition:
* Scalar constant UNSIGNED BOUNDARY is explicitly casted
to the result type so that the generated hlfir.eoshift
operation is valid. The lowering produces signless constants
by default. It might be a bigger issue in lowering, so I just
want to "fix" it for EOSHIFT in this patch.
* Since we have to create unsigned integer constant during
HLFIR inlining, I added code in createIntegerConstant
to make it possible.
Array sections like this have not been using the knowledge that
the result is contiguous:
```
type t
integer :: f
end type
type(t) :: a(:)
a%f = 0
```
Peter Klausler is working on a change that will result in the
corresponding
hlfir.designate having a component and a non-box result.
This patch fixes the issues found in HLFIR-to-FIR conversion.
The current code will crash on the MAP clause with OpenMP version >= 6.0
when the clause does not explicitly list any modifiers. The proper fix
is to update the handling of assumed-size arrays for OpenMP 6.0+, but in
the short term keep the behavior from 5.2, just avoid the crash.
The "ARRAY=" argument to these intrinsics cannot be scalar, whether
"DIM=" is present or not. (Allowing the "ARRAY=" argument to be scalar
when "DIM=" is absent would be a conceivable extension returning an
empty result array, like SHAPE() does with extents, but it doesn't seem
useful in a programming language without compilation-time rank
polymorphism apart from assumed-rank dummy arguments, and those are
supported.)
Fixes https://github.com/llvm/llvm-project/issues/154044.
Because the per-dimension information in a descriptor holds an extent
and a lower bound, but not an upper bound, the calculation of the upper
bound sometimes requires that the extent and lower bound be extracted
from a descriptor and added together, minus 1. This shouldn't be
attempted when the NamedEntity of the descriptor is something that
shouldn't be duplicated and used twice; specifically, it shouldn't apply
to NamedEntities containing references to impure functions as parts of
subscript expressions.
Fixes https://github.com/llvm/llvm-project/issues/153031.
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.
Example:
%1 = omp.target_allocmem %device : i32, i64
omp.target_freemem %device, %1 : i32, i64
The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
Fixes a regression uncovered by Fujitsu test 0686_0024.f90. In
particular, verifies that a pre-determined symbol is only privatized by
its defining evaluation (e.g. the loop for which the symbol was marked
as pre-determined).
In relation to the approval and merge of the
[PRIF](https://github.com/llvm/llvm-project/pull/76088) specification
about multi-image features in Flang, here is a first PR to add support
for the `-fcoarray` compilation flag and the initialization of the PRIF
environment.
Other PRs will follow for adding support of lowering to PRIF.
This patch generalizes the code for hlfir.cshift to be applicable
for hlfir.eoshift. The major difference is the selection
of the boundary value that might be statically/dynamically absent,
in which case the default scalar value has to be used.
The scalar value of the boundary is always computed before
the hlfir.elemental or the assignment loop.
Contrary to hlfir.cshift simplication, the SHIFT value is not
normalized,
because the original value (and its sign) participate in the EOSHIFT
index computation for addressing the input array and selecting
which elements of the results are assigned from the boundary operand.
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.
Automap Ref: OpenMP 6.0 section 7.9.7.
The script copies `ReleaseNotesTemplate.txt` to corresponding
`ReleaseNotes.rst`/`.md` to clear release notes.
The suffix of `ReleaseNotesTemplate.txt` must be `.txt`. If it is
`.rst`/`.md`, it will be treated as a documentation source file when
building documentation.
Prior to this PR, the default behaviour of a conversion pattern which
receives operands of a 1:N is to abort the compilation. This has
historically been useful when the 1:N type conversion got merged into
the dialect conversion as it allowed us to easily find patterns that
should be capable of handling 1:N type conversions but didn't.
However, this behaviour has the disadvantage of being non-composable:
While the pattern in question cannot handle the 1:N type conversion,
another pattern part of the set might, but doesn't get the chance as
compilation is aborted.
This PR fixes this behaviour by failing to match and instead of
aborting, giving other patterns the chance to legalize an op. The
implementation uses a reusable function called `dispatchTo1To1` to allow
derived conversion patterns to also implement the behaviour.
Consider the following example:
```fortran
implicit none
integer :: i, j
do concurrent (i=1:10) local(j)
block
do j=1,20
end do
end block
end do
```
Without the fix introduced in this PR, the compiler would "re-localize"
the `j` variable inside the `fir.do_concurrent` loop:
```mlir
fir.do_concurrent {
%7 = fir.alloca i32 {bindc_name = "j"}
%8:2 = hlfir.declare %7 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
...
fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
%12:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
...
%17:2 = fir.do_loop %arg2 = %14 to %15 step %c1_1 iter_args(%arg3 = %16) -> (index, i32) {
fir.store %arg3 to %8#0 : !fir.ref<i32>
...
}
}
}
```
This happened because we did a shallow look-up of `j` and since the loop
is nested inside a `block`, the look-up failed and we re-created a local
allocation for `j` inside the parent `fir.do_concurrent` loop. This
means that we ended up not using the actual localized symbol which is
passed as a region argument to the `fir.do_concurrent.loop` op.
In case of `j`, we do not need to do a shallow look-up. The shallow
look-up is only needed if a symbol is an OpenMP private one or an
iteration variable of a `do concurrent` loop. Neither of which applies
to `j`.
With the fix, `j` is properly resolved to the `local` region argument:
```mlir
fir.do_concurrent {
...
fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
...
%10:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
...
%15:2 = fir.do_loop %arg2 = %12 to %13 step %c1_1 iter_args(%arg3 = %14) -> (index, i32) {
fir.store %arg3 to %10#0 : !fir.ref<i32>
...
}
}
}
```