OpenMP 6.0 introduced alternative spelling for some directives, with the
previous spellings still allowed.
Warn the user when a new spelling is encountered with OpenMP version set
to an older value.
The `-mframe-pointer` flag is not explicitly set in the original `flang`
invocation and so the value passed to `flang -fc1` can vary depending on
the host machine, so don't verify it in the output.
`-mframe-pointer` forwarding is already verified by
`flang/test/Driver/frame-pointer-forwarding.f90`.
Our local Flang build on PowerPC was broken as
```
llvm/flang/../mlir/include/mlir/IR/ValueRange.h:401:20: error: 'ArrayRef' is deprecated: Use {} or ArrayRef<T>() instead [-Werror,-Wdeprecated-declarations]
401 | : ValueRange(ArrayRef<Value>(std::forward<Arg>(arg))) {}
| ^
llvm/flang/lib/Optimizer/CodeGen/CodeGen.cpp:2243:53: note: in instantiation of function template specialization 'mlir::ValueRange::ValueRange<const std::nullopt_t &, void>' requested here
2243 | /*cstInteriorIndices=*/std::nullopt, fieldIndices,
| ^
llvm/include/llvm/ADT/ArrayRef.h:70:18: note: 'ArrayRef' has been explicitly marked deprecated here
70 | /*implicit*/ LLVM_DEPRECATED("Use {} or ArrayRef<T>() instead", "{}")
| ^
llvm/include/llvm/Support/Compiler.h:244:50: note: expanded from macro 'LLVM_DEPRECATED'
244 | #define LLVM_DEPRECATED(MSG, FIX) __attribute__((deprecated(MSG, FIX)))
| ^
1 error generated.
```
This patch is to fix it.
After the recent move to work queues, in certain cases when linking in
the fortran runtime built for offload on AMDGPU as required in certain
cases, we'll get missing symbols when linking. This PR tries to address
this issue by encompassing more of the library in
RT_OFFLOAD_API_GROUP_BEGIN, which has the affect of compiling these
functions for AMDGPU, resolving the missing symbols.
This PR should address the following issue:
https://github.com/llvm/llvm-project/issues/145888
Collect all spellings from all supported OpenMP versions before parsing.
Break up the list of spellings by the initial letter to speed up parsing
a little.
In the case of nested loops, `acc.loop` is meant to subsume all of the
loops that it applies to (when explicitly described as doing so in the
OpenACC specification). So when there is a `acc loop tile(...)` present
on nested Fortran DO loops, `acc.loop` should apply to the `n` loops
that `tile` applies to. This change lowers such nested Fortran loops
with tile clause into a collapsed `acc.loop` with `n` IVs, loop bounds,
and step, in a similar fashion to the current lowering for acc loops
with `collapse` clause.
The test flang/test/Semantics/io08.f90 was failing when UBSAN was
enabled:
```
/home/david.spickett/llvm-project/flang/include/flang/Common/format.h:224:26: runtime error: signed integer overflow: 10 * 987654321098765432 cannot be represented in type 'int64_t' (aka 'long')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/david.spickett/llvm-project/flang/include/flang/Common/format.h:224:26
```
This is because the code was effectively:
* Take the risk of UB happening
* Check whether it happened or not
Which UBSAN is obviously not going to like. Instead of checking after
the fact, use llvm's helpers that catch overflow without actually doing
it.
This patch adds an option to select the method for computing complex
number division. It uses `LoweringOptions` to determine whether to lower
complex division to a runtime function call or to MLIR's `complex.div`,
and `CodeGenOptions` to select the computation algorithm for
`complex.div`. The available option values and their corresponding
algorithms are as follows:
- `full`: Lower to a runtime function call. (Default behavior)
- `improved`: Lower to `complex.div` and expand to Smith's algorithm.
- `basic`: Lower to `complex.div` and expand to the algebraic algorithm.
See also the discussion in the following discourse post:
https://discourse.llvm.org/t/optimization-of-complex-number-division/83468
---------
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
If a `do concurrent` loop is offloaded then there should be no CUDA data
transfer in it. Update the semantic and lowering to take that into
account.
`AssignmentChecker` has to be put into a separate pass because the
checkers in `SemanticsVisitor` cannot have the same `Enter/Leave`
functions. The `DoForallChecker` already has `Eneter/Leave` functions
for the `DoConstruct`.
Host associated variables were not being handled properly.
For array references, get the fixed shape extents from the value
type instead, that works correctly in all cases.
As reported in #145917 and #147309, there are situation's where flang
may crash. This is because `nextIt` in
`RewriteOpenMPLoopConstruct` gets re-assigned when an iterator is erased
from the block. If this is missed, Flang may attempt to access a
location in memory that is not accessable and cause a compiler crash.
This adds protection where the crash can occur, and a test with a
reproducer that can trigger the crash.
Fixes#147309
If the size of the other Interval was 0, (that.size_ - 1) would wrap
below zero.
I've fixed this so that a zero size interval A is within interval B if
the start of A is within B. There's a few ways you could handle zero
sized intervals in theory but this one passes all tests so I assume it's
the intention.
This fixes the following tests when ubsan is enabled:
Flang :: Lower/OpenMP/PFT/sections-pft.f90
Flang :: Lower/OpenMP/derived-type-allocatable.f90
Flang :: Lower/OpenMP/privatization-proc-ptr.f90
Flang :: Lower/OpenMP/sections.f90
Flang :: Parser/OpenMP/sections.f90
Flang :: Semantics/OpenMP/clause-validity01.f90
Flang :: Semantics/OpenMP/if-clause.f90
Flang :: Semantics/OpenMP/parallel-sections01.f90
Flang :: Semantics/OpenMP/private-assoc.f90
The behaviour of strncmp is undefined if either string pointer is null
(https://en.cppreference.com/w/cpp/string/byte/strncmp.html).
I've copied the logic over from Compare to another CharBlock, which had
code to avoid UB in memcmp.
The test Preprocessing/kind-suffix.F90 was failing with UBSAN enabled,
and now passes.
Fixes `#146802`
#146582 started using the `Reserved` Frame Pointer kind for Arm64
Windows, but this revealed a bug in Flang where it copied the
`-mframe-pointer=reserved` flag from Clang, but didn't correctly handle
it in its own command line parser and subsequent compilation pipeline.
This change adds support for `-mframe-pointer=reserved` and adds a test
to make sure that functions are correctly marked when the flag is set.
These should have been looking for the "x86" target not "x64_64".
When run on AArch64 they failed because bbc tried to compile for
AArch64. Add a target option to fix that, as these tests are x86
specific.
Basic parsing and semantics support for Declare Variant was added in
#130578. However, this did not include variant of `Pre` and `Post`
within `OmpAttributeVisitor`. This meant that when a function in the
class tried to get the context using `GetContext`, Flang would crash as
the context was empty. To ensure this is possible, such as when
resolving names as part of the `uniform` clause in the `simd` directive,
the context is now pushed within `OmpAttributeVisitor` when parsing a
`DECLARE VARIANT` directive.
Fixes#145222
Summary:
This patch is mostly an NFC that renames the existing `-fopenmp-targets`
into `--offload-targets`. Doing this early to simplify a follow-up patch
that will hopefully allow this syntax to be used more generically over
the existing `--offload` syntax (which I think is mostly unmaintained
now.). Following in the well-trodden path of trying to pull language
specific offload options into generic ones, but right now this is still
just OpenMP specific.
We no longer utilise the deprecated FIR only flow, so we should be
testing for the current HLFIR flow that we support as opposed to the
older that we no longer maintain.
When merging the list of recursive reference under two components,
duplicates should be removed.
If the recursive reference to parents nodes (referred by depth of the
parents node) are [1, 2, 5] and [4, 5], the new list should be [1,2,4,5].
With std::merge the order was correct but 5 was duplicated. Use
std::set_union instead that removes duplicates.
With this patch Fujitsu tests 0394_0030.f90 [1] and 0390_0230.f90 [2]()
finally compile with -g in about 10s. Their compilation was hanging
before #146543, and they were now hitting an error:
"LLVM ERROR: SmallVector unable to grow" which is fixed by this patch.
[1]: 0d02267bb9/Fortran/0394/0394_0030.f90
[2]: 0d02267bb9/Fortran/0390/0390_0230.f90
Assignments of n-dimensional arrays, with trivial RHS, were
always being converted to n nested loops. For contiguous arrays,
it's possible to flatten them and use a single loop, that can
usually be better optimized by LLVM.
In a test program, using a 3-dimensional array and varying its
size, the resulting speedup was as follows (measured on Graviton4):
16K 1.09
64K 1.40
128K 1.90
256K 1.91
512K 1.00
For sizes above or equal to 512K no improvement was observed.
It looks like LLVM stops trying to perform aggressive loop
unrolling at a certain threshold and just uses nested loops
instead. Larger sizes won't fit on L1 and L2 caches too.
This was noticed while profiling 527.cam4_r. This optimization
makes aer_rad_props_sw slightly faster, but unfortunately it
practically doesn't change 527.cam4_r total execution time.
Many tests in Flang are looking for x86_64-registered-target, but this
never exists because the target is just called x86.
These two pass with this corrected but the others I need to look into
why they fail.
Fixes a Windows bot failure caused by #146667. Just run the test if an
AMD GPU target is registered. Hopefully, the bot now passes.
Test coverage is not reduced since `bbc` is still run on all platforms.
Temps needed for the reduction init regions are now allocate on the heap
all the time. However, this is performance killer for GPUs since malloc
calls are prohibitively expensive. Therefore, we should do these
allocations on the stack for GPU reductions.
Changed the type of the `func_name` attribute from SymbolNameAttr to
SymbolRefAttr. SymbolNameAttr is typically used when defining a symbol
(e.g., `sym_name`), while SymbolRefAttr is appropriate for referencing
existing operations. This change ensures that MLIR can correctly track
the link to the referenced `func.func` operation.
When using -fhermetic-module-files it's possible for a derived type to
have multiple distinct definition sites that are being compared for
being the same type, as in argument association. Accept them as being
the same type so long as they have the same names, the same module
names, and identical definitions.
…ion line
An obsolete flag ("insertASpace_") is being used to signal some cases in
the prescanner's implementation of continuation lines when a token
should be broken when it straddles a line break. It turns out that it's
sufficient to simply note these cases without ever actually inserting a
space, so don't do that (fixing the motivating bug). This leaves some
variables with obsolete names, so change them as well.
This patch handles the third of the three bugs reported in
https://github.com/llvm/llvm-project/issues/146362 .
If we disable `FLANG_BUILD_TOOLS`, not only building the tools but also
generating the targets for them is skipped now. On the other hand, llvm
separates them into `LLVM_BUILD_TOOLS` and `LLVM_INCLUDE_TOOLS`.
This patch introduces `FLANG_INCLUDE_TOOLS` for the distinction.
The current DITypeAttr caching for derived type debug metadata
generation strategy is not optimal. This turns out to be an issue for
compile times in apps with very very complex derived types like CP2K
See the added debug-cyclic-derived-type-caching-simple.f90 test for more
details about the duplication issue.
As a real world example justifying the new non trivial caching strategy,
in CP2K, emitting debug type info for the swarm_worker_type` in swarm_worker.F
caused 1,747,347 llvm debug metadata nodes to be emitted instead of 8023
after this patch (200x less) leading to noticeable compile time
improvements (I measured 0.12s spent in `AddDebugInfo` pass instead of
7.5s prior to this patch).
The main idea is that caching is now associating to the cached
DITypeAttr tree for a derived type a list of parent nodes being referred
to recursively via indices in this DITypeAttr.
When leaving the context of a parent node, all types that were cached
and linked to this parent node are cleared from the cache.
This allows more reusage in sub-trees while still fulfilling the MLIR
requirements that DITypeAttr types referring to a parent DITypeAttr via
integer id should only be used inside the DITypeAttr of the parent.
Most of the complexity comes from computing the "list of parent nodes"
by merging the ones from the components.
This is made is such a way that the extra cost for apps without
recursive derived type is minimal because the extra data structure
should not require extra dynamic allocations when they are no or little
recursion.
Example:
Take the following type graph (Fortran source for it in the added
debug-cyclic-derived-type-caching-complex.f90).
A is the tope level types, and has direct components of types B, C, and
E.
There are cycles in the type tree introduced by type B and D.
Types `C` and `E` are of interest here because they are in the middle of
those cycles and appear in several places in the type tree. There
occurrences is labeled in brackets in the order of visit by the
DebugTypeGenerator.
```
A -> B -> C [1] -> D -> E [1] -> F -> G -> B
| | | |
| | | | -> D
| | |
| | | -> H -> E [2] -> F -> G -> B
| | |
| | |-> D
| |
| | -> I -> E [3] -> F -> G -> B
| | |
| | |-> D
| | -> C [2]
|
| -> C [3] -> D
| -> E [4] -> F -> G -> B
|
| -> D
```
With this patch, E[2] and E[3] can share the same DITypeAttr as well as
C[1] and C[2] while they previously all got there own nodes.
To be safe with regards to cycles in MLIR, a DITypeAttr created for a
node N2 under a node N1 being recursively referred to and above the
recursive reference to N1 shall not be used above N1 in the DITypeAttr
tree. It can however be used in several places under N1.
Hence here:
-E[2] cannot reuse E[1] DITypeAttr because D appears above and under
E[1].
-E[3] can reuse E[2] DITypeAttr because they are both under B and above
D.
-E[4] cannot reuse E[3] DITypeAttr because it is above B.
This is achieved by this patch because when visiting A and reaching B,
the recursive reference to B is registered in the visit context. This
context is added D when going back-up in F. So when reaching back E[1]
with the information to build its DITypeAttr, its recursive references
are known and saved along the DITypeAttr in the cache.
When reaching back D, the cache for E is cleared because it is known it
depended on D. A new DITypeAttr is created after E[2], and this time it
only depends on B because the D under E[2] is not a recursive reference
(D is not above E[2]). Hence, when reaching E[3] it can be reused, and
the cache entry for E[2] is cleared when reaching B, which leads to a
new DITypeAttr to be created for E[4].
This is combination of https://github.com/llvm/llvm-project/pull/138149
and https://github.com/llvm/llvm-project/pull/138039 which were opened
separately for ease of reviewing. Only other change is adjustments in 2
tests which have gone in since.
There are `DeclareOp` present for the variables mapped into target
region. That allow us to generate debug information for them. But the
`TargetOp` is still part of parent function and those variables get the
parent function's `DISubprogram` as a scope.
In `OMPIRBuilder`, a new function is created for the `TargetOp`. We also
create a new `DISubprogram` for it. All the variables that were in the
target region now have to be updated to have the correct scope. This
after the fact updating of
debug information becomes very difficult in certain cases. Take the
example of variable arrays. The type of those arrays depend on the
artificial `DILocalVariable`(s) which hold the size(s) of the array.
This new function will now require that we generate the new variable and
and new types. Similar issue exist for character type variables too.
To avoid this after the fact updating, this PR generates a
`DISubprogramAttr` for the `TargetOp` while generating the debug info in
`flang`. Then we don't need to generate a `DISubprogram` in
`OMPIRBuilder`. This change is made a bit more complicated by the the
fact that in new scheme, the debug location already points to the new
`DISubprogram` by the time it reaches `convertOmpTarget`. But we need
some code generation in the parent function so we have to carefully
manage the debug locations.
This fixes issue `#134991`.
`-fveclib=libmvec` for AArch64 (NEON and SVE) in Clang was supported by
#143696. This patch does the same for Flang.
Vector functions defined in `libmvec` are used for the following Fortran
operator and functions currently.
- Power operator (`**`)
- Fortran intrinsic functions listed below for `real(kind=4)` and
`real(kind=8)` (including their coresponding specific intrinsic
functions)
- Fortran intrinsic functions which are expanded using functions listed
below (for example, `sin` for `complex(kind=8)`)
```
sin
tan
cos
asin
acos
atan (both atan(x) and atan(y, x))
atan2
cosh
tanh
asinh
acosh
atanh
erf
erfc
exp
log
log10
```
As with Clang/AArch64, glibc 2.40 or higher is required to use all these
functions.