Both clang and gfortran support the -fopenmp-simd flag, which enables
OpenMP support only for simd constructs, while disabling the rest of
OpenMP.
Implement the appropriate parse tree rewriting to remove non-SIMD OpenMP
constructs at the parsing stage.
Add a new SimdOnly flang OpenMP IR pass which rewrites generated OpenMP
FIR to handle untangling composite simd constructs, and clean up OpenMP
operations leftover after the parse tree rewriting stage.
With this approach, the two parts of the logic required to make the flag
work can be self-contained within the parse tree rewriter and the MLIR
pass, respectively. It does not need to be implemented within the core
lowering logic itself.
The flag is expected to have no effect if -fopenmp is passed explicitly,
and is only expected to remove OpenMP constructs, not things like OpenMP
library functions calls. This matches the behaviour of other compilers.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
When a REAL or COMPLEX literal appears without an explicit kind suffix
or a kind-determining exponent letter, and the conversion of that
literal from decimal to binary is inexact, emit a warning if that
constant is later implicitly widened to a more precise kind, since it
will have a different value than was probably intended.
Values that convert exactly from decimal to default real, e.g. 1.0 and
0.125, do not elicit this warning.
There are many contexts in which Fortran implicitly converts constants.
This patch covers name constant values, variable and component
initialization, constants in expressions, structure constructor
components, and array constructors.
For example, "real(8) :: tenth = 0.1" is a common Fortran bug that's
hard to find, and is one that often trips up even experienced Fortran
programmers. Unlike C and C++, the literal constant 0.1 is *not* double
precision by default, and it does not have the same value as 0.1d0 or
0.1_8 do when it is converted from decimal to real(4) and then to
real(8).
When the rhs of the data transfer is from a different type, allocate a
new temp on the host and first transfer the rhs to it. Then, use the
elemental op created to do the conversion.
There semantic analysis of the ATOMIC construct will require additional
rewriting (reassociation of certain expressions for user convenience),
and that will be driven by diagnoses made in the semantic checks.
While the rewriting of min/max is not required to be done in semantic
analysis, moving it there will make all rewriting for ATOMIC construct
be located in a single location.
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.
Automap Ref: OpenMP 6.0 section 7.9.7.
The structure of evaluate::Expr is highly customized for the specific
operation or entity that it represents. The different cases are
expressed with different types, which makes the traversal and
modifications somewhat complicated. There exists a framework for
read-only traversal (traverse.h), but there is nothing that helps with
modifying evaluate::Expr.
It's rare that evaluate::Expr needs to be modified, but for the cases
where it needs to be, this code will make it easier.
---------
Co-authored-by: Tom Eccles <tom.eccles@arm.com>
Add a new `AutomapToTargetData` pass. This gathers the declare target
enter variables which have the `AUTOMAP` modifier. And adds
`omp.declare_target_enter/exit` mapping directives for `fir.allocmem`
and `fir.freemem` oeprations on the `AUTOMAP` enabled variables.
Automap Ref: OpenMP 6.0 section 7.9.7.
Add Automap modifier to the MLIR op definition for the DeclareTarget
directive's Enter clause. Also add lowering support in Flang.
Automap Ref: OpenMP 6.0 section 7.9.7.
The default linker can be changed by a CMake variable
CLANG_DEFAULT_LINKER. However, it also changes the default linker
invoked by clang. In fact, there already exists FLANG_DEFAULT_LINKER,
but it does not work. This patch fixes it.
Note that FLANG_DEFAULT_LINKER will have the same value as
CLANG_DEFAULT_LINKER unless it is defined explicitly. That means this
patch does not affect the current behavior.
Fixes#73153
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Reviewed in #152379
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
Until the compiler part is fully hooked up via
https://github.com/llvm/llvm-project/pull/151878, tested this using
`external`:
```
external secnds
real s1, s2
s1 = secnds(0.0)
print *, "Seconds from midnight:", s1
call sleep(2)
s2 = secnds(s1)
print *, "Seconds from s1", s2
print *, "Seconds from midnight:", secnds(0.0)
end
```
The descriptor for derived-type with CUDA components are allocated in
managed memory. The lowering was calling the standard runtime on
allocate statement where it should be a `cuf.allocate` operation.
When the LoweringBridge is created, it registers an MLIR Diagnostics
handler with the MLIRContext. However, it never deregisters it once
lowering is finished.
This fixes this particular scenario. It also makes it so that the
Diagnostics handler is optional.
The structure is
- OmpBeginDirective (aka OmpDirectiveSpecification)
- Block
- optional<OmpEndDirective> (aka optional<OmpDirectiveSpecification>)
The OmpBeginDirective and OmpEndDirective are effectively different
names for OmpDirectiveSpecification. They exist to allow the semantic
analyses to distinguish between the beginning and the ending of a block
construct without maintaining additional context.
The actual changes are in the parser: parse-tree.h and openmp-parser.cpp
in particular. The rest is simply changing the way the directive/clause
information is accessed (typically for the simpler).
All standalone and block constructs now use OmpDirectiveSpecification to
store the directive/clause information.
Make the OperationCode overloads take the derived operation instead of
the Operation base class instance. This makes them usable from visitors
of "Expr<T>.u".
Also, fix small bug: OperationCode(Constant<T>) shoud be "Constant".
The OpenMPSectionConstruct corresponds to the `!$omp section` directive,
but there is nothing in the AST node that stores the directive
information. Even though the only possibility (at the moment) is
"section" without any clauses, for improved generality it is helpful to
have that information anyway.
It was an alias for OmpDirectiveName, which could cause confusion in
parse-tree visitors: a visitor for OmpDirectiveNameModifier could be
executed for an OmpDirectiveName node, leading to unexpected results.
Fortran's intrinsic numeric and relational operators can be overridden
with explicit interfaces so long as one or more of the dummy arguments
have the DEVICE attribute. Semantics already allows this without
complaint, but fails to replace the operations with the defined specific
procedure calls when analyzing expressions.
An error report of the following code generating non-atomic code led us
to realize there are missing checks in the OpenACC atomics code. Add
some of those checks for atomic and sketch how the rest of the code
should proceed in checking the rest of the properties. The following
cases are all reported as errors.
```fortran
! Originally reported error!
!$acc atomic capture
a = b
c = b
!$acc end atomic capture
! Other ambiguous, but related errors!
!$acc atomic capture
x = i
i = x
!$acc end atomic capture
!$acc atomic capture
a = b
b = b
!$acc end atomic capture
!$acc atomic capture
a = b
a = c
!$acc end atomic capture
```
When OpenACC is enabled and Fortran loops are annotated with `acc loop`,
they are lowered to `acc.loop` operation. And rest of the contained
loops use the normal FIR lowering path.
Hovever, the OpenACC specification has special provisions related to
contained loops and their induction variable. In order to adhere to
this, we convert all valid contained loops to `acc.loop` in order to
store this information appropriately.
The provisions in the spec that motivated this change (line numbers are
from OpenACC 3.4):
- 1353 Loop variables in Fortran do statements within a compute
construct are predetermined to be private to the thread that executes
the loop.
- 3783 When do concurrent appears without a loop construct in a kernels
construct it is treated as if it is annotated with loop auto. If it
appears in a parallel construct or an accelerator routine then it is
treated as if it is annotated with loop independent.
By valid loops - we convert do loops and do concurrent loops which have
induction variable. Loops which are unstructured are not handled.
Reland https://github.com/llvm/llvm-project/pull/150805
Shared libs build was broken.
Add `${dialect_libs}` and `${conversion_libs}` to
`MLIRRegisterAllExtensions` because it depends on
`registerConvert***ToLLVMInterface` functions.
`InitAll***` functions are used by `opt`-style tools to init all MLIR
dialects/passes/extensions. Currently they are implemeted as inline
functions and include essentially the entire MLIR header tree. Each file
which includes this header (~10 currently) takes 10+ sec and multiple GB
of ram to compile (tested with clang-19), which limits amount of
parallel compiler jobs which can be run. Also, flang just includes this
file into one of its headers.
Move the actual registration code to the static library, so it's
compiled only once.
Discourse thread
https://discourse.llvm.org/t/rfc-moving-initall-implementation-into-static-library/87559
Flang and Flang-RT have two flavours of unittests:
1. GTest unittests, using lit's `lit.formats.GoogleTest` format ending
with `Tests${CMAKE_EXECUTABLE_SUFFIX}`
2. "non-GTest" or "OldUnit" unittests, a plain executable ending with
`.test${CMAKE_EXECUTABLE_SUFFIX}`
Both executables are emitted into the same unittests/ subdirectory. When
running ...
1. `tests/Unit/lit.cfg.py`, only considers executable ending with
`Tests` (or `Tests.exe` on Windows), hence skips the non-GTest tests.
2. `tests/NonGtestUnit/lit.cfg.py` considers all tests ending with
`.test` or `.exe`. On Windows, The GTest unitests also end with `.exe`.
In Flang-RT, `.exe` is considered an extension for non-GTest unitests
which causes tests such as Flang's `RuntimeTests.exe` to be executed for
both on Windows. This particular test includes a file write test, using
a hard-coded filename `ucsfile`. If the two instances are executed
concurrently, they might interfere with each other reading/writing
`ucsfile` which results in a flaky test.
This patch avoids the redundant execution by requiring the suffix
`.test.exe` on Windows. lit has to be modified because it uses
`os.path.splitext` the extract the extension, which would only recognize
the last component. It was changed from the orginal `endswith` in
c865abe747aa72192f02ebfdcabe730f2553e42f
for unknown reasons.
In Flang, `.exe` is not considered a suffix for non-GTest unittests and
hence they are not run at all. Fixing by also added `.test.exe` as valid
suffix, like with Flang-RT.
Unfortunately, the ` Evaluate/real.test.exe` test was failing on
Windows:
```
FAIL: flang-OldUnit :: Evaluate/real.test.exe (3592 of 3592)
******************** TEST 'flang-OldUnit :: Evaluate/real.test.exe' FAILED ********************
..\_src\flang\unittests\Evaluate\real.cpp:511: FAIL: FlagsToBits(prod.flags) == 0x18, not 0x10
0 0x800001 * 0xbf7ffffe
..\_src\flang\unittests\Evaluate\real.cpp:511: FAIL: FlagsToBits(prod.flags) == 0x18, not 0x10
0 0x800001 * 0x3f7ffffe
..\_src\flang\unittests\Evaluate\real.cpp:511: FAIL: FlagsToBits(prod.flags) == 0x18, not 0x10
0 0x80800001 * 0xbf7ffffe
..\_src\flang\unittests\Evaluate\real.cpp:511: FAIL: FlagsToBits(prod.flags) == 0x18, not 0x10
0 0x80800001 * 0x3f7ffffe
...
```
This is due to the `__x86_64__` macro not being set by Microsoft's
cl.exe and hence floating point status flags not being read out. The
equivalent macro for Microsofts compiler is `_M_X64` (or `_M_X64`).
Atomic Control Options are used to specify architectural characteristics
to help lowering of atomic operations. The options used are:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
Legacy option `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
More details can be found in
https://github.com/llvm/llvm-project/pull/102569. This PR implements the
frontend support for these options with OpenMP atomic in flang.
Backend changes are available in the draft PR:
https://github.com/llvm/llvm-project/pull/143769 which will be raised
after this merged.
Block-associated constructs have, as their body, either a strictly- or a
loosely-structured block. In the former case the end-directive is
optional.
The existing parser required the end-directive to be present in all
cases.
Note:
The definitions of these blocks in the OpenMP spec exclude cases where
the block contains more than one construct, and the first one is
BLOCK/ENDBLOCK. For example, the following is invalid:
```
!$omp target
block ! This cannot be a strictly-structured block, but
continue ! a loosely-structured block cannot start with
endblock ! BLOCK/ENDBLOCK
continue !
!$omp end target
```
Fortran::parser::omp::GetOmpDirectiveName(t) will get the
OmpDirectiveName object that corresponds to construct t. That object (an
AST node) contains the enum id and the source information of the
directive.
Replace uses of extractOmpDirective and getOpenMPDirectiveEnum with the
new function.
Fixes#149089 and #149700.
Before #145837, when processing a reduction symbol not yet supported by
OpenMP lowering, the reduction processor would simply skip filling in
the reduction symbols and variables. With #145837, this behvaior was
slightly changed because the reduction symbols are populated before
invoking the reduction processor (this is more convenient to shared the
code with `do concurrent`).
This PR restores the previous behavior.
OpenMP 6.0 has changed the modifiers on the MAP clause. Previous patch
has introduced parsing support for them. This patch introduces
processing of the new forms in semantic checks and in lowering. This
only applies to existing modifiers, which were updated in the 6.0 spec.
Any of the newly introduced modifiers (SELF and REF) are ignored.
When REAL types are constant folded, the underneath implementation uses
arrays of integers. Ensure that these arrays are properly aligned.
This matters when building flang with clang. In some cases, the
resulting code for flang compiler ended up using SSE2 aligned load
instructions for REAL(16) constant folding on x86_64, and these
instructions require that the values are loaded from the aligned
addresses.