Properly cast the selector to `i64` regardless of its integer type.
We used to generate llvm.trunc always.
We have to use `i64` as long as the case values may exceed INT_MAX.
Fixes#153050.
A long chain of fir.pack_arrays might require multiple iterations
of the greedy rewriter, if we use down-top traversal. The rewriter
may not converge in 10 (default) iterations. It is not an error,
but it was reported as such.
This patch changes the traversal to top-down and also disabled
the hard error, if the rewriter does not converge soon enough.
There semantic analysis of the ATOMIC construct will require additional
rewriting (reassociation of certain expressions for user convenience),
and that will be driven by diagnoses made in the semantic checks.
While the rewriting of min/max is not required to be done in semantic
analysis, moving it there will make all rewriting for ATOMIC construct
be located in a single location.
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.
Automap Ref: OpenMP 6.0 section 7.9.7.
Fixes a bug when a block variable is marked as implicit private. In such
case, we can simply ignore privatizing that symbol within the context of
the currrent OpenMP construct since the "private" allocation for the
symbol will be emitted within the nested block anyway.
The structure of evaluate::Expr is highly customized for the specific
operation or entity that it represents. The different cases are
expressed with different types, which makes the traversal and
modifications somewhat complicated. There exists a framework for
read-only traversal (traverse.h), but there is nothing that helps with
modifying evaluate::Expr.
It's rare that evaluate::Expr needs to be modified, but for the cases
where it needs to be, this code will make it easier.
---------
Co-authored-by: Tom Eccles <tom.eccles@arm.com>
Add a new `AutomapToTargetData` pass. This gathers the declare target
enter variables which have the `AUTOMAP` modifier. And adds
`omp.declare_target_enter/exit` mapping directives for `fir.allocmem`
and `fir.freemem` oeprations on the `AUTOMAP` enabled variables.
Automap Ref: OpenMP 6.0 section 7.9.7.
Add Automap modifier to the MLIR op definition for the DeclareTarget
directive's Enter clause. Also add lowering support in Flang.
Automap Ref: OpenMP 6.0 section 7.9.7.
Fixes a bug when a block variable is marked as pre-determined private.
In such case, we can simply ignore privatizing that symbol within the
context of the currrent OpenMP construct since the "private" allocation
for the symbol will be emitted within the nested block anyway.
The default linker can be changed by a CMake variable
CLANG_DEFAULT_LINKER. However, it also changes the default linker
invoked by clang. In fact, there already exists FLANG_DEFAULT_LINKER,
but it does not work. This patch fixes it.
Note that FLANG_DEFAULT_LINKER will have the same value as
CLANG_DEFAULT_LINKER unless it is defined explicitly. That means this
patch does not affect the current behavior.
Fixes#73153
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Fixes#149563
When emitting unstructured `do concurrent` loops, reduction processing
should be skipped since we are not emitting `fir.do_concurrent` loop in
the first place.
The function ResolveOmpObject had a lot of highly-indented code in two
variant visitors. Extract the visitors into their own functions, and
reformat the code. Replace !(||) with !&&! in a couple of places to make
the formatting a bit nicer. Use llvm::enumerate instead of manually
maintaining iteration index.
Adding mlir to llvm support for atomic control options.
Atomic Control Options are used to specify architectural characteristics
to help lowering of atomic operations. The options used are:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
Legacy option `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
More details can be found in
https://github.com/llvm/llvm-project/pull/102569. This PR implements the
MLIR to LLVM lowering support of atomic control attributes specified
with OpenMP `atomicUpdateOp`.
Initial support can be found in PR:
https://github.com/llvm/llvm-project/pull/150860
Reviewed in #152379
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
When the rhs is a an array element, the assert was triggered but this is
still a valid transfer. Remove the assert. The operation has a verifier
to check its validity.
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
Until the compiler part is fully hooked up via
https://github.com/llvm/llvm-project/pull/151878, tested this using
`external`:
```
external secnds
real s1, s2
s1 = secnds(0.0)
print *, "Seconds from midnight:", s1
call sleep(2)
s2 = secnds(s1)
print *, "Seconds from s1", s2
print *, "Seconds from midnight:", secnds(0.0)
end
```
They were inserted in the current scope.
OpenMP spec (all versions):
The names of critical constructs are global entities of the program. If
a name conflicts with any other entity, the behavior of the program is
unspecified.
At the moment, there is no established way to emit LLVM IR before
optimization in LLVM. tco can partially does, but the optimization level
is fixed to 2. This patch adds -O flag to tco.
Note that this is not completely equivalent to the frontend option. tco
does not accept -O flag without numbers, and the default optimization
level is not 0 but 2.
Re-apply "[flang][cmake] Fix bcc dependencies (#125822)"
It was overwritten due to an automatic merge gone wrong for #124416.
`git cherry-pick f9af5c145f40480d46874b643ca2b1237e9fbb2a` applied in
97cf061e83a0d0cada38d8a2d75c8a840b01ba26 ignored the renaming of
FortranCommon into FortranSupport after #125822.
Original commit message:
The Fortran libraries are not part of MLIR, so they should use
target_link_libraries() rather than mlir_target_link_libraries().
This fixes an issue introduced in #20966.
This PR implements a semantic checker to ensure the legality of nested
OpenACC parallelism. The following are quotes from Spec 3.3. We need to
disallow loops from having parallelism at the same level as or at a
sub-level of child loops.
>**2.9.2 gang clause**
>[2064] <ins>When the parent compute construct is a parallel
construct</ins>, or on an orphaned loop construct, the gang clause
behaves as follows. (...) The associated dimension is the value of the
dim argument, if it appears, or is dimension one. The dim argument must
be a constant positive integer with value 1, 2, or 3.
>[2112] The region of a loop with a gang(dim:d) clause may not contain a
loop construct with a gang(dim:e) clause where e >= d unless it appears
within a nested compute region.
>[2074] <ins>When the parent compute construct is a kernels
construct</ins>, the gang clause behaves as follows. (...)
>[2148] The region of a loop with the gang clause may not contain
another loop with a gang clause unless within a nested compute region.
>**2.9.3 worker clause**
>[2122]/[2129] The region of a loop with the worker clause may not
contain a loop with the gang or worker clause unless within a nested
compute region.
>**2.9.4 vector clause**
>[2141]/[2148] The region of a loop with the vector clause may not
contain a loop with a gang, worker, or vector clause unless within a
nested compute region.
https://openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.3-final.pdf
Module files shouldn't ever produce parsing errors, and if they did in
the case of a badly-generated module file, the compiler will notice and
crash. So we can run the parser on module files with message deferral
enabled, and that saves time that would otherwise be spent generating
messages on failed parsing alternatives that are discarded anyway when
backtracking. It's not a big savings (single digit percentage on overall
compilation time for a big application with lots of modules), but worth
doing.
For more accurate compatibility with other compilers' extensions, accept
a bare exponent letter as valid real input to a formatted READ statement
only in a fixed-width input field. So it works with (G1.0) editing, but
not (G)/(D)/(E)/(F) or list-directed input.
Fixes https://github.com/llvm/llvm-project/issues/151465.
Currently, we indicate to the runtime that implicit scalar captures are
firstprivate (via map and
capture types), enough for the runtime trace to treat it as such, but we
do not CodeGen the IR
in such a way that we can take full advantage of this aspect of the
OpenMP specification.
This patch seeks to change that by applying the correct symbol flags
(firstprivate/implicit) to the
implicitly captured scalars within target regions, which then triggers
the delayed privitization code
generation for these symbols, bringing the code generation in-line with
the explicit firstpriviate
clause. Currently, similarly to the delayed privitization I have
sheltered this segment of code
behind the EnabledDelayedPrivitization flag, as without it, we'll
trigger an compiler error for
firstprivate not being supported any time we implicitly capture a scalar
and try to firstprivitize
it, in future when this flag is removed it can also be removed here. So,
for now, you need to
enable this via providing the compiler the flag on compilation of any
programs.
The descriptor for derived-type with CUDA components are allocated in
managed memory. The lowering was calling the standard runtime on
allocate statement where it should be a `cuf.allocate` operation.
While I was updating the creation of CUF operation in #152050, I noticed
some other places in flang that were not updated. This patch updates the
FIR operation creations in `flang/unittests`