Fixes#108136
In #108136 (the new testcase), flang was missing the length parameter
required for the variable length string when boxing the global variable.
The code that is initializing global variables for OpenMP did not
support types with length parameters.
Instead of duplicating this initialization logic in OpenMP, I decided to
use the exact same initialization as is used in the base language
because this will already be well tested and will be updated for any new
types. The difference for OpenMP is that the global variables will be
zero initialized instead of left undefined.
Previously `Fortran::lower::createGlobalInitialization` was used to
share a smaller amount of the logic with the base language lowering. I
think this bug has demonstrated that helper was too low level to be
helpful, and it was only used in OpenMP so I have made it static inside
of ConvertVariable.cpp.
This patch defines `fir::SafeTempArrayCopyAttrInterface` and the
corresponding
OpenACC/OpenMP related attributes in FIR dialect. The actual
implementations
are just placeholders right now, and array repacking becomes a no-op
if `-fopenacc/-fopenmp` is used for the compilation.
TARGET dummy arrays can be accessed indirectly, so it is unsafe
to repack them.
INTENT(OUT) dummy arrays that require finalization on entry
to their subroutine must be copied-in by `fir.pack_arrays`.
In addition, based on my testing results, I think it will be useful
to document that `LOC` and `IS_CONTIGUOUS` will have different values
for the repacked arrays. I still need to decide where to document
this, so just added a note in the design doc for the time being.
Implement handling of `NULL()` RHS, polymorphic pointers, as well as
lower bounds or bounds remapping in pointer assignment inside FORALL.
These cases eventually do not require updating hlfir.region_assign,
lowering can simply prepare the new descriptor for the LHS inside the
RHS region.
Looking more closely at the polymorphic cases, there is not need to call
the runtime, fir.rebox and fir.embox do handle the dynamic type setting
correctly.
After this patch, the last remaining TODO is the allocatable assignment
inside FORALL, which like some cases here, is more likely an accidental
feature given FORALL was deprecated in F2003 at the same time than
allocatable components where added.
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.
Note: this relands #114002 with the fix for the LLVM timeout regressions that have been seen. The fix is to use the added fir.copy to avoid aggregate load/store.
Co-authored-by: NimishMishra <42909663+NimishMishra@users.noreply.github.com>
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.
Dummy arguments from other entry statement that are not live in the current entry have no backing storage, user code referring to them is not allowed to be reached. The compiler was generating initialization/destruction code for them when INTENT(OUT), causing undefined behaviors.
Allocatable members of privatized derived types must be allocated,
with the same bounds as the original object, whenever that member
is also allocated in it, but Flang was not performing such
initialization.
The `Initialize` runtime function can't perform this task unless
its signature is changed to receive an additional parameter, the
original object, that is needed to find out which allocatable
members, with their bounds, must also be allocated in the clone.
As `Initialize` is used not only for privatization, sometimes this
other object won't even exist, so this new parameter would need
to be optional.
Because of this, it seemed better to add a new runtime function:
`InitializeClone`.
To avoid unnecessary calls, lowering inserts a call to it only for
privatized items that are derived types with allocatable members.
Fixes https://github.com/llvm/llvm-project/issues/114888
Fixes https://github.com/llvm/llvm-project/issues/114889
Split some headers into headers for public and private declarations in
preparation for #110217. Moving the runtime-private headers in
runtime-private include directory will occur in #110298.
* Do not use `sizeof(Descriptor)` in the compiler. The size of the
descriptor is target-dependent while `sizeof(Descriptor)` is the size of
the Descriptor for the host platform which might be too small when
cross-compiling to a different platform. Another problem is that the
emitted assembly ((cross-)compiling to the same target) is not identical
between Flang's running on different systems. Moving the declaration of
`class Descriptor` out of the included header will also reduce the
amount of #included sources.
* Do not use `sizeof(ArrayConstructorVector)` and
`alignof(ArrayConstructorVector)` in the compiler. Same reason as with
`Descriptor`.
* Compute the descriptor's extra flags without instantiating a
Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime
source, but not the compiler source.
* Move `InquiryKeywordHashDecode` into runtime-private header. The
function is defined in the runtime sources and trying to call it in the
compiler would lead to a link-error.
* Move allocator-kind magic numbers into common header. They are the
only declarations out of `allocator-registry.h` in the compiler as well.
This does not make Flang cross-compile ready yet, the main goal is to
avoid transitive header dependencies from Flang to clang-rt. There are
more assumptions that host platform is the same as the target platform.
fir.call side effects are hard to describe in a useful way using
`MemoryEffectOpInterface` because it is impossible to list which memory
location a user procedure read/write without doing a data flow analysis
of its body (even PURE procedures may read from any module variable,
Fortran SIMPLE procedure from F2023 will allow that, but they are far
from common at that point).
Fortran language specifications allow the compiler to deduce
that a procedure call cannot access a variable in many cases
This patch leverages this to extend `fir::AliasAnalysis::getModRef` to
deal with fir.call.
This will allow implementing "array = array_function()" optimization in
a future patch.
This patch adds a flag to mark hlfir.declare of host variables that are
captured in some internal procedure.
It enables implementing a simple fir.call handling in
fir::AliasAnalysis::getModRef leveraging Fortran language specifications
and without a data flow analysis.
This will allow implementing an optimization for "array =
array_function()" where array storage is passed directly into the hidden
result argument to "array_function" when it can be proven that
arraY_function does not reference "array".
Captured host variables are very tricky because they may be accessed
indirectly in any calls if the internal procedure address was captured
via some global procedure pointer. Without flagging them, there is no
way around doing a complex inter procedural data flow analysis:
- checking that the call is not made to an internal procedure is not
enough because of the possibility of indirect calls made to internal
procedures inside the callee.
- checking that the current func.func has no internal procedure is not
enough because this would be invalid with inlining when an procedure
with internal procedures is inlined inside a procedure without internal
procedure.
getElementType() was missing from Sequence and Vector types. Did a
replace of the obvious places getEleTy() was used for these two types
and updated to use this name instead.
Co-authored-by: Scott Manley <scmanley@nvidia.com>
Descriptor for module variable with cuda attribute must be set with the
correct allocator index. This patch updates the embox operation used in
the global to carry the allocator index.
CUDA unified variable where set to use the same allocator than managed
variable. This patch adds a specific allocator for the unified
variables. Currently it will call the managed allocator underneath but
we want to have the flexibility to change that in the future.
Local descriptor for cuda allocatable need to be handled on host and
device. One solution is to duplicate the descriptor (one on the host and
one on the device) and keep them in sync or have the descriptor in
managed/unified memory so we don't to take care of any sync.
The second solution is probably the one we will implement. In order to
have more flexibility on how descriptor representing cuda allocatable
are allocated, this patch updates the lowering to use the cuf operations
alloc and free to managed them.
With `createUnallocatedBox` utility change from #96106 , the TODO for assumed-rank in entry
can simply be lifted and test is added.
The key is that a unallocated assumed-rank descriptor is created with
rank zero in the entry where an assumed-rank dummy from some other entry
do not appear as a dummy (the symbol must still be mapped to some valid
value because the symbol could be used in code that would be unreachable
at runtime, but that the compiler must still generate).
The frontend computes the necessary alignment for COMMON blocks but this
information is never carried over to the code generation and can lead to
segfault for COMMON block that requires a non default alignment.
This patch add an optional attribute on fir.global and carries over the
information.
Lower allocatable and pointers specification parts. Nothing special is
required to allocate the descriptor given they are required to be dummy
arguments, however, care must be taken with INTENT(OUT) to use the
runtime to deallocate them (inlined fir.embox + store is not possible).
Enable lowering of assumed-ranks in specification parts under a debug
flag. I am using a debug flag because many cryptic TODOs/issues may be
hit until more support is added. The development should not take too
long, so I want to stay away from the noise of adding an actual
experimental flag to flang-new.
The number of operations dedicated to CUF grew and where all still in
FIR. In order to have a better organization, the CUF operations,
attributes and code is moved into their specific dialect and files. CUF
dialect is tightly coupled with HLFIR/FIR and their types.
The CUF attributes are bundled into their own library since some
HLFIR/FIR operations depend on them and the CUF dialect depends on the
FIR types. Without having the attributes into a separate library there
would be a dependency cycle.
The lowering produces fir.dummy_scope operation if the current
function has dummy arguments. Each hlfir.declare generated
for a dummy argument is then using the result of fir.dummy_scope
as its dummy_scope operand. This is only done for HLFIR.
I was not able to find a reliable way to identify dummy symbols
in `genDeclareSymbol`, so I added a set of registered dummy symbols
that is alive during the variables instantiation for the current
function. The set is initialized during the mapping of the dummy
argument symbols to their MLIR values. It is reset right after
all variables are instantiated - this is done to avoid generating
hlfir.declare operations with dummy_scope for the clones of
the dummy symbols (e.g. this happens with OpenMP privatization).
If this can be done in a cleaner way, please advise.
Lower locals allocation of cuda device, managed and unified variables to
fir.cuda_alloc. Add fir.cuda_free in the function context finalization.
@vzakhari For some reason the PR #90526 has been closed when I merged PR
#90525. Just reopening one.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc754c74a49fe84ed2640e269c25414087.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
Automatic deallocation of allocatable that are cuda device variable must
use the fir.cuda_deallocate operation. This patch update the automatic
deallocation code generation to use this operation when the variable is
a cuda variable.
This patch has also the side effect to correctly call
`attachDeclarePostDeallocAction` for OpenACC declare variable on
automatic deallocation as well. Update the code in
`attachDeclarePostDeallocAction` so we do not attach on fir.result but
on the correct last op.
Automatic deallocation of allocatable that are cuda device variable must
use the fir.cuda_deallocate operation. This patch update the automatic
deallocation code generation to use this operation when the variable is
a cuda variable.
Fortran mandates "CHARACTER(1), VALUE" be passed as a C "char" in calls
to BIND(C) procedures (F'2023 18.3.7 (4)). Lowering passed them by
memory instead. Update call interface lowering code to pass them by
register. Fix related test and update it to use HLFIR.