Local descriptor for cuda allocatable need to be handled on host and
device. One solution is to duplicate the descriptor (one on the host and
one on the device) and keep them in sync or have the descriptor in
managed/unified memory so we don't to take care of any sync.
The second solution is probably the one we will implement. In order to
have more flexibility on how descriptor representing cuda allocatable
are allocated, this patch updates the lowering to use the cuf operations
alloc and free to managed them.
With `createUnallocatedBox` utility change from #96106 , the TODO for assumed-rank in entry
can simply be lifted and test is added.
The key is that a unallocated assumed-rank descriptor is created with
rank zero in the entry where an assumed-rank dummy from some other entry
do not appear as a dummy (the symbol must still be mapped to some valid
value because the symbol could be used in code that would be unreachable
at runtime, but that the compiler must still generate).
The frontend computes the necessary alignment for COMMON blocks but this
information is never carried over to the code generation and can lead to
segfault for COMMON block that requires a non default alignment.
This patch add an optional attribute on fir.global and carries over the
information.
Lower allocatable and pointers specification parts. Nothing special is
required to allocate the descriptor given they are required to be dummy
arguments, however, care must be taken with INTENT(OUT) to use the
runtime to deallocate them (inlined fir.embox + store is not possible).
Enable lowering of assumed-ranks in specification parts under a debug
flag. I am using a debug flag because many cryptic TODOs/issues may be
hit until more support is added. The development should not take too
long, so I want to stay away from the noise of adding an actual
experimental flag to flang-new.
The number of operations dedicated to CUF grew and where all still in
FIR. In order to have a better organization, the CUF operations,
attributes and code is moved into their specific dialect and files. CUF
dialect is tightly coupled with HLFIR/FIR and their types.
The CUF attributes are bundled into their own library since some
HLFIR/FIR operations depend on them and the CUF dialect depends on the
FIR types. Without having the attributes into a separate library there
would be a dependency cycle.
The lowering produces fir.dummy_scope operation if the current
function has dummy arguments. Each hlfir.declare generated
for a dummy argument is then using the result of fir.dummy_scope
as its dummy_scope operand. This is only done for HLFIR.
I was not able to find a reliable way to identify dummy symbols
in `genDeclareSymbol`, so I added a set of registered dummy symbols
that is alive during the variables instantiation for the current
function. The set is initialized during the mapping of the dummy
argument symbols to their MLIR values. It is reset right after
all variables are instantiated - this is done to avoid generating
hlfir.declare operations with dummy_scope for the clones of
the dummy symbols (e.g. this happens with OpenMP privatization).
If this can be done in a cleaner way, please advise.
Lower locals allocation of cuda device, managed and unified variables to
fir.cuda_alloc. Add fir.cuda_free in the function context finalization.
@vzakhari For some reason the PR #90526 has been closed when I merged PR
#90525. Just reopening one.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc754c74a49fe84ed2640e269c25414087.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
Automatic deallocation of allocatable that are cuda device variable must
use the fir.cuda_deallocate operation. This patch update the automatic
deallocation code generation to use this operation when the variable is
a cuda variable.
This patch has also the side effect to correctly call
`attachDeclarePostDeallocAction` for OpenACC declare variable on
automatic deallocation as well. Update the code in
`attachDeclarePostDeallocAction` so we do not attach on fir.result but
on the correct last op.
Automatic deallocation of allocatable that are cuda device variable must
use the fir.cuda_deallocate operation. This patch update the automatic
deallocation code generation to use this operation when the variable is
a cuda variable.
Fortran mandates "CHARACTER(1), VALUE" be passed as a C "char" in calls
to BIND(C) procedures (F'2023 18.3.7 (4)). Lowering passed them by
memory instead. Update call interface lowering code to pass them by
register. Fix related test and update it to use HLFIR.
The all one masks was not properly created for i128 types because
builder.createIntegerConstant ended-up truncating -1 to something
positive.
Add a builder.createAllOnesInteger/createMinusOneInteger helpers and use
them where createIntegerConstant(..., -1) was used.
Add an assert in createIntegerConstant to catch negative numbers for
i128 type.
This PR is to address `TODO(loc, "procedure pointer component default
initialization");`.
It handles default init for procedure pointer components in a derived
type that is 32 bytes or larger (Default init for smaller size type has
already been handled).
```
interface
subroutine sub()
end
end interface
type dt
real :: r1 = 5.0
procedure(real), pointer, nopass :: pp1 => null()
real, pointer :: rp1 => null()
procedure(), pointer, nopass :: pp2 => sub
end type
type(dt) :: dd1
end
```
Cray pointee symbols can be host associated from a module or host
procedure while the related cray pointer is not explicitly associated.
This caused the "not yet implemented: lowering symbol to HLFIR" to fire
when lowering a reference to the cray pointee and fetching the cray
pointer.
This patch:
- Ensures cray pointers are always instantiated when instantiating a
cray pointee.
- Fix internal procedure lowering to deal with cray pointee host
association like it does for pointers (the lowering strategy for cray
pointee is to create a pointer that is updated with the cray pointer
value before being fetched).
This should fix the bug reported in
https://github.com/llvm/llvm-project/issues/85420.
The current lowering did not handle sequence associated argument passed
by descriptor. This case is special because sequence association implies
that the actual and dummy argument need to to agree in rank and shape.
Usually, arguments that can be sequence associated are passed by raw
address, and the shape mistmatch is transparent. But there are three
cases of explicit and assumed-size arrays passed by descriptors:
- polymorphic arguments
- BIND(C) assumed-length arguments (F'2023 18.3.7 (5)).
- length parametrized derived types (TBD)
The callee side is expecting a descriptor containing the dummy rank and
shape. This was not the case. This patch fix that by evaluating the
dummy shape on the caller side using the interface (that has to be
available when arguments are passed by descriptors).
CUDA attribute are correctly propagated to the module file but were not
imported currently so they did not appear on the hlfir.declare and
fir.global operations for module variables.
The newly introduced `CUDAAttribute` is meant for CUDA attributes
associated with variable. In order to not clash with the future
attribute for function/subroutine, rename `CUDAAttribute` to
`CUDADataAttribute`.
Lower CUDA attribute for simple dummy argument. This is done in a
similar way than `TARGET`, `OPTIONAL` and so on.
This patch also move the `Fortran::common::CUDADataAttr` to
`fir::CUDAAttributeAttr` mapping to
`flang/include/flang/Optimizer/Support/Utils.h` so that it can be reused
where needed.
This is a first simple patch to introduce a new FIR attribute to carry
the CUDA variable attribute information to hlfir.declare and fir.declare
operations. It currently lowers this information for local variables.
The texture attribute is omitted since it is rejected by semantic and
will not make its way to MLIR.
This new attribute is added as optional attribute to the hlfir.declare
and fir.declare operations.
Runtime globals are compiler generated globals injected in user scopes.
They are never referred to directly in lowering code, we only need th
fur.global for them. Yet lowering was creating hlfir.declare for them in
module procedures. In modern fortran apps, this blows up the generated
IR for nothing (Types with dozens of components, type bound procedures
and parents can create in the order of 10 000 runtime info globals to
describe them, if there is a 100 module procedure, that is that is a few
million operations generated and processed in each pass for nothing).
Start implementing assumed-rank support as described in
https://github.com/llvm/llvm-project/blob/main/flang/docs/AssumedRank.md
This commit holds the minimal support for lowering calls to procedure
with assumed-rank arguments where the procedure implementation is done
in C.
The case for passing assumed-size to assumed-rank is left TODO since it
will be done a change in assumed-size lowering that is better done in
another patch.
Care is taken to set the lower bounds to zero when passing non allocatable no pointer as descriptor
to a BIND(C) procedure as required per 18.5.3 point 3. This was not done before while the requirements also applies to non assumed-rank descriptors. This change required special attention with IGNORE_TKR(t) to avoid emitting invalid fir.rebox operations (the actual argument type must be used in this case as the output type).
Implementation of Fortran procedure with assumed-rank arguments is still
TODO.
Currently lowering sets the extents of assumed-size array to "undef"
which was OK as long as the value was not expected to be read.
But when interfacing with the runtime and when passing assumed-size to
assumed-rank, this last extent may be read and must be -1 as specified
in the BIND(C) case in 18.5.3 point 5.
Set this value to -1, and update all the lowering code that was looking
for an undef defining op to identify assumed-size: much safer to
propagate and use semantic info here, the previous check actually did
not work if the array was used in an internal procedure (defining op not
visible anymore).
@clementval and @agozillon, I left assumed-size extent to zero in the
acc/omp bounds op as it was, please double check that is what you want
(I can imagine -1 may create troubles here, and 0 makes some sense as it
would lead to no data transfer).
This also allows removing special cases in UBOUND/LBOUND lowering.
Also disable allocation of cray pointee. This was never intended and
would now lead to crashes with the -1 value for assumed-size cray
pointee.
Lower initialized BIND(C) module variable as regular module variable,
except that the fir.global symbol name is the binding label.
For uninitialized variables, add the common linkage so that C code may
define the variables. The standard does not provide a way to indicate
that a variable is defined in C, but there are use cases.
Beware that if the module file compiled object is added to a shared
library, the variable will become a regular global definition and may
override the C variable depending on the linking order.
Lowering was instantiating component symbols (but the last) in initial
target designator as if they were whole objects, leading to collisions
and bugs.
Fixes https://github.com/llvm/llvm-project/issues/75728
VALUE derived type are passed by reference outside of BIND(C) interface.
The ABI is much simpler and it is possible for these arguments to have
the OPTIONAL attribute.
In the BIND(C) context, these arguments must follow the C ABI for
struct, which may lead the data to be passed in register. OPTIONAL is
also forbidden for those arguments, so it is safe to directly use the
fir.type<T> type for the func.func argument.
Codegen is in charge of later applying the C passing ABI according to
the target (https://github.com/llvm/llvm-project/pull/74829).
The function `genCommonBlockMember` is not specific to OpenMP, and it
could very well be a common utility. Move it to ConvertVariable.cpp
where it logically belongs.
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.
**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.
**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.
Note: this commit is actually resubmitting the original commit from
PR #70461 that was reverted. See PR #73221.
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.
**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.
**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.
This change is required for hlfir.declares of host-associated symbols in
the OpenMP regions.
Added A FIXME to correctly use the symbol attributes for VOLATILE and
ASYNCHRONOUS.
Type extension is currently handled in FIR by inlining the parents
components as the first member of the record type.
This is not correct from a memory layout point of view since the storage
size of the parent type may be bigger than the sum of the size of its
component (due to alignment requirement). To avoid making FIR types
target dependent and fix this issue, make the parent component a single
component with the parent type at the beginning of the record type.
This also simplifies addressing since parent component is now a "normal"
component that can be designated with hlfir.designate.
StructureComponent lowering however is a bit more complex since the
symbols in the structure component may refer to subcomponents of parent
types.
Notes:
1. The fix is only done in HLFIR for now, a similar fix should be done
in ConvertExpr.cpp to fix the path without HLFIR (I will likely still do
it in a new patch since it would be an annoying bug to investigate for
people testing flang without HLFIR).
2. The private component extra mangling is useless after this patch. I
will remove it after 1.
3. The "parent component" TODO in constant CTOR is free to implement for
HLFIR after this patch, but I would rather remove it and test it in a
different patch.
Non POINTER/ALLOCATABLE INTENT(OUT) dummy arguments with allocatable
components were reset without a proper deallocation if needed. Add a
call to Destroy runtime to deallocate the components on entry.
Notes:
1. The same logic is not needed on the callee side of BIND(C) call
because BIND(C) arguments cannot be derived type with allocatable
components (C1806).
2. When the argument is an INTENT(OUT) polymorphic, the dynamic type of
the actual may contain allocatable components. This case is covered by
the call to Destroy that uses dynamic type and was already inserted for
INTENT(OUT) polymorphic dummies.
When calling a statement function with a character actual argument with
a constant length mismatching the dummy length, HLFIR lowering created
an hlfir.declare with the actual argument length for the dummy, causing
bugs when lowering the statement function expression.
Ensure character dummies are always cast to the dummy type when lowering
dummy declarations.
Fixes https://github.com/llvm/llvm-project/issues/67658
The bug was that when instantiating a character array result variable,
the code inserted a cast from the result buffer to the proper array type
if it could see an fir.unboxchar op. But this is wrong for results and
on caller side because the fir.emboxchar is visible so,
charHelper.genUnboxChar() just takes the operand from that instead of
generating an unboxchar.
The fix is simply to move the cast at the place where fir.boxchar<>
argument are dealt with. The cast when creating fir.emboxchar is also
removed: it adds noise and causes constant length result type to be
lowered to fir.char<?>.
The main change from this patch is to deal with the lit test fallout of
this cast move and removal.
Follow up up of https://github.com/llvm/llvm-project/pull/67693
- Zero initialize uninitialized components of saved derived type entity
with a default initial value.
- Zero initialize uninitialized storage of common blocks with a member
with an initial value.
- Zero initialized uninitialized saved equivalence
This removes all the cases where fir.global are created with an initial
value that results in an undef in LLVM for part of the global, leading
in surprising LLVM optimizations at -O2 for Fortran folks that expects
there saved variables to be zero initialized if there is no explicit or
default initial value.
This is not standard but is vastly expected by existing code.
This was implemented by https://reviews.llvm.org/D149877 for simple
scalars, but MLIR lacked a generic way to deal with aggregate types
(arrays and derived type).
Support was recently added in
https://github.com/llvm/llvm-project/pull/65508. Leverage it to zero
initialize all types.