Reviewed in #152379
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
The descriptor for derived-type with CUDA components are allocated in
managed memory. The lowering was calling the standard runtime on
allocate statement where it should be a `cuf.allocate` operation.
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch replaces std::nullopt with {}. There are a couple of
places where std::nullopt is replaced with TypeRange() to accommodate
perfect forwarding.
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.
New tentative with some fix. The previous was reverted some time ago.
Reviewed in #138010
When lowering allocatables, the generated calls to runtime functions
were not using the runtime::createArguments utility which handles the
required conversions. createArguments is where I added the implicit
volatile casts to handle converting volatile variables to the
appropriate type based on their volatility in the callee. Because the
calls to allocatable runtime functions were not using this function,
their arguments were not casted to have the appropriate volatility.
Add a test to demonstrate that volatile and allocatable
class/box/reference types are appropriately casted before calling into
the runtime library.
Instead of using a recursive variadic template to perform the
conversions in createArguments, map over the arguments directly so that
createArguments can be called with an ArrayRef of arguments. Some cases
in Allocatable.cpp already had a vector of values at the point where
createArguments needed to be called - the new overload allows calling
with a vector of args or the variadic version with each argument spelled
out at the callsite.
This change resulted in the allocatable runtime calls having their
arguments converted left-to-right, which changed some of the test
results. I used CHECK-DAG to ignore the order.
Add some missing handling of volatile class entities, which I previously
missed because I had not yet enabled volatile class entities in Lower.
When `PINNED=` is used with variables that don't have the `PINNED`
attribute, the logical value must be set to false when host allocation
is performed.
Introduce cuf.sync_descriptor to be used to sync device global
descriptor after pointer association.
Also move CUFCommon so it can be used in FIRBuilder lib as well.
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.
This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.
A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
ALLOCATE and DEALLOCATE statements can be inlined in device function.
This patch updates the condition that determined to inline these actions
in lowering.
This avoid runtime calls in device function code and can speed up the
execution.
Also move `isCudaDeviceContext` from `Bridge.cpp` so it can be used
elsewhere.
This fixes a bug in OpenMP privatisation. The privatised variables are
created as though they are host associated clones of the original
variables. These privatised variables do not contain the allocatable
attribute themselves and so we need to check if the ultimate symbol is
allocatable. Having or not having this flag influences whether lowering
determines that this is a whole allocatable assignment, which then
causes hlfir.assign not to get the realloc flag, which cases the
allocatable not to be allocated when it is assigned to (leading to a
segfault running the newly added test).
I also did the same for pointer variables because I would imagine they
could experience the same issue.
There is no fallout on tests outside of OpenMP, and the gfortran test
suite still passes, so I think this doesn't break host other kinds of
host associated symbols.
Lower allocatable and pointers specification parts. Nothing special is
required to allocate the descriptor given they are required to be dummy
arguments, however, care must be taken with INTENT(OUT) to use the
runtime to deallocate them (inlined fir.embox + store is not possible).
The number of operations dedicated to CUF grew and where all still in
FIR. In order to have a better organization, the CUF operations,
attributes and code is moved into their specific dialect and files. CUF
dialect is tightly coupled with HLFIR/FIR and their types.
The CUF attributes are bundled into their own library since some
HLFIR/FIR operations depend on them and the CUF dialect depends on the
FIR types. Without having the attributes into a separate library there
would be a dependency cycle.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc754c74a49fe84ed2640e269c25414087.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
Automatic deallocation of allocatable that are cuda device variable must
use the fir.cuda_deallocate operation. This patch update the automatic
deallocation code generation to use this operation when the variable is
a cuda variable.
This patch has also the side effect to correctly call
`attachDeclarePostDeallocAction` for OpenACC declare variable on
automatic deallocation as well. Update the code in
`attachDeclarePostDeallocAction` so we do not attach on fir.result but
on the correct last op.
Automatic deallocation of allocatable that are cuda device variable must
use the fir.cuda_deallocate operation. This patch update the automatic
deallocation code generation to use this operation when the variable is
a cuda variable.
Replace the runtime call to `AllocatableDeallocate` for CUDA device
variable to the newly added `fir.cuda_deallocate` operation.
This is similar with #88980
A third patch will handle the case of automatic dealloctaion of device
allocatable variables
Allocate statement for variable with CUDA attributes need to allocate
memory on the device and not the host. Add a proper TODO so we keep
track of work to be done for it.
Flang supports source allocation to allocatable or pointers with a non
deferred length that do not match the source length. This documented at:
9708d09003/flang/docs/Extensions.md (L312)
The current lowering code was bugged when such explicit length allocate
object appeared after a deferred length object in the source allocation
list:
Since "lenParams" had been computed when generating allocation of the
deferred length object, the call to genSetDeferredLengthParameters was
not a no-op on when lowering the explicit length allocation, and the
explicit length was overridden with the source length.
The output of the program added in test was:
```
ZZheZZ
ZZhelloZZ
ZZhelloZZ
```
Instead of:
```
ZZheZZ
ZZhelloZZ
ZZhello ZZ
```
Skip genSetDeferredLengthParameters when the allocate object has non
deferred length.
A DEALLOCATE statement on a pointer should always use
PointerDeallocate() in the runtime, even if there's no STAT= or
polymorphism or derived types, so that it can be checked to ensure that
it is indeed a whole allocation of a pointer.
The `acc.declate_action` attribute was sometime misplaced as reported in
#79770.
This patch updates the lowering code to place the
postAllocate/postDeallocate actions at the correct place.
The standard requires a compiler to diagnose an incorrect use of a
pointer in a DEALLOCATE statement. The pointer must be associated with
an entire object that was allocated as a pointer (not allocatable) by an
ALLOCATE statement.
Implement by appending a validation footer to pointer allocations. This
is an extra allocated word that encodes the base address of the
allocation. If it is not found after the data payload when the pointer
is deallocated, signal an error. There is a chance of a false positive
result, but that should be vanishingly unlikely.
This change requires all pointer allocations (not allocatables) to take
place in the runtime in PointerAllocate(), which might be slower in
cases that could otherwise be handled with a native memory allocation
operation. I believe that memory allocation of pointers is less common
than with allocatables, which are not affected. If this turns out to
become a performance problem, we can inline the creation and
initialization of the footer word.
Fixes https://github.com/llvm/llvm-project/issues/78391.
There are currently several places that automatically deallocate
allocatble if they are allocated:
- INTENT(OUT) allocatable are deallocated on entry in the callee
- INTENT(OUT) allocatable are also deallocated on the caller side of
BIND(C) function in case the implementation is in C.
- Results of function returning allocatable are deallocated after usage.
- OPENMP privatized allocatable are deallocated at the end of OPENMP
region.
Introduce genDeallocateIfAllocated that centralize all this code, except
for the function return that use genFreememIfAllocated since
finalization is done separately currently.
`fir:🏭:genFinalization` and
`fir:🏭:genInlinedDeallocation` are removed and replaced by
genFreemem since their name were misleading: finalization was not
called.
There is a fallout in the tests because previous generated code did not
check the allocated status when doing inline deallocation. This was OK
since free(null) is guaranteed to be a no-op, but this makes compiler
code more complex, is a bit surprising in the generated IR IMHO, and it
relied on knowing when genDeallocateBox inserts runtime calls or uses
inlined code.
This patches adds the acc.declare_action attrbites on
post allocate operation and pre/post deallocate operations.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D157915
Begin upstreaming of CUDA Fortran support in LLVM Flang.
This first patch implements parsing for CUDA Fortran syntax,
including:
- a new LanguageFeature enum value for CUDA Fortran
- driver change to enable that feature for *.cuf and *.CUF source files
- parse tree representation of CUDA Fortran syntax
- dumping and unparsing of the parse tree
- the actual parsers for CUDA Fortran syntax
- prescanning support for !@CUF and !$CUF
- basic sanity testing via unparsing and parse tree dumps
... along with any minimized changes elsewhere to make these
work, mostly no-op cases in common::visitors instances in
semantics and lowering to allow them to compile in the face
of new types in variant<> instances in the parse tree.
Because CUDA Fortran allows the kernel launch chevron syntax
("call foo<<<blocks, threads>>>()") only on CALL statements and
not on function references, the parse tree nodes for CallStmt,
FunctionReference, and their shared Call were rearranged a bit;
this caused a fair amount of one-line changes in many files.
More patches will follow that implement CUDA Fortran in the symbol
table and name resolution, and then semantic checking.
Differential Revision: https://reviews.llvm.org/D150159
Currently, local allocatables and contiguous/scalar pointers (and some
other conditions) are lowered to a set of independent variables in FIR
(one for the address, one for each bound and one for character length).
The intention was to help LLVM get rids of descriptors. But LLVM knows
how to do that anyway in those cases:
```
subroutine foo(x)
real, target :: x(100)
real, pointer, contiguous :: p(:)
p => x
call bar(p(50))
end subroutine
```
The output fir the option on or off is the same after llvm opt -O1,
there is no descriptor anymore, the indirection is removed.
```
define void @foo_(ptr %0) local_unnamed_addr {
%2 = getelementptr [100 x float], ptr %0, i64 0, i64 49
tail call void @bar_(ptr %2)
ret void
}
```
So the benefit of not using a descriptor in lowering is questionable,
and although it is abstracted as much as possible in the so called
MutableBoxValue class that represent allocatable/pointer in lowering
it is still causing bugs from time to time, and will also be a bit
problematic when emitting debug info for the pointer/allocatable.
In HLFIR lowering, the simplification to always use a descriptor in
lowering was already made. This patch allows decorrelating the impact
from this change from the bigger impact HLFIR will have so that it
is easier to get feedback if this causes performance issues.
The lowering tests relying on the previous behavior are only updated
to set back this option to true. The reason is that I think we should
preserve coverage of the code dealing with the "non descriptor" approach
in lowering until we actually get rid of it. The other reason is that
the test will have to be or are already covered by equivalent HLFIR
tests, which use descriptors.
Differential Revision: https://reviews.llvm.org/D148910
Update lowering of allocate statement to use the new
functions defined in D146290.
Depends on D146290
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D146291
We were failing tests where an ALLOCATE statement that allocated an
array had a non-character scalar MOLD argument.
I fixed this by merging the code for ALLOCATE statements with MOLD and
SOURCE arguments.
Differential Revision: https://reviews.llvm.org/D145418