…an device arrays (#185984)"
This reverts commit fb18d570b0466ca2a401aba11d6e58b206aebc1a.
This PR caused compilation failures with allocatable arrays, reverting
now for more investigation.
When CUDA Fortran device arrays are listed in an OpenMP private clause,
the compiler previously allocated private copies on the host heap using
fir.allocmem. This caused device-side operations to receive host
pointers instead of device pointers, leading to cudaErrorIllegalAddress
(700).
Fix by detecting symbols with a CUDA data attribute (device, managed,
unified, etc.) during privatization and using cuf.alloc / cuf.free
instead of fir.allocmem / fir.freemem, so the private copies reside in
device memory.
The LOC intrinsic allows a cray pointer to alias with ordinary variables
with no other attribute. See the new test for an example.
This is not enabled by default. The functionality can be used with
`-mmlir -funsafe-cray-pointers`.
First part of the un-revert of #169544. That will handle TBAA.
CHARACTER dummy arguments were treated as local variables in debug info.
This happened because our method to get the argument number was not
robust. It relied on `DeclareOp` having a direct reference to arguments
which was not the case for character arguments. This is fixed by storing
source-level argument positions in `DeclareOp`.
Fixes#112886
The purpose of this patch is to allow converting FIR array representation to
memref when possible without hitting memref verifier issue.
The issue was that FIR arrays may be assumed size, in which case the
last dimension will not be known at runtime. Flang uses -1 to encode
this to fulfill Fortran 2023 standard requirements in 18.5.3 point 5
about CFI_desc_t.
When arrays are converted to memeref, if this `-1` reaches memeref
operations, it triggers verifier errors (even if the conversion happened
in code that guards the code to be entered at runtime if the array is
assumed-size because folders/verifiers do not take into account
reachability).
This follows-up on discussions in #163505 merge requests
New implementation of `MayNeedCopy()` is used to consolidate
copy-in/copy-out checks.
`IsAssumedShape()` and `IsAssumedRank()` were simplified and are both
now in `Fortran::semantics` workspace.
`preparePresentUserCallActualArgument()` in lowering was modified to use
`MayNeedCopyInOut()`
Fixes https://github.com/llvm/llvm-project/issues/138471
Reviewed in #152379
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
The descriptor for derived-type with CUDA components are allocated in
managed memory. The lowering was calling the standard runtime on
allocate statement where it should be a `cuf.allocate` operation.
ArrayRef(std::nullopt_t) has been deprecated. This patch replaces
std::nullopt with {}.
A subsequence patch will address those places where we need to replace
std::nullopt with mlir::TypeRange{} or mlir::ValueRange{}.
Reland #145901 with a fix for shared library builds.
So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.
This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.
This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.
Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.
I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.
This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.
This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.
Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.
I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch replaces std::nullopt with {}. There are a couple of
places where std::nullopt is replaced with TypeRange() to accommodate
perfect forwarding.
Support odd case where a static object is being declared both as a
common block and a BIND(C) module variable name in different modules,
and both modules are used in the same compilation unit.
This is not standard, but happens when using MPI and MPI_F08 in the same
compilation unit, and at least both gfortran and ifx support this.
See added test case for an illustration.
`cuf.alloc` and `cuf.free` are converted to `fir.alloca` or deleted when
in device context during the CUFOpConversion pass. Do not generate them
in lowering to avoid confusion.
Fixes#108136
In #108136 (the new testcase), flang was missing the length parameter
required for the variable length string when boxing the global variable.
The code that is initializing global variables for OpenMP did not
support types with length parameters.
Instead of duplicating this initialization logic in OpenMP, I decided to
use the exact same initialization as is used in the base language
because this will already be well tested and will be updated for any new
types. The difference for OpenMP is that the global variables will be
zero initialized instead of left undefined.
Previously `Fortran::lower::createGlobalInitialization` was used to
share a smaller amount of the logic with the base language lowering. I
think this bug has demonstrated that helper was too low level to be
helpful, and it was only used in OpenMP so I have made it static inside
of ConvertVariable.cpp.
This patch defines `fir::SafeTempArrayCopyAttrInterface` and the
corresponding
OpenACC/OpenMP related attributes in FIR dialect. The actual
implementations
are just placeholders right now, and array repacking becomes a no-op
if `-fopenacc/-fopenmp` is used for the compilation.
TARGET dummy arrays can be accessed indirectly, so it is unsafe
to repack them.
INTENT(OUT) dummy arrays that require finalization on entry
to their subroutine must be copied-in by `fir.pack_arrays`.
In addition, based on my testing results, I think it will be useful
to document that `LOC` and `IS_CONTIGUOUS` will have different values
for the repacked arrays. I still need to decide where to document
this, so just added a note in the design doc for the time being.
Implement handling of `NULL()` RHS, polymorphic pointers, as well as
lower bounds or bounds remapping in pointer assignment inside FORALL.
These cases eventually do not require updating hlfir.region_assign,
lowering can simply prepare the new descriptor for the LHS inside the
RHS region.
Looking more closely at the polymorphic cases, there is not need to call
the runtime, fir.rebox and fir.embox do handle the dynamic type setting
correctly.
After this patch, the last remaining TODO is the allocatable assignment
inside FORALL, which like some cases here, is more likely an accidental
feature given FORALL was deprecated in F2003 at the same time than
allocatable components where added.
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.
Note: this relands #114002 with the fix for the LLVM timeout regressions that have been seen. The fix is to use the added fir.copy to avoid aggregate load/store.
Co-authored-by: NimishMishra <42909663+NimishMishra@users.noreply.github.com>
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.
Dummy arguments from other entry statement that are not live in the current entry have no backing storage, user code referring to them is not allowed to be reached. The compiler was generating initialization/destruction code for them when INTENT(OUT), causing undefined behaviors.