187 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
eb0ddba26b
Reland "[flang][cuda] Set the allocator of derived type component after allocation" (#152418)
Reviewed in #152379
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
2025-08-06 21:49:55 -07:00
Valentin Clement (バレンタイン クレメン)
7d3134f6cc
Revert "[flang][cuda] Set the allocator of derived type component after allocation" (#152402)
Reverts llvm/llvm-project#152379

Buildbot failure
https://lab.llvm.org/buildbot/#/builders/207/builds/4905
2025-08-06 15:55:53 -07:00
Valentin Clement (バレンタイン クレメン)
d897355876
[flang][cuda] Set the allocator of derived type component after allocation (#152379)
- Move the allocator index set up after the allocate statement otherwise
the derived type descriptor is not allocated.
- Support array of derived-type with device component
2025-08-06 15:14:00 -07:00
Valentin Clement (バレンタイン クレメン)
9b195dc3ef
[flang][cuda] Generate cuf.allocate for descriptor with CUDA components (#152041)
The descriptor for derived-type with CUDA components are allocated in
managed memory. The lowering was calling the standard runtime on
allocate statement where it should be a `cuf.allocate` operation.
2025-08-04 16:51:11 -07:00
Valentin Clement (バレンタイン クレメン)
05b52ef909
[flang][cuda][NFC] Update to the new create APIs (#152050)
Some operation creations were updated in flang directory but not all.
Migrate the CUF ops to the new create APIs introduce in #147168
2025-08-04 16:09:24 -07:00
Maksim Levental
a3a007ad5f
[mlir][NFC] update flang/Lower create APIs (8/n) (#149912)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-21 19:54:29 -04:00
Valentin Clement (バレンタイン クレメン)
6c63316ee1
[flang][cuda] Support device component in a pointer or allocatable derived-type (#149418) 2025-07-18 04:17:15 -07:00
Kazu Hirata
2a7328daca
[flang] Migrate away from ArrayRef(std::nullopt_t) (#149337)
ArrayRef(std::nullopt_t) has been deprecated.  This patch replaces
std::nullopt with {}.

A subsequence patch will address those places where we need to replace
std::nullopt with mlir::TypeRange{} or mlir::ValueRange{}.
2025-07-17 15:23:55 -07:00
Valentin Clement (バレンタイン クレメン)
8349bbd0b9
[flang][cuda] Exit early when there is no device components (#149005)
- Exit early when there is no device components
- Make the retrieval of the record type more robust
2025-07-16 10:08:11 -07:00
Valentin Clement (バレンタイン クレメン)
b4e2272271
[flang][cuda] Move cuf.set_allocator_idx after derived-type init (#148936)
Derived type initialization overwrite the component descriptor. Place
the `cuf.set_allocator_idx` after the initialization is performed.
2025-07-15 13:52:00 -07:00
Kazu Hirata
769bd90f8b [flang] Fix a warning
This patch fixes:

  flang/lib/Lower/ConvertVariable.cpp:787:22: error: unused variable
  'fieldTy' [-Werror,-Wunused-variable]
2025-07-14 22:17:12 -07:00
Valentin Clement (バレンタイン クレメン)
90ef114a33
[flang][cuda] Add cuf.set_allocator_idx for device component (#148750) 2025-07-14 19:31:44 -07:00
Valentin Clement (バレンタイン クレメン)
659c8102f4
Reland [flang][cuda] Allocate derived-type with CUDA component in anaged memory (#147416) 2025-07-07 17:40:04 -07:00
Valentin Clement
a5350785db Revert "[flang][cuda] Allocate derived-type with CUDA componement in managed memory (#146797)"
This reverts commit 925588cd001a91d592b99e6e7c6bee9514f5a26e.
2025-07-02 17:51:45 -07:00
Valentin Clement (バレンタイン クレメン)
925588cd00
[flang][cuda] Allocate derived-type with CUDA componement in managed memory (#146797)
Similarly to descriptor for device data, put derived type holding device
descriptor in managed memory.
2025-07-02 16:02:08 -07:00
jeanPerier
faefe7cf7d
[flang] add option to generate runtime type info as external (#146071)
Reland #145901 with a fix for shared library builds.

So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.

This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.

This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.

Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.

I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
2025-06-30 09:58:00 +02:00
jeanPerier
37e2d10499
Revert "[flang] add option to generate runtime type info as external" (#146064)
Reverts llvm/llvm-project#145901

Broke shared library builds because of the usage of
`skipExternalRttiDefinition` in Lowering.
2025-06-27 14:05:59 +02:00
jeanPerier
e816817bbb
[flang] add option to generate runtime type info as external (#145901)
So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.

This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.

This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.

Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.

I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
2025-06-27 13:00:29 +02:00
Kazu Hirata
938cdb30f1
[flang] Migrate away from std::nullopt (NFC) (#145928)
ArrayRef has a constructor that accepts std::nullopt.  This
constructor dates back to the days when we still had llvm::Optional.

Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.

This patch replaces std::nullopt with {}.  There are a couple of
places where std::nullopt is replaced with TypeRange() to accommodate
perfect forwarding.
2025-06-26 12:41:49 -07:00
jeanPerier
13daf65656
[flang] handle common block used as BIND(C) module variables (#145669)
Support odd case where a static object is being declared both as a
common block and a BIND(C) module variable name in different modules,
and both modules are used in the same compilation unit.

This is not standard, but happens when using MPI and MPI_F08 in the same
compilation unit, and at least both gfortran and ifx support this.

See added test case for an illustration.
2025-06-26 12:00:23 +02:00
Valentin Clement (バレンタイン クレメン)
d6d52a4abc
[flang][cuda] Do not generate cuf.alloc/cuf.free in device context (#141117)
`cuf.alloc` and `cuf.free` are converted to `fir.alloca` or deleted when
in device context during the CUFOpConversion pass. Do not generate them
in lowering to avoid confusion.
2025-05-22 13:30:34 -07:00
Tom Eccles
75e5643abf
[flang][OpenMP] share global variable initialization code (#138672)
Fixes #108136

In #108136 (the new testcase), flang was missing the length parameter
required for the variable length string when boxing the global variable.
The code that is initializing global variables for OpenMP did not
support types with length parameters.

Instead of duplicating this initialization logic in OpenMP, I decided to
use the exact same initialization as is used in the base language
because this will already be well tested and will be updated for any new
types. The difference for OpenMP is that the global variables will be
zero initialized instead of left undefined.

Previously `Fortran::lower::createGlobalInitialization` was used to
share a smaller amount of the logic with the base language lowering. I
think this bug has demonstrated that helper was too low level to be
helpful, and it was only used in OpenMP so I have made it static inside
of ConvertVariable.cpp.
2025-05-07 10:18:13 +01:00
Slava Zakharin
9aff19e7a3
[flang] Defined SafeTempArrayCopyAttrInterface for array repacking. (#134346)
This patch defines `fir::SafeTempArrayCopyAttrInterface` and the
corresponding
OpenACC/OpenMP related attributes in FIR dialect. The actual
implementations
are just placeholders right now, and array repacking becomes a no-op
if `-fopenacc/-fopenmp` is used for the compilation.
2025-04-10 18:41:54 -07:00
Slava Zakharin
3f6ae3f0a8
[flang] Added driver options for arrays repacking. (#134002)
Added options:
  * -f[no-]repack-arrays
  * -f[no-]stack-repack-arrays
  * -frepack-arrays-contiguity=whole/innermost
2025-04-03 10:43:28 -07:00
Slava Zakharin
2c91f10362
[flang] Fixed repacking for TARGET and INTENT(OUT) (#131972)
TARGET dummy arrays can be accessed indirectly, so it is unsafe
to repack them.
INTENT(OUT) dummy arrays that require finalization on entry
to their subroutine must be copied-in by `fir.pack_arrays`.

In addition, based on my testing results, I think it will be useful
to document that `LOC` and `IS_CONTIGUOUS` will have different values
for the repacked arrays. I still need to decide where to document
this, so just added a note in the design doc for the time being.
2025-03-19 17:12:32 -07:00
Slava Zakharin
fd0e20a64b
[flang] Generate fir.pack/unpack_array in Lowering. (#131704)
Basic generation of array repacking operations in Lowering.
2025-03-18 21:26:33 -07:00
Valentin Clement (バレンタイン クレメン)
4fde8c341f
[flang][cuda] Lower CUDA shared variable with cuf.shared_memory op (#131399)
Use `cuf.shared_memory` operation instead of `cuf.alloc` for CUDA shared
variable. These variables do not need free operations.
2025-03-16 17:44:56 -07:00
jeanPerier
3ff3b29dd6
[flang] lower remaining cases of pointer assignments inside forall (#130772)
Implement handling of `NULL()` RHS, polymorphic pointers, as well as
lower bounds or bounds remapping in pointer assignment inside FORALL.

These cases eventually do not require updating hlfir.region_assign,
lowering can simply prepare the new descriptor for the LHS inside the
RHS region.

Looking more closely at the polymorphic cases, there is not need to call
the runtime, fir.rebox and fir.embox do handle the dynamic type setting
correctly.

After this patch, the last remaining TODO is the allocatable assignment
inside FORALL, which like some cases here, is more likely an accidental
feature given FORALL was deprecated in F2003 at the same time than
allocatable components where added.
2025-03-14 10:51:46 +01:00
jeanPerier
356bf3fa2d
Reland " [flang] Rely on global initialization for simpler derived types" (#130290)
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.

Note: this relands #114002 with the fix for the LLVM timeout regressions that have been seen. The fix is to use the added fir.copy to avoid aggregate load/store.

Co-authored-by: NimishMishra <42909663+NimishMishra@users.noreply.github.com>
2025-03-11 15:19:43 +01:00
Leandro Lupori
29f5d5bea9
[flang][OpenMP] Fix privatization of procedure pointers (#130336)
Fixes #121720
2025-03-11 09:38:40 -03:00
Tom Eccles
d31a7dde48
Revert " [flang] Rely on global initialization for simpler derived types" (#130278)
Reverts llvm/llvm-project#114002

This causes a regression building cam4_r from spec2017
2025-03-07 13:59:29 +00:00
Valentin Clement (バレンタイン クレメン)
2130285564
[flang][cuda] Make sure allocator id is set for pointer allocate (#129950) 2025-03-05 17:29:09 -08:00
NimishMishra
0ae1f0a310
[flang] Rely on global initialization for simpler derived types (#114002)
Currently, all derived types are initialized through `_FortranAInitialize`, which is functionally correct, but bears poor runtime performance. This patch falls back on global initialization for "simpler" derived types to speed up the initialization.
2025-03-05 05:44:51 -08:00
Valentin Clement (バレンタイン クレメン)
f3000d7d27
[flang][cuda] Do not trigger automatic deallocation in main (#128789)
Similar to host flow, do not trigger automatic deallocation at then end
of the main program since anything could happen like a
cudaDevcieReset().
2025-02-25 17:25:04 -08:00
jeanPerier
22d9726593
[flang] do not finalize or initialize unused entry dummy (#125482)
Dummy arguments from other entry statement that are not live in the current entry have no backing storage, user code referring to them is not allowed to be reached. The compiler was generating initialization/destruction code for them when INTENT(OUT), causing undefined behaviors.
2025-02-03 18:09:01 +01:00
Kiran Chandramohan
ce32625966
Reland "[Flang][Driver] Add a flag to control zero initialization" (#123606)
Reverts llvm/llvm-project#123330
2025-01-21 07:57:44 +00:00
Kiran Chandramohan
8a229f595a
Revert "Revert "Revert "[Flang][Driver] Add a flag to control zero initializa…" (#123330)
Reverts llvm/llvm-project#123097

Reverting due to buildbot failure
https://lab.llvm.org/buildbot/#/builders/89/builds/14577.
2025-01-17 12:27:58 +00:00
Kiran Chandramohan
8c63648117
Revert "Revert "[Flang][Driver] Add a flag to control zero initializa… (#123097)
…tion of global v…" (#123067)"

This reverts commit 44ba43aa2b740878d83a9d6f1d52a333c0d48c22.

Adds the flag to bbc as well.
2025-01-17 12:14:20 +00:00
Kiran Chandramohan
44ba43aa2b
Revert "[Flang][Driver] Add a flag to control zero initialization of global v…" (#123067)
Reverts llvm/llvm-project#122144

Reverting due to CI failure
https://lab.llvm.org/buildbot/#/builders/89/builds/14422
2025-01-15 15:23:34 +00:00
Kiran Chandramohan
c593e3d0f7
[Flang][Driver] Add a flag to control zero initialization of global v… (#122144)
…ariables

Patch adds a flag to control zero initialization of global variables
without default initialization. The default is to zero initialize.
2025-01-15 15:06:57 +00:00
Leandro Lupori
1fcb6a9754
[flang][OpenMP] Initialize allocatable members of derived types (#120295)
Allocatable members of privatized derived types must be allocated,
with the same bounds as the original object, whenever that member
is also allocated in it, but Flang was not performing such
initialization.

The `Initialize` runtime function can't perform this task unless
its signature is changed to receive an additional parameter, the
original object, that is needed to find out which allocatable
members, with their bounds, must also be allocated in the clone.
As `Initialize` is used not only for privatization, sometimes this
other object won't even exist, so this new parameter would need
to be optional.
Because of this, it seemed better to add a new runtime function:
`InitializeClone`.
To avoid unnecessary calls, lowering inserts a call to it only for
privatized items that are derived types with allocatable members.

Fixes https://github.com/llvm/llvm-project/issues/114888
Fixes https://github.com/llvm/llvm-project/issues/114889
2024-12-19 17:26:50 -03:00
Michael Kruse
c91ba04328
[Flang][NFC] Split runtime headers in preparation for cross-compilation. (#112188)
Split some headers into headers for public and private declarations in
preparation for #110217. Moving the runtime-private headers in
runtime-private include directory will occur in #110298.

* Do not use `sizeof(Descriptor)` in the compiler. The size of the
descriptor is target-dependent while `sizeof(Descriptor)` is the size of
the Descriptor for the host platform which might be too small when
cross-compiling to a different platform. Another problem is that the
emitted assembly ((cross-)compiling to the same target) is not identical
between Flang's running on different systems. Moving the declaration of
`class Descriptor` out of the included header will also reduce the
amount of #included sources.

* Do not use `sizeof(ArrayConstructorVector)` and
`alignof(ArrayConstructorVector)` in the compiler. Same reason as with
`Descriptor`.

* Compute the descriptor's extra flags without instantiating a
Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime
source, but not the compiler source.

* Move `InquiryKeywordHashDecode` into runtime-private header. The
function is defined in the runtime sources and trying to call it in the
compiler would lead to a link-error.

* Move allocator-kind magic numbers into common header. They are the
only declarations out of `allocator-registry.h` in the compiler as well.
 
This does not make Flang cross-compile ready yet, the main goal is to
avoid transitive header dependencies from Flang to clang-rt. There are
more assumptions that host platform is the same as the target platform.
2024-12-06 15:29:00 +01:00
jeanPerier
cf602b95d1
[flang] handle fir.call in AliasAnalysis::getModRef (#117164)
fir.call side effects are hard to describe in a useful way using
`MemoryEffectOpInterface` because it is impossible to list which memory
location a user procedure read/write without doing a data flow analysis
of its body (even PURE procedures may read from any module variable,
Fortran SIMPLE procedure from F2023 will allow that, but they are far
from common at that point).

Fortran language specifications allow the compiler to deduce
that a procedure call cannot access a variable in many cases 
This patch leverages this to extend `fir::AliasAnalysis::getModRef` to
deal with fir.call.

This will allow implementing "array = array_function()" optimization in
a future patch.
2024-11-26 11:17:33 +01:00
jeanPerier
bb8bf858e8
[flang] add internal_assoc flag to mark variable captured in internal procedure (#117161)
This patch adds a flag to mark hlfir.declare of host variables that are
captured in some internal procedure.

It enables implementing a simple fir.call handling in
fir::AliasAnalysis::getModRef leveraging Fortran language specifications
and without a data flow analysis.

This will allow implementing an optimization for "array =
array_function()" where array storage is passed directly into the hidden
result argument to "array_function" when it can be proven that
arraY_function does not reference "array".

Captured host variables are very tricky because they may be accessed
indirectly in any calls if the internal procedure address was captured
via some global procedure pointer. Without flagging them, there is no
way around doing a complex inter procedural data flow analysis:
- checking that the call is not made to an internal procedure is not
enough because of the possibility of indirect calls made to internal
procedures inside the callee.
- checking that the current func.func has no internal procedure is not
enough because this would be invalid with inlining when an procedure
with internal procedures is inlined inside a procedure without internal
procedure.
2024-11-26 09:21:13 +01:00
Scott Manley
e6a4346b5a
[flang] add getElementType() to fir::SquenceType and fir::VectorType (#112770)
getElementType() was missing from Sequence and Vector types. Did a
replace of the obvious places getEleTy() was used for these two types
and updated to use this name instead.

Co-authored-by: Scott Manley <scmanley@nvidia.com>
2024-10-18 09:29:25 +02:00
jeanPerier
ccca3c6371
[flang] enable assumed-rank lowering by default (#110893)
Aside from a minor TODO about polymorphic RANK(*) (2b8e81ce91/flang/lib/Lower/Bridge.cpp (L3459)),
the implementation for assumed-rank is ready for everyone to use.
2024-10-04 09:04:56 +02:00
jeanPerier
c4204c0b29
[flang] replace fir.complex usages with mlir complex (#110850)
Core patch of
https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292.
After that, the last step is to remove fir.complex from FIR types.
2024-10-03 17:10:57 +02:00
Valentin Clement (バレンタイン クレメン)
0dcd68c28a
[flang][cuda] Set allocator index for module allocatable variable (#106777)
Descriptor for module variable with cuda attribute must be set with the
correct allocator index. This patch updates the embox operation used in
the global to carry the allocator index.
2024-08-30 14:55:09 -07:00
Valentin Clement (バレンタイン クレメン)
5bb379f6f0
[flang][cuda] Fix allocation of descriptor for cray pointer (#103474)
The cray pointee descriptor with device attribute was not allocated with
cuf.alloc so it leads to error on deallocation with cuf.free.
2024-08-13 16:52:00 -07:00
Valentin Clement (バレンタイン クレメン)
388b63243c
[flang][cuda] Defined allocator for unified data (#102189)
CUDA unified variable where set to use the same allocator than managed
variable. This patch adds a specific allocator for the unified
variables. Currently it will call the managed allocator underneath but
we want to have the flexibility to change that in the future.
2024-08-06 14:30:31 -07:00