Rework derived type initialization in the runtime to just initialize the
first element of any array, and then memcpy it to the others, rather
than exercising the per-component paths for each element.
Reword derived type destruction in the runtime to detect and exploit a
fast path for allocatable components whose types themselves don't need
nested destruction.
Small tweaks were made in hot paths exposed by profiling in descriptor
operations and derived type assignment.
After the recent move to work queues, in certain cases when linking in
the fortran runtime built for offload on AMDGPU as required in certain
cases, we'll get missing symbols when linking. This PR tries to address
this issue by encompassing more of the library in
RT_OFFLOAD_API_GROUP_BEGIN, which has the affect of compiling these
functions for AMDGPU, resolving the missing symbols.
This PR should address the following issue:
https://github.com/llvm/llvm-project/issues/145888
Followup to https://github.com/llvm/llvm-project/pull/143508
This required adding another alternative implementation of time
intrinsics to match what is available in older MacOS.
With this change, flang can be used to build programs for older versions
of MacOS.
Co-authored-by: David Truby <david.truby@arm.com>
The Offload and Flang-RT had the ability to compile GTest themselves.
But in bootstrapping builds, LLVM_LIBRARY_OUTPUT_INTDIR points to the
same location as the stage1 build. If both are building GTest, they
everwrite each others `libllvm_gtest.a` and `libllvm_test_main.a` which
causes #143134.
This PR removes the ability for the Offload/Flang-RT runtimes to build
their own GTest and instead relies on the stage1 build of GTest. This
was already the case with LLVM_INSTALL_GTEST=ON configurations. For
LLVM_INSTALL_GTEST=OFF configurations, we now also export gtest into the
buildtree configuration. Ultimately, this reduces combinatorial
explosion of configurations in which unittests could be built
(LLVM_INSTALL_GTEST=ON, GTest built by Offload, GTest built by Flang-RT,
GTest built by Offload and also used by Flang-RT).
GTest and therefore Offload/Runtime unittests will not be available if
the runtimes are configured against an LLVM install tree. Since llvm-lit
isn't available in the install tree either, it doesn't matter.
Note that compiler-rt and libc also use GTest in non-default
configrations. libc also depends on LLVM's GTest build (and would
error-out if unavailable), but compiler-rt builds it completely
different.
Fixes#143134
The recent change of `flang-rt` has code like `std::size_t
offset{offset_};`.
It broke the 32-bit `flang-rt` build because `Component::offset_` is of
type `uint64_t` but `size_t` varies.
Clang complains
```
error: non-constant-expression cannot be narrowed from type 'std::uint64_t' (aka 'unsigned long long') to 'std::size_t' (aka 'unsigned long') in initializer list [-Wc++11-narrowing]
143 | std::size_t offset{offset_};
| ^~~~~~~
```
This patch is to use the consistent `uint64_t` for offset.
Recursion, both direct and indirect, prevents accurate stack size
calculation at link time for GPU device code. Restructure these
recursive (often mutually so) routines in the Fortran runtime with new
implementations based on an iterative work queue with
suspendable/resumable work tickets: Assign, Initialize, initializeClone,
Finalize, and Destroy.
Default derived type I/O is also recursive, but already disabled. It can
be added to this new framework later if the overall approach succeeds.
Note that derived type FINAL subroutine calls, defined assignments, and
defined I/O procedures all perform callbacks into user code, which may
well reenter the runtime library. This kind of recursion is not handled
by this change, although it may be possible to do so in the future using
thread-local work queues.
(Relanding this patch after reverting initial attempt due to some test
failures that needed some time to analyze and fix.)
Fixes https://github.com/llvm/llvm-project/issues/142481.
The SOURCE= expression of an ALLOCATE statement, when present and not
scalar, must conform to the shape of the allocated objects. Check this
at runtime, and return a recoverable error, or crash, when appropriate.
Fixes https://github.com/llvm/llvm-project/issues/143900.
Recursion, both direct and indirect, prevents accurate stack size
calculation at link time for GPU device code. Restructure these
recursive (often mutually so) routines in the Fortran runtime with new
implementations based on an iterative work queue with
suspendable/resumable work tickets: Assign, Initialize, initializeClone,
Finalize, and Destroy.
Default derived type I/O is also recursive, but already disabled. It can
be added to this new framework later if the overall approach succeeds.
Note that derived type FINAL subroutine calls, defined assignments, and
defined I/O procedures all perform callbacks into user code, which may
well reenter the runtime library. This kind of recursion is not handled
by this change, although it may be possible to do so in the future using
thread-local work queues.
The effects of this restructuring on CPU performance are yet to be
measured.
In a test like:
```
integer, allocatable, device :: da(:)
allocate(a(200))
a = 2
da = a ! da is not allocated before data transfer is initiated. Allocate it with a
```
The reference compiler will allocate the data for the `da` descriptor so
the data transfer can be done properly.
When an assignment to a derived type allocatable requires
(re)allocation, its type may change to that of the right-hand side. The
code didn't update its derived type pointer, leading to the wrong type
being put into the descriptors created for elemental defined assignment
subroutine calls.
Fixes https://github.com/llvm/llvm-project/issues/141835.
FORMAT("J=",I3) is accepted by a few other Fortran compilers as a valid
format for input as well as for output. The character string edit
descriptor "J=" is interpreted as if it had been 2X on input, causing
two characters to be skipped over. The skipped characters don't have to
match the characters in the literal string. An optional warning is
emitted under control of the -pedantic option.
Not explicitly defining the default case for ShallowCopy* functions does
not meet the requirements for gcc to actually instantiate the templates,
leading to build errors that show up with gcc but not with clang.
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Using Descriptor.Element<>() when iterating through a rank-1 array is
currently inefficient, because the generic implementation suitable for
arrays of any rank makes the compiler unable to perform optimisations
that would make the rank-1 case considerably faster.
This is currently done inside ShallowCopy, as well as by CopyInAssign,
where the implementation of elemental copies (inside Assign) is
equivalent to ShallowCopyDiscontiguousToDiscontiguous.
To address that, add a DescriptorIterator abstraction specialised for
arrays of various ranks, and use that throughout ShallowCopy to iterate
over the arrays.
Furthermore, depending on the pointer type passed to memcpy, the
optimiser can remove the memcpy calls from ShallowCopy altogether which
can result in substantial performance improvements on its own.
Specialise ShallowCopy for various element pointer types to make these
optimisations possible.
Finally, replace the call to Assign inside CopyInAssign with a call to
newly optimised ShallowCopy.
For the thornado-mini application, this reduces the runtime by 27.7%.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.
New tentative with some fix. The previous was reverted some time ago.
Reviewed in #138010
This patch fixes:
flang-rt/include/flang-rt/runtime/emit-encoded.h:67:27: error:
implicit conversion from 'const char16_t' to 'char32_t' may change
the meaning of the represented code unit
[-Werror,-Wcharacter-conversion]
flang-rt/lib/runtime/edit-input.cpp:1114:18: error: implicit
conversion from 'char32_t' to 'char16_t' may lose precision and
change the meaning of the represented code unit
[-Werror,-Wcharacter-conversion]
flang-rt/lib/runtime/edit-input.cpp:1133:18: error: implicit
conversion from 'char32_t' to 'char16_t' may lose precision and
change the meaning of the represented code unit
[-Werror,-Wcharacter-conversion]
flang-rt/lib/runtime/edit-input.cpp:1033:14: error: implicit
conversion from 'char32_t' to 'char16_t' may lose precision and
change the meaning of the represented code unit
[-Werror,-Wcharacter-conversion]
flang-rt/lib/runtime/edit-input.cpp:986:14: error: implicit
conversion from 'char32_t' to 'char16_t' may lose precision and
change the meaning of the represented code unit
[-Werror,-Wcharacter-conversion]
When an assignment to a polymorphic allocatable changes its type to an
intrinsic type, be sure to reset its descriptor's derived type pointer
to null.
Fixes https://github.com/llvm/llvm-project/issues/136522.
It's the end of an era. The IRC channel was previously where the
community gathered to discuss technical topics but is now a ghost town
where the primary activity is moderators (me) kickbanning the same
individual dozens of times a day for CoC violations and the secondary
activity is telling the occasional person to come to Discord for help.
The number of people engaging on IRC for the community's intended
purposes seems to be roughly one person a month.
So this removes all remaining mentions of IRC from our documentation so
that it no longer appears to be an "official" channel for communicating
with the community. It also removes IRC handles from the various
maintainers lists, since those would stand out as confusing
anachronisms.
The IRC channel topic already recommends people come to the Discord
server. There is no way to "shut down" an IRC channel such that it no
longer exists, so the channel will continue to exist on OFTC, but will
be unmoderated.
(This was previously discussed in https://discourse.llvm.org/c/llvm/5
but some mentions persisted.)
This reverts commit 27539c3f903be26c487703943d3c27d45d4542b2. Retry
with new buildbot configuration after master restart.
Original message:
Remove the FLANG_INCLUDE_RUNTIME option which was replaced by
LLVM_ENABLE_RUNTIMES=flang-rt.
The FLANG_INCLUDE_RUNTIME option was added in #122336 which disables the
non-runtimes build instructions for the Flang runtime so they do not
conflict with the LLVM_ENABLE_RUNTIMES=flang-rt option added in #110217.
In order to not maintain multiple build instructions for the same thing,
this PR completely removes the old build instructions (effectively
forcing FLANG_INCLUDE_RUNTIME=OFF).
As per discussion in
https://discourse.llvm.org/t/buildbot-changes-with-llvm-enable-runtimes-flang-rt/83571/2
we now implicitly add LLVM_ENABLE_RUNTIMES=flang-rt whenever Flang is
compiled in a bootstrapping (non-standalone) build. Because it is
possible to build Flang-RT separately, this behavior can be disabled
using `-DFLANG_ENABLE_FLANG_RT=OFF`. Also see the discussion an
implicitly adding runtimes/projects in #123964.
When an asynchronous allocation is made, we call `cudaMallocAsync` with
a stream. For deallocation, we need to call `cudaFreeAsync` with the
same stream. in order to achieve that, we need to track the allocation
and their respective stream.
This patch adds a simple sorted array of asynchronous allocations. A
binary search is performed to retrieve the allocation when deallocation
is needed.
A recent patch fixed Fujitsu test case 0561_0168 by emitting a leading
space for "bare" (no width 'w') I and G output editing of integer
values. This fix has broken another Fujitsu test case (0561_0168), since
the leading space should not be produced at the first column of the
output record. Adjust.
The original descriptor-only path for I/O checks for null data addresses
and crashes with a readable message, but there's no such check on the
new fast path for formatted integer input, and so a READ into (say) a
deallocated allocatable will crash with a segfault. Put a null data
address check on the new fast path.
The present implementation of the intrinsic function SAME_TYPE_AS()
yields false positive .TRUE. results for distinct derived types that
happen to have the same name.
Replace with an implementation that can now depend on derived type
information records being the same type if and only if they are at the
same location, or are PDT instantiations of the same uninstantiated
derived type. And ensure that the derived type information includes
references from instantiated PDTs to their original types. (The derived
type information format supports these references already, but they were
not being set, perhaps because the current faulty SAME_TYPE_AS
implementation didn't need them, and nothing else does.)
Fixes https://github.com/llvm/llvm-project/issues/135580.
In CUDA Fortran the stream is encoded in an INTEGER(cuda_stream_kind)
variable.
This information is carried over the GPU dialect through the
`cuf.stream_cast` and the token in the GPU ops.
When converting the `gpu.launch_func` to runtime call, the
`cuf.stream_cast` becomes a no-op and the reference to the stream is
passed to the runtime.
The runtime is adapted to take integer references instead of value for
stream.
Previously, `hostnm` extended intrinsic was implemented as proper
intrinsic. Since then we found out that some applications use `hostnm`
as external routine via `external hostnm`. This prevents `hostnm` from
being recognized as an intrinsic. This PR implements `hostnm` as
external routine.