llvm-project

Author	SHA1	Message	Date
Valentin Clement (バレンタインクレメン)	9e9fdd433a	[flang][cuda] Fix definition of CUFSetAllocatorIndex (#148778 )	2025-07-14 21:26:43 -07:00
Valentin Clement (バレンタインクレメン)	2c6771889a	[flang][cuda] Introduce cuf.set_allocator_idx operation (#148717 )	2025-07-14 17:23:18 -07:00
Peter Klausler	40ceaf1d99	[flang][runtime] Fix bad instance of std::optional in runtime (#148724 ) The runtime needs to use common::optional, not std::optional.	2025-07-14 14:12:49 -07:00
Peter Klausler	2e53a68c09	[flang][runtime] Speed up initialization & destruction (#148087 ) Rework derived type initialization in the runtime to just initialize the first element of any array, and then memcpy it to the others, rather than exercising the per-component paths for each element. Reword derived type destruction in the runtime to detect and exploit a fast path for allocatable components whose types themselves don't need nested destruction. Small tweaks were made in hot paths exposed by profiling in descriptor operations and derived type assignment.	2025-07-14 11:14:02 -07:00
Valentin Clement (バレンタインクレメン)	aec3016b64	[flang][cuda] Use minor version in flang_rt.cuda lib name (#148085 ) Add minor version in the lib name to be able to distinguish between specific version.	2025-07-11 15:49:34 -07:00
Valentin Clement (バレンタインクレメン)	f642b63412	[flang][cuda] Update condition in descriptor data transfer (#148306 ) When the two descriptor have the same number of elements and are contiguous, the transfer can be done via pointers.	2025-07-11 15:32:04 -07:00
agozillon	75f81ded8f	[Flang][FlangRT][Runtime] Add RT_OFFLOAD_API_GROUP_BEGIN to missing symbols on AMDGPU (#147612 ) After the recent move to work queues, in certain cases when linking in the fortran runtime built for offload on AMDGPU as required in certain cases, we'll get missing symbols when linking. This PR tries to address this issue by encompassing more of the library in RT_OFFLOAD_API_GROUP_BEGIN, which has the affect of compiling these functions for AMDGPU, resolving the missing symbols. This PR should address the following issue: https://github.com/llvm/llvm-project/issues/145888	2025-07-10 13:19:58 +02:00
Tom Eccles	fe5d94d85d	[flang-rt] Match compiler-rt's default macos version (#147273 ) Followup to https://github.com/llvm/llvm-project/pull/143508 This required adding another alternative implementation of time intrinsics to match what is available in older MacOS. With this change, flang can be used to build programs for older versions of MacOS. Co-authored-by: David Truby <david.truby@arm.com>	2025-07-09 13:26:44 +01:00
Michael Kruse	4be3e95284	[Flang-RT][Offload] Always use LLVM-built GTest (#143682 ) The Offload and Flang-RT had the ability to compile GTest themselves. But in bootstrapping builds, LLVM_LIBRARY_OUTPUT_INTDIR points to the same location as the stage1 build. If both are building GTest, they everwrite each others `libllvm_gtest.a` and `libllvm_test_main.a` which causes #143134. This PR removes the ability for the Offload/Flang-RT runtimes to build their own GTest and instead relies on the stage1 build of GTest. This was already the case with LLVM_INSTALL_GTEST=ON configurations. For LLVM_INSTALL_GTEST=OFF configurations, we now also export gtest into the buildtree configuration. Ultimately, this reduces combinatorial explosion of configurations in which unittests could be built (LLVM_INSTALL_GTEST=ON, GTest built by Offload, GTest built by Flang-RT, GTest built by Offload and also used by Flang-RT). GTest and therefore Offload/Runtime unittests will not be available if the runtimes are configured against an LLVM install tree. Since llvm-lit isn't available in the install tree either, it doesn't matter. Note that compiler-rt and libc also use GTest in non-default configrations. libc also depends on LLVM's GTest build (and would error-out if unavailable), but compiler-rt builds it completely different. Fixes #143134	2025-07-09 12:53:33 +02:00
Daniel Chen	b84696db74	Fix the type of offset that broke 32-bit flang-rt build to use `uint64_t` consistently (#147359 ) The recent change of `flang-rt` has code like `std::size_t offset{offset_};`. It broke the 32-bit `flang-rt` build because `Component::offset_` is of type `uint64_t` but `size_t` varies. Clang complains ``` error: non-constant-expression cannot be narrowed from type 'std::uint64_t' (aka 'unsigned long long') to 'std::size_t' (aka 'unsigned long') in initializer list [-Wc++11-narrowing] 143 \| std::size_t offset{offset_}; \| ^~~~~~~ ``` This patch is to use the consistent `uint64_t` for offset.	2025-07-08 10:01:43 -04:00
Peter Klausler	dccc0266f4	[flang][runtime] Allow INQUIRE(IOLENGTH=) in the presence of defined I/O (#144541 ) When I/O list items include instances of derived types for which defined I/O procedures exist, ignore them. Fixes https://github.com/llvm/llvm-project/issues/144363.	2025-06-30 10:20:39 -07:00
Peter Klausler	2bf3ccabfa	[flang] Restructure runtime to avoid recursion (relanding) (#143993 ) Recursion, both direct and indirect, prevents accurate stack size calculation at link time for GPU device code. Restructure these recursive (often mutually so) routines in the Fortran runtime with new implementations based on an iterative work queue with suspendable/resumable work tickets: Assign, Initialize, initializeClone, Finalize, and Destroy. Default derived type I/O is also recursive, but already disabled. It can be added to this new framework later if the overall approach succeeds. Note that derived type FINAL subroutine calls, defined assignments, and defined I/O procedures all perform callbacks into user code, which may well reenter the runtime library. This kind of recursion is not handled by this change, although it may be possible to do so in the future using thread-local work queues. (Relanding this patch after reverting initial attempt due to some test failures that needed some time to analyze and fix.) Fixes https://github.com/llvm/llvm-project/issues/142481.	2025-06-16 14:37:01 -07:00
Peter Klausler	65b06cd983	[flang][runtime] Check SOURCE= conformability on ALLOCATE (#144113 ) The SOURCE= expression of an ALLOCATE statement, when present and not scalar, must conform to the shape of the allocated objects. Check this at runtime, and return a recoverable error, or crash, when appropriate. Fixes https://github.com/llvm/llvm-project/issues/143900.	2025-06-16 14:36:35 -07:00
Valentin Clement (バレンタインクレメン)	9992668404	[flang][cuda] Add runtime check for passing device arrays (#144003 )	2025-06-12 20:47:58 -07:00
Peter Klausler	10f512f7bb	Revert runtime work queue patch, it breaks some tests that need investigation (#143713 ) Revert "[flang][runtime] Another try to fix build failure" This reverts commit 13869cac2b5051e453aa96ad71220d9d33404620. Revert "[flang][runtime] Fix build bot flang-runtime-cuda-gcc errors (#143650)" This reverts commit d75e28477af0baa063a4d4cc7b3cf657cfadd758. Revert "[flang][runtime] Replace recursion with iterative work queue (#137727)" This reverts commit 163c67ad3d1bf7af6590930d8f18700d65ad4564.	2025-06-11 07:55:06 -07:00
Peter Klausler	b512077c37	[flang][runtime] Another try to fix build failure (#143702 ) Tweak accessibility to try to get code past whatever gcc is being used by the flang-runtime-cuda-gcc build bot.	2025-06-11 06:34:46 -07:00
Peter Klausler	d75e28477a	[flang][runtime] Fix build bot flang-runtime-cuda-gcc errors (#143650 ) Adjust default parent class accessibility to attemp to work around what appear to be old GCC's interpretation.	2025-06-10 20:36:52 -07:00
Peter Klausler	163c67ad3d	[flang][runtime] Replace recursion with iterative work queue (#137727 ) Recursion, both direct and indirect, prevents accurate stack size calculation at link time for GPU device code. Restructure these recursive (often mutually so) routines in the Fortran runtime with new implementations based on an iterative work queue with suspendable/resumable work tickets: Assign, Initialize, initializeClone, Finalize, and Destroy. Default derived type I/O is also recursive, but already disabled. It can be added to this new framework later if the overall approach succeeds. Note that derived type FINAL subroutine calls, defined assignments, and defined I/O procedures all perform callbacks into user code, which may well reenter the runtime library. This kind of recursion is not handled by this change, although it may be possible to do so in the future using thread-local work queues. The effects of this restructuring on CPU performance are yet to be measured.	2025-06-10 14:44:19 -07:00
Valentin Clement (バレンタインクレメン)	9c54512c3e	[flang][cuda] Allocate the dst descriptor in data transfer (#143437 ) In a test like: ``` integer, allocatable, device :: da(:) allocate(a(200)) a = 2 da = a ! da is not allocated before data transfer is initiated. Allocate it with a ``` The reference compiler will allocate the data for the `da` descriptor so the data transfer can be done properly.	2025-06-10 09:43:30 -07:00
Peter Klausler	7b9518ae27	[flang][runtime] Accommodate change of type in assignment to allocatable (#141988 ) When an assignment to a derived type allocatable requires (re)allocation, its type may change to that of the right-hand side. The code didn't update its derived type pointer, leading to the wrong type being put into the descriptors created for elemental defined assignment subroutine calls. Fixes https://github.com/llvm/llvm-project/issues/141835.	2025-06-04 09:22:01 -07:00
Peter Klausler	4c6b60a639	[flang] Extension: allow char string edit descriptors in input formats (#140624 ) FORMAT("J=",I3) is accepted by a few other Fortran compilers as a valid format for input as well as for output. The character string edit descriptor "J=" is interpreted as if it had been 2X on input, causing two characters to be skipped over. The skipped characters don't have to match the characters in the literal string. An optional warning is emitted under control of the -pedantic option.	2025-05-28 13:58:22 -07:00
Valentin Clement (バレンタインクレメン)	fc9ce037ef	[flang][rt] Enable Count and CountDim for device build (#141684 )	2025-05-28 09:55:49 -07:00
Kajetan Puchalski	09a70b1e10	[flang-rt] Explicitly define the default ShallowCopy* templates (#141619 ) Not explicitly defining the default case for ShallowCopy* functions does not meet the requirements for gcc to actually instantiate the templates, leading to build errors that show up with gcc but not with clang. Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>	2025-05-27 16:38:48 +01:00
Kajetan Puchalski	0d464009fe	[flang-rt] Fix usage of kNoAsyncId in assign.cpp (#141077 ) Fix a leftover old variable name causing build bot errors. Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>	2025-05-22 15:49:03 +01:00
Kajetan Puchalski	c2892b0bdf	[flang-rt] Optimise ShallowCopy and use it in CopyInAssign (#140569 ) Using Descriptor.Element<>() when iterating through a rank-1 array is currently inefficient, because the generic implementation suitable for arrays of any rank makes the compiler unable to perform optimisations that would make the rank-1 case considerably faster. This is currently done inside ShallowCopy, as well as by CopyInAssign, where the implementation of elemental copies (inside Assign) is equivalent to ShallowCopyDiscontiguousToDiscontiguous. To address that, add a DescriptorIterator abstraction specialised for arrays of various ranks, and use that throughout ShallowCopy to iterate over the arrays. Furthermore, depending on the pointer type passed to memcpy, the optimiser can remove the memcpy calls from ShallowCopy altogether which can result in substantial performance improvements on its own. Specialise ShallowCopy for various element pointer types to make these optimisations possible. Finally, replace the call to Assign inside CopyInAssign with a call to newly optimised ShallowCopy. For the thornado-mini application, this reduces the runtime by 27.7%. --------- Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>	2025-05-22 15:11:46 +01:00
Valentin Clement (バレンタインクレメン)	c17ae161fd	[flang][cuda] Use nullptr for comparison (#140767 ) Comparison without explicit nullptr seems to bring false positives. Use explicit nullptr.	2025-05-20 11:04:06 -07:00
Valentin Clement (バレンタインクレメン)	f5609aa1b0	[flang][cuda] Use a reference for asyncObject (#140614 ) Switch from `int64_t` to `int64_t*` to fit with the rest of the implementation. New tentative with some fix. The previous was reverted some time ago. Reviewed in #138010	2025-05-19 15:02:53 -07:00
Kazu Hirata	56aa935bec	[flang-rt] Fix warnings This patch fixes: flang-rt/include/flang-rt/runtime/emit-encoded.h:67:27: error: implicit conversion from 'const char16_t' to 'char32_t' may change the meaning of the represented code unit [-Werror,-Wcharacter-conversion] flang-rt/lib/runtime/edit-input.cpp:1114:18: error: implicit conversion from 'char32_t' to 'char16_t' may lose precision and change the meaning of the represented code unit [-Werror,-Wcharacter-conversion] flang-rt/lib/runtime/edit-input.cpp:1133:18: error: implicit conversion from 'char32_t' to 'char16_t' may lose precision and change the meaning of the represented code unit [-Werror,-Wcharacter-conversion] flang-rt/lib/runtime/edit-input.cpp:1033:14: error: implicit conversion from 'char32_t' to 'char16_t' may lose precision and change the meaning of the represented code unit [-Werror,-Wcharacter-conversion] flang-rt/lib/runtime/edit-input.cpp:986:14: error: implicit conversion from 'char32_t' to 'char16_t' may lose precision and change the meaning of the represented code unit [-Werror,-Wcharacter-conversion]	2025-05-15 17:46:00 -07:00
Peter Klausler	36ccfe29be	[flang] Clear obsolete type from reallocated allocatable (#139788 ) When an assignment to a polymorphic allocatable changes its type to an intrinsic type, be sure to reset its descriptor's derived type pointer to null. Fixes https://github.com/llvm/llvm-project/issues/136522.	2025-05-15 11:25:44 -07:00
Aaron Ballman	7548cec16f	[www][docs] Remove last mentions of IRC (#139076 ) It's the end of an era. The IRC channel was previously where the community gathered to discuss technical topics but is now a ghost town where the primary activity is moderators (me) kickbanning the same individual dozens of times a day for CoC violations and the secondary activity is telling the occasional person to come to Discord for help. The number of people engaging on IRC for the community's intended purposes seems to be roughly one person a month. So this removes all remaining mentions of IRC from our documentation so that it no longer appears to be an "official" channel for communicating with the community. It also removes IRC handles from the various maintainers lists, since those would stand out as confusing anachronisms. The IRC channel topic already recommends people come to the Discord server. There is no way to "shut down" an IRC channel such that it no longer exists, so the channel will continue to exist on OFTC, but will be unmoderated. (This was previously discussed in https://discourse.llvm.org/c/llvm/5 but some mentions persisted.)	2025-05-08 09:40:33 -04:00
Valentin Clement (バレンタインクレメン)	9b6b144438	Revert "[flang][cuda] Use a reference for asyncObject" (#138221 ) Reverts llvm/llvm-project#138186	2025-05-01 17:41:44 -07:00
Valentin Clement (バレンタインクレメン)	7f922f1400	[flang][cuda] Use a reference for asyncObject (#138186 ) Switch from `int64_t` to `int64_t*` to fit with the rest of the implementation. New tentative with some fix. The previous was reverted yesterday.	2025-05-01 17:04:12 -07:00
Valentin Clement (バレンタインクレメン)	01a18809ee	Revert "[flang][cuda] Use a reference for asyncObject (#138010 )" (#138082 ) This reverts commit 9b0eaf71e674a28ee55be3afa11b5f7d4da732c0.	2025-04-30 22:03:26 -07:00
Valentin Clement (バレンタインクレメン)	16f01b3777	[flang][cuda] Fix signatures after argument change (#138081 )	2025-04-30 21:40:12 -07:00
Valentin Clement (バレンタインクレメン)	ba3a46c1ea	[flang][cuda] Fix type of kNoAsyncObject (#138029 )	2025-04-30 14:59:02 -07:00
Valentin Clement (バレンタインクレメン)	9b0eaf71e6	[flang][cuda] Use a reference for asyncObject (#138010 ) Switch from `int64_t` to `int64_t*` to fit with the rest of the implementation.	2025-04-30 14:02:29 -07:00
Slava Zakharin	a8607063f3	[flang-rt] Simplify INDEX with len-1 SUBSTRING. (#137889 ) The len-1 case is noticeably slower than gfortran's straightforward implementation `075611b646/libgfortran/intrinsics/string_intrinsics_inc.c (L253)` This change speeds up a simple microkernel by 37% on icelake.	2025-04-30 08:25:06 -07:00
Michael Kruse	77581e2751	Reapply "[Flang] Remove FLANG_INCLUDE_RUNTIME (#124126 )" This reverts commit 27539c3f903be26c487703943d3c27d45d4542b2. Retry with new buildbot configuration after master restart. Original message: Remove the FLANG_INCLUDE_RUNTIME option which was replaced by LLVM_ENABLE_RUNTIMES=flang-rt. The FLANG_INCLUDE_RUNTIME option was added in #122336 which disables the non-runtimes build instructions for the Flang runtime so they do not conflict with the LLVM_ENABLE_RUNTIMES=flang-rt option added in #110217. In order to not maintain multiple build instructions for the same thing, this PR completely removes the old build instructions (effectively forcing FLANG_INCLUDE_RUNTIME=OFF). As per discussion in https://discourse.llvm.org/t/buildbot-changes-with-llvm-enable-runtimes-flang-rt/83571/2 we now implicitly add LLVM_ENABLE_RUNTIMES=flang-rt whenever Flang is compiled in a bootstrapping (non-standalone) build. Because it is possible to build Flang-RT separately, this behavior can be disabled using `-DFLANG_ENABLE_FLANG_RT=OFF`. Also see the discussion an implicitly adding runtimes/projects in #123964.	2025-04-30 12:32:49 +02:00
Valentin Clement (バレンタインクレメン)	565a075909	[flang][cuda][rt] Track asynchronous allocation stream for deallocation (#137073 ) When an asynchronous allocation is made, we call `cudaMallocAsync` with a stream. For deallocation, we need to call `cudaFreeAsync` with the same stream. in order to achieve that, we need to track the allocation and their respective stream. This patch adds a simple sorted array of asynchronous allocations. A binary search is performed to retrieve the allocation when deallocation is needed.	2025-04-24 10:01:47 -07:00
Joseph Huber	a5cdbef5f0	Revert "[LLVM] Replace use of `LLVM_RUNTIMES_TARGET` with `LLVM_DEFAULT_TARGET_TRIPLE` (#136208 )" This reverts commit 2e145f11c0bcfa2052416d96d682c75f33971a8c. Somehow causes some static assertions to fail?	2025-04-22 08:08:51 -05:00
Joseph Huber	2e145f11c0	[LLVM] Replace use of `LLVM_RUNTIMES_TARGET` with `LLVM_DEFAULT_TARGET_TRIPLE` (#136208 ) Summary: For purposes of determining the triple, it's more correct to use `LLVM_DEFAULT_TARGET_TRIPLE`.	2025-04-22 07:59:54 -05:00
Peter Klausler	03b3620538	[flang] Tweak integer output under width-free I/G editing (#136316 ) A recent patch fixed Fujitsu test case 0561_0168 by emitting a leading space for "bare" (no width 'w') I and G output editing of integer values. This fix has broken another Fujitsu test case (0561_0168), since the leading space should not be produced at the first column of the output record. Adjust.	2025-04-18 12:52:39 -07:00
Peter Klausler	32145a5e18	[flang][runtime] Better handling for integer input into null address (#135987 ) The original descriptor-only path for I/O checks for null data addresses and crashes with a readable message, but there's no such check on the new fast path for formatted integer input, and so a READ into (say) a deallocated allocatable will crash with a segfault. Put a null data address check on the new fast path.	2025-04-18 12:51:18 -07:00
Peter Klausler	21a406c92c	[flang] Improve runtime SAME_TYPE_AS() (#135670 ) The present implementation of the intrinsic function SAME_TYPE_AS() yields false positive .TRUE. results for distinct derived types that happen to have the same name. Replace with an implementation that can now depend on derived type information records being the same type if and only if they are at the same location, or are PDT instantiations of the same uninstantiated derived type. And ensure that the derived type information includes references from instantiated PDTs to their original types. (The derived type information format supports these references already, but they were not being set, perhaps because the current faulty SAME_TYPE_AS implementation didn't need them, and nothing else does.) Fixes https://github.com/llvm/llvm-project/issues/135580.	2025-04-18 12:48:33 -07:00
Valentin Clement (バレンタインクレメン)	d79bb93278	[flang][cuda] Carry over the stream information to kernel launch (#136217 ) In CUDA Fortran the stream is encoded in an INTEGER(cuda_stream_kind) variable. This information is carried over the GPU dialect through the `cuf.stream_cast` and the token in the GPU ops. When converting the `gpu.launch_func` to runtime call, the `cuf.stream_cast` becomes a no-op and the reference to the stream is passed to the runtime. The runtime is adapted to take integer references instead of value for stream.	2025-04-18 10:44:18 -07:00
Slava Zakharin	273aecdb20	[flang-rt] Use runtime::memchr instead of std::memchr. (#135298 )	2025-04-18 08:45:52 -07:00
Eugene Epshteyn	3428cc94c8	[flang] Implement external routine usage of hostnm() (#134900 ) Previously, `hostnm` extended intrinsic was implemented as proper intrinsic. Since then we found out that some applications use `hostnm` as external routine via `external hostnm`. This prevents `hostnm` from being recognized as an intrinsic. This PR implements `hostnm` as external routine.	2025-04-15 19:04:59 -04:00
Peter Klausler	72144d119a	[flang][runtime] Fix recently broken big-endian formatted integer input (#135417 ) My recent change to speed up formatted integer input has a bug on big-endian targets that has shown up on ppc64 AIX build bots. Fix.	2025-04-11 12:52:23 -07:00
Slava Zakharin	f4203ca2b7	[flang-rt] Declare DeviceTrap static inline. (#135286 )	2025-04-10 17:38:04 -07:00
Valentin Clement (バレンタインクレメン)	1d8966e246	[flang][cuda] Use the provided stream in kernel launch (#135267 )	2025-04-10 17:15:23 -07:00

1 2 3

117 Commits