llvm-project

Author	SHA1	Message	Date
agozillon	30d2cb5a7e	[Flang][OpenMP][Runtime] Minor Flang runtime for OpenMP AMDGPU modifications (#152631 ) We have some modifications downstream to compile the flang runtime for amdgpu using clang OpenMP, some more hacky than others to workaround (hopefully temporary) compiler issues. The additions here are the non-hacky alterations. Main changes: * Create freestanding versions of memcpy, strlen and memmove, and replace std:: references with these so that we can default to std:: when it's available, or our own Flang implementation when it's not. * Wrap more bits and pieces of the library in declare target wrappers (RT_* macros). * Fix some warnings that'll pose issues with werror on, in this case having the namespace infront of variables passed to templates. Another minor issues that'll likely still pop up depending on the program you're linking with is that abort will be undefined, it is perhaps possible to solve it with a freestanding implementation as with memcpy etc. but we end up with multiple definitions in this case. An alternative is to create an empty extern "c" version (which can be empty or forwrd on to the builtin). Co-author: Dan Palermo Dan.Palermo@amd.com	2025-08-29 23:04:48 +02:00
Peter Klausler	2e53a68c09	[flang][runtime] Speed up initialization & destruction (#148087 ) Rework derived type initialization in the runtime to just initialize the first element of any array, and then memcpy it to the others, rather than exercising the per-component paths for each element. Reword derived type destruction in the runtime to detect and exploit a fast path for allocatable components whose types themselves don't need nested destruction. Small tweaks were made in hot paths exposed by profiling in descriptor operations and derived type assignment.	2025-07-14 11:14:02 -07:00
Peter Klausler	2bf3ccabfa	[flang] Restructure runtime to avoid recursion (relanding) (#143993 ) Recursion, both direct and indirect, prevents accurate stack size calculation at link time for GPU device code. Restructure these recursive (often mutually so) routines in the Fortran runtime with new implementations based on an iterative work queue with suspendable/resumable work tickets: Assign, Initialize, initializeClone, Finalize, and Destroy. Default derived type I/O is also recursive, but already disabled. It can be added to this new framework later if the overall approach succeeds. Note that derived type FINAL subroutine calls, defined assignments, and defined I/O procedures all perform callbacks into user code, which may well reenter the runtime library. This kind of recursion is not handled by this change, although it may be possible to do so in the future using thread-local work queues. (Relanding this patch after reverting initial attempt due to some test failures that needed some time to analyze and fix.) Fixes https://github.com/llvm/llvm-project/issues/142481.	2025-06-16 14:37:01 -07:00
Peter Klausler	10f512f7bb	Revert runtime work queue patch, it breaks some tests that need investigation (#143713 ) Revert "[flang][runtime] Another try to fix build failure" This reverts commit 13869cac2b5051e453aa96ad71220d9d33404620. Revert "[flang][runtime] Fix build bot flang-runtime-cuda-gcc errors (#143650)" This reverts commit d75e28477af0baa063a4d4cc7b3cf657cfadd758. Revert "[flang][runtime] Replace recursion with iterative work queue (#137727)" This reverts commit 163c67ad3d1bf7af6590930d8f18700d65ad4564.	2025-06-11 07:55:06 -07:00
Peter Klausler	163c67ad3d	[flang][runtime] Replace recursion with iterative work queue (#137727 ) Recursion, both direct and indirect, prevents accurate stack size calculation at link time for GPU device code. Restructure these recursive (often mutually so) routines in the Fortran runtime with new implementations based on an iterative work queue with suspendable/resumable work tickets: Assign, Initialize, initializeClone, Finalize, and Destroy. Default derived type I/O is also recursive, but already disabled. It can be added to this new framework later if the overall approach succeeds. Note that derived type FINAL subroutine calls, defined assignments, and defined I/O procedures all perform callbacks into user code, which may well reenter the runtime library. This kind of recursion is not handled by this change, although it may be possible to do so in the future using thread-local work queues. The effects of this restructuring on CPU performance are yet to be measured.	2025-06-10 14:44:19 -07:00

5 Commits