297 Commits

Author SHA1 Message Date
Joseph Huber
b2d3a6574c
[libc] Rename rpc::Status to rpc::RPCStatus to reduce conflicts (#190239)
Summary:
`Status` is unfortunately heavily overloaded in practice. Things like
X11 define it as a macro. Best to just remove that possibility entirely.
2026-04-02 14:55:57 -05:00
Michael Kruse
afb80bddf1
[Runtimes] Introduce variables containing resource dir paths (#177953)
Introduce common infrastructure for runtimes that determines compiler
resource path locations. These variables introduced are:

 * RUNTIMES_OUTPUT_RESOURCE_DIR
 * RUNTIMES_INSTALL_RESOURCE_PATH
 
That contain the location for the compiler resource path (typically
`lib/clang/<version>`) in the build tree and the install tree (the
latter relative to CMAKE_INSTALL_PREFIX).

Additionally, define

 * RUNTIMES_OUTPUT_RESOURCE_LIB_DIR
 * RUNTIMES_INSTALL_RESOURCE_LIB_PATH

as for the location of clang/flang version-locked libraries (typically
`lib${LLVM_LIBDIR_SUFFIX}/<targer-triple>`, but also depends on `APPLE`
and `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR`). This code is moved from
flang-rt and initially becomes its only user.

Refactored out of #171610 as requested
[here](https://github.com/llvm/llvm-project/pull/171610#discussion_r2687382481).

Extracted `get_runtimes_target_libdir_common` from compiler-rt as
requested
[here](https://github.com/llvm/llvm-project/pull/171610#discussion_r2689565634).
 
Added TODO comments to all runtimes as requested
[here](https://github.com/llvm/llvm-project/pull/171610#issuecomment-3789598635).
2026-04-02 10:32:14 +00:00
Zhen Wang
c794742bd7
[flang][cuda] Support non-allocatable module-level managed variables (#189753)
Add support for non-allocatable module-level CUDA managed variables
using pointer indirection through a companion global in
__nv_managed_data__. The CUDA runtime populates this pointer with the
unified memory address via __cudaRegisterManagedVar and
__cudaInitModule.

- Create a .managed.ptr companion global in the __nv_managed_data__
section and register it with _FortranACUFRegisterManagedVariable
- Call __cudaInitModule once after all variables are registered, only
when non-allocatable managed globals are present, to populate managed
pointers
- Annotate managed globals in gpu.module with nvvm.managed for PTX
.attribute(.managed) generation
- Suppress cuf.data_transfer for assignments to/from non-allocatable
module managed variables, since cudaMemcpy would target the shadow
address rather than the actual unified memory
- Preserve cuf.data_transfer for device_var = managed_var assignments
where explicit transfer is still required

Note: This PR depends on
[#189751](https://github.com/llvm/llvm-project/pull/189751) (MLIR:
nvvm.managed attribute).
2026-04-01 18:43:04 +00:00
Zhen Wang
3ed48bf648
Revert "[flang][cuda] Support non-allocatable module-level managed variables" (#189745)
Reverts llvm/llvm-project#188526
2026-03-31 20:53:50 +00:00
Zhen Wang
c4e6cf0abf
[flang][cuda] Support non-allocatable module-level managed variables (#188526)
Add support for non-allocatable module-level CUDA managed variables
using pointer indirection through a companion global in
__nv_managed_data__. The CUDA runtime populates this pointer with the
unified memory address via __cudaRegisterManagedVar and
__cudaInitModule.

1. Create a .managed.ptr companion global in the __nv_managed_data__
section and register it with _FortranACUFRegisterManagedVariable
(CUFAddConstructor.cpp)
2. Call __cudaInitModule after registration to populate the managed
pointer (registration.cpp)
3. Annotate managed globals in gpu.module with nvvm.managed for PTX
.attribute(.managed) generation (cuda-code-gen.mlir)
4. Suppress cuf.data_transfer for assignments to/from non-allocatable
module managed variables, since cudaMemcpy would target the shadow
address rather than the actual unified memory (tools.h)
5. Preserve cuf.data_transfer for device_var = managed_var assignments
where explicit transfer is still required
2026-03-31 16:27:08 +00:00
Joseph Huber
cc4727ae3b
[LLVM] Replace use of LLVM_RUNTIMES_TARGET with LLVM_DEFAULT_TARGET_TRIPLE (#188303)
Summary:
This PR primarily changes using `LLVM_RUNTIMES_TARGET` to
`LLVM_DEFAULT_TARGET_TRIPLE`. The reason is that the default target
triple is the true cross-compiling architecture we are using, while the
runtimes_target can contain multilib strings like `+debug` or similar.

Additionally add the proper path handling to the OpenMP / Offload
libraries.
2026-03-26 08:52:32 -05:00
Markus Mützel
e41e7b90f9
[flang-rt] Avoid duplicate definition of std::__libcpp_verbose_abort (#175551)
If a project depends on the Flang runtime and on libc++, linking fails
because `std::__libcpp_verbose_abort` is defined in both libraries.

Avoid that duplicate definition by defining `_LIBCPP_VERBOSE_ABORT`
before including any C++ headers and by renaming that symbol in the
Flang runtime to `flang_rt_verbose_abort`.

The function that is modified was originally introduced in D158957 to
solve an undefined symbol error when linking pure-Fortran projects with
the Flang runtime.
Providing a definition for that symbol in the Flang runtime might work
correctly for ELF or Mach-O if that symbol has weak linkage in libc++.
But at least for COFF, this now causes multiple-definition errors for
projects that are linking to the Flang runtime and to libc++.

The linker errors before this change for Windows/MinGW using
Clang+Flang+lld look like this:
```
ld.lld: error: duplicate symbol: std::__1::__libcpp_verbose_abort(char const*, ...)
>>> defined at libflang_rt.runtime.a(io-api-minimal.cpp.obj)
>>> defined at libc++.dll.a(libc++.dll)
```
2026-03-26 14:46:34 +01:00
Michael Kruse
4380ae6dbb
[Flang-RT] Support building no library (#187868)
Allow setting both FLANG_RT_ENABLE_SHARED and FLANG_RT_ENABLE_STATIC to
OFF at the same time.

This is extracted out of #171515 to make that PR a little smaller. By
itself it makes little sense since if not building either the `.a` or
the `.so`, you are not building anything. But with #171515, the module
files are still built, allowing building the modules files without the
library. This is mostly intended for GPGPU targets where building the
library is not always needed, but the module files are.
2026-03-25 19:49:15 +01:00
Sairudra More
b16e012603
[flang-rt] Fix macOS build: define _DARWIN_C_SOURCE for mmap flags (#186142)
On Darwin, `sys/mman.h` hides `MAP_JIT` and `MAP_ANON(YMOUS)` when
`_POSIX_C_SOURCE` is defined unless `_DARWIN_C_SOURCE` is also defined.
`trampoline.cpp` uses those flags, so this change defines
`_DARWIN_C_SOURCE` before including `<sys/mman.h>` in this file.

Fixes build failure reported in #183108.

Co-authored-by: Sairudra More <moresair@pe31.hpc.amslabs.hpecorp.net>
2026-03-25 10:23:31 +05:30
Eugene Epshteyn
1a5d176359
[flang-rt] Fix test isolation, fixture usage, and other issues in Stop.cpp tests (#188155)
- Use TEST_F instead of TEST so CrashHandlerFixture::SetUp() is actually
called, registering the custom crash handler for death tests.
- Move putenv/executionEnvironment.Configure calls inside EXPECT_EXIT
blocks so they run in the forked child process, preventing the
NO_STOP_MESSAGE environment variable and configured global state from
leaking into subsequent tests.
- Replace const_cast<char *>("NO_STOP_MESSAGE=1") with a mutable static
char array, to avoid casting away constness of a string literal.
- Update CrashTest's expected pattern to match the output format of the
custom crash handler installed by CrashHandlerFixture, which was
previously never invoked due to the TEST vs TEST_F bug. (Note: there was
a buildbot failure related to this:
https://lab.llvm.org/buildbot/#/builders/130/builds/18413 )

Assisted-by: AI
2026-03-24 06:27:11 -04:00
Zhen Wang
4f32ea35f5
[flang][cuda] Fix const mismatch in CUFRegisterManagedVariable for __cudaRegisterManagedVar (#188142)
Change varName parameter from `const char *` to `char *` in
CUFRegisterManagedVariable to match the CUDA runtime API signature of
__cudaRegisterManagedVar, which declares deviceAddress as `char *`.
2026-03-23 22:24:04 +00:00
Zhen Wang
c0634cbb07
[flang][cuda] Add CUFRegisterManagedVariable runtime entry for __cudaRegisterManagedVar (#188124)
Add CUFRegisterManagedVariable runtime wrapper in flang-rt that calls
__cudaRegisterManagedVar.
This is preparation for supporting non-allocatable managed variables.
No functional change -- nothing calls this yet.
2026-03-23 14:01:24 -07:00
laoshd
6e5e1c97e0
[flang][flang-rt] Implement F202X leading-zero control edit descriptors LZ, LZS, and LZP for formatted output (F, E, D, and G editing) (#183500)
LZ: processor-dependent (default, flang prints leading zero); LZS:
suppress the optional leading zero before the decimal point; LZP: print
the optional leading zero before the decimal point. Changes span the
source parser, compile-time format validator, runtime format processing,
and runtime output formatting. Includes semantic test (io18.f90) and
documentation updates.
2026-03-23 11:50:48 -04:00
David Truby
6491750a87
[flang-rt] Fix file opening in APPEND mode on Windows (#186144) 2026-03-23 11:34:31 +00:00
laoshd
5a14e4f231
[flang] Implement SPLIT intrinsic subroutine with tests (#185584)
This is the implementation of part of F2023 new feature US 03.
Extracting tokens from a string, SPLIT intrinsic.

It's section 16.9.196 SPLIT (STRING, SET, POS [, BACK]) of Fortran 2023
Standard.

It's part of Flang issue
[#178044](https://github.com/llvm/llvm-project/issues/178044). Note that
I work with @kwyatt-ext on this issue. He implemented the other part,
TOKENIZE.

A test will be added into
[llvm-test-suite](https://github.com/llvm/llvm-test-suite) later after
this PR is merged.
2026-03-20 13:12:51 -04:00
kwyatt-ext
2a89e249a2
[flang] [flang-rt] Subscript overrun could occur in namelists during a READ command. (#176959)
NOTE: This is a new pull request, as the prior didn't have labels
properly applied.

If a bad subscript is provided in a namelisted record, the
HandleSubscripts() routine can read off into infinity. This patch
ensures that a read will not go beyond the rank of the expected
variable.

The failure will then be captured in the return status (IOSTAT) of the
READ.

The small test demonstrates the failure before and after the fix.

---------

Co-authored-by: Kevin Wyatt <kwyatt@hpe.com>
2026-03-18 16:26:15 -05:00
Kelvin Li
96299d8d4d
[flang] Disable trampoline test for PPC (NFC) (#187194) 2026-03-18 15:51:37 -04:00
Kelvin Li
77667d7c5b
[flang] Fix the CHECK: directive to ensure flagging RWE (NFC) (#187186)
Update the check to catch "RWE" in the header.
2026-03-18 10:13:33 -04:00
Sairudra More
111bafff9b
[flang] Add runtime trampoline pool for W^X compliance (#183108)
Flang currently lowers internal procedures passed as actual arguments
using LLVM's `llvm.init.trampoline` / `llvm.adjust.trampoline`
intrinsics, which require an executable stack. On modern Linux
toolchains and security-hardened kernels that enforce W^X (Write XOR
Execute), this causes link-time failures (`ld.lld: error: ... requires
an executable stack`) or runtime `SEGV` from NX violations.

This patch introduces a runtime trampoline pool that allocates
trampolines from a dedicated `mmap`'d region instead of the stack. The
pool toggles page permissions between writable (for patching) and
executable (for dispatch), so the stack stays non-executable throughout.
On macOS, MAP_JIT and `pthread_jit_write_protect_np` are used for the
same effect. An i-cache flush (`__builtin___clear_cache` on Linux,
`sys_icache_invalidate` on macOS) is performed after each write→exec
transition.

The feature is gated behind a new driver flag, `-fsafe-trampoline` (off
by default), which threads through the frontend into the
`BoxedProcedurePass`. When enabled, the pass emits calls to
`_FortranATrampolineInit`, `_FortranATrampolineAdjust`, and
`_FortranATrampolineFree` instead of the legacy intrinsics. The legacy
path is completely untouched when the flag is off.

The pool is a singleton with a fixed capacity (default 1024 slots,
overridable via `FLANG_TRAMPOLINE_POOL_SIZE`). Slot size varies by
target (32 bytes on x86-64/AArch64, 48 on PPC64, 64 fallback). Each slot
holds a small architecture-specific stub, currently x86-64 (17 bytes,
using `r10` as the nest/static-chain register) and AArch64 (24 bytes,
using `x15`). The implementation compiles on all architectures but will
crash at runtime with a clear diagnostic if trampoline emission is
actually attempted on an unsupported target. This avoids breaking the
flang-rt build on e.g. RISC-V or PPC64.

Freed slots are poisoned (the callee pointer is overwritten with a
sentinel) and recycled into a freelist, so the pool can sustain
long-running programs that repeatedly create and destroy closures.

A few design choices worth calling out:

The runtime avoids all C++ runtime dependencies, no `std::mutex`, no
`operator new`, no function-local statics with hidden guard variables.
Locking is via flang-rt's own `Lock` / `CriticalSection`, memory is via
`AllocateMemoryOrCrash` / `FreeMemory`, and the singleton uses explicit
double-checked locking with a raw pointer. This was done so the
trampoline pool links cleanly in minimal / freestanding flang-rt
configurations.

`_FortranATrampolineFree` calls are inserted immediately before every
`func.return` in the enclosing host function. This is a conservative but
correct strategy. The trampoline handle cannot outlive the host's stack
frame since the closure captures the host's local variables by
reference.

The GNU_STACK note is verified via a dedicated integration test
(`safe-trampoline-gnustack.f90`) that compiles and links a Fortran
program using the runtime path, then inspects the ELF with
`llvm-readelf` to confirm the stack segment is `RW` (not `RWE`).

**Test coverage:**

- `flang/test/Driver/fsafe-trampoline.f90` — flag forwarding (on, off,
default)
- `flang/test/Fir/boxproc-safe-trampoline.fir` — FIR-level FileCheck for
emitted runtime calls
- `flang/test/Lower/safe-trampoline.f90` — end-to-end lowering
- `flang-rt/test/Driver/safe-trampoline-gnustack.f90` — GNU_STACK ELF
verification

Closes #182813

Co-authored-by: Sairudra More <moresair@pe31.hpc.amslabs.hpecorp.net>
2026-03-10 16:16:05 +05:30
Eugene Epshteyn
a8b726ab9d
[flang-rt] Need to pad the output of execute_command_line(..., CMDMSG) (#185509)
Previously the error message was copied, but not padded for cases where
the message was shorter than the passed CMDMSG string. Add the padding
and also change the test case to test padding on all platforms.
2026-03-09 18:34:08 -04:00
Valentin Clement (バレンタイン クレメン)
089d69de46
[flang][cuda][NFC] Add filename and line number in error reporting (#185516)
Some entry points carry over filename and line number for error
reporting. Use this information when reporting cuda error.
2026-03-09 21:06:29 +00:00
Joseph Huber
8e40387ce4
[flang-rt] Remove experiemental OpenMP offloading support (#183653)
Summary:
This, as far as I am aware, has mostly been superceded by the runtimes
build that's built on top of libc. This build links 30% faster, supports
more functionality, and uses 95% less disk space, so it seems to be the
direction we want to go.

CUDA support remains, this is not needed urgently.
2026-03-06 09:50:57 -06:00
Eugene Epshteyn
008dc8b5bf
[flang-rt] Fix EXECUTE_COMMAND_LINE() on Windows (#184875)
Detect cmd.exe special status code 9009 that indicates "command not
found" condition. Crash the process if "command not found" detected when
CMDSTAT was not specified.
2026-03-05 18:17:57 -05:00
John Otken
22bf237e74
[flang-rt] Handle NAMELIST logical comments without preceding space (#183202)
If a comment appears immediately after a logical value in a NAMELIST
file, the flang runtime returns IostatGenericError. No error occurs when
a space preceeds the exclamation point. Add code to handle a comment
while parsing logical values.

Co-authored-by: John Otken john.otken@hpe.com
2026-03-05 11:14:33 +05:30
Eugene Epshteyn
82a9948ea4
[flang-rt] Fixes EXECUTE_COMMAND_LINE() status management and double buffering (#184285)
EXECUTE_COMMAND_LINE() without CMDSTAT initiated termination in runtime
if the command returned non-zero status code. For example,
EXECUTE_COMMAND_LINE('false') on Linux would cause "fatal Fortran
runtime error... : Command line execution failed with exit code: 1."
This is too strict: EXECUTE_COMMAND_LINE() successfully called 'false',
it's just 'false' happened to return non-zero status code. ifx and
gfortran don't initiate termination in such case. Changed
EXECUTE_COMMAND_LINE() implementation to behave in similar fashion.

Also during testing discovered that when the output of the program that
uses EXECUTE_COMMAND_LINE(... WAIT=.false.) is piped to a file, the
resulting file has duplicated output lines. This was because fork()
command also ends up duplicating parent's buffered output to the child.
Added flush of all units and C stdio before calling fork().
2026-03-04 22:03:21 -05:00
Joseph Huber
0cbba3ed5f
[flang-rt] Fix incorrect condition for removing backtrace (#184610) 2026-03-04 07:50:48 -06:00
Joseph Huber
dc44bcafe0
[flang-rt] Fix NVPTX builds erroneously using backtrace support (#184415)
Summary:
This is caused  by the CMake hacks I had to do to worm around NVIDIA's
proprietary binaries.
2026-03-03 14:38:53 -06:00
Peter Klausler
d723d14e4c
[flang][runtime] Emit "Infinity" rather than "Inf" when required (#183359)
The ISO Fortran standard requires that numeric output editing produce
the full word "Infinity", rather than my current "Inf", when the output
field is wide enough to hold it. Comply.
2026-03-02 09:45:12 -08:00
kwyatt-ext
ca0e7d31d0
[flang] [flang-rt] Addition of the Fortran 2023 TOKENIZE intrinsic. (#181030)
This implements the TOKENIZE intrinsic per the Fortran 2023 Standard.

TOKENIZE is a more complicated addition to the flang intrinsics, as it
is the first subroutine that has multiple unique footprints. Intrinsic
functions have already addressed this challenge, however subroutines and
functions are processed slightly differently and the function code was
not a good 1:1 solution for the subroutines. To solve this the function
code was used as an example to create error buffering within the
intrinsics Process and select the most appropriate error message for a
given subroutine footprint.

A simple FIR compile test was added to show the proper compilation of
each case. A thorough negative path test has also been added, ensuring
that all possible errors are reported as expected.

Testing prior to commit:

= check-flang ==========================================
```
Testing Time: 139.51s

Total Discovered Tests: 4153
  Unsupported      :   77 (1.85%)
  Passed           : 4065 (97.88%)
  Expectedly Failed:   11 (0.26%)


FLANG Container Test completed 2 minutes (160 s).

Total Time: 2 minutes (160 s)
Completed : Wed Feb 11 04:05:50 PM CST 2026
```

= check-flang-rt ==========================================
```
Testing Time: 1.55s

Total Discovered Tests: 258
  Passed: 258 (100.00%)


FLANG Container Test completed 0 minutes (55 s).

Total Time: 0 minutes (56 s)
Completed : Wed Feb 11 04:08:32 PM CST 2026
```

= llvm-test-suite ==========================================
```
Testing Time: 1886.64s

Total Discovered Tests: 6926
  Passed: 6926 (100.00%)


CCE SLES Container debug compile completed 31 minutes (1895 s).
CCE SLES Container debug install completed in 0 minutes (0 s).

Total Time: 31 minutes (1895 s)
Completed : Wed Feb 11 05:46:52 PM CST 2026
```

Additionally, (FYI) an executable test has been written and will be
added to the llvm-test-suite under a separate PR.

---------

Co-authored-by: Kevin Wyatt <kwyatt@hpe.com>
2026-02-27 18:43:18 +00:00
Joseph Huber
c49460bae7
[flang-rt] Enable more runtime functions for the GPU target (#183649)
Summary:
This enables primarily `stop.cpp` and `descriptor.cpp`. Requires a
little bit of wrangling to get it to compile. Unlike the CUDA build,
this build uses an in-tree libc++ configured for the GPU. This is
configured without thread support, environment, or filesystem, and it is
not POSIX at all. So, no mutexes, pthreads, or get/setenv.

I tested stop, but i don't know if it's actually legal to exit from
OpenMP offloading.
2026-02-27 12:27:39 -06:00
Valentin Clement (バレンタイン クレメン)
26b4c25b8b
[flang][cuda] Add support for cudaStreamDestroy (#183648)
Add specific lowering and entry point for cudaStreamDestroy. Since we
keep associated stream for some allocation, we need to reset it when the
stream is destroy so we don't use it anymore.
2026-02-27 00:24:29 +00:00
Slava Zakharin
a7723994a9
[flang-rt] Get rid of cyclic call chain in ChildIo. (#183369)
This is a follow up on #182635

It was suggested to place `static_assert(std::is_trivially_destructible_c<A>)`
for the `OwningPtr` class. This cannot be done, because there are
non-trivially destructible types used with `OwnerPtr` (e.g. lots of types
that inherit from `IoErrorHandler`, which is not trivially destructible).

This patch brings back the desctructor call into `OwningPtr::delete_ptr`
just to be on the safe side (though, I do not think we had any memory
leaks even without the destructor call), and removes the cyclic
dependency for the `~ChildIo()` caused by `previous_` member.
2026-02-26 08:55:18 -08:00
Peter Klausler
7acc02ff7a
[flang][runtime] Catch EOSHIFT ARRAY/BOUNDARY type mismatch (#183168)
The ARRAY= and optional BOUNDARY= arguments to EOSHIFT must have the
same dynamic type. Add a runtime check.
2026-02-26 07:08:56 -08:00
Peter Klausler
67eb750225
[flang][runtime] Catch asynchronous parent or child I/O (#183161)
ASYNCHRONOUS="YES" is not permitted for either a parent or child data
transfer statement in ISO Fortran (F'2023 12.6.4.8.3 p19). Not that it
matters much -- we don't support true asynchronous I/O anyway -- but
someday we might, and in the meantime it's nice to be able to pass tests
that check conformance.
2026-02-26 07:07:49 -08:00
Peter Klausler
a10e77b1a5
[flang][runtime] Conditionally fail empty ALLOCATE (#182918)
Add an environment variable (FORT_NO_EMPTY_ALLOCATION) that, when set to
1, changes the behavior of an ALLOCATE statement so that it will fail on
an empty allocation rather than its default behavior of allocating one
byte.
2026-02-26 06:55:59 -08:00
Slava Zakharin
553ce3a115
[flang-rt] Temporarily disable destructor call in OwningPtr::delete_ptr. (#182635)
This is causing failures in CUF testing, because the device compiler
cannot identify the static stack size for kernels.
2026-02-20 19:49:54 -08:00
Valentin Clement (バレンタイン クレメン)
9f6ad7654b
[flang][cuda] Use default stream when calling cudaStreamSynchronize without arg (#182623) 2026-02-21 00:05:50 +00:00
Joseph Huber
70b5a1d050
[flang-rt] Add support for formatted I/O on the GPU (#182580)
Summary:
Expands on the previous support to enable formatted output, characters,
and checking basic iostat. We intentionally do not handle cases where
the descriptor is non-null as this is a non-trivial class that cannot
easily be shepherded across the wire.
2026-02-20 14:43:06 -06:00
Joseph Huber
f343cc0eff [flang-rt] Add missing libc dependency to object library 2026-02-20 10:45:38 -06:00
Tom Eccles
f3f0b621f7
[flang-rt] Add sysroot to test (#182508)
This fixes the test on MacOS. Without this change the SDK sysroot is not
set and so the library path is incorrect and the 'System' library cannot
be found.

Test with https://github.com/llvm/llvm-project/pull/182501 so that the
sysroot variable is correctly set.

Assisted-by: Codex
2026-02-20 16:18:44 +00:00
Joseph Huber
21b3461440
[flang-rt] Implement basic support for I/O from OpenMP GPU Offloading (#181039)
Summary:
This PR provides the minimal support for Fortran I/O coming from a GPU
in OpenMP offloading. We use the same support the `libc` uses for its
printing through the RPC server. The helper functions `rpc::dispatch`
and `rpc::invoke` help make this mostly automatic.

Becaus Fortran I/O is not reentrant, the vast majority of complexity
comes from needing to stitch together calls from the GPU until they can
be executed all at once. This is needed not only because of the
limitations of recursive I/O, but without this the output would all be
interleaved because of the GPU's lock-step execution.

As such, the return values from the intermediate functions are
meaningless, all returning true. The final value is correct however. For
cookies we create a context pointer on the server to chain these
together.

Works on both my AMD and NVIDIA GPUs.
```fortran
program hello_gpu
  implicit none

  !$omp target teams num_teams(1)
  !$omp parallel num_threads(2)
    ! Print strings
    print *, "Hello from GPU"
  !$omp end parallel
  !$omp end target teams

end program hello_gpu
```
```console
> flang hello.f90 -O2 -fopenmp --offload-arch=gfx1030 
> ./a.out 
 Hello from GPU
 Hello from GPU
> flang hello.f90 -O2 -fopenmp --offload-arch=sm_89  
> ./a.out 
 Hello from GPU
 Hello from GPU
```
2026-02-20 07:56:59 -06:00
Michael Kruse
398bd9543b
[Flang-RT][unittests] Fix buffer over-read (#182176)
The unittests `Reductions.InfSums` defines a test array descriptor with
shape 2x3 (i.e. 6 elements), but only provides values for 2 elements.
The result is access of likely uninitialized memory when accessing the
additional 4 elements. In most cases the additional values get gobbled
up by the infinity, but if it happens to be NaN or the negated infinity,
the result becomes NaN and fails the test.

Fix by reducing the shabe of the test array to 2. Fixes the flakyness of
the test of the flang-x86_64-windows buildbot.
2026-02-20 09:31:39 +01:00
Valentin Clement (バレンタイン クレメン)
7772a45b1a
[flang][cuda] Add entry points for cudastreamsynchronize (#181932) 2026-02-18 15:54:54 -08:00
Valentin Clement (バレンタイン クレメン)
786b3b4741
[flang][cuda][NFC] Move set/get default stream to its own file (#181927) 2026-02-17 14:43:38 -08:00
Valentin Clement (バレンタイン クレメン)
7ce0c53291
[flang][cuda] Fix return value for CUFSetDefaultStream (#181884)
The interface return an integer value but the entry point and lowering
were missing it.
2026-02-17 13:48:29 -08:00
Valentin Clement (バレンタイン クレメン)
3c32747a7c
[flang][cuda] Lower set/get default stream (#181775) 2026-02-17 09:32:04 -08:00
Valentin Clement (バレンタイン クレメン)
1b2196bd82
[flang][cuda] Add entry point for set/get default stream (#181440) 2026-02-13 17:59:06 -08:00
Valentin Clement (バレンタイン クレメン)
c4170461d7
[flang][cuda] Lower set/get default stream for arrays (#181432) 2026-02-13 23:44:38 +00:00
Valentin Clement (バレンタイン クレメン)
19de5dd722
[flang][cuda] Add CUFSetAssociatedStream entry point (#181313) 2026-02-13 19:15:44 +00:00
Caroline Newcombe
d3a70f3b2c
[flang] Implement 'F_C_STRING' library function (Fortran 2023) (#174474)
Implement `F_C_STRING` to convert a Fortran string to a C
null-terminated string. Documented in F2023 Standard: 18.2.3.9
`F_C_STRING (STRING [, ASIS])`.
2026-02-10 13:30:31 -05:00