245 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
26b4c25b8b
[flang][cuda] Add support for cudaStreamDestroy (#183648)
Add specific lowering and entry point for cudaStreamDestroy. Since we
keep associated stream for some allocation, we need to reset it when the
stream is destroy so we don't use it anymore.
2026-02-27 00:24:29 +00:00
Valentin Clement (バレンタイン クレメン)
af9ca0e5be
Revert "[flang][cuda] Add entry points for cudastreamsynchronize (#181932)" (#182657)
This is causing some testing issue. Reverting for now.
2026-02-21 06:50:03 +00:00
Valentin Clement (バレンタイン クレメン)
7772a45b1a
[flang][cuda] Add entry points for cudastreamsynchronize (#181932) 2026-02-18 15:54:54 -08:00
Valentin Clement (バレンタイン クレメン)
3c32747a7c
[flang][cuda] Lower set/get default stream (#181775) 2026-02-17 09:32:04 -08:00
Valentin Clement (バレンタイン クレメン)
c4170461d7
[flang][cuda] Lower set/get default stream for arrays (#181432) 2026-02-13 23:44:38 +00:00
Caroline Newcombe
4b109dc5ac
[flang] Implement C_F_STRPOINTER (Fortran 2023) (#176973)
Implement C_F_STRPOINTER to associate a Fortran character pointer with a
C string.

This intrinsic has two forms:

C_F_STRPOINTER(CSTRARRAY, FSTRPTR [,NCHARS]): Associates FSTRPTR with a
C string array
C_F_STRPOINTER(CSTRPTR, FSTRPTR, NCHARS): Associates FSTRPTR with a
C_PTR pointing to a character string
Implementation includes semantic validation, FIR lowering, and
associated tests.

F2023 Standard: 18.2.3.5

AI Usage Disclosure: AI tools (Claude Sonnet 4.5) were used to assist
with implementation of this feature and test code generation. I have
reviewed, modified, and tested all AI-generated code.
2026-02-12 12:42:13 -06:00
Caroline Newcombe
d3a70f3b2c
[flang] Implement 'F_C_STRING' library function (Fortran 2023) (#174474)
Implement `F_C_STRING` to convert a Fortran string to a C
null-terminated string. Documented in F2023 Standard: 18.2.3.9
`F_C_STRING (STRING [, ASIS])`.
2026-02-10 13:30:31 -05:00
Valentin Clement (バレンタイン クレメン)
3d3dd559a0
[flang][cuda] Add support for derived-type component with managed/unified attributes (#177409)
Derived-type components that have the `ALLOCATABLE` or `POINTER`
attribute as well as the CUDA `MANAGED` or `UNIFIED` attribute need to
have a specific allocator index set in the descriptor so the allocation
is done correctly. Without this, the allocation is done in host memory
and will trigger illegal read or write if the component is used on the
device. The correct allocator index was set some time ago for the
`DEVICE` attribute but the `MANAGED` and `UNIFIED` attribute need the
same mechanism.

Since the `Component::Genre` has quite some room I opted to add specific
genre for allocatable and pointer with both managed or unified
attribute.
@klausler Let me know if you would prefer another solution. I was
thinking about a separate field but I wanted to avoid wasting some
bytes.
2026-01-22 13:08:48 -08:00
Jameson Nash
44f59bae39
[flang] Fix ISO_C_BINDING type sizes for Windows (#172034)
Fix several ISO_C_BINDING type parameters for Windows compatibility:

- c_long/c_unsigned_long: Use 32-bit on Windows (LLP64 data model)
- c_long_double: Use 64-bit (kind=8) on Windows ARM64

https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build/issues/9#issuecomment-2573385824
- c_unsigned_long_long: Explicitly use c_uint64_t instead of depending
on c_unsigned_long
- c_uintmax_t: Use 64-bit on Windows (consistent with MSVC/MinGW)

Fixes issue reported in
https://github.com/msys2/MINGW-packages/pull/16579

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 09:57:15 -05:00
Michael Kruse
d233e787f0 Revert "[Flang] Move builtin .mod generation into runtimes (Reapply #137828) (#169638)"
This reverts commit 7675fc79c802cf7f6a95660f6ee59bf6cb62102f.

Requested in PR:
https://github.com/llvm/llvm-project/pull/169638#issuecomment-3634227707
2025-12-09 22:39:41 +01:00
Michael Kruse
7675fc79c8
[Flang] Move builtin .mod generation into runtimes (Reapply #137828) (#169638)
Reapplication of #137828, changes:
* Workaround CMAKE_Fortran_PREPROCESS_SOURCE issue for CMake < 2.24: The
issue is that `try_compile` does not forward manually-defined compiler
flang variables to the test build environment; instead of just a
negative test result, it aborts the configuration step itself. To be
fair, manually defining these variables is deprecated since at least
CMake 3.6.
* Missing flang cmd line flags for CMake < 3.28 `-target=`, `-O2`, `-O3`
* It is now possible to set FLANG_RT_ENABLED_STATIC=OFF and
FLANG_RT_ENABLE_SHARED=OFF at the same and is the default for amdgpu and
nvptx targets. In this mode, only the .mod files are compiled --
necessary for module files in
lib/clang/22/finclude/flang/(nvptx64-nvidia-cuda|amdgpu-amd-amdhsa)/*.mod
to be available.
* For compiling omp_lib.mod for nvptx and amdgpu, the module build
functionality must be hoisted out if openmp's runtime/ directory which
is only included for host targets. This PR now requires #169909.
 

Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:

1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Constants such as [`c_long_double` also have different
values](d748c81218/flang-rt/lib/runtime/iso_c_binding.f90 (L77-L81)).
Most other modules have `#ifdef`-enclosed code as well. For instance
this caused offload targets nvptx64-nvidia-cuda/amdgpu-amd-amdhsa to use
the modules files compiled for the host which may contrain uses of the
types REAL(10) or REAL(16) not available for nvptx/amdgpu.

#146876
#128015
#129742
#158790

3. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.

4. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (#89403). The workaround of #95388 could be reverted (PR
#169525).


If using Flang for cross-compilation or target-offloading, flang-rt must
now be compiled for each target not only for the library, but also to
get the target-specific module files. For instance in a bootstrapping
runtime build, this can be done by adding:
`-DLLVM_RUNTIME_TARGETS=default;nvptx64-nvidia-cuda;amdgpu-amd-amdhsa`.


Some new dependencies come into play:
* openmp depends on flang-rt for building `lib_omp.mod` and
`lib_omp_kinds.mod`. Currently, if flang-rt is not found then the
modules are not built.
* check-flang depends on flang-rt: If not found, the majority of tests
are disabled. If not building in a bootstrpping build, the location of
the module files can be pointed to using
`-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone
build. Alternatively, the test needing any of the intrinsic modules
could be marked with `REQUIRES: flangrt-modules`.
* check-flang depends on openmp: Not a change; tests requiring
`lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with
`openmp_runtime`.

As intrinsic are now specific to the target, their location is moved
from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The
mechnism to compute the location have been moved from flang-rt
(previously used to compute the location of `libflang_rt.*.a`) to common
locations in `cmake/GetToolchainDirs.cmake` and
`runtimes/CMakeLists.txt` so they can be used by both, openmp and
flang-rt. Potentially the mechnism could also be shared by other
libraries such as compiler-rt.

`finclude` was chosen because `gfortran` uses it as well and avoids
misuse such as `#include <flang/iso_c_binding.mod>`. The search location
is now determined by `ToolChain` in the driver, instead of by the
frontend. Another subdirectory `flang` avoids accidental inclusion of
gfortran-modules which due to compression would result in
user-unfriendly errors. Now the driver adds `-fintrinsic-module-path`
for that location to the frontend call (Just like gfortran does).
`-fintrinsic-module-path` had to be fixed for this because ironically it
was only added to `searchDirectories`, but not
`intrinsicModuleDirectories_`. Since the driver determines the location,
tests invoking `flang -fc1` and `bbc` must also be passed the location
by llvm-lit. This works like llvm-lit does for finding the include dirs
for Clang using `-print-file-name=...`.
2025-12-09 12:54:26 +01:00
Valery Dmitriev
4b2714f12f
[flang/flang-rt] Implement show_descriptor intrinsic, a non-standard extension. (#170389)
This is a reapply the original patch (#169137) with the flang-rt unit
test changes limiting it to linux platform only.
Additionally accommodated style changes from Peter Klausler (#170227)
    
show_descriptor intrinsic prints details of a descriptor (extended
Fortran pointer).
It accepts a descriptor for any type and rank, including scalars.
Requires use of flang_debug module.
    
Example:
```
    program test
      use flang_debug
      implicit none
      integer :: a(4) = (/ 1,3,5,7 /)
      call show_descriptor(a(1:3))
    end program test
```    
and its output:
```
    Descriptor @ 0x7ffe01ec6a98:
      base_addr 0x563b7035103c
      elem_len  4
      version   20240719
      rank      1
      type      9 "INTEGER(kind=4)"
      attribute 0
      extra     0
        addendum  0
        alloc_idx 0
```
2025-12-04 08:29:31 -08:00
Michael Kruse
5681c71a80 Revert "[flang] implement show_descriptor intrinsic, a non-standard extension (#169137)"
This reverts commit e7748e92cd5d71af2e1699328b7c575e9b9bf479.

It broke the Windows build

https://github.com/llvm/llvm-project/actions/runs/19842117405/job/56852610863
https://lab.llvm.org/buildbot/#/builders/166/builds/4535

After #170142 fixed another issue, this was also the remaining reason
for this buildbot to fail:

https://lab.llvm.org/buildbot/#/builders/207/builds/10423
2025-12-02 17:48:28 +01:00
Valery Dmitriev
e7748e92cd
[flang] implement show_descriptor intrinsic, a non-standard extension (#169137)
show_descriptor intrinsic prints details of a descriptor (extended
Fortran pointer).
It accepts a descriptor for any type and rank, including scalars.
Requires use of flang_debug module.

Example:
program test
  use flang_debug
  implicit none
  integer :: a(4) = (/ 1,3,5,7 /)
  call show_descriptor(a(1:3))
end program test

and its output:
Descriptor @ 0x7ffe01ec6a98:
  base_addr 0x563b7035103c
  elem_len  4
  version   20240719
  rank      1
  type      9 "INTEGER(kind=4)"
  attribute 0
  extra     0
    addendum  0
    alloc_idx 0
  dim[0] lower_bound 1
         extent      3
         sm          4
2025-12-01 13:53:13 -08:00
Jan Patrick Lehr
4bc654d649
Revert "[Flang] Move builtin .mod generation into runtimes" (#169489)
Reverts llvm/llvm-project#137828

Buildbot error in
https://lab.llvm.org/staging/#/builders/105/builds/37275
2025-11-25 13:54:27 +01:00
Michael Kruse
86fbaef99a
[Flang] Move builtin .mod generation into runtimes (#137828)
Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:

1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Most other modules have `#ifdef`-enclosed code as well.

2. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.

3. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (#89403). The workaround of #95388 could be reverted.


Some new dependencies come into play:
* openmp depends on flang-rt for building `lib_omp.mod` and
`lib_omp_kinds.mod`. Currently, if flang-rt is not found then the
modules are not built.
* check-flang depends on flang-rt: If not found, the majority of tests
are disabled. If not building in a bootstrpping build, the location of
the module files can be pointed to using
`-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone
build. Alternatively, the test needing any of the intrinsic modules
could be marked with `REQUIRES: flangrt-modules`.
* check-flang depends on openmp: Not a change; tests requiring
`lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with
`openmp_runtime`.

As intrinsic are now specific to the target, their location is moved
from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The
mechnism to compute the location have been moved from flang-rt
(previously used to compute the location of `libflang_rt.*.a`) to common
locations in `cmake/GetToolchainDirs.cmake` and
`runtimes/CMakeLists.txt` so they can be used by both, openmp and
flang-rt. Potentially the mechnism could also be shared by other
libraries such as compiler-rt.

`finclude` was chosen because `gfortran` uses it as well and avoids
misuse such as `#include <flang/iso_c_binding.mod>`. The search location
is now determined by `ToolChain` in the driver, instead of by the
frontend. Now the driver adds `-fintrinsic-module-path` for that
location to the frontend call (Just like gfortran does).
`-fintrinsic-module-path` had to be fixed for this because ironically it
was only added to `searchDirectories`, but not
`intrinsicModuleDirectories_`. Since the driver determines the location,
tests invoking `flang -fc1` and `bbc` must also be passed the location
by llvm-lit. This works like llvm-lit does for finding the include dirs
for Clang using `-print-file-name=...`.
2025-11-25 10:33:58 +01:00
Valentin Clement (バレンタイン クレメン)
e23328b457
[flang][cuda] Add support for cluster_block_index in cooperative groups (#169427) 2025-11-24 16:41:03 -08:00
Valentin Clement (バレンタイン クレメン)
ab2a302f0e
[flang][cuda] Add support for cluster_dim_blocks in cooperative_groups (#169417) 2025-11-24 22:55:02 +00:00
Valentin Clement (バレンタイン クレメン)
ab5ae9a61f
[flang][cuda] Implement this_cluster for cooperative groups (#169414)
Implement `this_cluster` like `this_group` by lowering it directly like
an intrinsic function. Use the NVVM operation to get the rank and size
information and populate the derived type.
2025-11-24 14:28:16 -08:00
Valentin Clement (バレンタイン クレメン)
0ce03c2be4
[flang][cuda] Add interface and lowering for atomicadd_r4x2 and atomicadd_r4x4 (#166308) 2025-11-04 09:33:09 -08:00
Valentin Clement (バレンタイン クレメン)
95d6caa5d4
[flang][cuda] Add interfaces and lowering for atomicaddvector (#166275) 2025-11-03 20:40:18 -08:00
Valentin Clement (バレンタイン クレメン)
ac21fde464
[flang][cuda] Add support for f16 atomicadd (#166229) 2025-11-03 11:56:22 -10:00
Valentin Clement (バレンタイン クレメン)
5defeedd35
[flang][cuda] Add interfaces and lowering for tma_bulk_store (#165482)
As defined in
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations
2025-10-28 22:10:26 +00:00
Valentin Clement (バレンタイン クレメン)
5d89a474b0
[flang][cuda] Add interfaces and lowering for tma_bulk_load (#165474)
As defined in
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations
2025-10-28 14:54:42 -07:00
Valentin Clement (バレンタイン クレメン)
56c1d35bfd
[flang][cuda] Add interfaces and lowering for barrier_try_wait(_sleep) (#165316)
As described in the programming guide:
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations
2025-10-28 17:37:59 +00:00
Valentin Clement (バレンタイン クレメン)
07ed101f8d
[flang][cuda] Support logical(4) in syncthread_and|count|or functions (#164706) 2025-10-22 21:21:57 +00:00
Valentin Clement (バレンタイン クレメン)
32adfb5612
[flang][cuda] Add interface and lowering for tma_bulk_s2g (#163232)
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html#load-and-store-functions-using-cache-hints
2025-10-13 18:51:02 +00:00
Valentin Clement (バレンタイン クレメン)
47e9df8a74
[flang][cuda] Add interface and lowering for tma_bulk_g2s (#163034)
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations
2025-10-12 21:00:59 -07:00
Valentin Clement (バレンタイン クレメン)
78b363ce1d
[flang][cuda] Set value attribute to count for barrier procedures (#163031) 2025-10-11 20:07:52 -07:00
Valentin Clement (バレンタイン クレメン)
9f068436d3
[flang][cuda] Add interface and lowering for fence_proxy_async (#163014)
Part of TMA operation defined here:
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations
2025-10-11 19:49:56 +00:00
Valentin Clement (バレンタイン クレメン)
886e797469
[flang][cuda] Add interfaces and lowering for tma_bulk_[commit|wait]_group subroutine (#163012)
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations
2025-10-11 19:18:58 +00:00
Valentin Clement (バレンタイン クレメン)
1c95c7ae20
[flang][cuda] Add interfaces and lowering for barrier_arrive (#162949) 2025-10-11 09:54:38 -07:00
Valentin Clement (バレンタイン クレメン)
1a8057fd47
[flang][cuda] Add interface and lower barrier_init (#162929)
Add interface for `barrier_init` and lower it to the NVVM Op. 

https://github.com/clementval/llvm-project/pull/new/cuf_barrier_init
2025-10-10 22:48:27 +00:00
Valentin Clement (バレンタイン クレメン)
d3c09c45aa
[flang] Add special genre for allocatable and pointer device component (#157731)
Allocatable and pointer device components need a different allocator
index to be set in their descriptor when it is establish. This PR adds
two genre for the components `AllocatableDevice` and `PointerDevice` so
the correct allocator index can be set accordingly.
2025-09-09 13:12:20 -07:00
Valentin Clement (バレンタイン クレメン)
3720d8b52d
[flang][cuda] Update some bind name to fast version and add __sincosf (#153744)
Use the fast version in the bind name and reorder these fast math
functions. Add missing __sincosf interface.
2025-08-15 11:07:15 -07:00
Valentin Clement (バレンタイン クレメン)
115f816069
[flang][cuda] Add missing bind name for __int2double_rn (#153720) 2025-08-15 10:27:19 -07:00
Valentin Clement (バレンタイン クレメン)
0e4af726cb
[flang][cuda] Add interface for __fdividef (#153742) 2025-08-15 10:26:40 -07:00
Valentin Clement (バレンタイン クレメン)
0e8c964c21
[flang][cuda] Add interfaces for double_as_longlong and longlong_as_double (#153719) 2025-08-15 17:26:11 +00:00
Valentin Clement (バレンタイン クレメン)
fd3f052aeb
[flang][cuda] Add interfaces for int_as_float and float_as_int (#153716) 2025-08-15 10:00:53 -07:00
Valentin Clement (バレンタイン クレメン)
583499a8cf
[flang][cuda] Add missing bind name for __hiloint2double, __double2loint and __double2hiint (#153713) 2025-08-15 09:32:59 -07:00
Valentin Clement (バレンタイン クレメン)
3bc4d66082
[flang][cuda] Add interfaces for __int2float_rX (#153708) 2025-08-14 16:45:44 -07:00
Valentin Clement (バレンタイン クレメン)
ffe4870472
[flang][cuda] Add interfaces for __float2int_rX and __float2unit_rX (#153691) 2025-08-14 23:11:45 +00:00
Valentin Clement (バレンタイン クレメン)
602f308d4f
[flang][cuda] Add interface for __saturatef (#153705) 2025-08-14 15:55:17 -07:00
Valentin Clement (バレンタイン クレメン)
2775c79c4f
[flang][cuda] Add interfaces for __float2ll_rX (#153702) 2025-08-14 15:44:52 -07:00
Valentin Clement (バレンタイン クレメン)
ca9ddd54b7
[flang][cuda] Add interfaces for __ll2float_rX (#153694) 2025-08-14 15:35:02 -07:00
Valentin Clement (バレンタイン クレメン)
df15c0d716
[flang][cuda] Add interfaces for __dsqrt_rn and __dsqrt_rz (#153624) 2025-08-14 22:08:33 +00:00
Valentin Clement (バレンタイン クレメン)
b989c7c2e0
[flang][cuda] Add interfaces for __drcp_rX (#153681) 2025-08-14 21:44:47 +00:00
Valentin Clement (バレンタイン クレメン)
06590444f5
[flang][cuda] Add bind names for __double2ull_rX interfaces (#153678) 2025-08-14 21:10:20 +00:00
Valentin Clement (バレンタイン クレメン)
bad3df4764
[flang][cuda] Add bind names for __double2ll_rX interfaces (#153660) 2025-08-14 13:34:25 -07:00
Valentin Clement (バレンタイン クレメン)
20a829937c
[flang][cuda] Add interfaces for __expf and __exp10f (#153633) 2025-08-14 11:36:55 -07:00