14 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
4cb2a519db
Revert "Reland '[flang] Allow to pass an async id to allocate the descriptor (#118713)' and #118733" (#121029)
This still cause issue for device runtime build.
2024-12-23 21:27:34 -08:00
Valentin Clement (バレンタイン クレメン)
5b74fb75d9
Reland '[flang] Allow to pass an async id to allocate the descriptor (#118713)' and #118733 (#120997)
Device runtime build have been fixed. Attempt to re-land these patches
that have been approved before.

https://github.com/llvm/llvm-project/pull/118713
https://github.com/llvm/llvm-project/pull/118733
2024-12-23 12:13:56 -08:00
Valentin Clement (バレンタイン クレメン)
16c2a1016e
Revert "[flang] Allow to pass an async id to allocate the descriptor (#118713)" (#119109)
This reverts commit 7d1c661381d36018fd105f4ad4c2d6dc45e7288b.

This commit breaks some device runtime builds. Need time to investigate.
2024-12-07 19:55:12 -08:00
Valentin Clement (バレンタイン クレメン)
83ccaad473
[flang][cuda] Use async id for device stream allocation (#118733)
When stream is specified use cudaMallocAsync with the specified stream
2024-12-05 08:57:10 -08:00
Valentin Clement (バレンタイン クレメン)
7d1c661381
[flang] Allow to pass an async id to allocate the descriptor (#118713)
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.

This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.

A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
2024-12-04 18:24:40 -08:00
Valentin Clement (バレンタイン クレメン)
cdf447baa5
[flang][cuda] Add function to allocate and deallocate device module variable (#109213)
This patch adds new runtime entry points that perform the simple
allocation/deallocation of module allocatable variable with cuda
attributes.
When the allocation is initiated on the host, the descriptor on the
device is synchronized. Both descriptors point to the same data on the
device.

This is the first PR of a stack.
2024-09-18 20:22:06 -07:00
Valentin Clement
743e99dcf5
Reland "[flang][cuda] Use cuda runtime API #103488"
CUDA Fortran is meant to be an equivalent to the runtime API. Therefore, it
makes more sense to use the cuda rt API in the allocators for CUF.
2024-08-14 14:56:00 -07:00
Valentin Clement (バレンタイン クレメン)
f6e3dbc27d
Revert "[flang][cuda] Use cuda runtime API" (#104232)
Reverts llvm/llvm-project#103488
2024-08-14 13:44:49 -07:00
Valentin Clement (バレンタイン クレメン)
00ab8a6a4c
[flang][cuda] Use cuda runtime API (#103488)
CUDA Fortran is meant to be an equivalent to the runtime API. Therefore,
it makes more sense to use the cuda rt API in the allocators for CUF.

@bdudleback
2024-08-14 12:34:45 -07:00
Valentin Clement (バレンタイン クレメン)
4c1dbbe7aa
[flang][cuda] Make CUFRegisterAllocator callable from C/Fortran (#102543) 2024-08-08 17:09:53 -07:00
Valentin Clement (バレンタイン クレメン)
388b63243c
[flang][cuda] Defined allocator for unified data (#102189)
CUDA unified variable where set to use the same allocator than managed
variable. This patch adds a specific allocator for the unified
variables. Currently it will call the managed allocator underneath but
we want to have the flexibility to change that in the future.
2024-08-06 14:30:31 -07:00
Valentin Clement (バレンタイン クレメン)
10d7805c4f
[flang][cuda][NFC] Disambiguate namespace with cuf dialect (#102194)
Rename namespace `Fortran::runtime::cuf` to `Fortran::runtime::cuda` to
avoid embiguity with the namespace `::cuf` that is defined in the CUF
dialect.
2024-08-06 14:04:45 -07:00
Valentin Clement (バレンタイン クレメン)
46425b8d0f
[flang][cuda] Fix allocator-registry header path (#101727)
File was moved in #101212
2024-08-02 11:19:36 -07:00
Valentin Clement (バレンタイン クレメン)
1417633943
[flang][cuda] Add CUF allocator (#101216)
Add allocators for CUDA fortran allocation on the device. 3 allocators
are added for pinned, device and managed/unified memory allocation.
`CUFRegisterAllocator()` is called to register the allocators in the
allocator registry added in #100690.


Since this require CUDA, a cmake option `FLANG_CUF_RUNTIME` is added to
conditionally build these.
2024-08-02 10:02:34 -07:00