61 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
2c6771889a
[flang][cuda] Introduce cuf.set_allocator_idx operation (#148717) 2025-07-14 17:23:18 -07:00
Valentin Clement (バレンタイン クレメン)
f5609aa1b0
[flang][cuda] Use a reference for asyncObject (#140614)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.

New tentative with some fix. The previous was reverted some time ago.

Reviewed in #138010
2025-05-19 15:02:53 -07:00
Sergio Afonso
7e8b3fea43
[Flang] Add missing dependent dialects to MLIR passes (#139260)
This patch updates several passes to include the DLTI dialect, since
their use of the `fir::support::getOrSetMLIRDataLayout()` utility
function could, in some cases, require this dialect to be loaded in
advance.

Also, the `CUFComputeSharedMemoryOffsetsAndSize` pass has been updated
with a dependency to the GPU dialect, as its invocation to
`cuf::getOrCreateGPUModule()` would result in the same kind of error if
no other operations or attributes from that dialect were present in the
input MLIR module.
2025-05-13 16:17:49 +01:00
Valentin Clement (バレンタイン クレメン)
9b6b144438
Revert "[flang][cuda] Use a reference for asyncObject" (#138221)
Reverts llvm/llvm-project#138186
2025-05-01 17:41:44 -07:00
Valentin Clement (バレンタイン クレメン)
7f922f1400
[flang][cuda] Use a reference for asyncObject (#138186)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.

New tentative with some fix. The previous was reverted yesterday.
2025-05-01 17:04:12 -07:00
Valentin Clement (バレンタイン クレメン)
01a18809ee
Revert "[flang][cuda] Use a reference for asyncObject (#138010)" (#138082)
This reverts commit 9b0eaf71e674a28ee55be3afa11b5f7d4da732c0.
2025-04-30 22:03:26 -07:00
Valentin Clement (バレンタイン クレメン)
9b0eaf71e6
[flang][cuda] Use a reference for asyncObject (#138010)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.
2025-04-30 14:02:29 -07:00
Valentin Clement (バレンタイン クレメン)
d08e980065
[flang][cuda] Only convert launch from CUDA Fortran kernels (#136221)
Make sure `gpu.launch_func` has a CUDA proc attribute and update the
conversion pattern to only convert those with the attribute.
2025-04-21 10:51:48 -07:00
Valentin Clement (バレンタイン クレメン)
91f9f0fa1e
[flang][cuda] Update cuf.kernel_launch stream and conversion (#136179)
Update `cuf.kernel_launch` to take the stream as a reference. Update the
conversion to insert the `cuf.stream_cast` op so the stream can be set
as dependency.
2025-04-17 12:55:08 -07:00
Valentin Clement (バレンタイン クレメン)
ca53463137
[flang][cuda] Propagate stream information to gpu.launch_func op (#135227)
Use the information from `cuf.kernel_launch` to `gpu.launch_func`
2025-04-10 11:58:18 -07:00
Valentin Clement (バレンタイン クレメン)
5be9082fed
[flang][cuda] Carry over the dynamic shared memory size to gpu.launch_func (#132837) 2025-03-24 18:37:19 -07:00
Valentin Clement (バレンタイン クレメン)
478e516140
[flang][cuda] Sync double descriptor after c_f_pointer call (#130194)
After a global device pointer is set through `c_f_pointer`, we need to
sync the double descriptor so the version on the device is also up to
date.
2025-03-06 19:19:51 -08:00
Valentin Clement (バレンタイン クレメン)
93b2e47f12
[flang][cuda] Avoid assign element mismatch when doing data transfer from a constant (#128252)
Currently when we do a CUDA data transfer from a constant, we embox it
and delegate the assignment to the runtime. When the type of the
constant is not exactly the same as the destination descriptor, the
runtime will emit an assignment mismatch error.

Convert the constant when necessary so the assignment is fine.
2025-02-21 17:46:46 -08:00
Michael Kruse
b815a3942a
[Flang] Move non-common headers to FortranSupport (#124416)
Move non-common files from FortranCommon to FortranSupport (analogous to
LLVMSupport) such that

* declarations and definitions that are only used by the Flang compiler,
but not by the runtime, are moved to FortranSupport

* declarations and definitions that are used by both ("common"), the
compiler and the runtime, remain in FortranCommon

* generic STL-like/ADT/utility classes and algorithms remain in
FortranCommon

This allows a for cleaner separation between compiler and runtime
components, which are compiled differently. For instance, runtime
sources must not use STL's `<optional>` which causes problems with CUDA
support. Instead, the surrogate header `flang/Common/optional.h` must be
used. This PR fixes this for `fast-int-sel.h`.

Declarations in include/Runtime are also used by both, but are
header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler
and runtime, is also moved into FortranCommon.
2025-02-06 15:29:10 +01:00
Valentin Clement (バレンタイン クレメン)
f1b075df2e
[flang][cuda] Pass the pinned variable in allocate calls (#125310) 2025-02-02 18:05:59 -08:00
agozillon
4186805060
[Flang][MLIR] Extend DataLayout utilities to have basic GPU Module support (#123149)
As there is now certain areas where we now have the possibility of
having either a ModuleOp or GPUModuleOp and both of these modules can
have DataLayout's and we may require utilising the DataLayout utilities
in these areas I've taken the liberty of trying to extend them for use
with both.

Those with more knowledge of how they wish the GPUModuleOp's to interact
with their parent ModuleOp's DataLayout may have further alterations
they wish to make in the future, but for the moment, it'll simply
utilise the basic data layout construction which I believe combines
parent and child datalayouts from the ModuleOp and GPUModuleOp. If there
is no GPUModuleOp DataLayout it should default to the parent ModuleOp.

It's worth noting there is some weirdness if you have two module
operations defining builtin dialect DataLayout Entries, it appears the
combinatorial functionality for DataLayouts doesn't support the merging
of these.

This behaviour is useful for areas like:
https://github.com/llvm/llvm-project/pull/119585/files#diff-19fc4bcb38829d085e25d601d344bbd85bf7ef749ca359e348f4a7c750eae89dR1412
where we have a crossroads between the two different module operations.
2025-01-30 17:31:50 +01:00
Valentin Clement (バレンタイン クレメン)
382d3599c2
[flang][cuda] Propagate the data attribute on the converted calls (#124877) 2025-01-29 08:04:22 -08:00
Valentin Clement (バレンタイン クレメン)
ee054404df
[flang][cuda] Carry over the cuf.proc_attr attribute to gpu.launch_func (#124325) 2025-01-24 13:09:58 -08:00
Valentin Clement (バレンタイン クレメン)
67a8857989
[flang][cuda] Handle pointer allocation with double descriptors (#124183) 2025-01-23 15:54:22 -08:00
Valentin Clement (バレンタイン クレメン)
8c138bee6e
[flang][cuda] Handle pointer allocation with source (#124070) 2025-01-23 09:24:06 -08:00
Valentin Clement (バレンタイン クレメン)
6e498bc2cd
[flang][cuda] Handle simple device pointer allocation (#123996) 2025-01-22 15:59:32 -08:00
Valentin Clement (バレンタイン クレメン)
ce30ee53a8
[flang][cuda] Add gpu.launch to device context (#123105)
`gpu.launch` should also be considered device context.
2025-01-15 14:04:38 -08:00
Valentin Clement (バレンタイン クレメン)
a19919f4cd
[flang][cuda] Add cuf.device_address operation (#122975)
Introduce a new op to get the device address from a host symbol. This
simplify the current conversion and this is also in preparation for some
legalization work that need to be done in cuf kernel and cuf kernel
launch similar to
https://github.com/llvm/llvm-project/pull/122802
2025-01-14 17:38:04 -08:00
Valentin Clement (バレンタイン クレメン)
ba4dc5a0d6
[flang][cuda] Pass the device address for global descriptor (#122802) 2025-01-13 17:23:12 -08:00
Valentin Clement (バレンタイン クレメン)
7531672712
[flang][cuda][NFC] Remove unused variable (#121533)
Failed buildbot after https://github.com/llvm/llvm-project/pull/121524
2025-01-02 17:37:44 -08:00
Valentin Clement (バレンタイン クレメン)
6dcd2b035d
[flang][cuda] Convert cuf.sync_descriptor to runtime call (#121524)
Convert the op to a new entry point in the runtime
`CUFSyncGlobalDescriptor`
2025-01-02 17:02:59 -08:00
Valentin Clement (バレンタイン クレメン)
4b17a8b10e
[flang][cuda] Add operation to sync global descriptor (#121520)
Introduce cuf.sync_descriptor to be used to sync device global
descriptor after pointer association.

Also move CUFCommon so it can be used in FIRBuilder lib as well.
2025-01-02 17:02:45 -08:00
Valentin Clement (バレンタイン クレメン)
415cfaf339
[flang][cuda][NFC] Fix type in CUFFreeDescriptor (#120799) 2024-12-20 14:43:12 -08:00
Valentin Clement (バレンタイン クレメン)
e650ac1654
[flang][cuda][NFC] Fix typo in CUFAllocDescriptor (#120797)
Missing `r` in the function name.
2024-12-20 13:57:47 -08:00
Renaud Kauffmann
27e458c8cb
[flang][cuda] Distinguish constant fir.global from globals with a #cuf.cuda<constant> attribute (#118912)
1. In `CufOpConversion` `isDeviceGlobal` was renamed
`isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal`
excludes constant `fir.global` operation from registration. This is to
avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not
have any symbols in the runtime. This was done for
`_FortranACUFRegisterVariable` in #118582, but also needs to be done
here after #118591
2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute
to the constant global. As discussed in #118582 a module variable with
the #cuf.cuda<constant> attribute is not a compile time constant. Yet,
the compile time constant also needs to be copied into the GPU module.
The candidates for copy to the GPU modules are
- the globals needing regsitrations regardless of their uses in device
code (they can be referred to in host code as well)
       - the compile time constant when used in device code 

3. The registration of "constant" module device variables (
#cuf.cuda<constant>) can be restored in `CufAddConstructor`
2024-12-05 18:36:48 -08:00
Valentin Clement (バレンタイン クレメン)
7efd6139f2
[flang][cuda] Get device address in fir.declare (#118591)
Add pattern that update fir.declare memref when it comes from a device
global and is not a descriptor. In that case, we recover the device
address that needs to be used in ops like `fir.array_coor` and so on.
2024-12-04 13:36:58 -08:00
Valentin Clement (バレンタイン クレメン)
b5825963f0
[flang][cuda] Materialize box when needed (#117810)
Materialize the box when the src comes from a embox or rebox operation.
This was done in the case of transfer to a descriptor but not when
transferring from a descriptor.
2024-11-26 17:36:25 -08:00
Valentin Clement (バレンタイン クレメン)
eb5cda480d
[flang][cuda] cuf.allocate: Carry over stream to the runtime call (#117631)
- Update the runtime entry points to accept a stream information
- Update the conversion of `cuf.allocate` to pass correctly the stream
information when present.

Note that the stream is not currently used in the runtime. This will be
done in a separate patch as a design/solution needs to be down together
with the allocators.
2024-11-25 20:46:24 -08:00
Valentin Clement (バレンタイン クレメン)
5802367ddb
[flang][cuda] Add support for allocate with source (#117388)
Add support for allocate statement with CUDA device variable and a
source.
2024-11-22 16:55:26 -08:00
Valentin Clement (バレンタイン クレメン)
4d7df40c08
[flang][cuda] Materialize constant src in memory (#116851)
When the src of the data transfer is a constant, it needs to be
materialized in memory to be able to perform a data transfer.

```
subroutine sub1()
  real, device :: a(10)
  integer :: I

  do i = 5, 10
    a(i) = -4.0
  end do
end
```
2024-11-19 14:11:20 -08:00
Valentin Clement (バレンタイン クレメン)
de2e270ee6
[flang][cuda] Materialize box when src or dst are rebox (#116494) 2024-11-18 09:22:12 -08:00
Valentin Clement
42be165dde Reland '[flang][cuda] Specialize entry point for scalar to desc data transfer' 2024-11-15 19:13:55 -08:00
Valentin Clement (バレンタイン クレメン)
70b9440c88
Revert "[flang][cuda] Specialize entry point for scalar to desc data transfer" (#116458)
Reverts llvm/llvm-project#116457
2024-11-15 17:44:48 -08:00
Valentin Clement (バレンタイン クレメン)
43cb424a54
[flang][cuda] Specialize entry point for scalar to desc data transfer (#116457)
The runtime Assign function is not meant to initialize an array from a
scalar. For that we need to use DoAssignFromSource. Update the data
transfer from scalar to descriptor to use a new entry point that use
this function underneath.
2024-11-15 17:41:23 -08:00
Valentin Clement (バレンタイン クレメン)
b1fa9d154b
[flang][cuda] Correctly embox logical constant (#116445) 2024-11-15 15:29:41 -08:00
Valentin Clement (バレンタイン クレメン)
012fad975e
[flang][cuda] Materialize the box in memory when dst is emboxed (#116320)
Similar to #116289 but for the dst.
2024-11-15 14:31:36 -08:00
Valentin Clement (バレンタイン クレメン)
e8469f1577
[flang][cuda] Add support for character type in cuf.alloc and cuf.data_transfer (#116277)
Add support for character type in bytes computation
2024-11-15 14:31:21 -08:00
Valentin Clement (バレンタイン クレメン)
98daf22638
[flang][cuda] Materialize the box in memory when src is emboxed (#116289) 2024-11-14 18:33:14 -08:00
Valentin Clement (バレンタイン クレメン)
02018cf793
[flang][cuda][NFC] Use mlir::emitError to get location (#116267)
Use `mlir::emitError` so we can get location information on error.
2024-11-14 10:32:09 -08:00
Valentin Clement (バレンタイン クレメン)
d133a3ee9d
[flang][cuda] Add conversion after CUFGetDeviceAddress to avoid issue when emboxing (#116145) 2024-11-14 09:03:15 -08:00
Valentin Clement (バレンタイン クレメン)
ec066d30e2
[flang][cuda] cuf.alloc in device context should be converted to fir.alloc (#116110)
Update `inDeviceContext` to account for the gpu.func operation.
2024-11-13 14:57:42 -08:00
Valentin Clement (バレンタイン クレメン)
e457861647
[flang][cuda] Support shape shift in data transfer op. (#115929)
When an array is declared with a non default lower bound, the declare op
`getShape` will return a `ShapeShiftOp`. This result is used in data
transfer operation to compute the number of bytes to transfer. Update
the op to support `ShapeShiftOp`.
2024-11-13 11:13:19 -08:00
Valentin Clement (バレンタイン クレメン)
2583071fb4
[flang][cuda] Compute size of derived type arrays (#115914) 2024-11-12 21:23:58 -08:00
Valentin Clement (バレンタイン クレメン)
853d52b838
[flang][cuda] Support derived type in cuf.data_transfer conversion (#115557)
Support derived type in `cuf.data_transfer` conversion by computing
their size in bytes.
2024-11-12 10:05:53 -08:00
Valentin Clement (バレンタイン クレメン)
d4eb430c9e
[flang][cuda] Support derived type in cuf.alloc (#115550)
Number of bytes to allocate was not computed when using `cuf.alloc` with
a derived type. Update the conversion to compute the number of bytes and
emit an error when type is not supported.
2024-11-08 14:32:00 -08:00