37 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
7531672712
[flang][cuda][NFC] Remove unused variable (#121533)
Failed buildbot after https://github.com/llvm/llvm-project/pull/121524
2025-01-02 17:37:44 -08:00
Valentin Clement (バレンタイン クレメン)
6dcd2b035d
[flang][cuda] Convert cuf.sync_descriptor to runtime call (#121524)
Convert the op to a new entry point in the runtime
`CUFSyncGlobalDescriptor`
2025-01-02 17:02:59 -08:00
Valentin Clement (バレンタイン クレメン)
4b17a8b10e
[flang][cuda] Add operation to sync global descriptor (#121520)
Introduce cuf.sync_descriptor to be used to sync device global
descriptor after pointer association.

Also move CUFCommon so it can be used in FIRBuilder lib as well.
2025-01-02 17:02:45 -08:00
Valentin Clement (バレンタイン クレメン)
415cfaf339
[flang][cuda][NFC] Fix type in CUFFreeDescriptor (#120799) 2024-12-20 14:43:12 -08:00
Valentin Clement (バレンタイン クレメン)
e650ac1654
[flang][cuda][NFC] Fix typo in CUFAllocDescriptor (#120797)
Missing `r` in the function name.
2024-12-20 13:57:47 -08:00
Renaud Kauffmann
27e458c8cb
[flang][cuda] Distinguish constant fir.global from globals with a #cuf.cuda<constant> attribute (#118912)
1. In `CufOpConversion` `isDeviceGlobal` was renamed
`isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal`
excludes constant `fir.global` operation from registration. This is to
avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not
have any symbols in the runtime. This was done for
`_FortranACUFRegisterVariable` in #118582, but also needs to be done
here after #118591
2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute
to the constant global. As discussed in #118582 a module variable with
the #cuf.cuda<constant> attribute is not a compile time constant. Yet,
the compile time constant also needs to be copied into the GPU module.
The candidates for copy to the GPU modules are
- the globals needing regsitrations regardless of their uses in device
code (they can be referred to in host code as well)
       - the compile time constant when used in device code 

3. The registration of "constant" module device variables (
#cuf.cuda<constant>) can be restored in `CufAddConstructor`
2024-12-05 18:36:48 -08:00
Valentin Clement (バレンタイン クレメン)
7efd6139f2
[flang][cuda] Get device address in fir.declare (#118591)
Add pattern that update fir.declare memref when it comes from a device
global and is not a descriptor. In that case, we recover the device
address that needs to be used in ops like `fir.array_coor` and so on.
2024-12-04 13:36:58 -08:00
Valentin Clement (バレンタイン クレメン)
b5825963f0
[flang][cuda] Materialize box when needed (#117810)
Materialize the box when the src comes from a embox or rebox operation.
This was done in the case of transfer to a descriptor but not when
transferring from a descriptor.
2024-11-26 17:36:25 -08:00
Valentin Clement (バレンタイン クレメン)
eb5cda480d
[flang][cuda] cuf.allocate: Carry over stream to the runtime call (#117631)
- Update the runtime entry points to accept a stream information
- Update the conversion of `cuf.allocate` to pass correctly the stream
information when present.

Note that the stream is not currently used in the runtime. This will be
done in a separate patch as a design/solution needs to be down together
with the allocators.
2024-11-25 20:46:24 -08:00
Valentin Clement (バレンタイン クレメン)
5802367ddb
[flang][cuda] Add support for allocate with source (#117388)
Add support for allocate statement with CUDA device variable and a
source.
2024-11-22 16:55:26 -08:00
Valentin Clement (バレンタイン クレメン)
4d7df40c08
[flang][cuda] Materialize constant src in memory (#116851)
When the src of the data transfer is a constant, it needs to be
materialized in memory to be able to perform a data transfer.

```
subroutine sub1()
  real, device :: a(10)
  integer :: I

  do i = 5, 10
    a(i) = -4.0
  end do
end
```
2024-11-19 14:11:20 -08:00
Valentin Clement (バレンタイン クレメン)
de2e270ee6
[flang][cuda] Materialize box when src or dst are rebox (#116494) 2024-11-18 09:22:12 -08:00
Valentin Clement
42be165dde Reland '[flang][cuda] Specialize entry point for scalar to desc data transfer' 2024-11-15 19:13:55 -08:00
Valentin Clement (バレンタイン クレメン)
70b9440c88
Revert "[flang][cuda] Specialize entry point for scalar to desc data transfer" (#116458)
Reverts llvm/llvm-project#116457
2024-11-15 17:44:48 -08:00
Valentin Clement (バレンタイン クレメン)
43cb424a54
[flang][cuda] Specialize entry point for scalar to desc data transfer (#116457)
The runtime Assign function is not meant to initialize an array from a
scalar. For that we need to use DoAssignFromSource. Update the data
transfer from scalar to descriptor to use a new entry point that use
this function underneath.
2024-11-15 17:41:23 -08:00
Valentin Clement (バレンタイン クレメン)
b1fa9d154b
[flang][cuda] Correctly embox logical constant (#116445) 2024-11-15 15:29:41 -08:00
Valentin Clement (バレンタイン クレメン)
012fad975e
[flang][cuda] Materialize the box in memory when dst is emboxed (#116320)
Similar to #116289 but for the dst.
2024-11-15 14:31:36 -08:00
Valentin Clement (バレンタイン クレメン)
e8469f1577
[flang][cuda] Add support for character type in cuf.alloc and cuf.data_transfer (#116277)
Add support for character type in bytes computation
2024-11-15 14:31:21 -08:00
Valentin Clement (バレンタイン クレメン)
98daf22638
[flang][cuda] Materialize the box in memory when src is emboxed (#116289) 2024-11-14 18:33:14 -08:00
Valentin Clement (バレンタイン クレメン)
02018cf793
[flang][cuda][NFC] Use mlir::emitError to get location (#116267)
Use `mlir::emitError` so we can get location information on error.
2024-11-14 10:32:09 -08:00
Valentin Clement (バレンタイン クレメン)
d133a3ee9d
[flang][cuda] Add conversion after CUFGetDeviceAddress to avoid issue when emboxing (#116145) 2024-11-14 09:03:15 -08:00
Valentin Clement (バレンタイン クレメン)
ec066d30e2
[flang][cuda] cuf.alloc in device context should be converted to fir.alloc (#116110)
Update `inDeviceContext` to account for the gpu.func operation.
2024-11-13 14:57:42 -08:00
Valentin Clement (バレンタイン クレメン)
e457861647
[flang][cuda] Support shape shift in data transfer op. (#115929)
When an array is declared with a non default lower bound, the declare op
`getShape` will return a `ShapeShiftOp`. This result is used in data
transfer operation to compute the number of bytes to transfer. Update
the op to support `ShapeShiftOp`.
2024-11-13 11:13:19 -08:00
Valentin Clement (バレンタイン クレメン)
2583071fb4
[flang][cuda] Compute size of derived type arrays (#115914) 2024-11-12 21:23:58 -08:00
Valentin Clement (バレンタイン クレメン)
853d52b838
[flang][cuda] Support derived type in cuf.data_transfer conversion (#115557)
Support derived type in `cuf.data_transfer` conversion by computing
their size in bytes.
2024-11-12 10:05:53 -08:00
Valentin Clement (バレンタイン クレメン)
d4eb430c9e
[flang][cuda] Support derived type in cuf.alloc (#115550)
Number of bytes to allocate was not computed when using `cuf.alloc` with
a derived type. Update the conversion to compute the number of bytes and
emit an error when type is not supported.
2024-11-08 14:32:00 -08:00
Valentin Clement (バレンタイン クレメン)
ef8d88ca1a
[flang][cuda] Support scalar to array data transfer (#115273)
Do it via descriptor assignment until we have a more efficient way.
2024-11-07 09:27:10 -08:00
Valentin Clement (バレンタイン クレメン)
db69d6939a
[flang][cuda] Support data transfer from descriptor to a pointer (#115023)
Data transfer from a variable with a descriptor to a pointer. We create
a descriptor for the pointer so we can use the flang runtime to perform
the transfer. The Assign function handles all corner cases. We add a new
entry points `CUFDataTransferDescDescNoRealloc` to avoid reallocation
since the variable on the LHS is not an allocatable.
2024-11-05 11:59:08 -08:00
Valentin Clement (バレンタイン クレメン)
652db7e4ff
[flang][cuda] Support data transfer from pointer to a descriptor (#114892)
When source is a pointer to an array or a scalar, embox it and use the
`CUFDataTransferDescDesc` or `CUFDataTransferGlobalDescDesc` entry
points. The runtime is already able to deal with all the corner cases
like non contiguous arrays and so on so we exploit this.

Memset might still be used for simple case where we want to initialize
to 0 for example. This will come in a follow up patch.
2024-11-05 08:56:19 -08:00
Valentin Clement (バレンタイン クレメン)
9d09c6fd9c
[flang][cuda] Update device descriptor on data transfer (#114838)
When the destination of the data transfer is a global we might need to
sync the descriptor after the data transfer is done. This is the case
when the data transfer is from host/device to device as reallocation
might have happened and the descriptor on the device needs to take the
new values written on the host.

A new entry point is added `CUFDataTransferGlobalDescDesc` with the sync
when needed.
2024-11-04 13:22:06 -08:00
Valentin Clement (バレンタイン クレメン)
e4e9fea71e
[flang][cuda] Pass descriptor by reference for CUFMemsetDescriptor (#114338) 2024-10-31 09:02:59 -07:00
Renaud Kauffmann
bfe486fe76
Passing descriptors by reference to CUDA runtime calls (#114288)
Passing a descriptor as a `const Descriptor &` or a `const Descriptor *`
generates a FIR signature where the box is passed by value.
This is an issue, as it requires a load of the box to be passed. But
since, ultimately, all boxes are passed by reference a temporary is
generated in LLVM and the reference to the temporary is passed.

The boxes addresses are registered with the CUDA runtime but the
temporaries are not, thus preventing the runtime to properly map a host
side address to its device side counterpart.

To address this issue, this PR changes the signatures to the transfer
functions to pass a descriptor as a `Descriptor *`, which will in turn
generate a FIR signature with that takes a box reference as an argument.
2024-10-30 13:24:47 -07:00
Valentin Clement (バレンタイン クレメン)
0d94c7b5ce
[flang][cuda][NFC] Make pattern names homogenous (#114156)
Dialect name is uppercase. Make all the patterns prefix homogenous.
2024-10-29 20:39:17 -07:00
Valentin Clement (バレンタイン クレメン)
0fa2fb3ed0
[flang][cuda] Add conversion pattern for cuf.kernel_launch op (#114129) 2024-10-29 17:00:41 -07:00
Renaud Kauffmann
b9978f8c77
[flang][cuda] Adding variable registration in constructor (#113976)
1) Adding variable registration in constructor
2) Applying feedback from PR
https://github.com/llvm/llvm-project/pull/112989
2024-10-29 11:48:48 -07:00
Valentin Clement (バレンタイン クレメン)
4e40b71c51
[flang][cuda] Add specialized gpu.launch_func conversion (#113493) 2024-10-23 15:28:51 -07:00
Renaud Kauffmann
f1e59dcb45
Renaming Cuf passes to CUF (#113351)
For consistency with other dialects and other CUF passes and files, this
patch renames passes CufOpConversion to CUFOpConversion,
CufImplicitDeviceGlobal to CUFDeviceGlobal.
It also renames the file.
2024-10-22 12:50:31 -07:00