darkbuck 61881c307c
[CUDA] Add device-side kernel launch support (#165519)
- CUDA's dynamic parallelism extension allows device-side kernel
launches, which share the identical syntax to host-side launches, e.g.,

    kernel<<<Dg, Db, Ns, S>>>(arguments);

but differ from the code generation. That device-side kernel launches is
eventually translated into the following sequence

    config = cudaGetParameterBuffer(alignment, size);
    // setup arguments by copying them into `config`.
    cudaLaunchDevice(func, config, Dg, Db, Ns, S);

- To support the device-side kernel launch, 'CUDAKernelCallExpr' is
reused but its config expr is set to a call to 'cudaLaunchDevice'.
During the code generation, 'CUDAKernelCallExpr' is expanded into the
sequence aforementioned.

- As the device-side kernel launch requires the source to be compiled as
relocatable device code and linked with '-lcudadevrt'. Linkers are
changed to pass relevant link options to 'nvlink'.
2025-12-01 17:45:10 +00:00
..