Otherwise, builds with `-DFLANG_CUF_RUNTIME` hits:
```
runtime/CUDA/descriptor.cpp:44:24: error: invalid use of incomplete type 'const class Fortran::runtime::Descriptor'
44 | std::size_t count{src->SizeInBytes()};
```
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.
This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.
A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
This patch adds new runtime entry points that perform the simple
allocation/deallocation of module allocatable variable with cuda
attributes.
When the allocation is initiated on the host, the descriptor on the
device is synchronized. Both descriptors point to the same data on the
device.
This is the first PR of a stack.
This patch adds entry point in the runtime to be able to allocate
descriptors in managed memory. These entry points currently only call
`CUFAllocManaged` and `CUFFreeManaged` but could be more complicated in
the future.
`cuf.alloc` and `cuf.free` related to local descriptors are converted
into runtime calls.