CUDA Fortran is meant to be an equivalent to the runtime API. Therefore,
it makes more sense to use the cuda rt API in the allocators for CUF.
@bdudleback
CUDA unified variable where set to use the same allocator than managed
variable. This patch adds a specific allocator for the unified
variables. Currently it will call the managed allocator underneath but
we want to have the flexibility to change that in the future.
Add allocators for CUDA fortran allocation on the device. 3 allocators
are added for pinned, device and managed/unified memory allocation.
`CUFRegisterAllocator()` is called to register the allocators in the
allocator registry added in #100690.
Since this require CUDA, a cmake option `FLANG_CUF_RUNTIME` is added to
conditionally build these.