`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.
Accommodate operations with VALUE dummy arguments in the runtime support
for the REDUCE intrinsic function by splitting most entry points into
Reduce...Ref and Reduce...Value variants.
Further work will be needed in lowering to call the ...Value entry
points.
Supports the REDUCE() transformational intrinsic function of Fortran
(see F'2023 16.9.173) in a manner similar to the existing support for
SUM(), PRODUCT(), &c. There are APIs for total reductions to scalar
results, and APIs for partial reductions that reduce the rank of the
argument by one.
This implementation requires more functions than other reductions
because the various possible types of the user-supplied OPERATION=
function need to be elaborated.
Once the basic API in reduce.h has been approved, later patches will
implement lowering.
REDUCE() is primarily for completeness, not portability; only one other
Fortran compiler implements this F'2018 feature today, and only some
types work correctly with it.