In order to get the pointer to a structure member, `getelementptr`
typically requires two indices: one to indicate the structure itself,
and another to specify the member's position. We are missing the former
in `GPULaunchKernelConversion`, so generated code may cause stack
corruption. This PR corrects the indices of a structure used as a kernel
launch temp.
For the call to _FortranACUFLaunchKernel, we store the pointer to a
member of a temporary structure in a parameter array. However, when we
obtain an element pointer from the parameter array, its address is
calculated based on the type of the structure. This PR properly treats
the parameter array as an array of pointers.
Example:
```mlir
%30 = llvm.load %29 : !llvm.ptr -> i32
%31 = llvm.mlir.constant(1 : i32) : i32
%32 = llvm.alloca %31 x !llvm.struct<(i64, i64, i32, ptr)> : (i32) -> !llvm.ptr
%33 = llvm.mlir.constant(4 : i32) : i32
%34 = llvm.alloca %33 x !llvm.ptr : (i32) -> !llvm.ptr
%35 = llvm.mlir.constant(0 : i32) : i32
%36 = llvm.getelementptr %32[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %8, %36 : i64, !llvm.ptr
%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %36, %37 : !llvm.ptr, !llvm.ptr
...
llvm.call @_FortranACUFLaunchKernel(%47, %8, %8, %8, %2, %8, %8, %7, %34, %48) : (!llvm.ptr, i64, i64, i64, i64, i64, i64, i32, !llvm.ptr, !llvm.ptr) -> ()
```
In this example, `%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32)
-> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>` will be `%37 =
llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.ptr`.
Kernel launch in CUF are converted to `gpu.launch_func`. When the kernel
has `cluster_dims` specified these get carried over to the
`gpu.launch_func` operation. This patch updates the special conversion
of `gpu.launch_func` when cluster dims are present to the newly added
entry point.