If an inlined kernel is called in a loop, the launch point alloca would
lead to increasing stack usage every time the kernel is invoked. This
could make the application run out of stack space and crash. This problem
is fixed by using the alloca insertion point while creating the alloca instruction.
Fixes https://github.com/llvm/llvm-project/issues/60602
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D145820
The indeces of the dependent loops are properly ordered, just start from
1, so need just subtract 1 to get correct loop index.
Differential Revision: https://reviews.llvm.org/D145514