The new LLVM stack save/restore intrinsic operations are more convenient
than function calls because they do not add function declarations to the
module and therefore do not block the parallelisation of passes.
Furthermore they could be much more easily marked with memory effects
than function calls if that ever proved useful.
This builds on top of #107879.
Resolves#108016
This PR aims to factor in the allocation address space provided by an
architectures data layout when generating the intrinsic instructions,
this allows them to be lowered later with the address spaces in tow.
This aligns the intrinsic creation with the LLVM IRBuilder's
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/IRBuilder.h#L1053
This is also necessary for the below example to compile for OpenMP AMD
GPU and not ICE the compiler in ISEL as AMD's stackrestore and stacksave
are expected to have the appropriate allocation address space for AMD
GPU.
program main
integer(4), allocatable :: test
allocate(test)
!$omp target map(tofrom:test)
do i = 1, 10
test = test + 50
end do
!$omp end target
deallocate(test)
end program
The PR also fixes the issue I opened a while ago which hits the same
error when compiling for AMDGPU:
https://github.com/llvm/llvm-project/issues/82368
Although, you have to have the appropriate GPU LIBC and Fortran offload
runtime (both compiled for AMDGPU) added to the linker for the command
or it will reach another ISEL error and ICE weirdly. But with the
pre-requisites it works fine with this PR.
Some passes in the flang pipeline are creating `fir.alloca` operation
like `hlfir.concat`. When these allocas are located in a loop, the stack
can quickly be used too much leading to segfaults.
This behavior can be seen in
https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90
This patch insert a call to LLVM stacksave/stackrestore in the body of
the loop to reclaim the alloca in its scope.
This PR is an alternative implementation to #95173