Summary:
We call the global constructors by function pointer. For whatever reason
the NVPTX architecture relies very specifically on the arguments to the
function pointer invocation matching what the function is implemented
as. This is problematic as most of these constructors are generated
with no arguments. This patch removes the extended arguments that GNU
and LLVM use for the constructors optionally so that it can support the
common case.
Summary:
The NVPTX backend is picky about the definitions of functions. Because
we call these functions with these arguments it can cause some problems
when it goes through the backend. This was observed in a different test
for `printf` that hasn't been landed yet. Also adjust the priority.
This patch adds the necessary hacks to support global constructors and
destructors. This is an incredibly hacky process caused by the primary
fact that Nvidia does not provide any binary tools and very little
linker support. We first had to emit references to these functions and
their priority in D149451. Then we dig them out of the module once it's
loaded to manually create the list that the linker should have made for
us. This patch also contains a few Nvidia specific hacks, but it passes
the test, albeit with a stack size warning from `ptxas` for the
callback. But this should be fine given the resource usage of a common
test.
This also adds a dependency on LLVM to the NVPTX loader, which hopefully doesn't
cause problems with our CUDA buildbot.
Depends on D149451
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D149527
This patch makes the necessary changes to support calling global
constructors and destructors on the GPU. The patch in D149340 allows the
`lld` linker to create the symbols pointing us to these globals. These
should be executed by a single thread, which is more difficult on the
GPU because all threads are active. I chose to use an atomic counter to
sync every thread on the GPU. This is very slow if you use more than a
few thousand threads, but for testing purposes it should be sufficient.
Depends on D149340 D149363
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D149398