This (mostly) removes one of the largest remaining limitations of
`hipstdpar` based algorithm acceleration, by adding support for global
variable usage in offloaded algorithms. It is mean to compose with a run
time component that will live in the support library, and fires iff a
special variable is provided by the latter. In short, things work as
follows:
- We replace uses some global `G` with an indirect access via an
implicitly created anonymous global `F`, which is of pointer type and is
expected to hold the program-wide address of `G`;
- We append 'F', alongside 'G''s name, to an table structure;
- At run-time, the support library uses the table to look-up the
program-wide address of a contained symbol based on its name, and then
stores the address via the paired pointer.
This doesn't handle internal linkage symbols (`static foo` or `namespace
{ foo }`) if they are not unique i.e. if there's a name clash that is
solved by the linker, as the resolution would not be visible. Also,
initially we will only support "true" globals in RDC mode. Things would
be much simpler if we had direct access to the accelerator loader, but
since the expectation is to compose at the HIP RT level we have to jump
through additional hoops.
Following GitHub organizations were merged into the ROCm org:
* ROCm-Developer-Tools
* RadeonOpenCompute
* ROCmSoftwarePlatform
Ensure that all hyperlinks to the old organizations now point to the new
organization at https://github.com/ROCm.
The allocation interposition mode had a number of issues, which are
primarily addressed in the library component via
<https://github.com/ROCm/rocThrust/pull/543>. However, it is necessary
to interpose some additional symbols, which this patch does.
Furthermore, to implement this in a compatible way, we guard the new
implementation under a V1 macro, which is defined in addition to the
existing `__HIPSTDPAR_INTERPOSE_ALLOC__` one.
Currently if CUDA/HIP users use template class with virtual dtor
and std::string data member with C++20 and MSVC. When the template
class is explicitly instantiated, there is error about host
function called by host device function (used to be undefined
symbols in linking stage before member destructors were checked
by deferred diagnostics).
It was caused by clang inferring host/device attributes for
default dtors. Since all dtors of member and parent classes
have implicit host device attrs, clang infers the virtual dtor have
implicit host and device attrs. Since virtual dtor of
explicitly instantiated template class must be emitted,
this causes constexpr dtor of std::string emitted, which
calls a host function which was not emitted on device side.
This is a serious issue since it prevents users from
using std::string with C++20 on Windows.
When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative
since it is sure to be emitted. Since an implicit host device
function may call a host function, clang cannot assume it is
always available on device. This guarantees dtors that
may call host functions not to have implicit device attr,
therefore will not be emitted on device side.
Fixes: https://github.com/llvm/llvm-project/issues/108548
Fixes: SWDEV-517435
This addresses an odd ommision from the 19 release cycle, wherein we
upstreamed HIPSTDPAR support without adding the relevant documentation.
As an added bonus, we also remove a reference to `amdgcnspirv` not
mixing with concrete targets, as that limitation has been addressed.
So far, these macros can be used in contexts where no meaningful
wavefront size is available. We therefore deprecate these macros, to
replace them with a more resilient interface to access wavefront size
information where it is available.
Reapplies #112849 with a fix for the non-hermetic clang test that failed
on Mac after the revert in #115499.
For SWDEV-491529.
So far, these macros can be used in contexts where no meaningful
wavefront size is available. We therefore deprecate these macros, to
replace them with a more resilient interface to access wavefront size
information where it is available.
For SWDEV-491529.
This is mostly stealing from #75357, and updating it to reflect the
pivot towards AMDGCN flavoured SPIR-V and the slightly different set of
limitations. As we bring up more functionality it will be updated
accordingly. With thanks to @yxsamliu.
Summary:
Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via `-mlink-builtin-bitcode` to inherit its attributes.
However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.
When doing combined compilation/link for HIP source
files, clang should link the HIP runtime library
automatically without --hip-link.
Reviewed by: Siu Chi Chan, Joseph Huber
Differential Revision: https://reviews.llvm.org/D156426
start with example usage, predefined macros, and path setting.
Reviewed by: Brian Sumner, Siu Chi Chan, Matt Arsenault, Ronan Keryell
Differential Revision: https://reviews.llvm.org/D154123