Performance testing shows no significant gains or losses on graphics
workloads, so this is mostly to make the behavior consistent across all
supported OSes instead of special-casing HSA.
From GFX10 onwards it is possible to employ benevolent scheduling of
waves. This patch unconditionally enables, for the `amdhsa` OS, the bit
which controls that capability, as it is beneficial for algorithms that
rely on more complex concurrent coordination and it is generally
performance neutral otherwise.
PAL Metadata 3.0 introduces an explicit structure in metadata for the
programmable registers written out by the compiler backend.
Rather than using opaque registers which can change between different
architectures and requires encoding the bitfield information in the backend,
which may change between versions.
This is the initial minimal implementation that enables the use of PAL Metadata
3.0.
The change itself should be NFC for non-PAL, although the way RSRC2 register is
handled has been changed slightly.
The test is fairly minimal, but checks that the metadata format looks as
expected and verifies a couple of special cases such as tgid_[xyz]_en handling
and PsInputAddr/Ena which also change to explicit fields.
Differential Revision: https://reviews.llvm.org/D147143