13 Commits

Author SHA1 Message Date
Gang Chen
ef68d1587d
[AMDGPU] upstream barrier count reporting part1 (#154409) 2025-08-19 16:42:31 -07:00
Jay Foad
1c49ce676c
[AMDGPU] Enable FWD_PROGRESS bit for GFX10+ on PAL (#139895)
Performance testing shows no significant gains or losses on graphics
workloads, so this is mostly to make the behavior consistent across all
supported OSes instead of special-casing HSA.
2025-07-21 17:29:06 +01:00
Alex Voicu
c1fabd681f
[llvm][AMDGPU] Enable FWD_PROGRESS bit for GFX10+ (#128367)
From GFX10 onwards it is possible to employ benevolent scheduling of
waves. This patch unconditionally enables, for the `amdhsa` OS, the bit
which controls that capability, as it is beneficial for algorithms that
rely on more complex concurrent coordination and it is generally
performance neutral otherwise.
2025-03-17 23:17:46 +00:00
Stanislav Mekhanoshin
6c9a9d9fe2
[AMDGPU] Set inst_pref_size to maximum (#126981)
On gfx11 and gfx12 set initial instruction prefetch size to a
minimum of kernel size and maximum allowed value.

Fixes: SWDEV-513122
2025-03-03 10:40:31 -08:00
Stanislav Mekhanoshin
2479479285
[AMDGPU] Extend ComputePGMRSrc3 to gfx10+. NFCI. (#129289)
ComputePGMRSrc3 exists since gfx90a and gfx10+. Current code
only expects gfx90a. This is NFCI since we do not fill it on
gfx10+ yet.
2025-03-03 08:22:15 -08:00
Stanislav Mekhanoshin
8529bd7b96
[AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (#127142) 2025-02-18 13:19:33 -08:00
Stanislav Mekhanoshin
bc4f05d8a8
[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. (#127129)
It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.
2025-02-18 02:08:28 -08:00
Stanislav Mekhanoshin
d19187f5fe
[AMDGPU] Move into SIProgramInfo and cache getFunctionCodeSize. NFCI. (#127111)
This moves function as is, improvements to the estimate go into
a subseqent patch.
2025-02-17 18:22:48 -08:00
Janek van Oirschot
17eaa23f7e
[AMDGPU] MCExpr-ify AMDGPU HSAMetadata (#94788)
Enables MCExpr for HSAMetadata, particularly, HSAMetadata's msgpack format.
2024-06-26 16:39:08 +01:00
Janek van Oirschot
d86b68afd7
MCExpr-ify SIProgramInfo (#88257)
Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.
2024-05-09 13:02:32 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
David Stuttard
fc83f1de5d [AMDGPU] Add backend support for new PAL ELF Metadata 3.0
PAL Metadata 3.0 introduces an explicit structure in metadata for the
programmable registers written out by the compiler backend.
Rather than using opaque registers which can change between different
architectures and requires encoding the bitfield information in the backend,
which may change between versions.

This is the initial minimal implementation that enables the use of PAL Metadata
3.0.

The change itself should be NFC for non-PAL, although the way RSRC2 register is
handled has been changed slightly.

The test is fairly minimal, but checks that the metadata format looks as
expected and verifies a couple of special cases such as tgid_[xyz]_en handling
and PsInputAddr/Ena which also change to explicit fields.

Differential Revision: https://reviews.llvm.org/D147143
2023-04-14 09:57:13 +01:00
Sebastian Neubauer
1124bf4ab7 [AMDGPU] Set rsrc1 flags for graphics shaders
Before they were only set for compute kernels and compute shaders but
not for other shaders.

Differential Revision: https://reviews.llvm.org/D89399
2020-11-04 12:25:41 +01:00