llvm-project

Author	SHA1	Message	Date
Emma Pilkington	4490003a22	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905 ) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb.	2024-03-06 09:51:48 -05:00
pvanhout	89e91e4c0c	[AMDGPU] Remove post-PromoteAlloca SROA run PromoteAlloca now uses SSAUpdater, it doesn't need SROA to clean-up after it anymore. Internal testing shows no noticeable performance impact. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D156398	2023-08-11 08:29:21 +02:00
Corbin Robeck	7a4968b5a3	[AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156040 Author: Corbin Robeck <corbin.robeck@amd.com>	2023-07-25 12:20:13 -07:00
Austin Kerbow	864a2b25be	[AMDGPU] Reserve extra SGPR blocks wth XNACK "any" TID Setting ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel descriptor for the xnack_mask. This was broken for the dynamic XNACK "any" TID setting which could cause user SGPRs to be clobbered if the number of SGPRs reserved was near a granulated block boundary. When XNACK was enabled this worked correctly in the ASMParser which meant some kernels were only failing without "-save-temps". Fixes: SWDEV-382764 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D145401	2023-03-17 20:26:23 -07:00
Nicolai Hähnle	10cef708a7	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468	2023-01-23 21:43:06 +01:00
Nikita Popov	bdf2fbba9c	[AMDGPU] Convert some tests to opaque pointers (NFC)	2022-12-19 12:41:13 +01:00
Vang Thao	67357739c6	[AMDGPU] Add remarks to output some resource usage Add analyis remarks to output kernel name, register usage, occupancy, scratch usage, spills, and LDS information. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123878	2022-07-15 11:01:53 -07:00

7 Commits