llvm-project

Author	SHA1	Message	Date
Austin Kerbow	4bcbeaed63	[AMDGPU] Enable kernel arg preloading with gfx90a (#81180 ) Add a trap instruction to the beginning of the kernel prologue to handle cases where preloading is attempted on HW loaded with incompatible firmware.	2024-02-12 22:33:29 -08:00
Pierre van Houtryve	f93aa5157a	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955 ) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.	2024-02-12 10:18:20 +01:00
Carl Ritson	7d508eb5d3	Revert "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 )" This reverts commit d6c7253d32e4bdff619c39708170f1c1fa01ff95. Change causing CTS failures due to incomplete metadata.	2024-02-07 17:09:56 +09:00
David Stuttard	d6c7253d32	[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 ) PAL Metadata 3.0 introduces an explicit structure in metadata for the programmable registers written out by the compiler backend. The previous approach used opaque registers which can change between different architectures and required encoding the bitfield information in the backend, which may change between versions. This change is an extension the previously added support - which only handled entry functions. This adds support for all functions. The change also includes some re-factoring to separate common code.	2024-02-06 15:34:36 +00:00
Pierre van Houtryve	500846d2f5	[AMDGPU] Introduce Code Object V6 (#76954 ) Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955	2024-02-05 08:19:53 +01:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Jay Foad	42b9ea841e	[AMDGPU] Increase max scratch allocation for GFX12 (#77625 )	2024-01-17 10:25:28 +00:00
Piotr Sobczak	fac093dd08	[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 13:52:40 +01:00
Pierre van Houtryve	ecd2f56a80	[AMDGPU] Warn if 'amdgpu-waves-per-eu' target occupancy was not met (#74055 ) This should make it a bit harder to miss this type of issue. The warning only shows if amdgpu-waves-per-eu is used. See SWDEV-434482	2023-12-06 10:46:46 +01:00
Pierre van Houtryve	4428b01faa	Reland: [AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-11-07 12:23:03 +01:00
Jay Foad	521ac12a25	[AMDGPU] Remove AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough (#71407 ) The special handling for blocks ending with a long branch has been unnecessary since D106445: "[amdgpu] Add 64-bit PC support when expanding unconditional branches."	2023-11-06 16:29:52 +00:00
pvanhout	868abf0961	Revert "[AMDGPU] Remove Code Object V3 (#67118 )" This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.	2023-10-18 12:55:36 +02:00
Pierre van Houtryve	544d91280c	[AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-10-16 08:21:48 +02:00
Yashwant Singh	7ac532efc8	[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091 ) Use this flag to give more context to implicit def comments in assembly. Reviewed on phabricator: https://reviews.llvm.org/D153754	2023-09-29 11:15:01 +05:30
Austin Kerbow	0455596e1e	[AMDGPU] Add DAG ISel support for preloaded kernel arguments This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patches in the series will make the codegen backwards compatible. This patch should only be submitted alongside that subsequent patch. Preloading here begins from the start of the kernel arguments until the amount of arguments indicated by the CL flag amdgpu-kernarg-preload-count. Aggregates and arguments passed by-ref are not supported. Special care for the alignment of the kernarg segment is needed as well as consideration of the alignment of addressable SGPR tuples when we cannot directly use misaligned large tuples that the arguments are loaded to. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D158579	2023-09-25 09:32:59 -07:00
Pierre van Houtryve	fe2f67e4ba	[AMDGPU] Remove Code Object V2 (#65715 ) Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).	2023-09-21 12:00:45 +02:00
Austin Kerbow	343be5132e	[AMDGPU] Add utilities to track number of user SGPRs. NFC. Factor out and unify some common code that calculates and tracks the number of user SGRPs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159439	2023-09-12 08:52:30 -07:00
Reid Kleckner	f86c81b2a8	[AMDGPU] Avoid CodeGen dependencies from AMDGPU/Utils and MCTargetDesc This required two substantial changes: 1. Moving a `getRegBitWidth(TargetRegisterClass)` overload out of Utils and into CodeGen 2. Passing the string function name to AMDGPUPALMetadata instead of the MachineFunction Other changes are minor or updates to accommodate the first two. See issue #64166 for more information on the layering issue. Differential Revision: https://reviews.llvm.org/D156486	2023-07-27 15:19:24 -07:00
Corbin Robeck	7a4968b5a3	[AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156040 Author: Corbin Robeck <corbin.robeck@amd.com>	2023-07-25 12:20:13 -07:00
Tom Stellard	4b36b2c23c	[Support] Use C++11 attribute syntax for visibility attributes The gnu extension __attribute syntax cannot be mixed with the C++11 alignas specifier, so in order to use visibility attributes on classes that also use alignas, we need to use the C++11 standard syntax. Also fix a few warnings introduced by this change. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D152043	2023-07-06 10:30:56 -07:00
Ivan Kosarev	ee165cdb1b	[AMDGPU] Eliminate SIMCCodeEmitter and de-virtualise encoding methods. Simplifies some future changes needed for <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154337	2023-07-05 10:13:33 +01:00
David Stuttard	90431ca2e0	Reland [AMDGPU] New PAL metadata updates to ps_extra_lds_size and float_mode New metadata format contains full calculation of field contents for ps_extra_lds_size (vs old format where the value in RSRC register is used by PAL to calculate the value required). Also stop updating float_mode and rely on front end settings for this field. Differential Revision: https://reviews.llvm.org/D152247	2023-06-09 12:34:00 +01:00
David Stuttard	b2817d22bb	Revert "[AMDGPU] New PAL metadata updates to ps_extra_lds_size and float_mode" This reverts commit 6d5a653dda628250b373ec89e0e11cdd27603c24.	2023-06-06 13:02:28 +01:00
David Stuttard	6d5a653dda	[AMDGPU] New PAL metadata updates to ps_extra_lds_size and float_mode New metadata format contains full calculation of field contents for ps_extra_lds_size (vs old format where the value in RSRC register is used by PAL to calculate the value required). Also stop updating float_mode and rely on front end settings for this field. Differential Revision: https://reviews.llvm.org/D152247	2023-06-06 12:18:21 +01:00
David Stuttard	4d54565436	[AMDGPU] Remove unnecessary assert Also remove the function attributes from the test. For PAL based shaders this isn't required. Differential Revision: https://reviews.llvm.org/D148625	2023-04-18 13:41:38 +01:00
David Stuttard	fc83f1de5d	[AMDGPU] Add backend support for new PAL ELF Metadata 3.0 PAL Metadata 3.0 introduces an explicit structure in metadata for the programmable registers written out by the compiler backend. Rather than using opaque registers which can change between different architectures and requires encoding the bitfield information in the backend, which may change between versions. This is the initial minimal implementation that enables the use of PAL Metadata 3.0. The change itself should be NFC for non-PAL, although the way RSRC2 register is handled has been changed slightly. The test is fairly minimal, but checks that the metadata format looks as expected and verifies a couple of special cases such as tgid_[xyz]_en handling and PsInputAddr/Ena which also change to explicit fields. Differential Revision: https://reviews.llvm.org/D147143	2023-04-14 09:57:13 +01:00
Austin Kerbow	864a2b25be	[AMDGPU] Reserve extra SGPR blocks wth XNACK "any" TID Setting ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel descriptor for the xnack_mask. This was broken for the dynamic XNACK "any" TID setting which could cause user SGPRs to be clobbered if the number of SGPRs reserved was near a granulated block boundary. When XNACK was enabled this worked correctly in the ASMParser which meant some kernels were only failing without "-save-temps". Fixes: SWDEV-382764 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D145401	2023-03-17 20:26:23 -07:00
Jay Foad	dcb834843e	[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC. This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650	2023-02-23 16:38:15 +00:00
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Kazu Hirata	a5cd202e21	Use llvm::Log2_32 and llvm::Log2_64 instead of llvm::findLastSet (NFC) For a nonzero argument, llvm::findLastSet(x) is equivalent to llvm::Log2_32(x) or llvm::Log2_64(x). None of the calls to llvm::findLastSet in this patch relies on llvm::findLastSet's ability to return std::numeric_limits<T>::max() on input 0.	2023-01-25 21:34:09 -08:00
Nicolai Hähnle	10cef708a7	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468	2023-01-23 21:43:06 +01:00
Matt Arsenault	8dfe60c356	AMDGPU: Set scratch_en if there is dynamic stack but no fixed stack	2023-01-04 20:51:18 -05:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Guillaume Chatelet	702126aec5	[NFC] Add helper method to ensure min alignment on MCSection Follow up on D138653. Differential Revision: https://reviews.llvm.org/D138686	2022-11-28 10:00:34 +00:00
Guillaume Chatelet	6c09ea3fdd	[Alignment][NFC] Use Align in MCStreamer::emitValueToAlignment Differential Revision: https://reviews.llvm.org/D138674	2022-11-24 16:09:44 +00:00
Guillaume Chatelet	e647b4f519	[reland][Alignment][NFC] Use the Align type in MCSection Differential Revision: https://reviews.llvm.org/D138653	2022-11-24 13:19:18 +00:00
Guillaume Chatelet	3467f9c7d6	Revert D138653 [Alignment][NFC] Use the Align type in MCSection" This breaks the bolt project. This reverts commit 409f0dc4a420db1c6b259d5ae965a070c169d930.	2022-11-24 12:42:30 +00:00
Guillaume Chatelet	409f0dc4a4	[Alignment][NFC] Use the Align type in MCSection Differential Revision: https://reviews.llvm.org/D138653	2022-11-24 12:32:58 +00:00
Matt Arsenault	7d568cdc9d	AMDGPU: Register a null MC streamer for -emit-codegen-only For some reason null is a valid MC target, used from clang with -emit-codegen-only. Previously the target streamer was null, which was inconsistently null checked resulting in crashes if using amdhsa.	2022-10-28 16:39:09 -07:00
Abinav Puthan Purayil	3d9f011a9c	[AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later Unfortunately, we have a broken handling of this in the runtime of rocm 5.3. The runtime is expected to handle this correctly when v5 becomes the default. Differential Revision: https://reviews.llvm.org/D134714	2022-10-11 23:28:43 +05:30
raghavmedicherla	57f01fee1e	[AMDGPU/Metadata] Rename HSAMD::MetadataStreamer classes Renamed all HSAMD::MetadataStreamer classes to improve readability of the code. Differential Revision: https://reviews.llvm.org/D133156	2022-09-06 16:46:37 -04:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Abinav Puthan Purayil	d96361d714	[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map This change introduces the dynamic stack boolean field to code-object-v3 and above under the code properties of the kernel descriptor and under the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to the is_dynamic_callstack field of amd_kernel_code_t. Differential Revision: https://reviews.llvm.org/D128344	2022-07-18 10:07:13 +05:30
Vang Thao	67357739c6	[AMDGPU] Add remarks to output some resource usage Add analyis remarks to output kernel name, register usage, occupancy, scratch usage, spills, and LDS information. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123878	2022-07-15 11:01:53 -07:00
Matt Arsenault	0bdaef38c9	AMDGPU: Add gfx11 feature to force initializing 16 input SGPRs The total user+system SGPR count needs to be padded out to 16 if fewer inputs are enabled.	2022-06-29 14:52:19 -04:00
Jay Foad	929a8ad2b6	[AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11 The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed in GFX11. It is now in units of 256 dwords instead of 128 dwords. COMPUTE_PGM_RSRC2.LDS_SIZE is unaffected. It is still in units of 128 dwords. Differential Revision: https://reviews.llvm.org/D128179	2022-06-21 14:48:12 +01:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Fangrui Song	adf4142f76	[MC] De-capitalize SwitchSection. NFC Add SwitchSection to return switchSection. The API will be removed soon.	2022-06-10 22:50:55 -07:00

1 2 3 4 5 ...

321 Commits