llvm-project

Author	SHA1	Message	Date
Austin Kerbow	2e5c298281	[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167 ) Add a prologue to the kernel entry to handle cases where code designed for kernarg preloading is executed on hardware equipped with incompatible firmware. If hardware has compatible firmware the 256 bytes at the start of the kernel entry will be skipped. This skipping is done automatically by hardware that supports the feature. A pass is added which is intended to be run at the very end of the pipeline to avoid any optimizations that would assume the prologue is a real predecessor block to the actual code start. In reality we have two possible entry points for the function. 1. The optimized path that supports kernarg preloading which begins at an offset of 256 bytes. 2. The backwards compatible entry point which starts at offset 0.	2025-01-10 11:39:02 -08:00
Shilei Tian	86734c8577	[NFC][AMDGPU] Remove redundant code in `AMDGPUAsmPrinter.cpp`	2024-11-20 15:08:26 -05:00
Matt Arsenault	5a556d55fb	AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309 )	2024-11-18 10:48:56 -08:00
Shilei Tian	6548b6354d	Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403 )" This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.	2024-11-08 20:21:16 -05:00
Janek van Oirschot	7f60f1312a	[AMDGPU] Fix resource usage information for unnamed functions (#115320 ) Resource usage information would try to overwrite unnamed functions if there are multiple within the same compilation unit. This aims to either use the `MCSymbol` assigned to the unnamed function (i.e., `CurrentFnSym`), or, rematerialize the `MCSymbol` for the unnamed function.	2024-11-07 18:24:54 +00:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Thomas Symalla	b95d50e5d8	Add and call `AMDGPUMCResourceInfo::reset` method (#110818 ) When compiling multiple pipelines, the `MCRegisterInfo` instance in `AMDGPUAsmPrinter` gets re-used even after finalization, so it calls `finalize()` multiple times. Add a reset method and call it in `AMDGPUAsmPrinter::doFinalization`. Different approach would be to make it a `unique_ptr`. --------- Co-authored-by: Thomas Symalla <tsymalla@amd.com>	2024-10-02 14:17:01 +02:00
Janek van Oirschot	c897c13dde	[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913 ) Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for. Fixes https://github.com/llvm/llvm-project/issues/64863	2024-09-30 11:43:34 +01:00
Austin Kerbow	954ab83e6a	[AMDGPU] Include unused preload kernarg in KD total SGPR count (#104743 ) Unlike with implicitly preloaded data UserSGPRs firmware is unable to handle cases where SGPRs for kernel arguments contain preloaded data but not are not explicitly referenced in the kernel. We need to include these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation to not clobber SGPRs in adjacent waves.	2024-09-23 13:48:22 -07:00
Janek van Oirschot	bfce1aae76	[AMDGPU] MCExpr printing helper with KnownBits support (#95951 ) Walks over the MCExpr and uses KnownBits to deduce whether an expression is known and if so, prints said known value. Should support the most common MCExpr cases for AMDGPU metadata.	2024-08-15 13:43:13 +01:00
Joseph Huber	f39bd0a24e	[AMDGPU] Do not print `kernel-resource-usage` information on non-kernels (#99720 ) Summary: This pass is used to get helpful information about the kernel resources without needing to insepct the binary. However, it currently prints on every function. These values will always be zero, so it's just spam on the terminal, at best an indication that a function wasn't internalized / optimized out. This patch makes it only print for kernels to make it more useful in practice.	2024-07-22 07:07:51 -05:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Jay Foad	5e338f1f4a	[AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC.	2024-07-17 08:27:35 +01:00
Jay Foad	f10a78b7e4	[AMDGPU] clang-tidy: use std::make_unique. NFC.	2024-07-17 07:58:09 +01:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Janek van Oirschot	17eaa23f7e	[AMDGPU] MCExpr-ify AMDGPU HSAMetadata (#94788 ) Enables MCExpr for HSAMetadata, particularly, HSAMetadata's msgpack format.	2024-06-26 16:39:08 +01:00
Ivan Kosarev	13ed349c44	[AMDGPU][NFC] Rename AMDGPUVariadicMCExpr to AMDGPUMCExpr. (#96618 ) Some of our custom expressions are not variadic and there seems to be little benefit in mentioning the variadic nature of expression nodes in the name anyway.	2024-06-25 15:32:09 +01:00
Nicolai Hähnle	7e9b49f6b8	AMDGPU: Add plumbing for private segment size argument (#96445 ) The actual size of scratch/private is determined at dispatch time, so add more plumbing to request it. Will be used in subsequent change.	2024-06-25 16:20:51 +02:00
Janek van Oirschot	3d1705d00c	MCExpr-ify AMDGPU PALMetadata (#93236 ) Allows MCExprs as passed values to PALMetadata. Also adds related `DelayedMCExpr` classes which serve as a pseudo-fixup to resolve MCExprs as late as possible (i.e., right before emit through string or blob, where they should be resolvable).	2024-06-13 13:59:31 +01:00
Jie Fu	a1c29df572	[AMDGPU] Fix -Wunused-variable in AMDGPUAsmPrinter.cpp (NFC) /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:653:13: error: unused variable 'PGMRSrc3' [-Werror,-Wunused-variable] int64_t PGMRSrc3; ^ 1 error generated.	2024-06-10 20:35:09 +08:00
Janek van Oirschot	bc022b406d	[AMDGPU] Support SIProgramInfo MCExpr for comments and remarks (#94350 ) Eliminates assumption that MCExpr comments/remarks being emitted are always resolvable	2024-06-10 12:34:06 +01:00
Janek van Oirschot	a699ccbf0c	MCExpr-ify amd_kernel_code_t (#91587 ) Redefines the amd_kernel_code_t struct with MCExprs for members that would be derived from SIProgramInfo MCExpr members.	2024-05-22 13:45:45 +01:00
Fangrui Song	9500a5d02e	[MC] Make UseAssemblerInfoForParsing mostly true Commit 6c0665e22174d474050e85ca367424f6e02476be (https://reviews.llvm.org/D45164) enabled certain constant expression evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives, see llvm/test/MC/AsmParser/assembler-expressions.s). `getUseAssemblerInfoForParsing` was added to make `clang -c` handling inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`), where such expression folding (related to `AttemptToFoldSymbolOffsetDifference`) is unavailable. I believe this is overly conservative. We can make some parse-time expression folding work for `clang -c` even if `clang -S` would still report an error, a MCAsmStreamer issue (we cannot print `.if` directives) that should not restrict the functionality of MCObjectStreamer. ``` % cat b.cc asm(R"( .pushsection .text,"ax" .globl _start; _start: ret .if . -_start == 1 ret .endif .popsection )"); % gcc -S b.cc && gcc -c b.cc % clang -S -fno-integrated-as b.cc # succeeded % clang -c b.cc # succeeded with this patch % clang -S b.cc # still failed <inline asm>:4:5: error: expected absolute expression 4 \| .if . -_start == 1 \| ^ 1 error generated. ``` However, removing `getUseAssemblerInfoForParsing` would make MCDwarfFrameEmitter::Emit (for .eh_frame FDE) slow (~4% compile time regression for sqlite3.c amalgamation) due to expensive `AttemptToFoldSymbolOffsetDifference`. For now, make `UseAssemblerInfoForParsing` false in MCDwarfFrameEmitter::Emit. Close #62520 Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841 Pull Request: https://github.com/llvm/llvm-project/pull/91082	2024-05-19 23:35:15 -07:00
Nikita Popov	fa750f09be	Revert "[MC] Remove UseAssemblerInfoForParsing" This reverts commit 03c53c69a367008da689f0d2940e2197eb4a955c. This causes very large compile-time regressions in some cases, e.g. sqlite3 at O0 regresses by 5%.	2024-05-16 09:56:07 +09:00
Fangrui Song	03c53c69a3	[MC] Remove UseAssemblerInfoForParsing Commit 6c0665e22174d474050e85ca367424f6e02476be (https://reviews.llvm.org/D45164) enabled certain constant expression evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives, see llvm/test/MC/AsmParser/assembler-expressions.s). `getUseAssemblerInfoForParsing` was added to make `clang -c` handling inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`), where such expression folding (related to `AttemptToFoldSymbolOffsetDifference`) is unavailable. I believe this is overly conservative. We can make some parse-time expression folding work for `clang -c` even if `clang -S` would still report an error, a MCAsmStreamer issue (we cannot print `.if` directives) that should not restrict the functionality of MCObjectStreamer. ``` % cat b.cc asm(R"( .pushsection .text,"ax" .globl _start; _start: ret .if . -_start == 1 ret .endif .popsection )"); % gcc -S b.cc && gcc -c b.cc % clang -S -fno-integrated-as b.cc # succeeded % clang -c b.cc # succeeded with this patch % clang -S b.cc # still failed <inline asm>:4:5: error: expected absolute expression 4 \| .if . -_start == 1 \| ^ 1 error generated. ``` Close #62520 Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841 Pull Request: https://github.com/llvm/llvm-project/pull/91082	2024-05-15 09:18:39 -07:00
Jie Fu	c2a87d7e03	[AMDGPU] Remove unused lambda capture (NFC) /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:733:30: error: lambda capture 'Ctx' is not used [-Werror,-Wunused-lambda-capture] auto TryGetMCExprValue = [&Ctx](const MCExpr *Value, uint64_t &Res) -> bool { ~^~~ 1 error generated.	2024-05-09 20:15:41 +08:00
Janek van Oirschot	d86b68afd7	MCExpr-ify SIProgramInfo (#88257 ) Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.	2024-05-09 13:02:32 +01:00
Kazu Hirata	c18bcd0a57	[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072 ) (#91138 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 38 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-05 13:43:10 -07:00
Carl Ritson	44648ccb8b	[AMDGPU] Always emit lds_size in PAL ELF Metadata 3.0 (#87222 ) Emit lds_size for all shader types in PAL metadata.	2024-05-03 17:01:03 +09:00
Janek van Oirschot	1103a2a337	Reland [AMDGPU] MCExpr-ify MC layer kernel descriptor (#86494 ) Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr. Relands #80855 with fixes	2024-03-27 11:59:56 +00:00
Janek van Oirschot	797336b127	Revert "[AMDGPU] MCExpr-ify MC layer kernel descriptor" (#86151 ) Reverts llvm/llvm-project#80855	2024-03-21 10:19:54 -07:00
Janek van Oirschot	857161c367	[AMDGPU] MCExpr-ify MC layer kernel descriptor (#80855 ) Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.	2024-03-21 13:57:10 +00:00
Carl Ritson	c29b265eb9	Reapply "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 )" This reverts commit 7d508eb5d38f4bbbab4230a666d9e742e271af61.	2024-03-14 10:56:43 +09:00
Diana Picus	0086cc95b3	[AMDGPU] Rename getNumVGPRBlocks. NFC (#84161 ) Rename getNumVGPRBlocks to getEncodedNumVGPRBlocks, to clarify that it's using the encoding granule. This is used to program the hardware. In practice, the hardware will use the alloc granule instead, so this patch also adds a new helper, getAllocatedNumVGPRBlocks, which can be useful when driving heuristics.	2024-03-07 12:46:42 +01:00
Austin Kerbow	4bcbeaed63	[AMDGPU] Enable kernel arg preloading with gfx90a (#81180 ) Add a trap instruction to the beginning of the kernel prologue to handle cases where preloading is attempted on HW loaded with incompatible firmware.	2024-02-12 22:33:29 -08:00
Pierre van Houtryve	f93aa5157a	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955 ) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.	2024-02-12 10:18:20 +01:00
Carl Ritson	7d508eb5d3	Revert "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 )" This reverts commit d6c7253d32e4bdff619c39708170f1c1fa01ff95. Change causing CTS failures due to incomplete metadata.	2024-02-07 17:09:56 +09:00
David Stuttard	d6c7253d32	[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 ) PAL Metadata 3.0 introduces an explicit structure in metadata for the programmable registers written out by the compiler backend. The previous approach used opaque registers which can change between different architectures and required encoding the bitfield information in the backend, which may change between versions. This change is an extension the previously added support - which only handled entry functions. This adds support for all functions. The change also includes some re-factoring to separate common code.	2024-02-06 15:34:36 +00:00
Pierre van Houtryve	500846d2f5	[AMDGPU] Introduce Code Object V6 (#76954 ) Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955	2024-02-05 08:19:53 +01:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Jay Foad	42b9ea841e	[AMDGPU] Increase max scratch allocation for GFX12 (#77625 )	2024-01-17 10:25:28 +00:00
Piotr Sobczak	fac093dd08	[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 13:52:40 +01:00
Pierre van Houtryve	ecd2f56a80	[AMDGPU] Warn if 'amdgpu-waves-per-eu' target occupancy was not met (#74055 ) This should make it a bit harder to miss this type of issue. The warning only shows if amdgpu-waves-per-eu is used. See SWDEV-434482	2023-12-06 10:46:46 +01:00
Pierre van Houtryve	4428b01faa	Reland: [AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-11-07 12:23:03 +01:00
Jay Foad	521ac12a25	[AMDGPU] Remove AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough (#71407 ) The special handling for blocks ending with a long branch has been unnecessary since D106445: "[amdgpu] Add 64-bit PC support when expanding unconditional branches."	2023-11-06 16:29:52 +00:00
pvanhout	868abf0961	Revert "[AMDGPU] Remove Code Object V3 (#67118 )" This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.	2023-10-18 12:55:36 +02:00
Pierre van Houtryve	544d91280c	[AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-10-16 08:21:48 +02:00
Yashwant Singh	7ac532efc8	[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091 ) Use this flag to give more context to implicit def comments in assembly. Reviewed on phabricator: https://reviews.llvm.org/D153754	2023-09-29 11:15:01 +05:30
Austin Kerbow	0455596e1e	[AMDGPU] Add DAG ISel support for preloaded kernel arguments This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patches in the series will make the codegen backwards compatible. This patch should only be submitted alongside that subsequent patch. Preloading here begins from the start of the kernel arguments until the amount of arguments indicated by the CL flag amdgpu-kernarg-preload-count. Aggregates and arguments passed by-ref are not supported. Special care for the alignment of the kernarg segment is needed as well as consideration of the alignment of addressable SGPR tuples when we cannot directly use misaligned large tuples that the arguments are loaded to. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D158579	2023-09-25 09:32:59 -07:00
Pierre van Houtryve	fe2f67e4ba	[AMDGPU] Remove Code Object V2 (#65715 ) Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).	2023-09-21 12:00:45 +02:00

1 2 3 4 5 ...

355 Commits