llvm-project

Author	SHA1	Message	Date
Austin Kerbow	b1d42465fc	[AMDGPU] Fix hidden kernarg preload count inconsistency (#116759 ) It is possible that the number of hidden arguments that are selected to be preloaded in AMDGPULowerKernel arguments and isel can differ. This isn't an issue with explicit arguments since isel can lower the argument correctly either way, but with hidden arguments we may have alignment issues if we try to load these hidden arguments that were added to the kernel signature. The reason for the mismatch is that isel reserves an extra synthetic user SGPR for module LDS. Instead of teaching lowerFormalArguments how to handle these properly it makes more sense and is less expensive to fix the mismatch and assert if we ever run into this issue again. We should never be trying to lower these in the normal way. In a future change we probably want to revise how we track "synthetic" user SGPRs and unify the handling in GCNUserSGPRUsageInfo. Sometimes synthetic SGPRSs are considered user SGPRs and sometimes they are not. Until then this patch resolves the inconsistency, fixes the bug, and is otherwise a NFC.	2024-12-08 10:10:08 -08:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Shilei Tian	6548b6354d	Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403 )" This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.	2024-11-08 20:21:16 -05:00
Shilei Tian	ca33649abe	Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403 )" This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.	2024-11-08 16:36:35 -05:00
Shilei Tian	e215a1e27d	[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403 )	2024-11-08 13:05:35 -05:00
Jay Foad	8ee5e19c87	[AMDGPU] Fix @llvm.amdgcn.cs.chain with SGPR args not provably uniform (#114232 ) The correct behaviour is to insert a readfirstlane. SelectionDAG was already doing this in some cases, but not in the general case for chain calls. GlobalISel was already doing this for return values but not for arguments.	2024-10-30 16:12:37 +00:00
Matt Arsenault	428ae0f12e	AMDGPU: Do not tail call if an inreg argument requires waterfalling (#111002 ) If we have a divergent value passed to an outgoing inreg argument, the call needs to be executed in a waterfall loop and thus cannot be tail called. The waterfall handling of arbitrary calls is broken on the selectiondag path, so some of these cases still hit an error later. I also noticed the argument evaluation code in isEligibleForTailCallOptimization is not correctly accounting for implicit argument assignments. It also seems inreg codegen is generally broken; we are assigning arguments to the reserved private resource descriptor.	2024-10-04 00:04:02 +04:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Kazu Hirata	3b9f183958	[AMDGPU] Use llvm::any_of, llvm::all_of, and llvm::none_of (NFC) (#103007 )	2024-08-13 00:07:54 -07:00
Jay Foad	74b87b02d2	[AMDGPU] Fix and add namespace closing comments. NFC.	2024-07-16 16:56:31 +01:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Thomas Symalla	256343a0e9	Revert "Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 " (#86273 ) Reverts llvm/llvm-project#81394 This reverts commit 3ac243bc0d7922d083af2cf025247b5698556062. It is not handling RSrc registers s0-s3 correctly. This leads to a broken test, where it expects s0-s3 as function argument and uses it as RSrc register as well. We need to re-visit the patch, but apparently we only want to have s0-s3 as argument registers if we don't need them as RSrc registers.	2024-03-26 11:01:08 +01:00
SahilPatidar	3ac243bc0d	Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 (#81394 ) Resolve #78226	2024-03-21 16:52:08 +05:30
Sameer Sahasrabuddhe	ec34699f75	[GlobalISel] convergence control tokens and intrinsics (#67006 ) [GlobalISel] Implement convergence control tokens and intrinsics in GMIR In the IR translator, convert the LLVM token type to LLT::token(), which is an alias for the s0 type. These show up as implicit uses on convergent operations. Differential Revision: https://reviews.llvm.org/D158147	2024-03-18 10:34:11 +05:30
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Matt Arsenault	af4f1766ae	AMDGPU: Allocate special SGPRs before user SGPR arguments (#78234 )	2024-01-17 21:41:50 +07:00
Matt Arsenault	480cc413b7	AMDGPU/GlobalISel: Handle inreg arguments as SGPRs (#78123 ) This is the missing GISel part of 54470176afe20b16e6b026ab989591d1d19ad2b7	2024-01-16 15:13:31 +07:00
Diana	7f5d59b38d	[AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (#68186 ) The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call parameters are bundled up into 2 intrinsic arguments, one for those that should go in the SGPRs (the 3rd intrinsic argument), and one for those that should go in the VGPRs (the 4th intrinsic argument). Both will often be some kind of aggregate. Both instruction selection frameworks have some internal representation for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel, ISD::INTRINSIC_[VOID\|WITH_CHAIN] for DAGISel), but we can't use those because aggregates are dissolved very early on during ISel and we'd lose the inreg information. Therefore, this patch shortcircuits both the IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call from the very start. It tries to use the existing infrastructure as much as possible, by calling into the code for lowering tail calls. This has already gone through a few rounds of review in Phab: Differential Revision: https://reviews.llvm.org/D153761	2023-11-06 12:30:07 +01:00
Craig Topper	2f4328e697	[GISel] Make assignValueToReg take CCValAssign by const reference. (#70086 ) This was previously passed by value. It used to be passed by non-const reference, but it was changed to value in D110610. I'm not sure why.	2023-10-24 15:47:04 -07:00
Craig Topper	9f592cbc18	[GISel] Pass MPO and VA to assignValueToAddress by const reference. NFC (#69810 ) Previously they were passed by non-const reference. No in tree target modifies the values. This makes it possible to call assignValueToAddress from assignCustomValue without a const_cast. For example in this patch https://github.com/llvm/llvm-project/pull/69138.	2023-10-24 09:58:22 -07:00
Austin Kerbow	343be5132e	[AMDGPU] Add utilities to track number of user SGPRs. NFC. Factor out and unify some common code that calculates and tracks the number of user SGRPs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159439	2023-09-12 08:52:30 -07:00
Sameer Sahasrabuddhe	ef38e6d97f	[GlobalISel] introduce MIFlag::NoConvergent Some opcodes in MIR are defined to be convergent by the target by setting IsConvergent in the corresponding TD file. For example, in AMDGPU, the opcodes G_SI_CALL and G_INTRINSIC* are marked as convergent. But this is too conservative, since calls to functions that do not execute convergent operations should not be marked convergent. This information is available in LLVM IR. The new flag MIFlag::NoConvergent now allows the IR translator to mark an instruction as not performing any convergent operations. It is relevant only on occurrences of opcodes that are marked isConvergent in the target. Differential Revision: https://reviews.llvm.org/D157475	2023-08-20 21:14:46 +05:30
Sameer Sahasrabuddhe	d9847cde48	[GlobalISel] convergent intrinsics Introduced the convergent equivalent of the existing G_INTRINSIC opcodes: - G_INTRINSIC_CONVERGENT - G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS Out of the targets that currently have some support for GlobalISel, the patch assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154766	2023-07-31 12:15:39 +05:30
Jon Chesterfield	74e928a081	[amdgpu][lds] Remove recalculation of LDS frame from backend Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend. Prior to this patch: The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol metadata so that the assembler can build lookup tables out of it. There is a fragile association between kernel functions and named structs which is used to recompute the frame layout in the backend, with fatal_errors catching inconsistencies in the second calculation. After this patch: The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same absolute_symbol metadata that the assembler uses to place objects within that frame size. Deleted the now dead allocation code from the backend. Left for a later cleanup: - enabling lowering for anonymous functions - removing the elide-module-lds attribute (test churn, it's not used by llc any more) - adjusting the dynamic alignment check to not use symbol names Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155190	2023-07-13 23:54:38 +01:00
Matt Arsenault	3d0350b762	AMDGPU: Add MF independent version of getImplicitParameterOffset	2023-06-07 08:26:31 -04:00
Sergei Barannikov	01a7967447	[CodeGen] Replace CCState's getNextStackOffset with getStackSize (NFC) The term "next stack offset" is misleading because the next argument is not necessarily allocated at this offset due to alignment constrains. It also does not make much sense when allocating arguments at negative offsets (introduced in a follow-up patch), because the returned offset would be past the end of the next argument. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D149566	2023-05-17 21:51:45 +03:00
Changpeng Fang	1ab8b9ae15	AMDGPU: Define sub-class of SGPR_64 for tail call return Summary: Registers for tail call return should not be clobbered by callee. So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold the tail call return address. Because GFX and C calling conventions have different CSR, we need to define the sub-class separately. This work is an extension of D147096 with the consideration of GFX calling convention. Based on the calling conventions, different instructions will be selected with different sub-class of SGPR_64 as the input. Reviewers: arsenm, cdevadas and sebastian-ne Differential Revision: https://reviews.llvm.org/D148824	2023-04-27 10:45:11 -07:00
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Kazu Hirata	83d56fb17a	Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC) This patch drops the ZeroBehavior parameter from bit counting functions like countLeadingZeros. ZeroBehavior specifies the behavior when the input to count{Leading,Trailing}Zeros is zero and when the input to count{Leading,Trailing}Ones is all ones. ZeroBehavior was first introduced on May 24, 2013 in commit eb91eac9fb866ab1243366d2e238b9961895612d. While that patch did not state the intention, I would guess ZeroBehavior was for performance reasons. The x86 machines around that time required a conditional branch to implement countLeadingZero<uint32_t> that returns the 32 on zero: test edi, edi je .LBB0_2 bsr eax, edi xor eax, 31 .LBB1_2: mov eax, 32 That is, we can remove the conditional branch if we don't care about the behavior on zero. IIUC, Intel's Haswell architecture, launched on June 4, 2013, introduced several bit manipulation instructions, including lzcnt and tzcnt, which eliminated the need for the conditional branch. I think it's time to retire ZeroBehavior as its utility is very limited. If you care about compilation speed, you should build LLVM with an appropriate -march= to take advantage of lzcnt and tzcnt. Even if not, modern host compilers should be able to optimize away quite a few conditional branches because the input is often known to be nonzero from dominating conditional branches. Differential Revision: https://reviews.llvm.org/D141798	2023-01-18 19:58:44 -08:00
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
Matt Arsenault	b5edd522d1	AMDGPU/GlobalISel: Do not create readfirstlane with non-s32 type We should probably handle any 32-bit type here, but the intrinsic definition and selection pattern currently do not. Avoids a few lit tests failures when switched on by default.	2022-12-15 21:44:07 -05:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Jon Chesterfield	80ba432821	[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically A kernel may have an associated struct for laying out LDS variables. This patch puts that instance, if present, at a deterministic address by allocating it at the same time as the module scope instance. This is relatively likely to be where the instance was allocated anyway (~NFC) but will allow later patches to calculate where a given field can be found, which means a function which is only reachable from a single kernel will be able to access a LDS variable with zero overhead. That will be particularly helpful for applications that instantiate a function template containing LDS variables once per kernel. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127052	2022-09-28 14:55:16 +01:00
Kazu Hirata	0387da6f4f	Use value instead of getValue (NFC)	2022-07-19 21:18:26 -07:00
Kazu Hirata	41ae78ea3a	Use has_value instead of hasValue (NFC)	2022-07-19 20:15:44 -07:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
Guillaume Chatelet	d154d0ac06	[NFC] Simplify code	2022-06-20 15:15:52 +00:00
Jon Chesterfield	bc78c09952	[amdgpu] Elide module lds allocation in kernels with no callees Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block as before if the attribute is missing or has its default value of true. Patch uses the new attribute to detect the simplest possible instance of this, where a kernel makes no calls and thus cannot call any functions that use LDS. Tests updated to match, coverage was already good. Interesting cases is in lower-module-lds-offsets where annotating the kernel allows the backend to pick a different (in this case better) variable ordering than previously. A later patch will avoid moving kernel variables into module.lds when the kernel can have this attribute, allowing optimal ordering and locally unused variable elimination. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122091	2022-05-04 22:42:07 +01:00
Jon Chesterfield	bcbd4cf1f2	Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal" Reconsidered, better to handle per-function state in the constructor as before. This reverts commit 98e474c1b3210d90e313457bf6a6e39a7edb4d2b.	2022-03-20 00:58:26 +00:00
Jon Chesterfield	98e474c1b3	[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal	2022-03-19 16:42:17 +00:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
Changpeng Fang	0f20a35b9e	AMDGPU: Set up User SGPRs for queue_ptr only when necessary Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend. Reviewers: sameerds, arsenm Fixes: SWDEV-307189 Differential Revision: https://reviews.llvm.org/D119762	2022-03-09 10:14:05 -08:00
Venkata Ramanaiah Nalamothu	04fff547e2	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652	2022-03-09 12:18:02 +05:30
Matt Arsenault	f482e86980	AMDGPU/GlobalISel: Fix flat_scratch_init handling for shaders I don't think this is actually defined for mesa, but this is what we were doing on the DAG path.	2022-01-27 10:20:52 -05:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit a97e20a3a8a58be751f023e610758310d5664562.	2022-01-24 09:26:52 -05:00

1 2 3 4

174 Commits