llvm-project

Author	SHA1	Message	Date
Christudasan Devadasan	7a72a93580	[AMDGPU] Preserve only the inactive lanes of scratch vgprs In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can potentially have its inactive lanes overwritten by the writelane instructions. When the function returns, it can cause unexpected behavior if the VGPR value is not preserved appropriately. The current scheme to preserve the inactive lanes of such scratch VGPRs is not done rightly. It preserves all lanes and causes the outgoing values (if any) getting overwritten by the epilog restores. It then corrupts the return value. To avoid such situation with scratch VGPRs, this patch ensures we preserve only their inactive lanes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134526	2022-12-17 11:51:43 +05:30
Christudasan Devadasan	20a940f1e2	[AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores There is a lot of customization and eventually code duplication in the frame lowering that handles special SGPR spills like the one needed for the Frame Pointer. Incorporating any additional SGPR spill currently makes it difficult during PEI. This patch introduces a new spill builder to efficiently handle such spill requirements. Various spill methods are special handled using a separate class. Reviewed By: sebastian-ne, scott.linder Differential Revision: https://reviews.llvm.org/D132436	2022-12-17 11:50:25 +05:30
Christudasan Devadasan	b25b4c0ab4	[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI SILowerSGPRSpills pass handles the lowering of SGPR spills into VGPR lanes. Some SGPR spills are handled later during PEI. There is a common function used in both places to find the free VGPR lane. This patch eliminates that dependency to find the free VGPR by handling it separately for PEI. It is a prerequisite patch for a future work to allow SGPR spills to virtual VGPR lanes during SILowerSGPRSpills. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124195	2022-12-17 11:49:41 +05:30
Christudasan Devadasan	af5e5c40ff	[AMDGPU] Add WWM reserved VGPRs to WWMSpills The custom VGPR spills inserted during frame lowering maintain a separate list for WWM reserved registers. Added them into WWMSpills that already tracks such reserved registers. It unifies the spill insertion. Reviewed By: nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D124193	2022-12-17 11:47:58 +05:30
Christudasan Devadasan	5692a7e84e	[AMDGPU] Callee must always spill writelane VGPRs Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if it was marked Caller-saved. Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D124192	2022-12-17 11:11:42 +05:30
Jeffrey Byrnes	4d2faf043b	[AMDGPU][SIFrameLowering] Mark VGPR used for AGPR spills as reserved Presently, there is an issue on MI100 (and probably other architecture) where the VGPR used for AGPR copies clobbers VGPR used for AGPR spill. AFAICT this is because in processFunctionBeforeFrameIndicesReplaced we think the VGPR register for AGPR spill is unused. This patch aims to correct that. This is a WIP while I work out issues with producing a good test. For now, I'm curious if this is generally a good / bad idea. Differential Revision: https://reviews.llvm.org/D139673	2022-12-16 12:00:51 -08:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Krzysztof Parzyszek	c589730ad5	[YAML] Convert Optional to std::optional	2022-12-06 12:49:32 -08:00
Fangrui Song	4b1b9e22b3	Remove unused #include "llvm/ADT/Optional.h"	2022-12-05 04:21:08 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Nicolai Hähnle	b7f44f7cf9	AMDGPU: Remove ImagePSV and move images to addrspace 7 Following up on the removal of BufferPSV in commit 43b86bf992 ("AMDGPU: Remove BufferPseudoSourceValue") It is unclear what exactly the right address space for images should be. They seem morally closest to buffers, so that's what I went with. In practical terms, address space 7 is better than address space 0 because it can't alias with LDS. Differential Revision: https://reviews.llvm.org/D138949	2022-11-30 11:32:34 +01:00
Nicolai Hähnle	43b86bf992	AMDGPU: Remove BufferPseudoSourceValue The use of a PSV for buffer intrinsics is misleading because it may be misinterpreted as all buffer intrinsics accessing the same address in memory, which is clearly not true. Instead, build MachineMemOperands without a pointer value but with an address space, so that address space-based alias analysis can still work. There is a lot of test churn because previously address space 4 (constant address space) was used as an address space for buffer intrinsics. This doesn't make much sense and seems to have been an accident -- see the change in AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind. Differential Revision: https://reviews.llvm.org/D138711	2022-11-29 22:15:11 +01:00
Matt Arsenault	ffdbbd112c	AMDGPU: Directly pass Function to mayUseAGPRs This was taking the MachineFunction, but only inspecting the underlying IR.	2022-11-02 10:48:51 -07:00
Stanislav Mekhanoshin	5a3fe9a039	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-28 13:13:40 -07:00
Vitaly Buka	20a80d60a8	Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI" Break msan bots. Details in D134666. This reverts commit 0ce96e06ee0226938e723bd0c8e16e3d2d51f203.	2022-09-26 22:22:09 -07:00
Stanislav Mekhanoshin	0ce96e06ee	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-26 13:20:24 -07:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
Guillaume Chatelet	cef65864af	[Alignment] Use Align for MaxKernArgAlign Differential Revision: https://reviews.llvm.org/D128118	2022-06-22 13:40:37 +00:00
Matt Arsenault	cc5a1b3dd9	llvm-reduce: Add cloning of target MachineFunctionInfo MIR support is totally unusable for AMDGPU without this, since the set of reserved registers is set from fields here. Add a clone method to MachineFunctionInfo. This is a subtle variant of the copy constructor that is required if there are any MIR constructs that use pointers. Specifically, at minimum fields that reference MachineBasicBlocks or the MachineFunction need to be adjusted to the values in the new function.	2022-06-07 10:14:48 -04:00
Matt Arsenault	cfe5168499	AMDGPU: Make PSV instances static members	2022-06-07 10:14:48 -04:00
Matt Arsenault	dd7e407d81	AMDGPU: Move SpilledReg from MFI to SIRegisterInfo This isn't the most natural place for it, but it avoids a circular include dependency in an out of tree patch.	2022-06-02 17:11:24 -04:00
hsmahesha	5bd87350a5	[AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget. Based on available register budget, reserve highest available VGPR for AGPR copy before RA. After RA, shift it to lowest unused VGPR if the one exist. Fixes SWDEV-330006. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D123525	2022-04-21 07:57:26 +05:30
Matt Arsenault	987df725ac	AMDGPU: Serialize VGPRForAGPRCopy	2022-04-19 22:14:52 -04:00
Matt Arsenault	b5ec131267	AMDGPU: Fix allocating GDS globals to LDS offsets These don't seem to be very well used or tested, but try to make the behavior a bit more consistent with LDS globals. I'm not sure what the definition for amdgpu-gds-size is supposed to mean. For now I assumed it's allocating a static size at the beginning of the allocation, and any known globals are allocated after it.	2022-04-19 22:14:48 -04:00
Matt Arsenault	378bb8014d	AMDGPU: Serialize a few more MachineFunctionInfo fields in MIR	2022-04-19 22:12:59 -04:00
Matt Arsenault	f90f4884c8	AMDGPU: Serialize gds size in MIR	2022-04-19 22:12:59 -04:00
Matt Arsenault	5cd17f9d43	AMDGPU: Serialize WWM registers	2022-04-19 21:44:43 -04:00
Matt Arsenault	e0d585d75a	AMDGPU: Defer creation of WWM VGPR spill slots There's no reason to create these immediately. They can be created in the prolog/epilog code like CSR spills. There's probably a cleaner way to do this by utilizing the CSR spill code. This makes the frame index used transient state for PrologEpilogInserter, and thus makes serialization easier. Really this doesn't need to be saved here but there isn't really a better place for it.	2022-04-19 21:07:13 -04:00
hsmahesha	ea47373af4	[AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy. This is an NFC patch in preparation to fix a bug related to always reserving VGPR32 for AGPR copy. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D123651	2022-04-14 12:51:33 +05:30
Changpeng Fang	8384ced974	[AMDGPU][NFC]: Remove unnecessary MFI functions Summary: hasHostcallPtr() and hasHeapPtr() are only used in metadata emit. However, we can use the corresponding function attributes directly instead introducing the functions. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D122600	2022-03-28 12:13:33 -07:00
Changpeng Fang	ca62b1db9f	[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg Summary: Emit metadata for hidden_heap_v1 kernarg Reviewers: sameerds, b-sumner Fixes: SWDEV-307188 Differential Revision: https://reviews.llvm.org/D119027	2022-02-25 10:45:35 -08:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Stanislav Mekhanoshin	aeaf85b9c2	[AMDGPU] Select VGPR versions of MFMA if possible We can select _vgprcd versions of MAI instructions and have no AGPRs with the whole budget left for VGPRs if: 1. This is a kernel; 2. It has no calls; 3. It runs at least on 2 waves thus having not more that 256 VGPRs. 4. There is no inline asm requesting AGPRs. Differential Revision: https://reviews.llvm.org/D117253	2022-02-08 10:19:41 -08:00
Matt Arsenault	d6fdbbcace	AMDGPU: Add second emergency slot for SGPR to vmem for large frames In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the SGPR spill. In most cases we could spill the VGPR along with the SGPR being spilled, but we don't have any free lanes for SGPR_1024 in wave32 so we could still potentially need a second scavenging slot.	2022-02-02 19:05:05 -05:00
Matt Arsenault	de1600a1d9	AMDGPU: Avoid enabling kernel workitem IDs with reqd_work_group_size	2022-01-18 13:52:04 -05:00
Austin Kerbow	8470bf2b08	[AMDGPU] Do not reserve any VGPR for SGPR spills After the split register allocation changes in eebe841a47cb it is no longer necessary to reserve a VGPR before RA. This can also create bugs when IPRA is enabled since we cannot predict that a called function may not reserve any register if it does not have any SGPR spills. If that happens those functions may override reserved registers that are normally callee saved. Added a test to show this. Fixes: SWDEV-309900 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115551	2022-01-11 22:14:59 -08:00
Brendon Cahoon	d45a247998	[AMDGPU] Don't remove VGPR to AGPR dead spills from frame info Removing dead frame indices for VGPR to AGPR spills is incorrect when the frame index is shared by multiple objects, which may occur due to stack slot coloring. The problem is that subsequent code that processes the other object will assert because the stack frame index is marked dead. Removing dead frame indices is needed prior to stack slot coloring, which is what happens with SGPR to VGPR spills. These spills are lowered prior to stack slot coloring, but the VGPR to AGPR spills are processed afterwards during the Prolog/Epilog Inserter pass. This patch marks the VGPR to AGPR spill slot as dead if the slot is not used by another object. Differential Revision: https://reviews.llvm.org/D115996	2021-12-23 11:09:19 -06:00
Matt Arsenault	06b90175e7	AMDGPU: Remove fixed function ABI option	2021-12-10 19:41:19 -05:00
Matt Arsenault	729bf9b26b	AMDGPU: Enable fixed function ABI by default Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway. Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.	2021-12-04 10:49:18 -05:00
Stanislav Mekhanoshin	e5340ed30c	[AMDGPU] Fix global isel for kernels using agprs on gfx90a With Global ISel getReservedRegs() is called before function is regbank selected for the first time. Defer caching of usesAGPRs() in this case. Differential Revision: https://reviews.llvm.org/D112644	2021-10-29 14:23:14 -07:00
Stanislav Mekhanoshin	ca0c92d6a1	[AMDGPU] Allow to use a whole register file on gfx90a for VGPRs In a kernel which does not have calls or AGPR usage we can allocate the whole vector register budget for VGPRs and have no AGPRs as long as VGPRs stay addressable (i.e. below 256). Differential Revision: https://reviews.llvm.org/D111764	2021-10-21 18:24:34 -07:00
hsmahesha	52cb3af08c	[AMDGPU] Remove dead frame indices after sgpr spill. All those frame indices which are dead after sgpr spill should be removed from the function frame. Othewise, there is a side effect such as re-mapping of free frame index ids by the later pass(es) like "stack slot coloring" which in turn could mess-up with the book keeping of "frame index to VGPR lane". Reviewed By: cdevadas Differential Revision: https://reviews.llvm.org/D111150	2021-10-12 09:58:49 +05:30
Matt Arsenault	722b8e0e5a	AMDGPU: Invert ABI attribute handling Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated. This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass. However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit. At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.	2021-09-09 18:24:28 -04:00
Matt Arsenault	98d7aa435f	AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr We no longer use this intrinsic outside of the backend and no longer support using it outside of kernels.	2021-08-26 20:30:03 -04:00
Stanislav Mekhanoshin	827dd17e26	[AMDGPU] Invert partial vgpr to agpr spill lane order On targets requiring VGPR alignment we may end up spilling an unaligned register if we were partially spilled odd number of leading lanes. The reminder will start with an odd register. This problem is solved by inverting the order of lanes to be spillied so that we start from the end. Differential Revision: https://reviews.llvm.org/D108732	2021-08-26 09:39:03 -07:00
Matt Arsenault	5beb9a0e6a	AMDGPU: Respect compute ABI attributes with unknown OS Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL, so we still have the awkward unknown OS case to deal with. Previously if the HSA ABI intrinsics appeared, we we would not add the ABI registers to the function. We would emit an error later, but we still need to produce some compile result. Start adding the registers to any compute function, regardless of the OS. This keeps the internal state more consistent, and will help avoid numerous test crashes in a future patch which starts assuming the ABI inputs are present on functions by default.	2021-08-13 20:44:46 -04:00
Sebastian Neubauer	4359b870b1	[AMDGPU] Init scratch only if necessary If no scratch or flat instructions are used, we do not need to initialize the flat scratch hardware register. Differential Revision: https://reviews.llvm.org/D105920	2021-07-14 10:45:22 +02:00
Matt Arsenault	eebe841a47	RegAlloc: Allow targets to split register allocation AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point. Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill. This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers. In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run. One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work. Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator.	2021-07-13 18:49:29 -04:00
Stanislav Mekhanoshin	6fb02596a2	[AMDGPU] Add support for architected flat scratch Add support for the readonly flat Scratch register initialized by the SPI. Differential Revision: https://reviews.llvm.org/D102432	2021-05-14 10:53:48 -07:00

1 2 3 4

171 Commits