llvm-project

Author	SHA1	Message	Date
Matt Arsenault	fac34c01f6	AMDGPU: Bulk update some call tests to use opaque pointers	2022-11-29 18:10:38 -05:00
Ron Lieberman	ca856fff1c	Revert "enable code-object-version=5" very sorry wrong repo. This reverts commit d882ba7aeac4b496dccd1b10cb58bd691786b691.	2022-11-29 15:21:09 -06:00
Ron Lieberman	d882ba7aea	enable code-object-version=5	2022-11-29 15:11:57 -06:00
Matt Arsenault	729bf9b26b	AMDGPU: Enable fixed function ABI by default Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway. Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.	2021-12-04 10:49:18 -05:00
Joe Nash	3ce1b9631a	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde	2021-09-14 15:11:27 -04:00
Matt Arsenault	722b8e0e5a	AMDGPU: Invert ABI attribute handling Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated. This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass. However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit. At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.	2021-09-09 18:24:28 -04:00
madhur13490	5682ae2fc6	[AMDGPU] Set implicit arg attributes for indirect calls This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken. Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99347	2021-04-13 13:15:13 +00:00
madhur13490	3c297a2564	Make fixed-abi default for AMD HSA OS fixed-abi uses pre-defined and predictable SGPR/VGPRs for passing arguments. This patch makes this scheme default when HSA OS is specified in triple. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96340	2021-02-19 15:05:25 +00:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Sebastian Neubauer	a343b9b032	Revert "[AMDGPU] Insert waitcnt after returning from call" This reverts commit ca907bfb57d8ad3ec3bcc2cff2abab7b1b933af6. According to michel.daenzer, > This completely broke the Mesa radeonsi driver on Navi 14. Xorg + > xterm come up with major corruption & psychedelic colours.	2020-09-23 17:16:39 +02:00
Sebastian Neubauer	ca907bfb57	[AMDGPU] Insert waitcnt after returning from call When memory operations are outstanding on function calls, either the caller or the callee can insert a waitcnt to ensure that all reads are finished. Calls need some time to be executed, so if the callee inserts the waitcnt, filling the instruction buffer and waiting for memory will be interleaved, hiding some latency. This comes at the cost of having a waitcnt inside functions that may not be needed as no memory operations are outstanding. For function calls, this is already implemented. The same principal applies to returns: If the caller inserts a waitcnt after the call, the callee does not have to wait and the return and memory operation can be run in parallel. This commit implements waiting in the caller after returning from a function call. Differential Revision: https://reviews.llvm.org/D87674	2020-09-23 12:17:59 +02:00
Jay Foad	4bdab2e86a	[AMDGPU] Fix offset for REL32_HI relocs The addend in a REL32 reloc needs to be adjusted to account for the offset from the PC value returned by the s_getpc instruction to the point where the reloc is applied. This was being done correctly for (GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a difference if the target symbol happens to get loaded almost exactly a multiple of 4G away from the relocated instructions. Differential Revision: https://reviews.llvm.org/D86938	2020-09-02 10:55:55 +01:00
Christudasan Devadasan	375cec4b6c	[AMDGPU] Introduce more scratch registers in the ABI. The AMDGPU target has a convention that defined all VGPRs (execept the initial 32 argument registers) as callee-saved. This convention is not efficient always, esp. when the callee requiring more registers, ended up emitting a large number of spills, even though its caller requires only a few. This patch revises the ABI by introducing more scratch registers that a callee can freely use. The 256 vgpr registers now become: 32 argument registers 112 scratch registers and 112 callee saved registers. The scratch registers and the CSRs are intermixed at regular intervals (a split boundary of 8) to obtain a better occupancy. Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr Reviewed By: arsenm, t-tye Differential Revision: https://reviews.llvm.org/D76356	2020-05-05 23:02:58 +05:30
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Matt Arsenault	71dfb7ec5c	AMDGPU: Make s34 the FP register Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372	2019-07-08 19:03:38 +00:00
Matt Arsenault	4f3472deb2	CodeGen: Set hasSideEffects = 0 on BUNDLE The BUNDLE itself should not have side effects, and this is a property of instructions inside the bundle. The hasProperty check already searches for any member instructions, which was pointless since it was overridden by this bit. Allows me to distinguish bundles that have side effects vs. do not in a future patch. Also fixes an unnecessary scheduling barrier in the bundle AMDGPU uses to get PC relative addresses. llvm-svn: 364984	2019-07-03 00:30:47 +00:00
Matt Arsenault	d88db6d7fc	AMDGPU: Always use s33 for global scratch wave offset Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990	2019-06-20 21:58:24 +00:00
Matt Arsenault	aa41e92e17	AMDGPU: Avoid most waitcnts before calls Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465	2019-06-14 21:52:26 +00:00
Matt Arsenault	1509fde891	AMDGPU: Add baseline test for call waitcnt insertion llvm-svn: 363453	2019-06-14 21:01:23 +00:00

19 Commits