llvm-project

Author	SHA1	Message	Date
Jay Foad	7e5019e82b	[AMDGPU] Simplify WaitcntBrackets::getRegInterval with getPhysRegBaseClass (#74087 ) This means that getRegInterval no longer depends on the MCInstrDesc, so it could be simplified further to take just a MachineOperand or just a physical register. NFCI.	2023-12-18 14:16:02 +00:00
Jie Fu	f0b44ce28e	[AMDGPU] Fix -Wunused-variable in SIInsertWaitcnts.cpp (NFC) llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1322:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = ^ llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1334:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = BuildMI(Block, It, DL, TII->get(AMDGPU::S_WAITCNT_VSCNT)) ^ 2 errors generated.	2023-12-15 20:00:18 +08:00
Pierre van Houtryve	ef067f5204	[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830 ) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>	2023-12-15 12:33:32 +01:00
Pierre van Houtryve	f1ea77f7be	[AMDGPU][SIInsertWaitcnts] Set initial state for VS_CNT in non-kernel functions (#75436 ) Split from #72830	2023-12-15 08:31:14 +01:00
Stanislav Mekhanoshin	c6ecbcb48b	[AMDGPU] Fix no waitcnt produced between LDS DMA and ds_read on gfx10 (#75245 ) BUFFER_LOAD_DWORD_LDS was incorrectly touching vscnt instead of the vmcnt. This is VMEM load and DS store, so it shall use vmcnt.	2023-12-13 10:49:36 -08:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Ivan Kosarev	d1e3d32088	[AMDGPU][NFCI] Decouple actual register encodings from HWEncoding values. (#69452 ) The HWEncoding values currently form a strange mix of actual register codes for some subtargets and types of operands and informational flags. This patch removes the dependency allowing arbitrary changes in the structure of HWEncoding values without breaking register encodings. Such changes, in turn, would make it possible to speed up and simplify getAVOperandEncoding() testing for AGPRs as well as other functions dealing with register codes downstream. They would also allow to maintain the same format of HWEncoding values across our downstream code bases, thus simplifying merging in mainline changes.	2023-10-25 13:24:50 +01:00
Ivan Kosarev	637dfc5f9a	[AMDGPU][True16] Support disassembling .h registers. Differential Revision: https://reviews.llvm.org/D156939	2023-09-27 12:02:50 +01:00
Juan Manuel MARTINEZ CAAMAÑO	356494c36e	[NFC][AMDGPU] Perform a single lookup in map in SIInsertWaitcnts::isPreheaderToFlush	2023-09-20 14:02:04 +02:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Luke Drummond	ce0d16f574	[NFC][AMDGPU] assert scoreboard index is in range `getRegInterval` can theoretically return AGPRs or SGPRS which aren't valid when updating the VgprMemTypes array. Make this clear with an assert.	2023-08-25 13:30:06 +01:00
Jay Foad	3091bdb86d	[AMDGPU] Do not release VGPRs at -O0 This was an oversight when the GFX11 early release VGPRs optimization was reimplemented in D153279. Sending the DEALLOC_VGPRS message is a performance optimization so there is no need to do it at -O0. In addition it makes some kinds of post mortem debugging hard or impossible, since VGPR values are no longer available to inspect at the s_endpgm instruction. Differential Revision: https://reviews.llvm.org/D157599	2023-08-10 14:58:06 +01:00
Jay Foad	e61ca23289	[AMDGPU] Add and use SIInstrFlags::GWS. NFC. This reduces the number of places where we have to check for a list of DS_GWS_* opcodes. Differential Revision: https://reviews.llvm.org/D157099	2023-08-07 12:05:14 +01:00
Jay Foad	7fa7a08f21	[AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) Differential Revision: https://reviews.llvm.org/D155681	2023-07-19 10:33:11 +01:00
Jay Foad	f2c164c815	[AMDGPU] Do not wait for vscnt on function entry and return SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in this way. It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer. Hence there is no need to conservatively wait for vscnt to be 0 on function entry and before returns. Differential Revision: https://reviews.llvm.org/D153537	2023-07-04 12:22:38 +01:00
Jay Foad	98c4ab146b	[AMDGPU] Simplify BlockInfo in SIInsertWaitcnts. NFC.	2023-06-20 20:10:12 +01:00
Jay Foad	4b6d41cd1d	[AMDGPU] Do not release VGPRs if there may be pending scratch stores Differential Revision: https://reviews.llvm.org/D153295	2023-06-19 21:12:43 +01:00
Jay Foad	b3a08fa317	[AMDGPU] Remove unused macro CNT_MASK	2023-06-19 21:08:35 +01:00
Jay Foad	eb7491769a	[AMDGPU] Reimplement the GFX11 early release VGPRs optimization Implement this optimization in SIInsertWaitcnts, where we already have information about whether there might be outstanding VMEM store instructions. This has the following advantages: - Correctly handles atomics-with-return. - Correctly handles call instructions. - Should be faster because it does not require running a separate pass. Differential Revision: https://reviews.llvm.org/D153279	2023-06-19 17:12:54 +01:00
Austin Kerbow	e501ed84aa	[AMDGPU] Don't flush vmcnt for loops with use/def pairs Conditions for hoisting vmcnt with flat instructions should be similar to VMEM. If there are use/def pairs in a loop body we cannot guarantee that hosting the waitcnt will be profitable. Better heuristics are needed to analyse whether gains from avoiding waitcnt in loop bodys outweighs waiting for loads in the preheader. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D151126	2023-06-02 22:55:12 -07:00
Ronak Chauhan	5f0b92e580	[AMDGPU] Also consider global and scratch instructions when flushing vmcnt counter in loop preheader Reviewed By: foad Differential Revision: https://reviews.llvm.org/D149332	2023-05-05 21:12:10 +05:30
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Ruiling Song	9119d9bfce	AMDGPU/SIInsertWait: Skip dummy tied source For D16 memory load instructions, the hardware usually only write to half of the 32bit register, but we define the destination register using 32bit register for the MachineIR instruction. Without the extra tied source register, LLVM framework will think previous write to the other half of the register being dead. This is because by using 32bit register as the destination register, LLVM will think the instruction will always overwrite the whole 32bit register. By adding the extra tied source, LLVM will think we are reading the register, so previous write to the register will not be dead. This dummy tied source is introducing unnecessary read-after-write dependency. The change here is to bypass the tied source that can be skipped, thus avoiding an unnecessary s_waitcnt. Reviewed by: foad Differential Revision: https://reviews.llvm.org/D140537	2023-01-11 09:59:35 +08:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Stephen Thomas	ab2e27faa4	[AMDGPU] Small cleanup in insertWaitcntInBlock() Move some code that checks if an instruction is a waitcount into a separate function, mainly to aid readability in the logic where it is used. Differential Revision: https://reviews.llvm.org/D139522	2022-12-07 11:58:59 +00:00
Ruiling Song	0eaf6759ae	[AMDGPU][InsertWaits] No wait for WAW for global/scratch_load global/scratch_load will return in order they are issued. No need to insert a s_waitcnt for WAW hazard. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D138476	2022-11-23 09:57:50 +08:00
Jay Foad	bc0c89d3d8	[AMDGPU] More cleanup after D117544. NFC.	2022-11-22 21:15:22 +00:00
Jay Foad	6ed6e8e3b8	[AMDGPU] Remove RegStrictDom variable. NFC. D117544 removed the only substantive use of RegStrictDom. Now we can simplify by using StrictDom for everything.	2022-11-22 17:08:00 +00:00
Jay Foad	dce7c09bd4	[AMDGPU] Define and use new allZeroWaitcnt helper. NFC.	2022-11-22 17:01:36 +00:00
Stephen Thomas	69dd9910c7	[AMDGPU] Declutter applyPreexistingWaitcnt() Declutter applyPreexistingWaitcnt() a little by abstracting the code that updates the operands to S_WAITCNT and S_WAITCNT_VSCNT instructions into discrete functions. Differential Revision: https://reviews.llvm.org/D137624	2022-11-08 11:51:16 +00:00
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00
Jay Foad	86dc6a3c0f	[AMDGPU] Constify a couple of methods. NFC.	2022-11-02 11:04:54 +00:00
Stephen Thomas	c8a90316fa	[AMDGPU] Small cleanups in wait counter code A small number of cleanups and refactors intended to enhance readability in two passes that deal with s_waitcnt instructions. Differential Revision: https://reviews.llvm.org/D136677	2022-10-28 11:05:02 +01:00
Jay Foad	f0ca946bf9	[AMDGPU] New helper function SIInsertWaitcnts::getVmemWaitEventType This just commons up and simplifies some logic that was repeated in SIInsertWaitcnts::updateEventWaitcntAfter. NFCI. Differential Revision: https://reviews.llvm.org/D136253	2022-10-19 16:22:50 +01:00
Baptiste	b556726ccc	[AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary One of the conditions to flush the vmcnt counter in loop preheaders is: The loop contains a use of a vgpr that is defined out of the loop. The code currently checks if a waitcnt is needed by looking at the score of that vgpr in the score brackets. This is not enough and may cause the generation of an unnecessary vmcnt flush. This patch fixes that case. Differential Revision: https://reviews.llvm.org/D130313	2022-09-28 13:05:50 -04:00
Austin Kerbow	2c82a126d7	[AMDGPU] Omit unnecessary waitcnt before barriers It is not necessary to wait for all outstanding memory operations before barriers on hardware that can back off of the barrier in the event of an exception when traps are enabled. Add a new subtarget feature which tracks which HW has this ability. Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D130722	2022-07-29 11:12:36 -07:00
Baptiste Saleil	79e77a9f39	[AMDGPU] Flush the vmcnt counter in loop preheaders when necessary waitcnt vmcnt instructions are currently generated in loop bodies before using values loaded outside of the loop. In some cases, it is better to flush the vmcnt counter in a loop preheader before entering the loop body. This patch detects these cases and generates waitcnt instructions to flush the counter. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D115747	2022-06-23 10:53:21 -04:00
Joe Nash	2a68364745	[AMDGPU] gfx11 waitcnt support for VINTERP and LDSDIR instructions Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127781	2022-06-17 09:30:37 -04:00
Jay Foad	6c372daa84	[AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and s_sendmsg_rtn_b64 instructions. Differential Revision: https://reviews.llvm.org/D127315	2022-06-10 08:15:23 +01:00
Joe Nash	d21b9b4946	[AMDGPU] gfx11 scalar alu instructions MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn. Contributors: Jay Foad <jay.foad@amd.com> Patch 7/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125319 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D125498	2022-05-17 13:35:41 -04:00
Stanislav Mekhanoshin	791ec1c68e	[AMDGPU] Add intrinsics llvm.amdgcn.{raw\|struct}.buffer.load.lds Differential Revision: https://reviews.llvm.org/D124884	2022-05-17 10:32:13 -07:00
Stanislav Mekhanoshin	51e02409f0	[AMDGPU] Produce waitcounts for LDS DMA MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written can be accessed. A load from LDS to VMEM does not need a wait. Differential Revision: https://reviews.llvm.org/D124626	2022-04-29 11:14:11 -07:00
Austin Kerbow	7f97ac94f7	Revert "[AMDGPU] Omit unnecessary waitcnt before barriers" This reverts commit 8d0c34fd4fb66ea0d19563154a59658e4b7f35d4.	2022-04-18 21:24:08 -07:00
Venkata Ramanaiah Nalamothu	04fff547e2	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652	2022-03-09 12:18:02 +05:30
Austin Kerbow	8d0c34fd4f	[AMDGPU] Omit unnecessary waitcnt before barriers It is not necessary to wait for all outstanding memory operations before barriers on hardware that can back off of the barrier in the event of an exception when traps are enabled. Add a new subtarget feature which tracks which HW has this ability. Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D120544	2022-03-07 08:23:53 -08:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00
Joe Nash	c87c61c52c	[AMDGPU] Fix AGPR offset for waitcnt An enum value stores the offset between AGPR ranges and VGPR ranges in the internal storage of SIInsertWaitcnts. It said 226 when it should say 256, causing some portion of the ranges to overlap. That in turn causes 'aliasing' between the registers, potentially inserting waitcnts that are not required. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D119749	2022-02-14 15:16:21 -05:00
Sebastian Neubauer	f1e36474b9	[AMDGPU][NFC] Fix debug prints Print the instructions instead of pointers.	2022-01-24 13:55:00 +01:00
Piotr Sobczak	8dfb417e67	[AMDGPU] Fix missing waitcnt issue Ignore out of order counters when merging brackets. The fact that there was a pending event in the old state does not guarantee that the waitcnt was generated, so we still need to conservatively re-process the block. The patch fixes a correctness issue where the block was not re-processed and the waitcnt not inserted in consequence. Differential Revision: https://reviews.llvm.org/D117544	2022-01-19 10:54:44 +01:00
Ron Lieberman	09b53296cf	Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range" This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3. Failed amdgpu runtime buildbot # 3514	2021-12-22 11:39:28 -05:00

1 2 3 4

175 Commits