llvm-project

Author	SHA1	Message	Date
Austin Kerbow	0234d90d81	[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768 ) It was shown experimentally that this may have some benefit on newer HW.	2024-03-31 10:46:05 -07:00
Matt Arsenault	a6382de399	AMDGPU: Refactor mfma hazard handling [NFC] (#84276 ) Try to make this editable by using functions for the number of wait states as a function of the number of passes. I'm assuming the current hazard test coverage is comprehensive. This could probably use another round to further simplify it. Alternatively, I believe this could all be expressed in a constant table indexed by an instruction classify function and number of passes.	2024-03-07 14:39:59 +05:30
Matt Arsenault	0f3628a937	AMDGPU: Correct cycle counts for f64 mfma on gfx940 (#83782 )	2024-03-06 09:36:01 +05:30
Ivan Kosarev	dfa1d9b027	[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772 ) These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined constants and functions.	2024-02-23 17:34:55 +00:00
Matt Arsenault	659ce8f665	AMDGPU: Simplify else if to else in GCNHazardRecognizer Fixes #79736	2024-01-30 08:17:04 +05:30
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Petar Avramovic	149ed9d2c5	AMDGPU: update GFX11 wmma hazards (#76143 ) One V_NOP or unrelated VALU instruction in between is required for correctness when matrix A or B of current WMMA instruction overlaps with matrix D of previous WMMA instruction. Remaining cases of WMMA operand overlaps are handled by the hardware and do not require handling in hazard recognizer. Hardware may stall in cases where: - matrix C of current WMMA instruction overlaps with matrix D of previous WMMA instruction - VALU instruction reads matrix D of previous WMMA instruction - matrix A,B or C of WMMA instruction reads result of previous VALU instruction	2024-01-24 12:00:35 +01:00
Pierre van Houtryve	42b0884238	[AMDGPU] Handle V_PERMLANE64_B32 in fixVcmpxPermlaneHazards (#79125 ) Fixes #78856	2024-01-23 13:10:58 +01:00
Jay Foad	97747467f1	[AMDGPU] Update hazard recognition for new GFX12 wait counters (#78722 ) In most cases the hazards no longer apply, so just assert that we are not on GFX12.	2024-01-19 15:30:41 +00:00
Jay Foad	ba52f06f9d	[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438 ) Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.	2024-01-18 10:47:45 +00:00
Jay Foad	b120dae9bb	[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628 ) Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc operand in gfx12 DS_PARAM_LOAD or DS_DIRECT_LOAD instructions is set appropriately depending on whether a hazard is found or not, rather than inserting an S_WAITCNT_DEPCTR instruction if a hazard needs to be mitigated. Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>	2024-01-11 13:20:19 +00:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Michael Maitland	85e3875ad7	[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics D150312 added a TODO: TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction. This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency. This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems. Differential Revision: https://reviews.llvm.org/D158568	2023-08-24 19:21:36 -07:00
Michael Maitland	71bfec762b	Revert "[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics" This reverts commit 5b854f2c23ea1b000cb4cac4c0fea77326c03d43. Build still failing.	2023-08-24 15:37:27 -07:00
Michael Maitland	5b854f2c23	[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics D150312 added a TODO: TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction. This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency. This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems. Differential Revision: https://reviews.llvm.org/D158568	2023-08-24 15:25:42 -07:00
Stephen Thomas	2dfb4b56fe	[AMDGPU] Fix incorrect hazard mitigation GCNHazardRecognizer::fixVcmpxExecWARHazard() mitigates a specific hazard by inserting a wait on sa_sdst==0 if such a wait isn't already present. Unfortunately, the check for an existing wait incorrectly checks for one that doesn't actually care about sa_sdst itself, but requires that no other counters are waited for. Once the check is performed correctly, a lit test needs to be updated, since it is currently testing for the incorrect behaviour. Differential Revision: https://reviews.llvm.org/D154438	2023-07-04 14:42:51 +01:00
Stephen Thomas	8aedad0fa0	[AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands Add functions AMDGPU::DepCtr::encodeField() and AMDGPU::DepCtr::decodeField() for each of vm_vsrc, va_vdst and sa_sdst. These are now used in AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with S_WAITCNT_DEPCTR operands easier and more readable. Differential Revision: https://reviews.llvm.org/D154424	2023-07-04 11:02:12 +01:00
Sergei Barannikov	aa2d0fbc30	[MC] Add MCRegisterInfo::regunits for iteration over register units Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152098	2023-06-16 05:39:50 +03:00
Jay Foad	890c76a931	[AMDGPU] Fix odd implicit operand handling in clause breaking By inspection. Because of the strange behaviour of MI.uses(), this was adding implicit defs to the clause uses set, and then wrongly detecting a conflict between explicit defs and implicit defs. For example it would detect a conflict on this pair of instructions: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4088, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1, addrspace 5) $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4092, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1 + 4, addrspace 5) Differential Revision: https://reviews.llvm.org/D150947	2023-05-19 21:24:33 +01:00
Kazu Hirata	4241d890ae	[Target] Use range-based for loops (NFC)	2023-04-15 14:14:56 -07:00
Jay Foad	a07584d57d	[CodeGen] Make more use of MachineOperand::getOperandNo. NFC. Differential Revision: https://reviews.llvm.org/D143252	2023-02-07 11:50:57 +00:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Carl Ritson	5bc703f755	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422	2022-12-20 16:22:14 +09:00
Jay Foad	49762162ea	[AMDGPU] Remove isLiteralConstant and isLiteralConstantLike isLiteralConstant and isLiteralConstantLike were similar to !isInlineConstant with slight differences like handling isReg operands. To avoid a profusion of similar functions with undocumented differences, this patch removes all the isLiteralConstant* variants. Callers are responsible for handling the isReg case. Differential Revision: https://reviews.llvm.org/D125759	2022-11-17 16:45:48 +00:00
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00
Stephen Thomas	c8a90316fa	[AMDGPU] Small cleanups in wait counter code A small number of cleanups and refactors intended to enhance readability in two passes that deal with s_waitcnt instructions. Differential Revision: https://reviews.llvm.org/D136677	2022-10-28 11:05:02 +01:00
Jay Foad	9bb1e21f07	[AMDGPU] Clean up calls to MachineOperand::setIsDead and friends. NFC.	2022-10-28 10:44:08 +01:00
Matt Arsenault	575eed3dac	AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs If inline asm has a VGPR def, it must have come from a VGPR write somewhere inside the asm. This should be further extended to all read after write hazards.	2022-10-12 17:25:24 -07:00
Carl Ritson	a35013bec6	[AMDGPU][GFX11] Mitigate VALU mask write hazard VALU use of an SGPR (pair) as mask followed by SALU write to the same SGPR can cause incorrect execution of subsequent SALU reads of the SGPR. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D134151	2022-10-01 16:21:24 +09:00
Jay Foad	f19cc793d2	[AMDGPU] Disable fp atomic to s_denorm_mode hazard for GFX11 This hazard only exists on GFX10. Differential Revision: https://reviews.llvm.org/D134276	2022-09-20 17:40:49 +01:00
Stanislav Mekhanoshin	fb28bf3fb4	[AMDGPU] Fix liveness verifier error in hazard recognizer After D133067 we are inserting swaps to use a new physical register. I have noticed verifier errors about undefined physical register uses if we are tracking liveness post RA. We have no access to LIS at this point, so mark new register uses as undef to calm down the verifier. Liveness should not matter at this point anyway. Note the description of the RegState::Undef: "Value of the register doesn't matter." I.e. it does not say it is strictly undefined. In fact that is what we really need: this value does not matter. I also had to modify the test a bit since with tracking enabled it does not pass verification even before the recognizer. Differential Revision: https://reviews.llvm.org/D133459	2022-09-07 16:30:36 -07:00
Stanislav Mekhanoshin	95d497ff2a	[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR In this case gfx90a uses v0 instead of the correct register. Swap the value temporarily with a lower register and then swap it back. Unfortunately hazard recognizer works after wait count insertion, so we cannot simply reuse an arbitrary register, hence w/a also includes a full waitcount. This can be avoided if we run it from expandPostRAPseudo, but that is a complete misplacement. Differential Revision: https://reviews.llvm.org/D133067	2022-09-07 14:23:49 -07:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Vang Thao	7fc52d7c8b	[AMDGPU] Fix DGEMM hazard for GFX90a For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert wait-states to avoid data hazard. However when there is a DGEMM instruction in-between them, SQ incorrectly disables the wait-states thus the data hazard needs to be handled with this workaround. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130677	2022-08-01 11:56:22 -07:00
Piotr Sobczak	f29a19b0b8	[AMDGPU] Extend cases for ReadM0MovRelInterpHazard Extend hazard recognizer of ReadM0MovRelInterpHazard with DS_READ_ADDTID and DS_WRITE_ADDTID, as they also require a manually inserted S_NOP after SALU writing m0. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130783	2022-08-01 17:59:33 +02:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Jay Foad	13107c2770	[AMDGPU] Add support for GFX11 LDSDIR hazards Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to allow concurrent issue of LDS direct with VALU execution. Also detect LDS direct versus VMEM source VGPR hazards and insert vm_vsrc=0 waits using s_waitcnt_depctr. Differential Revision: https://reviews.llvm.org/D127963	2022-06-20 21:58:12 +01:00
Jay Foad	9dff14be9e	[AMDGPU] Add support for GFX11 hazards Add support for partial stall over EXEC hazard and trans use hazard. Differential Revision: https://reviews.llvm.org/D127872	2022-06-16 08:15:21 +01:00
Austin Kerbow	bd9eed3aec	[AMDGPU] Add isMFMA helper function. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127124	2022-06-14 22:01:49 -07:00
Stanislav Mekhanoshin	5c974d086c	[AMDGPU] Fix hazard handling of v_cmpx to permlane - VOP3 and SDWA forms of V_CMPX were not handled - Hazard only exists if the compare defines EXEC (i.e. V_CMPX) forwarded to the permlane. Differential Revision: https://reviews.llvm.org/D127344	2022-06-09 10:33:54 -07:00
Stanislav Mekhanoshin	63f21f4cc7	[AMDGPU] Handle LDS DMA and LDS_DIRECT hazards There shall be 1 wait state between M0 write and LDS DMA/LDS_DIRECT use. Differential Revision: https://reviews.llvm.org/D124550	2022-05-04 14:45:16 -07:00
Stanislav Mekhanoshin	d951d937a0	[AMDGPU] Increate hazard for store dwordx3/4 to 2 waitstates on gfx940 Fixes: SWDEV-327053 Differential Revision: https://reviews.llvm.org/D123687	2022-04-13 14:21:45 -07:00
Stanislav Mekhanoshin	f311f934e1	[AMDGPU] gfx940 VALU hazard recognizer Differntial Revision: https://reviews.llvm.org/D122339	2022-03-29 10:57:54 -07:00
Stanislav Mekhanoshin	64838ba365	[AMDGPU] Use GenericTable to classify DGEMM Since there is a table introduced for MAI instructions extend it to use for DGEMM classification. Differential Revision: https://reviews.llvm.org/D122337	2022-03-24 13:00:37 -07:00
Stanislav Mekhanoshin	cad9de71d7	[AMDGPU] gfx940 MAI hazard recognizer Differential Revision: https://reviews.llvm.org/D122263	2022-03-24 12:59:52 -07:00
Austin Kerbow	1e15adba62	[AMDGPU] Add s_nop WaitStates between neighboring mfma In some cases padding bubbles between sequential MFMA instructions may lead to increased inter-wave performance. Add option to request to pad some portion of these stall cycles with s_nops. Fixes: SWDEV-326925 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D121437	2022-03-23 13:56:09 -07:00
Stanislav Mekhanoshin	e9a49c6483	[AMDGPU] gfx940 basic speed model This is incomplete and will handle more instructions as they are added. Differential Revision: https://reviews.llvm.org/D121966	2022-03-18 13:19:47 -07:00
Thomas Symalla	380ff31d83	[AMDGPU] Fix typo in comment [NFC] This replaces "V_MOB_B32" with "V_MOV_B32" in some comment.	2022-02-22 13:27:26 +01:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00

1 2 3

134 Commits