llvm-project

Author	SHA1	Message	Date
Pravin Jagtap	5e007afa9d	[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589 ) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop for above cases irrespective of opsel values. Note: We might need to add few others into the same table.	2024-12-11 18:38:10 +05:30
Matt Arsenault	39337ff2dc	AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844 ) gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruction writing some other destination VREG after a conversion to F4/F8 since it writes either low/high half or bytes. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>	2024-12-02 09:23:17 -05:00
Matt Arsenault	27a8afa3fc	AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (#117287 )	2024-11-25 09:33:04 -08:00
Matt Arsenault	c3fe5ad6be	AMDGPU: Handle vcmpx+permalane gfx950 hazard (#117286 ) Confusingly, this is a different hazard to the one on gfx10 with a subtarget feature.	2024-11-25 09:27:53 -08:00
Matt Arsenault	3db4f5b0da	AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285 ) The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case was 1 wait state overly conservative. Previously, for gfx940, the XDL/non-XDL cases happened to have the same number of cycles in all cases. Now the XDL consumer case has an additional state for 2 pass sources.	2024-11-25 09:23:51 -08:00
Matt Arsenault	85601fd78f	AMDGPU: Handle v_mfma_f64_16x16x4_f64 write VGPR read srca/srcb hazard change for gfx950 (#117284 ) Increase in wait states from 11 to 19. The index for smfmac counts as like srcA/srcB.	2024-11-22 20:30:06 -08:00
Matt Arsenault	db08d78c3e	AMDGPU: Handle v_mfma_f64_16x16x4_f64 srcc write VGPR hazard change for gfx950 (#117283 ) Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17 wait states.	2024-11-22 20:26:58 -08:00
Matt Arsenault	8cb6c9907c	AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (#117263 ) These have an additional wait state compared to gfx940.	2024-11-22 20:23:46 -08:00
Matt Arsenault	b078b882b9	AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (#117262 ) Increase from 11 wait states to 19	2024-11-22 20:20:23 -08:00
Juan Manuel Martinez Caamaño	d617371375	[AMDGPU] Use the SchedModel available in SIInstrInfo (#110859 ) Instead of allocating an initializing a new instance in `GCNHazardRecognizer` and `AMDGPUInsertDelayAlu`.	2024-10-02 18:17:27 +02:00
Carl Ritson	86627149f6	[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067 ) Any SGPR read by a VALU can potentially obscure SALU writes to the same register. Insert s_wait_alu instructions to mitigate the hazard on affected paths. Compute a global cache of SGPRs with any VALU reads and use this to avoid inserting mitigation for SGPRs never accessed by VALUs. To avoid excessive search when compile time is priority implement secondary mode where all SALU writes are mitigated. Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-09-04 12:15:20 +09:00
Carl Ritson	987ffc31f8	[AMDGPU] Refactor code for GETPC bundle updates in hazards (NFCI) As suggested in review for PR #100067. Refactor code for S_GETPC_B64 bundle updates for use with multiple hazard mitigations.	2024-08-23 11:58:47 +09:00
Jeffrey Byrnes	7bcf4d63cf	[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276 ) MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" and "VALU op consumes result of that op" This includes the case where the second op is SDWA with same dest and dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there is an implicit read of the first op dst and the compiler needs to resolve this hazard. Confirmed with HW team. We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand, so this PR checks for that. MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue. Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 -- this PR adds support opsel[2] != 0 as well	2024-08-22 11:38:24 -07:00
Carl Ritson	939a6624ac	[AMDGPU] Implement workaround for GFX11.5 export priority (#99273 ) On GFX11.5 shaders having completed exports need to execute/wait at a lower priority than shaders still executing exports. Add code to maintain normal priority of 2 for shaders that export and drop to priority 0 after exports.	2024-07-23 17:03:21 +09:00
Jay Foad	aeafdc21d2	[AMDGPU] Use using instead of typedef. NFC.	2024-07-16 16:44:12 +01:00
Jay Foad	0606747c96	[AMDGPU] Remove some pointless fallthrough annotations	2024-05-01 16:04:35 +01:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Austin Kerbow	0234d90d81	[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768 ) It was shown experimentally that this may have some benefit on newer HW.	2024-03-31 10:46:05 -07:00
Matt Arsenault	a6382de399	AMDGPU: Refactor mfma hazard handling [NFC] (#84276 ) Try to make this editable by using functions for the number of wait states as a function of the number of passes. I'm assuming the current hazard test coverage is comprehensive. This could probably use another round to further simplify it. Alternatively, I believe this could all be expressed in a constant table indexed by an instruction classify function and number of passes.	2024-03-07 14:39:59 +05:30
Matt Arsenault	0f3628a937	AMDGPU: Correct cycle counts for f64 mfma on gfx940 (#83782 )	2024-03-06 09:36:01 +05:30
Ivan Kosarev	dfa1d9b027	[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772 ) These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined constants and functions.	2024-02-23 17:34:55 +00:00
Matt Arsenault	659ce8f665	AMDGPU: Simplify else if to else in GCNHazardRecognizer Fixes #79736	2024-01-30 08:17:04 +05:30
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Petar Avramovic	149ed9d2c5	AMDGPU: update GFX11 wmma hazards (#76143 ) One V_NOP or unrelated VALU instruction in between is required for correctness when matrix A or B of current WMMA instruction overlaps with matrix D of previous WMMA instruction. Remaining cases of WMMA operand overlaps are handled by the hardware and do not require handling in hazard recognizer. Hardware may stall in cases where: - matrix C of current WMMA instruction overlaps with matrix D of previous WMMA instruction - VALU instruction reads matrix D of previous WMMA instruction - matrix A,B or C of WMMA instruction reads result of previous VALU instruction	2024-01-24 12:00:35 +01:00
Pierre van Houtryve	42b0884238	[AMDGPU] Handle V_PERMLANE64_B32 in fixVcmpxPermlaneHazards (#79125 ) Fixes #78856	2024-01-23 13:10:58 +01:00
Jay Foad	97747467f1	[AMDGPU] Update hazard recognition for new GFX12 wait counters (#78722 ) In most cases the hazards no longer apply, so just assert that we are not on GFX12.	2024-01-19 15:30:41 +00:00
Jay Foad	ba52f06f9d	[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438 ) Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.	2024-01-18 10:47:45 +00:00
Jay Foad	b120dae9bb	[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628 ) Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc operand in gfx12 DS_PARAM_LOAD or DS_DIRECT_LOAD instructions is set appropriately depending on whether a hazard is found or not, rather than inserting an S_WAITCNT_DEPCTR instruction if a hazard needs to be mitigated. Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>	2024-01-11 13:20:19 +00:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Michael Maitland	85e3875ad7	[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics D150312 added a TODO: TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction. This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency. This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems. Differential Revision: https://reviews.llvm.org/D158568	2023-08-24 19:21:36 -07:00
Michael Maitland	71bfec762b	Revert "[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics" This reverts commit 5b854f2c23ea1b000cb4cac4c0fea77326c03d43. Build still failing.	2023-08-24 15:37:27 -07:00
Michael Maitland	5b854f2c23	[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics D150312 added a TODO: TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction. This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency. This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems. Differential Revision: https://reviews.llvm.org/D158568	2023-08-24 15:25:42 -07:00
Stephen Thomas	2dfb4b56fe	[AMDGPU] Fix incorrect hazard mitigation GCNHazardRecognizer::fixVcmpxExecWARHazard() mitigates a specific hazard by inserting a wait on sa_sdst==0 if such a wait isn't already present. Unfortunately, the check for an existing wait incorrectly checks for one that doesn't actually care about sa_sdst itself, but requires that no other counters are waited for. Once the check is performed correctly, a lit test needs to be updated, since it is currently testing for the incorrect behaviour. Differential Revision: https://reviews.llvm.org/D154438	2023-07-04 14:42:51 +01:00
Stephen Thomas	8aedad0fa0	[AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands Add functions AMDGPU::DepCtr::encodeField() and AMDGPU::DepCtr::decodeField() for each of vm_vsrc, va_vdst and sa_sdst. These are now used in AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with S_WAITCNT_DEPCTR operands easier and more readable. Differential Revision: https://reviews.llvm.org/D154424	2023-07-04 11:02:12 +01:00
Sergei Barannikov	aa2d0fbc30	[MC] Add MCRegisterInfo::regunits for iteration over register units Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152098	2023-06-16 05:39:50 +03:00
Jay Foad	890c76a931	[AMDGPU] Fix odd implicit operand handling in clause breaking By inspection. Because of the strange behaviour of MI.uses(), this was adding implicit defs to the clause uses set, and then wrongly detecting a conflict between explicit defs and implicit defs. For example it would detect a conflict on this pair of instructions: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4088, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1, addrspace 5) $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4092, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1 + 4, addrspace 5) Differential Revision: https://reviews.llvm.org/D150947	2023-05-19 21:24:33 +01:00
Kazu Hirata	4241d890ae	[Target] Use range-based for loops (NFC)	2023-04-15 14:14:56 -07:00
Jay Foad	a07584d57d	[CodeGen] Make more use of MachineOperand::getOperandNo. NFC. Differential Revision: https://reviews.llvm.org/D143252	2023-02-07 11:50:57 +00:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Carl Ritson	5bc703f755	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422	2022-12-20 16:22:14 +09:00
Jay Foad	49762162ea	[AMDGPU] Remove isLiteralConstant and isLiteralConstantLike isLiteralConstant and isLiteralConstantLike were similar to !isInlineConstant with slight differences like handling isReg operands. To avoid a profusion of similar functions with undocumented differences, this patch removes all the isLiteralConstant* variants. Callers are responsible for handling the isReg case. Differential Revision: https://reviews.llvm.org/D125759	2022-11-17 16:45:48 +00:00
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00
Stephen Thomas	c8a90316fa	[AMDGPU] Small cleanups in wait counter code A small number of cleanups and refactors intended to enhance readability in two passes that deal with s_waitcnt instructions. Differential Revision: https://reviews.llvm.org/D136677	2022-10-28 11:05:02 +01:00
Jay Foad	9bb1e21f07	[AMDGPU] Clean up calls to MachineOperand::setIsDead and friends. NFC.	2022-10-28 10:44:08 +01:00
Matt Arsenault	575eed3dac	AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs If inline asm has a VGPR def, it must have come from a VGPR write somewhere inside the asm. This should be further extended to all read after write hazards.	2022-10-12 17:25:24 -07:00
Carl Ritson	a35013bec6	[AMDGPU][GFX11] Mitigate VALU mask write hazard VALU use of an SGPR (pair) as mask followed by SALU write to the same SGPR can cause incorrect execution of subsequent SALU reads of the SGPR. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D134151	2022-10-01 16:21:24 +09:00
Jay Foad	f19cc793d2	[AMDGPU] Disable fp atomic to s_denorm_mode hazard for GFX11 This hazard only exists on GFX10. Differential Revision: https://reviews.llvm.org/D134276	2022-09-20 17:40:49 +01:00
Stanislav Mekhanoshin	fb28bf3fb4	[AMDGPU] Fix liveness verifier error in hazard recognizer After D133067 we are inserting swaps to use a new physical register. I have noticed verifier errors about undefined physical register uses if we are tracking liveness post RA. We have no access to LIS at this point, so mark new register uses as undef to calm down the verifier. Liveness should not matter at this point anyway. Note the description of the RegState::Undef: "Value of the register doesn't matter." I.e. it does not say it is strictly undefined. In fact that is what we really need: this value does not matter. I also had to modify the test a bit since with tracking enabled it does not pass verification even before the recognizer. Differential Revision: https://reviews.llvm.org/D133459	2022-09-07 16:30:36 -07:00
Stanislav Mekhanoshin	95d497ff2a	[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR In this case gfx90a uses v0 instead of the correct register. Swap the value temporarily with a lower register and then swap it back. Unfortunately hazard recognizer works after wait count insertion, so we cannot simply reuse an arbitrary register, hence w/a also includes a full waitcount. This can be avoided if we run it from expandPostRAPseudo, but that is a complete misplacement. Differential Revision: https://reviews.llvm.org/D133067	2022-09-07 14:23:49 -07:00

1 2 3 4

151 Commits