llvm-project

Author	SHA1	Message	Date
Rahul Joshi	bee9664970	[TableGen] Emit OpName as an enum class instead of a namespace (#125313 ) - Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).	2025-02-12 08:19:30 -08:00
Vigneshwar Jayakumar	1188b1ff7b	AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state change (#126132 ) There are additional wait states for XDL write VALU WAW hazard in gfx950 compared to gfx940.	2025-02-12 01:32:23 +07:00
Vigneshwar Jayakumar	a2263eba4d	AMDGPU: Handle gfx950 XDL-write-VGPR-VALU-Mem-Exp wait state change (#126727 )	2025-02-12 01:30:53 +07:00
Vigneshwar Jayakumar	c837f57286	AMDGPU: Handle gfx950 XDL-write-VGPR-Overlap-Src-AB wait state (#126732 ) gfx950 needs more additional waitstates from gfx940	2025-02-11 22:30:16 +07:00
Carl Ritson	a3a3e6997b	[AMDGPU] Rewrite GFX12 SGPR hazard handling to dedicated pass (#118750 ) - Algorithm operates over whole IR to attempt to minimize waits. - Add support for VALU->VALU SGPR hazards via VA_SDST/VA_VCC.	2025-01-30 11:21:11 +09:00
Chinmay Deshpande	9ca1323de1	[AMDGPU] Fix crash due to missing check for FLAT instructions that dont use vector registers when computing VALU hazard (#123627 )	2025-01-21 05:50:58 -08:00
Brox Chen	8a0c2e7567	[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736 ) Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and fake16 flow. Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we have to at least update the fake16 codeGen to get codeGen test passing. For this case, we have to update the true16 and with fake16 together, otherwise some of the true16 tests will fail	2025-01-16 17:18:28 -05:00
Pravin Jagtap	5e007afa9d	[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589 ) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop for above cases irrespective of opsel values. Note: We might need to add few others into the same table.	2024-12-11 18:38:10 +05:30
Matt Arsenault	39337ff2dc	AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844 ) gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruction writing some other destination VREG after a conversion to F4/F8 since it writes either low/high half or bytes. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>	2024-12-02 09:23:17 -05:00
Matt Arsenault	27a8afa3fc	AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (#117287 )	2024-11-25 09:33:04 -08:00
Matt Arsenault	c3fe5ad6be	AMDGPU: Handle vcmpx+permalane gfx950 hazard (#117286 ) Confusingly, this is a different hazard to the one on gfx10 with a subtarget feature.	2024-11-25 09:27:53 -08:00
Matt Arsenault	3db4f5b0da	AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285 ) The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case was 1 wait state overly conservative. Previously, for gfx940, the XDL/non-XDL cases happened to have the same number of cycles in all cases. Now the XDL consumer case has an additional state for 2 pass sources.	2024-11-25 09:23:51 -08:00
Matt Arsenault	85601fd78f	AMDGPU: Handle v_mfma_f64_16x16x4_f64 write VGPR read srca/srcb hazard change for gfx950 (#117284 ) Increase in wait states from 11 to 19. The index for smfmac counts as like srcA/srcB.	2024-11-22 20:30:06 -08:00
Matt Arsenault	db08d78c3e	AMDGPU: Handle v_mfma_f64_16x16x4_f64 srcc write VGPR hazard change for gfx950 (#117283 ) Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17 wait states.	2024-11-22 20:26:58 -08:00
Matt Arsenault	8cb6c9907c	AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (#117263 ) These have an additional wait state compared to gfx940.	2024-11-22 20:23:46 -08:00
Matt Arsenault	b078b882b9	AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (#117262 ) Increase from 11 wait states to 19	2024-11-22 20:20:23 -08:00
Juan Manuel Martinez Caamaño	d617371375	[AMDGPU] Use the SchedModel available in SIInstrInfo (#110859 ) Instead of allocating an initializing a new instance in `GCNHazardRecognizer` and `AMDGPUInsertDelayAlu`.	2024-10-02 18:17:27 +02:00
Carl Ritson	86627149f6	[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067 ) Any SGPR read by a VALU can potentially obscure SALU writes to the same register. Insert s_wait_alu instructions to mitigate the hazard on affected paths. Compute a global cache of SGPRs with any VALU reads and use this to avoid inserting mitigation for SGPRs never accessed by VALUs. To avoid excessive search when compile time is priority implement secondary mode where all SALU writes are mitigated. Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-09-04 12:15:20 +09:00
Carl Ritson	987ffc31f8	[AMDGPU] Refactor code for GETPC bundle updates in hazards (NFCI) As suggested in review for PR #100067. Refactor code for S_GETPC_B64 bundle updates for use with multiple hazard mitigations.	2024-08-23 11:58:47 +09:00
Jeffrey Byrnes	7bcf4d63cf	[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276 ) MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" and "VALU op consumes result of that op" This includes the case where the second op is SDWA with same dest and dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there is an implicit read of the first op dst and the compiler needs to resolve this hazard. Confirmed with HW team. We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand, so this PR checks for that. MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue. Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 -- this PR adds support opsel[2] != 0 as well	2024-08-22 11:38:24 -07:00
Carl Ritson	939a6624ac	[AMDGPU] Implement workaround for GFX11.5 export priority (#99273 ) On GFX11.5 shaders having completed exports need to execute/wait at a lower priority than shaders still executing exports. Add code to maintain normal priority of 2 for shaders that export and drop to priority 0 after exports.	2024-07-23 17:03:21 +09:00
Jay Foad	aeafdc21d2	[AMDGPU] Use using instead of typedef. NFC.	2024-07-16 16:44:12 +01:00
Jay Foad	0606747c96	[AMDGPU] Remove some pointless fallthrough annotations	2024-05-01 16:04:35 +01:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Austin Kerbow	0234d90d81	[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768 ) It was shown experimentally that this may have some benefit on newer HW.	2024-03-31 10:46:05 -07:00
Matt Arsenault	a6382de399	AMDGPU: Refactor mfma hazard handling [NFC] (#84276 ) Try to make this editable by using functions for the number of wait states as a function of the number of passes. I'm assuming the current hazard test coverage is comprehensive. This could probably use another round to further simplify it. Alternatively, I believe this could all be expressed in a constant table indexed by an instruction classify function and number of passes.	2024-03-07 14:39:59 +05:30
Matt Arsenault	0f3628a937	AMDGPU: Correct cycle counts for f64 mfma on gfx940 (#83782 )	2024-03-06 09:36:01 +05:30
Ivan Kosarev	dfa1d9b027	[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772 ) These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined constants and functions.	2024-02-23 17:34:55 +00:00
Matt Arsenault	659ce8f665	AMDGPU: Simplify else if to else in GCNHazardRecognizer Fixes #79736	2024-01-30 08:17:04 +05:30
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Petar Avramovic	149ed9d2c5	AMDGPU: update GFX11 wmma hazards (#76143 ) One V_NOP or unrelated VALU instruction in between is required for correctness when matrix A or B of current WMMA instruction overlaps with matrix D of previous WMMA instruction. Remaining cases of WMMA operand overlaps are handled by the hardware and do not require handling in hazard recognizer. Hardware may stall in cases where: - matrix C of current WMMA instruction overlaps with matrix D of previous WMMA instruction - VALU instruction reads matrix D of previous WMMA instruction - matrix A,B or C of WMMA instruction reads result of previous VALU instruction	2024-01-24 12:00:35 +01:00
Pierre van Houtryve	42b0884238	[AMDGPU] Handle V_PERMLANE64_B32 in fixVcmpxPermlaneHazards (#79125 ) Fixes #78856	2024-01-23 13:10:58 +01:00
Jay Foad	97747467f1	[AMDGPU] Update hazard recognition for new GFX12 wait counters (#78722 ) In most cases the hazards no longer apply, so just assert that we are not on GFX12.	2024-01-19 15:30:41 +00:00
Jay Foad	ba52f06f9d	[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438 ) Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.	2024-01-18 10:47:45 +00:00
Jay Foad	b120dae9bb	[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628 ) Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc operand in gfx12 DS_PARAM_LOAD or DS_DIRECT_LOAD instructions is set appropriately depending on whether a hazard is found or not, rather than inserting an S_WAITCNT_DEPCTR instruction if a hazard needs to be mitigated. Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>	2024-01-11 13:20:19 +00:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Michael Maitland	85e3875ad7	[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics D150312 added a TODO: TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction. This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency. This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems. Differential Revision: https://reviews.llvm.org/D158568	2023-08-24 19:21:36 -07:00
Michael Maitland	71bfec762b	Revert "[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics" This reverts commit 5b854f2c23ea1b000cb4cac4c0fea77326c03d43. Build still failing.	2023-08-24 15:37:27 -07:00
Michael Maitland	5b854f2c23	[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics D150312 added a TODO: TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction. This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency. This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems. Differential Revision: https://reviews.llvm.org/D158568	2023-08-24 15:25:42 -07:00
Stephen Thomas	2dfb4b56fe	[AMDGPU] Fix incorrect hazard mitigation GCNHazardRecognizer::fixVcmpxExecWARHazard() mitigates a specific hazard by inserting a wait on sa_sdst==0 if such a wait isn't already present. Unfortunately, the check for an existing wait incorrectly checks for one that doesn't actually care about sa_sdst itself, but requires that no other counters are waited for. Once the check is performed correctly, a lit test needs to be updated, since it is currently testing for the incorrect behaviour. Differential Revision: https://reviews.llvm.org/D154438	2023-07-04 14:42:51 +01:00
Stephen Thomas	8aedad0fa0	[AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands Add functions AMDGPU::DepCtr::encodeField() and AMDGPU::DepCtr::decodeField() for each of vm_vsrc, va_vdst and sa_sdst. These are now used in AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with S_WAITCNT_DEPCTR operands easier and more readable. Differential Revision: https://reviews.llvm.org/D154424	2023-07-04 11:02:12 +01:00
Sergei Barannikov	aa2d0fbc30	[MC] Add MCRegisterInfo::regunits for iteration over register units Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152098	2023-06-16 05:39:50 +03:00
Jay Foad	890c76a931	[AMDGPU] Fix odd implicit operand handling in clause breaking By inspection. Because of the strange behaviour of MI.uses(), this was adding implicit defs to the clause uses set, and then wrongly detecting a conflict between explicit defs and implicit defs. For example it would detect a conflict on this pair of instructions: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4088, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1, addrspace 5) $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4092, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1 + 4, addrspace 5) Differential Revision: https://reviews.llvm.org/D150947	2023-05-19 21:24:33 +01:00
Kazu Hirata	4241d890ae	[Target] Use range-based for loops (NFC)	2023-04-15 14:14:56 -07:00
Jay Foad	a07584d57d	[CodeGen] Make more use of MachineOperand::getOperandNo. NFC. Differential Revision: https://reviews.llvm.org/D143252	2023-02-07 11:50:57 +00:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Carl Ritson	5bc703f755	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422	2022-12-20 16:22:14 +09:00
Jay Foad	49762162ea	[AMDGPU] Remove isLiteralConstant and isLiteralConstantLike isLiteralConstant and isLiteralConstantLike were similar to !isInlineConstant with slight differences like handling isReg operands. To avoid a profusion of similar functions with undocumented differences, this patch removes all the isLiteralConstant* variants. Callers are responsible for handling the isReg case. Differential Revision: https://reviews.llvm.org/D125759	2022-11-17 16:45:48 +00:00
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00

1 2 3 4

158 Commits