151 Commits

Author SHA1 Message Date
Pravin Jagtap
5e007afa9d
[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)
Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop
for above cases irrespective of opsel values.

Note: We might need to add few others into the same table.
2024-12-11 18:38:10 +05:30
Matt Arsenault
39337ff2dc
AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)
gfx950 SP changes doc says:
No 4 clk forwarding on opcodes that convert from
F32/F16->F8 or F32/F16->F4. Must insert a NOP or
instruction writing some other destination VREG
after a conversion to F4/F8 since it writes either
low/high half or bytes.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>
2024-12-02 09:23:17 -05:00
Matt Arsenault
27a8afa3fc
AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (#117287) 2024-11-25 09:33:04 -08:00
Matt Arsenault
c3fe5ad6be
AMDGPU: Handle vcmpx+permalane gfx950 hazard (#117286)
Confusingly, this is a different hazard to the one on gfx10
with a subtarget feature.
2024-11-25 09:27:53 -08:00
Matt Arsenault
3db4f5b0da
AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285)
The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case
was 1 wait state overly conservative. Previously, for gfx940,
the XDL/non-XDL cases happened to have the same number of cycles
in all cases. Now the XDL consumer case has an additional state for
2 pass sources.
2024-11-25 09:23:51 -08:00
Matt Arsenault
85601fd78f
AMDGPU: Handle v_mfma_f64_16x16x4_f64 write VGPR read srca/srcb hazard change for gfx950 (#117284)
Increase in wait states from 11 to 19. The index for smfmac counts as like srcA/srcB.
2024-11-22 20:30:06 -08:00
Matt Arsenault
db08d78c3e
AMDGPU: Handle v_mfma_f64_16x16x4_f64 srcc write VGPR hazard change for gfx950 (#117283)
Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17
wait states.
2024-11-22 20:26:58 -08:00
Matt Arsenault
8cb6c9907c
AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (#117263)
These have an additional wait state compared to gfx940.
2024-11-22 20:23:46 -08:00
Matt Arsenault
b078b882b9
AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (#117262)
Increase from 11 wait states to 19
2024-11-22 20:20:23 -08:00
Juan Manuel Martinez Caamaño
d617371375
[AMDGPU] Use the SchedModel available in SIInstrInfo (#110859)
Instead of allocating an initializing a new instance in
`GCNHazardRecognizer` and `AMDGPUInsertDelayAlu`.
2024-10-02 18:17:27 +02:00
Carl Ritson
86627149f6
[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)
Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on affected paths.

Compute a global cache of SGPRs with any VALU reads and use this to
avoid inserting mitigation for SGPRs never accessed by VALUs.

To avoid excessive search when compile time is priority implement
secondary mode where all SALU writes are mitigated.

Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-09-04 12:15:20 +09:00
Carl Ritson
987ffc31f8 [AMDGPU] Refactor code for GETPC bundle updates in hazards (NFCI)
As suggested in review for PR #100067.
Refactor code for S_GETPC_B64 bundle updates for use with multiple
hazard mitigations.
2024-08-23 11:58:47 +09:00
Jeffrey Byrnes
7bcf4d63cf
[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)
MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position" and "VALU op
consumes result of that op"

This includes the case where the second op is SDWA with same dest and
dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there
is an implicit read of the first op dst and the compiler needs to
resolve this hazard. Confirmed with HW team.

We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand,
so this PR checks for that.

MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and
CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue.
Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 --
this PR adds support opsel[2] != 0 as well
2024-08-22 11:38:24 -07:00
Carl Ritson
939a6624ac
[AMDGPU] Implement workaround for GFX11.5 export priority (#99273)
On GFX11.5 shaders having completed exports need to execute/wait at a
lower priority than shaders still executing exports.
Add code to maintain normal priority of 2 for shaders that export and
drop to priority 0 after exports.
2024-07-23 17:03:21 +09:00
Jay Foad
aeafdc21d2 [AMDGPU] Use using instead of typedef. NFC. 2024-07-16 16:44:12 +01:00
Jay Foad
0606747c96 [AMDGPU] Remove some pointless fallthrough annotations 2024-05-01 16:04:35 +01:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Austin Kerbow
0234d90d81
[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768)
It was shown experimentally that this may have some benefit on newer HW.
2024-03-31 10:46:05 -07:00
Matt Arsenault
a6382de399
AMDGPU: Refactor mfma hazard handling [NFC] (#84276)
Try to make this editable by using functions for the number of wait
states as a function of the number of passes. I'm assuming the current
hazard test coverage is comprehensive.

This could probably use another round to further simplify it.
Alternatively, I believe this could all be expressed in a constant table
indexed by an instruction classify function and number of passes.
2024-03-07 14:39:59 +05:30
Matt Arsenault
0f3628a937
AMDGPU: Correct cycle counts for f64 mfma on gfx940 (#83782) 2024-03-06 09:36:01 +05:30
Ivan Kosarev
dfa1d9b027
[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)
These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined constants
and functions.
2024-02-23 17:34:55 +00:00
Matt Arsenault
659ce8f665 AMDGPU: Simplify else if to else in GCNHazardRecognizer
Fixes #79736
2024-01-30 08:17:04 +05:30
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Petar Avramovic
149ed9d2c5
AMDGPU: update GFX11 wmma hazards (#76143)
One V_NOP or unrelated VALU instruction in between is required for
correctness when matrix A or B of current WMMA instruction overlaps with
matrix D of previous WMMA instruction.
Remaining cases of WMMA operand overlaps are handled by the hardware and
do not require handling in hazard recognizer.

Hardware may stall in cases where:
- matrix C of current WMMA instruction overlaps with matrix D of
previous WMMA instruction
- VALU instruction reads matrix D of previous WMMA instruction
- matrix A,B or C of WMMA instruction reads result of previous VALU
instruction
2024-01-24 12:00:35 +01:00
Pierre van Houtryve
42b0884238
[AMDGPU] Handle V_PERMLANE64_B32 in fixVcmpxPermlaneHazards (#79125)
Fixes #78856
2024-01-23 13:10:58 +01:00
Jay Foad
97747467f1
[AMDGPU] Update hazard recognition for new GFX12 wait counters (#78722)
In most cases the hazards no longer apply, so just assert that we are
not on GFX12.
2024-01-19 15:30:41 +00:00
Jay Foad
ba52f06f9d
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
separate LOADCNT, SAMPLECNT and BVHCNT counters.
2024-01-18 10:47:45 +00:00
Jay Foad
b120dae9bb
[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628)
Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc
operand
in gfx12 DS_PARAM_LOAD or DS_DIRECT_LOAD instructions is set
appropriately
depending on whether a hazard is found or not, rather than inserting an
S_WAITCNT_DEPCTR instruction if a hazard needs to be mitigated.

Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>
2024-01-11 13:20:19 +00:00
Mariusz Sikora
966416b9e8
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Michael Maitland
85e3875ad7 [TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics
D150312 added a TODO:

TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.

This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.

This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.

Differential Revision: https://reviews.llvm.org/D158568
2023-08-24 19:21:36 -07:00
Michael Maitland
71bfec762b Revert "[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics"
This reverts commit 5b854f2c23ea1b000cb4cac4c0fea77326c03d43.

Build still failing.
2023-08-24 15:37:27 -07:00
Michael Maitland
5b854f2c23 [TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics
D150312 added a TODO:

TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.

This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.

This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.

Differential Revision: https://reviews.llvm.org/D158568
2023-08-24 15:25:42 -07:00
Stephen Thomas
2dfb4b56fe [AMDGPU] Fix incorrect hazard mitigation
GCNHazardRecognizer::fixVcmpxExecWARHazard() mitigates a specific hazard
by inserting a wait on sa_sdst==0 if such a wait isn't already present.
Unfortunately, the check for an existing wait incorrectly checks for one
that doesn't actually care about sa_sdst itself, but requires that no
other counters are waited for.

Once the check is performed correctly, a lit test needs to be updated,
since it is currently testing for the incorrect behaviour.

Differential Revision: https://reviews.llvm.org/D154438
2023-07-04 14:42:51 +01:00
Stephen Thomas
8aedad0fa0 [AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands
Add functions AMDGPU::DepCtr::encodeField*() and AMDGPU::DepCtr::decodeField*()
for each of vm_vsrc, va_vdst and sa_sdst. These are now used in
AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with
S_WAITCNT_DEPCTR operands easier and more readable.

Differential Revision: https://reviews.llvm.org/D154424
2023-07-04 11:02:12 +01:00
Sergei Barannikov
aa2d0fbc30 [MC] Add MCRegisterInfo::regunits for iteration over register units
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D152098
2023-06-16 05:39:50 +03:00
Jay Foad
890c76a931 [AMDGPU] Fix odd implicit operand handling in clause breaking
By inspection. Because of the strange behaviour of MI.uses(), this was
adding implicit defs to the clause *uses* set, and then wrongly
detecting a conflict between explicit defs and implicit defs.

For example it would detect a conflict on this pair of instructions:

   $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4088, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1, addrspace 5)
   $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4092, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1 + 4, addrspace 5)

Differential Revision: https://reviews.llvm.org/D150947
2023-05-19 21:24:33 +01:00
Kazu Hirata
4241d890ae [Target] Use range-based for loops (NFC) 2023-04-15 14:14:56 -07:00
Jay Foad
a07584d57d [CodeGen] Make more use of MachineOperand::getOperandNo. NFC.
Differential Revision: https://reviews.llvm.org/D143252
2023-02-07 11:50:57 +00:00
Archibald Elliott
8e3d7cf5de [NFC][TargetParser] Remove llvm/Support/TargetParser.h 2023-02-07 11:08:21 +00:00
Jay Foad
768aed1378 [MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.

Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.

Differential Revision: https://reviews.llvm.org/D142213
2023-01-23 11:31:41 +00:00
Carl Ritson
5bc703f755 [AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass
Accelerate finding the base class for a physical register by
building a statically mapping table from physical registers
to base classes using TableGen.

Replace uses of SIRegisterInfo::getPhysRegClass with
TargetRegisterInfo::getPhysRegBaseClass in order to use
the computed table.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D139422
2022-12-20 16:22:14 +09:00
Jay Foad
49762162ea [AMDGPU] Remove isLiteralConstant and isLiteralConstantLike
isLiteralConstant and isLiteralConstantLike were similar to
!isInlineConstant with slight differences like handling isReg operands.

To avoid a profusion of similar functions with undocumented differences,
this patch removes all the isLiteralConstant* variants. Callers are responsible
for handling the isReg case.

Differential Revision: https://reviews.llvm.org/D125759
2022-11-17 16:45:48 +00:00
Pierre van Houtryve
7425077e31 [AMDGPU] Add & use hasNamedOperand, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Stephen Thomas
c8a90316fa [AMDGPU] Small cleanups in wait counter code
A small number of cleanups and refactors intended to enhance readability in
two passes that deal with s_waitcnt instructions.

Differential Revision: https://reviews.llvm.org/D136677
2022-10-28 11:05:02 +01:00
Jay Foad
9bb1e21f07 [AMDGPU] Clean up calls to MachineOperand::setIsDead and friends. NFC. 2022-10-28 10:44:08 +01:00
Matt Arsenault
575eed3dac AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs
If inline asm has a VGPR def, it must have come from a VGPR write
somewhere inside the asm. This should be further extended to all
read after write hazards.
2022-10-12 17:25:24 -07:00
Carl Ritson
a35013bec6 [AMDGPU][GFX11] Mitigate VALU mask write hazard
VALU use of an SGPR (pair) as mask followed by SALU write to the
same SGPR can cause incorrect execution of subsequent SALU reads
of the SGPR.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D134151
2022-10-01 16:21:24 +09:00
Jay Foad
f19cc793d2 [AMDGPU] Disable fp atomic to s_denorm_mode hazard for GFX11
This hazard only exists on GFX10.

Differential Revision: https://reviews.llvm.org/D134276
2022-09-20 17:40:49 +01:00
Stanislav Mekhanoshin
fb28bf3fb4 [AMDGPU] Fix liveness verifier error in hazard recognizer
After D133067 we are inserting swaps to use a new physical
register. I have noticed verifier errors about undefined
physical register uses if we are tracking liveness post RA.

We have no access to LIS at this point, so mark new register
uses as undef to calm down the verifier. Liveness should not
matter at this point anyway.

Note the description of the RegState::Undef: "Value of the
register doesn't matter." I.e. it does not say it is strictly
undefined. In fact that is what we really need: this value
does not matter.

I also had to modify the test a bit since with tracking enabled
it does not pass verification even before the recognizer.

Differential Revision: https://reviews.llvm.org/D133459
2022-09-07 16:30:36 -07:00
Stanislav Mekhanoshin
95d497ff2a [AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR
In this case gfx90a uses v0 instead of the correct register. Swap
the value temporarily with a lower register and then swap it back.

Unfortunately hazard recognizer works after wait count insertion,
so we cannot simply reuse an arbitrary register, hence w/a also
includes a full waitcount. This can be avoided if we run it from
expandPostRAPseudo, but that is a complete misplacement.

Differential Revision: https://reviews.llvm.org/D133067
2022-09-07 14:23:49 -07:00