126 Commits

Author SHA1 Message Date
Matt Arsenault
5d650a62a3
AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596)
This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32
instructions.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:44:47 -08:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Brox Chen
4cc278587f
[AMDGPU][True16][MC] VOPC profile fake16 pseudo update (#113175)
Update VOPC profile with VOP3 pseudo:

1. On GFX11+, v_cmp_class_f16 has src1 type f16 for literals, however
it's semantically interpreted as an integer. Update VOPC class f16
profile from operand type f16, i16 to f16, f16, currently updating it
for fake16 format, and will update t16 format in the following patch.
2. 16bit V_CMP_CLASS instructions (V_CMP_**_U/I/F16) are named with
`t16`, but actually using 32 bit registers. Correct it by updating the
pseudo definitions with useRealTrue16/useFakeTrue16 predicates and
rename these `t16` instructions to `fake16`.
3. Update the inst select so that `t16`/`fake16` instructions are
selected in true16/fake16 flow.
4. The mir test file are impacted for a name change of these impacted 16
bit V_CMP instructions, but non-functional change to emitted code
2024-11-22 12:12:13 -05:00
Matt Arsenault
01c9a14ccf
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.
2024-11-21 08:51:58 -08:00
Matt Arsenault
201f4f6bcc
AMDGPU: Add v_mfma_ld_scale_b32 for gfx950 (#116722) 2024-11-20 10:52:38 -08:00
Matt Arsenault
b7d635ed30
AMDGPU: Copy correct predicates for SDWA reals (#116288)
There are a lot of messes in the special case
predicate handling. Currently broad let blocks
override specific predicates with more general
cases. For instructions with SDWA, the HasSDWA
predicate was overriding the SubtargetPredicate
for the instruction.

This fixes enough to properly disallow new instructions
that support SDWA on older targets.
2024-11-18 08:38:35 -08:00
Brox Chen
7b4c8b35d4
[AMDGPU][True16][MC] VOP3 profile in True16 format (#109031)
Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16
including DPP and DPP8 in true16 and fake16 format.

This patch applies true16/fake16 changes and asm/dasm changes to
V_ADD_NC_U16
V_ADD_NC_I16
V_SUB_NC_U16
V_SUB_NC_I16
2024-10-16 10:27:44 -04:00
Scott Egerton
396f677514
[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769) 2024-09-24 10:58:00 +01:00
Jeffrey Byrnes
7bcf4d63cf
[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)
MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position" and "VALU op
consumes result of that op"

This includes the case where the second op is SDWA with same dest and
dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there
is an implicit read of the first op dst and the compiler needs to
resolve this hazard. Confirmed with HW team.

We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand,
so this PR checks for that.

MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and
CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue.
Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 --
this PR adds support opsel[2] != 0 as well
2024-08-22 11:38:24 -07:00
Acim Maravic
9398cc2ec5
[LLVM][AMDGPU] Copy isConvergent from Pseudo to Real instructions (#99658)
This patch copies the flag isConvergent from pseudo instructions to the
corresponding real instructions, so that isConvergent flag is also
defined for real instructions.

Flags are not required by the compiler, but for consistency it would be
nice to have them.

Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>
2024-07-25 18:01:07 +02:00
Ivan Kosarev
47c3eca489
[AMDGPU][NFC] Make GFX*Gen records globally available. (#97291)
And use them to simplify SOP-related definitions.

Introduces GFX10Gen.
2024-07-01 16:09:56 +01:00
Scott Egerton
0a57a20aa5
[AMDGPU] NFC: Remove duplicate VOP_DPP_Pseudo TableGen definitions (#95370)
After recent changes, VOP_DPP_Pseudo now inherits from VOP_Pseudo.
This commit removes some on the duplicate definitions in
VOP_DPP_Pseudo that are exactly the same as definitions inherited from
VOP_Pseudo.
2024-06-14 15:52:28 +01:00
Scott Egerton
4a305d40a3
[AMDGPU] Exclude certain opcodes from being marked as single use (#91802)
The s_singleuse_vdst instruction is used to mark regions of instructions
that produce values that have only one use.
Certain instructions take more than one cycle to execute, resulting in
regions being incorrectly marked.
This patch excludes these multi-cycle instructions from being marked as
either producing single use values or consuming single use values
or both depending on the instruction.
2024-06-12 10:43:23 +01:00
Fabian Ritter
0821b7937c
[AMDGPU] Copy Defs and Uses from Pseudo to Real Instructions (#93004)
Currently, the tablegen files that generate the instruction definitions
in lib/Target/AMDGPU/AMDGPUGenInstrInfo.inc often only include implicit
operands for the architecture-independent pseudo instructions, but not
for the corresponding real instructions. The missing implicit operands
(most prominently: the EXEC mask) do not affect code generation, since
that operates on pseudo instructions, but they are problematic when
working with real instructions, e.g., as a decoding result from the MC
layer.

This patch copies the implicit Defs and Uses from pseudo instructions to
the corresponding real instructions, so that implicit operands are also
defined for real instructions.

Addresses issue #89830.
2024-05-31 08:40:54 +02:00
Joe Nash
fe0b7983a2
[AMDGPU] Create AMDGPUMnemonicAlias tablegen class (#89288)
AMDGPUMnemonicAlias is a MnemonicAlias that inherits from
GCNPredicateControl, so that we can set predicates on the alias the same
way as Instructions.
Use AssemblerPredicate instead of Requires on aliases

NFC.
2024-05-09 11:37:56 -04:00
Stanislav Mekhanoshin
a70ad96b3c
[AMDGPU] Fix condition in VOP3_Real_Base. NFCI. (#91373) 2024-05-07 13:45:58 -07:00
Stanislav Mekhanoshin
57216f7bd6
[AMDGPU] Support byte_sel modifier for v_cvt_f32_fp8 and v_cvt_f32_bf8 (#90887) 2024-05-02 12:03:51 -07:00
Stanislav Mekhanoshin
6e722bbe30
[AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244) 2024-04-26 13:02:57 -07:00
Stanislav Mekhanoshin
ce1b6783d2
[AMDGPU] simplify VOP3_Real definitions. NFC. (#89656) 2024-04-22 14:51:11 -07:00
Joe Nash
e29228efae
[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418)
Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on
GFX1150Plus

The w32 and w64 _e64_dpp assembler only real instructions were unused,
and erroneously constructed in a way that bugged parsing of the new
instructions. They are removed.

This patch is a follow up to PR
https://github.com/llvm/llvm-project/pull/87382
2024-04-03 14:51:27 -04:00
Changpeng Fang
839a8fecb4
AMDGPU: Copy SubtargetPredicate from pseudo to real for dpp16 and dpp8 (#84517)
We usually expect to copy SubtargetPredicate (and OtherPredicates) from
pseudo to real. However, in dpp16 and dpp8, there are assignments like
SubtargetPredicate = HasDPP/HasDPP16/HasDpp8. These assignments override
predicates copied from pseudo, and thus the predicates used to define
pseudo get lost.

Losing predicates is a subtle issue usually not easy to be found. It may
result in instructions being generated on GPUs that do not support the
features to generate them.
https://github.com/llvm/llvm-project/pull/84354 addressed one of such
issues, and inspired this work.

Fortunately, we found that the assignment of SubtargetPredicate usually
comes together with assignment of AssemblerPredicate, and with the same
value. For example:
  let AssemblerPredicate = HasDPP16;
  let SubtargetPredicate = HasDPP16;
One of them is redundant and can be removed.

In this work, we remove the redundant assignment of SubtargetPredicate,
and then copy it from pseudo for VOP*_DPP and VOP*_DPP8. With this
change, we can safely use SubtargetPredicate to define pseudo
instructions.
2024-03-08 10:30:01 -08:00
Stanislav Mekhanoshin
d7b73c8d01
[AMDGPU] Copy WaveSizePredicate into VOP3_Real. NFCI. (#83352) 2024-02-28 15:42:31 -08:00
Changpeng Fang
9de78c4e24
AMDGPU: Simplify FP8 conversion definitions. NFC. (#83043)
Reals should inherit predicates from the corresponding Pseudo.
2024-02-26 10:13:40 -08:00
Stanislav Mekhanoshin
3dfca24dda
[AMDGPU] Fix encoding of VOP3P dpp on GFX11 and GFX12 (#82710)
The bug affects dpp forms of v_dot2_f32_f16. The encoding does not match
SP3 and does not set op_sel_hi bits properly.
2024-02-23 03:50:00 -08:00
Jay Foad
3b7d43301e
[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491)
Now that there is no special checking for valid DPP encodings, these
instructions can use the same DecoderNamespace as other 64- or 96-bit
instructions.

Also clean up setting DecoderNamespace: in most cases it should be set
as a pair with AssemblerPredicate.
2024-02-22 11:18:18 +00:00
Changpeng Fang
d3fcf31031
AMDGPU: Use HasFP8ConversionInsts appropriately, NFC (#82433)
The corresponding fp8 conversion instructions are available for a
subtarget when and only when the subtarget "HasFP8ConversionInsts". We
should not assume all the future subtargets (gfx12+) have
FP8ConversionInsts.
  In this patch, we use OtherPredicates to carry HasFP8ConversionInsts
feature. This is because SubtargetPredicate is not copied from pseudos
to reals for DPP16 and DPP6. To avoid overriding OtherPredicates in a
few places, we use the newly introduced True16Predicate to hold
UseRealTrue16Insts instead.
 This work repalces the inadvertently closed pull request:
https://github.com/llvm/llvm-project/pull/82024
2024-02-20 16:03:54 -08:00
Jay Foad
ddba6b271c
[AMDGPU] Stop using SDWA DecoderNamespaces. NFCI. (#82233)
64-bit SDWA encodings have to be checked first because their first 32
bits are a special case of the corresponding 32-bit non-SDWA encoding of
the same instruction. But all 64-bit encodings are checked first, so we
don't need special handling for SDWA.
2024-02-20 12:58:07 +00:00
Ivan Kosarev
f122268c04
[AMDGPU][NFC] Extend PredicateControl to support True16 predicates. (#82245)
Using OtherPredicates for True16 predicates is often problematic due to
interference with other kinds of predicates, particularly when this
overrides predicates inherited from pseudo instructions.
2024-02-20 11:37:44 +00:00
Stanislav Mekhanoshin
f847c72be0
[AMDGPU] Use HasClamp instead of HasIntClamp in VOP3_Pseudo. NFC. (#82020)
There is no real reason to differentiate.
2024-02-17 00:48:37 -08:00
Konstantin Zhuravlyov
fcef407aa2
AMDGPU/NFC: Remove some bits from TSFlags (#81525)
- AMDGPU/NFC: Purge SOPK_ZEXT from TSFlags
  - Moved to helper function in SIInstInfo
- AMDGPU/NFC: Purge VOPAsmPrefer32Bit from TSFlags
  - This flag did not make sense / remnants of something else I think
2024-02-12 16:43:48 -05:00
Mirko Brkušanin
815e0485a4
[AMDGPU][MC] Fix printing vcc(_lo) twice for VOPC DPP instrucitons (#81158) 2024-02-12 19:01:58 +01:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Mariusz Sikora
cfddb59be2
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Mariusz Sikora
28b7e498b6
AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892)
Endoding is VOP3P. Tagged as deep/machine learning instructions. i32
type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and
src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32
fneg(neg_lo[2]) and f32 fabs(neg_hi[2]).

---------

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
2024-01-18 14:00:27 +01:00
Stanislav Mekhanoshin
8e9e4f8809
[AMDGPU] Remove VT helpers isFloatType, isPackedType, simplify isIntType (#77987) 2024-01-16 02:08:22 -08:00
Ivan Kosarev
60bb5c54f6
[AMDGPU] Fix predicates for various True16 instructions. (#77581)
Resolves AsmParser ambiguities, e.g., between
V_SUBREV_F16_t16_dpp8_gfx11 and V_SUBREV_F16_t16_dpp8_gfx12.

Part of <https://github.com/llvm/llvm-project/issues/69256>.
2024-01-10 12:58:18 +00:00
Mirko Brkušanin
569ef8ddd9
[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204) 2023-12-15 10:41:05 +01:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00
Ivan Kosarev
fab28e0e14 Reapply "[AMDGPU] Introduce real and keep fake True16 instructions."
Reverts 6cb3866b1ce9d835402e414049478cea82427cf1.

Analysis of failures on buildbots with expensive checks enabled showed
that the problem was triggered by changes in another commit,
469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug
addressed in #67245.
2023-09-23 22:07:41 +01:00
Ivan Kosarev
6cb3866b1c Revert "[AMDGPU] Introduce real and keep fake True16 instructions."
This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to
failures on expensive checks.
2023-09-22 15:40:26 +01:00
Ivan Kosarev
0f864c7b8b [AMDGPU] Introduce real and keep fake True16 instructions.
The existing fake True16 instructions using 32-bit VGPRs are supposed to
co-exist with real ones until all the necessary True16 functionality is
implemented and relevant tests are updated.

Reviewed By: arsenm, Joe_Nash

Differential Revision: https://reviews.llvm.org/D156101
2023-09-22 10:57:56 +01:00
Stanislav Mekhanoshin
cfe9a134bb [AMDGPU] Rename 64BitDPP feature and fix the checks
Names '64BitDPP' and especially 'DPP64' were found misleading, and
DPP64 can easily be mixed with DPP16 and DPP8 while these are
different concepts. DPP16 and DPP8 refers to lanes where DPP64
refers to the operand size.

In fact the essential part here is that these instructions are
executed on the DP ALU, so rename the feature accordingly.

I have also found a bug in a check for these instructions, which is
fixed here and a common utility function is now used.

Differential Revision: https://reviews.llvm.org/D158465
2023-08-22 11:00:10 -07:00
Matt Arsenault
fb54afd1b7 AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers
This isn't always folded to fneg for a freestanding fsub depending on
the denormal mode. When matching source modifiers, we're implicitly
canonicalizing the input so we can fold it here.

Doesn't bother handling the VOP3P case since it's only relevant with
DAZ, which nobody really uses with f16.

For f64, tests show an existing bug where DAGCombiner tries to respect
the denormal mode for fsub -0, x, but not after it's lowered to fadd
-0, (fneg x). Either the fold is wrong or we shouldn't restrict the
fsub case based on the denormal mode.

https://reviews.llvm.org/D155652
2023-07-20 19:29:40 -04:00
Jay Foad
23b0df72d2 [AMDGPU] Remove BoolToList class
Replace all:
  foreach _ = BoolToList<cond>.ret in
with:
  if cond then

Thanks to Philip Reames for D145711 which enabled this.
2023-03-13 09:22:52 +00:00
Janek van Oirschot
322966f8f8 [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp
class support and introduce GlobalISel implementation for AMDGPU

Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic
for llvm.is.fpclass
2022-11-28 16:00:36 -05:00
Joe Nash
b982ba2a6e [AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Due to the encoding changes in GFX11, we had a hack in place that
    disables the use of VGPRs above 128. This patch removes the need for
    that hack.

    We introduce a new register class VGPR_32_Lo128 which is used for 16-bit
    operands of VOP1, VOP2, and VOPC instructions. This register class only has the
    low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1,
    VOP2, and VOPC instructions are correctly limited to use the first 128
    VGPRs, while the other instructions can freely use all 256.

    We introduce new pseduo-instructions used on GFX11 which have the suffix
    t16 (True 16) to use the VGPR_32_Lo128 register class.

Reviewed By: foad, rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D133723
2022-09-20 09:56:28 -04:00
Petar Avramovic
8de1f04c77 [AMDGPU] gfx11 Fix VOP3 dot instructions
Fix src modifiers for operands with bf16 type.
op_sel[0:1] are ignored.

Differential Revision: https://reviews.llvm.org/D129084
2022-07-22 11:43:35 +02:00
Dmitry Preobrazhensky
2a6532d542 [AMDGPU][MC][GFX11] Correct disassembly of *_e64_dpp opcodes which support op_sel
These opcodes cannot be disassembled because op_sel operand is missing - it must be added manually.
See https://github.com/llvm/llvm-project/issues/56512 for detailed issue analysis.

Differential Revision: https://reviews.llvm.org/D129637
2022-07-15 13:11:59 +03:00
Jay Foad
4dbc2876cf [AMDGPU] GFX11 trivial NFC tweaks
A few miscellaneous comment, whitespace and indentation tweaks.
2022-07-05 17:20:17 +01:00
Joe Nash
0483c91eee [AMDGPU] gfx11 CodeGen for new DPP instructions
Modifies the GCNDPPCombine pass to enable DPP formation for the new DPP
instruction in gfx11, namely VOP3 encoded instructions with DPP and VOPC
with DPP.

Depends on D128656

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128682
2022-07-05 10:17:59 -04:00