412 Commits

Author SHA1 Message Date
Joe Nash
e29228efae
[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418)
Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on
GFX1150Plus

The w32 and w64 _e64_dpp assembler only real instructions were unused,
and erroneously constructed in a way that bugged parsing of the new
instructions. They are removed.

This patch is a follow up to PR
https://github.com/llvm/llvm-project/pull/87382
2024-04-03 14:51:27 -04:00
Joe Nash
6a13bbf92f
[AMDGPU][MC] Enables sgpr or imm src1 for float VOP3 DPP, but excludi… (#87382)
…ng VOPC.

Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions
should allow sgpr and imm operands.
PR #67461 added support for this with int operands, but it was missing a
piece for float.
Changing VOPC e64_dpp will be in a different patch because there is a
bug preventing that change.
2024-04-03 11:34:12 -04:00
Joe Nash
44278f2326 [AMDGPU][MC] Fix GFX12 check line typo and move test
NFC.
Fix CHECK lines that seem to have a copy paste error.
Move the test that was formerly in gfx12_dasm_vinterp.txt (see #85949).
2024-03-21 10:46:07 -04:00
Joe Nash
d1f182c895
[AMDGPU][MC][True16] Rename and combine VINTERP MC tests (#85949)
NFC.
gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the
assembler and disassembler tests to be sorted based on real16 or fake16
instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16
(fake16 in encoding, but not by name) upstream, so that is why the test
files have a -fake16 suffix.

One test input is changed, and that is the disassembler test for
unsupported bits in the instruction. It is now an input that is valid on
both GFX11 and GFX12. This was necessary because the size of the opcode
field changed.
2024-03-21 10:42:39 -04:00
Stanislav Mekhanoshin
0b0e52836d
[AMDGPU] Fix GFX11 sendmsg codes (#85299)
The code MSG_RTN_GET_TBA_TO_PC was missing, and the next code is off by
1 as a result.
2024-03-15 09:46:58 -07:00
Jay Foad
36dece0013
[AMDGPU] Add missing GFX10 buffer format d16 hi instructions (#84809) 2024-03-12 08:20:08 +00:00
Jay Foad
212604698c
[AMDGPU] Add missing tests for GFX10 (t)buffer format d16 instructions (#84789) 2024-03-11 18:25:49 +00:00
Joe Nash
3b1512c477 [AMDGPU] Make gfx11 vop2 disassembler tests use strict-whitespace
NFC.
Adds -strict-whitespace to RUN lines and adjusts CHECK line space
padding accordingly.

See also  (#84078)
2024-03-06 16:07:30 -05:00
Joe Nash
f448b8ec03
[AMDGPU] Make gfx11 vop1 disassembler tests use strict-whitespace (#84078)
NFC.
The whitespace needs to be consistently formatted in some manner. Might
as well use -strict-whitespace as the standard.
Adds -strict-whitespace to RUN lines and adjust CHECK line space padding
accordingly.

Also test REAL16 and FAKE16 CHECK lines with wave64.
2024-03-06 10:11:10 -05:00
Ivan Kosarev
a888f5e4d7
[AMDGPU][NFC] Update tests to use -triple= instead of -arch=. (#84153) 2024-03-06 12:44:19 +00:00
Stanislav Mekhanoshin
3dfca24dda
[AMDGPU] Fix encoding of VOP3P dpp on GFX11 and GFX12 (#82710)
The bug affects dpp forms of v_dot2_f32_f16. The encoding does not match
SP3 and does not set op_sel_hi bits properly.
2024-02-23 03:50:00 -08:00
Stanislav Mekhanoshin
98db8d0cb7
[AMDGPU] Fix v_dot2_f16_f16/v_dot2_bf16_bf16 operands (#82423)
src0 and src1 are packed f16/bf16, we are printing literals like
0x40002000, but we cannot parse it.
2024-02-20 16:34:40 -08:00
Shilei Tian
2ad43fa467
[AMDGPU] Fix operand types for V_DOT2_F32_BF16 (#82044) 2024-02-20 08:25:01 -05:00
Stanislav Mekhanoshin
13e64958a0
[AMDGPU] Fix decoder for BF16 inline constants (#82276)
Fix #82039.
2024-02-19 13:45:23 -08:00
Ivan Kosarev
0ec524b120
[AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16. (#81131)
[AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16.

Also add missing v_ceil/floor_f16 tests.

Includes https://github.com/llvm/llvm-project/pull/80892.
2024-02-19 15:50:48 +00:00
Shilei Tian
46734aa1e5
[AMDGPU] Use bf16 instead of i16 for bfloat (#80908)
Currently we generally use `i16` to represent `bf16` in those tablegen
files. This patch is trying to use `bf16` directly.

Fix #79369.
2024-02-16 15:58:30 -05:00
Jay Foad
cb8f910035
[AMDGPU] Do not test both wave sizes for DSDIR disassembly (#81719)
There is nothing in these instruction definitions that depends on wave
size so testing both seems like overkill. The corresponding assembler
tests do not do it.
2024-02-14 10:15:06 +00:00
Konstantin Zhuravlyov
cf55e61dd9
AMDGPU: Don't allow s_barrier on gfx12 (#81317)
- s_barrier is not present on gfx12
2024-02-12 11:32:46 -05:00
Ivan Kosarev
7d19dc50de
[AMDGPU][True16] Support VOP3 source DPP operands. (#80892) 2024-02-08 16:23:00 +00:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Mariusz Sikora
cfddb59be2
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Ivan Kosarev
5a458767dd
[AMDGPU][True16] Support source DPP operands. (#79025) 2024-01-23 09:52:49 +00:00
Stanislav Mekhanoshin
1000cefc04
[AMDGPU] Remove s_set_inst_prefetch_distance support from GFX12 (#78786)
This instruction is not supported by GFX12.
2024-01-22 14:31:17 -08:00
Mariusz Sikora
2c78f3b860
[AMDGPU][GFX12] Add tests for flat_atomic_pk (#78683) 2024-01-19 12:08:17 +01:00
Piotr Sobczak
57f6a3f7ea
[AMDGPU] Add global_load_tr for GFX12 (#77772)
Support new amdgcn_global_load_tr instructions for load with transpose.

* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
2024-01-18 15:14:42 +01:00
Mariusz Sikora
3e6589f21c
[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917)
- image_atomic_pk_add_f16
- image_atomic_pk_add_bf16
- ds_pk_add_bf16
- ds_pk_add_f16
- ds_pk_add_rtn_bf16
- ds_pk_add_rtn_f16
- flat_atomic_pk_add_f16
- flat_atomic_pk_add_bf16
- global_atomic_pk_add_f16
- global_atomic_pk_add_bf16
- buffer_atomic_pk_add_f16
- buffer_atomic_pk_add_bf16
2024-01-18 14:01:09 +01:00
Mariusz Sikora
28b7e498b6
AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892)
Endoding is VOP3P. Tagged as deep/machine learning instructions. i32
type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and
src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32
fneg(neg_lo[2]) and f32 fabs(neg_hi[2]).

---------

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
2024-01-18 14:00:27 +01:00
Ivan Kosarev
2a869ced61
[AMDGPU][True16] Support V_FLOOR_F16. (#78446) 2024-01-18 08:43:47 +00:00
Mariusz Sikora
c99da46fc1
[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2024-01-17 19:23:42 +01:00
Jay Foad
e4c8c58517
[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 (#77929) 2024-01-17 15:57:36 +00:00
Shilei Tian
d63c2e52e6
[AMDGPU][MC] Remove incorrect _e32 suffix from v_dot2c_f32_f16 and v_dot4c_i32_i8 (#77993)
The two VOP2 instructions cannot be encoded as VOP3.

Fix #54691.
2024-01-15 23:11:50 -05:00
Mirko Brkušanin
3867e6689e
[AMDGPU] Add new GFX12 image atomic float instructions (#76946) 2024-01-11 17:28:04 +01:00
Ivan Kosarev
084f1c2ee0
[AMDGPU][True16] Support V_CEIL_F16. (#73108)
As not all fake instructions have their real counterparts implemented
yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow
both fake and real True16 instructions in assembler and disassembler
tests in the -mattr=+real-true16 mode during the transition period.

Source DPP and desitnation VOPDstOperand_t16 operands are still not
supported and will be addressed separately.
2024-01-10 08:46:19 +00:00
Jay Foad
b59b8d4182
[AMDGPU] Add GFX12 S_WAIT_* instructions (#77336)
GFX12 has separate wait instructions per counter e.g. S_WAIT_LOADCNT.
S_WAITCNT still exists but is deprecated and codegen should stop using
it. S_WAITCNT_* (e.g. S_WAITCNT_VSCNT) are removed.

This patch adds/removes MC layer support for these instructions.
2024-01-09 09:05:48 +00:00
Mirko Brkušanin
7ca4473dd9
[AMDGPU] Add new cache flushing instructions for GFX12 (#76944)
Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>
2024-01-08 14:06:58 +00:00
Jay Foad
59f3b7202d
[AMDGPU] Add GXF12 8- and 16-bit SMEM loads (#76966) 2024-01-05 08:19:50 +00:00
Nicolai Hähnle
49b492048a
AMDGPU: Fix packed 16-bit inline constants (#76522)
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.

Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.

Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.

Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.

Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.

A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.

Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
2024-01-04 00:10:15 +01:00
Mirko Brkušanin
82e33d6203
[AMDGPU] Add VDSDIR instructions for GFX12 (#75197) 2024-01-03 16:32:00 +01:00
Jay Foad
cf025c767e
[AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (#76149) 2024-01-02 13:02:20 +00:00
Jay Foad
8fdfd34cd2
[AMDGPU] Remove GDS and GWS for GFX12 (#76148) 2023-12-21 15:27:08 +00:00
Mirko Brkušanin
569ef8ddd9
[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204) 2023-12-15 10:41:05 +01:00
Mariusz Sikora
966416b9e8
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Mirko Brkušanin
c1a6974d6b
[AMDGPU][MC] Add GFX12 SMEM encoding (#75215) 2023-12-15 09:00:54 +01:00
Jay Foad
3e6da3252f
[AMDGPU] Add GFX12 s_sleep_var instruction and intrinsic (#75499) 2023-12-14 21:11:39 +00:00
Mirko Brkušanin
47615ddc84
[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193) 2023-12-14 14:22:04 +01:00
Mirko Brkušanin
ac406b4817
[AMDGPU][MC] Add GFX12 VBUFFER encoding (#75195) 2023-12-14 12:58:18 +01:00
Mirko Brkušanin
16c27bcdde
[AMDGPU][MC] Add GFX12 VDS encoding (#75316) 2023-12-14 11:04:21 +01:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Piotr Sobczak
6eec80133b
[AMDGPU] Min/max changes for GFX12 (#75214)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 14:18:10 +01:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00