llvm-project

Author	SHA1	Message	Date
Joe Nash	e29228efae	[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418 ) Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on GFX1150Plus The w32 and w64 _e64_dpp assembler only real instructions were unused, and erroneously constructed in a way that bugged parsing of the new instructions. They are removed. This patch is a follow up to PR https://github.com/llvm/llvm-project/pull/87382	2024-04-03 14:51:27 -04:00
Joe Nash	6a13bbf92f	[AMDGPU][MC] Enables sgpr or imm src1 for float VOP3 DPP, but excludi… (#87382 ) …ng VOPC. Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions should allow sgpr and imm operands. PR #67461 added support for this with int operands, but it was missing a piece for float. Changing VOPC e64_dpp will be in a different patch because there is a bug preventing that change.	2024-04-03 11:34:12 -04:00
Joe Nash	44278f2326	[AMDGPU][MC] Fix GFX12 check line typo and move test NFC. Fix CHECK lines that seem to have a copy paste error. Move the test that was formerly in gfx12_dasm_vinterp.txt (see #85949).	2024-03-21 10:46:07 -04:00
Joe Nash	d1f182c895	[AMDGPU][MC][True16] Rename and combine VINTERP MC tests (#85949 ) NFC. gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the assembler and disassembler tests to be sorted based on real16 or fake16 instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16 (fake16 in encoding, but not by name) upstream, so that is why the test files have a -fake16 suffix. One test input is changed, and that is the disassembler test for unsupported bits in the instruction. It is now an input that is valid on both GFX11 and GFX12. This was necessary because the size of the opcode field changed.	2024-03-21 10:42:39 -04:00
Stanislav Mekhanoshin	0b0e52836d	[AMDGPU] Fix GFX11 sendmsg codes (#85299 ) The code MSG_RTN_GET_TBA_TO_PC was missing, and the next code is off by 1 as a result.	2024-03-15 09:46:58 -07:00
Jay Foad	36dece0013	[AMDGPU] Add missing GFX10 buffer format d16 hi instructions (#84809 )	2024-03-12 08:20:08 +00:00
Jay Foad	212604698c	[AMDGPU] Add missing tests for GFX10 (t)buffer format d16 instructions (#84789 )	2024-03-11 18:25:49 +00:00
Joe Nash	3b1512c477	[AMDGPU] Make gfx11 vop2 disassembler tests use strict-whitespace NFC. Adds -strict-whitespace to RUN lines and adjusts CHECK line space padding accordingly. See also (#84078)	2024-03-06 16:07:30 -05:00
Joe Nash	f448b8ec03	[AMDGPU] Make gfx11 vop1 disassembler tests use strict-whitespace (#84078 ) NFC. The whitespace needs to be consistently formatted in some manner. Might as well use -strict-whitespace as the standard. Adds -strict-whitespace to RUN lines and adjust CHECK line space padding accordingly. Also test REAL16 and FAKE16 CHECK lines with wave64.	2024-03-06 10:11:10 -05:00
Ivan Kosarev	a888f5e4d7	[AMDGPU][NFC] Update tests to use -triple= instead of -arch=. (#84153 )	2024-03-06 12:44:19 +00:00
Stanislav Mekhanoshin	3dfca24dda	[AMDGPU] Fix encoding of VOP3P dpp on GFX11 and GFX12 (#82710 ) The bug affects dpp forms of v_dot2_f32_f16. The encoding does not match SP3 and does not set op_sel_hi bits properly.	2024-02-23 03:50:00 -08:00
Stanislav Mekhanoshin	98db8d0cb7	[AMDGPU] Fix v_dot2_f16_f16/v_dot2_bf16_bf16 operands (#82423 ) src0 and src1 are packed f16/bf16, we are printing literals like 0x40002000, but we cannot parse it.	2024-02-20 16:34:40 -08:00
Shilei Tian	2ad43fa467	[AMDGPU] Fix operand types for `V_DOT2_F32_BF16` (#82044 )	2024-02-20 08:25:01 -05:00
Stanislav Mekhanoshin	13e64958a0	[AMDGPU] Fix decoder for BF16 inline constants (#82276 ) Fix #82039.	2024-02-19 13:45:23 -08:00
Ivan Kosarev	0ec524b120	[AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16. (#81131 ) [AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16. Also add missing v_ceil/floor_f16 tests. Includes https://github.com/llvm/llvm-project/pull/80892.	2024-02-19 15:50:48 +00:00
Shilei Tian	46734aa1e5	[AMDGPU] Use `bf16` instead of `i16` for bfloat (#80908 ) Currently we generally use `i16` to represent `bf16` in those tablegen files. This patch is trying to use `bf16` directly. Fix #79369.	2024-02-16 15:58:30 -05:00
Jay Foad	cb8f910035	[AMDGPU] Do not test both wave sizes for DSDIR disassembly (#81719 ) There is nothing in these instruction definitions that depends on wave size so testing both seems like overkill. The corresponding assembler tests do not do it.	2024-02-14 10:15:06 +00:00
Konstantin Zhuravlyov	cf55e61dd9	AMDGPU: Don't allow s_barrier on gfx12 (#81317 ) - s_barrier is not present on gfx12	2024-02-12 11:32:46 -05:00
Ivan Kosarev	7d19dc50de	[AMDGPU][True16] Support VOP3 source DPP operands. (#80892 )	2024-02-08 16:23:00 +00:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Ivan Kosarev	5a458767dd	[AMDGPU][True16] Support source DPP operands. (#79025 )	2024-01-23 09:52:49 +00:00
Stanislav Mekhanoshin	1000cefc04	[AMDGPU] Remove s_set_inst_prefetch_distance support from GFX12 (#78786 ) This instruction is not supported by GFX12.	2024-01-22 14:31:17 -08:00
Mariusz Sikora	2c78f3b860	[AMDGPU][GFX12] Add tests for flat_atomic_pk (#78683 )	2024-01-19 12:08:17 +01:00
Piotr Sobczak	57f6a3f7ea	[AMDGPU] Add global_load_tr for GFX12 (#77772 ) Support new amdgcn_global_load_tr instructions for load with transpose. * MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128 * Intrinsic int_amdgcn_global_load_tr * Clang builtins amdgcn_global_load_tr*	2024-01-18 15:14:42 +01:00
Mariusz Sikora	3e6589f21c	[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917 ) - image_atomic_pk_add_f16 - image_atomic_pk_add_bf16 - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 - buffer_atomic_pk_add_bf16	2024-01-18 14:01:09 +01:00
Mariusz Sikora	28b7e498b6	AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892 ) Endoding is VOP3P. Tagged as deep/machine learning instructions. i32 type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32 fneg(neg_lo[2]) and f32 fabs(neg_hi[2]). --------- Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>	2024-01-18 14:00:27 +01:00
Ivan Kosarev	2a869ced61	[AMDGPU][True16] Support V_FLOOR_F16. (#78446 )	2024-01-18 08:43:47 +00:00
Mariusz Sikora	c99da46fc1	[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2024-01-17 19:23:42 +01:00
Jay Foad	e4c8c58517	[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 (#77929 )	2024-01-17 15:57:36 +00:00
Shilei Tian	d63c2e52e6	[AMDGPU][MC] Remove incorrect `_e32` suffix from `v_dot2c_f32_f16` and `v_dot4c_i32_i8` (#77993 ) The two VOP2 instructions cannot be encoded as VOP3. Fix #54691.	2024-01-15 23:11:50 -05:00
Mirko Brkušanin	3867e6689e	[AMDGPU] Add new GFX12 image atomic float instructions (#76946 )	2024-01-11 17:28:04 +01:00
Ivan Kosarev	084f1c2ee0	[AMDGPU][True16] Support V_CEIL_F16. (#73108 ) As not all fake instructions have their real counterparts implemented yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow both fake and real True16 instructions in assembler and disassembler tests in the -mattr=+real-true16 mode during the transition period. Source DPP and desitnation VOPDstOperand_t16 operands are still not supported and will be addressed separately.	2024-01-10 08:46:19 +00:00
Jay Foad	b59b8d4182	[AMDGPU] Add GFX12 S_WAIT_* instructions (#77336 ) GFX12 has separate wait instructions per counter e.g. S_WAIT_LOADCNT. S_WAITCNT still exists but is deprecated and codegen should stop using it. S_WAITCNT_* (e.g. S_WAITCNT_VSCNT) are removed. This patch adds/removes MC layer support for these instructions.	2024-01-09 09:05:48 +00:00
Mirko Brkušanin	7ca4473dd9	[AMDGPU] Add new cache flushing instructions for GFX12 (#76944 ) Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>	2024-01-08 14:06:58 +00:00
Jay Foad	59f3b7202d	[AMDGPU] Add GXF12 8- and 16-bit SMEM loads (#76966 )	2024-01-05 08:19:50 +00:00
Nicolai Hähnle	49b492048a	AMDGPU: Fix packed 16-bit inline constants (#76522 ) Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)	2024-01-04 00:10:15 +01:00
Mirko Brkušanin	82e33d6203	[AMDGPU] Add VDSDIR instructions for GFX12 (#75197 )	2024-01-03 16:32:00 +01:00
Jay Foad	cf025c767e	[AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (#76149 )	2024-01-02 13:02:20 +00:00
Jay Foad	8fdfd34cd2	[AMDGPU] Remove GDS and GWS for GFX12 (#76148 )	2023-12-21 15:27:08 +00:00
Mirko Brkušanin	569ef8ddd9	[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204 )	2023-12-15 10:41:05 +01:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Mirko Brkušanin	c1a6974d6b	[AMDGPU][MC] Add GFX12 SMEM encoding (#75215 )	2023-12-15 09:00:54 +01:00
Jay Foad	3e6da3252f	[AMDGPU] Add GFX12 s_sleep_var instruction and intrinsic (#75499 )	2023-12-14 21:11:39 +00:00
Mirko Brkušanin	47615ddc84	[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193 )	2023-12-14 14:22:04 +01:00
Mirko Brkušanin	ac406b4817	[AMDGPU][MC] Add GFX12 VBUFFER encoding (#75195 )	2023-12-14 12:58:18 +01:00
Mirko Brkušanin	16c27bcdde	[AMDGPU][MC] Add GFX12 VDS encoding (#75316 )	2023-12-14 11:04:21 +01:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Piotr Sobczak	6eec80133b	[AMDGPU] Min/max changes for GFX12 (#75214 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 14:18:10 +01:00
Mariusz Sikora	a97028ac51	[AMDGPU] Update VOP instructions for GFX12 (#74853 ) Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2023-12-12 11:38:24 +01:00

1 2 3 4 5 ...

412 Commits