87 Commits

Author SHA1 Message Date
zGoldthorpe
d7074b63ed
[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216)
This patch exposes builtins for atomic `add`, `max`, and `min` operations that
operate over buffer resource pointers.
2025-08-05 11:04:15 -06:00
Stanislav Mekhanoshin
0988510ad4
[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773) 2025-08-01 20:12:35 -07:00
Piotr Sobczak
ae6a9ce000
[AMDGPU] Update tests (#151688)
Fix two minor issues:
- Add double quote
- Remove unused prefix
2025-08-01 14:34:56 +02:00
Stanislav Mekhanoshin
62187a60e6
[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385) 2025-07-30 14:02:03 -07:00
Stanislav Mekhanoshin
9deb7f6062
[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466) 2025-07-24 12:13:59 -07:00
Shilei Tian
7e105fbdbe
[AMDGPU] Add support for v_tanh_f32 on gfx1250 (#149360)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 15:42:35 -04:00
Shilei Tian
d7ec80c897
[AMDGPU] Add support for v_tanh_bf16 on gfx1250 (#147425)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-14 16:30:18 -04:00
Shilei Tian
d258457d42
[AMDGPU] Add support for v_cvt_f32_fp8 on gfx1250 (#147579)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-08 16:21:24 -04:00
Changpeng Fang
4729242878
AMDGPU: Add MC layer support for load transpose instructions for gfx1250 (#146024)
Co-authored with @jayfoad
2025-06-26 22:30:31 -07:00
Stanislav Mekhanoshin
69974658f0
[AMDGPU] Initial support for gfx1250 target. (#144965)
This is just a stub for now.
2025-06-19 22:52:51 -07:00
Stanislav Mekhanoshin
189702326a
[AMDGPU] Fix gfx1201 check line in the amdgpu-features.cl. NFC. (#138743) 2025-05-06 17:05:44 -07:00
Fabian Ritter
029c8e783d
[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
2025-02-19 10:11:48 +01:00
Matt Arsenault
5615657209
AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16|f16}_f32 instructions (#117824)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:37:05 -05:00
Matt Arsenault
62dc8f3069
AMDGPU: Add builtins & codegen support for bitop3_b{16|32} of gfx950. (#117823)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 23:33:07 -05:00
Matt Arsenault
0f4fcca546
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp|bf]6 for gfx950 (#117745)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:26:07 -05:00
Matt Arsenault
2b9e947d43
AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743)
OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z]
where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.

OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d]
where, c & d i.e. OPSEL[3 : 2] selects which dst_byte  to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:20:09 -05:00
Matt Arsenault
815069c701
AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16|f32]_[bf8|fp8] (#117739)
OPSEL[1:0] collectively decide which byte to read
from src input.

Builtin takes additional imm argument which
represents index (with valid values:[0:3]) of src
byte read. Out of bounds checks will added in next
patch.

OPSEL ASM Syntax: opsel:[x,y,z]
where,
    opsel[x] = Inst{11} = src0_modifier{2}
    opsel[y] = Inst{12} = src1_modifier{2}
    opsel[z] = Inst{14} = src0_modifier{3}

Note: Inst{13} i.e. OPSEL[2] is ignored in
asm syntax and opsel[z] is meaningless
for v_cvt_scalef32_f32_{fp|bf}8

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 14:54:10 -05:00
Matt Arsenault
7fc71f7909
AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599)
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:54:50 -08:00
Matt Arsenault
716364ebd6
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:51:01 -08:00
Matt Arsenault
aa7eb5723c
AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597)
v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16.
All three instructions were part of Dot9 instructions in the compiler.

This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16)
into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16).

All necessary changes to gfx11 and gfx12 are updated to reflect this change.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:47:48 -08:00
Matt Arsenault
5d650a62a3
AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596)
This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32
instructions.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:44:47 -08:00
Matt Arsenault
22503a9df1
AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (#117592)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-25 19:27:01 -08:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Matt Arsenault
ca1b35a6c8
AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310)
Rand num instruction for stochastic rounding.
2024-11-18 10:54:54 -08:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Carl Ritson
076aac59ac
[AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Shilei Tian
1ca0055f45
[AMDGPU] Add a new target gfx1152 (#94534) 2024-06-06 12:16:11 -04:00
Fangrui Song
c5de4dd1ea [test] %clang_cc1 -emit-llvm: remove redundant -S
And replace -emit-llvm -o - with -emit-llvm-only
2024-05-04 17:00:29 -07:00
Changpeng Fang
96813de52d
AMDGPU: Define a feature for v_dot4_f32_* instructions (#84248)
FeatureDot11Insts (dot11-insts) for:
  v_dot4_f32_fp8_fp8, v_dot4_f32_fp8_bf8,
  v_dot4_f32_bf8_fp8, v_dot4_f32_bf8_bf8
2024-03-06 14:37:03 -08:00
Mariusz Sikora
cfddb59be2
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Jay Foad
ed12388082
[AMDGPU] Do not emit V_DOT2C_F32_F16_e32 on GFX12 (#78709)
That instruction is not supported on GFX12.
Added a testcase which previously crashed without this change.

Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>
2024-01-19 14:36:27 +00:00
Mariusz Sikora
3e6589f21c
[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917)
- image_atomic_pk_add_f16
- image_atomic_pk_add_bf16
- ds_pk_add_bf16
- ds_pk_add_f16
- ds_pk_add_rtn_bf16
- ds_pk_add_rtn_f16
- flat_atomic_pk_add_f16
- flat_atomic_pk_add_bf16
- global_atomic_pk_add_f16
- global_atomic_pk_add_bf16
- buffer_atomic_pk_add_f16
- buffer_atomic_pk_add_bf16
2024-01-18 14:01:09 +01:00
Mariusz Sikora
264fd9e13e
[AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (#78439) 2024-01-18 08:46:53 +01:00
Jay Foad
cf1e0c0b07
[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Konstantin Zhuravlyov
9d05727972 AMDGPU: Add basic gfx942 target
Differential Revision: https://reviews.llvm.org/D149983
2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov
1fc70210a6 AMDGPU: Add basic gfx941 target
Differential Revision: https://reviews.llvm.org/D149982
2023-05-10 11:51:06 -04:00
Anshil Gandhi
a955a31896 [AMDGPU] Replace target feature for global fadd32
Change target feature of __builtin_amdgcn_global_atomic_fadd_f32
to atomic-fadd-rtn-insts. Enable atomic-fadd-rtn-insts for gfx90a,
gfx940 and gfx1100 as they all support the return variant of
`global_atomic_add_f32`.

Fixes https://github.com/llvm/llvm-project/issues/61331.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D146840
2023-03-28 15:58:30 -06:00
Mariusz Sikora
ea064ee2a3 [AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions
Introducing Subtarget Features for instructions:
- ds_pk_add_bf16
- ds_pk_add_f16
- ds_pk_add_rtn_bf16
- ds_pk_add_rtn_f16
- flat_atomic_pk_add_f16
- flat_atomic_pk_add_bf16
- global_atomic_pk_add_f16
- global_atomic_pk_add_bf16
- buffer_atomic_pk_add_f16

Differential Revision: https://reviews.llvm.org/D146701
2023-03-24 13:10:40 +01:00
Stanislav Mekhanoshin
df0488369d [AMDGPU] Split dot7 feature
Differential Revision: https://reviews.llvm.org/D142507
2023-01-26 10:34:36 -08:00
Stanislav Mekhanoshin
870b92977e [AMDGPU] Split dot8 feature
Differential Revision: https://reviews.llvm.org/D142407
2023-01-24 11:16:07 -08:00
Stanislav Mekhanoshin
4ab2246d48 [AMDGPU] Remove dot1 and dot6 features from clang for gfx11
These are unsupported.

Differential Revision: https://reviews.llvm.org/D142493
2023-01-24 10:52:42 -08:00
Matt Arsenault
81849497b4 clang/AMDGPU: Remove flat-address-space from feature map
This was only used for checking if is_shared/is_private were legal,
which we're not bothering to do anymore.

This is apparently visible to more than the target attribute (which
seems to silently ignore unrecognized features), so this has the
potential to break something (i.e. see the OpenMP test change)
2023-01-05 16:35:04 -05:00
Matt Arsenault
f4bcd7f598 AMDGPU/clang: Add builtins for llvm.amdgcn.ballot
Use explicit _w32/_w64 suffixes for the wave size to be consistent
with the existing other wave dependent intrinsics. Also start
diagnosing trying to use both wave32 and wave64.

I would have preferred to avoid the +wavefrontsize64 spam on targets
where that's the only option, but avoiding this seems to be more work
than I expected.
2022-12-29 17:58:55 -05:00
Stanislav Mekhanoshin
9fa5a6b7e8 [AMDGPU] Support for gfx940 fp8 conversions
Differential Revision: https://reviews.llvm.org/D129902
2022-07-18 11:48:43 -07:00
Joe Nash
8bdfc73f63 [AMDGPU][clang] Definition of gfx11 subtarget
Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 2/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124536

Reviewed By: foad, kzhuravl, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124537
2022-04-29 13:55:56 -04:00
Aakanksha
840695814a [AMDGPU] Add gfx1036 target
Differential Revision: https://reviews.llvm.org/D120846
2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin
2e2e64df4a [AMDGPU] Add gfx940 target
This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688
2022-03-02 13:54:48 -08:00
Aakanksha Patil
3453f3dd46 [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00