77 Commits

Author SHA1 Message Date
Mirko Brkušanin
4280f0d241
[AMDGPU] Add dot4 fp8/bf8 instructions for gfx1170 (#180516) 2026-02-10 12:14:49 +01:00
Mirko Brkušanin
45b037cf7a
[AMDGPU] Add fp8/bf8 conversion instructions for gfx1170 (#180191) 2026-02-09 13:56:43 +01:00
Mirko Brkušanin
20b5849e17
[AMDGPU] Define new target gfx1170 (#180185) 2026-02-06 14:38:50 +01:00
Mariusz Sikora
6de6f7b46b
[AMDGPU] Define gfx1310 target with ELF number 0x50 (#177355)
For now this is identical to gfx1250.

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
2026-01-22 17:08:38 +01:00
Shilei Tian
39bd4562ba
[Clang][AMDGPU] Handle wavefrontsize32 and wavefrontsize64 features more robustly (#176599)
We should not allow `-wavefrontsize32` and `-wavefrontsize64` to be
specified at the same time. We should also not allow `-wavefrontsize32`
on a target that only supports `wavefrontsize32`, and the vice versa.
2026-01-19 18:16:29 -05:00
Shoreshen
26624d51d1
[AMDGPU]Add specific instruction feature for multicast load (#175503) 2026-01-13 09:10:09 +08:00
Mirko Brkušanin
5759a3a779
[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501) 2025-12-10 09:45:13 +01:00
Shoreshen
52a58a4193
[AMDGPU] Adding instruction specific features (#167809) 2025-11-19 11:06:00 +08:00
Amit Kumar Pandey
36d477850f
[ASan] Skip explicit check of 'xnack' feature for gfx1250 && gfx1251. (#166754)
Xnack processing is essential and performed at the frontend to enable
ASan instrumentation for AMDGPU device code. Certain AMDGPU subtargets
like gfx1250 && gfx1251 don't have to enable 'xnack+' explictly in
'--offload-arch=' for device ASan instrumentation.
2025-11-06 21:42:42 +05:30
Stanislav Mekhanoshin
9b5bc98743
[AMDGPU] Add intrinsics for v_[pk]_add_{min|max}_* instructions (#164731) 2025-10-22 17:46:33 -07:00
Shilei Tian
9e8dda1034
[NFC] Change spelling of cluster feature to "clusters" (#162103) 2025-10-06 15:55:39 +00:00
Shilei Tian
bea0225c30
[AMDGPU] Make cluster a target feature (#162040)
This replaces the original arch check.
2025-10-06 05:05:53 +00:00
Stanislav Mekhanoshin
e556dc0b23
[AMDGPU] Add gfx1251 subtarget (#159430) 2025-09-17 13:02:02 -07:00
Stanislav Mekhanoshin
a3762fb240
[AMDGPU] Add missing bf16-pk-insts feature to gfx1250 (#159167) 2025-09-16 13:58:40 -07:00
Stanislav Mekhanoshin
9cca295dcc
[AMDGPU] More radical feature initialization refactoring (#155222)
Factoring in flang, just have a single fillAMDGPUFeatureMap
function doing it all as an external interface and returing
an error.
2025-08-27 01:21:14 -07:00
Stanislav Mekhanoshin
8c6b7af50e
[AMDGPU] Refactor insertWaveSizeFeature (#154850)
If a wavefrontsize32 or wavefrontsize64 is the only possible value
insert it into feature list by default and use that value as an
indication that another wavefront size is not legal.
2025-08-27 00:30:15 -07:00
zGoldthorpe
d7074b63ed
[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216)
This patch exposes builtins for atomic `add`, `max`, and `min` operations that
operate over buffer resource pointers.
2025-08-05 11:04:15 -06:00
Alex Voicu
06458fff87
[AMDGCNSPIRV][NFC] Add missing target features to AMDGCNSPIRV (#152057)
`gfx1250` bring-up omitted updating the `amdgcnspirv` feature list, this
fixes that oversight.
2025-08-05 15:29:48 +01:00
Stanislav Mekhanoshin
0988510ad4
[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773) 2025-08-01 20:12:35 -07:00
Stanislav Mekhanoshin
62187a60e6
[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385) 2025-07-30 14:02:03 -07:00
Stanislav Mekhanoshin
9deb7f6062
[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466) 2025-07-24 12:13:59 -07:00
Shilei Tian
7e105fbdbe
[AMDGPU] Add support for v_tanh_f32 on gfx1250 (#149360)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 15:42:35 -04:00
Shilei Tian
d7ec80c897
[AMDGPU] Add support for v_tanh_bf16 on gfx1250 (#147425)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-14 16:30:18 -04:00
Shilei Tian
d258457d42
[AMDGPU] Add support for v_cvt_f32_fp8 on gfx1250 (#147579)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-08 16:21:24 -04:00
Changpeng Fang
4729242878
AMDGPU: Add MC layer support for load transpose instructions for gfx1250 (#146024)
Co-authored with @jayfoad
2025-06-26 22:30:31 -07:00
Stanislav Mekhanoshin
69974658f0
[AMDGPU] Initial support for gfx1250 target. (#144965)
This is just a stub for now.
2025-06-19 22:52:51 -07:00
Karlo Basioli
3de01d07c3
Fix bazel build after #144594, mark variable as potentially unused (#144910) 2025-06-19 16:16:03 +01:00
zhijian lin
bf79d4819e
[Reland] [PowerPC] frontend get target feature from backend with cpu name (#144594)
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.

The reland patch addressed the comment
https://github.com/llvm/llvm-project/pull/137670#discussion_r2143541120
2025-06-19 09:22:16 -04:00
Reid Kleckner
cbf27bf711 Revert " [PowerPC] frontend get target feature from backend with cpu name (#137670)"
This reverts commit 9208b343e962b9f1140ee345c0050a3920bdcbf2.

TargetParser shouldn't re-run the PPC subtarget tablegen target, it
should define its own `-gen-ppc-target-def` rule like all the other
targets do in llvm/include/llvm/TargetParser/CMakeLists.txt .

One user reported that there are incorrect CMake dependencies after this
change, so I will roll this back in the meantime.
2025-06-12 19:56:41 +00:00
zhijian lin
9208b343e9
[PowerPC] frontend get target feature from backend with cpu name (#137670)
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
2025-06-12 13:38:13 -04:00
Juan Manuel Martinez Caamaño
d6c1ef576f
[AMDGPU] vmem-to-lds-load-insts incoherence between TargetParser and AMDGPU.td (#135376)
The vmem-to-lds-loads-insts feature is only available on gfx9/10. While
target-parser was also enabling it for gfx6,7,8.
2025-04-11 16:31:04 +02:00
Juan Manuel Martinez Caamaño
beae0e9f1a
[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055)
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.

This feature is only available on gfx9/10.

A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
2025-04-02 20:00:09 +02:00
Fabian Ritter
8615f9aaff
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.

Documentation changes will follow.

For SWDEV-512631
2025-02-19 10:20:48 +01:00
Alex Voicu
b08b56381c
[NFC][AMDGPU] Clean-up feature parsing for AMDGCNSPIRV. (#123519)
When we did the initial AMDGCNSPIRV commits we left the initialisation
of the feature map in a relatively disorderly state. This change
corrects that oversight:

- We make sure that AMDGCNSPIRV actually advertises the union of all
AMDGCN features, as some were not included;
- We keep feature initialisation in sorted order to make it easy to pick
an insertion point when features are added in the future.
2025-01-20 02:30:29 +00:00
Matt Arsenault
5615657209
AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16|f16}_f32 instructions (#117824)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:37:05 -05:00
Matt Arsenault
62dc8f3069
AMDGPU: Add builtins & codegen support for bitop3_b{16|32} of gfx950. (#117823)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 23:33:07 -05:00
Matt Arsenault
0f4fcca546
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp|bf]6 for gfx950 (#117745)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:26:07 -05:00
Matt Arsenault
2b9e947d43
AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743)
OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z]
where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.

OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d]
where, c & d i.e. OPSEL[3 : 2] selects which dst_byte  to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:20:09 -05:00
Matt Arsenault
815069c701
AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16|f32]_[bf8|fp8] (#117739)
OPSEL[1:0] collectively decide which byte to read
from src input.

Builtin takes additional imm argument which
represents index (with valid values:[0:3]) of src
byte read. Out of bounds checks will added in next
patch.

OPSEL ASM Syntax: opsel:[x,y,z]
where,
    opsel[x] = Inst{11} = src0_modifier{2}
    opsel[y] = Inst{12} = src1_modifier{2}
    opsel[z] = Inst{14} = src0_modifier{3}

Note: Inst{13} i.e. OPSEL[2] is ignored in
asm syntax and opsel[z] is meaningless
for v_cvt_scalef32_f32_{fp|bf}8

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 14:54:10 -05:00
Matt Arsenault
7fc71f7909
AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599)
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:54:50 -08:00
Matt Arsenault
716364ebd6
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:51:01 -08:00
Matt Arsenault
aa7eb5723c
AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597)
v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16.
All three instructions were part of Dot9 instructions in the compiler.

This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16)
into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16).

All necessary changes to gfx11 and gfx12 are updated to reflect this change.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:47:48 -08:00
Matt Arsenault
5d650a62a3
AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596)
This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32
instructions.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:44:47 -08:00
Matt Arsenault
22503a9df1
AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (#117592)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-25 19:27:01 -08:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Matt Arsenault
ca1b35a6c8
AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310)
Rand num instruction for stochastic rounding.
2024-11-18 10:54:54 -08:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Carl Ritson
076aac59ac
[AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Stanislav Mekhanoshin
f363e30f15
[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633) 2024-07-09 14:25:58 -07:00