llvm-project

Author	SHA1	Message	Date
zGoldthorpe	d7074b63ed	[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216 ) This patch exposes builtins for atomic `add`, `max`, and `min` operations that operate over buffer resource pointers.	2025-08-05 11:04:15 -06:00
Stanislav Mekhanoshin	0988510ad4	[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773 )	2025-08-01 20:12:35 -07:00
Piotr Sobczak	ae6a9ce000	[AMDGPU] Update tests (#151688 ) Fix two minor issues: - Add double quote - Remove unused prefix	2025-08-01 14:34:56 +02:00
Stanislav Mekhanoshin	62187a60e6	[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385 )	2025-07-30 14:02:03 -07:00
Stanislav Mekhanoshin	9deb7f6062	[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466 )	2025-07-24 12:13:59 -07:00
Shilei Tian	7e105fbdbe	[AMDGPU] Add support for `v_tanh_f32` on gfx1250 (#149360 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 15:42:35 -04:00
Shilei Tian	d7ec80c897	[AMDGPU] Add support for `v_tanh_bf16` on gfx1250 (#147425 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-14 16:30:18 -04:00
Shilei Tian	d258457d42	[AMDGPU] Add support for `v_cvt_f32_fp8` on gfx1250 (#147579 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-08 16:21:24 -04:00
Changpeng Fang	4729242878	AMDGPU: Add MC layer support for load transpose instructions for gfx1250 (#146024 ) Co-authored with @jayfoad	2025-06-26 22:30:31 -07:00
Stanislav Mekhanoshin	69974658f0	[AMDGPU] Initial support for gfx1250 target. (#144965 ) This is just a stub for now.	2025-06-19 22:52:51 -07:00
Stanislav Mekhanoshin	189702326a	[AMDGPU] Fix gfx1201 check line in the amdgpu-features.cl. NFC. (#138743 )	2025-05-06 17:05:44 -07:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Matt Arsenault	5615657209	AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16\|f16}_f32 instructions (#117824 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:37:05 -05:00
Matt Arsenault	62dc8f3069	AMDGPU: Add builtins & codegen support for bitop3_b{16\|32} of gfx950. (#117823 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:33:07 -05:00
Matt Arsenault	0f4fcca546	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 for gfx950 (#117745 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:26:07 -05:00
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	815069c701	AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16\|f32]_[bf8\|fp8] (#117739 ) OPSEL[1:0] collectively decide which byte to read from src input. Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch. OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3} Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp\|bf}8 Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:54:10 -05:00
Matt Arsenault	7fc71f7909	AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599 ) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:54:50 -08:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	aa7eb5723c	AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597 ) v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler. This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16). All necessary changes to gfx11 and gfx12 are updated to reflect this change. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:47:48 -08:00
Matt Arsenault	5d650a62a3	AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596 ) This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:44:47 -08:00
Matt Arsenault	22503a9df1	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:27:01 -08:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Matt Arsenault	ca1b35a6c8	AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310 ) Rand num instruction for stochastic rounding.	2024-11-18 10:54:54 -08:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Shilei Tian	de0fd64bed	[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190 ) This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.	2024-11-12 23:11:05 -05:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Shilei Tian	1ca0055f45	[AMDGPU] Add a new target gfx1152 (#94534 )	2024-06-06 12:16:11 -04:00
Fangrui Song	c5de4dd1ea	[test] %clang_cc1 -emit-llvm: remove redundant -S And replace -emit-llvm -o - with -emit-llvm-only	2024-05-04 17:00:29 -07:00
Changpeng Fang	96813de52d	AMDGPU: Define a feature for v_dot4_f32_* instructions (#84248 ) FeatureDot11Insts (dot11-insts) for: v_dot4_f32_fp8_fp8, v_dot4_f32_fp8_bf8, v_dot4_f32_bf8_fp8, v_dot4_f32_bf8_bf8	2024-03-06 14:37:03 -08:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Jay Foad	ed12388082	[AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (#78709 ) That instruction is not supported on GFX12. Added a testcase which previously crashed without this change. Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>	2024-01-19 14:36:27 +00:00
Mariusz Sikora	3e6589f21c	[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917 ) - image_atomic_pk_add_f16 - image_atomic_pk_add_bf16 - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 - buffer_atomic_pk_add_bf16	2024-01-18 14:01:09 +01:00
Mariusz Sikora	264fd9e13e	[AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (#78439 )	2024-01-18 08:46:53 +01:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Konstantin Zhuravlyov	9d05727972	AMDGPU: Add basic gfx942 target Differential Revision: https://reviews.llvm.org/D149983	2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov	1fc70210a6	AMDGPU: Add basic gfx941 target Differential Revision: https://reviews.llvm.org/D149982	2023-05-10 11:51:06 -04:00
Anshil Gandhi	a955a31896	[AMDGPU] Replace target feature for global fadd32 Change target feature of __builtin_amdgcn_global_atomic_fadd_f32 to atomic-fadd-rtn-insts. Enable atomic-fadd-rtn-insts for gfx90a, gfx940 and gfx1100 as they all support the return variant of `global_atomic_add_f32`. Fixes https://github.com/llvm/llvm-project/issues/61331. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D146840	2023-03-28 15:58:30 -06:00
Mariusz Sikora	ea064ee2a3	[AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions Introducing Subtarget Features for instructions: - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 Differential Revision: https://reviews.llvm.org/D146701	2023-03-24 13:10:40 +01:00
Stanislav Mekhanoshin	df0488369d	[AMDGPU] Split dot7 feature Differential Revision: https://reviews.llvm.org/D142507	2023-01-26 10:34:36 -08:00
Stanislav Mekhanoshin	870b92977e	[AMDGPU] Split dot8 feature Differential Revision: https://reviews.llvm.org/D142407	2023-01-24 11:16:07 -08:00
Stanislav Mekhanoshin	4ab2246d48	[AMDGPU] Remove dot1 and dot6 features from clang for gfx11 These are unsupported. Differential Revision: https://reviews.llvm.org/D142493	2023-01-24 10:52:42 -08:00
Matt Arsenault	81849497b4	clang/AMDGPU: Remove flat-address-space from feature map This was only used for checking if is_shared/is_private were legal, which we're not bothering to do anymore. This is apparently visible to more than the target attribute (which seems to silently ignore unrecognized features), so this has the potential to break something (i.e. see the OpenMP test change)	2023-01-05 16:35:04 -05:00
Matt Arsenault	f4bcd7f598	AMDGPU/clang: Add builtins for llvm.amdgcn.ballot Use explicit _w32/_w64 suffixes for the wave size to be consistent with the existing other wave dependent intrinsics. Also start diagnosing trying to use both wave32 and wave64. I would have preferred to avoid the +wavefrontsize64 spam on targets where that's the only option, but avoiding this seems to be more work than I expected.	2022-12-29 17:58:55 -05:00
Stanislav Mekhanoshin	9fa5a6b7e8	[AMDGPU] Support for gfx940 fp8 conversions Differential Revision: https://reviews.llvm.org/D129902	2022-07-18 11:48:43 -07:00
Joe Nash	8bdfc73f63	[AMDGPU][clang] Definition of gfx11 subtarget Contributors: Jay Foad <jay.foad@amd.com> Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> Patch 2/N for upstreaming of AMDGPU gfx11 architecture Depends on D124536 Reviewed By: foad, kzhuravl, #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D124537	2022-04-29 13:55:56 -04:00
Aakanksha	840695814a	[AMDGPU] Add gfx1036 target Differential Revision: https://reviews.llvm.org/D120846	2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin	2e2e64df4a	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688	2022-03-02 13:54:48 -08:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00

1 2

87 Commits