llvm-project

Author	SHA1	Message	Date
Mirko Brkušanin	4280f0d241	[AMDGPU] Add dot4 fp8/bf8 instructions for gfx1170 (#180516 )	2026-02-10 12:14:49 +01:00
Mirko Brkušanin	45b037cf7a	[AMDGPU] Add fp8/bf8 conversion instructions for gfx1170 (#180191 )	2026-02-09 13:56:43 +01:00
Mirko Brkušanin	20b5849e17	[AMDGPU] Define new target gfx1170 (#180185 )	2026-02-06 14:38:50 +01:00
Mariusz Sikora	6de6f7b46b	[AMDGPU] Define gfx1310 target with ELF number 0x50 (#177355 ) For now this is identical to gfx1250. --------- Co-authored-by: Jay Foad <jay.foad@amd.com>	2026-01-22 17:08:38 +01:00
Shilei Tian	39bd4562ba	[Clang][AMDGPU] Handle `wavefrontsize32` and `wavefrontsize64` features more robustly (#176599 ) We should not allow `-wavefrontsize32` and `-wavefrontsize64` to be specified at the same time. We should also not allow `-wavefrontsize32` on a target that only supports `wavefrontsize32`, and the vice versa.	2026-01-19 18:16:29 -05:00
Shoreshen	26624d51d1	[AMDGPU]Add specific instruction feature for multicast load (#175503 )	2026-01-13 09:10:09 +08:00
Mirko Brkušanin	5759a3a779	[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501 )	2025-12-10 09:45:13 +01:00
Shoreshen	52a58a4193	[AMDGPU] Adding instruction specific features (#167809 )	2025-11-19 11:06:00 +08:00
Amit Kumar Pandey	36d477850f	[ASan] Skip explicit check of 'xnack' feature for gfx1250 && gfx1251. (#166754 ) Xnack processing is essential and performed at the frontend to enable ASan instrumentation for AMDGPU device code. Certain AMDGPU subtargets like gfx1250 && gfx1251 don't have to enable 'xnack+' explictly in '--offload-arch=' for device ASan instrumentation.	2025-11-06 21:42:42 +05:30
Stanislav Mekhanoshin	9b5bc98743	[AMDGPU] Add intrinsics for v_[pk]_add_{min\|max}_* instructions (#164731 )	2025-10-22 17:46:33 -07:00
Shilei Tian	9e8dda1034	[NFC] Change spelling of cluster feature to "clusters" (#162103 )	2025-10-06 15:55:39 +00:00
Shilei Tian	bea0225c30	[AMDGPU] Make cluster a target feature (#162040 ) This replaces the original arch check.	2025-10-06 05:05:53 +00:00
Stanislav Mekhanoshin	e556dc0b23	[AMDGPU] Add gfx1251 subtarget (#159430 )	2025-09-17 13:02:02 -07:00
Stanislav Mekhanoshin	a3762fb240	[AMDGPU] Add missing bf16-pk-insts feature to gfx1250 (#159167 )	2025-09-16 13:58:40 -07:00
Stanislav Mekhanoshin	9cca295dcc	[AMDGPU] More radical feature initialization refactoring (#155222 ) Factoring in flang, just have a single fillAMDGPUFeatureMap function doing it all as an external interface and returing an error.	2025-08-27 01:21:14 -07:00
Stanislav Mekhanoshin	8c6b7af50e	[AMDGPU] Refactor insertWaveSizeFeature (#154850 ) If a wavefrontsize32 or wavefrontsize64 is the only possible value insert it into feature list by default and use that value as an indication that another wavefront size is not legal.	2025-08-27 00:30:15 -07:00
zGoldthorpe	d7074b63ed	[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216 ) This patch exposes builtins for atomic `add`, `max`, and `min` operations that operate over buffer resource pointers.	2025-08-05 11:04:15 -06:00
Alex Voicu	06458fff87	[AMDGCNSPIRV][NFC] Add missing target features to AMDGCNSPIRV (#152057 ) `gfx1250` bring-up omitted updating the `amdgcnspirv` feature list, this fixes that oversight.	2025-08-05 15:29:48 +01:00
Stanislav Mekhanoshin	0988510ad4	[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773 )	2025-08-01 20:12:35 -07:00
Stanislav Mekhanoshin	62187a60e6	[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385 )	2025-07-30 14:02:03 -07:00
Stanislav Mekhanoshin	9deb7f6062	[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466 )	2025-07-24 12:13:59 -07:00
Shilei Tian	7e105fbdbe	[AMDGPU] Add support for `v_tanh_f32` on gfx1250 (#149360 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 15:42:35 -04:00
Shilei Tian	d7ec80c897	[AMDGPU] Add support for `v_tanh_bf16` on gfx1250 (#147425 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-14 16:30:18 -04:00
Shilei Tian	d258457d42	[AMDGPU] Add support for `v_cvt_f32_fp8` on gfx1250 (#147579 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-08 16:21:24 -04:00
Changpeng Fang	4729242878	AMDGPU: Add MC layer support for load transpose instructions for gfx1250 (#146024 ) Co-authored with @jayfoad	2025-06-26 22:30:31 -07:00
Stanislav Mekhanoshin	69974658f0	[AMDGPU] Initial support for gfx1250 target. (#144965 ) This is just a stub for now.	2025-06-19 22:52:51 -07:00
Karlo Basioli	3de01d07c3	Fix bazel build after #144594 , mark variable as potentially unused (#144910 )	2025-06-19 16:16:03 +01:00
zhijian lin	bf79d4819e	[Reland] [PowerPC] frontend get target feature from backend with cpu name (#144594 ) 1. The PR proceeds with a backend target hook to allow front-ends to determine what target features are available in a compilation based on the CPU name. 2. Fix a backend target feature bug that supports HTM for Power8/9/10/11. However, HTM is only supported on Power8/9 according to the ISA. 3. All target features that are hardcoded in PPC.cpp can be retrieved from the backend target feature. I have double-checked that the hardcoded logic for inferring target features from the CPU in the frontend(PPC.cpp) is the same as in PPC.td. The reland patch addressed the comment https://github.com/llvm/llvm-project/pull/137670#discussion_r2143541120	2025-06-19 09:22:16 -04:00
Reid Kleckner	cbf27bf711	Revert " [PowerPC] frontend get target feature from backend with cpu name (#137670 )" This reverts commit 9208b343e962b9f1140ee345c0050a3920bdcbf2. TargetParser shouldn't re-run the PPC subtarget tablegen target, it should define its own `-gen-ppc-target-def` rule like all the other targets do in llvm/include/llvm/TargetParser/CMakeLists.txt . One user reported that there are incorrect CMake dependencies after this change, so I will roll this back in the meantime.	2025-06-12 19:56:41 +00:00
zhijian lin	9208b343e9	[PowerPC] frontend get target feature from backend with cpu name (#137670 ) 1. The PR proceeds with a backend target hook to allow front-ends to determine what target features are available in a compilation based on the CPU name. 2. Fix a backend target feature bug that supports HTM for Power8/9/10/11. However, HTM is only supported on Power8/9 according to the ISA. 3. All target features that are hardcoded in PPC.cpp can be retrieved from the backend target feature. I have double-checked that the hardcoded logic for inferring target features from the CPU in the frontend(PPC.cpp) is the same as in PPC.td.	2025-06-12 13:38:13 -04:00
Juan Manuel Martinez Caamaño	d6c1ef576f	[AMDGPU] vmem-to-lds-load-insts incoherence between TargetParser and AMDGPU.td (#135376 ) The vmem-to-lds-loads-insts feature is only available on gfx9/10. While target-parser was also enabling it for gfx6,7,8.	2025-04-11 16:31:04 +02:00
Juan Manuel Martinez Caamaño	beae0e9f1a	[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055 ) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.	2025-04-02 20:00:09 +02:00
Fabian Ritter	8615f9aaff	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631	2025-02-19 10:20:48 +01:00
Alex Voicu	b08b56381c	[NFC][AMDGPU] Clean-up feature parsing for AMDGCNSPIRV. (#123519 ) When we did the initial AMDGCNSPIRV commits we left the initialisation of the feature map in a relatively disorderly state. This change corrects that oversight: - We make sure that AMDGCNSPIRV actually advertises the union of all AMDGCN features, as some were not included; - We keep feature initialisation in sorted order to make it easy to pick an insertion point when features are added in the future.	2025-01-20 02:30:29 +00:00
Matt Arsenault	5615657209	AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16\|f16}_f32 instructions (#117824 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:37:05 -05:00
Matt Arsenault	62dc8f3069	AMDGPU: Add builtins & codegen support for bitop3_b{16\|32} of gfx950. (#117823 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:33:07 -05:00
Matt Arsenault	0f4fcca546	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 for gfx950 (#117745 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:26:07 -05:00
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	815069c701	AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16\|f32]_[bf8\|fp8] (#117739 ) OPSEL[1:0] collectively decide which byte to read from src input. Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch. OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3} Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp\|bf}8 Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:54:10 -05:00
Matt Arsenault	7fc71f7909	AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599 ) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:54:50 -08:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	aa7eb5723c	AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597 ) v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler. This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16). All necessary changes to gfx11 and gfx12 are updated to reflect this change. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:47:48 -08:00
Matt Arsenault	5d650a62a3	AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596 ) This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:44:47 -08:00
Matt Arsenault	22503a9df1	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:27:01 -08:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Matt Arsenault	ca1b35a6c8	AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310 ) Rand num instruction for stochastic rounding.	2024-11-18 10:54:54 -08:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Shilei Tian	de0fd64bed	[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190 ) This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.	2024-11-12 23:11:05 -05:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Stanislav Mekhanoshin	f363e30f15	[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633 )	2024-07-09 14:25:58 -07:00

1 2

77 Commits