llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	62187a60e6	[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385 )	2025-07-30 14:02:03 -07:00
Stanislav Mekhanoshin	9deb7f6062	[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466 )	2025-07-24 12:13:59 -07:00
Shilei Tian	7e105fbdbe	[AMDGPU] Add support for `v_tanh_f32` on gfx1250 (#149360 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 15:42:35 -04:00
Shilei Tian	d7ec80c897	[AMDGPU] Add support for `v_tanh_bf16` on gfx1250 (#147425 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-14 16:30:18 -04:00
Shilei Tian	d258457d42	[AMDGPU] Add support for `v_cvt_f32_fp8` on gfx1250 (#147579 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-08 16:21:24 -04:00
Changpeng Fang	4729242878	AMDGPU: Add MC layer support for load transpose instructions for gfx1250 (#146024 ) Co-authored with @jayfoad	2025-06-26 22:30:31 -07:00
Stanislav Mekhanoshin	69974658f0	[AMDGPU] Initial support for gfx1250 target. (#144965 ) This is just a stub for now.	2025-06-19 22:52:51 -07:00
Karlo Basioli	3de01d07c3	Fix bazel build after #144594 , mark variable as potentially unused (#144910 )	2025-06-19 16:16:03 +01:00
zhijian lin	bf79d4819e	[Reland] [PowerPC] frontend get target feature from backend with cpu name (#144594 ) 1. The PR proceeds with a backend target hook to allow front-ends to determine what target features are available in a compilation based on the CPU name. 2. Fix a backend target feature bug that supports HTM for Power8/9/10/11. However, HTM is only supported on Power8/9 according to the ISA. 3. All target features that are hardcoded in PPC.cpp can be retrieved from the backend target feature. I have double-checked that the hardcoded logic for inferring target features from the CPU in the frontend(PPC.cpp) is the same as in PPC.td. The reland patch addressed the comment https://github.com/llvm/llvm-project/pull/137670#discussion_r2143541120	2025-06-19 09:22:16 -04:00
Reid Kleckner	cbf27bf711	Revert " [PowerPC] frontend get target feature from backend with cpu name (#137670 )" This reverts commit 9208b343e962b9f1140ee345c0050a3920bdcbf2. TargetParser shouldn't re-run the PPC subtarget tablegen target, it should define its own `-gen-ppc-target-def` rule like all the other targets do in llvm/include/llvm/TargetParser/CMakeLists.txt . One user reported that there are incorrect CMake dependencies after this change, so I will roll this back in the meantime.	2025-06-12 19:56:41 +00:00
zhijian lin	9208b343e9	[PowerPC] frontend get target feature from backend with cpu name (#137670 ) 1. The PR proceeds with a backend target hook to allow front-ends to determine what target features are available in a compilation based on the CPU name. 2. Fix a backend target feature bug that supports HTM for Power8/9/10/11. However, HTM is only supported on Power8/9 according to the ISA. 3. All target features that are hardcoded in PPC.cpp can be retrieved from the backend target feature. I have double-checked that the hardcoded logic for inferring target features from the CPU in the frontend(PPC.cpp) is the same as in PPC.td.	2025-06-12 13:38:13 -04:00
Juan Manuel Martinez Caamaño	d6c1ef576f	[AMDGPU] vmem-to-lds-load-insts incoherence between TargetParser and AMDGPU.td (#135376 ) The vmem-to-lds-loads-insts feature is only available on gfx9/10. While target-parser was also enabling it for gfx6,7,8.	2025-04-11 16:31:04 +02:00
Juan Manuel Martinez Caamaño	beae0e9f1a	[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055 ) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.	2025-04-02 20:00:09 +02:00
Fabian Ritter	8615f9aaff	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631	2025-02-19 10:20:48 +01:00
Alex Voicu	b08b56381c	[NFC][AMDGPU] Clean-up feature parsing for AMDGCNSPIRV. (#123519 ) When we did the initial AMDGCNSPIRV commits we left the initialisation of the feature map in a relatively disorderly state. This change corrects that oversight: - We make sure that AMDGCNSPIRV actually advertises the union of all AMDGCN features, as some were not included; - We keep feature initialisation in sorted order to make it easy to pick an insertion point when features are added in the future.	2025-01-20 02:30:29 +00:00
Matt Arsenault	5615657209	AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16\|f16}_f32 instructions (#117824 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:37:05 -05:00
Matt Arsenault	62dc8f3069	AMDGPU: Add builtins & codegen support for bitop3_b{16\|32} of gfx950. (#117823 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:33:07 -05:00
Matt Arsenault	0f4fcca546	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 for gfx950 (#117745 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:26:07 -05:00
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	815069c701	AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16\|f32]_[bf8\|fp8] (#117739 ) OPSEL[1:0] collectively decide which byte to read from src input. Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch. OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3} Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp\|bf}8 Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:54:10 -05:00
Matt Arsenault	7fc71f7909	AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599 ) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:54:50 -08:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	aa7eb5723c	AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597 ) v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler. This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16). All necessary changes to gfx11 and gfx12 are updated to reflect this change. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:47:48 -08:00
Matt Arsenault	5d650a62a3	AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596 ) This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:44:47 -08:00
Matt Arsenault	22503a9df1	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:27:01 -08:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Matt Arsenault	ca1b35a6c8	AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310 ) Rand num instruction for stochastic rounding.	2024-11-18 10:54:54 -08:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Shilei Tian	de0fd64bed	[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190 ) This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.	2024-11-12 23:11:05 -05:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Stanislav Mekhanoshin	f363e30f15	[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633 )	2024-07-09 14:25:58 -07:00
Alex Voicu	88e2bb4092	[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796 ) This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features: - AMDGCN inline ASM is allowed/supported, under the assumption that the [SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc) extension is enabled/used - AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is enabled when using the downstream translator - the featureset matches the union of AMDGCN targets' features - the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the [SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc) extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong. Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).	2024-06-07 11:50:23 +01:00
Shilei Tian	1ca0055f45	[AMDGPU] Add a new target gfx1152 (#94534 )	2024-06-06 12:16:11 -04:00
Konstantin Zhuravlyov	775f1cd34d	AMDGPU: Add gfx12-generic target (#93875 )	2024-05-31 12:46:44 -04:00
Changpeng Fang	96813de52d	AMDGPU: Define a feature for v_dot4_f32_* instructions (#84248 ) FeatureDot11Insts (dot11-insts) for: v_dot4_f32_fp8_fp8, v_dot4_f32_fp8_bf8, v_dot4_f32_bf8_fp8, v_dot4_f32_bf8_bf8	2024-03-06 14:37:03 -08:00
Pierre van Houtryve	43c7eb5d7b	[AMDGPU] Replace '.' with '-' in generic target names (#81718 ) The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs & the associated clang driver logic are also confused by the dot. After discussions, we decided it's better to just remove the '.' from the target name than fix each issue one by one.	2024-02-14 15:19:04 +01:00
Pierre van Houtryve	f93aa5157a	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955 ) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.	2024-02-12 10:18:20 +01:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Jay Foad	e21b0b083e	[AMDGPU] Remove gws feature from GFX12 (#78711 ) This was already done for LLVM. This patch just updates the Clang builtin handling to match.	2024-01-19 15:45:53 +00:00
Jay Foad	ed12388082	[AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (#78709 ) That instruction is not supported on GFX12. Added a testcase which previously crashed without this change. Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>	2024-01-19 14:36:27 +00:00
Mariusz Sikora	3e6589f21c	[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917 ) - image_atomic_pk_add_f16 - image_atomic_pk_add_bf16 - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 - buffer_atomic_pk_add_bf16	2024-01-18 14:01:09 +01:00
Mariusz Sikora	264fd9e13e	[AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (#78439 )	2024-01-18 08:46:53 +01:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Jay Foad	9b374a800d	[AMDGPU] Add some clang-format off/on markers This keeps clang-format happy on future patches.	2023-11-23 09:50:55 +00:00
Jay Foad	e0d93d5aaa	[AMDGPU] Reindent some tables This keeps clang-format happy on future patches.	2023-11-23 09:49:03 +00:00
Ivan Kosarev	096eba148d	[TargetParser][AMDGPU] Fix getArchEntry(). (#69222 ) It's supposed to return null when an unknown target id is passed.	2023-10-17 14:54:29 +01:00
Yaxun (Sam) Liu	b8a9c50f22	[AMDGPU] Add target feature gws to clang Reviewed by: Matt Arsenault Differential Revision: https://reviews.llvm.org/D158367	2023-08-25 11:50:47 -04:00
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Yaxun (Sam) Liu	c0f0d50653	[HIP] emit macro `__HIP_NO_IMAGE_SUPPORT` HIP texture/image support is optional as some devices do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT is defined for device not supporting images (`d0448aa4c4/docs/reference/kernel_language.md (L426)` ) Currently the macro is defined by HIP header based on predefined macros for GPU, e.g __gfx*__ , which is error prone. This patch let clang emit the predefined macro. Reviewed by: Matt Arsenault, Artem Belevich Differential Revision: https://reviews.llvm.org/D151349	2023-06-14 22:53:41 -04:00
Kazu Hirata	143e131aa7	Do not unnecessarily include StringSwitch.h	2023-06-11 13:19:22 -07:00

1 2

58 Commits