llvm-project

Author	SHA1	Message	Date
Florian Hahn	7954a0514b	[Clang] Enable -fpointer-tbaa by default. (#117244 ) Support for more precise TBAA metadata has been added a while ago (behind the -fpointer-tbaa flag). The more precise TBAA metadata allows treating accesses of different pointer types as no-alias. This helps to remove more redundant loads and stores in a number of workloads. Some highlights on the impact across llvm-test-suite's MultiSource, SPEC2006 & SPEC2017 include: * +2% more NoAlias results for memory accesses * +3% more stores removed by DSE, * +4% more loops vectorized. This closes a relatively big gap to GCC, which has been supporting disambiguating based on pointer types for a long time. (https://clang.godbolt.org/z/K7Wbhrz4q) Pointer-TBAA support for pointers to builtin types has been added in https://github.com/llvm/llvm-project/pull/76612. Support for user-defined types has been added in https://github.com/llvm/llvm-project/pull/110569. There are 2 recent PRs with bug fixes for special cases uncovered during testing: * https://github.com/llvm/llvm-project/pull/116991 * https://github.com/llvm/llvm-project/pull/116596 PR: https://github.com/llvm/llvm-project/pull/117244	2024-12-04 20:55:18 +00:00
Matt Arsenault	e0f52538c9	AMDGPU: Change bitop3 intrinsic operand to i32 (#118647 )	2024-12-04 15:44:04 -05:00
Shilei Tian	68bcba6d7a	Revert "[AMDGPU] Use COV6 by default (#118515 )" This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.	2024-12-03 20:17:06 -05:00
Shilei Tian	410cbe3cf2	[AMDGPU] Use COV6 by default (#118515 )	2024-12-03 19:38:35 -05:00
Matt Arsenault	a796f597cd	AMDGPU: Allow f16/bf16 for DS_READ_TR16_B64 gfx950 builtins (#118297 ) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-12-02 14:40:36 -05:00
Matt Arsenault	a2c3e0c4cb	AMDGPU/clang: Add global_load_lds size check support for gfx950 (#117825 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:41:09 -05:00
Matt Arsenault	5615657209	AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16\|f16}_f32 instructions (#117824 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:37:05 -05:00
Matt Arsenault	62dc8f3069	AMDGPU: Add builtins & codegen support for bitop3_b{16\|32} of gfx950. (#117823 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:33:07 -05:00
Matt Arsenault	265e209ceb	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_{bf8\|fp8}_{f16\|bf16\|f32} (#117821 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:24:01 -05:00
Matt Arsenault	301c8e6047	AMDGPU: Add support for v_cvt_scalef32_sr instructions (#117820 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:20:16 -05:00
Matt Arsenault	76715787f4	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_pk_fp4 instructions (#117798 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 19:59:14 -05:00
Matt Arsenault	c8ee1ee057	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk_fp4_{f\|bf}16 for gfx950 (#117794 ) These instructions have non-standard use of OPSEL bits to select dest write byte. The src2_modifiers operand is used without having its corresponding src2 operand by introducing dummy src2. OPSEL ASM OPSEL Syntax: opsel:[a,b,c,d] a & b are meaningless, c & d together decides byte to write in dst reg. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:38:23 -05:00
Matt Arsenault	065dc93d96	AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{bf\|f}16_{bf\|fp}8 for gfx950 (#117793 ) OPSEL[0] selects src_word to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:35:18 -05:00
Matt Arsenault	991dcbc468	AMDGPU: Builtin & codegen support for v_cvt_scalef32_pk32_{bf\|f}16_{bf\|fp}6 for gfx950 (#117747 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:30:04 -05:00
Matt Arsenault	0f4fcca546	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 for gfx950 (#117745 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:26:07 -05:00
Matt Arsenault	eeb76880f3	AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{f\|bf}16_fp4 for gfx950 (#117744 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_{f\|bf}16_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. Note: Conventional Inst{13} i.e. OPSEL[2] is ignored in asm syntax. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:23:15 -05:00
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	4527894143	Builtins & Codegen support for v_cvt_scalef32_pk_{fp\|bf}8_{f\|bf}16 for gfx950 (#117742 ) OPSEL[3] determines low/high 16 bits of word to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:16:08 -05:00
Matt Arsenault	62584f32eb	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_f32_{fp8\|bf8} for gfx950 (#117741 ) OPSEL[0] determines low/high 16 bits of src0 to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:12:18 -05:00
Matt Arsenault	803bd812b1	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_{fp8\|bf8}_f32 for gfx950 (#117740 ) OPSEL[3] determines low/high 16 bits of word to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:57:09 -05:00
Matt Arsenault	815069c701	AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16\|f32]_[bf8\|fp8] (#117739 ) OPSEL[1:0] collectively decide which byte to read from src input. Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch. OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3} Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp\|bf}8 Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:54:10 -05:00
Matt Arsenault	7fc71f7909	AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599 ) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:54:50 -08:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	aa7eb5723c	AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597 ) v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler. This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16). All necessary changes to gfx11 and gfx12 are updated to reflect this change. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:47:48 -08:00
Matt Arsenault	5d650a62a3	AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596 ) This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:44:47 -08:00
Matt Arsenault	a87d484a97	AMDGPU: Support v_cvt_scalef32_2xpk16_{bf\|fp}6_f32 for gfx950. (#117595 ) Scale packed 16-component single-precision float vectors from two source inputs using the exponent provided by the third single-precision float input, then convert the values to a packed 32-component FP6 float value. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:41:12 -08:00
Matt Arsenault	22503a9df1	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:27:01 -08:00
Matt Arsenault	e97fb2207e	AMDGPU: Add support for load transpose instructions for gfx950 (#117378 ) This patch support for intrinsics in clang, as well as assembly instructions in the backend. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 09:39:04 -08:00
Alex Voicu	48ec59c234	[llvm][AMDGPU] Fold `llvm.amdgcn.wavefrontsize` early (#114481 ) Fold `llvm.amdgcn.wavefrontsize` early, during InstCombine, so that it's concrete value is used throughout subsequent optimisation passes.	2024-11-25 10:29:50 +00:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Matt Arsenault	7d544c64e3	AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (#117259 )	2024-11-22 12:11:06 -08:00
Matt Arsenault	90dc644d73	AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (#117258 )	2024-11-22 12:08:15 -08:00
Matt Arsenault	8d3435f8a1	AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (#117257 )	2024-11-22 12:02:18 -08:00
Matt Arsenault	8a5c24149d	AMDGPU: Add v_smfmac_f32_32x32x64_bf8_bf8 for gfx950 (#117256 )	2024-11-22 11:59:06 -08:00
Matt Arsenault	836d2dcf60	AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (#117235 )	2024-11-21 17:06:06 -08:00
Matt Arsenault	33124910c9	AMDGPU: Add v_smfmac_f32_16x16x128_fp8_bf8 for gfx950 (#117234 )	2024-11-21 17:03:03 -08:00
Matt Arsenault	3678f8a8aa	AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (#117233 )	2024-11-21 17:00:08 -08:00
Matt Arsenault	7baadb2a4e	AMDGPU: Add v_smfmac_f32_16x16x128_bf8_bf8 for gfx950 (#117232 )	2024-11-21 16:57:01 -08:00
Matt Arsenault	3e6f3508ad	AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (#117214 )	2024-11-21 15:01:03 -08:00
Matt Arsenault	8c53036146	AMDGPU: Add v_smfmac_i32_16x16x128_i8 for gfx950 (#117213 )	2024-11-21 14:58:11 -08:00
Matt Arsenault	42dd114a46	AMDGPU: Add v_smfmac_f32_32x32x32_bf16 for gfx950 (#117212 )	2024-11-21 14:52:11 -08:00
Matt Arsenault	95ddc1a63b	AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (#117211 )	2024-11-21 14:46:43 -08:00
Matt Arsenault	e50eaa2cf1	AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (#117205 )	2024-11-21 14:43:33 -08:00
Matt Arsenault	2ab178820b	AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (#117202 )	2024-11-21 14:40:30 -08:00
Matt Arsenault	1c47d67abc	AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950 (#117053 )	2024-11-21 14:28:05 -08:00
Matt Arsenault	f4ed79b160	AMDGPU: Add v_mfma_i32_32x32x32_i8 for gfx950 (#117052 )	2024-11-21 09:08:15 -08:00
Matt Arsenault	0a6e8741dd	AMDGPU: Shrink used number of registers for mfma scale based on format (#117047 ) Currently the builtins assume you are using an 8-bit format that requires an 8 element vector. We can shrink the number of registers if the format requires 4 or 6.	2024-11-21 09:08:05 -08:00
Matt Arsenault	76b24640e5	AMDGPU: Add v_mfma_i32_16x16x64_i8 for gfx950 (#116728 )	2024-11-21 09:02:12 -08:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Haopeng Liu	4d6e69143d	Add the initializes attribute inference (#117104 ) reland https://github.com/llvm/llvm-project/pull/97373 after fixing clang tests. Confirmed with "ninja check-llvm" and "ninja check-clang"	2024-11-20 19:15:23 -08:00

1 2 3 4 5 ...

837 Commits