llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	34aed0ed56	[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194 )	2025-08-05 15:15:21 -07:00
zGoldthorpe	d7074b63ed	[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216 ) This patch exposes builtins for atomic `add`, `max`, and `min` operations that operate over buffer resource pointers.	2025-08-05 11:04:15 -06:00
Stanislav Mekhanoshin	a153e83e41	[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036 )	2025-08-04 19:16:34 -07:00
Stanislav Mekhanoshin	7598c25b5a	[AMDGPU] v_cvt_scale_pk16 gfx1250 instructions (#151804 )	2025-08-02 10:45:02 -07:00
Stanislav Mekhanoshin	c7bb105e97	[AMDGPU] Add v_cvt_scale_pk8_* gfx1250 instructions (#151616 )	2025-07-31 18:55:59 -07:00
Changpeng Fang	d7a38a94cd	[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540 )	2025-07-24 16:23:33 -07:00
Stanislav Mekhanoshin	9deb7f6062	[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466 )	2025-07-24 12:13:59 -07:00
Changpeng Fang	d6094370cb	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-21 10:09:42 -07:00
YexuanXiao	7c402b8b81	Reland [Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types (#149613 ) The checks for the 'z' and 't' format specifiers added in the original PR #143653 had some issues and were overly strict, causing some build failures and were consequently reverted at `4c85bf2fe8`. In the latest commit `27c58629ec`, I relaxed the checks for the 'z' and 't' format specifiers, so warnings are now only issued when they are used with mismatched types. The original intent of these checks was to diagnose code that assumes the underlying type of `size_t` is `unsigned` or `unsigned long`, for example: ```c printf("%zu", 1ul); // Not portable, but not an error when size_t is unsigned long ``` However, it produced a significant number of false positives. This was partly because Clang does not treat the `typedef` `size_t` and `__size_t` as having a common "sugar" type, and partly because a large amount of existing code either assumes `unsigned` (or `unsigned long`) is `size_t`, or they define the equivalent of size_t in their own way (such as sanitizer_internal_defs.h).`2e67dcfdcd/compiler-rt/lib/sanitizer_common/sanitizer_internal_defs.h (L203)`	2025-07-19 03:44:14 -03:00
Kazu Hirata	4c85bf2fe8	Revert "[Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types instead of built-in types (#143653 )" This reverts commit c27e283cfbca2bd22f34592430e98ee76ed60ad8. A builbot failure has been reported: https://lab.llvm.org/buildbot/#/builders/186/builds/10819/steps/10/logs/stdio I'm also getting a large number of warnings related to %zu and %zx.	2025-07-17 21:04:01 -07:00
YexuanXiao	c27e283cfb	[Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types instead of built-in types (#143653 ) Including the results of `sizeof`, `sizeof...`, `__datasizeof`, `__alignof`, `_Alignof`, `alignof`, `_Countof`, `size_t` literals, and signed `size_t` literals, the results of pointer-pointer subtraction and checks for standard library functions (and their calls). The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics. The previous discussion can be found at #136542. This PR implements this feature by introducing a new subtype of `Type` called `PredefinedSugarType`, which was considered appropriate in discussions. I tried to keep `PredefinedSugarType` simple enough yet not limited to `size_t` and `ptrdiff_t` so that it can be used for other purposes. `PredefinedSugarType` wraps a canonical `Type` and provides a name, conceptually similar to a compiler internal `TypedefType` but without depending on a `TypedefDecl` or a source file. Additionally, checks for the `z` and `t` format specifiers in format strings for `scanf` and `printf` were added. It will precisely match expressions using `typedef`s or built-in expressions. The affected tests indicates that it works very well. Several code require that `SizeType` is canonical, so I kept `SizeType` to its canonical form. The failed tests in CI are allowed to fail. See the [comment](https://github.com/llvm/llvm-project/pull/135386#issuecomment-3049426611) in another PR #135386.	2025-07-17 22:45:57 -03:00
zGoldthorpe	85349b4936	[clang][amdgpu] Add builtin for struct buffer lds load (#148950 ) This is essentially just a revision of #137678 which only exposes a builtin for the intrinsic `llvm.amdgcn.struct.ptr.buffer.load.lds`, which expects an `__amdgpu_buffer_rsrc_t` rather than a `v4i32` as its first argument. The reason for excluding the other intrinsics exposed by the cited PR is because the intrinsics taking a `v4i32` are legacy and should be deprecated.	2025-07-16 07:23:09 -06:00
Changpeng Fang	c962f2b29d	AMDGPU: Implement builtins for gfx1250 wmma instructions (#148991 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>	2025-07-15 18:17:12 -07:00
Changpeng Fang	8c1b516948	AMDGPU: Implement s_wait_asynccnt and s_wait_tensorcnt for gfx1250 (#148292 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2025-07-11 15:07:21 -07:00
Shilei Tian	d258457d42	[AMDGPU] Add support for `v_cvt_f32_fp8` on gfx1250 (#147579 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-08 16:21:24 -04:00
Changpeng Fang	eda3161c35	AMDGPU: Implement tensor load and store instructions for gfx1250 (#146636 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-03 13:49:34 -07:00
Juan Manuel Martinez Caamaño	03919ef8d9	[Clang][OpenCL] Declare cl_amd_media_ops/cl_amd_media_ops2 builtins with -fdeclare-opencl-builtins (#143507 ) cl_amd_media_ops/cl_amd_media_ops2 builtins are currently not declared with -fdeclare-opencl-builtins. This patch adds support for these builtins.	2025-07-02 16:55:24 +02:00
Shilei Tian	749c7c5dc4	[AMDGPU] Add support for `v_cvt_f16_bf8` on gfx1250 (#146305 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-06-30 07:54:55 -04:00
Shilei Tian	a99c964d7f	[AMDGPU] Add support for `v_cvt_f16_fp8` on gfx1250 (#146302 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-06-30 07:51:00 -04:00
Changpeng Fang	bb982e733c	AMDGPU: support s_monitor_sleep on gfx1250 (#146293 ) Co-Authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-06-29 21:01:47 -07:00
Stanislav Mekhanoshin	40eee8ec7f	[AMDGPU] Add s_setprio_inc_wg gfx1250 instruction (#145152 )	2025-06-22 12:52:05 -07:00
Aaron Ballman	9eef4d1c5f	Remove delayed typo expressions (#143423 ) This removes the delayed typo correction functionality from Clang (regular typo correction still remains) due to fragility of the solution. An RFC was posted here: https://discourse.llvm.org/t/rfc-removing-support-for-delayed-typo-correction/86631 and while that RFC was asking for folks to consider stepping up to be maintainers, and we did have a few new contributors show some interest, experiments show that it's likely worth it to remove this functionality entirely and focus efforts on improving regular typo correction. This removal fixes ~20 open issues (quite possibly more), improves compile time performance by roughly .3-.4% (https://llvm-compile-time-tracker.com/?config=Overview&stat=instructions%3Au&remote=AaronBallman&sortBy=date), and does not appear to regress diagnostic behavior in a way we wouldn't find acceptable. Fixes #142457 Fixes #139913 Fixes #138850 Fixes #137867 Fixes #137860 Fixes #107840 Fixes #93308 Fixes #69470 Fixes #59391 Fixes #58172 Fixes #46215 Fixes #45915 Fixes #45891 Fixes #44490 Fixes #36703 Fixes #32903 Fixes #23312 Fixes #69874	2025-06-13 06:45:40 -04:00
Krzysztof Drewniak	4bdd116b80	[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425 ) This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.	2025-05-19 07:15:04 -07:00
Aaron Ballman	a635bbf141	[C] Update the -Wdefault-const-init-unsafe wording (#138266 ) This drops the "and is incompatible with C++" phrasing from the diagnostic unless -Wc++-compat is explicitly passed. This makes the diagnostic less confusing when it is on by default rather than enabled because of C++ compatibility concerns	2025-05-02 12:23:40 -04:00
Aaron Ballman	576161cb60	[C] Warn on uninitialized const objects (#137166 ) Unlike C++, C allows the definition of an uninitialized `const` object. If the object has static or thread storage duration, it is still zero-initialized, otherwise, the object is left uninitialized. In either case, the code is not compatible with C++. This adds a new diagnostic group, `-Wdefault-const-init-unsafe`, which is on by default and diagnoses any definition of a `const` object which remains uninitialized. It also adds another new diagnostic group, `-Wdefault-const-init` (which also enabled the `unsafe` variant) that diagnoses any definition of a `const` object (including ones which are zero-initialized). This diagnostic is off by default. Finally, it adds `-Wdefault-const-init` to `-Wc++-compat`. GCC diagnoses these situations under this flag. Fixes #19297	2025-04-25 08:21:41 -04:00
Chris B	52e0337ea3	[HLSL][OpenCL] Strip addrspace from implicit cast diags (#135830 ) The address space of a source value for an implicit cast isn't really relevant when emitting conversion warnings. Since the lvalue->rvalue cast effectively removes the address space they don't factor in, but they do create visual noise in the diagnostics. This is a small quality-of-life fixup to get in as HLSL adopts more address space annotations.	2025-04-16 12:13:19 -05:00
Matheus Izvekov	fceb9cecdf	[clang] consistently quote expressions in diagnostics (#134769 )	2025-04-15 04:18:23 -03:00
Juan Manuel Martinez Caamaño	d6c1ef576f	[AMDGPU] vmem-to-lds-load-insts incoherence between TargetParser and AMDGPU.td (#135376 ) The vmem-to-lds-loads-insts feature is only available on gfx9/10. While target-parser was also enabling it for gfx6,7,8.	2025-04-11 16:31:04 +02:00
Jay Foad	e2fe78797f	[Clang] Use "syncscope" instead of "synchscope". NFC. (#134616 ) This matches the spelling of the keyword in LLVM IR.	2025-04-07 13:32:36 +01:00
Juan Manuel Martinez Caamaño	041e84261a	[Clang][AMDGPU] Expose buffer load lds as a clang builtin (#132048 ) CK is using either inline assembly or inline LLVM-IR builtins to generate buffer_load_dword lds instructions. This patch exposes this instruction as a Clang builtin available on gfx9 and gfx10. Related to SWDEV-519702 and SWDEV-518861	2025-04-03 09:22:38 +02:00
Juan Manuel Martinez Caamaño	beae0e9f1a	[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055 ) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.	2025-04-02 20:00:09 +02:00
Juan Manuel Martinez Caamaño	0375ef07c3	[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741 ) This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as a 4-bit signed integer and returns `0.0625f * src`. SWDEV-518861	2025-04-02 19:51:40 +02:00
Alexander Shaposhnikov	2dc123b33d	[clang][opencl] Allow passing all zeros to reqd_work_group_size (#131543 ) Allow passing all zeros to reqd_work_group_size. Test plan: ninja check-all	2025-03-16 16:21:46 -07:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Sven van Haastregt	957213f60b	[OpenCL] Diagnose block references in selection operator (#114824 ) In addition to the invocation case that is already diagnosed, also diagnose when a block reference appears on either side of a ternary selection operator. Until now, clang would accept the added test case only to crash during code generation.	2025-01-22 09:49:56 +01:00
Matt Arsenault	a2c3e0c4cb	AMDGPU/clang: Add global_load_lds size check support for gfx950 (#117825 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:41:09 -05:00
Matt Arsenault	5615657209	AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16\|f16}_f32 instructions (#117824 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:37:05 -05:00
Matt Arsenault	62dc8f3069	AMDGPU: Add builtins & codegen support for bitop3_b{16\|32} of gfx950. (#117823 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:33:07 -05:00
Matt Arsenault	265e209ceb	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_{bf8\|fp8}_{f16\|bf16\|f32} (#117821 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:24:01 -05:00
Matt Arsenault	76715787f4	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_pk_fp4 instructions (#117798 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 19:59:14 -05:00
Matt Arsenault	c8ee1ee057	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk_fp4_{f\|bf}16 for gfx950 (#117794 ) These instructions have non-standard use of OPSEL bits to select dest write byte. The src2_modifiers operand is used without having its corresponding src2 operand by introducing dummy src2. OPSEL ASM OPSEL Syntax: opsel:[a,b,c,d] a & b are meaningless, c & d together decides byte to write in dst reg. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:38:23 -05:00
Matt Arsenault	065dc93d96	AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{bf\|f}16_{bf\|fp}8 for gfx950 (#117793 ) OPSEL[0] selects src_word to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:35:18 -05:00
Matt Arsenault	eeb76880f3	AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{f\|bf}16_fp4 for gfx950 (#117744 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_{f\|bf}16_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. Note: Conventional Inst{13} i.e. OPSEL[2] is ignored in asm syntax. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:23:15 -05:00
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	4527894143	Builtins & Codegen support for v_cvt_scalef32_pk_{fp\|bf}8_{f\|bf}16 for gfx950 (#117742 ) OPSEL[3] determines low/high 16 bits of word to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:16:08 -05:00
Matt Arsenault	62584f32eb	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_f32_{fp8\|bf8} for gfx950 (#117741 ) OPSEL[0] determines low/high 16 bits of src0 to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:12:18 -05:00
Matt Arsenault	803bd812b1	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_{fp8\|bf8}_f32 for gfx950 (#117740 ) OPSEL[3] determines low/high 16 bits of word to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:57:09 -05:00
Matt Arsenault	815069c701	AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16\|f32]_[bf8\|fp8] (#117739 ) OPSEL[1:0] collectively decide which byte to read from src input. Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch. OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3} Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp\|bf}8 Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:54:10 -05:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Matt Arsenault	7d544c64e3	AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (#117259 )	2024-11-22 12:11:06 -08:00

1 2 3 4 5 ...

546 Commits