llvm-project

Author	SHA1	Message	Date
macurtis-amd	0c480dd4b6	[clang][CodeGen] cast addr space of ReturnValue if needed (#154380 ) Fixes a bug on AMDGPU targets where a pointer was stored as address space 5, but then loaded as address space 0. Issue found as part of [Kokkos](https://github.com/kokkos/kokkos) testing, specifically `hip.atomics` (see [core/unit_test/TestAtomics.hpp](https://github.com/kokkos/kokkos/blob/develop/core/unit_test/TestAtomics.hpp)). Issue was introduced by commit [39ec9de7c230](https://github.com/llvm/llvm-project/commit/39ec9de7c230) - [clang][CodeGen] sret args should always point to the alloca AS, so use that (https://github.com/llvm/llvm-project/pull/114062).	2025-08-21 04:38:55 -05:00
Stanislav Mekhanoshin	d0ee82040c	[AMDGPU] Add s_barrier_init\|join\|leave instructions (#153296 )	2025-08-12 15:07:07 -07:00
Leon Clark	9115bef8ee	[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138 ) Reopen #128938. Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-12 14:08:37 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Stanislav Mekhanoshin	34aed0ed56	[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194 )	2025-08-05 15:15:21 -07:00
zGoldthorpe	d7074b63ed	[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216 ) This patch exposes builtins for atomic `add`, `max`, and `min` operations that operate over buffer resource pointers.	2025-08-05 11:04:15 -06:00
Stanislav Mekhanoshin	a153e83e41	[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036 )	2025-08-04 19:16:34 -07:00
Simon Pilgrim	88c6448fa2	Revert "[VectorCombine] Shrink loads used in shufflevector rebroadcasts" (#151960 ) Reverts llvm/llvm-project#128938 while a crash regression is investigated	2025-08-04 15:03:53 +01:00
Leon Clark	1feed444aa	[VectorCombine] Shrink loads used in shufflevector rebroadcasts (#128938 ) Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-04 10:49:27 +01:00
Stanislav Mekhanoshin	d18511e10a	[AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (#151810 )	2025-08-02 15:21:59 -07:00
Stanislav Mekhanoshin	bc463c059c	[AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (#151807 )	2025-08-02 12:42:12 -07:00
Stanislav Mekhanoshin	7598c25b5a	[AMDGPU] v_cvt_scale_pk16 gfx1250 instructions (#151804 )	2025-08-02 10:45:02 -07:00
Stanislav Mekhanoshin	0988510ad4	[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773 )	2025-08-01 20:12:35 -07:00
Stanislav Mekhanoshin	cc3932bf29	[AMDGPU] gfx1250 v_cvt_scalef32_sr_pk8_* instructions (#151765 )	2025-08-01 19:25:57 -07:00
Stanislav Mekhanoshin	962ee7a568	[AMDGPU] gfx1250 v_cvt_scalef32_pk8_* instructions (#151758 )	2025-08-01 18:29:45 -07:00
Stanislav Mekhanoshin	33abf05af4	[AMDGPU] gfx1250 v_permlane_* instructions (#151749 )	2025-08-01 16:14:19 -07:00
Piotr Sobczak	ae6a9ce000	[AMDGPU] Update tests (#151688 ) Fix two minor issues: - Add double quote - Remove unused prefix	2025-08-01 14:34:56 +02:00
Stanislav Mekhanoshin	c7bb105e97	[AMDGPU] Add v_cvt_scale_pk8_* gfx1250 instructions (#151616 )	2025-07-31 18:55:59 -07:00
Stanislav Mekhanoshin	49d89bc9f4	[AMDGPU] Add gfx1250 cvt_pk\|sr_fp8\|bf8_f32 instructions (#151595 )	2025-07-31 16:04:46 -07:00
Stanislav Mekhanoshin	e46d938ddf	[AMDGPU] v_cvt_sr_pk_f16_f32 gfx1250 instruction (#151482 )	2025-07-31 12:25:55 -07:00
Stanislav Mekhanoshin	ce40863209	[AMDGPU] Add v_cvt_sr\|pk_bf8\|fp8_f16 gfx1250 instructions (#151415 )	2025-07-30 17:24:45 -07:00
Stanislav Mekhanoshin	62187a60e6	[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385 )	2025-07-30 14:02:03 -07:00
Changpeng Fang	3b66d4a987	[AMDGPU] Support builtin/intrinsics for async loads/stores on gfx1250 (#151058 )	2025-07-29 08:20:05 -07:00
Changpeng Fang	30ad2e24ab	[AMDGPU] Allow readonly features to be written to IR when there is no target (#148141 ) Fixes: SWDEV-541399	2025-07-29 08:18:00 -07:00
Changpeng Fang	d7a38a94cd	[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540 )	2025-07-24 16:23:33 -07:00
Stanislav Mekhanoshin	9deb7f6062	[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466 )	2025-07-24 12:13:59 -07:00
Changpeng Fang	d6094370cb	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-21 10:09:42 -07:00
Shilei Tian	2c50e4cac2	[AMDGPU] Add support for `v_sat_pk4_i4_[i8,u8]` on gfx1250 (#149528 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Foad, Jay <Jay.Foad@amd.com>	2025-07-18 13:08:50 -04:00
Shilei Tian	e11d28faee	[AMDGPU] Add support for `v_permlane16_swap_b32` on gfx1250 (#149518 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 13:05:08 -04:00
Shilei Tian	95b69e0e70	[AMDGPU] Add support for `v_prng_b32` on gfx1250 (#149450 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 10:59:47 -04:00
Shilei Tian	602d43cfd1	[Clang][AMDGPU] Add the missing builtin `__builtin_amdgcn_sqrt_bf16` (#149447 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 08:43:08 -04:00
Shilei Tian	aecd44818a	[AMDGPU] Add support for `v_tanh_f16` on gfx1250 (#149439 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 00:21:04 -04:00
Shilei Tian	7e105fbdbe	[AMDGPU] Add support for `v_tanh_f32` on gfx1250 (#149360 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 15:42:35 -04:00
Shilei Tian	fd5fc76c91	[AMDGPU] Add support for `v_cos_bf16` on gfx1250 (#149355 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 14:43:34 -04:00
Shilei Tian	a102342990	[AMDGPU] Add support for `v_sin_bf16` on gfx1250 (#149241 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 08:49:45 -04:00
Shilei Tian	a6b5ece75e	[AMDGPU] Add support for `v_exp_bf16` on gfx1250 (#149229 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 08:46:01 -04:00
Wenju He	b41398294c	[SPIR] Set MaxAtomicInlineWidth minimum size to 32 for spir32 and 64 for spir64 (#148997 ) Set MaxAtomicInlineWidth the same way as SPIR-V targets in 3cfd0c0d3697. This PR fixes build warning in scoped atomic built-in in #146814: `warning: large atomic operation may incur significant performance penalty; ; the access size (2 bytes) exceeds the max lock-free size (0 bytes) [-Watomic-alignment]`	2025-07-17 09:02:10 +08:00
Shilei Tian	ad6d5d2821	[AMDGPU] Add support for `v_log_bf16` on gfx1250 (#149201 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-16 19:09:34 -04:00
Shilei Tian	7d2a58e87d	[AMDGPU] Add support for `v_rsq_bf16` on gfx1250 (#149194 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-16 19:06:03 -04:00
zGoldthorpe	85349b4936	[clang][amdgpu] Add builtin for struct buffer lds load (#148950 ) This is essentially just a revision of #137678 which only exposes a builtin for the intrinsic `llvm.amdgcn.struct.ptr.buffer.load.lds`, which expects an `__amdgpu_buffer_rsrc_t` rather than a `v4i32` as its first argument. The reason for excluding the other intrinsics exposed by the cited PR is because the intrinsics taking a `v4i32` are legacy and should be deprecated.	2025-07-16 07:23:09 -06:00
Changpeng Fang	c962f2b29d	AMDGPU: Implement builtins for gfx1250 wmma instructions (#148991 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>	2025-07-15 18:17:12 -07:00
Shilei Tian	dabc8e2ec1	[AMDGPU] Add support for `v_rcp_bf16` on gfx1250 (#148916 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-15 16:12:51 -04:00
Shilei Tian	d7ec80c897	[AMDGPU] Add support for `v_tanh_bf16` on gfx1250 (#147425 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-14 16:30:18 -04:00
Changpeng Fang	8c1b516948	AMDGPU: Implement s_wait_asynccnt and s_wait_tensorcnt for gfx1250 (#148292 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2025-07-11 15:07:21 -07:00
jofrn	15d36aa4ce	[clang][CodeGen] Preserve addrspace of enqueue_kernel builtin. (#148062 ) __enqueue_kernel_varargs' last parameter is in addrspace(5), but CodeGen currently misses this qualifier. This commit fixes the code to preserve the qualifier by referencing Alloca, which has its casts removed, rather than TmpPtr.	2025-07-11 17:00:28 -04:00
Shilei Tian	d258457d42	[AMDGPU] Add support for `v_cvt_f32_fp8` on gfx1250 (#147579 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-08 16:21:24 -04:00
Changpeng Fang	eda3161c35	AMDGPU: Implement tensor load and store instructions for gfx1250 (#146636 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-03 13:49:34 -07:00
Changpeng Fang	5035d20dcb	AMDGPU: Implement ds_atomic_async_barrier_arrive_b64/ds_atomic_barrier_arrive_rtn_b64 (#146409 ) These two instructions are supported by gfx1250. We define the instructions and implement the corresponding intrinsic and builtin. Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-01 11:08:49 -07:00
Shilei Tian	749c7c5dc4	[AMDGPU] Add support for `v_cvt_f16_bf8` on gfx1250 (#146305 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-06-30 07:54:55 -04:00
Shilei Tian	a99c964d7f	[AMDGPU] Add support for `v_cvt_f16_fp8` on gfx1250 (#146302 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-06-30 07:51:00 -04:00

1 2 3 4 5 ...

945 Commits