945 Commits

Author SHA1 Message Date
macurtis-amd
0c480dd4b6
[clang][CodeGen] cast addr space of ReturnValue if needed (#154380)
Fixes a bug on AMDGPU targets where a pointer was stored as address
space 5, but then loaded as address space 0.

Issue found as part of [Kokkos](https://github.com/kokkos/kokkos)
testing, specifically `hip.atomics` (see
[core/unit_test/TestAtomics.hpp](https://github.com/kokkos/kokkos/blob/develop/core/unit_test/TestAtomics.hpp)).

Issue was introduced by commit
[39ec9de7c230](https://github.com/llvm/llvm-project/commit/39ec9de7c230)
- [clang][CodeGen] sret args should always point to the alloca AS, so
use that (https://github.com/llvm/llvm-project/pull/114062).
2025-08-21 04:38:55 -05:00
Stanislav Mekhanoshin
d0ee82040c
[AMDGPU] Add s_barrier_init|join|leave instructions (#153296) 2025-08-12 15:07:07 -07:00
Leon Clark
9115bef8ee
[VectorCombine] Shrink loads used in shufflevector rebroadcasts. (#153138)
Reopen #128938.

Attempt to shrink the size of vector loads where only some of the
incoming lanes are used for rebroadcasts in shufflevector instructions.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-12 14:08:37 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Stanislav Mekhanoshin
34aed0ed56
[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194) 2025-08-05 15:15:21 -07:00
zGoldthorpe
d7074b63ed
[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216)
This patch exposes builtins for atomic `add`, `max`, and `min` operations that
operate over buffer resource pointers.
2025-08-05 11:04:15 -06:00
Stanislav Mekhanoshin
a153e83e41
[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036) 2025-08-04 19:16:34 -07:00
Simon Pilgrim
88c6448fa2
Revert "[VectorCombine] Shrink loads used in shufflevector rebroadcasts" (#151960)
Reverts llvm/llvm-project#128938 while a crash regression is investigated
2025-08-04 15:03:53 +01:00
Leon Clark
1feed444aa
[VectorCombine] Shrink loads used in shufflevector rebroadcasts (#128938)
Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions.

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-04 10:49:27 +01:00
Stanislav Mekhanoshin
d18511e10a
[AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (#151810) 2025-08-02 15:21:59 -07:00
Stanislav Mekhanoshin
bc463c059c
[AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (#151807) 2025-08-02 12:42:12 -07:00
Stanislav Mekhanoshin
7598c25b5a
[AMDGPU] v_cvt_scale_pk16 gfx1250 instructions (#151804) 2025-08-02 10:45:02 -07:00
Stanislav Mekhanoshin
0988510ad4
[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773) 2025-08-01 20:12:35 -07:00
Stanislav Mekhanoshin
cc3932bf29
[AMDGPU] gfx1250 v_cvt_scalef32_sr_pk8_* instructions (#151765) 2025-08-01 19:25:57 -07:00
Stanislav Mekhanoshin
962ee7a568
[AMDGPU] gfx1250 v_cvt_scalef32_pk8_* instructions (#151758) 2025-08-01 18:29:45 -07:00
Stanislav Mekhanoshin
33abf05af4
[AMDGPU] gfx1250 v_permlane_* instructions (#151749) 2025-08-01 16:14:19 -07:00
Piotr Sobczak
ae6a9ce000
[AMDGPU] Update tests (#151688)
Fix two minor issues:
- Add double quote
- Remove unused prefix
2025-08-01 14:34:56 +02:00
Stanislav Mekhanoshin
c7bb105e97
[AMDGPU] Add v_cvt_scale_pk8_* gfx1250 instructions (#151616) 2025-07-31 18:55:59 -07:00
Stanislav Mekhanoshin
49d89bc9f4
[AMDGPU] Add gfx1250 cvt_pk|sr_fp8|bf8_f32 instructions (#151595) 2025-07-31 16:04:46 -07:00
Stanislav Mekhanoshin
e46d938ddf
[AMDGPU] v_cvt_sr_pk_f16_f32 gfx1250 instruction (#151482) 2025-07-31 12:25:55 -07:00
Stanislav Mekhanoshin
ce40863209
[AMDGPU] Add v_cvt_sr|pk_bf8|fp8_f16 gfx1250 instructions (#151415) 2025-07-30 17:24:45 -07:00
Stanislav Mekhanoshin
62187a60e6
[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385) 2025-07-30 14:02:03 -07:00
Changpeng Fang
3b66d4a987
[AMDGPU] Support builtin/intrinsics for async loads/stores on gfx1250 (#151058) 2025-07-29 08:20:05 -07:00
Changpeng Fang
30ad2e24ab
[AMDGPU] Allow readonly features to be written to IR when there is no target (#148141)
Fixes: SWDEV-541399
2025-07-29 08:18:00 -07:00
Changpeng Fang
d7a38a94cd
[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540) 2025-07-24 16:23:33 -07:00
Stanislav Mekhanoshin
9deb7f6062
[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466) 2025-07-24 12:13:59 -07:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Shilei Tian
2c50e4cac2
[AMDGPU] Add support for v_sat_pk4_i4_[i8,u8] on gfx1250 (#149528)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Foad, Jay <Jay.Foad@amd.com>
2025-07-18 13:08:50 -04:00
Shilei Tian
e11d28faee
[AMDGPU] Add support for v_permlane16_swap_b32 on gfx1250 (#149518)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 13:05:08 -04:00
Shilei Tian
95b69e0e70
[AMDGPU] Add support for v_prng_b32 on gfx1250 (#149450)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 10:59:47 -04:00
Shilei Tian
602d43cfd1
[Clang][AMDGPU] Add the missing builtin __builtin_amdgcn_sqrt_bf16 (#149447)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 08:43:08 -04:00
Shilei Tian
aecd44818a
[AMDGPU] Add support for v_tanh_f16 on gfx1250 (#149439)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 00:21:04 -04:00
Shilei Tian
7e105fbdbe
[AMDGPU] Add support for v_tanh_f32 on gfx1250 (#149360)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 15:42:35 -04:00
Shilei Tian
fd5fc76c91
[AMDGPU] Add support for v_cos_bf16 on gfx1250 (#149355)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 14:43:34 -04:00
Shilei Tian
a102342990
[AMDGPU] Add support for v_sin_bf16 on gfx1250 (#149241)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 08:49:45 -04:00
Shilei Tian
a6b5ece75e
[AMDGPU] Add support for v_exp_bf16 on gfx1250 (#149229)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 08:46:01 -04:00
Wenju He
b41398294c
[SPIR] Set MaxAtomicInlineWidth minimum size to 32 for spir32 and 64 for spir64 (#148997)
Set MaxAtomicInlineWidth the same way as SPIR-V targets in 3cfd0c0d3697.
This PR fixes build warning in scoped atomic built-in in #146814:
`warning: large atomic operation may incur significant performance
penalty; ; the access size (2 bytes) exceeds the max lock-free size (0
bytes) [-Watomic-alignment]`
2025-07-17 09:02:10 +08:00
Shilei Tian
ad6d5d2821
[AMDGPU] Add support for v_log_bf16 on gfx1250 (#149201)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16 19:09:34 -04:00
Shilei Tian
7d2a58e87d
[AMDGPU] Add support for v_rsq_bf16 on gfx1250 (#149194)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16 19:06:03 -04:00
zGoldthorpe
85349b4936
[clang][amdgpu] Add builtin for struct buffer lds load (#148950)
This is essentially just a revision of #137678 which only exposes a
builtin for the intrinsic `llvm.amdgcn.struct.ptr.buffer.load.lds`,
which expects an `__amdgpu_buffer_rsrc_t` rather than a `v4i32` as its
first argument.

The reason for excluding the other intrinsics exposed by the cited PR is
because the intrinsics taking a `v4i32` are legacy and should be
deprecated.
2025-07-16 07:23:09 -06:00
Changpeng Fang
c962f2b29d
AMDGPU: Implement builtins for gfx1250 wmma instructions (#148991)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-15 18:17:12 -07:00
Shilei Tian
dabc8e2ec1
[AMDGPU] Add support for v_rcp_bf16 on gfx1250 (#148916)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-15 16:12:51 -04:00
Shilei Tian
d7ec80c897
[AMDGPU] Add support for v_tanh_bf16 on gfx1250 (#147425)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-14 16:30:18 -04:00
Changpeng Fang
8c1b516948
AMDGPU: Implement s_wait_asynccnt and s_wait_tensorcnt for gfx1250 (#148292)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2025-07-11 15:07:21 -07:00
jofrn
15d36aa4ce
[clang][CodeGen] Preserve addrspace of enqueue_kernel builtin. (#148062)
__enqueue_kernel_varargs' last parameter is in addrspace(5), but CodeGen
currently misses this qualifier. This commit fixes the code to preserve
the qualifier by referencing Alloca, which has its casts removed, rather
than TmpPtr.
2025-07-11 17:00:28 -04:00
Shilei Tian
d258457d42
[AMDGPU] Add support for v_cvt_f32_fp8 on gfx1250 (#147579)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-08 16:21:24 -04:00
Changpeng Fang
eda3161c35
AMDGPU: Implement tensor load and store instructions for gfx1250 (#146636)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-03 13:49:34 -07:00
Changpeng Fang
5035d20dcb
AMDGPU: Implement ds_atomic_async_barrier_arrive_b64/ds_atomic_barrier_arrive_rtn_b64 (#146409)
These two instructions are supported by gfx1250. We define the
instructions and implement the corresponding intrinsic and builtin.

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-01 11:08:49 -07:00
Shilei Tian
749c7c5dc4
[AMDGPU] Add support for v_cvt_f16_bf8 on gfx1250 (#146305)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-06-30 07:54:55 -04:00
Shilei Tian
a99c964d7f
[AMDGPU] Add support for v_cvt_f16_fp8 on gfx1250 (#146302)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-06-30 07:51:00 -04:00