25 Commits

Author SHA1 Message Date
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Shilei Tian
602d43cfd1
[Clang][AMDGPU] Add the missing builtin __builtin_amdgcn_sqrt_bf16 (#149447)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 08:43:08 -04:00
Shilei Tian
aecd44818a
[AMDGPU] Add support for v_tanh_f16 on gfx1250 (#149439)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 00:21:04 -04:00
Shilei Tian
7e105fbdbe
[AMDGPU] Add support for v_tanh_f32 on gfx1250 (#149360)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 15:42:35 -04:00
Shilei Tian
fd5fc76c91
[AMDGPU] Add support for v_cos_bf16 on gfx1250 (#149355)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 14:43:34 -04:00
Shilei Tian
a102342990
[AMDGPU] Add support for v_sin_bf16 on gfx1250 (#149241)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 08:49:45 -04:00
Shilei Tian
a6b5ece75e
[AMDGPU] Add support for v_exp_bf16 on gfx1250 (#149229)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 08:46:01 -04:00
Shilei Tian
ad6d5d2821
[AMDGPU] Add support for v_log_bf16 on gfx1250 (#149201)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16 19:09:34 -04:00
Shilei Tian
7d2a58e87d
[AMDGPU] Add support for v_rsq_bf16 on gfx1250 (#149194)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16 19:06:03 -04:00
Changpeng Fang
c962f2b29d
AMDGPU: Implement builtins for gfx1250 wmma instructions (#148991)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-15 18:17:12 -07:00
Shilei Tian
dabc8e2ec1
[AMDGPU] Add support for v_rcp_bf16 on gfx1250 (#148916)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-15 16:12:51 -04:00
Shilei Tian
d7ec80c897
[AMDGPU] Add support for v_tanh_bf16 on gfx1250 (#147425)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-14 16:30:18 -04:00
Changpeng Fang
1f5f381920
AMDGPU: Implement intrinsic/builtins for gfx1250 load transpose instructions (#146289) 2025-06-29 14:33:31 -07:00
choikwa
77de8a0c0a
[AMDGPU][clang] provide device implementation for __builtin_logb and … (#129347)
…__builtin_scalbn

Clang generates library calls for __builtin_* functions which can be a
problem for GPUs that cannot handle them. This patch generates call to
device implementation for __builtin_logb and ldexp intrinsic for
__builtin_scalbn.
2025-05-19 14:11:31 -04:00
Krzysztof Drewniak
4bdd116b80
[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425)
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to
LDS from global (address space 1) pointers and buffer fat pointers
(address space 7), since they use the same API and "gather from a
pointer to LDS" is something of an abstract operation.

This commit adds the intrinsic and its lowerings for addrspaces 1 and 7,
and updates the MLIR wrappers to use it (loosening up the restrictions
on loads to LDS along the way to match the ground truth from target
features).

It also plumbs the intrinsic through to clang.
2025-05-19 07:15:04 -07:00
Matt Arsenault
9bdd9dc895
AMDGPU: Mark workitem ID intrinsics with range attribute (#136196)
This avoids the need to have special handling at every use site.
Unfortunately this means we unnecessarily emit AssertZext in the DAG
(where we already directly understand the range of the intrinsic), andt
we regress in undefined cases as we don't fold out asserts on undef.
2025-04-18 12:27:38 +02:00
Shilei Tian
9e90e10e76
[AMDGPU][Clang] Add builtins for gfx12 ray tracing intrinsics (#135224) 2025-04-11 09:33:32 -04:00
Farzon Lotfi
82103dfae9
Revert "Reland [Clang][Cmake] fix libtool duplicate member name warnings" (#134656)
Reverts llvm/llvm-project#133850
2025-04-07 10:00:53 -04:00
Farzon Lotfi
0d71d9ab28
Reland [Clang][Cmake] fix libtool duplicate member name warnings (#133850)
fixes https://github.com/llvm/llvm-project/issues/133199

As of the third commit the fix to the linker missing references in
`Targets/DirectX.cpp` found in
https://github.com/llvm/llvm-project/pull/133776 was fixed by moving
`HLSLBufferLayoutBuilder.cpp` to `clang/lib/CodeGen/Targets/`.

It fixes the circular reference issue found in
https://github.com/llvm/llvm-project/pull/133619 for all
`-DBUILD_SHARED_LIBS=ON` builds by removing `target_link_libraries` from
the sub directory cmake files.

testing for amdgpu offload was done via
`cmake -B ../llvm_amdgpu -S llvm -GNinja -C
offload/cmake/caches/Offload.cmake -DCMAKE_BUILD_TYPE=Release`

PR https://github.com/llvm/llvm-project/pull/132252 Created a second
file that shared <TargetName>.cpp in clang/lib/CodeGen/CMakeLists.txt

For example There were two AMDGPU.cpp's one in TargetBuiltins and the
other in Targets. Even though these were in different directories
libtool warns that it might not distinguish them because they share the
same base name.

There are two potential fixes. The easy fix is to rename one of them and
keep one cmake file. That solution though doesn't future proof this
problem in the event of a third <TargetName>.cpp and it seems teams want
to just use the target name

https://github.com/llvm/llvm-project/pull/132252#issuecomment-2758178483.

The alternative fix that this PR went with is to seperate the cmake
files into their own sub directories as static libs.
2025-04-07 09:53:07 -04:00
Farzon Lotfi
bdae91b08b
Revert "[Clang][Cmake] fix libtool duplicate member name warnings" (#133795)
Reverts llvm/llvm-project#133619
2025-03-31 17:00:38 -04:00
Farzon Lotfi
cc2b432614
[Clang][Cmake] fix libtool duplicate member name warnings (#133619)
fixes #133199
 
PR #132252 Created a second file that shared `<TargetName>.cpp` in
`clang/lib/CodeGen/CMakeLists.txt`

For example There were two `AMDGPU.cpp`'s one in `TargetBuiltins` and
the other in `Targets`. Even though these were in different directories
`libtool` warns that it might not distinguish them because they share
the same base name.

There are two potential fixes. The easy fix is to rename one of them and
keep one cmake file. That solution though doesn't future proof this
problem in the event of a third `<TargetName>.cpp` and it seems teams
want to just use the target name

https://github.com/llvm/llvm-project/pull/132252#issuecomment-2758178483.

The alternative fix is to seperate the cmake files into their own sub
directories. I chose to create static libraries. It might of been
possible to build an OBJECT, but I only saw examples of this in
compiler-rt and test directories so assumed there was a reason it wasn't
used.
2025-03-31 14:21:22 -04:00
Joseph Huber
772173f548
[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
2025-03-28 07:35:16 -05:00
Farzon Lotfi
59d06071e9
[NFC][HLSL] Move emitter out of AMDGPU.cpp (#133251)
- Move all HLSL code out of AMDGPU.cpp to CGHLSLBuiltins.cpp
- Fixes accidental reorganization of HLSL code into AMDGPU caused by
(https://github.com/llvm/llvm-project/pull/132252,
https://github.com/llvm/llvm-project/commit/7f920e2e5f70b)
2025-03-27 11:47:14 -04:00
Jonathan Thackray
a1a74c9e80
[NFC][clang] Remove superfluous header files after refactor in #132252 (#132495)
Remove superfluous header files after refactor in #132252
2025-03-26 14:45:00 +00:00
Jonathan Thackray
7f920e2e5f
[NFC][clang] Split clang/lib/CodeGen/CGBuiltin.cpp into target-specific files (#132252)
clang/lib/CodeGen/CGBuiltin.cpp is over 1MB long (>23k LoC), and can
take minutes to recompile (depending on compiler and host system) when
modified, and 5 seconds for clangd to update for every edit. Splitting
this file was discussed in this thread:

   https://discourse.llvm.org/t/splitting-clang-s-cgbuiltin-cpp-over-23k-lines-long-takes-1min-to-compile/

and the idea has received a number of +1 votes, hence this change.
2025-03-21 19:09:39 +00:00