546 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
34aed0ed56
[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194) 2025-08-05 15:15:21 -07:00
zGoldthorpe
d7074b63ed
[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216)
This patch exposes builtins for atomic `add`, `max`, and `min` operations that
operate over buffer resource pointers.
2025-08-05 11:04:15 -06:00
Stanislav Mekhanoshin
a153e83e41
[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036) 2025-08-04 19:16:34 -07:00
Stanislav Mekhanoshin
7598c25b5a
[AMDGPU] v_cvt_scale_pk16 gfx1250 instructions (#151804) 2025-08-02 10:45:02 -07:00
Stanislav Mekhanoshin
c7bb105e97
[AMDGPU] Add v_cvt_scale_pk8_* gfx1250 instructions (#151616) 2025-07-31 18:55:59 -07:00
Changpeng Fang
d7a38a94cd
[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540) 2025-07-24 16:23:33 -07:00
Stanislav Mekhanoshin
9deb7f6062
[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466) 2025-07-24 12:13:59 -07:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
YexuanXiao
7c402b8b81
Reland [Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types (#149613)
The checks for the 'z' and 't' format specifiers added in the original
PR #143653 had some issues and were overly strict, causing some build
failures and were consequently reverted at
4c85bf2fe8.

In the latest commit
27c58629ec,
I relaxed the checks for the 'z' and 't' format specifiers, so warnings
are now only issued when they are used with mismatched types.

The original intent of these checks was to diagnose code that assumes
the underlying type of `size_t` is `unsigned` or `unsigned long`, for
example:

```c
printf("%zu", 1ul); // Not portable, but not an error when size_t is unsigned long
```  

However, it produced a significant number of false positives. This was
partly because Clang does not treat the `typedef` `size_t` and
`__size_t` as having a common "sugar" type, and partly because a large
amount of existing code either assumes `unsigned` (or `unsigned long`)
is `size_t`, or they define the equivalent of size_t in their own way
(such as
sanitizer_internal_defs.h).2e67dcfdcd/compiler-rt/lib/sanitizer_common/sanitizer_internal_defs.h (L203)
2025-07-19 03:44:14 -03:00
Kazu Hirata
4c85bf2fe8 Revert "[Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types instead of built-in types (#143653)"
This reverts commit c27e283cfbca2bd22f34592430e98ee76ed60ad8.

A builbot failure has been reported:
https://lab.llvm.org/buildbot/#/builders/186/builds/10819/steps/10/logs/stdio

I'm also getting a large number of warnings related to %zu and %zx.
2025-07-17 21:04:01 -07:00
YexuanXiao
c27e283cfb
[Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types instead of built-in types (#143653)
Including the results of `sizeof`, `sizeof...`, `__datasizeof`,
`__alignof`, `_Alignof`, `alignof`, `_Countof`, `size_t` literals, and
signed `size_t` literals, the results of pointer-pointer subtraction and
checks for standard library functions (and their calls).

The goal is to enable clang and downstream tools such as clangd and
clang-tidy to provide more portable hints and diagnostics.

The previous discussion can be found at #136542.

This PR implements this feature by introducing a new subtype of `Type`
called `PredefinedSugarType`, which was considered appropriate in
discussions. I tried to keep `PredefinedSugarType` simple enough yet not
limited to `size_t` and `ptrdiff_t` so that it can be used for other
purposes. `PredefinedSugarType` wraps a canonical `Type` and provides a
name, conceptually similar to a compiler internal `TypedefType` but
without depending on a `TypedefDecl` or a source file.

Additionally, checks for the `z` and `t` format specifiers in format
strings for `scanf` and `printf` were added. It will precisely match
expressions using `typedef`s or built-in expressions.

The affected tests indicates that it works very well.

Several code require that `SizeType` is canonical, so I kept `SizeType`
to its canonical form.

The failed tests in CI are allowed to fail. See the
[comment](https://github.com/llvm/llvm-project/pull/135386#issuecomment-3049426611)
in another PR #135386.
2025-07-17 22:45:57 -03:00
zGoldthorpe
85349b4936
[clang][amdgpu] Add builtin for struct buffer lds load (#148950)
This is essentially just a revision of #137678 which only exposes a
builtin for the intrinsic `llvm.amdgcn.struct.ptr.buffer.load.lds`,
which expects an `__amdgpu_buffer_rsrc_t` rather than a `v4i32` as its
first argument.

The reason for excluding the other intrinsics exposed by the cited PR is
because the intrinsics taking a `v4i32` are legacy and should be
deprecated.
2025-07-16 07:23:09 -06:00
Changpeng Fang
c962f2b29d
AMDGPU: Implement builtins for gfx1250 wmma instructions (#148991)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-15 18:17:12 -07:00
Changpeng Fang
8c1b516948
AMDGPU: Implement s_wait_asynccnt and s_wait_tensorcnt for gfx1250 (#148292)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2025-07-11 15:07:21 -07:00
Shilei Tian
d258457d42
[AMDGPU] Add support for v_cvt_f32_fp8 on gfx1250 (#147579)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-08 16:21:24 -04:00
Changpeng Fang
eda3161c35
AMDGPU: Implement tensor load and store instructions for gfx1250 (#146636)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-03 13:49:34 -07:00
Juan Manuel Martinez Caamaño
03919ef8d9
[Clang][OpenCL] Declare cl_amd_media_ops/cl_amd_media_ops2 builtins with -fdeclare-opencl-builtins (#143507)
cl_amd_media_ops/cl_amd_media_ops2 builtins are currently not declared
with -fdeclare-opencl-builtins.

This patch adds support for these builtins.
2025-07-02 16:55:24 +02:00
Shilei Tian
749c7c5dc4
[AMDGPU] Add support for v_cvt_f16_bf8 on gfx1250 (#146305)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-06-30 07:54:55 -04:00
Shilei Tian
a99c964d7f
[AMDGPU] Add support for v_cvt_f16_fp8 on gfx1250 (#146302)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-06-30 07:51:00 -04:00
Changpeng Fang
bb982e733c
AMDGPU: support s_monitor_sleep on gfx1250 (#146293)
Co-Authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-06-29 21:01:47 -07:00
Stanislav Mekhanoshin
40eee8ec7f
[AMDGPU] Add s_setprio_inc_wg gfx1250 instruction (#145152) 2025-06-22 12:52:05 -07:00
Aaron Ballman
9eef4d1c5f
Remove delayed typo expressions (#143423)
This removes the delayed typo correction functionality from Clang
(regular typo correction still remains) due to fragility of the
solution.

An RFC was posted here:
https://discourse.llvm.org/t/rfc-removing-support-for-delayed-typo-correction/86631
and while that RFC was asking for folks to consider stepping up to be
maintainers, and we did have a few new contributors show some interest,
experiments show that it's likely worth it to remove this functionality
entirely and focus efforts on improving regular typo correction.

This removal fixes ~20 open issues (quite possibly more), improves
compile time performance by roughly .3-.4%
(https://llvm-compile-time-tracker.com/?config=Overview&stat=instructions%3Au&remote=AaronBallman&sortBy=date),
and does not appear to regress diagnostic behavior in a way we wouldn't
find acceptable.

Fixes #142457
Fixes #139913
Fixes #138850
Fixes #137867
Fixes #137860
Fixes #107840
Fixes #93308
Fixes #69470
Fixes #59391
Fixes #58172
Fixes #46215
Fixes #45915
Fixes #45891
Fixes #44490
Fixes #36703
Fixes #32903
Fixes #23312
Fixes #69874
2025-06-13 06:45:40 -04:00
Krzysztof Drewniak
4bdd116b80
[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425)
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to
LDS from global (address space 1) pointers and buffer fat pointers
(address space 7), since they use the same API and "gather from a
pointer to LDS" is something of an abstract operation.

This commit adds the intrinsic and its lowerings for addrspaces 1 and 7,
and updates the MLIR wrappers to use it (loosening up the restrictions
on loads to LDS along the way to match the ground truth from target
features).

It also plumbs the intrinsic through to clang.
2025-05-19 07:15:04 -07:00
Aaron Ballman
a635bbf141
[C] Update the -Wdefault-const-init-unsafe wording (#138266)
This drops the "and is incompatible with C++" phrasing from the
diagnostic unless -Wc++-compat is explicitly passed. This makes the
diagnostic less confusing when it is on by default rather than enabled
because of C++ compatibility concerns
2025-05-02 12:23:40 -04:00
Aaron Ballman
576161cb60
[C] Warn on uninitialized const objects (#137166)
Unlike C++, C allows the definition of an uninitialized `const` object.
If the object has static or thread storage duration, it is still
zero-initialized, otherwise, the object is left uninitialized. In either
case, the code is not compatible with C++.

This adds a new diagnostic group, `-Wdefault-const-init-unsafe`, which
is on by default and diagnoses any definition of a `const` object which
remains uninitialized.

It also adds another new diagnostic group, `-Wdefault-const-init` (which
also enabled the `unsafe` variant) that diagnoses any definition of a
`const` object (including ones which are zero-initialized). This
diagnostic is off by default.

Finally, it adds `-Wdefault-const-init` to `-Wc++-compat`. GCC diagnoses
these situations under this flag.

Fixes #19297
2025-04-25 08:21:41 -04:00
Chris B
52e0337ea3
[HLSL][OpenCL] Strip addrspace from implicit cast diags (#135830)
The address space of a source value for an implicit cast isn't really
relevant when emitting conversion warnings. Since the lvalue->rvalue
cast effectively removes the address space they don't factor in, but
they do create visual noise in the diagnostics.

This is a small quality-of-life fixup to get in as HLSL adopts more
address space annotations.
2025-04-16 12:13:19 -05:00
Matheus Izvekov
fceb9cecdf
[clang] consistently quote expressions in diagnostics (#134769) 2025-04-15 04:18:23 -03:00
Juan Manuel Martinez Caamaño
d6c1ef576f
[AMDGPU] vmem-to-lds-load-insts incoherence between TargetParser and AMDGPU.td (#135376)
The vmem-to-lds-loads-insts feature is only available on gfx9/10. While
target-parser was also enabling it for gfx6,7,8.
2025-04-11 16:31:04 +02:00
Jay Foad
e2fe78797f
[Clang] Use "syncscope" instead of "synchscope". NFC. (#134616)
This matches the spelling of the keyword in LLVM IR.
2025-04-07 13:32:36 +01:00
Juan Manuel Martinez Caamaño
041e84261a
[Clang][AMDGPU] Expose buffer load lds as a clang builtin (#132048)
CK is using either inline assembly or inline LLVM-IR builtins to
generate buffer_load_dword lds instructions.

This patch exposes this instruction as a Clang builtin available on gfx9 and gfx10.

Related to SWDEV-519702 and SWDEV-518861
2025-04-03 09:22:38 +02:00
Juan Manuel Martinez Caamaño
beae0e9f1a
[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055)
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.

This feature is only available on gfx9/10.

A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
2025-04-02 20:00:09 +02:00
Juan Manuel Martinez Caamaño
0375ef07c3
[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741)
This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as
a 4-bit signed integer and returns `0.0625f * src`.

SWDEV-518861
2025-04-02 19:51:40 +02:00
Alexander Shaposhnikov
2dc123b33d
[clang][opencl] Allow passing all zeros to reqd_work_group_size (#131543)
Allow passing all zeros to reqd_work_group_size.

Test plan: ninja check-all
2025-03-16 16:21:46 -07:00
Fabian Ritter
029c8e783d
[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
2025-02-19 10:11:48 +01:00
Sven van Haastregt
957213f60b
[OpenCL] Diagnose block references in selection operator (#114824)
In addition to the invocation case that is already diagnosed, also
diagnose when a block reference appears on either side of a ternary
selection operator.

Until now, clang would accept the added test case only to crash during
code generation.
2025-01-22 09:49:56 +01:00
Matt Arsenault
a2c3e0c4cb
AMDGPU/clang: Add global_load_lds size check support for gfx950 (#117825)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:41:09 -05:00
Matt Arsenault
5615657209
AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16|f16}_f32 instructions (#117824)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:37:05 -05:00
Matt Arsenault
62dc8f3069
AMDGPU: Add builtins & codegen support for bitop3_b{16|32} of gfx950. (#117823)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 23:33:07 -05:00
Matt Arsenault
265e209ceb
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_{bf8|fp8}_{f16|bf16|f32} (#117821)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:24:01 -05:00
Matt Arsenault
76715787f4
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_pk_fp4 instructions (#117798)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 19:59:14 -05:00
Matt Arsenault
c8ee1ee057
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk_fp4_{f|bf}16 for gfx950 (#117794)
These instructions have non-standard use of OPSEL bits to select
dest write byte. The src2_modifiers operand is used without having
its corresponding src2 operand by introducing dummy src2.

OPSEL ASM OPSEL Syntax: opsel:[a,b,c,d]
a & b are meaningless, c & d together decides byte to write in dst reg.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:38:23 -05:00
Matt Arsenault
065dc93d96
AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 for gfx950 (#117793)
OPSEL[0] selects src_word to read.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:35:18 -05:00
Matt Arsenault
eeb76880f3
AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{f|bf}16_fp4 for gfx950 (#117744)
OPSEL ASM Syntax for v_cvt_scalef32_pk_{f|bf}16_fp4 : opsel:[x,y,z]
where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.

Note: Conventional Inst{13} i.e. OPSEL[2] is ignored in asm syntax.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:23:15 -05:00
Matt Arsenault
2b9e947d43
AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743)
OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z]
where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.

OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d]
where, c & d i.e. OPSEL[3 : 2] selects which dst_byte  to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:20:09 -05:00
Matt Arsenault
4527894143
Builtins & Codegen support for v_cvt_scalef32_pk_{fp|bf}8_{f|bf}16 for gfx950 (#117742)
OPSEL[3] determines low/high 16 bits of word to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:16:08 -05:00
Matt Arsenault
62584f32eb
AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_f32_{fp8|bf8} for gfx950 (#117741)
OPSEL[0] determines low/high 16 bits of src0 to read.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:12:18 -05:00
Matt Arsenault
803bd812b1
AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_{fp8|bf8}_f32 for gfx950 (#117740)
OPSEL[3] determines low/high 16 bits of word to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 14:57:09 -05:00
Matt Arsenault
815069c701
AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16|f32]_[bf8|fp8] (#117739)
OPSEL[1:0] collectively decide which byte to read
from src input.

Builtin takes additional imm argument which
represents index (with valid values:[0:3]) of src
byte read. Out of bounds checks will added in next
patch.

OPSEL ASM Syntax: opsel:[x,y,z]
where,
    opsel[x] = Inst{11} = src0_modifier{2}
    opsel[y] = Inst{12} = src1_modifier{2}
    opsel[z] = Inst{14} = src0_modifier{3}

Note: Inst{13} i.e. OPSEL[2] is ignored in
asm syntax and opsel[z] is meaningless
for v_cvt_scalef32_f32_{fp|bf}8

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 14:54:10 -05:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Matt Arsenault
7d544c64e3
AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (#117259) 2024-11-22 12:11:06 -08:00