546 Commits

Author SHA1 Message Date
Matt Arsenault
90dc644d73
AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (#117258) 2024-11-22 12:08:15 -08:00
Matt Arsenault
8d3435f8a1
AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (#117257) 2024-11-22 12:02:18 -08:00
Matt Arsenault
8a5c24149d
AMDGPU: Add v_smfmac_f32_32x32x64_bf8_bf8 for gfx950 (#117256) 2024-11-22 11:59:06 -08:00
Matt Arsenault
836d2dcf60
AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (#117235) 2024-11-21 17:06:06 -08:00
Matt Arsenault
33124910c9
AMDGPU: Add v_smfmac_f32_16x16x128_fp8_bf8 for gfx950 (#117234) 2024-11-21 17:03:03 -08:00
Matt Arsenault
3678f8a8aa
AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (#117233) 2024-11-21 17:00:08 -08:00
Matt Arsenault
7baadb2a4e
AMDGPU: Add v_smfmac_f32_16x16x128_bf8_bf8 for gfx950 (#117232) 2024-11-21 16:57:01 -08:00
Matt Arsenault
3e6f3508ad
AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (#117214) 2024-11-21 15:01:03 -08:00
Matt Arsenault
8c53036146
AMDGPU: Add v_smfmac_i32_16x16x128_i8 for gfx950 (#117213) 2024-11-21 14:58:11 -08:00
Matt Arsenault
42dd114a46
AMDGPU: Add v_smfmac_f32_32x32x32_bf16 for gfx950 (#117212) 2024-11-21 14:52:11 -08:00
Matt Arsenault
95ddc1a63b
AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (#117211) 2024-11-21 14:46:43 -08:00
Matt Arsenault
e50eaa2cf1
AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (#117205) 2024-11-21 14:43:33 -08:00
Matt Arsenault
2ab178820b
AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (#117202) 2024-11-21 14:40:30 -08:00
Matt Arsenault
1c47d67abc
AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950 (#117053) 2024-11-21 14:28:05 -08:00
Matt Arsenault
f4ed79b160
AMDGPU: Add v_mfma_i32_32x32x32_i8 for gfx950 (#117052) 2024-11-21 09:08:15 -08:00
Matt Arsenault
76b24640e5
AMDGPU: Add v_mfma_i32_16x16x64_i8 for gfx950 (#116728) 2024-11-21 09:02:12 -08:00
Matt Arsenault
01c9a14ccf
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.
2024-11-21 08:51:58 -08:00
Matt Arsenault
130a3150ec
AMDGPU: Define v_mfma_f32_32x32x16_bf16 for gfx950 (#116679)
Unlike the existing gfx940 intrinsics using short/i16 in place of
bfloat, this uses the natural bfloat type.
2024-11-18 21:53:56 -08:00
Matt Arsenault
0c421687f8
AMDGPU: Add first gfx950 mfma instructions (#116312)
Scheduling info and hazards are wrong and TBD.
2024-11-18 13:38:07 -08:00
Krzysztof Parzyszek
e44c28f07e
[clang] Replace "can't" and "can not" in diagnostics with "cannot" (#116623)
See
https://discourse.llvm.org/t/cant-cannot-can-not-in-diagnostic-messages/83171
2024-11-18 15:28:17 -06:00
Stanislav Mekhanoshin
ba1a09da8d
[AMDGPU] Allow overload of __builtin_amdgcn_mov_dpp8 (#113610)
The same handling as for __builtin_amdgcn_mov_dpp.
2024-10-31 02:19:20 -07:00
Gang Chen
4ac0e7e400
[AMDGPU] Add a type for the named barrier (#113614) 2024-10-25 11:24:47 -07:00
Stanislav Mekhanoshin
03fef62b84
[AMDGPU] Relax __builtin_amdgcn_update_dpp sema check (#113341)
Recent change applied too strict check for old and src operands match.
These shall be compatible, but not necessarily exactly the same.

Fixes: SWDEV-493072
2024-10-22 12:32:08 -07:00
Stanislav Mekhanoshin
622e398d88
[AMDGPU] Allow overload of __builtin_amdgcn_mov/update_dpp (#112447)
We need to support 64-bit data types (intrinsics do support it). We are
also silently converting FP to integer argument now, also fixed.
2024-10-21 11:57:18 -07:00
Aaron Ballman
1881f648e2
Remove ^^ as a token in OpenCL (#108224)
OpenCL has a reserved operator (^^), the use of which was diagnosed as
an error (735c6cdebdcd4292928079cb18a90f0dd5cd65fb). However, OpenCL
also encourages working with the blocks language extension. This token
has a parsing ambiguity as a result. Consider:

  unsigned x=0;
  unsigned y=x^^{return 0;}();

This should result in y holding the value zero (0^0) through an
immediately invoked block call as the right-hand side of the xor
operator. However, it causes errors instead because of this reserved
token: https://godbolt.org/z/navf7jTv1

This token is still reserved in OpenCL 3.0, so we still wish to issue a
diagnostic for its use. However, we do not need to create a token for an
extension point that's been unused for about a decade. So this patch
moves the diagnostic from a parsing diagnostic to a lexing diagnostic
and no longer forms a single token. The diagnostic behavior is slightly
worse as a result, but still seems acceptable.

Part of the reason this is coming up is because WG21 is considering
using ^^ as a token for reflection, so this token may come back in the
future.
2024-09-16 07:46:58 -04:00
Shilei Tian
af5352fe8e
[Clang][AMDGPU] Use unsigned data type for __builtin_amdgcn_raw_buffer_store_* (#99546) 2024-07-18 16:34:59 -04:00
Shilei Tian
892c58cf74
[Clang][AMDGPU] Add builtins for instrinsic llvm.amdgcn.raw.ptr.buffer.load (#99258) 2024-07-18 15:33:03 -04:00
Stanislav Mekhanoshin
f363e30f15
[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633) 2024-07-09 14:25:58 -07:00
Shilei Tian
c9f083a994
[Clang][AMDGPU] Add builtins for instrinsic llvm.amdgcn.raw.ptr.buffer.store (#94576)
Depends on https://github.com/llvm/llvm-project/pull/96313.
2024-06-25 09:55:37 -04:00
Shilei Tian
ad599211a7
[Clang][AMDGPU] Add a new builtin type for buffer rsrc (#94830)
This patch adds a new builtin type for AMDGPU's buffer rsrc data type,
which is effectively an AS 8 pointer. This is needed because we'd like
to expose certain intrinsics to users via builtins which take buffer
rsrc as argument.
2024-06-18 20:46:53 -04:00
Alex Voicu
88e2bb4092
[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796)
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:

- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.

Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
2024-06-07 11:50:23 +01:00
Shilei Tian
170d45c0eb
[Clang][AMDGPU] Use I to decorate imm argument for __builtin_amdgcn_global_load_lds (#94376) 2024-06-06 09:29:59 -04:00
Pierre van Houtryve
c1ac6d2dd4
[AMDGPU] Add amdgpu-as MMRA for fences (#78572)
Using MMRAs, allow `builtin_amdgcn_fence` to emit fences that only
target one or more address spaces, instead of fencing all address spaces
at once.

This is done through a `amdgpu-as` MMRA. Currently focused on OpenCL
fences, but can very easily support more AS names and codegen on more
than just fences.
2024-05-27 12:17:04 +02:00
Aaron Ballman
b49ce9c304
Fix more diagnostic wording for style; NFC (#93190)
This tries to fix all of the places where a diagnostic message starts
with a capital letter (other than acroynyms or proper nouns) or ends
with punctuation (other than a question mark).

This is in support of a planned change to tablegen to start diagnosing
incorrect diagnostic message styles.
2024-05-23 14:50:29 -04:00
Shilei Tian
7e476eb11c
[AMDGPU][Clang] Add check of size for __builtin_amdgcn_global_load_lds (#93064) 2024-05-23 10:36:03 -04:00
Fangrui Song
7c1d9b15ee [test] %clang_cc1: remove redundant actions 2024-05-04 23:08:11 -07:00
Timm Bäder
3590ede848 [clang][Interp] Support vec_step 2024-04-29 18:21:24 +02:00
Mariya Podchishchaeva
8697bbe2d4
[clang] Use CPlusPlus language option instead of Bool (#80975)
As it was pointed out in
https://github.com/llvm/llvm-project/pull/80724, we should not be
checking `getLangOpts().Bool` when determining something related to
logical operators, since it only indicates that bool keyword is present,
not which semantic logical operators have. As a side effect a missing
`-Wpointer-bool-conversion` in OpenCL C was restored since like C23,
OpenCL C has bool keyword but logical operators still return int.
2024-02-08 14:31:57 +01:00
Joseph Huber
d1722868d3 [Clang] Make AMDGPU OpenCL tests require AMD registered target
Summary:
These tests likely always failed but was hidden by the expected return
value. Simply make them require AMDGPU as a registered target so they
don't fail on other machines.
2024-02-05 09:08:31 -06:00
Joseph Huber
5249379d74
[AMDGPU] Allow w64 ballot to be used on w32 targets (#80183)
Summary:
Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64
targets even though it is valid. This is relevant for making library
code that can handle both without needing to check the wavefront size.
This patch relaxes the semantic check for w64 so it can be used
normally.
2024-02-05 08:42:28 -06:00
Mariusz Sikora
966416b9e8
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Yaxun (Sam) Liu
00448a548c [clang] Allow fp in atomic fetch max/min builtins
LLVM IR already allows floating point type in atomicrmw.
Update clang atomic fetch max/min builtins to accept
floating point type like we did for fetch add/sub.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D150985

Fixes: SWDEV-401056
2023-05-31 15:19:31 -04:00
Tobias Hieta
dd3c26a045
[NFC][Py Reformat] Reformat python files in clang and clang-tools-extra
This is an ongoing series of commits that are reformatting our
Python code.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D150761
2023-05-23 08:29:52 +02:00
Aaron Ballman
5d8aaad445 [C2x] Implement support for empty brace initialization (WG14 N2900 and WG14 N3011)
This implements support for allowing {} to consistently zero initialize
objects. We already supported most of this work as a GNU extension, but
the C2x feature goes beyond what the GNU extension allowed.

The changes in this patch are:

* Removed the -Wgnu-empty-initializer warning group. The extension is
  now a C2x extension warning instead. Note that use of
  `-Wno-gnu-empty-initializer seems` to be quite low in the wild
(https://sourcegraph.com/search?q=context%3Aglobal+-file%3A.*test.*+%22-Wno-gnu-empty-initializer%22&patternType=standard&sm=1&groupBy=repo
  which currently only gives 8 hits total), so this is not expected to
  be an overly disruptive change. But I'm adding the clang vendors
  review group just in case this expectation is wrong.
* Reworded the diagnostic wording to be about a C2x extension, added a
  pre-C2x compat warning.
* Allow {} to zero initialize a VLA

This functionality is exposed as an extension in all older C modes
(same as the GNU extension was), but does *not* allow the extension for
VLA initialization in C++ due to concern about handling non-trivially
constructible types.

Differential Revision: https://reviews.llvm.org/D147349
2023-04-03 15:22:52 -04:00
Ayal Zaks
eae70ccbf9 [Clang][OpenCL] Allow pointers in structs as kernel arguments from 2.0
Structs that contain global or local pointers can be passed as kernel
arguments starting OpenCL v2.0 with the introduction of shared virtual memory.

Differential Revision: https://reviews.llvm.org/D143849
2023-03-13 18:59:26 +02:00
Nikita Popov
8421307b6b [Clang] Convert some tests to opaque pointers (NFC) 2023-02-16 15:48:10 +01:00
Sven van Haastregt
a60b8f4681 [OpenCL] Allow undefining header-only features
`opencl-c-base.h` always defines 5 particular feature macros for
SPIR-V, making it impossible to disable those features.

To allow disabling any of those features, let the header recognize
`__undef_<feature>` macros.  The user can then pass the
`-D__undef_<feature>` flag on the command line to disable a specific
feature.  The __undef macro could potentially also be set from
`-cl-ext=-feature`, but for now only change the header and only
provide __undef macros for the 5 features that are always enabled in
`opencl-c-base.h`.

Differential Revision: https://reviews.llvm.org/D141297
2023-01-16 11:32:12 +00:00
Matt Arsenault
e630d9b299 AMDGPU/clang: Remove target features from address space test builtins
It turns out we can codegen these on targets without flat addressing,
although the runtime probably didn't put anything useful there. The
proper diagnostic would be to disallow flat pointer uses or languages
with them, not this one edge case. Allows removing one of the special
cases requiring subtarget support in the device libraries.
2022-12-29 18:46:41 -05:00
Matt Arsenault
f4bcd7f598 AMDGPU/clang: Add builtins for llvm.amdgcn.ballot
Use explicit _w32/_w64 suffixes for the wave size to be consistent
with the existing other wave dependent intrinsics. Also start
diagnosing trying to use both wave32 and wave64.

I would have preferred to avoid the +wavefrontsize64 spam on targets
where that's the only option, but avoiding this seems to be more work
than I expected.
2022-12-29 17:58:55 -05:00
Xiang Li
7e04c0ad63 [HLSL] Add groupshare address space.
Added keyword, LangAS and TypeAttrbute for groupshared.

Tanslate it to LangAS with asHLSLLangAS.

Make sure it translated into address space 3 for DirectX target.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D135060
2022-10-20 09:29:09 -07:00