44 Commits

Author SHA1 Message Date
Alex Voicu
b08b56381c
[NFC][AMDGPU] Clean-up feature parsing for AMDGCNSPIRV. (#123519)
When we did the initial AMDGCNSPIRV commits we left the initialisation
of the feature map in a relatively disorderly state. This change
corrects that oversight:

- We make sure that AMDGCNSPIRV actually advertises the union of all
AMDGCN features, as some were not included;
- We keep feature initialisation in sorted order to make it easy to pick
an insertion point when features are added in the future.
2025-01-20 02:30:29 +00:00
Matt Arsenault
5615657209
AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16|f16}_f32 instructions (#117824)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
2024-11-26 23:37:05 -05:00
Matt Arsenault
62dc8f3069
AMDGPU: Add builtins & codegen support for bitop3_b{16|32} of gfx950. (#117823)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 23:33:07 -05:00
Matt Arsenault
0f4fcca546
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp|bf]6 for gfx950 (#117745)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:26:07 -05:00
Matt Arsenault
2b9e947d43
AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743)
OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z]
where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.

OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d]
where, c & d i.e. OPSEL[3 : 2] selects which dst_byte  to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:20:09 -05:00
Matt Arsenault
815069c701
AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16|f32]_[bf8|fp8] (#117739)
OPSEL[1:0] collectively decide which byte to read
from src input.

Builtin takes additional imm argument which
represents index (with valid values:[0:3]) of src
byte read. Out of bounds checks will added in next
patch.

OPSEL ASM Syntax: opsel:[x,y,z]
where,
    opsel[x] = Inst{11} = src0_modifier{2}
    opsel[y] = Inst{12} = src1_modifier{2}
    opsel[z] = Inst{14} = src0_modifier{3}

Note: Inst{13} i.e. OPSEL[2] is ignored in
asm syntax and opsel[z] is meaningless
for v_cvt_scalef32_f32_{fp|bf}8

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 14:54:10 -05:00
Matt Arsenault
7fc71f7909
AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599)
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:54:50 -08:00
Matt Arsenault
716364ebd6
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:51:01 -08:00
Matt Arsenault
aa7eb5723c
AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597)
v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16.
All three instructions were part of Dot9 instructions in the compiler.

This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16)
into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16).

All necessary changes to gfx11 and gfx12 are updated to reflect this change.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:47:48 -08:00
Matt Arsenault
5d650a62a3
AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596)
This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32
instructions.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:44:47 -08:00
Matt Arsenault
22503a9df1
AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (#117592)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-25 19:27:01 -08:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Matt Arsenault
ca1b35a6c8
AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310)
Rand num instruction for stochastic rounding.
2024-11-18 10:54:54 -08:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Carl Ritson
076aac59ac
[AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Stanislav Mekhanoshin
f363e30f15
[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633) 2024-07-09 14:25:58 -07:00
Alex Voicu
88e2bb4092
[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796)
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:

- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.

Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
2024-06-07 11:50:23 +01:00
Shilei Tian
1ca0055f45
[AMDGPU] Add a new target gfx1152 (#94534) 2024-06-06 12:16:11 -04:00
Konstantin Zhuravlyov
775f1cd34d
AMDGPU: Add gfx12-generic target (#93875) 2024-05-31 12:46:44 -04:00
Changpeng Fang
96813de52d
AMDGPU: Define a feature for v_dot4_f32_* instructions (#84248)
FeatureDot11Insts (dot11-insts) for:
  v_dot4_f32_fp8_fp8, v_dot4_f32_fp8_bf8,
  v_dot4_f32_bf8_fp8, v_dot4_f32_bf8_bf8
2024-03-06 14:37:03 -08:00
Pierre van Houtryve
43c7eb5d7b
[AMDGPU] Replace '.' with '-' in generic target names (#81718)
The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs &
the associated clang driver logic are also confused by the dot.

After discussions, we decided it's better to just remove the '.' from
the target name than fix each issue one by one.
2024-02-14 15:19:04 +01:00
Pierre van Houtryve
f93aa5157a
[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.

Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.

This contains the documentation changes for both this change and #76954
as well.
2024-02-12 10:18:20 +01:00
Mariusz Sikora
cfddb59be2
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Jay Foad
e21b0b083e
[AMDGPU] Remove gws feature from GFX12 (#78711)
This was already done for LLVM. This patch just updates the Clang
builtin handling to match.
2024-01-19 15:45:53 +00:00
Jay Foad
ed12388082
[AMDGPU] Do not emit V_DOT2C_F32_F16_e32 on GFX12 (#78709)
That instruction is not supported on GFX12.
Added a testcase which previously crashed without this change.

Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>
2024-01-19 14:36:27 +00:00
Mariusz Sikora
3e6589f21c
[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917)
- image_atomic_pk_add_f16
- image_atomic_pk_add_bf16
- ds_pk_add_bf16
- ds_pk_add_f16
- ds_pk_add_rtn_bf16
- ds_pk_add_rtn_f16
- flat_atomic_pk_add_f16
- flat_atomic_pk_add_bf16
- global_atomic_pk_add_f16
- global_atomic_pk_add_bf16
- buffer_atomic_pk_add_f16
- buffer_atomic_pk_add_bf16
2024-01-18 14:01:09 +01:00
Mariusz Sikora
264fd9e13e
[AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (#78439) 2024-01-18 08:46:53 +01:00
Jay Foad
cf1e0c0b07
[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Jay Foad
9b374a800d [AMDGPU] Add some clang-format off/on markers
This keeps clang-format happy on future patches.
2023-11-23 09:50:55 +00:00
Jay Foad
e0d93d5aaa [AMDGPU] Reindent some tables
This keeps clang-format happy on future patches.
2023-11-23 09:49:03 +00:00
Ivan Kosarev
096eba148d
[TargetParser][AMDGPU] Fix getArchEntry(). (#69222)
It's supposed to return null when an unknown target id is passed.
2023-10-17 14:54:29 +01:00
Yaxun (Sam) Liu
b8a9c50f22 [AMDGPU] Add target feature gws to clang
Reviewed by: Matt Arsenault

Differential Revision: https://reviews.llvm.org/D158367
2023-08-25 11:50:47 -04:00
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Yaxun (Sam) Liu
c0f0d50653 [HIP] emit macro __HIP_NO_IMAGE_SUPPORT
HIP texture/image support is optional as some devices
do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT
is defined for device not supporting images (d0448aa4c4/docs/reference/kernel_language.md (L426) )

Currently the macro is defined by HIP header based on predefined macros
for GPU, e.g __gfx*__ , which is error prone. This patch let clang
emit the predefined macro.

Reviewed by: Matt Arsenault, Artem Belevich

Differential Revision: https://reviews.llvm.org/D151349
2023-06-14 22:53:41 -04:00
Kazu Hirata
143e131aa7 Do not unnecessarily include StringSwitch.h 2023-06-11 13:19:22 -07:00
Yaxun (Sam) Liu
6adb9a0602 [AMDGPU] Emit predefined macro __AMDGCN_CUMODE__
Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes.

If WGP mode is not supported, ignore -mno-cumode and emit a warning.

This is needed for implementing device functions like __smid
(312dff7b79/include/hip/amd_detail/amd_device_functions.h (L957))

Reviewed by: Matt Arsenault, Artem Belevich, Brian Sumner

Differential Revision: https://reviews.llvm.org/D145343
2023-05-12 18:50:52 -04:00
Konstantin Zhuravlyov
9d05727972 AMDGPU: Add basic gfx942 target
Differential Revision: https://reviews.llvm.org/D149983
2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov
1fc70210a6 AMDGPU: Add basic gfx941 target
Differential Revision: https://reviews.llvm.org/D149982
2023-05-10 11:51:06 -04:00
Dominik Adamski
e43247dd32 [Clang][Flang][AMDGPU] Add support for AMDGPU to Flang driver
Scope of changes:
  1) Extract common code between Clang and Flang for parsing AMDGPU features
  2) Add function which adds implicit target features for AMDGPU as Clang does
  3) Add AMDGPU target as one of valid targets for Flang

Differential Revision: https://reviews.llvm.org/D145579

Reviewed By: yaxunl, awarzynski
2023-03-29 02:23:37 -05:00
Francesco Petrogalli
ac1ffd3cac [TargetParser] Generate the defs for RISCV CPUs using llvm-tblgen.
Rework the change to prevent build failures. NFCI.

The failing code was submitted as
cf7a8305a2b4ddfd299c748136cb9a2960ef7089 and reverted via
8bd65e535fb33bc48805bafed8217b16a853e158.

The rework in this new commit prevents failures like the following:

FAILED: tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/Targets/RISCV.cpp.o
/usr/bin/c++  [bunch of non interesting stuff]  -c <path-to>/llvm-project/clang/lib/Basic/Targets/RISCV.cpp
In file included from <path-to>/llvm-project/clang/lib/Basic/Targets/RISCV.cpp:19:
<path-to>/llvm-project/llvm/include/llvm/TargetParser/RISCVTargetParser.h:29:10: fatal error: llvm/TargetParser/RISCVTargetParserDef.inc: No such file or directory
  29 | #include "llvm/TargetParser/RISCVTargetParserDef.inc"
     |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These failures happen because the library LLVMTargetParser depends on
RISCVTargetParserTableGen, which is a tablegen target that generates
the list of CPUs in
llvm/TargetParser/RISCVTargetParserDef.inc. This *.inc file is
included by the public header file
llvm/TargetParser/RISCVTargetParser.h.

The header file llvm/TargetParser/RISCVTargetParser.h is also used in
components (clangDriver and clangBasic) that link into
LLVMTargetParser, but on some configurations such components might end
up being built before TargetParser is ready.

The fix is to make sure that clangDriver and clangBasic depend on the
tablegen target RISCVTargetParserTableGen, which generates the .inc
file whether or not LLVMTargetParser is ready.

WRT the original patch at https://reviews.llvm.org/D137517, this
commit is just adding RISCVTargetParserTableGen in the DEPENDS list of
clangDriver and clangBasic.
2023-01-11 11:18:44 +01:00
Francesco Petrogalli
8bd65e535f Revert "[TargetParser] Generate the defs for RISCV CPUs using llvm-tblgen."
This reverts commit cf7a8305a2b4ddfd299c748136cb9a2960ef7089.
2023-01-11 10:22:56 +01:00
Francesco Petrogalli
cf7a8305a2 [TargetParser] Generate the defs for RISCV CPUs using llvm-tblgen.
This patch removes the file `llvm/include/llvm/TargetParser/RISCVTargetParser.def` and replaces it with a tablegen-generated `.inc` file out of `llvm/lib/Target/RISCV/RISCV.td`.

The module system has been updated to make sure we can build clang/llvm with `-DLLVM_ENABLE_MODULES=On`

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137517
2023-01-11 10:00:04 +01:00
Archibald Elliott
f09cf34d00 [Support] Move TargetParsers to new component
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
  component into a new LLVM Component called "TargetParser". This
  potentially enables using tablegen to maintain this information, as
  is shown in https://reviews.llvm.org/D137517. This cannot currently
  be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
  information in the TargetParser:
  - `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
    the current Host machine for info about it, primarily to support
    getting the host triple, but also for `-mcpu=native` support in e.g.
    Clang. This is fairly tightly intertwined with the information in
    `X86TargetParser.h`, so keeping them in the same component makes
    sense.
  - `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
    the target triple parser and representation. This is very intertwined
    with the Arm target parser, because the arm architecture version
    appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.

And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM

Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.

If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.

Differential Revision: https://reviews.llvm.org/D137838
2022-12-20 11:05:50 +00:00