1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
The reland patch addressed the comment
https://github.com/llvm/llvm-project/pull/137670#discussion_r2143541120
This reverts commit 9208b343e962b9f1140ee345c0050a3920bdcbf2.
TargetParser shouldn't re-run the PPC subtarget tablegen target, it
should define its own `-gen-ppc-target-def` rule like all the other
targets do in llvm/include/llvm/TargetParser/CMakeLists.txt .
One user reported that there are incorrect CMake dependencies after this
change, so I will roll this back in the meantime.
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.
This feature is only available on gfx9/10.
A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.
Documentation changes will follow.
For SWDEV-512631
When we did the initial AMDGCNSPIRV commits we left the initialisation
of the feature map in a relatively disorderly state. This change
corrects that oversight:
- We make sure that AMDGCNSPIRV actually advertises the union of all
AMDGCN features, as some were not included;
- We keep feature initialisation in sorted order to make it easy to pick
an insertion point when features are added in the future.
OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z]
where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.
OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d]
where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write.
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
OPSEL[1:0] collectively decide which byte to read
from src input.
Builtin takes additional imm argument which
represents index (with valid values:[0:3]) of src
byte read. Out of bounds checks will added in next
patch.
OPSEL ASM Syntax: opsel:[x,y,z]
where,
opsel[x] = Inst{11} = src0_modifier{2}
opsel[y] = Inst{12} = src1_modifier{2}
opsel[z] = Inst{14} = src0_modifier{3}
Note: Inst{13} i.e. OPSEL[2] is ignored in
asm syntax and opsel[z] is meaningless
for v_cvt_scalef32_f32_{fp|bf}8
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16.
All three instructions were part of Dot9 instructions in the compiler.
This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16)
into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16).
All necessary changes to gfx11 and gfx12 are updated to reflect this change.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.
The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:
- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.
Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs &
the associated clang driver logic are also confused by the dot.
After discussions, we decided it's better to just remove the '.' from
the target name than fix each issue one by one.
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.
Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.
This contains the documentation changes for both this change and #76954
as well.
That instruction is not supported on GFX12.
Added a testcase which previously crashed without this change.
Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>
HIP texture/image support is optional as some devices
do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT
is defined for device not supporting images (d0448aa4c4/docs/reference/kernel_language.md (L426) )
Currently the macro is defined by HIP header based on predefined macros
for GPU, e.g __gfx*__ , which is error prone. This patch let clang
emit the predefined macro.
Reviewed by: Matt Arsenault, Artem Belevich
Differential Revision: https://reviews.llvm.org/D151349