198 Commits

Author SHA1 Message Date
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Jay Foad
19f4cec676
[AMDGPU] Add GFX12 encoding for VINTERP instructions (#74616) 2023-12-07 10:15:31 +00:00
Jay Foad
44ff904d21
[AMDGPU] Add VEXPORT encoding for GFX12 (#74615)
In GFX12 the exp instruction is renamed to export, but exp is still
accepted as an alias.

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2023-12-07 10:06:20 +00:00
Mariusz Sikora
19c9f9c0bf
[AMDGPU] GFX12: Add s_prefetch_inst/data instructions (#74448)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-06 10:01:29 +01:00
Jay Foad
1c55b227fe
[AMDGPU] Add GFX12 encoding and aliases for existing SOP (SALU) instructions (#74305) 2023-12-05 10:07:06 +00:00
Mirko Brkušanin
f5868cb6a6
[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062) 2023-12-04 13:04:42 +01:00
Konstantin Zhuravlyov
6cfb64276d
AMDGPU: Minor updates to program resource registers (#69525)
- Be explicit about which program resource register is supported by
which target
    - RSRC1
      - FP16_OVFL is GFX9+
      - WGP_MODE is GFX10+
      - MEM_ORDERED is GFX10+
      - FWD_PROGRESS is GFX10+
    - RSRC3
      - INST_PREF_SIZE is GFX11+
      - TRAP_ON_START is GFX11+
      - TRAP_ON_END is GFX11+
      - IMAGE_OP is GFX11+
  - Do not emit GFX11+ fields when disassembling GFX10 code objects
  - Tighten enforcement of reserved bits in disassembler

---------

Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>
2023-10-19 12:40:19 -04:00
Stanislav Mekhanoshin
ab6c3d5034
[AMDGPU] Change the representation of double literals in operands (#68740)
A 64-bit literal can be used as a 32-bit zero or sign extended operand.
In case of double zeroes are added to the low 32 bits. Currently asm
parser stores only high 32 bits of a double into an operand. To support
codegen as requested by the
https://github.com/llvm/llvm-project/issues/67781 we need to change the
representation to store a full 64-bit value so that codegen can simply
add immediates to an instruction.

There is some code to support compatibility with existing tests and asm
kernels. We allow to use short hex strings to represent only a high 32
bit of a double value as a valid literal.
2023-10-12 14:45:45 -07:00
Kazu Hirata
b05dbc4d5f [llvm] Use llvm::endianness::{big,little,native} (NFC)
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form.  This patch replaces
support::endianness::{big,little,native} with
llvm::endianness::{big,little,native}.
2023-10-10 20:14:20 -07:00
Kazu Hirata
7c9d7b73e4 Revert "[AMDGPU] Add [[maybe_unused]] to several unused functions (NFC)"
This reverts commit fff16807c2e6c64d671b2f6b7b3ae76f5e16e38d.

We no longer need this workaround after
053478bbd0ae5329ea7993261225e6541f728858.
2023-09-25 11:16:47 -07:00
Kazu Hirata
fff16807c2 [AMDGPU] Add [[maybe_unused]] to several unused functions (NFC)
Ivan is planning to introduce actual uses of these functions in near
future.
2023-09-25 11:13:01 -07:00
Ivan Kosarev
9310baa596 [AMDGPU][NFC] Add True16 operand definitions.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D156103
2023-09-25 16:48:46 +01:00
Ivan Kosarev
fab28e0e14 Reapply "[AMDGPU] Introduce real and keep fake True16 instructions."
Reverts 6cb3866b1ce9d835402e414049478cea82427cf1.

Analysis of failures on buildbots with expensive checks enabled showed
that the problem was triggered by changes in another commit,
469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug
addressed in #67245.
2023-09-23 22:07:41 +01:00
Ivan Kosarev
6cb3866b1c Revert "[AMDGPU] Introduce real and keep fake True16 instructions."
This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to
failures on expensive checks.
2023-09-22 15:40:26 +01:00
Ivan Kosarev
0f864c7b8b [AMDGPU] Introduce real and keep fake True16 instructions.
The existing fake True16 instructions using 32-bit VGPRs are supposed to
co-exist with real ones until all the necessary True16 functionality is
implemented and relevant tests are updated.

Reviewed By: arsenm, Joe_Nash

Differential Revision: https://reviews.llvm.org/D156101
2023-09-22 10:57:56 +01:00
Mirko Brkušanin
923285b139
[AMDGPU] Add gfx1150 SALU Float instructions (#66884) 2023-09-21 11:22:15 +02:00
Austin Kerbow
69447d6afe [AMDGPU] Add ASM and MC updates for preloading kernargs
Add assembler directives for preloading kernel arguments that correspond
to new fields in the kernel descriptor for the length and offset of
arguments that will be placed in SGPRs prior to kernel launch. Alignment
of the arguments in SGPRs is equivalent to the kernarg segment when
accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated
directly after other user SGPRs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159459
2023-09-19 15:45:16 -07:00
Ivan Kosarev
289ae6525d [AMDGPU][MC] Fix handling of A16 operands in intersect_ray instructions.
The patch adds the support for 'noa16' operands in non-A16 variants of
the instructions, fixes validation of A16 operands and eliminates the
custom conversion to MCInst.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155057
2023-07-13 19:46:03 +01:00
Scott Linder
986001c827 [AMDGPU] Improve assembler + disassembler handling of kernel descriptors
* Relax the AsmParser to accept `.amdhsa_wavefront_size32 0` when the
  `.amdhsa_shared_vgpr_count` directive is present.
* Teach the KD disassembler to respect the setting of
  KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32 when calculating the
  value of `.amdhsa_next_free_vgpr`.
* Teach the KD disassembler to disassemble COMPUTE_PGM_RSRC3 for gfx90a
  and gfx10+.
* Include "pseudo directive" comments for gfx10 fields which are not
  controlled by any assembler directive.
* Fix disassembleObject failure diagnostic in llvm-objdump to not
  hard-code a comment string, and to follow the convention of not
  capitalizing the first sentence.

Reviewed By: rochauha

Differential Revision: https://reviews.llvm.org/D128014
2023-07-06 21:20:51 +00:00
Ivan Kosarev
dac5957ca9 [AMDGPU][AsmParser][NFC] Clean up the implementation of KImmFP operands.
addKImmFPOperands() duplicates the KImmFP-specific logic implemented in
addLiteralImmOperand() and therefore can be removed.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D154427
2023-07-05 10:43:41 +01:00
Ivan Kosarev
b7e8a55a83 [AMDGPU][AsmParser][NFC] Simplify parsing of sopp_brtarget operands.
Also refine the definitions while there.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: mbrkusanin

Differential Revision: https://reviews.llvm.org/D154061
2023-06-30 16:35:46 +01:00
Scott Linder
ede070a28d [NFC][AMDGPU] Refactor AMDGPUDisassembler
Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler.

Use split-file to simplify and extend existing kernel-descriptor
disassembly tests.

Add a comment to AMDHSAKernelDescriptor.h, as at least one small set
towards keeping all kernel-descriptor sensitive code in sync.

Reviewed By: MaskRay, kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D130105
2023-06-29 16:06:09 +00:00
Ivan Kosarev
e23891a382 [AMDGPU][Disassembler] Fix a spurious error message in an instruction comment.
The patch prevents pollution of instruction comments with error messages
generated during unsuccessful decoding attempts.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D149049
2023-04-26 12:50:17 +01:00
Mirko Brkusanin
b3dc0e69cf [AMDGPU][MC][GFX11] Add Partial NSA format for image sample instructions
Image sample instructions that need more than 5 VGPRs for VAddr can use
partial NSA for NSA encoding format. VGPRs that can not fit into the
encoding are sequential after the last one.
This patch adds assembly and disassembly parts.

Differential Revision: https://reviews.llvm.org/D144033
2023-02-23 13:33:34 +01:00
Fangrui Song
432caca39a Simplify with hasFeature. NFC 2023-02-17 18:22:24 -08:00
Kazu Hirata
64dad4ba9a Use llvm::bit_cast (NFC) 2023-02-14 01:22:12 -08:00
Changpeng Fang
7ca3444fba AMDGPU: Use module flag to get code object version at IR level folow-up
Summary:
  This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.

Reviewers: arsenm

Differential Revision
  https://reviews.llvm.org/D143293
2023-02-10 11:16:38 -08:00
Petar Avramovic
13512f84f3 AMDGPU/MC: Fix decoders for VSrc_v2b32 and VSrc_v2f32 RegisterOperands
Decoder should make 32 bit value when decoding immediates, not 64 bit.

Differential Revision: https://reviews.llvm.org/D143574
2023-02-09 12:16:46 +01:00
Petar Avramovic
a256e1d97a AMDGPU/MC: Fix indentation and remove unused macro after D142636 2023-02-06 13:19:03 +01:00
Petar Avramovic
b0c1a45ba5 AMDGPU/MC: Refactor decoders. Rework decoders for float immediates
decodeFPImmed creates immediate operand using register operand width,
but size of created immediate should correspond to OperandType for
RegisterOperand.
e.g. OPW128 could be used for RegisterOperands that use v2f64 v4f32
and v8f16. Each RegisterOperands would have different OperandType and
require that immediate is decoded using 64, 32 and 16 bit immediate
respectively.
decodeOperand_<RegClass> only provides width for register decoding,
introduce decodeOperand_<RegClass>_Imm<ImmWidth> that also provides
width for immediate decoding.
Refactor RegisterOperands:
 - decoders get _Imm<ImmWidth> suffix in some cases
 - removed unused RegisterOperands defined via multiclass
 - use different RegisterOperand in a few places, new RegisterOperand's
   decoder corresponds to the number of bits used for operand's encoding
Refactor decoder functions:
 - add asserts for the size of encoding that will be decoded
 - regroup them according to the method of decoding
decodeOperand_<RegClass> (register only, no immediate) decoders can now
create immediate of consistent size, use it for better diagnostic of
'invalid immediate'.

Differential Revision: https://reviews.llvm.org/D142636
2023-02-01 16:52:57 +01:00
Jay Foad
768aed1378 [MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.

Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.

Differential Revision: https://reviews.llvm.org/D142213
2023-01-23 11:31:41 +00:00
Kazu Hirata
caa99a01f5 Use llvm::popcount instead of llvm::countPopulation(NFC) 2023-01-22 12:48:51 -08:00
Fangrui Song
f4c16c4473 [MC] llvm::Optional => std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 21:36:08 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Mateja Marjanovic
595a08847a [AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.

Differential Revision: https://reviews.llvm.org/D138205
2022-11-29 17:02:04 +01:00
Dmitry Preobrazhensky
869fc7eabd [AMDGPU][MC][MI100+] Enable VOP3 variants of dot2c/dot4c/dot8c opcodes
Differential Revision: https://reviews.llvm.org/D138494
2022-11-29 17:38:18 +03:00
Pierre van Houtryve
220147d536 [AMDGPU] Make aperture registers 64 bit
Makes the SRC_(SHARED|PRIVATE)_(BASE|LIMIT) registers 64 bit instead of 32.
They're still usable as 32 bit operands by using the _LO suffix.

Preparation for D137542

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137767
2022-11-22 09:17:59 +00:00
Ivan Kosarev
1b560e6ab7 [AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores.
Reviewed By: dp, arsenm

Differential Revision: https://reviews.llvm.org/D137783
2022-11-14 15:36:18 +00:00
Pierre van Houtryve
7425077e31 [AMDGPU] Add & use hasNamedOperand, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Abinav Puthan Purayil
3d9f011a9c [AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later
Unfortunately, we have a broken handling of this in the runtime of rocm
5.3. The runtime is expected to handle this correctly when v5 becomes
the default.

Differential Revision: https://reviews.llvm.org/D134714
2022-10-11 23:28:43 +05:30
Joe Nash
3648fc5b42 [AMDGPU] Make disassembler convertFMAanyK call more generic
Make support more generic to support future instructions.
Currently NFC.

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D135678
2022-10-11 11:22:25 -04:00
Kazu Hirata
7f90597be6 [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:800:17:
  error: unused variable 'DST_IDX' [-Werror,-Wunused-variable]
2022-10-07 08:27:02 -07:00
Dmitry Preobrazhensky
8f8e4e3b38 [AMDGPU][MC][GFX11] Correct v_fmac_.*_e64_dpp
Differential Revision: https://reviews.llvm.org/D134961
2022-10-07 16:21:55 +03:00
Dmitry Preobrazhensky
485c539391 [AMDGPU][MC][GFX11] Disable non-null src0 for s_waitcnt_*cnt
Differential Revision: https://reviews.llvm.org/D134809
2022-09-29 19:56:03 +03:00
Scott Linder
552539bdac Revert "[NFC][AMDGPU] Refactor AMDGPUDisassembler"
This reverts commit f5831514612cd9e014e4fc7455b75411531fe6e1.
2022-09-21 18:48:42 +00:00
Scott Linder
f583151461 [NFC][AMDGPU] Refactor AMDGPUDisassembler
Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler.

Use lit.local.cfg substitutions and more idiomatic use of split-file to
simplify and extend existing kernel-descriptor disassembly tests.

Add a comment to AMDHSAKernelDescriptor.h, as at least one small set
towards keeping all kernel-descriptor sensitive code in sync.

Reviewed By: kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D130105
2022-09-20 20:37:19 +00:00
Joe Nash
b982ba2a6e [AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Due to the encoding changes in GFX11, we had a hack in place that
    disables the use of VGPRs above 128. This patch removes the need for
    that hack.

    We introduce a new register class VGPR_32_Lo128 which is used for 16-bit
    operands of VOP1, VOP2, and VOPC instructions. This register class only has the
    low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1,
    VOP2, and VOPC instructions are correctly limited to use the first 128
    VGPRs, while the other instructions can freely use all 256.

    We introduce new pseduo-instructions used on GFX11 which have the suffix
    t16 (True 16) to use the VGPR_32_Lo128 register class.

Reviewed By: foad, rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D133723
2022-09-20 09:56:28 -04:00