102 Commits

Author SHA1 Message Date
Aleksandar Spasojevic
1b47135c9d
[AMDGPU] Ensure positive InstOffset for buffer operations (#145504)
GFX12+ buffer ops require positive InstOffset per AMD hardware spec.
Modified assembler/disassembler to reject negative buffer offsets.
2025-09-04 15:37:46 +02:00
Rahul Joshi
dafffe262d
[LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth (#154865)
This change adds an option to specialize decoders per bitwidth, which
can help reduce the (compiled) code size of the decoder code.

**Current state**:
Currently, the code generated by the decoder emitter consists of two key
functions: `decodeInstruction` which is the entry point into the
generated code and `decodeToMCInst` which is invoked when a decode op is
reached while traversing through the decoder table. Both functions are
templated on `InsnType` which is the raw instruction bits that are
supplied to `decodeInstruction`.

Several backends call `decodeInstruction` with different `InsnType`
types, leading to several template instantiations of these functions in
the final code. As an example, AMDGPU instantiates this function with
type `DecoderUInt128` type for decoding 96/128-bit instructions,
`uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding
32-bit instructions. Since there is just one `decodeToMCInst` in the
generated code, it has code that handles decoding for *all* instruction
sizes. However, the decoders emitted for different instructions sizes
rarely have any intersection with each other. That means, in the AMDGPU
case, the instantiation with InsnType == DecoderUInt128 has decoder code
for 32/64-bit instructions that is *never exercised*. Conversely, the
instantiation with InsnType == uint64_t has decoder code for
128/96/32-bit instructions that is never exercised. This leads to
unnecessary dead code in the generated disassembler binary (that the
compiler cannot eliminate by itself).

**New state**:
With this change, we introduce an option
`specialize-decoders-per-bitwidth`. Under this mode, the DecoderEmitter
will generate several versions of `decodeToMCInst` function, one for
each bitwidth. The code is still templated, but will require backends to
specify, for each `InsnType` used, the bitwidth of the instruction that
the type is used to represent using a type-trait `InsnBitWidth`. This
will enable the templated code to choose the right variant of
`decodeToMCInst`. Under this mode, a particular instantiation will only
end up instantiating a single variant of `decodeToMCInst` generated and
that will include only those decoders that are applicable to a single
bitwidth, resulting in elimination of the code duplication through
instantiation and a reduction in code size.

Additionally, under this mode, decoders are uniqued only within a given
bitwidth (as opposed to across all bitwidths without this option), so
the decoder index values assigned are smaller, and consume less bytes in
their ULEB128 encoding. As a result, the generated decoder tables can
also reduce in size.

Adopt this feature for the AMDGPU and RISCV backend. In a release build,
this results in a net 55% reduction in the .text size of
libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For
RISCV, which today uses a single `uint64_t` type, this results in a 3.7%
increase in code size (expected as we instantiate the code 3 times now).

Actual measured sizes are as follows:
```
Baseline commit: 72c04bb882ad70230bce309c3013d9cc2c99e9a7
Configuration: Ubuntu clang version 18.1.3, release build with asserts disabled.
 
AMDGPU        Before       After      Change
======================================================
.text         612327       275607     55% reduction
.rodata       369728       351336      5% reduction          

RISCV:
======================================================
.text          47407       49187      3.7% increase   
.rodata        35768       35839      0.1% increase
```
2025-09-01 13:44:18 -07:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Stanislav Mekhanoshin
f090554359
[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282) 2025-07-11 14:17:03 -07:00
Stanislav Mekhanoshin
00a85e5704
[AMDGPU] gfx1250: MC support for 64-bit literals (#147861) 2025-07-09 22:25:47 -07:00
Rahul Joshi
23b4f4eb9b
[NFC][TableGen] Change DecoderEmitter insertBits to use integer types only (#147613)
The `insertBits` templated function generated by DecoderEmitter is
called with variable `tmp` of type `TmpType` which is:

```
using TmpType = std::conditional_t<std::is_integral<InsnType>::value, InsnType, uint64_t>;
```

That is, `TmpType` is always an integral type. Change the generated
`insertBits` to be valid only for integer types, and eliminate the
unused `insertBits` function from `DecoderUInt128` in
AMDGPUDisassembler.h

Additionally, drop some of the requirements `InsnType` must support as
they no longer seem to be required.
2025-07-09 08:56:07 -07:00
Stanislav Mekhanoshin
fa0b84f23c
[AMDGPU] Rename call instructions from b64 to i64 (#145103)
These get renamed in gfx1250 and on from B64 to I64:

  S_CALL_I64
  S_GET_PC_I64
  S_RFE_I64
  S_SET_PC_I64
  S_SWAP_PC_I64
2025-06-21 21:42:09 -07:00
Ivan Kosarev
66d3980b53
[AMDGPU][NFC] Remove _DEFERRED operands. (#139123)
All immediates are deferred now.
2025-05-09 10:10:53 +01:00
Ivan Kosarev
71f8f2b155
[AMDGPU][NFC] Get rid of OPW constants. (#139074)
We can infer the widths from register classes and represent them as
numbers.
2025-05-08 18:42:07 +01:00
Ivan Kosarev
d9bdc2d6a2
[AMDGPU][Disassembler][NFCI] Always defer immediate operands. (#138885)
Removes the need to parameterise decoders with OperandSemantics,
ImmWidth and MandatoryLiteral.

Likely allows further simplification of handling _DEFERRED immediates.

Tested to work downstream.
2025-05-08 11:43:50 +01:00
Rahul Joshi
6c4caae449
[LLVM][TableGen] Move DecoderEmitter output to anonymous namespace (#136214)
- Move the code generated by DecoderEmitter to anonymous namespace.
- Move AMDGPU's usage of this code from header file to .cpp file.

Note, we get build errors like "call to function 'decodeInstruction'
that is neither visible in the template definition nor found by
argument-dependent lookup" if we do not change AMDGPU.
2025-04-18 04:35:05 -07:00
Brox Chen
5e26ff35c1
[AMDGPU][True16][MC] true16 for v_cmp_lt_f16 (#122499)
True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.
2025-01-14 10:03:36 -05:00
Jun Wang
b2adeae865
[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.
2025-01-03 11:49:51 -08:00
Matt Arsenault
5dd48c4901
AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (#117590)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-25 19:20:51 -08:00
Matt Arsenault
01c9a14ccf
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.
2024-11-21 08:51:58 -08:00
Ivan Kosarev
162386693f
[AMDGPU][MC] Support UC_VERSION_* constants. (#95618)
Our other tools support them, so we want them in LLVM
assembler/disassembler too.
2024-06-18 15:44:14 +01:00
Emma Pilkington
68e814d911
[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400)
These mostly are checking for various reserved bits being set. The diagnostics
for gpu-dependent reserved bits have a bit more context since they seem like the
most likely ones to be observed in practice.

This commit also improves the error handling mechanism for
MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter
that was just being ignored by llvm-objdump, now it returns errors using
Expected<T>.
2024-04-18 13:44:22 -04:00
Jay Foad
60e7ae3f30
[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992)
Speed up disassembly by only calling tryDecodeInst for DecoderTables
that make sense for the current subtarget.

This gives a 1.3x speed-up on check-llvm-mc-disassembler-amdgpu in my
Release+Asserts build.
2024-02-26 13:02:08 +00:00
Jay Foad
42f6f95e08
[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775)
Remove all the code that set and tested Res. Change all convert*
functions to return void since none of them can fail. getInstruction
only has one main point of failure, after all calls to tryDecodeInst
have failed.
2024-02-23 18:44:02 +00:00
Jay Foad
bcbffd99c4
[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379)
Split Dpp8FI and Dpp16FI into two different operands sharing an
AsmOperandClass. They are parsed and rendered identically as fi:1 but
the encoding is different: for DPP16 FI is a single bit, but for DPP8 it
uses two different special values in the src0 field. Having a dedicated
decoder for Dpp8FI allows it to reject other (non-special) src0 values
so that AMDGPUDisassembler::getInstruction no longer needs to call
isValidDPP8 to do post hoc validation of decoded DPP8 instructions.
2024-02-22 09:40:46 +00:00
Stanislav Mekhanoshin
13e64958a0
[AMDGPU] Fix decoder for BF16 inline constants (#82276)
Fix #82039.
2024-02-19 13:45:23 -08:00
Ivan Kosarev
7d19dc50de
[AMDGPU][True16] Support VOP3 source DPP operands. (#80892) 2024-02-08 16:23:00 +00:00
Emma Pilkington
4eb0810922
[llvm-objdump][AMDGPU] Pass ELF ABIVersion through disassembler (#78907)
Admittedly, its a bit ugly to pass the ABIVersion through onSymbolStart
but I'm not sure what a better place for it would be.
2024-02-01 11:26:42 -05:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Mirko Brkušanin
f5868cb6a6
[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062) 2023-12-04 13:04:42 +01:00
Stanislav Mekhanoshin
ab6c3d5034
[AMDGPU] Change the representation of double literals in operands (#68740)
A 64-bit literal can be used as a 32-bit zero or sign extended operand.
In case of double zeroes are added to the low 32 bits. Currently asm
parser stores only high 32 bits of a double into an operand. To support
codegen as requested by the
https://github.com/llvm/llvm-project/issues/67781 we need to change the
representation to store a full 64-bit value so that codegen can simply
add immediates to an instruction.

There is some code to support compatibility with existing tests and asm
kernels. We allow to use short hex strings to represent only a high 32
bit of a double value as a valid literal.
2023-10-12 14:45:45 -07:00
Ivan Kosarev
9310baa596 [AMDGPU][NFC] Add True16 operand definitions.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D156103
2023-09-25 16:48:46 +01:00
Ivan Kosarev
fab28e0e14 Reapply "[AMDGPU] Introduce real and keep fake True16 instructions."
Reverts 6cb3866b1ce9d835402e414049478cea82427cf1.

Analysis of failures on buildbots with expensive checks enabled showed
that the problem was triggered by changes in another commit,
469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug
addressed in #67245.
2023-09-23 22:07:41 +01:00
Ivan Kosarev
6cb3866b1c Revert "[AMDGPU] Introduce real and keep fake True16 instructions."
This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to
failures on expensive checks.
2023-09-22 15:40:26 +01:00
Ivan Kosarev
0f864c7b8b [AMDGPU] Introduce real and keep fake True16 instructions.
The existing fake True16 instructions using 32-bit VGPRs are supposed to
co-exist with real ones until all the necessary True16 functionality is
implemented and relevant tests are updated.

Reviewed By: arsenm, Joe_Nash

Differential Revision: https://reviews.llvm.org/D156101
2023-09-22 10:57:56 +01:00
Austin Kerbow
69447d6afe [AMDGPU] Add ASM and MC updates for preloading kernargs
Add assembler directives for preloading kernel arguments that correspond
to new fields in the kernel descriptor for the length and offset of
arguments that will be placed in SGPRs prior to kernel launch. Alignment
of the arguments in SGPRs is equivalent to the kernarg segment when
accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated
directly after other user SGPRs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159459
2023-09-19 15:45:16 -07:00
Scott Linder
986001c827 [AMDGPU] Improve assembler + disassembler handling of kernel descriptors
* Relax the AsmParser to accept `.amdhsa_wavefront_size32 0` when the
  `.amdhsa_shared_vgpr_count` directive is present.
* Teach the KD disassembler to respect the setting of
  KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32 when calculating the
  value of `.amdhsa_next_free_vgpr`.
* Teach the KD disassembler to disassemble COMPUTE_PGM_RSRC3 for gfx90a
  and gfx10+.
* Include "pseudo directive" comments for gfx10 fields which are not
  controlled by any assembler directive.
* Fix disassembleObject failure diagnostic in llvm-objdump to not
  hard-code a comment string, and to follow the convention of not
  capitalizing the first sentence.

Reviewed By: rochauha

Differential Revision: https://reviews.llvm.org/D128014
2023-07-06 21:20:51 +00:00
Scott Linder
ede070a28d [NFC][AMDGPU] Refactor AMDGPUDisassembler
Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler.

Use split-file to simplify and extend existing kernel-descriptor
disassembly tests.

Add a comment to AMDHSAKernelDescriptor.h, as at least one small set
towards keeping all kernel-descriptor sensitive code in sync.

Reviewed By: MaskRay, kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D130105
2023-06-29 16:06:09 +00:00
Elliot Goodrich
b0abd4893f [llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
2023-06-25 15:42:22 +01:00
Ivan Kosarev
e23891a382 [AMDGPU][Disassembler] Fix a spurious error message in an instruction comment.
The patch prevents pollution of instruction comments with error messages
generated during unsuccessful decoding attempts.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D149049
2023-04-26 12:50:17 +01:00
Petar Avramovic
b0c1a45ba5 AMDGPU/MC: Refactor decoders. Rework decoders for float immediates
decodeFPImmed creates immediate operand using register operand width,
but size of created immediate should correspond to OperandType for
RegisterOperand.
e.g. OPW128 could be used for RegisterOperands that use v2f64 v4f32
and v8f16. Each RegisterOperands would have different OperandType and
require that immediate is decoded using 64, 32 and 16 bit immediate
respectively.
decodeOperand_<RegClass> only provides width for register decoding,
introduce decodeOperand_<RegClass>_Imm<ImmWidth> that also provides
width for immediate decoding.
Refactor RegisterOperands:
 - decoders get _Imm<ImmWidth> suffix in some cases
 - removed unused RegisterOperands defined via multiclass
 - use different RegisterOperand in a few places, new RegisterOperand's
   decoder corresponds to the number of bits used for operand's encoding
Refactor decoder functions:
 - add asserts for the size of encoding that will be decoded
 - regroup them according to the method of decoding
decodeOperand_<RegClass> (register only, no immediate) decoders can now
create immediate of consistent size, use it for better diagnostic of
'invalid immediate'.

Differential Revision: https://reviews.llvm.org/D142636
2023-02-01 16:52:57 +01:00
Fangrui Song
f4c16c4473 [MC] llvm::Optional => std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 21:36:08 +00:00
Mateja Marjanovic
595a08847a [AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.

Differential Revision: https://reviews.llvm.org/D138205
2022-11-29 17:02:04 +01:00
Dmitry Preobrazhensky
8f8e4e3b38 [AMDGPU][MC][GFX11] Correct v_fmac_.*_e64_dpp
Differential Revision: https://reviews.llvm.org/D134961
2022-10-07 16:21:55 +03:00
Scott Linder
552539bdac Revert "[NFC][AMDGPU] Refactor AMDGPUDisassembler"
This reverts commit f5831514612cd9e014e4fc7455b75411531fe6e1.
2022-09-21 18:48:42 +00:00
Scott Linder
f583151461 [NFC][AMDGPU] Refactor AMDGPUDisassembler
Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler.

Use lit.local.cfg substitutions and more idiomatic use of split-file to
simplify and extend existing kernel-descriptor disassembly tests.

Add a comment to AMDHSAKernelDescriptor.h, as at least one small set
towards keeping all kernel-descriptor sensitive code in sync.

Reviewed By: kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D130105
2022-09-20 20:37:19 +00:00
Joe Nash
b982ba2a6e [AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Due to the encoding changes in GFX11, we had a hack in place that
    disables the use of VGPRs above 128. This patch removes the need for
    that hack.

    We introduce a new register class VGPR_32_Lo128 which is used for 16-bit
    operands of VOP1, VOP2, and VOPC instructions. This register class only has the
    low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1,
    VOP2, and VOPC instructions are correctly limited to use the first 128
    VGPRs, while the other instructions can freely use all 256.

    We introduce new pseduo-instructions used on GFX11 which have the suffix
    t16 (True 16) to use the VGPR_32_Lo128 register class.

Reviewed By: foad, rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D133723
2022-09-20 09:56:28 -04:00
Dmitry Preobrazhensky
2a6532d542 [AMDGPU][MC][GFX11] Correct disassembly of *_e64_dpp opcodes which support op_sel
These opcodes cannot be disassembled because op_sel operand is missing - it must be added manually.
See https://github.com/llvm/llvm-project/issues/56512 for detailed issue analysis.

Differential Revision: https://reviews.llvm.org/D129637
2022-07-15 13:11:59 +03:00
Joe Nash
07b7fada73 [AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32.
This patch includes MC layer support only.

A VOPD instruction is constituted of an X component (for which there are
13 possible opcodes) and a Y component (for which there are the 13 X
opcodes plus 3 more). Most of the complexity in defining and parsing
a VOPD operation arises from the possible different total numbers of
operands and deferred parsing of certain operands depending on the
constituent X and Y opcodes.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D128218
2022-06-24 11:08:39 -04:00
Joe Nash
be1082c6d5 [AMDGPU] gfx11 VOPC instructions
Supports encoding existing instrutions on gfx11 and MC support for the new VOPC
dpp instructions.

Patch 19/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126978

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126989
2022-06-09 15:22:42 -04:00
Joe Nash
40f35cef89 [AMDGPU] gfx11 VOP3P instruction MC support
Includes dpp versions of VOP3P instructions.

Patch 18/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126917

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126978
2022-06-08 13:32:01 -04:00
Joe Nash
e243ead6fc Reland [AMDGPU] gfx11 vop3dpp instructions
There was an issue with encoding wide (>64 bit) instructions on
BigEndian hosts, which is fixed in D127195. Therefore reland this.

gfx11 adds the ability to use dpp modifiers on vop3 instructions.
This patch adds machine code layer support for that. The MCCodeEmitter
is changed to use APInt instead of uint64_t to support these wider
instructions.

Patch 16/N for upstreaming of AMDGPU gfx11 architecture

Differential Revision: https://reviews.llvm.org/D126483
2022-06-07 14:49:13 -04:00
Joe Nash
eaed07eb7e Revert "[AMDGPU] gfx11 vop3dpp instructions"
This reverts commit 99a83b1286748501e0ccf199a582dc3ec5451ef5.
2022-06-06 17:12:09 -04:00
Joe Nash
99a83b1286 [AMDGPU] gfx11 vop3dpp instructions
gfx11 adds the ability to use dpp modifiers on vop3 instructions.
This patch adds machine code layer support for that. The MCCodeEmitter
is changed to use APInt instead of uint64_t to support these wider
instructions.

Patch 16/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126475

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126483
2022-06-06 09:34:59 -04:00
Maksim Panchenko
bed9efed71 [MCDisassembler] Disambiguate Size parameter in tryAddingSymbolicOperand()
MCSymbolizer::tryAddingSymbolicOperand() overloaded the Size parameter
to specify either the instruction size or the operand size depending on
the architecture. However, for proper symbolic disassembly on X86, we
need to know both sizes, as an instruction can have two operands, and
the instruction size cannot be reliably calculated based on the operand
offset and its size. Hence, split Size into OpSize and InstSize.

For X86, the new interface allows to fix a couple of issues:
  * Correctly adjust the value of PC-relative operands.
  * Set operand size to zero when the operand is specified implicitly.

Differential Revision: https://reviews.llvm.org/D126101
2022-05-25 13:44:32 -07:00