llvm-project

Author	SHA1	Message	Date
Mirko Brkušanin	b7a0445565	[AMDGPU][MC] Update old and add new min/max instructions for gfx1170 (#184601 )	2026-03-09 18:38:45 +01:00
Mirko Brkušanin	d0f50d5574	[AMDGPU] Remove DX10_CLAMP and IEEE bits from gfx1170 (#182107 ) Add `DX10ClampAndIEEEMode` feature and set it for every subtarget prior to gfx1170	2026-03-04 12:16:41 +01:00
Mariusz Sikora	037fd6eaaf	[AMDGPU] Add VINTERP encoding to gfx13 (#182481 )	2026-03-02 10:50:08 +01:00
Son Tuan Vu	b2eb876ca6	[AMDGPU][NFC] Use range-based for loops in KD reserved bytes checks (#182340 )	2026-02-20 07:53:36 -08:00
Mirko Brkušanin	829afc4c91	[AMDGPU] Add WMMA and SWMMAC instructions for gfx1170 (#180731 ) Introduce two new subtarget features: - WMMA256bInsts for GFX11 WMMA instructions and - WMMA128bInsts for GFX1170 and GFX12 WMMA and SWMMAC instructions Some WMMA instructions have changed from GFX 11.0 to GFX 11.7 so new Real versions were added with "_gfx1170" suffix. For consistency all WMMA and SWMMAC GFX11.7 instructions use this suffix. To resolve decoding issues between different formats for some WMMA instructions between GFX 11 and GFX 11.7, new decoding tables were added.	2026-02-18 19:17:48 +01:00
Mariusz Sikora	ab85d5f642	[AMDGPU] Add VOP1 support for gfx13 (#177603 ) Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>	2026-01-27 13:40:10 +01:00
Mariusz Sikora	3c0f5045e1	[AMDGPU] Add FeatureGFX13 and SMEM encoding for gfx13 (#177567 ) For now list of features is based on gfx12 and gfx1250 --------- Co-authored-by: Jay Foad <jay.foad@amd.com>	2026-01-26 14:16:36 +01:00
Shilei Tian	c253b9f9ca	[AMDGPU] Fix inline constant encoding for `v_pk_fmac_f16` (#176659 ) This PR handles`v_pk_fmac_f16` inline constant encoding/decoding differences between pre-GFX11 and GFX11+ hardware. - Pre-GFX11: fp16 inline constants produce `(f16, 0)` - value in low 16 bits, zero in high. - GFX11+: fp16 inline constants are duplicated to both halves `(f16, f16)`. Fixes #94116.	2026-01-20 19:14:59 -05:00
Craig Topper	8eb28ca83d	[AMDGPU] Remove implicit conversions of MCRegister to unsigned. NFC (#167284 ) Use MCRegister instead of MCPhysReg or use MCRegister::id().	2025-11-11 08:54:27 -08:00
Jun Wang	b8c7013060	[AMDGPU][MC] Fix disassembler warning for v_cmpx instructions in GFX9 (#163825 ) In GFX10+, the v_cmpx_* instructions use EXEC as the implicit dst and do not have explicit dst. Therefore a warning is issued by the disassembler when the dst is not EXEC. However, in GFX9 and earlier, those instructions have EXEC as the implicit dst as well as an explicit dst. The aforementioned warning should not be issued.	2025-10-17 11:11:58 -07:00
Ivan Kosarev	33503d016e	[AMDGPU] Preserve literal operands on disassembling. (#163376 ) Fixes round-tripping where literals used to be reassembled into inline constants. Also fix the %extract-encodings substitution in lit tests to emit each instruction code once and not twice. Eliminate the Literal64 field.	2025-10-16 11:23:53 +01:00
Ivan Kosarev	20f41ed8c1	[AMDGPU][MC] Avoid creating lit64() operands unless asked or needed. (#161191 ) There should normally be no need to generate implicit lit64() modifiers on the assembler side. It's the encoder's responsibility to recognise literals that are implicitly 64 bits wide. The exceptions are where we rewrite floating-point operand values as integer ones, which would not be assembled back to the original values unless wrapped into lit64(). Respect explicit lit() modifiers for non-inline values as necessary to avoid regressions in MC tests. This change still doesn't prevent use of inline constants where lit()/lit64 is specified; subject to a separate patch. On disassembling, only create lit64() operands where necessary for correct round-tripping. Add round-tripping tests where useful and feasible.	2025-10-08 10:51:55 +01:00
Matt Arsenault	1a5494ca4a	AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272 ) This removes special case processing in TargetInstrInfo::getRegClass to fixup register operands which depending on the subtarget support AGPRs, or require even aligned registers. This regresses assembler diagnostics, which currently work by hackily accepting invalid cases and then post-rejecting a validly parsed instruction. On the plus side this now emits a comment when disassembling unaligned registers for targets with the alignment requirement.	2025-10-08 11:19:54 +09:00
Shilei Tian	173063cf05	[AMDGPU][Disassembler] Use target feature for `.amdhsa_reserve_xnack_mask` instead of hard code zero (#161771 ) There is no test change at this moment because we don't have a target that has this feature by default yet.	2025-10-03 09:16:57 -04:00
Ivan Kosarev	9e55d81c68	[AMDGPU][AsmParser] Introduce MC representation for lit() and lit64(). (#160316 ) And rework the lit64() support to use it. The rules for when to add lit64() can be simplified and improved. In this change, however, we just follow the existing conventions on the assembler and disassembler sides. In codegen we do not (and normally should not need to) add explicit lit() and lit64() modifiers, so the codegen tests lose them. The change is an NFCI otherwise. Simplifies printing operands.	2025-09-24 12:35:50 +01:00
Matt Arsenault	4644099b54	AMDGPU: Remove most manual AVLdSt decoder code (#157861 ) This was additional hacking around using incorrect register class constraints for paired data operands. I'm not really sure why we need any of what's left. In particular the IS_VGPR special case seems backwards from how the encoding works.	2025-09-11 08:13:58 +09:00
Pierre van Houtryve	dcaa29c8ed	Revert "[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588 )" (#157639 ) This reverts commit be17791f2624f22b3ed24a2539406164a379125d. This is not necessary for gfx1250 anymore.	2025-09-10 10:20:59 +02:00
Aleksandar Spasojevic	1b47135c9d	[AMDGPU] Ensure positive InstOffset for buffer operations (#145504 ) GFX12+ buffer ops require positive InstOffset per AMD hardware spec. Modified assembler/disassembler to reject negative buffer offsets.	2025-09-04 15:37:46 +02:00
Stanislav Mekhanoshin	6aebbb0a85	[AMDGPU] Define 1024 VGPRs on gfx1250 (#156765 ) This is a baseline support, it is not useable yet.	2025-09-03 16:25:18 -07:00
Rahul Joshi	0196d7ec69	[MC][DecoderEmitter] Fix build warning: explicit specialization cannot have a storage class (#156375 ) Move `InsnBitWidth` template into anonymous namespace in the generated code and move template specialization of `InsnBitWidth` to anonymous namespace as well, and drop `static` for them. This makes `InsnBitWidth` completely private to each target and fixes the "explicit specialization cannot have a storage class" warning as well as any potential linker errors if `InsnBitWidth` is kept in the `llvm::MCD` namespace.	2025-09-02 07:28:36 -07:00
Kazu Hirata	e8b5fbd5fa	[AMDGPU, RISCV] Fix warnings This patch fixes: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:451:13: error: explicit specialization cannot have a storage class [-Werror,-Wexplicit-specialization-storage-class] llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:452:13: error: explicit specialization cannot have a storage class [-Werror,-Wexplicit-specialization-storage-class] llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:454:1: error: explicit specialization cannot have a storage class [-Werror,-Wexplicit-specialization-storage-class] llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:456:1: error: explicit specialization cannot have a storage class [-Werror,-Wexplicit-specialization-storage-class] While I am at it, this patch changes the storage types of InsnBitWidth specilization to "inline constexpr" to avoid linker errors.	2025-09-01 16:34:17 -07:00
Rahul Joshi	dafffe262d	[LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth (#154865 ) This change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. Current state: Currently, the code generated by the decoder emitter consists of two key functions: `decodeInstruction` which is the entry point into the generated code and `decodeToMCInst` which is invoked when a decode op is reached while traversing through the decoder table. Both functions are templated on `InsnType` which is the raw instruction bits that are supplied to `decodeInstruction`. Several backends call `decodeInstruction` with different `InsnType` types, leading to several template instantiations of these functions in the final code. As an example, AMDGPU instantiates this function with type `DecoderUInt128` type for decoding 96/128-bit instructions, `uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding 32-bit instructions. Since there is just one `decodeToMCInst` in the generated code, it has code that handles decoding for all instruction sizes. However, the decoders emitted for different instructions sizes rarely have any intersection with each other. That means, in the AMDGPU case, the instantiation with InsnType == DecoderUInt128 has decoder code for 32/64-bit instructions that is never exercised. Conversely, the instantiation with InsnType == uint64_t has decoder code for 128/96/32-bit instructions that is never exercised. This leads to unnecessary dead code in the generated disassembler binary (that the compiler cannot eliminate by itself). New state: With this change, we introduce an option `specialize-decoders-per-bitwidth`. Under this mode, the DecoderEmitter will generate several versions of `decodeToMCInst` function, one for each bitwidth. The code is still templated, but will require backends to specify, for each `InsnType` used, the bitwidth of the instruction that the type is used to represent using a type-trait `InsnBitWidth`. This will enable the templated code to choose the right variant of `decodeToMCInst`. Under this mode, a particular instantiation will only end up instantiating a single variant of `decodeToMCInst` generated and that will include only those decoders that are applicable to a single bitwidth, resulting in elimination of the code duplication through instantiation and a reduction in code size. Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For RISCV, which today uses a single `uint64_t` type, this results in a 3.7% increase in code size (expected as we instantiate the code 3 times now). Actual measured sizes are as follows: ``` Baseline commit: 72c04bb882ad70230bce309c3013d9cc2c99e9a7 Configuration: Ubuntu clang version 18.1.3, release build with asserts disabled. AMDGPU Before After Change ====================================================== .text 612327 275607 55% reduction .rodata 369728 351336 5% reduction RISCV: ====================================================== .text 47407 49187 3.7% increase .rodata 35768 35839 0.1% increase ```	2025-09-01 13:44:18 -07:00
Stanislav Mekhanoshin	438c099c23	[AMDGPU] gfx1250 kernel descriptor update (#155008 )	2025-08-22 12:58:41 -07:00
Rahul Joshi	22f8693248	[NFC][MC][Decoder] Extract fixed pieces of decoder code into new header file (#154802 ) Extract fixed functions generated by decoder emitter into a new MCDecoder.h header.	2025-08-21 15:06:43 -07:00
Gang Chen	ef68d1587d	[AMDGPU] upstream barrier count reporting part1 (#154409 )	2025-08-19 16:42:31 -07:00
Stanislav Mekhanoshin	d08c2977e8	[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203 )	2025-08-05 14:35:48 -07:00
Stanislav Mekhanoshin	37fe9f6382	[AMDGPU] Add gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 MC support (#152014 ) This adds new VOP3PX2e encoding	2025-08-04 14:20:12 -07:00
Pierre van Houtryve	be17791f26	[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588 ) Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.	2025-07-29 11:38:43 +02:00
Changpeng Fang	d6094370cb	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-21 10:09:42 -07:00
Stanislav Mekhanoshin	f090554359	[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282 )	2025-07-11 14:17:03 -07:00
Stanislav Mekhanoshin	00a85e5704	[AMDGPU] gfx1250: MC support for 64-bit literals (#147861 )	2025-07-09 22:25:47 -07:00
Shilei Tian	473f992c1f	[AMDGPU] Add the support for `v_cvt_f32_bf16` on gfx1250 (#145632 ) Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-06-25 16:02:40 -04:00
Stanislav Mekhanoshin	d06c2efd67	[AMDGPU] Support v_lshl_add_u64 in gfx1250 (#145591 ) It also brings in some DPP changes needed to define it.	2025-06-24 15:49:01 -07:00
Matt Arsenault	092ef1da45	AMDGPU: Use reportFatalUsageError for unsupported disassembly error (#145264 )	2025-06-23 17:52:27 +09:00
Stanislav Mekhanoshin	fa0b84f23c	[AMDGPU] Rename call instructions from b64 to i64 (#145103 ) These get renamed in gfx1250 and on from B64 to I64: S_CALL_I64 S_GET_PC_I64 S_RFE_I64 S_SET_PC_I64 S_SWAP_PC_I64	2025-06-21 21:42:09 -07:00
Andrew Rogers	19658d1474	[llvm] annotate interfaces in llvm/Target for DLL export (#143615 ) ## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang	2025-06-17 13:28:45 -07:00
Ivan Kosarev	66d3980b53	[AMDGPU][NFC] Remove _DEFERRED operands. (#139123 ) All immediates are deferred now.	2025-05-09 10:10:53 +01:00
Ivan Kosarev	71f8f2b155	[AMDGPU][NFC] Get rid of OPW constants. (#139074 ) We can infer the widths from register classes and represent them as numbers.	2025-05-08 18:42:07 +01:00
Ivan Kosarev	d9bdc2d6a2	[AMDGPU][Disassembler][NFCI] Always defer immediate operands. (#138885 ) Removes the need to parameterise decoders with OperandSemantics, ImmWidth and MandatoryLiteral. Likely allows further simplification of handling _DEFERRED immediates. Tested to work downstream.	2025-05-08 11:43:50 +01:00
Rahul Joshi	6c4caae449	[LLVM][TableGen] Move DecoderEmitter output to anonymous namespace (#136214 ) - Move the code generated by DecoderEmitter to anonymous namespace. - Move AMDGPU's usage of this code from header file to .cpp file. Note, we get build errors like "call to function 'decodeInstruction' that is neither visible in the template definition nor found by argument-dependent lookup" if we do not change AMDGPU.	2025-04-18 04:35:05 -07:00
Mariusz Sikora	575fde0995	[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038 ) - Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and image_bvh_dual_intersect_ray machine instruction. - Add llvm_v10i32_ty and llvm_v10f32_ty --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>	2025-03-19 07:35:09 +01:00
Ivan Kosarev	15869a861b	[AMDGPU][MC] Don't crash on decoding invalid SOP1 ssrc0 operands. (#130302 ) These are encoded as 8-bit fields.	2025-03-08 01:10:09 +00:00
Jun Wang	bc91accbfe	[AMDGPU][MC] Disassembler warning for v_cmpx instructions (#127925 ) For GFX10+ the destination reg of v_cmpx instructions is implicitly EXEC, which is encoded as 0x7E. However, the disassembler does not check this field, thus allowing any value. With this patch, if the field is not EXEC a warning is issued.	2025-02-27 09:17:18 -08:00
Pierre van Houtryve	5231736329	[AMDGPU] Do not allow M0 as v_readfirstlane_b32 dst (#128851 ) M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is effectively useless. Represent this by restricting the dest RC of that instruction to `SReg_32_XM0` which excludes M0. There is a lot of test changes due to the register class changing, but most changes are trivial. In some cases, an extra register and `s_mov_b32` is needed. Fixes SWDEV-513269	2025-02-26 13:14:03 +01:00
Rahul Joshi	bee9664970	[TableGen] Emit OpName as an enum class instead of a namespace (#125313 ) - Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).	2025-02-12 08:19:30 -08:00
Stanislav Mekhanoshin	7639242155	[AMDGPU] Create new directive .amdhsa_inst_pref_size (#126622 ) The field INST_PREF_SIZE is available since gfx11.	2025-02-11 08:35:45 -08:00
Brox Chen	5e26ff35c1	[AMDGPU][True16][MC] true16 for v_cmp_lt_f16 (#122499 ) True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.	2025-01-14 10:03:36 -05:00
Jun Wang	b2adeae865	[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200 ) For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem while ensuring null cannot be used as S#, T#, or V#.	2025-01-03 11:49:51 -08:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	22503a9df1	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:27:01 -08:00

1 2 3 4 5 ...

294 Commits