llvm-project

Author	SHA1	Message	Date
Gang Chen	ef68d1587d	[AMDGPU] upstream barrier count reporting part1 (#154409 )	2025-08-19 16:42:31 -07:00
Stanislav Mekhanoshin	d08c2977e8	[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203 )	2025-08-05 14:35:48 -07:00
Stanislav Mekhanoshin	37fe9f6382	[AMDGPU] Add gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 MC support (#152014 ) This adds new VOP3PX2e encoding	2025-08-04 14:20:12 -07:00
Pierre van Houtryve	be17791f26	[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588 ) Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.	2025-07-29 11:38:43 +02:00
Changpeng Fang	d6094370cb	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-21 10:09:42 -07:00
Stanislav Mekhanoshin	f090554359	[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282 )	2025-07-11 14:17:03 -07:00
Stanislav Mekhanoshin	00a85e5704	[AMDGPU] gfx1250: MC support for 64-bit literals (#147861 )	2025-07-09 22:25:47 -07:00
Rahul Joshi	23b4f4eb9b	[NFC][TableGen] Change DecoderEmitter `insertBits` to use integer types only (#147613 ) The `insertBits` templated function generated by DecoderEmitter is called with variable `tmp` of type `TmpType` which is: ``` using TmpType = std::conditional_t<std::is_integral<InsnType>::value, InsnType, uint64_t>; ``` That is, `TmpType` is always an integral type. Change the generated `insertBits` to be valid only for integer types, and eliminate the unused `insertBits` function from `DecoderUInt128` in AMDGPUDisassembler.h Additionally, drop some of the requirements `InsnType` must support as they no longer seem to be required.	2025-07-09 08:56:07 -07:00
Shilei Tian	473f992c1f	[AMDGPU] Add the support for `v_cvt_f32_bf16` on gfx1250 (#145632 ) Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-06-25 16:02:40 -04:00
Stanislav Mekhanoshin	d06c2efd67	[AMDGPU] Support v_lshl_add_u64 in gfx1250 (#145591 ) It also brings in some DPP changes needed to define it.	2025-06-24 15:49:01 -07:00
Matt Arsenault	092ef1da45	AMDGPU: Use reportFatalUsageError for unsupported disassembly error (#145264 )	2025-06-23 17:52:27 +09:00
Stanislav Mekhanoshin	fa0b84f23c	[AMDGPU] Rename call instructions from b64 to i64 (#145103 ) These get renamed in gfx1250 and on from B64 to I64: S_CALL_I64 S_GET_PC_I64 S_RFE_I64 S_SET_PC_I64 S_SWAP_PC_I64	2025-06-21 21:42:09 -07:00
Andrew Rogers	19658d1474	[llvm] annotate interfaces in llvm/Target for DLL export (#143615 ) ## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang	2025-06-17 13:28:45 -07:00
Ivan Kosarev	66d3980b53	[AMDGPU][NFC] Remove _DEFERRED operands. (#139123 ) All immediates are deferred now.	2025-05-09 10:10:53 +01:00
Ivan Kosarev	71f8f2b155	[AMDGPU][NFC] Get rid of OPW constants. (#139074 ) We can infer the widths from register classes and represent them as numbers.	2025-05-08 18:42:07 +01:00
Ivan Kosarev	d9bdc2d6a2	[AMDGPU][Disassembler][NFCI] Always defer immediate operands. (#138885 ) Removes the need to parameterise decoders with OperandSemantics, ImmWidth and MandatoryLiteral. Likely allows further simplification of handling _DEFERRED immediates. Tested to work downstream.	2025-05-08 11:43:50 +01:00
Rahul Joshi	6c4caae449	[LLVM][TableGen] Move DecoderEmitter output to anonymous namespace (#136214 ) - Move the code generated by DecoderEmitter to anonymous namespace. - Move AMDGPU's usage of this code from header file to .cpp file. Note, we get build errors like "call to function 'decodeInstruction' that is neither visible in the template definition nor found by argument-dependent lookup" if we do not change AMDGPU.	2025-04-18 04:35:05 -07:00
Mariusz Sikora	575fde0995	[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038 ) - Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and image_bvh_dual_intersect_ray machine instruction. - Add llvm_v10i32_ty and llvm_v10f32_ty --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>	2025-03-19 07:35:09 +01:00
Ivan Kosarev	15869a861b	[AMDGPU][MC] Don't crash on decoding invalid SOP1 ssrc0 operands. (#130302 ) These are encoded as 8-bit fields.	2025-03-08 01:10:09 +00:00
Jun Wang	bc91accbfe	[AMDGPU][MC] Disassembler warning for v_cmpx instructions (#127925 ) For GFX10+ the destination reg of v_cmpx instructions is implicitly EXEC, which is encoded as 0x7E. However, the disassembler does not check this field, thus allowing any value. With this patch, if the field is not EXEC a warning is issued.	2025-02-27 09:17:18 -08:00
Pierre van Houtryve	5231736329	[AMDGPU] Do not allow M0 as v_readfirstlane_b32 dst (#128851 ) M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is effectively useless. Represent this by restricting the dest RC of that instruction to `SReg_32_XM0` which excludes M0. There is a lot of test changes due to the register class changing, but most changes are trivial. In some cases, an extra register and `s_mov_b32` is needed. Fixes SWDEV-513269	2025-02-26 13:14:03 +01:00
Rahul Joshi	bee9664970	[TableGen] Emit OpName as an enum class instead of a namespace (#125313 ) - Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).	2025-02-12 08:19:30 -08:00
Stanislav Mekhanoshin	7639242155	[AMDGPU] Create new directive .amdhsa_inst_pref_size (#126622 ) The field INST_PREF_SIZE is available since gfx11.	2025-02-11 08:35:45 -08:00
Brox Chen	5e26ff35c1	[AMDGPU][True16][MC] true16 for v_cmp_lt_f16 (#122499 ) True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.	2025-01-14 10:03:36 -05:00
Jun Wang	b2adeae865	[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200 ) For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem while ensuring null cannot be used as S#, T#, or V#.	2025-01-03 11:49:51 -08:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	22503a9df1	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:27:01 -08:00
Matt Arsenault	5dd48c4901	AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 of gfx950 (#117590 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:20:51 -08:00
Matt Arsenault	cd20fc0772	AMDGPU: Remove wavefrontsize64 feature from dummy target (#117410 ) This is a refinement for the existing hack. With this, the default target will have neither wavefrontsize feature present, unless it was explicitly specified. That is, getWavefrontSize() == 64 no longer implies +wavefrontsize64. getWavefrontSize() == 32 does imply +wavefrontsize32. Continue to assume the value is 64 with no wavesize feature. This maintains the codegenable property without any code that directly cares about the wavesize needing to worry about it. Introduce an isWaveSizeKnown helper to check if we know the wavesize is accurate based on having one of the features explicitly set, or a known target-cpu. I'm not sure what's going on in wave_any.s. It's testing what happens when both wavesizes are enabled, but this is treated as an error in codegen. We now treat wave32 as the winning case, so some cases that were previously printed as vcc are now vcc_lo.	2024-11-23 09:27:47 -08:00
Matt Arsenault	8b087d6422	AMDGPU: Move default wavesize hack for disassembler (#117422 ) You cannot adjust the disassembler's subtarget. llvm-mc passes the originally constructed MCSubtargetInfo around, rather than querying the pointer in the disassembler instance.	2024-11-23 09:24:44 -08:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Brox Chen	9fb01fcd9f	[AMDGPU][MC][True16] Support VOP2 instructions with true16 format (#115233 ) Support true16 format for VOP2 instructions in MC This patch updates the true16 and fake16 vop_profile for the following instructions and update the asm/dasm tests: v_fmac_f16 v_fmamk_f16 v_fmaak_f16 It seems vop2_t16_promote.s files are not yet updated with true16 flag in the previous batch update. It will be updated seperately	2024-11-20 11:33:04 -05:00
Brox Chen	abff8fe2a9	[AMDGPU][True16][MC] VINTERP instructions supporting true16/fake16 (#113634 ) Update VInterp instructions with true16 and fake16 formats. This patch includes instructions: v_interp_p10_f16_f32 v_interp_p2_f16_f32 v_interp_p10_rtz_f16_f32 v_interp_p2_rtz_f16_f32 dasm test vinterp-fake16.txt is removed and the testline are merged into vinterp.txt which handles both true16/fake16 cases	2024-11-14 18:22:37 -05:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Craig Topper	fd50cdfb94	[AMDGPU] Use MCRegister. NFC	2024-09-28 11:40:25 -07:00
Jun Wang	f6a8eb98b1	[AMDGPU][MC] Disallow null as saddr in flat instructions (#101730 ) Some flat instructions have an saddr operand. When 'null' is provided as saddr, it may have the same encoding as another instruction. For example, the instructions 'global_atomic_add v1, v2, null' and 'global_atomic_add v[1:2], v2, off' have the same encoding. This patch disallows having null as saddr.	2024-09-24 11:08:41 +04:00
Jay Foad	73b8074e68	[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414 )	2024-09-20 13:45:04 +01:00
Brox Chen	35e27c0ee5	[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510 ) This is a large patch includes the MC level support for V_CVT_F16_F32, V_CVT_F32_F16 and V_LDEXP_F16 in true16 format. This patch includes the asm/disasm changes to encode/decode the 16bit vsrc, vdst and src modifieres for vop and dpp format. This patch is a dependency for many 16 bit instructions while only three instructions are updated to make it easier to review. There will be another patch to support these three instructions in the codeGen level, this patch just replaces these two instructions with its fake16 format.	2024-09-11 10:48:11 -04:00
Craig Topper	c1b3ebba79	[MC] Update MCOperand::getReg/setReg/createReg and MCInstBuilder::addReg to use MCRegister. (#106015 ) Replace unsigned with MCRegister. Update some ternary operators that started giving errors.	2024-08-26 09:37:49 -07:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Stanislav Mekhanoshin	b132dd41eb	[AMDGPU] Remove wavefrontsize feature from GFX10+ (#98400 ) Processor definition shall not include a default feature which may be switched off by a different wave size. This allows not to write -mattr=-wavefrontsize32,+wavefrontsize64 in tests.	2024-07-16 01:02:25 -07:00
Carl Ritson	e83e53b702	[AMDGPU][MC] Allow UC_VERSION_* constant reuse (#96461 ) If more than one disassembler is created for a context then allow reuse of existing constants. Warn if constants values do not match.	2024-07-07 17:39:03 +09:00
Jay Foad	bb973785c9	[AMDGPU] Only reinitialize disassembler Bytes array when needed. NFC. (#96666 )	2024-06-27 15:45:30 +01:00
Ivan Kosarev	162386693f	[AMDGPU][MC] Support UC_VERSION_* constants. (#95618 ) Our other tools support them, so we want them in LLVM assembler/disassembler too.	2024-06-18 15:44:14 +01:00
luolent	a98a6e95be	Add clarifying parenthesis around non-trivial conditions in ternary expressions. (#90391 ) Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868) Parenthesis are added as requested on ternary operators with non trivial conditions. I used this [precedence table](https://en.cppreference.com/w/cpp/language/operator_precedence) for reference, to make sure we get the expected behavior on each change.	2024-05-04 18:38:45 +01:00
Stanislav Mekhanoshin	6e722bbe30	[AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244 )	2024-04-26 13:02:57 -07:00
Emma Pilkington	68e814d911	[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400 ) These mostly are checking for various reserved bits being set. The diagnostics for gpu-dependent reserved bits have a bit more context since they seem like the most likely ones to be observed in practice. This commit also improves the error handling mechanism for MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter that was just being ignored by llvm-objdump, now it returns errors using Expected<T>.	2024-04-18 13:44:22 -04:00
Jay Foad	60e7ae3f30	[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992 ) Speed up disassembly by only calling tryDecodeInst for DecoderTables that make sense for the current subtarget. This gives a 1.3x speed-up on check-llvm-mc-disassembler-amdgpu in my Release+Asserts build.	2024-02-26 13:02:08 +00:00
Jay Foad	42f6f95e08	[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775 ) Remove all the code that set and tested Res. Change all convert* functions to return void since none of them can fail. getInstruction only has one main point of failure, after all calls to tryDecodeInst have failed.	2024-02-23 18:44:02 +00:00
Jay Foad	3b7d43301e	[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491 ) Now that there is no special checking for valid DPP encodings, these instructions can use the same DecoderNamespace as other 64- or 96-bit instructions. Also clean up setting DecoderNamespace: in most cases it should be set as a pair with AssemblerPredicate.	2024-02-22 11:18:18 +00:00

1 2 3 4 5 ...

280 Commits