llvm-project

Author	SHA1	Message	Date
Jun Wang	b2adeae865	[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200 ) For GFX10+, currently null cannot be used as dst reg in instructions that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4). This patch fixes this problem while ensuring null cannot be used as S#, T#, or V#.	2025-01-03 11:49:51 -08:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	22503a9df1	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:27:01 -08:00
Matt Arsenault	5dd48c4901	AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 of gfx950 (#117590 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:20:51 -08:00
Matt Arsenault	cd20fc0772	AMDGPU: Remove wavefrontsize64 feature from dummy target (#117410 ) This is a refinement for the existing hack. With this, the default target will have neither wavefrontsize feature present, unless it was explicitly specified. That is, getWavefrontSize() == 64 no longer implies +wavefrontsize64. getWavefrontSize() == 32 does imply +wavefrontsize32. Continue to assume the value is 64 with no wavesize feature. This maintains the codegenable property without any code that directly cares about the wavesize needing to worry about it. Introduce an isWaveSizeKnown helper to check if we know the wavesize is accurate based on having one of the features explicitly set, or a known target-cpu. I'm not sure what's going on in wave_any.s. It's testing what happens when both wavesizes are enabled, but this is treated as an error in codegen. We now treat wave32 as the winning case, so some cases that were previously printed as vcc are now vcc_lo.	2024-11-23 09:27:47 -08:00
Matt Arsenault	8b087d6422	AMDGPU: Move default wavesize hack for disassembler (#117422 ) You cannot adjust the disassembler's subtarget. llvm-mc passes the originally constructed MCSubtargetInfo around, rather than querying the pointer in the disassembler instance.	2024-11-23 09:24:44 -08:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Brox Chen	9fb01fcd9f	[AMDGPU][MC][True16] Support VOP2 instructions with true16 format (#115233 ) Support true16 format for VOP2 instructions in MC This patch updates the true16 and fake16 vop_profile for the following instructions and update the asm/dasm tests: v_fmac_f16 v_fmamk_f16 v_fmaak_f16 It seems vop2_t16_promote.s files are not yet updated with true16 flag in the previous batch update. It will be updated seperately	2024-11-20 11:33:04 -05:00
Brox Chen	abff8fe2a9	[AMDGPU][True16][MC] VINTERP instructions supporting true16/fake16 (#113634 ) Update VInterp instructions with true16 and fake16 formats. This patch includes instructions: v_interp_p10_f16_f32 v_interp_p2_f16_f32 v_interp_p10_rtz_f16_f32 v_interp_p2_rtz_f16_f32 dasm test vinterp-fake16.txt is removed and the testline are merged into vinterp.txt which handles both true16/fake16 cases	2024-11-14 18:22:37 -05:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Craig Topper	fd50cdfb94	[AMDGPU] Use MCRegister. NFC	2024-09-28 11:40:25 -07:00
Jun Wang	f6a8eb98b1	[AMDGPU][MC] Disallow null as saddr in flat instructions (#101730 ) Some flat instructions have an saddr operand. When 'null' is provided as saddr, it may have the same encoding as another instruction. For example, the instructions 'global_atomic_add v1, v2, null' and 'global_atomic_add v[1:2], v2, off' have the same encoding. This patch disallows having null as saddr.	2024-09-24 11:08:41 +04:00
Jay Foad	73b8074e68	[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414 )	2024-09-20 13:45:04 +01:00
Brox Chen	35e27c0ee5	[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510 ) This is a large patch includes the MC level support for V_CVT_F16_F32, V_CVT_F32_F16 and V_LDEXP_F16 in true16 format. This patch includes the asm/disasm changes to encode/decode the 16bit vsrc, vdst and src modifieres for vop and dpp format. This patch is a dependency for many 16 bit instructions while only three instructions are updated to make it easier to review. There will be another patch to support these three instructions in the codeGen level, this patch just replaces these two instructions with its fake16 format.	2024-09-11 10:48:11 -04:00
Craig Topper	c1b3ebba79	[MC] Update MCOperand::getReg/setReg/createReg and MCInstBuilder::addReg to use MCRegister. (#106015 ) Replace unsigned with MCRegister. Update some ternary operators that started giving errors.	2024-08-26 09:37:49 -07:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Stanislav Mekhanoshin	b132dd41eb	[AMDGPU] Remove wavefrontsize feature from GFX10+ (#98400 ) Processor definition shall not include a default feature which may be switched off by a different wave size. This allows not to write -mattr=-wavefrontsize32,+wavefrontsize64 in tests.	2024-07-16 01:02:25 -07:00
Carl Ritson	e83e53b702	[AMDGPU][MC] Allow UC_VERSION_* constant reuse (#96461 ) If more than one disassembler is created for a context then allow reuse of existing constants. Warn if constants values do not match.	2024-07-07 17:39:03 +09:00
Jay Foad	bb973785c9	[AMDGPU] Only reinitialize disassembler Bytes array when needed. NFC. (#96666 )	2024-06-27 15:45:30 +01:00
Ivan Kosarev	162386693f	[AMDGPU][MC] Support UC_VERSION_* constants. (#95618 ) Our other tools support them, so we want them in LLVM assembler/disassembler too.	2024-06-18 15:44:14 +01:00
luolent	a98a6e95be	Add clarifying parenthesis around non-trivial conditions in ternary expressions. (#90391 ) Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868) Parenthesis are added as requested on ternary operators with non trivial conditions. I used this [precedence table](https://en.cppreference.com/w/cpp/language/operator_precedence) for reference, to make sure we get the expected behavior on each change.	2024-05-04 18:38:45 +01:00
Stanislav Mekhanoshin	6e722bbe30	[AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244 )	2024-04-26 13:02:57 -07:00
Emma Pilkington	68e814d911	[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400 ) These mostly are checking for various reserved bits being set. The diagnostics for gpu-dependent reserved bits have a bit more context since they seem like the most likely ones to be observed in practice. This commit also improves the error handling mechanism for MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter that was just being ignored by llvm-objdump, now it returns errors using Expected<T>.	2024-04-18 13:44:22 -04:00
Jay Foad	60e7ae3f30	[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992 ) Speed up disassembly by only calling tryDecodeInst for DecoderTables that make sense for the current subtarget. This gives a 1.3x speed-up on check-llvm-mc-disassembler-amdgpu in my Release+Asserts build.	2024-02-26 13:02:08 +00:00
Jay Foad	42f6f95e08	[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775 ) Remove all the code that set and tested Res. Change all convert* functions to return void since none of them can fail. getInstruction only has one main point of failure, after all calls to tryDecodeInst have failed.	2024-02-23 18:44:02 +00:00
Jay Foad	3b7d43301e	[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491 ) Now that there is no special checking for valid DPP encodings, these instructions can use the same DecoderNamespace as other 64- or 96-bit instructions. Also clean up setting DecoderNamespace: in most cases it should be set as a pair with AssemblerPredicate.	2024-02-22 11:18:18 +00:00
Jay Foad	b9ce237980	[AMDGPU] Clean up conversion of DPP instructions in AMDGPUDisassembler (#82480 ) Convert DPP instructions after all calls to tryDecodeInst, just like we do for all other instruction types. NFCI.	2024-02-22 10:39:43 +00:00
Jay Foad	bcbffd99c4	[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379 ) Split Dpp8FI and Dpp16FI into two different operands sharing an AsmOperandClass. They are parsed and rendered identically as fi:1 but the encoding is different: for DPP16 FI is a single bit, but for DPP8 it uses two different special values in the src0 field. Having a dedicated decoder for Dpp8FI allows it to reject other (non-special) src0 values so that AMDGPUDisassembler::getInstruction no longer needs to call isValidDPP8 to do post hoc validation of decoded DPP8 instructions.	2024-02-22 09:40:46 +00:00
Jay Foad	ddba6b271c	[AMDGPU] Stop using SDWA DecoderNamespaces. NFCI. (#82233 ) 64-bit SDWA encodings have to be checked first because their first 32 bits are a special case of the corresponding 32-bit non-SDWA encoding of the same instruction. But all 64-bit encodings are checked first, so we don't need special handling for SDWA.	2024-02-20 12:58:07 +00:00
Jay Foad	a4d4615771	[AMDGPU] Try decoding instructions longest first. NFCI. (#82014 ) AMDGPUDisassembler::getInstruction tries decoding instructions using different DecoderTables in a confusing order: first 96-bit instructions, then some 64-bit, then 32-bit, then some more 64-bit. This patch changes it to always try longer encodings first. The motivation is to make getInstruction easier to understand, and to pave the way for combining some 64-bit tables that do not need to be separate.	2024-02-20 12:09:21 +00:00
Stanislav Mekhanoshin	13e64958a0	[AMDGPU] Fix decoder for BF16 inline constants (#82276 ) Fix #82039.	2024-02-19 13:45:23 -08:00
Jay Foad	ded3ca224f	[AMDGPU] Set predicates more consistently for BUF instructions (#81865 ) Set DecoderNamespace and AssemblerPredicate in the base class for Real instructions for each subtarget. This avoids some ad hoc "let" around groups of instructions definitions, and fixes some missed cases like BUFFER_GL0_INV_gfx10 which was missing DecoderNamespace.	2024-02-17 13:19:39 +00:00
Jay Foad	d3b825f80a	[AMDGPU] Use consistent DecoderNamespace for wave64 instructions. NFC. (#81863 ) For wave64 WMMA instructions, putting W64 in the DecoderNamespace is more descriptive than WMMA, and matches other uses for GFX12 GLOBAL_LOAD_TR instructions.	2024-02-15 14:47:46 +00:00
Ivan Kosarev	4c931091a3	[AMDGPU][NFC] Get rid of some operand decoders defined using macros. (#81482 ) Use templates instead. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-02-13 10:27:56 +00:00
Ivan Kosarev	7d19dc50de	[AMDGPU][True16] Support VOP3 source DPP operands. (#80892 )	2024-02-08 16:23:00 +00:00
Emma Pilkington	4eb0810922	[llvm-objdump][AMDGPU] Pass ELF ABIVersion through disassembler (#78907 ) Admittedly, its a bit ugly to pass the ABIVersion through onSymbolStart but I'm not sure what a better place for it would be.	2024-02-01 11:26:42 -05:00
Simon Pilgrim	70fbcdb41d	Fix MSVC "signed/unsigned mismatch" warning. NFC.	2024-01-26 14:40:10 +00:00
Ivan Kosarev	2aa8945d59	[AMDGPU][NFC] Use templates to decode AV operands. (#79313 ) Eliminates the need to define them manually. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-01-25 11:30:04 +00:00
Ivan Kosarev	2e81ac25b4	[AMDGPU][NFC] Simplify AGPR/VGPR load/store operand definitions. (#79289 ) Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-01-24 15:38:16 +00:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Piotr Sobczak	57f6a3f7ea	[AMDGPU] Add global_load_tr for GFX12 (#77772 ) Support new amdgcn_global_load_tr instructions for load with transpose. * MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128 * Intrinsic int_amdgcn_global_load_tr * Clang builtins amdgcn_global_load_tr*	2024-01-18 15:14:42 +01:00
Nicolai Hähnle	49b492048a	AMDGPU: Fix packed 16-bit inline constants (#76522 ) Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)	2024-01-04 00:10:15 +01:00
Jay Foad	c01e844a7e	[AMDGPU] Update compute program resource registers for GFX12 (#75911 ) Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>	2024-01-02 13:24:42 +00:00
Ivan Kosarev	8c6172b0ac	[AMDGPU][True16] Don't use the VGPR_LO/HI16 register classes. (#76440 ) Removing the classes requires updating tests and so is planned to be done with a separate change.	2023-12-28 11:48:25 +00:00
Jay Foad	8fdfd34cd2	[AMDGPU] Remove GDS and GWS for GFX12 (#76148 )	2023-12-21 15:27:08 +00:00
Mirko Brkušanin	569ef8ddd9	[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204 )	2023-12-15 10:41:05 +01:00
Mirko Brkušanin	c1a6974d6b	[AMDGPU][MC] Add GFX12 SMEM encoding (#75215 )	2023-12-15 09:00:54 +01:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00

1 2 3 4 5

247 Commits