llvm-project

Author	SHA1	Message	Date
Matt Arsenault	5d650a62a3	AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596 ) This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:44:47 -08:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Brox Chen	4cc278587f	[AMDGPU][True16][MC] VOPC profile fake16 pseudo update (#113175 ) Update VOPC profile with VOP3 pseudo: 1. On GFX11+, v_cmp_class_f16 has src1 type f16 for literals, however it's semantically interpreted as an integer. Update VOPC class f16 profile from operand type f16, i16 to f16, f16, currently updating it for fake16 format, and will update t16 format in the following patch. 2. 16bit V_CMP_CLASS instructions (V_CMP_**_U/I/F16) are named with `t16`, but actually using 32 bit registers. Correct it by updating the pseudo definitions with useRealTrue16/useFakeTrue16 predicates and rename these `t16` instructions to `fake16`. 3. Update the inst select so that `t16`/`fake16` instructions are selected in true16/fake16 flow. 4. The mir test file are impacted for a name change of these impacted 16 bit V_CMP instructions, but non-functional change to emitted code	2024-11-22 12:12:13 -05:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Matt Arsenault	201f4f6bcc	AMDGPU: Add v_mfma_ld_scale_b32 for gfx950 (#116722 )	2024-11-20 10:52:38 -08:00
Matt Arsenault	b7d635ed30	AMDGPU: Copy correct predicates for SDWA reals (#116288 ) There are a lot of messes in the special case predicate handling. Currently broad let blocks override specific predicates with more general cases. For instructions with SDWA, the HasSDWA predicate was overriding the SubtargetPredicate for the instruction. This fixes enough to properly disallow new instructions that support SDWA on older targets.	2024-11-18 08:38:35 -08:00
Brox Chen	7b4c8b35d4	[AMDGPU][True16][MC] VOP3 profile in True16 format (#109031 ) Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16 including DPP and DPP8 in true16 and fake16 format. This patch applies true16/fake16 changes and asm/dasm changes to V_ADD_NC_U16 V_ADD_NC_I16 V_SUB_NC_U16 V_SUB_NC_I16	2024-10-16 10:27:44 -04:00
Scott Egerton	396f677514	[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769 )	2024-09-24 10:58:00 +01:00
Jeffrey Byrnes	7bcf4d63cf	[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276 ) MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" and "VALU op consumes result of that op" This includes the case where the second op is SDWA with same dest and dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there is an implicit read of the first op dst and the compiler needs to resolve this hazard. Confirmed with HW team. We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand, so this PR checks for that. MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue. Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 -- this PR adds support opsel[2] != 0 as well	2024-08-22 11:38:24 -07:00
Acim Maravic	9398cc2ec5	[LLVM][AMDGPU] Copy isConvergent from Pseudo to Real instructions (#99658 ) This patch copies the flag isConvergent from pseudo instructions to the corresponding real instructions, so that isConvergent flag is also defined for real instructions. Flags are not required by the compiler, but for consistency it would be nice to have them. Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>	2024-07-25 18:01:07 +02:00
Ivan Kosarev	47c3eca489	[AMDGPU][NFC] Make GFX*Gen records globally available. (#97291 ) And use them to simplify SOP-related definitions. Introduces GFX10Gen.	2024-07-01 16:09:56 +01:00
Scott Egerton	0a57a20aa5	[AMDGPU] NFC: Remove duplicate VOP_DPP_Pseudo TableGen definitions (#95370 ) After recent changes, VOP_DPP_Pseudo now inherits from VOP_Pseudo. This commit removes some on the duplicate definitions in VOP_DPP_Pseudo that are exactly the same as definitions inherited from VOP_Pseudo.	2024-06-14 15:52:28 +01:00
Scott Egerton	4a305d40a3	[AMDGPU] Exclude certain opcodes from being marked as single use (#91802 ) The s_singleuse_vdst instruction is used to mark regions of instructions that produce values that have only one use. Certain instructions take more than one cycle to execute, resulting in regions being incorrectly marked. This patch excludes these multi-cycle instructions from being marked as either producing single use values or consuming single use values or both depending on the instruction.	2024-06-12 10:43:23 +01:00
Fabian Ritter	0821b7937c	[AMDGPU] Copy Defs and Uses from Pseudo to Real Instructions (#93004 ) Currently, the tablegen files that generate the instruction definitions in lib/Target/AMDGPU/AMDGPUGenInstrInfo.inc often only include implicit operands for the architecture-independent pseudo instructions, but not for the corresponding real instructions. The missing implicit operands (most prominently: the EXEC mask) do not affect code generation, since that operates on pseudo instructions, but they are problematic when working with real instructions, e.g., as a decoding result from the MC layer. This patch copies the implicit Defs and Uses from pseudo instructions to the corresponding real instructions, so that implicit operands are also defined for real instructions. Addresses issue #89830.	2024-05-31 08:40:54 +02:00
Joe Nash	fe0b7983a2	[AMDGPU] Create AMDGPUMnemonicAlias tablegen class (#89288 ) AMDGPUMnemonicAlias is a MnemonicAlias that inherits from GCNPredicateControl, so that we can set predicates on the alias the same way as Instructions. Use AssemblerPredicate instead of Requires on aliases NFC.	2024-05-09 11:37:56 -04:00
Stanislav Mekhanoshin	a70ad96b3c	[AMDGPU] Fix condition in VOP3_Real_Base. NFCI. (#91373 )	2024-05-07 13:45:58 -07:00
Stanislav Mekhanoshin	57216f7bd6	[AMDGPU] Support byte_sel modifier for v_cvt_f32_fp8 and v_cvt_f32_bf8 (#90887 )	2024-05-02 12:03:51 -07:00
Stanislav Mekhanoshin	6e722bbe30	[AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244 )	2024-04-26 13:02:57 -07:00
Stanislav Mekhanoshin	ce1b6783d2	[AMDGPU] simplify VOP3_Real definitions. NFC. (#89656 )	2024-04-22 14:51:11 -07:00
Joe Nash	e29228efae	[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418 ) Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on GFX1150Plus The w32 and w64 _e64_dpp assembler only real instructions were unused, and erroneously constructed in a way that bugged parsing of the new instructions. They are removed. This patch is a follow up to PR https://github.com/llvm/llvm-project/pull/87382	2024-04-03 14:51:27 -04:00
Changpeng Fang	839a8fecb4	AMDGPU: Copy SubtargetPredicate from pseudo to real for dpp16 and dpp8 (#84517 ) We usually expect to copy SubtargetPredicate (and OtherPredicates) from pseudo to real. However, in dpp16 and dpp8, there are assignments like SubtargetPredicate = HasDPP/HasDPP16/HasDpp8. These assignments override predicates copied from pseudo, and thus the predicates used to define pseudo get lost. Losing predicates is a subtle issue usually not easy to be found. It may result in instructions being generated on GPUs that do not support the features to generate them. https://github.com/llvm/llvm-project/pull/84354 addressed one of such issues, and inspired this work. Fortunately, we found that the assignment of SubtargetPredicate usually comes together with assignment of AssemblerPredicate, and with the same value. For example: let AssemblerPredicate = HasDPP16; let SubtargetPredicate = HasDPP16; One of them is redundant and can be removed. In this work, we remove the redundant assignment of SubtargetPredicate, and then copy it from pseudo for VOP_DPP and VOP_DPP8. With this change, we can safely use SubtargetPredicate to define pseudo instructions.	2024-03-08 10:30:01 -08:00
Stanislav Mekhanoshin	d7b73c8d01	[AMDGPU] Copy WaveSizePredicate into VOP3_Real. NFCI. (#83352 )	2024-02-28 15:42:31 -08:00
Changpeng Fang	9de78c4e24	AMDGPU: Simplify FP8 conversion definitions. NFC. (#83043 ) Reals should inherit predicates from the corresponding Pseudo.	2024-02-26 10:13:40 -08:00
Stanislav Mekhanoshin	3dfca24dda	[AMDGPU] Fix encoding of VOP3P dpp on GFX11 and GFX12 (#82710 ) The bug affects dpp forms of v_dot2_f32_f16. The encoding does not match SP3 and does not set op_sel_hi bits properly.	2024-02-23 03:50:00 -08:00
Jay Foad	3b7d43301e	[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491 ) Now that there is no special checking for valid DPP encodings, these instructions can use the same DecoderNamespace as other 64- or 96-bit instructions. Also clean up setting DecoderNamespace: in most cases it should be set as a pair with AssemblerPredicate.	2024-02-22 11:18:18 +00:00
Changpeng Fang	d3fcf31031	AMDGPU: Use HasFP8ConversionInsts appropriately, NFC (#82433 ) The corresponding fp8 conversion instructions are available for a subtarget when and only when the subtarget "HasFP8ConversionInsts". We should not assume all the future subtargets (gfx12+) have FP8ConversionInsts. In this patch, we use OtherPredicates to carry HasFP8ConversionInsts feature. This is because SubtargetPredicate is not copied from pseudos to reals for DPP16 and DPP6. To avoid overriding OtherPredicates in a few places, we use the newly introduced True16Predicate to hold UseRealTrue16Insts instead. This work repalces the inadvertently closed pull request: https://github.com/llvm/llvm-project/pull/82024	2024-02-20 16:03:54 -08:00
Jay Foad	ddba6b271c	[AMDGPU] Stop using SDWA DecoderNamespaces. NFCI. (#82233 ) 64-bit SDWA encodings have to be checked first because their first 32 bits are a special case of the corresponding 32-bit non-SDWA encoding of the same instruction. But all 64-bit encodings are checked first, so we don't need special handling for SDWA.	2024-02-20 12:58:07 +00:00
Ivan Kosarev	f122268c04	[AMDGPU][NFC] Extend PredicateControl to support True16 predicates. (#82245 ) Using OtherPredicates for True16 predicates is often problematic due to interference with other kinds of predicates, particularly when this overrides predicates inherited from pseudo instructions.	2024-02-20 11:37:44 +00:00
Stanislav Mekhanoshin	f847c72be0	[AMDGPU] Use HasClamp instead of HasIntClamp in VOP3_Pseudo. NFC. (#82020 ) There is no real reason to differentiate.	2024-02-17 00:48:37 -08:00
Konstantin Zhuravlyov	fcef407aa2	AMDGPU/NFC: Remove some bits from TSFlags (#81525 ) - AMDGPU/NFC: Purge SOPK_ZEXT from TSFlags - Moved to helper function in SIInstInfo - AMDGPU/NFC: Purge VOPAsmPrefer32Bit from TSFlags - This flag did not make sense / remnants of something else I think	2024-02-12 16:43:48 -05:00
Mirko Brkušanin	815e0485a4	[AMDGPU][MC] Fix printing vcc(_lo) twice for VOPC DPP instrucitons (#81158 )	2024-02-12 19:01:58 +01:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Mariusz Sikora	28b7e498b6	AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892 ) Endoding is VOP3P. Tagged as deep/machine learning instructions. i32 type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32 fneg(neg_lo[2]) and f32 fabs(neg_hi[2]). --------- Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>	2024-01-18 14:00:27 +01:00
Stanislav Mekhanoshin	8e9e4f8809	[AMDGPU] Remove VT helpers isFloatType, isPackedType, simplify isIntType (#77987 )	2024-01-16 02:08:22 -08:00
Ivan Kosarev	60bb5c54f6	[AMDGPU] Fix predicates for various True16 instructions. (#77581 ) Resolves AsmParser ambiguities, e.g., between V_SUBREV_F16_t16_dpp8_gfx11 and V_SUBREV_F16_t16_dpp8_gfx12. Part of <https://github.com/llvm/llvm-project/issues/69256>.	2024-01-10 12:58:18 +00:00
Mirko Brkušanin	569ef8ddd9	[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204 )	2023-12-15 10:41:05 +01:00
Mariusz Sikora	a97028ac51	[AMDGPU] Update VOP instructions for GFX12 (#74853 ) Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2023-12-12 11:38:24 +01:00
Ivan Kosarev	fab28e0e14	Reapply "[AMDGPU] Introduce real and keep fake True16 instructions." Reverts 6cb3866b1ce9d835402e414049478cea82427cf1. Analysis of failures on buildbots with expensive checks enabled showed that the problem was triggered by changes in another commit, 469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug addressed in #67245.	2023-09-23 22:07:41 +01:00
Ivan Kosarev	6cb3866b1c	Revert "[AMDGPU] Introduce real and keep fake True16 instructions." This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to failures on expensive checks.	2023-09-22 15:40:26 +01:00
Ivan Kosarev	0f864c7b8b	[AMDGPU] Introduce real and keep fake True16 instructions. The existing fake True16 instructions using 32-bit VGPRs are supposed to co-exist with real ones until all the necessary True16 functionality is implemented and relevant tests are updated. Reviewed By: arsenm, Joe_Nash Differential Revision: https://reviews.llvm.org/D156101	2023-09-22 10:57:56 +01:00
Stanislav Mekhanoshin	cfe9a134bb	[AMDGPU] Rename 64BitDPP feature and fix the checks Names '64BitDPP' and especially 'DPP64' were found misleading, and DPP64 can easily be mixed with DPP16 and DPP8 while these are different concepts. DPP16 and DPP8 refers to lanes where DPP64 refers to the operand size. In fact the essential part here is that these instructions are executed on the DP ALU, so rename the feature accordingly. I have also found a bug in a check for these instructions, which is fixed here and a common utility function is now used. Differential Revision: https://reviews.llvm.org/D158465	2023-08-22 11:00:10 -07:00
Matt Arsenault	fb54afd1b7	AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers This isn't always folded to fneg for a freestanding fsub depending on the denormal mode. When matching source modifiers, we're implicitly canonicalizing the input so we can fold it here. Doesn't bother handling the VOP3P case since it's only relevant with DAZ, which nobody really uses with f16. For f64, tests show an existing bug where DAGCombiner tries to respect the denormal mode for fsub -0, x, but not after it's lowered to fadd -0, (fneg x). Either the fold is wrong or we shouldn't restrict the fsub case based on the denormal mode. https://reviews.llvm.org/D155652	2023-07-20 19:29:40 -04:00
Jay Foad	23b0df72d2	[AMDGPU] Remove BoolToList class Replace all: foreach _ = BoolToList<cond>.ret in with: if cond then Thanks to Philip Reames for D145711 which enabled this.	2023-03-13 09:22:52 +00:00
Janek van Oirschot	322966f8f8	[AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic for llvm.is.fpclass	2022-11-28 16:00:36 -05:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Petar Avramovic	8de1f04c77	[AMDGPU] gfx11 Fix VOP3 dot instructions Fix src modifiers for operands with bf16 type. op_sel[0:1] are ignored. Differential Revision: https://reviews.llvm.org/D129084	2022-07-22 11:43:35 +02:00
Dmitry Preobrazhensky	2a6532d542	[AMDGPU][MC][GFX11] Correct disassembly of *_e64_dpp opcodes which support op_sel These opcodes cannot be disassembled because op_sel operand is missing - it must be added manually. See https://github.com/llvm/llvm-project/issues/56512 for detailed issue analysis. Differential Revision: https://reviews.llvm.org/D129637	2022-07-15 13:11:59 +03:00
Jay Foad	4dbc2876cf	[AMDGPU] GFX11 trivial NFC tweaks A few miscellaneous comment, whitespace and indentation tweaks.	2022-07-05 17:20:17 +01:00
Joe Nash	0483c91eee	[AMDGPU] gfx11 CodeGen for new DPP instructions Modifies the GCNDPPCombine pass to enable DPP formation for the new DPP instruction in gfx11, namely VOP3 encoded instructions with DPP and VOPC with DPP. Depends on D128656 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128682	2022-07-05 10:17:59 -04:00

1 2 3

126 Commits