llvm-project

Author	SHA1	Message	Date
Jun Wang	82f00ea40a	[AMDGPU][MC] In GFX11+ v_pk_fmac_f16 should not allow DPP (#148751 ) In GFX11+ the instruction v_pk_fmac_f16 should not allow DPP.	2025-07-30 13:44:24 -07:00
Stanislav Mekhanoshin	2346968807	[AMDGPU] Add V_ADD\|SUB\|MUL_U64 gfx1250 opcodes (#150291 )	2025-07-23 13:17:56 -07:00
Stanislav Mekhanoshin	5277021c3c	[AMDGPU] Add gfx1250 v_fmac_f64 implementation (#148725 )	2025-07-14 15:39:04 -07:00
Stanislav Mekhanoshin	f090554359	[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282 )	2025-07-11 14:17:03 -07:00
Stanislav Mekhanoshin	7920dff394	[AMDGPU] VOPD/VOPD3 changes for gfx1250 (#147602 )	2025-07-10 14:15:01 -07:00
Stanislav Mekhanoshin	d0a4af725e	[AMDGPU] Add FeatureIEEEMinimumMaximumInsts. NFCI. (#147594 ) Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2025-07-08 14:32:44 -07:00
Jun Wang	063cee7bde	[AMDGPU][MC] Allow opsel for v_max_i16 etc in GFX10 (#143982 ) In GFX10, a number of VOP3 instructions should allow opsel, including V_MAX_I16, V_MAX_U16, V_MIN_I16, V_MIN_U16, V_MUL_LO_U16, V_LSHLREV_B16, V_LSHRREV_B16, and V_ASHRREV_I16.	2025-06-26 14:08:13 -07:00
Jun Wang	46d33b6102	[AMDGPU][MC] Allow dpp in v_pk_fmac_f16 for GFX9 and GFX10 (#144782 ) Allows dpp in v_pk_fmac_f16 for GFX9, and both dpp and dpp8 for GFX10.	2025-06-24 15:14:00 -07:00
Stanislav Mekhanoshin	849ecbc3ba	[AMDGPU] Simplify getIns64. NFCI. (#139981 ) This big switch is unmaintainable and buggy. In particular it unconditionally adds clamp if there is omod to VOP3.	2025-05-15 02:59:46 -07:00
Ivan Kosarev	66d3980b53	[AMDGPU][NFC] Remove _DEFERRED operands. (#139123 ) All immediates are deferred now.	2025-05-09 10:10:53 +01:00
Pierre van Houtryve	0f0d3fb6b5	[AMDGPU] Do not allow M0 as v_readlane_b32 dst (#128867 ) See #128851 - this is the same patch, but for v_readlane_b32. This instruction is used much less often so there were less changes required.	2025-02-26 14:13:39 +01:00
Pravin Jagtap	7c2ebe5dbb	AMDGPU: Restrict src0 to VGPRs only for certain cvt scale opcodes. (#127464 ) The Src0 operand width higher that 32-bits of cvt_scale opcodes operating on FP6/BF6/FP4 need to be restricted to take only VGPRs.	2025-02-21 07:27:25 +05:30
Brox Chen	8a0c2e7567	[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736 ) Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and fake16 flow. Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we have to at least update the fake16 codeGen to get codeGen test passing. For this case, we have to update the true16 and with fake16 together, otherwise some of the true16 tests will fail	2025-01-16 17:18:28 -05:00
Brox Chen	0f3aeca16f	[AMDGPU][True16][CodeGen] Update and/or/xor codegen pattern for i16 (#121835 ) In true16 flow, remove and/or/xor 32bit patterns for i16	2025-01-13 16:48:00 -05:00
Brox Chen	c3241a9a4d	[AMDGPU][True16][MC] test update for v_subrev_f16 in true16 (#119315 ) This is a NFC change. Update mc test for v_subrev_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-18 13:01:08 -05:00
Brox Chen	5270e63cdc	[AMDGPU][True16][MC] test update for v_ldexp_f16 in true16 (#119313 ) This is a NFC change. Update mc test for v_ldexp_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-18 13:00:07 -05:00
Brox Chen	f9a9173b6c	[AMDGPU][True16][MC] test update for v_mul_f16 in true16 (#119314 ) This is a NFC change. Update mc test for v_mul_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-17 13:24:32 -05:00
Brox Chen	8bbbcaddbb	[AMDGPU][True16][MC] test update for v_max_f16/v_min_f16 in true16 (#119291 ) This is a NFC change. Update mc test for v_max/min_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-17 13:12:39 -05:00
Brox Chen	cbed714f2a	[AMDGPU][True16][MC] test update for v_add/sub_f16 in true16 (#118926 ) This is a NFC change. Update mc test for v_add/sub_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-09 17:58:21 -05:00
Jay Foad	f9d6d46a8e	[AMDGPU] Add assembler/disassembler support for v_dual_dot2acc_f32_bf16 (#118984 ) There is still no codegen support because the corresponding v_dot2c_f32_bf16 instruction is not supported on GFX11.	2024-12-09 09:47:22 +00:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Brox Chen	9fb01fcd9f	[AMDGPU][MC][True16] Support VOP2 instructions with true16 format (#115233 ) Support true16 format for VOP2 instructions in MC This patch updates the true16 and fake16 vop_profile for the following instructions and update the asm/dasm tests: v_fmac_f16 v_fmamk_f16 v_fmaak_f16 It seems vop2_t16_promote.s files are not yet updated with true16 flag in the previous batch update. It will be updated seperately	2024-11-20 11:33:04 -05:00
Matt Arsenault	b7d635ed30	AMDGPU: Copy correct predicates for SDWA reals (#116288 ) There are a lot of messes in the special case predicate handling. Currently broad let blocks override specific predicates with more general cases. For instructions with SDWA, the HasSDWA predicate was overriding the SubtargetPredicate for the instruction. This fixes enough to properly disallow new instructions that support SDWA on older targets.	2024-11-18 08:38:35 -08:00
Brox Chen	e8644e3b47	[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436 ) Some old "t16" VOP2 instructions are actually in fake16 format. Correct and update test file	2024-11-05 16:12:49 -05:00
Jay Foad	b3acb25735	[AMDGPU] Don't rely on !eq comparing int with bits<5>. NFC. (#113279 ) Tweak VOP2eInst_Base so that it does not rely on !eq comparing an int value (-1) with a bits<5> value. This is to avoid a change in behaviour when #112904 lands, which is a bug fix which has the side effect of implicitly casting template arguments to the declared template parameter type.	2024-10-22 12:20:36 +01:00
Brox Chen	7b4c8b35d4	[AMDGPU][True16][MC] VOP3 profile in True16 format (#109031 ) Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16 including DPP and DPP8 in true16 and fake16 format. This patch applies true16/fake16 changes and asm/dasm changes to V_ADD_NC_U16 V_ADD_NC_I16 V_SUB_NC_U16 V_SUB_NC_I16	2024-10-16 10:27:44 -04:00
Yaxun (Sam) Liu	3b88805ca2	[AMDGPU] Fix SDWA commuting (#106920 ) SDWA insts miss reverse opcode, which causes them to be treated as commutable with default reverse opcode i.e. their own opcode. As a result, SWDA F16 sub A, B and Sub B, A are merged by machine CSE. The correct behavior is to merged sub A, B and subrev B, A instead of sub B, A. This issues caused failures in rocFFT tests. Another issue is that src0_sel and src1_sel are not swapped when SDWA insts are commuted. Verified that this fixes rocFFT tests failure.	2024-10-04 15:53:40 -04:00
Brox Chen	2672037e36	[AMDGPU][True16][MC] Support VOP3 only instructions with true16 and fake16 (#109891 ) Update VOP3 only instructions with true16 and fake16 formats. This patch includes instructions: V_MUL_LO_U16 V_MAX_U16 V_MAX_I16 V_MIN_U16 V_MIN_I16 V_LSHLREV_B16 V_LSHRREV_B16 V_ASHRREV_I16	2024-10-01 09:25:36 -04:00
Corbin Robeck	661666d43a	[AMDGPU] Move renamedInGFX9 from TableGen to SIInstrInfo helper function/macro to free up a bit slot (#82787 ) Follow on to #81525 and #81901 in the series of consolidating bits in TSFlags. Remove renamedInGFX9 from SIInstrFormats.td and move to helper function/macro in SIInstrInfo. renamedInGFX9 points to V_{add, sub, subrev, addc, subb, subbrev}_ U32 and V_{div_fixup_F16, fma_F16, interp_p2_F16, mad_F16, mad_U16, mad_I16}.	2024-09-25 20:38:51 -04:00
Scott Egerton	396f677514	[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769 )	2024-09-24 10:58:00 +01:00
Brox Chen	35e27c0ee5	[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510 ) This is a large patch includes the MC level support for V_CVT_F16_F32, V_CVT_F32_F16 and V_LDEXP_F16 in true16 format. This patch includes the asm/disasm changes to encode/decode the 16bit vsrc, vdst and src modifieres for vop and dpp format. This patch is a dependency for many 16 bit instructions while only three instructions are updated to make it easier to review. There will be another patch to support these three instructions in the codeGen level, this patch just replaces these two instructions with its fake16 format.	2024-09-11 10:48:11 -04:00
Jay Foad	935b9f6274	[AMDGPU] Make use of multiclass inheritance. NFC.	2024-09-11 10:39:48 +01:00
Brox Chen	afd42fb303	[AMDGPU][True16][CodeGen] Support AND/OR/XOR and LDEXP True16 format (#102620 ) Support AND/OR/XOR true16 and LDEXP true/fake16 format. These instructions are previously implemented with fake16 profile. Fixing the implementation. Added a RA hint so that when using 16bit register in a 32bit instruction, try to use the register directly without an extra 16bit move --------- Co-authored-by: guochen2 <guochen2@amd.com>	2024-08-13 12:23:39 -04:00
Matt Arsenault	0a62980ad3	AMDGPU: Support VALU add instructions in localstackalloc (#101692 ) Pre-enable this optimization before allowing folds of frame indexes into add instructions. Disables this fold when using scratch instructions for now. I see some code size improvements with it, but the optimization needs to be smarter about the uses depending on the register classes.	2024-08-08 23:22:48 +04:00
Acim Maravic	9398cc2ec5	[LLVM][AMDGPU] Copy isConvergent from Pseudo to Real instructions (#99658 ) This patch copies the flag isConvergent from pseudo instructions to the corresponding real instructions, so that isConvergent flag is also defined for real instructions. Flags are not required by the compiler, but for consistency it would be nice to have them. Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>	2024-07-25 18:01:07 +02:00
Vikram Hegde	5feb32ba92	[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217 ) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com>	2024-06-25 14:35:19 +05:30
Scott Egerton	4a305d40a3	[AMDGPU] Exclude certain opcodes from being marked as single use (#91802 ) The s_singleuse_vdst instruction is used to mark regions of instructions that produce values that have only one use. Certain instructions take more than one cycle to execute, resulting in regions being incorrectly marked. This patch excludes these multi-cycle instructions from being marked as either producing single use values or consuming single use values or both depending on the instruction.	2024-06-12 10:43:23 +01:00
Ivan Kosarev	6b91a3be46	[AMDGPU][NFC] Rename the clamp modifier definition to follow the prevailing convention. (#94353 ) Allows to simplify the definition itself. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-06-04 16:31:27 +01:00
Joe Nash	fe0b7983a2	[AMDGPU] Create AMDGPUMnemonicAlias tablegen class (#89288 ) AMDGPUMnemonicAlias is a MnemonicAlias that inherits from GCNPredicateControl, so that we can set predicates on the alias the same way as Instructions. Use AssemblerPredicate instead of Requires on aliases NFC.	2024-05-09 11:37:56 -04:00
Joe Nash	6a13bbf92f	[AMDGPU][MC] Enables sgpr or imm src1 for float VOP3 DPP, but excludi… (#87382 ) …ng VOPC. Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions should allow sgpr and imm operands. PR #67461 added support for this with int operands, but it was missing a piece for float. Changing VOPC e64_dpp will be in a different patch because there is a bug preventing that change.	2024-04-03 11:34:12 -04:00
Joe Nash	2a3f27cce8	[AMDGPU][True16] Make NotHasTrue16BitInsts a True16Predicate (#84771 ) NFC. Test coverage on VOPC shows NotHasTrue16BitInsts on the pre-gfx11 instructions is necessary (we cannot use the default NoTrue16Predicate). Update the VOP2 instructions in the same manner.	2024-03-11 13:58:45 -04:00
Changpeng Fang	839a8fecb4	AMDGPU: Copy SubtargetPredicate from pseudo to real for dpp16 and dpp8 (#84517 ) We usually expect to copy SubtargetPredicate (and OtherPredicates) from pseudo to real. However, in dpp16 and dpp8, there are assignments like SubtargetPredicate = HasDPP/HasDPP16/HasDpp8. These assignments override predicates copied from pseudo, and thus the predicates used to define pseudo get lost. Losing predicates is a subtle issue usually not easy to be found. It may result in instructions being generated on GPUs that do not support the features to generate them. https://github.com/llvm/llvm-project/pull/84354 addressed one of such issues, and inspired this work. Fortunately, we found that the assignment of SubtargetPredicate usually comes together with assignment of AssemblerPredicate, and with the same value. For example: let AssemblerPredicate = HasDPP16; let SubtargetPredicate = HasDPP16; One of them is redundant and can be removed. In this work, we remove the redundant assignment of SubtargetPredicate, and then copy it from pseudo for VOP_DPP and VOP_DPP8. With this change, we can safely use SubtargetPredicate to define pseudo instructions.	2024-03-08 10:30:01 -08:00
Changpeng Fang	f862265733	AMDGPU: Use True16Predicate for UseRealTrue16Insts in VOP2 Reals (#84394 ) We can not use OtherPredicates or SubtargetPredicate because they should be copied from pseudo to real, and we should not override them.	2024-03-07 15:39:41 -08:00
Jay Foad	3b7d43301e	[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491 ) Now that there is no special checking for valid DPP encodings, these instructions can use the same DecoderNamespace as other 64- or 96-bit instructions. Also clean up setting DecoderNamespace: in most cases it should be set as a pair with AssemblerPredicate.	2024-02-22 11:18:18 +00:00
Jay Foad	bcbffd99c4	[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379 ) Split Dpp8FI and Dpp16FI into two different operands sharing an AsmOperandClass. They are parsed and rendered identically as fi:1 but the encoding is different: for DPP16 FI is a single bit, but for DPP8 it uses two different special values in the src0 field. Having a dedicated decoder for Dpp8FI allows it to reject other (non-special) src0 values so that AMDGPUDisassembler::getInstruction no longer needs to call isValidDPP8 to do post hoc validation of decoded DPP8 instructions.	2024-02-22 09:40:46 +00:00
Jay Foad	ddba6b271c	[AMDGPU] Stop using SDWA DecoderNamespaces. NFCI. (#82233 ) 64-bit SDWA encodings have to be checked first because their first 32 bits are a special case of the corresponding 32-bit non-SDWA encoding of the same instruction. But all 64-bit encodings are checked first, so we don't need special handling for SDWA.	2024-02-20 12:58:07 +00:00
Ivan Kosarev	4b8e55cb04	[AMDGPU][AsmParser][NFC] Rename integer modifier operands to follow the convention. (#79284 ) Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-01-25 11:40:42 +00:00
Ivan Kosarev	5a458767dd	[AMDGPU][True16] Support source DPP operands. (#79025 )	2024-01-23 09:52:49 +00:00
Ivan Kosarev	03abf7fe09	[AMDGPU] Fix predicates for V_DOT instructions. (#78198 ) Resolves AsmParser ambiguities, e.g., between V_DOT4C_I32_I8_dpp_vi and V_DOT4C_I32_I8_dpp_gfx10. The latter is predicated with isGFX10Only while the first has no subtarget generation predicates. Part of <https://github.com/llvm/llvm-project/issues/69256>.	2024-01-16 21:23:55 +00:00
Stanislav Mekhanoshin	8e9e4f8809	[AMDGPU] Remove VT helpers isFloatType, isPackedType, simplify isIntType (#77987 )	2024-01-16 02:08:22 -08:00

1 2 3 4 5

238 Commits