238 Commits

Author SHA1 Message Date
Jun Wang
82f00ea40a
[AMDGPU][MC] In GFX11+ v_pk_fmac_f16 should not allow DPP (#148751)
In GFX11+ the instruction v_pk_fmac_f16 should not allow DPP.
2025-07-30 13:44:24 -07:00
Stanislav Mekhanoshin
2346968807
[AMDGPU] Add V_ADD|SUB|MUL_U64 gfx1250 opcodes (#150291) 2025-07-23 13:17:56 -07:00
Stanislav Mekhanoshin
5277021c3c
[AMDGPU] Add gfx1250 v_fmac_f64 implementation (#148725) 2025-07-14 15:39:04 -07:00
Stanislav Mekhanoshin
f090554359
[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282) 2025-07-11 14:17:03 -07:00
Stanislav Mekhanoshin
7920dff394
[AMDGPU] VOPD/VOPD3 changes for gfx1250 (#147602) 2025-07-10 14:15:01 -07:00
Stanislav Mekhanoshin
d0a4af725e
[AMDGPU] Add FeatureIEEEMinimumMaximumInsts. NFCI. (#147594)
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2025-07-08 14:32:44 -07:00
Jun Wang
063cee7bde
[AMDGPU][MC] Allow opsel for v_max_i16 etc in GFX10 (#143982)
In GFX10, a number of VOP3 instructions should allow opsel, including
V_MAX_I16, V_MAX_U16, V_MIN_I16, V_MIN_U16, V_MUL_LO_U16, V_LSHLREV_B16,
V_LSHRREV_B16, and V_ASHRREV_I16.
2025-06-26 14:08:13 -07:00
Jun Wang
46d33b6102
[AMDGPU][MC] Allow dpp in v_pk_fmac_f16 for GFX9 and GFX10 (#144782)
Allows dpp in v_pk_fmac_f16 for GFX9, and both dpp and dpp8 for GFX10.
2025-06-24 15:14:00 -07:00
Stanislav Mekhanoshin
849ecbc3ba
[AMDGPU] Simplify getIns64. NFCI. (#139981)
This big switch is unmaintainable and buggy. In particular it
unconditionally
adds clamp if there is omod to VOP3.
2025-05-15 02:59:46 -07:00
Ivan Kosarev
66d3980b53
[AMDGPU][NFC] Remove _DEFERRED operands. (#139123)
All immediates are deferred now.
2025-05-09 10:10:53 +01:00
Pierre van Houtryve
0f0d3fb6b5
[AMDGPU] Do not allow M0 as v_readlane_b32 dst (#128867)
See #128851 - this is the same patch, but for v_readlane_b32.

This instruction is used much less often so there were less changes
required.
2025-02-26 14:13:39 +01:00
Pravin Jagtap
7c2ebe5dbb
AMDGPU: Restrict src0 to VGPRs only for certain cvt scale opcodes. (#127464)
The Src0 operand width higher that 32-bits of cvt_scale opcodes
operating on FP6/BF6/FP4 need to be restricted to take only VGPRs.
2025-02-21 07:27:25 +05:30
Brox Chen
8a0c2e7567
[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736)
Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and
fake16 flow.

Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we
have to at least update the fake16 codeGen to get codeGen test passing.
For this case, we have to update the true16 and with fake16 together,
otherwise some of the true16 tests will fail
2025-01-16 17:18:28 -05:00
Brox Chen
0f3aeca16f
[AMDGPU][True16][CodeGen] Update and/or/xor codegen pattern for i16 (#121835)
In true16 flow, remove and/or/xor 32bit patterns for i16
2025-01-13 16:48:00 -05:00
Brox Chen
c3241a9a4d
[AMDGPU][True16][MC] test update for v_subrev_f16 in true16 (#119315)
This is a NFC change. Update mc test for v_subrev_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-18 13:01:08 -05:00
Brox Chen
5270e63cdc
[AMDGPU][True16][MC] test update for v_ldexp_f16 in true16 (#119313)
This is a NFC change. Update mc test for v_ldexp_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-18 13:00:07 -05:00
Brox Chen
f9a9173b6c
[AMDGPU][True16][MC] test update for v_mul_f16 in true16 (#119314)
This is a NFC change. Update mc test for v_mul_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-17 13:24:32 -05:00
Brox Chen
8bbbcaddbb
[AMDGPU][True16][MC] test update for v_max_f16/v_min_f16 in true16 (#119291)
This is a NFC change. Update mc test for v_max/min_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-17 13:12:39 -05:00
Brox Chen
cbed714f2a
[AMDGPU][True16][MC] test update for v_add/sub_f16 in true16 (#118926)
This is a NFC change. Update mc test for v_add/sub_f16 in true16 format.

MC source change was done by previous patch and automatically enabled by
t16 pesudo
2024-12-09 17:58:21 -05:00
Jay Foad
f9d6d46a8e
[AMDGPU] Add assembler/disassembler support for v_dual_dot2acc_f32_bf16 (#118984)
There is still no codegen support because the corresponding 
v_dot2c_f32_bf16 instruction is not supported on GFX11.
2024-12-09 09:47:22 +00:00
Matt Arsenault
716364ebd6
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
2024-11-25 19:51:01 -08:00
Brox Chen
9fb01fcd9f
[AMDGPU][MC][True16] Support VOP2 instructions with true16 format (#115233)
Support true16 format for VOP2 instructions in MC

This patch updates the true16 and fake16 vop_profile for the following
instructions and update the asm/dasm tests:
v_fmac_f16
v_fmamk_f16
v_fmaak_f16

It seems vop2_t16_promote.s files are not yet updated with true16 flag
in the previous batch update. It will be updated seperately
2024-11-20 11:33:04 -05:00
Matt Arsenault
b7d635ed30
AMDGPU: Copy correct predicates for SDWA reals (#116288)
There are a lot of messes in the special case
predicate handling. Currently broad let blocks
override specific predicates with more general
cases. For instructions with SDWA, the HasSDWA
predicate was overriding the SubtargetPredicate
for the instruction.

This fixes enough to properly disallow new instructions
that support SDWA on older targets.
2024-11-18 08:38:35 -08:00
Brox Chen
e8644e3b47
[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436)
Some old "t16" VOP2 instructions are actually in fake16 format. Correct
and update test file
2024-11-05 16:12:49 -05:00
Jay Foad
b3acb25735
[AMDGPU] Don't rely on !eq comparing int with bits<5>. NFC. (#113279)
Tweak VOP2eInst_Base so that it does not rely on !eq comparing an int
value (-1) with a bits<5> value. This is to avoid a change in behaviour
when #112904 lands, which is a bug fix which has the side effect of
implicitly casting template arguments to the declared template parameter
type.
2024-10-22 12:20:36 +01:00
Brox Chen
7b4c8b35d4
[AMDGPU][True16][MC] VOP3 profile in True16 format (#109031)
Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16
including DPP and DPP8 in true16 and fake16 format.

This patch applies true16/fake16 changes and asm/dasm changes to
V_ADD_NC_U16
V_ADD_NC_I16
V_SUB_NC_U16
V_SUB_NC_I16
2024-10-16 10:27:44 -04:00
Yaxun (Sam) Liu
3b88805ca2
[AMDGPU] Fix SDWA commuting (#106920)
SDWA insts miss reverse opcode, which causes them to be treated as
commutable with default reverse opcode i.e. their own opcode. As a
result, SWDA F16 sub A, B and Sub B, A are merged by machine CSE. The
correct behavior is to merged sub A, B and subrev B, A instead of sub B,
A. This issues caused failures in rocFFT tests.

Another issue is that src0_sel and src1_sel are not swapped when SDWA
insts are commuted.

Verified that this fixes rocFFT tests failure.
2024-10-04 15:53:40 -04:00
Brox Chen
2672037e36
[AMDGPU][True16][MC] Support VOP3 only instructions with true16 and fake16 (#109891)
Update VOP3 only instructions with true16 and fake16 formats. 

This patch includes instructions:
V_MUL_LO_U16
V_MAX_U16
V_MAX_I16
V_MIN_U16
V_MIN_I16
V_LSHLREV_B16
V_LSHRREV_B16
V_ASHRREV_I16
2024-10-01 09:25:36 -04:00
Corbin Robeck
661666d43a
[AMDGPU] Move renamedInGFX9 from TableGen to SIInstrInfo helper function/macro to free up a bit slot (#82787)
Follow on to #81525 and #81901 in the series of consolidating bits in
TSFlags.

Remove renamedInGFX9 from SIInstrFormats.td and move to helper
function/macro in SIInstrInfo. renamedInGFX9 points to V_{add, sub,
subrev, addc, subb, subbrev}_ U32 and V_{div_fixup_F16, fma_F16,
interp_p2_F16, mad_F16, mad_U16, mad_I16}.
2024-09-25 20:38:51 -04:00
Scott Egerton
396f677514
[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769) 2024-09-24 10:58:00 +01:00
Brox Chen
35e27c0ee5
[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510)
This is a large patch includes the MC level support for V_CVT_F16_F32,
V_CVT_F32_F16 and V_LDEXP_F16 in true16 format.

This patch includes the asm/disasm changes to encode/decode the 16bit
vsrc, vdst and src modifieres for vop and dpp format. This patch is a
dependency for many 16 bit instructions while only three instructions
are updated to make it easier to review.

There will be another patch to support these three instructions in the
codeGen level, this patch just replaces these two instructions with its
fake16 format.
2024-09-11 10:48:11 -04:00
Jay Foad
935b9f6274 [AMDGPU] Make use of multiclass inheritance. NFC. 2024-09-11 10:39:48 +01:00
Brox Chen
afd42fb303
[AMDGPU][True16][CodeGen] Support AND/OR/XOR and LDEXP True16 format (#102620)
Support AND/OR/XOR true16 and LDEXP true/fake16 format.

These instructions are previously implemented with fake16 profile.
Fixing the implementation.

Added a RA hint so that when using 16bit register in a 32bit
instruction, try to use the register directly without an extra 16bit
move

---------

Co-authored-by: guochen2 <guochen2@amd.com>
2024-08-13 12:23:39 -04:00
Matt Arsenault
0a62980ad3
AMDGPU: Support VALU add instructions in localstackalloc (#101692)
Pre-enable this optimization before allowing folds of frame
indexes into add instructions. Disables this fold when using
scratch instructions for now. I see some code size improvements
with it, but the optimization needs to be smarter about the
uses depending on the register classes.
2024-08-08 23:22:48 +04:00
Acim Maravic
9398cc2ec5
[LLVM][AMDGPU] Copy isConvergent from Pseudo to Real instructions (#99658)
This patch copies the flag isConvergent from pseudo instructions to the
corresponding real instructions, so that isConvergent flag is also
defined for real instructions.

Flags are not required by the compiler, but for consistency it would be
nice to have them.

Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>
2024-07-25 18:01:07 +02:00
Vikram Hegde
5feb32ba92
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel

---------

Co-authored-by: vikramRH <vikhegde@amd.com>
2024-06-25 14:35:19 +05:30
Scott Egerton
4a305d40a3
[AMDGPU] Exclude certain opcodes from being marked as single use (#91802)
The s_singleuse_vdst instruction is used to mark regions of instructions
that produce values that have only one use.
Certain instructions take more than one cycle to execute, resulting in
regions being incorrectly marked.
This patch excludes these multi-cycle instructions from being marked as
either producing single use values or consuming single use values
or both depending on the instruction.
2024-06-12 10:43:23 +01:00
Ivan Kosarev
6b91a3be46
[AMDGPU][NFC] Rename the clamp modifier definition to follow the prevailing convention. (#94353)
Allows to simplify the definition itself.

Part of <https://github.com/llvm/llvm-project/issues/62629>.
2024-06-04 16:31:27 +01:00
Joe Nash
fe0b7983a2
[AMDGPU] Create AMDGPUMnemonicAlias tablegen class (#89288)
AMDGPUMnemonicAlias is a MnemonicAlias that inherits from
GCNPredicateControl, so that we can set predicates on the alias the same
way as Instructions.
Use AssemblerPredicate instead of Requires on aliases

NFC.
2024-05-09 11:37:56 -04:00
Joe Nash
6a13bbf92f
[AMDGPU][MC] Enables sgpr or imm src1 for float VOP3 DPP, but excludi… (#87382)
…ng VOPC.

Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions
should allow sgpr and imm operands.
PR #67461 added support for this with int operands, but it was missing a
piece for float.
Changing VOPC e64_dpp will be in a different patch because there is a
bug preventing that change.
2024-04-03 11:34:12 -04:00
Joe Nash
2a3f27cce8
[AMDGPU][True16] Make NotHasTrue16BitInsts a True16Predicate (#84771)
NFC.
Test coverage on VOPC shows NotHasTrue16BitInsts on the pre-gfx11
instructions is necessary (we cannot use the default NoTrue16Predicate).
Update the VOP2 instructions in the same manner.
2024-03-11 13:58:45 -04:00
Changpeng Fang
839a8fecb4
AMDGPU: Copy SubtargetPredicate from pseudo to real for dpp16 and dpp8 (#84517)
We usually expect to copy SubtargetPredicate (and OtherPredicates) from
pseudo to real. However, in dpp16 and dpp8, there are assignments like
SubtargetPredicate = HasDPP/HasDPP16/HasDpp8. These assignments override
predicates copied from pseudo, and thus the predicates used to define
pseudo get lost.

Losing predicates is a subtle issue usually not easy to be found. It may
result in instructions being generated on GPUs that do not support the
features to generate them.
https://github.com/llvm/llvm-project/pull/84354 addressed one of such
issues, and inspired this work.

Fortunately, we found that the assignment of SubtargetPredicate usually
comes together with assignment of AssemblerPredicate, and with the same
value. For example:
  let AssemblerPredicate = HasDPP16;
  let SubtargetPredicate = HasDPP16;
One of them is redundant and can be removed.

In this work, we remove the redundant assignment of SubtargetPredicate,
and then copy it from pseudo for VOP*_DPP and VOP*_DPP8. With this
change, we can safely use SubtargetPredicate to define pseudo
instructions.
2024-03-08 10:30:01 -08:00
Changpeng Fang
f862265733
AMDGPU: Use True16Predicate for UseRealTrue16Insts in VOP2 Reals (#84394)
We can not use OtherPredicates or SubtargetPredicate because they
should be copied from pseudo to real, and we should not override them.
2024-03-07 15:39:41 -08:00
Jay Foad
3b7d43301e
[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491)
Now that there is no special checking for valid DPP encodings, these
instructions can use the same DecoderNamespace as other 64- or 96-bit
instructions.

Also clean up setting DecoderNamespace: in most cases it should be set
as a pair with AssemblerPredicate.
2024-02-22 11:18:18 +00:00
Jay Foad
bcbffd99c4
[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379)
Split Dpp8FI and Dpp16FI into two different operands sharing an
AsmOperandClass. They are parsed and rendered identically as fi:1 but
the encoding is different: for DPP16 FI is a single bit, but for DPP8 it
uses two different special values in the src0 field. Having a dedicated
decoder for Dpp8FI allows it to reject other (non-special) src0 values
so that AMDGPUDisassembler::getInstruction no longer needs to call
isValidDPP8 to do post hoc validation of decoded DPP8 instructions.
2024-02-22 09:40:46 +00:00
Jay Foad
ddba6b271c
[AMDGPU] Stop using SDWA DecoderNamespaces. NFCI. (#82233)
64-bit SDWA encodings have to be checked first because their first 32
bits are a special case of the corresponding 32-bit non-SDWA encoding of
the same instruction. But all 64-bit encodings are checked first, so we
don't need special handling for SDWA.
2024-02-20 12:58:07 +00:00
Ivan Kosarev
4b8e55cb04
[AMDGPU][AsmParser][NFC] Rename integer modifier operands to follow the convention. (#79284)
Part of <https://github.com/llvm/llvm-project/issues/62629>.
2024-01-25 11:40:42 +00:00
Ivan Kosarev
5a458767dd
[AMDGPU][True16] Support source DPP operands. (#79025) 2024-01-23 09:52:49 +00:00
Ivan Kosarev
03abf7fe09
[AMDGPU] Fix predicates for V_DOT instructions. (#78198)
Resolves AsmParser ambiguities, e.g., between V_DOT4C_I32_I8_dpp_vi and
V_DOT4C_I32_I8_dpp_gfx10. The latter is predicated with isGFX10Only
while the first has no subtarget generation predicates.

Part of <https://github.com/llvm/llvm-project/issues/69256>.
2024-01-16 21:23:55 +00:00
Stanislav Mekhanoshin
8e9e4f8809
[AMDGPU] Remove VT helpers isFloatType, isPackedType, simplify isIntType (#77987) 2024-01-16 02:08:22 -08:00