In GFX10, a number of VOP3 instructions should allow opsel, including
V_MAX_I16, V_MAX_U16, V_MIN_I16, V_MIN_U16, V_MUL_LO_U16, V_LSHLREV_B16,
V_LSHRREV_B16, and V_ASHRREV_I16.
Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and
fake16 flow.
Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we
have to at least update the fake16 codeGen to get codeGen test passing.
For this case, we have to update the true16 and with fake16 together,
otherwise some of the true16 tests will fail
This is a NFC change. Update mc test for v_subrev_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
This is a NFC change. Update mc test for v_ldexp_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
This is a NFC change. Update mc test for v_mul_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
This is a NFC change. Update mc test for v_max/min_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
This is a NFC change. Update mc test for v_add/sub_f16 in true16 format.
MC source change was done by previous patch and automatically enabled by
t16 pesudo
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
Support true16 format for VOP2 instructions in MC
This patch updates the true16 and fake16 vop_profile for the following
instructions and update the asm/dasm tests:
v_fmac_f16
v_fmamk_f16
v_fmaak_f16
It seems vop2_t16_promote.s files are not yet updated with true16 flag
in the previous batch update. It will be updated seperately
There are a lot of messes in the special case
predicate handling. Currently broad let blocks
override specific predicates with more general
cases. For instructions with SDWA, the HasSDWA
predicate was overriding the SubtargetPredicate
for the instruction.
This fixes enough to properly disallow new instructions
that support SDWA on older targets.
Tweak VOP2eInst_Base so that it does not rely on !eq comparing an int
value (-1) with a bits<5> value. This is to avoid a change in behaviour
when #112904 lands, which is a bug fix which has the side effect of
implicitly casting template arguments to the declared template parameter
type.
Modify VOP3 profile and pesudo, and add encoding info for VOP3 True16
including DPP and DPP8 in true16 and fake16 format.
This patch applies true16/fake16 changes and asm/dasm changes to
V_ADD_NC_U16
V_ADD_NC_I16
V_SUB_NC_U16
V_SUB_NC_I16
SDWA insts miss reverse opcode, which causes them to be treated as
commutable with default reverse opcode i.e. their own opcode. As a
result, SWDA F16 sub A, B and Sub B, A are merged by machine CSE. The
correct behavior is to merged sub A, B and subrev B, A instead of sub B,
A. This issues caused failures in rocFFT tests.
Another issue is that src0_sel and src1_sel are not swapped when SDWA
insts are commuted.
Verified that this fixes rocFFT tests failure.
Update VOP3 only instructions with true16 and fake16 formats.
This patch includes instructions:
V_MUL_LO_U16
V_MAX_U16
V_MAX_I16
V_MIN_U16
V_MIN_I16
V_LSHLREV_B16
V_LSHRREV_B16
V_ASHRREV_I16
Follow on to #81525 and #81901 in the series of consolidating bits in
TSFlags.
Remove renamedInGFX9 from SIInstrFormats.td and move to helper
function/macro in SIInstrInfo. renamedInGFX9 points to V_{add, sub,
subrev, addc, subb, subbrev}_ U32 and V_{div_fixup_F16, fma_F16,
interp_p2_F16, mad_F16, mad_U16, mad_I16}.
This is a large patch includes the MC level support for V_CVT_F16_F32,
V_CVT_F32_F16 and V_LDEXP_F16 in true16 format.
This patch includes the asm/disasm changes to encode/decode the 16bit
vsrc, vdst and src modifieres for vop and dpp format. This patch is a
dependency for many 16 bit instructions while only three instructions
are updated to make it easier to review.
There will be another patch to support these three instructions in the
codeGen level, this patch just replaces these two instructions with its
fake16 format.
Support AND/OR/XOR true16 and LDEXP true/fake16 format.
These instructions are previously implemented with fake16 profile.
Fixing the implementation.
Added a RA hint so that when using 16bit register in a 32bit
instruction, try to use the register directly without an extra 16bit
move
---------
Co-authored-by: guochen2 <guochen2@amd.com>
Pre-enable this optimization before allowing folds of frame
indexes into add instructions. Disables this fold when using
scratch instructions for now. I see some code size improvements
with it, but the optimization needs to be smarter about the
uses depending on the register classes.
This patch copies the flag isConvergent from pseudo instructions to the
corresponding real instructions, so that isConvergent flag is also
defined for real instructions.
Flags are not required by the compiler, but for consistency it would be
nice to have them.
Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel
---------
Co-authored-by: vikramRH <vikhegde@amd.com>
The s_singleuse_vdst instruction is used to mark regions of instructions
that produce values that have only one use.
Certain instructions take more than one cycle to execute, resulting in
regions being incorrectly marked.
This patch excludes these multi-cycle instructions from being marked as
either producing single use values or consuming single use values
or both depending on the instruction.
AMDGPUMnemonicAlias is a MnemonicAlias that inherits from
GCNPredicateControl, so that we can set predicates on the alias the same
way as Instructions.
Use AssemblerPredicate instead of Requires on aliases
NFC.
…ng VOPC.
Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions
should allow sgpr and imm operands.
PR #67461 added support for this with int operands, but it was missing a
piece for float.
Changing VOPC e64_dpp will be in a different patch because there is a
bug preventing that change.
NFC.
Test coverage on VOPC shows NotHasTrue16BitInsts on the pre-gfx11
instructions is necessary (we cannot use the default NoTrue16Predicate).
Update the VOP2 instructions in the same manner.
We usually expect to copy SubtargetPredicate (and OtherPredicates) from
pseudo to real. However, in dpp16 and dpp8, there are assignments like
SubtargetPredicate = HasDPP/HasDPP16/HasDpp8. These assignments override
predicates copied from pseudo, and thus the predicates used to define
pseudo get lost.
Losing predicates is a subtle issue usually not easy to be found. It may
result in instructions being generated on GPUs that do not support the
features to generate them.
https://github.com/llvm/llvm-project/pull/84354 addressed one of such
issues, and inspired this work.
Fortunately, we found that the assignment of SubtargetPredicate usually
comes together with assignment of AssemblerPredicate, and with the same
value. For example:
let AssemblerPredicate = HasDPP16;
let SubtargetPredicate = HasDPP16;
One of them is redundant and can be removed.
In this work, we remove the redundant assignment of SubtargetPredicate,
and then copy it from pseudo for VOP*_DPP and VOP*_DPP8. With this
change, we can safely use SubtargetPredicate to define pseudo
instructions.
Now that there is no special checking for valid DPP encodings, these
instructions can use the same DecoderNamespace as other 64- or 96-bit
instructions.
Also clean up setting DecoderNamespace: in most cases it should be set
as a pair with AssemblerPredicate.
Split Dpp8FI and Dpp16FI into two different operands sharing an
AsmOperandClass. They are parsed and rendered identically as fi:1 but
the encoding is different: for DPP16 FI is a single bit, but for DPP8 it
uses two different special values in the src0 field. Having a dedicated
decoder for Dpp8FI allows it to reject other (non-special) src0 values
so that AMDGPUDisassembler::getInstruction no longer needs to call
isValidDPP8 to do post hoc validation of decoded DPP8 instructions.
64-bit SDWA encodings have to be checked first because their first 32
bits are a special case of the corresponding 32-bit non-SDWA encoding of
the same instruction. But all 64-bit encodings are checked first, so we
don't need special handling for SDWA.
Resolves AsmParser ambiguities, e.g., between V_DOT4C_I32_I8_dpp_vi and
V_DOT4C_I32_I8_dpp_gfx10. The latter is predicated with isGFX10Only
while the first has no subtarget generation predicates.
Part of <https://github.com/llvm/llvm-project/issues/69256>.