Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on
GFX1150Plus
The w32 and w64 _e64_dpp assembler only real instructions were unused,
and erroneously constructed in a way that bugged parsing of the new
instructions. They are removed.
This patch is a follow up to PR
https://github.com/llvm/llvm-project/pull/87382
…ng VOPC.
Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions
should allow sgpr and imm operands.
PR #67461 added support for this with int operands, but it was missing a
piece for float.
Changing VOPC e64_dpp will be in a different patch because there is a
bug preventing that change.
NFC.
gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the
assembler and disassembler tests to be sorted based on real16 or fake16
instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16
(fake16 in encoding, but not by name) upstream, so that is why the test
files have a -fake16 suffix.
One test input is changed, and that is the disassembler test for
unsupported bits in the instruction. It is now an input that is valid on
both GFX11 and GFX12. This was necessary because the size of the opcode
field changed.
NFC.
The whitespace needs to be consistently formatted in some manner. Might
as well use -strict-whitespace as the standard.
Adds -strict-whitespace to RUN lines and adjust CHECK line space padding
accordingly.
Also test REAL16 and FAKE16 CHECK lines with wave64.
There is nothing in these instruction definitions that depends on wave
size so testing both seems like overkill. The corresponding assembler
tests do not do it.
Support new amdgcn_global_load_tr instructions for load with transpose.
* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
Endoding is VOP3P. Tagged as deep/machine learning instructions. i32
type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and
src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32
fneg(neg_lo[2]) and f32 fabs(neg_hi[2]).
---------
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
As not all fake instructions have their real counterparts implemented
yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow
both fake and real True16 instructions in assembler and disassembler
tests in the -mattr=+real-true16 mode during the transition period.
Source DPP and desitnation VOPDstOperand_t16 operands are still not
supported and will be addressed separately.
GFX12 has separate wait instructions per counter e.g. S_WAIT_LOADCNT.
S_WAITCNT still exists but is deprecated and codegen should stop using
it. S_WAITCNT_* (e.g. S_WAITCNT_VSCNT) are removed.
This patch adds/removes MC layer support for these instructions.
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.
Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.
Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.
Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.
Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.
A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.
Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)