…ng VOPC.
Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions
should allow sgpr and imm operands.
PR #67461 added support for this with int operands, but it was missing a
piece for float.
Changing VOPC e64_dpp will be in a different patch because there is a
bug preventing that change.
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
Rename getNumVGPRBlocks to getEncodedNumVGPRBlocks, to clarify that it's
using the encoding granule. This is used to program the hardware. In
practice, the hardware will use the alloc granule instead, so this patch
also adds a new helper, getAllocatedNumVGPRBlocks, which can be useful
when driving heuristics.
Inline constant 1/(2*pi) has the truncated value 0x3e22. According to
the spec it is not rounded. A bf16 value in a nutshall is a fp32 value
with cleared 16 bites of mantissa. The value 0x3e22 converted to fp32 is
0.158203125 and the next representable value 0x3e23 means 0.1591796875.
The fp32 value of 1/(2*pi) = 0.15915494 cannot be represented in bf16.
Although since bf16 values are essentailly truncated fp32 values we can
use 0.15915494 as an idiomatic representation of 1/(2*pi) inline
constant. This is also consistent with sp3 behaviour. The patch fixes
the problem that value we are printing for inv2pi inline constant is not
parsed as inv2pi by the asm parser and gets rounded.
- AMDGPU/NFC: Purge SOPK_ZEXT from TSFlags
- Moved to helper function in SIInstInfo
- AMDGPU/NFC: Purge VOPAsmPrefer32Bit from TSFlags
- This flag did not make sense / remnants of something else I think
Currently the asm parser takes `v_writelane_b32 v1, s13, m0` as illegal
instruction for pre-gfx11 because it uses two constant buses while the
hardware
can only allow one. However, based on the comment of
`AMDGPUInstructionSelector::selectWritelane`,
it is allowed to have M0 as lane selector and a SGPR used as SRC0
because the
lane selector doesn't count as a use of constant bus. In fact, codegen
can already
generate this form, but this inconsistency is not exposed because the
validation
of constant bus limitation only happens when paring an assembly but we
don't have
a test case when both SGPR and M0 used as source operands for the
instruction.
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.
This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
Endoding is VOP3P. Tagged as deep/machine learning instructions. i32
type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and
src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32
fneg(neg_lo[2]) and f32 fabs(neg_hi[2]).
---------
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
As not all fake instructions have their real counterparts implemented
yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow
both fake and real True16 instructions in assembler and disassembler
tests in the -mattr=+real-true16 mode during the transition period.
Source DPP and desitnation VOPDstOperand_t16 operands are still not
supported and will be addressed separately.
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.
Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.
Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.
Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.
Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.
A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.
Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
CPol and CPol_GLC1 operand classes have identical predicates, which
means AsmParser cannot differentiate between the RTN and non-RTN
variants of the instructions. When it currently selects the wrong
instruction, a hack in cvtSMEMAtomic() corrects the op-code. Using the
new predicated-value operands makes this hack and the whole conversion
function not needed.
Other uses of CPol_GLC1 operands are to be addressed separately.
Resolves about half of the remaining ~1000 pairs of ambiguous
instructions.
Part of <https://github.com/llvm/llvm-project/issues/69256>.
- Be explicit about which program resource register is supported by
which target
- RSRC1
- FP16_OVFL is GFX9+
- WGP_MODE is GFX10+
- MEM_ORDERED is GFX10+
- FWD_PROGRESS is GFX10+
- RSRC3
- INST_PREF_SIZE is GFX11+
- TRAP_ON_START is GFX11+
- TRAP_ON_END is GFX11+
- IMAGE_OP is GFX11+
- Do not emit GFX11+ fields when disassembling GFX10 code objects
- Tighten enforcement of reserved bits in disassembler
---------
Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>
A 64-bit literal can be used as a 32-bit zero or sign extended operand.
In case of double zeroes are added to the low 32 bits. Currently asm
parser stores only high 32 bits of a double into an operand. To support
codegen as requested by the
https://github.com/llvm/llvm-project/issues/67781 we need to change the
representation to store a full 64-bit value so that codegen can simply
add immediates to an instruction.
There is some code to support compatibility with existing tests and asm
kernels. We allow to use short hex strings to represent only a high 32
bit of a double value as a valid literal.
A literal can be optionally wrapped in a lit() expression. This does not
force literal encoding for an inline immediate, but neither does SP3.
The syntax is dummy for compatibility only.