Processor definition shall not include a default feature which may be
switched off by a different wave size. This allows not to write
-mattr=-wavefrontsize32,+wavefrontsize64 in tests.
These mostly are checking for various reserved bits being set. The diagnostics
for gpu-dependent reserved bits have a bit more context since they seem like the
most likely ones to be observed in practice.
This commit also improves the error handling mechanism for
MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter
that was just being ignored by llvm-objdump, now it returns errors using
Expected<T>.
Speed up disassembly by only calling tryDecodeInst for DecoderTables
that make sense for the current subtarget.
This gives a 1.3x speed-up on check-llvm-mc-disassembler-amdgpu in my
Release+Asserts build.
Remove all the code that set and tested Res. Change all convert*
functions to return void since none of them can fail. getInstruction
only has one main point of failure, after all calls to tryDecodeInst
have failed.
Now that there is no special checking for valid DPP encodings, these
instructions can use the same DecoderNamespace as other 64- or 96-bit
instructions.
Also clean up setting DecoderNamespace: in most cases it should be set
as a pair with AssemblerPredicate.
Split Dpp8FI and Dpp16FI into two different operands sharing an
AsmOperandClass. They are parsed and rendered identically as fi:1 but
the encoding is different: for DPP16 FI is a single bit, but for DPP8 it
uses two different special values in the src0 field. Having a dedicated
decoder for Dpp8FI allows it to reject other (non-special) src0 values
so that AMDGPUDisassembler::getInstruction no longer needs to call
isValidDPP8 to do post hoc validation of decoded DPP8 instructions.
64-bit SDWA encodings have to be checked first because their first 32
bits are a special case of the corresponding 32-bit non-SDWA encoding of
the same instruction. But all 64-bit encodings are checked first, so we
don't need special handling for SDWA.
AMDGPUDisassembler::getInstruction tries decoding instructions using
different DecoderTables in a confusing order: first 96-bit instructions,
then some 64-bit, then 32-bit, then some more 64-bit.
This patch changes it to always try longer encodings first. The
motivation is to make getInstruction easier to understand, and to pave
the way for combining some 64-bit tables that do not need to be
separate.
Set DecoderNamespace and AssemblerPredicate in the base class for Real
instructions for each subtarget. This avoids some ad hoc "let" around
groups of instructions definitions, and fixes some missed cases like
BUFFER_GL0_INV_gfx10 which was missing DecoderNamespace.
For wave64 WMMA instructions, putting W64 in the DecoderNamespace is
more descriptive than WMMA, and matches other uses for GFX12
GLOBAL_LOAD_TR instructions.
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.
This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
Support new amdgcn_global_load_tr instructions for load with transpose.
* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.
Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.
Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.
Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.
Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.
A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.
Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
In GFX12 the exp instruction is renamed to export, but exp is still
accepted as an alias.
Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
- Be explicit about which program resource register is supported by
which target
- RSRC1
- FP16_OVFL is GFX9+
- WGP_MODE is GFX10+
- MEM_ORDERED is GFX10+
- FWD_PROGRESS is GFX10+
- RSRC3
- INST_PREF_SIZE is GFX11+
- TRAP_ON_START is GFX11+
- TRAP_ON_END is GFX11+
- IMAGE_OP is GFX11+
- Do not emit GFX11+ fields when disassembling GFX10 code objects
- Tighten enforcement of reserved bits in disassembler
---------
Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>
A 64-bit literal can be used as a 32-bit zero or sign extended operand.
In case of double zeroes are added to the low 32 bits. Currently asm
parser stores only high 32 bits of a double into an operand. To support
codegen as requested by the
https://github.com/llvm/llvm-project/issues/67781 we need to change the
representation to store a full 64-bit value so that codegen can simply
add immediates to an instruction.
There is some code to support compatibility with existing tests and asm
kernels. We allow to use short hex strings to represent only a high 32
bit of a double value as a valid literal.
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form. This patch replaces
support::endianness::{big,little,native} with
llvm::endianness::{big,little,native}.
Reverts 6cb3866b1ce9d835402e414049478cea82427cf1.
Analysis of failures on buildbots with expensive checks enabled showed
that the problem was triggered by changes in another commit,
469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug
addressed in #67245.