Currently the bitwise NOT instruction is not recognized. Add support for
using NOT on data registers. This is a partial implementation that puts
NOT at the same level of support as NEG currently enjoys.
Using not rather than eori cuts the length of the encoded instruction
in half or in thirds, leading to a reduction of 4-10 cycles per
instruction, on the original 68000.
This change includes tests for both bitwise and arithmetic negation.
Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on
GFX1150Plus
The w32 and w64 _e64_dpp assembler only real instructions were unused,
and erroneously constructed in a way that bugged parsing of the new
instructions. They are removed.
This patch is a follow up to PR
https://github.com/llvm/llvm-project/pull/87382
…ng VOPC.
Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions
should allow sgpr and imm operands.
PR #67461 added support for this with int operands, but it was missing a
piece for float.
Changing VOPC e64_dpp will be in a different patch because there is a
bug preventing that change.
- The opcode of the mina.fmt and max.fmt is documented wrong, the
object code compiled from the same assembly with LLVM behaves
differently than one compiled with GCC and Binutils.
- Modify the opcodes to match Binutils. The actual opcodes are as
follows:
{5,3} | bits {2,0} of func
| ... | 100 | 101 | 110 | 111
-----+-----+-----+-----+-----+-----
010 | ... | min | mina | max | maxa
NFC.
gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the
assembler and disassembler tests to be sorted based on real16 or fake16
instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16
(fake16 in encoding, but not by name) upstream, so that is why the test
files have a -fake16 suffix.
One test input is changed, and that is the disassembler test for
unsupported bits in the instruction. It is now an input that is valid on
both GFX11 and GFX12. This was necessary because the size of the opcode
field changed.
Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83.
T1 allow for an optional registers list,
the register list must be {d0-d15}.
T2 define a mandatory register list,
the register list must be {d0-d31}.
The requirements for T1/T2 are as follows:
T1 T2
Require: v8-M.Main, v8.1-M.Main,
secure state secure state
16 D Regs valid valid
32 D Regs UNDEFINED valid
No D Regs NOP NOP
APX assembly syntax recommendations:
https://cdrdv2.intel.com/v1/dl/getContent/817241
NOTE:
The change in llvm/tools/llvm-exegesis/lib/X86/Target.cpp is for test
LLVM ::
tools/llvm-exegesis/X86/latency/latency-SETCCr-cond-codes-sweep.s
For `SETcc`, llvm-exegesis would randomly choose 1 other instruction to
test with `SETcc`, after selecting the instruction, llvm-exegesis would
check if the operand is initialized and valid, if not
`randomizeTargetMCOperand` would choose a value for invalid operand, it
misses support for condition code operand, which cause the flaky failure
after `CCMP` supported.
llvm-exegesis can choose `CCMP` without specifying ccmp feature b/c it
use `MCSubtarget` and only16/32/64 bit is considered.
llvm-exegesis doesn't choose other instructions b/c requirement in
`hasAliasingRegistersThrough`: the instruction should use GPR (defined
by `SETcc`) and define `EFLAGS` (used by `SETcc`).
NFC.
The whitespace needs to be consistently formatted in some manner. Might
as well use -strict-whitespace as the standard.
Adds -strict-whitespace to RUN lines and adjust CHECK line space padding
accordingly.
Also test REAL16 and FAKE16 CHECK lines with wave64.
T1 allows for an optional registers list, the register list must be {d0-d15}.
T2 defines a mandatory register list, the register list must be {d0-d31}.
The requirements for T1/T2 are as follows:
T1 T2
Require: v8-M.Main, v8.1-M.Main,
secure state secure state
16 D Regs valid valid
32 D Regs UNDEFINED valid
No D Regs NOP NOP
Fixes#82557. APX specification states that the high bits found in REX2
used to encode GPRs can also be used to encode control and debug
registers, although all of them will #UD. Therefore, when disassembling
we reject attempts to create control or debug registers with a value of
16 or more.
See page 22 of the
[specification](https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html):
> Note that the R, X and B register identifiers can also address non-GPR
register types, such as vector registers, control registers and debug
registers. When any of them does, the highest-order bits REX2.R4,
REX2.X4 or REX2.B4 are generally ignored, except when the register being
addressed is a control or debug register. [...] The exception is that
REX2.R4 and REX2.R3 [*sic*] are not ignored when the R register
identifier addresses a control or debug register. Furthermore, if any
attempt is made to access a non-existent control register (CR*) or debug
register (DR*) using the REX2 prefix and one of the following
instructions:
“MOV CR*, r64”, “MOV r64, CR*”, “MOV DR*, r64”, “MOV r64, DR*”. #UD is
raised.
The invalid encodings are 64-bit only because `0xd5` is a valid
instruction in 32-bit mode.
There is nothing in these instruction definitions that depends on wave
size so testing both seems like overkill. The corresponding assembler
tests do not do it.
The Ampere1B is Ampere's third-generation core implementing a
superscalar, out-of-order microarchitecture with nested virtualization,
speculative side-channel mitigation and architectural support for
defense against ROP/JOP style software attacks.
Ampere1B is an ARMv8.7+ implementation, adding support for the FEAT
WFxT, FEAT CSSC, FEAT PAN3 and FEAT AFP extensions. It also includes all
features of the second-generation Ampere1A, such as the Memory Tagging
Extension and SM3/SM4 cryptography instructions.
The FP/SIMD instructions are optional for v8-R, so they should not be
marked as a dependency of HasV8_0rOps. This had the effect of disabling
some v8R-specific system registers when any of these features was
disabled.
I've moved these features to be enabled by default for Cortex-R82
(currently the only v8-R AArch64 core), matching the previous behavior,
and clang's default.
Based on a patch by Simi Pallipurath <simi.pallipurath@arm.com>
https://github.com/llvm/llvm-project/pull/76125 supported the enc/dec
for CMPCCXADD instructions, this patch
1. Add lowering test for promoted CMPCCXADD
2. Update the representation of condition code for promoted CMPCCXADD to
align with the existing one
Address review comments in #76709
Add `NoCD8` to class `ITy`, and rewrite the promoted instructions with
`ITy` to avoid unexpected incorrect encoding about `NoCD8`.
Support new amdgcn_global_load_tr instructions for load with transpose.
* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*