2274 Commits

Author SHA1 Message Date
Freddy Ye
f4509cf284
[X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473)
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-04-11 10:18:29 +08:00
Peter Lafreniere
614a578034
[M68k] Add support for bitwise NOT instruction (#88049)
Currently the bitwise NOT instruction is not recognized. Add support for
using NOT on data registers. This is a partial implementation that puts
NOT at the same level of support as NEG currently enjoys.

Using not rather than eori cuts the length of the encoded instruction
in half or in thirds, leading to a reduction of 4-10 cycles per
instruction, on the original 68000.

This change includes tests for both bitwise and arithmetic negation.
2024-04-09 09:07:26 -07:00
Joe Nash
e29228efae
[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418)
Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on
GFX1150Plus

The w32 and w64 _e64_dpp assembler only real instructions were unused,
and erroneously constructed in a way that bugged parsing of the new
instructions. They are removed.

This patch is a follow up to PR
https://github.com/llvm/llvm-project/pull/87382
2024-04-03 14:51:27 -04:00
Joe Nash
6a13bbf92f
[AMDGPU][MC] Enables sgpr or imm src1 for float VOP3 DPP, but excludi… (#87382)
…ng VOPC.

Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions
should allow sgpr and imm operands.
PR #67461 added support for this with int operands, but it was missing a
piece for float.
Changing VOPC e64_dpp will be in a different patch because there is a
bug preventing that change.
2024-04-03 11:34:12 -04:00
Cinhi Young
8b859c6e4a
[MIPS] Fix the opcode of max.fmt and mina.fmt (#85609)
- The opcode of the mina.fmt and max.fmt is documented wrong, the
  object code compiled from the same assembly with LLVM behaves
  differently than one compiled with GCC and Binutils.
- Modify the opcodes to match Binutils. The actual opcodes are as
follows:

  {5,3} | bits {2,0} of func
           |    ...   | 100  | 101    | 110   | 111
  -----+-----+-----+-----+-----+-----
   010  |   ...   |  min  | mina | max  | maxa
2024-04-03 10:14:02 +08:00
Freddy Ye
db7d243978
[X86][MC] Support enc/dec for IMULZU. (#86653)
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-03-29 15:52:41 +08:00
Joe Nash
44278f2326 [AMDGPU][MC] Fix GFX12 check line typo and move test
NFC.
Fix CHECK lines that seem to have a copy paste error.
Move the test that was formerly in gfx12_dasm_vinterp.txt (see #85949).
2024-03-21 10:46:07 -04:00
Joe Nash
d1f182c895
[AMDGPU][MC][True16] Rename and combine VINTERP MC tests (#85949)
NFC.
gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the
assembler and disassembler tests to be sorted based on real16 or fake16
instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16
(fake16 in encoding, but not by name) upstream, so that is why the test
files have a -fake16 suffix.

One test input is changed, and that is the disassembler test for
unsupported bits in the instruction. It is now an input that is valid on
both GFX11 and GFX12. This was necessary because the size of the opcode
field changed.
2024-03-21 10:42:39 -04:00
XinWang10
7b766a6f50
[X86] Support APX CMOV/CFCMOV instructions (#82592)
This patch support ND CMOV instructions and CFCMOV instructions.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-03-17 20:18:56 +08:00
Stanislav Mekhanoshin
0b0e52836d
[AMDGPU] Fix GFX11 sendmsg codes (#85299)
The code MSG_RTN_GET_TBA_TO_PC was missing, and the next code is off by
1 as a result.
2024-03-15 09:46:58 -07:00
Jay Foad
36dece0013
[AMDGPU] Add missing GFX10 buffer format d16 hi instructions (#84809) 2024-03-12 08:20:08 +00:00
Shengchen Kan
71590e7d1e [X86][test] Add missing enc/dec tests for CTEST
These tests were accidentally missed in #83863
2024-03-12 13:11:16 +08:00
Jay Foad
212604698c
[AMDGPU] Add missing tests for GFX10 (t)buffer format d16 instructions (#84789) 2024-03-11 18:25:49 +00:00
Sivan Shani
5e688f0dbd [llvm][arm] add T1 and T2 assembly options for vlldm and vlstm
Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

T1 allow for an optional registers list,
the register list must be {d0-d15}.
T2 define a mandatory register list,
the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-03-11 14:27:28 +00:00
Shengchen Kan
1ca8092e87
[X86][MC] Support encoding/decoding for APX CCMP/CTEST (#83863)
APX assembly syntax recommendations:
  https://cdrdv2.intel.com/v1/dl/getContent/817241

NOTE:
The change in llvm/tools/llvm-exegesis/lib/X86/Target.cpp is for test
LLVM ::
tools/llvm-exegesis/X86/latency/latency-SETCCr-cond-codes-sweep.s

For `SETcc`, llvm-exegesis would randomly choose 1 other instruction to
test with `SETcc`, after selecting the instruction, llvm-exegesis would
check if the operand is initialized and valid, if not
`randomizeTargetMCOperand` would choose a value for invalid operand, it
misses support for condition code operand, which cause the flaky failure
after `CCMP` supported.

llvm-exegesis can choose `CCMP` without specifying ccmp feature b/c it
use `MCSubtarget` and only16/32/64 bit is considered.
llvm-exegesis doesn't choose other instructions b/c requirement in
`hasAliasingRegistersThrough`: the instruction should use GPR (defined
by `SETcc`) and define `EFLAGS` (used by `SETcc`).
2024-03-08 20:54:33 +08:00
Joe Nash
3b1512c477 [AMDGPU] Make gfx11 vop2 disassembler tests use strict-whitespace
NFC.
Adds -strict-whitespace to RUN lines and adjusts CHECK line space
padding accordingly.

See also  (#84078)
2024-03-06 16:07:30 -05:00
Joe Nash
f448b8ec03
[AMDGPU] Make gfx11 vop1 disassembler tests use strict-whitespace (#84078)
NFC.
The whitespace needs to be consistently formatted in some manner. Might
as well use -strict-whitespace as the standard.
Adds -strict-whitespace to RUN lines and adjust CHECK line space padding
accordingly.

Also test REAL16 and FAKE16 CHECK lines with wave64.
2024-03-06 10:11:10 -05:00
Ivan Kosarev
a888f5e4d7
[AMDGPU][NFC] Update tests to use -triple= instead of -arch=. (#84153) 2024-03-06 12:44:19 +00:00
Tomas Matheson
03420f570e Revert "[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)"
This reverts commit 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

Failing EXPENSIVE_CHECKS builds with "undefined physical register".
2024-02-29 09:48:29 +00:00
XinWang10
ffa48f0c94
[X86][MC] Teach disassembler to recognize apx instructions which ignores W bit (#82747)
Extended VMX instructions and 8 bit apx extended instructions don't need
W bit, they are marked as W ignored in spec.
RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-02-29 11:44:41 +08:00
SivanShani-Arm
634b0243b8
[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)
T1 allows for an optional registers list, the register list must be {d0-d15}.
T2 defines a mandatory register list, the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-02-28 17:02:51 +00:00
Timothy Herchen
ae91a427ac
[X86][MC] Reject out-of-range control and debug registers encoded with APX (#82584)
Fixes #82557. APX specification states that the high bits found in REX2
used to encode GPRs can also be used to encode control and debug
registers, although all of them will #UD. Therefore, when disassembling
we reject attempts to create control or debug registers with a value of
16 or more.

See page 22 of the
[specification](https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html):

> Note that the R, X and B register identifiers can also address non-GPR
register types, such as vector registers, control registers and debug
registers. When any of them does, the highest-order bits REX2.R4,
REX2.X4 or REX2.B4 are generally ignored, except when the register being
addressed is a control or debug register. [...] The exception is that
REX2.R4 and REX2.R3 [*sic*] are not ignored when the R register
identifier addresses a control or debug register. Furthermore, if any
attempt is made to access a non-existent control register (CR*) or debug
register (DR*) using the REX2 prefix and one of the following
instructions:
“MOV CR*, r64”, “MOV r64, CR*”, “MOV DR*, r64”, “MOV r64, DR*”. #UD is
raised.

The invalid encodings are 64-bit only because `0xd5` is a valid
instruction in 32-bit mode.
2024-02-23 12:49:05 -08:00
Stanislav Mekhanoshin
3dfca24dda
[AMDGPU] Fix encoding of VOP3P dpp on GFX11 and GFX12 (#82710)
The bug affects dpp forms of v_dot2_f32_f16. The encoding does not match
SP3 and does not set op_sel_hi bits properly.
2024-02-23 03:50:00 -08:00
John Brawn
48101edc8d
[AArch64] Fix syntax of gcsstr and gcssttr instructions (#82385)
The address register should be surrounded by square brackets, like in
all the other str instructions.

Fixes https://github.com/llvm/llvm-project/issues/81846
2024-02-21 10:05:50 +00:00
Stanislav Mekhanoshin
98db8d0cb7
[AMDGPU] Fix v_dot2_f16_f16/v_dot2_bf16_bf16 operands (#82423)
src0 and src1 are packed f16/bf16, we are printing literals like
0x40002000, but we cannot parse it.
2024-02-20 16:34:40 -08:00
Shilei Tian
2ad43fa467
[AMDGPU] Fix operand types for V_DOT2_F32_BF16 (#82044) 2024-02-20 08:25:01 -05:00
Stanislav Mekhanoshin
13e64958a0
[AMDGPU] Fix decoder for BF16 inline constants (#82276)
Fix #82039.
2024-02-19 13:45:23 -08:00
Ivan Kosarev
0ec524b120
[AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16. (#81131)
[AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16.

Also add missing v_ceil/floor_f16 tests.

Includes https://github.com/llvm/llvm-project/pull/80892.
2024-02-19 15:50:48 +00:00
Shilei Tian
46734aa1e5
[AMDGPU] Use bf16 instead of i16 for bfloat (#80908)
Currently we generally use `i16` to represent `bf16` in those tablegen
files. This patch is trying to use `bf16` directly.

Fix #79369.
2024-02-16 15:58:30 -05:00
Jay Foad
cb8f910035
[AMDGPU] Do not test both wave sizes for DSDIR disassembly (#81719)
There is nothing in these instruction definitions that depends on wave
size so testing both seems like overkill. The corresponding assembler
tests do not do it.
2024-02-14 10:15:06 +00:00
Konstantin Zhuravlyov
cf55e61dd9
AMDGPU: Don't allow s_barrier on gfx12 (#81317)
- s_barrier is not present on gfx12
2024-02-12 11:32:46 -05:00
Philipp Tomsich
fbba818a78
[AArch64] Add the Ampere1B core (#81297)
The Ampere1B is Ampere's third-generation core implementing a
superscalar, out-of-order microarchitecture with nested virtualization,
speculative side-channel mitigation and architectural support for
defense against ROP/JOP style software attacks.

Ampere1B is an ARMv8.7+ implementation, adding support for the FEAT
WFxT, FEAT CSSC, FEAT PAN3 and FEAT AFP extensions. It also includes all
features of the second-generation Ampere1A, such as the Memory Tagging
Extension and SM3/SM4 cryptography instructions.
2024-02-09 15:22:09 -08:00
Ivan Kosarev
7d19dc50de
[AMDGPU][True16] Support VOP3 source DPP operands. (#80892) 2024-02-08 16:23:00 +00:00
XinWang10
d9e875dcc1
[X86][MC] Support encoding/decoding for APX variant LZCNT/TZCNT/POPCNT instructions (#79954)
Two variants: promoted legacy, NF (no flags update).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-31 21:10:02 +08:00
Rageking8
5f4c89edd0
Fix unsigned typos (#76670) 2024-01-27 22:20:08 -08:00
Shengchen Kan
8a4cb7b607 [X86][test] Add MRM7r/MRM7m entries in evex format enc/dec tests 2024-01-26 17:22:07 +08:00
XinWang10
02d56801ee
[X86] Support APX promoted RAO-INT and MOVBE instructions (#77431)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the promoted RAO-INT and MOVBE instructions in EVEX
space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-26 14:33:45 +08:00
XinWang10
6d0080b5de
[X86] Support promoted ENQCMD, KEYLOCKER and USERMSR (#77293)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the promoted ENQCMD, KEYLOCKER and USER-MSR
instructions in EVEX space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-26 14:24:43 +08:00
XinWang10
816cc9d24b
[X86][MC] Support Enc/Dec for NF BMI instructions (#76709)
Promoted BMI instructions were supported in #73899
2024-01-25 10:33:14 +08:00
ostannard
5469010ba7
[AArch64] FP/SIMD is not mandatory for v8-R (#79004)
The FP/SIMD instructions are optional for v8-R, so they should not be
marked as a dependency of HasV8_0rOps. This had the effect of disabling
some v8R-specific system registers when any of these features was
disabled.

I've moved these features to be enabled by default for Cortex-R82
(currently the only v8-R AArch64 core), matching the previous behavior,
and clang's default.

Based on a patch by Simi Pallipurath <simi.pallipurath@arm.com>
2024-01-24 13:12:03 +00:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Mariusz Sikora
cfddb59be2
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Ivan Kosarev
5a458767dd
[AMDGPU][True16] Support source DPP operands. (#79025) 2024-01-23 09:52:49 +00:00
Shengchen Kan
5c68c6d70f
[X86] Support encoding/decoding and lowering for APX variant SHL/SHR/SAR/ROL/ROR/RCL/RCR/SHLD/SHRD (#78853)
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-23 10:23:27 +08:00
Stanislav Mekhanoshin
1000cefc04
[AMDGPU] Remove s_set_inst_prefetch_distance support from GFX12 (#78786)
This instruction is not supported by GFX12.
2024-01-22 14:31:17 -08:00
XinWang10
d3cd1ce6ab
[X86] Add lowering tests for promoted CMPCCXADD and update CC representation (#78685)
https://github.com/llvm/llvm-project/pull/76125 supported the enc/dec
for CMPCCXADD instructions, this patch
1. Add lowering test for promoted CMPCCXADD
2. Update the representation of condition code for promoted CMPCCXADD to
align with the existing one
2024-01-22 11:32:03 +08:00
Mariusz Sikora
2c78f3b860
[AMDGPU][GFX12] Add tests for flat_atomic_pk (#78683) 2024-01-19 12:08:17 +01:00
XinWang10
d124b02242
[X86][MC] Fix wrong encoding of promoted BMI instructions due to missing NoCD8 (#78386)
Address review comments in #76709

Add `NoCD8` to class `ITy`, and rewrite the promoted instructions with
`ITy` to avoid unexpected incorrect encoding about `NoCD8`.
2024-01-19 00:27:16 +08:00
Piotr Sobczak
57f6a3f7ea
[AMDGPU] Add global_load_tr for GFX12 (#77772)
Support new amdgcn_global_load_tr instructions for load with transpose.

* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
2024-01-18 15:14:42 +01:00
Mariusz Sikora
3e6589f21c
[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917)
- image_atomic_pk_add_f16
- image_atomic_pk_add_bf16
- ds_pk_add_bf16
- ds_pk_add_f16
- ds_pk_add_rtn_bf16
- ds_pk_add_rtn_f16
- flat_atomic_pk_add_f16
- flat_atomic_pk_add_bf16
- global_atomic_pk_add_f16
- global_atomic_pk_add_bf16
- buffer_atomic_pk_add_f16
- buffer_atomic_pk_add_bf16
2024-01-18 14:01:09 +01:00