2664 Commits

Author SHA1 Message Date
Gang Chen
ef68d1587d
[AMDGPU] upstream barrier count reporting part1 (#154409) 2025-08-19 16:42:31 -07:00
Stanislav Mekhanoshin
80d430df5d
[AMDGPU] Add MSG_SAVEWAVE_HAS_TDM on gfx1250 (#153483) 2025-08-13 23:01:50 -07:00
Stanislav Mekhanoshin
fc911fe928
[AMDGPU] Add HW_REG_IB_STS2 on gfx1250 (#153479) 2025-08-13 23:01:28 -07:00
Jonathan Thackray
7bd0c5fa66
[AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (#146331)
This is a series of patches (4/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests whose .s tests have functions
* makes the .s tests have a roundabout run line to test both encoding
and assembly

See also #146328, #146329 and #146330.

Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
2025-08-13 14:40:41 +00:00
Jonathan Thackray
8453f205eb
[AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (#146330)
This is a series of patches (3/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests which have multiple feature dependencies
* makes the .s tests have a roundabout run line to test both encoding
and assembly
 * creates diagnostic tests when needed

See also #146328, #146329 and #146331.

Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
2025-08-13 14:07:28 +00:00
Jonathan Thackray
b878793739
[AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (#146329)
This is a series of patches (2/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests which have only one feature required
* makes the .s tests have a roundabout run line to test both encoding
and assembly
 * creates diagnostic tests when needed
 * fixes naming convention of tests
 
See also #146328, #146330 and #146331.

Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
2025-08-13 13:39:45 +00:00
Jonathan Thackray
69452d50ce
[AArch64][llvm] Unify AArch64 tests into a single file (1/4) (NFC) (#146328)
This is a series of patches (1/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * unifies errorless .s and .txt tests into a single file
 * remove .txt tests which don't have feature requirements
* makes the .s tests have a roundabout run line to test both encoding
and assembly
 
See also #146329, #146330 and #146331.

---------

Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
2025-08-13 13:45:25 +01:00
Stanislav Mekhanoshin
d0ee82040c
[AMDGPU] Add s_barrier_init|join|leave instructions (#153296) 2025-08-12 15:07:07 -07:00
Sam Elliott
4e11f89904
[RISCV] Basic Objdump Mapping Symbol Support (#151452)
This implements very basic support for RISC-V mapping symbols in
llvm-objdump, sharing the implementation with how Arm/AArch64/CSKY
implement this feature.

This only supports the `$x` (instruction) and `$d` (data) mapping
symbols for RISC-V, and not the version of `$x` which includes an
architecture string suffix.
2025-08-07 11:28:07 -07:00
Stanislav Mekhanoshin
b296ea9c14
[AMDGPU] s_get_shader_cycles_u64 gfx1250 instruction (#152390)
It is the same as reading SHADER_CYCLES_LO and SHADER_CYCLES_HI
but with a single instruction.
2025-08-06 15:32:28 -07:00
Stanislav Mekhanoshin
66392a8d8d
[AMDGPU] Add XNACK_STATE_PRIV and _MASK gfx1250 registers (#152374)
Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>

Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>
2025-08-06 14:44:17 -07:00
Stanislav Mekhanoshin
c3103068b7
[AMDGPU] Add more gfx1250 MC tests. NFC. (#152388)
These are already working, but left downstream.
2025-08-06 14:38:28 -07:00
Stanislav Mekhanoshin
184821b63d
[AMDGPU] Add gfx1250 DS MC tests. NFC. (#152378) 2025-08-06 14:15:35 -07:00
Jonathan Thackray
c3d24217bf
[AArch64][llvm] Fix disassembly of ldt{add,set,clr} instructions using xzr/wzr (#152292)
The current disassembly of `ldt{add,set,clr}` instructions when using
`xzr/wzr` is incorrect. The Armv9.6-A Memory Systems specification says:

```
  For each of LDT{ADD|SET|CLR}{L}, there is the corresponding STT{ADD|SET|CLR}{L}
  alias, for the case where the register selected by the Rt field is XZR or WZR
```
and:
```
  LDT{ADD|SET|CLR}{A}{L} is equivalent to LD{ADD|SET|CLR}{A}{L} except that: <..conditions..>
```

The Arm ARM specifies the preferred form of disassembly for these
aliases:
```
   STADD <Xs>, [<Xn|SP>]
   is equivalent to
   LDADD <Xs>, XZR, [<Xn|SP>]
   and is always the preferred disassembly.
```
(ref: DDI 0487L.b C6-2317)

This means that `sttadd` is the preferred disassembly for `ldtadd w0,
wzr, [x2]` when Rt is `xzr` or `wzr`.

This change also aligns llvm disassembly with GNU binutils, as shown by
the following examples:

llvm before this change:
```
% cat test.s
stadd w0, [sp]
sttadd w0, [sp]
ldadd w0, wzr, [sp]
ldtadd w0, wzr, [sp]

% llvm-mc-20 -triple aarch64 -mattr=+lse,+lsui test.s
        stadd   w0, [sp]
        ldtadd  w0, wzr, [sp]
        stadd   w0, [sp]
        ldtadd  w0, wzr, [sp]
```
llvm after this change:
```
% llvm-mc -triple aarch64 -mattr=+lse,+lsui test.s
        stadd   w0, [sp]
        sttadd  w0, [sp]
        stadd   w0, [sp]
        sttadd  w0, [sp]
```
GCC-15 test:
```
% gas test.s -march=armv8-a+lsui+lse -o test.o
% objdump -dr test.o
   0:   b82003ff        stadd   w0, [sp]
   4:   192007ff        sttadd  w0, [sp]
   8:   b82003ff        stadd   w0, [sp]
   c:   192007ff        sttadd  w0, [sp]
```
Many thanks to Ezra Sitorus and Alice Carlotti for reporting and
confirming this issue.
2025-08-06 15:44:15 +01:00
Stanislav Mekhanoshin
34aed0ed56
[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194) 2025-08-05 15:15:21 -07:00
Stanislav Mekhanoshin
d08c2977e8
[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203) 2025-08-05 14:35:48 -07:00
Oliver Stannard
f6c2a357e7
[AArch64] Add Apple assembly syntax for recent instructions (#152111)
Some vector instructions override AsmString in the tablegen description,
but did not include the Apple syntax variant, so were printed without
operands.

Fixes #151330
2025-08-05 16:04:25 +01:00
Stanislav Mekhanoshin
37fe9f6382
[AMDGPU] Add gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 MC support (#152014)
This adds new VOP3PX2e encoding
2025-08-04 14:20:12 -07:00
Stanislav Mekhanoshin
dd0737bd99
[AMDGPU] gfx1250 v_wmma_ld_scale instructions (#152010) 2025-08-04 11:36:48 -07:00
Stanislav Mekhanoshin
849009c635
[AMDGPU] Add missing v_permlane_up_b32 test. NFC. (#151811) 2025-08-02 15:22:29 -07:00
Stanislav Mekhanoshin
d18511e10a
[AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (#151810) 2025-08-02 15:21:59 -07:00
Stanislav Mekhanoshin
bc463c059c
[AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (#151807) 2025-08-02 12:42:12 -07:00
Stanislav Mekhanoshin
7598c25b5a
[AMDGPU] v_cvt_scale_pk16 gfx1250 instructions (#151804) 2025-08-02 10:45:02 -07:00
Stanislav Mekhanoshin
0988510ad4
[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773) 2025-08-01 20:12:35 -07:00
Stanislav Mekhanoshin
cc3932bf29
[AMDGPU] gfx1250 v_cvt_scalef32_sr_pk8_* instructions (#151765) 2025-08-01 19:25:57 -07:00
Stanislav Mekhanoshin
962ee7a568
[AMDGPU] gfx1250 v_cvt_scalef32_pk8_* instructions (#151758) 2025-08-01 18:29:45 -07:00
Stanislav Mekhanoshin
33abf05af4
[AMDGPU] gfx1250 v_permlane_* instructions (#151749) 2025-08-01 16:14:19 -07:00
Stanislav Mekhanoshin
c7bb105e97
[AMDGPU] Add v_cvt_scale_pk8_* gfx1250 instructions (#151616) 2025-07-31 18:55:59 -07:00
Stanislav Mekhanoshin
49d89bc9f4
[AMDGPU] Add gfx1250 cvt_pk|sr_fp8|bf8_f32 instructions (#151595) 2025-07-31 16:04:46 -07:00
Stanislav Mekhanoshin
e46d938ddf
[AMDGPU] v_cvt_sr_pk_f16_f32 gfx1250 instruction (#151482) 2025-07-31 12:25:55 -07:00
Stanislav Mekhanoshin
7f93487862
[AMDGPU] Add v_cvt_pk_f16_f32 instruction for gfx1250 (#151469) 2025-07-31 10:45:06 -07:00
Stanislav Mekhanoshin
ce40863209
[AMDGPU] Add v_cvt_sr|pk_bf8|fp8_f16 gfx1250 instructions (#151415) 2025-07-30 17:24:45 -07:00
Stanislav Mekhanoshin
b3b36d3590
[AMDGPU] Add V_ASHR_PK_I8_I32 and V_ASHR_PK_U8_I32 on gfx1250 (#151389) 2025-07-30 16:30:47 -07:00
Stanislav Mekhanoshin
62187a60e6
[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385) 2025-07-30 14:02:03 -07:00
Stanislav Mekhanoshin
d70f228e83
[AMDGPU] Add gfx1250 V_ADD_{MIN|MAX}_{U|I}32 instructions (#151379) 2025-07-30 13:12:14 -07:00
Stanislav Mekhanoshin
3dfd939a16
[AMDGPU] gfx1250 V_{MIN|MAX}_{I|U}64 opcodes (#151256) 2025-07-29 19:13:51 -07:00
Changpeng Fang
9b4a44d63d
[AMDGPU] Update MC tests for vflat instructions on GFX1250 (#151232)
These instructions have already been supported (at MC layer) with
current upstream code base.
2025-07-29 15:39:14 -07:00
Stanislav Mekhanoshin
7eaf1f2b2d
[AMDGPU] Bitop3 opcodes for gfx1250 (#151235) 2025-07-29 15:36:56 -07:00
Stanislav Mekhanoshin
d99238263c
[AMDGPU] Implement v_mad_u32/v_mad_nc_u|i64_u32 on gfx1250 (#151226) 2025-07-29 15:06:35 -07:00
Changpeng Fang
6184ef1c2f
[AMDGPU] Support f64 atomics on gfx1250 (#151172)
- BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64
   - DS_ADD_F64

Co-authored-by: Konstantin Zhuravlyov <Konstantin Zhuravlyov@amd.com>
2025-07-29 09:41:00 -07:00
Pierre van Houtryve
be17791f26
[AMDGPU][gfx1250] Add cu-store subtarget feature (#150588)
Determines whether we can use `SCOPE_CU` stores (on by default), or
whether all stores must be done at `SCOPE_SE` minimum.
2025-07-29 11:38:43 +02:00
Changpeng Fang
67e2faa50c
[AMDGPU] MC support for async load and store on gfx1250 (#151030) 2025-07-28 13:45:37 -07:00
Craig Topper
1669bd3ae9
[RISCV] Accept c.slli/c.srli/c.srli with a 0 immediate as hints. (#150689)
These encodings were previously assigned to c.slli64/srli64/srai64, and
designated as hints for RV32 and RV64. Those mnemonics no longer appear
in the ISA manual after RV128 was removed. The spec now says that
c.slli/c.srli/c.srai with an immediate of 0 is a hint.

This patch updates the assembler to accept this. I've left the old
spelling for backwards compatibility but we disassemble a shift with a
zero immediate. The C_SLLI64_HINT/C_SRLI_HINT/C_SRAI_HINT instructions
are removed and the predicates for C_SLLI/C_SRLI/C_SRAI not accept a 0
immediate.

Fixes #150304
2025-07-26 00:05:33 -07:00
Changpeng Fang
34b6587249
[AMDGPU] MC support for load monitor instructions on gfx1250 (#150496) 2025-07-24 12:16:47 -07:00
Stanislav Mekhanoshin
a70f7dafc1
[AMDGPU] gfx1250 flat and global prefetch MC support (#150455) 2025-07-24 11:00:56 -07:00
Changpeng Fang
473bc0d188
[AMDGPU] Support V_FMA_MIX*_BF16 instructions on gfx1250 (#150381)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-24 09:43:49 -07:00
Changpeng Fang
9a563b08e2
[AMDGPU] Support V_PK_MIN3/MAX3_NUM_F16 on gfx1250 (#150326)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-23 15:15:19 -07:00
Changpeng Fang
203ea0a97e
AMDGPU: Support V_PK_MAXIMUM3_F16 and V_PK_MINIMUM3_F16 on gfx1250 (#150307)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-23 13:45:01 -07:00
Stanislav Mekhanoshin
2346968807
[AMDGPU] Add V_ADD|SUB|MUL_U64 gfx1250 opcodes (#150291) 2025-07-23 13:17:56 -07:00
Changpeng Fang
bc1f85d234
AMDGPU: Support packed bf16 instructions on gfx1250 (#150283)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-23 12:01:23 -07:00