llvm-project

Author	SHA1	Message	Date
Gang Chen	ef68d1587d	[AMDGPU] upstream barrier count reporting part1 (#154409 )	2025-08-19 16:42:31 -07:00
Stanislav Mekhanoshin	80d430df5d	[AMDGPU] Add MSG_SAVEWAVE_HAS_TDM on gfx1250 (#153483 )	2025-08-13 23:01:50 -07:00
Stanislav Mekhanoshin	fc911fe928	[AMDGPU] Add HW_REG_IB_STS2 on gfx1250 (#153479 )	2025-08-13 23:01:28 -07:00
Jonathan Thackray	7bd0c5fa66	[AArch64][llvm] Unify AArch64 tests into a single file (4/4) (NFC) (#146331 ) This is a series of patches (4/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests whose .s tests have functions * makes the .s tests have a roundabout run line to test both encoding and assembly See also #146328, #146329 and #146330. Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>	2025-08-13 14:40:41 +00:00
Jonathan Thackray	8453f205eb	[AArch64][llvm] Unify AArch64 tests into a single file (3/4) (NFC) (#146330 ) This is a series of patches (3/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have multiple feature dependencies * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed See also #146328, #146329 and #146331. Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>	2025-08-13 14:07:28 +00:00
Jonathan Thackray	b878793739	[AArch64][llvm] Unify AArch64 tests into a single file (2/4) (NFC) (#146329 ) This is a series of patches (2/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have only one feature required * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed * fixes naming convention of tests See also #146328, #146330 and #146331. Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>	2025-08-13 13:39:45 +00:00
Jonathan Thackray	69452d50ce	[AArch64][llvm] Unify AArch64 tests into a single file (1/4) (NFC) (#146328 ) This is a series of patches (1/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * unifies errorless .s and .txt tests into a single file * remove .txt tests which don't have feature requirements * makes the .s tests have a roundabout run line to test both encoding and assembly See also #146329, #146330 and #146331. --------- Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>	2025-08-13 13:45:25 +01:00
Stanislav Mekhanoshin	d0ee82040c	[AMDGPU] Add s_barrier_init\|join\|leave instructions (#153296 )	2025-08-12 15:07:07 -07:00
Sam Elliott	4e11f89904	[RISCV] Basic Objdump Mapping Symbol Support (#151452 ) This implements very basic support for RISC-V mapping symbols in llvm-objdump, sharing the implementation with how Arm/AArch64/CSKY implement this feature. This only supports the `$x` (instruction) and `$d` (data) mapping symbols for RISC-V, and not the version of `$x` which includes an architecture string suffix.	2025-08-07 11:28:07 -07:00
Stanislav Mekhanoshin	b296ea9c14	[AMDGPU] s_get_shader_cycles_u64 gfx1250 instruction (#152390 ) It is the same as reading SHADER_CYCLES_LO and SHADER_CYCLES_HI but with a single instruction.	2025-08-06 15:32:28 -07:00
Stanislav Mekhanoshin	66392a8d8d	[AMDGPU] Add XNACK_STATE_PRIV and _MASK gfx1250 registers (#152374 ) Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com> Co-authored-by: Pierre Vanhoutryve <pierre.vanhoutryve@amd.com>	2025-08-06 14:44:17 -07:00
Stanislav Mekhanoshin	c3103068b7	[AMDGPU] Add more gfx1250 MC tests. NFC. (#152388 ) These are already working, but left downstream.	2025-08-06 14:38:28 -07:00
Stanislav Mekhanoshin	184821b63d	[AMDGPU] Add gfx1250 DS MC tests. NFC. (#152378 )	2025-08-06 14:15:35 -07:00
Jonathan Thackray	c3d24217bf	[AArch64][llvm] Fix disassembly of `ldt{add,set,clr}` instructions using `xzr/wzr` (#152292 ) The current disassembly of `ldt{add,set,clr}` instructions when using `xzr/wzr` is incorrect. The Armv9.6-A Memory Systems specification says: ``` For each of LDT{ADD\|SET\|CLR}{L}, there is the corresponding STT{ADD\|SET\|CLR}{L} alias, for the case where the register selected by the Rt field is XZR or WZR ``` and: ``` LDT{ADD\|SET\|CLR}{A}{L} is equivalent to LD{ADD\|SET\|CLR}{A}{L} except that: <..conditions..> ``` The Arm ARM specifies the preferred form of disassembly for these aliases: ``` STADD <Xs>, [<Xn\|SP>] is equivalent to LDADD <Xs>, XZR, [<Xn\|SP>] and is always the preferred disassembly. ``` (ref: DDI 0487L.b C6-2317) This means that `sttadd` is the preferred disassembly for `ldtadd w0, wzr, [x2]` when Rt is `xzr` or `wzr`. This change also aligns llvm disassembly with GNU binutils, as shown by the following examples: llvm before this change: ``` % cat test.s stadd w0, [sp] sttadd w0, [sp] ldadd w0, wzr, [sp] ldtadd w0, wzr, [sp] % llvm-mc-20 -triple aarch64 -mattr=+lse,+lsui test.s stadd w0, [sp] ldtadd w0, wzr, [sp] stadd w0, [sp] ldtadd w0, wzr, [sp] ``` llvm after this change: ``` % llvm-mc -triple aarch64 -mattr=+lse,+lsui test.s stadd w0, [sp] sttadd w0, [sp] stadd w0, [sp] sttadd w0, [sp] ``` GCC-15 test: ``` % gas test.s -march=armv8-a+lsui+lse -o test.o % objdump -dr test.o 0: b82003ff stadd w0, [sp] 4: 192007ff sttadd w0, [sp] 8: b82003ff stadd w0, [sp] c: 192007ff sttadd w0, [sp] ``` Many thanks to Ezra Sitorus and Alice Carlotti for reporting and confirming this issue.	2025-08-06 15:44:15 +01:00
Stanislav Mekhanoshin	34aed0ed56	[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194 )	2025-08-05 15:15:21 -07:00
Stanislav Mekhanoshin	d08c2977e8	[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203 )	2025-08-05 14:35:48 -07:00
Oliver Stannard	f6c2a357e7	[AArch64] Add Apple assembly syntax for recent instructions (#152111 ) Some vector instructions override AsmString in the tablegen description, but did not include the Apple syntax variant, so were printed without operands. Fixes #151330	2025-08-05 16:04:25 +01:00
Stanislav Mekhanoshin	37fe9f6382	[AMDGPU] Add gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 MC support (#152014 ) This adds new VOP3PX2e encoding	2025-08-04 14:20:12 -07:00
Stanislav Mekhanoshin	dd0737bd99	[AMDGPU] gfx1250 v_wmma_ld_scale instructions (#152010 )	2025-08-04 11:36:48 -07:00
Stanislav Mekhanoshin	849009c635	[AMDGPU] Add missing v_permlane_up_b32 test. NFC. (#151811 )	2025-08-02 15:22:29 -07:00
Stanislav Mekhanoshin	d18511e10a	[AMDGPU] v_cvt_scalef32_sr_pk16_* gfx1250 instructions (#151810 )	2025-08-02 15:21:59 -07:00
Stanislav Mekhanoshin	bc463c059c	[AMDGPU] v_cvt_scalef32_pk16_* gfx1250 instructions (#151807 )	2025-08-02 12:42:12 -07:00
Stanislav Mekhanoshin	7598c25b5a	[AMDGPU] v_cvt_scale_pk16 gfx1250 instructions (#151804 )	2025-08-02 10:45:02 -07:00
Stanislav Mekhanoshin	0988510ad4	[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773 )	2025-08-01 20:12:35 -07:00
Stanislav Mekhanoshin	cc3932bf29	[AMDGPU] gfx1250 v_cvt_scalef32_sr_pk8_* instructions (#151765 )	2025-08-01 19:25:57 -07:00
Stanislav Mekhanoshin	962ee7a568	[AMDGPU] gfx1250 v_cvt_scalef32_pk8_* instructions (#151758 )	2025-08-01 18:29:45 -07:00
Stanislav Mekhanoshin	33abf05af4	[AMDGPU] gfx1250 v_permlane_* instructions (#151749 )	2025-08-01 16:14:19 -07:00
Stanislav Mekhanoshin	c7bb105e97	[AMDGPU] Add v_cvt_scale_pk8_* gfx1250 instructions (#151616 )	2025-07-31 18:55:59 -07:00
Stanislav Mekhanoshin	49d89bc9f4	[AMDGPU] Add gfx1250 cvt_pk\|sr_fp8\|bf8_f32 instructions (#151595 )	2025-07-31 16:04:46 -07:00
Stanislav Mekhanoshin	e46d938ddf	[AMDGPU] v_cvt_sr_pk_f16_f32 gfx1250 instruction (#151482 )	2025-07-31 12:25:55 -07:00
Stanislav Mekhanoshin	7f93487862	[AMDGPU] Add v_cvt_pk_f16_f32 instruction for gfx1250 (#151469 )	2025-07-31 10:45:06 -07:00
Stanislav Mekhanoshin	ce40863209	[AMDGPU] Add v_cvt_sr\|pk_bf8\|fp8_f16 gfx1250 instructions (#151415 )	2025-07-30 17:24:45 -07:00
Stanislav Mekhanoshin	b3b36d3590	[AMDGPU] Add V_ASHR_PK_I8_I32 and V_ASHR_PK_U8_I32 on gfx1250 (#151389 )	2025-07-30 16:30:47 -07:00
Stanislav Mekhanoshin	62187a60e6	[AMDGPU] Add gfx1250 v_cvt_sr_pk_bf16_f32 instruction (#151385 )	2025-07-30 14:02:03 -07:00
Stanislav Mekhanoshin	d70f228e83	[AMDGPU] Add gfx1250 V_ADD_{MIN\|MAX}_{U\|I}32 instructions (#151379 )	2025-07-30 13:12:14 -07:00
Stanislav Mekhanoshin	3dfd939a16	[AMDGPU] gfx1250 V_{MIN\|MAX}_{I\|U}64 opcodes (#151256 )	2025-07-29 19:13:51 -07:00
Changpeng Fang	9b4a44d63d	[AMDGPU] Update MC tests for vflat instructions on GFX1250 (#151232 ) These instructions have already been supported (at MC layer) with current upstream code base.	2025-07-29 15:39:14 -07:00
Stanislav Mekhanoshin	7eaf1f2b2d	[AMDGPU] Bitop3 opcodes for gfx1250 (#151235 )	2025-07-29 15:36:56 -07:00
Stanislav Mekhanoshin	d99238263c	[AMDGPU] Implement v_mad_u32/v_mad_nc_u\|i64_u32 on gfx1250 (#151226 )	2025-07-29 15:06:35 -07:00
Changpeng Fang	6184ef1c2f	[AMDGPU] Support f64 atomics on gfx1250 (#151172 ) - BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64 - DS_ADD_F64 Co-authored-by: Konstantin Zhuravlyov <Konstantin Zhuravlyov@amd.com>	2025-07-29 09:41:00 -07:00
Pierre van Houtryve	be17791f26	[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588 ) Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.	2025-07-29 11:38:43 +02:00
Changpeng Fang	67e2faa50c	[AMDGPU] MC support for async load and store on gfx1250 (#151030 )	2025-07-28 13:45:37 -07:00
Craig Topper	1669bd3ae9	[RISCV] Accept c.slli/c.srli/c.srli with a 0 immediate as hints. (#150689 ) These encodings were previously assigned to c.slli64/srli64/srai64, and designated as hints for RV32 and RV64. Those mnemonics no longer appear in the ISA manual after RV128 was removed. The spec now says that c.slli/c.srli/c.srai with an immediate of 0 is a hint. This patch updates the assembler to accept this. I've left the old spelling for backwards compatibility but we disassemble a shift with a zero immediate. The C_SLLI64_HINT/C_SRLI_HINT/C_SRAI_HINT instructions are removed and the predicates for C_SLLI/C_SRLI/C_SRAI not accept a 0 immediate. Fixes #150304	2025-07-26 00:05:33 -07:00
Changpeng Fang	34b6587249	[AMDGPU] MC support for load monitor instructions on gfx1250 (#150496 )	2025-07-24 12:16:47 -07:00
Stanislav Mekhanoshin	a70f7dafc1	[AMDGPU] gfx1250 flat and global prefetch MC support (#150455 )	2025-07-24 11:00:56 -07:00
Changpeng Fang	473bc0d188	[AMDGPU] Support V_FMA_MIX*_BF16 instructions on gfx1250 (#150381 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-24 09:43:49 -07:00
Changpeng Fang	9a563b08e2	[AMDGPU] Support V_PK_MIN3/MAX3_NUM_F16 on gfx1250 (#150326 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-23 15:15:19 -07:00
Changpeng Fang	203ea0a97e	AMDGPU: Support V_PK_MAXIMUM3_F16 and V_PK_MINIMUM3_F16 on gfx1250 (#150307 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-23 13:45:01 -07:00
Stanislav Mekhanoshin	2346968807	[AMDGPU] Add V_ADD\|SUB\|MUL_U64 gfx1250 opcodes (#150291 )	2025-07-23 13:17:56 -07:00
Changpeng Fang	bc1f85d234	AMDGPU: Support packed bf16 instructions on gfx1250 (#150283 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-23 12:01:23 -07:00

1 2 3 4 5 ...

2664 Commits