llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	af67e0f94f	[AMDGPU] Remove obsolete comments from VOP1Instructions.td. NFC. (#153249 )	2025-08-12 14:29:21 -07:00
Stanislav Mekhanoshin	3589234568	[AMDGPU] Remove dead VOP1_Real_NO_DPP_OP_SEL_with_name. NFC. (#153245 )	2025-08-12 12:37:07 -07:00
Chris Jackson	0824811b0b	[AMDGPU] Remove dead code in VOP1 tablegen (NFC) (#151932 ) Remove dead class in VOP1Instructions.td.	2025-08-04 11:29:20 +01:00
Shilei Tian	2c50e4cac2	[AMDGPU] Add support for `v_sat_pk4_i4_[i8,u8]` on gfx1250 (#149528 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Foad, Jay <Jay.Foad@amd.com>	2025-07-18 13:08:50 -04:00
Shilei Tian	e11d28faee	[AMDGPU] Add support for `v_permlane16_swap_b32` on gfx1250 (#149518 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 13:05:08 -04:00
Shilei Tian	95b69e0e70	[AMDGPU] Add support for `v_prng_b32` on gfx1250 (#149450 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 10:59:47 -04:00
Shilei Tian	aecd44818a	[AMDGPU] Add support for `v_tanh_f16` on gfx1250 (#149439 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 00:21:04 -04:00
Shilei Tian	7e105fbdbe	[AMDGPU] Add support for `v_tanh_f32` on gfx1250 (#149360 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 15:42:35 -04:00
Shilei Tian	fd5fc76c91	[AMDGPU] Add support for `v_cos_bf16` on gfx1250 (#149355 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 14:43:34 -04:00
Shilei Tian	a102342990	[AMDGPU] Add support for `v_sin_bf16` on gfx1250 (#149241 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 08:49:45 -04:00
Shilei Tian	a6b5ece75e	[AMDGPU] Add support for `v_exp_bf16` on gfx1250 (#149229 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 08:46:01 -04:00
Shilei Tian	ad6d5d2821	[AMDGPU] Add support for `v_log_bf16` on gfx1250 (#149201 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-16 19:09:34 -04:00
Shilei Tian	7d2a58e87d	[AMDGPU] Add support for `v_rsq_bf16` on gfx1250 (#149194 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-16 19:06:03 -04:00
Shilei Tian	23ac7b938d	[AMDGPU] Add support for `v_sqrt_bf16` on gfx1250 (#148921 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-15 16:15:47 -04:00
Shilei Tian	dabc8e2ec1	[AMDGPU] Add support for `v_rcp_bf16` on gfx1250 (#148916 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-15 16:12:51 -04:00
Shilei Tian	d7ec80c897	[AMDGPU] Add support for `v_tanh_bf16` on gfx1250 (#147425 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-14 16:30:18 -04:00
Stanislav Mekhanoshin	eb97422e00	[AMDGPU] Disable DPP with v_mov_b64 on gfx1250 (#148054 )	2025-07-10 16:20:13 -07:00
Stanislav Mekhanoshin	fd894f6e9e	[AMDGPU] gfx1250 MC support for v_mov_b64 (#147859 ) It is incomplete in terms of the DPP diagnistics, that is much more involved change.	2025-07-09 21:31:27 -07:00
Brox Chen	0d2b47ae4a	[AMDGPU][True16][CodeGen] stop emitting spgr_lo16 from isel (#144819 ) When true16 is enabled, isel start to emit sgpr_lo16 register when a trunc/sext i16/i32 is generated, or a salu32 is used by vgpr16 or vice versa. And this causes a problem as sgpr_lo16 is not fully supported in the pipeline. True16 mode works fine in -O3 mode since folding pass remove sgpr_lo16 from the pipeline. However it hit a problem in -O0 mode as folding pass is skipped. This patch did: 1. stop emitting sgpr_lo16 from isel 2. update codegen pattern to split uniformed/divergent pattern for i16/i32 conversion 3. update fix-sgpr-copy pass to address legalization requirement in true16 mode, update fix-sgpr-copies-f16-true16.mir test to include all possible combinations This patch is tested with cts and downstream repo with -O0 testing	2025-07-09 16:17:14 -04:00
Shilei Tian	d258457d42	[AMDGPU] Add support for `v_cvt_f32_fp8` on gfx1250 (#147579 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-08 16:21:24 -04:00
Jun Wang	641ad52b6a	[AMDGPU][MC] Fix disassembly for v_permlane16_swap_b32 for GFX950 (#146600 ) When targeting GFX950, disassembly of v_permlane16_swap_b32 and v_permlane32_swap_b32 instructions produces errors when they use certain vdst operand values, e.g., v_permlane16_swap_b32 v218, v219. This patch fixes this problem.	2025-07-02 10:05:25 -07:00
Shilei Tian	749c7c5dc4	[AMDGPU] Add support for `v_cvt_f16_bf8` on gfx1250 (#146305 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-06-30 07:54:55 -04:00
Shilei Tian	a99c964d7f	[AMDGPU] Add support for `v_cvt_f16_fp8` on gfx1250 (#146302 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-06-30 07:51:00 -04:00
Shilei Tian	ce1c1a0e6d	[AMDGPU] Add support for `v_cvt_pk_f16_bf8` on gfx1250 (#145753 ) Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-06-25 17:03:52 -04:00
Shilei Tian	ff23ee40d6	[AMDGPU] Add support for `v_cvt_pk_f16_fp8` on gfx1250 (#145747 ) Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-06-25 17:00:10 -04:00
Shilei Tian	473f992c1f	[AMDGPU] Add the support for `v_cvt_f32_bf16` on gfx1250 (#145632 ) Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-06-25 16:02:40 -04:00
Vigneshwar Jayakumar	1b83f10072	[AMDGPU] Fix to prevent sinking of PERMLANE_SWAP instruction (#144423 ) Permlane_swap instruction depends on exec mask, added isConvergent flag to prevent sinking of instruction. Fixes: SWDEV-537232	2025-06-20 20:44:23 +09:00
Stanislav Mekhanoshin	7b7b5a397d	[AMDGPU] Remove AsmVOP3OpSel field completely. NFCI. (#144574 )	2025-06-17 13:29:45 -07:00
Stanislav Mekhanoshin	be6c1684c0	[AMDGPU] Automate creation of byte_sel dags. NFCI. (#140155 )	2025-05-16 08:54:04 -07:00
Stanislav Mekhanoshin	f113cab191	[AMDGPU] Cleanup bytesel variables. NFC. (#140131 ) Somehow we ended up with 2 sets of td variables: Is...ByteSel and Has...ByteSel. Keep only Has... form.	2025-05-15 16:04:24 -07:00
Ivan Kosarev	66d3980b53	[AMDGPU][NFC] Remove _DEFERRED operands. (#139123 ) All immediates are deferred now.	2025-05-09 10:10:53 +01:00
Brox Chen	d4706e17f5	[AMDGPU][True16][CodeGen] readfirstlane for vgpr16 copy to sgpr32 (#118037 ) i16 can be selected into sgpr32 or vgpr16 in isel lowering in true16 mode. And thus, it creates cases that we copy from vgpr16 to sgpr32 in ext selection and this seems inevitable without sgpr16 support. legalize the src/dst reg when we decide to lower this special copy to a readfirstlane in fix-sgpr-copy pass and add a lit test	2025-05-05 15:17:34 -04:00
Brox Chen	fb0e7b5f16	[AMDGPU][True16][CodeGen] Implement sgpr folding in true16 (#128929 ) We haven't implemented 16 bit SGPRs. Currently allow 32-bit SGPRs to be folded into True16 bit instructions taking 16 bit values. Also use sgpr_32 when Imm is copied to spgr_lo16 so it could be further folded. This improves generated code quality.	2025-04-02 16:08:26 -04:00
Juan Manuel Martinez Caamaño	0375ef07c3	[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741 ) This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as a 4-bit signed integer and returns `0.0625f * src`. SWDEV-518861	2025-04-02 19:51:40 +02:00
Pierre van Houtryve	5231736329	[AMDGPU] Do not allow M0 as v_readfirstlane_b32 dst (#128851 ) M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is effectively useless. Represent this by restricting the dest RC of that instruction to `SReg_32_XM0` which excludes M0. There is a lot of test changes due to the register class changing, but most changes are trivial. In some cases, an extra register and `s_mov_b32` is needed. Fixes SWDEV-513269	2025-02-26 13:14:03 +01:00
Ivan Kosarev	983562d8c5	[AMDGPU][NFC] Simplify t16/fake16 TableGen definitions. (#122693 ) Infer mnemonics from the names of the records.	2025-01-29 12:46:05 +00:00
Brox Chen	4af3332015	[AMDGPU][True16][MC] true16 for v_cvt_u32_u16 (#120646 ) Support true16 format for v_cvt_u32_u16 in MC	2025-01-06 15:28:48 -05:00
Brox Chen	d7acf03cec	[AMDGPU][True16][MC] true16 for v_rndne_f16 (#120691 ) Support true16 format for v_rndne_b16 in MC	2025-01-03 16:32:15 -05:00
Brox Chen	bf274b3d80	[AMDGPU][True16][MC] true16 for v_cos_f16 (#120639 ) Support true16 format for v_cos_f16 in MC	2025-01-03 15:46:41 -05:00
Brox Chen	b71a6fd042	[AMDGPU][True16][MC] true16 for v_cvt_i32_i16 (#120645 ) Support true16 format for v_cvt_i32_i16 in MC	2025-01-03 15:46:06 -05:00
Brox Chen	dc307be1b5	[AMDGPU][True16][MC] true16 for v_fract_f16 (#120647 ) Support true16 format for v_fract_f16 in MC	2025-01-03 15:45:33 -05:00
Brox Chen	3b72c62e7f	[AMDGPU][True16][MC] true16 for v_frexp_mant_f16 (#120653 ) Support true16 format for v_frexp_mant_f16 in MC	2025-01-03 14:42:39 -05:00
Brox Chen	34d2c3b934	[AMDGPU][True16][MC] true16 for v_sin_f16 (#120692 ) Support true16 format for v_sin_f16 in MC	2025-01-03 14:11:25 -05:00
Brox Chen	d37aa5135c	[AMDGPU][True16][MC] true16 for v_not_b16 (#120659 ) Support true16 format for v_not_b16 in MC	2025-01-03 13:09:23 -05:00
Brox Chen	e5acb167b7	[AMDGPU][True16][MC] true16 for v_trunc_f16 (#120693 ) Support true16 format for v_trunc_f16 in MC	2025-01-03 11:43:45 -05:00
Brox Chen	322f16e624	[AMDGPU][True16][MC] true16 for v_sat_pk_u8_i16 (#120634 ) Support true16 format for v_sat_pk_u8_i16 in MC	2025-01-03 11:43:07 -05:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Matt Arsenault	6dceb0e34e	AMDGPU: Add V_CVT_F32_BF16 for gfx950 (#116311 )	2024-11-18 13:33:05 -08:00
Matt Arsenault	ca1b35a6c8	AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310 ) Rand num instruction for stochastic rounding.	2024-11-18 10:54:54 -08:00
Matt Arsenault	b7d635ed30	AMDGPU: Copy correct predicates for SDWA reals (#116288 ) There are a lot of messes in the special case predicate handling. Currently broad let blocks override specific predicates with more general cases. For instructions with SDWA, the HasSDWA predicate was overriding the SubtargetPredicate for the instruction. This fixes enough to properly disallow new instructions that support SDWA on older targets.	2024-11-18 08:38:35 -08:00

1 2 3 4 5

223 Commits