223 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
af67e0f94f
[AMDGPU] Remove obsolete comments from VOP1Instructions.td. NFC. (#153249) 2025-08-12 14:29:21 -07:00
Stanislav Mekhanoshin
3589234568
[AMDGPU] Remove dead VOP1_Real_NO_DPP_OP_SEL_with_name. NFC. (#153245) 2025-08-12 12:37:07 -07:00
Chris Jackson
0824811b0b
[AMDGPU] Remove dead code in VOP1 tablegen (NFC) (#151932)
Remove dead class in VOP1Instructions.td.
2025-08-04 11:29:20 +01:00
Shilei Tian
2c50e4cac2
[AMDGPU] Add support for v_sat_pk4_i4_[i8,u8] on gfx1250 (#149528)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Foad, Jay <Jay.Foad@amd.com>
2025-07-18 13:08:50 -04:00
Shilei Tian
e11d28faee
[AMDGPU] Add support for v_permlane16_swap_b32 on gfx1250 (#149518)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 13:05:08 -04:00
Shilei Tian
95b69e0e70
[AMDGPU] Add support for v_prng_b32 on gfx1250 (#149450)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 10:59:47 -04:00
Shilei Tian
aecd44818a
[AMDGPU] Add support for v_tanh_f16 on gfx1250 (#149439)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 00:21:04 -04:00
Shilei Tian
7e105fbdbe
[AMDGPU] Add support for v_tanh_f32 on gfx1250 (#149360)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 15:42:35 -04:00
Shilei Tian
fd5fc76c91
[AMDGPU] Add support for v_cos_bf16 on gfx1250 (#149355)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 14:43:34 -04:00
Shilei Tian
a102342990
[AMDGPU] Add support for v_sin_bf16 on gfx1250 (#149241)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 08:49:45 -04:00
Shilei Tian
a6b5ece75e
[AMDGPU] Add support for v_exp_bf16 on gfx1250 (#149229)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 08:46:01 -04:00
Shilei Tian
ad6d5d2821
[AMDGPU] Add support for v_log_bf16 on gfx1250 (#149201)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16 19:09:34 -04:00
Shilei Tian
7d2a58e87d
[AMDGPU] Add support for v_rsq_bf16 on gfx1250 (#149194)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16 19:06:03 -04:00
Shilei Tian
23ac7b938d
[AMDGPU] Add support for v_sqrt_bf16 on gfx1250 (#148921)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-15 16:15:47 -04:00
Shilei Tian
dabc8e2ec1
[AMDGPU] Add support for v_rcp_bf16 on gfx1250 (#148916)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-15 16:12:51 -04:00
Shilei Tian
d7ec80c897
[AMDGPU] Add support for v_tanh_bf16 on gfx1250 (#147425)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-14 16:30:18 -04:00
Stanislav Mekhanoshin
eb97422e00
[AMDGPU] Disable DPP with v_mov_b64 on gfx1250 (#148054) 2025-07-10 16:20:13 -07:00
Stanislav Mekhanoshin
fd894f6e9e
[AMDGPU] gfx1250 MC support for v_mov_b64 (#147859)
It is incomplete in terms of the DPP diagnistics, that is much
more involved change.
2025-07-09 21:31:27 -07:00
Brox Chen
0d2b47ae4a
[AMDGPU][True16][CodeGen] stop emitting spgr_lo16 from isel (#144819)
When true16 is enabled, isel start to emit sgpr_lo16 register when a
trunc/sext i16/i32 is generated, or a salu32 is used by vgpr16 or vice
versa. And this causes a problem as sgpr_lo16 is not fully supported in
the pipeline.

True16 mode works fine in -O3 mode since folding pass remove sgpr_lo16
from the pipeline. However it hit a problem in -O0 mode as folding pass
is skipped.

This patch did:
1. stop emitting sgpr_lo16 from isel
2. update codegen pattern to split uniformed/divergent pattern for
i16/i32 conversion
3. update fix-sgpr-copy pass to address legalization requirement in
true16 mode, update fix-sgpr-copies-f16-true16.mir
test to include all possible combinations

This patch is tested with cts and downstream repo with -O0 testing
2025-07-09 16:17:14 -04:00
Shilei Tian
d258457d42
[AMDGPU] Add support for v_cvt_f32_fp8 on gfx1250 (#147579)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-08 16:21:24 -04:00
Jun Wang
641ad52b6a
[AMDGPU][MC] Fix disassembly for v_permlane16_swap_b32 for GFX950 (#146600)
When targeting GFX950, disassembly of v_permlane16_swap_b32 and
v_permlane32_swap_b32 instructions produces errors when they use certain
vdst operand values, e.g., v_permlane16_swap_b32 v218, v219. This patch
fixes this problem.
2025-07-02 10:05:25 -07:00
Shilei Tian
749c7c5dc4
[AMDGPU] Add support for v_cvt_f16_bf8 on gfx1250 (#146305)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-06-30 07:54:55 -04:00
Shilei Tian
a99c964d7f
[AMDGPU] Add support for v_cvt_f16_fp8 on gfx1250 (#146302)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-06-30 07:51:00 -04:00
Shilei Tian
ce1c1a0e6d
[AMDGPU] Add support for v_cvt_pk_f16_bf8 on gfx1250 (#145753)
Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-06-25 17:03:52 -04:00
Shilei Tian
ff23ee40d6
[AMDGPU] Add support for v_cvt_pk_f16_fp8 on gfx1250 (#145747)
Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-06-25 17:00:10 -04:00
Shilei Tian
473f992c1f
[AMDGPU] Add the support for v_cvt_f32_bf16 on gfx1250 (#145632)
Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-06-25 16:02:40 -04:00
Vigneshwar Jayakumar
1b83f10072
[AMDGPU] Fix to prevent sinking of PERMLANE_SWAP instruction (#144423)
Permlane_swap instruction depends on exec mask, added isConvergent flag
to prevent sinking of instruction.

Fixes: SWDEV-537232
2025-06-20 20:44:23 +09:00
Stanislav Mekhanoshin
7b7b5a397d
[AMDGPU] Remove AsmVOP3OpSel field completely. NFCI. (#144574) 2025-06-17 13:29:45 -07:00
Stanislav Mekhanoshin
be6c1684c0
[AMDGPU] Automate creation of byte_sel dags. NFCI. (#140155) 2025-05-16 08:54:04 -07:00
Stanislav Mekhanoshin
f113cab191
[AMDGPU] Cleanup bytesel variables. NFC. (#140131)
Somehow we ended up with 2 sets of td variables: Is...ByteSel and
Has...ByteSel. Keep only Has... form.
2025-05-15 16:04:24 -07:00
Ivan Kosarev
66d3980b53
[AMDGPU][NFC] Remove _DEFERRED operands. (#139123)
All immediates are deferred now.
2025-05-09 10:10:53 +01:00
Brox Chen
d4706e17f5
[AMDGPU][True16][CodeGen] readfirstlane for vgpr16 copy to sgpr32 (#118037)
i16 can be selected into sgpr32 or vgpr16 in isel lowering in true16
mode. And thus, it creates cases that we copy from vgpr16 to sgpr32 in
ext selection and this seems inevitable without sgpr16 support.

legalize the src/dst reg when we decide to lower this special copy to a
readfirstlane in fix-sgpr-copy pass and add a lit test
2025-05-05 15:17:34 -04:00
Brox Chen
fb0e7b5f16
[AMDGPU][True16][CodeGen] Implement sgpr folding in true16 (#128929)
We haven't implemented 16 bit SGPRs. Currently allow 32-bit SGPRs to be
folded into True16 bit instructions taking 16 bit values. Also use
sgpr_32 when Imm is copied to spgr_lo16 so it could be further folded.
This improves generated code quality.
2025-04-02 16:08:26 -04:00
Juan Manuel Martinez Caamaño
0375ef07c3
[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741)
This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as
a 4-bit signed integer and returns `0.0625f * src`.

SWDEV-518861
2025-04-02 19:51:40 +02:00
Pierre van Houtryve
5231736329
[AMDGPU] Do not allow M0 as v_readfirstlane_b32 dst (#128851)
M0 can only be written to by the SALU, so `v_readfirstlane_b32 m0` is
effectively useless. Represent this by restricting the dest RC of that
instruction to `SReg_32_XM0` which excludes M0.

There is a lot of test changes due to the register class changing, but
most changes are trivial. In some cases, an extra register and
`s_mov_b32` is needed.

Fixes SWDEV-513269
2025-02-26 13:14:03 +01:00
Ivan Kosarev
983562d8c5
[AMDGPU][NFC] Simplify t16/fake16 TableGen definitions. (#122693)
Infer mnemonics from the names of the records.
2025-01-29 12:46:05 +00:00
Brox Chen
4af3332015
[AMDGPU][True16][MC] true16 for v_cvt_u32_u16 (#120646)
Support true16 format for v_cvt_u32_u16 in MC
2025-01-06 15:28:48 -05:00
Brox Chen
d7acf03cec
[AMDGPU][True16][MC] true16 for v_rndne_f16 (#120691)
Support true16 format for v_rndne_b16 in MC
2025-01-03 16:32:15 -05:00
Brox Chen
bf274b3d80
[AMDGPU][True16][MC] true16 for v_cos_f16 (#120639)
Support true16 format for v_cos_f16 in MC
2025-01-03 15:46:41 -05:00
Brox Chen
b71a6fd042
[AMDGPU][True16][MC] true16 for v_cvt_i32_i16 (#120645)
Support true16 format for v_cvt_i32_i16 in MC
2025-01-03 15:46:06 -05:00
Brox Chen
dc307be1b5
[AMDGPU][True16][MC] true16 for v_fract_f16 (#120647)
Support true16 format for v_fract_f16 in MC
2025-01-03 15:45:33 -05:00
Brox Chen
3b72c62e7f
[AMDGPU][True16][MC] true16 for v_frexp_mant_f16 (#120653)
Support true16 format for v_frexp_mant_f16 in MC
2025-01-03 14:42:39 -05:00
Brox Chen
34d2c3b934
[AMDGPU][True16][MC] true16 for v_sin_f16 (#120692)
Support true16 format for v_sin_f16 in MC
2025-01-03 14:11:25 -05:00
Brox Chen
d37aa5135c
[AMDGPU][True16][MC] true16 for v_not_b16 (#120659)
Support true16 format for v_not_b16 in MC
2025-01-03 13:09:23 -05:00
Brox Chen
e5acb167b7
[AMDGPU][True16][MC] true16 for v_trunc_f16 (#120693)
Support true16 format for v_trunc_f16 in MC
2025-01-03 11:43:45 -05:00
Brox Chen
322f16e624
[AMDGPU][True16][MC] true16 for v_sat_pk_u8_i16 (#120634)
Support true16 format for v_sat_pk_u8_i16 in MC
2025-01-03 11:43:07 -05:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Matt Arsenault
6dceb0e34e
AMDGPU: Add V_CVT_F32_BF16 for gfx950 (#116311) 2024-11-18 13:33:05 -08:00
Matt Arsenault
ca1b35a6c8
AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310)
Rand num instruction for stochastic rounding.
2024-11-18 10:54:54 -08:00
Matt Arsenault
b7d635ed30
AMDGPU: Copy correct predicates for SDWA reals (#116288)
There are a lot of messes in the special case
predicate handling. Currently broad let blocks
override specific predicates with more general
cases. For instructions with SDWA, the HasSDWA
predicate was overriding the SubtargetPredicate
for the instruction.

This fixes enough to properly disallow new instructions
that support SDWA on older targets.
2024-11-18 08:38:35 -08:00