131 Commits

Author SHA1 Message Date
Matt Arsenault
6f8e7c11cf
AMDGPU: Add MC support for gfx950 V_BITOP3_B32/B16 (#117379)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-25 09:42:07 -08:00
Matt Arsenault
cd20fc0772
AMDGPU: Remove wavefrontsize64 feature from dummy target (#117410)
This is a refinement for the existing hack. With this,
the default target will have neither wavefrontsize feature
present, unless it was explicitly specified. That is,
getWavefrontSize() == 64 no longer implies +wavefrontsize64.
getWavefrontSize() == 32 does imply +wavefrontsize32.

Continue to assume the value is 64 with no wavesize feature.
This maintains the codegenable property without any code
that directly cares about the wavesize needing to worry about it.

Introduce an isWaveSizeKnown helper to check if we know the
wavesize is accurate based on having one of the features explicitly
set, or a known target-cpu.

I'm not sure what's going on in wave_any.s. It's testing what
happens when both wavesizes are enabled, but this is treated
as an error in codegen. We now treat wave32 as the winning
case, so some cases that were previously printed as vcc are now
vcc_lo.
2024-11-23 09:27:47 -08:00
Jay Foad
ade0750e35
[AMDGPU] Fix some cache policy checks for GFX12+ (#116396)
Fix coding errors found by inspection and check that the swz bit still
serves to prevent merging of buffer loads/stores on GFX12+.
2024-11-21 08:22:59 +00:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Kazu Hirata
4048c64306
[llvm] Remove redundant control flow statements (NFC) (#115831)
Identified with readability-redundant-control-flow.
2024-11-12 10:09:42 -08:00
Fangrui Song
facdae62b7 [MCInstPrinter] Make printRegName non-const
Similar to printInst. printRegName may change states (e.g. #113834).
2024-10-29 19:14:54 -07:00
Craig Topper
fd50cdfb94 [AMDGPU] Use MCRegister. NFC 2024-09-28 11:40:25 -07:00
Jun Wang
cd5f5b7690
[AMDGPU][MC] Implement fft and rotate modes for ds_swizzle_b32 (#108064)
In addition to the basic mode, the ds_swizzle_b32 is supposed to support
two specific modes: fft and rotate. This patch implements those two
modes.
2024-09-27 10:18:34 -07:00
Matt Arsenault
d2b6a8ee67
AMDGPU: Fix asserting when trying to print scc (#101175)
This is printable using inline assembly. Also we should
handle using scc directly as instruction operands.
2024-07-30 17:41:13 +04:00
Ivan Kosarev
5bd3aef5e2
[AMDGPU] Use a generic printer for NamedIntOperands. (#100399)
This includes simplifying printing dmask modifiers where we don't need
to mask the value to print.

Part of <https://github.com/llvm/llvm-project/issues/62629>.
2024-07-29 09:54:05 +01:00
Ivan Kosarev
7a3bc44c89
[AMDGPU][MC][NFCI] Eliminate printU4ImmDecOperand(). (#100589)
This is hoped to make things a bit safer not masking the value to print
and to make the logic in printDPPCtrl() a bit more explicit.

Part of <https://github.com/llvm/llvm-project/issues/62629>.
2024-07-26 10:10:30 +01:00
Ivan Kosarev
24a18aafa3
[AMDGPU] Simplify printing row/bank_mask modifiers. (#100575)
And fix a codegen test to use mask values that fit their encoding
fields.

Part of <https://github.com/llvm/llvm-project/issues/62629>.
2024-07-25 16:44:44 +01:00
Ivan Kosarev
430cf6537b
[AMDGPU][NFCI] Declare offset0/1 operands to be i32. (#100560)
Being of type i8 makes them signed, which they aren't, and requires
extra work masking them on verbalisation.

Part of <https://github.com/llvm/llvm-project/issues/62629>.
2024-07-25 14:32:19 +01:00
Jay Foad
63fae3ed65
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298) 2024-07-17 21:11:00 +01:00
Ivan Kosarev
162386693f
[AMDGPU][MC] Support UC_VERSION_* constants. (#95618)
Our other tools support them, so we want them in LLVM
assembler/disassembler too.
2024-06-18 15:44:14 +01:00
Pierre van Houtryve
9ab601fa53 Reland "[NFC][AMDGPU] Do not flush after printing every instruction (#95237)"
It's very expensive and doesn't achieve anything.

I one test I did, it saves almost 10s on a 2m23s build, bringing it down
to 2m15s using a downstream branch.
2024-06-13 08:30:06 +02:00
pvanhout
04c4cf45fa Revert "[NFC][AMDGPU] Do not flush after printing every instruction (#95237)"
This reverts commit ad9fe3b2a949fb3379e0a1bafbcd2ca81f5fa414.
2024-06-12 15:24:21 +02:00
Pierre van Houtryve
ad9fe3b2a9
[NFC][AMDGPU] Do not flush after printing every instruction (#95237)
It's very expensive and doesn't achieve anything.

I one test I did, it saves almost 10s on a 2m23s build, bringing it down
to 2m15s using a downstream branch.
2024-06-12 14:57:41 +02:00
Stanislav Mekhanoshin
6e722bbe30
[AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244) 2024-04-26 13:02:57 -07:00
Shilei Tian
e963d0740e
[AMDGPU] Replace isInlinableLiteral16 with specific version (#84402)
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
2024-03-08 14:49:52 -05:00
Shilei Tian
e9c1dbb408 Revert "[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345)"
This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks
downstream.
2024-03-06 08:42:54 -05:00
Shilei Tian
530f0e64ec
[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345) 2024-03-04 08:40:42 -05:00
Ivan Kosarev
dfa1d9b027
[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)
These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined constants
and functions.
2024-02-23 17:34:55 +00:00
Shilei Tian
46734aa1e5
[AMDGPU] Use bf16 instead of i16 for bfloat (#80908)
Currently we generally use `i16` to represent `bf16` in those tablegen
files. This patch is trying to use `bf16` directly.

Fix #79369.
2024-02-16 15:58:30 -05:00
Mirko Brkušanin
815e0485a4
[AMDGPU][MC] Fix printing vcc(_lo) twice for VOPC DPP instrucitons (#81158) 2024-02-12 19:01:58 +01:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Mariusz Sikora
cfddb59be2
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Mariusz Sikora
28b7e498b6
AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892)
Endoding is VOP3P. Tagged as deep/machine learning instructions. i32
type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and
src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32
fneg(neg_lo[2]) and f32 fabs(neg_hi[2]).

---------

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
2024-01-18 14:00:27 +01:00
Nicolai Hähnle
49b492048a
AMDGPU: Fix packed 16-bit inline constants (#76522)
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.

Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.

Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.

Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.

Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.

A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.

Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
2024-01-04 00:10:15 +01:00
Mirko Brkušanin
82e33d6203
[AMDGPU] Add VDSDIR instructions for GFX12 (#75197) 2024-01-03 16:32:00 +01:00
Mirko Brkušanin
47615ddc84
[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193) 2023-12-14 14:22:04 +01:00
Mirko Brkušanin
ac406b4817
[AMDGPU][MC] Add GFX12 VBUFFER encoding (#75195) 2023-12-14 12:58:18 +01:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00
Mirko Brkušanin
f5868cb6a6
[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062) 2023-12-04 13:04:42 +01:00
Stanislav Mekhanoshin
ab6c3d5034
[AMDGPU] Change the representation of double literals in operands (#68740)
A 64-bit literal can be used as a 32-bit zero or sign extended operand.
In case of double zeroes are added to the low 32 bits. Currently asm
parser stores only high 32 bits of a double into an operand. To support
codegen as requested by the
https://github.com/llvm/llvm-project/issues/67781 we need to change the
representation to store a full 64-bit value so that codegen can simply
add immediates to an instruction.

There is some code to support compatibility with existing tests and asm
kernels. We allow to use short hex strings to represent only a high 32
bit of a double value as a valid literal.
2023-10-12 14:45:45 -07:00
Ivan Kosarev
c62f208c05 [AMDGPU] Don't suppress printing the .l and .h register suffixes.
We don't seem to have a use for the -amdgpu-keep-16-bit-reg-suffixes
option anymore. Was introduced in <https://reviews.llvm.org/D79435>.

Reviewed By: Joe_Nash, foad

Differential Revision: https://reviews.llvm.org/D156102
2023-09-22 11:13:05 +01:00
Carl Ritson
6ebc179978
[AMDGPU][MC][GFX11] Always output wait_vdst and wait_exp (#66610)
Always output values of wait_vdst and wait_exp in assembly even when
they are zero.

While we normally avoid outputing default/zero parameters in assembly,
the values of these parameters still imply wait behaviour when zero.
Outputing zero values makes the intent more obvious to human readers,
and avoid any future ambiguity if we choose to change the defaults to
something other than zero.

Fixes #66383
2023-09-22 09:25:02 +09:00
Stanislav Mekhanoshin
cfe9a134bb [AMDGPU] Rename 64BitDPP feature and fix the checks
Names '64BitDPP' and especially 'DPP64' were found misleading, and
DPP64 can easily be mixed with DPP16 and DPP8 while these are
different concepts. DPP16 and DPP8 refers to lanes where DPP64
refers to the operand size.

In fact the essential part here is that these instructions are
executed on the DP ALU, so rename the feature accordingly.

I have also found a bug in a check for these instructions, which is
fixed here and a common utility function is now used.

Differential Revision: https://reviews.llvm.org/D158465
2023-08-22 11:00:10 -07:00
Reid Kleckner
f86c81b2a8 [AMDGPU] Avoid CodeGen dependencies from AMDGPU/Utils and MCTargetDesc
This required two substantial changes:
1. Moving a `getRegBitWidth(TargetRegisterClass)` overload out of Utils
   and into CodeGen
2. Passing the string function name to AMDGPUPALMetadata instead of the
   MachineFunction

Other changes are minor or updates to accommodate the first two.

See issue #64166 for more information on the layering issue.

Differential Revision: https://reviews.llvm.org/D156486
2023-07-27 15:19:24 -07:00
Ivan Kosarev
7208fde09e [AMDGPU][AsmParser][NFC] Generate printers for named-bit operands automatically.
Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D154433
2023-07-05 10:53:33 +01:00
Ivan Kosarev
12460cf90f [AMDGPU][AsmParser] Simplify the implementation of SWZ operands.
Those are implicit helper operands and therefore don't need any parsers
or printers.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: piotr, foad

Differential Revision: https://reviews.llvm.org/D154432
2023-07-05 10:45:12 +01:00
Ivan Kosarev
59fd48d71e [AMDGPU][AsmParser][NFC] Simplify instruction operand definitions.
This addresses the trivial cases that only require removing the
operand classes and renaming related entities.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D153965
2023-06-29 10:51:44 +01:00
Ivan Kosarev
212af2c081 [AMDGPU][AsmParser] Refine parsing of some 32-bit instruction operands.
Eliminates the need for the custom code in parseCustomOperand().

The remaining uses of NamedOperandU32 are to be addressed separately.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D150204
2023-05-19 16:54:30 +01:00
Fangrui Song
432caca39a Simplify with hasFeature. NFC 2023-02-17 18:22:24 -08:00
Kazu Hirata
64dad4ba9a Use llvm::bit_cast (NFC) 2023-02-14 01:22:12 -08:00
Ivan Kosarev
3d6b108a87 [AMDGPU] Remove the unused u8imm operand definition.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D142193
2023-02-07 11:48:38 +00:00
Archibald Elliott
8e3d7cf5de [NFC][TargetParser] Remove llvm/Support/TargetParser.h 2023-02-07 11:08:21 +00:00
Petar Avramovic
b0c1a45ba5 AMDGPU/MC: Refactor decoders. Rework decoders for float immediates
decodeFPImmed creates immediate operand using register operand width,
but size of created immediate should correspond to OperandType for
RegisterOperand.
e.g. OPW128 could be used for RegisterOperands that use v2f64 v4f32
and v8f16. Each RegisterOperands would have different OperandType and
require that immediate is decoded using 64, 32 and 16 bit immediate
respectively.
decodeOperand_<RegClass> only provides width for register decoding,
introduce decodeOperand_<RegClass>_Imm<ImmWidth> that also provides
width for immediate decoding.
Refactor RegisterOperands:
 - decoders get _Imm<ImmWidth> suffix in some cases
 - removed unused RegisterOperands defined via multiclass
 - use different RegisterOperand in a few places, new RegisterOperand's
   decoder corresponds to the number of bits used for operand's encoding
Refactor decoder functions:
 - add asserts for the size of encoding that will be decoded
 - regroup them according to the method of decoding
decodeOperand_<RegClass> (register only, no immediate) decoders can now
create immediate of consistent size, use it for better diagnostic of
'invalid immediate'.

Differential Revision: https://reviews.llvm.org/D142636
2023-02-01 16:52:57 +01:00
Mateja Marjanovic
f84d3dd0fd [AMDGPU] Make flat_offset a 32-bit operand instead of 16-bits
Differential Revision: https://reviews.llvm.org/D142549
2023-01-25 17:52:26 +01:00