406 Commits

Author SHA1 Message Date
Mariusz Sikora
966416b9e8
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Mirko Brkušanin
47615ddc84
[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193) 2023-12-14 14:22:04 +01:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Jay Foad
99430c589f
[AMDGPU] Update sendmsg types for GFX12 (#74888) 2023-12-11 09:02:00 +00:00
Kazu Hirata
55531e715f [Target] Remove unused forward declarations (NFC) 2023-12-10 10:38:55 -08:00
Jay Foad
8d4977affb
[AMDGPU] Update hardware registers for GFX12 (#74445) 2023-12-06 10:04:52 +00:00
Mariusz Sikora
19c9f9c0bf
[AMDGPU] GFX12: Add s_prefetch_inst/data instructions (#74448)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-06 10:01:29 +01:00
Mirko Brkušanin
f5868cb6a6
[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062) 2023-12-04 13:04:42 +01:00
Kazu Hirata
92c2529ccd [llvm] Stop including vector (NFC)
Identified with clangd.
2023-12-03 22:32:21 -08:00
Emma Pilkington
6c62f7cbfb
[MsgPack] Handle Expected<T> errors in document reader (#73183)
This was causing an assert on invalid in the modified test case.
2023-11-28 13:13:05 -05:00
Stanislav Mekhanoshin
87d884b5c8
[AMDGPU] Fix folding of v2i16/v2f16 splat imms (#72709)
We can use inline constants with packed 16-bit operands, but these
should use op_sel. Currently splat of inlinable constants is considered
legal, which is not really true if we fail to fold it with op_sel and
drop the high half. It may be legal as a literal but not as inline
constant, but then usual literal checks must be performed.

This patch makes these splat literals illegal but adds additional logic
to the operand folding to keep current folds. This logic is somewhat
heavy though.

This has fixed constant bus violation in the fdot2 test.
2023-11-28 09:07:26 -08:00
Jay Foad
cf1e0c0b07
[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Diana
61332cb047
[AMDGPU] Emit backend_stack_size PAL metadata (#72509)
For chain functions, PAL uses a `backend_stack_size` metadata item,
which at the moment has the same meaning as `stack_frame_size_in_bytes`.
We emit both for now in order to simplify coordination with PAL.

The new item must be emitted in the `shader_functions` section, just as
the metadata for other module entry functions. For simplicity, we mark
chain functions as module entry functions and emit the same metadata for
all of them.
2023-11-20 10:01:13 +01:00
Jay Foad
cc5c2efe4a
[AMDGPU] Fix and use isSISrcInlinableOperand. NFC. (#72101) 2023-11-13 11:29:56 +00:00
Jun Wang
54470176af
[AMDGPU] Add inreg support for SGPR arguments (#67182)
Function parameters marked with inreg are supposed to be allocated to
SGPRs. However, for compute functions, this is ignored and function
parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in
AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg.
---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-11-08 11:35:52 -08:00
Pierre van Houtryve
4428b01faa Reland: [AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-11-07 12:23:03 +01:00
Ivan Kosarev
d1e3d32088
[AMDGPU][NFCI] Decouple actual register encodings from HWEncoding values. (#69452)
The HWEncoding values currently form a strange mix of actual register
codes for some subtargets and types of operands and informational flags.
This patch removes the dependency allowing arbitrary changes in the
structure of HWEncoding values without breaking register encodings.

Such changes, in turn, would make it possible to speed up and simplify
getAVOperandEncoding() testing for AGPRs as well as other functions
dealing with register codes downstream. They would also allow to
maintain the same format of HWEncoding values across our downstream code
bases, thus simplifying merging in mainline changes.
2023-10-25 13:24:50 +01:00
Konstantin Zhuravlyov
6cfb64276d
AMDGPU: Minor updates to program resource registers (#69525)
- Be explicit about which program resource register is supported by
which target
    - RSRC1
      - FP16_OVFL is GFX9+
      - WGP_MODE is GFX10+
      - MEM_ORDERED is GFX10+
      - FWD_PROGRESS is GFX10+
    - RSRC3
      - INST_PREF_SIZE is GFX11+
      - TRAP_ON_START is GFX11+
      - TRAP_ON_END is GFX11+
      - IMAGE_OP is GFX11+
  - Do not emit GFX11+ fields when disassembling GFX10 code objects
  - Tighten enforcement of reserved bits in disassembler

---------

Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>
2023-10-19 12:40:19 -04:00
pvanhout
868abf0961 Revert "[AMDGPU] Remove Code Object V3 (#67118)"
This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.
2023-10-18 12:55:36 +02:00
Pierre van Houtryve
544d91280c
[AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-10-16 08:21:48 +02:00
Stanislav Mekhanoshin
ab6c3d5034
[AMDGPU] Change the representation of double literals in operands (#68740)
A 64-bit literal can be used as a 32-bit zero or sign extended operand.
In case of double zeroes are added to the low 32 bits. Currently asm
parser stores only high 32 bits of a double into an operand. To support
codegen as requested by the
https://github.com/llvm/llvm-project/issues/67781 we need to change the
representation to store a full 64-bit value so that codegen can simply
add immediates to an instruction.

There is some code to support compatibility with existing tests and asm
kernels. We allow to use short hex strings to represent only a high 32
bit of a double value as a valid literal.
2023-10-12 14:45:45 -07:00
Kazu Hirata
b05dbc4d5f [llvm] Use llvm::endianness::{big,little,native} (NFC)
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form.  This patch replaces
support::endianness::{big,little,native} with
llvm::endianness::{big,little,native}.
2023-10-10 20:14:20 -07:00
Mirko Brkušanin
2cd2445c21
[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on supported subtargets (#67461)
In order to avoid duplicating every dpp pseudo opcode that has src1, we
allow it for all opcodes and add manual checks on subtargets that do not
support it.
2023-09-29 11:54:49 +02:00
Stanislav Mekhanoshin
2024bfec9c
[AMDGPU] Remove int types from isSISrcFPOperand (#67401)
This is NFCI, I don't believe there are any instructions using packed
types in the ins dag, only in patterns, and the affected function is
only used in the asm parser. However, int types shall not be reported as
fp types.

This may be usesul if we create an asm syntax for packed fp literals
which we currently don't. If/when we do it that shall affect if we
accept FP modifiers on these types or not. Say we could create a syntax
like v2(-lit1, |lit2|) that would matter then.
2023-09-26 01:38:09 -07:00
Ivan Kosarev
9310baa596 [AMDGPU][NFC] Add True16 operand definitions.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D156103
2023-09-25 16:48:46 +01:00
Ruiling, Song
45e425e355
AMDGPU: Teach isArgPassedInSGPR() about cs_chain* calling convention (#67086)
This cs_chain and cs_chain_preserve use InReg attribute to indicate
argument passed through SGPR.
2023-09-22 22:24:17 +08:00
Pierre van Houtryve
fe2f67e4ba
[AMDGPU] Remove Code Object V2 (#65715)
Code Object V2 has been deprecated for more than a year now. We can
safely remove it from LLVM.

- [clang] Remove support for the `-mcode-object-version=2` option.
- [lld] Remove/refactor tests that were still using COV2
- [llvm] Update AMDGPUUsage.rst
- Code Object V2 docs are left for informational purposes because those
code objects may still be supported by the runtime/loaders for a while.
- [AMDGPU] Remove COV2 emission capabilities.
- [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2
- [AMDGPU] Update all tests that were still using COV2 - They are either
deleted or ported directly to code object v4 (as v3 is also planned to
be removed soon).
2023-09-21 12:00:45 +02:00
Austin Kerbow
69447d6afe [AMDGPU] Add ASM and MC updates for preloading kernargs
Add assembler directives for preloading kernel arguments that correspond
to new fields in the kernel descriptor for the length and offset of
arguments that will be placed in SGPRs prior to kernel launch. Alignment
of the arguments in SGPRs is equivalent to the kernarg segment when
accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated
directly after other user SGPRs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159459
2023-09-19 15:45:16 -07:00
Saiyedul Islam
466a8149b3
Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)
This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
2023-09-12 15:13:59 +05:30
Saiyedul Islam
0a8d17e79b
[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818
2023-09-12 13:53:31 +05:30
TY-AMD
b1b6c06567
[AMDGPU] Erase ShaderFunctions in AMDGPUPALMetadata::reset() (#65247) 2023-09-04 08:03:01 -04:00
Stanislav Mekhanoshin
cfe9a134bb [AMDGPU] Rename 64BitDPP feature and fix the checks
Names '64BitDPP' and especially 'DPP64' were found misleading, and
DPP64 can easily be mixed with DPP16 and DPP8 while these are
different concepts. DPP16 and DPP8 refers to lanes where DPP64
refers to the operand size.

In fact the essential part here is that these instructions are
executed on the DP ALU, so rename the feature accordingly.

I have also found a bug in a check for these instructions, which is
fixed here and a common utility function is now used.

Differential Revision: https://reviews.llvm.org/D158465
2023-08-22 11:00:10 -07:00
Diana Picus
26dc284498 [AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions
Lower formal arguments and returns for functions with the
`amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions:

* Put `inreg` arguments into SGPRs, starting at s0, and other arguments
into VGPRs, starting at v8. No arguments should end up on the stack, if
we don't have enough registers we should error out.

* Lower the return (which is always void) as an S_ENDPGM.

* Set the ScratchRSrc register to s48:51, as described in the docs.

* Set the SP to s32, matching amdgpu_gfx. This might be revisited in a
future patch.

Differential Revision: https://reviews.llvm.org/D153517
2023-08-21 11:16:17 +02:00
Mirko Brkusanin
de82fde22d AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergent
Patch by: Acim Maravic

Differential Revision: https://reviews.llvm.org/D157091
2023-08-18 18:23:40 +02:00
Jay Foad
e61ca23289 [AMDGPU] Add and use SIInstrFlags::GWS. NFC.
This reduces the number of places where we have to check for a list of
DS_GWS_* opcodes.

Differential Revision: https://reviews.llvm.org/D157099
2023-08-07 12:05:14 +01:00
Reid Kleckner
f86c81b2a8 [AMDGPU] Avoid CodeGen dependencies from AMDGPU/Utils and MCTargetDesc
This required two substantial changes:
1. Moving a `getRegBitWidth(TargetRegisterClass)` overload out of Utils
   and into CodeGen
2. Passing the string function name to AMDGPUPALMetadata instead of the
   MachineFunction

Other changes are minor or updates to accommodate the first two.

See issue #64166 for more information on the layering issue.

Differential Revision: https://reviews.llvm.org/D156486
2023-07-27 15:19:24 -07:00
Stanislav Mekhanoshin
237784dd30 [AMDGPU] Remove std::optional from VOPD::ComponentProps. NFC.
This class has to be fast and efficient with a trivial copy
constructor.

Differential Revision: https://reviews.llvm.org/D155881
2023-07-21 10:52:37 -07:00
Jon Chesterfield
a222951148 [amdgpu][nfc] Use unsigned for getIntegerPairAttribute to match the only call sites 2023-07-15 20:42:13 +01:00
Ivan Kosarev
289ae6525d [AMDGPU][MC] Fix handling of A16 operands in intersect_ray instructions.
The patch adds the support for 'noa16' operands in non-A16 variants of
the instructions, fixes validation of A16 operands and eliminates the
custom conversion to MCInst.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155057
2023-07-13 19:46:03 +01:00
Stephen Thomas
8aedad0fa0 [AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands
Add functions AMDGPU::DepCtr::encodeField*() and AMDGPU::DepCtr::decodeField*()
for each of vm_vsrc, va_vdst and sa_sdst. These are now used in
AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with
S_WAITCNT_DEPCTR operands easier and more readable.

Differential Revision: https://reviews.llvm.org/D154424
2023-07-04 11:02:12 +01:00
Stephen Thomas
fed6cd9c15 [AMDGPU] Remove unused method Waitcnt::dominates(). NFC
Differential Revision: https://reviews.llvm.org/D153322
2023-06-20 15:10:56 +01:00
Stanislav Mekhanoshin
e2903abc15 [AMDGPU] Remove integer division in VOPD checks
There is no way any compiler can simplify this division, while
the check is done rather often.

Differential Revision: https://reviews.llvm.org/D152613
2023-06-12 15:01:53 -07:00
Ivan Kosarev
4e312abdfd [AMDGPU][NFC] Add a getRegBitWidth() helper for TargetRegisterClass operands.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D152257
2023-06-07 11:41:11 +01:00
NAKAMURA Takumi
5d71ec6e44 Split out CodeGenTypes from CodeGen for LLT/MVT
This reduces dependencies on `llvm-tblgen` so much.

`CodeGenTypes` depends on `Support` at the moment.
Be careful to append deps on this, since Targets' tablegens
depend on this.

Depends on D149024

Differential Revision: https://reviews.llvm.org/D148769
2023-05-03 00:13:20 +09:00
NAKAMURA Takumi
9cfeba5b12 Restore CodeGen/LowLevelType from Support
This is rework of;
  - D30046 (LLT)

Since I have introduced `llvm-min-tblgen` as D146352, `llvm-tblgen`
may depend on `CodeGen`.

`LowLevlType.h` originally belonged to `CodeGen`. Almost all userse are
still under `CodeGen` or `Target`. I think `CodeGen` is the right place
to put `LowLevelType.h`.

`MachineValueType.h` may be moved as well. (later, D149024)

I have made many modules depend on `CodeGen`. It is consistent but
inefficient. It will be split out later, D148769

Besides, I had to isolate MVT and LLT in modmap, since
`llvm::PredicateInfo` clashes between `TableGen/CodeGenSchedule.h`
and `Transforms/Utils/PredicateInfo.h`.
(I think better to introduce namespace llvm::TableGen)

Depends on D145937, D146352, and D148768.

Differential Revision: https://reviews.llvm.org/D148767
2023-05-03 00:13:19 +09:00
NAKAMURA Takumi
7d5d987e93 [CMake] Reorder and reformat deps 2023-04-17 00:32:16 +09:00
David Stuttard
fc83f1de5d [AMDGPU] Add backend support for new PAL ELF Metadata 3.0
PAL Metadata 3.0 introduces an explicit structure in metadata for the
programmable registers written out by the compiler backend.
Rather than using opaque registers which can change between different
architectures and requires encoding the bitfield information in the backend,
which may change between versions.

This is the initial minimal implementation that enables the use of PAL Metadata
3.0.

The change itself should be NFC for non-PAL, although the way RSRC2 register is
handled has been changed slightly.

The test is fairly minimal, but checks that the metadata format looks as
expected and verifies a couple of special cases such as tgid_[xyz]_en handling
and PsInputAddr/Ena which also change to explicit fields.

Differential Revision: https://reviews.llvm.org/D147143
2023-04-14 09:57:13 +01:00