349 Commits

Author SHA1 Message Date
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Jay Foad
6f956e3117
[AMDGPU] Rename LocalMemorySize features to AddressableLocalMemorySize (#110242)
Change the names of the TableGen features to match the names used by
AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed
by a single workgroup. Add some explanatory comments. NFC.
2024-09-30 10:29:31 +01:00
Craig Topper
fd50cdfb94 [AMDGPU] Use MCRegister. NFC 2024-09-28 11:40:25 -07:00
Scott Egerton
396f677514
[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769) 2024-09-24 10:58:00 +01:00
Youngsuk Kim
d31e314131 [llvm] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
2024-09-20 12:19:59 -05:00
Jeffrey Byrnes
7bcf4d63cf
[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)
MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position" and "VALU op
consumes result of that op"

This includes the case where the second op is SDWA with same dest and
dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there
is an implicit read of the first op dst and the compiler needs to
resolve this hazard. Confirmed with HW team.

We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand,
so this PR checks for that.

MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and
CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue.
Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 --
this PR adds support opsel[2] != 0 as well
2024-08-22 11:38:24 -07:00
Mariusz Sikora
2f89c1c76c
[AMDGPU][NFC] Remove duplicate code by using getAddressableLocalMemorySize (#104604) 2024-08-17 08:11:03 +02:00
Ivan Kosarev
f0fe6c66cb
[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888)
And declare it to take an MCRegister.

Also rename related entities and remove a comment for the function that
depending on its purpose is either irrelevant or misleading.
2024-08-14 17:04:15 +01:00
Jay Foad
63fae3ed65
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298) 2024-07-17 21:11:00 +01:00
Jay Foad
74b87b02d2 [AMDGPU] Fix and add namespace closing comments. NFC. 2024-07-16 16:56:31 +01:00
Matt Arsenault
b3f5c7247d
AMDGPU: Assume true in getVOPNIsSingle helpers (#98516)
If we have something we don't know what it is, we should conservatively
avoid printing an additional suffix. For isCodeGenOnly
pseudoinstructions,
no encoded instruction is added to the tables this is queried, and the
null
case would assume true.

This happens to fix the case I ran into, but this isn't a wholistic fix.
These really should be encoded directly in the TSFlags of the
MCInstrDesc,
which would allow encoding pseudos to work correctly.
2024-07-12 21:12:56 +04:00
vangthao95
3aef525aa4
[AMDGPU] Fix negative immediate offset for unbuffered smem loads (#89165)
For unbuffered smem loads, it is illegal for the immediate offset to be
negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is
negative.

New PR of https://github.com/llvm/llvm-project/pull/79553.
2024-06-24 14:18:23 -07:00
Matt Arsenault
8520061281
AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590)
This has always been supported. Somehow, we ended up with 2
copies of clang builtins for this case, and the newer one
erroneously requires gfx8-insts.
2024-06-18 18:34:34 +02:00
Scott Egerton
4a305d40a3
[AMDGPU] Exclude certain opcodes from being marked as single use (#91802)
The s_singleuse_vdst instruction is used to mark regions of instructions
that produce values that have only one use.
Certain instructions take more than one cycle to execute, resulting in
regions being incorrectly marked.
This patch excludes these multi-cycle instructions from being marked as
either producing single use values or consuming single use values
or both depending on the instruction.
2024-06-12 10:43:23 +01:00
Janek van Oirschot
a699ccbf0c
MCExpr-ify amd_kernel_code_t (#91587)
Redefines the amd_kernel_code_t struct with MCExprs for members that would be
derived from SIProgramInfo MCExpr members.
2024-05-22 13:45:45 +01:00
Janek van Oirschot
d86b68afd7
MCExpr-ify SIProgramInfo (#88257)
Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.
2024-05-09 13:02:32 +01:00
Emma Pilkington
dcc7ef3ce8
[AMDGPU][MC] Disable sendmsg SYSMSG_OP_HOST_TRAP_ACK on gfx9+ (#90203)
This is no longer supported as of gfx9. Fixes #52903

This commit also includes some refactoring of sendmsg operand parsing:
  - Use CustomOperand for sendmsg operations, this allows them to be
    conditionally available based on a STI check (and automatically in
    sync with SIDefines.h).
  - Move CustomOperand table lookups from AMDGPUBaseInfo to
    AMDGPUAsmUtils. This cleans up an awkward interface where
    AMDGPUAsmUtils defined a table/size as globals that AMDGPUBaseInfo
    had to loop over.
  - Clean up a few of the operand lookup functions while moving them.
2024-05-07 07:38:58 -04:00
Emma Pilkington
607b4bc602
[AMDGPU] Add a missing COV6 case to getAMDHSACodeObjectVersion() (#87492) 2024-04-03 15:36:58 -04:00
Joe Nash
e29228efae
[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418)
Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on
GFX1150Plus

The w32 and w64 _e64_dpp assembler only real instructions were unused,
and erroneously constructed in a way that bugged parsing of the new
instructions. They are removed.

This patch is a follow up to PR
https://github.com/llvm/llvm-project/pull/87382
2024-04-03 14:51:27 -04:00
Janek van Oirschot
1103a2a337
Reland [AMDGPU] MCExpr-ify MC layer kernel descriptor (#86494)
Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.

Relands #80855 with fixes
2024-03-27 11:59:56 +00:00
Mariusz Sikora
94a550dab2
[AMDGPU][NFC] Rename Feature GFX11FullVGPRs to 1_5xVGPRs (#86468) 2024-03-25 11:00:59 +01:00
David Stuttard
75e528fdd9
[AMDGPU] Extend zero initialization of return values for TFE (#85759)
buffer_load instructions that use TFE also need to zero initialize
return values similar to how the image instructions currently work. Add
support for this with standard zero init of all results + zero init of
just TFE flag when enable-prt-strict-null subtarget feature is disabled.
2024-03-25 09:01:46 +00:00
Janek van Oirschot
797336b127
Revert "[AMDGPU] MCExpr-ify MC layer kernel descriptor" (#86151)
Reverts llvm/llvm-project#80855
2024-03-21 10:19:54 -07:00
Janek van Oirschot
857161c367
[AMDGPU] MCExpr-ify MC layer kernel descriptor (#80855)
Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.
2024-03-21 13:57:10 +00:00
Carl Ritson
c29b265eb9 Reapply "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)"
This reverts commit 7d508eb5d38f4bbbab4230a666d9e742e271af61.
2024-03-14 10:56:43 +09:00
Jun Wang
c4e517f59c
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-03-12 10:30:39 -07:00
Shilei Tian
e963d0740e
[AMDGPU] Replace isInlinableLiteral16 with specific version (#84402)
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
2024-03-08 14:49:52 -05:00
Diana Picus
0086cc95b3
[AMDGPU] Rename getNumVGPRBlocks. NFC (#84161)
Rename getNumVGPRBlocks to getEncodedNumVGPRBlocks, to clarify that it's
using the encoding granule. This is used to program the hardware. In
practice, the hardware will use the alloc granule instead, so this patch
also adds a new helper, getAllocatedNumVGPRBlocks, which can be useful
when driving heuristics.
2024-03-07 12:46:42 +01:00
Emma Pilkington
4490003a22
[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)
The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.
2024-03-06 09:51:48 -05:00
Shilei Tian
e9c1dbb408 Revert "[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345)"
This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks
downstream.
2024-03-06 08:42:54 -05:00
Shilei Tian
530f0e64ec
[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345) 2024-03-04 08:40:42 -05:00
Ivan Kosarev
680c780a36
[AMDGPU][AsmParser] Support structured HWREG operands. (#82805)
Symbolic values are to be supported separately.
2024-02-28 14:44:34 +00:00
Jeffrey Byrnes
113052b2b0 [AMDGPU] Prefer lower total register usage in regions with spilling
Change-Id: Ia5c434b0945bdcbc357c5e06c3164118fc91df25
2024-02-26 12:19:52 -08:00
Ivan Kosarev
dfa1d9b027
[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)
These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined constants
and functions.
2024-02-23 17:34:55 +00:00
Shilei Tian
46734aa1e5
[AMDGPU] Use bf16 instead of i16 for bfloat (#80908)
Currently we generally use `i16` to represent `bf16` in those tablegen
files. This patch is trying to use `bf16` directly.

Fix #79369.
2024-02-16 15:58:30 -05:00
Mirko Brkušanin
815e0485a4
[AMDGPU][MC] Fix printing vcc(_lo) twice for VOPC DPP instrucitons (#81158) 2024-02-12 19:01:58 +01:00
Carl Ritson
7d508eb5d3 Revert "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)"
This reverts commit d6c7253d32e4bdff619c39708170f1c1fa01ff95.

Change causing CTS failures due to incomplete metadata.
2024-02-07 17:09:56 +09:00
David Stuttard
d6c7253d32
[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)
PAL Metadata 3.0 introduces an explicit structure in metadata for the
programmable registers written out by the compiler backend.
The previous approach used opaque registers which can change between different
architectures and required encoding the bitfield information in the backend,
which may change between versions.

This change is an extension the previously added support - which only handled
entry functions. This adds support for all functions.

The change also includes some re-factoring to separate common code.
2024-02-06 15:34:36 +00:00
Pierre van Houtryve
500846d2f5
[AMDGPU] Introduce Code Object V6 (#76954)
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same
as V5 except a new "generic version" flag can be present in EFLAGS. This
is related to new generic targets that'll be added in a follow-up patch.
It's also likely V6 will have new changes (possibly new metadata
entries) added later.

Docs change are part of the follow-up patch #76955
2024-02-05 08:19:53 +01:00
Emma Pilkington
4eb0810922
[llvm-objdump][AMDGPU] Pass ELF ABIVersion through disassembler (#78907)
Admittedly, its a bit ugly to pass the ABIVersion through onSymbolStart
but I'm not sure what a better place for it would be.
2024-02-01 11:26:42 -05:00
Mariusz Sikora
cfddb59be2
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Saiyedul Islam
082f87c9d4
[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Corresponding llvm-objdump AMDGPU lit tests are updated
in a follow-up PR.
2024-01-23 17:08:18 +05:30
Emma Pilkington
bc82cfb38d
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.

This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
2024-01-21 11:54:47 -05:00
Jay Foad
ba52f06f9d
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
separate LOADCNT, SAMPLECNT and BVHCNT counters.
2024-01-18 10:47:45 +00:00
Nicolai Hähnle
49b492048a
AMDGPU: Fix packed 16-bit inline constants (#76522)
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.

Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.

Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.

Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.

Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.

A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.

Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
2024-01-04 00:10:15 +01:00
Ivan Kosarev
8c6172b0ac
[AMDGPU][True16] Don't use the VGPR_LO/HI16 register classes. (#76440)
Removing the classes requires updating tests and so is planned to be
done with a separate change.
2023-12-28 11:48:25 +00:00
Mariusz Sikora
966416b9e8
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Mirko Brkušanin
47615ddc84
[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193) 2023-12-14 14:22:04 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00