331 Commits

Author SHA1 Message Date
Matt Arsenault
c13556c0b0
AMDGPU: Document more backend recognized attributes (#80239) 2024-03-28 14:27:14 +03:00
Matt Arsenault
b6b703b2df
AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948)
SIMachineFunctionInfo has a scan  of the function body for inline asm
which may use AGPRs, or callees in SIMachineFunctionInfo. Move this
into the attributor, so it actually works interprocedurally.
    
Could probably avoid most of the test churn if this bothered to avoid
adding this on subtargets without AGPRs. We should also probably
try to delete the MIR scan in usesAGPRs but it seems to be trickier
to eliminate.
2024-03-21 14:24:06 +05:30
Janek van Oirschot
f7bebc1914
Reland [AMDGPU] Add AMDGPU specific variadic operation MCExprs (#84562)
Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'. 

Relands #82022 with fixes
2024-03-14 14:31:00 +00:00
Matt Arsenault
5c3d001668
AMDGPU: Don't use table for metadata docs, and fix section headers (#85046) 2024-03-13 18:34:23 +05:30
Matt Arsenault
cd2f616313
AMDGPU: Use list-table for metadata table (#85024)
The table syntax for sphinx is really insufferably whitespace dependent.
I've been meaning to convert the existing attribute and intrinsic tables
to use list-table, which is less painful to merge.
2024-03-13 12:42:15 +05:30
Jun Wang
c4e517f59c
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-03-12 10:30:39 -07:00
Pierre van Houtryve
63c77d8475
[AMDGPU] Make generic versioning docs easier to find (#84761) 2024-03-11 15:56:17 +01:00
Florian Mayer
0083c3eb83
Revert "[AMDGPU] Add AMDGPU specific variadic operation MCExprs" (#84273)
Reverts llvm/llvm-project#82022

Fails on hwasan build bot:
https://lab.llvm.org/buildbot/#/builders/236/builds/9874/steps/10/logs/stdio
2024-03-06 19:37:49 -08:00
Janek van Oirschot
bec2d105c7
[AMDGPU] Add AMDGPU specific variadic operation MCExprs (#82022)
Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'.
2024-03-06 21:01:54 +00:00
Mirko Brkušanin
1fd1f4c0e1
[AMDGPU] Handle amdgpu.last.use metadata (#83816)
Convert !amdgpu.last.use metadata into MachineMemOperand for last use
and handle it in SIMemoryLegalizer similar to nontemporal and volatile.
2024-03-06 16:33:52 +01:00
Joseph Huber
1fc5e50ceb
[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906)
Summary:
This patch implements the LLVM floating point environment control
intrinsics and also exposes it through clang. We encode the floating
point environment as a 64-bit value that simply concatenates the values
of the mode registers and the current trap status. We only fetch the
bits relevant for floating point instructions. That is, rounding mode,
denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16
overflow, and active exceptions.
2024-03-06 08:11:54 -06:00
Pierre van Houtryve
43c7eb5d7b
[AMDGPU] Replace '.' with '-' in generic target names (#81718)
The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs &
the associated clang driver logic are also confused by the dot.

After discussions, we decided it's better to just remove the '.' from
the target name than fix each issue one by one.
2024-02-14 15:19:04 +01:00
Pierre van Houtryve
87d7711934
[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450)
Fixes SWDEV-443292
2024-02-13 09:07:51 +01:00
Austin Kerbow
4bcbeaed63
[AMDGPU] Enable kernel arg preloading with gfx90a (#81180)
Add a trap instruction to the beginning of the kernel prologue to handle
cases where preloading is attempted on HW loaded with incompatible
firmware.
2024-02-12 22:33:29 -08:00
Konstantin Zhuravlyov
75a1c4e10b
AMDGPU/NFC: Reserve 0x055 MACH in e_flag for future use (#81501) 2024-02-12 13:37:25 -05:00
Mariusz Sikora
0c63453714
[AMDGPU][NFC] Docs - remove duplicates (#81465) 2024-02-12 12:25:54 +01:00
Pierre van Houtryve
f93aa5157a
[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.

Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.

This contains the documentation changes for both this change and #76954
as well.
2024-02-12 10:18:20 +01:00
pvanhout
f5399e89a2 Remove trailing whitespaces in AMDGPUUsage.rst 2024-02-12 09:30:10 +01:00
Corbin Robeck
fcb59203c8
[AMDGPU][DOC] Add MI200 Names to AMDGPUUsage Doc (#81252) 2024-02-09 10:05:26 -05:00
Jan Patrick Lehr
f661057865
Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)
Reverts llvm/llvm-project#79586

This broke the AMDGPU OpenMP Offload buildbot.
The typical error message was that the GPU attempted to read beyong the
largest legal address.

Error message:
AMDGPU fatal error 1: Received error in queue 0x7f8363f22000:
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to
access memory beyond the largest legal address.
2024-02-09 09:57:38 +01:00
alex-t
88e52511ca
[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)
This change implements synthesizing the private buffer resource
descriptor in the kernel prolog instead of using the preloaded kernel
argument.
2024-02-08 20:27:36 +01:00
Saiyedul Islam
082f87c9d4
[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Corresponding llvm-objdump AMDGPU lit tests are updated
in a follow-up PR.
2024-01-23 17:08:18 +05:30
Konstantin Zhuravlyov
726d940586
AMDGPU/Docs: Add link to MI300 Instruction Set Architecture (#78777) 2024-01-22 10:32:35 -05:00
Emma Pilkington
bc82cfb38d
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.

This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
2024-01-21 11:54:47 -05:00
Jay Foad
9ca36932b5
[AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (#78186) 2024-01-18 10:23:27 +00:00
Mariusz Sikora
c99da46fc1
[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2024-01-17 19:23:42 +01:00
Chaitanya
9803de0e8e
[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273)
"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a
function uses dynamic LDS.

hidden argument will be added in below cases:

- LDS global is used in the kernel.
- Kernel calls a function which uses LDS global.
- LDS pointer is passed as argument to kernel itself.
2024-01-04 19:05:12 +05:30
Jay Foad
c01e844a7e
[AMDGPU] Update compute program resource registers for GFX12 (#75911)
Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>
2024-01-02 13:24:42 +00:00
Jeffrey Byrnes
f1156fb622
[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416)
Makes constructing SchedGroups of this type easier, and provides ability
to create them with __builtin_amdgcn_sched_group_barrier
2023-12-19 16:54:18 -08:00
Jessica Del
32f9983c06
[AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
2023-12-15 15:49:25 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
Michael Halkenhaeuser
19fa27605c [NFC][docs] Add AMDGPU documentation for LIBOMPTARGET_STACK_SIZE
Add documentation w.r.t. changes by #72606, which allows to set the dynamic
callstack size.
2023-11-28 14:09:42 -05:00
Jay Foad
a1b3b78c55
[AMDGPU] Clarify description of _HI relocation types (#73663)
Clarify how the addend is used in _HI relocation types like
R_AMDGPU_ABS32_HI based on the current behaviour of the Mesa and AMDPAL
ELF loaders.

This affects Mesa and AMDPAL because they use REL relocation records, so
the addend for these types is the 32-bit literal value from the
instruction being relocated. AMDHSA is not affected because it uses RELA
relocation records which have a 64-bit addend.
2023-11-28 17:06:11 +00:00
Jay Foad
82a5708c7a
[AMDGPU] Document that PAL uses Elf64_Rel relocation records (#73648) 2023-11-28 14:37:10 +00:00
Jay Foad
cf1e0c0b07
[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Pierre van Houtryve
4428b01faa Reland: [AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-11-07 12:23:03 +01:00
bcahoon
f4b54f799a
[AMDGPU] Add documentation for scheduler intrinsics (#69854)
Adding sched_barrier, sched_group_barrier, and iglp_opt.
2023-11-02 18:47:45 -05:00
Konstantin Zhuravlyov
8b36a19b3f
AMDGPU/Docs: Memory model updates for GFX940, GFX941, GFX942 (#71091)
- Update memory model sequences for GFX940, GFX941, GFX942 to match
implementation
  - Re-title "Memory Model GFX940" to "Memory Model GFX942"

Co-authored with @t-tye

Change-Id: I82f1707b7c3e010ce1fe8207fcca18c4570057a3

Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>
2023-11-02 14:48:21 -04:00
Konstantin Zhuravlyov
8b61ef0925
AMDGPU/Docs: Add links to instruction descriptions for gfx941, gfx942 (#70941)
Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>
2023-11-01 14:14:48 -04:00
Austin Kerbow
d681461098
[AMDGPU] Add doc updates for kernarg preloading (#67516) 2023-10-19 13:43:35 -07:00
pvanhout
868abf0961 Revert "[AMDGPU] Remove Code Object V3 (#67118)"
This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.
2023-10-18 12:55:36 +02:00
Pierre van Houtryve
544d91280c
[AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-10-16 08:21:48 +02:00
Pierre van Houtryve
fe2f67e4ba
[AMDGPU] Remove Code Object V2 (#65715)
Code Object V2 has been deprecated for more than a year now. We can
safely remove it from LLVM.

- [clang] Remove support for the `-mcode-object-version=2` option.
- [lld] Remove/refactor tests that were still using COV2
- [llvm] Update AMDGPUUsage.rst
- Code Object V2 docs are left for informational purposes because those
code objects may still be supported by the runtime/loaders for a while.
- [AMDGPU] Remove COV2 emission capabilities.
- [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2
- [AMDGPU] Update all tests that were still using COV2 - They are either
deleted or ported directly to code object v4 (as v3 is also planned to
be removed soon).
2023-09-21 12:00:45 +02:00
Saiyedul Islam
466a8149b3
Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)
This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
2023-09-12 15:13:59 +05:30
Saiyedul Islam
0a8d17e79b
[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818
2023-09-12 13:53:31 +05:30
Matt Arsenault
17bd80601e AMDGPU: Implement llvm.get.fpmode
Currently s_getreg_b32 is missing the possible mode use. Really we
need separate pseudos for mode-only accesses, but leave this as a
pre-existing issue.

https://reviews.llvm.org/D152710
2023-09-10 10:19:19 +03:00
Matt Arsenault
5f8ee45d5a AMDGPU: Implement llvm.get.rounding
There are really two rounding modes, so only return the standard
values if both modes are the same. Otherwise, return a bitmask
representing the two modes.

Annoyingly the register doesn't use the same values as FLT_ROUNDS. Use
a simple integer table we can shift into to convert.

https://reviews.llvm.org/D153158
2023-08-30 14:06:13 -04:00
Kazu Hirata
3a14993fa4 Fix typos in documentation 2023-08-27 00:18:14 -07:00
Jeffrey Byrnes
3ba8dabbf3 [AMDGPU] Add sdot4 / sdot8 intrinsics for gfx11
This provides a uniform way to lower into the relevant instructions across all generations.

Differential Revision: https://reviews.llvm.org/D158468

Change-Id: I1f7ba4b15ee470738535cf1c7d177a11fc471e43
2023-08-25 11:45:55 -07:00
Diana Picus
71eb8c07dd [AMDGPU] Update amdgpu_cs_chain_preserve docs. NFC
We no longer allow calls to functions with the `amdgpu_gfx` calling
convention from functions with the `amdgpu_cs_chain_preserve` calling
convention. See D153517.

Also mention that we can't have a chain call from
amdgpu_cs_chain_preserve using more VGPRs than it has received.

Differential Revision: https://reviews.llvm.org/D156408
2023-08-24 10:17:00 +02:00