383 Commits

Author SHA1 Message Date
Fabian Ritter
db597084c5
[AMDGPU][docs] Replace gfx940 and gfx941 with gfx942 in llvm/docs (#126887)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.

For SWDEV-512631
2025-02-19 10:31:47 +01:00
Fabian Ritter
8615f9aaff
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.

Documentation changes will follow.

For SWDEV-512631
2025-02-19 10:20:48 +01:00
Krzysztof Drewniak
f7d03707d1
[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828)
Attempting to pass a `ptr addrspace(7)` to functions that take `ptr`
arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7)
to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP
operations on buffer resources, which can't be GEP'd. (However, note
that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr
is legal - it's just an effective address computation)

To resolve this problem, and thus prevent illegal
`getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this
commit extends amdgcn.make.buffer.rsrc to also be variadic in its result
type, auto-upgrading old manglings.

The logic for handling a make.buffer.rsrc in instruction selection
remains untouched and expects the output type to be a ptr addrspace(8),
as does the Clang lowering for its builtin (the pointer-to-pointer
version might want a different name in clang). LowerBufferFatPointers
has been updated to lower
amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* .

This'll also make exposing buffer fat pointers in Clang easier, since
you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
2025-02-18 14:15:28 -06:00
Stanislav Mekhanoshin
7639242155
[AMDGPU] Create new directive .amdhsa_inst_pref_size (#126622)
The field INST_PREF_SIZE is available since gfx11.
2025-02-11 08:35:45 -08:00
Konstantin Zhuravlyov
fc4210fb6c
AMDGPU/Docs: Fix target properties for gfx9-4-generic (#125593)
gfx9-4-generic has architected flat scratch, not absolute
2025-02-04 21:47:43 -05:00
Carl Ritson
1f38d38d54 [AMDGPU] Fix documentation table formatting from #118750 (NFC) 2025-01-30 14:27:25 +09:00
Carl Ritson
a3a3e6997b
[AMDGPU] Rewrite GFX12 SGPR hazard handling to dedicated pass (#118750)
- Algorithm operates over whole IR to attempt to minimize waits.
- Add support for VALU->VALU SGPR hazards via VA_SDST/VA_VCC.
2025-01-30 11:21:11 +09:00
Jeffrey Byrnes
db1ee18eda NFC: Typo fix
Change-Id: I08470bc617490558250136ea35a4964003fa9981
2025-01-24 15:59:13 -08:00
Jun Wang
77c23fd0aa
[AMDGPU] Update AMDGPUUsage.rst to document two intrinsics (#123816)
The AMDGPUUsage.rst file is updated to document two intrinsics:
llvm.amdgcn.mov.dpp and llvm.amdgcn.update.dpp.
2025-01-24 14:12:18 -08:00
Austin Kerbow
2e5c298281
[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167)
Add a prologue to the kernel entry to handle cases where code designed
for kernarg preloading is executed on hardware equipped with
incompatible firmware. If hardware has compatible firmware the 256 bytes
at the start of the kernel entry will be skipped. This skipping is done
automatically by hardware that supports the feature.

A pass is added which is intended to be run at the very end of the
pipeline to avoid any optimizations that would assume the prologue is a
real predecessor block to the actual code start. In reality we have two
possible entry points for the function. 1. The optimized path that
supports kernarg preloading which begins at an offset of 256 bytes. 2.
The backwards compatible entry point which starts at offset 0.
2025-01-10 11:39:02 -08:00
Austin Kerbow
aebd3389a9
[AMDGPU] Fix user SGPR alloc order in docs (#119092)
NFC. Preload kernarg SGPRs are allocated after the private segment size
SGPR. This patch updates AMDGPUUsage.rst to reflect this.
2024-12-07 13:08:35 -08:00
Jeffrey Byrnes
9ac52ce8d6
[AMDGPU] Add iglp_opt(3) for simple mfma / exp interleaving (#117269)
Adds a minimal iglp_opt to do simple exp / mfma interleaving.
2024-12-06 15:19:07 -08:00
Shilei Tian
17cfd016b4 [AMDGPU][Doc] Add gfx950 to gfx9-4-generic in the document 2024-12-03 11:17:22 -05:00
Matt Arsenault
d1cca3133a
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
2024-11-22 20:12:50 -08:00
Matt Arsenault
01c9a14ccf
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.
2024-11-21 08:51:58 -08:00
Matt Arsenault
5a556d55fb
AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309) 2024-11-18 10:48:56 -08:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Diana Picus
2aa6cedfa8
[AMDGPU] Clarify amdgpu.cs.chain + init whole wave. NFC (#115452)
Add some docs clarifying how inactive lanes are handled in the
amdgpu_cs_chain calling convention when the llvm.amdgcn.init.whole.wave
intrinsic is used.
2024-11-14 10:10:33 +01:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Matt Arsenault
0b40f97929
AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751)
0 does not make sense as a value for this to be, much less the default.
Also stop emitting each individual field if it is the default, rather than
if any element was the default. Also fix the name of the test since it didn't
exactly match the real attribute name.
2024-11-05 12:50:44 -08:00
Carl Ritson
076aac59ac
[AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Jay Foad
e7f1dae412
[AMDGPU] gfx1152 does not have Feature1_5xVGPRs (#113163) 2024-10-22 11:12:00 +01:00
Petar Avramovic
7b0d56be1d
AMDGPU/GlobalISel: Fix inst-selection of ballot (#109986)
Both input and output of ballot are lane-masks:
result is lane-mask with 'S32/S64 LLT and SGPR bank'
input is lane-mask with 'S1 LLT and VCC reg bank'.
Ballot copies bits from input lane-mask for
all active lanes and puts 0 for inactive lanes.
GlobalISel did not set 0 in result for inactive lanes
for non-constant input.
2024-10-11 11:40:27 +02:00
Pierre van Houtryve
924a64a348
[AMDGPU] Only emit SCOPE_SYS global_wb (#110636)
global_wb with scopes lower than SCOPE_SYS is unnecessary for
correctness.

I was initially optimistic they would be very cheap no-ops but they can
actually be quite expensive so let's avoid them.
2024-10-07 07:35:31 +02:00
Austin Kerbow
c4d89203f3
[AMDGPU] Support preloading hidden kernel arguments (#98861)
Adds hidden kernel arguments to the function signature and marks them
inreg if they should be preloaded into user SGPRs. The normal kernarg
preloading logic then takes over with some additional checks for the
correct implicitarg_ptr alignment.

Special care is needed so that metadata for the hidden arguments is not
added twice when generating the code object.
2024-10-06 17:44:33 -07:00
Jakub Kuderski
5d45815473
[docs][amdgpu] Update kernarg documentation for gfx90a (#109690)
Update the docs to mention that kernel argument preloading is not
supported on MI210.
2024-09-30 13:51:41 -04:00
Janek van Oirschot
c897c13dde
[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913)
Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propagation to to MC layer (through
helpers in AMDGPUMCResourceInfo) by generating MCExprs for every
function resource which the emitters have been prepped for.

Fixes https://github.com/llvm/llvm-project/issues/64863
2024-09-30 11:43:34 +01:00
Scott Egerton
396f677514
[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769) 2024-09-24 10:58:00 +01:00
Jay Foad
8663a75fa2
[AMDGPU] Add link to RDNA 3.5 docs (#108977) 2024-09-17 16:32:27 +01:00
Pierre van Houtryve
eaac4a2613
[AMDGPU] Document & Finalize GFX12 Memory Model (#98599)
Documents the memory model implemented as of #98591, with some
fixes/optimizations to the implementation.
2024-09-09 15:35:28 +02:00
Scott Linder
9171881d64 [AMDGPU][Docs] DWARF aspace-aware base types (post-review fixes) 2024-09-04 22:19:25 +00:00
Aarni Koskela
df5840f9f0
[AMDGPU][Docs] Update product names for some targets (#106973)
Based on
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus.
2024-09-04 16:58:17 +04:00
Scott Linder
22825ddd88 [AMDGPU][Docs] DWARF aspace-aware base types
Propose an extension to base type DIEs such that DW_ATE_address-encoded
base types can include an architecture specific address space. Use this
to implement DW_OP_convert conversions between AMDGPU address space
addresses where meaningful.
2024-08-19 19:55:15 +00:00
lancesix
cc78639453
[AMDGPU][NFC] AMDGPUUsage.rst: document corefile format (#104419)
This patch adds a description of the core file format used for AMDGPU.

Reference implementation for creating and loading AMDGPU core dump is
available in
[ROCgdb-6.2](https://github.com/ROCm/ROCgdb/tree/rocm-6.2.x/gdb)
2024-08-16 12:22:19 +02:00
pvanhout
db27905a0b [AMDGPU] Remove trailing spaces in AMDGPUUsage.rst 2024-07-12 09:02:46 +02:00
Matt Arsenault
62d949766b
AMDGPU: Add description for new atomicrmw metadata (#85052)
Add a spec for yet-to-be-implemented metadata to allow the backend to
fully handle atomicrmw lowering. This is the base of an alternative
to #69229, which inverts the direction to be correct by default, and
extends to cover the peer device case.
2024-07-10 17:39:04 +04:00
Vikram Hegde
35f7b60aa6
[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
2024-06-26 09:24:09 +05:30
Vikram Hegde
5feb32ba92
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel

---------

Co-authored-by: vikramRH <vikhegde@amd.com>
2024-06-25 14:35:19 +05:30
Nicolai Hähnle
4a70981d21
AMDGPU/gfx12: Minor documentation update (#96079) 2024-06-19 16:49:18 +02:00
Pierre van Houtryve
a45080f091
[AMDGPU] Document amdgpu-as in AMDGPUUsage (#94335)
Add a section about fence & address spaces that covers amdgpu-as.
2024-06-11 14:31:26 +02:00
Shilei Tian
1ca0055f45
[AMDGPU] Add a new target gfx1152 (#94534) 2024-06-06 12:16:11 -04:00
Krzysztof Drewniak
e31bfc040a
[AMDGPU] Strengthen preload intrinsics to noundef and nonnull (#92801)
The various preloaded registers (workitem IDs, workgroup IDs, and
various implicit pointers) always have a finite, invariant, well-defined
value throughout a well-defined program.

In cases where the compiler infers or the user declares that some
implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the
behavior of the entire program is undefined, since that misdeclaration
can cause arbitrary other preloaded-register intrinsics to access the
wrong register. This case is not expected to arise in practice, but
could occur when the no implicit argument attributes were not cleared
correctly in the presence of external functions, indrect calls, or other
means of executing un-analyzable code. Failure to detect that case would
be a bug in the attributor.

This commit updates the documentation to reflect this long-standing
reality.

Then, on the basis that all implicit arguments are defined in all
correct programs, the intrinsics that return those values are
annototated with `noundef``. Some implicit pointer arguments gain a
`nonnull`, but the kernel argument segment pointer or implicit argument
pointers don't necessarily have this property.

This will prevent spurious calls to `freeze` in front-end optimizations
that destroy user-provided ranges on built-in IDs.

(While I'm here, this commit adds a test for `noundef` on kernel
arguments which is currently unimplemented)
2024-06-03 16:37:08 -05:00
Konstantin Zhuravlyov
775f1cd34d
AMDGPU: Add gfx12-generic target (#93875) 2024-05-31 12:46:44 -04:00
Konstantin Zhuravlyov
949ef57dd2
AMDGPU/NFC: Reserve 0x058 EF_AMDGPU_MACHs (#93696) 2024-05-29 12:52:34 -04:00
Lu Weining
74014b5a34
Fix typo in AMDGPUUsage. NFC (#93652)
The vendor name is mesa but not mesa3d.
2024-05-29 17:39:38 +08:00
Konstantin Zhuravlyov
315a83145b
AMDGPU/NFC: Reserve 0x056 and 0x057 EF_AMDGPU_MACHs (#92917) 2024-05-21 13:35:39 -04:00
Krzysztof Drewniak
ac0d415552
Update documentation for buffer fat pointers (#92034)
Now that we've got (minus some issues around datatypes and invariant
loads) working lowerings for address space 7, update the table in the
AMDGPU usage guide to properly indicate the nature of these address
spaces.
2024-05-14 10:03:48 -05:00
Matt Arsenault
d654278bde
Reapply "AMDGPU: Implement llvm.set.rounding (#88587)" series (#91113)
Revert "Revert 4 last AMDGPU commits to unbreak Windows bots"

This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f.

MSVC does not like constexpr on the definition after an extern
declaration of a global.
2024-05-06 09:09:19 +02:00
Mehdi Amini
0d493ed2c6 Revert 4 last AMDGPU commits to unbreak Windows bots
Revert "AMDGPU: Try to fix build error with old gcc"
This reverts commit c7ad12d0d7606b0b9fb531b0b273bdc5f1490ddb.

Revert "AMDGPU: Use umin in set.rounding expansion"
This reverts commit a56f0b51dd988ad2b533de759c98457c1ed42456.

Revert "AMDGPU: Optimize set_rounding if input is known to fit in 2 bits (#88588)"
This reverts commit b4e751e2ab0ff152ed18dea59ebf9691e963e1dd.

Revert "AMDGPU: Implement llvm.set.rounding (#88587)"
This reverts commit 9731b77e80261c627d79980f8c275700bdaf6591.
2024-05-04 19:57:33 +02:00
Matt Arsenault
9731b77e80
AMDGPU: Implement llvm.set.rounding (#88587)
Use a shift of a magic constant and some offseting to convert from
flt_rounds values.

I don't know why the enum defines Dynamic = 7. The standard suggests -1
is the cannot determine value. If we could start the extended values at
4 we wouldn't need the extra compare sub and select.

https://reviews.llvm.org/D153257
2024-05-03 09:41:27 +02:00