414 Commits

Author SHA1 Message Date
Matt Arsenault
0b40f97929
AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751)
0 does not make sense as a value for this to be, much less the default.
Also stop emitting each individual field if it is the default, rather than
if any element was the default. Also fix the name of the test since it didn't
exactly match the real attribute name.
2024-11-05 12:50:44 -08:00
Carl Ritson
076aac59ac
[AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Jay Foad
e7f1dae412
[AMDGPU] gfx1152 does not have Feature1_5xVGPRs (#113163) 2024-10-22 11:12:00 +01:00
Petar Avramovic
7b0d56be1d
AMDGPU/GlobalISel: Fix inst-selection of ballot (#109986)
Both input and output of ballot are lane-masks:
result is lane-mask with 'S32/S64 LLT and SGPR bank'
input is lane-mask with 'S1 LLT and VCC reg bank'.
Ballot copies bits from input lane-mask for
all active lanes and puts 0 for inactive lanes.
GlobalISel did not set 0 in result for inactive lanes
for non-constant input.
2024-10-11 11:40:27 +02:00
Pierre van Houtryve
924a64a348
[AMDGPU] Only emit SCOPE_SYS global_wb (#110636)
global_wb with scopes lower than SCOPE_SYS is unnecessary for
correctness.

I was initially optimistic they would be very cheap no-ops but they can
actually be quite expensive so let's avoid them.
2024-10-07 07:35:31 +02:00
Austin Kerbow
c4d89203f3
[AMDGPU] Support preloading hidden kernel arguments (#98861)
Adds hidden kernel arguments to the function signature and marks them
inreg if they should be preloaded into user SGPRs. The normal kernarg
preloading logic then takes over with some additional checks for the
correct implicitarg_ptr alignment.

Special care is needed so that metadata for the hidden arguments is not
added twice when generating the code object.
2024-10-06 17:44:33 -07:00
Jakub Kuderski
5d45815473
[docs][amdgpu] Update kernarg documentation for gfx90a (#109690)
Update the docs to mention that kernel argument preloading is not
supported on MI210.
2024-09-30 13:51:41 -04:00
Janek van Oirschot
c897c13dde
[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913)
Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propagation to to MC layer (through
helpers in AMDGPUMCResourceInfo) by generating MCExprs for every
function resource which the emitters have been prepped for.

Fixes https://github.com/llvm/llvm-project/issues/64863
2024-09-30 11:43:34 +01:00
Scott Egerton
396f677514
[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769) 2024-09-24 10:58:00 +01:00
Jay Foad
8663a75fa2
[AMDGPU] Add link to RDNA 3.5 docs (#108977) 2024-09-17 16:32:27 +01:00
Pierre van Houtryve
eaac4a2613
[AMDGPU] Document & Finalize GFX12 Memory Model (#98599)
Documents the memory model implemented as of #98591, with some
fixes/optimizations to the implementation.
2024-09-09 15:35:28 +02:00
Scott Linder
9171881d64 [AMDGPU][Docs] DWARF aspace-aware base types (post-review fixes) 2024-09-04 22:19:25 +00:00
Aarni Koskela
df5840f9f0
[AMDGPU][Docs] Update product names for some targets (#106973)
Based on
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus.
2024-09-04 16:58:17 +04:00
Scott Linder
22825ddd88 [AMDGPU][Docs] DWARF aspace-aware base types
Propose an extension to base type DIEs such that DW_ATE_address-encoded
base types can include an architecture specific address space. Use this
to implement DW_OP_convert conversions between AMDGPU address space
addresses where meaningful.
2024-08-19 19:55:15 +00:00
lancesix
cc78639453
[AMDGPU][NFC] AMDGPUUsage.rst: document corefile format (#104419)
This patch adds a description of the core file format used for AMDGPU.

Reference implementation for creating and loading AMDGPU core dump is
available in
[ROCgdb-6.2](https://github.com/ROCm/ROCgdb/tree/rocm-6.2.x/gdb)
2024-08-16 12:22:19 +02:00
pvanhout
db27905a0b [AMDGPU] Remove trailing spaces in AMDGPUUsage.rst 2024-07-12 09:02:46 +02:00
Matt Arsenault
62d949766b
AMDGPU: Add description for new atomicrmw metadata (#85052)
Add a spec for yet-to-be-implemented metadata to allow the backend to
fully handle atomicrmw lowering. This is the base of an alternative
to #69229, which inverts the direction to be correct by default, and
extends to cover the peer device case.
2024-07-10 17:39:04 +04:00
Vikram Hegde
35f7b60aa6
[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
2024-06-26 09:24:09 +05:30
Vikram Hegde
5feb32ba92
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel

---------

Co-authored-by: vikramRH <vikhegde@amd.com>
2024-06-25 14:35:19 +05:30
Nicolai Hähnle
4a70981d21
AMDGPU/gfx12: Minor documentation update (#96079) 2024-06-19 16:49:18 +02:00
Pierre van Houtryve
a45080f091
[AMDGPU] Document amdgpu-as in AMDGPUUsage (#94335)
Add a section about fence & address spaces that covers amdgpu-as.
2024-06-11 14:31:26 +02:00
Shilei Tian
1ca0055f45
[AMDGPU] Add a new target gfx1152 (#94534) 2024-06-06 12:16:11 -04:00
Krzysztof Drewniak
e31bfc040a
[AMDGPU] Strengthen preload intrinsics to noundef and nonnull (#92801)
The various preloaded registers (workitem IDs, workgroup IDs, and
various implicit pointers) always have a finite, invariant, well-defined
value throughout a well-defined program.

In cases where the compiler infers or the user declares that some
implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the
behavior of the entire program is undefined, since that misdeclaration
can cause arbitrary other preloaded-register intrinsics to access the
wrong register. This case is not expected to arise in practice, but
could occur when the no implicit argument attributes were not cleared
correctly in the presence of external functions, indrect calls, or other
means of executing un-analyzable code. Failure to detect that case would
be a bug in the attributor.

This commit updates the documentation to reflect this long-standing
reality.

Then, on the basis that all implicit arguments are defined in all
correct programs, the intrinsics that return those values are
annototated with `noundef``. Some implicit pointer arguments gain a
`nonnull`, but the kernel argument segment pointer or implicit argument
pointers don't necessarily have this property.

This will prevent spurious calls to `freeze` in front-end optimizations
that destroy user-provided ranges on built-in IDs.

(While I'm here, this commit adds a test for `noundef` on kernel
arguments which is currently unimplemented)
2024-06-03 16:37:08 -05:00
Konstantin Zhuravlyov
775f1cd34d
AMDGPU: Add gfx12-generic target (#93875) 2024-05-31 12:46:44 -04:00
Konstantin Zhuravlyov
949ef57dd2
AMDGPU/NFC: Reserve 0x058 EF_AMDGPU_MACHs (#93696) 2024-05-29 12:52:34 -04:00
Lu Weining
74014b5a34
Fix typo in AMDGPUUsage. NFC (#93652)
The vendor name is mesa but not mesa3d.
2024-05-29 17:39:38 +08:00
Konstantin Zhuravlyov
315a83145b
AMDGPU/NFC: Reserve 0x056 and 0x057 EF_AMDGPU_MACHs (#92917) 2024-05-21 13:35:39 -04:00
Krzysztof Drewniak
ac0d415552
Update documentation for buffer fat pointers (#92034)
Now that we've got (minus some issues around datatypes and invariant
loads) working lowerings for address space 7, update the table in the
AMDGPU usage guide to properly indicate the nature of these address
spaces.
2024-05-14 10:03:48 -05:00
Matt Arsenault
d654278bde
Reapply "AMDGPU: Implement llvm.set.rounding (#88587)" series (#91113)
Revert "Revert 4 last AMDGPU commits to unbreak Windows bots"

This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f.

MSVC does not like constexpr on the definition after an extern
declaration of a global.
2024-05-06 09:09:19 +02:00
Mehdi Amini
0d493ed2c6 Revert 4 last AMDGPU commits to unbreak Windows bots
Revert "AMDGPU: Try to fix build error with old gcc"
This reverts commit c7ad12d0d7606b0b9fb531b0b273bdc5f1490ddb.

Revert "AMDGPU: Use umin in set.rounding expansion"
This reverts commit a56f0b51dd988ad2b533de759c98457c1ed42456.

Revert "AMDGPU: Optimize set_rounding if input is known to fit in 2 bits (#88588)"
This reverts commit b4e751e2ab0ff152ed18dea59ebf9691e963e1dd.

Revert "AMDGPU: Implement llvm.set.rounding (#88587)"
This reverts commit 9731b77e80261c627d79980f8c275700bdaf6591.
2024-05-04 19:57:33 +02:00
Matt Arsenault
9731b77e80
AMDGPU: Implement llvm.set.rounding (#88587)
Use a shift of a magic constant and some offseting to convert from
flt_rounds values.

I don't know why the enum defines Dynamic = 7. The standard suggests -1
is the cannot determine value. If we could start the extended values at
4 we wouldn't need the extra compare sub and select.

https://reviews.llvm.org/D153257
2024-05-03 09:41:27 +02:00
Emma Pilkington
68e814d911
[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400)
These mostly are checking for various reserved bits being set. The diagnostics
for gpu-dependent reserved bits have a bit more context since they seem like the
most likely ones to be observed in practice.

This commit also improves the error handling mechanism for
MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter
that was just being ignored by llvm-objdump, now it returns errors using
Expected<T>.
2024-04-18 13:44:22 -04:00
Fabian Ritter
7b8625ec16
[AMDGPU][Docs] Fix broken link to HRF memory model reference (#88696)
The link to the Heterogeneous-race-free Memory Models ASPLOS'14 paper by
Hower et al. pointed to a bogus website, probably because the domain
ownership has changed.
This patch updates it to a version hosted on research.cs.wisc.edu.
2024-04-17 14:54:14 +02:00
Matt Arsenault
c13556c0b0
AMDGPU: Document more backend recognized attributes (#80239) 2024-03-28 14:27:14 +03:00
Matt Arsenault
b6b703b2df
AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948)
SIMachineFunctionInfo has a scan  of the function body for inline asm
which may use AGPRs, or callees in SIMachineFunctionInfo. Move this
into the attributor, so it actually works interprocedurally.
    
Could probably avoid most of the test churn if this bothered to avoid
adding this on subtargets without AGPRs. We should also probably
try to delete the MIR scan in usesAGPRs but it seems to be trickier
to eliminate.
2024-03-21 14:24:06 +05:30
Janek van Oirschot
f7bebc1914
Reland [AMDGPU] Add AMDGPU specific variadic operation MCExprs (#84562)
Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'. 

Relands #82022 with fixes
2024-03-14 14:31:00 +00:00
Matt Arsenault
5c3d001668
AMDGPU: Don't use table for metadata docs, and fix section headers (#85046) 2024-03-13 18:34:23 +05:30
Matt Arsenault
cd2f616313
AMDGPU: Use list-table for metadata table (#85024)
The table syntax for sphinx is really insufferably whitespace dependent.
I've been meaning to convert the existing attribute and intrinsic tables
to use list-table, which is less painful to merge.
2024-03-13 12:42:15 +05:30
Jun Wang
c4e517f59c
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-03-12 10:30:39 -07:00
Pierre van Houtryve
63c77d8475
[AMDGPU] Make generic versioning docs easier to find (#84761) 2024-03-11 15:56:17 +01:00
Florian Mayer
0083c3eb83
Revert "[AMDGPU] Add AMDGPU specific variadic operation MCExprs" (#84273)
Reverts llvm/llvm-project#82022

Fails on hwasan build bot:
https://lab.llvm.org/buildbot/#/builders/236/builds/9874/steps/10/logs/stdio
2024-03-06 19:37:49 -08:00
Janek van Oirschot
bec2d105c7
[AMDGPU] Add AMDGPU specific variadic operation MCExprs (#82022)
Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'.
2024-03-06 21:01:54 +00:00
Mirko Brkušanin
1fd1f4c0e1
[AMDGPU] Handle amdgpu.last.use metadata (#83816)
Convert !amdgpu.last.use metadata into MachineMemOperand for last use
and handle it in SIMemoryLegalizer similar to nontemporal and volatile.
2024-03-06 16:33:52 +01:00
Joseph Huber
1fc5e50ceb
[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906)
Summary:
This patch implements the LLVM floating point environment control
intrinsics and also exposes it through clang. We encode the floating
point environment as a 64-bit value that simply concatenates the values
of the mode registers and the current trap status. We only fetch the
bits relevant for floating point instructions. That is, rounding mode,
denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16
overflow, and active exceptions.
2024-03-06 08:11:54 -06:00
Pierre van Houtryve
43c7eb5d7b
[AMDGPU] Replace '.' with '-' in generic target names (#81718)
The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs &
the associated clang driver logic are also confused by the dot.

After discussions, we decided it's better to just remove the '.' from
the target name than fix each issue one by one.
2024-02-14 15:19:04 +01:00
Pierre van Houtryve
87d7711934
[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450)
Fixes SWDEV-443292
2024-02-13 09:07:51 +01:00
Austin Kerbow
4bcbeaed63
[AMDGPU] Enable kernel arg preloading with gfx90a (#81180)
Add a trap instruction to the beginning of the kernel prologue to handle
cases where preloading is attempted on HW loaded with incompatible
firmware.
2024-02-12 22:33:29 -08:00
Konstantin Zhuravlyov
75a1c4e10b
AMDGPU/NFC: Reserve 0x055 MACH in e_flag for future use (#81501) 2024-02-12 13:37:25 -05:00
Mariusz Sikora
0c63453714
[AMDGPU][NFC] Docs - remove duplicates (#81465) 2024-02-12 12:25:54 +01:00
Pierre van Houtryve
f93aa5157a
[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.

Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.

This contains the documentation changes for both this change and #76954
as well.
2024-02-12 10:18:20 +01:00