355 Commits

Author SHA1 Message Date
Austin Kerbow
2e5c298281
[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167)
Add a prologue to the kernel entry to handle cases where code designed
for kernarg preloading is executed on hardware equipped with
incompatible firmware. If hardware has compatible firmware the 256 bytes
at the start of the kernel entry will be skipped. This skipping is done
automatically by hardware that supports the feature.

A pass is added which is intended to be run at the very end of the
pipeline to avoid any optimizations that would assume the prologue is a
real predecessor block to the actual code start. In reality we have two
possible entry points for the function. 1. The optimized path that
supports kernarg preloading which begins at an offset of 256 bytes. 2.
The backwards compatible entry point which starts at offset 0.
2025-01-10 11:39:02 -08:00
Shilei Tian
86734c8577 [NFC][AMDGPU] Remove redundant code in AMDGPUAsmPrinter.cpp 2024-11-20 15:08:26 -05:00
Matt Arsenault
5a556d55fb
AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309) 2024-11-18 10:48:56 -08:00
Shilei Tian
6548b6354d Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
2024-11-08 20:21:16 -05:00
Janek van Oirschot
7f60f1312a
[AMDGPU] Fix resource usage information for unnamed functions (#115320)
Resource usage information would try to overwrite unnamed functions if
there are multiple within the same compilation unit. This aims to either
use the `MCSymbol` assigned to the unnamed function (i.e.,
`CurrentFnSym`), or, rematerialize the `MCSymbol` for the unnamed
function.
2024-11-07 18:24:54 +00:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Thomas Symalla
b95d50e5d8
Add and call AMDGPUMCResourceInfo::reset method (#110818)
When compiling multiple pipelines, the `MCRegisterInfo` instance in
`AMDGPUAsmPrinter` gets re-used even after finalization, so it calls
`finalize()` multiple times.

Add a reset method and call it in
`AMDGPUAsmPrinter::doFinalization`.

Different approach would be to make it a `unique_ptr`.

---------

Co-authored-by: Thomas Symalla <tsymalla@amd.com>
2024-10-02 14:17:01 +02:00
Janek van Oirschot
c897c13dde
[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913)
Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propagation to to MC layer (through
helpers in AMDGPUMCResourceInfo) by generating MCExprs for every
function resource which the emitters have been prepped for.

Fixes https://github.com/llvm/llvm-project/issues/64863
2024-09-30 11:43:34 +01:00
Austin Kerbow
954ab83e6a
[AMDGPU] Include unused preload kernarg in KD total SGPR count (#104743)
Unlike with implicitly preloaded data UserSGPRs firmware is unable to
handle cases where SGPRs for kernel arguments contain preloaded data but
not are not explicitly referenced in the kernel. We need to include
these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation
to not clobber SGPRs in adjacent waves.
2024-09-23 13:48:22 -07:00
Janek van Oirschot
bfce1aae76
[AMDGPU] MCExpr printing helper with KnownBits support (#95951)
Walks over the MCExpr and uses KnownBits to deduce whether an expression
is known and if so, prints said known value. Should support the most
common MCExpr cases for AMDGPU metadata.
2024-08-15 13:43:13 +01:00
Joseph Huber
f39bd0a24e
[AMDGPU] Do not print kernel-resource-usage information on non-kernels (#99720)
Summary:
This pass is used to get helpful information about the kernel resources
without needing to insepct the binary. However, it currently prints on
every function. These values will always be zero, so it's just spam on
the terminal, at best an indication that a function wasn't internalized
/ optimized out. This patch makes it only print for kernels to make it
more useful in practice.
2024-07-22 07:07:51 -05:00
Jay Foad
63fae3ed65
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298) 2024-07-17 21:11:00 +01:00
Jay Foad
5e338f1f4a [AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC. 2024-07-17 08:27:35 +01:00
Jay Foad
f10a78b7e4 [AMDGPU] clang-tidy: use std::make_unique. NFC. 2024-07-17 07:58:09 +01:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Janek van Oirschot
17eaa23f7e
[AMDGPU] MCExpr-ify AMDGPU HSAMetadata (#94788)
Enables MCExpr for HSAMetadata, particularly, HSAMetadata's msgpack format.
2024-06-26 16:39:08 +01:00
Ivan Kosarev
13ed349c44
[AMDGPU][NFC] Rename AMDGPUVariadicMCExpr to AMDGPUMCExpr. (#96618)
Some of our custom expressions are not variadic and there seems to be
little benefit in mentioning the variadic nature of expression nodes in
the name anyway.
2024-06-25 15:32:09 +01:00
Nicolai Hähnle
7e9b49f6b8
AMDGPU: Add plumbing for private segment size argument (#96445)
The actual size of scratch/private is determined at dispatch time, so
add more plumbing to request it. Will be used in subsequent change.
2024-06-25 16:20:51 +02:00
Janek van Oirschot
3d1705d00c
MCExpr-ify AMDGPU PALMetadata (#93236)
Allows MCExprs as passed values to PALMetadata. Also adds related
`DelayedMCExpr` classes which serve as a pseudo-fixup to resolve MCExprs
as late as possible (i.e., right before emit through string or blob,
where they should be resolvable).
2024-06-13 13:59:31 +01:00
Jie Fu
a1c29df572 [AMDGPU] Fix -Wunused-variable in AMDGPUAsmPrinter.cpp (NFC)
/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:653:13:
error: unused variable 'PGMRSrc3' [-Werror,-Wunused-variable]
    int64_t PGMRSrc3;
            ^
1 error generated.
2024-06-10 20:35:09 +08:00
Janek van Oirschot
bc022b406d
[AMDGPU] Support SIProgramInfo MCExpr for comments and remarks (#94350)
Eliminates assumption that MCExpr comments/remarks being emitted are
always resolvable
2024-06-10 12:34:06 +01:00
Janek van Oirschot
a699ccbf0c
MCExpr-ify amd_kernel_code_t (#91587)
Redefines the amd_kernel_code_t struct with MCExprs for members that would be
derived from SIProgramInfo MCExpr members.
2024-05-22 13:45:45 +01:00
Fangrui Song
9500a5d02e [MC] Make UseAssemblerInfoForParsing mostly true
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).

`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.

I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.

```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
  ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc     # succeeded

% clang -c b.cc     # succeeded with this patch
% clang -S b.cc     # still failed
<inline asm>:4:5: error: expected absolute expression
    4 | .if . -_start == 1
      |     ^
1 error generated.
```

However, removing `getUseAssemblerInfoForParsing` would make
MCDwarfFrameEmitter::Emit (for .eh_frame FDE) slow (~4% compile time
regression for sqlite3.c amalgamation) due to expensive
`AttemptToFoldSymbolOffsetDifference`. For now, make
`UseAssemblerInfoForParsing` false in MCDwarfFrameEmitter::Emit.

Close #62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841

Pull Request: https://github.com/llvm/llvm-project/pull/91082
2024-05-19 23:35:15 -07:00
Nikita Popov
fa750f09be Revert "[MC] Remove UseAssemblerInfoForParsing"
This reverts commit 03c53c69a367008da689f0d2940e2197eb4a955c.

This causes very large compile-time regressions in some cases,
e.g. sqlite3 at O0 regresses by 5%.
2024-05-16 09:56:07 +09:00
Fangrui Song
03c53c69a3
[MC] Remove UseAssemblerInfoForParsing
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).

`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.

I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.

```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
  ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc     # succeeded

% clang -c b.cc     # succeeded with this patch
% clang -S b.cc     # still failed
<inline asm>:4:5: error: expected absolute expression
    4 | .if . -_start == 1
      |     ^
1 error generated.
```

Close #62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841

Pull Request: https://github.com/llvm/llvm-project/pull/91082
2024-05-15 09:18:39 -07:00
Jie Fu
c2a87d7e03 [AMDGPU] Remove unused lambda capture (NFC)
/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:733:30:
error: lambda capture 'Ctx' is not used [-Werror,-Wunused-lambda-capture]
  auto TryGetMCExprValue = [&Ctx](const MCExpr *Value, uint64_t &Res) -> bool {
                            ~^~~
1 error generated.
2024-05-09 20:15:41 +08:00
Janek van Oirschot
d86b68afd7
MCExpr-ify SIProgramInfo (#88257)
Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.
2024-05-09 13:02:32 +01:00
Kazu Hirata
c18bcd0a57
[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072) (#91138)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  38 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-05 13:43:10 -07:00
Carl Ritson
44648ccb8b
[AMDGPU] Always emit lds_size in PAL ELF Metadata 3.0 (#87222)
Emit lds_size for all shader types in PAL metadata.
2024-05-03 17:01:03 +09:00
Janek van Oirschot
1103a2a337
Reland [AMDGPU] MCExpr-ify MC layer kernel descriptor (#86494)
Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.

Relands #80855 with fixes
2024-03-27 11:59:56 +00:00
Janek van Oirschot
797336b127
Revert "[AMDGPU] MCExpr-ify MC layer kernel descriptor" (#86151)
Reverts llvm/llvm-project#80855
2024-03-21 10:19:54 -07:00
Janek van Oirschot
857161c367
[AMDGPU] MCExpr-ify MC layer kernel descriptor (#80855)
Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.
2024-03-21 13:57:10 +00:00
Carl Ritson
c29b265eb9 Reapply "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)"
This reverts commit 7d508eb5d38f4bbbab4230a666d9e742e271af61.
2024-03-14 10:56:43 +09:00
Diana Picus
0086cc95b3
[AMDGPU] Rename getNumVGPRBlocks. NFC (#84161)
Rename getNumVGPRBlocks to getEncodedNumVGPRBlocks, to clarify that it's
using the encoding granule. This is used to program the hardware. In
practice, the hardware will use the alloc granule instead, so this patch
also adds a new helper, getAllocatedNumVGPRBlocks, which can be useful
when driving heuristics.
2024-03-07 12:46:42 +01:00
Austin Kerbow
4bcbeaed63
[AMDGPU] Enable kernel arg preloading with gfx90a (#81180)
Add a trap instruction to the beginning of the kernel prologue to handle
cases where preloading is attempted on HW loaded with incompatible
firmware.
2024-02-12 22:33:29 -08:00
Pierre van Houtryve
f93aa5157a
[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.

Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.

This contains the documentation changes for both this change and #76954
as well.
2024-02-12 10:18:20 +01:00
Carl Ritson
7d508eb5d3 Revert "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)"
This reverts commit d6c7253d32e4bdff619c39708170f1c1fa01ff95.

Change causing CTS failures due to incomplete metadata.
2024-02-07 17:09:56 +09:00
David Stuttard
d6c7253d32
[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)
PAL Metadata 3.0 introduces an explicit structure in metadata for the
programmable registers written out by the compiler backend.
The previous approach used opaque registers which can change between different
architectures and required encoding the bitfield information in the backend,
which may change between versions.

This change is an extension the previously added support - which only handled
entry functions. This adds support for all functions.

The change also includes some re-factoring to separate common code.
2024-02-06 15:34:36 +00:00
Pierre van Houtryve
500846d2f5
[AMDGPU] Introduce Code Object V6 (#76954)
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same
as V5 except a new "generic version" flag can be present in EFLAGS. This
is related to new generic targets that'll be added in a follow-up patch.
It's also likely V6 will have new changes (possibly new metadata
entries) added later.

Docs change are part of the follow-up patch #76955
2024-02-05 08:19:53 +01:00
Emma Pilkington
bc82cfb38d
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.

This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
2024-01-21 11:54:47 -05:00
Jay Foad
42b9ea841e
[AMDGPU] Increase max scratch allocation for GFX12 (#77625) 2024-01-17 10:25:28 +00:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
Pierre van Houtryve
ecd2f56a80
[AMDGPU] Warn if 'amdgpu-waves-per-eu' target occupancy was not met (#74055)
This should make it a bit harder to miss this type of issue. The warning
only shows if amdgpu-waves-per-eu is used.

See SWDEV-434482
2023-12-06 10:46:46 +01:00
Pierre van Houtryve
4428b01faa Reland: [AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-11-07 12:23:03 +01:00
Jay Foad
521ac12a25
[AMDGPU] Remove AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough (#71407)
The special handling for blocks ending with a long branch has been
unnecessary since D106445:
"[amdgpu] Add 64-bit PC support when expanding unconditional branches."
2023-11-06 16:29:52 +00:00
pvanhout
868abf0961 Revert "[AMDGPU] Remove Code Object V3 (#67118)"
This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.
2023-10-18 12:55:36 +02:00
Pierre van Houtryve
544d91280c
[AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-10-16 08:21:48 +02:00
Yashwant Singh
7ac532efc8
[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091)
Use this flag to give more context to implicit def comments in assembly.

Reviewed on phabricator: 
https://reviews.llvm.org/D153754
2023-09-29 11:15:01 +05:30
Austin Kerbow
0455596e1e [AMDGPU] Add DAG ISel support for preloaded kernel arguments
This patch adds the DAG isel changes for kernel argument preloading.
These changes are not usable with older firmware but subsequent patches
in the series will make the codegen backwards compatible. This patch
should only be submitted alongside that subsequent patch.

Preloading here begins from the start of the kernel arguments until the
amount of arguments indicated by the CL flag
amdgpu-kernarg-preload-count.

Aggregates and arguments passed by-ref are not supported.

Special care for the alignment of the kernarg segment is needed as well
as consideration of the alignment of addressable SGPR tuples when we
cannot directly use misaligned large tuples that the arguments are
loaded to.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D158579
2023-09-25 09:32:59 -07:00
Pierre van Houtryve
fe2f67e4ba
[AMDGPU] Remove Code Object V2 (#65715)
Code Object V2 has been deprecated for more than a year now. We can
safely remove it from LLVM.

- [clang] Remove support for the `-mcode-object-version=2` option.
- [lld] Remove/refactor tests that were still using COV2
- [llvm] Update AMDGPUUsage.rst
- Code Object V2 docs are left for informational purposes because those
code objects may still be supported by the runtime/loaders for a while.
- [AMDGPU] Remove COV2 emission capabilities.
- [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2
- [AMDGPU] Update all tests that were still using COV2 - They are either
deleted or ported directly to code object v4 (as v3 is also planned to
be removed soon).
2023-09-21 12:00:45 +02:00