375 Commits

Author SHA1 Message Date
Diana Picus
a910a6a8b5
[AMDGPU] AsmPrinter: Unify arg handling (#151672)
When computing the number of registers required by entry functions, the
`AMDGPUAsmPrinter` needs to take into account both the register usage
computed by the `AMDGPUResourceUsageAnalysis` pass, and the number
of registers initialized by the hardware. At the moment, the way it
computes the latter is different for graphics vs compute, due to differences in
the implementation. For kernels, all the information needed is available in
the `SIMachineFunctionInfo`, but for graphics shaders we would iterate over
the `Function`  arguments in the `AMDGPUAsmPrinter`. This pretty much 
repeats some of the logic from instruction selection.

This patch introduces 2 new members to `SIMachineFunctionInfo`, one
for SGPRs and one for VGPRs. Both will be computed during instruction
selection and then used during `AMDGPUAsmPrinter`, removing the need
to refer to the `Function` when printing assembly.

This patch is NFC except for the fact that we now add the extra SGPRs
(VCC, XNACK etc) to the number of SGPRs computed for graphics entry points.
I'm not sure why these weren't included before. It would be nice if
someone could confirm if that was just an oversight or if we have some docs
somewhere that I haven't managed to find. Only one test is affected (its SGPR
usage increases because we now take into account the XNACK registers).
2025-08-08 12:00:37 +02:00
Tim Renouf
e99c565cd2
MC,AMDGPU: Don't pad .text with s_code_end if it would otherwise be empty (#147980)
We don't want that padding in a module that only contains data, not
code.

Also fix MCSection::hasInstructions() so it works with the asm streamer
too.
2025-08-06 13:25:45 +01:00
Pierre van Houtryve
be17791f26
[AMDGPU][gfx1250] Add cu-store subtarget feature (#150588)
Determines whether we can use `SCOPE_CU` stores (on by default), or
whether all stores must be done at `SCOPE_SE` minimum.
2025-07-29 11:38:43 +02:00
Jay Foad
1c49ce676c
[AMDGPU] Enable FWD_PROGRESS bit for GFX10+ on PAL (#139895)
Performance testing shows no significant gains or losses on graphics
workloads, so this is mostly to make the behavior consistent across all
supported OSes instead of special-casing HSA.
2025-07-21 17:29:06 +01:00
Vikram Hegde
1ccd779324
[AMDGPU][NewPM] Port "AMDGPUResourceUsageAnalysis" to NPM (#130959) 2025-07-10 13:35:43 +05:30
Diana Picus
a201f8872a
[AMDGPU] Replace dynamic VGPR feature with attribute (#133444)
Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget
feature, as requested in #130030.
2025-06-24 11:09:36 +02:00
Matt Arsenault
7031280218
AMDGPU: Use reportFatalUsageError for unsupported code object version (#145133) 2025-06-21 13:01:18 +09:00
Andrew Rogers
19658d1474
[llvm] annotate interfaces in llvm/Target for DLL export (#143615)
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Target` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.

## Background

This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

A sub-set of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The bulk of this change is manual additions of `LLVM_ABI` to
`LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target.
Adding `LLVM_ABI` to the function implementation is required here
because they do not `#include "llvm/Support/TargetSelect.h"`, which
contains the declarations for this functions and was already updated
with `LLVM_ABI` in a previous patch. I considered patching these files
with `#include "llvm/Support/TargetSelect.h"` instead, but since
TargetSelect.h is a large file with a bunch of preprocessor x-macro
stuff in it I was concerned it would unnecessarily impact compile times.

In addition, a number of unit tests under llvm/unittests/Target required
additional dependencies to make them build correctly against the LLVM
DLL on Windows using MSVC.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
2025-06-17 13:28:45 -07:00
Diana Picus
a5cbd2ab0b
Revert "[AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis (#… (#144039)
…133242)"

This reverts commit 130080fab11cde5efcb338b77f5c3b31097df6e6 because it
causes issues in testcases similar to coalescer_remat.ll [1], i.e. when
we use a VGPR tuple but only write to its lower parts. The high VGPRs
would then not be included in the vgpr_count, and accessing them would
be an out of bounds violation.

[1]
https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/coalescer_remat.ll
2025-06-13 12:48:24 +02:00
Diana Picus
130080fab1
[AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis (#133242)
Don't count register uses when determining the maximum number of
registers used by a function. Count only the defs. This is really an
underestimate of the true register usage, but in practice that's not
a problem because if a function uses a register, then it has either
defined it earlier, or some other function that executed before has
defined it.

In particular, the register counts are used:
1. When launching an entry function - in which case we're safe because
   the register counts of the entry function will include the register
   counts of all callees.
2. At function boundaries in dynamic VGPR mode. In this case it's safe
   because whenever we set the new VGPR allocation we take into account
   the outgoing_vgpr_count set by the middle-end.

The main advantage of doing this is that the artificial VGPR arguments
used only for preserving the inactive lanes when using the
llvm.amdgcn.init.whole.wave intrinsic are no longer counted. This
enables us to allocate only the registers we need in dynamic VGPR mode.

---------

Co-authored-by: Thomas Symalla <5754458+tsymalla@users.noreply.github.com>
2025-06-03 11:20:48 +02:00
Matthias Braun
675cb70641
Register assembly printer passes (#138348)
Register assembly printer passes in the pass registry.

This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests.

Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow
targets to use the `INITIALIZE_PASS` macros and register the pass in the
pass registry. This currently has a default parameter so it won't break
any targets that have not been updated.
2025-05-06 18:01:17 -07:00
Nikita Popov
b492ec5899
[ErrorHandling] Add reportFatalInternalError + reportFatalUsageError (NFC) (#138251)
This implements the result of the discussion at:

https://discourse.llvm.org/t/rfc-report-fatal-error-and-the-default-value-of-gencrashdialog/73587

There are two different use cases for report_fatal_error, so replace it
with two functions reportFatalInternalError() and
reportFatalUsageError(). The former indicates a bug in LLVM and
generates a crash dialog. The latter does not. The names have been
suggested by rnk and people seemed to like them.

This replaces a lot of the usages that passed an explicit value for
GenCrashDiag. I did not bulk replace remaining report_fatal_error usage
-- they probably require case by case review for which function to use.
2025-05-05 12:10:03 +02:00
Diana Picus
72c3c30452
[AMDGPU] Allocate scratch space for dVGPRs for CWSR (#130055)
The CWSR trap handler needs to save and restore the VGPRs. When dynamic
VGPRs are in use, the fixed function hardware will only allocate enough
space for one VGPR block. The rest will have to be stored in scratch, at
offset 0.

This patch allocates the necessary space by:
- generating a prologue that checks at runtime if we're on a compute
queue (since CWSR only works on compute queues); for this we will have
to check the ME_ID bits of the ID_HW_ID2 register - if that is non-zero,
we can assume we're on a compute queue and initialize the SP and FP with
enough room for the dynamic VGPRs
- forcing all compute entry functions to use a FP so they can access
their locals/spills correctly (this isn't ideal but it's the quickest to
implement)

Note that at the moment we allocate enough space for the theoretical
maximum number of VGPRs that can be allocated dynamically (for blocks of
16 registers, this will be 128, of which we subtract the first 16, which
are already allocated by the fixed function hardware). Future patches
may decide to allocate less if they can prove the shader never allocates
that many blocks.

Also note that this should not affect any reported stack sizes (e.g. PAL
backend_stack_size etc).
2025-03-19 13:49:19 +01:00
Diana Picus
0a21ef9536
[AMDGPU] Add SubtargetFeature for dynamic VGPR mode (#130030)
This represents a hardware mode supported only for wave32 compute
shaders. When enabled, we set the `.dynamic_vgpr_en` field of
`.compute_registers` to true in the PAL metadata.

This will be changed to use an attribute after downstream consumers
have been migrated.
2025-03-18 11:48:01 +01:00
Alex Voicu
c1fabd681f
[llvm][AMDGPU] Enable FWD_PROGRESS bit for GFX10+ (#128367)
From GFX10 onwards it is possible to employ benevolent scheduling of
waves. This patch unconditionally enables, for the `amdhsa` OS, the bit
which controls that capability, as it is beneficial for algorithms that
rely on more complex concurrent coordination and it is generally
performance neutral otherwise.
2025-03-17 23:17:46 +00:00
Stanislav Mekhanoshin
6c9a9d9fe2
[AMDGPU] Set inst_pref_size to maximum (#126981)
On gfx11 and gfx12 set initial instruction prefetch size to a
minimum of kernel size and maximum allowed value.

Fixes: SWDEV-513122
2025-03-03 10:40:31 -08:00
Stanislav Mekhanoshin
2479479285
[AMDGPU] Extend ComputePGMRSrc3 to gfx10+. NFCI. (#129289)
ComputePGMRSrc3 exists since gfx90a and gfx10+. Current code
only expects gfx90a. This is NFCI since we do not fill it on
gfx10+ yet.
2025-03-03 08:22:15 -08:00
Stanislav Mekhanoshin
d19187f5fe
[AMDGPU] Move into SIProgramInfo and cache getFunctionCodeSize. NFCI. (#127111)
This moves function as is, improvements to the estimate go into
a subseqent patch.
2025-02-17 18:22:48 -08:00
Lucas Ramirez
6206f5444f
[AMDGPU] Occupancy w.r.t. workgroup size range is also a range (#123748)
Occupancy (i.e., the number of waves per EU) depends, in addition to
register usage, on per-workgroup LDS usage as well as on the range of
possible workgroup sizes. Mirroring the latter, occupancy should
therefore be expressed as a range since different group sizes generally
yield different achievable occupancies.

`getOccupancyWithLocalMemSize` currently returns a scalar occupancy
based on the maximum workgroup size and LDS usage. With respect to the
workgroup size range, this scalar can be the minimum, the maximum, or
neither of the two of the range of achievable occupancies. This commit
fixes the function by making it compute and return the range of
achievable occupancies w.r.t. workgroup size and LDS usage; it also
renames it to `getOccupancyWithWorkGroupSizes` since it is the range of
workgroup sizes that produces the range of achievable occupancies.

Computing the achievable occupancy range is surprisingly involved.
Minimum/maximum workgroup sizes do not necessarily yield maximum/minimum
occupancies i.e., sometimes workgroup sizes inside the range yield the
occupancy bounds. The implementation finds these sizes in constant time;
heavy documentation explains the rationale behind the sometimes
relatively obscure calculations.

As a justifying example, consider a target with 10 waves / EU, 4 EUs/CU,
64-wide waves. Also consider a function with no LDS usage and a flat
workgroup size range of [513,1024].

- A group of 513 items requires 9 waves per group. Only 4 groups made up
of 9 waves each can fit fully on a CU at any given time, for a total of
36 waves on the CU, or 9 per EU. However, filling as much as possible
the remaining 40-36=4 wave slots without decreasing the number of groups
reveals that a larger group of 640 items yields 40 waves on the CU, or
10 per EU.
- Similarly, a group of 1024 items requires 16 waves per group. Only 2
groups made up of 16 waves each can fit fully on a CU ay any given time,
for a total of 32 waves on the CU, or 8 per EU. However, removing as
many waves as possible from the groups without being able to fit another
equal-sized group on the CU reveals that a smaller group of 896 items
yields 28 waves on the CU, or 7 per EU.

Therefore the achievable occupancy range for this function is not [8,9]
as the group size bounds directly yield, but [7,10].

Naturally this change causes a lot of test churn as instruction
scheduling is driven by achievable occupancy estimates. In most unit
tests the flat workgroup size range is the default [1,1024] which,
ignoring potential LDS limitations, would previously produce a scalar
occupancy of 8 (derived from 1024) on a lot of targets, whereas we now
consider the maximum occupancy to be 10 in such cases. Most tests are
updated automatically and checked manually for sanity. I also manually
changed some non-automatically generated assertions when necessary.

Fixes #118220.
2025-01-23 16:07:57 +01:00
Janek van Oirschot
82944595fa
[AMDGPU] Change scope of resource usage info symbols (#114810)
Change scope of resource usage info MC symbols to align with the function linkage type
2025-01-21 13:10:06 +00:00
Austin Kerbow
2e5c298281
[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167)
Add a prologue to the kernel entry to handle cases where code designed
for kernarg preloading is executed on hardware equipped with
incompatible firmware. If hardware has compatible firmware the 256 bytes
at the start of the kernel entry will be skipped. This skipping is done
automatically by hardware that supports the feature.

A pass is added which is intended to be run at the very end of the
pipeline to avoid any optimizations that would assume the prologue is a
real predecessor block to the actual code start. In reality we have two
possible entry points for the function. 1. The optimized path that
supports kernarg preloading which begins at an offset of 256 bytes. 2.
The backwards compatible entry point which starts at offset 0.
2025-01-10 11:39:02 -08:00
Shilei Tian
86734c8577 [NFC][AMDGPU] Remove redundant code in AMDGPUAsmPrinter.cpp 2024-11-20 15:08:26 -05:00
Matt Arsenault
5a556d55fb
AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309) 2024-11-18 10:48:56 -08:00
Shilei Tian
6548b6354d Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
2024-11-08 20:21:16 -05:00
Janek van Oirschot
7f60f1312a
[AMDGPU] Fix resource usage information for unnamed functions (#115320)
Resource usage information would try to overwrite unnamed functions if
there are multiple within the same compilation unit. This aims to either
use the `MCSymbol` assigned to the unnamed function (i.e.,
`CurrentFnSym`), or, rematerialize the `MCSymbol` for the unnamed
function.
2024-11-07 18:24:54 +00:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Thomas Symalla
b95d50e5d8
Add and call AMDGPUMCResourceInfo::reset method (#110818)
When compiling multiple pipelines, the `MCRegisterInfo` instance in
`AMDGPUAsmPrinter` gets re-used even after finalization, so it calls
`finalize()` multiple times.

Add a reset method and call it in
`AMDGPUAsmPrinter::doFinalization`.

Different approach would be to make it a `unique_ptr`.

---------

Co-authored-by: Thomas Symalla <tsymalla@amd.com>
2024-10-02 14:17:01 +02:00
Janek van Oirschot
c897c13dde
[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913)
Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propagation to to MC layer (through
helpers in AMDGPUMCResourceInfo) by generating MCExprs for every
function resource which the emitters have been prepped for.

Fixes https://github.com/llvm/llvm-project/issues/64863
2024-09-30 11:43:34 +01:00
Austin Kerbow
954ab83e6a
[AMDGPU] Include unused preload kernarg in KD total SGPR count (#104743)
Unlike with implicitly preloaded data UserSGPRs firmware is unable to
handle cases where SGPRs for kernel arguments contain preloaded data but
not are not explicitly referenced in the kernel. We need to include
these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation
to not clobber SGPRs in adjacent waves.
2024-09-23 13:48:22 -07:00
Janek van Oirschot
bfce1aae76
[AMDGPU] MCExpr printing helper with KnownBits support (#95951)
Walks over the MCExpr and uses KnownBits to deduce whether an expression
is known and if so, prints said known value. Should support the most
common MCExpr cases for AMDGPU metadata.
2024-08-15 13:43:13 +01:00
Joseph Huber
f39bd0a24e
[AMDGPU] Do not print kernel-resource-usage information on non-kernels (#99720)
Summary:
This pass is used to get helpful information about the kernel resources
without needing to insepct the binary. However, it currently prints on
every function. These values will always be zero, so it's just spam on
the terminal, at best an indication that a function wasn't internalized
/ optimized out. This patch makes it only print for kernels to make it
more useful in practice.
2024-07-22 07:07:51 -05:00
Jay Foad
63fae3ed65
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298) 2024-07-17 21:11:00 +01:00
Jay Foad
5e338f1f4a [AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC. 2024-07-17 08:27:35 +01:00
Jay Foad
f10a78b7e4 [AMDGPU] clang-tidy: use std::make_unique. NFC. 2024-07-17 07:58:09 +01:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Janek van Oirschot
17eaa23f7e
[AMDGPU] MCExpr-ify AMDGPU HSAMetadata (#94788)
Enables MCExpr for HSAMetadata, particularly, HSAMetadata's msgpack format.
2024-06-26 16:39:08 +01:00
Ivan Kosarev
13ed349c44
[AMDGPU][NFC] Rename AMDGPUVariadicMCExpr to AMDGPUMCExpr. (#96618)
Some of our custom expressions are not variadic and there seems to be
little benefit in mentioning the variadic nature of expression nodes in
the name anyway.
2024-06-25 15:32:09 +01:00
Nicolai Hähnle
7e9b49f6b8
AMDGPU: Add plumbing for private segment size argument (#96445)
The actual size of scratch/private is determined at dispatch time, so
add more plumbing to request it. Will be used in subsequent change.
2024-06-25 16:20:51 +02:00
Janek van Oirschot
3d1705d00c
MCExpr-ify AMDGPU PALMetadata (#93236)
Allows MCExprs as passed values to PALMetadata. Also adds related
`DelayedMCExpr` classes which serve as a pseudo-fixup to resolve MCExprs
as late as possible (i.e., right before emit through string or blob,
where they should be resolvable).
2024-06-13 13:59:31 +01:00
Jie Fu
a1c29df572 [AMDGPU] Fix -Wunused-variable in AMDGPUAsmPrinter.cpp (NFC)
/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:653:13:
error: unused variable 'PGMRSrc3' [-Werror,-Wunused-variable]
    int64_t PGMRSrc3;
            ^
1 error generated.
2024-06-10 20:35:09 +08:00
Janek van Oirschot
bc022b406d
[AMDGPU] Support SIProgramInfo MCExpr for comments and remarks (#94350)
Eliminates assumption that MCExpr comments/remarks being emitted are
always resolvable
2024-06-10 12:34:06 +01:00
Janek van Oirschot
a699ccbf0c
MCExpr-ify amd_kernel_code_t (#91587)
Redefines the amd_kernel_code_t struct with MCExprs for members that would be
derived from SIProgramInfo MCExpr members.
2024-05-22 13:45:45 +01:00
Fangrui Song
9500a5d02e [MC] Make UseAssemblerInfoForParsing mostly true
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).

`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.

I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.

```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
  ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc     # succeeded

% clang -c b.cc     # succeeded with this patch
% clang -S b.cc     # still failed
<inline asm>:4:5: error: expected absolute expression
    4 | .if . -_start == 1
      |     ^
1 error generated.
```

However, removing `getUseAssemblerInfoForParsing` would make
MCDwarfFrameEmitter::Emit (for .eh_frame FDE) slow (~4% compile time
regression for sqlite3.c amalgamation) due to expensive
`AttemptToFoldSymbolOffsetDifference`. For now, make
`UseAssemblerInfoForParsing` false in MCDwarfFrameEmitter::Emit.

Close #62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841

Pull Request: https://github.com/llvm/llvm-project/pull/91082
2024-05-19 23:35:15 -07:00
Nikita Popov
fa750f09be Revert "[MC] Remove UseAssemblerInfoForParsing"
This reverts commit 03c53c69a367008da689f0d2940e2197eb4a955c.

This causes very large compile-time regressions in some cases,
e.g. sqlite3 at O0 regresses by 5%.
2024-05-16 09:56:07 +09:00
Fangrui Song
03c53c69a3
[MC] Remove UseAssemblerInfoForParsing
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).

`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.

I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.

```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
  ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc     # succeeded

% clang -c b.cc     # succeeded with this patch
% clang -S b.cc     # still failed
<inline asm>:4:5: error: expected absolute expression
    4 | .if . -_start == 1
      |     ^
1 error generated.
```

Close #62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841

Pull Request: https://github.com/llvm/llvm-project/pull/91082
2024-05-15 09:18:39 -07:00
Jie Fu
c2a87d7e03 [AMDGPU] Remove unused lambda capture (NFC)
/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:733:30:
error: lambda capture 'Ctx' is not used [-Werror,-Wunused-lambda-capture]
  auto TryGetMCExprValue = [&Ctx](const MCExpr *Value, uint64_t &Res) -> bool {
                            ~^~~
1 error generated.
2024-05-09 20:15:41 +08:00
Janek van Oirschot
d86b68afd7
MCExpr-ify SIProgramInfo (#88257)
Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.
2024-05-09 13:02:32 +01:00
Kazu Hirata
c18bcd0a57
[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072) (#91138)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  38 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-05 13:43:10 -07:00
Carl Ritson
44648ccb8b
[AMDGPU] Always emit lds_size in PAL ELF Metadata 3.0 (#87222)
Emit lds_size for all shader types in PAL metadata.
2024-05-03 17:01:03 +09:00
Janek van Oirschot
1103a2a337
Reland [AMDGPU] MCExpr-ify MC layer kernel descriptor (#86494)
Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.

Relands #80855 with fixes
2024-03-27 11:59:56 +00:00