llvm-project

Author	SHA1	Message	Date
Diana Picus	a910a6a8b5	[AMDGPU] AsmPrinter: Unify arg handling (#151672 ) When computing the number of registers required by entry functions, the `AMDGPUAsmPrinter` needs to take into account both the register usage computed by the `AMDGPUResourceUsageAnalysis` pass, and the number of registers initialized by the hardware. At the moment, the way it computes the latter is different for graphics vs compute, due to differences in the implementation. For kernels, all the information needed is available in the `SIMachineFunctionInfo`, but for graphics shaders we would iterate over the `Function` arguments in the `AMDGPUAsmPrinter`. This pretty much repeats some of the logic from instruction selection. This patch introduces 2 new members to `SIMachineFunctionInfo`, one for SGPRs and one for VGPRs. Both will be computed during instruction selection and then used during `AMDGPUAsmPrinter`, removing the need to refer to the `Function` when printing assembly. This patch is NFC except for the fact that we now add the extra SGPRs (VCC, XNACK etc) to the number of SGPRs computed for graphics entry points. I'm not sure why these weren't included before. It would be nice if someone could confirm if that was just an oversight or if we have some docs somewhere that I haven't managed to find. Only one test is affected (its SGPR usage increases because we now take into account the XNACK registers).	2025-08-08 12:00:37 +02:00
Tim Renouf	e99c565cd2	MC,AMDGPU: Don't pad .text with s_code_end if it would otherwise be empty (#147980 ) We don't want that padding in a module that only contains data, not code. Also fix MCSection::hasInstructions() so it works with the asm streamer too.	2025-08-06 13:25:45 +01:00
Pierre van Houtryve	be17791f26	[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588 ) Determines whether we can use `SCOPE_CU` stores (on by default), or whether all stores must be done at `SCOPE_SE` minimum.	2025-07-29 11:38:43 +02:00
Jay Foad	1c49ce676c	[AMDGPU] Enable FWD_PROGRESS bit for GFX10+ on PAL (#139895 ) Performance testing shows no significant gains or losses on graphics workloads, so this is mostly to make the behavior consistent across all supported OSes instead of special-casing HSA.	2025-07-21 17:29:06 +01:00
Vikram Hegde	1ccd779324	[AMDGPU][NewPM] Port "AMDGPUResourceUsageAnalysis" to NPM (#130959 )	2025-07-10 13:35:43 +05:30
Diana Picus	a201f8872a	[AMDGPU] Replace dynamic VGPR feature with attribute (#133444 ) Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in #130030.	2025-06-24 11:09:36 +02:00
Matt Arsenault	7031280218	AMDGPU: Use reportFatalUsageError for unsupported code object version (#145133 )	2025-06-21 13:01:18 +09:00
Andrew Rogers	19658d1474	[llvm] annotate interfaces in llvm/Target for DLL export (#143615 ) ## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang	2025-06-17 13:28:45 -07:00
Diana Picus	a5cbd2ab0b	Revert "[AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis (#… (#144039 ) …133242)" This reverts commit 130080fab11cde5efcb338b77f5c3b31097df6e6 because it causes issues in testcases similar to coalescer_remat.ll [1], i.e. when we use a VGPR tuple but only write to its lower parts. The high VGPRs would then not be included in the vgpr_count, and accessing them would be an out of bounds violation. [1] https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/coalescer_remat.ll	2025-06-13 12:48:24 +02:00
Diana Picus	130080fab1	[AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis (#133242 ) Don't count register uses when determining the maximum number of registers used by a function. Count only the defs. This is really an underestimate of the true register usage, but in practice that's not a problem because if a function uses a register, then it has either defined it earlier, or some other function that executed before has defined it. In particular, the register counts are used: 1. When launching an entry function - in which case we're safe because the register counts of the entry function will include the register counts of all callees. 2. At function boundaries in dynamic VGPR mode. In this case it's safe because whenever we set the new VGPR allocation we take into account the outgoing_vgpr_count set by the middle-end. The main advantage of doing this is that the artificial VGPR arguments used only for preserving the inactive lanes when using the llvm.amdgcn.init.whole.wave intrinsic are no longer counted. This enables us to allocate only the registers we need in dynamic VGPR mode. --------- Co-authored-by: Thomas Symalla <5754458+tsymalla@users.noreply.github.com>	2025-06-03 11:20:48 +02:00
Matthias Braun	675cb70641	Register assembly printer passes (#138348 ) Register assembly printer passes in the pass registry. This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests. Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow targets to use the `INITIALIZE_PASS` macros and register the pass in the pass registry. This currently has a default parameter so it won't break any targets that have not been updated.	2025-05-06 18:01:17 -07:00
Nikita Popov	b492ec5899	[ErrorHandling] Add reportFatalInternalError + reportFatalUsageError (NFC) (#138251 ) This implements the result of the discussion at: https://discourse.llvm.org/t/rfc-report-fatal-error-and-the-default-value-of-gencrashdialog/73587 There are two different use cases for report_fatal_error, so replace it with two functions reportFatalInternalError() and reportFatalUsageError(). The former indicates a bug in LLVM and generates a crash dialog. The latter does not. The names have been suggested by rnk and people seemed to like them. This replaces a lot of the usages that passed an explicit value for GenCrashDiag. I did not bulk replace remaining report_fatal_error usage -- they probably require case by case review for which function to use.	2025-05-05 12:10:03 +02:00
Diana Picus	72c3c30452	[AMDGPU] Allocate scratch space for dVGPRs for CWSR (#130055 ) The CWSR trap handler needs to save and restore the VGPRs. When dynamic VGPRs are in use, the fixed function hardware will only allocate enough space for one VGPR block. The rest will have to be stored in scratch, at offset 0. This patch allocates the necessary space by: - generating a prologue that checks at runtime if we're on a compute queue (since CWSR only works on compute queues); for this we will have to check the ME_ID bits of the ID_HW_ID2 register - if that is non-zero, we can assume we're on a compute queue and initialize the SP and FP with enough room for the dynamic VGPRs - forcing all compute entry functions to use a FP so they can access their locals/spills correctly (this isn't ideal but it's the quickest to implement) Note that at the moment we allocate enough space for the theoretical maximum number of VGPRs that can be allocated dynamically (for blocks of 16 registers, this will be 128, of which we subtract the first 16, which are already allocated by the fixed function hardware). Future patches may decide to allocate less if they can prove the shader never allocates that many blocks. Also note that this should not affect any reported stack sizes (e.g. PAL backend_stack_size etc).	2025-03-19 13:49:19 +01:00
Diana Picus	0a21ef9536	[AMDGPU] Add SubtargetFeature for dynamic VGPR mode (#130030 ) This represents a hardware mode supported only for wave32 compute shaders. When enabled, we set the `.dynamic_vgpr_en` field of `.compute_registers` to true in the PAL metadata. This will be changed to use an attribute after downstream consumers have been migrated.	2025-03-18 11:48:01 +01:00
Alex Voicu	c1fabd681f	[llvm][AMDGPU] Enable FWD_PROGRESS bit for GFX10+ (#128367 ) From GFX10 onwards it is possible to employ benevolent scheduling of waves. This patch unconditionally enables, for the `amdhsa` OS, the bit which controls that capability, as it is beneficial for algorithms that rely on more complex concurrent coordination and it is generally performance neutral otherwise.	2025-03-17 23:17:46 +00:00
Stanislav Mekhanoshin	6c9a9d9fe2	[AMDGPU] Set inst_pref_size to maximum (#126981 ) On gfx11 and gfx12 set initial instruction prefetch size to a minimum of kernel size and maximum allowed value. Fixes: SWDEV-513122	2025-03-03 10:40:31 -08:00
Stanislav Mekhanoshin	2479479285	[AMDGPU] Extend ComputePGMRSrc3 to gfx10+. NFCI. (#129289 ) ComputePGMRSrc3 exists since gfx90a and gfx10+. Current code only expects gfx90a. This is NFCI since we do not fill it on gfx10+ yet.	2025-03-03 08:22:15 -08:00
Stanislav Mekhanoshin	d19187f5fe	[AMDGPU] Move into SIProgramInfo and cache getFunctionCodeSize. NFCI. (#127111 ) This moves function as is, improvements to the estimate go into a subseqent patch.	2025-02-17 18:22:48 -08:00
Lucas Ramirez	6206f5444f	[AMDGPU] Occupancy w.r.t. workgroup size range is also a range (#123748 ) Occupancy (i.e., the number of waves per EU) depends, in addition to register usage, on per-workgroup LDS usage as well as on the range of possible workgroup sizes. Mirroring the latter, occupancy should therefore be expressed as a range since different group sizes generally yield different achievable occupancies. `getOccupancyWithLocalMemSize` currently returns a scalar occupancy based on the maximum workgroup size and LDS usage. With respect to the workgroup size range, this scalar can be the minimum, the maximum, or neither of the two of the range of achievable occupancies. This commit fixes the function by making it compute and return the range of achievable occupancies w.r.t. workgroup size and LDS usage; it also renames it to `getOccupancyWithWorkGroupSizes` since it is the range of workgroup sizes that produces the range of achievable occupancies. Computing the achievable occupancy range is surprisingly involved. Minimum/maximum workgroup sizes do not necessarily yield maximum/minimum occupancies i.e., sometimes workgroup sizes inside the range yield the occupancy bounds. The implementation finds these sizes in constant time; heavy documentation explains the rationale behind the sometimes relatively obscure calculations. As a justifying example, consider a target with 10 waves / EU, 4 EUs/CU, 64-wide waves. Also consider a function with no LDS usage and a flat workgroup size range of [513,1024]. - A group of 513 items requires 9 waves per group. Only 4 groups made up of 9 waves each can fit fully on a CU at any given time, for a total of 36 waves on the CU, or 9 per EU. However, filling as much as possible the remaining 40-36=4 wave slots without decreasing the number of groups reveals that a larger group of 640 items yields 40 waves on the CU, or 10 per EU. - Similarly, a group of 1024 items requires 16 waves per group. Only 2 groups made up of 16 waves each can fit fully on a CU ay any given time, for a total of 32 waves on the CU, or 8 per EU. However, removing as many waves as possible from the groups without being able to fit another equal-sized group on the CU reveals that a smaller group of 896 items yields 28 waves on the CU, or 7 per EU. Therefore the achievable occupancy range for this function is not [8,9] as the group size bounds directly yield, but [7,10]. Naturally this change causes a lot of test churn as instruction scheduling is driven by achievable occupancy estimates. In most unit tests the flat workgroup size range is the default [1,1024] which, ignoring potential LDS limitations, would previously produce a scalar occupancy of 8 (derived from 1024) on a lot of targets, whereas we now consider the maximum occupancy to be 10 in such cases. Most tests are updated automatically and checked manually for sanity. I also manually changed some non-automatically generated assertions when necessary. Fixes #118220.	2025-01-23 16:07:57 +01:00
Janek van Oirschot	82944595fa	[AMDGPU] Change scope of resource usage info symbols (#114810 ) Change scope of resource usage info MC symbols to align with the function linkage type	2025-01-21 13:10:06 +00:00
Austin Kerbow	2e5c298281	[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167 ) Add a prologue to the kernel entry to handle cases where code designed for kernarg preloading is executed on hardware equipped with incompatible firmware. If hardware has compatible firmware the 256 bytes at the start of the kernel entry will be skipped. This skipping is done automatically by hardware that supports the feature. A pass is added which is intended to be run at the very end of the pipeline to avoid any optimizations that would assume the prologue is a real predecessor block to the actual code start. In reality we have two possible entry points for the function. 1. The optimized path that supports kernarg preloading which begins at an offset of 256 bytes. 2. The backwards compatible entry point which starts at offset 0.	2025-01-10 11:39:02 -08:00
Shilei Tian	86734c8577	[NFC][AMDGPU] Remove redundant code in `AMDGPUAsmPrinter.cpp`	2024-11-20 15:08:26 -05:00
Matt Arsenault	5a556d55fb	AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309 )	2024-11-18 10:48:56 -08:00
Shilei Tian	6548b6354d	Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403 )" This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.	2024-11-08 20:21:16 -05:00
Janek van Oirschot	7f60f1312a	[AMDGPU] Fix resource usage information for unnamed functions (#115320 ) Resource usage information would try to overwrite unnamed functions if there are multiple within the same compilation unit. This aims to either use the `MCSymbol` assigned to the unnamed function (i.e., `CurrentFnSym`), or, rematerialize the `MCSymbol` for the unnamed function.	2024-11-07 18:24:54 +00:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Thomas Symalla	b95d50e5d8	Add and call `AMDGPUMCResourceInfo::reset` method (#110818 ) When compiling multiple pipelines, the `MCRegisterInfo` instance in `AMDGPUAsmPrinter` gets re-used even after finalization, so it calls `finalize()` multiple times. Add a reset method and call it in `AMDGPUAsmPrinter::doFinalization`. Different approach would be to make it a `unique_ptr`. --------- Co-authored-by: Thomas Symalla <tsymalla@amd.com>	2024-10-02 14:17:01 +02:00
Janek van Oirschot	c897c13dde	[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913 ) Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for. Fixes https://github.com/llvm/llvm-project/issues/64863	2024-09-30 11:43:34 +01:00
Austin Kerbow	954ab83e6a	[AMDGPU] Include unused preload kernarg in KD total SGPR count (#104743 ) Unlike with implicitly preloaded data UserSGPRs firmware is unable to handle cases where SGPRs for kernel arguments contain preloaded data but not are not explicitly referenced in the kernel. We need to include these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation to not clobber SGPRs in adjacent waves.	2024-09-23 13:48:22 -07:00
Janek van Oirschot	bfce1aae76	[AMDGPU] MCExpr printing helper with KnownBits support (#95951 ) Walks over the MCExpr and uses KnownBits to deduce whether an expression is known and if so, prints said known value. Should support the most common MCExpr cases for AMDGPU metadata.	2024-08-15 13:43:13 +01:00
Joseph Huber	f39bd0a24e	[AMDGPU] Do not print `kernel-resource-usage` information on non-kernels (#99720 ) Summary: This pass is used to get helpful information about the kernel resources without needing to insepct the binary. However, it currently prints on every function. These values will always be zero, so it's just spam on the terminal, at best an indication that a function wasn't internalized / optimized out. This patch makes it only print for kernels to make it more useful in practice.	2024-07-22 07:07:51 -05:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Jay Foad	5e338f1f4a	[AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC.	2024-07-17 08:27:35 +01:00
Jay Foad	f10a78b7e4	[AMDGPU] clang-tidy: use std::make_unique. NFC.	2024-07-17 07:58:09 +01:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Janek van Oirschot	17eaa23f7e	[AMDGPU] MCExpr-ify AMDGPU HSAMetadata (#94788 ) Enables MCExpr for HSAMetadata, particularly, HSAMetadata's msgpack format.	2024-06-26 16:39:08 +01:00
Ivan Kosarev	13ed349c44	[AMDGPU][NFC] Rename AMDGPUVariadicMCExpr to AMDGPUMCExpr. (#96618 ) Some of our custom expressions are not variadic and there seems to be little benefit in mentioning the variadic nature of expression nodes in the name anyway.	2024-06-25 15:32:09 +01:00
Nicolai Hähnle	7e9b49f6b8	AMDGPU: Add plumbing for private segment size argument (#96445 ) The actual size of scratch/private is determined at dispatch time, so add more plumbing to request it. Will be used in subsequent change.	2024-06-25 16:20:51 +02:00
Janek van Oirschot	3d1705d00c	MCExpr-ify AMDGPU PALMetadata (#93236 ) Allows MCExprs as passed values to PALMetadata. Also adds related `DelayedMCExpr` classes which serve as a pseudo-fixup to resolve MCExprs as late as possible (i.e., right before emit through string or blob, where they should be resolvable).	2024-06-13 13:59:31 +01:00
Jie Fu	a1c29df572	[AMDGPU] Fix -Wunused-variable in AMDGPUAsmPrinter.cpp (NFC) /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:653:13: error: unused variable 'PGMRSrc3' [-Werror,-Wunused-variable] int64_t PGMRSrc3; ^ 1 error generated.	2024-06-10 20:35:09 +08:00
Janek van Oirschot	bc022b406d	[AMDGPU] Support SIProgramInfo MCExpr for comments and remarks (#94350 ) Eliminates assumption that MCExpr comments/remarks being emitted are always resolvable	2024-06-10 12:34:06 +01:00
Janek van Oirschot	a699ccbf0c	MCExpr-ify amd_kernel_code_t (#91587 ) Redefines the amd_kernel_code_t struct with MCExprs for members that would be derived from SIProgramInfo MCExpr members.	2024-05-22 13:45:45 +01:00
Fangrui Song	9500a5d02e	[MC] Make UseAssemblerInfoForParsing mostly true Commit 6c0665e22174d474050e85ca367424f6e02476be (https://reviews.llvm.org/D45164) enabled certain constant expression evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives, see llvm/test/MC/AsmParser/assembler-expressions.s). `getUseAssemblerInfoForParsing` was added to make `clang -c` handling inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`), where such expression folding (related to `AttemptToFoldSymbolOffsetDifference`) is unavailable. I believe this is overly conservative. We can make some parse-time expression folding work for `clang -c` even if `clang -S` would still report an error, a MCAsmStreamer issue (we cannot print `.if` directives) that should not restrict the functionality of MCObjectStreamer. ``` % cat b.cc asm(R"( .pushsection .text,"ax" .globl _start; _start: ret .if . -_start == 1 ret .endif .popsection )"); % gcc -S b.cc && gcc -c b.cc % clang -S -fno-integrated-as b.cc # succeeded % clang -c b.cc # succeeded with this patch % clang -S b.cc # still failed <inline asm>:4:5: error: expected absolute expression 4 \| .if . -_start == 1 \| ^ 1 error generated. ``` However, removing `getUseAssemblerInfoForParsing` would make MCDwarfFrameEmitter::Emit (for .eh_frame FDE) slow (~4% compile time regression for sqlite3.c amalgamation) due to expensive `AttemptToFoldSymbolOffsetDifference`. For now, make `UseAssemblerInfoForParsing` false in MCDwarfFrameEmitter::Emit. Close #62520 Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841 Pull Request: https://github.com/llvm/llvm-project/pull/91082	2024-05-19 23:35:15 -07:00
Nikita Popov	fa750f09be	Revert "[MC] Remove UseAssemblerInfoForParsing" This reverts commit 03c53c69a367008da689f0d2940e2197eb4a955c. This causes very large compile-time regressions in some cases, e.g. sqlite3 at O0 regresses by 5%.	2024-05-16 09:56:07 +09:00
Fangrui Song	03c53c69a3	[MC] Remove UseAssemblerInfoForParsing Commit 6c0665e22174d474050e85ca367424f6e02476be (https://reviews.llvm.org/D45164) enabled certain constant expression evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives, see llvm/test/MC/AsmParser/assembler-expressions.s). `getUseAssemblerInfoForParsing` was added to make `clang -c` handling inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`), where such expression folding (related to `AttemptToFoldSymbolOffsetDifference`) is unavailable. I believe this is overly conservative. We can make some parse-time expression folding work for `clang -c` even if `clang -S` would still report an error, a MCAsmStreamer issue (we cannot print `.if` directives) that should not restrict the functionality of MCObjectStreamer. ``` % cat b.cc asm(R"( .pushsection .text,"ax" .globl _start; _start: ret .if . -_start == 1 ret .endif .popsection )"); % gcc -S b.cc && gcc -c b.cc % clang -S -fno-integrated-as b.cc # succeeded % clang -c b.cc # succeeded with this patch % clang -S b.cc # still failed <inline asm>:4:5: error: expected absolute expression 4 \| .if . -_start == 1 \| ^ 1 error generated. ``` Close #62520 Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841 Pull Request: https://github.com/llvm/llvm-project/pull/91082	2024-05-15 09:18:39 -07:00
Jie Fu	c2a87d7e03	[AMDGPU] Remove unused lambda capture (NFC) /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:733:30: error: lambda capture 'Ctx' is not used [-Werror,-Wunused-lambda-capture] auto TryGetMCExprValue = [&Ctx](const MCExpr *Value, uint64_t &Res) -> bool { ~^~~ 1 error generated.	2024-05-09 20:15:41 +08:00
Janek van Oirschot	d86b68afd7	MCExpr-ify SIProgramInfo (#88257 ) Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.	2024-05-09 13:02:32 +01:00
Kazu Hirata	c18bcd0a57	[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072 ) (#91138 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 38 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-05 13:43:10 -07:00
Carl Ritson	44648ccb8b	[AMDGPU] Always emit lds_size in PAL ELF Metadata 3.0 (#87222 ) Emit lds_size for all shader types in PAL metadata.	2024-05-03 17:01:03 +09:00
Janek van Oirschot	1103a2a337	Reland [AMDGPU] MCExpr-ify MC layer kernel descriptor (#86494 ) Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr. Relands #80855 with fixes	2024-03-27 11:59:56 +00:00

1 2 3 4 5 ...

375 Commits