llvm-project

Author	SHA1	Message	Date
Austin Kerbow	3e4efe3ed4	[AMDGPU] Add ML-oriented coexec scheduler selection and queue handling (#169616 ) This patch adds the initial coexec scheduler scaffold for machine learning workloads on gfx1250. It introduces function and module-level controls for selecting the AMDGPU preRA and postRA schedulers, including an `amdgpu-workload-type` module flag that maps ML workloads to coexec preRA scheduling and a nop postRA scheduler by default. It also updates the coexec scheduler to use a simplified top-down candidate selection path that considers both available and pending queues through a single flow, setting up follow-on heuristic work.	2026-03-23 09:30:01 -07:00
Jay Foad	79d1a2c418	[AMDGPU] Standardize on using AMDGPU::getNullPointerValue. NFC. (#187037 ) AMDGPUTargetMachine also had a static method which did the same thing. Remove it so that we have a single source of truth.	2026-03-17 17:08:16 +00:00
Yoonseo Choi	0ab9053327	[AMDGPU] Cgscc amdgpu attributor boilerplate NFC (#179719 ) This PR is adding a boilerplate of CGSCC AMDGPUAttributor pass (amdgpu-attributor-cgscc) by doing refactoring from the existing Module AMDGPUAttributor pass (amdgpu-attributor). CGSCC AMDGPUAttributor pass sets `AttributorConfig.IsModulePass = false`, and make Attributor's `Functions` set contain only functions in a SCC. The main implementations of abstract attributes have not changed - NFC. Subsequently, in future work some of the AMDGPU abstract attributors might move to be handled by CGSCC pass. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2026-03-09 15:14:03 -04:00
Mirko Brkušanin	d0f50d5574	[AMDGPU] Remove DX10_CLAMP and IEEE bits from gfx1170 (#182107 ) Add `DX10ClampAndIEEEMode` feature and set it for every subtarget prior to gfx1170	2026-03-04 12:16:41 +01:00
Matt Arsenault	3a8081cedc	AMDGPU: Stop checking for r600 in printf pass (#183536 ) Avoid adding the pass to the pipeline in the first place.	2026-02-26 17:18:36 +01:00
Aiden Grossman	abc443ba0a	[CodeGen][NewPM] Adjust pipeline for AsmPrinter AsmPrinter needs to be split into three passes (begin, per MF, end) to avoid the need to materialize all machine functions at the same time. Update the CodeGenPassBuilder hooks for this. Reviewers: aeubanks, paperchalice, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/182795	2026-02-23 17:23:25 -08:00
Aiden Grossman	683d15f810	[CodeGen][NewPM] Plumb MCContext through buildCodeGenPipeline Otherwise we cannot create an MCStreamer without getting MMI, which we cannot do until we have started running AsmPrinter without also plumbing MMI through CodeGenPassBuilder. Reviewers: arsenm, paperchalice, aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/182794	2026-02-23 17:21:32 -08:00
Dark Steve	9e6a6be8a8	[AMDGPU] Remove AMDGPUArgumentUsageInfo pass (#182490 ) `AMDGPUArgumentUsageInfo` provided a per-function map that `lowerFormalArguments` would write each function's implicit argument register layout into, and `passSpecialInputs` would read back when lowering calls to look up the callee's layout. This per-function map is redundant for all non-entry callees, which already use the same `FixedABIFunctionInfo` register layout. GlobalISel already used `FixedABIFunctionInfo` unconditionally. This change makes SelectionDAG do the same.	2026-02-23 18:47:01 +05:30
Teja Alaghari	a03f82d7e5	[AMDGPU][NPM] Add target-specific register allocation options (#178889 ) Add below AMDGPU-specific options for its SGPR, WWM & VGPR registers allocation in NPM - - `-sgpr-regalloc-npm` - `-wwm-regalloc-npm` - `-vgpr-regalloc-npm`	2026-02-04 11:27:47 +05:30
Carl Ritson	5cc4b05380	[AMDGPU] Add scheduling DAG mutation for hazard latencies (#170075 ) Improve waitcnt merging in ML kernel loops by increasing latencies on VALU writes to SGPRs. Specifically this helps with the case of V_CMP output feeding V_CNDMASK instructions.	2026-02-03 11:10:28 +09:00
Vikram Hegde	537276af2d	[AMDGPU][NPM] Complete fast regalloc pipeline (#174096 )	2026-01-23 20:47:54 +01:00
Vikram Hegde	ced1c00c91	[AMDGPU][NPM] Enable "AMDGPURewriteAGPRCopyMFMAPass" (#173487 )	2026-01-21 17:57:34 +01:00
Vikram Hegde	324888bdb6	[AMDGPU][NPM] Obey "enable-amdgpu-aa" option (#173486 )	2026-01-21 13:16:05 +05:30
Vikram Hegde	e2e2c50beb	[AMDGPU][NPM] Disable few non useful passes (#172796 ) Matches the legacy pipeline	2026-01-01 13:19:41 +05:30
Vikram Hegde	3b7a97379b	[AMDGPU][NPM] add "addPostBBSections()" to NPM (#172793 ) Matches Legacy pipeline, GCNPassConfig::addPostBBSections()	2025-12-31 10:39:33 +05:30
Aiden Grossman	5678c939e8	[CodeGen][NPM] Do not implicitly flush pipeline when switching to CGSCC (#173315 )	2025-12-22 16:33:57 -08:00
Aiden Grossman	5f4fc30b1c	[NPM] Remove unused includes for CodeGenPassBuilder (#172575 ) There are a couple around. Remove them to better comply with IWYU.	2025-12-21 19:24:35 +00:00
Aiden Grossman	6b183f4cfd	[Codegen][NewPM] Explicitly Nest Passes in CodegenPassBuilder (#169867 ) This implements the major piece of https://discourse.llvm.org/t/rfc-codegen-new-pass-manager-pipeline-construction-design/84659, making it explicit when we break the function pipeline up. We essentially get rid of the AddPass and AddMachinePass helpers and replace them with explicit functions for the pass types. The user then needs to explicitly call flushFPMstoMPM before breaking. This is sort of a hybrid of the current construction and what the RFC proposed. The alternative would be passing around FunctionPassManagers and having the pipeline actually explicitly constructed. I think this compromises ergonomics slightly (needing to pass a FPM in many more places). It is also nice to assert that the function pass manager is empty when adding a module pass, which is easier when CodegenPassBuilder owns the FPM and MFPM.	2025-12-16 15:42:28 -08:00
Vikram Hegde	c590b35f0f	[AMDGPU][NPM] Enable SIModeRegister and SIInsertHardclauses passes (#168831 ) Passes already ported.	2025-12-09 14:01:15 +05:30
Dark Steve	cc19f420b9	[AMDGPU][NPM] Port AMDGPUArgumentUsageInfo to NPM (#170886 ) Port AMDGPUArgumentUsageInfo analysis to the NPM to fix suboptimal code generation when NPM is enabled by default. Previously, DAG.getPass() returns nullptr when using NPM, causing the argument usage info to be unavailable during ISel. This resulted in fallback to FixedABIFunctionInfo which assumes all implicit arguments are needed, generating unnecessary register setup code for entry functions. Fixes LLVM::CodeGen/AMDGPU/cc-entry.ll Changes: - Split AMDGPUArgumentUsageInfo into a data class and NPM analysis wrapper - Update SIISelLowering to use DAG.getMFAM() for NPM path - Add RequireAnalysisPass in addPreISel() to ensure analysis availability This follows the same pattern used for PhysicalRegisterUsageInfo.	2025-12-08 20:38:00 +05:30
Matt Arsenault	1d30ae6e40	AMDGPU: Stop forcing RequiresCodeGenSCCOrder (#169522 ) This hasn't been strictly necessary since c897c13dde. Practically this makes little difference; we still enable IPRA by default which implies this option. By removing this explicit force, -enable-ipra=0 has the expected change in the pass pipeline to remove the DummyCGSCC runs.	2025-11-25 13:23:55 -05:00
Matt Arsenault	84df446af9	AMDGPU: Remove DummyCGSCC use after buffer lowering passes (#169519 ) The fixme the comment refers to was removed.	2025-11-25 12:53:13 -05:00
tyb0807	29d1e1857d	[AMDGPU] Enable serializing of allocated preload kernarg SGPRs info (#168374 ) - Support serialization of the number of allocated preload kernarg SGPRs - Support serialization of the first preload kernarg SGPR allocated Together they enable reconstructing correctly MIR with preload kernarg SGPRs.	2025-11-22 14:03:14 -08:00
Carl Ritson	711a295479	[AMDGPU] Ignore wavefront barrier latency during scheduling DAG mutation (#168500 ) Do not add latency for wavefront and singlethread scope fences during barrier latency DAG mutation. These scopes do not typically introduce any latency and adjusting schedules based on them significantly impacts latency hiding.	2025-11-19 17:49:14 +09:00
Chaitanya	49d5bb0ad0	[AMDGPU] Add amdgpu-lower-exec-sync pass to lower named-barrier globals (#165692 ) This PR introduces `amdgpu-lower-exec-sync` pass which specifically lowers named-barrier LDS globals introduced by #114550 . Changes include: - Moving the logic of lowering named-barrier LDS globals from `amdgpu-lower-module-lds` pass to this new pass. - This PR adds the pass to pipeline, remove the existing lowering logic for named-barrier LDS in `amdgpu-lower-module-lds` See #161827 for discussion on this topic.	2025-11-17 10:08:40 +05:30
Jakub Kuderski	4c21d0cb14	[ADT] Prepare to deprecate variadic `StringSwitch::Cases`. NFC. (#166020 ) Update all uses of variadic `.Cases` to use the initializer list overload instead. I plan to mark variadic `.Cases` as deprecated in a followup PR. For more context, see https://github.com/llvm/llvm-project/pull/163117.	2025-11-02 00:12:33 +00:00
Pankaj Dwivedi	4d7093b806	[AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (#162819 ) This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline. Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag to enable/disable the pass. see the PR:https://github.com/llvm/llvm-project/pull/116953	2025-10-30 12:32:32 +05:30
Pankaj Dwivedi	20532c0aab	[AMDGPU] make AMDGPUUniformIntrinsicCombine a function pass (#165265 ) There has been an issue(using function analysis inside the module pass in OPM) integrating this pass into the LLC pipeline, which currently lacks NPM support. I tried finding a way to get the per-function analysis, but it seems that in OPM, we don't have that option. So the best approach would be to make it a function pass. Ref: https://github.com/llvm/llvm-project/pull/116953	2025-10-29 11:56:43 +05:30
Sam Clegg	7ebc3dbe8b	[llvm] Make getEffectiveRelocModel helper consistent across targets. NFC (#165121 ) - On targets that don't require the Triple, don't pass it. - Use `.value_or` to where possible.	2025-10-25 21:20:20 -07:00
paperchalice	f3df058b03	[Passes] Report error when pass requires target machine (#142550 ) Fixes #142146 Do nullptr check when pass accept `const TargetMachine &` in constructor, but it is still not exhaustive.	2025-10-23 12:57:03 +08:00
Carl Ritson	af6fa77a35	[AMDGPU] Add DAG mutation to improve scheduling before barriers (#142716 ) Add scheduler DAG mutation to add data dependencies between atomic fences and preceding memory reads. This allows some modelling of the impact an atomic fence can have on outstanding memory accesses. This is beneficial when a fence would cause wait count insertion, as more instructions will be scheduled before the fence hiding memory latency.	2025-10-21 13:28:52 +09:00
Pankaj Dwivedi	53aad35208	[AMDGPU] Introduce "amdgpu-uniform-intrinsic-combine" pass to combine uniform AMDGPU lane Intrinsics. (#116953 ) This pass introduces optimizations for AMDGPU intrinsics by leveraging the uniformity of their arguments. When an intrinsic's arguments are detected as uniform, redundant computations are eliminated, and the intrinsic calls are simplified accordingly. By utilizing the UniformityInfo analysis, this pass identifies cases where intrinsic calls are uniform across all lanes, allowing transformations that reduce unnecessary operations and improve the IR's efficiency. These changes enhance performance by streamlining intrinsic usage in uniform scenarios without altering the program's semantics. For background, see PR #99878	2025-10-09 12:44:56 +05:30
Shilei Tian	4ddc0f3ffd	[AMDGPU] Add the missing enabling check of AMDGPUAttributor (#162420 )	2025-10-08 04:21:04 +00:00
James Y Knight	783c1a7617	AMDGPU: skip AMDGPUAttributor pass on R600 some more. (#162418 ) This is a follow-up for #162207, where I neglected to skip the second use of AMDGPUAttributor for R600 targets. This use is covered by the test lld/test/ELF/lto/r600.ll.	2025-10-08 03:40:30 +00:00
James Y Knight	2f4275b195	AMDGPU: skip AMDGPUAttributor and AMDGPUImageIntrinsicOptimizerPass on R600. (#162207 ) These passes call `getSubtarget<GCNSubtarget>`, which doesn't work on R600 targets, as that uses an `R600Subtarget` type, instead. Unfortunately, `TargetMachine::getSubtarget<ST>` does an unchecked static_cast to `ST&`, which makes it easy for this error to go undetected. The modifications here were verified by running check-llvm with an assert added to getSubtarget. However, that asssert requires that RTTI is enabled, which LLVM doesn't use, so I've reverted the assert before sending this fix upstream. These errors have been present for some time, but were detected after #162040 caused an uninitialized memory read to be reported by asan/msan.	2025-10-07 16:17:59 +00:00
Gang Chen	640644d68a	[AMDGPU] Move LowerBufferFatPointers after LoadStoreVectorizer and remove the fixme (#161531 ) Move LowerBufferFatPointers pass after CodegenPrepare and LoadStoreVectorizer pass, and remove the fixme about that.	2025-10-01 17:52:15 -07:00
Reid Kleckner	f3efbce4a7	[llvm] Move data layout string computation to TargetParser (#157612 ) Clang and other frontends generally need the LLVM data layout string in order to generate LLVM IR modules for LLVM. MLIR clients often need it as well, since MLIR users often lower to LLVM IR. Before this change, the LLVM datalayout string was computed in the LLVM${TGT}CodeGen library in the relevant TargetMachine subclass. However, none of the logic for computing the data layout string requires any details of code generation. Clients who want to avoid duplicating this information were forced to link in LLVMCodeGen and all registered targets, leading to bloated binaries. This happened in PR #145899, which measurably increased binary size for some of our users. By moving this information to the TargetParser library, we can delete the duplicate datalayout strings in Clang, and retain the ability to generate IR for unregistered targets. This is intended to be a very mechanical LLVM-only change, but there is an immediately obvious follow-up to clang, which will be prepared separately. The vast majority of data layouts are computable with two inputs: the triple and the "ABI name". There is only one exception, NVPTX, which has a cl::opt to enable short device pointers. I invented a "shortptr" ABI name to pass this option through the target independent interface. Everything else fits. Mips is a bit awkward because it uses a special MipsABIInfo abstraction, which includes members with codegen-like concepts like ABI physical registers that can't live in TargetParser. I think the string logic of looking for "n32" "n64" etc is reasonable to duplicate. We have plenty of other minor duplication to preserve layering. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>	2025-09-11 11:05:29 -07:00
Stanislav Mekhanoshin	5901d896f4	[AMDGPU] Register amdgpu-lower-vgpr-encoding pass in npm (#156971 )	2025-09-05 00:07:47 -07:00
Stanislav Mekhanoshin	1f0f3473e6	[AMDGPU] High VGPR lowering on gfx1250 (#156965 )	2025-09-04 16:20:47 -07:00
Nicolai Hähnle	353b5e43c6	AMDGPU: Refactor lowering of s_barrier to split barriers (#154648 ) Let's do the lowering of non-split into split barriers in a new IR pass, AMDGPULowerIntrinsics. That way, there is no code duplication between SelectionDAG and GlobalISel. This simplifies some upcoming extensions to the code.	2025-08-28 07:01:20 -07:00
Ivan Kosarev	faca8c9ed4	[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769 ) Saves around 125-210 MB of compilation memory usage per source for roughly one third of our backend sources, ~60 MB on average.	2025-08-22 10:05:06 +01:00
Shoreshen	04aebbfbe2	[AMDGPU] Delete AMDGPU Unify Metadata pass (#153548 ) Fixes #153150	2025-08-14 16:16:32 +08:00
Vikram Hegde	e50bd78d54	Reapply "[CodeGen][NPM] Stitch up loop passes in codegen pipeline" (#151098 ) Reapplies https://github.com/llvm/llvm-project/pull/148114 includes shared lib build failure fixes for AMDGPU and X86.	2025-07-30 13:29:15 +05:30
Alex Voicu	6bcff9eb13	[HIPSTDPAR] Add handling for math builtins (#140158 ) When compiling in `--hipstdpar` mode, the builtins corresponding to the standard library might end up in code that is expected to execute on the accelerator (e.g. by using the `std::` prefixed functions from `<cmath>`). We do not have uniform handling for this in AMDGPU, and the errors that obtain are quite arcane. Furthermore, the user-space changes required to work around this tend to be rather intrusive. This patch adds an additional `--hipstdpar` specific pass which forwards to the run time component of HIPSTDPAR the intrinsics / libcalls which result from the use of the math builtins, and which are not properly handled. In the long run we will want to stop relying on this and handle things in the compiler, but it is going to be a rather lengthy journey, which makes this medium term escape hatch necessary. The paired change in the run time component is here <https://github.com/ROCm/rocThrust/pull/551>.	2025-07-28 22:29:31 +01:00
Vikram Hegde	495774d6d5	Revert "[CodeGen][NPM] Stitch up loop passes in codegen pipeline" (#150883 ) Reverts llvm/llvm-project#148114 will update with fixed PR.	2025-07-28 11:28:00 +05:30
Vikram Hegde	d35bf478a8	[CodeGen][NPM] Stitch up loop passes in codegen pipeline (#148114 ) same as https://github.com/llvm/llvm-project/pull/133050 Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>	2025-07-28 11:13:44 +05:30
Matt Arsenault	8f3e78f971	AMDGPU: Add pass to replace constant materialize with AV pseudos (#149292 ) If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate, replace it with a pseudo which writes to the combined AV_* class. This relaxes the operand constraints, which will allow the allocator to inflate the register class to AV_* to potentially avoid spilling. The allocator does not know how to replace an instruction to enable the change of register class. I originally tried to do this by changing all of the places we introduce v_mov_b32 with immediate, but it's along tail of niche cases that require manual updating. Plus we can restrict this to only run on functions where we know we will be allocating AGPRs.	2025-07-18 17:15:38 +09:00
Vikram Hegde	72c61a6a25	[AMDGPU][NPM] Fill in addPreSched2 passes (#148112 ) same as https://github.com/llvm/llvm-project/pull/139516 Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>	2025-07-17 12:26:27 +05:30
Vikram Hegde	1ccd779324	[AMDGPU][NewPM] Port "AMDGPUResourceUsageAnalysis" to NPM (#130959 )	2025-07-10 13:35:43 +05:30
Akshat Oke	edaf656d5e	[CodeGen][NPM] Differentiate pipeline-required and opt-required passes (#135752 ) "Required" passes relate to actually running the pass on the IR, regardless of whether they are in the pipeline. CGPassBuilder was mistakenly still adding them to the pipeline. The test `llc -stop-after=greedy -enable-new-pm` would still add `greedy` to the pipeline otherwise.	2025-07-09 14:52:58 +05:30

1 2 3 4 5 ...

718 Commits