llvm-project

Author	SHA1	Message	Date
Diana Picus	40a7dce9ef	[AMDGPU] Remove duplicated/confusing helpers. NFCI (#142598 ) Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead of keeping two copies for DAG/Global ISel. Also remove isKernelCC, which doesn't agree with isKernel and doesn't seem very useful. While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and mark them constexpr.	2025-06-05 11:19:20 +02:00
Pengcheng Wang	f393986b53	[MISched] Add templates for creating custom schedulers (#141935 ) We rename `createGenericSchedLive` and `createGenericSchedPostRA` to `createSchedLive` and `createSchedPostRA`, and add a template parameter `Strategy` which is the generic implementation by default. This can simplify some code for targets that have custom scheduler strategy.	2025-06-03 11:37:40 +08:00
Shilei Tian	4d48673562	Reapply "Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )"" This reverts commit 37ea3b32cdcb6c0dcecbcc4bf844f5190c7378dd.	2025-05-30 22:11:22 -04:00
Shilei Tian	37ea3b32cd	Revert "Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )"" This reverts commit 4efc13f8ff1eaf4f9fb1fcea8d4552b3eca052ca.	2025-05-30 22:06:16 -04:00
Shilei Tian	4efc13f8ff	Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )" This reverts commit 3c6211c183885afb5d89259a53c4f4f46a6bf399.	2025-05-30 21:56:24 -04:00
Shilei Tian	3c6211c183	Revert "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )" This reverts commit 9bf6b2a8cb0467b62173659306e43a0346f063a2.	2025-05-30 21:15:25 -04:00
Shilei Tian	9bf6b2a8cb	[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )	2025-05-30 17:30:42 -04:00
Shilei Tian	84a69a0f8f	[AMDGPU] Move InferAddressSpacesPass to middle end optimization pipeline (#138604 ) It will run twice in the non-LTO pipeline with `O1` or higher. In LTO post link pipeline, it will be run once with `O2` or higher, since inline and SROA don't run in `O1`.	2025-05-29 17:20:56 -04:00
Carl Ritson	e6b43bdde3	[AMDGPU] Cluster export instructions in PostRA Scheduler (#141399 ) DAG mutation needs to be applied post-RA to maintain order established during pre-RA scheduler.	2025-05-26 18:00:43 +09:00
Alexander Richardson	07e2ba445d	[AMDGPU] Set AS8 address width to 48 bits Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419	2025-05-19 17:26:05 -07:00
Austin Kerbow	2c9a46cce3	[AMDGPU] Move kernarg preload logic to separate pass (#130434 ) Moves kernarg preload logic to its own module pass. Cloned function declarations are removed when preloading hidden arguments. The inreg attribute is now added in this pass instead of AMDGPUAttributor. The rest of the logic is copied from AMDGPULowerKernelArguments which now only check whether an arguments is marked inreg to avoid replacing direct uses of preloaded arguments. This change requires test updates to remove inreg from lit tests with kernels that don't actually want preloading.	2025-05-11 21:18:11 -07:00
Matthias Braun	675cb70641	Register assembly printer passes (#138348 ) Register assembly printer passes in the pass registry. This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests. Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow targets to use the `INITIALIZE_PASS` macros and register the pass in the pass registry. This currently has a default parameter so it won't break any targets that have not been updated.	2025-05-06 18:01:17 -07:00
Shilei Tian	d6dbe7799e	[AMDGPU][Attributor] Add `ThinOrFullLTOPhase` as an argument (#123994 )	2025-05-02 11:33:56 -04:00
Akshat Oke	e91cbd4f29	[CodeGen][NPM] Port VirtRegRewriter to NPM (#130564 )	2025-04-30 14:10:46 +05:30
Sergei Barannikov	bb1765179e	[TTI] Simplify implementation (NFCI) (#136674 ) Replace "concept based polymorphism" with simpler PImpl idiom. This pursues two goals: * Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures. * Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. `getInstructionCost()`), it would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`, `TTIImplBase`, and target overrides. Now it is two less :) There are three commits to hopefully simplify the review. The first commit removes `TTI::Model`. This is done by deriving `TargetTransformInfoImplBase` from `TTI::Concept`. This is possible because they implement the same set of interfaces with identical signatures. The first commit makes `TargetTransformImplBase` polymorphic, which means all derived classes should `override` its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of `-Woverloaded-virtual` warnings (and break `-Werror` builds). The third commit eliminates `TTI::Concept` by merging it with the only derived class `TargetTransformImplBase`. This commit could be extracted into a separate PR, but it touches the same lines in `TargetTransformInfoImpl.h` (removes `override` added by the second commit and adds `virtual`), so I thought it may make sense to land these two commits together. Pull Request: https://github.com/llvm/llvm-project/pull/136674	2025-04-26 15:25:40 +03:00
Christudasan Devadasan	940108b24d	[AMDGPU][NewPM] Make the pass flow consistent with the legacy pipeline. (#136551 )	2025-04-21 15:34:15 +05:30
Akshat Oke	84082223c8	[AMDGPU][NPM] Cleanup AMDGPUPassRegistry.def (#130071 ) Finishing up AMDGPU specific passes. Only ones remaining are assembly printer, virt reg rewriter and PEI.	2025-04-17 10:08:22 +05:30
Jun Wang	31f39c8325	[AMDGPU] Remove the AnnotateKernelFeatures pass (#130198 ) Previously the AnnotateKernelFeatures pass infers two attributes: amdgpu-calls and amdgpu-stack-objects, which are used to help determine if flat scratch init is allowed. PR #118907 created the amdgpu-no-flat-scratch-init attribute. Continuing with that work, this patch makes use of this attribute to determine flat scratch init, replacing amdgpu-calls and amdgpu-stack-objects. This also leads to the removal of the AnnotateKernelFeatures pass.	2025-04-15 15:17:33 -07:00
Alex Voicu	1bcec036e1	[HIP][HIPSTDPAR][NFC] Re-order & adapt `hipstdpar` specific passes (#134753 ) The `hipstdpar` specific passes were not ordered ideally, especially for `fgpu-rdc` compilations, which meant that we'd eagerly run accelerator code selection and remove symbols that might end up used. This change corrects that aspect by ensuring that accelerator code selection is only done after linking (this will have to be revisited in the future once the closed-world assumption no longer holds). Furthermore, we take the opportunity to move allocation interposition so that it properly gets printed when print-pipeline-passes is requested. NFC.	2025-04-15 00:47:09 +03:00
Akshat Oke	b283ff7eb1	[CodeGen][NPM] Port BranchRelaxation to NPM (#130067 ) This completes the PreEmitPasses.	2025-04-14 10:19:42 +05:30
Jeffrey Byrnes	6e7fe85247	[AMDGPU] Teach iterative schedulers about IGLP (#134953 ) This adds IGLP mutation to the iterative schedulers (`gcn-iterative-max-occupancy-experimental`, `gcn-iterative-minreg`, and `gcn-iterative-ilp`). The `gcn-iterative-minreg` and `gcn-iterative-ilp` schedulers never actually applied the mutations added, so this also has the effect of teaching them about mutations in general. The `gcn-iterative-max-occupancy-experimental` scheduler has calls to `ScheduleDAGMILive::schedule()`, so, before this, mutations were applied at this point. Now this is done during calls to `BuildDAG`, with IGLP superseding other mutations (similar to the other schedulers). We may end up scheduling regions multiple times, with mutations being applied each time, so we need to track for `AMDGPU::SchedulingPhase::PreRAReentry`	2025-04-11 15:34:49 -07:00
Jeffrey Byrnes	5de3118c67	[AMDGPU] Make the iterative schedulers selectable via amdgpu-sched-strategy (#135042 ) Currently, the only way for users to try these schedulers is via `-misched=` . However, this overrides the default scheduler for all targets. This causes problems for various toolchains / drivers which spawn jobs for both x86 and AMDGPU -- e.g. hipcc. On the other hand, `amdgpu-sched-strategy` only changes the scheduler for AMDGPU target.	2025-04-10 14:43:42 -07:00
Akshat Oke	2f6b06b264	[CodeGen][NPM] Port PostRAHazardRecognizer to NPM (#130066 )	2025-04-09 16:36:22 +05:30
Akshat Oke	fcaefc2c19	[AMDGPU][NPM] Port SIPreEmitPeephole to NPM (#130065 )	2025-04-08 17:58:48 +05:30
Rahul Joshi	a3754ade63	[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410 ) - Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767	2025-04-07 17:27:50 -07:00
Akshat Oke	a13a51b91f	[AMDGPU][NPM] Port AMDGPUSetWavePriority to NPM (#130064 )	2025-04-02 16:28:05 +05:30
Akshat Oke	719b029c16	[AMDGPU][NPM] Port SILateBranchLowering to NPM (#130063 )	2025-03-26 19:28:19 +05:30
Akshat Oke	f8e908a0ed	[AMDGPU][NPM] Port SIInsertHardClauses to NPM (#130062 )	2025-03-25 15:33:32 +05:30
Akshat Oke	f10dc76f03	[AMDGPU][NPM] Port SIInsertWaitcnts to NPM (#130061 )	2025-03-24 21:36:45 +05:30
Akshat Oke	6cc23faaac	[AMDGPU][NPM] Port AMDGPUMarkLastScratchLoad to NPM (#131738 ) This finishes all passes for the optimized regalloc path. --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-03-19 09:27:05 +05:30
Shilei Tian	51c706c119	[NFC][AMDGPU] Replace direct arch comparison with `isAMDGCN()` (#131357 )	2025-03-14 14:21:44 -04:00
Akshat Oke	f34385dd1b	[AMDGPU][NPM] Port GCNCreateVOPD to NPM (#130059 )	2025-03-14 10:22:45 +05:30
Petar Avramovic	3ad810ea9a	AMDGPU/GlobalISel: Disable LCSSA pass (#124297 ) Disable LCSSA pass in preparation for implementing temporal divergence lowering in amdgpu divergence lowering. Breaks all cases where sgpr or i1 values are used outside of the cycle with divergent exit. Regenerate regression tests for amdgpu divergence lowering with LCSSA disabled. Update IntrinsicLaneMaskAnalyzer to stop tracking lcssa phis that are lane masks.	2025-03-12 11:09:50 +01:00
Akshat Oke	c22c5643db	[AMDGPU][NPM] Port SIMemoryLegalizer to NPM (#130060 )	2025-03-12 14:30:35 +05:30
Matt Arsenault	0d2c55cb96	AMDGPU: Move enqueued block handling into clang (#128519 ) The previous implementation wasn't maintaining a faithful IR representation of how this really works. The value returned by createEnqueuedBlockKernel wasn't actually used as a function, and hacked up later to be a pointer to the runtime handle global variable. In reality, the enqueued block is a struct where the first field is a pointer to the kernel descriptor, not the kernel itself. We were also relying on passing around a reference to a global using a string attribute containing its name. It's better to base this on a proper IR symbol reference during final emission. This now avoids using a function attribute on kernels and avoids using the additional "runtime-handle" attribute to populate the final metadata. Instead, associate the runtime handle reference to the kernel with the !associated global metadata. We can then get a final, correctly mangled name at the end. I couldn't figure out how to get rename-with-external-symbol behavior using a combination of comdats and aliases, so leaves an IR pass to externalize the runtime handles for codegen. If anything breaks, it's most likely this, so leave avoiding this for a later step. Use a special section name to enable this behavior. This also means it's possible to declare enqueuable kernels in source without going through the dedicated block syntax or other dedicated compiler support. We could move towards initializing the runtime handle in the compiler/linker. I have a working patch where the linker sets up the first field of the handle, avoiding the need to export the block kernel symbol for the runtime. We would need new relocations to get the private and group sizes, but that would avoid the runtime's special case handling that requires the device_enqueue_symbol metadata field. https://reviews.llvm.org/D141700	2025-03-10 19:54:04 +07:00
Akshat Oke	52225d2702	[AMDGPU][NewPM] Port AMDGPUReserveWWMRegs to NPM (#123722 )	2025-03-10 17:36:35 +05:30
Akshat Oke	6c87ec4f4d	[AMDGPU][NPM] Port SIModeRegister to NPM (#129014 )	2025-03-04 10:51:03 +05:30
Akshat Oke	852923822f	[AMDGPU][NewPM] Port AMDGPUInsertDelayAlu to NPM (#128003 )	2025-02-26 09:50:09 +05:30
Akshat Oke	bd16a87d05	[AMDGPU][NewPM] Port SIPostRABundler to NPM (#123717 )	2025-02-21 16:05:58 +05:30
Akshat Oke	9855d761f3	[AMDGPU][NewPM] Port SIOptimizeExecMaskingPreRA to NPM (#125351 )	2025-02-20 17:35:56 +05:30
Vikram Hegde	663db5c70d	[AMDGPU][NewPM] Port GCNNSAReassign pass to new pass manager (#125034 ) tests to be added while porting virtregrewrite and greedy regalloc	2025-02-18 11:13:31 +05:30
Scott Linder	29ca3b8b28	[AMDGPU] Push amdgpu-preload-kern-arg-prolog after livedebugvalues (#126148 ) This is effectively a workaround for a bug in livedebugvalues, but seems to potentially be a general improvement, as BB sections seems like it could ruin the special 256-byte prelude scheme that amdgpu-preload-kern-arg-prolog requires anyway. Moving it even later doesn't seem to have any material impact, and just adds livedebugvalues to the list of things which no longer have to deal with pseudo multiple-entry functions. AMDGPU debug-info isn't supported upstream yet, so the bug being avoided isn't testable here. I am posting the patch upstream to avoid an unnecessary diff with AMD's fork.	2025-02-17 13:29:56 -05:00
Vikram Hegde	06a3abd9e8	[AMDGPU][NewPM] Port "SIFormMemoryClauses" to NPM (#127181 )	2025-02-17 11:07:17 +05:30
Akshat Oke	7b60e03d73	Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126684 ) `RegisterClassInfo` was supposed to be kept alive between pass runs, which wasn't being done leading to recomputations increasing the compile time. Now the Impl class is a member of the legacy and new passes so that it is not reconstructed on every pass run. --------- Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>	2025-02-12 18:54:39 +05:30
Vikram Hegde	9c725ef368	[AMDGPU][NewPM] Port "GCNRewritePartialRegUses" pass to NPM (#126024 )	2025-02-12 11:21:40 +05:30
Vikram Hegde	3293bff5d2	[AMDGPU][NewPM] Port "GCNPreRAOptimizations" pass to NPM (#126040 )	2025-02-11 11:09:38 +05:30
Shilei Tian	bde8ce6a5c	[AMDGPU] Only run `AMDGPUPrintfRuntimeBindingPass` at non-prelink phase (#125162 )	2025-02-10 08:24:50 -05:00
Akshat Oke	564b9b7f4d	Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126268 ) This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I investigate what's causing the compile-time regression.	2025-02-08 15:36:48 +05:30
Christudasan Devadasan	814db6c53f	[CodeGen][NewPM] Port GCNPreRALongBranchReg to NPM. (#125844 )	2025-02-05 18:46:27 +05:30
Christudasan Devadasan	b83c960bad	[CodeGen][NewPM] Port SIWholeQuadMode to NPM. (#125833 )	2025-02-05 18:44:57 +05:30

1 2 3 4 5 ...

659 Commits