llvm-project

Author	SHA1	Message	Date
John Paul Jepko	cabebddac9	[NFC] Remove unused-but-set global variables (#189315 ) Remove four global variables that are set but never read to fix -Wunused-but-set-global warnings: - `MFMAChainLength` in AMDGPUIGroupLP.cpp - `Wide` in llvm-objdump.cpp - `SaveTemps` in ClangSYCLLinker.cpp - `DeprecatedDriverCommand` in ClangScanDeps.cpp Follow up to #178342	2026-03-30 19:29:50 +02:00
Bruce Changlong Xu	4cbb67a962	[AMDGPU] Use empty() instead of size() comparisons. NFC. (#187424 ) Replace `.size() == 0`, `.size() != 0`, and `.size() > 0` with `.empty()` / `!.empty()` across the AMDGPU backend.	2026-03-19 16:08:56 +00:00
Frederik Harwath	39f2740fac	[AMDGPU] IGroupLP: Avoid repeating reachability checks in greedy algorithm (#182463 ) In the greedy pipeline solver, the group cost is found using the addEdges function and the edges must be removed from the DAG after processing each group. The best group edges are then reinserted using the same function. This repeats the costly reachability checks inside the function which become problematic for pipelines with many SchedGroups. The algorithm is changed to remember the best group edges instead of recomputing them. Additionally, SchedGroup::tryAddEdge is refactored to avoid a redundant cycle check which is already performed by DAG->addEdge.	2026-03-03 09:40:30 +00:00
Frederik Harwath	0933b634c6	[AMDGPU] IGroupLP: Refactor SchedGroup::initSchedGroup (NFC) (#184122 ) There are three overloaded SchedGroup::initSchedGroup functions, two of which are only used for specific types of SchedGroups, namely SCHED_BARRIER and SCHED_GROUP_BARRIER. This seems to have a led to some confusion since the different functions perform checks which are not needed for their intended restricted use cases. Furthermore, there are several wrong comments surrounding those functions. Simplify the functions and inline the actual initialization parts of the SCHED_BARRIER and SCHED_GROUP_BARRIER variants at their only call sites. Extract a function that finds the candidate SUnits for a given SchedGroup and use this instead of initSchedGroup. Fix comments.	2026-03-03 09:09:44 +00:00
Patrick Simmons	0d0249e543	Try To Guess SGMasks for Inline Asm Instructions (#155491 ) Addresses SWDEV-549227	2026-01-23 12:43:53 -06:00
David Spickett	80e3548372	[llvm][AMDGPU] Fix signed/unsigned comparison warning in 32-bit builds (#172623 ) llvm::count_if calls std::count_if which returns a difference_type. difference_type is always signed but is never going to be a negative value when used as the result of count_if. This resulted in warnings in our 32-bit Arm builds like: ``` AMDGPUIGroupLP.cpp:1050:20: warning: comparison of integers of different signs: 'typename iterator_traits<const SDep *>::difference_type' (aka 'int') and 'unsigned int' [-Wsign-compare] 1050 \| if (SuccSize >= Size) \| ~~~~~~~~ ^ ~~~~ ``` I presume these warnings are not generated in 64-bit builds because unsigned is 32-bit even for 64-bit platforms and there is no risk in extending 32-bit unsigned into 64-bit signed. To fix the warning I've changed the type of SuccSize to unsigned, and the assignment acts like a static_cast into that type.	2025-12-18 11:11:09 +00:00
Robert Imschweiler	576e1affab	[NFC][AMDGPU] IGLP: Fixes for unsigned int handling (#135090 ) Fixes unsigned int underflows in `MFMASmallGemmSingleWaveOpt::applyIGLPStrategy`.	2025-11-18 20:47:41 +01:00
Jay Foad	72c69aefba	[AMDGPU] Make use of getFunction and getMF. NFC. (#167872 )	2025-11-14 11:00:57 +00:00
Jeffrey Byrnes	b86503efe5	[AMDGPU] Do not put memory instructions in ALU SchedGroups (#162560 ) Classifying some memory instructions as VALU leads to unexpected behavior from the schedbarrier intrinsics.	2025-10-13 11:49:32 -07:00
Iris Shi	bdf03fcff3	Revert "[llvm][NFC] Use `llvm::sort()`" (#140668 )	2025-05-20 11:27:03 +08:00
Iris Shi	061a7699f3	[llvm][NFC] Use `llvm::sort()` (#140335 )	2025-05-17 14:49:46 +08:00
Robert Imschweiler	e55172f139	[AMDGPU] Classify FLAT instructions as VMEM (#137148 ) Also adapt hazard and wait handling.	2025-05-07 09:20:52 +02:00
Kazu Hirata	d144c13ae5	[Target] Remove unused local variables (NFC) (#138443 )	2025-05-04 07:56:38 -07:00
Kazu Hirata	4f71e1ebfc	[AMDGPU] Use llvm::count_if (NFC) (#137492 )	2025-04-26 23:27:54 -07:00
anjenner	a3d05e8987	Remove an incorrect assert in MFMASmallGemmSingleWaveOpt. (#130131 ) This assert was failing in a fuzzing test. I consulted with @jrbyrnes who said: The MFMASmallGemmSingleWaveOpt::apply() method is invoked if and only if the user has inserted an intrinsic llvm.amdgcn.iglp.opt(i32 1) into their source code. This intrinsic applies a highly specialized DAG mutation to result in specific scheduling for a specific set of kernels. These assertions are really just confirming that the characteristics of the kernel match what is expected (i.e. The kernels are similar to the ones this DAG mutation strategy were designed against). However, if we apply this DAG mutation to kernels for which is was not designed, then we may not find the types of instructions we are looking for, and may end up with empty caches. I think it should be fine to just return false if the cache is empty instead of the assert.	2025-04-24 09:22:24 +01:00
Kazu Hirata	515564aa6e	[AMDGPU] Partially revert my llvm::less_second patch (#136615 ) This patch partially reverts: commit 5e1b0f97735083b6762834b83fdbb35e76002e03 Author: Kazu Hirata <kazu@google.com> Date: Fri Apr 18 10:05:55 2025 -0700 to fix: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir under LLVM_ENABLE_EXPENSIVE_CHECKS.	2025-04-21 14:55:08 -07:00
Kazu Hirata	5e1b0f9773	[llvm] Use llvm::less_first and llvm::less_second (NFC) (#136272 )	2025-04-18 10:05:55 -07:00
Kazu Hirata	1380a8259e	[AMDGPU] Use llvm::find and llvm::find_if (NFC) (#135582 )	2025-04-13 23:46:57 -07:00
Rahul Joshi	a3754ade63	[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410 ) - Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767	2025-04-07 17:27:50 -07:00
Kazu Hirata	bfe93aedcc	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private field 'DAG' is not used [-Werror,-Wunused-private-field]	2025-01-11 13:06:37 -08:00
Austin Kerbow	657fb4433e	[AMDGPU] Add target hook to isGlobalMemoryObject (#112781 ) We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.	2025-01-11 09:57:57 -08:00
Jeffrey Byrnes	9ac52ce8d6	[AMDGPU] Add iglp_opt(3) for simple mfma / exp interleaving (#117269 ) Adds a minimal iglp_opt to do simple exp / mfma interleaving.	2024-12-06 15:19:07 -08:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Kazu Hirata	141574bacb	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#113415 )	2024-10-23 10:44:09 -07:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Kazu Hirata	d07dc5aa50	[AMDGPU] Avoid repeated hash lookups (NFC) (#110788 )	2024-10-02 06:52:21 -07:00
Kazu Hirata	3b9f183958	[AMDGPU] Use llvm::any_of, llvm::all_of, and llvm::none_of (NFC) (#103007 )	2024-08-13 00:07:54 -07:00
Kazu Hirata	e40915b740	[AMDGPU] Use llvm::any_of and llvm::none_of (NFC) (#102794 )	2024-08-12 10:45:24 -07:00
Jay Foad	c7309dadbf	[AMDGPU] Use range-based for loops. NFC. (#99047 )	2024-07-17 10:18:03 +01:00
Jay Foad	5e338f1f4a	[AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC.	2024-07-17 08:27:35 +01:00
Jay Foad	aeafdc21d2	[AMDGPU] Use using instead of typedef. NFC.	2024-07-16 16:44:12 +01:00
Jay Foad	78dea4c1ea	[AMDGPU] Use bool literals for bools. NFC.	2024-07-16 15:44:49 +01:00
Kazu Hirata	fef144cebb	Revert "[llvm] Use llvm::sort (NFC) (#96434 )" This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d. Reverting the patch fixes the following under EXPENSIVE_CHECKS: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll	2024-06-25 11:18:40 -07:00
Kazu Hirata	05d167fc20	[llvm] Use llvm::sort (NFC) (#96434 )	2024-06-23 10:38:51 -07:00
Kazu Hirata	5dc99af487	[llvm] Use llvm::is_contained (NFC) (#95362 )	2024-06-13 08:09:13 -07:00
Jeffrey Byrnes	cf1c97b2d2	[AMDGPU] Do not attempt to fallback to default mutations (#83208 ) IGLP itself will be in SavedMutations via mutations added during Scheduler creation, thus falling back results in reapplying IGLP. In PostRA scheduling, if we have multiple regions with IGLP instructions, then we may have infinite loop. Disable the feature for now.	2024-02-27 18:04:59 -08:00
Jeffrey Byrnes	8f2bd8ae68	[AMDGPU] Introduce iglp_opt(2): Generalized exp/mfma interleaving for select kernels (#81342 ) This implements the basic pipelining structure of exp/mfma interleaving for better extensibility. While it does have improved extensibility, there are controls which only enable it for DAGs with certain characteristics (matching the DAGs it has been designed against).	2024-02-23 17:13:20 -08:00
Jeffrey Byrnes	f1156fb622	[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416 ) Makes constructing SchedGroups of this type easier, and provides ability to create them with __builtin_amdgcn_sched_group_barrier	2023-12-19 16:54:18 -08:00
Jeffrey Byrnes	6d8b44a506	[AMDGPU] [IGLP]: Fix assert (#73710 ) We can also re-enter IGLP mutation via later `SchedStage`s in the `GCNMaxOccupancySchedStrategy` . This is sort of NFC in that there is no changed behavior for the only current client of `IsReentry`	2023-12-07 17:10:10 -08:00
Craig Topper	35baff8b6a	[AMDGPU] Correct assert that incorrectly chained multiple == operators. (#70291 ) I believe this assert was trying to check that 3 variables were equal to 0. I think it instead got interpreted as ((DSWCount == DSWWithPermCount) == DSWWithSharedVMEMCount) == 0 I guess (DSWCount == DSWWithPermCount) was true because both counts were 0. Then true got compared to DSWWithSharedVMEMCount, and since DSWWithSharedVMEMCount is 0, that compare was false. And then that false compared equal to the final 0.	2023-10-26 08:02:10 -07:00
Kazu Hirata	6e18003a23	[llvm] Use llvm::any_of (NFC)	2023-10-22 10:42:18 -07:00
Jeffrey Byrnes	6afceba510	[AMDGPU][IGLP] SingleWaveOpt: Cache DSW Counters from PreRA (#67759 ) Save the DSW counters from PreRA scheduling. While this avoids recalculation in the postRA pass, that isn't the main purpose. This is required because of physical register dependencies in PostRA scheduling -- they alter the DAG s.t. our counters may become incorrect -- which alters the layout of the pipeline. By preserving the values from PreRA, we can be sure that we accurately construct the pipeline. Additionally, remove a bad assert in SharesPredWithPrevNthGroup -- it is possible that we will have an empty cache if OtherGroup has no elements which have a V_PERM pred (possible if the V_PERM SG is empty).	2023-10-06 17:34:14 -07:00
Kazu Hirata	8a7f4eeb60	[llvm] Use llvm::is_contained (NFC)	2023-09-22 17:09:27 -07:00
Luke Drummond	471d9c57af	[NFC][AMDGPU] assert we've found a value before use The sync pipeline should always contain the candidate ID. If it doesn't something's gone awry. assert on that. Reviewed by: jrbyrnes Differential Revision: https://reviews.llvm.org/D158845	2023-08-28 10:14:47 +01:00
Jeffrey Byrnes	6b7805fcb1	[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering. The kernels of interest have these specific phases: NT: 1, 2a, 2c NN: 1, 2a, 2b TT: 1, 2b, 2c TN: 1, 2b The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver. Differential Revision: https://reviews.llvm.org/D149773 Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96	2023-07-13 12:03:04 -07:00
Jeffrey Byrnes	db61927951	[AMDGPU][IGLP]: Add rules to SchedGroups Differential Revision: https://reviews.llvm.org/D146774 Change-Id: Icd7aaaa0b257a25713c22ead0813777cef7d5859	2023-06-06 19:19:21 -07:00
Jeffrey Byrnes	1721e72d6e	[AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to require this. Providing the option to toggle this affords greater experimentation capability, and make usage a bit more intuitive. Importantly, it makes designing rules much easier. Differential Revision: https://reviews.llvm.org/D149393 Change-Id: Ic4abd3408f9faa105c0eef72eab7873d46083ee4	2023-05-30 14:43:14 -07:00
Nico Weber	72e01ef1f1	Revert "[AMDGPU] Add Lower Bound to PipelineSolver" This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9. Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c912994a9#1191724	2023-04-06 12:35:44 -04:00
Jeff Byrnes	3c42a58c4f	[AMDGPU] Add Lower Bound to PipelineSolver	2023-04-05 14:54:59 -07:00
Stanislav Mekhanoshin	63e7e9c875	[AMDGPU] Treat WMMA the same as MFMA for sched_barrier MFMA and WMMA essentially the same thing, but apear on different ASICs. Differential Revision: https://reviews.llvm.org/D142062	2023-01-19 10:52:31 -08:00

1 2

66 Commits