llvm-project

Author	SHA1	Message	Date
Iris Shi	bdf03fcff3	Revert "[llvm][NFC] Use `llvm::sort()`" (#140668 )	2025-05-20 11:27:03 +08:00
Iris Shi	061a7699f3	[llvm][NFC] Use `llvm::sort()` (#140335 )	2025-05-17 14:49:46 +08:00
Robert Imschweiler	e55172f139	[AMDGPU] Classify FLAT instructions as VMEM (#137148 ) Also adapt hazard and wait handling.	2025-05-07 09:20:52 +02:00
Kazu Hirata	d144c13ae5	[Target] Remove unused local variables (NFC) (#138443 )	2025-05-04 07:56:38 -07:00
Kazu Hirata	4f71e1ebfc	[AMDGPU] Use llvm::count_if (NFC) (#137492 )	2025-04-26 23:27:54 -07:00
anjenner	a3d05e8987	Remove an incorrect assert in MFMASmallGemmSingleWaveOpt. (#130131 ) This assert was failing in a fuzzing test. I consulted with @jrbyrnes who said: The MFMASmallGemmSingleWaveOpt::apply() method is invoked if and only if the user has inserted an intrinsic llvm.amdgcn.iglp.opt(i32 1) into their source code. This intrinsic applies a highly specialized DAG mutation to result in specific scheduling for a specific set of kernels. These assertions are really just confirming that the characteristics of the kernel match what is expected (i.e. The kernels are similar to the ones this DAG mutation strategy were designed against). However, if we apply this DAG mutation to kernels for which is was not designed, then we may not find the types of instructions we are looking for, and may end up with empty caches. I think it should be fine to just return false if the cache is empty instead of the assert.	2025-04-24 09:22:24 +01:00
Kazu Hirata	515564aa6e	[AMDGPU] Partially revert my llvm::less_second patch (#136615 ) This patch partially reverts: commit 5e1b0f97735083b6762834b83fdbb35e76002e03 Author: Kazu Hirata <kazu@google.com> Date: Fri Apr 18 10:05:55 2025 -0700 to fix: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir under LLVM_ENABLE_EXPENSIVE_CHECKS.	2025-04-21 14:55:08 -07:00
Kazu Hirata	5e1b0f9773	[llvm] Use llvm::less_first and llvm::less_second (NFC) (#136272 )	2025-04-18 10:05:55 -07:00
Kazu Hirata	1380a8259e	[AMDGPU] Use llvm::find and llvm::find_if (NFC) (#135582 )	2025-04-13 23:46:57 -07:00
Rahul Joshi	a3754ade63	[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410 ) - Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767	2025-04-07 17:27:50 -07:00
Kazu Hirata	bfe93aedcc	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private field 'DAG' is not used [-Werror,-Wunused-private-field]	2025-01-11 13:06:37 -08:00
Austin Kerbow	657fb4433e	[AMDGPU] Add target hook to isGlobalMemoryObject (#112781 ) We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.	2025-01-11 09:57:57 -08:00
Jeffrey Byrnes	9ac52ce8d6	[AMDGPU] Add iglp_opt(3) for simple mfma / exp interleaving (#117269 ) Adds a minimal iglp_opt to do simple exp / mfma interleaving.	2024-12-06 15:19:07 -08:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Kazu Hirata	141574bacb	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#113415 )	2024-10-23 10:44:09 -07:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Kazu Hirata	d07dc5aa50	[AMDGPU] Avoid repeated hash lookups (NFC) (#110788 )	2024-10-02 06:52:21 -07:00
Kazu Hirata	3b9f183958	[AMDGPU] Use llvm::any_of, llvm::all_of, and llvm::none_of (NFC) (#103007 )	2024-08-13 00:07:54 -07:00
Kazu Hirata	e40915b740	[AMDGPU] Use llvm::any_of and llvm::none_of (NFC) (#102794 )	2024-08-12 10:45:24 -07:00
Jay Foad	c7309dadbf	[AMDGPU] Use range-based for loops. NFC. (#99047 )	2024-07-17 10:18:03 +01:00
Jay Foad	5e338f1f4a	[AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC.	2024-07-17 08:27:35 +01:00
Jay Foad	aeafdc21d2	[AMDGPU] Use using instead of typedef. NFC.	2024-07-16 16:44:12 +01:00
Jay Foad	78dea4c1ea	[AMDGPU] Use bool literals for bools. NFC.	2024-07-16 15:44:49 +01:00
Kazu Hirata	fef144cebb	Revert "[llvm] Use llvm::sort (NFC) (#96434 )" This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d. Reverting the patch fixes the following under EXPENSIVE_CHECKS: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll	2024-06-25 11:18:40 -07:00
Kazu Hirata	05d167fc20	[llvm] Use llvm::sort (NFC) (#96434 )	2024-06-23 10:38:51 -07:00
Kazu Hirata	5dc99af487	[llvm] Use llvm::is_contained (NFC) (#95362 )	2024-06-13 08:09:13 -07:00
Jeffrey Byrnes	cf1c97b2d2	[AMDGPU] Do not attempt to fallback to default mutations (#83208 ) IGLP itself will be in SavedMutations via mutations added during Scheduler creation, thus falling back results in reapplying IGLP. In PostRA scheduling, if we have multiple regions with IGLP instructions, then we may have infinite loop. Disable the feature for now.	2024-02-27 18:04:59 -08:00
Jeffrey Byrnes	8f2bd8ae68	[AMDGPU] Introduce iglp_opt(2): Generalized exp/mfma interleaving for select kernels (#81342 ) This implements the basic pipelining structure of exp/mfma interleaving for better extensibility. While it does have improved extensibility, there are controls which only enable it for DAGs with certain characteristics (matching the DAGs it has been designed against).	2024-02-23 17:13:20 -08:00
Jeffrey Byrnes	f1156fb622	[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416 ) Makes constructing SchedGroups of this type easier, and provides ability to create them with __builtin_amdgcn_sched_group_barrier	2023-12-19 16:54:18 -08:00
Jeffrey Byrnes	6d8b44a506	[AMDGPU] [IGLP]: Fix assert (#73710 ) We can also re-enter IGLP mutation via later `SchedStage`s in the `GCNMaxOccupancySchedStrategy` . This is sort of NFC in that there is no changed behavior for the only current client of `IsReentry`	2023-12-07 17:10:10 -08:00
Craig Topper	35baff8b6a	[AMDGPU] Correct assert that incorrectly chained multiple == operators. (#70291 ) I believe this assert was trying to check that 3 variables were equal to 0. I think it instead got interpreted as ((DSWCount == DSWWithPermCount) == DSWWithSharedVMEMCount) == 0 I guess (DSWCount == DSWWithPermCount) was true because both counts were 0. Then true got compared to DSWWithSharedVMEMCount, and since DSWWithSharedVMEMCount is 0, that compare was false. And then that false compared equal to the final 0.	2023-10-26 08:02:10 -07:00
Kazu Hirata	6e18003a23	[llvm] Use llvm::any_of (NFC)	2023-10-22 10:42:18 -07:00
Jeffrey Byrnes	6afceba510	[AMDGPU][IGLP] SingleWaveOpt: Cache DSW Counters from PreRA (#67759 ) Save the DSW counters from PreRA scheduling. While this avoids recalculation in the postRA pass, that isn't the main purpose. This is required because of physical register dependencies in PostRA scheduling -- they alter the DAG s.t. our counters may become incorrect -- which alters the layout of the pipeline. By preserving the values from PreRA, we can be sure that we accurately construct the pipeline. Additionally, remove a bad assert in SharesPredWithPrevNthGroup -- it is possible that we will have an empty cache if OtherGroup has no elements which have a V_PERM pred (possible if the V_PERM SG is empty).	2023-10-06 17:34:14 -07:00
Kazu Hirata	8a7f4eeb60	[llvm] Use llvm::is_contained (NFC)	2023-09-22 17:09:27 -07:00
Luke Drummond	471d9c57af	[NFC][AMDGPU] assert we've found a value before use The sync pipeline should always contain the candidate ID. If it doesn't something's gone awry. assert on that. Reviewed by: jrbyrnes Differential Revision: https://reviews.llvm.org/D158845	2023-08-28 10:14:47 +01:00
Jeffrey Byrnes	6b7805fcb1	[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering. The kernels of interest have these specific phases: NT: 1, 2a, 2c NN: 1, 2a, 2b TT: 1, 2b, 2c TN: 1, 2b The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver. Differential Revision: https://reviews.llvm.org/D149773 Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96	2023-07-13 12:03:04 -07:00
Jeffrey Byrnes	db61927951	[AMDGPU][IGLP]: Add rules to SchedGroups Differential Revision: https://reviews.llvm.org/D146774 Change-Id: Icd7aaaa0b257a25713c22ead0813777cef7d5859	2023-06-06 19:19:21 -07:00
Jeffrey Byrnes	1721e72d6e	[AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to require this. Providing the option to toggle this affords greater experimentation capability, and make usage a bit more intuitive. Importantly, it makes designing rules much easier. Differential Revision: https://reviews.llvm.org/D149393 Change-Id: Ic4abd3408f9faa105c0eef72eab7873d46083ee4	2023-05-30 14:43:14 -07:00
Nico Weber	72e01ef1f1	Revert "[AMDGPU] Add Lower Bound to PipelineSolver" This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9. Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c912994a9#1191724	2023-04-06 12:35:44 -04:00
Jeff Byrnes	3c42a58c4f	[AMDGPU] Add Lower Bound to PipelineSolver	2023-04-05 14:54:59 -07:00
Stanislav Mekhanoshin	63e7e9c875	[AMDGPU] Treat WMMA the same as MFMA for sched_barrier MFMA and WMMA essentially the same thing, but apear on different ASICs. Differential Revision: https://reviews.llvm.org/D142062	2023-01-19 10:52:31 -08:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Austin Kerbow	f9c76a1198	[AMDGPU] Update MFMASmallGemmOpt with better performing stategy Based on experiments this does better with target small GEMM kernels. Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D139227	2022-12-09 19:03:51 -08:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Kazu Hirata	7d8c2d17eb	[llvm] Use range-based for loops (NFC) Identified with modernize-loop-convert.	2022-09-03 23:27:25 -07:00
Kazu Hirata	9861a68a7c	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 10:41:50 -07:00
Kazu Hirata	ce9f007c7c	[llvm] Use llvm::find_if (NFC)	2022-08-28 10:41:48 -07:00
Austin Kerbow	b0f4678b90	[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other mutations or clustering that may interfere with the desired pipeline. The first optimization strategy that is added here is designed to improve the performance of small gemm kernels on gfx90a. Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D132079	2022-08-19 15:38:36 -07:00
Jeffrey Byrnes	1c8d7ea973	[AMDGPU] Implement pipeline solver for non-trivial pipelines Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm. Differential Revision: https://reviews.llvm.org/D130797	2022-08-17 16:21:59 -07:00

1 2

57 Commits