57 Commits

Author SHA1 Message Date
Iris Shi
bdf03fcff3
Revert "[llvm][NFC] Use llvm::sort()" (#140668) 2025-05-20 11:27:03 +08:00
Iris Shi
061a7699f3
[llvm][NFC] Use llvm::sort() (#140335) 2025-05-17 14:49:46 +08:00
Robert Imschweiler
e55172f139
[AMDGPU] Classify FLAT instructions as VMEM (#137148)
Also adapt hazard and wait handling.
2025-05-07 09:20:52 +02:00
Kazu Hirata
d144c13ae5
[Target] Remove unused local variables (NFC) (#138443) 2025-05-04 07:56:38 -07:00
Kazu Hirata
4f71e1ebfc
[AMDGPU] Use llvm::count_if (NFC) (#137492) 2025-04-26 23:27:54 -07:00
anjenner
a3d05e8987
Remove an incorrect assert in MFMASmallGemmSingleWaveOpt. (#130131)
This assert was failing in a fuzzing test. I consulted with @jrbyrnes
who said:

The MFMASmallGemmSingleWaveOpt::apply() method is invoked if and only if
the user has inserted an intrinsic llvm.amdgcn.iglp.opt(i32 1) into
their source code. This intrinsic applies a highly specialized DAG
mutation to result in specific scheduling for a specific set of kernels.
These assertions are really just confirming that the characteristics of
the kernel match what is expected (i.e. The kernels are similar to the
ones this DAG mutation strategy were designed against).

However, if we apply this DAG mutation to kernels for which is was not
designed, then we may not find the types of instructions we are looking
for, and may end up with empty caches.

I think it should be fine to just return false if the cache is empty
instead of the assert.
2025-04-24 09:22:24 +01:00
Kazu Hirata
515564aa6e
[AMDGPU] Partially revert my llvm::less_second patch (#136615)
This patch partially reverts:

  commit 5e1b0f97735083b6762834b83fdbb35e76002e03
  Author: Kazu Hirata <kazu@google.com>
  Date:   Fri Apr 18 10:05:55 2025 -0700

to fix:

  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir

under LLVM_ENABLE_EXPENSIVE_CHECKS.
2025-04-21 14:55:08 -07:00
Kazu Hirata
5e1b0f9773
[llvm] Use llvm::less_first and llvm::less_second (NFC) (#136272) 2025-04-18 10:05:55 -07:00
Kazu Hirata
1380a8259e
[AMDGPU] Use llvm::find and llvm::find_if (NFC) (#135582) 2025-04-13 23:46:57 -07:00
Rahul Joshi
a3754ade63
[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)
- Remove calls to pass initialization from pass constructors.
- https://github.com/llvm/llvm-project/issues/111767
2025-04-07 17:27:50 -07:00
Kazu Hirata
bfe93aedcc [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private
  field 'DAG' is not used [-Werror,-Wunused-private-field]
2025-01-11 13:06:37 -08:00
Austin Kerbow
657fb4433e
[AMDGPU] Add target hook to isGlobalMemoryObject (#112781)
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
2025-01-11 09:57:57 -08:00
Jeffrey Byrnes
9ac52ce8d6
[AMDGPU] Add iglp_opt(3) for simple mfma / exp interleaving (#117269)
Adds a minimal iglp_opt to do simple exp / mfma interleaving.
2024-12-06 15:19:07 -08:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Kazu Hirata
141574bacb
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#113415) 2024-10-23 10:44:09 -07:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Kazu Hirata
d07dc5aa50
[AMDGPU] Avoid repeated hash lookups (NFC) (#110788) 2024-10-02 06:52:21 -07:00
Kazu Hirata
3b9f183958
[AMDGPU] Use llvm::any_of, llvm::all_of, and llvm::none_of (NFC) (#103007) 2024-08-13 00:07:54 -07:00
Kazu Hirata
e40915b740
[AMDGPU] Use llvm::any_of and llvm::none_of (NFC) (#102794) 2024-08-12 10:45:24 -07:00
Jay Foad
c7309dadbf
[AMDGPU] Use range-based for loops. NFC. (#99047) 2024-07-17 10:18:03 +01:00
Jay Foad
5e338f1f4a [AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC. 2024-07-17 08:27:35 +01:00
Jay Foad
aeafdc21d2 [AMDGPU] Use using instead of typedef. NFC. 2024-07-16 16:44:12 +01:00
Jay Foad
78dea4c1ea [AMDGPU] Use bool literals for bools. NFC. 2024-07-16 15:44:49 +01:00
Kazu Hirata
fef144cebb Revert "[llvm] Use llvm::sort (NFC) (#96434)"
This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d.

Reverting the patch fixes the following under EXPENSIVE_CHECKS:

  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir
  LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll
  LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll
2024-06-25 11:18:40 -07:00
Kazu Hirata
05d167fc20
[llvm] Use llvm::sort (NFC) (#96434) 2024-06-23 10:38:51 -07:00
Kazu Hirata
5dc99af487
[llvm] Use llvm::is_contained (NFC) (#95362) 2024-06-13 08:09:13 -07:00
Jeffrey Byrnes
cf1c97b2d2
[AMDGPU] Do not attempt to fallback to default mutations (#83208)
IGLP itself will be in SavedMutations via mutations added during
Scheduler creation, thus falling back results in reapplying IGLP.

In PostRA scheduling, if we have multiple regions with IGLP
instructions, then we may have infinite loop.

Disable the feature for now.
2024-02-27 18:04:59 -08:00
Jeffrey Byrnes
8f2bd8ae68
[AMDGPU] Introduce iglp_opt(2): Generalized exp/mfma interleaving for select kernels (#81342)
This implements the basic pipelining structure of exp/mfma interleaving
for better extensibility. While it does have improved extensibility,
there are controls which only enable it for DAGs with certain
characteristics (matching the DAGs it has been designed against).
2024-02-23 17:13:20 -08:00
Jeffrey Byrnes
f1156fb622
[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416)
Makes constructing SchedGroups of this type easier, and provides ability
to create them with __builtin_amdgcn_sched_group_barrier
2023-12-19 16:54:18 -08:00
Jeffrey Byrnes
6d8b44a506
[AMDGPU] [IGLP]: Fix assert (#73710)
We can also re-enter IGLP mutation via later `SchedStage`s in the
`GCNMaxOccupancySchedStrategy` . This is sort of NFC in that there is no
changed behavior for the only current client of `IsReentry`
2023-12-07 17:10:10 -08:00
Craig Topper
35baff8b6a
[AMDGPU] Correct assert that incorrectly chained multiple == operators. (#70291)
I believe this assert was trying to check that 3 variables were equal to
0.

I think it instead got interpreted as ((DSWCount == DSWWithPermCount) ==
DSWWithSharedVMEMCount) == 0 I guess (DSWCount == DSWWithPermCount) was
true because both counts were 0. Then true got compared to
DSWWithSharedVMEMCount, and since DSWWithSharedVMEMCount is 0, that
compare was false. And then that false compared equal to the final 0.
2023-10-26 08:02:10 -07:00
Kazu Hirata
6e18003a23 [llvm] Use llvm::any_of (NFC) 2023-10-22 10:42:18 -07:00
Jeffrey Byrnes
6afceba510
[AMDGPU][IGLP] SingleWaveOpt: Cache DSW Counters from PreRA (#67759)
Save the DSW counters from PreRA scheduling. While this avoids recalculation in the postRA pass, that isn't the main purpose.

This is required because of physical register dependencies in PostRA scheduling -- they alter the DAG s.t. our counters may become incorrect -- which alters the layout of the pipeline. By preserving the values from PreRA, we can be sure that we accurately construct the pipeline.

Additionally, remove a bad assert in SharesPredWithPrevNthGroup -- it is possible that we will have an empty cache if OtherGroup has no elements which have a V_PERM pred (possible if the V_PERM SG is empty).
2023-10-06 17:34:14 -07:00
Kazu Hirata
8a7f4eeb60 [llvm] Use llvm::is_contained (NFC) 2023-09-22 17:09:27 -07:00
Luke Drummond
471d9c57af [NFC][AMDGPU] assert we've found a value before use
The sync pipeline should always contain the candidate ID. If it doesn't
something's gone awry. assert on that.

Reviewed by: jrbyrnes

Differential Revision: https://reviews.llvm.org/D158845
2023-08-28 10:14:47 +01:00
Jeffrey Byrnes
6b7805fcb1 [AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms
This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering.

The kernels of interest have these specific phases:
NT: 1, 2a, 2c
NN: 1, 2a, 2b
TT: 1, 2b, 2c
TN: 1, 2b

The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver.

Differential Revision: https://reviews.llvm.org/D149773

Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96
2023-07-13 12:03:04 -07:00
Jeffrey Byrnes
db61927951 [AMDGPU][IGLP]: Add rules to SchedGroups
Differential Revision: https://reviews.llvm.org/D146774

Change-Id: Icd7aaaa0b257a25713c22ead0813777cef7d5859
2023-06-06 19:19:21 -07:00
Jeffrey Byrnes
1721e72d6e [AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver
Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to require this. Providing the option to toggle this affords greater experimentation capability, and make usage a bit more intuitive. Importantly, it makes designing rules much easier.

Differential Revision: https://reviews.llvm.org/D149393

Change-Id: Ic4abd3408f9faa105c0eef72eab7873d46083ee4
2023-05-30 14:43:14 -07:00
Nico Weber
72e01ef1f1 Revert "[AMDGPU] Add Lower Bound to PipelineSolver"
This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9.
Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c912994a9#1191724
2023-04-06 12:35:44 -04:00
Jeff Byrnes
3c42a58c4f [AMDGPU] Add Lower Bound to PipelineSolver 2023-04-05 14:54:59 -07:00
Stanislav Mekhanoshin
63e7e9c875 [AMDGPU] Treat WMMA the same as MFMA for sched_barrier
MFMA and WMMA essentially the same thing, but apear on different ASICs.

Differential Revision: https://reviews.llvm.org/D142062
2023-01-19 10:52:31 -08:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Austin Kerbow
f9c76a1198 [AMDGPU] Update MFMASmallGemmOpt with better performing stategy
Based on experiments this does better with target small GEMM kernels.

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D139227
2022-12-09 19:03:51 -08:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Kazu Hirata
7d8c2d17eb [llvm] Use range-based for loops (NFC)
Identified with modernize-loop-convert.
2022-09-03 23:27:25 -07:00
Kazu Hirata
9861a68a7c [Target] Qualify auto in range-based for loops (NFC) 2022-08-28 10:41:50 -07:00
Kazu Hirata
ce9f007c7c [llvm] Use llvm::find_if (NFC) 2022-08-28 10:41:48 -07:00
Austin Kerbow
b0f4678b90 [AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other mutations or
clustering that may interfere with the desired pipeline. The first optimization
strategy that is added here is designed to improve the performance of small gemm
kernels on gfx90a.

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D132079
2022-08-19 15:38:36 -07:00
Jeffrey Byrnes
1c8d7ea973 [AMDGPU] Implement pipeline solver for non-trivial pipelines
Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm.

Differential Revision: https://reviews.llvm.org/D130797
2022-08-17 16:21:59 -07:00