66 Commits

Author SHA1 Message Date
John Paul Jepko
cabebddac9
[NFC] Remove unused-but-set global variables (#189315)
Remove four global variables that are set but never read to fix
-Wunused-but-set-global warnings:

- `MFMAChainLength` in AMDGPUIGroupLP.cpp
- `Wide` in llvm-objdump.cpp
- `SaveTemps` in ClangSYCLLinker.cpp
- `DeprecatedDriverCommand` in ClangScanDeps.cpp

Follow up to #178342
2026-03-30 19:29:50 +02:00
Bruce Changlong Xu
4cbb67a962
[AMDGPU] Use empty() instead of size() comparisons. NFC. (#187424)
Replace `.size() == 0`, `.size() != 0`, and `.size() > 0` with
`.empty()` / `!.empty()` across the AMDGPU backend.
2026-03-19 16:08:56 +00:00
Frederik Harwath
39f2740fac
[AMDGPU] IGroupLP: Avoid repeating reachability checks in greedy algorithm (#182463)
In the greedy pipeline solver, the group cost is found using the
addEdges function and the edges must be removed from the DAG after
processing each group. The best group edges are then reinserted using
the same function. This repeats the costly reachability checks inside
the function which become problematic for pipelines with many
SchedGroups.

The algorithm is changed to remember the best group edges instead of
recomputing them. Additionally, SchedGroup::tryAddEdge is refactored to
avoid a redundant cycle check which is already performed by DAG->addEdge.
2026-03-03 09:40:30 +00:00
Frederik Harwath
0933b634c6
[AMDGPU] IGroupLP: Refactor SchedGroup::initSchedGroup (NFC) (#184122)
There are three overloaded SchedGroup::initSchedGroup functions, two of
which are only used for specific types of SchedGroups, namely
SCHED_BARRIER and SCHED_GROUP_BARRIER. This seems to have a led to some
confusion since the different functions perform checks which are not
needed for their intended restricted use cases. Furthermore, there are
several wrong comments surrounding those functions.

Simplify the functions and inline the actual initialization parts of the
SCHED_BARRIER and SCHED_GROUP_BARRIER variants at their only call sites.
Extract a function that finds the candidate SUnits for a given
SchedGroup and use this instead of initSchedGroup. Fix comments.
2026-03-03 09:09:44 +00:00
Patrick Simmons
0d0249e543
Try To Guess SGMasks for Inline Asm Instructions (#155491)
Addresses SWDEV-549227
2026-01-23 12:43:53 -06:00
David Spickett
80e3548372
[llvm][AMDGPU] Fix signed/unsigned comparison warning in 32-bit builds (#172623)
llvm::count_if calls std::count_if which returns a difference_type.
difference_type is always signed but is never going to be a negative
value when used as the result of count_if.

This resulted in warnings in our 32-bit Arm builds like: 
```
AMDGPUIGroupLP.cpp:1050:20: warning: comparison of integers of different signs: 
'typename iterator_traits<const SDep *>::difference_type' (aka 'int') and 'unsigned int' [-Wsign-compare]
 1050 |       if (SuccSize >= Size)
      |           ~~~~~~~~ ^  ~~~~
```

I presume these warnings are not generated in 64-bit builds because
unsigned is 32-bit even for 64-bit platforms and there is no risk in
extending 32-bit unsigned into 64-bit signed.

To fix the warning I've changed the type of SuccSize to unsigned, and
the assignment acts like a static_cast into that type.
2025-12-18 11:11:09 +00:00
Robert Imschweiler
576e1affab
[NFC][AMDGPU] IGLP: Fixes for unsigned int handling (#135090)
Fixes unsigned int underflows in
`MFMASmallGemmSingleWaveOpt::applyIGLPStrategy`.
2025-11-18 20:47:41 +01:00
Jay Foad
72c69aefba
[AMDGPU] Make use of getFunction and getMF. NFC. (#167872) 2025-11-14 11:00:57 +00:00
Jeffrey Byrnes
b86503efe5
[AMDGPU] Do not put memory instructions in *ALU SchedGroups (#162560)
Classifying some memory instructions as VALU leads to unexpected
behavior from the sched*barrier intrinsics.
2025-10-13 11:49:32 -07:00
Iris Shi
bdf03fcff3
Revert "[llvm][NFC] Use llvm::sort()" (#140668) 2025-05-20 11:27:03 +08:00
Iris Shi
061a7699f3
[llvm][NFC] Use llvm::sort() (#140335) 2025-05-17 14:49:46 +08:00
Robert Imschweiler
e55172f139
[AMDGPU] Classify FLAT instructions as VMEM (#137148)
Also adapt hazard and wait handling.
2025-05-07 09:20:52 +02:00
Kazu Hirata
d144c13ae5
[Target] Remove unused local variables (NFC) (#138443) 2025-05-04 07:56:38 -07:00
Kazu Hirata
4f71e1ebfc
[AMDGPU] Use llvm::count_if (NFC) (#137492) 2025-04-26 23:27:54 -07:00
anjenner
a3d05e8987
Remove an incorrect assert in MFMASmallGemmSingleWaveOpt. (#130131)
This assert was failing in a fuzzing test. I consulted with @jrbyrnes
who said:

The MFMASmallGemmSingleWaveOpt::apply() method is invoked if and only if
the user has inserted an intrinsic llvm.amdgcn.iglp.opt(i32 1) into
their source code. This intrinsic applies a highly specialized DAG
mutation to result in specific scheduling for a specific set of kernels.
These assertions are really just confirming that the characteristics of
the kernel match what is expected (i.e. The kernels are similar to the
ones this DAG mutation strategy were designed against).

However, if we apply this DAG mutation to kernels for which is was not
designed, then we may not find the types of instructions we are looking
for, and may end up with empty caches.

I think it should be fine to just return false if the cache is empty
instead of the assert.
2025-04-24 09:22:24 +01:00
Kazu Hirata
515564aa6e
[AMDGPU] Partially revert my llvm::less_second patch (#136615)
This patch partially reverts:

  commit 5e1b0f97735083b6762834b83fdbb35e76002e03
  Author: Kazu Hirata <kazu@google.com>
  Date:   Fri Apr 18 10:05:55 2025 -0700

to fix:

  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir

under LLVM_ENABLE_EXPENSIVE_CHECKS.
2025-04-21 14:55:08 -07:00
Kazu Hirata
5e1b0f9773
[llvm] Use llvm::less_first and llvm::less_second (NFC) (#136272) 2025-04-18 10:05:55 -07:00
Kazu Hirata
1380a8259e
[AMDGPU] Use llvm::find and llvm::find_if (NFC) (#135582) 2025-04-13 23:46:57 -07:00
Rahul Joshi
a3754ade63
[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)
- Remove calls to pass initialization from pass constructors.
- https://github.com/llvm/llvm-project/issues/111767
2025-04-07 17:27:50 -07:00
Kazu Hirata
bfe93aedcc [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:255:18: error: private
  field 'DAG' is not used [-Werror,-Wunused-private-field]
2025-01-11 13:06:37 -08:00
Austin Kerbow
657fb4433e
[AMDGPU] Add target hook to isGlobalMemoryObject (#112781)
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
2025-01-11 09:57:57 -08:00
Jeffrey Byrnes
9ac52ce8d6
[AMDGPU] Add iglp_opt(3) for simple mfma / exp interleaving (#117269)
Adds a minimal iglp_opt to do simple exp / mfma interleaving.
2024-12-06 15:19:07 -08:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Kazu Hirata
141574bacb
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#113415) 2024-10-23 10:44:09 -07:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Kazu Hirata
d07dc5aa50
[AMDGPU] Avoid repeated hash lookups (NFC) (#110788) 2024-10-02 06:52:21 -07:00
Kazu Hirata
3b9f183958
[AMDGPU] Use llvm::any_of, llvm::all_of, and llvm::none_of (NFC) (#103007) 2024-08-13 00:07:54 -07:00
Kazu Hirata
e40915b740
[AMDGPU] Use llvm::any_of and llvm::none_of (NFC) (#102794) 2024-08-12 10:45:24 -07:00
Jay Foad
c7309dadbf
[AMDGPU] Use range-based for loops. NFC. (#99047) 2024-07-17 10:18:03 +01:00
Jay Foad
5e338f1f4a [AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC. 2024-07-17 08:27:35 +01:00
Jay Foad
aeafdc21d2 [AMDGPU] Use using instead of typedef. NFC. 2024-07-16 16:44:12 +01:00
Jay Foad
78dea4c1ea [AMDGPU] Use bool literals for bools. NFC. 2024-07-16 15:44:49 +01:00
Kazu Hirata
fef144cebb Revert "[llvm] Use llvm::sort (NFC) (#96434)"
This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d.

Reverting the patch fixes the following under EXPENSIVE_CHECKS:

  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir
  LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll
  LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll
2024-06-25 11:18:40 -07:00
Kazu Hirata
05d167fc20
[llvm] Use llvm::sort (NFC) (#96434) 2024-06-23 10:38:51 -07:00
Kazu Hirata
5dc99af487
[llvm] Use llvm::is_contained (NFC) (#95362) 2024-06-13 08:09:13 -07:00
Jeffrey Byrnes
cf1c97b2d2
[AMDGPU] Do not attempt to fallback to default mutations (#83208)
IGLP itself will be in SavedMutations via mutations added during
Scheduler creation, thus falling back results in reapplying IGLP.

In PostRA scheduling, if we have multiple regions with IGLP
instructions, then we may have infinite loop.

Disable the feature for now.
2024-02-27 18:04:59 -08:00
Jeffrey Byrnes
8f2bd8ae68
[AMDGPU] Introduce iglp_opt(2): Generalized exp/mfma interleaving for select kernels (#81342)
This implements the basic pipelining structure of exp/mfma interleaving
for better extensibility. While it does have improved extensibility,
there are controls which only enable it for DAGs with certain
characteristics (matching the DAGs it has been designed against).
2024-02-23 17:13:20 -08:00
Jeffrey Byrnes
f1156fb622
[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416)
Makes constructing SchedGroups of this type easier, and provides ability
to create them with __builtin_amdgcn_sched_group_barrier
2023-12-19 16:54:18 -08:00
Jeffrey Byrnes
6d8b44a506
[AMDGPU] [IGLP]: Fix assert (#73710)
We can also re-enter IGLP mutation via later `SchedStage`s in the
`GCNMaxOccupancySchedStrategy` . This is sort of NFC in that there is no
changed behavior for the only current client of `IsReentry`
2023-12-07 17:10:10 -08:00
Craig Topper
35baff8b6a
[AMDGPU] Correct assert that incorrectly chained multiple == operators. (#70291)
I believe this assert was trying to check that 3 variables were equal to
0.

I think it instead got interpreted as ((DSWCount == DSWWithPermCount) ==
DSWWithSharedVMEMCount) == 0 I guess (DSWCount == DSWWithPermCount) was
true because both counts were 0. Then true got compared to
DSWWithSharedVMEMCount, and since DSWWithSharedVMEMCount is 0, that
compare was false. And then that false compared equal to the final 0.
2023-10-26 08:02:10 -07:00
Kazu Hirata
6e18003a23 [llvm] Use llvm::any_of (NFC) 2023-10-22 10:42:18 -07:00
Jeffrey Byrnes
6afceba510
[AMDGPU][IGLP] SingleWaveOpt: Cache DSW Counters from PreRA (#67759)
Save the DSW counters from PreRA scheduling. While this avoids recalculation in the postRA pass, that isn't the main purpose.

This is required because of physical register dependencies in PostRA scheduling -- they alter the DAG s.t. our counters may become incorrect -- which alters the layout of the pipeline. By preserving the values from PreRA, we can be sure that we accurately construct the pipeline.

Additionally, remove a bad assert in SharesPredWithPrevNthGroup -- it is possible that we will have an empty cache if OtherGroup has no elements which have a V_PERM pred (possible if the V_PERM SG is empty).
2023-10-06 17:34:14 -07:00
Kazu Hirata
8a7f4eeb60 [llvm] Use llvm::is_contained (NFC) 2023-09-22 17:09:27 -07:00
Luke Drummond
471d9c57af [NFC][AMDGPU] assert we've found a value before use
The sync pipeline should always contain the candidate ID. If it doesn't
something's gone awry. assert on that.

Reviewed by: jrbyrnes

Differential Revision: https://reviews.llvm.org/D158845
2023-08-28 10:14:47 +01:00
Jeffrey Byrnes
6b7805fcb1 [AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms
This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering.

The kernels of interest have these specific phases:
NT: 1, 2a, 2c
NN: 1, 2a, 2b
TT: 1, 2b, 2c
TN: 1, 2b

The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver.

Differential Revision: https://reviews.llvm.org/D149773

Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96
2023-07-13 12:03:04 -07:00
Jeffrey Byrnes
db61927951 [AMDGPU][IGLP]: Add rules to SchedGroups
Differential Revision: https://reviews.llvm.org/D146774

Change-Id: Icd7aaaa0b257a25713c22ead0813777cef7d5859
2023-06-06 19:19:21 -07:00
Jeffrey Byrnes
1721e72d6e [AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver
Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to require this. Providing the option to toggle this affords greater experimentation capability, and make usage a bit more intuitive. Importantly, it makes designing rules much easier.

Differential Revision: https://reviews.llvm.org/D149393

Change-Id: Ic4abd3408f9faa105c0eef72eab7873d46083ee4
2023-05-30 14:43:14 -07:00
Nico Weber
72e01ef1f1 Revert "[AMDGPU] Add Lower Bound to PipelineSolver"
This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9.
Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c912994a9#1191724
2023-04-06 12:35:44 -04:00
Jeff Byrnes
3c42a58c4f [AMDGPU] Add Lower Bound to PipelineSolver 2023-04-05 14:54:59 -07:00
Stanislav Mekhanoshin
63e7e9c875 [AMDGPU] Treat WMMA the same as MFMA for sched_barrier
MFMA and WMMA essentially the same thing, but apear on different ASICs.

Differential Revision: https://reviews.llvm.org/D142062
2023-01-19 10:52:31 -08:00