36 Commits

Author SHA1 Message Date
Joseph Huber
923cc2d43b
[AMDGPU] Fix alias handling in module splitting functionality (#187295)
Summary:
The module splitting used for `-flto-partitions=8` support (which is
passed by default) did not correctly handle aliases. We mainly need to
do two things: keep the aliases in the they are used in and externalize
them. Internalize linkage needs to be handled conservatively.

This is needed because these aliases show up in PGO contexts.

---------

Co-authored-by: Shilei Tian <i@tianshilei.me>
2026-03-19 10:51:39 -05:00
Jan Svoboda
de4a1a77e1
[clang][modules] Prevent deadlock in module cache (#182722)
When there's a dependency cycle between modules, the dependency scanner
may encounter a deadlock. This was caused by not respecting the lock
timeout. But even with the timeout implemented, leaving
`unsafeMaybeUnlock()` unimplemented means trying to take a lock after a
timeout would still fail and prevent making progress. This PR implements
this API in a way to avoid UB on `std::mutex` (when it's unlocked by
someone else than the owner). Lastly, this PR makes sure that
`unsafeUnlock()` ends the wait of existing threads, so that they don't
need to hit the full timeout amount.

This PR also implements `-fimplicit-modules-lock-timeout=<seconds>` that
allows tweaking the default 90-second lock timeout, and adds `#pragma
clang __debug sleep` that makes it easier to achieve desired execution
ordering.

rdar://170738600
2026-02-27 09:36:14 -08:00
Henry
c6f6efba3b
[NFC] Implicit container copy cleanup (#174702)
A set of cleanup for redundant implicit container copies. Fixed with
const reference or move semantics.

e8996cb24 [AMDGPU] replace copy with const reference (NFC)
25ceecee8 [-Wunsafe-buffer-usage] Replace vector copy with reference
(NFC)
e1f5254e0 [AMDGPU] Replace copy with move semantics (NFC)
8261250d7 [InstCombine] Replace vector copy with move semantic (NFC)
749bb21de [CommandLine] Avoid vector copy for const argument (NFC)
b89526f90 [LoongArch] Remove unnecessary vector copy (NFC)
6b22bcf56 [TextAPI] Replace map copy with const reference (NFC)
a121519d8 [BlockExtract] Avoid copy semantic for ctor (NFC)
3034d3063 [LifetimeSafety] Avoid map copy for dump methods (NFC)

---------

Co-authored-by: sfu <afwbu8tp6@mozmail.com>
2026-01-14 11:16:32 +01:00
Kazu Hirata
00ef94805a
[AMDGPU] Remove const on a return type. (#168490)
While I am at it, this patch switches to the constructor that takes
a container instead of a pair of begin/end.

Identified with readability-const-return-type.
2025-11-18 07:16:58 -08:00
Pierre van Houtryve
2168455ef4
[AMDGPU][SplitModule] Do not create empty modules (#135761)
Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which
can happen),
we import global with non-local linkage into the first non-empty
partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they
can deal with less modules
than the number requested, but no. We already return only 1 module in
some cases and
it seems to be handled just fine.

Fixes SWDEV-523146
2025-04-25 09:36:41 +02:00
David Green
dd3de590eb [CostModel] Fix InlineSizeEstimatorAnalysis after #135596
Fix a reference to getValue() being optional in InlineSizeEstimatorAnalysis, a
file that is not included in the default build. A "warning: enumerated and
non-enumerated type in conditional expression" warning is fixed in AMDGPU too.
2025-04-23 08:20:12 +01:00
David Green
98b6f8dc69
[CostModel] Remove optional from InstructionCost::getValue() (#135596)
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.
2025-04-23 07:46:27 +01:00
Ramkumar Ramachandra
fd6260f13b
[EquivClasses] Shorten members_{begin,end} idiom (#134373)
Introduce members() iterator-helper to shorten the members_{begin,end}
idiom. A previous attempt of this patch was #130319, which had to be
reverted due to unit-test failures when attempting to call members() on
the end iterator. In this patch, members() accepts either an ECValue or
an ElemTy, which is more intuitive and doesn't suffer from the same
issue.
2025-04-04 14:34:08 +01:00
Florian Hahn
3bdf9a0880
[EquivalenceClasses] Use SmallVector for deterministic iteration order. (#134075)
Currently iterators over EquivalenceClasses will iterate over std::set,
which guarantees the order specified by the comperator. Unfortunately in
many cases, EquivalenceClasses are used with pointers, so iterating over
std::set of pointers will not be deterministic across runs.

There are multiple places that explicitly try to sort the equivalence
classes before using them to try to get a deterministic order
(LowerTypeTests, SplitModule), but there are others that do not at the
moment and this can result at least in non-determinstic value naming in
Float2Int.

This patch updates EquivalenceClasses to keep track of all members via a
extra SmallVector and removes code from LowerTypeTests and SplitModule
to sort the classes before processing.

Overall it looks like compile-time slightly decreases in most cases, but
close to noise:

https://llvm-compile-time-tracker.com/compare.php?from=7d441d9892295a6eb8aaf481e1715f039f6f224f&to=b0c2ac67a88d3ef86987e2f82115ea0170675a17&stat=instructions

PR: https://github.com/llvm/llvm-project/pull/134075
2025-04-02 20:27:43 +01:00
Florian Hahn
9e5bfbf77d
[EquivalenceClasses] Update member_begin to take ECValue (NFC).
Remove a level of indirection and update code to use range-based for
loops.
2025-04-01 09:28:46 +01:00
Kazu Hirata
71935281e0
[Target] Use *Set::insert_range (NFC) (#132140)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-20 09:09:30 -07:00
Jan Svoboda
d8dfdafc1d
[Support] Introduce new AdvisoryLock interface (#130989)
This PR abstracts the `LockFileManager` API into new `AdvisoryLock`
interface. This is so that we can create an alternative implementation
for Clang implicitly-built modules that is optimized for single-process
environment.
2025-03-13 10:54:51 -07:00
Jan Svoboda
f4d599cda9
[Support] Do not remove lock file on failure (#130834)
Clients of `LockFileManager` call `unsafeRemoveLockFile()` whenever
`tryLock()` fails. However looking at the code, there are no scenarios
where this actually does something useful. This PR removes such calls.
2025-03-12 08:51:00 -07:00
Jan Svoboda
dafb566710
[Support] Return LockFileManager errors right away (#130627)
This patch removes some internal state out of `LockFileManager` by
moving the locking code from the constructor into new member function
`tryLock()` which returns the errors right away. This simplifies and
modernizes the interface.
2025-03-11 13:49:44 -07:00
Siu Chi Chan
83ad90d851 [AMDGPU] Fix module split's assumption on kernels
Module split assumes that a kernel function must have an external
linkage; however, that isn't the case.  For example, a static kernel
function will have a weak_odr linkage

Change-Id: I1e5dee0de1fd866b365f4090a574e1b2961f8dca
2024-12-06 22:56:31 +00:00
Fraser Cormack
345b3319c8
[AMDGPU][SplitModule] Fix unintentional integer division (#117586)
A static analysis tool warned that a division was always being performed
in integer division, so was either 0.0 or 1.0.

This doesn't seem intentional, so has been fixed to return a true ratio
using floating-point division. This in turn showed a bug where a
comparison against this ratio was incorrect.
2024-11-27 10:18:52 +00:00
Fraser Cormack
3414993eaf
[AMDGPU][SplitModule] Fix potential divide by zero (#117602)
A static analysis tool found that ModuleCost could be zero, so would
perform divide by zero when being printed. Perhaps this is unreachable
in practice, but the fix is straightforward enough and unlikely to be a
performance concern.
2024-11-26 10:05:09 +00:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Pierre van Houtryve
b3a8400afa (reland) [AMDGPU][SplitModule] Handle !callees metadata (#108802)
(reland with fixed sed command for macos)

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
2024-10-15 07:16:57 +02:00
Nico Weber
140cbca83d Revert "[AMDGPU][SplitModule] Handle !callees metadata (#108802)"
This reverts commit 4a0dc3ef36ceff20787ff277a1fb6a1b513c4934.
Breaks tests, see comments on
https://github.com/llvm/llvm-project/pull/108802
2024-10-14 17:26:15 -04:00
Pierre van Houtryve
4a0dc3ef36
[AMDGPU][SplitModule] Handle !callees metadata (#108802)
See #106528 to review the first commit.

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
2024-10-14 08:55:12 +02:00
Pierre van Houtryve
d656b20632
[AMDGPU][SplitModule] Cleanup CallsExternal Handling (#106528)
- Don't treat inline ASM as indirect calls
- Remove call to alias testing, which was broken (only working by pure
luck right now) and isn't needed anyway. GlobalOpt should take care of
them for us.
2024-10-11 08:37:20 +02:00
pvanhout
959d84044a [AMDGPU] Remove unused SplitGraph::Node::getFullCost 2024-09-09 13:50:48 +02:00
Pierre van Houtryve
9347b66cfc
Reland "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)" (#107076)
Relands #104763 with
- Fixes for EXPENSIVE_CHECKS test failure (due to sorting operator
failing if the input is shuffled first)
 - Fix for broken proposal selection
 - c3cb27370af40e491446164840766478d3258429 included

Original commit description below
---

Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in favor of just using CL options
and LLVM_DEBUG, like any other pass in LLVM.
- The SML system started from good intentions, but it was too flawed and
messy to be of any real use. It was also a real pain to use and made the
code more annoying to maintain.
 - Graph-based module representation with DOTGraph printing support
- The graph represents the module accurately, with bidirectional, typed
edges between nodes (a node usually represents one function).
- Nodes are assigned IDs starting from 0, which allows us to represent a
set of nodes as a BitVector. This makes comparing 2 sets of nodes to
find common dependencies a trivial task. Merging two clusters of nodes
together is also really trivial.
 - No more defaulting to "P0" for external calls
- Roots that can reach non-copyable dependencies (such as external
calls) are now grouped together in a single "cluster" that can go into
any partition.
 - No more defaulting to "P0" for indirect calls
- New representation for module splitting proposals that can be graded
and compared.
- Graph-search algorithm that can explore multiple branches/assignments
for a cluster of functions, up to a maximum depth.
- With the default max depth of 8, we can create up to 256 propositions
to try and find the best one.
- We can still fall back to a greedy approach upon reaching max depth.
That greedy approach uses almost identical heuristics to the previous
version of the pass.

All of this gives us a lot of room to experiment with new heuristics or
even entirely different splitting strategies if we need to. For
instance, the graph representation has room for abstract nodes, e.g. if
we need to represent some global variables or external constraints. We
could also introduce more edge types to model other type of relations
between nodes, etc.

I also designed the graph representation & the splitting strategies to
be as fast as possible, and it seems to have paid off. Some quick tests
showed that we spend pretty much all of our time in the CloneModule
function, with the actual splitting logic being >1% of the runtime.
2024-09-09 09:06:34 +02:00
Danial Klimkin
6345604ae5
Revert: [AMDGPU] Graph-based Module Splitting Rewrite (llvm#104763) (#106707)
* Revert "Fix MSVC "not all control paths return a value" warning. NFC."
Dep to revert c9b6e01b2e4fc930dac91dd44c0592ad7e36d967

* Revert "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)"
Breaks tests.
2024-08-30 13:39:30 +02:00
Simon Pilgrim
c3cb27370a Fix MSVC "not all control paths return a value" warning. NFC. 2024-08-29 11:56:15 +01:00
Pierre van Houtryve
c9b6e01b2e
[AMDGPU] Graph-based Module Splitting Rewrite (#104763)
Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in favor of just using CL options
and LLVM_DEBUG, like any other pass in LLVM.
- The SML system started from good intentions, but it was too flawed and
messy to be of any real use. It was also a real pain to use and made the
code more annoying to maintain.
 - Graph-based module representation with DOTGraph printing support
- The graph represents the module accurately, with bidirectional, typed
edges between nodes (a node usually represents one function).
- Nodes are assigned IDs starting from 0, which allows us to represent a
set of nodes as a BitVector. This makes comparing 2 sets of nodes to
find common dependencies a trivial task. Merging two clusters of nodes
together is also really trivial.
 - No more defaulting to "P0" for external calls
- Roots that can reach non-copyable dependencies (such as external
calls) are now grouped together in a single "cluster" that can go into
any partition.
 - No more defaulting to "P0" for indirect calls
- New representation for module splitting proposals that can be graded
and compared.
- Graph-search algorithm that can explore multiple branches/assignments
for a cluster of functions, up to a maximum depth.
- With the default max depth of 8, we can create up to 256 propositions
to try and find the best one.
- We can still fall back to a greedy approach upon reaching max depth.
That greedy approach uses almost identical heuristics to the previous
version of the pass.

All of this gives us a lot of room to experiment with new heuristics or
even entirely different splitting strategies if we need to. For
instance, the graph representation has room for abstract nodes, e.g. if
we need to represent some global variables or external constraints. We
could also introduce more edge types to model other type of relations
between nodes, etc.

I also designed the graph representation & the splitting strategies to
be as fast as possible, and it seems to have paid off. Some quick tests
showed that we spend pretty much all of our time in the CloneModule
function, with the actual splitting logic being >1% of the runtime.
2024-08-29 10:39:57 +02:00
Fraser Cormack
2e9f3f3b84
[AMDGPU][llvm-split] Fix another division by zero (#104421)
Somehow I missed this in #98888. It requires a log file, or the debug
flag to be passed.
2024-08-15 12:54:26 +01:00
Jay Foad
5e338f1f4a [AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC. 2024-07-17 08:27:35 +01:00
Fraser Cormack
075f7542f1
[AMDGPU][llvm-split] Fix division by zero (#98888)
An empty module, or one containing only declarations, would result in a
division by a zero cost.
2024-07-15 15:06:37 +01:00
Pierre van Houtryve
1c025fb02d
[AMDGPU][SplitModule] Allow non-kernels to be treated as roots (#95902)
I initially assumed only kernels could be roots, but that is wrong. A
function with no callers also needs to be a root to ensure it is
correctly handled.
They're very rare because we usually internalize everything, and
internal functions with no callers would be deleted.

When they are present, we need to also consider their dependencies and
act accordingly. Previously, we could put a function "by default" in P0,
but it could call another function with internal linkage defined in
another module which was of course incorrect.

Fixes SWDEV-467695
2024-06-24 08:46:53 +02:00
Pierre van Houtryve
d95b82c49a
[NFC][AMDGPU] Make AMDGPUSplitModule a ModulePass (#95773)
It allows it to access TTI correctly, and opens the door to accessing
more analysis in the future.

I went back and forth between this, and also making the default
SplitModule a Pass too to make it uniform, but I decided against it
because it's just needless complications. Neither llvm-split or
LTOBackend have a PM ready to use so we need to create one anyway. Let's
keep all the mess hidden in the AMDGPU version for now to keep this
change more self-contained.
2024-06-18 09:16:32 +02:00
Pierre van Houtryve
42c4027729
[AMDGPU][SplitModule] Keep looking for more dependencies after finding an indirect call (#93480)
This is just something I noticed while going over this pass logic one
more time and didn't cause issues (yet). If we find an indirect call, we
stop looking assuming we added all functions to the list, but if not all
functions in the module were indirectly callable, some may still be
missing.

Just to be safe, keep looking until we did everything we could to find
dependencies, so we don't accidentally miss one.
2024-05-28 08:18:53 +02:00
Pierre van Houtryve
43fd244b3d Reland "[AMDGPU] Add AMDGPU-specific module splitting (#89245)"
(with fix for ubsan)

This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).

This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.
2024-05-27 10:43:00 +02:00
Vitaly Buka
5a48223d1c
Revert "[AMDGPU] Add AMDGPU-specific module splitting" (#93275)
Fails on https://lab.llvm.org/buildbot/#/builders/85/builds/24181
and https://lab.llvm.org/buildbot/#/builders/5/builds/43589

Reverts llvm/llvm-project#89245
2024-05-23 23:48:39 -07:00
Pierre van Houtryve
d7c3713000
[AMDGPU] Add AMDGPU-specific module splitting (#89245)
This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).

This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.
2024-05-23 12:26:24 +02:00