32 Commits

Author SHA1 Message Date
Pierre van Houtryve
b3a8400afa (reland) [AMDGPU][SplitModule] Handle !callees metadata (#108802)
(reland with fixed sed command for macos)

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
2024-10-15 07:16:57 +02:00
Nico Weber
140cbca83d Revert "[AMDGPU][SplitModule] Handle !callees metadata (#108802)"
This reverts commit 4a0dc3ef36ceff20787ff277a1fb6a1b513c4934.
Breaks tests, see comments on
https://github.com/llvm/llvm-project/pull/108802
2024-10-14 17:26:15 -04:00
Pierre van Houtryve
4a0dc3ef36
[AMDGPU][SplitModule] Handle !callees metadata (#108802)
See #106528 to review the first commit.

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
2024-10-14 08:55:12 +02:00
Pierre van Houtryve
d656b20632
[AMDGPU][SplitModule] Cleanup CallsExternal Handling (#106528)
- Don't treat inline ASM as indirect calls
- Remove call to alias testing, which was broken (only working by pure
luck right now) and isn't needed anyway. GlobalOpt should take care of
them for us.
2024-10-11 08:37:20 +02:00
Pierre van Houtryve
9347b66cfc
Reland "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)" (#107076)
Relands #104763 with
- Fixes for EXPENSIVE_CHECKS test failure (due to sorting operator
failing if the input is shuffled first)
 - Fix for broken proposal selection
 - c3cb27370af40e491446164840766478d3258429 included

Original commit description below
---

Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in favor of just using CL options
and LLVM_DEBUG, like any other pass in LLVM.
- The SML system started from good intentions, but it was too flawed and
messy to be of any real use. It was also a real pain to use and made the
code more annoying to maintain.
 - Graph-based module representation with DOTGraph printing support
- The graph represents the module accurately, with bidirectional, typed
edges between nodes (a node usually represents one function).
- Nodes are assigned IDs starting from 0, which allows us to represent a
set of nodes as a BitVector. This makes comparing 2 sets of nodes to
find common dependencies a trivial task. Merging two clusters of nodes
together is also really trivial.
 - No more defaulting to "P0" for external calls
- Roots that can reach non-copyable dependencies (such as external
calls) are now grouped together in a single "cluster" that can go into
any partition.
 - No more defaulting to "P0" for indirect calls
- New representation for module splitting proposals that can be graded
and compared.
- Graph-search algorithm that can explore multiple branches/assignments
for a cluster of functions, up to a maximum depth.
- With the default max depth of 8, we can create up to 256 propositions
to try and find the best one.
- We can still fall back to a greedy approach upon reaching max depth.
That greedy approach uses almost identical heuristics to the previous
version of the pass.

All of this gives us a lot of room to experiment with new heuristics or
even entirely different splitting strategies if we need to. For
instance, the graph representation has room for abstract nodes, e.g. if
we need to represent some global variables or external constraints. We
could also introduce more edge types to model other type of relations
between nodes, etc.

I also designed the graph representation & the splitting strategies to
be as fast as possible, and it seems to have paid off. Some quick tests
showed that we spend pretty much all of our time in the CloneModule
function, with the actual splitting logic being >1% of the runtime.
2024-09-09 09:06:34 +02:00
Danial Klimkin
6345604ae5
Revert: [AMDGPU] Graph-based Module Splitting Rewrite (llvm#104763) (#106707)
* Revert "Fix MSVC "not all control paths return a value" warning. NFC."
Dep to revert c9b6e01b2e4fc930dac91dd44c0592ad7e36d967

* Revert "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)"
Breaks tests.
2024-08-30 13:39:30 +02:00
pvanhout
575be3efb0 [AMDGPU][llvm-split] Make declarations test more stable
Delete the previous files if present, to ensure it won't fail if the output directory of the tests wasn't cleared.
2024-08-29 11:17:17 +02:00
pvanhout
31684c676a [AMDGPU][llvm-split] Remove declarations-debug
Test didn't have a FileCheck line and is obsolete after #104763
2024-08-29 10:52:40 +02:00
Pierre van Houtryve
c9b6e01b2e
[AMDGPU] Graph-based Module Splitting Rewrite (#104763)
Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in favor of just using CL options
and LLVM_DEBUG, like any other pass in LLVM.
- The SML system started from good intentions, but it was too flawed and
messy to be of any real use. It was also a real pain to use and made the
code more annoying to maintain.
 - Graph-based module representation with DOTGraph printing support
- The graph represents the module accurately, with bidirectional, typed
edges between nodes (a node usually represents one function).
- Nodes are assigned IDs starting from 0, which allows us to represent a
set of nodes as a BitVector. This makes comparing 2 sets of nodes to
find common dependencies a trivial task. Merging two clusters of nodes
together is also really trivial.
 - No more defaulting to "P0" for external calls
- Roots that can reach non-copyable dependencies (such as external
calls) are now grouped together in a single "cluster" that can go into
any partition.
 - No more defaulting to "P0" for indirect calls
- New representation for module splitting proposals that can be graded
and compared.
- Graph-search algorithm that can explore multiple branches/assignments
for a cluster of functions, up to a maximum depth.
- With the default max depth of 8, we can create up to 256 propositions
to try and find the best one.
- We can still fall back to a greedy approach upon reaching max depth.
That greedy approach uses almost identical heuristics to the previous
version of the pass.

All of this gives us a lot of room to experiment with new heuristics or
even entirely different splitting strategies if we need to. For
instance, the graph representation has room for abstract nodes, e.g. if
we need to represent some global variables or external constraints. We
could also introduce more edge types to model other type of relations
between nodes, etc.

I also designed the graph representation & the splitting strategies to
be as fast as possible, and it seems to have paid off. Some quick tests
showed that we spend pretty much all of our time in the CloneModule
function, with the actual splitting logic being >1% of the runtime.
2024-08-29 10:39:57 +02:00
Fraser Cormack
2e9f3f3b84
[AMDGPU][llvm-split] Fix another division by zero (#104421)
Somehow I missed this in #98888. It requires a log file, or the debug
flag to be passed.
2024-08-15 12:54:26 +01:00
Fraser Cormack
075f7542f1
[AMDGPU][llvm-split] Fix division by zero (#98888)
An empty module, or one containing only declarations, would result in a
division by a zero cost.
2024-07-15 15:06:37 +01:00
Ilia Sergachev
c02e8f762a
[llvm][transforms] Add a new algorithm to SplitModule (#95941)
The new round-robin algorithm overrides the hash-based distribution of
functions to modules. It achieves a more even number of functions per
module when the number of functions is close to the number of requested
modules. It's not in use by default and is available under a new flag.
2024-07-03 21:26:21 +02:00
Pierre van Houtryve
1c025fb02d
[AMDGPU][SplitModule] Allow non-kernels to be treated as roots (#95902)
I initially assumed only kernels could be roots, but that is wrong. A
function with no callers also needs to be a root to ensure it is
correctly handled.
They're very rare because we usually internalize everything, and
internal functions with no callers would be deleted.

When they are present, we need to also consider their dependencies and
act accordingly. Previously, we could put a function "by default" in P0,
but it could call another function with internal linkage defined in
another module which was of course incorrect.

Fixes SWDEV-467695
2024-06-24 08:46:53 +02:00
Pierre van Houtryve
42c4027729
[AMDGPU][SplitModule] Keep looking for more dependencies after finding an indirect call (#93480)
This is just something I noticed while going over this pass logic one
more time and didn't cause issues (yet). If we find an indirect call, we
stop looking assuming we added all functions to the list, but if not all
functions in the module were indirectly callable, some may still be
missing.

Just to be safe, keep looking until we did everything we could to find
dependencies, so we don't accidentally miss one.
2024-05-28 08:18:53 +02:00
Pierre van Houtryve
43fd244b3d Reland "[AMDGPU] Add AMDGPU-specific module splitting (#89245)"
(with fix for ubsan)

This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).

This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.
2024-05-27 10:43:00 +02:00
Vitaly Buka
5a48223d1c
Revert "[AMDGPU] Add AMDGPU-specific module splitting" (#93275)
Fails on https://lab.llvm.org/buildbot/#/builders/85/builds/24181
and https://lab.llvm.org/buildbot/#/builders/5/builds/43589

Reverts llvm/llvm-project#89245
2024-05-23 23:48:39 -07:00
Pierre van Houtryve
d7c3713000
[AMDGPU] Add AMDGPU-specific module splitting (#89245)
This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).

This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.
2024-05-23 12:26:24 +02:00
pvanhout
83f7a3a21f [llvm-split] Require x86-registered-target for target-specific-split.ll 2024-04-22 09:46:36 +02:00
Pierre van Houtryve
e86ebe4ff8
[LTO] Allow target-specific module splittting (#83128)
Allow targets to implement custom module splitting logic for
--lto-partitions, see #89245

https://discourse.llvm.org/t/rfc-lto-target-specific-module-splittting/77252
2024-04-22 08:59:18 +02:00
Matt Arsenault
a74c5707be Fix some test files with executable permissions 2022-12-02 17:12:03 -05:00
Matt Arsenault
7dc1009d13 llvm-split: Convert tests to opaque pointers
global.ll and scc-const-alias.ll needed some manual fixups; the script
seems to not correctly deal with constantexpr bitcasts.
2022-11-28 09:48:21 -05:00
Timm Bäder
924d62ca4a [llvm][tools] Hide remaining unrelated llvm- tool options
Differential Revision: https://reviews.llvm.org/D106430
2021-07-22 09:47:55 +02:00
Rafael Espindola
9fbc040599 Make GlobalValues with non-default visibilility dso_local.
This is similar to r322317, but for visibility. It is not as neat
because we have to special case extern_weak.

The idea is the same as the previous change, make the transition to
explicit dso_local easier for the frontends. With this they only have
to add dso_local to symbols where we need some external information to
decide if it is dso_local (like it being part of an ELF executable).

llvm-svn: 322806
2018-01-18 02:08:23 +00:00
Rafael Espindola
e4b0231c63 Make internal/private GVs implicitly dso_local.
While updating clang tests for having clang set dso_local I noticed
that:

- There are *a lot* of tests to update.
- Many of the updates are redundant.

They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.

llvm-svn: 322317
2018-01-11 22:15:05 +00:00
Evgeniy Stepanov
f74f091ea6 Preserve blockaddress use edges in the module splitter.
"blockaddress" can not apply to an external function. All
blockaddress constant uses must belong to the same module as the
definition of the target function.

llvm-svn: 265061
2016-03-31 21:55:11 +00:00
Evgeniy Stepanov
a614ab7b71 Preserve extern_weak linkage in CloneModule.
Only force "extern" linkage if the function used to be a definition
in the source module. Declarations keep their original linkage.

llvm-svn: 265043
2016-03-31 20:21:31 +00:00
Evgeniy Stepanov
f575b2687c Remove personality for declarations in CloneModule.
Personality is copied as part of copyFunctionAttributes, but it is
invalid on a declaration. Remove the personality attribute it the
function body is not cloned.

Also add a verifier run over output modules in the llvm-split tool.

llvm-svn: 264667
2016-03-28 21:37:02 +00:00
Sergei Larin
427f570ce1 [SplitModule] In split module utility we should never separate alias with its aliasee.
Summary: When splitting module with preserving locals, we currently do not handle case of global alias being separated with its aliasee.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16585

llvm-svn: 259075
2016-01-28 18:59:28 +00:00
Sergei Larin
d19d4d30d8 Add to the split module utility an SCC based method which allows not to globalize any local variables.
Summary:
    Currently llvm::SplitModule as the first step globalizes all local objects, which might not be desirable in some scenarios.
    This change adds a new flag to llvm::SplitModule that uses SCC approach to search for a balanced partition without the need to externalize symbols.
    Such partition might not be possible or fully balanced for a given number of partitions, and is a function of the module properties (global/local dependencies within the module).
    
    Joint development Tobias Edler von Koch (tobias@codeaurora.org) and Sergei Larin (slarin@codeaurora.org)
    
    Subscribers: llvm-commits, joker.eph
    
    Differential Revision: http://reviews.llvm.org/D16124

llvm-svn: 258083
2016-01-18 21:07:13 +00:00
Rafael Espindola
d1beb07d39 Have a single way for creating unique value names.
We had two code paths. One would create names like "foo.1" and the other
names like "foo1".

For globals it is important to use "foo.1" to help C++ name demangling.
For locals there is no strong reason to go one way or the other so I
kept the most common mangling (foo1).

llvm-svn: 253804
2015-11-22 00:16:24 +00:00
David Blaikie
2f40830dde [opaque pointer type] Add textual IR support for explicit type parameter for global aliases
update.py:
import fileinput
import sys
import re

alias_match_prefix = r"(.*(?:=|:|^)\s*(?:external |)(?:(?:private|internal|linkonce|linkonce_odr|weak|weak_odr|common|appending|extern_weak|available_externally) )?(?:default |hidden |protected )?(?:dllimport |dllexport )?(?:unnamed_addr |)(?:thread_local(?:\([a-z]*\))? )?alias"
plain = re.compile(alias_match_prefix + r" (.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|addrspacecast|\[\[[a-zA-Z]|\{\{).*$)")
cast  = re.compile(alias_match_prefix + r") ((?:bitcast|inttoptr|addrspacecast)\s*\(.* to (.*?)(| addrspace\(\d+\) *)\*\)\s*(?:;.*)?$)")
gep   = re.compile(alias_match_prefix + r") ((?:getelementptr)\s*(?:inbounds)?\s*\((?P<type>.*), (?P=type)(?:\s*addrspace\(\d+\)\s*)?\* .*\)\s*(?:;.*)?$)")

def conv(line):
  m = re.match(cast, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(gep, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(plain, line)
  if m:
    return m.group(1) + ", " + m.group(2) + m.group(3) + "*" + m.group(4) + "\n"
  return line

for line in sys.stdin:
  sys.stdout.write(conv(line))

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

llvm-svn: 247378
2015-09-11 03:22:04 +00:00
Peter Collingbourne
1dc6a8d179 TransformUtils: Introduce module splitter.
The module splitter splits a module into linkable partitions. It will
be used to implement parallel LTO code generation.

This initial version of the splitter does not attempt to deal with the
somewhat subtle symbol visibility issues around module splitting. These
will be dealt with in a future change.

Differential Revision: http://reviews.llvm.org/D12132

llvm-svn: 245662
2015-08-21 02:48:20 +00:00