I initially assumed only kernels could be roots, but that is wrong. A
function with no callers also needs to be a root to ensure it is
correctly handled.
They're very rare because we usually internalize everything, and
internal functions with no callers would be deleted.
When they are present, we need to also consider their dependencies and
act accordingly. Previously, we could put a function "by default" in P0,
but it could call another function with internal linkage defined in
another module which was of course incorrect.
Fixes SWDEV-467695
It allows it to access TTI correctly, and opens the door to accessing
more analysis in the future.
I went back and forth between this, and also making the default
SplitModule a Pass too to make it uniform, but I decided against it
because it's just needless complications. Neither llvm-split or
LTOBackend have a PM ready to use so we need to create one anyway. Let's
keep all the mess hidden in the AMDGPU version for now to keep this
change more self-contained.
This is just something I noticed while going over this pass logic one
more time and didn't cause issues (yet). If we find an indirect call, we
stop looking assuming we added all functions to the list, but if not all
functions in the module were indirectly callable, some may still be
missing.
Just to be safe, keep looking until we did everything we could to find
dependencies, so we don't accidentally miss one.
(with fix for ubsan)
This enables the --lto-partitions option to work more consistently.
This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).
This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.
This enables the --lto-partitions option to work more consistently.
This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).
This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.