This commit moves the bulk of
LinkGraphLinkingLayer::registerDependencies into a new static method,
LinkGraphLinkingLayer::calculateDepGroups, where the behavior can be
unit tested.
The new method returns a list of LinkGraphLinkingLayer::SymbolDepGroups:
```
struct SymbolDepGroup {
SmallVector<jitlink::Symbol*> Defs;
DenseSet<jitlink::Symbol*> Deps;
};
```
The existing registerDependencies method converts these into
orc::SymbolDependenceGroups for registration with the ExecutionSession.
The calculateDepGroups method uses a new algorithm for calculating
dependencies between symbols in the LinkGraph. As before, the goal is to
compute dependencies of non-locally-scoped symbols defined within the
graph on other non-locally-scoped symbols in the graph (whether defined
by the graph or external). It is sufficient to record the first
non-locally-scoped symbol defined for each block reached (since such
symbols will have their own dependencies reported, and all such symbols
for a given block will have the same dependencies). The new algorithm
uses SCCIterator to visit strongly connected components within the
subgraph formed by edges to "anonymous" blocks (i.e. blocks that do not
define any non-locally-scoped symbols). These are visited in reverse-DFS
order, allowing dependencies to be efficiently propagated.
This change results in a ~2x speedup when JIT-loading clang (as tested
on a 2023 MacBook Pro, 12-core M3 Pro, 18Gb).
This removes an unnecessary coupling between ExecutorProcessControl and
DylibManager, allowing clients to select DylibManager implementations
independently.
To simplify the transition, the
ExecutorProcessControl::createDefaultJITDylib method will return an
instance of whatever DylibManager the ExecutorProcessControl
implementation had been using previously.
This updates EPCGenericDylibManager to implement the DylibManager
interface, and drops the DylibManager implementation from
SimpleRemoteEPC. Since SimpleRemoteEPC already owned an
EPCGenericDylibManager it can simply provide that as its DylibManager
implementation. This change should not affect the behavior of
SimpleRemoteEPC from the perspective of API clients.
SelfExecutorProcessControl no longer implements DylibManager. Instead a
private inner class, InProcessDylibManager, is used to implement this
interface. This change should not affect the behavior of
SelfExecutorProcessControl from the perspective of API clients.
This is a step towards decoupling ExecutorProcessControl implementations
from other interfaces.
Adds a new EPCGenericJITLinkMemoryManager convenience constructor that
constructs an instance by looking up the given symbol names in the
bootstrap JITDylib of the given ExecutionSession.
The symbol names default to the SimpleNativeMemoryMap SPS-interface
symbol names provided by the new ORC runtime.
Create takes a JITDylib and a SymbolNames struct, looks up the
implementation symbol addresses in the given JITDylib, and uses them to
construct an EPCGenericJITLinkMemoryManager instance. This makes it
easier for ORC clients to construct the memory manager from named
symbols (e.g. in a bootstrap JITDylib) rather than raw addresses.
The ExecutionSession constructor now creates a "<bootstrap>" JITDylib
and populates it with the bootstrap symbols from the
ExecutorProcessControl object. This allows bootstrap symbols to be
looked up via ExecutionSession::lookup, providing greater consistency
with other JIT symbol lookups.
The function instructionsWithoutDebug serves two uses: skipping debug
intrinsics and skipping pseudo instructions. Nonetheless, these
functions are expensive due to out-of-line filtering using
std::function. Ideally, the filter should be inlined, but that would
require including IntrinsicInst.h in BasicBlock.h.
We no longer use debug intrinsics, so the first use (parameter false) is
no longer needed. The second use is sometimes needed, but the
distinction between PseudoProbe instructions can be made at the call
sites more easily in many cases.
Therefore, remove instructionsWithoutDebug/sizeWithoutDebug.
c-t-t stage2-O3 -0.21%.
Now that CondBrInst and UncondBrInst are explicit subclasses, use them
instead.
HotColdSplitting was trying to inspect prof metadata also on
unconditional branches, fix this.
Also introduce C API cast functions and deprecate LLVMIsConditional in
favor of LLVMIsACondBrInst.
This patch covers all LLVM uses outside of Transforms, Analysis,
CodeGen/Target, SandboxIR, Frontend/OpenMP, tools, examples.
WaitingOnGraph is critical to the performance of LLVM's JIT (see e.g.
https://github.com/llvm/llvm-project/issues/179611), and these facilities will
make it easier to capture and investigate test cases, and build a performance
regression suite.
WaitingOnGraph::OpRecorder provides an interface for classes that want to
capture the essential WaitingOnGraph operations: simplify-and-emit, and fail.
WaitingOnGraph::simplify and WaitingOnGraph::fail now take an optional
OpRecorder pointer.
WaitingOnGraphOpStreamRecorder (WaitingOnGraphOpReplay.h) is an OpRecorder
implementation that serializes operations to a line-oriented text format on a
raw_ostream. WaitingOnGraphOpReplay provides types and utilities for iterating
over and replaying recorded operations. readWaitingOnGraphOpsFromBuffer returns
an iterator range over the ops in a serialized buffer.
The new ExecutionSession::setWaitingOnGraphOpRecorder method can be used to
install a recorder to capture ops from OL_notifyEmitted and IL_failSymbols.
The llvm-jitlink tool gains two new options:
- -waiting-on-graph-capture <filename> records all WaitingOnGraph operations
during a regular llvm-jitlink invocation.
- -waiting-on-graph-replay <filename> replays the operations from a capture
file. In this mode other arguments are ignored.
If there is no debug information, we wouldn't call
`DebugObject::collectTargetAlloc` in the post-allocation phase.
Therefore, when it's in the post-fixup phase,
`DebugObject::awaitTargetMem` will fail with _"std::future_error: No
associated state"_ because the std::future was not even populated.
This PR fixes
1. issue #175509 about missing support of deinitialize on ELF platform.
2. missing support of execution order by proirity at both initialize and
deinitialize stage.
cc: @tqchen @joker-eph
Updates CompactUnwindTraits_MachO_arm64 and
CompactUnwindTraits_MachO_x86_64 encodingCanBeMerged methods to use
switch statements that clearly list mergeable encodings, and have a
default "false" case.
Since the new scheme explicitly covers DWARF modes (always
non-mergeable), this patch removes the separate DWARF mode check from
mergeRecords in CompactUnwindSupport.h.
Compact unwind record merging is an optimization. Using a can-be-merged
predicate is preferrable to a "cannot-be-merged" predicate as the former
encourages conservatively correct implementations: "what is safe to
merge" is easier to reason about than "what is safe to not not merge".
Add missing spaces to error messages, use Triple::getArchName (gives
canonical arch name on Darwin, e.g. "arm64" rather than "aarch64").
No testcase for this one: the change is cosmetic, and the error message
format not relied upon anywhere.
Optimize the GOT and Stub relocations if the edge target address is in
range from call site - Indirect jump by plt stub can be replaced with
direct jump to target in post-allocation optimization.
When trying to perf inject JIT dump generatd through the perf plugin,
perf fails with the following error:
```
jitdump file contains invalid or unsupported flags 0xf5880666c26c
0x2b750 [0xa8]: failed to process type: 10 [Operation not permitted]
```
It turns out that Header's Flags field was never initialized, so the
value could be random.
This patch fixes the issue by initialising all Header's fields.
Co-authored-by: Lang Hames <lhames@gmail.com>
The stub function is generated for R_MIPS_26 relocation, which could be
used for local jumping inside a function, and do not expect any
temporary register to be clobbered.
Use AT instead of T9 for the stub function, otherwise functions using T9
will be messed up.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
…ces"
This reapplies 906b48616c03948a4df62a5a144f7108f3c455e8, which was
reverted in c11df52f9b847170b766fb71defd2a9222d95a8d due to bot
failures.
The testcase has been dropped from this recommit as it failed on several
bots (possbly due to differing backtrace formats or failure modes). I'll
re-introduce the testcase in a follow-up commit so that it cane be
iterated on (and re-reverted if necessary) without affecting the options
introduced by this commit. (Since these options are best-effort
debugging tools it's ok if they live in-tree without a test for now).
The CMake ADDITIONAL_HEADER_DIRS directive for two Orc libraries,
specifically Shared and TargetProcess, used incorrect values that
pointed to its parent library include directory instead of its own. This
is now fixed.
This reverts commit 906b48616c03948a4df62a5a144f7108f3c455e8.
The forward fix for this got reverted in
25976e83606f1a7615e3725e6038bb53ee96c3d5, so reverting the original
commit given it is still broken and the forward fix that mitigated most
of the issues is no longer in tree.
StringMap duplicates the option name to a new allocation for every
option, which is not necessary. Instead we can use the same StringRef
that the Option already uses inside a DenseMap. This reduces the amount
of allocations when loading libLLVM.
This patch adds tools for capturing symbol information from JIT'd code
and using it to symbolicate backtraces. This is useful for debugging
crashes in JIT-compiled code where traditional symbolication tools may
not have access to the JIT symbol table. These tools are not a general
solution to the JIT symbolication problem (that will require further
integration with system components like libunwind, the dynamic linker,
and/or crash tracing tools), but will aid in JIT debugging and
development until a general solution is available.
APIs Added:
1. SymbolTableDumpPlugin - A LinkGraphLinkingLayer::Plugin that captures
symbol information as code is JIT'd and writes it to a file.
- Create(StringRef Path) ->
Expected<std::shared_ptr<SymbolTableDumpPlugin>> Creates a plugin that
appends symbol information to the specified file.
- Symbol table format: "<link graph name>" <address> <symbol name>
<address> <symbol name> ...
The plugin uses a PostAllocationPass to write symbols after addresses
have
been assigned but before the code is finalized.
2. DumpedSymbolTable - A class for symbolicating backtraces using a
previously dumped symbol table.
- Create(StringRef Path) -> Expected<DumpedSymbolTable> Loads and parses
a symbol table from a file.
- symbolicate(StringRef Backtrace) -> std::string Given text of a
backtrace, for rows ending with a hex address, adds the symbol name,
offset, and defining graph name.
New `llvm-jitlink` Command Line Options:
1. -write-symtab=<path> Enables the SymbolTableDumpPlugin to write
symbol information to the specified file as objects are JIT'd. The
symbol table can then be used to symbolicate backtraces from crashes or
signal handlers.
2. -symbolicate-with=<path> Runs llvm-jitlink in symbolication mode.
Reads the symbol table from <path> and symbolicates backtraces read from
stdin or input files.
Usage Examples:
$ llvm-jitlink -write-symtab=symbols.txt mycode.o
$ llvm-jitlink -symbolicate-with=symbols.txt - < backtrace.txt
This Error can be returned from operations on JITDylibs that cannot
proceed as the target JITDylib has been closed.
This patch uses the new error to replace an unsafe assertion in
JITDylib::define: If a JITDylib::define operation is run by an in-flight
task after the target JITDylib is closed it should error out rather than
asserting.
See also https://github.com/llvm/llvm-project/issues/174922
Allow C programs to pass a ReserveAlloc flag to the constructor of
llvm::SessionMemoryManager, using a new variant of
LLVMOrcCreateRTDyldObjectLinkingLayerWithSectionMemoryManager that has
...ReserveAlloc() appended to its name.
- Replace getLibraries() with cursor-based iteration.
- Simplify search logic and handle new libraries during scanning.
- Make symbol resolution faster with single enumeration and early exit.
fa7f7a4cab4 changed the jit-dispatch function signature used in the
orc_rt_lite_reoptimize_helper function, but jit-dispatch still takes a
raw data pointer and size argument.
Should fix the bug in
https://lab.llvm.org/buildbot/#/builders/169/builds/18319 and similar
builds.
Updates ExecutionSession::runJITDispatchHandler to take the argument
buffer for the function as a WrapperFunctionBuffer, rather than an
ArrayRef<char>.
This is a first step towards more efficient jit-dispatch handler calls:
1. Handlers can now be run as tasks, since they own their argument
buffer (so there's no risk of it being deallocated before they're run)
2. In in-process JIT setups, this will allow argument buffers to be
passed in directly from the ORC runtime, rather than having to copy the
buffer.
ReOptimizeLayer was building LLVM IR to define a precomputed,
SPS-serialized argument buffer, then inserting calls directly to
__orc_rt_jit_dispatch, passing the address of the precomputed buffer and
an __orc_rt_reoptimize_tag defined by the ORC runtime. This design is
non-canonical, requiring the ORC runtime to be loaded (or an extra
definition for __orc_rt_reoptimize_tag to be inserted) while not using
the runtime to perform the serialization.
This commit updates ReOptimizeLayer to instead insert calls to an
__orc_rt_reoptimize function implemented in the ORC runtime. This
function will perform serialization and call __orc_rt_jit_dispatch,
similar to other functions in the ORC runtime.
To maintain support for in-process JITs that don't use the ORC runtime,
this commit adds a ReOptimizeLayer::addOrcRTLiteSupport method which
injects IR to define __orc_rt_reoptimize (calling through to an
orc_rt_lite_reoptimize_helper function defined in LLVM) and
__orc_rt_reoptimize_tag. The ReOptimizeLayerTest is updated to use
addOrcRTLiteSupport.
If `Alloc.finalize()` fails in the post-allocation pass, we store the
error in `FinalizePromise`. If we don't reach the post-fixup pass
afterwards the error will leak. This patch adds another case in the
DebugObject destructor that will check the `Expected<T>` and report the
error.
Implements `reserveAllocationSpace` and provides an option to enable
`needsToReserveAllocationSpace` for large-memory environments with
AArch64.
The [AArch64
ABI](https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst#7code-models)
has restrictions on the distance between TEXT and GOT sections as the
instructions to reference them are limited to 2 or 4GB. Allocating
sections in multiple blocks can result in distances greater than that on
systems with lots of memory. In those environments several projects
using SectionMemoryManager with MCJIT have run across assertion failures
for the R_AARCH64_ADR_PREL_PG_HI21 instruction as it attempts to address
across distances greater than 2GB (an int32).
Fixes#71963 by allocating all sections in a single contiguous memory
allocation, limiting the distance required for instruction offsets
similar to how pre-compiled binaries would be loaded into memory.
Co-authored-by: Lang Hames <lhames@gmail.com>
Also renames CWrapperFunctionResult to CWrapperFunctionBuffer.
These types are used as argument buffers, as well as result buffers. The
new name better reflects their purpose, and is consistent with naming in
the new ORC runtime (llvm-project/orc-rt).
On ppc64le some sections like .toc get merged into other sections by
JITLink. As such, some sections in the object file may not be present in
the link graph. Skip those sections.