DeferredAAs should only capture bootstrap actions, but after 30b73ed7bd it was
capturing all actions, including those from other plugins. This is problematic
as other plugins may introduce actions that need to run before the platform
actions (e.g. on arm64e we need pointer signing to run before we access any
global pointers in the graph).
Note that this effectively undoes 30b73ed7bd, which was a buggy attempt to
synchronize writes to the DeferredAAs vector. This patch fixes that issue the
obvious way by locking the bootstrap mutex while accessing the DeferredAAs
vector.
No testcase yet: So far I've only seen this fail during bootstrap of arm64e
JIT'd programs.
This reapplies 6d72bf47606, which was reverted in 57447d3ddf to investigate
build failures, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/10114.
The original patch contained an invalid unused friend declaration of
std::make_shared. This has been removed.
Also adds a new IdleTask type and updates DynamicThreadPoolTaskDispatcher to
schedule IdleTasks whenever the total number of threads running is less than
the maximum number of MaterializationThreads.
A SimpleLazyReexportsSpeculator instance maintains a list of speculation
suggestions ((JITDylib, Function) pairs) and registered lazy reexports. When
speculation opportunities are available (having been added via
addSpeculationSuggestions or when lazy reexports were created) it schedules
an IdleTask that triggers the next speculative lookup as soon as resources
are available. Speculation suggestions are processed first, followed by
lookups for lazy reexport bodies. A callback can be registered at object
construction time to record lazy reexport executions as they happen, and these
executions can be fed back into the speculator as suggestions on subsequent
executions.
The llvm-jitlink tool is updated to support speculation when lazy linking is
used via three new arguments:
-speculate=[none|simple] : When the 'simple' value is specified a
SimpleLazyReexportsSpeculator instances is used
for speculation.
-speculate-order <path> : Specifies a path to a CSV containing
(jit dylib name, function name) triples to use
as speculative suggestions in the current run.
-record-lazy-execs <path> : Specifies a path in which to record lazy function
executions as a CSV of (jit dylib name, function
name) pairs, suitable for use with
-speculate-order.
The same path can be passed to -speculate-order and -record-lazy-execs, in
which case the file will be overwritten at the end of the execution.
No testcase yet: Speculative linking is difficult to test (since by definition
execution behavior should be unaffected by speculation) and this is an new
prototype of the concept*. Tests will be added in the future once the interface
and behavior settle down.
* An earlier implementation of the speculation concept can be found in
llvm/include/llvm/ExecutionEngine/Orc/Speculation.h. Both systems have the
same goal (hiding compilation latency) but different mechanisms. This patch
relies entirely on information available in the controller, where the old
system could receive additional information from the JIT'd runtime via
callbacks. I aim to combine the two in the future, but want to gain more
practical experience with speculation first.
Threads created by DynamicThreadPoolTaskDispatcher::dispatch had been holding a
unique_ptr to the most recent Task, meaning that the Task would be destroyed
when the thread object was destroyed, but this would happen *after* the thread
signaled the Dispatcher that it was finished. This could cause
DynamicThreadPoolTaskDispatcher::shutdown to return (and consequently
ExecutionSession to be destroyed) before all Tasks were destroyed, with Task
destructors accessing ExecutionSession and related objects after they were
freed.
The fix is to reset the Task pointer immediately after it is run to trigger
cleanup, *then* (if there are no other tasks to run) signal the Dispatcher that
the thread is finished.
This patch also updates DynamicThreadPoolTaskDispatcher::dispatch to reject any
new Tasks dispatched after DynamicThreadPoolTaskDispatcher::shutdown is called.
ORC and JITLink debugging output write the dbgs() raw_ostream, which isn't
thread-safe. Use -num-threads=0 to force single-threaded linking for tests that
produce debugging output.
The llvm-jitlink tool is updated to suggest -num-threads=0 when debugging
output is enabled.
Similar to a9e75b1d4d1: During MachOPlatform bootstrap we need to defer actions
until essential platform functionality has been loaded, but the platform itself
may be loaded under a concurrent dispatcher so we have to guard against the
deferred actions vector being accessed concurrently.
This fixes a probablistic failure in the ORC runtime regression tests on
Darwin/x86-64 that was spotted after edca1d9bad2 (which turned on concurrent
linking by default in llvm-jitlink).
The debug section map was using MachO section names (with the "__" prefix), but
DWARFContext expects section names with the object format prefix stripped off.
This was preventing DWARFContext from accessing the debug_str section,
resulting in bogus source name strings.
This fixes a concurrency bug in the COFF_x86_64 backend: lookupAsync was used
to find __ImageBase without blocking to wait for the result.
Rather than adding promises / futures, this patch just adds an external
__ImageBase symbol if needed. This is the canonical JITLink solution for
resolving external symbols, and is simpler and more concurrency friendly than
using promises / futures with lookupAsync.
Cleanup to recent LazyReexportManager changes: KeyToReentryAddr now maps to
multiple addrs, so make its name plural. Use vector insert rather than a for
loop.
NFC.
Introduces a new layer interface, LinkGraphLayer, that can be used to
add LinkGraphs to an ExecutionSession.
This patch moves most of ObjectLinkingLayer's functionality into a new
LinkGraphLinkingLayer which should (in the future) be able to be used
without linking libObject. ObjectLinkingLayer now inherits from
LinkGraphLinkingLayer and just handles conversion of object files to
LinkGraphs, which are then handed down to LinkGraphLinkingLayer to be
linked.
Apologies for the large change, I looked for ways to break this up and
all of the ones I saw added real complexity. This change focuses on the
option's prefixed names and the array of prefixes. These are present in
every option and the dominant source of dynamic relocations for PIE or
PIC users of LLVM and Clang tooling. In some cases, 100s or 1000s of
them for the Clang driver which has a huge number of options.
This PR addresses this by building a string table and a prefixes table
that can be referenced with indices rather than pointers that require
dynamic relocations. This removes almost 7k dynmaic relocations from the
`clang` binary, roughly 8% of the remaining dynmaic relocations outside
of vtables. For busy-boxing use cases where many different option tables
are linked into the same binary, the savings add up a bit more.
The string table is a straightforward mechanism, but the prefixes
required some subtlety. They are encoded in a Pascal-string fashion with
a size followed by a sequence of offsets. This works relatively well for
the small realistic prefixes arrays in use.
Lots of code has to change in order to land this though: both all the
option library code has to be updated to use the string table and
prefixes table, and all the users of the options library have to be
updated to correctly instantiate the objects.
Some follow-up patches in the works to provide an abstraction for this
style of code, and to start using the same technique for some of the
other strings here now that the infrastructure is in place.
This re-applies 570ecdcf8b4, which was reverted in 74e8a37ff32 due to bot
failures. This commit renames sysv_resolve.cpp to resolve.cpp, which was the
cause of the config errors.
This reapplies 570ecdcf8b4, which was reverted in 6073dd923b8 due to bot
failures.
The test failures on Linux were fixed by:
1. Removing an overly restrictive assertion (query dependence on a symbol no
longer implies a MaterializingInfo for that symbol)
2. Adding reentry and resolver files to the ORC runtime CMakeLists.txt for
Linux.
3. Adding the __orc_rt_reentry -> __orc_rt_sysv_reentry alias to ELFNixPlatform.
…d reentry.
These utilities provide new, more generic and easier to use support for
lazy compilation in ORC.
LazyReexportsManager is an alternative to LazyCallThroughManager. It
takes requests for lazy re-entry points in the form of an alias map:
lazy-reexports = {
( <entry point symbol #1>, <implementation symbol #1> ),
( <entry point symbol #2>, <implementation symbol #2> ),
...
( <entry point symbol #n>, <implementation symbol #n> )
}
LazyReexportsManager then:
1. binds the entry points to the implementation names in an internal
table.
2. creates a JIT re-entry trampoline for each entry point.
3. creates a redirectable symbol for each of the entry point name and
binds redirectable symbol to the corresponding reentry trampoline.
When an entry point symbol is first called at runtime (which may be on
any thread of the JIT'd program) it will re-enter the JIT via the
trampoline and trigger a lookup for the implementation symbol stored in
LazyReexportsManager's internal table. When the lookup completes the
entry point symbol will be updated (via the RedirectableSymbolManager)
to point at the implementation symbol, and execution will proceed to the
implementation symbol.
Actual construction of the re-entry trampolines and redirectable symbols
is delegated to an EmitTrampolines functor and the
RedirectableSymbolsManager respectively.
JITLinkReentryTrampolines.h provides a JITLink-based implementation of
the EmitTrampolines functor. (AArch64 only in this patch, but other
architectures will be added in the near future).
Register state save and reentry functionality is added to the ORC
runtime in the __orc_rt_sysv_resolve and __orc_rt_resolve_implementation
functions (the latter is generic, the former will need custom
implementations for each ABI and architecture to be supported, however
this should be much less effort than the existing OrcABISupport
approach, since the ORC runtime allows this code to be written as native
assembly).
The resulting system:
1. Works equally well for in-process and out-of-process JIT'd code.
2. Requires less boilerplate to set up.
Given an ObjectLinkingLayer and PlatformJD (JITDylib containing the ORC
runtime), setup is just:
```c++
auto RSMgr = JITLinkRedirectableSymbolManager::Create(OLL);
if (!RSMgr)
return RSMgr.takeError();
auto LRMgr = createJITLinkLazyReexportsManager(OLL, **RSMgr, PlatformJD);
if (!LRMgr)
return LRMgr.takeError();
```
after which lazy reexports can be introduced with:
```c++
JD.define(lazyReexports(LRMgr, <alias map>));
```
LazyObectLinkingLayer is updated to use this new method, but the LLVM-IR
level CompileOnDemandLayer will continue to use LazyCallThroughManager
and OrcABISupport until the new system supports a wider range of
architectures and ABIs.
The llvm-jitlink utility's -lazy option now uses the new scheme. Since
it depends on the ORC runtime, the lazy-link.ll testcase and associated
helpers are moved to the ORC runtime.
Make EPCGenericMemoryAccess the default implementation for the MemoryAccess
object in SimpleRemoteEPC, and add support for the WritePointers operation to
OrcTargetProcess (previously this operation was unimplemented and would have
triggered an error if accessed in a remote-JIT setup).
No testcase yet: This functionality requires cross-process JITing to test (or a
much more elaborate unit-test setup). It can be tested once the new top-level
ORC runtime project lands.
Use SymbolStringPtr for Symbol names in LinkGraph. This reduces string interning
on the boundary between JITLink and ORC, and allows pointer comparisons (rather
than string comparisons) between Symbol names. This should improve the
performance and readability of code that bridges between JITLink and ORC (e.g.
ObjectLinkingLayer and ObjectLinkingLayer::Plugins).
To enable use of SymbolStringPtr a std::shared_ptr<SymbolStringPool> is added to
LinkGraph and threaded through to its construction sites in LLVM and Bolt. All
LinkGraphs that are to have symbol names compared by pointer equality must point
to the same SymbolStringPool instance, which in ORC sessions should be the pool
attached to the ExecutionSession.
---------
Co-authored-by: Lang Hames <lhames@gmail.com>
This reapplies aba6bb0820b, which was reverted in 28e2a891210 due to bot
failures. It contains fixes to silence warnings for uncovered switches,
and for incorrect initializer-symbol handling on ELF and COFF.
SideEffectsOnly is a new jitlink::Scope value that corresponds to the
JITSymbolFlags::MaterializationSideEffectsOnly flag: Symbols with this scope
can be looked up (and form part of the initial interface of a LinkGraph) but
never actually resolve to an address (so can only be looked up with a
WeaklyReferencedSymbol lookup).
Previously ObjectLinkingLayer implicitly treated JITLink symbols as having this
scope, regardless of a Symbol's actual scope, if the
MaterializationSideEffectsOnly flag was set on the corresponding symbol in the
MaterializationResponsibility object. Using an explicit scope in JITLink for
this (1) allows JITLink plugins to identify and correctly handle
side-effects-only symbols, and (2) allows raw LinkGraphs to define
side-effects-only symbols without clients having to manually modify their
`MaterializationUnit::Interface`.
This reapplies 427fb5cc5ac, which was reverted in 08c1a6b3e18 due to bot
failures.
The fix was to remove an incorrect assertion: In IL_emit, during the initial
worklist loop, an EDU can have all of its dependencies removed without becoming
ready (because it may still have implicit dependencies that will be added back
during the subsequent propagateExtraEmitDeps operation). The EDU will be marked
Ready at the end of IL_emit if its Dependencies set is empty at that point.
Prior to that we can only assert that it's either Emitted or Ready (which is
already covered by other assertions).
AsynchronousSymbolQuery tracks the symbols that it depends on in order to (1)
detach the query in the event of a failure, and (2) report those dependencies
to clients of the ExecutionSession::lookup method (via the RegisterDependencies
argument). Previously we tracked only dependencies on symbols that didn't meet
the required state (the only symbols that the query needs to be attached to),
but this is insufficient to report all necessary dependencies to lookup clients.
E.g. A lookup requiring SymbolState::Resolved where some matched symbol is
already Resolved but not yet Emitted or Ready would result in the dependency on
that symbol not being reported, which could result in illegal access in
concurrent JIT setups. (This bug was discovered by @mikaoP on discord with a
simple concurrent JIT setup).
This patch tracks and reports all dependencies on symbols that aren't Ready yet,
correcting the under-reporting issue. AsynchronousSymbolQuery::detach is updated
to stop asserting that all depended-upon symbols have a query attached.
ErrorAsOutParameter's Error* constructor supports cases where an Error might not
be passed in (because in the calling context it's known that this call won't
fail). Most clients always have an Error present however, and for them an Error&
overload is more convenient.
Check that we're not reusing any handler tag addresses before installing any
handlers. This ensures that either all of the handlers are installed*, or none
of them are, simplifying error recovery.
* Ignoring handlers whose tags couldn't be resolved at all: these were never
installed.
The __mod_init_func section contains pointers to static initializer functions.
In the static compilation model for MachO/arm64e these are unsigned pointers
that are signed by dyld before being called. This patch teaches JITLink's
MachO/arm64 backend to sign __mod_init_func pointers using the PAC signing
function introduced in a432f11a52d (signing is triggered by rewriting all
Pointer64 edges in the section to Pointer64Authenticated edges). This means
that unlike the static compilation model the linked __mod_init_func section
will contain signed pointers.
Note: Signing of init pointers could instead have been handled by the ORC
runtime in a manner similar to dyld, but this would have come at the cost of
adding an extra signing oracle. Using the signing function avoids this.
Testing this change requires execution. It is covered by the
trivial-cxx-constructor.cpp testcase that was added to the ORC runtime in
7c0786363e6.
Adds two new JITLink passes to create and populate a pointer-signing function
that can be called via an allocation-action attached to the LinkGraph:
* createEmptyPointerSigningFunction creates a pointer signing function in a
custome section, reserving sufficient space for the signing code. It should
be run as a post-prune pass (to ensure that memory is reserved prior to
allocation).
* lowerPointer64AuthEdgesToSigningFunction pass populates the signing function
by walking the graph, decoding the ptrauth info (encoded in the edge addend) and
writing an instruction sequence to sign all ptrauth fixup locations.
rdar://61956998
LazyObjectLinkingLayer can be used to add object files that will not be linked
into the executor unless some function that they define is called at runtime.
(References to data members defined by these objects will still trigger
immediate linking)
To implement lazy linking, LazyObjectLinkingLayer uses the lazyReexports
utility to construct stubs for each function in a given object file, and an
ObjectLinkingLayer::Plugin to rename the function bodies at link-time. (Data
symbols are not renamed)
The llvm-jitlink utility is extended with a -lazy option that can be
passed before input files or archives to add them using the lazy linking
layer rather than the base ObjectLinkingLayer.
Symbol::setSize asserts that the new size does not overflow the containing
block, so we need to point the Symbol at the correct Block before updating its
size (otherwise we may get a spurious overflow assertion).