Get the shared cache filepath and uuid that the inferior process is
using from debugserver, try to open that shared cache on the lldb host
mac and if the UUID matches, index all of the binaries in that shared
cache. When looking for binaries loaded in the process, get them from
the already-indexed shared cache.
Every time a binary is loaded, PlatformMacOSX may query the shared cache
filepath and uuid from the Process, and pass that to
HostInfo::GetSharedCacheImageInfo() if available (else fall back to the
old HostInfo::GetSharedCacheImageInfo method which only looks at lldb's
own shared cache), to get the file being requested.
ProcessGDBRemote caches the shared cache filepath and uuid from the
inferior, once it has a non-zero UUID. I added a lock for this ivar
specifically, so I don't have 20 threads all asking for the shared cache
information from debugserver and updating the cached answer. If we never
get back a non-zero UUID shared cache reply, we will re-query at every
library loaded notification. debugserver has been providing the shared
cache UUID since 2013, although I only added the shared cache filepath
field last November.
Note that a process will not report its shared cache filepath or uuid at
initial launch. As dyld gets a chance to execute a bit, it will start
returning binaries -- it will be available at the point when libraries
start loading. (it won't be available yet when the binary & dyld are the
only two binaries loaded in the process)
I tested this by disabling lldb's scan of its own shared cache
pre-execution -- only loading the system shared cache when the inferior
process reports that it is using that. I got 6-7 additional testsuite
failures running lldb like that, because no system binaries were loaded
before exeuction start, and the tests assumed they would be.
rdar://148939795
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Taking advantage of a few new SPI in macOS 26.4 libdyld, it is possible
for lldb to load binaries out of a shared cache binary blob, instead of
needing discrete files on disk. lldb has had one special case where it
has done this for years -- if the debugee process and lldb itself are
using the same shared cache, it could create ObjectFiles based on its
own memory contents. This new method requires only the shared cache on
disk, not depending on it being mapped into lldb's address space
already.
In HostInfoMacOSX.mm, we create an array of binaries in lldb's shared
cache, by one of two methods depending on the availability of SPI/SDKs.
This PR adds a new third method for loading lldb's shared cache off disk
as a proof of concept. It will prefer this new method when the needed
SPI are available at runtime. There is also a user setting to disable
this new method in case we uncover a problem as it is deployed.
I did change the internal store of the shared cache files from a single
array, to being organized by shared cache UUIDs, so we can have multiple
shared caches indexed in the future.
In HostInfoBase.h's SharedCacheImageInfo class, you can now create an
ImageInfo with a DataExtractorSP or a void* baton. I added GetUUID and
GetExtractor methods, and the latter will use the libdyld SPI to map the
segments for a specific binary into lldb's memory and return a
DataExtractorSP.
The setting is currently called symbols.shared-cache-binary-loading.
In DynamicLoaderDarwin::FindTargetModuleForImageInfo there was an
ordering mistake where we would always consult the HostInfoMacOSX.mm
shared cache provider, instead of checking lldb's own global module
cache first when looking for a binary, resulting in creating a new
Module repeatedly for shared cache binaries with the new method, parsing
the symbol table repeatedly. I fixed the ordering so we look at existing
Modules before we check the shared cache for one.
In ObjectFileMachOTest, it tests a TEXT and a DATA symbol, checking that
the contents of the function/data object match the bytes we got from the
shared cache. The test was using a DATA_DIRTY symbol, which was fine
when using lldb's own shared cache memory, but when we worked on the
shared cache binary on-disk directly, we were seeing different values
for the bytes because of relocations in there. I changed this to a
constant DATA symbol.
rdar://148939795
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Co-authored-by: Alex Langford <nirvashtzero@gmail.com>
In a PR last month I changed the ObjectFile CreateInstance etc methods
to accept an optional DataExtractorSP instead of a DataBufferSP, and
retain the extractor in a shared pointer internally in all of the
ObjectFile subclasses. This is laying the groundwork for using a
VirtualDataExtractor for some Mach-O binaries on macOS, where the
segments of the binary are out-of-order in actual memory, and we add a
lookup table to make it appear that the TEXT segment is at offset 0 in
the Extractor, etc. Working on the actual implementation, I realized we
were still using DataBufferSP's in ModuleSpec and Module, as well as in
ObjectFile::GetModuleSpecifications.
I originally was making a much larger NFC change where I had all
ObjectFile subclasses operating on DataExtractors throughout their
implementation, as well as in the DWARF parser. It was a very large
patchset. Many subclasses start with their DataExtractor, then create
smaller DataExtractors for parts of the binary image - the string table,
the symbol table, etc., for processing.
After consideration and discussion with Jonas, we agreed that a
segment/section of a binary will never require a lookup table to access
the bytes within it, so I changed
VirtualDataExtractor::GetSubsetExtractorSP to (1) require that the
Subset be contained within a single lookup table entry, and (2) return a
simple DataExtractor bounded on that byte range. By doing this, I was
able to remove all of my very-invasive changes to the ObjectFile
subclass internals; it's only when they are operating on the entire
binary image that care is needed.
One pattern that subclasses like ObjectFileBreakpad use is to take an
ArrayRef of the DataBuffer for a binary, then create a StringRef of
that, then look for strings in it. With a VirtualDataExtractor and
out-of-order binary segments, with gaps between them, this allows us to
search the entire buffer looking for a string, and segfault when it gets
to an unmapped region of the buffer. I added a
VirtualDataExtractor::GetSubsetExtractorSP(0) which gets the largest
contiguous memory region starting at offset 0 for this use case, and I
added a comment about what was being done there because I know it is not
obvious, and people not working on macOS wouldn't be familiar with the
requirement. (when we have a ModuleSpec with a DataExtractor, any of the
ObjectFile subclasses get a shot at Creating, so they all have to be
able to iterate on these)
rdar://148939795
Main executables were bypassing the locate module callback that shared
libraries use, preventing custom symbol file location logic from working
consistently.
This PR fix this by
* Adding target context to ModuleSpec
* Leveraging that context to use target search path and platform's
locate module callback in ModuleList::GetSharedModule
This ensures both main executables and shared libraries get the same
callback treatment for symbol file resolution.
---------
Co-authored-by: George Hu <hyubo@meta.com>
Co-authored-by: George Hu <georgehuyubo@gmail.com>
This patch removes all of the Set.* methods from Status.
This cleanup is part of a series of patches that make it harder use the
anti-pattern of keeping a long-lives Status object around and updating
it while dropping any errors it contains on the floor.
This patch is largely NFC, the more interesting next steps this enables
is to:
1. remove Status.Clear()
2. assert that Status::operator=() never overwrites an error
3. remove Status::operator=()
Note that step (2) will bring 90% of the benefits for users, and step
(3) will dramatically clean up the error handling code in various
places. In the end my goal is to convert all APIs that are of the form
` ResultTy DoFoo(Status& error)
`
to
` llvm::Expected<ResultTy> DoFoo()
`
How to read this patch?
The interesting changes are in Status.h and Status.cpp, all other
changes are mostly
` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git
grep -l SetErrorString lldb/source)
`
plus the occasional manual cleanup.
These don't need to be ConstStrings. They don't really benefit much from
deduplication and comparing them isn't on a hot path, so they don't
really benefit much from quick comparisons.
Differential Revision: https://reviews.llvm.org/D152331
Xcode 14 no longer puts the Rosetta expanded shared cache in a directory
named "16.0". Instead, it includes the real version number (e.g. 13.0),
the build string and the architecture, similar to the device support
directory names for iOS, tvOS and watchOS.
Currently, when there are multiple directories, we might end up picking
the wrong one in GetSDKDirectoryForCurrentOSVersion. The problem is that
without the build string we have no way to differentiate between
multiple directories with the same version number. This patch fixes the
problem by using GetOSBuildString which, as the name implies, returns
the build string if known.
This also adds a test for Rosetta debugging on Apple Silicon. Depending
on whether the Rosetta expanded shared cache is present, the test
ensures that there is or isn't a diagnostic about reading out of memory.
rdar://97576121
Differential revision: https://reviews.llvm.org/D130540
Make GetSharedModuleWithLocalCache consider the device support
directory. In the past we only needed the device support directory to
debug remote processes. Since the introduction of Apple Silicon and
Rosetta this stopped being true.
When debugging a Rosetta process on macOS we need to consider the
Rosetta expanded shared cache. This patch and it dependencies move that
logic out of PlatfromRemoteDarwinDevice into a new abstract class called
PlatfromDarwinDevice. The new platform sit in between PlatformDarwin and
PlatformMacOSX and PlatformRemoteDarwinDevice and has all the necessary
logic to deal with the device support directory.
Technically I could have moved everything in PlatfromDarwinDevice into
PlatfromDarwin but decided that this logic is sufficiently self
contained that it warrants its own abstraction.
rdar://91966349
Differential revision: https://reviews.llvm.org/D124801