8 Commits

Author SHA1 Message Date
Jason Molenda
d406ce8e20
[lldb][macOS] Don't fetch settings in Host, to keep layering (#181406)
I introduced a dependency from Host on Core without realizing it in an
earlier PR, while adding a setting to disable the new shared cache
binary blob scanning/reading in HostInfoMacOSX, which caused build
problems. Thanks to Alex for figuring out the build failure I caused.

Add a bool to the methods in HostInfoMacOSX, and have the callers (in
Core and various plugins etc) all fetch the
symbols.shared-cache-binary-loading setting from ModuleList, and pass
the result in.

The least obvious part of this is in ProcessGDBRemote where we first
learn the shared cache filepath & uuid, it calls
HostInfoMacOSX::SharedCacheIndexFiles() - this is only called when the
shared cache binary loading is enabled, so I conditionalize the call to
this method based on the setting.

rdar://148939795
2026-02-13 16:31:05 -08:00
Jason Molenda
e3c72cf008
[lldb] Add a new way of loading files from a shared cache (#179881)
Taking advantage of a few new SPI in macOS 26.4 libdyld, it is possible
for lldb to load binaries out of a shared cache binary blob, instead of
needing discrete files on disk. lldb has had one special case where it
has done this for years -- if the debugee process and lldb itself are
using the same shared cache, it could create ObjectFiles based on its
own memory contents. This new method requires only the shared cache on
disk, not depending on it being mapped into lldb's address space
already.

In HostInfoMacOSX.mm, we create an array of binaries in lldb's shared
cache, by one of two methods depending on the availability of SPI/SDKs.
This PR adds a new third method for loading lldb's shared cache off disk
as a proof of concept. It will prefer this new method when the needed
SPI are available at runtime. There is also a user setting to disable
this new method in case we uncover a problem as it is deployed.

I did change the internal store of the shared cache files from a single
array, to being organized by shared cache UUIDs, so we can have multiple
shared caches indexed in the future.

In HostInfoBase.h's SharedCacheImageInfo class, you can now create an
ImageInfo with a DataExtractorSP or a void* baton. I added GetUUID and
GetExtractor methods, and the latter will use the libdyld SPI to map the
segments for a specific binary into lldb's memory and return a
DataExtractorSP.

The setting is currently called symbols.shared-cache-binary-loading.

In DynamicLoaderDarwin::FindTargetModuleForImageInfo there was an
ordering mistake where we would always consult the HostInfoMacOSX.mm
shared cache provider, instead of checking lldb's own global module
cache first when looking for a binary, resulting in creating a new
Module repeatedly for shared cache binaries with the new method, parsing
the symbol table repeatedly. I fixed the ordering so we look at existing
Modules before we check the shared cache for one.

In ObjectFileMachOTest, it tests a TEXT and a DATA symbol, checking that
the contents of the function/data object match the bytes we got from the
shared cache. The test was using a DATA_DIRTY symbol, which was fine
when using lldb's own shared cache memory, but when we worked on the
shared cache binary on-disk directly, we were seeing different values
for the bytes because of relocations in there. I changed this to a
constant DATA symbol.

rdar://148939795

---------

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Co-authored-by: Alex Langford <nirvashtzero@gmail.com>
2026-02-05 18:38:20 -08:00
Jason Molenda
2aa020f49b
[lldb][NFC] Module, ModuleSpec, GetSectionData use DataExtractorSP (#178347)
In a PR last month I changed the ObjectFile CreateInstance etc methods
to accept an optional DataExtractorSP instead of a DataBufferSP, and
retain the extractor in a shared pointer internally in all of the
ObjectFile subclasses. This is laying the groundwork for using a
VirtualDataExtractor for some Mach-O binaries on macOS, where the
segments of the binary are out-of-order in actual memory, and we add a
lookup table to make it appear that the TEXT segment is at offset 0 in
the Extractor, etc. Working on the actual implementation, I realized we
were still using DataBufferSP's in ModuleSpec and Module, as well as in
ObjectFile::GetModuleSpecifications.

I originally was making a much larger NFC change where I had all
ObjectFile subclasses operating on DataExtractors throughout their
implementation, as well as in the DWARF parser. It was a very large
patchset. Many subclasses start with their DataExtractor, then create
smaller DataExtractors for parts of the binary image - the string table,
the symbol table, etc., for processing.

After consideration and discussion with Jonas, we agreed that a
segment/section of a binary will never require a lookup table to access
the bytes within it, so I changed
VirtualDataExtractor::GetSubsetExtractorSP to (1) require that the
Subset be contained within a single lookup table entry, and (2) return a
simple DataExtractor bounded on that byte range. By doing this, I was
able to remove all of my very-invasive changes to the ObjectFile
subclass internals; it's only when they are operating on the entire
binary image that care is needed.

One pattern that subclasses like ObjectFileBreakpad use is to take an
ArrayRef of the DataBuffer for a binary, then create a StringRef of
that, then look for strings in it. With a VirtualDataExtractor and
out-of-order binary segments, with gaps between them, this allows us to
search the entire buffer looking for a string, and segfault when it gets
to an unmapped region of the buffer. I added a
VirtualDataExtractor::GetSubsetExtractorSP(0) which gets the largest
contiguous memory region starting at offset 0 for this use case, and I
added a comment about what was being done there because I know it is not
obvious, and people not working on macOS wouldn't be familiar with the
requirement. (when we have a ModuleSpec with a DataExtractor, any of the
ObjectFile subclasses get a shot at Creating, so they all have to be
able to iterate on these)

rdar://148939795
2026-01-29 15:36:40 -08:00
Alex Langford
9ca02a13a4
[lldb][NFC] Mark Symbol pointers as const where easily possible (#177472)
These are the places that required no modifications to surrounding code.
2026-01-27 15:23:49 -08:00
Jason Molenda
f961d6a89a Revert "[lldb] Set default object format to MachO in ObjectFileMachO (#142704)"
This reverts commit d4d2f069dec4fb8b13447f52752d4ecd08d976d6.

Temporarily reverting until we can find a way to get the correct
ObjectFile set in Module's Triples without adding "-macho" to the
triple string for each Module.  This is breaking TestUniversal.py
on the x86_64 macOS CI bots.
2025-06-05 16:24:31 -07:00
royitaqi
d4d2f069de
[lldb] Set default object format to MachO in ObjectFileMachO (#142704)
# The Change

This patch sets the **default** object format of `ObjectFileMachO` to be
`MachO` (instead of what currently ends up to be `ELF`, see below). This
should be **the correct thing to do**, because the code before the line
of change has already verified the Mach-O header.

The existing logic:
* In `ObjectFileMachO`, the object format is unassigned by default. So
it's `UnknownObjectFormat` (see
[code](54d544b831/llvm/lib/TargetParser/Triple.cpp (L1024))).
* The code then looks at load commands like `LC_VERSION_MIN_*`
([code](54d544b831/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp (L5180-L5217)))
and `LC_BUILD_VERSION`
([code](54d544b831/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp (L5231-L5252)))
and assign the Triple's OS and Environment if they exist.
* If the above sets the Triple's OS to macOS, then the object format
defaults to `MachO`; otherwise it is `ELF`
([code](54d544b831/llvm/lib/TargetParser/Triple.cpp (L936-L937)))

# Impact

For **production usage** where Mach-O files have the said load commands
(which is
[expected](https://www.google.com/search?q=Are+mach-o+files+expected+to+have+the+LC_BUILD_VERSION+load+command%3F)),
this patch won't change anything.
* **Important note**: It's not clear if there are legitimate production
use cases where the Mach-O files don't have said load commands. If there
is, the exiting code think they are `ELF`. This patch changes it to
`MachO`. This is considered a fix for such files.

For **unit tests**, this patch will simplify the yaml data by not
requiring the said load commands.

# Test

See PR.
2025-06-04 17:07:07 -07:00
Jonas Devlieghere
4ad19b80ea
[lldb] Test parsing the symtab with indirect symbols from the shared cache
This patch adds a test for b0dc2fae6025. That commit fixed a bug where
we could increment the indirect symbol offset every time we parsed the
symbol table.
2022-03-23 21:13:55 -07:00
Fred Riss
8113a8bb79 [lldb/ObjectFileMachO] Fetch shared cache images from our own shared cache
Summary:
On macOS 11, the libraries that have been integrated in the system
shared cache are not present on the filesystem anymore. LLDB was
using those files to get access to the symbols of those libraries.
LLDB can get the images from the target process memory though.

This has 2 consequences:
 - LLDB cannot load the images before the process starts, reporting
   an error if someone tries to break on a system symbol.
 - Loading the symbols by downloading the data from the inferior
   is super slow. It takes tens of seconds at the start of the
   debug session to populate the Module list.

To fix this, we can use the library images LLDB has in its own
mapping of the shared cache. Shared cache images are somewhat
special as their LINKEDIT segment is moved to the end of the cache
and thus the images are not contiguous in memory. All of this can
hidden in ObjectFileMachO.

This patch fixes a number of test failures on macOS 11 due to the
first problem described above and adds some specific unittesting
for the new SharedCache Host utilities.

Reviewers: jasonmolenda, labath

Subscribers: llvm-commits, lldb-commits

Tags: #lldb, #llvm

Differential Revision: https://reviews.llvm.org/D83023
2020-07-16 10:37:37 -07:00