- it is always passed as zero
- a lot of plugins aren't using it correctly
- the data extractor class already has the capability to look at a
subset of bytes
Some plugins were returning the number of specifications they have
added, while others were returning the total final number. Particularly
devious plugins (Minidump) were clearing the specification list
altogether. This resulted in nondeterministic failures (depending on
plugin ininitialization order) in TestSBModule.
This PR defines the problem away by having each plugin only return the
specifications it is responsible for. If the caller wants to merge them,
it is free to do so. This *might* be slighly less efficient, but this is
hardly hot code.
I'm not touching the ObjectFile::GetModuleSpecifications function (the
caller of all these functions) as the PR is big enough, although the
same approach might be warranted there as well.
Fixes https://github.com/llvm/llvm-project/issues/178625.
ObjectFile::FindPlugin iterates over plugins to find one that can handle
the binary provided. It is currently sending the one DataExtractorSP to
each subclass, but some subclasses may modify this DataExtractor during
their processing, e.g. calling DataExtractor::SetData on it, and I think
it is safer to isolate these with a copy of the DataExtractor so the
order the plugins are tried cannot possibly change behavior.
This PR replaces the Get*CallbackAtIndex pattern in the PluginManager
with returning a snapshot of callbacks that the caller can iterate over
using a range-based for loop. This is a continuation of #184452 which
added thread safety by using snapshots. However, that introduced a bunch
of unnecessary copies which are largely eliminated again by getting the
snapshot once when gather all the callbacks, rather than doing that on
each iteration when querying a plugin for a given index. It also
eliminates the possibility of the snapshot changing underneath you when
iterating over the plugins.
This change was largely mechanical and I used Claude to do the menial
work of updating the signatures and call sites.
This was originally introduced to support kalimba DSPs featuring 24-bit
bytes by f03e6d84 and also c928de3e, but the kalimba support was mostly
removed by f8819bd5. This change removes the rest of the support, which
was far from complete.
In a PR last month I changed the ObjectFile CreateInstance etc methods
to accept an optional DataExtractorSP instead of a DataBufferSP, and
retain the extractor in a shared pointer internally in all of the
ObjectFile subclasses. This is laying the groundwork for using a
VirtualDataExtractor for some Mach-O binaries on macOS, where the
segments of the binary are out-of-order in actual memory, and we add a
lookup table to make it appear that the TEXT segment is at offset 0 in
the Extractor, etc. Working on the actual implementation, I realized we
were still using DataBufferSP's in ModuleSpec and Module, as well as in
ObjectFile::GetModuleSpecifications.
I originally was making a much larger NFC change where I had all
ObjectFile subclasses operating on DataExtractors throughout their
implementation, as well as in the DWARF parser. It was a very large
patchset. Many subclasses start with their DataExtractor, then create
smaller DataExtractors for parts of the binary image - the string table,
the symbol table, etc., for processing.
After consideration and discussion with Jonas, we agreed that a
segment/section of a binary will never require a lookup table to access
the bytes within it, so I changed
VirtualDataExtractor::GetSubsetExtractorSP to (1) require that the
Subset be contained within a single lookup table entry, and (2) return a
simple DataExtractor bounded on that byte range. By doing this, I was
able to remove all of my very-invasive changes to the ObjectFile
subclass internals; it's only when they are operating on the entire
binary image that care is needed.
One pattern that subclasses like ObjectFileBreakpad use is to take an
ArrayRef of the DataBuffer for a binary, then create a StringRef of
that, then look for strings in it. With a VirtualDataExtractor and
out-of-order binary segments, with gaps between them, this allows us to
search the entire buffer looking for a string, and segfault when it gets
to an unmapped region of the buffer. I added a
VirtualDataExtractor::GetSubsetExtractorSP(0) which gets the largest
contiguous memory region starting at offset 0 for this use case, and I
added a comment about what was being done there because I know it is not
obvious, and people not working on macOS wouldn't be familiar with the
requirement. (when we have a ModuleSpec with a DataExtractor, any of the
ObjectFile subclasses get a shot at Creating, so they all have to be
able to iterate on these)
rdar://148939795
This fixes a regression in `ObjectFile` and `ObjectFileELF` introduced
by #171574.
The original code created a `DataBuffer` using `MapFileDataWritable`.
```
data_sp = MapFileDataWritable(*file, length, file_offset);
if (!data_sp)
return nullptr;
data_offset = 0;
```
The new code requires converting the `DataBuffer` to a `DataExtractor`:
```
DataBufferSP buffer_sp = MapFileDataWritable(*file, length, file_offset);
if (!buffer_sp)
return nullptr;
extractor_sp = std::make_shared<DataExtractor>();
extractor_sp->SetData(buffer_sp, data_offset, buffer_sp->GetByteSize());
data_offset = 0;
```
The issue is that once we get a data buffer back from
MapFileDataWritable, we don't have to adjust for the `data_offset` again
when calling `SetData` as the `DataBuffer` is already normalized to have
a zero start offset. A similar issue exists in `ObjectFile`.
rdar://168317174
Avoid redundant calls to `std::shared_ptr::get()`. The class provides a
dereference operator and using that is the standard, idiomatic way to
access the underlying object.
This is another instance where we weren't checking that the result of
FileSystem::CreateDataBuffer and unconditionally accessing it, similar
to the bug in SourceManager last week. In this particular case,
ObjectFile was assuming that we can read the contents non-zero, which
isn't true for directory nodes.
Jim figured this one out yesterday. I'm just putting up the patch and
adding a test.
rdar://167796036
The ObjectFile plugin interface accepts an optional DataBufferSP
argument. If the caller has the contents of the binary, it can provide
this in that DataBufferSP. The ObjectFile subclasses in their
CreateInstance methods will fill in the DataBufferSP with the actual
binary contents if it is not set.
ObjectFile base class creates an ivar DataExtractor from the
DataBufferSP passed in.
My next patch will be a caller that creates a VirtualDataExtractor with
the binary data, and needs to pass that in to the ObjectFile plugin,
instead of the bag-of-bytes DataBufferSP. It builds on the previous
patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP
so I could pass in a subclass in the shared ptr. And it will be using
the VirtualDataExtractor that Jonas added in
https://github.com/llvm/llvm-project/pull/168802
No behavior is changed by the patch; we're simply moving the creation of
the DataExtractor to the caller, instead of a DataBuffer that is
immediately used to set up the ObjectFile DataExtractor. The patch is a
bit complicated because all of the ObjectFile subclasses have to
initialize their DataExtractor to pass in to the base class.
I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it
under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked
lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the
SME, that explains why the testsuite passes)
rdar://148939795
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in to its
ctor. If the DataExtractor does not get a DataBuffer source passed in,
the subclass will initialize it with access to the object file's data.
When a DataBuffer is passed in to the base class ctor, the DataExtractor
only has its buffer initialized; ObjectFile doesn't yet know the address
size and endianness to fully initialize the DataExtractor.
This patch changes ObjectFile to instead have a DataExtractorSP ivar
which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor. The next patch I will be
writing is to change the ObjectFile ctor to take an optional
DataExtractorSP, so the caller can pass a DataExtractor subclass -- the
VirtualizeDataExtractor being added via
https://github.com/llvm/llvm-project/pull/168802
instead of a DataBuffer which is trivially saved into the DataExtractor.
The change is otherwise mechanical; all `m_data.` changed to
`m_data_sp->` and all the places where `m_data` was passed in for a
by-ref call were changed to `*m_data_sp.get()`. The shared pointer is
always initialized to contain an object.
I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for
getting the Linux testsuite to run on SME-only systems David). All of
the ObjectFile subclasses I modifed compile cleanly, but I haven't
tested them beyond any unit tests they may have (prob breakpad).
rdar://148939795
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.
For more context, see https://github.com/llvm/llvm-project/pull/163117.
This PR adds support for parsing the WebAssembly symbol table. The
symbol table is encoded in the "names" section and contains names and
indexes into other sections. For now we only support parsing function
(code) symbols. The result is that you can set breakpoints by symbol
name, while previously breakpoints by name required debug info (DWARF).
This is also necessary for Swift, which checks for the presence of
`swift_release` as a heuristic to determine if there's a static Swift
stdlib.
Different object file formats support DWARF sections (COFF, ELF, MachO,
PE/COFF, WASM). COFF and PE/COFF only matched a subset. This caused some
GCC executables produced on MinGW to have issue later on when debugging.
One example is that `.debug_rnglists` was not matched, which caused
range-extraction to fail when printing a backtrace.
This unifies the parsing of section names in
`ObjectFile::GetDWARFSectionTypeFromName`, so all file formats can use
the same naming convention. Since the prefixes are different,
`GetDWARFSectionTypeFromName` only matches the suffixes (i.e. `.debug_`
needs to be stripped before).
I added two tests to ensure the sections are correctly identified on
Windows executables.
Fix a [test
failure](https://github.com/llvm/llvm-project/pull/136236#issuecomment-2819772879)
in #136236, apply a minor renaming of statistics, and remerge. See
details below.
# Changes in #136236
Currently, `DebuggerStats::ReportStatistics()` calls
`Module::GetSymtab(/*can_create=*/false)`, but then the latter calls
`SymbolFile::GetSymtab()`. This will load symbols if haven't yet. See
stacktrace below.
The problem is that `DebuggerStats::ReportStatistics` should be
read-only. This is especially important because it reports stats for
symtab parsing/indexing time, which could be affected by the reporting
itself if it's not read-only.
This patch fixes this problem by adding an optional parameter
`SymbolFile::GetSymtab(bool can_create = true)` and receiving the
`false` value passed down from `Module::GetSymtab(/*can_create=*/false)`
when the call is initiated from `DebuggerStats::ReportStatistics()`.
---
Notes about the following stacktrace:
1. This can be reproduced. Create a helloworld program on **macOS** with
dSYM, add `settings set target.preload-symbols false` to `~/.lldbinit`,
do `lldb a.out`, then `statistics dump`.
2. `ObjectFile::GetSymtab` has `llvm::call_once`. So the fact that it
called into `ObjectFileMachO::ParseSymtab` means that the symbol table
is actually being parsed.
```
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000124c4d5a0 LLDB`ObjectFileMachO::ParseSymtab(this=0x0000000111504e40, symtab=0x0000600000a05e00) at ObjectFileMachO.cpp:2259:44
* frame #1: 0x0000000124fc50a0 LLDB`lldb_private::ObjectFile::GetSymtab()::$_0::operator()(this=0x000000016d35c858) const at ObjectFile.cpp:761:9
frame #5: 0x0000000124fc4e68 LLDB`void std::__1::__call_once_proxy[abi:v160006]<std::__1::tuple<lldb_private::ObjectFile::GetSymtab()::$_0&&>>(__vp=0x000000016d35c7f0) at mutex:652:5
frame #6: 0x0000000198afb99c libc++.1.dylib`std::__1::__call_once(unsigned long volatile&, void*, void (*)(void*)) + 196
frame #7: 0x0000000124fc4dd0 LLDB`void std::__1::call_once[abi:v160006]<lldb_private::ObjectFile::GetSymtab()::$_0>(__flag=0x0000600003920080, __func=0x000000016d35c858) at mutex:670:9
frame #8: 0x0000000124fc3cb0 LLDB`void llvm::call_once<lldb_private::ObjectFile::GetSymtab()::$_0>(flag=0x0000600003920080, F=0x000000016d35c858) at Threading.h:88:5
frame #9: 0x0000000124fc2bc4 LLDB`lldb_private::ObjectFile::GetSymtab(this=0x0000000111504e40) at ObjectFile.cpp:755:5
frame #10: 0x0000000124fe0a28 LLDB`lldb_private::SymbolFileCommon::GetSymtab(this=0x0000000104865200) at SymbolFile.cpp:158:39
frame #11: 0x0000000124d8fedc LLDB`lldb_private::Module::GetSymtab(this=0x00000001113041a8, can_create=false) at Module.cpp:1027:21
frame #12: 0x0000000125125bdc LLDB`lldb_private::DebuggerStats::ReportStatistics(debugger=0x000000014284d400, target=0x0000000115808200, options=0x000000014195d6d1) at Statistics.cpp:329:30
frame #13: 0x0000000125672978 LLDB`CommandObjectStatsDump::DoExecute(this=0x000000014195d540, command=0x000000016d35d820, result=0x000000016d35e150) at CommandObjectStats.cpp:144:18
frame #14: 0x0000000124f29b40 LLDB`lldb_private::CommandObjectParsed::Execute(this=0x000000014195d540, args_string="", result=0x000000016d35e150) at CommandObject.cpp:832:9
frame #15: 0x0000000124efbd70 LLDB`lldb_private::CommandInterpreter::HandleCommand(this=0x0000000141b22f30, command_line="statistics dump", lazy_add_to_history=eLazyBoolCalculate, result=0x000000016d35e150, force_repeat_command=false) at CommandInterpreter.cpp:2134:14
frame #16: 0x0000000124f007f4 LLDB`lldb_private::CommandInterpreter::IOHandlerInputComplete(this=0x0000000141b22f30, io_handler=0x00000001419b2aa8, line="statistics dump") at CommandInterpreter.cpp:3251:3
frame #17: 0x0000000124d7b5ec LLDB`lldb_private::IOHandlerEditline::Run(this=0x00000001419b2aa8) at IOHandler.cpp:588:22
frame #18: 0x0000000124d1e8fc LLDB`lldb_private::Debugger::RunIOHandlers(this=0x000000014284d400) at Debugger.cpp:1225:16
frame #19: 0x0000000124f01f74 LLDB`lldb_private::CommandInterpreter::RunCommandInterpreter(this=0x0000000141b22f30, options=0x000000016d35e63c) at CommandInterpreter.cpp:3543:16
frame #20: 0x0000000122840294 LLDB`lldb::SBDebugger::RunCommandInterpreter(this=0x000000016d35ebd8, auto_handle_events=true, spawn_thread=false) at SBDebugger.cpp:1212:42
frame #21: 0x0000000102aa6d28 lldb`Driver::MainLoop(this=0x000000016d35ebb8) at Driver.cpp:621:18
frame #22: 0x0000000102aa75b0 lldb`main(argc=1, argv=0x000000016d35f548) at Driver.cpp:829:26
frame #23: 0x0000000198858274 dyld`start + 2840
```
# Changes in this PR top of the above
Fix a [test
failure](https://github.com/llvm/llvm-project/pull/136236#issuecomment-2819772879)
in `TestStats.py`. The original version of the added test checks that
all modules have symbol count zero when `target.preload-symbols ==
false`. The test failed on macOS. Due to various reasons, on macOS,
symbols can be loaded for dylibs even with that setting, but not for the
main module. For now, the fix of the test is to limit the assertion to
only the main module. The test now passes on macOS. In the future, when
we have a way to control a specific list of plug-ins to be loaded, there
may be a configuration that this test can use to assert that all modules
have symbol count zero.
Apply a minor renaming of statistics, per the
[suggestion](https://github.com/llvm/llvm-project/pull/136226#issuecomment-2825080275)
in #136226 after merge.
This reverts commit d5b40c71f6be972f677de5d9886f91866df007b5.
This change broke greendragon lldb test:
lldb-api :: commands/statistics/basic/TestStats.py
And is therefore being reverted.
Currently, `DebuggerStats::ReportStatistics()` calls
`Module::GetSymtab(/*can_create=*/false)`, but then the latter calls
`SymbolFile::GetSymtab()`. This will load symbols if haven't yet. See
stacktrace below.
The problem is that `DebuggerStats::ReportStatistics` should be
read-only. This is especially important because it reports stats for
symtab parsing/indexing time, which could be affected by the reporting
itself if it's not read-only.
This patch fixes this problem by adding an optional parameter
`SymbolFile::GetSymtab(bool can_create = true)` and receive the `false`
value passed down from `Module::GetSymtab(/*can_create=*/false)` when
the call was initiated from `DebuggerStats::ReportStatistics()`.
Add ObjectFile::GetObjectName and SymbolFile::GetObjectName to retrieve
the name of the object file, including the `.a` for static libraries.
We currently do something similar in CommandObjectTarget, but the code
for dumping this is a lot more involved than what's being offered by the
new method. We have options to print he full path, the base name, and
the directoy of the path and trim it to a specific width.
This is motivated by #133211, where Greg pointed out that the old code
would print the static archive (the .a file) rather than the actual
object file inside of it.
Lots of code around LLDB was directly accessing the target's section
load list. This NFC patch makes the section load list private so the
Target class can access it, but everyone else now uses accessor
functions. This allows us to control the resolving of addresses and will
allow for functionality in LLDB which can lazily resolve addresses in
JIT plug-ins with a future patch.
Compared to the python version, this also does type checking and error
handling, so it's slightly longer, however, it's still comfortably
under 500 lines.
Relanding with more explicit type conversions.
This reverts commit f6012a209dca6b1866d00e6b4f96279469884320.
Revert "[lldb] Add cast to fix compile error on 32-but platforms"
This reverts commit d300337e93da4ed96b044557e4b0a30001967cf0.
Revert "[lldb] Improve log message to include missing strings"
This reverts commit 0be33484853557bc0fd9dfb94e0b6c15dda136ce.
Revert "[lldb] Add comment"
This reverts commit e2bb47443d2e5c022c7851dd6029e3869fc8835c.
Revert "[lldb] Implement a formatter bytecode interpreter in C++"
This reverts commit 9a9c1d4a6155a96ce9be494cec7e25731d36b33e.
Compared to the python version, this also does type checking and error
handling, so it's slightly longer, however, it's still comfortably
under 500 lines.
Add support for type summaries embedded into the binary.
These embedded summaries will typically be generated by Swift macros,
but can also be generated by any other means.
rdar://115184658
This patch improves the ability of a ObjectFileELF instance to read the
.dynamic section. It adds the ability to read the .dynamic section from
the PT_DYNAMIC program header which is useful for ELF files that have no
section headers and for ELF files that are read from memory. It cleans
up the usage of the .dynamic entries so that
ObjectFileELF::ParseDynamicSymbols() is the only code that parses
.dynamic entries, teaches that function the read and store the string
values for each .dynamic entry. We now dump the .dynamic entries in the
output of "image dump objfile". It also cleans up the code that gets the
dynamic string table so that it can grab it from the DT_STRTAB and
DT_STRSZ .dynamic entries for when we have a ELF file with no section
headers or we are reading it from memory.
This patch improves the ability of a ObjectFileELF instance to read the .dynamic section. It adds the ability to read the .dynamic section from the PT_DYNAMIC program header which is useful for ELF files that have no section headers and for ELF files that are read from memory. It cleans up the usage of the .dynamic entries so that ObjectFileELF::ParseDynamicSymbols() is the only code that parses .dynamic entries, teaches that function the read and store the string values for each .dynamic entry. We now dump the .dynamic entries in the output of "image dump objfile". It also cleans up the code that gets the dynamic string table so that it can grab it from the DT_STRTAB and DT_STRSZ .dynamic entries for when we have a ELF file with no section headers or we are reading it from memory.
Currently, LLDB prints out a rather unhelpful error message when passed
a file that it doesn't recognize as an executable.
> error: '/path/to/file' doesn't contain any 'host' platform
> architectures: arm64, armv7, armv7f, armv7k, armv7s, armv7m, armv7em,
> armv6m, armv6, armv5, armv4, arm, thumbv7, thumbv7k, thumbv7s,
> thumbv7f, thumbv7m, thumbv7em, thumbv6m, thumbv6, thumbv5, thumbv4t,
> thumb, x86_64, x86_64, arm64, arm64e
I did a quick search internally and found at least 24 instances of users
being confused by this. This patch improves the error message when it
doesn't recognize the file as an executable, but keeps the existing
error message otherwise, i.e. when it's an object file we understand,
but the current platform doesn't support.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
In Apple's downstream fork, there is support for understanding the swift
AST sections in various binaries. Even though the lldb on llvm.org does
not have support for debugging swift, I think it makes sense to move
support for recognizing swift ast sections upstream.
Differential Revision: https://reviews.llvm.org/D159142
There can be zero padding bytes at a section end for file alignment in
PECOFF. Exclude those padding bytes when reading the section data.
Differential Revision: https://reviews.llvm.org/D157059
This patch adds support for creating modules from JSON object files.
This is necessary for the crashlog use case where we don't have either a
module or a symbol file. In that case the ObjectFileJSON serves as both.
The patch adds support for an object file type (i.e. executable, shared
library, etc). It also adds the ability to specify sections, which is
necessary in order specify symbols by address. Finally, this patch
improves error handling and fixes a bug where we wouldn't read more than
the initial 512 bytes in GetModuleSpecifications.
Differential revision: https://reviews.llvm.org/D148062
Currently, all data buffers are assumed to be writable. This is a
problem on macOS where it's not allowed to load unsigned binaries in
memory as writable. To be more precise, MAP_RESILIENT_CODESIGN and
MAP_RESILIENT_MEDIA need to be set for mapped (unsigned) binaries on our
platform.
Binaries are mapped through FileSystem::CreateDataBuffer which returns a
DataBufferLLVM. The latter is backed by a llvm::WritableMemoryBuffer
because every DataBuffer in LLDB is considered to be writable. In order
to use a read-only llvm::MemoryBuffer I had to split our abstraction
around it.
This patch distinguishes between a DataBuffer (read-only) and
WritableDataBuffer (read-write) and updates LLDB to use the appropriate
one.
rdar://74890607
Differential revision: https://reviews.llvm.org/D122856
Most of our code was including Log.h even though that is not where the
"lldb" log channel is defined (Log.h defines the generic logging
infrastructure). This worked because Log.h included Logging.h, even
though it should.
After the recent refactor, it became impossible the two files include
each other in this direction (the opposite inclusion is needed), so this
patch removes the workaround that was put in place and cleans up all
files to include the right thing. It also renames the file to LLDBLog to
better reflect its purpose.
This is an updated version of the https://reviews.llvm.org/D113789 patch with the following changes:
- We no longer modify modification times of the cache files
- Use LLVM caching and cache pruning instead of making a new cache mechanism (See DataFileCache.h/.cpp)
- Add signature to start of each file since we are not using modification times so we can tell when caches are stale and remove and re-create the cache file as files are changed
- Add settings to control the cache size, disk percentage and expiration in days to keep cache size under control
This patch enables symbol tables to be cached in the LLDB index cache directory. All cache files are in a single directory and the files use unique names to ensure that files from the same path will re-use the same file as files get modified. This means as files change, their cache files will be deleted and updated. The modification time of each of the cache files is not modified so that access based pruning of the cache can be implemented.
The symbol table cache files start with a signature that uniquely identifies a file on disk and contains one or more of the following items:
- object file UUID if available
- object file mod time if available
- object name for BSD archive .o files that are in .a files if available
If none of these signature items are available, then the file will not be cached. This keeps temporary object files from expressions from being cached.
When the cache files are loaded on subsequent debug sessions, the signature is compare and if the file has been modified (uuid changes, mod time changes, or object file mod time changes) then the cache file is deleted and re-created.
Module caching must be enabled by the user before this can be used:
symbols.enable-lldb-index-cache (boolean) = false
(lldb) settings set symbols.enable-lldb-index-cache true
There is also a setting that allows the user to specify a module cache directory that defaults to a directory that defaults to being next to the symbols.clang-modules-cache-path directory in a temp directory:
(lldb) settings show symbols.lldb-index-cache-path
/var/folders/9p/472sr0c55l9b20x2zg36b91h0000gn/C/lldb/IndexCache
If this setting is enabled, the finalized symbol tables will be serialized and saved to disc so they can be quickly loaded next time you debug.
Each module can cache one or more files in the index cache directory. The cache file names must be unique to a file on disk and its architecture and object name for .o files in BSD archives. This allows universal mach-o files to support caching multuple architectures in the same module cache directory. Making the file based on the this info allows this cache file to be deleted and replaced when the file gets updated on disk. This keeps the cache from growing over time during the compile/edit/debug cycle and prevents out of space issues.
If the cache is enabled, the symbol table will be loaded from the cache the next time you debug if the module has not changed.
The cache also has settings to control the size of the cache on disk. Each time LLDB starts up with the index cache enable, the cache will be pruned to ensure it stays within the user defined settings:
(lldb) settings set symbols.lldb-index-cache-expiration-days <days>
A value of zero will disable cache files from expiring when the cache is pruned. The default value is 7 currently.
(lldb) settings set symbols.lldb-index-cache-max-byte-size <size>
A value of zero will disable pruning based on a total byte size. The default value is zero currently.
(lldb) settings set symbols.lldb-index-cache-max-percent <percentage-of-disk-space>
A value of 100 will allow the disc to be filled to the max, a value of zero will disable percentage pruning. The default value is zero.
Reviewed By: labath, wallace
Differential Revision: https://reviews.llvm.org/D115324
Symbol table parsing has evolved over the years and many plug-ins contained duplicate code in the ObjectFile::GetSymtab() that used to be pure virtual. With this change, the "Symbtab *ObjectFile::GetSymtab()" is no longer virtual and will end up calling a new "void ObjectFile::ParseSymtab(Symtab &symtab)" pure virtual function to actually do the parsing. This helps centralize the code for parsing the symbol table and allows the ObjectFile base class to do all of the common work, like taking the necessary locks and creating the symbol table object itself. Plug-ins now just need to parse when they are asked to parse as the ParseSymtab function will only get called once.
This is a retry of the original patch https://reviews.llvm.org/D113965 which was reverted. There was a deadlock in the Manual DWARF indexing code during symbol preloading where the module was asked on the main thread to preload its symbols, and this would in turn cause the DWARF manual indexing to use a thread pool to index all of the compile units, and if there were relocations on the debug information sections, these threads could ask the ObjectFile to load section contents, which could cause a call to ObjectFileELF::RelocateSection() which would ask for the symbol table from the module and it would deadlock. We can't lock the module in ObjectFile::GetSymtab(), so the solution I am using is to use a llvm::once_flag to create the symbol table object once and then lock the Symtab object. Since all APIs on the symbol table use this lock, this will prevent anyone from using the symbol table before it is parsed and finalized and will avoid the deadlock I mentioned. ObjectFileELF::GetSymtab() was never locking the module lock before and would put off creating the symbol table until somewhere inside ObjectFileELF::GetSymtab(). Now we create it one time inside of the ObjectFile::GetSymtab() and immediately lock it which should be safe enough. This avoids the deadlocks and still provides safety.
Differential Revision: https://reviews.llvm.org/D114288
This reverts commit 951b107eedab1829f18049443f03339dbb0db165.
Buildbots were failing, there is a deadlock in /Users/gclayton/Documents/src/llvm/clean/llvm-project/lldb/test/Shell/SymbolFile/DWARF/DW_AT_range-DW_FORM_sec_offset.s when ELF files try to relocate things.
Symbol table parsing has evolved over the years and many plug-ins contained duplicate code in the ObjectFile::GetSymtab() that used to be pure virtual. With this change, the "Symbtab *ObjectFile::GetSymtab()" is no longer virtual and will end up calling a new "void ObjectFile::ParseSymtab(Symtab &symtab)" pure virtual function to actually do the parsing. This helps centralize the code for parsing the symbol table and allows the ObjectFile base class to do all of the common work, like taking the necessary locks and creating the symbol table object itself. Plug-ins now just need to parse when they are asked to parse as the ParseSymtab function will only get called once.
Differential Revision: https://reviews.llvm.org/D113965
This is a resubmission of https://reviews.llvm.org/D105160 after fixing testing issues.
This fix was created after profiling the target creation of a large C/C++/ObjC application that contained almost 4,000,000 redacted symbol names. The symbol table parsing code was creating names for each of these synthetic symbols and adding them to the name indexes. The code was also adding the object file basename to the end of the symbol name which doesn't allow symbols from different shared libraries to share the names in the constant string pool.
Prior to this fix this was creating 180MB of "___lldb_unnamed_symbol" symbol names and was taking a long time to generate each name, add them to the string pool and then add each of these names to the name index.
This patch fixes the issue by:
not adding a name to synthetic symbols at creation time, and allows name to be dynamically generated when accessed
doesn't add synthetic symbol names to the name indexes, but catches this special case as name lookup time. Users won't typically set breakpoints or lookup these synthetic names, but support was added to do the lookup in case it does happen
removes the object file baseanme from the generated names to allow the names to be shared in the constant string pool
Prior to this fix the startup times for a large application was:
12.5 seconds (cold file caches)
8.5 seconds (warm file caches)
After this fix:
9.7 seconds (cold file caches)
5.7 seconds (warm file caches)
The names of the symbols are auto generated by appending the symbol's UserID to the end of the "___lldb_unnamed_symbol" string and is only done when the name is requested from a synthetic symbol if it has no name.
Differential Revision: https://reviews.llvm.org/D106837
This reverts commit c8164d0276b97679e80db01adc860271ab4a5d11 and
43f6dad2344247976d5777f56a1fc29e39c6c717 because it breaks
TestDyldTrieSymbols.py on GreenDragon.
This fix was created after profiling the target creation of a large C/C++/ObjC application that contained almost 4,000,000 redacted symbol names. The symbol table parsing code was creating names for each of these synthetic symbols and adding them to the name indexes. The code was also adding the object file basename to the end of the symbol name which doesn't allow symbols from different shared libraries to share the names in the constant string pool.
Prior to this fix this was creating 180MB of "___lldb_unnamed_symbol" symbol names and was taking a long time to generate each name, add them to the string pool and then add each of these names to the name index.
This patch fixes the issue by:
- not adding a name to synthetic symbols at creation time, and allows name to be dynamically generated when accessed
- doesn't add synthetic symbol names to the name indexes, but catches this special case as name lookup time. Users won't typically set breakpoints or lookup these synthetic names, but support was added to do the lookup in case it does happen
- removes the object file baseanme from the generated names to allow the names to be shared in the constant string pool
Prior to this fix the startup times for a large application was:
12.5 seconds (cold file caches)
8.5 seconds (warm file caches)
After this fix:
9.7 seconds (cold file caches)
5.7 seconds (warm file caches)
The names of the symbols are auto generated by appending the symbol's UserID to the end of the "___lldb_unnamed_symbol" string and is only done when the name is requested from a synthetic symbol if it has no name.
Differential Revision: https://reviews.llvm.org/D105160
Reverts commits:
"Fix failing tests after https://reviews.llvm.org/D104488."
"Fix buildbot failure after https://reviews.llvm.org/D104488."
"Create synthetic symbol names on demand to improve memory consumption and startup times."
This series of commits broke the windows lldb bot and then failed to fix all of the failing tests.
This fix was created after profiling the target creation of a large C/C++/ObjC application that contained almost 4,000,000 redacted symbol names. The symbol table parsing code was creating names for each of these synthetic symbols and adding them to the name indexes. The code was also adding the object file basename to the end of the symbol name which doesn't allow symbols from different shared libraries to share the names in the constant string pool.
Prior to this fix this was creating 180MB of "___lldb_unnamed_symbol" symbol names and was taking a long time to generate each name, add them to the string pool and then add each of these names to the name index.
This patch fixes the issue by:
- not adding a name to synthetic symbols at creation time, and allows name to be dynamically generated when accessed
- doesn't add synthetic symbol names to the name indexes, but catches this special case as name lookup time. Users won't typically set breakpoints or lookup these synthetic names, but support was added to do the lookup in case it does happen
- removes the object file baseanme from the generated names to allow the names to be shared in the constant string pool
Prior to this fix the startup times for a large application was:
12.5 seconds (cold file caches)
8.5 seconds (warm file caches)
After this fix:
9.7 seconds (cold file caches)
5.7 seconds (warm file caches)
The names of the symbols are auto generated by appending the symbol's UserID to the end of the "___lldb_unnamed_symbol" string and is only done when the name is requested from a synthetic symbol if it has no name.
Differential Revision: https://reviews.llvm.org/D104488
Some larger projects were loading quite slowly with the current LLDB on macOS and macOS simulator builds. I did some instrument traces and found 3 main culprits:
- a LLDB timer that was put into a function that was called too often
- a std::set that was keeping track of the address of symbols that were already added
- a unnamed function generator in ObjectFile that was going slow due to allocations
In order to see this in action I ran the latest LLDB on a large application with many frameworks using the following method:
(lldb) script import time; start_time = time.perf_counter()
(lldb) file Large.app
(lldb) script print(time.perf_counter() - start_time)
I first range "sudo purge" to clear the system file caches to simulate a cold startup of the debugger, followed by two iterations with warm file caches.
Prior to this fix I was seeing the following timings:
17.68 (cold)
14.56 (warm 1)
14.52 (warm 2)
After this fix I was seeing:
11.32 (cold)
8.43 (warm 1)
8.49 (warm 2)
Differential Revision: https://reviews.llvm.org/D103504