This diff introduces a new symbol on-demand which skips
loading a module's debug info unless explicitly asked on
demand. This provides significant performance improvement
for application with dynamic linking mode which has large
number of modules.
The feature can be turned on with:
"settings set symbols.load-on-demand true"
The feature works by creating a new SymbolFileOnDemand class for
each module which wraps the actual SymbolFIle subclass as member
variable. By default, most virtual methods on SymbolFileOnDemand are
skipped so that it looks like there is no debug info for that module.
But once the module's debug info is explicitly requested to
be enabled (in the conditions mentioned below) SymbolFileOnDemand
will allow all methods to pass through and forward to the actual SymbolFile
which would hydrate module's debug info on-demand.
In an internal benchmark, we are seeing more than 95% improvement
for a 3000 modules application.
Currently we are providing several ways to on demand hydrate
a module's debug info:
* Source line breakpoint: matching in supported files
* Stack trace: resolving symbol context for an address
* Symbolic breakpoint: symbol table match guided promotion
* Global variable: symbol table match guided promotion
In all above situations the module's debug info will be on-demand
parsed and indexed.
Some follow-ups for this feature:
* Add a command that allows users to load debug info explicitly while using a
new or existing command when this feature is enabled
* Add settings for "never load any of these executables in Symbols On Demand"
that takes a list of globs
* Add settings for "always load the the debug info for executables in Symbols
On Demand" that takes a list of globs
* Add a new column in "image list" that shows up by default when Symbols On
Demand is enable to show the status for each shlib like "not enabled for
this", "debug info off" and "debug info on" (with a single character to
short string, not the ones I just typed)
Differential Revision: https://reviews.llvm.org/D121631
Unlike for any of the other shells, we were escaping $ when using tcsh.
There's nothing special about $ in tcsh and this prevents you from
expanding shell variables, one of the main reasons this functionality
exists in the first place.
Differential revision: https://reviews.llvm.org/D123690
LLDB supports having globbing regexes in the process launch arguments
that will be resolved using the user's shell. This requires that we pass
the launch args to the shell and then read back the expanded arguments
using LLDB's argdumper utility.
As the shell will not just expand the globbing regexes but all special
characters, we need to escape all non-globbing charcters such as $, &,
<, >, etc. as those otherwise are interpreted and removed in the step
where we expand the globbing characters. Also because the special
characters are shell-specific, LLDB needs to maintain a list of all the
characters that need to be escaped for each specific shell.
This patch adds the missing semicolon character to the escape list for
all currently supported shells. Without this having a semicolon in the
binary path or having a semicolon in the launch arguments will cause the
argdumping process to fail. E.g., lldb -- ./calc "a;b" was failing
before but is working now.
Fixes rdar://55776943
Differential revision: https://reviews.llvm.org/D104629
Currently, all data buffers are assumed to be writable. This is a
problem on macOS where it's not allowed to load unsigned binaries in
memory as writable. To be more precise, MAP_RESILIENT_CODESIGN and
MAP_RESILIENT_MEDIA need to be set for mapped (unsigned) binaries on our
platform.
Binaries are mapped through FileSystem::CreateDataBuffer which returns a
DataBufferLLVM. The latter is backed by a llvm::WritableMemoryBuffer
because every DataBuffer in LLDB is considered to be writable. In order
to use a read-only llvm::MemoryBuffer I had to split our abstraction
around it.
This patch distinguishes between a DataBuffer (read-only) and
WritableDataBuffer (read-write) and updates LLDB to use the appropriate
one.
rdar://74890607
Differential revision: https://reviews.llvm.org/D122856
I found this function somewhat hard to read and removed a few entirely
redundant checks and converted it to early exits.
Differential Revision: https://reviews.llvm.org/D122912
Applied modernize-use-equals-default clang-tidy check over LLDB.
This check is already present in the lldb/.clang-tidy config.
Differential Revision: https://reviews.llvm.org/D121844
Update the response schema of the TraceGetState packet and add
Intel PT specific response structure that contains the TSC conversion,
if it exists. The IntelPTCollector loads the TSC conversion and caches
it to prevent unnecessary calls to perf_event_open. Move the TSC conversion
calculation from Perf.h to TraceIntelPTGDBRemotePackets.h to remove
dependency on Linux specific headers.
Differential Revision: https://reviews.llvm.org/D122246
Applied modernize-use-default-member-init clang-tidy check over LLDB.
It appears in many files we had already switched to in class member init but
never updated the constructors to reflect that. This check is already present in
the lldb/.clang-tidy config.
Differential Revision: https://reviews.llvm.org/D121481
This patch removes the ability to instantiate the LLDB FileSystem class
with a FileCollector. It keeps the ability to collect files, but uses
the FileCollectorFileSystem to do that transparently.
Because the two are intertwined, this patch also removes the
finalization logic which copied the files over out of process.
1) Make the BreakpointEventData::Dump actually do something useful.
2) Make the Breakpoint events print when the break log channel is on
without having to turn on the events channel.
Differential Revision: https://reviews.llvm.org/D120917
I was looking at Stream::PutRawBytes and thought I spotted a bug because
both loops are using `i < src_len` as the loop condition despite them
iterating in opposite directions.
On closer inspection, the existing code is correct, because it relies on
well-defined unsigned integer wrapping. Correct doesn't mean readable,
so this patch changes the loop condition to compare against 0 when
decrementing i while still covering the edge case of src_len potentially
being 0 itself.
Differential revision: https://reviews.llvm.org/D119857
Don't resize DataBufferHeap if the newly requested size exceeds the
capacity of the underlying data structure, i.e. std::vector<uint8_t>.
This matches the existing check in the DataBufferHeap constructor.
Most of our code was including Log.h even though that is not where the
"lldb" log channel is defined (Log.h defines the generic logging
infrastructure). This worked because Log.h included Logging.h, even
though it should.
After the recent refactor, it became impossible the two files include
each other in this direction (the opposite inclusion is needed), so this
patch removes the workaround that was put in place and cleans up all
files to include the right thing. It also renames the file to LLDBLog to
better reflect its purpose.
Remove ConstString::StaticMemorySize as it is unused and superseded by
GetMemoryStats. It is referenced in a bunch of doc comments but I don't
really understand why. My best guess it that the comments were
copy-pasted from ConstString::MemorySize() even though it didn't make
sense there either. The implementation of StaticMemorySize was being
called on the MemoryPool, not on the ConstString itself.
Differential revision: https://reviews.llvm.org/D118091
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
This patch makes use of c++ type checking and scoped enums to make
logging statements shorter and harder to misuse.
Defines like LIBLLDB_LOG_PROCESS are replaces with LLDBLog::Process.
Because it now carries type information we do not need to worry about
matching a specific enum value with the right getter function -- the
compiler will now do that for us.
The main entry point for the logging machinery becomes the GetLog
(template) function, which will obtain the correct Log object based on
the enum type. It achieves this through another template function
(LogChannelFor<T>), which must be specialized for each type, and should
return the appropriate channel object.
This patch also removes the ability to log a message if multiple
categories are enabled simultaneously as it was unused and confusing.
This patch does not actually remove any of the existing interfaces. The
defines and log retrieval functions are left around as wrappers around
the new interfaces. They will be removed in follow-up patch.
Differential Revision: https://reviews.llvm.org/D117490
Add statistics about the memory usage of the string pool. I'm
particularly interested in the memory used by the allocator, i.e. the
number of bytes actually used by the allocator it self as well as the
number of bytes allocated through the allocator.
Differential revision: https://reviews.llvm.org/D117914
The logic of `g_quiet` was inverted in D26243. This corrects the issue.
Without this, running `log timers enable` produces a high volume of incremental
timer output.
Differential Revision: https://reviews.llvm.org/D117837
Instrument the SB API with signposts on Darwin. This gives us a time
profile on whose behalf LLDB spends time (particularly when run via the
SBAPI from an IDE).
Differential revision: https://reviews.llvm.org/D117632
Remove the last remaining references to the reproducers from the
instrumentation. This patch renames the relevant files and macros.
Differential revision: https://reviews.llvm.org/D117712
Multithreaded applications using fork(2) need to be extra careful about
what they do in the fork child. Without any special precautions (which
only really work if you can fully control all threads) they can only
safely call async-signal-safe functions. This is because the forked
child will contain snapshot of the parents memory at a random moment in
the execution of all of the non-forking threads (this is where the
similarity with signals comes in).
For example, the other threads could have been holding locks that can
now never be released in the child process and any attempt to obtain
them would block. This is what sometimes happen when using tcmalloc --
our fork child ends up hanging in the memory allocation routine. It is
also what happened with our logging code, which is why we added a
pthread_atfork hackaround.
This patch implements a proper fix to the problem, by which is to make
the child code async-signal-safe. The ProcessLaunchInfo structure is
transformed into a simpler ForkLaunchInfo representation, one which can
be read without allocating memory and invoking complex library
functions.
Strictly speaking this implementation is not async-signal-safe, as it
still invokes library functions outside of the posix-blessed set of
entry points. Strictly adhering to the spec would mean reimplementing a
lot of the functionality in pure C, so instead I rely on the fact that
any reasonable implementation of some functions (e.g.,
basic_string::c_str()) will not start allocating memory or doing other
unsafe things.
The new child code does not call into our logging infrastructure, which
enables us to remove the pthread_atfork call from there.
Differential Revision: https://reviews.llvm.org/D116165
This is an updated version of the https://reviews.llvm.org/D113789 patch with the following changes:
- We no longer modify modification times of the cache files
- Use LLVM caching and cache pruning instead of making a new cache mechanism (See DataFileCache.h/.cpp)
- Add signature to start of each file since we are not using modification times so we can tell when caches are stale and remove and re-create the cache file as files are changed
- Add settings to control the cache size, disk percentage and expiration in days to keep cache size under control
This patch enables symbol tables to be cached in the LLDB index cache directory. All cache files are in a single directory and the files use unique names to ensure that files from the same path will re-use the same file as files get modified. This means as files change, their cache files will be deleted and updated. The modification time of each of the cache files is not modified so that access based pruning of the cache can be implemented.
The symbol table cache files start with a signature that uniquely identifies a file on disk and contains one or more of the following items:
- object file UUID if available
- object file mod time if available
- object name for BSD archive .o files that are in .a files if available
If none of these signature items are available, then the file will not be cached. This keeps temporary object files from expressions from being cached.
When the cache files are loaded on subsequent debug sessions, the signature is compare and if the file has been modified (uuid changes, mod time changes, or object file mod time changes) then the cache file is deleted and re-created.
Module caching must be enabled by the user before this can be used:
symbols.enable-lldb-index-cache (boolean) = false
(lldb) settings set symbols.enable-lldb-index-cache true
There is also a setting that allows the user to specify a module cache directory that defaults to a directory that defaults to being next to the symbols.clang-modules-cache-path directory in a temp directory:
(lldb) settings show symbols.lldb-index-cache-path
/var/folders/9p/472sr0c55l9b20x2zg36b91h0000gn/C/lldb/IndexCache
If this setting is enabled, the finalized symbol tables will be serialized and saved to disc so they can be quickly loaded next time you debug.
Each module can cache one or more files in the index cache directory. The cache file names must be unique to a file on disk and its architecture and object name for .o files in BSD archives. This allows universal mach-o files to support caching multuple architectures in the same module cache directory. Making the file based on the this info allows this cache file to be deleted and replaced when the file gets updated on disk. This keeps the cache from growing over time during the compile/edit/debug cycle and prevents out of space issues.
If the cache is enabled, the symbol table will be loaded from the cache the next time you debug if the module has not changed.
The cache also has settings to control the size of the cache on disk. Each time LLDB starts up with the index cache enable, the cache will be pruned to ensure it stays within the user defined settings:
(lldb) settings set symbols.lldb-index-cache-expiration-days <days>
A value of zero will disable cache files from expiring when the cache is pruned. The default value is 7 currently.
(lldb) settings set symbols.lldb-index-cache-max-byte-size <size>
A value of zero will disable pruning based on a total byte size. The default value is zero currently.
(lldb) settings set symbols.lldb-index-cache-max-percent <percentage-of-disk-space>
A value of 100 will allow the disc to be filled to the max, a value of zero will disable percentage pruning. The default value is zero.
Reviewed By: labath, wallace
Differential Revision: https://reviews.llvm.org/D115324
The change to ArchSpec::SetArchitecture that was setting the
ObjectFile of a mach-o binary to llvm::Triple::MachO. It's not
necessary for my patch, and it changes the output of image list -t
causing TestUniversal.py to fail on x86_64 systems. The bots
turned up the failure, I was developing and testing this on
an Apple Silicon mac.
With arm64e ARMv8.3 pointer authentication, lldb needs to know how
many bits are used for addressing and how many are used for pointer
auth signing. This should be determined dynamically from the inferior
system / corefile, but there are some workflows where it still isn't
recorded and we fall back on a default value that is correct on some
Darwin environments.
This patch also explicitly sets the vendor of mach-o binaries to
Apple, so we select an Apple ABI instead of a random other ABI.
It adds a function pointer formatter for systems where pointer
authentication is in use, and we can strip the ptrauth bits off
of the function pointer address and get a different value that
points to an actual symbol.
Differential Revision: https://reviews.llvm.org/D115431
rdar://84644661
DataEncoder was previously made to modify data within an existing buffer. As the code progressed, new clients started using DataEncoder to create binary data. In these cases the use of this class was possibly, but only if you knew exactly how large your buffer would be ahead of time. This patchs adds the ability for DataEncoder to own a buffer that can be dynamically resized as data is appended to the buffer.
Change in this patch:
- Allow a DataEncoder object to be created that owns a DataBufferHeap object that can dynamically grow as data is appended
- Add new methods that start with "Append" to append data to the buffer and grow it as needed
- Adds full testing of the API to assure modifications don't regress any functionality
- Has two constructors: one that uses caller owned data and one that creates an object with object owned data
- "Append" methods only work if the object owns it own data
- Removes the ability to specify a shared memory buffer as no one was using this functionality. This allows us to switch to a case where the object owns its own data in a DataBufferHeap that can be resized as data is added
"Put" methods work on both caller and object owned data.
"Append" methods work on only object owned data where we can grow the buffer. These methods will return false if called on a DataEncoder object that has caller owned data.
The main reason for these modifications is to be able to use the DateEncoder objects instead of llvm::gsym::FileWriter in https://reviews.llvm.org/D113789. This patch wants to add the ability to create symbol table caching to LLDB and the code needs to build binary caches and save them to disk.
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D115073
Since a8b54834a186f5570b49b614e31b961a9cf1cbfe, there are two
distinct Windows path styles, `windows_backslash` (with the old
`windows` being an alias for it) and `windows_slash`.
4e4883e1f394f7c47ff3adee48039aa8374bb8d0 added helpers for
inspecting path styles.
The newly added windows_slash path style doesn't end up used in
LLDB yet anyway, as LLDB is quite decoupled from most of
llvm::sys::path and uses its own FileSpec class. To take it in
use, it could be hooked up in `FileSpec::Style::GetNativeStyle`
(in lldb/source/Utility/FileSpec.cpp) just like in the `real_style`
function in llvm/lib/Support/Path.cpp in
df0ba47c36f6bd0865e3286853b76d37e037c2d7.
It is not currently clear whether there's a real need for using
the Windows path style with forward slashes in LLDB (if there's any
other applications interacting with it, expecting that style), and
what other changes in LLDB are needed for that to work, but this
at least makes some of the checks more ready for the new style,
simplifying code a bit.
Differential Revision: https://reviews.llvm.org/D113255
Use llvm::Optional<uint16_t> instead of int for port number
in UriParser::Parse(), and use llvm::None to indicate missing port
instead of a magic value of -1.
Differential Revision: https://reviews.llvm.org/D112309
Remove Status::WasInterrupted() that checks whether the underlying error
code matches EINTR. ProcessGDBRemote::ConnectToDebugserver() is its
only call site, and it does not seem correct there. After all, EINTR
is precisely when we want to retry, not stop retrying. Furthermore,
it should not really matter since we should be catching EINTR
immediately via llvm::sys::RetryAfterSignal() but that's another story.
Differential Revision: https://reviews.llvm.org/D111908
This makes the compiler generated code for accessing the thread local
variable much simpler (no need for wrapper functions and weak pointers
to potential init functions), and can avoid toolchain bugs regarding how
to access TLS variables.
In particular, this fixes LLDB when built with current GCC/binutils for
MinGW, see https://github.com/msys2/MINGW-packages/issues/8868.
Differential Revision: https://reviews.llvm.org/D111779
This reverts commits f9aba9a5afe09788eceb9879aa5c3ad345e0f1e9 and
035217ff515b8ecdc871e39fa840f3cba1b9cec7.
As explained in the original commit message, this didn't have the
intended effect of improving the common LLDB use case, but still
provided a marginal improvement for the places where LLDB creates a
scoped time with a string literal.
The reason for the revert is that this change pulls in the os/signpost.h
header in Signposts.h. The former transitively includes loader.h, which
contains a series of macro defines that conflict with MachO.h. There are
ways to work around that, but Adrian and I concluded that none of them
are worth the trade-off in complicating Signposts.h even further.
Implement the simpler vRun packet and prefer it over the A packet.
Unlike the latter, it tranmits command-line arguments without redundant
indices and lengths. This also improves GDB compatibility since modern
versions of gdbserver do not implement the A packet at all.
Make qLaunchSuccess not obligatory when using vRun. It is not
implemented by gdbserver, and since vRun returns the stop reason,
we can assume it to be successful.
Differential Revision: https://reviews.llvm.org/D107931
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
Add a new SaveCore() process method that can be used to request a core
dump. This is currently implemented on NetBSD via the PT_DUMPCORE
ptrace(2) request, and enabled via 'savecore' extension.
Protocol-wise, a new qSaveCore packet is introduced. It accepts zero
or more semicolon-separated key:value options, invokes the core dump
and returns a key:value response. Currently the only option supported
is "path-hint", and the return value contains the "path" actually used.
The support for the feature is exposed via qSaveCore qSupported feature.
Differential Revision: https://reviews.llvm.org/D101285