Instead of returning an error when:
- it can't obtain section information from a module.
- there are other issues calculating the size.
when we encounter such an error we log the error and continue with the
other modules.
tested with
lldb/test/API/functionalities/process_save_core_minidump/TestProcessSaveCoreMinidump.py
---------
Co-authored-by: Bar Soloveychik <barsolo@fb.com>
lldb/test/API/functionalities/process_save_core_minidump/TestProcessSaveCoreMinidump64b.py
fails under msan with uninitialized memory access errors. The problem is
that a few structs are written to the dump without having been fully
initialized. This change makes them default-initialized so dumping the
fields that aren't explicitly written to won't trigger UB.
### Context
Over a year ago, I landed support for 64b Memory ranges in Minidump
(#95312). In this patch we added the Memory64 list stream, which is
effectively a Linked List on disk. The layout is a sixteen byte header
and then however many Memory descriptors.
### The Bug
This is a classic off-by one error, where I added 8 bytes instead of 16
for the header. This caused the first region to start 8 bytes before the
correct RVA, thus shifting all memory reads by 8 bytes. We are correctly
writing all the regions to disk correctly, with no physical corruption
but the RVA is defined wrong, meaning we were incorrectly reading memory

### Why wasn't this caught?
One problem we've had is forcing Minidump to actually use the 64b mode,
it would be a massive waste of resources to have a test that actually
wrote >4.2gb of IO to validate the 64b regions, and so almost all
validation has been manual. As a weakness of manual testing, this issue
is psuedo non-deterministic, as what regions end up in 64b or 32b is
handled greedily and iterated in the order it's laid out in
/proc/pid/maps. We often validated 64b was written correctly by
hexdumping the Minidump itself, which was not corrupted (other than the
BaseRVA)

### Why is this showing up now?
During internal usage, we had a bug report that the Minidump wasn't
displaying values. I was unable to repro the issue, but during my
investigation I saw the variables were in the 64b regions which resulted
in me identifying the bug.
### How do we prevent future regressions?
To prevent regressions, and honestly to save my sanity for figuring out
where 8 bytes magically came from, I've added a new API to
SBSaveCoreOptions.
```SBSaveCoreOptions::GetMemoryRegionsToSave()```
The ability to get the memory regions that we intend to include in the Coredump. I added this so we can compare what we intended to include versus what was actually included. Traditionally we've always had issues comparing regions because Minidump includes `/proc/pid/maps` and it can be difficult to know what memoryregion read failure was a genuine error or just a page that wasn't meant to be included.
We are also leveraging this API to choose the memory regions to be generated, as well as for testing what regions should be bytewise 1:1.
After much debate with @clayborg, I've moved all non-stack memory to the Memory64 List. This list doesn't incur us any meaningful overhead and Greg originally suggested doing this in the original 64b PR. This also means we're exercising the 64b path every single time we save a Minidump, preventing regressions on this feature from slipping through testing in the future.
Snippet produced by [minidump.py](https://github.com/clayborg/scripts)
```
MINIDUMP_MEMORY_LIST:
NumberOfMemoryRanges = 0x00000002
MemoryRanges[0] = [0x00007f61085ff9f0 - 0x00007f6108601000) @ 0x0003f655
MemoryRanges[1] = [0x00007ffe47e50910 - 0x00007ffe47e52000) @ 0x00040c65
MINIDUMP_MEMORY64_LIST:
NumberOfMemoryRanges = 0x000000000000002e
BaseRva = 0x0000000000042669
MemoryRanges[0] = [0x00005584162d8000 - 0x00005584162d9000)
MemoryRanges[1] = [0x00005584162d9000 - 0x00005584162db000)
MemoryRanges[2] = [0x00005584162db000 - 0x00005584162dd000)
MemoryRanges[3] = [0x00005584162dd000 - 0x00005584162ff000)
MemoryRanges[4] = [0x00007f6100000000 - 0x00007f6100021000)
MemoryRanges[5] = [0x00007f6108800000 - 0x00007f6108828000)
MemoryRanges[6] = [0x00007f6108828000 - 0x00007f610899d000)
MemoryRanges[7] = [0x00007f610899d000 - 0x00007f61089f9000)
MemoryRanges[8] = [0x00007f61089f9000 - 0x00007f6108a08000)
MemoryRanges[9] = [0x00007f6108bf5000 - 0x00007f6108bf7000)
```
### Misc
As a part of this fix I had to look at LLDB logs a lot, you'll notice I added `0x` to many of the PRIx64 `LLDB_LOGF`. This is so the user (or I) can directly copy paste the address in the logs instead of adding the hex prefix themselves.
Added some SBSaveCore tests for the new GetMemoryAPI, and Docstrings.
CC: @DavidSpickett, @da-viper @labath because we've been working together on save-core plugins, review it optional and I didn't tag you but figured you'd want to know
In #129307, we introduced read write in chunks, and during the final
revision of the PR I changed the behavior for 64b memory regions and did
not test an actual 64b memory range.
This caused LLDB to crash whenever we generated a 64b memory region.
64b regions has been a problem in testing for some time as it's a waste
of test resources to generation a 5gb+ Minidump. I will work with
@clayborg and @labath to come up with a way to specify creating a 64b
list instead of a 32b list (likely via the yamilizer).
Add a generous amount of buffer directories. I found out some LLDB forks
(internal and external) had custom ranges that could fail because we
didn't pre-account for those. To prevent this from being a problem, I've
added a large number of buffer directories at the cost of 240 bytes.
I've recently been working on Minidump File Builder again, and some of
the comments are out of date, and many of the includes are no longer
used. This patch removes unneeded includes and rephrases some comments
to better fit with the current state after the read write chunks pr.
I recently received an internal error report that LLDB was OOM'ing when
creating a Minidump. In my 64b refactor we made a decision to acquire
buffers the size of the largest memory region so we could read all of
the contents in one call. This made error handling very simple (and
simpler coding for me!) but had the trade off of large allocations if
huge pages were enabled.
This patch is one I've had on the back burner for awhile, but we can
read and write the Minidump memory sections in discrete chunks which we
already do for writing to disk.
I had to refactor the error handling a bit, but it remains the same. We
make a best effort attempt to read as much of the memory region as
possible, but fail immediately if we receive an error writing to disk. I
did not add new tests for this because our existing test suite is quite
good, but I did manually verify a few Minidumps couldn't read beyond the
red_zone.
```
(lldb) reg read $sp
rsp = 0x00007fffffffc3b0
(lldb) p/x 0x00007fffffffc3b0 - 128
(long) 0x00007fffffffc330
(lldb) memory read 0x00007fffffffc330
0x7fffffffc330: 60 c3 ff ff ff 7f 00 00 60 cd ff ff ff 7f 00 00 `.......`.......
0x7fffffffc340: 60 c3 ff ff ff 7f 00 00 65 e6 26 00 00 00 00 00 `.......e.&.....
(lldb) memory read 0x00007fffffffc329
error: could not parse memory info (Success!)
```
I'm not sure how to quantify the memory improvement other than we would
allocate the largest size regardless of the size. So a 2gb unreadable
region would cause a 2gb allocation even if we were reading 4096 kb. Now
we will take the range size or the max chunk size of 128 mb.
In #119598 my recent TLS feature seems to break crashpad symbols. I have
a few ideas on how this is happening, but for now as a mitigation I'm
checking if the Minidump was LLDB generated, and if so leveraging the
dynamic loader.
Recently my coworker @jeffreytan81 pointed out that Minidumps don't show
breakpoints when collected. This was prior blocked because Minidumps
could only contain 1 exception, now that we support N signals/sections
we can save all the threads stopped on breakpoints.
In my prior two save core API's, I experimented on how to save stacks
with the new API. I incorrectly left these in, as the existing
`m_thread_by_range_end` was the correct choice.
I have removed the no-op collection, and moved to use the proper one.
It's worth noting this was not caught by testing because we do not
verify where the items are contained in the minidump. This would require
a test being aware of how minidumps are structured, or adding a textual
tool that we can then scan the output of.
In #95312 Minidump file creation was moved from being created at the
end, to the file being emitted in chunks. This causes some undesirable
behavior where the file can still be present after an error has
occurred. To resolve this we will now delete the file upon an error.
Recently in #107731 this change was revereted due to excess memory size
in `TestSkinnyCore`. This was due to a bug where a range's end was being
passed as size. Creating massive memory ranges.
Additionally, and requiring additional review, I added more unit tests
and more verbose logic to the merging of save core memory regions.
@jasonmolenda as an FYI.
Reapplies #106293, testing identified issue in the merging code. I used
this opportunity to strip CoreFileMemoryRanges to it's own file and then
add unit tests on it's behavior.
A follow up to #106473 Minidump wasn't collecting fs or gs_base. This
patch extends the x86_64 register context and gated reading it behind an
lldb specific flag. Additionally these registers are explicitly checked
in the tests.
This patch addresses a bug where `cs`/`fs` and other segmentation flags
were being identified as having a type of `32b` and `64b` for `rflags`.
In that case the register value was returning the fail value `0xF...`
and this was corrupting some minidumps. Here we just read it as a 64b
value and truncate it.
In addition to that fix, I added comparing the registers from the live
process to the loaded core for the generic minidump test. Prior only
being ARM register tests. This explains why this was not detected
before.
This patch removes all of the Set.* methods from Status.
This cleanup is part of a series of patches that make it harder use the
anti-pattern of keeping a long-lives Status object around and updating
it while dropping any errors it contains on the floor.
This patch is largely NFC, the more interesting next steps this enables
is to:
1. remove Status.Clear()
2. assert that Status::operator=() never overwrites an error
3. remove Status::operator=()
Note that step (2) will bring 90% of the benefits for users, and step
(3) will dramatically clean up the error handling code in various
places. In the end my goal is to convert all APIs that are of the form
` ResultTy DoFoo(Status& error)
`
to
` llvm::Expected<ResultTy> DoFoo()
`
How to read this patch?
The interesting changes are in Status.h and Status.cpp, all other
changes are mostly
` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git
grep -l SetErrorString lldb/source)
`
plus the occasional manual cleanup.
This patch adds the option to specify specific memory ranges to be
included in a given core file. The current implementation lets user
specified ranges either be in addition to a certain save style, or
independent of them via the newly added custom enum.
To achieve being inclusive of save style, I've moved from a std::vector
of ranges to a RangeDataVector, and to join overlapping ranges to
prevent duplication of memory ranges in the core file.
As a non function bonus, when SBSavecore was initially created, the
header was included in the lldb-private interfaces, and I've fixed that
and moved it the forward declare as an oversight. CC @bulbazord in case
we need to include that into swift.
This PR is in response to a bug my coworker @mbucko discovered where on
MacOS Minidumps were being created where the 64b memory regions were
readable, but were not being listed in
`SBProcess.GetMemoryRegionList()`. This went unnoticed in #95312 due to
all the linux testing including /proc/pid maps. On MacOS generated dumps
(or any dump without access to /proc/pid) we would fail to properly map
Memory Regions due to there being two independent methods for 32b and
64b mapping.
In this PR I addressed this minor bug and merged the methods, but in
order to add test coverage required additions to `obj2yaml` and
`yaml2obj` which make up the bulk of this patch.
Lastly, there are some non-required changes such as the addition of the
`Memory64ListHeader` type, to make writing/reading the header section of
the Memory64List easier.
Reapply #100443 and #101770. These were originally reverted due to a
test failure and an MSAN failure. I changed the test attribute to
restrict to x86 (following the other existing tests). I could not
reproduce the test or the MSAN failure and no repo steps were provided.
In #98403 I enabled the SBSaveCoreOptions object, which allows users via
the scripting API to define what they want saved into their core file.
As the first option I've added a threadlist, so users can scan and
identify which threads and corresponding stacks they want to save.
In order to support this, I had to add a new method to `Process.h` on
how we identify which threads are to be saved, and I had to change the
book keeping in minidump to ensure we don't double save the stacks.
Important to @jasonmolenda I also changed the MachO coredump to accept
these new APIs.
In #95312 I incorrectly set `m_expected_directories` to uint, this broke
the windows build and is the incorrect type.
`size_t` is more accurate because this value only ever represents the
expected upper bound of the directory vector.
Currently, LLDB does not support taking a minidump over the 4.2gb limit imposed by uint32. In fact, currently it writes the RVA's and the headers to the end of the file, which can become corrupted due to the header offset only supporting a 32b offset.
This change reorganizes how the file structure is laid out. LLDB will precalculate the number of directories required and preallocate space at the top of the file to fill in later. Additionally, thread stacks require a 32b offset, and we provision empty descriptors and keep track of them to clean up once we write the 32b memory list.
For
[MemoryList64](https://learn.microsoft.com/en-us/windows/win32/api/minidumpapiset/ns-minidumpapiset-minidump_memory64_list),
the RVA to the start of the section itself will remain in a 32b addressable space. We achieve this by predetermining the space the memory regions will take, and only writing up to 4.2 gb of data with some buffer to allow all the MemoryDescriptor64s to also still be 32b addressable.
I did not add any explicit tests to this PR because allocating 4.2gb+ to test is very expensive. However, we have 32b automation tests and I validated with in several ways, including with 5gb+ array/object and would be willing to add this as a test case.
Currently in Core dumps, the entire pthread is copied, including the
unused space beyond the stack pointer. This causes large amounts of core
dump inflation when the number of threads is high, but the stack usage
is low. Such as when an application is using a thread pool.
This change will optimize for these situations in addition to generally
improving the core dump performance for all of lldb.
Summary:
AddMemoryList() was returning the last error status returned by
ReadMemory(). So if an invalid memory region was read last, the function
would return an error.
Test Plan:
./bin/llvm-lit -sv
~/src/llvm-project/lldb/test/API/functionalities/process_save_core_minidump/TestProcessSaveCoreMinidump.py
Reviewers:
kevinfrei,clayborg
Subscribers:
Tasks:
Tags:
There are users reporting saving minidump from lldb-dap does not work.
Turns out our stack trace request always evaluate a function call which
caused JIT object file like "__lldb_caller_function" to be created which
can fail minidump builder to get its module size.
This patch fixes "getModuleFileSize" for ObjectFileJIT so that module
list can be saved. I decided to create a lldb-dap test so that this
end-to-end functionality can be validated from our tests (instead of
only command line lldb).
The patch also improves several other small things in the workflow:
1. It logs any minidump stream failure so that it is easier to find out
what stream saving fails. In future, we should show process and error to
end users.
2. It handles error from "getModuleFileSize" llvm::Expected<T> otherwise
it will complain the error is not handled.
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
This patch adds support for saving minidumps with the arm64
architecture. It also will cause unsupported architectures to emit an
error where before this patch it would emit a minidump with partial
information. This new code is tested by the arm64 windows buildbot that
was failing:
https://lab.llvm.org/buildbot/#/builders/219/builds/6868
This is needed following this PR:
https://github.com/llvm/llvm-project/pull/71772
Prior to this patch, each core file plugin (ObjectFileMachO.cpp and
ObjectFileMinindump.cpp) would calculate the address ranges to save in
different ways. This patch adds a new function to Process.h/.cpp:
```
Status Process::CalculateCoreFileSaveRanges(lldb::SaveCoreStyle core_style, CoreFileMemoryRanges &ranges);
```
The patch updates the ObjectFileMachO::SaveCore(...) and
ObjectFileMinindump::SaveCore(...) to use same code. This will allow
core files to be consistent with the lldb::SaveCoreStyle across
different core file creators and will allow us to add new core file
saving features that do more complex things in future patches.
This is a re-submission of 24d240558811604354a8d6080405f6bad8d15b5c
without the hunks in HostNativeThreadBase.{h,cpp}, which break builds
on Windows.
Identified with modernize-use-nullptr.
This reverts commit 913457acf07be7f22d71ac41ad1076517d7f45c6.
It again broke builds on Windows:
lldb/source/Host/common/HostNativeThreadBase.cpp(37,14): error:
assigning to 'lldb::thread_result_t' (aka 'unsigned int') from
incompatible type 'std::nullptr_t'
This is a re-submission of 24d240558811604354a8d6080405f6bad8d15b5c
without the hunk in HostNativeThreadBase.h, which breaks builds on
Windows.
Identified with modernize-use-nullptr.
This reverts commit 24d240558811604354a8d6080405f6bad8d15b5c.
Breaks building on Windows:
../../lldb/include\lldb/Host/HostNativeThreadBase.h(49,36): error:
cannot initialize a member subobject of type 'lldb::thread_result_t'
(aka 'unsigned int') with an rvalue of type 'std::nullptr_t'
lldb::thread_result_t m_result = nullptr;
^~~~~~~
1 error generated.
Fixes the following warning:
$llvm_project/lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp:744:11: warning:
format specifies type 'long' but the argument has type 'lldb::offset_t' (aka 'unsigned long long') [-Wformat]
m_data.GetByteSize());
^~~~~~~~~~~~~~~~~~~~
This change adds save-core functionality into the ObjectFileELF that enables
saving minidump of a stopped process. This change is mainly targeting Linux
running on x86_64 machines. Minidump should contain basic information needed
to examine state of threads, local variables and stack traces. Full support
for other platforms is not so far implemented. API tests are using LLDB's
MinidumpParser.
This relands commit aafa05e, reverted in 1f986f6.
Failed tests were fixed.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D108233
This change adds save-core functionality into the ObjectFileELF that enables
saving minidump of a stopped process. This change is mainly targeting Linux
running on x86_64 machines. Minidump should contain basic information needed
to examine state of threads, local variables and stack traces. Full support
for other platforms is not so far implemented. API tests are using LLDB's
MinidumpParser.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D108233