11 Commits

Author SHA1 Message Date
Pavel Labath
17491d9130
[lldb] Remove data_offset arg from GetModuleSpecifications (#188978)
- it is always passed as zero
- a lot of plugins aren't using it correctly
- the data extractor class already has the capability to look at a
subset of bytes
2026-03-30 08:39:52 +02:00
Pavel Labath
08c94c0ac3
[lldb] Clear up GetModuleSpecifications return value confusion (#188276)
Some plugins were returning the number of specifications they have
added, while others were returning the total final number. Particularly
devious plugins (Minidump) were clearing the specification list
altogether. This resulted in nondeterministic failures (depending on
plugin ininitialization order) in TestSBModule.

This PR defines the problem away by having each plugin only return the
specifications it is responsible for. If the caller wants to merge them,
it is free to do so. This *might* be slighly less efficient, but this is
hardly hot code.

I'm not touching the ObjectFile::GetModuleSpecifications function (the
caller of all these functions) as the PR is big enough, although the
same approach might be warranted there as well.

Fixes https://github.com/llvm/llvm-project/issues/178625.
2026-03-25 15:55:04 +01:00
Jason Molenda
2aa020f49b
[lldb][NFC] Module, ModuleSpec, GetSectionData use DataExtractorSP (#178347)
In a PR last month I changed the ObjectFile CreateInstance etc methods
to accept an optional DataExtractorSP instead of a DataBufferSP, and
retain the extractor in a shared pointer internally in all of the
ObjectFile subclasses. This is laying the groundwork for using a
VirtualDataExtractor for some Mach-O binaries on macOS, where the
segments of the binary are out-of-order in actual memory, and we add a
lookup table to make it appear that the TEXT segment is at offset 0 in
the Extractor, etc. Working on the actual implementation, I realized we
were still using DataBufferSP's in ModuleSpec and Module, as well as in
ObjectFile::GetModuleSpecifications.

I originally was making a much larger NFC change where I had all
ObjectFile subclasses operating on DataExtractors throughout their
implementation, as well as in the DWARF parser. It was a very large
patchset. Many subclasses start with their DataExtractor, then create
smaller DataExtractors for parts of the binary image - the string table,
the symbol table, etc., for processing.

After consideration and discussion with Jonas, we agreed that a
segment/section of a binary will never require a lookup table to access
the bytes within it, so I changed
VirtualDataExtractor::GetSubsetExtractorSP to (1) require that the
Subset be contained within a single lookup table entry, and (2) return a
simple DataExtractor bounded on that byte range. By doing this, I was
able to remove all of my very-invasive changes to the ObjectFile
subclass internals; it's only when they are operating on the entire
binary image that care is needed.

One pattern that subclasses like ObjectFileBreakpad use is to take an
ArrayRef of the DataBuffer for a binary, then create a StringRef of
that, then look for strings in it. With a VirtualDataExtractor and
out-of-order binary segments, with gaps between them, this allows us to
search the entire buffer looking for a string, and segfault when it gets
to an unmapped region of the buffer. I added a
VirtualDataExtractor::GetSubsetExtractorSP(0) which gets the largest
contiguous memory region starting at offset 0 for this use case, and I
added a comment about what was being done there because I know it is not
obvious, and people not working on macOS wouldn't be familiar with the
requirement. (when we have a ModuleSpec with a DataExtractor, any of the
ObjectFile subclasses get a shot at Creating, so they all have to be
able to iterate on these)

rdar://148939795
2026-01-29 15:36:40 -08:00
Jason Molenda
e4c83b7b11
[lldb][NFC] Change ObjectFile argument type (#171574)
The ObjectFile plugin interface accepts an optional DataBufferSP
argument. If the caller has the contents of the binary, it can provide
this in that DataBufferSP. The ObjectFile subclasses in their
CreateInstance methods will fill in the DataBufferSP with the actual
binary contents if it is not set.
ObjectFile base class creates an ivar DataExtractor from the
DataBufferSP passed in.

My next patch will be a caller that creates a VirtualDataExtractor with
the binary data, and needs to pass that in to the ObjectFile plugin,
instead of the bag-of-bytes DataBufferSP. It builds on the previous
patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP
so I could pass in a subclass in the shared ptr. And it will be using
the VirtualDataExtractor that Jonas added in
https://github.com/llvm/llvm-project/pull/168802

No behavior is changed by the patch; we're simply moving the creation of
the DataExtractor to the caller, instead of a DataBuffer that is
immediately used to set up the ObjectFile DataExtractor. The patch is a
bit complicated because all of the ObjectFile subclasses have to
initialize their DataExtractor to pass in to the base class.

I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it
under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked
lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the
SME, that explains why the testsuite passes)

rdar://148939795

---------

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
2025-12-11 10:08:56 -08:00
Jason Molenda
69511ae804
[lldb] Unwind through ARM Cortex-M exceptions automatically (#153922)
When a processor faults/is interrupted/gets an exception, it will stop
running code and jump to an exception catcher routine. Most processors
will store the pc that was executing in a system register, and the
catcher functions have special instructions to retrieve that & possibly
other registers. It may then save those values to stack, and the author
can add .cfi directives to tell lldb's unwinder where to find those
saved values.

ARM Cortex-M (microcontroller) processors have a simpler mechanism where
a fixed set of registers are saved to the stack on an exception, and a
unique value is put in the link register to indicate to the caller that
this has taken place. No special handling needs to be written into the
exception catcher, unless it wants to inspect these preserved values.
And it is possible for a general stack walker to walk the stack with no
special knowledge about what the catch function does.

This patch adds an Architecture plugin method to allow an Architecture
to override/augment the UnwindPlan that lldb would use for a stack
frame, given the contents of the return address register. It resembles a
feature where the LanguageRuntime can replace/augment the unwind plan
for a function, but it is doing it at offset by one level. The
LanguageRuntime is looking at the local register context and/or symbol
name to decide if it will override the unwind rules. For the Cortex-M
exception unwinds, we need to modify THIS frame's unwind plan if the
CALLER's LR had a specific value. RegisterContextUnwind has to retrieve
the caller's LR value before it has completely decided on the UnwindPlan
it will use for THIS stack frame.

This does mean that we will need one additional read of stack memory
than we currently do when unwinding, on Armv7 Cortex-M targets. The
unwinder walks the stack lazily, as stack frames are requested, and so
now if you ask for 2 stack frames, we will read enough stack to walk 2
frames, plus we will read one extra word of memory, the spilled LR value
from the stack. In practice, with 512-byte memory cache reads, this is
unlikely to be a real performance hit.

This PR includes a test with a yaml corefile description and a JSON
ObjectFile, incorporating all of the necessary stack memory and symbol
names from a real debug session I worked on. The architectural default
unwind plans are used for all stack frames except the 0th because
there's no instructions for the functions, and no unwind info. I may
need to add an encoding of unwind fules to ObjectFileJSON in the future
as we create more test cases like this.

This PR depends on the yaml2macho-core utility from
https://github.com/llvm/llvm-project/pull/153911 to run its API test.

rdar://110663219
2025-09-09 14:11:39 -07:00
Pavel Labath
2c4f67794b
[lldb/cmake] Implicitly pass arguments to llvm_add_library (#142583)
If we're not touching them, we don't need to do anything special to pass
them along -- with one important caveat: due to how cmake arguments
work, the implicitly passed arguments need to be specified before
arguments that we handle.

This isn't particularly nice, but the alternative is enumerating all
arguments that can be used by llvm_add_library and the macros it calls
(it also relies on implicit passing of some arguments to
llvm_process_sources).
2025-06-04 11:33:37 +02:00
Greg Clayton
8ac359ba0d
Add complete ObjectFileJSON support for sections. (#129916)
Sections now support specifying:
- user IDs
- file offset/size
- alignment
- flags
- bool values for fake, encrypted and thread specific sections
2025-03-07 15:34:27 -08:00
Greg Clayton
27901cec0e
Add subsection and permissions support to ObjectFileJSON. (#129801)
This patch adds the ability to create subsections in a section and
allows permissions to be specified.
2025-03-04 16:19:20 -08:00
Greg Clayton
7b596ce362
[lldb] Fix ObjectFileJSON to section addresses. (#129648)
ObjectFileJSON sections didn't work, they were set to zero all of the
time. Fixed the bug and fixed the test to ensure it was testing real
values.
2025-03-04 14:35:42 -08:00
Jonas Devlieghere
0e5cdbf07e
[lldb] Make ObjectFileJSON loadable as a module
This patch adds support for creating modules from JSON object files.
This is necessary for the crashlog use case where we don't have either a
module or a symbol file. In that case the ObjectFileJSON serves as both.

The patch adds support for an object file type (i.e. executable, shared
library, etc). It also adds the ability to specify sections, which is
necessary in order specify symbols by address. Finally, this patch
improves error handling and fixes a bug where we wouldn't read more than
the initial 512 bytes in GetModuleSpecifications.

Differential revision: https://reviews.llvm.org/D148062
2023-04-13 14:08:19 -07:00
Jonas Devlieghere
cf3524a574
[lldb] Introduce new SymbolFileJSON and ObjectFileJSON
Introduce a new object and symbol file format with the goal of mapping
addresses to symbol names. I'd like to think of is as an extremely
simple textual symtab. The file format consists of a triple, a UUID and
a list of symbols. JSON is used for the encoding, but that's mostly an
implementation detail. The goal of the format was to be simple and human
readable.

The new file format is motivated by two use cases:

 - Stripped binaries: when a binary is stripped, you lose the ability to
   do thing like setting symbolic breakpoints. You can keep the
   unstripped binary around, but if all you need is the stripped
   symbols then that's a lot of overhead. Instead, we could save the
   stripped symbols to a file and load them in the debugger when
   needed. I want to extend llvm-strip to have a mode where it emits
   this new file format.

 - Interactive crashlogs: with interactive crashlogs, if we don't have
   the binary or the dSYM for a particular module, we currently show an
   unnamed symbol for those frames. This is a regression compared to the
   textual format, that has these frames pre-symbolicated. Given that
   this information is available in the JSON crashlog, we need a way to
   tell LLDB about it. With the new symbol file format, we can easily
   synthesize a symbol file for each of those modules and load them to
   symbolicate those frames.

Here's an example of the file format:

 {
     "triple": "arm64-apple-macosx13.0.0",
     "uuid": "36D0CCE7-8ED2-3CA3-96B0-48C1764DA908",
     "symbols": [
         {
             "name": "main",
             "type": "code",
             "size": 32,
             "address": 4294983568
         },
         {
             "name": "foo",
             "type": "code",
             "size": 8,
             "address": 4294983560
         }
     ]
 }

Differential revision: https://reviews.llvm.org/D145180
2023-03-08 20:56:11 -08:00