This was originally introduced to support kalimba DSPs featuring 24-bit
bytes by f03e6d84 and also c928de3e, but the kalimba support was mostly
removed by f8819bd5. This change removes the rest of the support, which
was far from complete.
In a PR last month I changed the ObjectFile CreateInstance etc methods
to accept an optional DataExtractorSP instead of a DataBufferSP, and
retain the extractor in a shared pointer internally in all of the
ObjectFile subclasses. This is laying the groundwork for using a
VirtualDataExtractor for some Mach-O binaries on macOS, where the
segments of the binary are out-of-order in actual memory, and we add a
lookup table to make it appear that the TEXT segment is at offset 0 in
the Extractor, etc. Working on the actual implementation, I realized we
were still using DataBufferSP's in ModuleSpec and Module, as well as in
ObjectFile::GetModuleSpecifications.
I originally was making a much larger NFC change where I had all
ObjectFile subclasses operating on DataExtractors throughout their
implementation, as well as in the DWARF parser. It was a very large
patchset. Many subclasses start with their DataExtractor, then create
smaller DataExtractors for parts of the binary image - the string table,
the symbol table, etc., for processing.
After consideration and discussion with Jonas, we agreed that a
segment/section of a binary will never require a lookup table to access
the bytes within it, so I changed
VirtualDataExtractor::GetSubsetExtractorSP to (1) require that the
Subset be contained within a single lookup table entry, and (2) return a
simple DataExtractor bounded on that byte range. By doing this, I was
able to remove all of my very-invasive changes to the ObjectFile
subclass internals; it's only when they are operating on the entire
binary image that care is needed.
One pattern that subclasses like ObjectFileBreakpad use is to take an
ArrayRef of the DataBuffer for a binary, then create a StringRef of
that, then look for strings in it. With a VirtualDataExtractor and
out-of-order binary segments, with gaps between them, this allows us to
search the entire buffer looking for a string, and segfault when it gets
to an unmapped region of the buffer. I added a
VirtualDataExtractor::GetSubsetExtractorSP(0) which gets the largest
contiguous memory region starting at offset 0 for this use case, and I
added a comment about what was being done there because I know it is not
obvious, and people not working on macOS wouldn't be familiar with the
requirement. (when we have a ModuleSpec with a DataExtractor, any of the
ObjectFile subclasses get a shot at Creating, so they all have to be
able to iterate on these)
rdar://148939795
The ObjectFile plugin interface accepts an optional DataBufferSP
argument. If the caller has the contents of the binary, it can provide
this in that DataBufferSP. The ObjectFile subclasses in their
CreateInstance methods will fill in the DataBufferSP with the actual
binary contents if it is not set.
ObjectFile base class creates an ivar DataExtractor from the
DataBufferSP passed in.
My next patch will be a caller that creates a VirtualDataExtractor with
the binary data, and needs to pass that in to the ObjectFile plugin,
instead of the bag-of-bytes DataBufferSP. It builds on the previous
patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP
so I could pass in a subclass in the shared ptr. And it will be using
the VirtualDataExtractor that Jonas added in
https://github.com/llvm/llvm-project/pull/168802
No behavior is changed by the patch; we're simply moving the creation of
the DataExtractor to the caller, instead of a DataBuffer that is
immediately used to set up the ObjectFile DataExtractor. The patch is a
bit complicated because all of the ObjectFile subclasses have to
initialize their DataExtractor to pass in to the base class.
I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it
under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked
lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the
SME, that explains why the testsuite passes)
rdar://148939795
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
LLDB can use the wasm name section to populate its symbol table and get
names for functions. However the index space used in the name section is
the "function index space" which includes imported as well as locally
defined functions.
ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in to its
ctor. If the DataExtractor does not get a DataBuffer source passed in,
the subclass will initialize it with access to the object file's data.
When a DataBuffer is passed in to the base class ctor, the DataExtractor
only has its buffer initialized; ObjectFile doesn't yet know the address
size and endianness to fully initialize the DataExtractor.
This patch changes ObjectFile to instead have a DataExtractorSP ivar
which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor. The next patch I will be
writing is to change the ObjectFile ctor to take an optional
DataExtractorSP, so the caller can pass a DataExtractor subclass -- the
VirtualizeDataExtractor being added via
https://github.com/llvm/llvm-project/pull/168802
instead of a DataBuffer which is trivially saved into the DataExtractor.
The change is otherwise mechanical; all `m_data.` changed to
`m_data_sp->` and all the places where `m_data` was passed in for a
by-ref call were changed to `*m_data_sp.get()`. The shared pointer is
always initialized to contain an object.
I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for
getting the Linux testsuite to run on SME-only systems David). All of
the ObjectFile subclasses I modifed compile cleanly, but I haven't
tested them beyond any unit tests they may have (prob breakpad).
rdar://148939795
My original implementation for parsing Wasm segments was wrong in two
related ways. I had a bug in calculating the file vm address and I
didn't fully understand the difference between active and passive
segments and how that impacted their file vm address.
With this PR, we now support parsing init expressions for active
segments, rather than just skipping over them. This is necessary to
determine where they get loaded.
Similar to llvm-objdump, we currently only support simple opcodes (i.e.
constants). We also currently do not support active segments that use a
non-zero memory index. However this covers all segments for a
non-trivial Swift binary compiled to Wasm.
Improve error handling in ObjectFileWasm by using helpers that wrap
their result in an llvm::Expected. The helper to read a Wasm string now
return an Expected<std::string> and I created a helper to parse 32-bit
ULEBs that returns an Expected<uint32_t>.
This is a continuation of #153494. In a WebAssembly file, the "name"
section contains names for the segments in the data section
(WASM_NAMES_DATA_SEGMENT). We already parse these as symbols, and with
this PR, we now also create sub-sections for each of the segments.
This PR adds support for parsing the data symbols from the WebAssembly
name section, which consists of a name and address range for the
segments in the Wasm data section. Unlike other object file formats,
Wasm has no symbols for referencing items within those segments (i.e.
symbols the user has defined).
Use std::numeric_limits<uint32_t>::max() for all overflow checks in
ObjectFileWasm and fix a few locations where I incorrectly used `>=`
instead of `>`.
This PR adds support for parsing the WebAssembly symbol table. The
symbol table is encoded in the "names" section and contains names and
indexes into other sections. For now we only support parsing function
(code) symbols. The result is that you can set breakpoints by symbol
name, while previously breakpoints by name required debug info (DWARF).
This is also necessary for Swift, which checks for the presence of
`swift_release` as a heuristic to determine if there's a static Swift
stdlib.
Add support for DW_OP_WASM_location in DWARFExpression. This PR rebases
#78977 and cleans up the unit test. The DWARF extensions are documented
at https://yurydelendik.github.io/webassembly-dwarf/ and supported by
LLVM-based toolchains such as Clang, Swift, Emscripten, and Rust.
Different object file formats support DWARF sections (COFF, ELF, MachO,
PE/COFF, WASM). COFF and PE/COFF only matched a subset. This caused some
GCC executables produced on MinGW to have issue later on when debugging.
One example is that `.debug_rnglists` was not matched, which caused
range-extraction to fail when printing a backtrace.
This unifies the parsing of section names in
`ObjectFile::GetDWARFSectionTypeFromName`, so all file formats can use
the same naming convention. Since the prefixes are different,
`GetDWARFSectionTypeFromName` only matches the suffixes (i.e. `.debug_`
needs to be stripped before).
I added two tests to ensure the sections are correctly identified on
Windows executables.
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
There are 3 places where we were using WASM_SEC_TAG as the "last" known
section type, which requires updating (or leaves a bug) when a new known
section type is added. Instead add a "last type" to the enum for this
purpose.
Differential Revision: https://reviews.llvm.org/D127164
Currently, all data buffers are assumed to be writable. This is a
problem on macOS where it's not allowed to load unsigned binaries in
memory as writable. To be more precise, MAP_RESILIENT_CODESIGN and
MAP_RESILIENT_MEDIA need to be set for mapped (unsigned) binaries on our
platform.
Binaries are mapped through FileSystem::CreateDataBuffer which returns a
DataBufferLLVM. The latter is backed by a llvm::WritableMemoryBuffer
because every DataBuffer in LLDB is considered to be writable. In order
to use a read-only llvm::MemoryBuffer I had to split our abstraction
around it.
This patch distinguishes between a DataBuffer (read-only) and
WritableDataBuffer (read-write) and updates LLDB to use the appropriate
one.
rdar://74890607
Differential revision: https://reviews.llvm.org/D122856
The current design allows that the object file contents could be mapped
by one object file plugin and then used by another. Presumably the idea
here was to avoid mapping the same file twice.
This becomes an issue when one object file plugin wants to map the file
differently from the others. For example, ObjectFileELF needs to map its
memory as writable while others likeObjectFileMachO needs it to be
mapped read-only.
This patch prevents plugins from changing the buffer by passing them is
by value rather than by reference.
Differential revision: https://reviews.llvm.org/D122944
Most of our code was including Log.h even though that is not where the
"lldb" log channel is defined (Log.h defines the generic logging
infrastructure). This worked because Log.h included Logging.h, even
though it should.
After the recent refactor, it became impossible the two files include
each other in this direction (the opposite inclusion is needed), so this
patch removes the workaround that was put in place and cleans up all
files to include the right thing. It also renames the file to LLDBLog to
better reflect its purpose.
Symbol table parsing has evolved over the years and many plug-ins contained duplicate code in the ObjectFile::GetSymtab() that used to be pure virtual. With this change, the "Symbtab *ObjectFile::GetSymtab()" is no longer virtual and will end up calling a new "void ObjectFile::ParseSymtab(Symtab &symtab)" pure virtual function to actually do the parsing. This helps centralize the code for parsing the symbol table and allows the ObjectFile base class to do all of the common work, like taking the necessary locks and creating the symbol table object itself. Plug-ins now just need to parse when they are asked to parse as the ParseSymtab function will only get called once.
This is a retry of the original patch https://reviews.llvm.org/D113965 which was reverted. There was a deadlock in the Manual DWARF indexing code during symbol preloading where the module was asked on the main thread to preload its symbols, and this would in turn cause the DWARF manual indexing to use a thread pool to index all of the compile units, and if there were relocations on the debug information sections, these threads could ask the ObjectFile to load section contents, which could cause a call to ObjectFileELF::RelocateSection() which would ask for the symbol table from the module and it would deadlock. We can't lock the module in ObjectFile::GetSymtab(), so the solution I am using is to use a llvm::once_flag to create the symbol table object once and then lock the Symtab object. Since all APIs on the symbol table use this lock, this will prevent anyone from using the symbol table before it is parsed and finalized and will avoid the deadlock I mentioned. ObjectFileELF::GetSymtab() was never locking the module lock before and would put off creating the symbol table until somewhere inside ObjectFileELF::GetSymtab(). Now we create it one time inside of the ObjectFile::GetSymtab() and immediately lock it which should be safe enough. This avoids the deadlocks and still provides safety.
Differential Revision: https://reviews.llvm.org/D114288
This reverts commit 951b107eedab1829f18049443f03339dbb0db165.
Buildbots were failing, there is a deadlock in /Users/gclayton/Documents/src/llvm/clean/llvm-project/lldb/test/Shell/SymbolFile/DWARF/DW_AT_range-DW_FORM_sec_offset.s when ELF files try to relocate things.
Symbol table parsing has evolved over the years and many plug-ins contained duplicate code in the ObjectFile::GetSymtab() that used to be pure virtual. With this change, the "Symbtab *ObjectFile::GetSymtab()" is no longer virtual and will end up calling a new "void ObjectFile::ParseSymtab(Symtab &symtab)" pure virtual function to actually do the parsing. This helps centralize the code for parsing the symbol table and allows the ObjectFile base class to do all of the common work, like taking the necessary locks and creating the symbol table object itself. Plug-ins now just need to parse when they are asked to parse as the ParseSymtab function will only get called once.
Differential Revision: https://reviews.llvm.org/D113965
This patch deals with ObjectFile, ObjectContainer and OperatingSystem
plugins. I'll convert the other types in separate patches.
In order to enable piecemeal conversion, I am leaving some ConstStrings
in the lowest PluginManager layers. I'll convert those as the last step.
Differential Revision: https://reviews.llvm.org/D112061
This is a step towards making the initialize and terminate calls be
generated by CMake, which in turn is towards making it possible to
disable plugins at configuration time.
Differential revision: https://reviews.llvm.org/D74245
This patch has a couple of outstanding issues. The test is not python3
compatible, and it also seems to fail with python2 (at least under some
circumstances) due to an overambitious assertion.
This reverts the patch as well as subsequent fixup attempts:
014ea9337624fe20aca8892e73b6b3f741d8da9e,
f5f70d1c8fbf12249b4b9598f10a10f12d4db029.
4697e701b8cb40429818609814c7422e49b2ee07.
5c15e8e682e365b3a7fcf35200df79f3fb93b924.
3ec28da6d6430a00b46780555a87acd43fcab790.
size_t and uint64_t are spelled slightly differently on macOS, which was
causing the compiler to error out calling std::min - since the two types have
to be the same.
I fixed this by casting the uint64_t computation to a size_t. That's probably
not the cleanest solution, but it gets us back to building.
Summary:
This is the first in a series of patches to enable LLDB debugging of
WebAssembly targets.
Current versions of Clang emit (partial) DWARF debug information in WebAssembly
modules and we can leverage this debug information to give LLDB the ability to
do source-level debugging of Wasm code that runs in a WebAssembly engine.
A way to do this could be to use the remote debugging functionalities provided
by LLDB via the GDB-remote protocol. Remote debugging can indeed be useful not
only to connect a debugger to a process running on a remote machine, but also to
connect the debugger to a managed VM or script engine that runs locally,
provided that the engine implements a GDB-remote stub that offers the ability to
access the engine runtime internal state.
To make this work, the GDB-remote protocol would need to be extended with a few
Wasm-specific custom query commands, used to access aspects of the Wasm engine
state (like the Wasm memory, Wasm local and global variables, and so on).
Furthermore, the DWARF format would need to be enriched with a few Wasm-specific
extensions, here detailed: https://yurydelendik.github.io/webassembly-dwarf.
This CL introduce classes **ObjectFileWasm**, a file plugin to represent a Wasm
module loaded in a debuggee process. It knows how to parse Wasm modules and
store the Code section and the DWARF-specific sections.
Reviewers: jasonmolenda, clayborg, labath
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D71575