In a PR last month I changed the ObjectFile CreateInstance etc methods
to accept an optional DataExtractorSP instead of a DataBufferSP, and
retain the extractor in a shared pointer internally in all of the
ObjectFile subclasses. This is laying the groundwork for using a
VirtualDataExtractor for some Mach-O binaries on macOS, where the
segments of the binary are out-of-order in actual memory, and we add a
lookup table to make it appear that the TEXT segment is at offset 0 in
the Extractor, etc. Working on the actual implementation, I realized we
were still using DataBufferSP's in ModuleSpec and Module, as well as in
ObjectFile::GetModuleSpecifications.
I originally was making a much larger NFC change where I had all
ObjectFile subclasses operating on DataExtractors throughout their
implementation, as well as in the DWARF parser. It was a very large
patchset. Many subclasses start with their DataExtractor, then create
smaller DataExtractors for parts of the binary image - the string table,
the symbol table, etc., for processing.
After consideration and discussion with Jonas, we agreed that a
segment/section of a binary will never require a lookup table to access
the bytes within it, so I changed
VirtualDataExtractor::GetSubsetExtractorSP to (1) require that the
Subset be contained within a single lookup table entry, and (2) return a
simple DataExtractor bounded on that byte range. By doing this, I was
able to remove all of my very-invasive changes to the ObjectFile
subclass internals; it's only when they are operating on the entire
binary image that care is needed.
One pattern that subclasses like ObjectFileBreakpad use is to take an
ArrayRef of the DataBuffer for a binary, then create a StringRef of
that, then look for strings in it. With a VirtualDataExtractor and
out-of-order binary segments, with gaps between them, this allows us to
search the entire buffer looking for a string, and segfault when it gets
to an unmapped region of the buffer. I added a
VirtualDataExtractor::GetSubsetExtractorSP(0) which gets the largest
contiguous memory region starting at offset 0 for this use case, and I
added a comment about what was being done there because I know it is not
obvious, and people not working on macOS wouldn't be familiar with the
requirement. (when we have a ModuleSpec with a DataExtractor, any of the
ObjectFile subclasses get a shot at Creating, so they all have to be
able to iterate on these)
rdar://148939795
The ObjectFile plugin interface accepts an optional DataBufferSP
argument. If the caller has the contents of the binary, it can provide
this in that DataBufferSP. The ObjectFile subclasses in their
CreateInstance methods will fill in the DataBufferSP with the actual
binary contents if it is not set.
ObjectFile base class creates an ivar DataExtractor from the
DataBufferSP passed in.
My next patch will be a caller that creates a VirtualDataExtractor with
the binary data, and needs to pass that in to the ObjectFile plugin,
instead of the bag-of-bytes DataBufferSP. It builds on the previous
patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP
so I could pass in a subclass in the shared ptr. And it will be using
the VirtualDataExtractor that Jonas added in
https://github.com/llvm/llvm-project/pull/168802
No behavior is changed by the patch; we're simply moving the creation of
the DataExtractor to the caller, instead of a DataBuffer that is
immediately used to set up the ObjectFile DataExtractor. The patch is a
bit complicated because all of the ObjectFile subclasses have to
initialize their DataExtractor to pass in to the base class.
I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it
under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked
lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the
SME, that explains why the testsuite passes)
rdar://148939795
---------
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in to its
ctor. If the DataExtractor does not get a DataBuffer source passed in,
the subclass will initialize it with access to the object file's data.
When a DataBuffer is passed in to the base class ctor, the DataExtractor
only has its buffer initialized; ObjectFile doesn't yet know the address
size and endianness to fully initialize the DataExtractor.
This patch changes ObjectFile to instead have a DataExtractorSP ivar
which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor. The next patch I will be
writing is to change the ObjectFile ctor to take an optional
DataExtractorSP, so the caller can pass a DataExtractor subclass -- the
VirtualizeDataExtractor being added via
https://github.com/llvm/llvm-project/pull/168802
instead of a DataBuffer which is trivially saved into the DataExtractor.
The change is otherwise mechanical; all `m_data.` changed to
`m_data_sp->` and all the places where `m_data` was passed in for a
by-ref call were changed to `*m_data_sp.get()`. The shared pointer is
always initialized to contain an object.
I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for
getting the Linux testsuite to run on SME-only systems David). All of
the ObjectFile subclasses I modifed compile cleanly, but I haven't
tested them beyond any unit tests they may have (prob breakpad).
rdar://148939795
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.
For more context, see https://github.com/llvm/llvm-project/pull/163117.
If we're not touching them, we don't need to do anything special to pass
them along -- with one important caveat: due to how cmake arguments
work, the implicitly passed arguments need to be specified before
arguments that we handle.
This isn't particularly nice, but the alternative is enumerating all
arguments that can be used by llvm_add_library and the macros it calls
(it also relies on implicit passing of some arguments to
llvm_process_sources).
Add MSP430 to the list of available targets, implement MSP430 ABI, add support for debugging targets with 16-bit address size.
The update is intended for use with MSPDebug, a GDB server implementation for MSP430.
Reviewed By: bulbazord, DavidSpickett
Differential Revision: https://reviews.llvm.org/D146965
Add MSP430 to the list of available targets, implement MSP430 ABI, add support for debugging targets with 16-bit address size.
The update is intended for use with MSPDebug, a GDB server implementation for MSP430.
Reviewed By: bulbazord, DavidSpickett
Differential Revision: https://reviews.llvm.org/D146965
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
component into a new LLVM Component called "TargetParser". This
potentially enables using tablegen to maintain this information, as
is shown in https://reviews.llvm.org/D137517. This cannot currently
be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
information in the TargetParser:
- `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
the current Host machine for info about it, primarily to support
getting the host triple, but also for `-mcpu=native` support in e.g.
Clang. This is fairly tightly intertwined with the information in
`X86TargetParser.h`, so keeping them in the same component makes
sense.
- `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
the target triple parser and representation. This is very intertwined
with the Arm target parser, because the arm architecture version
appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.
And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM
Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.
If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.
Differential Revision: https://reviews.llvm.org/D137838
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Previously, depending on how you constructed a UUID from data or a
StringRef, an input value of all zeros was valid (e.g. setFromData)
or not (e.g. setFromOptionalData). Since there was no way to tell
which interpretation to use, it was done somewhat inconsistently.
This standardizes the meaning of a UUID of all zeros to Not Valid,
and removes all the Optional methods and their uses, as well as the
static factories that supported them.
Differential Revision: https://reviews.llvm.org/D132191
The current design allows that the object file contents could be mapped
by one object file plugin and then used by another. Presumably the idea
here was to avoid mapping the same file twice.
This becomes an issue when one object file plugin wants to map the file
differently from the others. For example, ObjectFileELF needs to map its
memory as writable while others likeObjectFileMachO needs it to be
mapped read-only.
This patch prevents plugins from changing the buffer by passing them is
by value rather than by reference.
Differential revision: https://reviews.llvm.org/D122944
Symbol table parsing has evolved over the years and many plug-ins contained duplicate code in the ObjectFile::GetSymtab() that used to be pure virtual. With this change, the "Symbtab *ObjectFile::GetSymtab()" is no longer virtual and will end up calling a new "void ObjectFile::ParseSymtab(Symtab &symtab)" pure virtual function to actually do the parsing. This helps centralize the code for parsing the symbol table and allows the ObjectFile base class to do all of the common work, like taking the necessary locks and creating the symbol table object itself. Plug-ins now just need to parse when they are asked to parse as the ParseSymtab function will only get called once.
This is a retry of the original patch https://reviews.llvm.org/D113965 which was reverted. There was a deadlock in the Manual DWARF indexing code during symbol preloading where the module was asked on the main thread to preload its symbols, and this would in turn cause the DWARF manual indexing to use a thread pool to index all of the compile units, and if there were relocations on the debug information sections, these threads could ask the ObjectFile to load section contents, which could cause a call to ObjectFileELF::RelocateSection() which would ask for the symbol table from the module and it would deadlock. We can't lock the module in ObjectFile::GetSymtab(), so the solution I am using is to use a llvm::once_flag to create the symbol table object once and then lock the Symtab object. Since all APIs on the symbol table use this lock, this will prevent anyone from using the symbol table before it is parsed and finalized and will avoid the deadlock I mentioned. ObjectFileELF::GetSymtab() was never locking the module lock before and would put off creating the symbol table until somewhere inside ObjectFileELF::GetSymtab(). Now we create it one time inside of the ObjectFile::GetSymtab() and immediately lock it which should be safe enough. This avoids the deadlocks and still provides safety.
Differential Revision: https://reviews.llvm.org/D114288
This reverts commit 951b107eedab1829f18049443f03339dbb0db165.
Buildbots were failing, there is a deadlock in /Users/gclayton/Documents/src/llvm/clean/llvm-project/lldb/test/Shell/SymbolFile/DWARF/DW_AT_range-DW_FORM_sec_offset.s when ELF files try to relocate things.
Symbol table parsing has evolved over the years and many plug-ins contained duplicate code in the ObjectFile::GetSymtab() that used to be pure virtual. With this change, the "Symbtab *ObjectFile::GetSymtab()" is no longer virtual and will end up calling a new "void ObjectFile::ParseSymtab(Symtab &symtab)" pure virtual function to actually do the parsing. This helps centralize the code for parsing the symbol table and allows the ObjectFile base class to do all of the common work, like taking the necessary locks and creating the symbol table object itself. Plug-ins now just need to parse when they are asked to parse as the ParseSymtab function will only get called once.
Differential Revision: https://reviews.llvm.org/D113965
Teach LLDB to understand INLINE and INLINE_ORIGIN records in breakpad.
They have the following formats:
```
INLINE inline_nest_level call_site_line call_site_file_num origin_num [address size]+
INLINE_ORIGIN origin_num name
```
`INLNIE_ORIGIN` is simply a string pool for INLINE so that we won't have
duplicated names for inlined functions and can show up anywhere in the symbol
file.
`INLINE` follows immediately after `FUNC` represents the ranges of momery
address that has functions inlined inside the function.
Differential Revision: https://reviews.llvm.org/D113330
This patch deals with ObjectFile, ObjectContainer and OperatingSystem
plugins. I'll convert the other types in separate patches.
In order to enable piecemeal conversion, I am leaving some ConstStrings
in the lowest PluginManager layers. I'll convert those as the last step.
Differential Revision: https://reviews.llvm.org/D112061
There is no reason why this function should be returning a ConstString.
While modifying these files, I also fixed several instances where
GetPluginName and GetPluginNameStatic were returning different strings.
I am not changing the return type of GetPluginNameStatic in this patch, as that
would necessitate additional changes, and this patch is big enough as it is.
Differential Revision: https://reviews.llvm.org/D111877
In all these years, we haven't found a use for this function (it has
zero callers). Lets just remove the boilerplate.
Differential Revision: https://reviews.llvm.org/D109600
SBTarget::AddModule currently handles the UUID parameter in a very
weird way: UUIDs with more than 16 bytes are trimmed to 16 bytes. On
the other hand, shorter-than-16-bytes UUIDs are completely ignored. In
this patch, we change the parsing code to handle UUIDs of arbitrary
size.
To support arbitrary size UUIDs in SBTarget::AddModule, this patch
changes UUID::SetFromStringRef to parse UUIDs of arbitrary length. We
subtly change the semantics of SetFromStringRef - SetFromStringRef now
only succeeds if the entire input is consumed to prevent some
prefix-parsing confusion. This is up for discussion, but I believe
this is more consistent - we always return false for invalid UUIDs
rather than sometimes truncating to a valid prefix. Also, all the
call-sites except the API and interpreter seem to expect to consume
the entire input.
This also adds tests for adding existing modules 4-, 16-, and 20-byte
build-ids. Finally, we took the liberty of testing the minidump
scenario we care about - removing placeholder module from minidump and
replacing it with the real module.
Reviewed By: labath, friss
Differential Revision: https://reviews.llvm.org/D80755
LLDB has a few different styles of header guards and they're not very
consistent because things get moved around or copy/pasted. This patch
unifies the header guards across LLDB and converts everything to match
LLVM's style.
Differential revision: https://reviews.llvm.org/D74743
This is a step towards making the initialize and terminate calls be
generated by CMake, which in turn is towards making it possible to
disable plugins at configuration time.
Differential revision: https://reviews.llvm.org/D74245
Summary:
A *.cpp file header in LLDB (and in LLDB) should like this:
```
//===-- TestUtilities.cpp -------------------------------------------------===//
```
However in LLDB most of our source files have arbitrary changes to this format and
these changes are spreading through LLDB as folks usually just use the existing
source files as templates for their new files (most notably the unnecessary
editor language indicator `-*- C++ -*-` is spreading and in every review
someone is pointing out that this is wrong, resulting in people pointing out that this
is done in the same way in other files).
This patch removes most of these inconsistencies including the editor language indicators,
all the different missing/additional '-' characters, files that center the file name, missing
trailing `===//` (mostly caused by clang-format breaking the line).
Reviewers: aprantl, espindola, jfb, shafik, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: dexonsmith, wuzish, emaste, sdardis, nemanjai, kbarton, MaskRay, atanasyan, arphaman, jfb, abidh, jsji, JDevlieghere, usaxena95, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D73258
Summary: The fields that aren't useful for us right now are simply ignored.
Reviewers: amccarth, markmentovai
Subscribers: rnk, lldb-commits
Differential Revision: https://reviews.llvm.org/D66633
llvm-svn: 369892
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
Differential revision: https://reviews.llvm.org/D66259
llvm-svn: 368933
Summary:
The contents of the gnu_debuglink section were passed through the
GetDebugSymbolFilePaths interface, which was more generic than needed.
As the only class implementing this function is ObjectFileELF, we can
modify the function to return just a single FileSpec (instead of a
list). Also, since the SymbolVendorELF already assumes ELF object files,
we don't have to make this method available on the generic ObjectFile
interface -- instead we can put it on ObjectFileELF directly.
This change also makes is so that if the Module has an explicit symbol
file spec set, we disregard the value the value of the debug link
(instead of doing a secondary lookup using that). I think it makes sense
to honor the users wishes if he had explicitly set the symbol file spec,
and this seems to be consistent with what SymbolVendorMacOSX is doing
(SymbolVendorMacOSX.cpp:125).
The main reason for making these changes is to make the treatment of
build-ids and debug links simpler in the follow-up patch.
Reviewers: clayborg, jankratochvil, mgorny, espindola
Subscribers: emaste, arichardson, MaskRay, lldb-commits
Differential Revision: https://reviews.llvm.org/D65560
llvm-svn: 367824
Summary:
On the heels of D62934, this patch uses the same approach to introduce
llvm RTTI support to the ObjectFile hierarchy. It also replaces the
existing uses of GetPluginName doing run-time type checks with
llvm::dyn_cast and friends.
This formally introduces new dependencies from some other plugins to
ObjectFile plugins. However, I believe this is fine because:
- these dependencies were already kind of there, and the only reason
we could get away with not modeling them explicitly was because the
code was relying on magically knowing what will GetPluginName() return
for a particular kind of object files.
- the dependencies themselves are logical (it makes sense for
SymbolVendorELF to depend on ObjectFileELF), or at least don't
actively get in the way (the JitLoaderGDB->MachO thing).
- they don't introduce any new dependency loops as ObjectFile plugins
don't depend on any other plugins
Reviewers: xiaobai, JDevlieghere, espindola
Subscribers: emaste, mgorny, arichardson, MaskRay, lldb-commits
Differential Revision: https://reviews.llvm.org/D65450
llvm-svn: 367413
D59433 and D60501 changed the way UUIDs are computed from minidump
files. This was done to synchronize the U(G)UID representation with the
native tools of given platforms, but it created a mismatch between
minidumps and breakpad files.
This updates the breakpad algorithm to match the one found in minidumps,
and also adds a couple of tests which should fail if these two ever get
out of sync. Incidentally, this means that the module id in the breakpad
files is almost identical to our notion of UUIDs, so the computation
algorithm can be somewhat simplified.
llvm-svn: 358500
A lot of comments in LLDB are surrounded by an ASCII line to delimit the
begging and end of the comment.
Its use is not really consistent across the code base, sometimes the
lines are longer, sometimes they are shorter and sometimes they are
omitted. Furthermore, it looks kind of weird with the 80 column limit,
where the comment actually extends past the line, but not by much.
Furthermore, when /// is used for Doxygen comments, it looks
particularly odd. And when // is used, it incorrectly gives the
impression that it's actually a Doxygen comment.
I assume these lines were added to improve distinguishing between
comments and code. However, given that todays editors and IDEs do a
great job at highlighting comments, I think it's worth to drop this for
the sake of consistency. The alternative is fixing all the
inconsistencies, which would create a lot more churn.
Differential revision: https://reviews.llvm.org/D60508
llvm-svn: 358135
Summary:
This patch adds support for parsing STACK CFI records from breakpad
files. The expressions specifying the values of registers are not
parsed.The idea is that these will be handed off to the postfix
expression -> dwarf compiler, once it is extracted from the internals of
the NativePDB plugin.
Reviewers: clayborg, amccarth, markmentovai
Subscribers: aprantl, lldb-commits
Differential Revision: https://reviews.llvm.org/D60268
llvm-svn: 357975
Previously we would classify all STACK records into a single bucket.
This is not really helpful, because there are three distinct types of
records beginning with the token "STACK" (STACK CFI INIT, STACK CFI,
STACK WIN). To be consistent with how we're treating other records, we
should classify these as three different record types.
It also implements the logic to put "STACK CFI INIT" and "STACK CFI"
records into the same "section" of the breakpad file, as they are meant
to be read together (similar to how FUNC and LINE records are treated).
The code which performs actual parsing of these records will come in a
separate patch.
llvm-svn: 357691
The `ap` suffix is a remnant of lldb's former use of auto pointers,
before they got deprecated. Although all their uses were replaced by
unique pointers, some variables still carried the suffix.
In r353795 I removed another auto_ptr remnant, namely redundant calls to
::get for unique_pointers. Jim justly noted that this is a good
opportunity to clean up the variable names as well.
I went over all the changes to ensure my find-and-replace didn't have
any undesired side-effects. I hope I didn't miss any, but if you end up
at this commit doing a git blame on a weirdly named variable, please
know that the change was unintentional.
llvm-svn: 353912
instead of returning the UUID through by-ref argument and a boolean
value indicating success, we can just return it directly. Since the UUID
class already has an invalid state, it can be used to denote the failure
without the additional bool.
llvm-svn: 353714
The two records aren't used by anything yet, but this part can be
separated out easily, so I am comitting it separately to simplify
reviews of the followup patch.
llvm-svn: 352507
Summary:
This addresses the issues raised in D56844. It removes the accessors from the
breakpad record structures by making the fields public. Also, I refactor the
UUID parsing code to remove hard-coded constants.
Reviewers: lemo
Subscribers: clayborg, lldb-commits
Differential Revision: https://reviews.llvm.org/D57037
llvm-svn: 352021
This patch extends SymbolFileBreakpad::AddSymbols to include the symbols
from the FUNC records too. These symbols come from the debug info and
have a size associated with them, so they are given preference in case
there is a PUBLIC record for the same address.
To achieve this, I first pre-process the symbols into a temporary
DenseMap, and then insert the uniqued symbols into the module's symtab.
Reviewers: clayborg, lemo, zturner
Reviewed By: clayborg
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D56590
llvm-svn: 351781
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
This centralizes parsing of breakpad records, which was previously
spread out over ObjectFileBreakpad and SymbolFileBreakpad.
For each record type X there is a separate breakpad::XRecord class, and
an associated parse function. The classes just store the information in
the breakpad records in a more accessible form. It is up to the users to
determine what to do with that data.
This separation also made it possible to write some targeted tests for
the parsing code, which was previously unaccessible, so I write a couple
of those too.
Reviewers: clayborg, lemo, zturner
Reviewed By: clayborg
Subscribers: mgorny, fedor.sergeev, lldb-commits
Differential Revision: https://reviews.llvm.org/D56844
llvm-svn: 351541
Summary:
This patch allows ObjectFileBreakpad to parse the contents of Breakpad
files into sections. This sounds slightly odd at first, but in essence
its not too different from how other object files handle things. For
example in elf files, the symtab section consists of a number of
"records", where each record represents a single symbol. The same is
true for breakpad's PUBLIC section, except in this case, the records will be
textual instead of binary.
To keep sections contiguous, I create a new section every time record
type changes. Normally, the breakpad processor will group all records of
the same type in one block, but the format allows them to be intermixed,
so in general, the "object file" may contain multiple sections with the
same record type.
Reviewers: clayborg, zturner, lemo, markmentovai, amccarth
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D55434
llvm-svn: 350511