Hexagon instructions are VLIW "bundles" of up to four instruction words
encoded as a single MCInst with operands for each sub-instruction.
Previously, the disassembler's getInstruction() returned the full
bundle, which made it difficult to work with llvm-objdump.
For example, since all instructions are bundles, and bundles do not
branch, branch targets could not be printed.
This patch modifies the Hexagon disassembler to return individual
sub-instructions instead of entire bundles, enabling correct printing of
branch targets and relocations. It also introduces
`MCDisassembler::getInstructionBundle` for cases where the full bundle
is still needed.
By default, llvm-objdump separates instructions with newlines. However,
this does not work well for Hexagon syntax:
{ inst1
inst2
inst3
inst4 <branch> } :endloop0
Instructions may be followed by a closing brace, a closing brace with
`:endloop`, or a newline. Branches must appear within the braces.
To address this, `PrettyPrinter::getInstructionSeparator()` is added and
overridden for Hexagon.
(cherry picked from commit ac7ceb3dabfac548caa993e7b77bbadc78af4464)
Reapply "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#…
(#145959)
This reapplies cbf781f0bdf2f680abbe784faedeefd6f84c246e, with fixes for
the shared-library build and the unconventional sanitizer-runtime build.
Original Description:
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
I am happy to put it in another location, or to structure it differently
if that makes sense. Some have suggested in BinaryFormat, but it is not
a great fit there. But if that makes more sense to the reviewers, I can
do that.
Another possibility would be to use pass-through headers to allow
clients who don't care to depend only on DebugInfo/DWARF. This would be
a much less invasive change, and perhaps easier for clients. But also a
system that hides details.
Either way, I'm open.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
Similar to the existing implementations for X86 and PPC, support
symbolizing branch targets for AArch64. Do not omit the address for ADRP
as the target is typically not at an intended location.
Pull Request: https://github.com/llvm/llvm-project/pull/145009
(Revised version of a previous, unreviewed, PR.)
Move all expression verification into its only client: DWARFVerifier.
Move all printing code (which was a mix of static and member functions)
into a separate class.
This is one in a series of refactoring PRs to separate dwarf
functionality into lower-level pieces usable without object files and
sections at build time. The code is already written this way via various
"if (section == nullptr)" and similar conditionals. So the functionality
itself is needed and exists, but only as a runtime feature. The goal of
these refactors is to remove the build-time dependencies, which will
allow the existing functionality to be used from lower-level parts of
the compiler. Particularly from lib/MC/.... More information at:
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Utilize the new extensions to the LLVM Offloading API to extend to
llvm-objdump to handle dumping fatbin offload bundles generated by HIP.
This extension to llvm-objdump adds the option --offload-fatbin.
Specifying this option will take the input object/executable and extract
all offload fatbin bundle entries into distinct code object files with
names reflecting the source file name combined with the Bundle Entry ID.
Users can also use the --arch-name option to filter offload fatbin
bundle entries by their target triple.
---------
Co-authored-by: dsalinas <dsalinas@MKM-L1-DSALINAS.amd.com>
llvm-objdump currently calls MCDisassembler::getInstruction with
unadjusted address and MCInstPrinter::printInst with adjusted address.
The decoded branch targets will be adjusted as expected for most targets
(as the getInstruction address is insignificant) but not for SystemZ
(where the getInstruction address is displayed).
Specify an adjust address to fix SystemZInstPrinter output.
The added test utilizes llvm/utils/update_test_body.py to make updates
easier and additionally checks that we don't adjust SHN_ABS symbol
addresses.
Pull Request: https://github.com/llvm/llvm-project/pull/140471
Utilize the new extensions to the LLVM Offloading API to extend to
llvm-objdump to handle dumping fatbin offload bundles generated by HIP.
This extension to llvm-objdump adds the option --offload-fatbin.
Specifying this option will take the input object/executable and extract
all offload fatbin bundle entries into distinct code object files with
names reflecting the source file name combined with the Bundle Entry ID.
Users can also use the --arch-name option to filter offload fatbin
bundle entries by their target triple.
---------
Co-authored-by: dsalinas <dsalinas@MKM-L1-DSALINAS.amd.com>
--adjust-vma adjusts the current section address. The address printed
for inline relocs is relative to the current section address instead of
the section that the referenced symbol resides in.
Fix https://github.com/llvm/llvm-project/issues/75444
Utilize the new extensions to the LLVM Offloading API to extend to
llvm-objdump to handle dumping fatbin offload bundles generated by HIP.
This extension to llvm-objdump adds the option --offload-fatbin.
Specifying this option will take the input object/executable and extract
all offload fatbin bundle entries into distinct code object files with
names reflecting the source file name combined with the Bundle Entry ID.
Users can also use the --arch-name option to filter offload fatbin
bundle entries by their target triple.
---------
Co-authored-by: dsalinas <dsalinas@MKM-L1-DSALINAS.amd.com>
This implements arm, armeb, thumb, thumbeb PLT entries parsing support
in ELF for llvm-objdump.
Implementation is similar to AArch64MCInstrAnalysis::findPltEntries. PLT
entry signatures are based on LLD code for PLT generation
(ARM::writePlt).
llvm-objdump tests are produced from lld/test/ELF/arm-plt-reloc.s,
lld/test/ELF/armv8-thumb-plt-reloc.s.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
Currently, `DIContext::getLineInfoForAddress` and
`DIContext::getLineInfoForDataAddress` returns empty DILineInfo when the
debug info is missing for the given address. This is not differentiable
with the case when debug info is found for the given address but the
debug info is default value (filename:linenum is <invalid>:0).
This change wraps the return types of `DIContext::getLineInfoForAddress`
and `DIContext::getLineInfoForDataAddress` with `std::optional`.
and fix crash when vd_aux is invalid (#86611).
vd_version, vd_flags, vd_ndx, and vd_cnt in Elf{32,64}_Verdef are
16-bit. Change VerDef to use uint16_t instead.
vda_name specifies a NUL-terminated string. Update getVersionDefinitions
to remove some `.c_str()`.
Pull Request: https://github.com/llvm/llvm-project/pull/128434
The dynamic string table used by the dynamic section is referenced by
the sh_link field of that section, so we should use that directly,
rather than going via the dynamic symbol table.
More info:
https://github.com/llvm/llvm-project/pull/125679#discussion_r1961333454
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
Now that we have a dedicated abstraction for string tables, switch the
option parser library's string table over to it rather than using a raw
`const char*`. Also try to use the `StringTable::Offset` type rather
than a raw `unsigned` where we can to avoid accidental increments or
other issues.
This is based on review feedback for the initial switch of options to a
string table. Happy to tweak or adjust if desired here.
Adds support to objdump and readobj for reading the `UOP_Epilog` entries
of Windows x64 unwind v2.
`UOP_Epilog` has a weird format:
The first `UOP_Epilog` in the unwind data is the "header":
* The least-significant bit of `OpInfo` is the "At End" flag, which
signifies that there is an epilog at the very end of the associated
function.
* `CodeOffset` is the length each epilog described by the current unwind
information (all epilogs have the same length).
Any subsequent `UOP_Epilog` represents another epilog for the current
function, where `OpInfo` and `CodeOffset` are combined to a 12-bit value
which is the offset of the beginning of the epilog from the end of the
current function. If the offset is 0, then this entry is actually
padding and can be ignored.
Apologies for the large change, I looked for ways to break this up and
all of the ones I saw added real complexity. This change focuses on the
option's prefixed names and the array of prefixes. These are present in
every option and the dominant source of dynamic relocations for PIE or
PIC users of LLVM and Clang tooling. In some cases, 100s or 1000s of
them for the Clang driver which has a huge number of options.
This PR addresses this by building a string table and a prefixes table
that can be referenced with indices rather than pointers that require
dynamic relocations. This removes almost 7k dynmaic relocations from the
`clang` binary, roughly 8% of the remaining dynmaic relocations outside
of vtables. For busy-boxing use cases where many different option tables
are linked into the same binary, the savings add up a bit more.
The string table is a straightforward mechanism, but the prefixes
required some subtlety. They are encoded in a Pascal-string fashion with
a size followed by a sequence of offsets. This works relatively well for
the small realistic prefixes arrays in use.
Lots of code has to change in order to land this though: both all the
option library code has to be updated to use the string table and
prefixes table, and all the users of the options library have to be
updated to correctly instantiate the objects.
Some follow-up patches in the works to provide an abstraction for this
style of code, and to start using the same technique for some of the
other strings here now that the infrastructure is in place.
Swap `!DisassembleZeroes` and `if (DumpARMELFData)` conditions so that
in the false DisassembleZeroes case (default), `...` will be printed for
long consecutive zeroes, even when a data mapping symbol is active.
This is especially useful for certain lld tests that insert a huge
padding within a code section. Without `...` the output will be huge.
Pull Request: https://github.com/llvm/llvm-project/pull/109553
In a mach_header, the cpusubtype is a 32-bit field, but it's split in 2
subfields:
- the low 24 bits containing the cpu subtype proper, (e.g.,
CPU_SUBTYPE_ARM64E 2)
- the high 8 bits containing a capability field used for additional
feature flags.
Notably, it's only the subtype subfield that participates in fat file
slice discrimination: the caps are ignored.
arm64e uses the caps subfield to encode a ptrauth ABI version:
- 0x80 (CPU_SUBTYPE_PTRAUTH_ABI) denotes a versioned binary
- 0x40 denotes a kernel-ABI binary
- 0x00-0x0F holds the ptrauth ABI version
This teaches the basic obj tools to decode that (or ignore it when
unneeded).
It also teaches the MachO writer to default to emitting versioned
binaries, but with a version of 0 (and without the kernel ABI flag).
Modern arm64e requires versioned binaries: a binary with 0x00 caps in
cpusubtype is now rejected by the linker and everything after. We can
live without the sophistication of specifying the version and kernel ABI
for now.
Co-authored-by: Francis Visoiu Mistrih <francisvm@apple.com>
The section field has been repurposed for some STAB symbol types, and if
we blindly look it up we'll produce an error and terminate. Logic
already existed
Existing stabs test had a section that was in range. Unfortunately I
don't know of an easy way to produce stabs entries in LLVM (I thought
they died in the 90s until this came up) so I just binary-edited it to
cause a failure on existing llvm-objdump.
Enable local labels computation for BPF disassembly when
`--symbolize-operands` option is specified.
This relies on `MCInstrAnalysis::evaluateBranch()` method, which is
already defined in `BPFMCInstrAnalysis::evaluateBranch`.
After this change the assembly code below:
if r1 > 42 goto +1
r1 -= 10
...
Would be printed as:
if r1 > 42 goto +1 <L0>
r1 -= 10
<L0>:
...
(when `--symbolize-operands` option is set).
See https://reviews.llvm.org/D84191 for the main part of the
`--symbolize-operands` implementation logic.
Extract the llvm-readelf decoder to `decodeCrel` (#91280) and reuse it
for llvm-objdump.
Because the section representation of LLVMObject (`SectionRef`) is
64-bit, insufficient to hold all decoder states, `section_rel_begin` is
modified to decode CREL eagerly and hold the decoded relocations inside
ELFObjectFile<ELFT>.
The test is adapted from llvm/test/tools/llvm-readobj/ELF/crel.test.
Pull Request: https://github.com/llvm/llvm-project/pull/97382
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
70 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".