Intel Vtune/SEP has supported collecting LBR on Windows and generating
perf-script file which is same format as Linux perf script. This patch
teaches llvm-profgen to disassemble COFF binary so that we can do
Sampling based PGO on Windows.
GNU addr2line supports lookup by symbol name in addition to the existing
address lookup. llvm-symbolizer starting from
e144ae54dcb96838a6176fd9eef21028935ccd4f supports lookup by symbol name.
This change extends this lookup with possibility to specify optional
offset.
Now the address for which source information is searched for can be
specified with offset:
llvm-symbolize --obj=abc.so "SYMBOL func_22+0x12"
It decreases the gap in features of llvm-symbolizer and GNU addr2line.
This lookup now is supported for code only.
Migrated from: https://reviews.llvm.org/D139859
Pull request: https://github.com/llvm/llvm-project/pull/75067
Recent versions of GNU binutils starting from 2.39 support symbol+offset
lookup in addition to the usual numeric address lookup. This change adds
symbol lookup to llvm-symbolize and llvm-addr2line.
Now llvm-symbolize behaves closer to GNU addr2line, - if the value specified
as address in command line or input stream is not a number, it is treated as
a symbol name. For example:
llvm-symbolize --obj=abc.so func_22
llvm-symbolize --obj=abc.so "CODE func_22"
This lookup is now supported only for functions. Specification with
offset is not supported yet.
This is a recommit of 2b27948783e4bbc1132d3220d8517ef62607b558, reverted
in 39fec5457c0925bd39f67f63fe17391584e08258 because the test
llvm/test/Support/interrupts.test started failing on Windows. The test was
changed in 18f036d0105589c3175bb51a518c5d272dae61e2 and is also updated in
this commit.
Differential Revision: https://reviews.llvm.org/D149759
Recent versions of GNU binutils starting from 2.39 support symbol+offset
lookup in addition to the usual numeric address lookup. This change adds
symbol lookup to llvm-symbolize and llvm-addr2line.
Now llvm-symbolize behaves closer to GNU addr2line, - if the value specified
as address in command line or input stream is not a number, it is treated as
a symbol name. For example:
llvm-symbolize --obj=abc.so func_22
llvm-symbolize --obj=abc.so "CODE func_22"
This lookup is now supported only for functions. Specification with
offset is not supported yet.
Differential Revision: https://reviews.llvm.org/D149759
"BTF" is a debug information format used by LLVM's BPF backend.
The format is much smaller in scope than DWARF, the following info is
available:
- full set of C types used in the binary file;
- types for global values;
- line number / line source code information .
BTF information is embedded in ELF as .BTF and .BTF.ext sections.
Detailed format description could be found as a part of Linux Source
tree, e.g. here: [1].
This commit modifies `llvm-objdump` utility to use line number
information provided by BTF if DWARF information is not available.
E.g., the goal is to make the following to print source code lines,
interleaved with disassembly:
$ clang --target=bpf -g test.c -o test.o
$ llvm-strip --strip-debug test.o
$ llvm-objdump -Sd test.o
test.o: file format elf64-bpf
Disassembly of section .text:
<foo>:
; void foo(void) {
r1 = 0x1
; consume(1);
call -0x1
r1 = 0x2
; consume(2);
call -0x1
; }
exit
A common production use case for BPF programs is to:
- compile separate object files using clang with `-g -c` flags;
- link these files as a final "static" binary using bpftool linker ([2]).
The bpftool linker discards most of the DWARF sections
(line information sections as well) but merges .BTF and .BTF.ext sections.
Hence, having `llvm-objdump` capable to print source code using .BTF.ext
is valuable.
The commit consists of the following modifications:
- llvm/lib/DebugInfo/BTF aka `DebugInfoBTF` component is added to host
the code needed to process BTF (with assumption that BTF support
would be added to some other tools as well, e.g. `llvm-readelf`):
- `DebugInfoBTF` provides `llvm::BTFParser` class, that loads information
from `.BTF` and `.BTF.ext` sections of a given `object::ObjectFile`
instance and allows to query this information.
Currently only line number information is loaded.
- `DebugInfoBTF` also provides `llvm::BTFContext` class, which is an
implementation of `DIContext` interface, used by `llvm-objdump` to
query information about line numbers corresponding to specific
instructions.
- Structure `DILineInfo` is modified with field `LineSource`.
`DIContext` interface uses `DILineInfo` structure to communicate
line number and source code information.
Specifically, `DILineInfo::Source` field encodes full file source code,
if available. BTF only stores source code for selected lines of the
file, not a complete source file. Moreover, stored lines are not
guaranteed to be sorted in a specific order.
To avoid reconstruction of a file source code from a set of
available lines, this commit adds `LineSource` field instead.
- `Symbolize` class is modified to use `BTFContext` instead of
`DWARFContext` when DWARF sections are not available but BTF
sections are present in the object file.
(`Symbolize` is instantiated by `llvm-objdump`).
- Integration and unit tests.
Note, that DWARF has a notion of "instruction sequence".
DWARF implementation of `DIContext::getLineInfoForAddress()` provides
inexact responses if exact address information is not available but
address falls within "instruction sequence" with some known line
information (see `DWARFDebugLine::LineTable::findRowInSeq()`).
BTF does not provide instruction sequence groupings, thus
`getLineInfoForAddress()` queries only return exact matches.
This does not seem to be a big issue in practice, but output
of the `llvm-objdump -Sd` might differ slightly when BTF
is used instead of DWARF.
[1] https://www.kernel.org/doc/html/latest/bpf/btf.html
[2] https://github.com/libbpf/bpftool
Depends on https://reviews.llvm.org/D149501
Reviewed By: MaskRay, yonghong-song, nickdesaulniers, #debug-info
Differential Revision: https://reviews.llvm.org/D149058
GNU addr2line exits immediately if -e (default to a.out) specifies a file that
cannot be open or a directory. llvm-addr2line used to wait for input on if the
input file cannot be open and addresses are not specified in command line.
Replace the D147652 checkFileExists with getOrCreateModuleInfo to avoid
a separate `sys::fs::status` operation.
Reviewed By: sepavloff
Differential Revision: https://reviews.llvm.org/D153595
GNU addr2line exits immediately if it cannot open the file specified as
executable/relocatable. In contrast llvm-addr2line does not exit and, if
addresses are not specified in command line, waits for input on stdin. This
causes the test compiler-rt/test/asan/TestCases/Posix/asan-symbolize-bad-path.cc to block
forever on Gentoo (see https://reviews.llvm.org/rG27c4777f41d2ab204c1cf84ff1cccd5ba41354da#1190273).
To fix this issue the behavior llvm-addr2line now exits if
executable/relocatable file cannot be found.
It fixes https://github.com/llvm/llvm-project/issues/42099 (llvm-addr2line
does not exit when passed a non-existent file).
Differential Revision: https://reviews.llvm.org/D147652
This should be last of the "bottom-up conversions" of various demanglers
to accept std::string_view. After this, D149104 may be revisited.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D152176
D149104 converted llvm::demangle to use std::string_view. Enabling
"expensive checks" (via -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON) causes
lld/test/wasm/why-extract.s to fail. The reason for this is obscure:
Reason #10007 why std::string_view is dangerous:
Consider the following pattern:
std::string_view s = ...;
const char *c = s.data();
std::strlen(c);
Is c a NUL-terminated C style string? It depends; but if it's not then
it's not safe to call std::strlen on the std::string_view::data().
std::string_view::length() should be used instead.
Fixing this fixes the one lone test that caught this.
microsoftDemangle, rustDemangle, and dlangDemangle should get this same
treatment, too. I will do that next.
Reviewed By: MaskRay, efriedma
Differential Revision: https://reviews.llvm.org/D149675
This is a recommit of 75f1f158812d, reverted in 7a443b1c493d, because
it caused compilation error in
compiler-rt/lib/sanitizer_common/symbolizer/sanitizer_symbolize.cpp.
The error was fixed by Kasimir Georgiev in de4c038c7ba2, but this
commit was reverted in de088dd3a0aa, because the initial commit was
reverted.
This commit reverts both the reverting commits, 7a443b1c493d and
de088dd3a0aa.
Original commit message is below.
If llvm-symbolize did not find module, the error looked like:
LLVMSymbolizer: error reading file: No such file or directory
This message does not follow common practice: LLVMSymbolizer is not an
utility name. Also the message did not not contain the name of missed file.
With this change the error message looks differently:
llvm-symbolizer: error: 'abc': No such file or directory
This format is closer to messages produced by other utilities and allow
proper coloring.
Differential Revision: https://reviews.llvm.org/D148032
If llvm-symbolize did not find module, the error looked like:
LLVMSymbolizer: error reading file: No such file or directory
This message does not follow common practice: LLVMSymbolizer is not an
utility name. Also the message did not not contain the name of missed file.
With this change the error message looks differently:
llvm-symbolizer: error: 'abc': No such file or directory
This format is closer to messages produced by other utilities and allow
proper coloring.
Differential Revision: https://reviews.llvm.org/D148032
This makes parsing for build IDs in the markup filter slightly more
permissive, in line with fromHex.
It also removes the distinction between missing build ID and empty build
ID; empty build IDs aren't a useful concept, since their purpose is to
uniquely identify a binary. This removes a layer of indirection wherever
build IDs are obtained.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D147485
On i386 Windows, after C++ names have been Itanium-mangled, the C name
mangling specific to its call convention may also be applied on top.
This change teaches symbolizer to be able to demangle this type of
mangled names.
As part of this change, `demanglePE32ExternCFunc` has also been modified
to fix unwanted stripping for vectorcall names when the demangled name
is supposed to contain a leading `_`. Notice that the vectorcall
mangling does not add either an `_` or `@` prefix. The old code always
tries to strip the prefix first, which for Itanium mangled names in
vectorcall, the leading underscore of the Itanium name gets stripped
instead and breaks the Itanium demangler.
Differential Revision: https://reviews.llvm.org/D144049
This creates a library for fetching debug info by build ID, whether
locally or remotely via debuginfod. The functionality was refactored
out of existing code in the Symboliize library. Existing utilities
were refactored to use this library.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D132504
This library implements the class `DebuginfodCollection`, which scans a set of directories for binaries, classifying them according to whether they contain debuginfo. This also provides the `DebuginfodServer`, an `HTTPServer` which serves debuginfod's `/debuginfo` and `/executable` endpoints. This is intended as the final new supporting library required for `llvm-debuginfod`.
As implemented here, `DebuginfodCollection` only finds ELF binaries and DWARF debuginfo. All other files are ignored. However, the class interface is format-agnostic. Generalizing to support other platforms will require refactoring of LLVM's object parsing libraries to eliminate use of `report_fatal_error` ([[ https://github.com/llvm/llvm-project/blob/main/llvm/lib/Object/WasmObjectFile.cpp#L74 | e.g. when reading WASM files ]]), so that the debuginfod daemon does not crash when it encounters a malformed file on the disk.
The `DebuginfodCollection` is tested by end-to-end tests of the debuginfod server (D114846).
Reviewed By: mysterymath
Differential Revision: https://reviews.llvm.org/D114845
Follow up to 1e396affca6a0d21247d960c93a415e8f6fe0301. On some standard
library configurations these have a dependency on the complete type of
SymbolizableModule.
This change adds a simple LRU cache to the Symbolize class to put a cap
on llvm-symbolizer memory usage. Previously, the Symbolizer's virtual
memory footprint would grow without bound as additional binaries were
referenced.
I'm putting this out there early for an informal review, since there may be
a dramatically different/better way to go about this. I still need to
figure out a good default constant for the memory cap and benchmark the
implementation against a large symbolization workload. Right now I've
pegged max memory usage at zero for testing purposes, which evicts the whole
cache every time.
Unfortunately, it looks like StringRefs in the returned DI objects can
directly refer to the contents of binaries. Accordingly, the cache
pruning must be explicitly requested by the caller, as the caller must
guarantee that none of the returned objects will be used afterwards.
For llvm-symbolizer this a light burden; symbolization occurs
line-by-line, and the returned objects are discarded after each.
Implementation wise, there are a number of nested caches that depend
on one another. I've implemented a simple Evictor callback system to
allow derived caches to register eviction actions to occur when the
underlying binaries are evicted.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D119784
On some standard library configurations these have a dependency on the
complete type of SymbolizableModule. They also do a lot of
copying/freeing so no point in inlining them.
This adds a --build-id=<hex build ID> flag to llvm-symbolizer. If --obj
is unspecified, this will attempt to look up the provided build ID using
whatever mechanisms are available to the Symbolizer (typically,
debuginfod). The semantics are then as if the found binary were given
using the --obj flag.
Reviewed By: jhenderson, phosek
Differential Revision: https://reviews.llvm.org/D118633
Debuginfod can pull in libcurl as a dependency, which isn't appropriate
for libLLVM. (See
https://gitlab.freedesktop.org/mesa/mesa/-/issues/5732).
This change breaks out debuginfod into a separate non-component library
that can be used directly in llvm-symbolizer. The tool can inject
debuginfod into the Symbolizer library via an abstract DebugInfoFetcher
interface, breaking the dependency of Symbolizer on debuinfod.
See https://github.com/llvm/llvm-project/issues/52731
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D118413
This change moves the SymbolizableObjectFile header to
include/llvm/DebugInfo/Symbolize. Making this header available to other
llvm libraries simplifies use cases where implicit caching, multiple
platform support and other features of the Symbolizer class are not
required. This also makes the dependent libraries easier to unit test
by having mocks which derive from SymbolizableModule.
Differential Revision: https://reviews.llvm.org/D116781
Adds a fallback to use the debuginfod client library (386655) in `findDebugBinary`.
Fixed a cast of Erorr::success() to Expected<> in debuginfod library.
Added Debuginfod to Symbolize deps in gn.
Updates compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh to include Debuginfod library to fix sanitizer-x86_64-linux breakage.
Reviewed By: jhenderson, vitalybuka
Differential Revision: https://reviews.llvm.org/D113717
Adds a fallback to use the debuginfod client library (386655) in `findDebugBinary`.
Fixed a cast of Erorr::success() to Expected<> in debuginfod library.
Added Debuginfod to Symbolize deps in gn.
Updates compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh to include Debuginfod library to fix sanitizer-x86_64-linux breakage.
Reviewed By: jhenderson, vitalybuka
Differential Revision: https://reviews.llvm.org/D113717
Adds a fallback to use the debuginfod client library (386655) in `findDebugBinary`.
Fixed a cast of Erorr::success() to Expected<> in debuginfod library.
Added Debuginfod to Symbolize deps in gn.
Adds new symbolizer symbols to `global_symbols.txt`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D113717
This reverts commit 02cc8d698c4941f8f0120ea1a5d7205fb33a312d because it
caused buildbot failures. The issue appears to be simply that we need to
only enable debuginfod when the HTTPClient has been initialized by the
running tool, since InitLLVM does not do the initialization step anymore.
Adds a fallback to use the debuginfod client library (386655) in `findDebugBinary`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D113717
Add support for demangling Rust v0 symbols to LLVM symbolizer by reusing
nonMicrosoftDemangle which supports both Itanium and Rust mangling.
Reviewed By: dblaikie, jhenderson
Part of https://reviews.llvm.org/D110664
When we build with split dwarf in single mode the .o files that contain both "normal" debug sections and dwo sections, along with relocaiton sections for "normal" debug sections.
When we create DWARF context in DWARFObjInMemory we process relocations and store them in the map for .debug_info, etc section.
For DWO Context we also do it for non dwo dwarf sections. Which I believe is not necessary. This leads to a lot of memory being wasted. We observed 70GB extra memory being used.
I went with context sensitive approach, flag is passed in. I am not sure if it's always safe not to process relocations for regular debug sections if Obj contains .dwo sections.
If it is alternatvie might be just to scan, in constructor, sections and if there are .dwo sections not to process regular debug ones.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D106624
This patch intended to provide additional interface to LLVMsymbolizer
such that they work directly on object files. There is an existing
method - symbolizecode which takes an object file, this patch provides
similar overloads for symbolizeInlinedCode, symbolizeData,
symbolizeFrame. This can be useful for clients who already have a
in-memory object files to symbolize for.
Patch By: pvellien (praveen velliengiri)
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D95232
llvm-symbolizer used to use the DIA SDK for symbolization on
Windows; this patch switches to using native symbolization, which was
implemented recently.
Users can still make the symbolizer use DIA by adding the `-dia` flag
in the LLVM_SYMBOLIZER_OPTS environment variable.
Differential Revision: https://reviews.llvm.org/D91814