33 Commits

Author SHA1 Message Date
Kevin Frei
3bdc4c702d
Gsymutil aggregation similar to DwarfDump --verify (#81154)
GsymUtil, like DwarfDump --verify, spews a *lot* of data necessary to
understand/diagnose issues with DWARF data. The trouble is that the kind
of information necessary to make the messages useful also makes them
nearly impossible to easily categorize. I put together a similar output
categorizer (https://github.com/llvm/llvm-project/pull/79648) that will
emit a summary of issues identified at the bottom of the (very verbose)
output, enabling easier tracking of issues as they arise or are
addressed.

There's a single output change, where a message "warning: Unable to
retrieve DWO .debug_info section for some object files. (Remove the
--quiet flag for full output)" was being dumped the first time it was
encountered (in what looks like an attempt to make something easily
grep-able), but rather than keep the output in the same order, that
message is now a 'category' so gets emitted at the end of the output.
The test 'tools/llvm-gsymutil/X86/elf-dwo.yaml' was changed to reflect
this difference.

---------

Co-authored-by: Kevin Frei <freik@meta.com>
2024-02-12 16:57:02 -08:00
kusmour
59c9a48d5e
[llvm-gsymutil] Fix assert failure on FileEntry.Dir empty (#79926)
Summary:
FileEntry.Dir can be empty if debug info only contains relative path.
This caused an assertion failure when gsym segmentation is trying to
copy a file entry with empty dir. As the fitst entry of StringTable is
always empty (and is preserved), `StringOffsetMap` doesn't have key 0.
Hence, `find(0)` returns `End` and `operator->()` fails the assertion

Test Plan:
./bin/llvm-lit -sv llvm/test/tools/llvm-gsymutil/X86/elf-empty-dir.yaml
2024-01-29 20:41:02 -05:00
Kazu Hirata
d7b18d5083 Use llvm::endianness{,::little,::native} (NFC)
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form.  This patch replaces
llvm::support::endianness with llvm::endianness.
2023-10-09 00:54:47 -07:00
Greg Clayton
a3afff9fd5 Remove some noisy log messages from showing up in llvm-gsymutil output.
This patch removes two log messages that were causing noisy output:
- when we have a zero sized symbol that gets removed in favor of something with a size or with debug info
- when an inlined function's address range has the same high and low pc, don't emit an error message as this is a common technique to indicate a function has been stripped or is no longer present.

Differential Revision: https://reviews.llvm.org/D156834
2023-08-09 14:13:03 -07:00
Greg Clayton
1d9c7c4161 Increase performance of llvm-gsymutil by up to 200%.
llvm-gsymutil was maintaining an address ranges collection behind a mutex and having the multi-threaded code access this and hold the mutex was causing slowdown when converting DWARF to GSYM. This patch does the following:
- removes the "Ranges" variable from the GsymCreator and any functions and places that used it
- clients don't try to detect if a function has been added for an address range, we now remove any inferior copies of information in the GsymCreator::finalize() routine as was done before, we just have more items to remove, though performance is greator due to less mutex thread locking
- after I started adding all of the inferior funtion info objects the previous patch that tried to remove infrior debug info had bugs in it, so I replace the removeIfBinary() function in GsymCreator with a more efficient and easier to debug way to do things which copies items from the GsymCreator::Funcs into a new vector of FunctionInfo objects and then replaces GsymCreator::Funcs at the end.
- Sorting of FunctionInfo objects has been modified to also compare InlineInfo objects. We found cases where LTO was ruining inline function address ranges and we ended up with a variety of FunctionInfo objects for the same range that had varying amounts of valid debug info. This patch now ensure that two function info objects with different inline info for the same function address range, the best one will be picked to ensure the greatest fidelity.
- If we detect that a DW_TAG_subprogram has inline functions and after parsing it, we don't end up with any valid inline information, we set the optional to std::nullopt to avoid emitting empty inline information and wasting space.

My tests show a 200% perf increase on M1 macs and a 100% performance increase on linux machines for the same complex large DWARF input binary.

Differential Revision: https://reviews.llvm.org/D156773
2023-08-01 13:48:06 -07:00
Kazu Hirata
0fa1ee8f1c [DebugInfo] Fix a warning
This patch fixes:

  llvm/lib/DebugInfo/GSYM/GsymCreator.cpp:117:18: error: unused
  variable 'MaxAddressOffset' [-Werror,-Wunused-variable]
2023-03-06 19:28:35 -08:00
Greg Clayton
d8e077e2ca Add the ability to segment GSYM files.
Some workflows can generate large GSYM files and sharding GSYM files into segments can help some performant workflows that can take advantage of smaller GSYM files. This patch add a new --segment-size option to llvm-gsymutil. This option can specify a rough size in bytes of how large each segment should be.

Segmented GSYM files contain only the strings and files that are needed for the FunctionInfo objects that are added to each shard. The output file path gets the first address of the first contained function info appended as a suffix to the filename. If a base address of an image is set in the GsymCreator, then all segments will use this same base address which allows lookups for symbolication to happen correctly when the image has been slid in memory.

Code has been addeed to refactor and re-use methods within the GsymCreator to allow for segments to be created easily and tested.

Example of segmenting GSYM files:

$ llvm-gsymutil --convert llvm-gsymutil.dSYM -o llvm-gsymutil.gsym --segment-size 10485760
$ ls -l llvm-gsymutil.gsym-*
-rw-r--r--  1 gclayton  staff  10485839 Feb  9 10:45 llvm-gsymutil.gsym-0x1000030c0
-rw-r--r--  1 gclayton  staff  10485765 Feb  9 10:45 llvm-gsymutil.gsym-0x100668888
-rw-r--r--  1 gclayton  staff  10485881 Feb  9 10:45 llvm-gsymutil.gsym-0x100c948b8
-rw-r--r--  1 gclayton  staff  10485954 Feb  9 10:45 llvm-gsymutil.gsym-0x101659e70
-rw-r--r--  1 gclayton  staff  10485792 Feb  9 10:45 llvm-gsymutil.gsym-0x1022b1dc0
-rw-r--r--  1 gclayton  staff  10485889 Feb  9 10:45 llvm-gsymutil.gsym-0x102a18b10
-rw-r--r--  1 gclayton  staff  10485893 Feb  9 10:45 llvm-gsymutil.gsym-0x1030b05d0
-rw-r--r--  1 gclayton  staff  10485802 Feb  9 10:45 llvm-gsymutil.gsym-0x1037caaac
-rw-r--r--  1 gclayton  staff  10485781 Feb  9 10:45 llvm-gsymutil.gsym-0x103e767a0
-rw-r--r--  1 gclayton  staff  10485832 Feb  9 10:45 llvm-gsymutil.gsym-0x10452d0d4
-rw-r--r--  1 gclayton  staff  10485782 Feb  9 10:45 llvm-gsymutil.gsym-0x104b93310
-rw-r--r--  1 gclayton  staff   6255785 Feb  9 10:45 llvm-gsymutil.gsym-0x10526bf34

Differential Revision: https://reviews.llvm.org/D145448
2023-03-06 16:08:37 -08:00
Douglas Yung
a14e3c2aa7 Revert "Add the ability to segment GSYM files."
This reverts commit fe758254181a824d73ad960b651b42f671f8936b.

This change was causing several buildbot failures:
- https://lab.llvm.org/buildbot/#/builders/38/builds/10105
- https://lab.llvm.org/buildbot/#/builders/192/builds/562
- https://lab.llvm.org/buildbot/#/builders/109/builds/58893
- https://lab.llvm.org/buildbot/#/builders/16/builds/44360
- https://lab.llvm.org/buildbot/#/builders/247/builds/2095
- https://lab.llvm.org/buildbot/#/builders/196/builds/27236
- https://lab.llvm.org/buildbot/#/builders/54/builds/3714
2023-03-03 00:25:06 -08:00
Greg Clayton
fe75825418 Add the ability to segment GSYM files.
Some workflows can generate large GSYM files and sharding GSYM files into segments can help some performant workflows that can take advantage of smaller GSYM files. This patch add a new --segment-size option to llvm-gsymutil. This option can specify a rough size in bytes of how large each segment should be.

Segmented GSYM files contain only the strings and files that are needed for the FunctionInfo objects that are added to each shard. The output file path gets the first address of the first contained function info appended as a suffix to the filename. If a base address of an image is set in the GsymCreator, then all segments will use this same base address which allows lookups for symbolication to happen correctly when the image has been slid in memory.

Code has been addeed to refactor and re-use methods within the GsymCreator to allow for segments to be created easily and tested.

Example of segmenting GSYM files:

$ llvm-gsymutil --convert llvm-gsymutil.dSYM -o llvm-gsymutil.gsym --segment-size 10485760
$ ls -l llvm-gsymutil.gsym-*
-rw-r--r--  1 gclayton  staff  10485839 Feb  9 10:45 llvm-gsymutil.gsym-0x1000030c0
-rw-r--r--  1 gclayton  staff  10485765 Feb  9 10:45 llvm-gsymutil.gsym-0x100668888
-rw-r--r--  1 gclayton  staff  10485881 Feb  9 10:45 llvm-gsymutil.gsym-0x100c948b8
-rw-r--r--  1 gclayton  staff  10485954 Feb  9 10:45 llvm-gsymutil.gsym-0x101659e70
-rw-r--r--  1 gclayton  staff  10485792 Feb  9 10:45 llvm-gsymutil.gsym-0x1022b1dc0
-rw-r--r--  1 gclayton  staff  10485889 Feb  9 10:45 llvm-gsymutil.gsym-0x102a18b10
-rw-r--r--  1 gclayton  staff  10485893 Feb  9 10:45 llvm-gsymutil.gsym-0x1030b05d0
-rw-r--r--  1 gclayton  staff  10485802 Feb  9 10:45 llvm-gsymutil.gsym-0x1037caaac
-rw-r--r--  1 gclayton  staff  10485781 Feb  9 10:45 llvm-gsymutil.gsym-0x103e767a0
-rw-r--r--  1 gclayton  staff  10485832 Feb  9 10:45 llvm-gsymutil.gsym-0x10452d0d4
-rw-r--r--  1 gclayton  staff  10485782 Feb  9 10:45 llvm-gsymutil.gsym-0x104b93310
-rw-r--r--  1 gclayton  staff   6255785 Feb  9 10:45 llvm-gsymutil.gsym-0x10526bf34

Differential Revision: https://reviews.llvm.org/D143793
2023-03-02 20:40:07 -08:00
Greg Clayton
838a57e1a5 Fix a bug introduced by the move of AddressRanges.h into ADT.
The bug was introduced when the AddressRange class was no longer able to modify the End address directly and the entire range of the .text address range that contained the trailing empty symbol was replaced. There was no unit test for this, so it wasn't caught. I fixed the bug and added a unit test for it.

The effects of this bug are serious as the AddressOffsetSize in the header would be incorrectly calculated and an invalid GSYM would be created.

Differential Revision: https://reviews.llvm.org/D127811
2022-06-16 10:50:46 -07:00
Alexey Lapshin
854c33946f [llvm-gsymutil][NFC] refactor AddressRange&AddresRanges structures.
llvm-gsymutil has an implementation of AddressRange and AddressRanges
classes. That implementation might be reused in other parts of llvm.
This patch moves AddressRange and AddressRanges classes into llvm/ADT.

Differential Revision: https://reviews.llvm.org/D124350
2022-04-26 12:00:43 +03:00
Greg Clayton
ab546ead3b Fix a case where multiple symbols with zero size would cause duplicate entries in gsym files.
Symbol tables can have symbols with no size in mach-o files that were failing to get combined into a single entry. This resulted in many duplicate entries for the same address and made gsym files larger.

Differential Revision: https://reviews.llvm.org/D105068
2021-06-28 18:26:26 -07:00
Simon Giesecke
5f2d4b23b4 Add --quiet option to llvm-gsymutil to suppress output of warnings.
Differential Revision: https://reviews.llvm.org/D102829
2021-05-27 12:36:34 +00:00
Simon Giesecke
81b2fcf26f Use a non-recursive mutex in GsymCreator.
There doesn't seem to be a need to support recursive locking,
and a recursive mutex is unnecessarily inefficient.

Differential Revision: https://reviews.llvm.org/D102486
2021-05-19 10:06:47 +00:00
Simon Giesecke
4ea4d9c066 Move FunctionInfo in addFunctionInfo rather than copying.
Differential Revision: https://reviews.llvm.org/D102485
2021-05-19 10:06:47 +00:00
Simon Giesecke
f29c4c6097 Avoid calculating the string hash twice in GsymCreator::insertString.
Do the single hash calculation before acquiring the lock, to reduce
lock contention. If Copy is true, and the string was not yet contained
in the StringStorage, use the new address from StringStorage, but
reuse the hash we already calculated.

Differential Revision: https://reviews.llvm.org/D102484
2021-05-19 10:06:47 +00:00
Simon Giesecke
e102fd50f9 Reformat GSYMCreator.cpp
Differential Revision: https://reviews.llvm.org/D102483
2021-05-19 10:06:47 +00:00
Greg Clayton
e5bdacba2e Optimize GSymCreator::finalize.
The algorithm removing duplicates from the Funcs list used to have
amortized quadratic time complexity because it was potentially
removing each entry using std::vector::erase individually. This
patch is now using a erase-remove idiom with an adapted
removeIfBinary algorithm.

Probably this was made under the assumption that these removals are
rare, but there are cases where the case of duplicate entries is
occurring frequently. In these cases, the actual runtime was very
poor, taking hours to process a single binary of around 1 GiB size
including debug info. Another factor contributing to that is the
frequent output of the warning, which is now removed.

It seems this is particularly an issue with GCC-compiled binaries,
rather than clang-built binaries.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D102219
2021-05-12 15:18:07 -07:00
Kazu Hirata
352fcfc697 [llvm] Use llvm::sort (NFC) 2021-01-17 10:39:45 -08:00
Greg Clayton
ffe6695acf Fix buildbots with merge that didn't happen for 4050b01ba9ece02721ec496383baee219ca8cc2b. 2020-03-04 19:28:24 -08:00
Greg Clayton
4050b01ba9 Fix GSYM tests to run the yaml files and fix test failures on some machines.
YAML files were not being run during lit testing as there was no lit.local.cfg file. Once this was fixed, some buildbots would fail due to a StringRef that pointed to a std::string inside of a temporary llvm::Triple object. These issues are fixed here by making a local triple object that stays around long enough so the StringRef points to valid data. Fixed memory sanitizer bot bugs as well.

Differential Revision: https://reviews.llvm.org/D75390
2020-03-04 19:14:08 -08:00
Mitch Phillips
58079aa91b Revert "Fix GSYM tests to run the yaml files and fix test failures on some machines."
This reverts commit 8d41f1a02369537cae1a7d00c0fa717fc3aca575.

This change broke the MSan buildbots - see comments in
https://reviews.llvm.org/D75390 for more information.
2020-03-04 10:21:54 -08:00
Greg Clayton
8d41f1a023 Fix GSYM tests to run the yaml files and fix test failures on some machines.
YAML files were not being run during lit testing as there was no lit.local.cfg file. Once this was fixed, some buildbots would fail due to a StringRef that pointed to a std::string inside of a temporary llvm::Triple object. These issues are fixed here by making a local triple object that stays around long enough so the StringRef points to valid data. Also fixed an issue where strings for files in the file table could be added in opposite order due to parameters to function calls not having a strong ordering, which caused tests to fail. Added new arch specfic directories so when targets are not enabled, we continue to function just fine.

Differential Revision: https://reviews.llvm.org/D75390
2020-03-02 15:40:11 -08:00
Greg Clayton
e3afe5952d Revert "Fix GSYM tests to run the yaml files and fix test failures on some machines."
This reverts commit 57688350adea307e7bccb83b68a5b7333de31fd7.

Need to conditionalize for ARM targets, this is failing on machines that don't have ARM targets.
2020-03-02 13:07:58 -08:00
Greg Clayton
57688350ad Fix GSYM tests to run the yaml files and fix test failures on some machines.
YAML files were not being run during lit testing as there was no lit.local.cfg file. Once this was fixed, some buildbots would fail due to a StringRef that pointed to a std::string inside of a temporary llvm::Triple object. These issues are fixed here by making a local triple object that stays around long enough so the StringRef points to valid data. Also fixed an issue where strings for files in the file table could be added in opposite order due to parameters to function calls not having a strong ordering, which caused tests to fail.

Differential Revision: https://reviews.llvm.org/D75390
2020-03-02 12:52:53 -08:00
Greg Clayton
2f6cc21f44 Add a llvm-gsymutil tool that can convert object files to GSYM and perform lookups.
Summary:
This patch creates the llvm-gsymutil binary that can convert object files to GSYM using the --convert <path> option. It can also dump and lookup addresses within GSYM files that have been saved to disk.

To dump a file:

llvm-gsymutil /path/to/a.gsym

To perform address lookups, like with atos, on GSYM files:

llvm-gsymutil --address 0x1000 --address 0x1100 /path/to/a.gsym

To convert a mach-o or ELF file, including any DWARF debug info contained within the object files:

llvm-gsymutil --convert /path/to/a.out --out-file /path/to/a.out.gsym

Conversion highlights:
- convert DWARF debug info in mach-o or ELF files to GSYM
- convert symbols in symbol table to GSYM and don't convert symbols that overlap with DWARF debug info
- extract UUID from object files
- extract .text (read + execute) section address ranges and filter out any DWARF or symbols that don't fall in those ranges.
- if .text sections are extracted, and if the last gsym::FunctionInfo object has no size, cap the size to the end of the section the function was contained in

Dumping GSYM files will dump all sections of the GSYM file in textual format.

Reviewers: labath, aadsm, serhiy.redko, jankratochvil, xiaobai, wallace, aprantl, JDevlieghere, jdoerfert

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74883
2020-02-25 21:11:05 -08:00
Greg Clayton
5e13e0ce4c [NFC] Move ValidTextRanges out of DwarfTransformer and into GsymCreator and unify address is not in GSYM errors so all strings match. 2020-02-15 16:48:23 -08:00
Greg Clayton
19602b7194 Add a DWARF transformer class that converts DWARF to GSYM.
Summary:
The DWARF transformer is added as a class so it can be unit tested fully.

The DWARF is converted to GSYM format and handles many special cases for functions:
- omit functions in compile units with 4 byte addresses whose address is UINT32_MAX (dead stripped)
- omit functions in compile units with 8 byte addresses whose address is UINT64_MAX (dead stripped)
- omit any functions whose high PC is <= low PC (dead stripped)
- StringTable builder doesn't copy strings, so we need to make backing copies of strings but only when needed. Many strings come from sections in object files and won't need to have backing copies, but some do.
- When a function doesn't have a mangled name, store the fully qualified name by creating a string by traversing the parent decl context DIEs and then. If we don't do this, we end up having cases where some function might appear in the GSYM as "erase" instead of "std::vector<int>::erase".
- omit any functions whose address isn't in the optional TextRanges member variable of DwarfTransformer. This allows object file to register address ranges that are known valid code ranges and can help omit functions that should have been dead stripped, but just had their low PC values set to zero. In this case we have many functions that all appear at address zero and can omit these functions by making sure they fall into good address ranges on the object file. Many compilers do this when the DWARF has a DW_AT_low_pc with a DW_FORM_addr, and a DW_AT_high_pc with a DW_FORM_data4 as the offset from the low PC. In this case the linker can't write the same address to both the high and low PC since there is only a relocation for the DW_AT_low_pc, so many linkers tend to just zero it out.

Reviewers: aprantl, dblaikie, probinson

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74450
2020-02-13 10:48:37 -08:00
Nico Weber
cae2662104 Fix Windows build after r374381
llvm-svn: 374413
2019-10-10 18:20:16 +00:00
Reid Kleckner
f05ed6601f Remove strings.h include to fix GSYM Windows build
Fifth time's the charm.

llvm-svn: 374411
2019-10-10 18:17:24 +00:00
Greg Clayton
d665bfcf7c Fix buildbots by using memset instead of bzero.
llvm-svn: 374409
2019-10-10 18:11:49 +00:00
Greg Clayton
4c145df6a7 Unbreak llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast buildbot.
llvm-svn: 374398
2019-10-10 17:52:33 +00:00
Greg Clayton
4b6c9de868 Add GsymCreator and GsymReader.
This patch adds the ability to create GSYM files with GsymCreator, and read them with GsymReader. Full testing has been added for both new classes.

This patch differs from the original patch https://reviews.llvm.org/D53379 in that is uses a StringTableBuilder class from llvm instead of a custom version. Support for big and little endian files has been added. If the endianness matches the current host, we use efficient extraction for the header, address table and address info offset tables.

Differential Revision: https://reviews.llvm.org/D68744

llvm-svn: 374381
2019-10-10 17:10:11 +00:00