Reapply "[NFC][DebugInfo][DWARF] Create new low-level dwarf library (#…
(#145959)
This reapplies cbf781f0bdf2f680abbe784faedeefd6f84c246e, with fixes for
the shared-library build and the unconventional sanitizer-runtime build.
Original Description:
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
This is the culmination of a series of changes described in [1].
Although somewhat large by line count, it is almost entirely mechanical,
creating a new library in DebugInfo/DWARF/LowLevel. This new library has
very minimal dependencies, allowing it to be used from more places than
the normal DebugInfo/DWARF library--in particular from MC.
I am happy to put it in another location, or to structure it differently
if that makes sense. Some have suggested in BinaryFormat, but it is not
a great fit there. But if that makes more sense to the reviewers, I can
do that.
Another possibility would be to use pass-through headers to allow
clients who don't care to depend only on DebugInfo/DWARF. This would be
a much less invasive change, and perhaps easier for clients. But also a
system that hides details.
Either way, I'm open.
1.
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665/2
Some of the functions in `#functions` may have several inlined
instances, but also an out-of-line definition.
Therefore, for complex enough DWARF input, `#functions` - `#inlined
functions` would not give us the number of out-of-line function
definitions.
`llvm-dwarfdump`, however, already keeps track of those; print it as
part of the statistics, as this number is useful in certain scenarios.
This patch adds a new set of statistics to llvm-dwarfdump that provide
additional information about .debug_line regarding the number of bytes
covered by the line table (and how many of those are covered by line 0
entries), and the number of entries within the table and how many of
those are is_stmt, unique, or unique and non-line-0 (where "uniqueness"
is based on file, line, and column only).
Collectively these give a little more insight into the state of debug
line information, rather than variables (as most of the dwarfdump
statistics are currently oriented towards). I've added all of the stats
that were useful to some degree, but I think the most generally useful
stat is "unique line entries", since it gives the most straightforward
indication of regressions, i.e. when the number goes down it means that
fewer source lines are reachable in the program.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
70 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
C++20 comes with std::erase to erase a value from std::vector. This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.
We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.
Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
As usual with that header cleanup series, some implicit dependencies now need to
be explicit:
llvm/DebugInfo/DWARF/DWARFContext.h no longer includes:
- "llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h"
- "llvm/DebugInfo/DWARF/DWARFCompileUnit.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAranges.h"
- "llvm/DebugInfo/DWARF/DWARFDebugFrame.h"
- "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
- "llvm/DebugInfo/DWARF/DWARFDebugMacro.h"
- "llvm/DebugInfo/DWARF/DWARFGdbIndex.h"
- "llvm/DebugInfo/DWARF/DWARFSection.h"
- "llvm/DebugInfo/DWARF/DWARFTypeUnit.h"
- "llvm/DebugInfo/DWARF/DWARFUnitIndex.h"
Plus llvm/Support/Errc.h not included by a bunch of llvm/DebugInfo/DWARF/DWARF*.h files
Preprocessed lines to build llvm on my setup:
after: 1065629059
before: 1066621848
Which is a great diff!
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119723
With link-time optimizations enabled, resulting DWARF mayend up containing
cross CU references (through the DW_AT_abstract_origin attribute).
Consider the following example:
// sum.c
__attribute__((always_inline)) int sum(int a, int b)
{
return a + b;
}
// main.c
extern int sum(int, int);
int main()
{
int a = 5, b = 10, c = sum(a, b);
return 0;
}
Compiled as follows:
$ clang -g -flto -fuse-ld=lld main.c sum.c -o main
Results in the following DWARF:
-- sum.c CU: abstract instance tree
...
0x000000b0: DW_TAG_subprogram
DW_AT_name ("sum")
DW_AT_decl_file ("sum.c")
DW_AT_decl_line (1)
DW_AT_prototyped (true)
DW_AT_type (0x000000d3 "int")
DW_AT_external (true)
DW_AT_inline (DW_INL_inlined)
0x000000bc: DW_TAG_formal_parameter
DW_AT_name ("a")
DW_AT_decl_file ("sum.c")
DW_AT_decl_line (1)
DW_AT_type (0x000000d3 "int")
0x000000c7: DW_TAG_formal_parameter
DW_AT_name ("b")
DW_AT_decl_file ("sum.c")
DW_AT_decl_line (1)
DW_AT_type (0x000000d3 "int")
...
-- main.c CU: concrete inlined instance tree
...
0x0000006d: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x00000000000000b0 "sum")
DW_AT_low_pc (0x00000000002016ef)
DW_AT_high_pc (0x00000000002016f1)
DW_AT_call_file ("main.c")
DW_AT_call_line (5)
DW_AT_call_column (0x19)
0x00000081: DW_TAG_formal_parameter
DW_AT_location (DW_OP_reg0 RAX)
DW_AT_abstract_origin (0x00000000000000bc "a")
0x00000088: DW_TAG_formal_parameter
DW_AT_location (DW_OP_reg2 RCX)
DW_AT_abstract_origin (0x00000000000000c7 "b")
...
Note that each entry within the concrete inlined instance tree in
the main.c CU has a DW_AT_abstract_origin attribute which
refers to a corresponding entry within the abstract instance
tree in the sum.c CU.
llvm-dwarfdump --statistics did not properly report
DW_TAG_formal_parameters/DW_TAG_variables from concrete inlined
instance trees which had 0% location coverage and which
referred to a different CU, mainly because information about abstract
instance trees and their parameters/variables was stored
locally - just for the currently processed CU,
rather than globally - for all CUs.
In particular, if the concrete inlined instance tree from
the example above was to look like this
(i.e. parameter b has 0% location coverage, hence why it's missing):
0x0000006d: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x00000000000000b0 "sum")
DW_AT_low_pc (0x00000000002016ef)
DW_AT_high_pc (0x00000000002016f1)
DW_AT_call_file ("main.c")
DW_AT_call_line (5)
DW_AT_call_column (0x19)
0x00000081: DW_TAG_formal_parameter
DW_AT_location (DW_OP_reg0 RAX)
DW_AT_abstract_origin (0x00000000000000bc "a")
llvm-dwarfdump --statistics would have not reported b as such.
Patch by Dimitrije Milosevic.
Differential revision: https://reviews.llvm.org/D113465
3d1d8c767be5537eb5510ee0522e2f3475fe7c04 made
DWARFExpression::iterator's Operation member `mutable`. After a few prep
commits, the iterator can instead be made a `const` iterator since no
caller can change the Operation.
Differential Revision: https://reviews.llvm.org/D113958
There are cases where a concrete DIE with DW_TAG_subprogram can have
abstract_origin attribute, so we handle that situation as well.
Differential Revision: https://reviews.llvm.org/D101025
This is preparation for the https://reviews.llvm.org/D101025.
The D101025 will start calculating var locstats for concrete fns
that refere to an abstract origin as well.
Initial (D96045) patch didn't handle split dwarf cases,
so this fixes that bug.
In addition, before applying this patch, we had a slowdown
that happened after the D96045. With this patch,
the slowdown will be fixed as well.
Differential Revision: https://reviews.llvm.org/D100951
Small optimization of the code -- No need to calculate any stats
for NULL nodes, and also no need to call the collectStatsForDie()
if it is the CU itself.
Differential Revision: https://reviews.llvm.org/D96871
The presence or absence of an inline variable (as well as formal
parameter) with only an abstract_origin ref (without DW_AT_location)
should not change the location coverage.
It means, for both:
DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x0000004e "f")
DW_AT_low_pc (0x0000000000000010)
DW_AT_high_pc (0x0000000000000013)
DW_TAG_formal_parameter
DW_AT_abstract_origin (0x0000005a "b")
and,
DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x0000004e "f")
DW_AT_low_pc (0x0000000000000010)
DW_AT_high_pc (0x0000000000000013)
we should report 0% location coverage. If we add DW_AT_location,
for both cases the coverage should be improved.
Differential Revision: https://reviews.llvm.org/D96045
Fixes PR46575.
Bump statistics version to 6.
Without this patch, for a variable described with a location list the stat
'sum_all_variables(#bytes in parent scope covered by DW_AT_location)' is
calculated by summing all bytes covered by the location ranges in the list and
capping the result to the number of bytes in the parent scope. With the patch,
only bytes which overlap with the parent DIE scope address ranges contribute to
the stat. A new stat 'sum_all_variables(#bytes in any scope covered by
DW_AT_location)' has been added which displays the total bytes covered when
ignoring scopes.
so that the user does not have to pipe the output to `jq` or `python -m json.tool`.
This change makes testing more convenient because `-NEXT` patterns can be used.
The "prettify by default" is a good tradeoff to make. The output size increases a bit.
Differential Revision: https://reviews.llvm.org/D86318
Then it is trivial to make the output indented (the second parameter of
json::OStream::OStream specifies the indentation).
Reviewed By: jhenderson, echristo
Differential Revision: https://reviews.llvm.org/D86045
With a fix to uninitialized EndOffset.
DW_OP_call_ref is the only operation that has an operand which depends
on the DWARF format. The patch fixes handling that operation in DWARF64
units.
Differential Revision: https://reviews.llvm.org/D79501
DW_OP_call_ref is the only operation that has an operand which depends
on the DWARF format. The patch fixes handling that operation in DWARF64
units.
Differential Revision: https://reviews.llvm.org/D79501
This addresses:
-Clean up the source code
-Refactor the JSON fields
-Fix the test cases
-Improve the docs for the stats output
Differential Revision: https://reviews.llvm.org/D77789
This patch moves interface declarations into llvm-dwarfdump.h and wrap
declarations in anonymous namespaces as appropriate. At the same time,
the externals are moved into the `llvm::dwarfdump` namespace`.
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D77848
Add an option to llvm-dwarfdump to calculate the bytes within
the debug sections. Dump this numbers when using --statistics
option as well.
This is an initial patch (e.g. we should support other units,
since we only support 'bytes' now).
Differential Revision: https://reviews.llvm.org/D74205
According to the Twine.h comment, the Twines should only
be used as const references in arguments.
Differential Revision: https://reviews.llvm.org/D75727
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
It isn't known how many times we've seen the same variable or member in
the global scope (unlike in functions), but there still can be some duplicates
among different CUs.
So, this patch proposes to count variables in the global scope just as a sum of
the number of vars, constant members and artificial entities.
Reviewed by: aprantl
Differential Revision: https://reviews.llvm.org/D73004
A few DW_TAG_formal_parameter's of the same function may have the same
name (e.g. variadic (template) functions) or don't have a name at all
(if the parameter isn't used inside the function body), but we still
need to be able to distinguish between them to get correct number of 'total vars'
and 'availability' metric.
Reviewed by: aprantl
Differential Revision: https://reviews.llvm.org/D73003
Here may be more than one out-of-line instance of the same function
among different CUs. All of them should be accounted for to get an accurate
total number of variables/parameters.
Reviewed by: aprantl
Differential Revision: https://reviews.llvm.org/D73002
DW_TAG_subroutine_type is not really useful for statistics purposes, as it never
has location information. But it may contain DW_TAG_formal_parameter
children that generate number of parameters w/o location and decrease
'availability' metric significantly.
Reviewed by: djtodoro
Differential Revision: https://reviews.llvm.org/D72983
Different variables and functions might have the same name in different CU.
To calculate 'Availability' metric more accurate (i.e. to avoid getting
availability above 100%), we need to have some additional logic to
distinguish between them.
The patch introduces a DIE identifier that consists of a function/variable name
and declaration information: a filename and a line number. This allows
distinguishing different functions/variables (different means declared in
different files/lines) with the same name, keeping duplicates counted
as duplicates.
Reviewed by: aprantl, djtodoro
Differential Revision: https://reviews.llvm.org/D72797
The Version was used only to determine the size of an operand of
DW_OP_call_ref. The size was 4 for all versions apart from 2, but
the DW_OP_call_ref operation was introduced only in DWARF3. Thus,
the code may be simplified and using of Version may be eliminated.
Differential Revision: https://reviews.llvm.org/D73264
Summary:
This is a follow up for D70548.
Currently, variables with debug info coverage between 0% and 1% are put into
zero-bucket. D70548 changed the way statistics calculate a variable's coverage:
we began to use enclosing scope rather than a possible variable life range.
Thus more variables might be moved to zero-bucket despite they have some debug
info coverage.
The patch is to distinguish between a variable that has location info but
it's significantly less than its enclosing scope and a variable that doesn't
have it at all.
Reviewers: djtodoro, aprantl, dblaikie, avl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71070
Summary:
This changes the representation of 'coverage buckets' in llvm-dwarfdump and
llvm-locstats to one that makes more clear what the buckets contain.
See some related details in D71070.
Reviewers: djtodoro, aprantl, cmtice, jhenderson
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71366
Summary:
The patch removes OffsetToFirstDefinition in the 'scope bytes total'
statistic computation. Thus it unifies the way the scope and the coverage
buckets are computed. The rationals behind that are the following:
1. OffsetToFirstDefinition was used to calculate the variable's life range.
However, there is no simple way to do it accurately, so the scope calculated
this way might be misleading. See D69027 for more details on the subject.
2. Both 'scope bytes total' and coverage buckets seem to be intended
to represent the same data in different ways. Otherwise, the statistics
might be controversial and confusing.
Note that the approach gives up a thorough evaluation of debug information
completeness (i.e. coverage buckets by themselves doesn't tell how good
the debug information is). Only changes in coverage over time make
a 'physical' sense.
Reviewers: djtodoro, aprantl, vsk, dblaikie, avl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70548
Summary:
This patch removes manual location list handling in the statistics code
and replaces it with the new DWARFDie api, which provides access to a
"cooked" location list. This has the following effects:
- the code now properly handles split-dwarf location lists
- it will automatically support dwarf5 location lists once support for
those is added
- it properly handles location lists with base address selection entries
- it fixes a bug where the location list code was using the first
DW_AT_ranges range as a "base address" of the compile unit (it should
have used DW_AT_low_pc instead. The effect of this was that the
computation of the start address of a variable in its scope was broken
for these kinds of compile units. This only manifested itself on
linked files, since in object files the first DW_AT_ranges range
normally starts at 0.
Since pretty much every kind of location list was broken in some way,
it's hard to verify that the new implementation is correct -- the output
will be different in all non-trivial cases, and mostly with good reason.
Most of the existing statistics tests continue to pass though, and a
visual inspection of the statistics for non-trivial inputs shows that
the data is more "reasonable" now. I have updated the "dwo statistics"
test to include the new numbers, as the previous ones were completely
bogus, and I have added a targeted test for the "base address" bug.
Reviewers: dblaikie, cmtice, vsk
Subscribers: aprantl, SouraVX, JDevlieghere, djtodoro, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70444
Summary:
This adds a visitLocationList function to the DWARF v4 location lists,
similar to what already exists for DWARF v5. It follows the approach
outlined in previous patches (D69672), where the parsed form is always
stored in the DWARF v5 format, which makes it easier for generic code to
be built on top of that. v4 location lists are "upgraded" during
parsing, and then this upgrade is undone while dumping.
Both "inline" and section-based dumping is rewritten to reuse the
existing "generic" location list dumper. This means that the output
format is consistent for all location lists (the only thing one needs to
implement is the function which prints the "raw" form of a location
list), and that debug_loc dumping correctly processes base address
selection entries, etc.
The previous existing debug_loc functionality (e.g.,
parseOneLocationList) is rewritten on top of the new API, but it is not
removed as there is still code which uses them. This will be done in
follow-up patches, after I build the API to access the "interpreted"
location lists in a generic way (as that is what those users really
want).
Reviewers: dblaikie, probinson, JDevlieghere, aprantl, SouraVX
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69847