Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF
sections has landed, there is no longer a need to keep around the
mbb-profile-dump flag.
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.
Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.
The other changes are supporting changes, to handle the new constructs
generated by that pass.
There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.
There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.
I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.
Thanks to @dpaoliello for extensive testing and suggestions.
(Originally posted as https://reviews.llvm.org/D157547 .)
This implements the ideas discussed in [1].
To summarize, this commit changes AsmPrinter so that it outputs
DW_IDX_parent information for debug_name entries. It will enable
debuggers to speed up queries for fully qualified types (based on a
DWARFDeclContext) significantly, as debuggers will no longer need to
parse the entire CU in order to inspect the parent chain of a DIE.
Instead, a debugger can simply take the parent DIE offset from the
accelerator table and peek at its name in the debug_info/debug_str
sections.
The implementation uses two types of DW_FORM for the DW_IDX_parent
attribute:
1. DW_FORM_ref4, which points to the accelerator table entry for the
parent.
2. DW_FORM_flag_present, when the entry has a parent that is not in the
table (that is, the parent doesn't have a name, or isn't allowed to be
in the table as per the DWARF spec). This is space-efficient, since it
takes 0 bytes.
The implementation works by:
1. Changing how abbreviations are encoded (so that they encode which
form, if
any, was used to encode IDX_Parent)
2. Creating an MCLabel per accelerator table entry, so that they may be
referred by IDX_parent references.
When all patches related to this are merged, we are able to show that
evaluating an expression such as:
```
lldb --batch -o 'b CodeGenFunction::GenerateCode' -o run -o 'expr Fn' -- \
clang++ -c -g test.cpp -o /dev/null
```
is far faster: from ~5000 ms to ~1500ms.
Building llvm-project + clang with and without this patch, and looking
at its impact on object file size:
```
ls -la $(find build_stage2_Debug_idx_parent_assert_dwarf5 -name \*.cpp.o) | awk '{s+=$5} END {printf "%\047d\n", s}'
11,507,327,592
-la $(find build_stage2_Debug_no_idx_parent_assert_dwarf5 -name \*.cpp.o) | awk '{s+=$5} END {printf "%\047d\n", s}'
11,436,446,616
```
That is, an increase of 0.62% in total object file size.
Looking only at debug_names:
```
$stage1_build/bin/llvm-objdump --section-headers $(find build_stage2_Debug_idx_parent_assert_dwarf5 -name \*.cpp.o) | grep __debug_names | awk '{s+="0x"$3} END {printf "%\047d\n", s}'
440,772,348
$stage1_build/bin/llvm-objdump --section-headers $(find build_stage2_Debug_no_idx_parent_assert_dwarf5 -name \*.cpp.o) | grep __debug_names | awk '{s+="0x"$3} END {printf "%\047d\n", s}'
369,867,920
```
That is an increase of 19%.
DWARF Linkers need to be changed in order to support this. This commit
already brings support to "base" linker, but it does not attempt to
modify the parallel linker. Accelerator entries refer to the
corresponding DIE offset, and this patch also requires the parent DIE
offset -- it's not clear how the parallel linker can access this. It may
be obvious to someone familiar with it, but it would be nice to get help
from its authors.
[1]:
https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151/
Without this gcc warned
../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3585:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
3584 | ((&Current == &AccelDebugNames) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3585 | (Unit.getUnitDie().getTag() != dwarf::DW_TAG_type_unit)) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
3586 | "Kind is CU but TU is being processed.");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3589:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
3588 | ((&Current == &AccelTypeUnitsDebugNames) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3589 | (Unit.getUnitDie().getTag() == dwarf::DW_TAG_type_unit)) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
3590 | "Kind is TU but CU is being processed.");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bug 1 is triggered when a TU is already created, and we process the same
DICompositeType at a top level. We would switch to TU accelerator table,
but
would not switch back on early exit. As the result we would add CU
entries to the TU
accelerator table. When we try to write out TUs and normalize entries,
the
offsets for DIEs that are part of a CU would not have been computed, and
it
would assert on getOffset().
Bug 2 is triggered when processing nested TUs. When we exit from
addDwarfTypeUnitType we switched back to CU accelerator table. If we
were processing nested TUs, the rest of the entries from TUs would be
added to CU accelerator table. When we write out TUs, all the DIE
pointers will become invalid. Eventually it will assert during
normalization step after CU is processed.
- [DebugMetadata][DwarfDebug] Support function-local types in lexical
block scopes (4/7)
- [CloneFunction][DebugInfo] Avoid cloning DILocalVariables of inlined
functions
This is a follow-up for https://reviews.llvm.org/D144006, fixing a crash
reported
in Chromium (https://reviews.llvm.org/D144006#4651955).
The first commit is added for convenience, as it has already been
accepted.
If DISubpogram was not cloned (e.g. we are cloning a function that has
other
functions inlined into it, and subprograms of the inlined functions are
not supposed to be cloned), it doesn't make sense to clone its
DILocalVariables as well.
Otherwise get duplicated DILocalVariables not tracked in their
subprogram's retainedNodes, that crash LTO with Chromium.
This is meant to be committed along with
https://reviews.llvm.org/D144006.
Specializations of AccelTableBase are always interested in accessing the
derived versions of their data classes (e.g. DWARF5AccelTableData). They
do so by sprinkling `static_casts` all over the code.
This commit adds a helper function to simplify this process, reducinng
the number of casts that have to be made in the middle of code, making
it easier to read.
Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF
section. Implements filecheck tests to verify emitting new fields.
This patch emits optional PGO related analyses into the bb-addr-map ELF
section during AsmPrinter. This currently supports Function Entry Count,
Machine Block Frequencies. and Machine Branch Probabilities. Each is
independently enabled via the `feature` byte of `bb-addr-map` for the given
function.
A part of [RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis](https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902).
Renaming a member variable from "Endoding" to "Encoding".
Also replace inlined code for "isNormalized" with a call to the
function, so that if the definition of normalization ever changes, we
only need to change the one place.
Type dereferenced fragments are specified by offset and length in bits.
The representation in CodeView is defined in terms of byte offsets. If
the bit slice overlaps at a byte that is included, we would create
invalid definition ranges.
Consider the following scenario:
~~~
01234567 01234567
---------+---------
==== ======
~~~
Here bits 1-4 are marked as defined as well as bits 7-9. The byte range
for the second portion overlaps and so we would say that bytes 1 and 2
are valid though there is potentially a hole. There is no way to
represent this in the defined range for the local variable in CodeView.
We simply can drop the fragment definition in such a scenario with the
variables are "optimized out".
Thanks to @rnk and @hjyamauchi for the discussion around this.
... by lowering them as lazy resolve-on-first-use symbol resolvers. Note that this is subtly different timing than on ELF platforms, where ifunc resolution happens at load time.
Since ld64 and ld-prime don't support all the cases we need for these, we lower them manually in the AsmPrinter.
This adds support to help LLDB when binary is built with split dwarf,
has
.debug_names accelerator table and DWP file.
Final linked binary might have Type Units (TUs) with the same type
signature in multiple
compilation units. Although the signature is the same, TUs are not
guranted to
be bit identical. This is not a problem when they are in .o/.dwo files
as LLDB
can find them by looking at the right one based on
DW_AT_comp_dir/DW_AT_name in
skeleton CU. Once DWP is created, TUs are de-duplicated, and we need to
know
from which CU remaining one came from.
This approach allows LLDB to figure it out, with minimal changes to the
rest of
the tooling. As would have been the case if .debug_tu_index section in
DWP was
modified.
This works around an AIX assembler and linker bug. If the
-fno-integrated-as and -frecord-command-line options are used but
there's no actual code in the source file, the assembler creates an
object file with only an .info section. The AIX linker rejects such an
object file.
Enables Type Units with DWARF5 accelerator tables for split dwarf. It is
still
under discussion what is the best way to implement support for
de-duplication in
DWP. This will be in follow up PR.
Enable Type Units with DWARF5 accelerator tables for monolithic DWARF.
Implementation relies on linker to tombstone offset in LocalTU list to
-1 when
it deduplciates type units using COMDAT.
The DWARF 5 specification says that:
> The name index must contain an entry for each debugging information
entry that
> defines a named [...] label [...].
The verifier currently verifies this, but the AsmPrinter does not add
entries for TAG_labels in debug_names. This patch addresses the issue by
ensuring we add labels in the accelerator tables once we have a fully
completed DIE for the TAG_label entry.
We also respect the spec as follows:
> DW_TAG_label debugging information entries without an address
attribute
> (DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges, or DW_AT_entry_pc) are
excluded.
The effect of this on the size of accelerator tables is minimal, as
TAG_labels are usually created by C/C++ labels (see example in test),
which are typically paired with "goto" statements.
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.
28b9126879
introduced the path cloning format in the basic-block-sections profile.
This PR validates and applies path clonings.
A path cloning is valid if all of these conditions hold:
1. All bb ids in the path are mapped to existing blocks.
2. Each two consecutive bb ids in the path have a successor relationship
in the CFG.
3. The path does not include a block with indirect branches, except
possibly as the last block.
Applying a path cloning involves cloning all blocks in the path (except
the first one) and setting up their branches.
Once all clonings are applied, the cluster information is used to guide
block layout in the modified function.
This is pre-cursor patch to enabling type units with DWARF5 acceleration
tables.
With this change it allows for entries to contain offsets directly, this
way type
units do not need to be preserved until .debug_names is written out.
C++20 comes with std::erase to erase a value from std::vector. This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.
We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.
Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form. This patch replaces
llvm::support::endianness with llvm::endianness.
The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.
We also increase the precision of `MachineBlockFrequencyInfo::getBlockFreqRelativeToEntryBlock` to double. Existing code uses it as float so should be unaffected.
This caused asserts:
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:2331:
virtual void llvm::DwarfDebug::endFunctionImpl(const llvm::MachineFunction *):
Assertion `LScopes.getAbstractScopesList().size() == NumAbstractSubprograms &&
"getOrCreateAbstractScope() inserted an abstract subprogram scope"' failed.
See comment on the code review for reproducer.
> RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
>
> Similar to imported declarations, the patch tracks function-local types in
> DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
> the aforementioned metadata change and provided a support of function-local
> types scoped within a lexical block.
>
> The patch assumes that DICompileUnit's 'enums field' no longer tracks local
> types and DwarfDebug would assert if any locally-scoped types get placed there.
>
> Reviewed By: jmmartinez
>
> Differential Revision: https://reviews.llvm.org/D144006
This reverts commit f8aab289b5549086062588fba627b0e4d3a5ab15.
The operators are defined in DwarfDebug.cpp but are
referenced in the struct definitions of FrameIndexExpr and
EntryValueInfo in DwarfDebug.h, and since they weren't
declared before, gcc warned with
[694/5646] Building CXX object lib/CodeGen/AsmPrinter/CMakeFiles/LLVMAsmPrinter.dir/DwarfDebug.cpp.o
../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:273:6: warning: 'bool llvm::operator<(const llvm::FrameIndexExpr&, const llvm::FrameIndexExpr&)' has not been declared within 'llvm'
273 | bool llvm::operator<(const FrameIndexExpr &LHS, const FrameIndexExpr &RHS) {
| ^~~~
In file included from ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:13:
../lib/CodeGen/AsmPrinter/DwarfDebug.h:112:15: note: only here as a 'friend'
112 | friend bool operator<(const FrameIndexExpr &LHS, const FrameIndexExpr &RHS);
| ^~~~~~~~
../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:278:6: warning: 'bool llvm::operator<(const llvm::EntryValueInfo&, const llvm::EntryValueInfo&)' has not been declared within 'llvm'
278 | bool llvm::operator<(const EntryValueInfo &LHS, const EntryValueInfo &RHS) {
| ^~~~
In file included from ../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:13:
../lib/CodeGen/AsmPrinter/DwarfDebug.h:121:15: note: only here as a 'friend'
121 | friend bool operator<(const EntryValueInfo &LHS, const EntryValueInfo &RHS);
| ^~~~~~~~
The `S_INLINEES` debug symbol is used to record all the functions that
are directly inlined within the current function (nested inlining is
ignored).
This change implements support for emitting the `S_INLINEES` debug
symbol in LLVM, and cleans up how the `S_INLINEES` and `S_CALLEES` debug
symbols are dumped.
This patch is extracted from D96035, it adds support for the accelerator
tables to the DWARFLinkerParallel functionality.
Differential Revision: https://reviews.llvm.org/D154793
Fold constructVariableDIEImpl into constructVariableDIE, simplify it and
group related functions.
Pull out the previously inline lambdas for visiting the active variant
of the DbgVariable to add location and related attributes as an overload
set for a private method
applyConcreteDbgVariableAttributes.
Rename applyVariableAttribute to reflect what kinds of attributes it
applies, and to contrast it with the new
applyConcreteDbgVariableAttributes.
Move constructLabelDIE down in the implementation file, so all of the
constructVariableDIE-related function impls are adjacent.
This avoids the need for a mutable member to implement deferred sorting,
and some bespoke code to maintain a SmallVector as a set.
The performance impact seems to be negligible in some small tests, and
so seems acceptable to greatly simplify the code.
An old FIXME and accompanying workaround are dropped. It is ostensibly
dead-code within the current codebase.
We were losing the function entry count, which is useful to check profile quality. For the original cases where we want
entrypoint-relative MBB frequencies, the user would just need to divide these values by the entrypoint (first MBB, with ID=0) value.
Revision c383f4d6550e enabled using variadic-form debug values to represent
single-location, non-stack-value debug values, and a further patch made all
DBG_INSTR_REFs use variadic form. Not all code paths were updated correctly to
handle the new syntax however, with entry values in still expecting an expression
that begins exactly DW_OP_LLVM_entry_value, 1.
A function already exists to select non-variadic-like expressions; this patch
adds an extra function to cheaply simplify such cases to non-variadic form, which
we use prior to any entry-value processing to put DBG_INSTR_REFs and DBG_VALUEs
down the same code path. We also use it for a few DIExpression functions that
check for whether the first element(s) of a DIExpression match a particular
pattern, so that they will return the same result for
DIExpression(DW_OP_LLVM_arg, 0, <ops>) as for DIExpression(<ops>).
Differential Revision: https://reviews.llvm.org/D158185