In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.
This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.
When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.
For example:
.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv
The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".
It is alright to skip this as we already check the section corresponding
to the debugloc entry.
Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.
For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
This patch makes it so that specifying all or none for -pgo-analysis-map
along with an explicit option causes an error as this set of options
does not really make sense.
This patch adds none and all options to the -pgo-analysis-map flag,
which do basically what they say on the tin. The none option is added to
enable forcing the pgo-analysis-map by overriding an earlier invocation
of the flag. The all option is just added for convenience.
Store Swift mangled names in DW_AT_linkage_name. The Swift compiler
emits only the type mangled name in debug information, and LLDB uses
those mangled names as keys to look up size, alignment, fields, etc
from either reflection metadata or Swift modules.
Additionally, emit types linkage names for types into the accelerator
table if they exist and they're different from the display name.
This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on.
Built by GCC 11.
Fix warnings:
llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp: In member
function ‘void llvm::AsmPrinter::emitJumpTableSizesSection(const
llvm::MachineJumpTableInfo*, const llvm::Function&) const’:
llvm-project/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:2852:31: error:
enumerated and non-enumerated type in conditional expression
[-Werror=extra]
2852 | int Flags = F.hasComdat() ? ELF::SHF_GROUP : 0;
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
Enable .debug_loc section for NVPTX backend.
This commit makes NVPTX omit DW_AT_low_pc (and DW_AT_high_pc) for
DW_TAG_compile_unit. This is because cuda-gdb uses the compile unit's
low_pc as a base address, and adds the addresses in the debug_loc
section to it. Removing low_pc is equivalent to setting that base
address to zero, so addition doesn't break the location ranges.
Additionally, this patch forces debug_loc label emission to emit single
labels with no subtraction or base. This would not be necessary if we
could emit `label1 - label2` expressions in PTX. The PTX documentation
at
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#debugging-directives-section
makes it seem like this is supported, but it doesn't actually work. I
believe when that documentation says that you can subtract “label
addresses between labels in the same dwarf section”, it doesn't merely
mean that the labels need to be in the same section as each other, but
in fact they need to be in the same section as the use. If support for
label subtraction is supported such that in the debug_loc section you
can subtract labels from the main code section, then we can remove the
workarounds added in this PR.
Also, since this now emits valid .debug_loc sections, it replaces the
empty .debug_loc to force existence of at least one debug section with
an empty .debug_macinfo section, which matches what nvcc does.
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.
Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
This is the final piece to enable register debugging for variables in
registers that have single locations that last throughout their
enclosing scope.
The next step after this for supporting register debugging for NVPTX is
to support the .debug_loc section.
Stacked on top of: https://github.com/llvm/llvm-project/pull/109495
This patch adds support for encoding PTX registers for DWARF, using the
encoding supported by nvcc and cuda-gcc.
There are some other features still needed for proper register debugging
that this patch does not address, such as DW_AT_address_class.
This PR is stacked on: https://github.com/llvm/llvm-project/pull/109494
The register encoding used by NVPTX and cuda-gdb basically use strings
encoded as numbers. They are always within 64-bits, but typically
outside of 32-bits, since they often need at least 5 characters.
This patch changes the signature of `MCRegisterInfo::getDwarfRegNum` and
some related data structures to use 64-bit numbers to accommodate
encodings like this.
Additionally, `MCRegisterInfo::getDwarfRegNum` is marked as virtual, so
that targets with peculiar dwarf register mapping schemes (such as
NVPTX) can override its behavior.
I originally tried to do a broader switch to 64-bit types for registers,
but it caused many problems. There are various places in code generation
where registers are not just treated as 32-bit numbers, but also treat
certain bit offsets as flags. So I limited the change as much as
possible to just the output of `getDwarfRegNum`. Keeping the types used
by `DwarfLLVMRegPair` as unsigned preserves the current behaviors. The
only way to give a 64-bit output from `getDwarfRegNum` that actually
needs more than 32-bits is to override `getDwarfRegNum` and provide an
implementation that sidesteps the use of the `DwarfLLVMRegPair` maps
defined in tablegen files.
First layer of stack supporting:
https://github.com/llvm/llvm-project/pull/109495
When emitting debug info for code alignment, it was possible to emit a
.loc directive with a file number of zero, which is invalid for DWARF 4
and earlier. This happened because getCurrentDwarfLoc() returned a
zero-initialised value when there hadn't been a previous .loc directive
emitted.
---------
Co-authored-by: Paul T Robinson <paul.robinson@sony.com>
This reapplies commit 1911a50fae8a441b445eb835b98950710d28fc88 with a
minor fix in lld/ELF/LTO.cpp which sets Options.BBAddrMap when
`--lto-basic-block-sections=labels` is passed.
This feature is supported via the newer option
`-fbasic-block-address-map`. Using the old option still works by
delegating to the newer option, while a warning is printed to show
deprecation.
AsmPrinter always creates a symbol for the end of function if valid
debug info is present. However, this breaks SPIR-V target's output,
because SPIR-V specification allows label instructions only inside a
block, not after the function body (see
https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpLabel).
This PR proposes to disable emission of label instructions after the
function body if the target is SPIR-V.
This PR is a fix of the
https://github.com/llvm/llvm-project/issues/102732 issue.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
Reverted due to large .debug_line size regressions for some
configurations; work currently in place to improve the output of this
behaviour in PR #108251.
This patch also modifies two tests that were created or modified after
the original commit landed and are affected by the revert:
llvm/test/CodeGen/X86/pseudo_cmov_lower2.ll
llvm/test/DebugInfo/X86/empty-line-info.ll
This reverts commit 5fef40c2c477e92187bd4e5c18091eca6b8465cc.
In degenerate but legal inputs, we can have functions that have no source
locations at all -- all the DebugLocs attached to instructions are empty.
LLVM didn't produce any source location for the function; with this patch
it will at least emit the function-scope source location. Demonstrated by
empty-line-info.ll
The XCOFF test modified has similar symptoms -- with this patch, the size
of the ".dwline" section grows a bit, thus shifting some of the file
internal offsets, which I've updated.
This patch will make LLVM emit a new section .llvm_jump_table_sizes
containing tuples of (jump table address, entry count) in object files.
This section is useful for tools that need to statically reconstruct the
control flow of executables.
At the moment this is only enabled by default for the PS5 target.
Currently, we identify the end of the prologue as being "the instruction
that first has *this* DebugLoc". It works well enough, but I feel
identifying a position in a function is best communicated by a
MachineInstr. Plus, I've got some patches coming that depend upon this.
Seemingly this goes back to fd07a2a in 2015 -- I anticipate that back
then the metadata layout was radically different. But nowadays at least, we
can just directly look up the subprogram.
The InitUndef pass currently uses target-specific pseudo instructions,
with one pseudo per register class.
Instead, add a generic pseudo instruction, which can be used by all
targets and register classes.
Inlining and zero-cost abstractions tend to produce volumes of debug
info with identical ranges. When built with full debugging information
(the equivalent of -g2) librustc_driver.so has 2.1 million entries in
.debug_ranges. But only 1.1 million of those entries are unique. While
in principle all duplicates could be eliminated with a hashtable,
checking to see if the new range is exactly identical to the previous
range and skipping a new addition if it is is sufficient to eliminate
99.99% of the duplicates. This reduces the size of librustc_driver.so's
.debug_ranges section by 35%, or the overall binary size a little more
than 1%.
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
Fixes the previous buildbot error by adding an explicit triple to the test,
ensuring that llc can produce a valid object file.
This reverts commit 926f0979af4f6172d4ed3dea5603aa97c800bef1.
Reverted (along with the NFC followup fix) due to buildbot failure:
https://lab.llvm.org/buildbot/#/builders/160/builds/4142
This reverts commit 3ef37e2f8f672393ee409fde8309198df0981735, and commit
616f7d3d4f6d9bea6f776e357c938847e522a681.
Fixes: https://github.com/llvm/llvm-project/issues/104695
This patch adds the is_stmt flag to line table entries for the first
instruction with a non-0 line location in each basic block, to ensure
that it will be used for stepping even if the last instruction in the
previous basic block had the same line number; this is important for
cases where the new BB is reachable from BBs other than the preceding
block.
All Mach-O targets have this property, so just remove this variable,
which could lure contributors to add unneeded object file format
specific properties.
Since ce0c205813c74b4225180ac8a6e40fd52ea88229, we are doing that if a
single (LTO) compilation contains more than one compile unit, but the
same thing can happen if the non-lto and single-cu lto compilations,
typically when the CU ends up (nearly) empty. In my case, this happened
when LTO emptied two compile units.
Note that the source file name is already a part of the hash, so this
can only happen when a single file is compiled and linked twice into the
same application (most likely with different preprocessor defintiions).
While not exactly common, this pattern is used by some C code to
implement "templates".
The 2017 patch already hinted at the possibility of doing this
unconditionally, and this patch implements that. While the DWARF spec
hints at the option of using the type signature hashing algorithm for
the DWO_id purposes, AFAICT it does not actually require it, so I
believe this change is still conforming.
The relevant section of the spec is in Section 3.1.2 "Skeleton
Compilation Unit Entries" (in non-normative text):
```
The means of determining a compilation unit ID does not need to be
similar or related to the means of determining a type unit signature.
However, it should be suitable for detecting file version skew or other
kinds of mismatched files and for looking up a full split unit in a
DWARF package file (see Section 7.3.5 on page 190).
```
This patch makes LLVM emit line 0 source locations for nops emitted as
basic block padding.
---------
Co-authored-by: Orlando Cazalet-Hyams <orlando.hyams@sony.com>
This avoids another unserializable field. Move the DbgInfoAvailable
field into the AsmPrinter, which is only really a cache/convenience
bit for checking a direct IR module metadata check.
I assume getSubprogram will do the correct thing in hasDebugInfo,
and this is redundant with the debug_compile_units distance check.
This is in preparation for removing the field.
Some of SIE's post-mortem analysis infrastructure currently makes use of
.debug_aranges, so we'd like to ensure the section's presence in
PlayStation binaries. The simplest way to do this is to force emission
when the debugger tuning is set to SCE (which is in turn typically
initialized from the target triple). This also simplifies the driver.
llvm/test/DebugInfo/debuglineinfo-path.ll has been marked as UNSUPPORTED
on PlayStation. When aranges are emitted, the DWARF in the test case is
such that relocations need to be applied to the aranges section in order
for symbolization to work. An alternative approach would be to implement
the application of relocations in DWARFDebugArangeSet. While experiments
show that this can be made to work with a modest patch, the test cases
would be rather contrived. Since I expect the only utility for such a
change would be to make this test case pass for PlayStation targets, and
few - if any - outside of PlayStation care about aranges, UNSUPPORTED
would seem to be a more practical option.
This was originally commited as 22eb290a96 (#99629) and later reverted
at 84658fb82b (#99711) due to test failures on SIE built bots. These
failures shouldn't recur due to 3b24e5d450 (#99897) and the
aforementioned change to debuglineinfo-path.ll.
SIE tracker: TOOLCHAIN-16951
Enabled in clang using:
-fptrauth-indirect-gotos
and at the IR level using function attribute:
"ptrauth-indirect-gotos"
Signing uses IA and a per-function integer discriminator. The
discriminator isn't ABI-visible, and is currently:
ptrauth_string_discriminator("<function_name> blockaddress")
A sufficiently sophisticated frontend could benefit from per-indirectbr
discrimination, which would need additional machinery, such as allowing
"ptrauth" bundles on indirectbr. For our purposes, the simple scheme
above is sufficient.
This approach doesn't support subtracting label addresses and using
the result as offsets, because each label address is signed.
Pointer arithmetic on signed pointers corrupts the signature bits,
and because label address expressions aren't typed beyond void*,
we can't do anything reliably intelligent on the arithmetic exprs.
Not signing addresses when used to form offsets would allow
easily hijacking control flow by overwriting the offset.
This diagnoses the basic cases (`&&lbl2 - &&lbl1`) in the frontend,
while we evaluate either alternative implementations (e.g., lowering
blockaddress to a bb number, and indirectbr to a checked jump-table),
or better diagnostics (both at the frontend level and on unencodable
IR constants).
Some of SIE's post-mortem analysis infrastructure currently makes use of
.debug_aranges, so we'd like to ensure the section's presence in
PlayStation binaries. The simplest way to do this is force emission when
the debugger tuning is set to SCE (which is in turn typically
initialized from the target triple). This also simplifies the driver.
SIE tracker: TOOLCHAIN-16951