This change implements import call optimization for AArch64 Windows
(equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag).
Import call optimization adds additional data to the binary which can be
used by the Windows kernel loader to rewrite indirect calls to imported
functions as direct calls. It uses the same [Dynamic Value Relocation
Table mechanism that was leveraged on x64 to implement
`/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618).
The change to the obj file is to add a new `.impcall` section with the
following layout:
```cpp
// Per section that contains calls to imported functions:
// uint32_t SectionSize: Size in bytes for information in this section.
// uint32_t Section Number
// Per call to imported function in section:
// uint32_t Kind: the kind of imported function.
// uint32_t BranchOffset: the offset of the branch instruction in its
// parent section.
// uint32_t TargetSymbolId: the symbol id of the called function.
```
NOTE: If the import call optimization feature is enabled, then the
`.impcall` section must be emitted, even if there are no calls to
imported functions.
The implementation is split across a few parts of LLVM:
* During AArch64 instruction selection, the `GlobalValue` for each call
to a global is recorded into the Extra Information for that node.
* During lowering to machine instructions, the called global value for
each call is noted in its containing `MachineFunction`.
* During AArch64 asm printing, if the import call optimization feature
is enabled:
- A (new) `.impcall` directive is emitted for each call to an imported
function.
- The `.impcall` section is emitted with its magic header (but is not
filled in).
* During COFF object writing, the `.impcall` section is filled in based
on each `.impcall` directive that were encountered.
The `.impcall` section can only be filled in when we are writing the
COFF object as it requires the actual section numbers, which are only
assigned at that point (i.e., they don't exist during asm printing).
I had tried to avoid using the Extra Information during instruction
selection and instead implement this either purely during asm printing
or in a `MachineFunctionPass` (as suggested in [on the
forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3))
but this was not possible due to how loading and calling an imported
function works on AArch64. Specifically, they are emitted as `ADRP` +
`LDR` (to load the symbol) then a `BR` (to do the call), so at the point
when we have machine instructions, we would have to work backwards
through the instructions to discover what is being called. An initial
prototype did work by inspecting instructions; however, it didn't
correctly handle the case where the same function was called twice in a
row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the
previously loaded address. Worse than that, sometimes for the
double-call case LLVM decided to spill the loaded address to the stack
and then reload it before making the second call. So, instead of trying
to implement logic to discover where the value in a register came from,
I instead recorded the symbol being called at the last place where it
was easy to do: instruction selection.
llvm-mc --assemble prints an initial `.text` from `initSections`.
This is weird for quick assembly tasks that do not specify `.text`.
Omit the .text by moving section directive printing from `changeSection`
to `switchSection`. switchSectionNoPrint now correctly calls the
`changeSection` hook (needed by MachO).
The initial directives of clang -S are now reordered. On ELF targets, we
get `.file "a.c"; .text` instead of `.text; .file "a.c"`.
If there is no function, `.text` will be omitted.
**Summary**
This patch introduces a new compiler option `-mllvm
-emit-func-debug-line-table-offsets` that enables the emission of
per-function line table offsets and end sequences in DWARF debug
information. This enhancement allows tools and debuggers to accurately
attribute line number information to their corresponding functions, even
in scenarios where functions are merged or share the same address space
due to optimizations like Identical Code Folding (ICF) in the linker.
**Background**
RFC: [New DWARF Attribute for Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)
Previous similar PR:
[#93137](https://github.com/llvm/llvm-project/pull/93137) – This PR was
very similar to the current one but at the time, the assembler had no
support for emitting labels within the line table. That support was
added in PR [#99710](https://github.com/llvm/llvm-project/pull/99710) -
and in this PR we use some of the support added in the assembler PR.
In the current implementation, Clang generates line information in the
`debug_line` section without directly associating line entries with
their originating `DW_TAG_subprogram` DIEs. This can lead to issues when
post-compilation optimizations merge functions, resulting in overlapping
address ranges and ambiguous line information.
For example, when functions are merged by ICF in LLD, multiple functions
may end up sharing the same address range. Without explicit linkage
between functions and their line entries, tools cannot accurately
attribute line information to the correct function, adversely affecting
debugging and call stack resolution.
**Implementation Details**
To address the above issue, the patch makes the following key changes:
**`DW_AT_LLVM_stmt_sequence` Attribute**: Introduces a new LLVM-specific
attribute `DW_AT_LLVM_stmt_sequence` to each `DW_TAG_subprogram` DIE.
This attribute holds a label pointing to the offset in the line table
where the function's line entries begin.
**End-of-Sequence Markers**: Emits an explicit DW_LNE_end_sequence after
each function's line entries in the line table. This marks the end of
the line information for that function, ensuring that line entries are
correctly delimited.
**Assembler and Streamer Modifications**: Modifies the MCStreamer and
related classes to support emitting the necessary labels and tracking
the current function's line entries. A new flag
GenerateFuncLineTableOffsets is added to control this behavior.
**Compiler Option**: Introduces the `-mllvm
-emit-func-debug-line-table-offsets` option to enable this
functionality, allowing users to opt-in as needed.
These specify that the value of the given register in the previous frame
is the CFA plus some offset. This isn't very common but can be necessary
if the original value is normally reconstructed from the stack/frame
pointer instead of being saved on the stack and reloaded from there.
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.
For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
GNU assembler 2.26 introduced the .cfi_label directive. It does not
expand to any CFI instructions, but defines a label in
.eh_frame/.debug_frame, which can be used by runtime patching code to
locate the FDE. .cfi_label is not allowed for CIE's initial
instructions, and can therefore be used to force the next instruction to
be placed in a FDE instead of a CIE.
In glibc since 2018, sysdeps/riscv/start.S utilizes .cfi_label to force
DW_CFA_undefined to be placed in a FDE. arc/csky/loongarch ports have
copied this use.
```
.cfi_startproc
// DW_CFA_undefined is allowed for CIE's initial instructions.
// Without .cfi_label, gas would place DW_CFA_undefined in a CIE.
.cfi_label .Ldummy
.cfi_undefined ra
.cfi_endproc
```
No CFI instruction is associated with .cfi_label, so the `case
MCCFIInstruction::OpLabel:` code in BOLT is unreachable and onlt to make
-Wswitch happy.
Close#97222
Pull Request: https://github.com/llvm/llvm-project/pull/97922
21101b32318647f600584d966c697d8773f59629 (2013) added SymbolOrdering,
which essentially became useless when
e3a20f57d927e422874a8e7730bb7590515b586d (2015) removed `AssignSection`
from `EmitLabel`. `assignFragment` is still used in very few places like
emitTBSSSymbol, which do not make a difference if we remove
SymbolOrdering.
Follow-up to 05ba5c0648ae5e80d5afce270495bf3b1eef9af4. uint32_t is
preferred over const MCExpr * in the section stack uses because it
should only be evaluated once. Change the paramter type to match.
There are only three actual uses of the section kind in MCSection:
isText(), XCOFF, and WebAssembly. Store isText() in the MCSection, and
store other info in the actual section variants where required.
ELF and COFF flags also encode all relevant information, so for these
two section variants, remove the SectionKind parameter entirely.
This allows to remove the string switch (which is unnecessary and
inaccurate) from createELFSectionImpl. This was introduced in
[D133456](https://reviews.llvm.org/D133456), but apparently, it was
never hit for non-writable sections anyway and the resulting kind was
never used.
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).
`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.
I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.
```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc # succeeded
% clang -c b.cc # succeeded with this patch
% clang -S b.cc # still failed
<inline asm>:4:5: error: expected absolute expression
4 | .if . -_start == 1
| ^
1 error generated.
```
However, removing `getUseAssemblerInfoForParsing` would make
MCDwarfFrameEmitter::Emit (for .eh_frame FDE) slow (~4% compile time
regression for sqlite3.c amalgamation) due to expensive
`AttemptToFoldSymbolOffsetDifference`. For now, make
`UseAssemblerInfoForParsing` false in MCDwarfFrameEmitter::Emit.
Close#62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841
Pull Request: https://github.com/llvm/llvm-project/pull/91082
This reverts commit 03c53c69a367008da689f0d2940e2197eb4a955c.
This causes very large compile-time regressions in some cases,
e.g. sqlite3 at O0 regresses by 5%.
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).
`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.
I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.
```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc # succeeded
% clang -c b.cc # succeeded with this patch
% clang -S b.cc # still failed
<inline asm>:4:5: error: expected absolute expression
4 | .if . -_start == 1
| ^
1 error generated.
```
Close#62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841
Pull Request: https://github.com/llvm/llvm-project/pull/91082
Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it set.
Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
This works around an AIX assembler and linker bug. If the
-fno-integrated-as and -frecord-command-line options are used but
there's no actual code in the source file, the assembler creates an
object file with only an .info section. The AIX linker rejects such an
object file.
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D153600
to help debug and report better diagnostics for functions like
relaxDwarfCallFrameFragment (D153167).
In MCStreamer, some emitCFI* functions already take a SMLoc argument. Add a
SMLoc argument to the remaining functions that generate a MCCFIInstruction.
Reasons for rolling forward:
- the crash reported from Chromium was fixed in D151824 (not related to this patch at all)
- since D152824 was committed, it should now be safe to roll this forward.
New change:
- add an additional _ in name check
This reverts commit 4980eead4d0b4666d53dad07afb091375b3a13a0.
Details: https://github.com/rust-lang/rust/issues/102754
The MachO format uses 2 bits to encode these personality funtions, with 0 reserved for "no-personality".
This means we can only have up to 3 personality. There are already three popular personalities: __gxx_personality_v0, __gcc_personality_v0, and __objc_personality_v0.
As a result, any system that needs custom-personality will run into a problem.
This patch implemented jyknight's proposal to simply force DWARFs for all non-canonical personality functions.
Differential Revision: https://reviews.llvm.org/D144999
Encoding FS discriminators for block probes. Decoding them correspondingly.
The encoding/decoding of FS discriminators are conditional, only for probes with a non-zero discriminator. This saves encoding size, also ensures downwards-compatiblity.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D147651
This doesn't change the output in any way, but we have a bunch of
emitFill for padding. When emitting an array of floats we'd end up with
DataFragment float1
FillFragment 0
DataFragment float2
FillFragment 0
... and so on
We never actually emit anything for those fills, neither in asm nor obj
emission mode, they just consume RAM for no reason.
Summary: A R_REF relocation as a non-relocating reference is required to prevent garbage collection (by the binder) of the ref symbol in object generation.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D144356
This patch adds handling of debug_macinfo/debug_macro tables to the DWARFLinker.
It uses already existing code for reading tables from DWARFDebugMacro.h.
It adds new code writing tables into the DwarfStreamer::emitMacroTables.
Differential Revision: https://reviews.llvm.org/D140223
In the same vein as D139439, the patch is not NFC as there is no way to check all downstream implementations but the patch seems pretty safe.
Differential Revision: https://reviews.llvm.org/D139548
Before performing this change, I checked that `ByteAlignment` was never `0` inside `MCAsmStreamer:emitZeroFill` and `MCAsmStreamer::emitLocalCommonSymbol`.
I believe it is NFC as `0` values are illegal in `emitZeroFill` anyways, `Log2(ByteAlignment)` would be undefined.
And currently, all calls to `emitLocalCommonSymbol` are provably `>0`.
Differential Revision: https://reviews.llvm.org/D139439
This breaks Windows bots with
`warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)`
Some shift operators are lacking a proper literal unit ('1ULL' instead of
'1'). Will reland once fixed.
This reverts commit c621c1a8e81856e6bf2be79714767d80466e9ede.
Before performing this change, I checked that `ByteAlignment` was never `0` inside `MCAsmStreamer:emitZeroFill` and `MCAsmStreamer::emitLocalCommonSymbol`.
I believe it is NFC as `0` values are illegal in `emitZeroFill` anyways, `Log2(ByteAlignment)` would be undefined.
And currently, all calls to `emitLocalCommonSymbol` are provably `>0`.
Differential Revision: https://reviews.llvm.org/D139439
This patch makes code less readable but it will clean itself after all functions are converted.
Differential Revision: https://reviews.llvm.org/D138665
Currently pseudo probe encoding for a function is like:
- For the first probe, a relocation from it to its physical position in the code body
- For subsequent probes, an incremental offset from the current probe to the previous probe
The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address.
A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name.
Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function
```
Foo() {
…
Probe 1
…
Probe 2
}
```
If it is transformed into two binary functions:
```
Foo:
…
Foo.outlined:
…
```
The encoding for the two binary functions will be separate:
```
GUID of Foo
Probe 1
GUID of Foo
Sentinel probe of Foo.outlined
Probe 2
```
Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address.
On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues.
The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding.
Reviewed By: wenlei, maksfb
Differential Revision: https://reviews.llvm.org/D135912
Differential Revision: https://reviews.llvm.org/D135914
Differential Revision: https://reviews.llvm.org/D136394