As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.
For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
GNU assembler 2.26 introduced the .cfi_label directive. It does not
expand to any CFI instructions, but defines a label in
.eh_frame/.debug_frame, which can be used by runtime patching code to
locate the FDE. .cfi_label is not allowed for CIE's initial
instructions, and can therefore be used to force the next instruction to
be placed in a FDE instead of a CIE.
In glibc since 2018, sysdeps/riscv/start.S utilizes .cfi_label to force
DW_CFA_undefined to be placed in a FDE. arc/csky/loongarch ports have
copied this use.
```
.cfi_startproc
// DW_CFA_undefined is allowed for CIE's initial instructions.
// Without .cfi_label, gas would place DW_CFA_undefined in a CIE.
.cfi_label .Ldummy
.cfi_undefined ra
.cfi_endproc
```
No CFI instruction is associated with .cfi_label, so the `case
MCCFIInstruction::OpLabel:` code in BOLT is unreachable and onlt to make
-Wswitch happy.
Close#97222
Pull Request: https://github.com/llvm/llvm-project/pull/97922
21101b32318647f600584d966c697d8773f59629 (2013) added SymbolOrdering,
which essentially became useless when
e3a20f57d927e422874a8e7730bb7590515b586d (2015) removed `AssignSection`
from `EmitLabel`. `assignFragment` is still used in very few places like
emitTBSSSymbol, which do not make a difference if we remove
SymbolOrdering.
Follow-up to 05ba5c0648ae5e80d5afce270495bf3b1eef9af4. uint32_t is
preferred over const MCExpr * in the section stack uses because it
should only be evaluated once. Change the paramter type to match.
There are only three actual uses of the section kind in MCSection:
isText(), XCOFF, and WebAssembly. Store isText() in the MCSection, and
store other info in the actual section variants where required.
ELF and COFF flags also encode all relevant information, so for these
two section variants, remove the SectionKind parameter entirely.
This allows to remove the string switch (which is unnecessary and
inaccurate) from createELFSectionImpl. This was introduced in
[D133456](https://reviews.llvm.org/D133456), but apparently, it was
never hit for non-writable sections anyway and the resulting kind was
never used.
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).
`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.
I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.
```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc # succeeded
% clang -c b.cc # succeeded with this patch
% clang -S b.cc # still failed
<inline asm>:4:5: error: expected absolute expression
4 | .if . -_start == 1
| ^
1 error generated.
```
However, removing `getUseAssemblerInfoForParsing` would make
MCDwarfFrameEmitter::Emit (for .eh_frame FDE) slow (~4% compile time
regression for sqlite3.c amalgamation) due to expensive
`AttemptToFoldSymbolOffsetDifference`. For now, make
`UseAssemblerInfoForParsing` false in MCDwarfFrameEmitter::Emit.
Close#62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841
Pull Request: https://github.com/llvm/llvm-project/pull/91082
This reverts commit 03c53c69a367008da689f0d2940e2197eb4a955c.
This causes very large compile-time regressions in some cases,
e.g. sqlite3 at O0 regresses by 5%.
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).
`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.
I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.
```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc # succeeded
% clang -c b.cc # succeeded with this patch
% clang -S b.cc # still failed
<inline asm>:4:5: error: expected absolute expression
4 | .if . -_start == 1
| ^
1 error generated.
```
Close#62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841
Pull Request: https://github.com/llvm/llvm-project/pull/91082
Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it set.
Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
This works around an AIX assembler and linker bug. If the
-fno-integrated-as and -frecord-command-line options are used but
there's no actual code in the source file, the assembler creates an
object file with only an .info section. The AIX linker rejects such an
object file.
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D153600
to help debug and report better diagnostics for functions like
relaxDwarfCallFrameFragment (D153167).
In MCStreamer, some emitCFI* functions already take a SMLoc argument. Add a
SMLoc argument to the remaining functions that generate a MCCFIInstruction.
Reasons for rolling forward:
- the crash reported from Chromium was fixed in D151824 (not related to this patch at all)
- since D152824 was committed, it should now be safe to roll this forward.
New change:
- add an additional _ in name check
This reverts commit 4980eead4d0b4666d53dad07afb091375b3a13a0.
Details: https://github.com/rust-lang/rust/issues/102754
The MachO format uses 2 bits to encode these personality funtions, with 0 reserved for "no-personality".
This means we can only have up to 3 personality. There are already three popular personalities: __gxx_personality_v0, __gcc_personality_v0, and __objc_personality_v0.
As a result, any system that needs custom-personality will run into a problem.
This patch implemented jyknight's proposal to simply force DWARFs for all non-canonical personality functions.
Differential Revision: https://reviews.llvm.org/D144999
Encoding FS discriminators for block probes. Decoding them correspondingly.
The encoding/decoding of FS discriminators are conditional, only for probes with a non-zero discriminator. This saves encoding size, also ensures downwards-compatiblity.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D147651
This doesn't change the output in any way, but we have a bunch of
emitFill for padding. When emitting an array of floats we'd end up with
DataFragment float1
FillFragment 0
DataFragment float2
FillFragment 0
... and so on
We never actually emit anything for those fills, neither in asm nor obj
emission mode, they just consume RAM for no reason.
Summary: A R_REF relocation as a non-relocating reference is required to prevent garbage collection (by the binder) of the ref symbol in object generation.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D144356
This patch adds handling of debug_macinfo/debug_macro tables to the DWARFLinker.
It uses already existing code for reading tables from DWARFDebugMacro.h.
It adds new code writing tables into the DwarfStreamer::emitMacroTables.
Differential Revision: https://reviews.llvm.org/D140223
In the same vein as D139439, the patch is not NFC as there is no way to check all downstream implementations but the patch seems pretty safe.
Differential Revision: https://reviews.llvm.org/D139548
Before performing this change, I checked that `ByteAlignment` was never `0` inside `MCAsmStreamer:emitZeroFill` and `MCAsmStreamer::emitLocalCommonSymbol`.
I believe it is NFC as `0` values are illegal in `emitZeroFill` anyways, `Log2(ByteAlignment)` would be undefined.
And currently, all calls to `emitLocalCommonSymbol` are provably `>0`.
Differential Revision: https://reviews.llvm.org/D139439
This breaks Windows bots with
`warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)`
Some shift operators are lacking a proper literal unit ('1ULL' instead of
'1'). Will reland once fixed.
This reverts commit c621c1a8e81856e6bf2be79714767d80466e9ede.
Before performing this change, I checked that `ByteAlignment` was never `0` inside `MCAsmStreamer:emitZeroFill` and `MCAsmStreamer::emitLocalCommonSymbol`.
I believe it is NFC as `0` values are illegal in `emitZeroFill` anyways, `Log2(ByteAlignment)` would be undefined.
And currently, all calls to `emitLocalCommonSymbol` are provably `>0`.
Differential Revision: https://reviews.llvm.org/D139439
This patch makes code less readable but it will clean itself after all functions are converted.
Differential Revision: https://reviews.llvm.org/D138665
Currently pseudo probe encoding for a function is like:
- For the first probe, a relocation from it to its physical position in the code body
- For subsequent probes, an incremental offset from the current probe to the previous probe
The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address.
A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name.
Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function
```
Foo() {
…
Probe 1
…
Probe 2
}
```
If it is transformed into two binary functions:
```
Foo:
…
Foo.outlined:
…
```
The encoding for the two binary functions will be separate:
```
GUID of Foo
Probe 1
GUID of Foo
Sentinel probe of Foo.outlined
Probe 2
```
Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address.
On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues.
The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding.
Reviewed By: wenlei, maksfb
Differential Revision: https://reviews.llvm.org/D135912
Differential Revision: https://reviews.llvm.org/D135914
Differential Revision: https://reviews.llvm.org/D136394
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests
Reviewed By: shchenz, RKSimon
Differential Revision: https://reviews.llvm.org/D132146
Summary:
Introduce NeverAlign fragment type.
The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.
In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).
This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.
The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.
LLVM: https://reviews.llvm.org/D97982
Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613
Test Plan: sandcastle
Reviewers: #llvm-bolt
Subscribers: phabricatorlinter
Differential Revision: https://phabricator.intern.facebook.com/D31361547