The fixed-size content of the MCFragment object is now stored as
trailing data, replacing ContentStart/ContentEnd with ContentSize. The
available space for trailing data is tracked using `FragSpace`. If the
available space is insufficient, a new block is allocated within the
bump allocator `MCObjectStreamer::FragStorage`.
FragList::Tail cannot be reused when switching sections or subsections,
as it is not associated with the fragment space tracked by `FragSpace`.
Instead, allocate a new fragment, which becomes less expensive after #150574.
Data can only be appended to the tail fragment of a subsection, not to
fragments in the middle. Post-assembler-layout adjustments (such as
.llvm_addrsig and .llvm.call-graph-profile) have been updated to use the
variable-size part instead.
---
This reverts commit a2fef664c29a53bfa8a66927fcf8b2e5c9da4876,
which reverted the innocent f1aa6050bd90f8ec4273da55d362e23905ad3a81 .
Commit df71243fa885cd3db701dc35a0c8d157adaf93b3, the MCOrgFragment fix,
has fixed the root cause of https://github.com/ClangBuiltLinux/linux/issues/2116
This reverts commit f1aa6050bd90f8ec4273da55d362e23905ad3a81 (reland of #150846),
fixing conflicts.
It caused https://github.com/ClangBuiltLinux/linux/issues/2116 ,
which surfaced after a subsequent commit faa931b717c02d57f0814caa9133219040e6a85b decreased sizeof(MCFragment).
```
% /tmp/Debug/bin/clang "-cc1as" "-triple" "aarch64" "-filetype" "obj" "-main-file-name" "a.s" "-o" "a.o" "a.s"
clang: /home/ray/llvm/llvm/lib/MC/MCAssembler.cpp:615: void llvm::MCAssembler::writeSectionData(raw_ostream &, const MCSection *) const: Assertion `getContext().hadError() || OS.tell() - Start == getSectionAddressSize(*Sec)' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: /tmp/Debug/bin/clang -cc1as -triple aarch64 -filetype obj -main-file-name a.s -o a.o a.s
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 libLLVMSupport.so.22.0git 0x00007cf91eb753cd llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 61
fish: Job 1, '/tmp/Debug/bin/clang "-cc1as" "…' terminated by signal SIGABRT (Abort)
```
The test is sensitive to precise fragment offsets. Using llvm-mc
-filetype=obj -triple=aarch64 a.s does not replicate the issue. However,
clang -cc1as includes an unnecessary `initSection` (adding an extra
FT_Align), which causes the problem.
Reapply after #151724 switched to `char *Data`, fixing a
-fsanitize=pointer-overflow issue in MCAssembler::layout.
---
The fixed-size content of the MCFragment object is now stored as
trailing data, replacing ContentStart/ContentEnd with ContentSize. The
available space for trailing data is tracked using `FragSpace`. If the
available space is insufficient, a new block is allocated within the
bump allocator `MCObjectStreamer::FragStorage`.
FragList::Tail cannot be reused when switching sections or subsections,
as it is not associated with the fragment space tracked by `FragSpace`.
Instead, allocate a new fragment, which becomes less expensive after #150574.
Data can only be appended to the tail fragment of a subsection, not to
fragments in the middle. Post-assembler-layout adjustments (such as
.llvm_addrsig and .llvm.call-graph-profile) have been updated to use the
variable-size part instead.
Pull Request: https://github.com/llvm/llvm-project/pull/150846
The fixed-size content of the MCFragment object is now stored as
trailing data, replacing ContentStart/ContentEnd with ContentSize. The
available space for trailing data is tracked using `FragSpace`. If the
available space is insufficient, a new block is allocated within the
bump allocator `MCObjectStreamer::FragStorage`.
FragList::Tail cannot be reused when switching sections or subsections,
as it is not associated with the fragment space tracked by `FragSpace`.
Instead, allocate a new fragment, which becomes less expensive after #150574.
Data can only be appended to the tail fragment of a subsection, not to
fragments in the middle. Post-assembler-layout adjustments (such as
.llvm_addrsig and .llvm.call-graph-profile) have been updated to use the
variable-size part instead.
Pull Request: https://github.com/llvm/llvm-project/pull/150846
Add an assert to ensure `CurFrag` is either null or an `FT_Data` fragment.
Follow-up to 39c8cfb70d203439e3296dfdfe3d41f1cb2ec551.
Extracted from #149721
Refactor the fragment representation of `push rax; jmp foo; nop; jmp foo`,
previously encoded as
`MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo)`,
to
```
MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)
```
Changes:
* Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment.
* The new MCFragment contains a fixed-size content (similar to previous
MCDataFragment) and an optional variable-size tail.
* The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and
FT_DwarfFrame, with plans to extend to other fragment types.
dyn_cast/isa should be avoided for the converted fragment subclasses.
* In `setVarFixups`, source fixup offsets are relative to the variable part's start.
Stored fixup (in `FixupStorage`) offsets are relative to the fixed part's start.
A lot of code does `getFragmentOffset(Frag) + Fixup.getOffset()`,
expecting the fixup offset to be relative to the fixed part's start.
* HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the
associated instruction for a fixup. We have to add a `const MCFragment &` parameter.
* In MCObjectStreamer, extend `absoluteSymbolDiff` to apply to
FT_Relaxable as otherwise there would be many more FT_DwarfFrame
fragments in -g compilations.
https://llvm-compile-time-tracker.com/compare.php?from=28e1473e8e523150914e8c7ea50b44fb0d2a8d65&to=778d68ad1d48e7f111ea853dd249912c601bee89&stat=instructions:u
```
stage2-O0-g instructins:u geomeon (-0.07%)
stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%)
```
```
% /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 59675
Align 2215
Data 29700
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
% /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 32287
Align 2215
Data 2312
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
```
Pull Request: https://github.com/llvm/llvm-project/pull/148544
Adds support for emitting Windows x64 Unwind V2 information, includes
support `/d2epilogunwind` in clang-cl.
Unwind v2 adds information about the epilogs in functions such that the
unwinder can unwind even in the middle of an epilog, without having to
disassembly the function to see what has or has not been cleaned up.
Unwind v2 requires that all epilogs are in "canonical" form:
* If there was a stack allocation (fixed or dynamic) in the prolog, then
the first instruction in the epilog must be a stack deallocation.
* Next, for each `PUSH` in the prolog there must be a corresponding
`POP` instruction in exact reverse order.
* Finally, the epilog must end with the terminator.
This change adds a pass to validate epilogs in modules that have Unwind
v2 enabled and, if they pass, emits new pseudo instructions to MC that
1) note that the function is using unwind v2 and 2) mark the start of
the epilog (this is either the first `POP` if there is one, otherwise
the terminator instruction). If a function does not meet these
requirements, it is downgraded to Unwind v1 (i.e., these new pseudo
instructions are not emitted).
Note that the unwind v2 table only marks the size of the epilog in the
"header" unwind code, but it's possible for epilogs to use different
terminator instructions thus they are not all the same size. As a work
around for this, MC will assume that all terminator instructions are
1-byte long - this still works correctly with the Windows unwinder as it
is only using the size to do a range check to see if a thread is in an
epilog or not, and since the instruction pointer will never be in the
middle of an instruction and the terminator is always at the end of an
epilog the range check will function correctly. This does mean, however,
that the "at end" optimization (where an epilog unwind code can be
elided if the last epilog is at the end of the function) can only be
used if the terminator is 1-byte long.
One other complication with the implementation is that the unwind table
for a function is emitted during streaming, however we can't calculate
the distance between an epilog and the end of the function at that time
as layout hasn't been completed yet (thus some instructions may be
relaxed). To work around this, epilog unwind codes are emitted via a
fixup. This also means that we can't pre-emptively downgrade a function
to Unwind v1 if one of these offsets is too large, so instead we raise
an error (but I've passed through the location information, so the user
will know which of their functions is problematic).
This should fix#66912. When emitting SEH unwind info, we need to be
able to calculate the exact length of functions before alignments are
fixed. Until that limitation is overcome, just disable all loop
alignment on Windows targets.
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
I don't think compiler-generated code could actually be affected by
this, but better to be thorough.
Differential Revision: https://reviews.llvm.org/D139048
Representing this as 12 separate operations is a bit ugly, but
trying to represent the different modes using a bitfield seemed worse.
Differential Revision: https://reviews.llvm.org/D135417
This matches what was done for the ARM implementation (where getting
the instruction sizes right is even more tricky, and hence needed
tighter testing).
This will allow catching any future cases where prologs and epilogs
don't match the instructions within them.
Differential Revision: https://reviews.llvm.org/D131394
This new opcode was initially documented as "pac_sign_return_address"
in https://github.com/MicrosoftDocs/cpp-docs/pull/4202, but was
soon afterwards renamed into "pac_sign_lr" in
https://github.com/MicrosoftDocs/cpp-docs/pull/4209, as the other
name was unwieldy, and there were no other external references to
that name anywhere.
Rename our external .seh assembler directive - it hasn't been merged
for very long yet, so there's probably no external use to account for.
Rename all other internal references to the opcode similarly.
Differential Revision: https://reviews.llvm.org/D135762
Exclude the terminating end opcode from the epilog - it doesn't
correspond to an actual instruction that is included in the epilog
itself (within the .seh_startepilogue/.seh_endepilogue range).
In most (all?) cases, an epilog is followed by a matching terminating
instruction though (a ret or a branch to a tail call), but it's not
strictly within the .seh_startepilogue/.seh_endepilogue range.
This fixes a number of failed asserts in cases where the codegen
has incorrectly reoredered SEH opcodes so they don't match up
exactly with their instructions.
However this still just avoids failing the assertion; the root cause
of generating unexpected epilogs is still present (and fixing that is
a less obvious issue).
Differential Revision: https://reviews.llvm.org/D131393
Create function segments and emit unwind info of them.
A segment must be less than 1MB and no prolog or epilog is splitted between two
segments.
This patch should generate correct, though not optimal, unwind info for large
functions. Currently it only generate pacted info (.pdata) only for functions
that are less than 1MB (single-segment functions). This is NFC from before this
patch.
The next step is to enable (.pdata) only unwind info for the first segment or
segments that have neither prolog or epilog in a multi-segment function.
Another future work item is to further split segments that require more than 255
code words or have more than 65535 epilogs.
Reference:
https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling#function-fragments
Differential Revision: https://reviews.llvm.org/D130049
It's a fairly common issue that the generating code incorrectly marks
instructions as narrow or wide; check that the instruction lengths
add up to the expected value, and error out if it doesn't. This allows
catching code generation bugs.
Also check that prologs and epilogs are properly terminated, to
catch other code generation issues.
Differential Revision: https://reviews.llvm.org/D125647
This includes .seh_* directives for generating it from assembly.
It is designed fairly similarly to the ARM64 handling.
For .seh_handler directives, such as
".seh_handler __C_specific_handler, @except" (which is supported
on x86_64 and aarch64 so far), the "@except" bit doesn't work in
ARM assembly, as '@' is used as a comment character (on all current
platforms).
Allow using '%' instead of '@' for this purpose. This convention
is used by GAS in similar contexts already,
e.g. [1]:
Note on targets where the @ character is the start of a comment
(eg ARM) then another character is used instead. For example the
ARM port uses the % character.
In practice, this unfortunately means that all such .seh_handler
directives will need ifdefs for ARM.
Contrary to ARM64, on ARM, it's quite common that we can't evaluate
e.g. the function length at this point, due to instructions whose
length is finalized later. (Also, inline jump tables end with
a ".p2align 1".)
If unable to to evaluate the function length immediately, emit
it as an MCExpr instead. If we'd implement splitting the unwind
info for a function (which isn't implemented for ARM64 yet either),
we wouldn't know whether we need to split it though.
Avoid calling getFrameIndexOffset() on an unset
FuncInfo.UnwindHelpFrameIdx, to avoid triggering asserts in the
preexisting testcase CodeGen/ARM/Windows/wineh-basic.ll. (Once
MSVC exception handling is fully implemented, those changes
can be reverted.)
[1] https://sourceware.org/binutils/docs/as/Section.html#Section
Differential Revision: https://reviews.llvm.org/D125645
For ARM SEH, the epilogs will need a little more associated data than
just the plain list of opcodes.
This is a preparatory refactoring for D125645.
Differential Revision: https://reviews.llvm.org/D125879
There's an inconsistency regarding the epilogs of packed ARM64
unwind info with homed parameters; according to the documentation
(and according to common sense), the epilog wouldn't have a series
of nop instructions matching the stp x0-x7 in the prolog - however
in practice, RtlVirtualUnwind still seems to behave as if the epilog
does have the mirrored nops from the prolog.
In practice, MSVC doesn't seem to produce packed unwind info with
homed parameters, which might be why this inconsistency hasn't
been noticed.
Thus, to play it safe, avoid creating such packed unwind info with
homed parameters. (LLVM's current behaviour matches the current
runtime behaviour of RtlVirtualUnwind, but if it later is bug fixed
to match the documentation, such unwind information would be
incorrect.)
See https://github.com/llvm/llvm-project/issues/54879 for further
discussion on the matter.
Differential Revision: https://reviews.llvm.org/D125876
This allows sharing opcodes between prolog and epilog even when there
is more than one epilog.
I didn't make any handcrafted special MC level testcases for this (yet
at least), but it does seem to have the expected effect on two existing
CodeGen level testcases.
Differential Revision: https://reviews.llvm.org/D125619
The "packed epilog" form only implies that the epilog is located
exactly at the end of the function (so the location of the epilog
is implicit from the epilog opcodes), but it doesn't have to share
opcodes with the prolog - as long as the total number of opcode
bytes and the offset to the epilog fit within the bitfields.
This avoids writing a 4 byte epilog scope in many cases. (I haven't
measured how much this shrinks actual xdata sections in practice
though.)
Differential Revision: https://reviews.llvm.org/D125536
operator== and operator!= were added in
1308bb99e06752ab0b5175c92da31083f91af921 / D87369, but this existing
codepath wasn't updated to use them.
Also fix the indentation of the enclosed liens.
Differential Revision: https://reviews.llvm.org/D125368
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:
llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h
Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after: 1049293745
Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
Previously the relocations pointed at the public user facing,
possibly external symbol.
When the function itself is weak, that symbol may be overridden at
link time, pointing at another strong implementation of the same
function instead. In that case, there's two conflicting pdata entries
pointing at the same address, and the wrong unwind info might end up
used.
Both GCC/binutils and MSVC produce pdata pointing at internal static
symbols. (GCC/binutils point at the .text section just as LLVM does
after this change, MSVC points at special label type symbols with the
type IMAGE_SYM_CLASS_LABEL and names like '$LN4'.)
This fixes unwinding through an overridden "operator new" with a
statically linked C++ library in MinGW mode. (Building libc++ with
-ffunction-sections and linking with --gc-sections might avoid the
issue too.)
This makes the produced object files a little less user friendly
to debug, but with other recent improvements for llvm-readobj, the
unwind info debugging experience should be pretty much the same.
Differential Revision: https://reviews.llvm.org/D109651
This reapplies 36c64af9d7f97414d48681b74352c9684077259b in updated
form.
Emit the xdata for each function at .seh_endproc. This keeps the
exact same output header order for most code generated by the LLVM
CodeGen layer. (Sections still change order for code built from
assembly where functions lack an explicit .seh_handlerdata
directive, and functions with chained unwind info.)
The practical effect should be that assembly output lacks
superfluous ".seh_handlerdata; .text" pairs at the end of functions
that don't handle exceptions, which allows such functions to use
the AArch64 packed unwind format again.
Differential Revision: https://reviews.llvm.org/D87448
In practice, this only gives modest savings (for a 6.5 MB DLL with
230 KB xdata, the xdata sections shrinks by around 2.5 KB); to
gain more, the frame lowering would need to be tweaked to more often
generate frame layouts that match the canonical layouts that can
be written in packed form.
Differential Revision: https://reviews.llvm.org/D87371
This gives a pretty substantial size reduction; for a 6.5 MB
DLL with 300 KB .xdata, the .xdata shrinks by 66 KB.
Differential Revision: https://reviews.llvm.org/D87369