Reland #150574 with a MCStreamer::changeSection change:
In Mach-O, DWARF sections use Begin as a temporary label, requiring a label
definition, unlike section symbols in other file formats.
(Tested by dec978036ef1037753e7de5b78c978e71c49217b)
---
13a79bbfe583e1d8cc85d241b580907260065eb8 (2017) introduced fragment
creation in MCContext for createELFSectionImpl, which was inappropriate.
Fragments should only be created when using MCSteramer, not during
`MCContext::get*Section` calls.
`initMachOMCObjectFileInfo` defines multiple sections, some of which may
not be used by the code generator. This caused symbol names matching
these sections to be incorrectly marked as undefined (see
https://reviews.llvm.org/D55173).
The fragment code was later replicated in other file formats, such as
WebAssembly (see https://reviews.llvm.org/D46561), XCOFF, and GOFF.
This patch fixes the problem by moving initial fragment allocation from
MCContext::createSection to MCStreamer::changeSection.
While MCContext still creates a section symbol, the symbol is not
attached to the initial fragment. In addition,
* Move `emitLabel`/`setFragment` from `switchSection*` and
overridden changeSection to `MCObjectStreamer::changeSection` for
consistency.
* De-virtualize `switchSectionNoPrint`.
* test/CodeGen/XCore/section-name.ll now passes. XCore doesn't support
MCObjectStreamer. I don't think the MCAsmStreamer output behavior
change matters.
Pull Request: https://github.com/llvm/llvm-project/pull/150574
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
13a79bbfe583e1d8cc85d241b580907260065eb8 (2017) introduced fragment
creation in MCContext for createELFSectionImpl, which was inappropriate.
Fragments should only be created when using MCSteramer, not during
`MCContext::get*Section` calls.
`initMachOMCObjectFileInfo` defines multiple sections, some of which may
not be used by the code generator. This caused symbol names matching
these sections to be incorrectly marked as undefined (see
https://reviews.llvm.org/D55173).
The fragment code was later replicated in other file formats, such as
WebAssembly (see https://reviews.llvm.org/D46561), XCOFF, and GOFF.
This patch fixes the problem by moving initial fragment allocation from
MCContext::createSection to MCStreamer::changeSection.
While MCContext still creates a section symbol, the symbol is not
attached to the initial fragment.
In addition, move `emitLabel`/`setFragment` from `switchSection*` and
overridden changeSection to `MCObjectStreamer::changeSection` for
consistency.
* test/CodeGen/XCore/section-name.ll now passes. XCore doesn't support
MCObjectStreamer. I don't think the MCAsmStreamer output behavior
change matters.
Pull Request: https://github.com/llvm/llvm-project/pull/150574
- try_emplace(Key) is shorter than insert({Key, nullptr}).
- try_emplace performs value initialization without value parameters.
- We overwrite values on successful insertion anyway.
While we are at it, this patch simplifies the code with structured
binding.
Refactor the fragment representation of `push rax; jmp foo; nop; jmp foo`,
previously encoded as
`MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo)`,
to
```
MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)
```
Changes:
* Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment.
* The new MCFragment contains a fixed-size content (similar to previous
MCDataFragment) and an optional variable-size tail.
* The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and
FT_DwarfFrame, with plans to extend to other fragment types.
dyn_cast/isa should be avoided for the converted fragment subclasses.
* In `setVarFixups`, source fixup offsets are relative to the variable part's start.
Stored fixup (in `FixupStorage`) offsets are relative to the fixed part's start.
A lot of code does `getFragmentOffset(Frag) + Fixup.getOffset()`,
expecting the fixup offset to be relative to the fixed part's start.
* HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the
associated instruction for a fixup. We have to add a `const MCFragment &` parameter.
* In MCObjectStreamer, extend `absoluteSymbolDiff` to apply to
FT_Relaxable as otherwise there would be many more FT_DwarfFrame
fragments in -g compilations.
https://llvm-compile-time-tracker.com/compare.php?from=28e1473e8e523150914e8c7ea50b44fb0d2a8d65&to=778d68ad1d48e7f111ea853dd249912c601bee89&stat=instructions:u
```
stage2-O0-g instructins:u geomeon (-0.07%)
stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%)
```
```
% /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 59675
Align 2215
Data 29700
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
% /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}'
Total 32287
Align 2215
Data 2312
Dwarf 12044
DwarfCallFrame 4216
Fill 92
LEB 12
Relaxable 11396
```
Pull Request: https://github.com/llvm/llvm-project/pull/148544
... due to their close relationship. MCSection's inline functions (e.g.
iterator) access MCFragment, and we want MCFragment's inline functions
to access MCSection similarly (#146307).
Pull Request: https://github.com/llvm/llvm-project/pull/146315
Sections which are not allowed to carry data are marked as virtual. Only
complication when writing out the text is that it must be written in
chunks of 32k-1 bytes, which is done by having a wrapper stream writing
those records.
Data of BSS sections is not written, since the contents is known to be
zero. Instead, the fill byte value is used.
Unlike other formats, the GOFF object file format uses a 2 dimensional structure
to define the location of data. For example, the equivalent of the ELF .text
section is made up of a Section Definition (SD) and a class (Element Definition;
ED). The name of the SD symbol depends on the application, while the class has
the predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively.
Data can be placed into this structure in 2 ways. First, the data (in a text
record) can be associated with an ED symbol. To refer to data, a Label
Definition (LD) is used to give an offset into the data a name. When binding,
the whole data is pulled into the resulting executable, and the addresses
given by the LD symbols are resolved.
The alternative is to use a Part Definition (PR). In this case, the data (in
a text record) is associated with the part. When binding, only the data of
referenced PRs is pulled into the resulting binary.
Both approaches are used. SD, ED, and PR elements are modeled by nested
MCSectionGOFF instances, while LD elements are associated with MCSymbolGOFF
instances.
At the binary level, a record called "External Symbol Definition" (ESD) is used. The
ESD has a type (SD, ED, PR, LD), and depending on the type a different subset of
the fields is used.
When MCContext is used for the second compile of
llvm/test/MC/ELF/twice.ll, ensure that .rel.text and .rel.eh_frame
strings do not come from the previous compilation copies.
GNU Assembler supports symbol reassignment via .set, .equ, or =.
However, LLVM's integrated assembler only allows reassignment for
MCConstantExpr cases, as it struggles with scenarios like:
```
.data
.set x, 0
.long x // reference the first instance
x = .-.data
.long x // reference the second instance
.set x,.-.data
.long x // reference the third instance
```
Between two assignments binds, we cannot ensure that a reference binds
to the earlier assignment. We use MCSymbol::IsUsed and other conditions
to reject potentially unsafe reassignments, but certain MCConstantExpr
uses could be unsafe as well.
This patch enables reassignment by cloning the symbol upon reassignment
and updating the symbol table. Existing references to the original
symbol remain unchanged, and the original symbol is excluded from the
emitted symbol table.
try_emplace with is much shorter and simpler if we are
default-constructing the value.
While I'm at it, this patch uses structured bindings to receive the
return value from try_emplace.
gas has supported " quoted symbols since 2015.
Both \ and " need to be escaped.
https://sourceware.org/pipermail/binutils/2015-August/090003.html
We don't unescape \\ or \" in assembly strings, leading to clang -c
--save-temps vs clang -c difference for the following C code:
```
int x asm("a\"\\b");
```
Fix#138390
MC/COFF/safeseh.h looks incorrect. \01 in `.safeseh "\01foo"` is not a
correct escape sequence. Change it to \\
Pull Request: https://github.com/llvm/llvm-project/pull/138817
They have same semantics. NonUniqueID is more friendly for isUnique
implementation in MCSectionELF.
History: 97837b7 added support for unique IDs in sections and added
GenericSectionID. Later, 1dc16c7 added NonUniqueID.
The check for `isOSWindows() || isUEFI()` is used in several places
across the codebase. Introducing `isOSWindowsOrUEFI()` in Triple.h
to simplify these checks.
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:
1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
- [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
- [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested
See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
Add `createLocalSymbol` to create a local, non-temporary symbol.
Different from `createRenamableSymbol`, the `Used` bit is ignored,
therefore multiple local symbols might share the same name.
Utilizing `createLocalSymbol` in AArch64 allows for efficient mapping
symbol creation with non-unique names, saving .strtab space.
The behavior matches GNU assembler.
Pull Request: https://github.com/llvm/llvm-project/pull/99836
This was introduced in dcb71c06c7b059e313f22e46bc9c41343a03f1eb to help
migrate away raw `operator new` and refactor the fragment
representation.
This is now unneeded after `MCStreamer::CurFrag` and
`MCSection::CurFragList` refactoring.
13a79bbfe583e1d8cc85d241b580907260065eb8 (2017) unified `BeginSymbol` and
section symbol for ELF. This patch does the same for COFF.
* In getCOFFSection, all sections now have a `BeginSymbol` (section
symbol). We do not need a dummy symbol name when `getBeginSymbol` is
needed (used by AsmParser::Run and DWARF generation).
* Section symbols are in the global symbol table. `call .text` will
reference the section symbol instead of an undefined symbol. This
matches GNU assembler. Unlike GNU, redefining the section symbol will
cause a "symbol 'foo0' is already defined" error (see
`section-sym-err.s`).
Pull Request: https://github.com/llvm/llvm-project/pull/96459
MCAssembler::layout ensures that every section has at least one
fragment, which simplifies MCAsmLayout::getSectionAddressSize (see
e73353c7201a3080851d99a16f5fe2c17f7697c6 from 2010). It's better to
ensure the condition is satisfied at create time (COFF, GOFF, Mach-O) to
simplify more fragment processing.
Follow-up to 05ba5c0648ae5e80d5afce270495bf3b1eef9af4. uint32_t is
preferred over const MCExpr * in the section stack uses because it
should only be evaluated once. Change the paramter type to match.
#95197 and 75006466296ed4b0f845cbbec4bf77c21de43b40 eliminated all raw
`new MCXXXFragment`. We can now place fragments in a bump allocator.
In addition, remove the dead `Kind == FragmentType(~0)` condition.
~CodeViewContext may call `StrTabFragment->destroy()` and need to be
reset before `FragmentAllocator.Reset()`.
Tested by llvm/test/MC/COFF/cv-compiler-info.ll using asan.
Pull Request: https://github.com/llvm/llvm-project/pull/96402
Internal label names never occur in the symbol table, so when using an
object streamer, there's no point in constructing these names and then
adding them to hash tables -- they are never visible in the output.
It's not possible to reuse createTempSymbol, because on BPF has a
different prefix for globals and basic blocks right now.
Previously, a symbol insertion requires (at least) three hash table
operations:
- Lookup/create entry in Symbols (main symbol table)
- Lookup NextUniqueID to deduplicate identical temporary labels
- Add entry to UsedNames, which is also used to serve as storage for the
symbol name in the MCSymbol.
All three lookups are done with the same name, so combining these into a
single table reduces the number of lookups to one. Thus, a pointer to a
symbol table entry can be passed to createSymbol to avoid a duplicate
lookup of the same name.
The new symbol table entry value is placed in a separate header to avoid
including MCContext in MCSymbol or vice versa.
There are only three actual uses of the section kind in MCSection:
isText(), XCOFF, and WebAssembly. Store isText() in the MCSection, and
store other info in the actual section variants where required.
ELF and COFF flags also encode all relevant information, so for these
two section variants, remove the SectionKind parameter entirely.
This allows to remove the string switch (which is unnecessary and
inaccurate) from createELFSectionImpl. This was introduced in
[D133456](https://reviews.llvm.org/D133456), but apparently, it was
never hit for non-writable sections anyway and the resulting kind was
never used.
`allocFragment` might be changed to a placement new when the allocation
strategy changes.
`allocInitialFragment` is to deduplicate the following pattern
```
auto *F = new MCDataFragment();
Result->addFragment(*F);
F->setParent(Result);
```
Pull Request: https://github.com/llvm/llvm-project/pull/95197
Also delete `AllowTemporaryLabels = true` from MCContext::reset: when
llc supports -save-temp-labels in the next change, this assignment
should be removed to support -compile-twice.
This avoids std::map, which is slow, and uses a StringMap. Section name,
group name, linked-to name and unique id are encoded into the key for
fast lookup.
This gives a measurable performance boost for applications that compile
many small object files (e.g., functions in JIT compilers).
---
Now also the second case works properly. That's what happens when you do
that last refactoring without re-running all tests... sorry.
There's only one way to create unnamed symbols (createTempSymbol).
Previously, the name was evaluated unconditionally, but often
unnecessarily. Avoid this.
Also the parameter names in the header were wrong, fix these.
Reverts llvm/llvm-project#95006
Seems like there's some bug where the section name is empty in the `if
(!Section.isSingleStringRef())`. Revert for now to get builds back to
green.
This avoid std::map, which is slow, and uses a StringMap. Section name,
group name, linked-to name and unique id are encoded into the key for
fast lookup.
This gives a measurable performance boost (>3%) for applications that
compile many small object files (e.g., functions in JIT compilers).
There is no need for an ordered std::map and also no need to duplicate
the section name, which is owned by the ELFSectionKey. Therefore, use a
DenseMap instead and don't copy the string. As a further, minor
performance optimization, avoid the hash table lookup in
isELFGenericMergeableSection when the section name was just added.
This slightly improves compilation performance in our application, where
we occasionally compile many small object files.
Fixes#85578, a use-after-free caused by some `MCSymbolWasm` data being
freed too early.
Previously, `WebAssemblyAsmParser` owned the data that is moved to
`MCContext` by this PR, which caused problems when handling module ASM,
because the ASM parser was destroyed after parsing the module ASM, but
the symbols persisted.
The added test passes locally with an LLVM build with AddressSanitizer
enabled.
Implementation notes:
* I've called the added method
<code>allocate<b><i>Generic</i></b>String</code> and added the second
paragraph of its documentation to maybe guide people a bit on when to
use this method (based on my (limited) understanding of the `MCContext`
class). We could also just call it `allocateString` and remove that
second paragraph.
* The added `createWasmSignature` method does not support taking the
return and parameter types as arguments: Specifying them afterwards is
barely any longer and prevents them from being accidentally specified in
the wrong order.
* This removes a _"TODO: Do the uniquing of Signatures here instead of
ObjectFileWriter?"_ since the field it's attached to is also removed.
Let me know if you think that TODO should be preserved somewhere.