451 Commits

Author SHA1 Message Date
Daniil Kovalev
cca9115b1c
[lld][AArch64][ELF][PAC] Support AUTH relocations and AUTH ELF marking (#72714)
This patch adds lld support for:

- Dynamic R_AARCH64_AUTH_* relocations (without including RELR compressed AUTH
relocations) as described here:
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#auth-variant-dynamic-relocations

- .note.AARCH64-PAUTH-ABI-tag section as defined here
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#elf-marking

Depends on #72713 and #85231

---------

Co-authored-by: Peter Collingbourne <peter@pcc.me.uk>
Co-authored-by: Fangrui Song <i@maskray.me>
2024-04-04 12:38:09 +03:00
Fangrui Song
dd7151094f [ELF] Simplify parseArmCMSEImportLib. NFC 2024-03-25 15:46:52 -07:00
Fangrui Song
36146d2b6b [ELF] Make LinkerDrive::link a template. NFC
This avoids many invokeELFT in `link`.
2024-03-19 17:12:40 -07:00
Fangrui Song
e115c00565
[ELF] Reject certain unknown section types (#85173)
Unknown section sections may require special linking rules, and
rejecting such sections for older linkers may be desired. For example,
if we introduce a new section type to replace a control structure (e.g.
relocations), it would be nice for older linkers to reject the new
section type. GNU ld allows certain unknown section types:

* [SHT_LOUSER,SHT_HIUSER] and non-SHF_ALLOC
* [SHT_LOOS,SHT_HIOS] and non-SHF_OS_NONCONFORMING

but reports errors and stops linking for others (unless
--no-warn-mismatch is specified). Port its behavior. For convenience, we
additionally allow all [SHT_LOPROC,SHT_HIPROC] types so that we don't
have to hard code all known types for each processor.

Close https://github.com/llvm/llvm-project/issues/84812
2024-03-15 09:50:23 -07:00
Fangrui Song
f1ca2a0967
[ELF] Add --compress-section to compress matched non-SHF_ALLOC sections
--compress-sections <section-glib>=[none|zlib|zstd] is similar to
--compress-debug-sections but applies to broader sections without the
SHF_ALLOC flag. lld will report an error if a SHF_ALLOC section is
matched. An interesting use case is to compress `.strtab`/`.symtab`,
which consume a significant portion of the file size (15.1% for a
release build of Clang).

An older revision is available at https://reviews.llvm.org/D154641 .
This patch focuses on non-allocated sections for safety. Moving
`maybeCompress` as D154641 does not handle STT_SECTION symbols for
`-r --compress-debug-sections=zlib` (see `relocatable-section-symbol.s`
from #66804).

Since different output sections may use different compression
algorithms, we need CompressedData::type to generalize
config->compressDebugSections.

GNU ld feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=27452

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: https://github.com/llvm/llvm-project/pull/84855
2024-03-12 10:56:14 -07:00
Fangrui Song
78762357d4 [ELF] Support placing .lbss/.lrodata/.ldata after .bss
https://reviews.llvm.org/D150510 places .lrodata before .rodata to
minimize the number of permission transitions in the memory image.
However, this layout is less ideal for -fno-pic code (which is still
important).

Small code model -fno-pic code has R_X86_64_32S relocations with a range
of `[0,2**31)` (if we ignore the negative area). Placing `.lrodata`
earlier exerts relocation pressure on such code. Non-x86 64-bit
architectures generally have a similar `[0,2**31)` limitation if they
don't use PC-relative relocations.

If we place .lrodata later, we will need one extra PT_LOAD. Two layouts
are appealing:

* .bss/.lbss/.lrodata/.ldata (GNU ld)
* .bss/.ldata/.lbss/.lrodata

The GNU ld layout has the nice property that there is only one BSS
(except .tbss/.relro_padding). Add -z lrodata-after-bss to support
this layout.

Since a read-only PT_LOAD segment (for large data sections) may appear
after RW PT_LOAD segments. The placement of `_etext` has to be adjusted.

Pull Request: https://github.com/llvm/llvm-project/pull/81224
2024-02-20 13:59:49 -08:00
Rahman Lavaee
acec6419e8
[SHT_LLVM_BB_ADDR_MAP] Allow basic-block-sections and labels be used together by decoupling the handling of the two features. (#74128)
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
    
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.

First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.

| | |
|--|--|
| Address of the function | Function Address |
|  Number of basic blocks in this function | NumBlocks |
|  BB entry 1
|  BB entry 2
|   ...
|  BB entry #NumBlocks
    
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
    
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
|  Number of sections for this function   | NumBBRanges |
| Section 1 begin address                     | BaseAddress[1]  |
| Number of basic blocks in section 1 | NumBlocks[1]    |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
    
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
    
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
    
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.

We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
  - New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
2024-02-01 17:50:46 -08:00
Fangrui Song
43b13341fb
[ELF] Add internal InputFile (#78944)
Based on https://reviews.llvm.org/D45375 . Introduce a new InputFile
kind `InternalKind`, use it for

* `ctx.internalFile`: for linker-defined symbols and some synthesized
`Undefined`
* `createInternalFile`: for symbol assignments and --defsym

I picked "internal" instead of "synthetic" to avoid confusion with
SyntheticSection.

Currently a symbol's file is one of: nullptr, ObjKind, SharedKind,
BitcodeKind, BinaryKind. Now it's non-null (I plan to add an
`assert(file)` to Symbol::Symbol and change `toString(const InputFile
*)`
separately).

Debugging and error reporting gets improved. The immediate user-facing
difference is more descriptive "File" column in the --cref output. This
patch may unlock further simplification.

Currently each symbol assignment gets its own
`createInternalFile(cmd->location)`. Two symbol assignments in a linker
script do not share the same file. Making the file the same would be
nice, but would require non trivial code.
2024-01-22 09:09:46 -08:00
Kazu Hirata
d7b18d5083 Use llvm::endianness{,::little,::native} (NFC)
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form.  This patch replaces
llvm::support::endianness with llvm::endianness.
2023-10-09 00:54:47 -07:00
spupyrev
904b3f66f5 [ELF] A new code layout algorithm for function reordering [3a/3]
We are brining a new algorithm for function layout (reordering) based on the
call graph (extracted from a profile data). The algorithm is an improvement of
top of a known heuristic, C^3. It tries to co-locate hot and frequently executed
together functions in the resulting ordering. Unlike C^3, it explores a larger
search space and have an objective closely tied to the performance of
instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort.
The algorithm can be used at the linking or post-linking (e.g., BOLT) stage.
Refer to https://reviews.llvm.org/D152834 for the actual implementation of the
reordering algorithm.

This diff adds a linker option to replace the existing C^3 heuristic with CDS.
The new behavior can be turned on by passing "--use-cache-directed-sort".
(the plan is to make it default in a next diff)

**Perf-impact**
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in [0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in [0.8%..1.5%]

Note that function layout affects the perf the most on older machines (with
smaller instruction/iTLB caches) and when huge pages are not enabled. The impact
on newer processors with huge pages enabled is likely neutral/minor.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D152840
2023-09-26 06:24:34 -07:00
Fangrui Song
8c556b7e2b [ELF] Change --call-graph-profile-sort to accept an argument
Change the FF form --call-graph-profile-sort to --call-graph-profile-sort={none,hfsort}.
This will be extended to support llvm/lib/Transforms/Utils/CodeLayout.cpp.

--call-graph-profile-sort is not used in the wild but
--no-call-graph-profile-sort is (Chromium). Make --no-call-graph-profile-sort an
alias for --call-graph-profile-sort=none.

Reviewed By: rahmanl

Differential Revision: https://reviews.llvm.org/D159544
2023-09-25 09:49:40 -07:00
modimo
272bd6f9cc [WPD][LLD] Add option to validate RTTI is enabled on all native types and prevent devirtualization on types with native RTTI
Discussion about this approach: https://discourse.llvm.org/t/rfc-safer-whole-program-class-hierarchy-analysis/65144/18

When enabling WPD in an environment where native binaries are present, types we want to optimize can be derived from inside these native files and devirtualizing them can lead to correctness issues. RTTI can be used as a way to determine all such types in native files and exclude them from WPD providing a safe checked way to enable WPD.

The approach is:
1. In the linker, identify if RTTI is available for all native types. If not, under `--lto-validate-all-vtables-have-type-infos` `--lto-whole-program-visibility` is automatically disabled. This is done by examining all .symtab symbols in object files and .dynsym symbols in DSOs for vtable (_ZTV) and typeinfo (_ZTI) symbols and ensuring there's always a match for every vtable symbol.
2. During thinlink, if `--lto-validate-all-vtables-have-type-infos` is set and RTTI is available for all native types, identify all typename (_ZTS) symbols via their corresponding typeinfo (_ZTI) symbols that are used natively or outside of our summary and exclude them from WPD.

Testing:
ninja check-all
large Meta service that uses boost, glog and libstdc++.so runs successfully with WPD via --lto-whole-program-visibility. Previously, native types in boost caused incorrect devirtualization that led to crashes.

Reviewed By: MaskRay, tejohnson

Differential Revision: https://reviews.llvm.org/D155659
2023-09-18 15:51:49 -07:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Fangrui Song
65a15a56d5 [ELF] Respect orders of symbol assignments and DEFINED (#65866)
Fix #64600: the currently implementation is minimal (see
https://reviews.llvm.org/D83758), and an assignment like
`__TEXT_REGION_ORIGIN__ = DEFINED(__TEXT_REGION_ORIGIN__) ? __TEXT_REGION_ORIGIN__ : 0;`
(used by avr-ld[1]) leads to a value of zero (default value in `declareSymbol`),
which is unexpected.

Assign orders to symbol assignments and references so that
for a script-defined symbol, the `DEFINED` results match users'
expectation. I am unclear about GNU ld's exact behavior, but this hopefully
matches its behavior in the majority of cases.

[1]: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/scripttempl/avr.sc
2023-09-11 10:54:49 -07:00
Shoaib Meenai
97e39f96c8 [ELF] Add -Bsymbolic-non-weak
This adds a new -Bsymbolic option that directly binds all non-weak
symbols. There's a couple of reasons motivating this:
* The new flag will match the default behavior on Mach-O, so you can get
  consistent behavior across platforms.
* We have use cases for which making weak data preemptible is useful,
  but we don't want to pessimize access to non-weak data. (For a large
  internal app, we measured 2000+ data symbols whose accesses would be
  unnecessarily pessimized by `-Bsymbolic-functions`.)

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D158322
2023-08-21 09:11:51 -07:00
Paul Kirth
14e3bec8fc Reland "[lld] Preliminary fat-lto-object support"
This patch adds support to lld for --fat-lto-objects. We add a new
--fat-lto-objects option to LLD, and slightly change how it chooses input
files in the driver when the option is set.

Fat LTO objects contain both LTO compatible IR, as well as generated object
code. This allows users to defer the choice of whether to use LTO or not to
link-time. This is a feature available in GCC for some time, and makes the
existing -ffat-lto-objects option functional in the same way as GCC's.

If the --fat-lto-objects option is passed to LLD and the input files are fat
object files, then the linker will chose the LTO compatible bitcode sections
embedded within the fat object and link them together using LTO. Otherwise,
standard object file linking is done using the assembly section in the object
files.

The previous version of this patch had a missing `REQUIRES: x86` line in
`fatlto.invalid.s`. Additionally, it was reported that this patch caused
a test failure in `export-dynamic-symbols.s`, however,
29112a994694baee070a2021e00f772f1913d214 disabled the
`export-dynamic-symbols.s` test on Windows due to a quotation difference
between platforms, unrelated to this patch.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D146778
2023-08-18 22:51:25 +00:00
Paul Kirth
1733d94963 Revert "[lld] Preliminary fat-lto-object support"
This reverts commit c9953d9891a6067549a78e7d07ca8eb6a7596792 and a
forward fix in 3a45b843dec1bca195884aa1c5bc56bd0e6755b4.

D14677 causes some failure on windows bots that the forward fix did not
address. Thus I'm reverting until the underlying cause can me triaged.
2023-07-20 03:37:48 +00:00
Paul Kirth
3a45b843de [lld] Preliminary fat-lto-object support
This patch adds support to lld for --fat-lto-objects. We add a new
--fat-lto-objects flag to LLD, and slightly change how it chooses input
files in the driver when the flag is set.

Fat LTO objects contain both LTO compatible IR, as well as generated object
code. This allows users to defer the choice of whether to use LTO or not to
link-time. This is a feature available in GCC for some time, and makes the
existing -ffat-lto-objects flag functional in the same way as GCC's.

If the --fat-lto-objects option is passed to LLD and the input files are fat
object files, then the linker will chose the LTO compatible bitcode sections
embedded within the fat object and link them together using LTO. Otherwise,
standard object file linking is done using the assembly section in the object
files.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Depends on D146777

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D146778
2023-07-19 23:07:42 +00:00
Matthew Voss
ab9b3c84a5 [lld] A Unified LTO Bitcode Frontend
The unified LTO pipeline creates a single LTO bitcode structure that can
be used by Thin or Full LTO. This means that the LTO mode can be chosen
at link time and that all LTO bitcode produced by the pipeline is
compatible, from an optimization perspective. This makes the behavior of
LTO a bit more predictable by normalizing the set of LTO features
supported by each LTO bitcode file.

Example usage:

clang -flto -funified-lto -fuse-ld=lld foo.c

clang -flto=thin -funified-lto -fuse-ld=lld foo.c

clang -c -flto -funified-lto foo.c  # -flto={full,thin} are identical in
terms of compilation actions
clang -flto=thin -fuse-ld=lld foo.o # pass --lto=thin to ld.lld

clang -c -flto -funified-lto foo.c clang -flto -fuse-ld=lld foo.o

The RFC discussing the details and rational for this change is here:
https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774

Differential Revision: https://reviews.llvm.org/D123805
2023-07-18 16:13:58 -07:00
Alex Brachet
67a212af4c [ELF] Make subsequent opens to auxiliary files append
Previously, the same file could be used across diagnostic options but
the file would be silently overwritten by the whichever option gets
handled last.

Differential Revision: https://reviews.llvm.org/D153873
2023-07-11 11:08:57 +00:00
Amilendra Kodithuwakku
9acbab60e5 [LLD][ELF] Cortex-M Security Extensions (CMSE) Support
This commit provides linker support for Cortex-M Security Extensions (CMSE).
The specification for this feature can be found in ARM v8-M Security Extensions:
Requirements on Development Tools.

The linker synthesizes a security gateway veneer in a special section;
`.gnu.sgstubs`, when it finds non-local symbols `__acle_se_<entry>` and `<entry>`,
defined relative to the same text section and having the same address. The
address of `<entry>` is retargeted to the starting address of the
linker-synthesized security gateway veneer in section `.gnu.sgstubs`.

In summary, the linker translates input:

```
    .text
  entry:
  __acle_se_entry:
    [entry_code]

```
into:

```
    .section .gnu.sgstubs
  entry:
    SG
    B.W __acle_se_entry

    .text
  __acle_se_entry:
    [entry_code]
```

If addresses of `__acle_se_<entry>` and `<entry>` are not equal, the linker
considers that `<entry>` already defines a secure gateway veneer so does not
synthesize one.

If `--out-implib=<out.lib>` is specified, the linker writes the list of secure
gateway veneers into a CMSE import library `<out.lib>`. The CMSE import library
will have 3 sections: `.symtab`, `.strtab`, `.shstrtab`. For every secure gateway
veneer <entry> at address `<addr>`, `.symtab` contains a `SHN_ABS` symbol `<entry>` with
value `<addr>`.

If `--in-implib=<in.lib>` is specified, the linker reads the existing CMSE import
library `<in.lib>` and preserves the entry function addresses in the resulting
executable and new import library.

Reviewed By: MaskRay, peter.smith

Differential Revision: https://reviews.llvm.org/D139092
2023-07-06 11:34:07 +01:00
Simi Pallipurath
f146763e07 Revert "Revert "[lld][Arm] Big Endian - Byte invariant support.""
This reverts commit d8851384c6ac2a1cea15e05228dbde5f13654e23.

Reason: Applied the fix for the Asan buildbot failures.
2023-06-22 16:10:18 +01:00
Mitch Phillips
cd116e0460 Revert "Revert "Revert "[LLD][ELF] Cortex-M Security Extensions (CMSE) Support"""
This reverts commit 9246df7049b0bb83743f860caff4221413c63de2.

Reason: This patch broke the UBSan buildbots. See more information in
the original phabricator review: https://reviews.llvm.org/D139092
2023-06-22 14:33:57 +02:00
Amilendra Kodithuwakku
9246df7049 Revert "Revert "[LLD][ELF] Cortex-M Security Extensions (CMSE) Support""
This reverts commit a685ddf1d104b3ce9d53cf420521f5aaff429630.

This relands Arm CMSE support (D139092) and fixes the GCC build bot errors.
2023-06-21 22:27:13 +01:00
Amilendra Kodithuwakku
a685ddf1d1 Revert "[LLD][ELF] Cortex-M Security Extensions (CMSE) Support"
This reverts commit c4fea3905617af89d1ad87319893e250f5b72dd6.

I am reverting this for now until I figure out how to fix
the build bot errors and warnings.

Errors:
llvm-project/lld/ELF/Arch/ARM.cpp:1300:29: error: expected primary-expression before ‘>’ token
 osec->writeHeaderTo<ELFT>(++sHdrs);

Warnings:
llvm-project/lld/ELF/Arch/ARM.cpp:1306:31: warning: left operand of comma operator has no effect [-Wunused-value]
2023-06-21 16:13:44 +01:00
Amilendra Kodithuwakku
c4fea39056 [LLD][ELF] Cortex-M Security Extensions (CMSE) Support
This commit provides linker support for Cortex-M Security Extensions (CMSE).
The specification for this feature can be found in ARM v8-M Security Extensions:
Requirements on Development Tools.

The linker synthesizes a security gateway veneer in a special section;
`.gnu.sgstubs`, when it finds non-local symbols `__acle_se_<entry>` and `<entry>`,
defined relative to the same text section and having the same address. The
address of `<entry>` is retargeted to the starting address of the
linker-synthesized security gateway veneer in section `.gnu.sgstubs`.

In summary, the linker translates input:

```
    .text
  entry:
  __acle_se_entry:
    [entry_code]

```
into:

```
    .section .gnu.sgstubs
  entry:
    SG
    B.W __acle_se_entry

    .text
  __acle_se_entry:
    [entry_code]
```

If addresses of `__acle_se_<entry>` and `<entry>` are not equal, the linker
considers that `<entry>` already defines a secure gateway veneer so does not
synthesize one.

If `--out-implib=<out.lib>` is specified, the linker writes the list of secure
gateway veneers into a CMSE import library `<out.lib>`. The CMSE import library
will have 3 sections: `.symtab`, `.strtab`, `.shstrtab`. For every secure gateway
veneer <entry> at address `<addr>`, `.symtab` contains a `SHN_ABS` symbol `<entry>` with
value `<addr>`.

If `--in-implib=<in.lib>` is specified, the linker reads the existing CMSE import
library `<in.lib>` and preserves the entry function addresses in the resulting
executable and new import library.

Reviewed By: MaskRay, peter.smith

Differential Revision: https://reviews.llvm.org/D139092
2023-06-21 14:47:34 +01:00
Simi Pallipurath
d8851384c6 Revert "[lld][Arm] Big Endian - Byte invariant support."
This reverts commit 8cf8956897ce9bca3176c6339077b1ca17b27abc.
2023-06-20 17:27:44 +01:00
Simi Pallipurath
8cf8956897 [lld][Arm] Big Endian - Byte invariant support.
Arm has BE8 big endian configuration called a byte-invariant(every byte has the same address on little and big-endian systems).

When in BE8 mode:
  1. Instructions are big-endian in relocatable objects but
     little-endian in executables and shared objects.
  2. Data is big-endian.
  3. The data encoding of the ELF file is ELFDATA2MSB.

To support BE8 without an ABI break for relocatable objects,the linker takes on the responsibility of changing the endianness of instructions. At a high level the only difference between BE32 and BE8 in the linker is that for BE8:
  1. The linker sets the flag EF_ARM_BE8 in the ELF header.
  2. The linker endian reverses the instructions, but not data.

This patch adds BE8 big endian support for Arm. To endian reverse the instructions we'll need access to the mapping symbols. Code sections can contain a mix of Arm, Thumb and literal data. We need to endian reverse Arm instructions as words, Thumb instructions
as half-words and ignore literal data.The only way to find these transitions precisely is by using mapping symbols. The instruction reversal will need to take place after relocation. For Arm BE8 code sections (Section has SHF_EXECINSTR flag ) we inserted a step after relocation to endian reverse the instructions. The implementation strategy i have used here is to write all sections BE32  including SyntheticSections then endian reverse all code in InputSections via mapping symbols.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D150870
2023-06-20 14:08:21 +01:00
Petr Hosek
811cbfc262 [lld][ELF] Implement –print-memory-usage
This option was introduced in GNU ld in
https://sourceware.org/legacy-ml/binutils/2015-06/msg00086.html and is
often used in embedded development. This change implements this option
in LLD matching the GNU ld output verbatim.

Differential Revision: https://reviews.llvm.org/D150644
2023-05-25 07:14:18 +00:00
Fangrui Song
39c20a63b1 [ELF] Add --remap-inputs= and --remap-inputs-file=
--remap-inputs-file= can be specified multiple times, each naming a
remap file that contains `from-glob=to-file` lines or `#`-led comments.
('=' is used a separator a la -fdebug-prefix-map=)
--remap-inputs-file= can be used to:

* replace an input file. E.g. `"*/libz.so=exp/libz.so"` can replace a resolved
  `-lz` without updating the input file list or (if used) a response file.
  When debugging an application where a bug is isolated to one single
  input file, this option gives a convenient way to test fixes.
* remove an input file with `/dev/null` (changed to `NUL` on Windows), e.g.
  `"a.o=/dev/null"`. A build system may add unneeded dependencies.
  This option gives a convenient way to test the result removing some inputs.

`--remap-inputs=a.o=aa.o` can be specified to provide one pattern without using
an extra file.
(bash/zsh process substitution is handy for specifying a pattern without using
a remap file, e.g. `--remap-inputs-file=<(printf 'a.o=aa.o')`, but it may be
unavailable in some systems. An extra file can be inconvenient for a build
system.)

Exact patterns are tested before wildcard patterns. In case of a tie, the first
patterns wins. This is an implementation detail that users should not rely on.

Co-authored-by: Marco Elver <elver@google.com>
Link: https://discourse.llvm.org/t/rfc-support-exclude-inputs/70070

Reviewed By: melver, peter.smith

Differential Revision: https://reviews.llvm.org/D148859
2023-04-26 13:18:55 -07:00
Craig Topper
85444794cd [lld][RISCV] Implement GP relaxation for R_RISCV_HI20/R_RISCV_LO12_I/R_RISCV_LO12_S.
This implements support for relaxing these relocations to use the GP
register to compute addresses of globals in the .sdata and .sbss
sections.

This feature is off by default and must be enabled by passing
--relax-gp to the linker.

The GP register might not always be the "global pointer". It can
be used for other purposes. See discussion here
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/371

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D143673
2023-04-13 10:52:15 -07:00
Ivan Tadeu Ferreira Antunes Filho
73fd9d310f [lld] Support separate native object file path in --thinlto-prefix-replace
Currently, the --thinlto-prefix-replace="oldpath;newpath" option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:

1) the output files of the thin link itself (the .thinlto.bc index files and the
optional .imports files)
2) the specified object file paths written to the response file given in the
--thinlto-index-only=${response} option, which is used by the final native
link and must match the paths of the native object files that will be
produced by ThinLTO backend compiles.
This patch expands the --thinlto-prefix-replace option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).

The new format is: --thinlto-prefix-replace="origpath;outpath[;objpath]"

This replaces the origpath directory tree of the thin link input files with
outpath when writing the thin link index and imports outputs (number 1
above). If objpath is specified it replaces origpath of the input files with
objpath when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using outpath for this as well.

Reviewed By: tejohnson, MaskRay

Differential Revision: https://reviews.llvm.org/D144596
2023-04-04 11:24:51 -07:00
Justin Cady
447aa48b4a [ELF] Add REVERSE input section description keyword
The `REVERSE` keyword is described here:

https://sourceware.org/bugzilla/show_bug.cgi?id=27565

It complements `SORT` by allowing the order of input sections to be reversed.

This is particularly useful for order-dependent sections such as .init_array,
where `REVERSE` can be used to either detect static initialization order fiasco
issues or as a mechanism to maintain .ctors element order while transitioning to
the modern .init_array. Such a transition is described here:

https://discourse.llvm.org/t/is-it-possible-to-manually-specify-init-array-order/68649

Differential Revision: https://reviews.llvm.org/D145381
2023-03-07 12:44:02 -05:00
Scott Linder
45ee0a9afc [LLD] Add --lto-CGO[0-3] option
Allow controlling the CodeGenOpt::Level independent of the LTO
optimization level in LLD via new options for the COFF, ELF, MachO, and
wasm frontends to lld. Most are spelled as --lto-CGO[0-3], but COFF is
spelled as -opt:lldltocgo=[0-3].

See D57422 for discussion surrounding the issue of how to set the CG opt
level. The ultimate goal is to let each function control its CG opt
level, but until then the current default means it is impossible to
specify a CG opt level lower than 2 while using LTO. This option gives
the user a means to control it for as long as it is not handled on a
per-function basis.

Reviewed By: MaskRay, #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D141970
2023-02-15 17:34:35 +00:00
Nikita Popov
d7cf7ab61c [LLD] Remove no-opaque-pointers plugin option
We always use opaque pointers. The opaque-pointers option is
retained as a no-op, same as no-lto-legacy-pass-manager.
2023-01-25 12:29:59 +01:00
serge-sans-paille
984b800a03
Move from llvm::makeArrayRef to ArrayRef deduction guides - last part
This is a follow-up to https://reviews.llvm.org/D140896, split into
several parts as it touches a lot of files.

Differential Revision: https://reviews.llvm.org/D141298
2023-01-10 11:47:43 +01:00
Fangrui Song
7d43c3ba51 IR: HotnessThreshold llvm::Optional => std::optional 2022-12-04 19:06:47 +00:00
Fangrui Song
4191fda69c [ELF] Change most llvm::Optional to std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 19:19:15 -08:00
Fangrui Song
2b153088be [ELF] Set DF_STATIC_TLS for AArch64/PPC32/PPC64 2022-10-16 12:08:08 -07:00
Fangrui Song
14f996dca8 [ELF] Move inputSections/ehInputSections into Ctx. NFC 2022-10-16 00:49:48 -07:00
Fangrui Song
f596d82385 [ELF] Move driver into ctx and remove indirection. NFC
This removes one global variable and removes GOT and unique_ptr indirection.
2022-10-01 15:12:50 -07:00
Fangrui Song
34fa860048 [ELF] Remove ctx indirection. NFC
Add LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection. We can move other global variables into ctx without
indirection concern. In the long term we may consider passing Ctx
as a parameter to various functions and eliminate global state as
much as possible and then remove `Ctx::reset`.
2022-10-01 12:06:33 -07:00
Fangrui Song
a623a4c8b4 [ELF] Remove elf::config indirection. NFC
`config` has 1000+ uses so we try to avoid changing `config->foo`. Define a
wrapper with LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection.

My x86-64 lld executable is 11+KiB smaller.
2022-10-01 11:39:45 -07:00
Mircea Trofin
c625c17b88 [lld][thinlto] Include -mllvm options in the thinlto cache key
They may modify thinlto optimization.

This patch only extends support for `-mllvm`. There is another way to
pass llvm flags, `-plugin-opt`, but its processing is different and will
be provided in a subsequent patch.

Differential Revision: https://reviews.llvm.org/D134013
2022-09-19 12:04:17 -07:00
Fangrui Song
12607f57da [ELF] Cache compute_thread_count. NFC 2022-09-12 19:09:08 -07:00
Fangrui Song
e6aebff674 [ELF] Parallelize relocation scanning
* Change `Symbol::flags` to a `std::atomic<uint16_t>`
* Add `llvm::parallel::threadIndex` as a thread-local non-negative integer
* Add `relocsVec` to part.relaDyn and part.relrDyn so that relative relocations can be added without a mutex
* Arbitrarily change -z nocombreloc to move relative relocations to the end. Disable parallelism for deterministic output.

MIPS and PPC64 use global states for relocation scanning. Keep serial scanning.

Speed-up with mimalloc and --threads=8 on an Intel Skylake machine:

* clang (Release): 1.27x as fast
* clang (Debug): 1.06x as fast
* chrome (default): 1.05x as fast
* scylladb (default): 1.04x as fast

Speed-up with glibc malloc and --threads=16 on a ThunderX2 (AArch64):

* clang (Release): 1.31x as fast
* scylladb (default): 1.06x as fast

Reviewed By: andrewng

Differential Revision: https://reviews.llvm.org/D133003
2022-09-12 12:56:35 -07:00
Fangrui Song
449f2ca146 [ELF] Add --compress-debug-sections=zstd
`clang -gz=zstd a.o` passes this option to the linker. This option compresses output
debug sections with zstd and sets ch_type to ELFCOMPRESS_ZSTD. As of today, very
few DWARF consumers recognize ELFCOMPRESS_ZSTD.

Use the llvm::zstd::compress API with level llvm::zstd::DefaultCompression (5),
which we may tune after we have more experience with zstd output.
zstd has built-in parallel compression support (so we don't need to do D117853
for zlib), which is not leveraged yet.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D133548
2022-09-09 10:30:18 -07:00
Nico Weber
87248ba5b1 [lld/elf] Use C++17 nested namespace syntax in most places
Like D131405, but for ELF.

No behavior change.

Differential Revision: https://reviews.llvm.org/D131612
2022-08-10 16:47:30 -04:00
Alex Brachet
dbd04b853b [ELF] Support --package-metadata
This was recently introduced in GNU linkers and it makes sense for
ld.lld to have the same support. This implementation omits checking if
the input string is valid json to reduce size bloat.

Differential Revision: https://reviews.llvm.org/D131439
2022-08-08 21:31:58 +00:00
Fangrui Song
4b2b68d5ab [lld] Change vector to SmallVector. NFC
My lld executable is 1.6KiB smaller and some functions are now more efficient.
2022-07-30 18:11:21 -07:00