845 Commits

Author SHA1 Message Date
Jessica Clarke
06d5e87e65
[NFCI][ELF][Mips] Replace MipsMultiGotPage with new RE_MIPS_OSEC_LOCAL_PAGE
Instead of having a special DynamicReloc::Kind, we can just use a new
RelExpr for the calculation needed. The only odd thing we do that allows
this is to keep a representative symbol for the OutputSection in
question (the first we see for it) around to use in this relocation for
the addend calculation.

This reduces DynamicReloc to just AddendOnly vs AgainstSymbol, plus the
internal Computed.

Reviewers: MaskRay, arichardson

Reviewed By: MaskRay, arichardson

Pull Request: https://github.com/llvm/llvm-project/pull/150810
2025-07-30 17:07:22 +01:00
Zhaoxin Yang
2c1900860c
[lld][LoongArch] Support TLSDESC GD/LD to IE/LE (#123715)
Support TLSDESC to initial-exec or local-exec optimizations. Introduce a
new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.
    
In normal or medium code model, there are two forms of code sequences:
* pcalau12i  $a0, %desc_pc_hi20(sym_desc)
* addi.d     $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d       $ra, $a0, %desc_ld(sym_desc)
* jirl       $ra, $ra, %desc_call(sym_desc)
------
* pcaddi     $a0, %desc_pcrel_20(sym_desc)
* ld.d       $ra, $a0, %desc_ld(sym_desc)
* jirl       $ra, $ra, %desc_call(sym_desc)
    
Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP
* ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to
convert the preceding instructions to NOPs, due to both forms of code
sequence (corresponding to relocation combinations:
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
R_LARCH_TLS_DESC_PCREL20_S2) have same process.
    
TODO: When relaxation enables, redundant NOPs can be removed. It will be
implemented in a future patch.
    
Note: All forms of TLSDESC code sequences should not appear interleaved
in the normal, medium or extreme code model, which compilers do not
generate and lld is unsupported. This is thanks to the guard in
PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```
2025-07-02 16:09:51 +08:00
Peter Collingbourne
494a74882b
Reapply "ELF: Add branch-to-branch optimization."
Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.

This reverts commit 1e95349dbe329938d2962a78baa0ec421e9cd7d1.

Original commit message follows:

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
        0.0243538 +/- 0.00233202
        1.87831% +/- 0.179859%
        (Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Reviewers: zmodem, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/145579
2025-06-24 22:16:18 -07:00
Hans Wennborg
1e95349dbe Revert "ELF: Add branch-to-branch optimization."
This caused assertion failures in applyBranchToBranchOpt():

  llvm/include/llvm/Support/Casting.h:578:
  decltype(auto) llvm::cast(From*)
  [with To = lld:🧝:InputSection; From = lld:🧝:InputSectionBase]:
  Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

See comment on the PR (https://github.com/llvm/llvm-project/pull/138366)

This reverts commit 491b82a5ec1add78d2c93370580a2f1897b6a364.

This also reverts the follow-up "[lld] Use llvm::partition_point (NFC) (#145209)"

This reverts commit 2ac293f5ac4cf65c0c038bf75a88f1d6715e467d.
2025-06-23 13:26:02 +02:00
Peter Collingbourne
491b82a5ec ELF: Add branch-to-branch optimization.
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
	0.0243538 +/- 0.00233202
	1.87831% +/- 0.179859%
	(Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Pull Request: https://github.com/llvm/llvm-project/pull/138366
2025-06-20 13:16:24 -07:00
Kazu Hirata
19f00c0570
[lld] Remove unused includes (NFC) (#141421) 2025-05-25 10:55:39 -07:00
Fangrui Song
6b87f01aaa [ELF] MergeInputSection: replace Fatal with Err
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
2025-01-25 16:20:27 -08:00
Fangrui Song
7db789b570 [ELF] Replace a few Fatal with Err
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
2025-01-25 16:00:51 -08:00
Daniil Kovalev
9178708c3b
[PAC][lld][AArch64][ELF] Support signed TLSDESC (#113817)
Depends on #120010

Support `R_AARCH64_AUTH_TLSDESC_ADR_PAGE21`, `R_AARCH64_AUTH_TLSDESC_LD64_LO12`
and `R_AARCH64_AUTH_TLSDESC_LD64_LO12` static relocations and
`R_AARCH64_AUTH_TLSDESC` dynamic relocation. IE/LE optimization is not
currently supported for AUTH TLSDESC.
2025-01-22 12:18:05 +03:00
Daniil Kovalev
1ef5b987a4
[PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (#113816)
Depends on #114525

Support `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` and `R_AARCH64_AUTH_GOT_LD_PREL19`
GOT-generating relocations. A corresponding `RE_AARCH64_AUTH_GOT_PC` member
of `RelExpr` is added, which is an AUTH-specific variant of `R_GOT_PC`.
2024-12-18 09:41:54 +03:00
Daniil Kovalev
417d2d7ce6
[PAC][lld][AArch64][ELF] Support signed GOT (#113815)
Depends on #113811

Support `R_AARCH64_AUTH_ADR_GOT_PAGE`, `R_AARCH64_AUTH_GOT_LO12_NC` and
`R_AARCH64_AUTH_GOT_ADD_LO12_NC` GOT-generating relocations. For preemptible
symbols, dynamic relocation `R_AARCH64_AUTH_GLOB_DAT` is emitted. Otherwise,
we unconditionally emit `R_AARCH64_AUTH_RELATIVE` dynamic relocation since
pointers in signed GOT needs to be signed during dynamic link time.
2024-12-17 10:23:01 +03:00
Fangrui Song
04996a28b7
[ELF] Rename target-specific RelExpr enumerators
RelExpr enumerators are named `R_*`, which can be confused with ELF
relocation type names. Rename the target-specific ones to `RE_*` to
avoid confusion.

For consistency, the target-independent ones can be renamed as well, but
that's not urgent. The relocation processing mechanism with RelExpr has
non-trivial overhead compared with mold's approach, and we might make
more code into Arch/*.cpp files and decrease the enumerators.

Pull Request: https://github.com/llvm/llvm-project/pull/118424
2024-12-03 09:17:17 -08:00
Fangrui Song
603d41ab7b [ELF] decompress: remove mutex
decompress() is in the parallel code path splitIntoPieces
and we should avoid mutex.
2024-12-01 14:02:44 -08:00
Fangrui Song
1f13713dbb [ELF] Change getSrcMsg to use ELFSyncStream. NFC 2024-11-29 17:18:22 -08:00
Fangrui Song
666de79595 [ELF] Move some ObjFile members to ELFFileBase to simplify getSrcMsg 2024-11-29 15:48:46 -08:00
Fangrui Song
05ff6e7948 [ELF] Use lower case offset in getObjMsg
to improve consistency with other diagnostics. While here, migrate to
use ELFSyncStream to drop toStr/getCtx uses and avoid string overhead.
2024-11-29 13:00:36 -08:00
Oliver Stannard
6512e488f6
[LLD][ARM] Allow R_ARM_SBREL32 relocations in debug info (#116956)
The R_ARM_SBREL32 relocation is used in debug info for ARM RWPI
(read-write position independent) code. Compiler-generated DWARF info
will use an expression to add the relocated value to the actual value of
the static base (held in r9) at run-time, so it should be relocated as
if the static base is at address 0.
2024-11-25 08:51:27 +00:00
Fangrui Song
1cd627562b [ELF] Remove unneeded Twine in ELFSyncStream 2024-11-24 12:13:02 -08:00
Fangrui Song
099a52fd2f [ELF] Reorder SectionBase/InputSectionBase members
Move `sectionKind` outside the bitfield and move bss/keepUnique to
InputSectionBase.

* sizeof(InputSection) decreases from 160 to 152 on 64-bit systems.
* The numerous `sectionKind` accesses are faster.
2024-11-23 16:34:21 -08:00
Fangrui Song
43e3871a32 [ELF] Make section member orders consistent
SectionBase, InputSectionBase, InputSection, MergeInputSection, and
OutputSection have different member orders. Make them consistent and
adopt the order similar to the raw Elf64_Shdr.
2024-11-23 14:22:24 -08:00
Fangrui Song
6d98f11f3b [ELF] Work around extra "warning: $" with MSVC 14.41.34120 2024-11-17 11:08:32 -08:00
Fangrui Song
2991a4e209 [ELF] Replace functions bAlloc/saver/uniqueSaver with member access 2024-11-16 22:34:13 -08:00
Fangrui Song
834457a134 [ELF] Simplify relocateNonAlloc diagnostic 2024-11-16 20:38:27 -08:00
Fangrui Song
483516fd83 [ELF] Remove unneeded Twine() 2024-11-16 20:32:44 -08:00
Fangrui Song
a626eb2a2f [ELF] Pass ctx to bAlloc/saver/uniqueSaver 2024-11-16 15:20:21 -08:00
Fangrui Song
6c19fa4bfc [ELF] Remove unneeded toString(Error) when using ELFSyncStream 2024-11-16 13:31:05 -08:00
Fangrui Song
38870fe124 [ELF] Remove unneeded toString(Error) when using ELFSyncStream 2024-11-16 13:22:06 -08:00
Fangrui Song
a6755bdad1 [ELF] Replace global ctx with getCtx() 2024-11-16 12:11:00 -08:00
Fangrui Song
58a971f42f [ELF] Replace contex-less toString(x) with toStr(ctx, x)
so that we can remove the global `ctx` from toString implementations.
Rename lld::toString (to lld:🧝:toStr) to simplify name lookup (we
have many llvm::toString and another lld::toString(const llvm::opt::Arg
&)).
2024-11-16 11:58:10 -08:00
Fangrui Song
3fb83f65c4 [ELF] Replace toString(RelType) with operator<< while using ELFSyncStream 2024-11-16 11:45:46 -08:00
Fangrui Song
942928f3df [ELF] Migrate away from global ctx 2024-11-14 23:04:18 -08:00
Fangrui Song
3d57c79728 [ELF] Migrate away from global ctx 2024-11-14 22:50:53 -08:00
Fangrui Song
d69cc05bcf [ELF] Migrate away from global ctx 2024-11-14 22:30:29 -08:00
Fangrui Song
9b058bb42d [ELF] Replace errorOrWarn(...) with Err 2024-11-06 22:33:51 -08:00
Fangrui Song
f8bae3af74 [ELF] Replace warn(...) with Warn 2024-11-06 22:19:31 -08:00
Fangrui Song
09c2c5e1e9 [ELF] Replace error(...) with ErrAlways or Err
Most are migrated to ErrAlways mechanically.
In the future we should change most to Err.
2024-11-06 22:04:52 -08:00
Fangrui Song
63c6fe4a0b [ELF] Replace fatal(...) with Fatal or Err 2024-11-06 21:17:26 -08:00
Fangrui Song
e6625a2c10 [ELF] Pass Ctx & 2024-10-19 21:08:50 -07:00
Fangrui Song
861bd36bce [ELF] Pass Ctx & to Symbol::getVA 2024-10-19 20:32:58 -07:00
Fangrui Song
d0606c265e [ELF] Make .comment have a non-full file
This ensures that SectionBase::file is non-null except
InputSection::discarded.
2024-10-11 20:55:21 -07:00
Fangrui Song
c33133279b [ELF] Pass Ctx & to InputSection 2024-10-11 20:39:53 -07:00
Fangrui Song
9bf2e20b17 [ELF] Pass Ctx & to OutputSection 2024-10-11 20:28:58 -07:00
Fangrui Song
6dd773b650 [ELF] Pass Ctx & 2024-10-11 20:15:02 -07:00
Fangrui Song
1c28f31133 [ELF] Pass Ctx & 2024-10-11 18:35:02 -07:00
Fangrui Song
81bd712f92 [ELF] Revert Ctx & parameters from SyntheticSection
Since Ctx &ctx is a member variable,
1f391a75af8685e6bba89421443d72ac6a186599
7a5b9ef54eb96abd8415fd893576c42e51fd95db
e2f0ec3a3a8a2981be8a1aac2004cfb9064c61e8 can be reverted.
2024-10-10 23:43:21 -07:00
Fangrui Song
c22588c7cd [ELF] Move InputSectionBase::file to SectionBase
... and add getCtx (file->ctx). This allows InputSectionBase and
OutputSection to access ctx without taking an extra function argument.
2024-10-10 22:15:10 -07:00
Sam Elliott
db1a762069
[LLD][RISCV] Error on PCREL_LO referencing other Section (#107558)
The RISC-V psABI states that "The `R_RISCV_PCREL_LO12_I` or
`R_RISCV_PCREL_LO12_S` relocations contain a label pointing to an
instruction in the same section with an `R_RISCV_PCREL_HI20` relocation
entry that points to the target symbol."

Without this patch, GNU ld errors, but LLD does not -- I think because LLD is
doing the right thing, certainly in the testcase provided.

Nonetheless, I think an error is good here to bring LLD in line with
what GNU ld is doing in showing that the object the user provided is not
following the psABI as written.

Fixes #107304
2024-10-08 12:45:01 +01:00
Fangrui Song
cfd3289a1f [ELF] Pass Ctx & to some free functions 2024-10-06 19:36:21 -07:00
Fangrui Song
acb2b1e779 [ELF] Pass Ctx & to Symbols 2024-10-06 16:59:04 -07:00
Fangrui Song
2b5cb1bf62 [ELF] getRelocTargetVA: pass Ctx and Relocation. NFC 2024-10-06 16:34:09 -07:00