710 Commits

Author SHA1 Message Date
Jessica Clarke
489d36cedc [NFC][ELF] Add missing blank line between functions
Fixes: b42f96bc057f ("[lld] Add thunks for hexagon (#111217)")
(cherry picked from commit b03d1e1e2e8e4b0b4b9e035b7ad9fb86dccefb93)
2025-08-05 10:56:10 +02:00
Brian Cain
f6c4f0eb70 [lld] Add thunks for hexagon (#111217)
Without thunks, programs will encounter link errors complaining that the
branch target is out of range. Thunks will extend the range of branch
targets, which is a critical need for large programs. Thunks provide
this flexibility at a cost of some modest code size increase.

When configured with the maximal feature set, the hexagon port of the
linux kernel would often encounter these limitations when linking with
`lld`.

The relocations which will be extended by thunks are:

* R_HEX_B22_PCREL, R_HEX_{G,L}D_PLT_B22_PCREL, R_HEX_PLT_B22_PCREL
relocations have a range of ± 8MiB on the baseline
* R_HEX_B15_PCREL: ±65,532 bytes
* R_HEX_B13_PCREL: ±16,380 bytes
* R_HEX_B9_PCREL: ±1,020 bytes

Fixes #149689

Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>

---------

Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
(cherry picked from commit b42f96bc057fd9e31572069b241ba130c21144e5)
2025-07-28 09:27:54 +02:00
Brian Cain
51245ebda1 [lld] [hexagon] guard allocateAux: only if idx nonzero (#149690)
While building libclang_rt.asan-hexagon.so, lld would assert in
lld:🧝:hexagonTLSSymbolUpdate().

Fixes #132766

(cherry picked from commit 3e9ceae29f39456508eef5b4af4d3c895048706a)
2025-07-22 10:37:04 +02:00
Zhaoxin Yang
2c1900860c
[lld][LoongArch] Support TLSDESC GD/LD to IE/LE (#123715)
Support TLSDESC to initial-exec or local-exec optimizations. Introduce a
new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.
    
In normal or medium code model, there are two forms of code sequences:
* pcalau12i  $a0, %desc_pc_hi20(sym_desc)
* addi.d     $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d       $ra, $a0, %desc_ld(sym_desc)
* jirl       $ra, $ra, %desc_call(sym_desc)
------
* pcaddi     $a0, %desc_pcrel_20(sym_desc)
* ld.d       $ra, $a0, %desc_ld(sym_desc)
* jirl       $ra, $ra, %desc_call(sym_desc)
    
Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP
* ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to
convert the preceding instructions to NOPs, due to both forms of code
sequence (corresponding to relocation combinations:
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
R_LARCH_TLS_DESC_PCREL20_S2) have same process.
    
TODO: When relaxation enables, redundant NOPs can be removed. It will be
implemented in a future patch.
    
Note: All forms of TLSDESC code sequences should not appear interleaved
in the normal, medium or extreme code model, which compilers do not
generate and lld is unsupported. This is thanks to the guard in
PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```
2025-07-02 16:09:51 +08:00
Peter Collingbourne
494a74882b
Reapply "ELF: Add branch-to-branch optimization."
Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.

This reverts commit 1e95349dbe329938d2962a78baa0ec421e9cd7d1.

Original commit message follows:

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
        0.0243538 +/- 0.00233202
        1.87831% +/- 0.179859%
        (Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Reviewers: zmodem, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/145579
2025-06-24 22:16:18 -07:00
Hans Wennborg
1e95349dbe Revert "ELF: Add branch-to-branch optimization."
This caused assertion failures in applyBranchToBranchOpt():

  llvm/include/llvm/Support/Casting.h:578:
  decltype(auto) llvm::cast(From*)
  [with To = lld:🧝:InputSection; From = lld:🧝:InputSectionBase]:
  Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

See comment on the PR (https://github.com/llvm/llvm-project/pull/138366)

This reverts commit 491b82a5ec1add78d2c93370580a2f1897b6a364.

This also reverts the follow-up "[lld] Use llvm::partition_point (NFC) (#145209)"

This reverts commit 2ac293f5ac4cf65c0c038bf75a88f1d6715e467d.
2025-06-23 13:26:02 +02:00
Peter Collingbourne
491b82a5ec ELF: Add branch-to-branch optimization.
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
	0.0243538 +/- 0.00233202
	1.87831% +/- 0.179859%
	(Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Pull Request: https://github.com/llvm/llvm-project/pull/138366
2025-06-20 13:16:24 -07:00
Fangrui Song
2fcaa00d1e [ELF] -z undefs: handle relocations referencing undefined non-weak like undefined weak
* Merge the special case into isStaticLinkTimeConstant
* Generalize isUndefWeak to isUndefined. undefined non-weak is an error
  case. We choose to be general, which also brings us in line with GNU ld.
2025-06-11 20:37:15 -07:00
Alexander Ziaee
44a7ecd1d7
[doc] Use ISO nomenclature for 1024 byte units (#133148)
Increase specificity by using the correct unit sizes. KBytes is an
abbreviation for kB, 1000 bytes, and the hardware industry as well as
several operating systems have now switched to using 1000 byte kBs.

If this change is acceptable, sometimes GitHub mangles merges to use the
original email of the account. $dayjob asks contributions have my work
email. Thanks!
2025-06-11 13:27:23 +02:00
Kazu Hirata
19f00c0570
[lld] Remove unused includes (NFC) (#141421) 2025-05-25 10:55:39 -07:00
Peter Collingbourne
f53eb88d25
ELF: Remove lock from MTE global relocation handling code.
This lock is unnecessary because we can add the relocations to
shards and let them be sorted later.

Reviewers: smithp35, fmayer, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/135123
2025-04-10 10:38:59 -07:00
Zhaoxin Yang
bd84d66700
[lld][LoongArch] Convert TLS IE to LE in the normal or medium code model (#123680)
Original code sequence:
* pcalau12i $a0, %ie_pc_hi20(sym)
* ld.d           $a0, $a0, %ie_pc_lo12(sym)

The code sequence converted is as follows:
* lu12i.w   $a0, %le_hi20(sym)         # le_hi20 != 0, otherwise NOP
* ori          $a0, src, %le_lo12(sym)  # le_hi20 != 0, src = $a0,
                                                         # otherwise,    src = $zero

TODO: When relaxation is enabled, redundant NOP can be removed. This
will be implemented in a future patch.
    
Note: In the normal or medium code model, original code sequence with
relocations allow interleaving, because converted code sequence
calculates the absolute offset. However, in extreme code model, to
identify the current code model, the first four instructions with
relocations must appear consecutively.
2025-04-07 19:58:48 +08:00
Fangrui Song
ba2de8f22d [ELF] Allow absolute relocation referencing symbol index 0 in PIC mode
The value of an absolute relocation, like R_RISCV_HI20 or R_PPC64_LO16,
with a symbol index of 0, the resulting value should be treated as
absolute and permitted in both -pie and -shared links.

This change also resolves an absolute relocation referencing an
undefined symbol in statically-linked executables.

PPC64 has unfortunate exceptions:

* R_PPC64_TOCBASE uses symbol index 0 but it should be treated as
  referencing the linker-defined .TOC.
* R_PPC64_PCREL_OPT (https://reviews.llvm.org/D84360) could no longer
  rely on `isAbsoluteValue` return false.
2025-03-28 20:44:07 -07:00
Fangrui Song
f21c35d54f [ELF] Replace some Fatal with Err
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
2025-01-25 17:29:28 -08:00
Daniil Kovalev
9178708c3b
[PAC][lld][AArch64][ELF] Support signed TLSDESC (#113817)
Depends on #120010

Support `R_AARCH64_AUTH_TLSDESC_ADR_PAGE21`, `R_AARCH64_AUTH_TLSDESC_LD64_LO12`
and `R_AARCH64_AUTH_TLSDESC_LD64_LO12` static relocations and
`R_AARCH64_AUTH_TLSDESC` dynamic relocation. IE/LE optimization is not
currently supported for AUTH TLSDESC.
2025-01-22 12:18:05 +03:00
Daniil Kovalev
1ef5b987a4
[PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (#113816)
Depends on #114525

Support `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` and `R_AARCH64_AUTH_GOT_LD_PREL19`
GOT-generating relocations. A corresponding `RE_AARCH64_AUTH_GOT_PC` member
of `RelExpr` is added, which is an AUTH-specific variant of `R_GOT_PC`.
2024-12-18 09:41:54 +03:00
Daniil Kovalev
417d2d7ce6
[PAC][lld][AArch64][ELF] Support signed GOT (#113815)
Depends on #113811

Support `R_AARCH64_AUTH_ADR_GOT_PAGE`, `R_AARCH64_AUTH_GOT_LO12_NC` and
`R_AARCH64_AUTH_GOT_ADD_LO12_NC` GOT-generating relocations. For preemptible
symbols, dynamic relocation `R_AARCH64_AUTH_GLOB_DAT` is emitted. Otherwise,
we unconditionally emit `R_AARCH64_AUTH_RELATIVE` dynamic relocation since
pointers in signed GOT needs to be signed during dynamic link time.
2024-12-17 10:23:01 +03:00
Fangrui Song
3733ed6f1c [ELF] Introduce Symbol::isExported to cache includeInDynsym
isExported, intended to replace exportDynamic, is primarily set in two
locations, (a) after parseSymbolVersion and (b) during demoteSymbols.

In the future, we should try removing exportDynamic. Currently,
merging exportDynamic/isExported would cause
riscv-gp.s to fail:

* The first isExported computation considers the undefined symbol exported
* Defined as a linker-synthesized symbol
* isExported remains true, while it should be false
2024-12-08 22:40:14 -08:00
Fangrui Song
c650880958 [ELF] Simplify handling of exportDynamic and canBeOmittedFromSymbolTable
When computing whether a defined symbol is exported, we set
`exportDynamic` in Defined and CommonSymbol's ctor and merge the bit in
symbol resolution. The complexity is for the LTO special case
canBeOmittedFromSymbolTable, which can be simplified by introducing a
new bit.

We might simplify the state by caching includeInDynsym in exportDynamic
in the future.
2024-12-08 09:33:48 -08:00
Fangrui Song
8669028c18 [ELF] Remove unneeded sym->file check
After #78944 and some follow-ups, sym->file, unless in the initial
Placeholder stage, is guaranteed to be non-null.
2024-12-07 20:46:02 -08:00
Fangrui Song
ae5fdaea43 [ELF] Simplify printLocation
sym.file is always non-null (since around #78944).
2024-12-07 20:31:50 -08:00
pcc
970d6d2096
ELF: Have __rela_iplt_{start,end} surround .rela.iplt with --pack-dyn-relocs=android.
In #86751 we moved the IRELATIVE relocations to .rela.plt when
--pack-dyn-relocs=android was enabled but we neglected to also move
the __rela_iplt_{start,end} symbols. As a result, static binaries
linked with this flag were unable to find their IRELATIVE relocations.
Fix it by having the symbols surround the correct section.

Reviewers: MaskRay, smithp35

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/118585
2024-12-04 17:35:05 -08:00
Fangrui Song
04996a28b7
[ELF] Rename target-specific RelExpr enumerators
RelExpr enumerators are named `R_*`, which can be confused with ELF
relocation type names. Rename the target-specific ones to `RE_*` to
avoid confusion.

For consistency, the target-independent ones can be renamed as well, but
that's not urgent. The relocation processing mechanism with RelExpr has
non-trivial overhead compared with mold's approach, and we might make
more code into Arch/*.cpp files and decrease the enumerators.

Pull Request: https://github.com/llvm/llvm-project/pull/118424
2024-12-03 09:17:17 -08:00
Fangrui Song
1f13713dbb [ELF] Change getSrcMsg to use ELFSyncStream. NFC 2024-11-29 17:18:22 -08:00
Fangrui Song
d8495ede01 [ELF] Change getLocation to use ELFSyncStream. NFC 2024-11-24 11:16:52 -08:00
Fangrui Song
ff97b28334 [ELF] Simplif reportUndefinedSymbol. NFC 2024-11-24 10:34:11 -08:00
Fangrui Song
37e39667cc [ELF] Make ThunkCreator take ownership of thunks
This removes many SpecificAlloc instantiations and makes my lld (x86-64
Release+Assertions) smaller by ~36k.
2024-11-19 23:16:35 -08:00
Fangrui Song
8f238f662c [ELF] Make Ctx inherit from CommonLinkerContext
link calls `new CommonLinkerContext`. Now that `Ctx ctx` is a local
variable, we can make it inherit from CommonLinkerContext.
2024-11-16 22:28:55 -08:00
Fangrui Song
c1a6defd9f [ELF] Make RelType a struct type
otherwise operator<<(const ELFSyncStream &s, RelType type) applies to
non-reloc-type uint32_t, which can be confusing.
2024-11-16 20:26:34 -08:00
Fangrui Song
e57331ec63 [ELF] Move global relocMutex/undefs into Ctx 2024-11-16 19:22:11 -08:00
Fangrui Song
9664ce6d59 [ELF] Simplify complex diagnostics 2024-11-16 19:11:58 -08:00
Fangrui Song
24c7d97cff [ELF] Replace context-less errorHandler() and error() with ctx.errHandler 2024-11-16 14:14:54 -08:00
Fangrui Song
58a971f42f [ELF] Replace contex-less toString(x) with toStr(ctx, x)
so that we can remove the global `ctx` from toString implementations.
Rename lld::toString (to lld:🧝:toStr) to simplify name lookup (we
have many llvm::toString and another lld::toString(const llvm::opt::Arg
&)).
2024-11-16 11:58:10 -08:00
Fangrui Song
47e6673006 [ELF] Replace toString(RelType) with operator<< while using ELFSyncStream 2024-11-16 10:12:08 -08:00
Peter Smith
098b0d18ad
[LLD][AArch64] Detach Landing Pad creation from Thunk creation (#116402)
Move Landing Pad Creation to a new function that checks each thunk every
pass to see if it needs a landing pad. This permits a thunk to be
created without needing a landing pad, but later needing one due to
drifting out of direct branch range and requiring an indirect branch.

We record all the Thunks created so far in a new vector rather than
trying to iterate over the DenseMap as we need a deterministic order of
adding LandingPadThunks due to the short branch fall through. We cannot
use normalizeExistingThunk() either as that only iterates through live
thunks.

Fixes: https://crbug.com/377438309
Original PR: https://github.com/llvm/llvm-project/pull/108989

Sending without a new test case to fix existing test. A new regression
test will come in a separate PR as coming up with a small enough
reproducer for this case is non-trivial.
2024-11-15 18:18:18 +00:00
Fangrui Song
3d57c79728 [ELF] Migrate away from global ctx 2024-11-14 22:50:53 -08:00
Fangrui Song
9b058bb42d [ELF] Replace errorOrWarn(...) with Err 2024-11-06 22:33:51 -08:00
Fangrui Song
f8bae3af74 [ELF] Replace warn(...) with Warn 2024-11-06 22:19:31 -08:00
Fangrui Song
09c2c5e1e9 [ELF] Replace error(...) with ErrAlways or Err
Most are migrated to ErrAlways mechanically.
In the future we should change most to Err.
2024-11-06 22:04:52 -08:00
Fangrui Song
63c6fe4a0b [ELF] Replace fatal(...) with Fatal or Err 2024-11-06 21:17:26 -08:00
Fangrui Song
861bd36bce [ELF] Pass Ctx & to Symbol::getVA 2024-10-19 20:32:58 -07:00
Fangrui Song
fe8af49a1b [ELF] Pass Ctx & to Defined & CommonSymbol 2024-10-20 01:38:16 +00:00
Fangrui Song
682925ef43 [ELF] Pass Ctx & to Partition 2024-10-15 22:58:07 -07:00
Fangrui Song
a3bad9adcb [ELF] Pass Ctx & 2024-10-12 09:56:05 -07:00
Fangrui Song
9bf2e20b17 [ELF] Pass Ctx & to OutputSection 2024-10-11 20:28:58 -07:00
Fangrui Song
81bd712f92 [ELF] Revert Ctx & parameters from SyntheticSection
Since Ctx &ctx is a member variable,
1f391a75af8685e6bba89421443d72ac6a186599
7a5b9ef54eb96abd8415fd893576c42e51fd95db
e2f0ec3a3a8a2981be8a1aac2004cfb9064c61e8 can be reverted.
2024-10-10 23:43:21 -07:00
Fangrui Song
25cda9e069 [ELF] Pass Ctx & to SyntheticSection 2024-10-10 23:07:02 -07:00
Fangrui Song
acb2b1e779 [ELF] Pass Ctx & to Symbols 2024-10-06 16:59:04 -07:00
Fangrui Song
6d03a69034 [ELF] Pass Ctx & to Arch/ 2024-10-06 00:14:12 -07:00
Fangrui Song
87d199ff24 [ELF] Pass Ctx & to Relocations 2024-10-05 09:37:27 -07:00