719 Commits

Author SHA1 Message Date
Kazu Hirata
4831d92005
[lld] Replace SmallSet with SmallPtrSet (NFC) (#154263)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 30 instances that rely on this "redirection".  Since the
redirection doesn't improve readability, this patch replaces SmallSet
with SmallPtrSet for pointer element types.

I'm planning to remove the redirection eventually.
2025-08-18 22:39:45 -07:00
Peter Collingbourne
53c41f19e3
ELF: Add support for R_AARCH64_PATCHINST relocation type.
The R_AARCH64_PATCHINST relocation type is to support deactivation
symbols. For more information, see the RFC:
https://discourse.llvm.org/t/rfc-deactivation-symbols/85556

Part of the AArch64 psABI extension:
https://github.com/ARM-software/abi-aa/issues/340

Reviewers: smithp35, davemgreen, MaskRay

Reviewed By: MaskRay, davemgreen, smithp35

Pull Request: https://github.com/llvm/llvm-project/pull/133534
2025-08-11 11:33:08 -07:00
Jessica Clarke
723b40a8d9 [ELF][Hexagon] Fix host endianness assumption
Fixes: b42f96bc057f ("[lld] Add thunks for hexagon (#111217)")
2025-08-03 21:30:17 +01:00
Jessica Clarke
de15d36574 [NFC][ELF][Hexagon] Avoid pointless ArrayRef::drop_front
Fixes: b42f96bc057f ("[lld] Add thunks for hexagon (#111217)")
2025-08-03 21:30:17 +01:00
Jessica Clarke
b03d1e1e2e [NFC][ELF] Add missing blank line between functions
Fixes: b42f96bc057f ("[lld] Add thunks for hexagon (#111217)")
2025-08-03 21:30:16 +01:00
Jessica Clarke
52ddcfd8d6
[NFC][ELF] Replace DynamicReloc::Kind with the equivalent bool in APIs
DynamicReloc::AgainstSymbol is now true and DynamicReloc::AddendOnly is
now false; uses of the constants were replaced mechanically.

Reviewers: rnk, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/150813
2025-07-30 17:08:36 +01:00
Jessica Clarke
54df4b8c35
[NFCI][ELF] Merge AgainstSymbol and AgainstSymbolWithTargetVA
The former is just a special case of the latter, ignoring the expr and
always just using the addend. If we use R_ADDEND as expr (which
previously had no effect, and so was misleadingly R_ABS not R_ADDEND in
all but one use) then we don't need to maintain this as a separate case.

Aside from the internal Computed Kind, this just leaves MipsMultiGotPage
as a special case; the only difference between the other two Kind values
is what needsDynSymIndex returns.

Reviewers: MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/150798
2025-07-30 17:06:59 +01:00
Jessica Clarke
e027b9258a
[NFCI][ELF] Merge AddendOnly and AddendOnlyWithTargetVA
The former is just a special case of the latter, ignoring the expr and
always just using the addend, allowing (and enforcing) the sym is null.
If we just use dummySym then we don't need to maintain this as a
separate case, since R_ADDEND will return the addend unmodified for the
call to getRelocTargetVA.

Reviewers: MaskRay, arichardson

Reviewed By: MaskRay, arichardson

Pull Request: https://github.com/llvm/llvm-project/pull/150797
2025-07-30 17:05:15 +01:00
Jessica Clarke
58e6bc87b7
[ELF] Add a dummySym member to Ctx
This ensures subsequent calls to elf::postScanRelocations with a new Ctx
will correctly use an instance with the right internalFile (with the old
one presumably deleted, even). It also avoids having to create a new
instance in elf::getErrorPlace, and will allow more uses of such a dummy
symbol in future commits.

Reviewers: MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/150796
2025-07-30 17:03:34 +01:00
Brian Cain
3e9ceae29f
[lld] [hexagon] guard allocateAux: only if idx nonzero (#149690)
While building libclang_rt.asan-hexagon.so, lld would assert in
lld:🧝:hexagonTLSSymbolUpdate().

Fixes #132766
2025-07-20 14:39:03 -05:00
Brian Cain
b42f96bc05
[lld] Add thunks for hexagon (#111217)
Without thunks, programs will encounter link errors complaining that the
branch target is out of range. Thunks will extend the range of branch
targets, which is a critical need for large programs. Thunks provide
this flexibility at a cost of some modest code size increase.

When configured with the maximal feature set, the hexagon port of the
linux kernel would often encounter these limitations when linking with
`lld`.

The relocations which will be extended by thunks are:

* R_HEX_B22_PCREL, R_HEX_{G,L}D_PLT_B22_PCREL, R_HEX_PLT_B22_PCREL
relocations have a range of ± 8MiB on the baseline
* R_HEX_B15_PCREL: ±65,532 bytes
* R_HEX_B13_PCREL: ±16,380 bytes
* R_HEX_B9_PCREL: ±1,020 bytes

Fixes #149689 

Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>

---------

Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
2025-07-20 11:46:31 -05:00
Fangrui Song
3cb0c7f45b MC: Rework .reloc directive and fix the offset when it evaluates to a constant
* Fix `.reloc constant` to mean section_symbol+constant instead of
  .+constant . The initial .reloc support from MIPS incorrectly
  interpreted the offset.
* Delay the evaluation of the offset expression after
  MCAssembler::layout, deleting a lot of code working with MCFragment.
* Delete many FIXME from https://reviews.llvm.org/D79625
* Some lld/ELF/Arch/LoongArch.cpp relaxation tests rely on .reloc .,
  R_LARCH_ALIGN generating ALIGN relocations at specific location.
  Sort the relocations.
2025-07-17 00:36:11 -07:00
Zhaoxin Yang
2c1900860c
[lld][LoongArch] Support TLSDESC GD/LD to IE/LE (#123715)
Support TLSDESC to initial-exec or local-exec optimizations. Introduce a
new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.
    
In normal or medium code model, there are two forms of code sequences:
* pcalau12i  $a0, %desc_pc_hi20(sym_desc)
* addi.d     $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d       $ra, $a0, %desc_ld(sym_desc)
* jirl       $ra, $ra, %desc_call(sym_desc)
------
* pcaddi     $a0, %desc_pcrel_20(sym_desc)
* ld.d       $ra, $a0, %desc_ld(sym_desc)
* jirl       $ra, $ra, %desc_call(sym_desc)
    
Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd]   $a0, $a0, %ie_pc_lo12(sym_ie)

Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP
* ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero

Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to
convert the preceding instructions to NOPs, due to both forms of code
sequence (corresponding to relocation combinations:
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
R_LARCH_TLS_DESC_PCREL20_S2) have same process.
    
TODO: When relaxation enables, redundant NOPs can be removed. It will be
implemented in a future patch.
    
Note: All forms of TLSDESC code sequences should not appear interleaved
in the normal, medium or extreme code model, which compilers do not
generate and lld is unsupported. This is thanks to the guard in
PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```
2025-07-02 16:09:51 +08:00
Peter Collingbourne
494a74882b
Reapply "ELF: Add branch-to-branch optimization."
Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.

This reverts commit 1e95349dbe329938d2962a78baa0ec421e9cd7d1.

Original commit message follows:

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
        0.0243538 +/- 0.00233202
        1.87831% +/- 0.179859%
        (Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Reviewers: zmodem, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/145579
2025-06-24 22:16:18 -07:00
Hans Wennborg
1e95349dbe Revert "ELF: Add branch-to-branch optimization."
This caused assertion failures in applyBranchToBranchOpt():

  llvm/include/llvm/Support/Casting.h:578:
  decltype(auto) llvm::cast(From*)
  [with To = lld:🧝:InputSection; From = lld:🧝:InputSectionBase]:
  Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

See comment on the PR (https://github.com/llvm/llvm-project/pull/138366)

This reverts commit 491b82a5ec1add78d2c93370580a2f1897b6a364.

This also reverts the follow-up "[lld] Use llvm::partition_point (NFC) (#145209)"

This reverts commit 2ac293f5ac4cf65c0c038bf75a88f1d6715e467d.
2025-06-23 13:26:02 +02:00
Peter Collingbourne
491b82a5ec ELF: Add branch-to-branch optimization.
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
	0.0243538 +/- 0.00233202
	1.87831% +/- 0.179859%
	(Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Pull Request: https://github.com/llvm/llvm-project/pull/138366
2025-06-20 13:16:24 -07:00
Fangrui Song
2fcaa00d1e [ELF] -z undefs: handle relocations referencing undefined non-weak like undefined weak
* Merge the special case into isStaticLinkTimeConstant
* Generalize isUndefWeak to isUndefined. undefined non-weak is an error
  case. We choose to be general, which also brings us in line with GNU ld.
2025-06-11 20:37:15 -07:00
Alexander Ziaee
44a7ecd1d7
[doc] Use ISO nomenclature for 1024 byte units (#133148)
Increase specificity by using the correct unit sizes. KBytes is an
abbreviation for kB, 1000 bytes, and the hardware industry as well as
several operating systems have now switched to using 1000 byte kBs.

If this change is acceptable, sometimes GitHub mangles merges to use the
original email of the account. $dayjob asks contributions have my work
email. Thanks!
2025-06-11 13:27:23 +02:00
Kazu Hirata
19f00c0570
[lld] Remove unused includes (NFC) (#141421) 2025-05-25 10:55:39 -07:00
Peter Collingbourne
f53eb88d25
ELF: Remove lock from MTE global relocation handling code.
This lock is unnecessary because we can add the relocations to
shards and let them be sorted later.

Reviewers: smithp35, fmayer, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/135123
2025-04-10 10:38:59 -07:00
Zhaoxin Yang
bd84d66700
[lld][LoongArch] Convert TLS IE to LE in the normal or medium code model (#123680)
Original code sequence:
* pcalau12i $a0, %ie_pc_hi20(sym)
* ld.d           $a0, $a0, %ie_pc_lo12(sym)

The code sequence converted is as follows:
* lu12i.w   $a0, %le_hi20(sym)         # le_hi20 != 0, otherwise NOP
* ori          $a0, src, %le_lo12(sym)  # le_hi20 != 0, src = $a0,
                                                         # otherwise,    src = $zero

TODO: When relaxation is enabled, redundant NOP can be removed. This
will be implemented in a future patch.
    
Note: In the normal or medium code model, original code sequence with
relocations allow interleaving, because converted code sequence
calculates the absolute offset. However, in extreme code model, to
identify the current code model, the first four instructions with
relocations must appear consecutively.
2025-04-07 19:58:48 +08:00
Fangrui Song
ba2de8f22d [ELF] Allow absolute relocation referencing symbol index 0 in PIC mode
The value of an absolute relocation, like R_RISCV_HI20 or R_PPC64_LO16,
with a symbol index of 0, the resulting value should be treated as
absolute and permitted in both -pie and -shared links.

This change also resolves an absolute relocation referencing an
undefined symbol in statically-linked executables.

PPC64 has unfortunate exceptions:

* R_PPC64_TOCBASE uses symbol index 0 but it should be treated as
  referencing the linker-defined .TOC.
* R_PPC64_PCREL_OPT (https://reviews.llvm.org/D84360) could no longer
  rely on `isAbsoluteValue` return false.
2025-03-28 20:44:07 -07:00
Fangrui Song
f21c35d54f [ELF] Replace some Fatal with Err
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
2025-01-25 17:29:28 -08:00
Daniil Kovalev
9178708c3b
[PAC][lld][AArch64][ELF] Support signed TLSDESC (#113817)
Depends on #120010

Support `R_AARCH64_AUTH_TLSDESC_ADR_PAGE21`, `R_AARCH64_AUTH_TLSDESC_LD64_LO12`
and `R_AARCH64_AUTH_TLSDESC_LD64_LO12` static relocations and
`R_AARCH64_AUTH_TLSDESC` dynamic relocation. IE/LE optimization is not
currently supported for AUTH TLSDESC.
2025-01-22 12:18:05 +03:00
Daniil Kovalev
1ef5b987a4
[PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (#113816)
Depends on #114525

Support `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` and `R_AARCH64_AUTH_GOT_LD_PREL19`
GOT-generating relocations. A corresponding `RE_AARCH64_AUTH_GOT_PC` member
of `RelExpr` is added, which is an AUTH-specific variant of `R_GOT_PC`.
2024-12-18 09:41:54 +03:00
Daniil Kovalev
417d2d7ce6
[PAC][lld][AArch64][ELF] Support signed GOT (#113815)
Depends on #113811

Support `R_AARCH64_AUTH_ADR_GOT_PAGE`, `R_AARCH64_AUTH_GOT_LO12_NC` and
`R_AARCH64_AUTH_GOT_ADD_LO12_NC` GOT-generating relocations. For preemptible
symbols, dynamic relocation `R_AARCH64_AUTH_GLOB_DAT` is emitted. Otherwise,
we unconditionally emit `R_AARCH64_AUTH_RELATIVE` dynamic relocation since
pointers in signed GOT needs to be signed during dynamic link time.
2024-12-17 10:23:01 +03:00
Fangrui Song
3733ed6f1c [ELF] Introduce Symbol::isExported to cache includeInDynsym
isExported, intended to replace exportDynamic, is primarily set in two
locations, (a) after parseSymbolVersion and (b) during demoteSymbols.

In the future, we should try removing exportDynamic. Currently,
merging exportDynamic/isExported would cause
riscv-gp.s to fail:

* The first isExported computation considers the undefined symbol exported
* Defined as a linker-synthesized symbol
* isExported remains true, while it should be false
2024-12-08 22:40:14 -08:00
Fangrui Song
c650880958 [ELF] Simplify handling of exportDynamic and canBeOmittedFromSymbolTable
When computing whether a defined symbol is exported, we set
`exportDynamic` in Defined and CommonSymbol's ctor and merge the bit in
symbol resolution. The complexity is for the LTO special case
canBeOmittedFromSymbolTable, which can be simplified by introducing a
new bit.

We might simplify the state by caching includeInDynsym in exportDynamic
in the future.
2024-12-08 09:33:48 -08:00
Fangrui Song
8669028c18 [ELF] Remove unneeded sym->file check
After #78944 and some follow-ups, sym->file, unless in the initial
Placeholder stage, is guaranteed to be non-null.
2024-12-07 20:46:02 -08:00
Fangrui Song
ae5fdaea43 [ELF] Simplify printLocation
sym.file is always non-null (since around #78944).
2024-12-07 20:31:50 -08:00
pcc
970d6d2096
ELF: Have __rela_iplt_{start,end} surround .rela.iplt with --pack-dyn-relocs=android.
In #86751 we moved the IRELATIVE relocations to .rela.plt when
--pack-dyn-relocs=android was enabled but we neglected to also move
the __rela_iplt_{start,end} symbols. As a result, static binaries
linked with this flag were unable to find their IRELATIVE relocations.
Fix it by having the symbols surround the correct section.

Reviewers: MaskRay, smithp35

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/118585
2024-12-04 17:35:05 -08:00
Fangrui Song
04996a28b7
[ELF] Rename target-specific RelExpr enumerators
RelExpr enumerators are named `R_*`, which can be confused with ELF
relocation type names. Rename the target-specific ones to `RE_*` to
avoid confusion.

For consistency, the target-independent ones can be renamed as well, but
that's not urgent. The relocation processing mechanism with RelExpr has
non-trivial overhead compared with mold's approach, and we might make
more code into Arch/*.cpp files and decrease the enumerators.

Pull Request: https://github.com/llvm/llvm-project/pull/118424
2024-12-03 09:17:17 -08:00
Fangrui Song
1f13713dbb [ELF] Change getSrcMsg to use ELFSyncStream. NFC 2024-11-29 17:18:22 -08:00
Fangrui Song
d8495ede01 [ELF] Change getLocation to use ELFSyncStream. NFC 2024-11-24 11:16:52 -08:00
Fangrui Song
ff97b28334 [ELF] Simplif reportUndefinedSymbol. NFC 2024-11-24 10:34:11 -08:00
Fangrui Song
37e39667cc [ELF] Make ThunkCreator take ownership of thunks
This removes many SpecificAlloc instantiations and makes my lld (x86-64
Release+Assertions) smaller by ~36k.
2024-11-19 23:16:35 -08:00
Fangrui Song
8f238f662c [ELF] Make Ctx inherit from CommonLinkerContext
link calls `new CommonLinkerContext`. Now that `Ctx ctx` is a local
variable, we can make it inherit from CommonLinkerContext.
2024-11-16 22:28:55 -08:00
Fangrui Song
c1a6defd9f [ELF] Make RelType a struct type
otherwise operator<<(const ELFSyncStream &s, RelType type) applies to
non-reloc-type uint32_t, which can be confusing.
2024-11-16 20:26:34 -08:00
Fangrui Song
e57331ec63 [ELF] Move global relocMutex/undefs into Ctx 2024-11-16 19:22:11 -08:00
Fangrui Song
9664ce6d59 [ELF] Simplify complex diagnostics 2024-11-16 19:11:58 -08:00
Fangrui Song
24c7d97cff [ELF] Replace context-less errorHandler() and error() with ctx.errHandler 2024-11-16 14:14:54 -08:00
Fangrui Song
58a971f42f [ELF] Replace contex-less toString(x) with toStr(ctx, x)
so that we can remove the global `ctx` from toString implementations.
Rename lld::toString (to lld:🧝:toStr) to simplify name lookup (we
have many llvm::toString and another lld::toString(const llvm::opt::Arg
&)).
2024-11-16 11:58:10 -08:00
Fangrui Song
47e6673006 [ELF] Replace toString(RelType) with operator<< while using ELFSyncStream 2024-11-16 10:12:08 -08:00
Peter Smith
098b0d18ad
[LLD][AArch64] Detach Landing Pad creation from Thunk creation (#116402)
Move Landing Pad Creation to a new function that checks each thunk every
pass to see if it needs a landing pad. This permits a thunk to be
created without needing a landing pad, but later needing one due to
drifting out of direct branch range and requiring an indirect branch.

We record all the Thunks created so far in a new vector rather than
trying to iterate over the DenseMap as we need a deterministic order of
adding LandingPadThunks due to the short branch fall through. We cannot
use normalizeExistingThunk() either as that only iterates through live
thunks.

Fixes: https://crbug.com/377438309
Original PR: https://github.com/llvm/llvm-project/pull/108989

Sending without a new test case to fix existing test. A new regression
test will come in a separate PR as coming up with a small enough
reproducer for this case is non-trivial.
2024-11-15 18:18:18 +00:00
Fangrui Song
3d57c79728 [ELF] Migrate away from global ctx 2024-11-14 22:50:53 -08:00
Fangrui Song
9b058bb42d [ELF] Replace errorOrWarn(...) with Err 2024-11-06 22:33:51 -08:00
Fangrui Song
f8bae3af74 [ELF] Replace warn(...) with Warn 2024-11-06 22:19:31 -08:00
Fangrui Song
09c2c5e1e9 [ELF] Replace error(...) with ErrAlways or Err
Most are migrated to ErrAlways mechanically.
In the future we should change most to Err.
2024-11-06 22:04:52 -08:00
Fangrui Song
63c6fe4a0b [ELF] Replace fatal(...) with Fatal or Err 2024-11-06 21:17:26 -08:00
Fangrui Song
861bd36bce [ELF] Pass Ctx & to Symbol::getVA 2024-10-19 20:32:58 -07:00