Move the "offset is outside the section" error for merge sections from
getSectionPiece to getSymVA, where we know the offset comes from a
section symbol + addend. Include the offset value in the diagnostic.
Accept offset == section_size (one-past-end) to match GNU ld behavior,
while rejecting offset > section_size. Skip out-of-bounds offsets in
MarkLive to avoid assertion failures in getSectionPiece.
Implement RISCV::scanSectionImpl, following the pattern established
for x86 (#178846) and AArch64 (#181099). This merges the getRelExpr
and TLS handling for SHF_ALLOC sections into the target-specific
scanner, enabling devirtualization and eliminating abstraction
overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Handle TLS IE and GD directly (RISC-V does not optimize GD/LD/IE).
- Replace TLS-optimization-specific expressions for TLSDESC, following
the x86 pattern: R_RELAX_TLS_GD_TO_IE -> R_GOT_PC,
R_RELAX_TLS_GD_TO_LE -> R_TPREL. Update relocateAlloc and relax()
to dispatch on relocation type instead of RelExpr for TLSDESC.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc and preprocessRelocs.
- Remove RISC-V-specific checks from handleTlsRelocation (isRISCV
variable, TLSDESC label special cases).
- Move R_RISCV_VENDOR handling into the relocation type switch. An
undefined vendor symbol now gets the standard undefined symbol error
instead of a vendor-specific diagnostic.
Implement LoongArch::scanSectionImpl, following the pattern established
for x86, PPC64, SystemZ, AArch64. This merges the getRelExpr and TLS
handling for SHF_ALLOC sections into the target-specific scanner,
enabling devirtualization and eliminating abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Inline TLS handling: IE->LE optimization for _PC_ variants only (not
_PCADD_ or absolute), TLSDESC->IE/LE for non-extreme code model,
GD/LD flag setting without going through generic handleTlsRelocation.
- Remove adjustTlsExpr by inlining its logic into scanSectionImpl.
- Remove LoongArch-specific code from Relocations.cpp:
handleTlsRelocation, execOptimizeInLoongArch, and the sort condition.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc, scanEhSection, and the extreme code model fallback
in relocateAlloc.
Implement AArch64::scanSectionImpl, following the pattern established
for x86 (#178846), PPC64 (#181496), and SystemZ (#181563). This merges
the getRelExpr and TLS handling for SHF_ALLOC sections into the
target-specific scanner, enabling devirtualization and eliminating
abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations, and handleTlsIe/handleTlsDesc for TLS IE/TLSDESC.
- Remove some AArch64-specific RelExpr members (RE_AARCH64_AUTH_GOT,
RE_AARCH64_AUTH_GOT_PC, RE_AARCH64_AUTH_GOT_PAGE_PC,
RE_AARCH64_AUTH_TLSDESC_PAGE, RE_AARCH64_AUTH_TLSDESC,
RE_AARCH64_RELAX_TLS_GD_TO_IE_PAGE_PC) by using regular RelExpr
members with flag-based dispatch (NEEDS_GOT_AUTH, NEEDS_TLSDESC_AUTH).
AUTH GOT relocations now call `sym.setFlags(NEEDS_GOT | NEEDS_GOT_AUTH)`
and `rs.processAux` directly.
- Remove adjustTlsExpr and handleAArch64PAuthTlsRelocation by inlining
their logic into scanSectionImpl and relocateAlloc.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc and EhInputSection::preprocessRelocs.
Implement PPC::scanSectionImpl, following the pattern established for
x86 (#178846) and PPC64 (#181496). This merges the getRelExpr and TLS
handling for SHF_ALLOC sections into the target-specific scanner,
enabling devirtualization and eliminating abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Handle R_PPC_PLTREL24 inline with addend masking via processAux,
removing the EM_PPC special case from process().
- Handle TLS GD/LD/IE directly, eliminating handleTlsRelocation,
getTlsGdRelaxSkip, and adjustTlsExpr overrides. Use handleTlsIe
for TLS IE, and handleTlsGd for R_PPC_GOT_TLSGD16.
- Use R_DTPREL unconditionally for DTPREL relocations, removing
R_RELAX_TLS_LD_TO_LE_ABS (PPC32 was the only user).
- Move TLS relaxation dispatch from relocateAlloc into relocate,
removing the relocateAlloc override.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc and .eh_frame.
Implement PPC64::scanSectionImpl, following the pattern established for
x86. This merges the getRelExpr and TLS handling for SHF_ALLOC sections
into the target-specific scanner, enabling devirtualization and
eliminating abstraction overhead.
- Inline relocation classification into scanSectionImpl with a switch
on relocation type, replacing the generic rs.scan() path.
- Use processR_PC/processR_PLT_PC for common PC-relative and PLT
relocations.
- Handle TLS GD, LD, and DTPREL directly, eliminating
handleTlsRelocation, getTlsGdRelaxSkip, and adjustTlsExpr overrides.
Use handleTlsIe for TLS IE, enabling IE-to-LE optimization even when
ppc64DisableTLSRelax is set (lifted a limitation from
the workaround patch https://reviews.llvm.org/D92959).
- Use processAux for R_PPC64_PCREL_OPT. Remove the PPC64-specific
special case from process().
- Replace RE_PPC64_RELAX_GOT_PC with R_RELAX_GOT_PC, which computes
the same value (sym + addend - PC).
- Replace RE_PPC64_RELAX_TOC with R_GOTREL, moving the
ctx.arg.tocOptimize check to relocateAlloc.
- Switch relocateAlloc from expr-based to type-based dispatch.
- Simplify getRelExpr to only handle relocations needed by
relocateNonAlloc.
The current implementation in addRelativeReloc makes it look like we're
writing the symbol's VA + addend to the section, because that's what the
given relocation will evaluate to, but we're supposed to be writing the
negated original addend (since the relative relocation's addend will be
the sum of the symbol's VA and the original addend). This only works
because deep down in AArch64::relocate we throw away the computed value
and peek back inside the relocation to extract the addend and negate it.
Do this properly by having a relocation that evaluates to the right
value instead.
Reviewers: kovdan01, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171182
.eh_frame sections require special sub-section processing, specifically,
CIEs are de-duplicated and FDEs are garbage collected. Create a
specialized scanEhSection() function utilizing the just-added
EhInputSection::rels. OffsetGetter is moved to scanEhSection.
This improves separation of concerns between InputSection and
EhInputSection processing.
This removes another `relsOrRelas` call using `supportsCrel=false`.
DWARF.cpp now has the last call.
Pull Request: https://github.com/llvm/llvm-project/pull/161091
Store relocations directly as `SmallVector<Relocation, 0>` within
EhInputSection to avoid processing different relocation formats
(REL/RELA/CREL) throughout the codebase.
Next: Refactor RelocationScanner to utilize EhInputSection::rels
Pull Request: https://github.com/llvm/llvm-project/pull/161041
relocateAlloc can be called with either InputSection (including
SyntheticSection like GotSection) or EhInputSection.
Introduce relocateEh so that we can remove some boilerplate and replace
relocateAlloc's parameter type with `InputSection`.
Pull Request: https://github.com/llvm/llvm-project/pull/160031
Instead of having a special DynamicReloc::Kind, we can just use a new
RelExpr for the calculation needed. The only odd thing we do that allows
this is to keep a representative symbol for the OutputSection in
question (the first we see for it) around to use in this relocation for
the addend calculation.
This reduces DynamicReloc to just AddendOnly vs AgainstSymbol, plus the
internal Computed.
Reviewers: MaskRay, arichardson
Reviewed By: MaskRay, arichardson
Pull Request: https://github.com/llvm/llvm-project/pull/150810
Support TLSDESC to initial-exec or local-exec optimizations. Introduce a
new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.
In normal or medium code model, there are two forms of code sequences:
* pcalau12i $a0, %desc_pc_hi20(sym_desc)
* addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
------
* pcaddi $a0, %desc_pcrel_20(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd] $a0, $a0, %ie_pc_lo12(sym_ie)
Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP
* ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero
Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to
convert the preceding instructions to NOPs, due to both forms of code
sequence (corresponding to relocation combinations:
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
R_LARCH_TLS_DESC_PCREL20_S2) have same process.
TODO: When relaxation enables, redundant NOPs can be removed. It will be
implemented in a future patch.
Note: All forms of TLSDESC code sequences should not appear interleaved
in the normal, medium or extreme code model, which compilers do not
generate and lld is unsupported. This is thanks to the guard in
PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```
Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.
This reverts commit 1e95349dbe329938d2962a78baa0ec421e9cd7d1.
Original commit message follows:
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Reviewers: zmodem, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/145579
This caused assertion failures in applyBranchToBranchOpt():
llvm/include/llvm/Support/Casting.h:578:
decltype(auto) llvm::cast(From*)
[with To = lld:🧝:InputSection; From = lld:🧝:InputSectionBase]:
Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
See comment on the PR (https://github.com/llvm/llvm-project/pull/138366)
This reverts commit 491b82a5ec1add78d2c93370580a2f1897b6a364.
This also reverts the follow-up "[lld] Use llvm::partition_point (NFC) (#145209)"
This reverts commit 2ac293f5ac4cf65c0c038bf75a88f1d6715e467d.
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Pull Request: https://github.com/llvm/llvm-project/pull/138366
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
Depends on #120010
Support `R_AARCH64_AUTH_TLSDESC_ADR_PAGE21`, `R_AARCH64_AUTH_TLSDESC_LD64_LO12`
and `R_AARCH64_AUTH_TLSDESC_LD64_LO12` static relocations and
`R_AARCH64_AUTH_TLSDESC` dynamic relocation. IE/LE optimization is not
currently supported for AUTH TLSDESC.
Depends on #114525
Support `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` and `R_AARCH64_AUTH_GOT_LD_PREL19`
GOT-generating relocations. A corresponding `RE_AARCH64_AUTH_GOT_PC` member
of `RelExpr` is added, which is an AUTH-specific variant of `R_GOT_PC`.
Depends on #113811
Support `R_AARCH64_AUTH_ADR_GOT_PAGE`, `R_AARCH64_AUTH_GOT_LO12_NC` and
`R_AARCH64_AUTH_GOT_ADD_LO12_NC` GOT-generating relocations. For preemptible
symbols, dynamic relocation `R_AARCH64_AUTH_GLOB_DAT` is emitted. Otherwise,
we unconditionally emit `R_AARCH64_AUTH_RELATIVE` dynamic relocation since
pointers in signed GOT needs to be signed during dynamic link time.
RelExpr enumerators are named `R_*`, which can be confused with ELF
relocation type names. Rename the target-specific ones to `RE_*` to
avoid confusion.
For consistency, the target-independent ones can be renamed as well, but
that's not urgent. The relocation processing mechanism with RelExpr has
non-trivial overhead compared with mold's approach, and we might make
more code into Arch/*.cpp files and decrease the enumerators.
Pull Request: https://github.com/llvm/llvm-project/pull/118424
The R_ARM_SBREL32 relocation is used in debug info for ARM RWPI
(read-write position independent) code. Compiler-generated DWARF info
will use an expression to add the relocated value to the actual value of
the static base (held in r9) at run-time, so it should be relocated as
if the static base is at address 0.
Move `sectionKind` outside the bitfield and move bss/keepUnique to
InputSectionBase.
* sizeof(InputSection) decreases from 160 to 152 on 64-bit systems.
* The numerous `sectionKind` accesses are faster.
SectionBase, InputSectionBase, InputSection, MergeInputSection, and
OutputSection have different member orders. Make them consistent and
adopt the order similar to the raw Elf64_Shdr.
so that we can remove the global `ctx` from toString implementations.
Rename lld::toString (to lld:🧝:toStr) to simplify name lookup (we
have many llvm::toString and another lld::toString(const llvm::opt::Arg
&)).