Commit a94060ca0c87 ("[ELF] Pass Ctx & to Relocations") swapped the
InputSectionBase &c argument for an InputSectionBase *sec member, and so
"c." was replaced with "sec->". However, this must have done in such a
way that "Local-Exec." was transformed to "Local-Exesec->" and
"RISCV::relocateAlloc." to "RISCV::relocateAllosec->", i.e. without the
use of something like clangd, and without appropriate word boundaries in
a regex.
For a non-preemptible ifunc, `handleNonPreemptibleIfunc` creates a
cloned
symbol (`directSym`) to compute the addend of the IRELATIVE dynamic
relocation.
This cloned symbol wasn't tracked by `initSymbolAnchors`, so its value
wasn't adjusted during RISC-V/LoongArch linker relaxation.
This caused IRELATIVE addends to point to pre-relaxation addresses.
Fix this by:
- Tracking cloned IRELATIVE symbols in `ctx.irelativeSyms`
- Adding these symbols to `relaxAux->anchors` in `initSymbolAnchors`
This allows R_AARCH64_AUTH_ABS64 to follow R_AARCH64_ABS64's flow rather
than being implemented on the side in the place that is normally for
symbolic relocations.
Note that this has one implementation change: the RelExpr passed to
relaDyn is now RE_AARCH64_AUTH rather than R_ABS, but the two are
handled identically by InputSectionbase::getRelocTargetVA, and it was
inconsistent with relrAuthDyn which was passed RE_AARCH64_AUTH.
Reviewers: kovdan01
Pull Request: https://github.com/llvm/llvm-project/pull/171180
Currently, R_AARCH64_AUTH_ABS64 against a tagged global just ignores the
tagging and so, if out of the symbol's bounds, does not write the
negated original addend for the loader to determine which granule's tag
to use for it. Handle the composition of the two.
Note that R_AARCH64_AUTH_ABS64/RELATIVE encode the signing schema in the
upper 32 bits of the value at the relocation target, and so only the
lower 32 bits are available for use as an addend, including for Memtag's
disambiguation, and so if a wildly out-of-bounds PAuth relocation
against a tagged global is used we have no choice but to error out with
the current ABI.
Reviewers: MaskRay, kovdan01, smithp35, asl
Reviewed By: smithp35
Pull Request: https://github.com/llvm/llvm-project/pull/173291
Original crash was observed in Chromium, in [1]. The problem occurs in
elf::isAArch64BTILandingPad because it didn't handle synthetic sections,
which can have a nullptr as a buf, so it crashed while trying to read
that buf.
After fixing that, a second issue occurs: When the patched code grows
too
much, it gets far away from the short jump, and the current
implementation
assumes a R_AARCH64_JUMP26 will be enough.
This PR changes the implementation to:
(a) In isAArch64BTILandingPad, checks if a section is synthetic, and
assumes that it'll NOT contain a landing pad, avoiding the buffer check;
(b) Suppress the size rounding for thunks that preceeds section
(Making the situation less likely to happen);
(c) Reimplements the patch by using a R_AARCH64_ABS64 in case the
patched code is still far away.
[1] https://issues.chromium.org/issues/440019454
---------
Co-authored-by: Tarcisio Fischer <tarcisio.fischer@arm.com>
This call to addRelativeReloc is the same as the one at the end of the
function, so skip the relrDyn code for this case and add the special
out-of-bounds handling code to the end of the function. This makes it
obvious where MTE globals differ in behaviour rather than having to
compare the two different implementations.
This also adds a comment documenting why relrDyn isn't used, and in it
highlights that it's probably safe to use relrDyn so long as the offset
is within the symbol's bounds.
Reviewers: pcc, kovdan01, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171181
This makes addRelativeReloc a bit more readable and uniform, as well as
the relrAuthDyn call in RelocScan::process.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171178
There's no need to poke into the internals, we can just use the more
abstract member function like everywhere else in LLD.
Reviewers: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171176
The current implementation in addRelativeReloc makes it look like we're
writing the symbol's VA + addend to the section, because that's what the
given relocation will evaluate to, but we're supposed to be writing the
negated original addend (since the relative relocation's addend will be
the sum of the symbol's VA and the original addend). This only works
because deep down in AArch64::relocate we throw away the computed value
and peek back inside the relocation to extract the addend and negate it.
Do this properly by having a relocation that evaluates to the right
value instead.
Reviewers: kovdan01, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171182
The only difference between these calls is whether rel or type is passed
as the first argument, but AArch64::getDynRel returns type unchanged for
R_AARCH64_AUTH_ABS64, so they are the same.
Reviewers: MaskRay, kovdan01
Pull Request: https://github.com/llvm/llvm-project/pull/171179
NEEDS_TLSGD_TO_IE is only ever set when the symbol is preeptible, in
which case addTpOffsetGotEntry will just add the symbol to the GOT and
emit a symbolic tlsGotRel anyway, so there is no need to give it its own
special case.
As well as simplifying the code upstream, this is useful downstream for
Morello, which doesn't really have a proper GD/IE-to-LE relaxation, and
so for GD-to-IE can benefit from being able to use the optimisations
addTpOffsetGotEntry has for non-preemptible symbols, rather than having
to reimplement them here.
R_AARCH64_FUNCINIT64 is a dynamic relocation type for relocating
word-sized data in the output file using the return value of
a function. An R_AARCH64_FUNCINIT64 shall be relocated as an
R_AARCH64_IRELATIVE with the target symbol address if the target
symbol is non-preemptible, and it shall be a usage error to relocate an
R_AARCH64_FUNCINIT64 with a preemptible or STT_GNU_IFUNC target symbol.
The initial use case for this relocation type shall be for emitting
global variable field initializers for structure protection. With
structure protection, the relocation value computation is tied to the
compiler implementation in such a way that it would not be reasonable to
define a relocation type for it (for example, it may involve computing
a hash using a compiler-determined algorithm), hence the need for the
computation to be implemented as code in the binary.
Part of the AArch64 psABI extension:
https://github.com/ARM-software/abi-aa/issues/340
Reviewers: smithp35, fmayer, MaskRay
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/156564
- Extract RelocScan to RelocScan.h. The file includes Target.h, and
cannot be merged with Relocations.h
- Add MIPS and PPC64 specific relocation scanners, removing runtime
checks for other targets.
This refactoring prepares the codebase for better target-specific
optimizations and easier addition of target-specific behavior.
.eh_frame sections require special sub-section processing, specifically,
CIEs are de-duplicated and FDEs are garbage collected. Create a
specialized scanEhSection() function utilizing the just-added
EhInputSection::rels. OffsetGetter is moved to scanEhSection.
This improves separation of concerns between InputSection and
EhInputSection processing.
This removes another `relsOrRelas` call using `supportsCrel=false`.
DWARF.cpp now has the last call.
Pull Request: https://github.com/llvm/llvm-project/pull/161091
relocateAlloc can be called with either InputSection (including
SyntheticSection like GotSection) or EhInputSection.
Introduce relocateEh so that we can remove some boilerplate and replace
relocateAlloc's parameter type with `InputSection`.
Pull Request: https://github.com/llvm/llvm-project/pull/160031
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 30 instances that rely on this "redirection". Since the
redirection doesn't improve readability, this patch replaces SmallSet
with SmallPtrSet for pointer element types.
I'm planning to remove the redirection eventually.
DynamicReloc::AgainstSymbol is now true and DynamicReloc::AddendOnly is
now false; uses of the constants were replaced mechanically.
Reviewers: rnk, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/150813
The former is just a special case of the latter, ignoring the expr and
always just using the addend. If we use R_ADDEND as expr (which
previously had no effect, and so was misleadingly R_ABS not R_ADDEND in
all but one use) then we don't need to maintain this as a separate case.
Aside from the internal Computed Kind, this just leaves MipsMultiGotPage
as a special case; the only difference between the other two Kind values
is what needsDynSymIndex returns.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/150798
The former is just a special case of the latter, ignoring the expr and
always just using the addend, allowing (and enforcing) the sym is null.
If we just use dummySym then we don't need to maintain this as a
separate case, since R_ADDEND will return the addend unmodified for the
call to getRelocTargetVA.
Reviewers: MaskRay, arichardson
Reviewed By: MaskRay, arichardson
Pull Request: https://github.com/llvm/llvm-project/pull/150797
This ensures subsequent calls to elf::postScanRelocations with a new Ctx
will correctly use an instance with the right internalFile (with the old
one presumably deleted, even). It also avoids having to create a new
instance in elf::getErrorPlace, and will allow more uses of such a dummy
symbol in future commits.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/150796
Without thunks, programs will encounter link errors complaining that the
branch target is out of range. Thunks will extend the range of branch
targets, which is a critical need for large programs. Thunks provide
this flexibility at a cost of some modest code size increase.
When configured with the maximal feature set, the hexagon port of the
linux kernel would often encounter these limitations when linking with
`lld`.
The relocations which will be extended by thunks are:
* R_HEX_B22_PCREL, R_HEX_{G,L}D_PLT_B22_PCREL, R_HEX_PLT_B22_PCREL
relocations have a range of ± 8MiB on the baseline
* R_HEX_B15_PCREL: ±65,532 bytes
* R_HEX_B13_PCREL: ±16,380 bytes
* R_HEX_B9_PCREL: ±1,020 bytes
Fixes#149689
Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
---------
Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
* Fix `.reloc constant` to mean section_symbol+constant instead of
.+constant . The initial .reloc support from MIPS incorrectly
interpreted the offset.
* Delay the evaluation of the offset expression after
MCAssembler::layout, deleting a lot of code working with MCFragment.
* Delete many FIXME from https://reviews.llvm.org/D79625
* Some lld/ELF/Arch/LoongArch.cpp relaxation tests rely on .reloc .,
R_LARCH_ALIGN generating ALIGN relocations at specific location.
Sort the relocations.
Support TLSDESC to initial-exec or local-exec optimizations. Introduce a
new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.
In normal or medium code model, there are two forms of code sequences:
* pcalau12i $a0, %desc_pc_hi20(sym_desc)
* addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
------
* pcaddi $a0, %desc_pcrel_20(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd] $a0, $a0, %ie_pc_lo12(sym_ie)
Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP
* ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero
Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to
convert the preceding instructions to NOPs, due to both forms of code
sequence (corresponding to relocation combinations:
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
R_LARCH_TLS_DESC_PCREL20_S2) have same process.
TODO: When relaxation enables, redundant NOPs can be removed. It will be
implemented in a future patch.
Note: All forms of TLSDESC code sequences should not appear interleaved
in the normal, medium or extreme code model, which compilers do not
generate and lld is unsupported. This is thanks to the guard in
PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```
Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.
This reverts commit 1e95349dbe329938d2962a78baa0ec421e9cd7d1.
Original commit message follows:
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Reviewers: zmodem, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/145579
This caused assertion failures in applyBranchToBranchOpt():
llvm/include/llvm/Support/Casting.h:578:
decltype(auto) llvm::cast(From*)
[with To = lld:🧝:InputSection; From = lld:🧝:InputSectionBase]:
Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
See comment on the PR (https://github.com/llvm/llvm-project/pull/138366)
This reverts commit 491b82a5ec1add78d2c93370580a2f1897b6a364.
This also reverts the follow-up "[lld] Use llvm::partition_point (NFC) (#145209)"
This reverts commit 2ac293f5ac4cf65c0c038bf75a88f1d6715e467d.
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Pull Request: https://github.com/llvm/llvm-project/pull/138366
* Merge the special case into isStaticLinkTimeConstant
* Generalize isUndefWeak to isUndefined. undefined non-weak is an error
case. We choose to be general, which also brings us in line with GNU ld.
Increase specificity by using the correct unit sizes. KBytes is an
abbreviation for kB, 1000 bytes, and the hardware industry as well as
several operating systems have now switched to using 1000 byte kBs.
If this change is acceptable, sometimes GitHub mangles merges to use the
original email of the account. $dayjob asks contributions have my work
email. Thanks!
This lock is unnecessary because we can add the relocations to
shards and let them be sorted later.
Reviewers: smithp35, fmayer, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/135123
Original code sequence:
* pcalau12i $a0, %ie_pc_hi20(sym)
* ld.d $a0, $a0, %ie_pc_lo12(sym)
The code sequence converted is as follows:
* lu12i.w $a0, %le_hi20(sym) # le_hi20 != 0, otherwise NOP
* ori $a0, src, %le_lo12(sym) # le_hi20 != 0, src = $a0,
# otherwise, src = $zero
TODO: When relaxation is enabled, redundant NOP can be removed. This
will be implemented in a future patch.
Note: In the normal or medium code model, original code sequence with
relocations allow interleaving, because converted code sequence
calculates the absolute offset. However, in extreme code model, to
identify the current code model, the first four instructions with
relocations must appear consecutively.
The value of an absolute relocation, like R_RISCV_HI20 or R_PPC64_LO16,
with a symbol index of 0, the resulting value should be treated as
absolute and permitted in both -pie and -shared links.
This change also resolves an absolute relocation referencing an
undefined symbol in statically-linked executables.
PPC64 has unfortunate exceptions:
* R_PPC64_TOCBASE uses symbol index 0 but it should be treated as
referencing the linker-defined .TOC.
* R_PPC64_PCREL_OPT (https://reviews.llvm.org/D84360) could no longer
rely on `isAbsoluteValue` return false.
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
Depends on #120010
Support `R_AARCH64_AUTH_TLSDESC_ADR_PAGE21`, `R_AARCH64_AUTH_TLSDESC_LD64_LO12`
and `R_AARCH64_AUTH_TLSDESC_LD64_LO12` static relocations and
`R_AARCH64_AUTH_TLSDESC` dynamic relocation. IE/LE optimization is not
currently supported for AUTH TLSDESC.
Depends on #114525
Support `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` and `R_AARCH64_AUTH_GOT_LD_PREL19`
GOT-generating relocations. A corresponding `RE_AARCH64_AUTH_GOT_PC` member
of `RelExpr` is added, which is an AUTH-specific variant of `R_GOT_PC`.
Depends on #113811
Support `R_AARCH64_AUTH_ADR_GOT_PAGE`, `R_AARCH64_AUTH_GOT_LO12_NC` and
`R_AARCH64_AUTH_GOT_ADD_LO12_NC` GOT-generating relocations. For preemptible
symbols, dynamic relocation `R_AARCH64_AUTH_GLOB_DAT` is emitted. Otherwise,
we unconditionally emit `R_AARCH64_AUTH_RELATIVE` dynamic relocation since
pointers in signed GOT needs to be signed during dynamic link time.
isExported, intended to replace exportDynamic, is primarily set in two
locations, (a) after parseSymbolVersion and (b) during demoteSymbols.
In the future, we should try removing exportDynamic. Currently,
merging exportDynamic/isExported would cause
riscv-gp.s to fail:
* The first isExported computation considers the undefined symbol exported
* Defined as a linker-synthesized symbol
* isExported remains true, while it should be false
When computing whether a defined symbol is exported, we set
`exportDynamic` in Defined and CommonSymbol's ctor and merge the bit in
symbol resolution. The complexity is for the LTO special case
canBeOmittedFromSymbolTable, which can be simplified by introducing a
new bit.
We might simplify the state by caching includeInDynsym in exportDynamic
in the future.
In #86751 we moved the IRELATIVE relocations to .rela.plt when
--pack-dyn-relocs=android was enabled but we neglected to also move
the __rela_iplt_{start,end} symbols. As a result, static binaries
linked with this flag were unable to find their IRELATIVE relocations.
Fix it by having the symbols surround the correct section.
Reviewers: MaskRay, smithp35
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/118585