Without thunks, programs will encounter link errors complaining that the
branch target is out of range. Thunks will extend the range of branch
targets, which is a critical need for large programs. Thunks provide
this flexibility at a cost of some modest code size increase.
When configured with the maximal feature set, the hexagon port of the
linux kernel would often encounter these limitations when linking with
`lld`.
The relocations which will be extended by thunks are:
* R_HEX_B22_PCREL, R_HEX_{G,L}D_PLT_B22_PCREL, R_HEX_PLT_B22_PCREL
relocations have a range of ± 8MiB on the baseline
* R_HEX_B15_PCREL: ±65,532 bytes
* R_HEX_B13_PCREL: ±16,380 bytes
* R_HEX_B9_PCREL: ±1,020 bytes
Fixes#149689
Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
---------
Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
Include the offset of a thunk in the ThunkSection when adding symbols.
At Thunk creation time the offset is set to 0 as we don't know where in
the ThunkSection the Thunk will end up. The symbol values are updated by
the setOffset() call in assignOffsets().
When we transform a thunk from a short to a long, we sometimes add a
mapping symbol. At this point the offset of the thunk is non zero and we
need to account for that when defining the symbol, as the setOffset()
call subtracts the offset before adding the new one back in.
To test; added a second thunk that is converted to a long thunk to
aarch64-thunk-bit-multipass. This second thunk is given a non zero
offset from the start of the Thunk Section so we can observe the mapping
symbol being put in the wrong place without accounting for the offset.
fixes: https://github.com/llvm/llvm-project/issues/142326
This permits an AArch64AbsLongThunk to be used in an environment where
unaligned accesses are disabled.
The AArch64AbsLongThunk does a load of an 8-byte address. When unaligned
accesses are disabled this address must be 8-byte aligned.
The vast majority of AArch64 systems will have unaligned accesses
enabled in userspace. However, after a reset, before the MMU has been
enabled, all memory accesses are to "device" memory, which requires
aligned accesses. In systems with multi-stage boot loaders a thunk may
be required to a later stage before the MMU has been enabled.
As we only want to increase the alignment when the ldr is used we delay
the increase in thunk alignment until we know we are going to write an
ldr. We also need to account for the ThunkSection alignment increase
when this happens.
In some of the test updates, particularly those with shared CHECK lines
with position independent thunks it was easier to ensure that the thunks
started at an 8-byte aligned address in all cases.
Thumb short thunks use the B.w instruction. This instruction is not
present on Arm v6-m so we should prevent these targets from using
short-thunks. We want to permit Arm v8-m.base targets to continue using
short thunks as it does have the B.w instruction despite not
implementing all of Thumb 2.
Add a check to see if the Movt and Movw instructions are present before
enabling short thunks for Thumb. The v6-m architecture has
J1J2BranchEncoding, but it does not have Movt and Movw, whereas
v8-m.base has both.
The memory map and limited flash size of an Arm v6-m CPU makes a short
thunk very unlikely in practice, but it is worth getting it right just
in case.
When we create a thunk we don't know whether it will be short or long.
Move the emission of the long thunk mapping symbol to when we transition
to a long thunk. This improves disassembly and binary analysis as tools
like BOLT identify thunks by disassembly.
This removes a FIXME added in #108989 aarch64-thunk-bti-multipass.s
which had a corrupt disassembly due to missing mapping symbols.
When Branch Target Identification BTI is enabled all indirect branches
must target a BTI instruction. A long branch thunk is a source of
indirect branches. To date LLD has been assuming that the object
producer is responsible for putting a BTI instruction at all places the
linker might generate an indirect branch to. This is true for clang, but
not for GCC. GCC will elide the BTI instruction when it can prove that
there are no indirect branches from outside the translation unit(s). GNU
ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for
the destination when a long range stub was needed [1].
This means that using GCC compiled objects with LLD may lead to LLD
generating an indirect branch to a location without a BTI. The ABI [2]
has also been clarified to say that it is a static linker's
responsibility to generate a landing pad when the target does not have a
BTI.
This patch implements the same mechansim as GNU ld. When the output ELF
file is setting the
GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the
destination to see if it has a BTI instruction. If it does not we
generate a landing pad consisting of:
BTI c
B <destination>
The B <destination> can be elided if the thunk can be placed so that
control flow drops through. For example:
BTI c
<destination>:
This will be common when -ffunction-sections is used.
The landing pad thunks are effectively alternative entry points for the
function. Direct branches are unaffected but any linker generated
indirect branch needs to use the alternative. We place these as close as
possible to the destination section.
There is some further optimization possible. Consider the case:
.text
fn1
...
fn2
...
If we need landing pad thunks for both fn1 and fn2 we could order them
so that the thunk for fn1 immediately precedes fn1. This could save a
single branch. However I didn't think that would be worth the additional
complexity.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
[2] https://github.com/ARM-software/abi-aa/issues/196
Ctx was introduced in March 2022 as a more suitable place for such
singletons.
llvm/Support/thread.h includes <thread>, which transitively includes
sstream in libc++ and uses ios_base::in, so we cannot use `#define in ctx.sec`.
`symtab, config, ctx` are now the only variables using
LLVM_LIBRARY_VISIBILITY.
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Strip calls to raw_string_ostream::str(), to avoid excess layer of indirection.
As per the ABI at
https://github.com/ARM-software/abi-aa/blob/main/memtagabielf64/memtagabielf64.rst,
this patch interprets the SHT_AARCH64_MEMTAG_GLOBALS_STATIC section,
which contains R_NONE relocations to tagged globals, and emits a
SHT_AARCH64_MEMTAG_GLOBALS_DYNAMIC section, with the correct
DT_AARCH64_MEMTAG_GLOBALS and DT_AARCH64_MEMTAG_GLOBALSSZ dynamic
entries. This section describes, in a uleb-encoded stream, global memory
ranges that should be tagged with MTE.
We are also out of bits to spare in the LLD Symbol class. As a result,
I've reused the 'needsTocRestore' bit, which is a PPC64 only feature.
Now, it's also used for 'isTagged' on AArch64.
An entry in SHT_AARCH64_MEMTAG_GLOBALS_STATIC is practically a guarantee
from an objfile that all references to the linked symbol are through the
GOT, and meet correct alignment requirements. As a result, we go through
all symbols and make sure that, for all symbols $SYM, all object files
that reference $SYM also have a SHT_AARCH64_MEMTAG_GLOBALS_STATIC entry
for $SYM. If this isn't the case, we demote the symbol to being
untagged. Symbols that are imported from other DSOs should always be
fine, as they're GOT-referenced (and thus the GOT entry either has the
correct tag or not, depending on whether it's tagged in the defining DSO
or not).
Additionally hand-tested by building {libc, libm, libc++, libm, and libnetd}
on Android with some experimental MTE globals support in the
linker/libc.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D152921
This patch adds a thunk for Thumb long branch on V6-M for eXecute Only.
Note that there is currently no support for a position independant and
eXecute Only V6-M long branch thunk
Differential Revision: https://reviews.llvm.org/D153772
Arm has BE8 big endian configuration called a byte-invariant(every byte has the same address on little and big-endian systems).
When in BE8 mode:
1. Instructions are big-endian in relocatable objects but
little-endian in executables and shared objects.
2. Data is big-endian.
3. The data encoding of the ELF file is ELFDATA2MSB.
To support BE8 without an ABI break for relocatable objects,the linker takes on the responsibility of changing the endianness of instructions. At a high level the only difference between BE32 and BE8 in the linker is that for BE8:
1. The linker sets the flag EF_ARM_BE8 in the ELF header.
2. The linker endian reverses the instructions, but not data.
This patch adds BE8 big endian support for Arm. To endian reverse the instructions we'll need access to the mapping symbols. Code sections can contain a mix of Arm, Thumb and literal data. We need to endian reverse Arm instructions as words, Thumb instructions
as half-words and ignore literal data.The only way to find these transitions precisely is by using mapping symbols. The instruction reversal will need to take place after relocation. For Arm BE8 code sections (Section has SHF_EXECINSTR flag ) we inserted a step after relocation to endian reverse the instructions. The implementation strategy i have used here is to write all sections BE32 including SyntheticSections then endian reverse all code in InputSections via mapping symbols.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D150870
As per section 4.2.2 of the PowerPC ELFv2 ABI, this value tells the dynamic linker which optimizations it is allowed to do.
Specifically, the higher order bit of the two tells the dynamic linker that there may be multiple TOC pointers in the binary.
When we resolve any NOTOC relocations during linking, we need to set this value because we may be calling
TOC functions from NOTOC functions when the NOTOC function already clobbered the TOC pointer.
In practice, this ensures that the PLT resolver always resolves the call to the GEP (global entry point) of
the TOC function (which will set up the TOC for the TOC function).
Original patch by nemanjai
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D150631
Relocations R_AVR_LO8_LDI_GS/R_AVR_HI8_LDI_GS (indirect calls
via function pointers) only cover range 128KiB. They are
equivalent to R_AVR_LO8_LDI_PM/R_AVR_HI8_LDI_PM within this
range.
But for function addresses beyond this range, GNU-ld emits
trampolines. And this patch implements corresponding thunks
for them in lld.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D147364
The AArch64 branch immediate instruction has a 128MiB range. This
makes it suitable for use a short range thunk in the same way as
short thunks are implemented in Arm and PPC. This patch adds
support for short range thunks to AArch64.
Adding short range thunk support should mean that OutputSections
can grow to nearly 256 MiB in size without needing long-range
indirect branches.
Differential Revision: https://reviews.llvm.org/D148701
Adding BTI to those PLT's which accessed with by a range extension thunk due to those preform an indirect call.
Fixes: #62140
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D148704
Changes:
- Adding BE32 big endian Support for Arm.
- Replace the writele and readle with their endian-aware versions.
- Adding test cases for the big-endian be32 arm configuration.
Patch by: Milosz Plichta. This patch merges all the changes from
this patch https://reviews.llvm.org/D140203 as well.
Reviewed By: peter.smith, MaskRay
Differential Revision: https://reviews.llvm.org/D140202
When a caller that does not use TOC calls a function, a call stub is needed if
the function may use TOC. --no-power10-stubs avoids PC-relative instructions in
the code sequence.
The --no-power10-stubs=no implementation added in D94627 is wrong.
First, the first instruction incorrectly uses `mflr 0` (instead of `mflr 12`).
Second, for the PLT case, it uses addis+addi with getVA instead of addis+ld with
getGotPltVA.
PPC64PCRelPLTStub (from D83669) duplicates lot of code from
PPC64R12SetupStub. Just merge them.
Note: PPC64R12SetupStub does not correctly handle long branch to a
non-preemptible non-TOC code.
Change:
- Replacing the memcpy that assume little endian with the endian-aware write.
Shouldn't affect the output for now, just a prerequisite for the next patches.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D140201
- Position independent thunks now work for both Armv4 and Armv4T
- Armv4 arm->arm thunks don't emit a BX anymore, which doesn't exist for the
arch. This fixes https://github.com/llvm/llvm-project/issues/50764.
- Armv4 and Armv4T both have the same arm->arm behaviour. Which also is
desirable for the above ticket.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D141272
In ThumbThunk::isCompatibleWith, we check if we can use short thunks if we are
within branch range. However these short thumb thunks will generate b.w
instructions, and these are not available on pre branch range extension
architectures.
On these architectures (v4, v5, and most of v6), we could replace the b.w with a
Thumb b (2) instruction, but that would in an ideal situation only give us an
extra range of 2048 bytes on top of the 4MB range of a BL, if a thunk section
happens to be placed on the outer range of a BL and the stars are aligned. It
doesn't seem worth it.
What would be worth it is a state change to Arm and a subsequent branch to
either Arm or Thumb code. But that's the subject of another patch.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D140633
changes:
- BLX: The Arm architecture versions that support the branch and link
instruction (BLX), can rewrite BLs in place when a state change from Arm<->Thumb
is required. Armv4T does not have BLX and so needs thunks for state changes.
- v4T Thumb long branches needed their own thunk. We could have used the v6M
implementation, but v6M doesn't have Arm state and must resolve to rather
inefficient stack reshuffling. We also can't reuse v7 thumb thunks as they use
MOVV/MOVT, which wasn't available yet for v4T.
- Remove the `lack of BLX' warning. LLVM only supports Arm Architecture versions
upwards of v4, which we now all support in LLD.
- renamed existing thunks to better reflect their use:
ARMV5ABSLongThunk -> ARMV5LongLdrPcThunk,
ARMV5PILongThunk -> ARMV4PILongThunk
- removed isCompatibleWith method from ARMV5ABSLongThunk and ARMV5PILongThunk,
as they were identical to the ARMThunk parent class implementation.
Support for (efficient) position independent thunks for v4T will be added in a
follow-up patch, including possible related thunk renaming and code comment
cleanup.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D139888