156 Commits

Author SHA1 Message Date
Brian Cain
b42f96bc05
[lld] Add thunks for hexagon (#111217)
Without thunks, programs will encounter link errors complaining that the
branch target is out of range. Thunks will extend the range of branch
targets, which is a critical need for large programs. Thunks provide
this flexibility at a cost of some modest code size increase.

When configured with the maximal feature set, the hexagon port of the
linux kernel would often encounter these limitations when linking with
`lld`.

The relocations which will be extended by thunks are:

* R_HEX_B22_PCREL, R_HEX_{G,L}D_PLT_B22_PCREL, R_HEX_PLT_B22_PCREL
relocations have a range of ± 8MiB on the baseline
* R_HEX_B15_PCREL: ±65,532 bytes
* R_HEX_B13_PCREL: ±16,380 bytes
* R_HEX_B9_PCREL: ±1,020 bytes

Fixes #149689 

Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>

---------

Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
2025-07-20 11:46:31 -05:00
Peter Smith
eb0f1dc00e
[LLD][ELF] Include offset when adding Thunk symbols (#144995)
Include the offset of a thunk in the ThunkSection when adding symbols.

At Thunk creation time the offset is set to 0 as we don't know where in
the ThunkSection the Thunk will end up. The symbol values are updated by
the setOffset() call in assignOffsets().

When we transform a thunk from a short to a long, we sometimes add a
mapping symbol. At this point the offset of the thunk is non zero and we
need to account for that when defining the symbol, as the setOffset()
call subtracts the offset before adding the new one back in.

To test; added a second thunk that is converted to a long thunk to
aarch64-thunk-bit-multipass. This second thunk is given a non zero
offset from the start of the Thunk Section so we can observe the mapping
symbol being put in the wrong place without accounting for the offset.

fixes: https://github.com/llvm/llvm-project/issues/142326
2025-06-20 10:11:42 +01:00
Peter Smith
e47d3a3088
[LLD][AArch64] Increase alignment of AArch64AbsLongThunk to 8 (#133738)
This permits an AArch64AbsLongThunk to be used in an environment where
unaligned accesses are disabled.

The AArch64AbsLongThunk does a load of an 8-byte address. When unaligned
accesses are disabled this address must be 8-byte aligned.

The vast majority of AArch64 systems will have unaligned accesses
enabled in userspace. However, after a reset, before the MMU has been
enabled, all memory accesses are to "device" memory, which requires
aligned accesses. In systems with multi-stage boot loaders a thunk may
be required to a later stage before the MMU has been enabled.

As we only want to increase the alignment when the ldr is used we delay
the increase in thunk alignment until we know we are going to write an
ldr. We also need to account for the ThunkSection alignment increase
when this happens.

In some of the test updates, particularly those with shared CHECK lines
with position independent thunks it was easier to ensure that the thunks
started at an 8-byte aligned address in all cases.
2025-04-01 09:49:27 +01:00
Csanád Hajdú
6e457c2001
[LLD][ELF][AArch64] Add support for SHF_AARCH64_PURECODE ELF section flag (3/3) (#125689)
Add support for the new SHF_AARCH64_PURECODE ELF section flag:
https://github.com/ARM-software/abi-aa/pull/304

The general implementation follows the existing one for ARM targets. The
output section only has the `SHF_AARCH64_PURECODE` flag set if all input
sections have it set.

Related PRs:
* LLVM: https://github.com/llvm/llvm-project/pull/125687
* Clang: https://github.com/llvm/llvm-project/pull/125688
2025-02-21 09:01:38 -08:00
Peter Smith
457e14b926
[LLD][ARM] Arm v6-m should not use short Thunks. (#118111)
Thumb short thunks use the B.w instruction. This instruction is not
present on Arm v6-m so we should prevent these targets from using
short-thunks. We want to permit Arm v8-m.base targets to continue using
short thunks as it does have the B.w instruction despite not
implementing all of Thumb 2.

Add a check to see if the Movt and Movw instructions are present before
enabling short thunks for Thumb. The v6-m architecture has
J1J2BranchEncoding, but it does not have Movt and Movw, whereas
v8-m.base has both.

The memory map and limited flash size of an Arm v6-m CPU makes a short
thunk very unlikely in practice, but it is worth getting it right just
in case.
2024-12-09 11:24:45 +00:00
Peter Smith
27923f7e1a
[LLD][AArch64][ARM] Delay adding long thunk mapping symbols (#116975)
When we create a thunk we don't know whether it will be short or long.
Move the emission of the long thunk mapping symbol to when we transition
to a long thunk. This improves disassembly and binary analysis as tools
like BOLT identify thunks by disassembly.

This removes a FIXME added in #108989 aarch64-thunk-bti-multipass.s
which had a corrupt disassembly due to missing mapping symbols.
2024-11-21 14:26:25 +00:00
Fangrui Song
37e39667cc [ELF] Make ThunkCreator take ownership of thunks
This removes many SpecificAlloc instantiations and makes my lld (x86-64
Release+Assertions) smaller by ~36k.
2024-11-19 23:16:35 -08:00
Fangrui Song
2991a4e209 [ELF] Replace functions bAlloc/saver/uniqueSaver with member access 2024-11-16 22:34:13 -08:00
Fangrui Song
a626eb2a2f [ELF] Pass ctx to bAlloc/saver/uniqueSaver 2024-11-16 15:20:21 -08:00
Fangrui Song
d69cc05bcf [ELF] Migrate away from global ctx 2024-11-14 22:30:29 -08:00
Fangrui Song
8440ced89f [ELF] Change a Fatal to assert in addThunkAArch64. NFC 2024-11-07 19:43:18 -08:00
Fangrui Song
63c6fe4a0b [ELF] Replace fatal(...) with Fatal or Err 2024-11-06 21:17:26 -08:00
Fangrui Song
861bd36bce [ELF] Pass Ctx & to Symbol::getVA 2024-10-19 20:32:58 -07:00
Fangrui Song
0dbc85a59f [ELF] Pass Ctx & to Arch-specific code 2024-10-13 11:08:06 -07:00
Fangrui Song
002ca63b3f [ELF] Pass Ctx & to (read|write)(16|64) 2024-10-13 10:47:18 -07:00
Fangrui Song
38dfcd9ac9 [ELF] Pass Ctx & to read32/write32 2024-10-13 10:37:47 -07:00
Fangrui Song
6dd773b650 [ELF] Pass Ctx & 2024-10-11 20:15:02 -07:00
Fangrui Song
4fadf41c2f [ELF] Pass Ctx & to ARM/AArch64 2024-10-07 23:29:11 -07:00
Fangrui Song
acb2b1e779 [ELF] Pass Ctx & to Symbols 2024-10-06 16:59:04 -07:00
Fangrui Song
6d03a69034 [ELF] Pass Ctx & to Arch/ 2024-10-06 00:14:12 -07:00
Fangrui Song
67c0846357 [ELF] Don't call getPPC64TargetInfo outside Driver. NFC
getPPC64TargetInfo should only be called once per link invocation.
2024-10-05 10:29:25 -07:00
Peter Smith
c4d9cd8b74
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989)
When Branch Target Identification BTI is enabled all indirect branches
must target a BTI instruction. A long branch thunk is a source of
indirect branches. To date LLD has been assuming that the object
producer is responsible for putting a BTI instruction at all places the
linker might generate an indirect branch to. This is true for clang, but
not for GCC. GCC will elide the BTI instruction when it can prove that
there are no indirect branches from outside the translation unit(s). GNU
ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for
the destination when a long range stub was needed [1].

This means that using GCC compiled objects with LLD may lead to LLD
generating an indirect branch to a location without a BTI. The ABI [2]
has also been clarified to say that it is a static linker's
responsibility to generate a landing pad when the target does not have a
BTI.

This patch implements the same mechansim as GNU ld. When the output ELF
file is setting the
GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the
destination to see if it has a BTI instruction. If it does not we
generate a landing pad consisting of:
BTI c
B <destination>

The B <destination> can be elided if the thunk can be placed so that
control flow drops through. For example:
BTI c
<destination>:
This will be common when -ffunction-sections is used.

The landing pad thunks are effectively alternative entry points for the
function. Direct branches are unaffected but any linker generated
indirect branch needs to use the alternative. We place these as close as
possible to the destination section.

There is some further optimization possible. Consider the case:
.text
fn1
...
fn2
...

If we need landing pad thunks for both fn1 and fn2 we could order them
so that the thunk for fn1 immediately precedes fn1. This could save a
single branch. However I didn't think that would be worth the additional
complexity.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
[2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
Fangrui Song
04e69ad727 [ELF] Pass Ctx & to Thunk 2024-09-29 15:20:01 -07:00
Fangrui Song
cf30e8e153 [ELF] Pass Ctx & to Thunk 2024-09-29 14:59:57 -07:00
Fangrui Song
c3e4998c0b [ELF] Pass Ctx & to TargetInfo. NFC 2024-09-28 21:48:26 -07:00
Fangrui Song
29783f70db [ELF] Pass Ctx & to Relocations 2024-09-28 19:17:18 -07:00
Fangrui Song
ff8d55f8d5 [ELF] Replace config-> with ctx.arg. in Relocations and Thunks 2024-09-21 19:56:07 -07:00
Fangrui Song
e88b7ff016 [ELF] Move InStruct into Ctx. NFC
Ctx was introduced in March 2022 as a more suitable place for such
singletons.

llvm/Support/thread.h includes <thread>, which transitively includes
sstream in libc++ and uses ios_base::in, so we cannot use `#define in ctx.sec`.

`symtab, config, ctx` are now the only variables using
LLVM_LIBRARY_VISIBILITY.
2024-09-15 22:15:02 -07:00
JOE1994
4b27b5800f [lld] Nits on uses of raw_string_ostream (NFC)
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Strip calls to raw_string_ostream::str(), to avoid excess layer of indirection.
2024-09-15 04:23:11 -04:00
Fangrui Song
b4feb26606 [ELF] Move target to Ctx. NFC
Ctx was introduced in March 2022 as a more suitable place for such
singletons.

Follow-up to driver (2022-10) and script (2024-08).
2024-08-21 23:53:36 -07:00
Fangrui Song
c62fa63ff1 [ELF] Move mainPart to Ctx. NFC
Ctx was introduced in March 2022 as a more suitable place for such
singletons.
2024-08-21 20:08:11 -07:00
Nico Weber
bf3d5dbe2f [lld/ELF] fix typos to cycle bots 2024-02-13 18:31:32 -05:00
Mitch Phillips
ca35a19aca [lld] Synthesize metadata for MTE globals
As per the ABI at
https://github.com/ARM-software/abi-aa/blob/main/memtagabielf64/memtagabielf64.rst,
this patch interprets the SHT_AARCH64_MEMTAG_GLOBALS_STATIC section,
which contains R_NONE relocations to tagged globals, and emits a
SHT_AARCH64_MEMTAG_GLOBALS_DYNAMIC section, with the correct
DT_AARCH64_MEMTAG_GLOBALS and DT_AARCH64_MEMTAG_GLOBALSSZ dynamic
entries. This section describes, in a uleb-encoded stream, global memory
ranges that should be tagged with MTE.

We are also out of bits to spare in the LLD Symbol class. As a result,
I've reused the 'needsTocRestore' bit, which is a PPC64 only feature.
Now, it's also used for 'isTagged' on AArch64.

An entry in SHT_AARCH64_MEMTAG_GLOBALS_STATIC is practically a guarantee
from an objfile that all references to the linked symbol are through the
GOT, and meet correct alignment requirements. As a result, we go through
all symbols and make sure that, for all symbols $SYM, all object files
that reference $SYM also have a SHT_AARCH64_MEMTAG_GLOBALS_STATIC entry
for $SYM. If this isn't the case, we demote the symbol to being
untagged. Symbols that are imported from other DSOs should always be
fine, as they're GOT-referenced (and thus the GOT entry either has the
correct tag or not, depending on whether it's tagged in the defining DSO
or not).

Additionally hand-tested by building {libc, libm, libc++, libm, and libnetd}
on Android with some experimental MTE globals support in the
linker/libc.

Reviewed By: MaskRay, peter.smith

Differential Revision: https://reviews.llvm.org/D152921
2023-07-31 17:07:42 +02:00
Keith Walker
5c42ba9837 [ARM] armv6m eXecute Only (XO) long branch Thunk
This patch adds a thunk for Thumb long branch on V6-M for eXecute Only.

Note that there is currently no support for a position independant and
eXecute Only V6-M long branch thunk

Differential Revision: https://reviews.llvm.org/D153772
2023-06-30 11:56:41 +01:00
Simi Pallipurath
f146763e07 Revert "Revert "[lld][Arm] Big Endian - Byte invariant support.""
This reverts commit d8851384c6ac2a1cea15e05228dbde5f13654e23.

Reason: Applied the fix for the Asan buildbot failures.
2023-06-22 16:10:18 +01:00
Simi Pallipurath
d8851384c6 Revert "[lld][Arm] Big Endian - Byte invariant support."
This reverts commit 8cf8956897ce9bca3176c6339077b1ca17b27abc.
2023-06-20 17:27:44 +01:00
Simi Pallipurath
8cf8956897 [lld][Arm] Big Endian - Byte invariant support.
Arm has BE8 big endian configuration called a byte-invariant(every byte has the same address on little and big-endian systems).

When in BE8 mode:
  1. Instructions are big-endian in relocatable objects but
     little-endian in executables and shared objects.
  2. Data is big-endian.
  3. The data encoding of the ELF file is ELFDATA2MSB.

To support BE8 without an ABI break for relocatable objects,the linker takes on the responsibility of changing the endianness of instructions. At a high level the only difference between BE32 and BE8 in the linker is that for BE8:
  1. The linker sets the flag EF_ARM_BE8 in the ELF header.
  2. The linker endian reverses the instructions, but not data.

This patch adds BE8 big endian support for Arm. To endian reverse the instructions we'll need access to the mapping symbols. Code sections can contain a mix of Arm, Thumb and literal data. We need to endian reverse Arm instructions as words, Thumb instructions
as half-words and ignore literal data.The only way to find these transitions precisely is by using mapping symbols. The instruction reversal will need to take place after relocation. For Arm BE8 code sections (Section has SHF_EXECINSTR flag ) we inserted a step after relocation to endian reverse the instructions. The implementation strategy i have used here is to write all sections BE32  including SyntheticSections then endian reverse all code in InputSections via mapping symbols.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D150870
2023-06-20 14:08:21 +01:00
Stefan Pintilie
658f23fc46 [LLD] Emit DT_PPC64_OPT into the dynamic section
As per section 4.2.2 of the PowerPC ELFv2 ABI, this value tells the dynamic linker which optimizations it is allowed to do.
Specifically, the higher order bit of the two tells the dynamic linker that there may be multiple TOC pointers in the binary.

When we resolve any NOTOC relocations during linking, we need to set this value because we may be calling
TOC functions from NOTOC functions when the NOTOC function already clobbered the TOC pointer.

In practice, this ensures that the PLT resolver always resolves the call to the GEP (global entry point) of
the TOC function (which will set up the TOC for the TOC function).

Original patch by nemanjai

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D150631
2023-06-05 12:18:29 -04:00
Ben Shi
229fcad7fc [lld][ELF] Support relocations R_AVR_LO8_LDI_GS/R_AVR_HI8_LDI_GS
Relocations R_AVR_LO8_LDI_GS/R_AVR_HI8_LDI_GS (indirect calls
via function pointers) only cover range 128KiB. They are
equivalent to R_AVR_LO8_LDI_PM/R_AVR_HI8_LDI_PM within this
range.

But for function addresses beyond this range, GNU-ld emits
trampolines. And this patch implements corresponding thunks
for them in lld.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D147364
2023-04-28 11:42:06 +08:00
Peter Smith
d0cdc5ddd7 [LLD][ELF][AArch64] Add AArch64 short range thunk support
The AArch64 branch immediate instruction has a 128MiB range. This
makes it suitable for use a short range thunk in the same way as
short thunks are implemented in Arm and PPC. This patch adds
support for short range thunks to AArch64.

Adding short range thunk support should mean that OutputSections
can grow to nearly 256 MiB in size without needing long-range
indirect branches.

Differential Revision: https://reviews.llvm.org/D148701
2023-04-24 13:48:22 +01:00
Daniel Kiss
60827df765 [lld][AArch64] Add BTI landing pad to PLT when it is accessed by a range extension thunk.
Adding BTI to those PLT's which accessed with by a range extension thunk due to those preform an indirect call.
Fixes: #62140

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D148704
2023-04-23 23:17:02 +02:00
Ben Shi
f1f6ca582e [lld][ELF][NFC] Simplify method "Thunk *elf::addThunk()"
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D147124
2023-03-30 11:35:07 +08:00
Simi Pallipurath
2f68ddc604 [lld][ARM][2/3]Big Endian support - Word invariant support
Changes:
 - Adding BE32 big endian Support for Arm.
 - Replace the writele and readle with their endian-aware versions.
 - Adding test cases for the big-endian be32 arm configuration.

     Patch by: Milosz Plichta. This patch merges all the changes from
     this patch https://reviews.llvm.org/D140203 as well.

Reviewed By: peter.smith, MaskRay

Differential Revision: https://reviews.llvm.org/D140202
2023-03-29 10:21:00 +01:00
Fangrui Song
7198c87f42 [ELF][PPC64] Actually implement --no-power10-stubs
When a caller that does not use TOC calls a function, a call stub is needed if
the function may use TOC. --no-power10-stubs avoids PC-relative instructions in
the code sequence.

The --no-power10-stubs=no implementation added in D94627 is wrong.
First, the first instruction incorrectly uses `mflr 0` (instead of `mflr 12`).
Second, for the PLT case, it uses addis+addi with getVA instead of addis+ld with
getGotPltVA.
2023-02-27 16:19:13 -08:00
Fangrui Song
8ce135123e [ELF][PPC64] Merge PPC64R12SetupStub and PPC64PCRelPLTStub. NFC
PPC64PCRelPLTStub (from D83669) duplicates lot of code from
PPC64R12SetupStub. Just merge them.

Note: PPC64R12SetupStub does not correctly handle long branch to a
non-preemptible non-TOC code.
2023-02-27 14:33:18 -08:00
Simi Pallipurath
674f094d85 [lld][ARM][NFCI][1/3]Big Endian support - Removing assumptions
Change:
 - Replacing the memcpy that assume little endian with the endian-aware write.

Shouldn't affect the output for now, just a prerequisite for the next patches.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140201
2023-02-15 11:42:49 +00:00
Ties Stuij
6f9ff1beee [lld][ARM] support position independent thunks for Armv4(T)
- Position independent thunks now work for both Armv4 and Armv4T
- Armv4 arm->arm thunks don't emit a BX anymore, which doesn't exist for the
  arch. This fixes https://github.com/llvm/llvm-project/issues/50764.
- Armv4 and Armv4T both have the same arm->arm behaviour. Which also is
  desirable for the above ticket.

Reviewed By: MaskRay, peter.smith

Differential Revision: https://reviews.llvm.org/D141272
2023-01-13 11:54:41 +00:00
Ties Stuij
747fc27ee4 [lld][ARM] don't use short thumb thunks if no branch range extension
In ThumbThunk::isCompatibleWith, we check if we can use short thunks if we are
within branch range. However these short thumb thunks will generate b.w
instructions, and these are not available on pre branch range extension
architectures.

On these architectures (v4, v5, and most of v6), we could replace the b.w with a
Thumb b (2) instruction, but that would in an ideal situation only give us an
extra range of 2048 bytes on top of the 4MB range of a BL, if a thunk section
happens to be placed on the outer range of a BL and the stars are aligned. It
doesn't seem worth it.

What would be worth it is a state change to Arm and a subsequent branch to
either Arm or Thumb code. But that's the subject of another patch.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140633
2023-01-09 11:45:49 +00:00
Ties Stuij
62c605771a [lld][ARM] support absolute thunks for Armv4T Thumb and interworking
changes:
- BLX: The Arm architecture versions that support the branch and link
  instruction (BLX), can rewrite BLs in place when a state change from Arm<->Thumb
  is required. Armv4T does not have BLX and so needs thunks for state changes.
- v4T Thumb long branches needed their own thunk. We could have used the v6M
  implementation, but v6M doesn't have Arm state and must resolve to rather
  inefficient stack reshuffling. We also can't reuse v7 thumb thunks as they use
  MOVV/MOVT, which wasn't available yet for v4T.
- Remove the `lack of BLX' warning. LLVM only supports Arm Architecture versions
  upwards of v4, which we now all support in LLD.
- renamed existing thunks to better reflect their use:
  ARMV5ABSLongThunk -> ARMV5LongLdrPcThunk,
  ARMV5PILongThunk -> ARMV4PILongThunk
- removed isCompatibleWith method from ARMV5ABSLongThunk and ARMV5PILongThunk,
  as they were identical to the ARMThunk parent class implementation.

Support for (efficient) position independent thunks for v4T will be added in a
follow-up patch, including possible related thunk renaming and code comment
cleanup.

Reviewed By: MaskRay, peter.smith

Differential Revision: https://reviews.llvm.org/D139888
2022-12-21 11:04:32 +00:00
Fangrui Song
4191fda69c [ELF] Change most llvm::Optional to std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 19:19:15 -08:00