The existing implementation didn't handle when the input text section
was some offset from the output section.
This resulted in an assert in relaxGot() with an lld built with asserts
for some large binaries, or even worse, a silently broken binary with an
lld without asserts.
Close#77201. When linking code with a R_X86_64_TPOFF64 relocation, LLD
exits with an 'unknown reloaction' error message due to two missing
cases in relocation switch statements. This patch adds in those cases so
that LLD successfully links code R_X86_64_TPOFF64 relocations.
For each R_X86_64_(REX_)GOTPCRELX relocation, check that the offset to the symbol is representable with 2^32 signed offset. If not, add a GOT entry for it and set its expr to R_GOT_PC so that we emit the GOT load instead of the relaxed lea. Do this in finalizeAddressDependentContent() where we iteratively attempt this (e.g. RISCV uses this for relaxation, ARM uses this to insert thunks).
Decided not to do the opposite of inserting GOT entries initially and removing them when relaxable because removing GOT entries isn't simple.
One drawback of this approach is that if we see any GOTPCRELX relocation, we'll create an empty .got even if it's not required in the end.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D157020
When the .eh_frame section is placed at a non-zero
offset within its output section, the relocation
value within .eh_frame are computed incorrectly.
We had similar issue in .ARM.exidx section and it has been
fixed already in https://reviews.llvm.org/D148033.
While applying the relocation using S+A-P, the value
of P (the location of the relocation) is getting wrong.
P is:
P = SecAddr + rel.offset, But SecAddr points to the
starting address of the outputsection rather than the
starting address of the eh frame section within that
output section.
This issue affects all targets which generates .eh_frame
section. Hence fixing in all the corresponding targets it affecting.
to prepare for changing `relocations` from a SmallVector to a pointer.
Also change the `isec` parameter in `addAddendOnlyRelocIfNonPreemptible` to `GotSection &`.
The target-specific code (AArch64, PPC64) does not fit into the generic code and
adds virtual function overhead. Move relocateAlloc into ELF/Arch/ instead. This
removes many virtual functions (relaxTls*). In addition, this helps get rid of
getRelocTargetVA dispatch and many RelExpr members in the future.
In many call sites we know uncompression cannot happen (non-SHF_ALLOC, or the
data (even if compressed) must have been uncompressed by a previous pass).
Prefer rawData in these cases. data() increases code size and prevents
optimization on rawData.
to decrease sizeof(SymbolUnion) by 8 on ELF64 platforms.
Symbols needing such information are typically 1% or fewer (5134 out of 560520
when linking clang, 19898 out of 5550705 when linking chrome). Storing them
elsewhere can decrease memory usage and symbol initialization time.
There is a ~0.8% saving on max RSS when linking a large program.
Future direction:
* Move some of dynsymIndex/verdefIndex/versionId to SymbolAux
* Support mixed TLSDESC and TLS GD without increasing sizeof(SymbolUnion)
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D116281
The current TLSDESC optimization code assumes:
```
leaq x@tlsdesc(%rip), %rax
call *x@tlscall(%rax) # adjacent
```
From https://gitlab.freedesktop.org/mesa/mesa/-/issues/5665 , it seems that the
two instructions may not be adjacent in GCC 10's output:
```
leaq x@tlsdesc(%rip), %rax
something else
call *x@tlscall(%rax)
```
This patch supports the case. While here, support non-RAX registers for
R_X86_64_GOTPC32_TLSDESC, in case the compiler generates inefficient:
```
leaq x@tlsdesc(%rip), %rcx # or %rdx, %rbx, %rdi, ...
movq %rcx, %rax
call *x@tlscall(%rax) # GNU ld/gold error for non-RAX
```
Differential Revision: https://reviews.llvm.org/D114416
This brings back the original version of D81359.
I have found several use cases now.
* Unlike GNU ld, LLD's relocation processing is one pass. If we decide to
optimize(relax) R_X86_64_{,REX_}GOTPCRELX, we will suppress GOT generation and
cannot undo the decision later. Optimizing R_X86_64_REX_GOTPCRELX can usually
make it easy to hit `relocation R_X86_64_REX_GOTPCRELX out of range` because
the distance to GOT is usually shorter. Without --no-relax, the user has to
recompile with `-Wa,-mrelax-relocations=no`.
* The option would help during my investigationg of the root cause of https://git.kernel.org/linus/09e43968db40c33a73e9ddbfd937f46d5c334924
* There is need for relaxation for AArch64 & RISC-V. Implementing this for
x86-64 improves consistency with little target-specific cost (two-line
X86_64.cpp change).
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D113615
For a function call (using the default `-fplt`), GCC `-mcmodel=large` generates an assembly modifier which
leads to an R_X86_64_PLTOFF64 relocation. In real world,
http://git.ageinghacker.net/jitter (used by GNU poke) uses `-mcmodel=large`.
R_X86_64_PLTOFF64's formula is (if preemptible) `L - GOT + A` or (if non-preemptible) `S - GOT + A`
where `GOT` is (confusingly) the address of `.got.plt`
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D112386
Most architectures use .got instead of .got.plt, so switching the default can
minimize customization.
This fixes an issue for SPARC V9 which uses .got .
AVR, AMDGPU, and MSP430 don't seem to use _GLOBAL_OFFSET_TABLE_.
I found this missing case with the new --check-dynamic-relocation flag
while running the lld tests with --apply-dynamic-relocs enabled by default.
This also fixes a broken CHECK in lld/test/ELF/x86-64-gotpc-relax.s:
The test wasn't using CHECK-NEXT, so it was passing despite the output
actually containing relocations. I am not sure when this changed, but I
think this behaviour is correct.
Found with D101450 + enabling --apply-dynamic-relocs by default.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101452
D62727 removed GotEntrySize and GotPltEntrySize with a comment that they
are always equal to wordsize(), but that is not entirely true: X32 has a
word size of 4, but needs 8-byte GOT entries. This restores gotEntrySize
for both, adjusted for current naming conventions, but defaults it to
config->wordsize to keep things simple for architectures other than
x86_64.
This partially reverts D62727.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D102509
The scope of R_TLS (TP offset relocation types (TPREL/TPOFF) used for the
local-exec TLS model) is actually narrower than its name may imply. R_TLS_NEG
is only used by Solaris R_386_TLS_LE_32.
Rename them so that they will be less confusing.
Reviewed By: grimar, psmith, rprichard
Differential Revision: https://reviews.llvm.org/D93467
clang may produce `movl x@GOTPCREL+4(%rip), %eax` when loading the high 32 bits
of the address of a global variable in -fpic/-fpie mode.
If assembled by GNU as, the fixup emits an R_X86_64_GOTPCRELX with an
addend != -4. The instruction loads from the GOT entry with an offset
and thus it is incorrect to relax the instruction.
If assembled by the integrated assembler, we emit R_X86_64_GOTPCREL for
relocations that definitely cannot be relaxed (D92114), so this patch is not
needed.
This patch disables the relaxation, which is compatible with the implementation in GNU ld
("Add R_X86_64_[REX_]GOTPCRELX support to gas and ld").
Reviewed By: grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D91993
With this change, `TargetInfo::adjustRelaxExpr` is only related to TLS
relaxations and a subsequent clean-up can delete the `data` parameter.
Differential Revision: https://reviews.llvm.org/D92079
While MC did not produce R_X86_64_GOTPCRELX for test/binop instructions
(movl/adcl/addl/andl/...) before the previous commit, this code path has been
exercised by -fno-integrated-as for GNU as since 2016: -no-pie relaxing
may incorrectly access loc[-3] and produce a corrupted instruction.
Simply handle test/binop R_X86_64_GOTPCRELX like R_X86_64_GOTPCREL.
This is part of the Propeller framework to do post link code layout
optimizations. Please see the RFC here:
https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the
detailed RFC doc here:
https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf
This patch adds lld support for basic block sections and performs relaxations
after the basic blocks have been reordered.
After the linker has reordered the basic block sections according to the
desired sequence, it runs a relaxation pass to optimize jump instructions.
Currently, the compiler emits the long form of all jump instructions. AMD64 ISA
supports variants of jump instructions with one byte offset or a four byte
offset. The compiler generates jump instructions with R_X86_64 32-bit PC
relative relocations. We would like to use a new relocation type for these jump
instructions as it makes it easy and accurate while relaxing these instructions.
The relaxation pass does two things:
First, it deletes all explicit fall-through direct jump instructions between
adjacent basic blocks. This is done by discarding the tail of the basic block
section.
Second, If there are consecutive jump instructions, it checks if the first
conditional jump can be inverted to convert the second into a fall through and
delete the second.
The jump instructions are relaxed by using jump instruction mods, something
like relocations. These are used to modify the opcode of the jump instruction.
Jump instruction mods contain three values, instruction offset, jump type and
size. While writing this jump instruction out to the final binary, the linker
uses the jump instruction mod to determine the opcode and the size of the
modified jump instruction. These mods are required because the input object
files are memory-mapped without write permissions and directly modifying the
object files requires copying these sections. Copying a large number of basic
block sections significantly bloats memory.
Differential Revision: https://reviews.llvm.org/D68065
Symbol information can be used to improve out-of-range/misalignment diagnostics.
It also helps R_ARM_CALL/R_ARM_THM_CALL which has different behaviors with different symbol types.
There are many (67) relocateOne() call sites used in thunks, {Arm,AArch64}errata, PLT, etc.
Rename them to `relocateNoSym()` to be clearer that there is no symbol information.
Reviewed By: grimar, peter.smith
Differential Revision: https://reviews.llvm.org/D73254
These functions call relocateOne(). This patch is a prerequisite for
making relocateOne() aware of `Symbol` (D73254).
Reviewed By: grimar, nickdesaulniers
Differential Revision: https://reviews.llvm.org/D73250
This patch is a joint work by Rui Ueyama and me based on D58102 by Xiang Zhang.
It adds Intel CET (Control-flow Enforcement Technology) support to lld.
The implementation follows the draft version of psABI which you can
download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI.
CET introduces a new restriction on indirect jump instructions so that
you can limit the places to which you can jump to using indirect jumps.
In order to use the feature, you need to compile source files with
-fcf-protection=full.
* IBT is enabled if all input files are compiled with the flag. To force enabling ibt, pass -z force-ibt.
* SHSTK is enabled if all input files are compiled with the flag, or if -z shstk is specified.
IBT-enabled executables/shared objects have two PLT sections, ".plt" and
".plt.sec". For the details as to why we have two sections, please read
the comments.
Reviewed By: xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D59780
PltSection is used by both PLT and IPLT. The PLT section may have a
header while the IPLT section does not. Split off IpltSection from
PltSection to be clearer.
Unlike other targets, PPC64 cannot use the same code sequence for PLT
and IPLT. This helps make a future PPC64 patch (D71509) more isolated.
On EM_386 and EM_X86_64, when PLT is empty while IPLT is not, currently
we are inconsistent whether the PLT header is conceptually attached to
in.plt or in.iplt . Consistently attach the header to in.plt can make
the -z retpolineplt logic simpler. It also makes `jmp` point to an
aesthetically better place for non-retpolineplt cases.
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D71519
This change only affects EM_386. relOff can be computed from `index`
easily, so it is unnecessarily passed as a parameter.
Both in.plt and in.iplt entries are written by writePLT. For in.iplt,
the instruction `push reloc_offset` will change because `index` is now
different. Fortunately, this does not matter because `push; jmp` is only
used by PLT. IPLT does not need the code sequence.
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D71518
This makes it clear `ELF/**/*.cpp` files define things in the `lld::elf`
namespace and simplifies `elf::foo` to `foo`.
Reviewed By: atanasyan, grimar, ruiu
Differential Revision: https://reviews.llvm.org/D68323
llvm-svn: 373885
This patch is mechanically generated by clang-llvm-rename tool that I wrote
using Clang Refactoring Engine just for creating this patch. You can see the
source code of the tool at https://reviews.llvm.org/D64123. There's no manual
post-processing; you can generate the same patch by re-running the tool against
lld's code base.
Here is the main discussion thread to change the LLVM coding style:
https://lists.llvm.org/pipermail/llvm-dev/2019-February/130083.html
In the discussion thread, I proposed we use lld as a testbed for variable
naming scheme change, and this patch does that.
I chose to rename variables so that they are in camelCase, just because that
is a minimal change to make variables to start with a lowercase letter.
Note to downstream patch maintainers: if you are maintaining a downstream lld
repo, just rebasing ahead of this commit would cause massive merge conflicts
because this patch essentially changes every line in the lld subdirectory. But
there's a remedy.
clang-llvm-rename tool is a batch tool, so you can rename variables in your
downstream repo with the tool. Given that, here is how to rebase your repo to
a commit after the mass renaming:
1. rebase to the commit just before the mass variable renaming,
2. apply the tool to your downstream repo to mass-rename variables locally, and
3. rebase again to the head.
Most changes made by the tool should be identical for a downstream repo and
for the head, so at the step 3, almost all changes should be merged and
disappear. I'd expect that there would be some lines that you need to merge by
hand, but that shouldn't be too many.
Differential Revision: https://reviews.llvm.org/D64121
llvm-svn: 365595
Similar to R_AARCH64_ABS32, R_PPC64_ADDR32 can represent either a signed
value or unsigned value, thus we should use `[-2**(n-1), 2**n)` instead of
`[-2**(n-1), 2**(n-1))` to check overflows.
The issue manifests as a bogus linker error when linking the powerpc64le Linux kernel.
The new behavior is compatible with ld.bfd's complain_overflow_bitfield.
The upper bound of the error message is not correct. Fix it as well.
The changes to R_PPC_ADDR16, R_PPC64_ADDR16, R_X86_64_8 and R_X86_64_16 are similar.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D63690
llvm-svn: 364164
The current rule is loose: `!Sym.IsPreemptible || Expr == R_GOT`.
When the symbol is non-preemptable, this allows absolute relocation
types with smaller numbers of bits, e.g. R_X86_64_{8,16,32}. They are
disallowed by ld.bfd and gold, e.g.
ld.bfd: a.o: relocation R_X86_64_8 against `.text' can not be used when making a shared object; recompile with -fPIC
This patch:
a) Add TargetInfo::SymbolicRel to represent relocation types that resolve to a
symbol value (e.g. R_AARCH_ABS64, R_386_32, R_X86_64_64).
As a side benefit, we currently (ab)use GotRel (R_*_GLOB_DAT) to resolve
GOT slots that are link-time constants. Since we now use Target->SymbolRel
to do the job, we can remove R_*_GLOB_DAT from relocateOne() for all targets.
R_*_GLOB_DAT cannot be used as static relocation types.
b) Change the condition to `!Sym.IsPreemptible && Type != Target->SymbolicRel || Expr == R_GOT`.
Some tests are caught by the improved error checking (ld.bfd/gold also
issue errors on them). Many misuse .long where .quad should be used
instead.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D63121
llvm-svn: 363059
We create several types of synthetic sections for loadable partitions, including:
- The dynamic symbol table. This allows code outside of the loadable partitions
to find entry points with dlsym.
- Creating a dynamic symbol table also requires the creation of several other
synthetic sections for the partition, such as the dynamic table and hash table
sections.
- The partition's ELF header is represented as a synthetic section in the
combined output file, and will be used by llvm-objcopy to extract partitions.
Differential Revision: https://reviews.llvm.org/D62350
llvm-svn: 362819
GotEntrySize and GotPltEntrySize were added in D22288. Later, with
the introduction of wordsize() (then Config->Wordsize), they become
redundant, because there is no target that sets GotEntrySize or
GotPltEntrySize to a number different from Config->Wordsize.
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D62727
llvm-svn: 362220
This handles two initial relocation types R_X86_64_GOTPC32_TLSDESC and
R_X86_64_TLSDESC_CALL, as well as the GD->LE and GD->IE relaxations.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D62513
llvm-svn: 361911
This is based on D54720 by Sean Fertile.
When accessing a global symbol which is not defined in the translation unit,
compilers will generate instructions that load the address from the toc entry.
If the symbol is defined, non-preemptable, and addressable with a 32-bit
signed offset from the toc pointer, the address can be computed
directly. e.g.
addis 3, 2, .LC0@toc@ha # R_PPC64_TOC16_HA
ld 3, .LC0@toc@l(3) # R_PPC64_TOC16_LO_DS, load the address from a .toc entry
ld/lwa 3, 0(3) # load the value from the address
.section .toc,"aw",@progbits
.LC0: .tc var[TC],var
can be relaxed to
addis 3,2,var@toc@ha # this may be relaxed to a nop,
addi 3,3,var@toc@l # then this becomes addi 3,2,var@toc
ld/lwa 3, 0(3) # load the value from the address
We can delete the test ppc64-got-indirect.s as its purpose is covered by
newly added ppc64-toc-relax.s and ppc64-toc-relax-constants.s
Reviewed By: ruiu, sfertile
Differential Revision: https://reviews.llvm.org/D60958
llvm-svn: 360112
Summary:
Fixes PR35242. A simplified reproduce:
thread_local int i; int f() { return i; }
% {g++,clang++} -fPIC -shared -ftls-model=local-dynamic -fuse-ld=lld a.cc
ld.lld: error: can't create dynamic relocation R_X86_64_DTPOFF32 against symbol: i in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output
In isStaticLinkTimeConstant(), Syn.IsPreemptible is true, so it is not
seen as a constant. The error is then issued in processRelocAux().
A symbol of the local-dynamic TLS model cannot be preempted but it can
preempt symbols of the global-dynamic TLS model in other DSOs.
So it makes some sense that the variable is not static.
This patch fixes the linking error by changing getRelExpr() on
R_386_TLS_LDO_32 and R_X86_64_DTPOFF{32,64} from R_ABS to R_DTPREL.
R_PPC64_DTPREL_* and R_MIPS_TLS_DTPREL_* need similar fixes, but they are not handled in this patch.
As a bonus, we use `if (Expr == R_ABS && !Config->Shared)` to find
ld-to-le opportunities. R_ABS is overloaded here for such STT_TLS symbols.
A dedicated R_DTPREL is clearer.
Differential Revision: https://reviews.llvm.org/D60945
llvm-svn: 358870