The inaccurate #111945 condition fixes a PROVIDE regression (#111478)
but introduces another regression: in a DSO link, if a symbol referenced
only by bitcode files is defined as PROVIDE_HIDDEN, lld would not set
the visibility correctly, leading to an assertion failure in
DynamicReloc::getSymIndex (https://reviews.llvm.org/D123985).
This is because `(sym->isUsedInRegularObj || sym->exportDynamic)` is
initially false (bitcode undef does not set `isUsedInRegularObj`) then
true (in `addSymbol`, after LTO compilation).
Fix this by making the condition accurate: use a map to track defined
symbols.
Reviewers: smithp35
Reviewed By: smithp35
Pull Request: https://github.com/llvm/llvm-project/pull/112386
When encountering an instruction like `if (p0) r0 = add(r0,##bar@GOT)`,
lld would fail with:
```
ld.lld: error: unrecognized instruction for 16_X type: 0x7400C000
```
This issue was encountered while building libreadline with clang 19.1.0.
Fixes: #111876
Case: `PROVIDE(f1 = bar);` when both `f1` and `bar` are in separate
sections that would be discarded by GC.
Due to `demoteDefined`, `shouldAddProvideSym(f1)` may initially return
false (when Defined) and then return true (been demoted to Undefined).
```
addScriptReferencedSymbolsToSymTable
shouldAddProvideSym(f1): false
// the RHS (bar) is not added to `referencedSymbols` and may be GCed
declareSymbols
shouldAddProvideSym(f1): false
markLive
demoteSymbolsAndComputeIsPreemptible
// demoted f1 to Undefined
processSymbolAssignments
addSymbol
shouldAddProvideSym(f1): true
```
The inconsistency can cause `cmd->expression()` in `addSymbol` to be
evaluated, leading to `symbol not found: bar` errors (since `bar` in the
RHS is not in `referencedSymbols` and is GCed) (#111478).
Fix this by adding a `sym->isUsedInRegularObj` condition, making
`shouldAddProvideSym(f1)` values consistent. In addition, we need a
`sym->exportDynamic` condition to keep provide-shared.s working.
Fixes: ebb326a51fec37b5a47e5702e8ea157cd4f835cd
Pull Request: https://github.com/llvm/llvm-project/pull/111945
The RISC-V psABI states that "The `R_RISCV_PCREL_LO12_I` or
`R_RISCV_PCREL_LO12_S` relocations contain a label pointing to an
instruction in the same section with an `R_RISCV_PCREL_HI20` relocation
entry that points to the target symbol."
Without this patch, GNU ld errors, but LLD does not -- I think because LLD is
doing the right thing, certainly in the testcase provided.
Nonetheless, I think an error is good here to bring LLD in line with
what GNU ld is doing in showing that the object the user provided is not
following the psABI as written.
Fixes#107304
We've noticed that for large builds executing thin-link can take on the
order of 10s of minutes. We are only using a single thread to write the
sharded indices and import files for each input bitcode file. While we
need to ensure the index file produced lists modules in a deterministic
order, that doesn't prevent us from executing the rest of the work in
parallel.
In this change we use a thread pool to execute as much of the backend's
work as possible in parallel. In local testing on a machine with 80
cores, this change makes a thin-link for ~100,000 input files run in ~2
minutes. Without this change it takes upwards of 10 minutes.
---------
Co-authored-by: Nuri Amari <nuriamari@fb.com>
When Branch Target Identification BTI is enabled all indirect branches
must target a BTI instruction. A long branch thunk is a source of
indirect branches. To date LLD has been assuming that the object
producer is responsible for putting a BTI instruction at all places the
linker might generate an indirect branch to. This is true for clang, but
not for GCC. GCC will elide the BTI instruction when it can prove that
there are no indirect branches from outside the translation unit(s). GNU
ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for
the destination when a long range stub was needed [1].
This means that using GCC compiled objects with LLD may lead to LLD
generating an indirect branch to a location without a BTI. The ABI [2]
has also been clarified to say that it is a static linker's
responsibility to generate a landing pad when the target does not have a
BTI.
This patch implements the same mechansim as GNU ld. When the output ELF
file is setting the
GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the
destination to see if it has a BTI instruction. If it does not we
generate a landing pad consisting of:
BTI c
B <destination>
The B <destination> can be elided if the thunk can be placed so that
control flow drops through. For example:
BTI c
<destination>:
This will be common when -ffunction-sections is used.
The landing pad thunks are effectively alternative entry points for the
function. Direct branches are unaffected but any linker generated
indirect branch needs to use the alternative. We place these as close as
possible to the destination section.
There is some further optimization possible. Consider the case:
.text
fn1
...
fn2
...
If we need landing pad thunks for both fn1 and fn2 we could order them
so that the thunk for fn1 immediately precedes fn1. This could save a
single branch. However I didn't think that would be worth the additional
complexity.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
[2] https://github.com/ARM-software/abi-aa/issues/196
Swap `!DisassembleZeroes` and `if (DumpARMELFData)` conditions so that
in the false DisassembleZeroes case (default), `...` will be printed for
long consecutive zeroes, even when a data mapping symbol is active.
This is especially useful for certain lld tests that insert a huge
padding within a code section. Without `...` the output will be huge.
Pull Request: https://github.com/llvm/llvm-project/pull/109553
Similar to commit 686cff17cc310884e48ae963bf7507f96950cc90 for SHT_REL (#57693).
CREL hasn't been tested with ICF before.
And avoid a pitfall that eqClass[0] might interfere with ICF.
When both CREL and the experimental lld partitions feature are enabled,
the relocation section may look like .crel.llvm_sympart.f1, and
`rels.relas` is empty. While here, support relocation sections with zero
entry.
In AArch64, the endianness of instruction encodings is always little,
whereas the endianness of data swaps between LE and BE modes. So
getImplicitAddend must use the right one of read32() and read32le(), for
data and code respectively. It was using read32() throughout, causing
instructions to be read as big-endian in BE mode, getting the wrong
addend.
Fixed, and updated the existing test to check both endiannesses. The
expected results for data must be byte-swapped, but the ones for code
need no adjustment.
Time trace profiler support was added into LLVMgold in
cd3255abede5e3687c1538f2d3857deb2c51af1b. This patch adds its
`-plugin-opt` counterpart, which is just an alias to `--time-trace=`,
into LLD for compatibility.
This retries #90692 which was reverted previously due to issues with
lld-available being set, even if the copy of lld is not built from
source.
This does not change any code compared to #90692 to address the
lld-available issue.
The main change w.r.t, lld-available is xfailing tests in PR #99056
(until a longer term fix is available).
The start state of a new section is `EMS_None`, often leading to a
$d/$x at offset 0. Introduce a MCTargetOption/cl::opt
"implicit-mapsyms" to allow an alternative behavior
(https://github.com/ARM-software/abi-aa/issues/274):
* Set the start state to `EMS_Data` or `EMS_A64`.
* For text sections, add an ending $x only if the final data is not instructions.
* For non-text sections, add an ending $d only if the final data is not data commands.
```
.section .text.1,"ax"
nop
// emit $d
.long 42
// emit $x
.section .text.2,"ax"
nop
```
This new behavior decreases the .symtab size significantly:
```
% bloaty a64-2/bin/clang -- a64-0/bin/clang
FILE SIZE VM SIZE
-------------- --------------
-5.4% -1.13Mi [ = ] 0 .strtab
-50.9% -4.09Mi [ = ] 0 .symtab
-4.0% -5.22Mi [ = ] 0 TOTAL
```
---
This scheme works as long as the user can rule out some error scenarios:
* .text.1 assembled using the traditional behavior is combined with .text.2 using the new behavior
* A linker script combining non-text sections and text sections. The
lack of mapping symbols in the non-text sections could make them
treated as code, unless the linker inserts extra mapping symbols.
The above mix-and-match scenarios aren't an issue at all for a
significant portion of users.
A text section may start with data commands in rare cases (e.g.
-fsanitize=function) that many users don't care about. When combing
`(.text.0; .word 0)` and `(.text.1; .word 0)`, the ending $x of .text.0
and the initial $d of .text.1 may have the same address. If both
sections reside in the same file, ensure the ending symbol comes before
the initial $d of .text.1, so that a dumb linker respecting the symbol
order will place the ending $x before the initial $d.
Disassemblers using stable sort will see both symbols at the same
address, and the second will win.
When section ordering mechanisms (e.g. --symbol-ordering-file,
--call-graph-profile-sort, `.text : { second.o(.text) first.o(.text) }`)
are involved, the initial data in a text section following a text
section with trailing data could be misidentified as code, but the issue
is local and the risk could be acceptable.
Pull Request: https://github.com/llvm/llvm-project/pull/99718
* Use split-file
* Remove -o /dev/null
* Avoid `{ list; }` compound command not supported by the lit internal shell (#102382)
* Don't test "ld.lld" before "error:" as per convention
Pull Request: https://github.com/llvm/llvm-project/pull/105454
RISC-V GCC with `-fdata-sections` will emit `.sbss.<name>`,
`.srodata.<name>`, and `.sdata.<name>` sections for small data items of
different kinds. Clang/LLVM already emits `.srodata.*` sections, and we
intend to emit the other two section name patterns in #87040.
This change ensures that any input sections starting `.sbss` are
combined into one output section called `.sbss`, and the same
respectively for `.srodata` and `.sdata`. This also allows the existing
RISC-V specific code for determining an output order for `.sbss` and
`.sdata` sections to apply to placing the sections.
so that --noinhibit-exec downgrades the error to a warning, which helps
debugging when `PHDRS` is specified without `PT_TLS`. Also update the
message to make it accurate: STT_TLS may exist in the absence of PT_TLS.
In addition, invoking `exitLld(1)` (through `fatal`) is problematic
(#66974): When a thread is `exitLld(1)`, triggering `llvm_shutdown`,
another thread may be at `relocateAlloc`, accessing `sec.relocs()` which
got destroyed(tampered?), leading to
incorrect `llvm_unreachable("invalid expression")`.
Follow-up to #98115. For EhInputSection, RelocationScanner::scan calls
sortRels, which doesn't support the CREL iterator. We should set
supportsCrel to false to ensure that the initial_location fields in
.eh_frame FDEs are relocated.
Previously, we selected the Thumb2 PLT sequences if any input object is
marked as not supporting the ARM ISA, which then causes assertion
failures when calls from ARM code in other objects are seen. I think the
intention here was to only use Thumb PLTs when the target does not have
the ARM ISA available, signalled by no objects being marked as having it
available. To do that we need to track which ISAs we have seen as we
parse the build attributes, and defer the decision about PLTs until all
input objects have been parsed.
This bug was triggered by real code in picolibc, which have some
versions of string.h functions built with Thumb2-only build attributes,
so that they are compatible with v7-A, v7-R and v7-M.
Fixes#99008.
This allows the input section matching algorithm to be separated from
output section descriptions. This allows a group of sections to be
assigned to multiple output sections, providing an explicit version of
--enable-non-contiguous-regions's spilling that doesn't require altering
global linker script matching behavior with a flag. It also makes the
linker script language more expressive even if spilling is not intended,
since input section matching can be done in a different order than
sections are placed in an output section.
The implementation reuses the backend mechanism provided by
--enable-non-contiguous-regions, so it has roughly similar semantics and
limitations. In particular, sections cannot be spilled into or out of
INSERT, OVERWRITE_SECTIONS, or /DISCARD/. The former two aren't
intrinsic, so it may be possible to relax those restrictions later.
... using the temporary section type code 0x40000020
(`clang -c -Wa,--crel,--allow-experimental-crel`). LLVM will change the
code and break compatibility (Clang and lld of different versions are
not guaranteed to cooperate, unlike other features). CREL with implicit
addends are not supported.
---
Introduce `RelsOrRelas::crels` to iterate over SHT_CREL sections and
update users to check `crels`.
(The decoding performance is critical and error checking is difficult.
Follow `skipLeb` and `R_*LEB128` handling, do not use
`llvm::decodeULEB128`, whichs compiles to a lot of code.)
A few users (e.g. .eh_frame, LLDDwarfObj, s390x) require random access. Pass
`/*supportsCrel=*/false` to `relsOrRelas` to allocate a buffer and
convert CREL to RELA (`relas` instead of `crels` will be used). Since
allocating a buffer increases, the conversion is only performed when
absolutely necessary.
---
Non-alloc SHT_CREL sections may be created in -r and --emit-relocs
links. SHT_CREL and SHT_RELA components need reencoding since
r_offset/r_symidx/r_type/r_addend may change. (r_type may change because
relocations referencing a symbol in a discarded section are converted to
`R_*_NONE`).
* SHT_CREL components: decode with `RelsOrRelas` and re-encode (`OutputSection::finalizeNonAllocCrel`)
* SHT_RELA components: convert to CREL (`relToCrel`). An output section can only have one relocation section.
* SHT_REL components: print an error for now.
SHT_REL to SHT_CREL conversion for -r/--emit-relocs is complex and
unsupported yet.
Link: https://discourse.llvm.org/t/rfc-crel-a-compact-relocation-format-for-elf/77600
Pull Request: https://github.com/llvm/llvm-project/pull/98115
GNU ld since 2.41 supports this option, which is mildly useful. It omits
the section header table and non-ALLOC sections (including
.symtab/.strtab (--strip-all)).
This option is simple to implement and might be used by LLDB to test
program headers parsing without the section header table (#100900).
-z sectionheader, which is the default, is also added.
Pull Request: https://github.com/llvm/llvm-project/pull/101286
If an included script is under the sysroot directory, when it opens an
absolute path file (`INPUT` or `GROUP`), add sysroot before the absolute
path. When the included script ends, the `isUnderSysroot` state is
restored.
Fix#93947: the cycle detection mechanism added by
https://reviews.llvm.org/D37524 also disallowed including a file twice,
which is an unnecessary limitation.
Now that we have an include stack #100493, supporting multiple inclusion
is trivial. Note: a filename can be referenced with many different
paths, e.g. a.lds, ./a.lds, ././a.lds. We don't attempt to detect the
cycle in the earliest point.
`getLineNumber` is both imprecise (when `INCLUDE` is used) and
inefficient (see https://reviews.llvm.org/D104137). Track line number
precisely now that we have `struct Buffer` abstraction from #100493.
After #100493, the idiom `while (!errorCount() && !consume("}"))` could
lead to inaccurate diagnostics or dead loops. Introduce till to change
the code pattern.
The current tokenize-whole-file approach has a few limitations.
* Lack of state information: `maybeSplitExpr` is needed to parse
expressions. It's infeasible to add new states to behave more like GNU
ld.
* `readInclude` may insert tokens in the middle, leading to a time
complexity issue with N-nested `INCLUDE`.
* line/column information for diagnostics are inaccurate, especially
after an `INCLUDE`.
* `getLineNumber` cannot be made more efficient without significant code
complexity and memory consumption. https://reviews.llvm.org/D104137
The patch switches to a traditional lexer that generates tokens lazily.
* `atEOF` behavior is modified: we need to call `peek` to determine EOF.
* `peek` and `next` cannot call `setError` upon `atEOF`.
* Since `consume` no longer reports an error upon `atEOF`, the idiom `while (!errorCount() && !consume(")"))`
would cause a dead loop. Use `while (peek() != ")" && !atEOF()) { ... } expect(")")` instead.
* An include stack is introduced to handle `readInclude`. This can be
utilized to address #93947 properly.
* `tokens` and `pos` are removed.
* `commandString` is reimplemented. Since it is used in -Map output,
`\n` needs to be replaced with space.
Pull Request: https://github.com/llvm/llvm-project/pull/100493