Ctx was introduced in March 2022 as a more suitable place for such
singletons.
We now use default-initialization for `LinkerScript` and should pay
attention to non-class types (e.g. `dot` is initialized by commit
503907dc505db1e439e7061113bf84dd105f2e35).
This allows the input section matching algorithm to be separated from
output section descriptions. This allows a group of sections to be
assigned to multiple output sections, providing an explicit version of
--enable-non-contiguous-regions's spilling that doesn't require altering
global linker script matching behavior with a flag. It also makes the
linker script language more expressive even if spilling is not intended,
since input section matching can be done in a different order than
sections are placed in an output section.
The implementation reuses the backend mechanism provided by
--enable-non-contiguous-regions, so it has roughly similar semantics and
limitations. In particular, sections cannot be spilled into or out of
INSERT, OVERWRITE_SECTIONS, or /DISCARD/. The former two aren't
intrinsic, so it may be possible to relax those restrictions later.
If an included script is under the sysroot directory, when it opens an
absolute path file (`INPUT` or `GROUP`), add sysroot before the absolute
path. When the included script ends, the `isUnderSysroot` state is
restored.
Fix#93947: the cycle detection mechanism added by
https://reviews.llvm.org/D37524 also disallowed including a file twice,
which is an unnecessary limitation.
Now that we have an include stack #100493, supporting multiple inclusion
is trivial. Note: a filename can be referenced with many different
paths, e.g. a.lds, ./a.lds, ././a.lds. We don't attempt to detect the
cycle in the earliest point.
After #100493, the idiom `while (!errorCount() && !consume("}"))` could
lead to inaccurate diagnostics or dead loops. Introduce till to change
the code pattern.
The current tokenize-whole-file approach has a few limitations.
* Lack of state information: `maybeSplitExpr` is needed to parse
expressions. It's infeasible to add new states to behave more like GNU
ld.
* `readInclude` may insert tokens in the middle, leading to a time
complexity issue with N-nested `INCLUDE`.
* line/column information for diagnostics are inaccurate, especially
after an `INCLUDE`.
* `getLineNumber` cannot be made more efficient without significant code
complexity and memory consumption. https://reviews.llvm.org/D104137
The patch switches to a traditional lexer that generates tokens lazily.
* `atEOF` behavior is modified: we need to call `peek` to determine EOF.
* `peek` and `next` cannot call `setError` upon `atEOF`.
* Since `consume` no longer reports an error upon `atEOF`, the idiom `while (!errorCount() && !consume(")"))`
would cause a dead loop. Use `while (peek() != ")" && !atEOF()) { ... } expect(")")` instead.
* An include stack is introduced to handle `readInclude`. This can be
utilized to address #93947 properly.
* `tokens` and `pos` are removed.
* `commandString` is reimplemented. Since it is used in -Map output,
`\n` needs to be replaced with space.
Pull Request: https://github.com/llvm/llvm-project/pull/100493
Support `preinit_array . (TYPE=SHT_PREINIT_ARRAY) : { QUAD(16) }`
Follow-up to https://reviews.llvm.org/D118840
peek2() could be eliminated by a future change.
Implement the two commands described by
https://sourceware.org/binutils/docs/ld/Miscellaneous-Commands.html
After `outputSections` is available, check each output section described
by at least one `NOCROSSREFS`/`NOCROSSERFS_TO` command. For each checked
output section, scan relocations from its input sections.
This step is slow, therefore utilize `parallelForEach(isd->sections, ...)`.
To support non SHF_ALLOC sections, `InputSectionBase::relocations`
(empty) cannot be used. In addition, we may explore eliminating this
member to speed up relocation scanning.
Some parse code is adapted from #95714.
Close#41825
Pull Request: https://github.com/llvm/llvm-project/pull/98773
This patch improves GNU ld compatibility.
Close#87891: Support `OUTPUT_FORMAT(binary)`, which is like
--oformat=binary. --oformat=binary takes precedence over an ELF
`OUTPUT_FORMAT`.
In addition, if more than one OUTPUT_FORMAT command is specified, only
check the first one.
Pull Request: https://github.com/llvm/llvm-project/pull/98837
- Add support for `.openbsd.mutable`
(rebaser's note) adapted from:
bd249b5664
New auto-coalescing sections removed
In the linkers, collect objects in section "openbsd.mutable" and place
them into a page-aligned region in the bss, with the right markers for
kernel/ld.so to identify the region and skip making it immutable. While
here, fix readelf/objdump versions to show all of this. ok miod kettenis
- Add support for `.openbsd.syscalls`
(rebaser's note) adapted from:
42a61acefa
Collect .openbsd.syscalls sections into a new PT_OPENBSD_SYSCALLS
segment. This will be used soon to pin system calls to designated call
sites.
ok deraadt@
- Scope OpenBSD special section handling under that ELFOSABI
As a preexisting comment in `ELF/Writer.cpp` says:
> section names shouldn't be significant in ELF in spirit.
so scoping OSABI-specific magic name hacks to just the OSABI in
question limits the degree to which we deviate from that "spirit" for
all other OSABIs.
OpenBSD in particular is very fast moving, having added a number of
special sections, etc. in recent years. It is unclear how possible /
reasonable it is for upstream to implement all these features in any
event, but scoping like this at least mitigates the fallout for other
OSABIs systems which wish to be more slow-moving.
Co-authored-by: deraadt <deraadt@openbsd.org>
Since `assignAddresses` is executed more than once, error reporting
during `assignAddresses` would be duplicated. Generalize #66854 to cover
more errors.
Note: address-related errors exposed in one invocation might not be
errors in another invocation.
Pull Request: https://github.com/llvm/llvm-project/pull/96361
Previously, linker was unnecessarily including a PROVIDE symbol which
was referenced by another unused PROVIDE symbol. For example, if a
linker script contained the below code and 'not_used_sym' provide symbol
is not included, then linker was still unnecessarily including 'foo' PROVIDE
symbol because it was referenced by 'not_used_sym'. This commit fixes
this behavior.
PROVIDE(not_used_sym = foo)
PROVIDE(foo = 0x1000)
This commit fixes this behavior by using dfs-like algorithm to find
all the symbols referenced in provide expressions of included provide
symbols.
This commit also fixes the issue of unused section not being garbage-collected
if a symbol of the section is referenced by an unused PROVIDE symbol.
Closes#74771Closes#84730
Co-authored-by: Fangrui Song <i@maskray.me>
The lexer is overly permissive. When parsing file patterns in an input
section description and there is a missing `)`, we would accept many
non-sensible tokens (e.g. `}`) as patterns, leading to confusion, e.g.
`*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.text*)) } PROVIDE_HIDDEN(__code_end = .)`
(#81804).
Ideally, the lexer should be stateful to report more errors like GNU ld
and get rid of hacks like `ScriptLexer::maybeSplitExpr`, but that would
require a large rewrite of the lexer. For now, just reject certain
non-wildcard meta characters to detect common mistakes.
Pull Request: https://github.com/llvm/llvm-project/pull/84130
This patch adds full support for linking SystemZ (ELF s390x) object
files. Support should be generally complete:
- All relocation types are supported.
- Full shared library support (DYNAMIC, GOT, PLT, ifunc).
- Relaxation of TLS and GOT relocations where appropriate.
- Platform-specific test cases.
In addition to new platform code and the obvious changes, there were a
few additional changes to common code:
- Add three new RelExpr members (R_GOTPLT_OFF, R_GOTPLT_PC, and
R_PLT_GOTREL) needed to support certain s390x relocations. I chose not
to use a platform-specific name since nothing in the definition of these
relocs is actually platform-specific; it is well possible that other
platforms will need the same.
- A couple of tweaks to TLS relocation handling, as the particular
semantics of the s390x versions differ slightly. See comments in the
code.
This was tested by building and testing >1500 Fedora packages, with only
a handful of failures; as these also have issues when building with LLD
on other architectures, they seem unrelated.
Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@redhat.com>
Based on https://reviews.llvm.org/D45375 . Introduce a new InputFile
kind `InternalKind`, use it for
* `ctx.internalFile`: for linker-defined symbols and some synthesized
`Undefined`
* `createInternalFile`: for symbol assignments and --defsym
I picked "internal" instead of "synthetic" to avoid confusion with
SyntheticSection.
Currently a symbol's file is one of: nullptr, ObjKind, SharedKind,
BitcodeKind, BinaryKind. Now it's non-null (I plan to add an
`assert(file)` to Symbol::Symbol and change `toString(const InputFile
*)`
separately).
Debugging and error reporting gets improved. The immediate user-facing
difference is more descriptive "File" column in the --cref output. This
patch may unlock further simplification.
Currently each symbol assignment gets its own
`createInternalFile(cmd->location)`. Two symbol assignments in a linker
script do not share the same file. Making the file the same would be
nice, but would require non trivial code.
https://reviews.llvm.org/D44780 implemented rudimentary support for
OVERLAY. The start address and `AT(ldaddr)` in `OVERLAY [start] :
[NOCROSSREFS] [AT ( ldaddr )]` are not optional.
In addition, there are two issues:
* When the start address is `.`, subsequent sections don't share the
address of the first overlay section.
* When the first overlay section is empty and discardable, `p_paddr` is
incorrectly zero. This is because a discarded section has a zero
address, causing `prev->getLMA() + prev->size` where `prev` refers to
the first section to evaluate to zero.
This patch supports optional start address and LMA and fix the issues.
Close#77265
Pull Request: https://github.com/llvm/llvm-project/pull/77272
Close#57618: currently we align the end of PT_GNU_RELRO to a
common-page-size
boundary, but do not align the end of the associated PT_LOAD. This is
benign
when runtime_page_size >= common-page-size.
However, when runtime_page_size < common-page-size, it is possible that
`alignUp(end(PT_LOAD), page_size) < alignDown(end(PT_GNU_RELRO),
page_size)`.
In this case, rtld's mprotect call for PT_GNU_RELRO will apply to
unmapped
regions and lead to an error, e.g.
```
error while loading shared libraries: cannot apply additional memory protection after relocation: Cannot allocate memory
```
To fix the issue, add a padding section .relro_padding like mold, which
is contained in the PT_GNU_RELRO segment and the associated PT_LOAD
segment. The section also prevents strip from corrupting PT_LOAD program
headers.
.relro_padding has the largest `sortRank` among RELRO sections.
Therefore, it is naturally placed at the end of `PT_GNU_RELRO` segment
in the absence of `PHDRS`/`SECTIONS` commands.
In the presence of `SECTIONS` commands, we place .relro_padding
immediately before a symbol assignment using DATA_SEGMENT_RELRO_END (see
also https://reviews.llvm.org/D124656), if present.
DATA_SEGMENT_RELRO_END is changed to align to max-page-size instead of
common-page-size.
Some edge cases worth mentioning:
* ppc64-toc-addis-nop.s: when PHDRS is present, do not append
.relro_padding
* avoid-empty-program-headers.s: when the only RELRO section is .tbss,
it is not part of PT_LOAD segment, therefore we do not append
.relro_padding.
---
Close#65002: GNU ld from 2.39 onwards aligns the end of PT_GNU_RELRO to
a
max-page-size boundary (https://sourceware.org/PR28824) so that the last
page is
protected even if runtime_page_size > common-page-size.
In my opinion, losing protection for the last page when the runtime page
size is
larger than common-page-size is not really an issue. Double mapping a
page of up
to max-common-page for the protection could cause undesired VM waste.
Internally
we had users complaining about 2MiB max-page-size applying to shared
objects.
Therefore, the end of .relro_padding is padded to a common-page-size
boundary. Users who are really anxious can set common-page-size to match
their runtime page size.
---
17 tests need updating as there are lots of change detectors.
Fix#64600: the currently implementation is minimal (see
https://reviews.llvm.org/D83758), and an assignment like
`__TEXT_REGION_ORIGIN__ = DEFINED(__TEXT_REGION_ORIGIN__) ? __TEXT_REGION_ORIGIN__ : 0;`
(used by avr-ld[1]) leads to a value of zero (default value in `declareSymbol`),
which is unexpected.
Assign orders to symbol assignments and references so that
for a script-defined symbol, the `DEFINED` results match users'
expectation. I am unclear about GNU ld's exact behavior, but this hopefully
matches its behavior in the majority of cases.
[1]: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/scripttempl/avr.sc
This adds support for the LoongArch ELF psABI v2.00 [1] relocation
model to LLD. The deprecated stack-machine-based psABI v1 relocs are not
supported.
The code is tested by successfully bootstrapping a Gentoo/LoongArch
stage3, complete with common GNU userland tools and both the LLVM and
GNU toolchains (GNU toolchain is present only for building glibc,
LLVM+Clang+LLD are used for the rest). Large programs like QEMU are
tested to work as well.
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
Reviewed By: MaskRay, SixWeining
Differential Revision: https://reviews.llvm.org/D138135
Section names used in ELF linker scripts can be quoted, but such
quotes must not be propagated to the binary ELF section names. As
such strip the quotes from the section names when processing them, and
also strip them from linker script functions that take section names
as parameters.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D124266
The left/right shift linker script operators may trigger UB.
E.g. in linkerscript/end-overflow-check.test, the initial REGION1__PADDED_SR_SHIFT is
uint64_t(-3), cause the following expression to trigger an out-of-range shift in
a ubsan build of lld.
REGION1__PADDED_SR_SIZE = MAX(1 << REGION1__PADDED_SR_SHIFT, 32);
Protect such UBs by making RHS less than 64.
This patch migrates uses of StringRef::{starts,ends}with_insensitive
to StringRef::{starts,ends}_with_insensitive so that we can use names
similar to those used in std::string_view.
Note that the llvm/ directory has migrated in commit
6c3ea866e93003e16fc55d3b5cedd3bc371d1fde.
I'll post a separate patch to deprecate
StringRef::{starts,ends}with_insensitive.
Differential Revision: https://reviews.llvm.org/D150506
In D72756 the change to add INPUT_SECTION_FLAGS inadvertantly
removed the line to parse the program header assignment information for
OutputSections within an OVERLAY.
This change adds back the missing line and adds a test for it.
Differential Revision: https://reviews.llvm.org/D150445
Changes:
- Adding BE32 big endian Support for Arm.
- Replace the writele and readle with their endian-aware versions.
- Adding test cases for the big-endian be32 arm configuration.
Patch by: Milosz Plichta. This patch merges all the changes from
this patch https://reviews.llvm.org/D140203 as well.
Reviewed By: peter.smith, MaskRay
Differential Revision: https://reviews.llvm.org/D140202
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716