270 Commits

Author SHA1 Message Date
Fangrui Song
b4feb26606 [ELF] Move target to Ctx. NFC
Ctx was introduced in March 2022 as a more suitable place for such
singletons.

Follow-up to driver (2022-10) and script (2024-08).
2024-08-21 23:53:36 -07:00
Fangrui Song
4629aa1797 [ELF] Move script into Ctx. NFC
Ctx was introduced in March 2022 as a more suitable place for such
singletons.

We now use default-initialization for `LinkerScript` and should pay
attention to non-class types (e.g. `dot` is initialized by commit
503907dc505db1e439e7061113bf84dd105f2e35).
2024-08-21 21:23:28 -07:00
Daniel Thornburgh
7e8a9020b1
[LLD] Add CLASS syntax to SECTIONS (#95323)
This allows the input section matching algorithm to be separated from
output section descriptions. This allows a group of sections to be
assigned to multiple output sections, providing an explicit version of
--enable-non-contiguous-regions's spilling that doesn't require altering
global linker script matching behavior with a flag. It also makes the
linker script language more expressive even if spilling is not intended,
since input section matching can be done in a different order than
sections are placed in an output section.

The implementation reuses the backend mechanism provided by
--enable-non-contiguous-regions, so it has roughly similar semantics and
limitations. In particular, sections cannot be spilled into or out of
INSERT, OVERWRITE_SECTIONS, or /DISCARD/. The former two aren't
intrinsic, so it may be possible to relax those restrictions later.
2024-08-05 13:06:45 -07:00
Fangrui Song
ff7f97a819 [ELF] --defsym: support quoted LHS
and move = splitting from Driver.cpp to ScriptParser.cpp.
2024-07-28 12:38:10 -07:00
Fangrui Song
a7e8bddfc1 [ELF] Respect --sysroot for INCLUDE
If an included script is under the sysroot directory, when it opens an
absolute path file (`INPUT` or `GROUP`), add sysroot before the absolute
path. When the included script ends, the `isUnderSysroot` state is
restored.
2024-07-28 11:43:27 -07:00
Fangrui Song
a4921f10e0 [ELF] Output section phdr: support quoted names 2024-07-27 17:40:51 -07:00
Fangrui Song
9c16a4a2dc [ELF] INSERT [AFTER|BEFORE]: support quoted names 2024-07-27 17:34:37 -07:00
Fangrui Song
8f72b0cb08 [ELF] Fix INCLUDE cycle detection
Fix #93947: the cycle detection mechanism added by
https://reviews.llvm.org/D37524 also disallowed including a file twice,
which is an unnecessary limitation.

Now that we have an include stack #100493, supporting multiple inclusion
is trivial. Note: a filename can be referenced with many different
paths, e.g. a.lds, ./a.lds, ././a.lds. We don't attempt to detect the
cycle in the earliest point.
2024-07-27 17:25:13 -07:00
Fangrui Song
dbd65a07f2 [ELF] OUTPUT_ARCH: report unclosed error 2024-07-27 16:52:47 -07:00
Fangrui Song
74f843d05f [ELF] Replace unquote(next()) with readName. NFC 2024-07-27 16:47:18 -07:00
Fangrui Song
0d8bc10acb [ELF] Memory region: support quoted names 2024-07-27 16:39:15 -07:00
Fangrui Song
e689515491 [ELF] OVERLAY: support quoted output section names 2024-07-27 16:33:18 -07:00
Fangrui Song
74ef53a01a [ELF] REGION_ALIAS: support quoted names 2024-07-27 16:29:43 -07:00
Fangrui Song
c89566f317 [ELF] Replace unquote(next()) with readName. NFC 2024-07-27 16:27:05 -07:00
Fangrui Song
30ec2bf58d [ELF] PROVIDE: allow quoted names to be discarded
Extend commit ebb326a51fec37b5a47e5702e8ea157cd4f835cd for (#74771) to
support quoted names, e.g. `PROVIDE("f1" = f2 + f3);`.
2024-07-27 16:19:57 -07:00
Fangrui Song
edcc60e403 [ELF] Simplify readAssignment
After #100493, the `=` support from
fe0de25b2195b66d1ebac5d3ebdb18f9e1e776da can be simplified.
2024-07-27 16:04:38 -07:00
Hongyu Chen
f1a7d146e0
[ELF] Updated some while conditions with till (#100893)
This change is based on
[commit](b32c38ab5b)
for a cleaner API usage. Thanks to @MaskRay !
2024-07-27 14:16:12 -07:00
Fangrui Song
b32c38ab5b [ELF] Replace some while (peek() != ")" && !atEOF()) with till 2024-07-26 17:25:23 -07:00
Fangrui Song
10bb296dfc [ELF] Replace some while (peek() != ")" && !atEOF()) with till 2024-07-26 17:19:04 -07:00
Fangrui Song
2a89356d64 [ELF] Add till and rewrite while (... consume("}"))
After #100493, the idiom `while (!errorCount() && !consume("}"))` could
lead to inaccurate diagnostics or dead loops. Introduce till to change
the code pattern.
2024-07-26 17:13:37 -07:00
Fangrui Song
1978c21d96
[ELF] ScriptLexer: generate tokens lazily
The current tokenize-whole-file approach has a few limitations.

* Lack of state information: `maybeSplitExpr` is needed to parse
  expressions. It's infeasible to add new states to behave more like GNU
  ld.
* `readInclude` may insert tokens in the middle, leading to a time
  complexity issue with N-nested `INCLUDE`.
* line/column information for diagnostics are inaccurate, especially
  after an `INCLUDE`.
* `getLineNumber` cannot be made more efficient without significant code
  complexity and memory consumption. https://reviews.llvm.org/D104137

The patch switches to a traditional lexer that generates tokens lazily.

* `atEOF` behavior is modified: we need to call `peek` to determine EOF.
* `peek` and `next` cannot call `setError` upon `atEOF`.
* Since `consume` no longer reports an error upon `atEOF`, the idiom `while (!errorCount() && !consume(")"))`
  would cause a dead loop. Use `while (peek() != ")" && !atEOF()) { ... } expect(")")` instead.
* An include stack is introduced to handle `readInclude`. This can be
  utilized to address #93947 properly.
* `tokens` and `pos` are removed.
* `commandString` is reimplemented. Since it is used in -Map output,
  `\n` needs to be replaced with space.

Pull Request: https://github.com/llvm/llvm-project/pull/100493
2024-07-26 14:26:38 -07:00
Hongyu Chen
2ae862b74b
[ELF] Remove consumeLabel in ScriptLexer (#99567)
This commit removes `consumeLabel` since we can just use consume
function to have the same functionalities.
2024-07-23 22:03:46 -07:00
Hongyu Chen
b828c13f3c
[ELF] Delete peek2 in Lexer (#99790)
Thanks to Fangrui's change

28045ceab0
so peek2 can be removed.
2024-07-20 16:35:38 -07:00
Fangrui Song
efa833dd0f [ELF] Simplify readExpr. NFC 2024-07-20 14:36:55 -07:00
Fangrui Song
28045ceab0 [ELF] Support (TYPE=<value>) beside output section address
Support `preinit_array . (TYPE=SHT_PREINIT_ARRAY) : { QUAD(16) }`

Follow-up to https://reviews.llvm.org/D118840

peek2() could be eliminated by a future change.
2024-07-20 14:13:02 -07:00
Fangrui Song
0778f5c1f1
[ELF] Support NOCROSSREFS and NOCROSSERFS_TO
Implement the two commands described by
https://sourceware.org/binutils/docs/ld/Miscellaneous-Commands.html

After `outputSections` is available, check each output section described
by at least one `NOCROSSREFS`/`NOCROSSERFS_TO` command. For each checked
output section, scan relocations from its input sections.
This step is slow, therefore utilize `parallelForEach(isd->sections, ...)`.

To support non SHF_ALLOC sections, `InputSectionBase::relocations`
(empty) cannot be used. In addition, we may explore eliminating this
member to speed up relocation scanning.

Some parse code is adapted from #95714.

Close #41825

Pull Request: https://github.com/llvm/llvm-project/pull/98773
2024-07-17 10:45:59 -07:00
Brian Cain
9078036685
[lld] Add emulation support for hexagon (#98857) 2024-07-16 15:01:27 -05:00
Fangrui Song
6464dd21b5
[ELF] OUTPUT_FORMAT: support "binary" and ignore extra OUTPUT_FORMAT commands
This patch improves GNU ld compatibility.

Close #87891: Support `OUTPUT_FORMAT(binary)`, which is like
--oformat=binary. --oformat=binary takes precedence over an ELF
`OUTPUT_FORMAT`.

In addition, if more than one OUTPUT_FORMAT command is specified, only
check the first one.

Pull Request: https://github.com/llvm/llvm-project/pull/98837
2024-07-16 10:28:09 -07:00
John Ericson
d7fd8b19e5
[LLD] Extend special OpenBSD support, but scope under ELFOSABI (#97122)
- Add support for `.openbsd.mutable`

  (rebaser's note) adapted from:

bd249b5664
  New auto-coalescing sections removed

  In the linkers, collect objects in section "openbsd.mutable" and place
  them into a page-aligned region in the bss, with the right markers for
kernel/ld.so to identify the region and skip making it immutable. While
here, fix readelf/objdump versions to show all of this. ok miod kettenis

- Add support for `.openbsd.syscalls`

  (rebaser's note) adapted from:

42a61acefa

  Collect .openbsd.syscalls sections into a new PT_OPENBSD_SYSCALLS
  segment. This will be used soon to pin system calls to designated call
  sites.

  ok deraadt@

- Scope OpenBSD special section handling under that ELFOSABI

  As a preexisting comment in `ELF/Writer.cpp` says:

  > section names shouldn't be significant in ELF in spirit.

  so scoping OSABI-specific magic name hacks to just the OSABI in
  question limits the degree to which we deviate from that "spirit" for
  all other OSABIs.

  OpenBSD in particular is very fast moving, having added a number of
  special sections, etc. in recent years. It is unclear how possible /
  reasonable it is for upstream to implement all these features in any
  event, but scoping like this at least mitigates the fallout for other
  OSABIs systems which wish to be more slow-moving.

Co-authored-by: deraadt <deraadt@openbsd.org>
2024-07-12 14:34:17 -04:00
Fangrui Song
ee4c12f87d
[ELF] Postpone more linker script errors
Since `assignAddresses` is executed more than once, error reporting
during `assignAddresses` would be duplicated. Generalize #66854 to cover
more errors.

Note: address-related errors exposed in one invocation might not be
errors in another invocation.

Pull Request: https://github.com/llvm/llvm-project/pull/96361
2024-06-24 10:15:28 -07:00
Parth Arora
ebb326a51f [ELF] Fix unnecessary inclusion of unreferenced provide symbols
Previously, linker was unnecessarily including a PROVIDE symbol which
was referenced by another unused PROVIDE symbol. For example, if a
linker script contained the below code and 'not_used_sym' provide symbol
is not included, then linker was still unnecessarily including 'foo' PROVIDE
symbol because it was referenced by 'not_used_sym'. This commit fixes
this behavior.

PROVIDE(not_used_sym = foo)
PROVIDE(foo = 0x1000)

This commit fixes this behavior by using dfs-like algorithm to find
all the symbols referenced in provide expressions of included provide
symbols.

This commit also fixes the issue of unused section not being garbage-collected
if a symbol of the section is referenced by an unused PROVIDE symbol.

Closes #74771
Closes #84730

Co-authored-by: Fangrui Song <i@maskray.me>
2024-03-25 16:11:21 -07:00
Fangrui Song
551e20d190
[ELF] Reject error-prone meta characters in input section description
The lexer is overly permissive. When parsing file patterns in an input
section description and there is a missing `)`, we would accept many
non-sensible tokens (e.g. `}`) as patterns, leading to confusion, e.g.
`*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.text*)) } PROVIDE_HIDDEN(__code_end = .)`
(#81804).

Ideally, the lexer should be stateful to report more errors like GNU ld
and get rid of hacks like `ScriptLexer::maybeSplitExpr`, but that would
require a large rewrite of the lexer. For now, just reject certain
non-wildcard meta characters to detect common mistakes.

Pull Request: https://github.com/llvm/llvm-project/pull/84130
2024-03-06 17:19:59 -08:00
Ulrich Weigand
fe3406e349
[lld] Add target support for SystemZ (s390x) (#75643)
This patch adds full support for linking SystemZ (ELF s390x) object
files. Support should be generally complete:
- All relocation types are supported.
- Full shared library support (DYNAMIC, GOT, PLT, ifunc).
- Relaxation of TLS and GOT relocations where appropriate.
- Platform-specific test cases.

In addition to new platform code and the obvious changes, there were a
few additional changes to common code:

- Add three new RelExpr members (R_GOTPLT_OFF, R_GOTPLT_PC, and
R_PLT_GOTREL) needed to support certain s390x relocations. I chose not
to use a platform-specific name since nothing in the definition of these
relocs is actually platform-specific; it is well possible that other
platforms will need the same.

- A couple of tweaks to TLS relocation handling, as the particular
semantics of the s390x versions differ slightly. See comments in the
code.

This was tested by building and testing >1500 Fedora packages, with only
a handful of failures; as these also have issues when building with LLD
on other architectures, they seem unrelated.

Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@redhat.com>
2024-02-13 11:29:21 +01:00
Fangrui Song
43b13341fb
[ELF] Add internal InputFile (#78944)
Based on https://reviews.llvm.org/D45375 . Introduce a new InputFile
kind `InternalKind`, use it for

* `ctx.internalFile`: for linker-defined symbols and some synthesized
`Undefined`
* `createInternalFile`: for symbol assignments and --defsym

I picked "internal" instead of "synthetic" to avoid confusion with
SyntheticSection.

Currently a symbol's file is one of: nullptr, ObjKind, SharedKind,
BitcodeKind, BinaryKind. Now it's non-null (I plan to add an
`assert(file)` to Symbol::Symbol and change `toString(const InputFile
*)`
separately).

Debugging and error reporting gets improved. The immediate user-facing
difference is more descriptive "File" column in the --cref output. This
patch may unlock further simplification.

Currently each symbol assignment gets its own
`createInternalFile(cmd->location)`. Two symbol assignments in a linker
script do not share the same file. Making the file the same would be
nice, but would require non trivial code.
2024-01-22 09:09:46 -08:00
Fangrui Song
7c89b20e02 [ELF] OVERLAY: support optional start address and LMA
https://reviews.llvm.org/D44780 implemented rudimentary support for
OVERLAY. The start address and `AT(ldaddr)` in `OVERLAY [start] :
[NOCROSSREFS] [AT ( ldaddr )]` are not optional.

In addition, there are two issues:

* When the start address is `.`, subsequent sections don't share the
  address of the first overlay section.
* When the first overlay section is empty and discardable, `p_paddr` is
  incorrectly zero. This is because a discarded section has a zero
  address, causing `prev->getLMA() + prev->size` where `prev` refers to
  the first section to evaluate to zero.

This patch supports optional start address and LMA and fix the issues.
Close #77265

Pull Request: https://github.com/llvm/llvm-project/pull/77272
2024-01-08 16:12:49 -08:00
Fangrui Song
1bd5df7af6 [ELF] Correct a comment about ^=. NFC
GNU ld added ^= support in July 2023.
2023-09-15 17:52:48 -07:00
Fangrui Song
5a58e98c20
[ELF] Align the end of PT_GNU_RELRO associated PT_LOAD to a common-page-size boundary (#66042)
Close #57618: currently we align the end of PT_GNU_RELRO to a
common-page-size
boundary, but do not align the end of the associated PT_LOAD. This is
benign
when runtime_page_size >= common-page-size.

However, when runtime_page_size < common-page-size, it is possible that
`alignUp(end(PT_LOAD), page_size) < alignDown(end(PT_GNU_RELRO),
page_size)`.
In this case, rtld's mprotect call for PT_GNU_RELRO will apply to
unmapped
regions and lead to an error, e.g.

```
error while loading shared libraries: cannot apply additional memory protection after relocation: Cannot allocate memory
```

To fix the issue, add a padding section .relro_padding like mold, which
is contained in the PT_GNU_RELRO segment and the associated PT_LOAD
segment. The section also prevents strip from corrupting PT_LOAD program
headers.

.relro_padding has the largest `sortRank` among RELRO sections.
Therefore, it is naturally placed at the end of `PT_GNU_RELRO` segment
in the absence of `PHDRS`/`SECTIONS` commands.

In the presence of `SECTIONS` commands, we place .relro_padding
immediately before a symbol assignment using DATA_SEGMENT_RELRO_END (see
also https://reviews.llvm.org/D124656), if present.
DATA_SEGMENT_RELRO_END is changed to align to max-page-size instead of
common-page-size.

Some edge cases worth mentioning:

* ppc64-toc-addis-nop.s: when PHDRS is present, do not append
.relro_padding
* avoid-empty-program-headers.s: when the only RELRO section is .tbss,
it is not part of PT_LOAD segment, therefore we do not append
.relro_padding.

---

Close #65002: GNU ld from 2.39 onwards aligns the end of PT_GNU_RELRO to
a
max-page-size boundary (https://sourceware.org/PR28824) so that the last
page is
protected even if runtime_page_size > common-page-size.

In my opinion, losing protection for the last page when the runtime page
size is
larger than common-page-size is not really an issue. Double mapping a
page of up
to max-common-page for the protection could cause undesired VM waste.
Internally
we had users complaining about 2MiB max-page-size applying to shared
objects.

Therefore, the end of .relro_padding is padded to a common-page-size
boundary. Users who are really anxious can set common-page-size to match
their runtime page size.

---

17 tests need updating as there are lots of change detectors.
2023-09-14 10:33:11 -07:00
Fangrui Song
65a15a56d5 [ELF] Respect orders of symbol assignments and DEFINED (#65866)
Fix #64600: the currently implementation is minimal (see
https://reviews.llvm.org/D83758), and an assignment like
`__TEXT_REGION_ORIGIN__ = DEFINED(__TEXT_REGION_ORIGIN__) ? __TEXT_REGION_ORIGIN__ : 0;`
(used by avr-ld[1]) leads to a value of zero (default value in `declareSymbol`),
which is unexpected.

Assign orders to symbol assignments and references so that
for a script-defined symbol, the `DEFINED` results match users'
expectation. I am unclear about GNU ld's exact behavior, but this hopefully
matches its behavior in the majority of cases.

[1]: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/scripttempl/avr.sc
2023-09-11 10:54:49 -07:00
WANG Xuerui
6084ee7420 [lld][ELF] Support LoongArch
This adds support for the LoongArch ELF psABI v2.00 [1] relocation
model to LLD. The deprecated stack-machine-based psABI v1 relocs are not
supported.

The code is tested by successfully bootstrapping a Gentoo/LoongArch
stage3, complete with common GNU userland tools and both the LLVM and
GNU toolchains (GNU toolchain is present only for building glibc,
LLVM+Clang+LLD are used for the rest). Large programs like QEMU are
tested to work as well.

[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html

Reviewed By: MaskRay, SixWeining

Differential Revision: https://reviews.llvm.org/D138135
2023-07-25 17:06:07 +08:00
Fangrui Song
fae96104d4 [ELF] Support operator ^ and ^=
GNU ld added ^ support in July 2023 and it looks like ^= is in plan as
well.

For now, we don't support `a^=0` (^= without a preceding space).
2023-07-15 14:10:40 -07:00
Fangrui Song
49dfbc6efc [ELF] Remove one unneeded unquote from D124266
This one is unneeded after commit d60ef9338deb734541ff1c9d0771807815d5d9e6 (2023-02-03).
2023-07-05 15:08:53 -07:00
Roger Pau Monne
7cab385a8f [lld/elf] support quote usage in section names
Section names used in ELF linker scripts can be quoted, but such
quotes must not be propagated to the binary ELF section names.  As
such strip the quotes from the section names when processing them, and
also strip them from linker script functions that take section names
as parameters.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D124266
2023-07-05 14:56:16 -07:00
Fangrui Song
daba24ee7b [ELF] << >>: make RHS less than 64
The left/right shift linker script operators may trigger UB.
E.g. in linkerscript/end-overflow-check.test, the initial REGION1__PADDED_SR_SHIFT is
uint64_t(-3), cause the following expression to trigger an out-of-range shift in
a ubsan build of lld.

    REGION1__PADDED_SR_SIZE = MAX(1 << REGION1__PADDED_SR_SHIFT, 32);

Protect such UBs by making RHS less than 64.
2023-06-15 10:34:33 -07:00
Fangrui Song
8d85c96e0e [lld] StringRef::{starts,ends}with => {starts,ends}_with. NFC
The latter form is now preferred to be similar to C++20 starts_with.
This replacement also removes one function call when startswith is not inlined.
2023-06-05 14:36:19 -07:00
Kazu Hirata
ed1539c6ad Migrate {starts,ends}with_insensitive to {starts,ends}_with_insensitive (NFC)
This patch migrates uses of StringRef::{starts,ends}with_insensitive
to StringRef::{starts,ends}_with_insensitive so that we can use names
similar to those used in std::string_view.

Note that the llvm/ directory has migrated in commit
6c3ea866e93003e16fc55d3b5cedd3bc371d1fde.

I'll post a separate patch to deprecate
StringRef::{starts,ends}with_insensitive.

Differential Revision: https://reviews.llvm.org/D150506
2023-05-16 10:12:42 -07:00
Peter Smith
e16af8a281 [LLD][ELF] Add missing program header parsing to OVERLAY
In D72756 the change to add INPUT_SECTION_FLAGS inadvertantly
removed the line to parse the program header assignment information for
OutputSections within an OVERLAY.

This change adds back the missing line and adds a test for it.

Differential Revision: https://reviews.llvm.org/D150445
2023-05-15 10:04:33 +01:00
Simi Pallipurath
2f68ddc604 [lld][ARM][2/3]Big Endian support - Word invariant support
Changes:
 - Adding BE32 big endian Support for Arm.
 - Replace the writele and readle with their endian-aware versions.
 - Adding test cases for the big-endian be32 arm configuration.

     Patch by: Milosz Plichta. This patch merges all the changes from
     this patch https://reviews.llvm.org/D140203 as well.

Reviewed By: peter.smith, MaskRay

Differential Revision: https://reviews.llvm.org/D140202
2023-03-29 10:21:00 +01:00
Justin Cady
447aa48b4a [ELF] Add REVERSE input section description keyword
The `REVERSE` keyword is described here:

https://sourceware.org/bugzilla/show_bug.cgi?id=27565

It complements `SORT` by allowing the order of input sections to be reversed.

This is particularly useful for order-dependent sections such as .init_array,
where `REVERSE` can be used to either detect static initialization order fiasco
issues or as a mechanism to maintain .ctors element order while transitioning to
the modern .init_array. Such a transition is described here:

https://discourse.llvm.org/t/is-it-possible-to-manually-specify-init-array-order/68649

Differential Revision: https://reviews.llvm.org/D145381
2023-03-07 12:44:02 -05:00
Fangrui Song
d60ef9338d [ELF] Support quoted output section names
Similar to e7a7ad134fe182aad190cb3ebc441164470e92f5 and
2bf06d9345caeb26520be8e830c092683bbdf0f7 for other linker script syntax.

Close https://github.com/llvm/llvm-project/issues/60496
2023-02-03 11:03:00 -08:00
Kazu Hirata
c68af42fa8 [lld] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 23:12:36 -08:00