The linker was crashing due to stack overflow when parsing ':ALIGN' in
an output section description. This commit fixes the linker script
parser so that the crash does not happen.
The root cause of the stack overflow is how we parse expressions
(readExpr) in linker script and the behavior of ScriptLexer::expect(...)
utility. ScriptLexer::expect does not do anything if errors have already
been encountered during linker script parsing. In particular, it never
increments the current token position in the script file, even if the
current token is the same as the expected token. This causes an infinite
call cycle on parsing an expression such as '(4096)' when an error has
already been encountered.
readExpr() calls readPrimary()
readPrimary() calls readParenExpr()
readParenExpr():
expect("("); // no-op, current token still points to '('
Expression *E = readExpr(); // The cycle continues...
Closes#146722
Signed-off-by: Parth Arora <partaror@qti.qualcomm.com>
assignAddresses is executed more than once. When an ASSERT expression
evaluates to zero, we should only report an error for the last
assignAddresses. Make a change similar to #66854 and #96361.
This change might help https://github.com/ClangBuiltLinux/linux/issues/2094
This allows NOCROSSREFS to be specified in OVERLAY linker script
descriptions. This is a particularly useful part of the OVERLAY syntax,
since it's very rarely possible for one overlay section to sensibly
reference another.
Closes#128790
This allows the contents of OVERLAYs to be attributed to memory regions.
This is the only clean way to overlap VMAs in linker scripts that choose
to primarily use memory regions to lay out addresses.
This also simplifies OVERLAY expansion to better match GNU LD.
Expressions for the first section's LMA and VMA are not generated if the
user did not provide them. This allows the LMA/VMA offset to be
preserved across multiple overlays in the same region, as with regular
sections.
Closes#129816
When attempting to add KEEP within an OVERLAY description, which the
Linux kernel would like to do for ARCH=arm to avoid dropping the
.vectors sections with '--gc-sections' [1], ld.lld errors with:
ld.lld: error: ./arch/arm/kernel/vmlinux.lds:37: section pattern is expected
>>> __vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { KEEP(*(.vectors)) } ...
>>> ^
readOverlaySectionDescription() does not handle all input section
description keywords, despite GNU ld's documentation stating that "The
section definitions within the OVERLAY construct are identical to those
within the general SECTIONS construct, except that no addresses and no
memory regions may be defined for sections within an OVERLAY."
Reuse the existing parsing in readInputSectionDescription(), which
handles KEEP, allowing the Linux kernel's use case to work properly.
[1]: https://lore.kernel.org/20250221125520.14035-1-ceggers@arri.de/
Remove the global variable `symtab` and add a member variable
(`std::unique_ptr<SymbolTable>`) to `Ctx` instead.
This is one step toward eliminating global states.
Pull Request: https://github.com/llvm/llvm-project/pull/109612
Ctx was introduced in March 2022 as a more suitable place for such
singletons.
We now use default-initialization for `LinkerScript` and should pay
attention to non-class types (e.g. `dot` is initialized by commit
503907dc505db1e439e7061113bf84dd105f2e35).
This allows the input section matching algorithm to be separated from
output section descriptions. This allows a group of sections to be
assigned to multiple output sections, providing an explicit version of
--enable-non-contiguous-regions's spilling that doesn't require altering
global linker script matching behavior with a flag. It also makes the
linker script language more expressive even if spilling is not intended,
since input section matching can be done in a different order than
sections are placed in an output section.
The implementation reuses the backend mechanism provided by
--enable-non-contiguous-regions, so it has roughly similar semantics and
limitations. In particular, sections cannot be spilled into or out of
INSERT, OVERWRITE_SECTIONS, or /DISCARD/. The former two aren't
intrinsic, so it may be possible to relax those restrictions later.
If an included script is under the sysroot directory, when it opens an
absolute path file (`INPUT` or `GROUP`), add sysroot before the absolute
path. When the included script ends, the `isUnderSysroot` state is
restored.
Fix#93947: the cycle detection mechanism added by
https://reviews.llvm.org/D37524 also disallowed including a file twice,
which is an unnecessary limitation.
Now that we have an include stack #100493, supporting multiple inclusion
is trivial. Note: a filename can be referenced with many different
paths, e.g. a.lds, ./a.lds, ././a.lds. We don't attempt to detect the
cycle in the earliest point.
After #100493, the idiom `while (!errorCount() && !consume("}"))` could
lead to inaccurate diagnostics or dead loops. Introduce till to change
the code pattern.
The current tokenize-whole-file approach has a few limitations.
* Lack of state information: `maybeSplitExpr` is needed to parse
expressions. It's infeasible to add new states to behave more like GNU
ld.
* `readInclude` may insert tokens in the middle, leading to a time
complexity issue with N-nested `INCLUDE`.
* line/column information for diagnostics are inaccurate, especially
after an `INCLUDE`.
* `getLineNumber` cannot be made more efficient without significant code
complexity and memory consumption. https://reviews.llvm.org/D104137
The patch switches to a traditional lexer that generates tokens lazily.
* `atEOF` behavior is modified: we need to call `peek` to determine EOF.
* `peek` and `next` cannot call `setError` upon `atEOF`.
* Since `consume` no longer reports an error upon `atEOF`, the idiom `while (!errorCount() && !consume(")"))`
would cause a dead loop. Use `while (peek() != ")" && !atEOF()) { ... } expect(")")` instead.
* An include stack is introduced to handle `readInclude`. This can be
utilized to address #93947 properly.
* `tokens` and `pos` are removed.
* `commandString` is reimplemented. Since it is used in -Map output,
`\n` needs to be replaced with space.
Pull Request: https://github.com/llvm/llvm-project/pull/100493
Support `preinit_array . (TYPE=SHT_PREINIT_ARRAY) : { QUAD(16) }`
Follow-up to https://reviews.llvm.org/D118840
peek2() could be eliminated by a future change.
Implement the two commands described by
https://sourceware.org/binutils/docs/ld/Miscellaneous-Commands.html
After `outputSections` is available, check each output section described
by at least one `NOCROSSREFS`/`NOCROSSERFS_TO` command. For each checked
output section, scan relocations from its input sections.
This step is slow, therefore utilize `parallelForEach(isd->sections, ...)`.
To support non SHF_ALLOC sections, `InputSectionBase::relocations`
(empty) cannot be used. In addition, we may explore eliminating this
member to speed up relocation scanning.
Some parse code is adapted from #95714.
Close#41825
Pull Request: https://github.com/llvm/llvm-project/pull/98773
This patch improves GNU ld compatibility.
Close#87891: Support `OUTPUT_FORMAT(binary)`, which is like
--oformat=binary. --oformat=binary takes precedence over an ELF
`OUTPUT_FORMAT`.
In addition, if more than one OUTPUT_FORMAT command is specified, only
check the first one.
Pull Request: https://github.com/llvm/llvm-project/pull/98837