This commit adds support for WebAssembly's custom-page-sizes proposal to
`wasm-ld`. An overview of the proposal can be found
[here](https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md).
In a sentence, it allows customizing a Wasm memory's page size, enabling
Wasm to target environments with less than 64KiB of memory (the default
Wasm page size) available for Wasm memories.
This commit contains the following:
* Adds a `--page-size=N` CLI flag to `wasm-ld` for configuring the
linked Wasm binary's linear memory's page size.
* When the page size is configured to a non-default value, then the
final Wasm binary will use the encodings defined in the
custom-page-sizes proposal to declare the linear memory's page size.
* Defines a `__wasm_first_page_end` symbol, whose address points to the
first page in the Wasm linear memory, a.k.a. is the Wasm memory's page
size. This allows writing code that is compatible with any page size,
and doesn't require re-compiling its object code. At the same time,
because it just lowers to a constant rather than a memory access or
something, it enables link-time optimization.
* Adds tests for these new features.
r? @sbc100
cc @sunfishcode
The rule here, which I'm copying from the ELF linker, is that shared
library symbols should take presence, unless the symbol has already be
extracted from the archive. e.g:
```
$ wasm-ld foo.a foo.so ref.o // .so wins
$ wasm-ld foo.a ref.o foo.so // .a wins
```
In the first case the shared library takes precedence because the lazy
symbol is replaced by the .so symbol before it is extracted from the
archive. In the second example the ref.o file causes the archive to be
exracted before the .so file is processed, so in that case the archive
file wins.
Fixes: https://github.com/emscripten-core/emscripten/issues/23501
Change the global variable reference to a member access of another
variable `ctx`. In the future, we may pass through `ctx` to functions to
eliminate global variables.
Pull Request: https://github.com/llvm/llvm-project/pull/119835
Add `allow-multiple-definition` flag to `wasm-ld`. This follows the ELF
linker logic. In case of duplication, the first symbol met is used.
This PR resolves the #97543
Previously we would ignore all undefined symbols when using
`-shared` or `-pie`. All undefined symbols would be treated as imports
regardless of whether those symbols we defined in any shared library.
With this change we now track symbol in shared libraries and report
undefined symbols in the main program by default.
The old behavior is still available via the
`--unresolved-symbols=import-dynamic` command line flag.
This rationale for allowing this type of breaking change is that `-pie`
and `-shared` are both still experimental will warn as such, unless
`--experimental-pic` is passed.
As part of this change the linker now models shared library symbols
via new SharedFunctionSymbol and SharedDataSymbol types.
I've also added a new `--no-shlib-sigcheck` option that bypassed the
checking of functions signature in shared libraries. This is
specifically required by emscripten the case where the imports/exports
of shared libraries have been modified by via JS type legalization (this
is only needed when targeting old JS engines where bigint is not yet
available
See https://github.com/emscripten-core/emscripten/issues/18198
When reference types are enabled clang will generate call_indirect
instructions that explicitly reference the global
`__indirect_function_table` symbol.
In this case the resulting global symbol was not being correctly marked
with explicit import name/module, resulting in the linker reporting
errors when it was referenced.
This issue was reported in
https://github.com/WebAssembly/tool-conventions/issues/158
Followup to #78658, which caused a regression in emscripten.
When a lazy symbol is added, which resolved and existing undefined
symbol, we don't need/want to replace the undefined symbol with the lazy
one. Instead we called extract, which replaces the undefined symbol with
the defined one.
The fact that we were first replacing the undefined symbol with a lazy
one before extracting the archive member doesn't normally matter but, in
the case of the function symbol, replacing the undefined symbol with a
lazy symbol means that `addDefinedFunction` sees the existing symbol as
lazy and simply replaces it.
Note that this is consistent with both the ELF code in
`Symbol::resolve(const LazySymbol &other)` and the wasm code prior to
#78658, neither of which replace the existing symbol with the lazy one
in this case.
The ELF linker transitioned away from archive indexes in
https://reviews.llvm.org/D117284.
This paves the way for supporting `--start-lib`/`--end-lib` (See #77960)
The ELF linker unified library handling with `--start-lib`/`--end-lib` and removed
the ArchiveFile class in https://reviews.llvm.org/D119074.
Also convert from std::vector to SmallVector.
This matches the ELF linker where these were moved into the ctx object
in 9a572164d592e and converted to SmallVector in ba948c5a9c524b.
LLVM models some features found in the binary format with raw integers
and others with nested or enumerated types. This PR switches modeling of
tables and segments to use wasm::ValType rather than uint32_t. This NFC
change is in preparation for modeling more reference types, but IMO is
also cleaner and closer to the spec.
When doing LTO on multiple archives, the order with which bitcodes are
linked to the LTO module is hard to control, given that processing
undefined symbols can lead to parsing of an object file, which in turn
lead to parsing of another object file before finishing parsing of the
previous file. This can result in encountering a non-prevailing comdat
first when linking, which can make the the symbol undefined, and the
real definition is added later with an additional prefix to avoid
duplication (e.g. `__cxx_global_var_init` and `__cxx_global_var_init.2`)
So this one-line fix ensures we compile bitcodes in the order that we
process comdats, so that when multiple archived bitcode files have the
same variable with the same comdat, we make sure that the prevailing
comdat will be linked first in the LTO.
Fixes#62243.
Fixes a crash in `-Wl,-emit-relocs` where the linker was not able to
write linker-synthetic absolute symbols to the symbol table.
This change adds a new symbol flag (`WASM_SYMBOL_ABS`), which means that
the symbol's offset is absolute and not relative to a given segment.
Such symbols include `__stack_low` and `__stack_low`.
Note that wasm object files never contains such symbols, only binaries
linked with `-Wl,-emit-relocs`.
Fixes: #67111
The ELF backend originally used `getSymbols()` but went though a
sequence of changes that resulted in this method being called
`symbols()`.
d8f8abbd4a2823f223bd7bc56445541fb221b512 replaced `getSymbols()` with
`forEachSymbol`.
a2fc96441788fba1e4709d63677f34ed8e321dae replaced `forEachSymbol` with
`llvm::iterator_range`.
e9262edf0d11a907763098d8e101219ccd9c43e9 replaced `llvm::iterator_range`
with `symbols()`.
Differential Revision: https://reviews.llvm.org/D131284
A LazySymbol is one that lives in `.a` archive and gets pulled in by a
strong reference. However, weak references to such symbols do not
result in them be loaded from the archive. In this case we want to
treat such symbols at undefined rather then lazy, once symbols
resolution is complete.
This fixes a crash bug in the linker when weakly referenced symbol that
lives in an archive file is live at the end of the link. In the case of
dynamic linking this is expected to turn into an import with (in the
case of a function symbol) a function index.
Differential Revision: https://reviews.llvm.org/D130736
Symbols from LTO objects don't contain Wasm signatures, but we need a
signature when we create undefined/stub functions for missing weakly
undefined symbols.
Luckily, after LTO, we know that symbols that are not referenced by a
regular object file must not be needed in the final output so there
is no need to generate undefined/stub function for them.
Differential Revision: https://reviews.llvm.org/D126554
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.
See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html
Differential Revision: https://reviews.llvm.org/D108850
This makes Wasm EH work with dynamic linking. So far we were only able
to handle destructors, which do not use any tags or LSDA info.
1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols,
which points to the address of per-function LSDA info. It is more
convenient to use than `MCSymbol` because it can take additional
target flags.
2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the
symbol relative to `__memory_base` and generate the `add` node. If
PIC is disabled, continue to use the absolute address.
3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in
the backend, because it is hard to make it work with dynamic
linking's loading order. Instead, we make all tag symbols undefined
in the LLVM backend and import it from JS.
4. Add support for undefined tags to the linker.
Companion patches:
- https://github.com/WebAssembly/binaryen/pull/4223
- https://github.com/emscripten-core/emscripten/pull/15266
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D111388
This removes `WasmTagType`. `WasmTagType` contained an attribute and a
signature index:
```
struct WasmTagType {
uint8_t Attribute;
uint32_t SigIndex;
};
```
Currently the attribute field is not used and reserved for future use,
and always 0. And that this class contains `SigIndex` as its property is
a little weird in the place, because the tag type's signature index is
not an inherent property of a tag but rather a reference to another
section that changes after linking. This makes tag handling in the
linker also weird that tag-related methods are taking both `WasmTagType`
and `WasmSignature` even though `WasmTagType` contains a signature
index. This is because the signature index changes in linking so it
doesn't have any info at this point. This instead moves `SigIndex` to
`struct WasmTag` itself, as we did for `struct WasmFunction` in D111104.
In this CL, in lib/MC and lib/Object, this now treats tag types in the
same way as function types. Also in YAML, this removes `struct Tag`,
because now it only contains the tag index. Also tags set `SigIndex` in
`WasmImport` union, as functions do.
I think this makes things simpler and makes tag handling more in line
with function handling. These two shares similar properties in that both
of them have signatures, but they are kind of nominal so having the same
signature doesn't mean they are the same element.
Also a drive-by fix: the reserved 'attirubute' part's encoding changed
from uleb32 to uint8 a while ago. This was fixed in lib/MC and
lib/Object but not in YAML. This doesn't change object files because the
field's value is always 0 and its encoding is the same for the both
encoding.
This is effectively NFC; I didn't mark it as such just because it
changed YAML test results.
Reviewed By: sbc100, tlively
Differential Revision: https://reviews.llvm.org/D111086
In PIC mode we import function address via `GOT.mem` imports but for
direct function calls we still import the first class function.
However, if the function is never directly called we can avoid the first
class import completely.
Differential Revision: https://reviews.llvm.org/D108345
The main motivation for this refactor is to remove the subclass
relationship between the InputSegment and MergeInputSegment and
SyntenticMergedInputSegment so that we can use the merging classes for
debug sections which are not data segments.
In the process of refactoring I also remove all the virtual functions
from the class hierarchy and try to reuse techniques used in the ELF
linker (see `lld/ELF/InputSections.h`).
Differential Revision: https://reviews.llvm.org/D102546
Specifically:
- InputChunk::outputOffset -> outSecOffset
- Symbol::get/setVirtualAddress -> get/setVA
- add InputChunk::getOffset helper that takes an offset
These are mostly in preparation for adding support for
SHF_MERGE/SHF_STRINGS but its also good to align with ELF where
possible.
Differential Revision: https://reviews.llvm.org/D97595
This commit regroups commonalities among InputGlobal, InputEvent, and
InputTable into the new InputElement. The subclasses are defined
inline in the new InputElement.h. NFC.
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D94677
addOptionalGlobalSymbols should be addOptionalGlobalSymbol.
Also, remove unnecessary additional argument to make the signature match
the sibling function: addOptionalDataSymbol.
Differential Revision: https://reviews.llvm.org/D96305
This patch adds support to wasm-ld for linking multiple table references
together, in a manner similar to wasm globals. The indirect function
table is synthesized as needed.
To manage the transitional period in which the compiler doesn't yet
produce TABLE_NUMBER relocations and doesn't residualize table symbols,
the linker will detect object files which have table imports or
definitions, but no table symbols. In that case it will synthesize
symbols for the defined and imported tables.
As a change, relocatable objects are now written with table symbols,
which can cause symbol renumbering in some of the tests. If no object
file requires an indirect function table, none will be written to the
file. Note that for legacy ObjFile inputs, this test is conservative: as
we don't have relocs for each use of the indirecy function table, we
just assume that any incoming indirect function table should be
propagated to the output.
Differential Revision: https://reviews.llvm.org/D91870
This patch adds support to wasm-ld for linking multiple table references
together, in a manner similar to wasm globals. The indirect function
table is synthesized as needed.
To manage the transitional period in which the compiler doesn't yet
produce TABLE_NUMBER relocations and doesn't residualize table symbols,
the linker will detect object files which have table imports or
definitions, but no table symbols. In that case it will synthesize
symbols for the defined and imported tables.
As a change, relocatable objects are now written with table symbols,
which can cause symbol renumbering in some of the tests. If no object
file requires an indirect function table, none will be written to the
file. Note that for legacy ObjFile inputs, this test is conservative: as
we don't have relocs for each use of the indirecy function table, we
just assume that any incoming indirect function table should be
propagated to the output.
Differential Revision: https://reviews.llvm.org/D91870
This commit adds table symbol support in a partial way, while still
including some special cases for the __indirect_function_table symbol.
No change in tests.
Differential Revision: https://reviews.llvm.org/D94075
Without this extra flag we can't distingish between stub functions and
functions that happen to have address 0 (relative to __table_base).
Adding this flag bit the base symbol class actually avoids growing the
SymbolUnion struct which would not be true if we added it to the
FunctionSymbol subclass (due to bitbacking).
The previous approach of setting it's table index to zero worked for
normal static relocations but not for `-fPIC` code.
See https://github.com/emscripten-core/emscripten/issues/12819
Differential Revision: https://reviews.llvm.org/D92038
This is a more full featured version of ``--allow-undefined``.
The semantics of the different methods are as follows:
report-all:
Report all unresolved symbols. This is the default. Normally the
linker will generate an error message for each reported unresolved
symbol but the option ``--warn-unresolved-symbols`` can change this
to a warning.
ignore-all:
Resolve all undefined symbols to zero. For data and function
addresses this is trivial. For direct function calls, the linker
will generate a trapping stub function in place of the undefined
function.
import-functions:
Generate WebAssembly imports for any undefined functions. Undefined
data symbols are resolved to zero as in `ignore-all`. This
corresponds to the legacy ``--allow-undefined`` flag.
The plan is to followup with a new mode called `import-dynamic` which
allows for statically linked binaries to refer to both data and
functions symbols from the embedder.
Differential Revision: https://reviews.llvm.org/D79248
Previously we limited the use of atomics and TLS to programs
linked with `--shared-memory`.
However, as of https://reviews.llvm.org/D79530 we now allow
programs that use atomic to be linked without `--shared-memory`.
For this to be useful we also want to all TLS usage in such
programs. In this case, since we know we are single threaded
we simply include the TLS data as a regular active segment
and create an immutable `__tls_base` global that point to the
start of this segment.
Fixes: https://github.com/emscripten-core/emscripten/issues/12489
Differential Revision: https://reviews.llvm.org/D91115