Towards
#https://github.com/llvm/llvm-project/issues/134809#issuecomment-2787206873
This change moves WasmSym from a static global struct to an instance
owned by Ctx, allowing it to be reset cleanly between linker runs. This
enables safe support for multiple invocations of wasm-ld within the same
process
Changes done
- Converted WasmSym from a static struct to a regular struct with
instance members.
- Added a std::unique_ptr<WasmSym> wasmSym field inside Ctx.
- Reset wasmSym in Ctx::reset() to clear state between links.
- Replaced all WasmSym:: references with ctx.wasmSym->.
- Removed global symbol definitions from Symbols.cpp that are no longer
needed.
Clearing wasmSym in ctx.reset() ensures a clean slate for each link
invocation, preventing symbol leakage across runs—critical when using
wasm-ld/lld as a reentrant library where global state can cause subtle,
hard-to-debug errors.
---------
Co-authored-by: Vassil Vassilev <v.g.vassilev@gmail.com>
This commit adds support for WebAssembly's custom-page-sizes proposal to
`wasm-ld`. An overview of the proposal can be found
[here](https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md).
In a sentence, it allows customizing a Wasm memory's page size, enabling
Wasm to target environments with less than 64KiB of memory (the default
Wasm page size) available for Wasm memories.
This commit contains the following:
* Adds a `--page-size=N` CLI flag to `wasm-ld` for configuring the
linked Wasm binary's linear memory's page size.
* When the page size is configured to a non-default value, then the
final Wasm binary will use the encodings defined in the
custom-page-sizes proposal to declare the linear memory's page size.
* Defines a `__wasm_first_page_end` symbol, whose address points to the
first page in the Wasm linear memory, a.k.a. is the Wasm memory's page
size. This allows writing code that is compatible with any page size,
and doesn't require re-compiling its object code. At the same time,
because it just lowers to a constant rather than a memory access or
something, it enables link-time optimization.
* Adds tests for these new features.
r? @sbc100
cc @sunfishcode
In most circumstances BSS segments are not required in the output binary
but combineOutputSegments was erroneously including them. This meant
that PIC binaries were including the BSS data as zero in the binary.
Fixes: https://github.com/emscripten-core/emscripten/issues/23683
Change the global variable reference to a member access of another
variable `ctx`. In the future, we may pass through `ctx` to functions to
eliminate global variables.
Pull Request: https://github.com/llvm/llvm-project/pull/119835
Instead of always generating __wasm_apply_data_relocs when relevant
options like -pie and -shared are specified, generate it only when the
relevant relocations are actually necessary.
Note: omitting empty __wasm_apply_data_relocs is not a problem because
the export is optional in the spec (DynamicLinking.md) and all runtime
linker implementations I'm aware of implement it that way. (emscripten,
toywasm, wasm-tools)
Motivations:
* This possibly reduces the module size
* This is also a preparation to fix
https://github.com/llvm/llvm-project/issues/107387, for which it isn't
obvious if we need these relocations at the time of
createSyntheticSymbols. (unless we introduce a new explicit option like
--non-pie-dynamic-link.)
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Strip calls to raw_string_ostream::str(), to avoid excess layer of indirection.
This change is enough to allow `--strip-debug` to work on object files,
without breaking the relocation information or symbol table.
A more complete version of this change would instead reconstruct the
symbol table and relocation sections, but that is much larger change.
Bug: #102002
Previously we would ignore all undefined symbols when using
`-shared` or `-pie`. All undefined symbols would be treated as imports
regardless of whether those symbols we defined in any shared library.
With this change we now track symbol in shared libraries and report
undefined symbols in the main program by default.
The old behavior is still available via the
`--unresolved-symbols=import-dynamic` command line flag.
This rationale for allowing this type of breaking change is that `-pie`
and `-shared` are both still experimental will warn as such, unless
`--experimental-pic` is passed.
As part of this change the linker now models shared library symbols
via new SharedFunctionSymbol and SharedDataSymbol types.
I've also added a new `--no-shlib-sigcheck` option that bypassed the
checking of functions signature in shared libraries. This is
specifically required by emscripten the case where the imports/exports
of shared libraries have been modified by via JS type legalization (this
is only needed when targeting old JS engines where bigint is not yet
available
See https://github.com/emscripten-core/emscripten/issues/18198
We recently added `--initial-heap` - an option that allows one to up the
initial memory size without the burden of having to know exactly how
much is needed.
However, in the process of implementing support for this in Emscripten
(https://github.com/emscripten-core/emscripten/pull/21071), we have
realized that `--initial-heap` cannot support the use-case of
non-growable memories by itself, since with it we don't know what to set
`--max-memory` to.
We have thus agreed to move the above work forward by introducing
another option to the linker (see
https://github.com/emscripten-core/emscripten/pull/21071#discussion_r1491755616),
one that would allow users to explicitly specify they want a
non-growable memory.
This change does this by introducing `--no-growable-memory`: an option
that is mutally exclusive with `--max-memory` (for simplicity - we can
also decide that it should override or be overridable by `--max-memory`.
In Emscripten a similar mix of options results in `--no-growable-memory`
taking precedence). The option specifies that the maximum memory size
should be set to the initial memory size, effectively disallowing memory
growth.
Closes#81932.
Also convert from std::vector to SmallVector.
This matches the ELF linker where these were moved into the ctx object
in 9a572164d592e and converted to SmallVector in ba948c5a9c524b.
It is beneficial to preallocate a certain number of pages in the linear
memory (i. e. use the "minimum" field of WASM memories) so that fewer
"memory.grow"s are needed at startup.
So far, the way to do that has been to pass the "--initial-memory"
option to the linker. It works, but has the very significant downside of
requiring the user to know the size of static data beforehand, as it
must not exceed the number of bytes passed-in as "--initial-memory".
The new "--initial-heap" option avoids this downside by simply appending
the specified number of pages to static data (and stack), regardless of
how large they already are.
Ref: https://github.com/emscripten-core/emscripten/issues/20888.
Similar to recent changes to ELF (e.g., D154813), Mach-O, and COFF to
improve hashing performance.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D155752
This patch fixes:
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp:1334:51:
warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
lld/wasm/Writer.cpp:250:39: warning: suggest parentheses around ‘&&’
within ‘||’ [-Wparentheses]
Implement the --build-id flag similarly to ELF, and generate a
build_id section according to the WebAssembly tool convention
specified in https://github.com/WebAssembly/tool-conventions/pull/183
The default style ("fast" aka "tree") hashes the contents of the
output and (unlike ELF) generates a v5 UUID based on the hash (using a
random namespace). It also supports generating a random v4 UUID, a
sha1 hash, and a user-specified string (as ELF does).
Differential Revision: https://reviews.llvm.org/D107662
Fix MSVC build by std::copy on the underying buffer rather than
directly from std::array to llvm::MutableArrayRef
Implement the --build-id flag similarly to ELF, and generate a build_id
section according to the WebAssembly tool convention specified in
https://github.com/WebAssembly/tool-conventions/pull/183
The default style ("fast" aka "tree") hashes the contents of the output
and (unlike ELF) generates a v5 UUID based on the hash (using a random
namespace).
It also supports generating a random v4 UUID, a sha1 hash,
and a user-specified string (as ELF does).
Differential Revision: https://reviews.llvm.org/D107662
This only effects folks building with wasm64 + shared memory which
is not currently a supported configuration in emscripten or any other
wasm toolchain.
Differential Revision: https://reviews.llvm.org/D141005
std::optional::value() has undesired exception checking semantics and is
unavailable in some older Xcode. The call sites block std::optional migration.
I intend to slowly upgrade all alignments to the Align type in lld as well.
Some places talk about alignment in Bytes while other specify them as Log2(Bytes).
Let's make sure all of this is coherent.
Differential Revision: https://reviews.llvm.org/D139181
This adds an `--export-memory` option to wasm-ld which allows passing
a name to give to the exported memory, and extends `--import-memory` to
allow passing a <module>,<name> pair specifying where the memory should
be imported from.
This is based on https://reviews.llvm.org/D131376, with the main
difference being that it only supports exporting memory by one name.
Differential Revision: https://reviews.llvm.org/D135898
Add some checks around this combination of flags
Also, honor `--global-base` when specified in `--stack-first` mode
rather than ignoring it. But error out if the specified base preseeds
the end of the stack.
Differential Revision: https://reviews.llvm.org/D136117
Define a `__heap_end` symbol that marks the end of the memory region
that starts at `__heap_base`. This will allow malloc implementations to
know how much memory they can use at `__heap_base` even if someone has
done a `memory.grow` before they can initialize their state.
Differential Revision: https://reviews.llvm.org/D136110
This flag acts just like the existing `--features` flag but instead
of replacing the set of inferred features it adds to it.
This is useful for example if you want to `--export` a mutable global
but none of the input of object were built with mutable global support.
In that case you can do `--extra-features=mutable-globals` to avoid the
linker error that would otherwise be generated in this case:
wasm-ld: error: mutable global exported but 'mutable-globals' feature not present in inputs: `__stack_pointer`. Use --no-check-features to suppress.
Differential Revision: https://reviews.llvm.org/D135831
The ELF backend originally used `getSymbols()` but went though a
sequence of changes that resulted in this method being called
`symbols()`.
d8f8abbd4a2823f223bd7bc56445541fb221b512 replaced `getSymbols()` with
`forEachSymbol`.
a2fc96441788fba1e4709d63677f34ed8e321dae replaced `forEachSymbol` with
`llvm::iterator_range`.
e9262edf0d11a907763098d8e101219ccd9c43e9 replaced `llvm::iterator_range`
with `symbols()`.
Differential Revision: https://reviews.llvm.org/D131284
Instead, export `__wasm_apply_data_relocs` and `__wasm_call_ctors`
separately.
This is required since user code in a shared library (such as static
constructors) should not be run until relocations have been applied to
all loaded libraries.
See: https://github.com/emscripten-core/emscripten/issues/17295
Differential Revision: https://reviews.llvm.org/D128515