Towards
#https://github.com/llvm/llvm-project/issues/134809#issuecomment-2787206873
This change moves WasmSym from a static global struct to an instance
owned by Ctx, allowing it to be reset cleanly between linker runs. This
enables safe support for multiple invocations of wasm-ld within the same
process
Changes done
- Converted WasmSym from a static struct to a regular struct with
instance members.
- Added a std::unique_ptr<WasmSym> wasmSym field inside Ctx.
- Reset wasmSym in Ctx::reset() to clear state between links.
- Replaced all WasmSym:: references with ctx.wasmSym->.
- Removed global symbol definitions from Symbols.cpp that are no longer
needed.
Clearing wasmSym in ctx.reset() ensures a clean slate for each link
invocation, preventing symbol leakage across runs—critical when using
wasm-ld/lld as a reentrant library where global state can cause subtle,
hard-to-debug errors.
---------
Co-authored-by: Vassil Vassilev <v.g.vassilev@gmail.com>
This commit adds support for WebAssembly's custom-page-sizes proposal to
`wasm-ld`. An overview of the proposal can be found
[here](https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md).
In a sentence, it allows customizing a Wasm memory's page size, enabling
Wasm to target environments with less than 64KiB of memory (the default
Wasm page size) available for Wasm memories.
This commit contains the following:
* Adds a `--page-size=N` CLI flag to `wasm-ld` for configuring the
linked Wasm binary's linear memory's page size.
* When the page size is configured to a non-default value, then the
final Wasm binary will use the encodings defined in the
custom-page-sizes proposal to declare the linear memory's page size.
* Defines a `__wasm_first_page_end` symbol, whose address points to the
first page in the Wasm linear memory, a.k.a. is the Wasm memory's page
size. This allows writing code that is compatible with any page size,
and doesn't require re-compiling its object code. At the same time,
because it just lowers to a constant rather than a memory access or
something, it enables link-time optimization.
* Adds tests for these new features.
r? @sbc100
cc @sunfishcode
Change the global variable reference to a member access of another
variable `ctx`. In the future, we may pass through `ctx` to functions to
eliminate global variables.
Pull Request: https://github.com/llvm/llvm-project/pull/119835
and forward it to LinkerDriver's ctor so that some uses of the global
`config` can be dropped. This is similar to how the ELF port
migrates away from the global `config`.
Pull Request: https://github.com/llvm/llvm-project/pull/119829
Add `allow-multiple-definition` flag to `wasm-ld`. This follows the ELF
linker logic. In case of duplication, the first symbol met is used.
This PR resolves the #97543
Previously we would ignore all undefined symbols when using
`-shared` or `-pie`. All undefined symbols would be treated as imports
regardless of whether those symbols we defined in any shared library.
With this change we now track symbol in shared libraries and report
undefined symbols in the main program by default.
The old behavior is still available via the
`--unresolved-symbols=import-dynamic` command line flag.
This rationale for allowing this type of breaking change is that `-pie`
and `-shared` are both still experimental will warn as such, unless
`--experimental-pic` is passed.
As part of this change the linker now models shared library symbols
via new SharedFunctionSymbol and SharedDataSymbol types.
I've also added a new `--no-shlib-sigcheck` option that bypassed the
checking of functions signature in shared libraries. This is
specifically required by emscripten the case where the imports/exports
of shared libraries have been modified by via JS type legalization (this
is only needed when targeting old JS engines where bigint is not yet
available
See https://github.com/emscripten-core/emscripten/issues/18198
Search *.so libraries regardless of -pie to make it a bit more
straightforward to build non-pie dynamic-linked executables.
Flip the default to -Bstatic (unless -pie or -shared is specified) as I
think it's what most users expect for the default as of today.
The assumption here is that, because dynamic-linking is not widely used
for WebAssembly, the most users do not specify -Bdynamic or -Bstatic,
expecting static link.
Although the recent wasi-sdk ships *.so files, there are not many wasm
runtimes providing the support of dynamic-linking. (only emscripten and
toywasm as far as i know.)
We recently added `--initial-heap` - an option that allows one to up the
initial memory size without the burden of having to know exactly how
much is needed.
However, in the process of implementing support for this in Emscripten
(https://github.com/emscripten-core/emscripten/pull/21071), we have
realized that `--initial-heap` cannot support the use-case of
non-growable memories by itself, since with it we don't know what to set
`--max-memory` to.
We have thus agreed to move the above work forward by introducing
another option to the linker (see
https://github.com/emscripten-core/emscripten/pull/21071#discussion_r1491755616),
one that would allow users to explicitly specify they want a
non-growable memory.
This change does this by introducing `--no-growable-memory`: an option
that is mutally exclusive with `--max-memory` (for simplicity - we can
also decide that it should override or be overridable by `--max-memory`.
In Emscripten a similar mix of options results in `--no-growable-memory`
taking precedence). The option specifies that the maximum memory size
should be set to the initial memory size, effectively disallowing memory
growth.
Closes#81932.
This mirrors how the ELF linker works. I wasn't able to find anywhere
where this is currently tested.
Followup to #78640, which triggered a regression.
Also convert from std::vector to SmallVector.
This matches the ELF linker where these were moved into the ctx object
in 9a572164d592e and converted to SmallVector in ba948c5a9c524b.
It is beneficial to preallocate a certain number of pages in the linear
memory (i. e. use the "minimum" field of WASM memories) so that fewer
"memory.grow"s are needed at startup.
So far, the way to do that has been to pass the "--initial-memory"
option to the linker. It works, but has the very significant downside of
requiring the user to know the size of static data beforehand, as it
must not exceed the number of bytes passed-in as "--initial-memory".
The new "--initial-heap" option avoids this downside by simply appending
the specified number of pages to static data (and stack), regardless of
how large they already are.
Ref: https://github.com/emscripten-core/emscripten/issues/20888.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
This change writes the module name to the name section of the wasm
binary. We use the `-soname` argument to determine the name and we
default the output file basename if this option is not specified.
In the future we will likely want to embed the soname in the dylink
section too, but this the first step in supporting `-soname`.
Differential Revision: https://reviews.llvm.org/D158001
Implement the --build-id flag similarly to ELF, and generate a
build_id section according to the WebAssembly tool convention
specified in https://github.com/WebAssembly/tool-conventions/pull/183
The default style ("fast" aka "tree") hashes the contents of the
output and (unlike ELF) generates a v5 UUID based on the hash (using a
random namespace). It also supports generating a random v4 UUID, a
sha1 hash, and a user-specified string (as ELF does).
Differential Revision: https://reviews.llvm.org/D107662
Fix MSVC build by std::copy on the underying buffer rather than
directly from std::array to llvm::MutableArrayRef
Implement the --build-id flag similarly to ELF, and generate a build_id
section according to the WebAssembly tool convention specified in
https://github.com/WebAssembly/tool-conventions/pull/183
The default style ("fast" aka "tree") hashes the contents of the output
and (unlike ELF) generates a v5 UUID based on the hash (using a random
namespace).
It also supports generating a random v4 UUID, a sha1 hash,
and a user-specified string (as ELF does).
Differential Revision: https://reviews.llvm.org/D107662
Allow controlling the CodeGenOpt::Level independent of the LTO
optimization level in LLD via new options for the COFF, ELF, MachO, and
wasm frontends to lld. Most are spelled as --lto-CGO[0-3], but COFF is
spelled as -opt:lldltocgo=[0-3].
See D57422 for discussion surrounding the issue of how to set the CG opt
level. The ultimate goal is to let each function control its CG opt
level, but until then the current default means it is impossible to
specify a CG opt level lower than 2 while using LTO. This option gives
the user a means to control it for as long as it is not handled on a
per-function basis.
Reviewed By: MaskRay, #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D141970
This adds an `--export-memory` option to wasm-ld which allows passing
a name to give to the exported memory, and extends `--import-memory` to
allow passing a <module>,<name> pair specifying where the memory should
be imported from.
This is based on https://reviews.llvm.org/D131376, with the main
difference being that it only supports exporting memory by one name.
Differential Revision: https://reviews.llvm.org/D135898
This flag acts just like the existing `--features` flag but instead
of replacing the set of inferred features it adds to it.
This is useful for example if you want to `--export` a mutable global
but none of the input of object were built with mutable global support.
In that case you can do `--extra-features=mutable-globals` to avoid the
linker error that would otherwise be generated in this case:
wasm-ld: error: mutable global exported but 'mutable-globals' feature not present in inputs: `__stack_pointer`. Use --no-check-features to suppress.
Differential Revision: https://reviews.llvm.org/D135831
- Add support -Bdynamic/-Bstatic and their aliases
- Add support for `--library` and `--library-path` long form args
- Add test based on test/ELF/libsearch.s
- In `-Bdynamic` mode search for `.so` files in preference to `.a`.
- Unlike ELF continue to default to static mode until `-pie` or
`-shared` are used.
Differential Revision: https://reviews.llvm.org/D135087
This removes options for performing LTO with the legacy pass
manager in LLD. Options that explicitly enable the new pass manager
are retained as no-ops.
Differential Revision: https://reviews.llvm.org/D123219
In particular we use these in two places:
1. When building PIC code we no longer need to combine output segments
into a single segment that can be initialized at `__memory_base`.
Instead each segment can encode its offset from `__memory_base` in
its initializer. e.g.
```
(i32.add (global.get __memory_base) (i32.const offset)
```
2. When building PIC code we no longer need to relocation internalized
global addresses. We can just initialize them with their correct
offsets.
Differential Revision: https://reviews.llvm.org/D121420
This is a new mode for handling unresolved symbols that allows all
symbols to be imported in the same that they would be in the case of
`-fpie` or `-shared`, but generting an otherwise fixed/non-relocatable
binary.
Code linked in this way should still be compiled with `-fPIC` so that
data symbols can be resolved via imports.
This essentially allows the building of static binaries that have
dynamic imports. See:
https://github.com/emscripten-core/emscripten/issues/12682
As with other uses of the experimental dynamic linking ABI, this
behaviour will produce a warning unless run with `--experimental-pic`.
Differential Revision: https://reviews.llvm.org/D91577
Previously we were relying on the dynamic loader to take care of this
but it simple and correct for us to do it here instead.
Now we initialize bss segments as part of `__wasm_init_memory` at the
same time we initialize passive segments.
In addition we extent the us of `__wasm_init_memory` outside of shared
memory situations. Specifically it is now used to initialize bss
segments when the memory is imported.
Differential Revision: https://reviews.llvm.org/D112667
This change revisits https://reviews.llvm.org/D79248 which originally
added support for the --unresolved-symbols flag.
At the time I thought it would make sense to add a third option to this
flag called `import-functions` but it turns out (as was suspects by on
the reviewers IIRC) that this option can be authoganal.
Instead I've added a new option called `--import-undefined` that only
operates on symbols that can be imported (for example, function symbols
can always be imported as opposed to data symbols we can only be
imported when compiling with PIC).
This option gives us the full expresivitiy that emscripten needs to be
able allow reporting of undefined data symbols as well as the option to
disable that.
This change does remove the `--unresolved-symbols=import-functions`
option, which is been in the codebase now for about a year but I would
be extremely surprised if anyone was using it.
Differential Revision: https://reviews.llvm.org/D103290
MVP object files may import at most one table, and if they do, it must
be assigned table number zero in the output, as the references to that
table are not relocatable. Ensure that this is the case, even if some
inputs define other tables.
Differential Revision: https://reviews.llvm.org/D96001
This is a more full featured version of ``--allow-undefined``.
The semantics of the different methods are as follows:
report-all:
Report all unresolved symbols. This is the default. Normally the
linker will generate an error message for each reported unresolved
symbol but the option ``--warn-unresolved-symbols`` can change this
to a warning.
ignore-all:
Resolve all undefined symbols to zero. For data and function
addresses this is trivial. For direct function calls, the linker
will generate a trapping stub function in place of the undefined
function.
import-functions:
Generate WebAssembly imports for any undefined functions. Undefined
data symbols are resolved to zero as in `ignore-all`. This
corresponds to the legacy ``--allow-undefined`` flag.
The plan is to followup with a new mode called `import-dynamic` which
allows for statically linked binaries to refer to both data and
functions symbols from the embedder.
Differential Revision: https://reviews.llvm.org/D79248
This flag works in a similar way to the ELF linker in that it
will resolve any defined symbols to their local definition with
a shared library or -pie executable.
This flag has no effect on static linking.
Differential Revision: https://reviews.llvm.org/D89152
The meaning of -shared and -pie are expected to be changed in the
future when Module Linking-style libraries are implemented. Begin
issuing warnings to give people a heads-up that they will be changing.
For compatibility with Emscripten, add a --experimental-pic flag which
disables these warnings.
Differential Revision: https://reviews.llvm.org/D81760
Summary:
A previous change (53211a) had updated the argument parsing to handle
large max memories, but 4294967296 would still wrap to zero after the
options were parsed. This change updates the configuration to use a
64-bit integer to store the max memory to avoid that overflow.
Reviewers: sbc100
Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77437
Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled.
One can now say in LLD:
/opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified.
/opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency().
/opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask.
When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows).
When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows).
Differential Revision: https://reviews.llvm.org/D75153
Summary:
- `__wasm_init_memory` is now the WebAssembly start function instead
of being called from `__wasm_call_ctors` or called directly by the
runtime.
- Adds a new synthetic data symbol `__wasm_init_memory_flag` that is
atomically incremented from zero to one by the thread responsible
for initializing memory.
- All threads now unconditionally perform data.drop on all passive
segments.
- Removes --passive-segments and --active-segments flags and controls
segment type based on --shared-memory instead. The deleted flags
were only present to ameliorate the upgrade path in Emscripten.
Reviewers: sbc100, aheejin
Subscribers: dschuff, jgravelle-google, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65783
llvm-svn: 370965
Adds --growable-table flag to handle building wasm modules with tables
that can grow.
Wasm tables that we use to store function pointers. In order to add functions
to that table at runtime, we need to either preallocate space, or grow the table.
In order to specify a table with no maximum size, we need some flag to handle
that case, separately from a potential --max-table-size= flag.
Note that the number of elements in the table isn't knowable until link-time,
so it's unclear if we will want a --max-table-size= flag in the future.
llvm-svn: 370127