This defines some new target features. These are subsets of existing
features that reflect implementation concerns:
- "call-indirect-overlong" - implied by "reference-types"; just the
overlong encoding for the `call_indirect` immediate, and not the actual
reference types.
- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.
This is split out from https://github.com/llvm/llvm-project/pull/112035.
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
When there was a type checking error, we didn't run `InstPrinter`. This
can be confusing because when there is an error in, say, block parameter
type, `InstPrinter` doesn't run even if it has nothing to do with block
parameter types, and all those updates to `ControlFlowStack` or
`TryStack` do not happen:
c20b90ab85/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.cpp (L135-L151)
For example,
```wast
block (i32) -> () ;; Block input parameter error
end_block ;; Now this errors out as "End marker mismatch"
```
This is confusing because there is a `block` and the `end_block` is not
a mismatch. Only that `block` has a type checking error, but that's not
an end marker mismatch.
I think we can just print the instruction whether we had a type checking
error or not, and this will be less confusing.
This makes the type checker handle blocks with input parameters and
return types, branches, and polymorphic stacks correctly.
We maintain the stack of "block info", which contains its input
parameter type, return type, and whether it is a loop or not. And this
is used when checking the validity of the value stack at the `end`
marker and all branches targeting the block.
`StackType` now supports a new variant `Polymorphic`, which indicates
the stack is in the polymorphic state. `Polymorphic`s are not popped
even when `popType` is executed; they are only popped when the current
block ends.
When popping from the value stack, we ensure we don't pop more than we
are allowed to at the given block level and print appropriate error
messages instead. Also after a block ends, the value stack is guaranteed
to have the right types based on the block return type. For example,
```wast
block i32
unreachable
end_block
;; You can expect to have an i32 on the stack here
```
This also adds handling for `br_if`. Previously only `br`s were checked.
`checkEnd` and `checkBr` were removed and their contents have been
inlined to the main `typeCheck` function, because they are called only
from a single callsite.
This also fixes two existing bugs in AsmParser, which were required to
make the tests passing. I added Github comments about them inline.
This modifies several existing invalid tests, those that passed
(incorrectly) before but do not pass with the new type checker anymore.
Fixes#107524.
This unifies the way we check types in various places in AsmTypeCheck.
The objectives of this PR are:
- We now use `checkTypes` for all type checking and `checkAndPopTypes`
for type checking + popping. All other functions are helper functions to
call these two functions.
- We now support comparisons of types between vectors. This lets us
printing error messages in more readable way. When an instruction takes
[i32, i64] but the stack top is [f32, f64], now instead of
```console
error: type mismatch, expected i64 but got f64
error: type mismatch, expected i32 but got f32
```
we can print this
```console
error: type mismatch, expected [i32, i64] but got [f32, f64]
```
which is also the format Wabt checker prints. This also helps printing
more meaningful messages when there are superfluous values on the stack
at the end of the function, such as:
```console
error: type mismatch, expected [] but got [i32, exnref]
```
Actually, many instructions are not utilizing this batch printing now,
which still causes multiple error messages to be printed for a single
instruction. This will be improved in a follow-up.
- The value stack now supports `Any` and `Ref`. There are instructions
that requires the type to be anything. Also instructions like
`ref.is_null` requires the type to be any reference types. Type
comparison function will handle this types accordingly, meaning
`match(I32, Any)` or `match(externref, Ref)` will succeed.
The changes in `type-checker-errors.s` are mostly the message format
changes. One downside of the new message format is that it doesn't have
instruction names in it. I plan to improve that in a potential
follow-up.
This also made some modifications in the instructions in
`type-checker-errors.s`. Currently, except for a few functions I've
recently added at the end, each function tests for a single error,
because the type checker used to bail out after the first error until
#109705. But many functions included multiple errors anyway, which I
don't think was the intention of the original writer. So I added some
instructions to remove the other errors which are not being tested. (In
some cases I added more error checking lines instead, when I felt that
could be relevant.)
Thanks to the new `ExactMatch` option in `checkTypes` function family,
we now can distinguish the cases when to check against only the top of
the value stack and when to check against the whole stack (e.g. to check
whether we have any superfluous values remaining at the end of the
function). `return` or `return_call(_indirect)` can set `ExactMatch` to
`false` because they don't care about the superfluous values. This makes
`type-checker-return.s` succeed and I was able to remove the `FIXME`.
This is the basis of the PR that fixes block parameter/return type
handling in the checker, but does not yet include the actual
block-related functionality, which will be submitted separately after
this PR.
Fixes are mostly one of these:
- `auto` -> `auto *` when the type is a pointer
- Function names start with a lowercase letter
- Variable names start with an uppercase letter
- No need to have an `else` after a `return`
Diff without whitespaces is easier to view.
This adds assembly parsing support for the new EH (exnref) proposal.
`try_table` parsing is a little tricky because catch clause lists use
`()` and the multivalue block return types also use `()`. This handles
all combinations below:
- No return type (void) + no catch list
- No return type (void) + catch list
- Single return type + no catch list
- Single return type + catch list
- Multivalue return type + no catch list
- Multivalue return type + catch list
This does not include AsmTypeCheck support yet. That's the reason why
this adds a new test file and use `--no-type-check` in the command line.
After the type checker is added as a follow-up, I plan to merge
https://github.com/llvm/llvm-project/blob/main/llvm/test/MC/WebAssembly/eh-assembly-legacy.s
with this file. (Turning on `-mattr=+exception-handling` adds support
for all legacy and new EH instructions in the assembly.
`-wasm-enable-exnref` in `llc` only controls which instructions to
generate and it doesn't affect `llvm-mc` and assembly parsing.)
This adds the basic assembly generation support for the final EH
proposal, which was newly adopted in Sep 2023 and advanced into Phase 4
in Jul 2024:
https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md
This adds support for the generation of new `try_table` and `throw_ref`
instruction in .s asesmbly format. This does NOT yet include
- Block annotation comment generation for .s format
- .o object file generation
- .s assembly parsing
- Type checking (AsmTypeCheck)
- Disassembler
- Fixing unwind mismatches in CFGStackify
These will be added as follow-up PRs.
---
The format for `TRY_TABLE`, both for `MachineInstr` and `MCInst`, is as
follows:
```
TRY_TABLE type number_of_catches catch_clauses*
```
where `catch_clause` is
```
catch_opcode tag+ destination
```
`catch_opcode` should be one of 0/1/2/3, which denotes
`CATCH`/`CATCH_REF`/`CATCH_ALL`/`CATCH_ALL_REF` respectively. (See
`BinaryFormat/Wasm.h`) `tag` exists when the catch is one of `CATCH` or
`CATCH_REF`.
The MIR format is printed as just the list of raw operands. The
(stack-based) assembly instruction supports pretty-printing, including
printing `catch` clauses by name, in InstPrinter.
In addition to the new instructions `TRY_TABLE` and `THROW_REF`, this
adds four pseudo instructions: `CATCH`, `CATCH_REF`, `CATCH_ALL`, and
`CATCH_ALL_REF`. These are pseudo instructions to simulate block return
values of `catch`, `catch_ref`, `catch_all`, `catch_all_ref` clauses in
`try_table` respectively, given that we don't support block return
values except for one case (`fixEndsAtEndOfFunction` in CFGStackify).
These will be omitted when we lower the instructions to `MCInst` at the
end.
LateEHPrepare now will have one more stage to covert
`CATCH`/`CATCH_ALL`s to `CATCH_REF`/`CATCH_ALL_REF`s when there is a
`RETHROW` to rethrow its exception. The pass also converts `RETHROW`s
into `THROW_REF`. Note that we still use `RETHROW` as an interim pseudo
instruction until we convert them to `THROW_REF` in LateEHPrepare.
CFGStackify has a new `placeTryTableMarker` function, which places
`try_table`/`end_try_table` markers with a necessary `catch` clause and
also `block`/`end_block` markers for the destination of the `catch`
clause.
In MCInstLower, now we need to support one more case for the multivalue
block signature (`catch_ref`'s destination's `(i32, exnref)` return
type).
InstPrinter has a new routine to print the `catch_list` type, which is
used to print `try_table` instructions.
The new test, `exception.ll`'s source is the same as
`exception-legacy.ll`, with the FileCheck expectations changed. One
difference is the commands in this file have `-wasm-enable-exnref` to
test the new format, and don't have `-wasm-disable-explicit-locals
-wasm-keep-registers`, because the new custom InstPrinter routine to
print `catch_list` only works for the stack-based instructions (`_S`),
and we can't use `-wasm-keep-registers` for them.
As in `exception-legacy.ll`, the FileCheck lines for the new tests do
not contain the whole program; they mostly contain only the control flow
instructions for readability.
There are only three actual uses of the section kind in MCSection:
isText(), XCOFF, and WebAssembly. Store isText() in the MCSection, and
store other info in the actual section variants where required.
ELF and COFF flags also encode all relevant information, so for these
two section variants, remove the SectionKind parameter entirely.
This allows to remove the string switch (which is unnecessary and
inaccurate) from createELFSectionImpl. This was introduced in
[D133456](https://reviews.llvm.org/D133456), but apparently, it was
never hit for non-writable sections anyway and the resulting kind was
never used.
Fixes#85578, a use-after-free caused by some `MCSymbolWasm` data being
freed too early.
Previously, `WebAssemblyAsmParser` owned the data that is moved to
`MCContext` by this PR, which caused problems when handling module ASM,
because the ASM parser was destroyed after parsing the module ASM, but
the symbols persisted.
The added test passes locally with an LLVM build with AddressSanitizer
enabled.
Implementation notes:
* I've called the added method
<code>allocate<b><i>Generic</i></b>String</code> and added the second
paragraph of its documentation to maybe guide people a bit on when to
use this method (based on my (limited) understanding of the `MCContext`
class). We could also just call it `allocateString` and remove that
second paragraph.
* The added `createWasmSignature` method does not support taking the
return and parameter types as arguments: Specifying them afterwards is
barely any longer and prevents them from being accidentally specified in
the wrong order.
* This removes a _"TODO: Do the uniquing of Signatures here instead of
ObjectFileWriter?"_ since the field it's attached to is also removed.
Let me know if you think that TODO should be preserved somewhere.
LLVM models some features found in the binary format with raw integers
and others with nested or enumerated types. This PR switches modeling of
tables and segments to use wasm::ValType rather than uint32_t. This NFC
change is in preparation for modeling more reference types, but IMO is
also cleaner and closer to the spec.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This finishes the work of replacing OperandMatchResultTy with
ParseStatus, started in D154101.
As a drive-by change, rename some RegNo variables to just Reg
(a leftover from the days when RegNo had 'unsigned' type).
Conventionally, parsing methods return false on success and true on
error. However, directive parsing methods need a third state: the
directive is not target specific. AsmParser::parseStatement detected
this case by using a fragile heuristic: if the target parser did not
consume any tokens, the directive is assumed to be not target-specific.
Some targets fail to follow the convention: they return success after
emitting an error or do not consume the entire line and return failure
on successful parsing. This was partially worked around by checking for
pending errors in parseStatement.
This patch tries to improve the situation by introducing parseDirective
method that returns ParseStatus -- three-state class. The new method
should eventually replace the old one returning bool.
ParseStatus is intentionally implicitly constructible from bool to allow
uses like `return Error(Loc, "message")`. It also has a potential to
replace OperandMatchResulTy as it is more convenient to use due to the
implicit construction from bool and more type safe.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D154101
When we encounter an `else`, `catch`, or `catch_all`, we currently just
push the structure `NestingType` and don't preserve the original `if`
and `try`'s signature. So after we pass `else`/`catch`/`catch_all`, we
can't check if the values on stack have the correct types when we
encounter `end_if` or `end_try`. This CL fixes the issue, and modifies
the existing test to be correct (some of them had `try` without
`catch`).
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D147881
The current code is
```
ExpectBlockType = false;
TC.setLastSig(*Signature.get());
if (ExpectBlockType)
NestingStack.back().Sig = *Signature.get();
```
Because of the first line, the third line's `if (ExpectBlockType)` is
always false and we don't get to update `NestingStack.back().Sig`. This
results in not correctly erroring out when the types of remaining values
on the stack do not match the block type if the block type is written in
the form of a function type. We should set `ExpectBlockType` to false
after the `if`.
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D147837
WebAssemblyUtils depends on CodeGen which depends on all middle end
optimization libraries.
This component is used by WebAssembly's AsmParser, Disassembler, and
MCTargetDesc libraries. Because of this, any MC layer tool built with
WebAssembly support includes a larger portion of LLVM than it should.
To fix this I've created an MC only version of WebAssemblyTypeUtilities.cpp
in MCTargetDesc to be used by the MC components.
This shrinks llvm-objdump and llvm-mc on my local release+asserts
build by 5-6 MB.
Reviewed By: MaskRay, aheejin
Differential Revision: https://reviews.llvm.org/D144354
Local info is supposed to be emitted in the start of every function.
When there are locals, `.local` section should be present, and we emit
local info according to the section.
If there is no locals, empty local info should be emitted. This empty
local info is emitted whenever a first instruction is emitted within a
function without encountering a `.local` section. If there is no
instruction, `end_function` pseudo instruction should be present and the
empty local info will be emitted when parsing the pseudo instruction.
The following assembly is malformed because the function `test` doesn't
have an `end_function` at the end, and the parser doesn't end up
emitting the empty local info needed. But currently we don't error out
and silently produce an invalid binary.
```
.functype test () -> ()
test:
```
This patch adds one extra state to the Wasm assembly parser,
`FunctionLabel` to detect whether a function label is parsed but not
ended properly when the next function starts or the file ends.
It is somewhat tricky to distinguish `FunctionLabel` and
`FunctionStart`, because it is not always possible to ensure the state
goes from `FunctionLabel` -> `FunctionStart`. `.functype` directive does
not seem to be mandated before a function label, in which case we don't
know if the label is a function at the time of parsing. But when we do
know the label is function, we would like to ensure it ends with an
`end_function` properly. Also we would like to error out when it does
not.
For example,
```
.functype test() -> ()
test:
```
We should error out for this because we know `test` is a function and it
doesn't end with an `end_function`. This PR fixes this.
```
test:
```
We don't error out for this because there is no info that `test` is a
function, so we don't know whether there should be an `end_function` or
not.
```
test:
.functype test() -> ()
```
We error out for this currently already, because we currently switch to
`FunctionStart` state when we first see `.functype` directive after its
label definition.
Fixes https://github.com/llvm/llvm-project/issues/57427.
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D141103
Although we only currently have one error produced in this function I am
working on changes right now that add some more. This change makes the
error location more accurate.
Differential Revision: https://reviews.llvm.org/D133016
Warn if `.size` is specified for a function symbol. The size of a
function symbol is determined solely by its content.
I noticed this simplification was possible while debugging #57427, but
this change doesn't fix that specific issue.
Differential Revision: https://reviews.llvm.org/D132929
The previous code didn't take account for the fact that parseExpression
my lex additional tokens - because of this, it's necessary to record the
location of the current token ahead of the call. This patch additionally
makes use of the fact parseExpression will set its End parameter to the
end of the expression.
Although this fix could be added independently of D122127, I've opted to
make it a child patch in order to ensure the change has some test
coverage.
Differential Revision: https://reviews.llvm.org/D122128
This addresses a series of FIXMEs introduced in D122020.
A follow-up patch (D122128) addresses the bug that is exposed by this
change (an issue with source location information when lexing
identifiers).
Differential Revision: https://reviews.llvm.org/D122127
As usual with that header cleanup series, some implicit dependencies now need to
be explicit:
llvm/MC/MCParser/MCAsmParser.h no longer includes llvm/MC/MCParser/MCAsmLexer.h
Preprocessed lines to build llvm on my setup:
after: 1068185081
before: 1068324320
So no compile time benefit to expect, but we still get the looser coupling
between files which is great.
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119359
This patch implements the intrinsic for ref.null.
In the process of implementing int_wasm_ref_null_func() and
int_wasm_ref_null_extern() intrinsics, it removes the redundant
HeapType.
This also causes the textual assembler syntax for ref.null to
change. Instead of receiving an argument: `func` or `extern`, the
instruction mnemonic is either ref.null_func or ref.null_extern,
without the need for a further operand.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D114979
To support return (it not being supported well was the ground cause for
https://github.com/WebAssembly/wasi-sdk/issues/200) we also have to have
at least a basic notion of unreachable, which in this case just means to stop
type checking until there is an end_block (an incoming control flow edge).
This is conservative (may miss on some type checking opportunities) but is
simple and an improvement over what we had before.
Differential Revision: https://reviews.llvm.org/D112953
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
This to protect against non-sensical instruction sequences being assembled,
which would either cause asserts/crashes further down, or a Wasm module being output that doesn't validate.
Unlike a validator, this type checker is able to give type-errors as part of the parsing process, which makes the assembler much friendlier to be used by humans writing manual input.
Because the MC system is single pass (instructions aren't even stored in MC format, they are directly output) the type checker has to be single pass as well, which means that from now on .globaltype and .functype decls must come before their use. An extra pass is added to Codegen to collect information for this purpose, since AsmPrinter is normally single pass / streaming as well, and would otherwise generate this information on the fly.
A `-no-type-check` flag was added to llvm-mc (and any other tools that take asm input) that surpresses type errors, as a quick escape hatch for tests that were not intended to be type correct.
This is a first version of the type checker that ignores control flow, i.e. it checks that types are correct along the linear path, but not the branch path. This will still catch most errors. Branch checking could be added in the future.
Differential Revision: https://reviews.llvm.org/D104945
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
This change was originally landed in: 5000a1b4b9edeb9e994f2a5b36da8d48599bea49
It was reverted in: 061e071d8c9b98526f35cad55a918a4f1615afd4
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
This reverts commit 5000a1b4b9edeb9e994f2a5b36da8d48599bea49.
Breaks tests, see https://reviews.llvm.org/D97657#2749151
Easily repros locally with `ninja check-llvm-mc-webassembly`.
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
This CL
1. Creates Utils/ directory under lib/Target/WebAssembly
2. Moves existing WebAssemblyUtilities.cpp|h into the Utils/ directory
3. Creates Utils/WebAssemblyTypeUtilities.cpp|h and put type
declarataions and type conversion functions scattered in various
places into this single place.
It has been suggested several times that it is not easy to share utility
functions between subdirectories (AsmParser, DIsassembler, MCTargetDesc,
...). Sometimes we ended up [[ https://reviews.llvm.org/D92840#2478863 | duplicating ]] the same function because of
this.
There are already other targets doing this: AArch64, AMDGPU, and ARM
have Utils/ subdirectory under their target directory.
This extracts the utility functions into a single directory Utils/ and
make them sharable among all passes in WebAssembly/ and its
subdirectories. Also I believe gathering all type-related conversion
functionalities into a single place makes it more usable. (Actually I
was working on another CL that uses various type conversion functions
scattered in multiple places, which became the motivation for this CL.)
Reviewed By: dschuff, aardappel
Differential Revision: https://reviews.llvm.org/D100995
This commit adds a full WasmTableType to MCSymbolWasm, differing from
the current situation (just an ElemType) in that it additionally records
a WasmLimits.
We add support for specifying the limits in .S files also, via the
following syntax variations:
.tabletype SYM, ELEMTYPE
.tabletype SYM, ELEMTYPE, MINSIZE
.tabletype SYM, ELEMTYPE, MINSIZE, MAXSIZE
Depends on D99186.
Differential Revision: https://reviews.llvm.org/D99191
The WebAssembly text and binary formats have different operand orders
for the "type" and "table" fields of call_indirect (and
return_call_indirect). In LLVM we use the binary order for the MCInstr,
but when we produce or consume the text format we should use the text
order. For compilation units targetting WebAssembly 1.0 (without the
reference types feature), we omit the table operand entirely.
Differential Revision: https://reviews.llvm.org/D97761