This replaces the existing `-wasm-enable-exnref` with
`-wasm-use-legacy-eh` option, in an effort to make the new standardized
exnref proposal the 'default' state and the legacy proposal needs to be
separately enabled an option. But given that most users haven't switched
to the new proposal and major web browsers haven't turned it on by
default, this `-wasm-use-legacy-eh` is turned on by default, so nothing
will change for now for the functionality perspective.
This also removes the restriction that `-wasm-enable-exnref` be only
used with `-wasm-enable-eh` because this option is enabled by default.
This option does not have any effect when `-wasm-enable-eh` is not used.
With this change, targets are no longer required to put memory / strict-fp opcodes after special
`ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers.
This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).
Pull Request: https://github.com/llvm/llvm-project/pull/119969
Clang [defaults to aligning `__int128_t` to 16 bytes], while LLVM
`datalayout` strings [default to aligning `i128` to 8 bytes]. Wasm is
currently using the defaults for both, so it's inconsistent. Fix this by
adding `-i128:128` to Wasm's `datalayout` string so that it aligns
`i128` to 16 bytes too.
This is similar to
[llvm/llvm-project@dbad963](dbad963a69)
for SPARC.
This fixesrust-lang/rust#133991; see that issue for further discussion.
[defaults to aligning `__int128_t` to 16 bytes]:
f8b4182f07/clang/lib/Basic/TargetInfo.cpp (L77)
[default to aligning `i128` to 8 bytes]:
https://llvm.org/docs/LangRef.html#langref-datalayout
[Reverts d57892a2a153ab71a796f07e39d939eae6910c21]
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
---------
Co-authored-by: Paul Walker <paul.walker@arm.com>
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
This adds WebAssembly support for the new [Lime1 CPU].
First, this defines some new target features. These are subsets of
existing
features that reflect implementation concerns:
- "call-indirect-overlong" - implied by "reference-types"; just the
overlong
encoding for the `call_indirect` immediate, and not the actual reference
types.
- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.
Next, this defines a new target CPU, "lime1", which enables
mutable-globals,
bulk-memory-opt, multivalue, sign-ext, nontrapping-fptoint,
extended-const,
and call-indirect-overlong. Unlike the default "generic" CPU, "lime1" is
meant
to be frozen, and followed up by "lime2" and so on when new features are
desired.
[Lime1 CPU]:
https://github.com/WebAssembly/tool-conventions/blob/main/Lime.md#lime1
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
This defines some new target features. These are subsets of existing
features that reflect implementation concerns:
- "call-indirect-overlong" - implied by "reference-types"; just the
overlong encoding for the `call_indirect` immediate, and not the actual
reference types.
- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.
This is split out from https://github.com/llvm/llvm-project/pull/112035.
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
This patch moves the `areInlineCompatible` implementation from multiple
subclasses (`AArch64TTIImpl`, `RISCVTTIImpl`, `WebAssemblyTTIImpl`) to
the base class `BasicTTIImpl`. The new implementation checks whether the
callee's target features are a subset of the caller's, enabling
consistent behavior across targets. Subclasses now simply delegate to
the base implementation, reducing code duplication and improving
maintainability.
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks
Note that PointerUnion::{is,get,dyn_cast} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
%0 = cleanuppad ...
call @destructor
cleanupret from %0 unwind label %catch.dispatch
```
If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
%0 = cleanuppad ...
invoke @throwing_func
to label %unreachale unwind label %catch.dispatch.i
catch.dispatch.i:
catchswitch ... [ label %catch.start.i ]
catch.start.i:
%1 = catchpad ...
invoke @some_function
to label %invoke.cont.i unwind label %terminate.i
invoke.cont.i:
catchret from %1 to label %destructor.exit
destructor.exit:
cleanupret from %0 unwind label %catch.dispatch
```
We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.
The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).
This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:
5091a359d9/llvm/include/llvm/Target/TargetSelectionDAG.td (L700)
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.
This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.
---
- Details about changes in the Wasm backend:
After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.
In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.
---
Fixes#114600.
This fixes unwind mismatches for the new EH spec.
The main flow is similar to that of the legacy EH's unwind mismatch
fixing. The new EH shared `fixCallUnwindMismatches` and
`fixCatchUnwindMismatches` functions, which gather the range of
instructions we need to fix their unwind destination for, with the
legacy EH. But unlike the legacy EH that uses `try`-`delegate`s to fix
them, the new EH wrap those instructions with nested
`try_table`-`end_try_table`s that jump to a "trampoline" BB, where we
rethrow (using a `throw_ref`) the exception to the correct `try_table`.
For a simple example of a call unwind mismatch, suppose if `call foo`
should unwind to the outer `try_table` but is wrapped in another
`try_table` (not shown here):
```wast
try_table
...
call foo ;; Unwind mismatch. Should unwind to the outer try_table
...
end_try_table
```
Then we wrap the call with a new nested `try_table`-`end_try_table`, add
a `block` / `end_block` right inside the target `try_table`, and make
the nested `try_table` jump to it using a `catch_all_ref` clause, and
rethrow the exception using a `throw_ref`:
```wast
try_table
block $l0 exnref
...
try_table (catch_all_ref $l0)
call foo
end_try_table
...
end_block ;; Trampoline BB
throw_ref
end_try_table
```
---
This fixes two existing bugs. These are not easy to test independently
without the unwind mismatch fixing. The first one is how we calculate
`ScopeTops`. Turns out, we should do it in the same way as in the legacy
EH even though there is no `end_try` at the end of `catch` block
anymore. `nested_try` in `cfg-stackify-eh.ll` tests this case.
The second bug is in `rewriteDepthImmediates`. `try_table`'s immediates
should be computed without the `try_table` itself, meaning
```wast
block
try_table (catch ... 0)
end_try_table
end_block
```
Here 0 should target not `end_try_table` but `end_block`. This bug
didn't crash the program because `placeTryTableMarker` generated only
the simple form of `try_table` that has a single catch clause and an
`end_block` follows right after the `end_try_table` in the same BB, so
jumping to an `end_try_table` is the same as jumping to the `end_block`.
But now we generate `catch` clauses with depths greater than 0 with when
fixing unwind mismatches, which uncovered this bug.
---
One case that needs a special treatment was when `end_loop` precedes an
`end_try_table` within a BB and this BB is a (true) unwind destination
when fixing unwind mismatches. In this case we need to split this
`end_loop` into a predecessor BB. This case is tested in
`unwind_mismatches_with_loop` in `cfg-stackify-eh.ll`.
---
`cfg-stackify-eh.ll` contains mostly the same set of tests with the
existing `cfg-stackify-eh-legacy.ll` with the updated FileCheck
expectations. As in `cfg-stackify-eh-legacy.ll`, the FileCheck lines
mostly only contain control flow instructions and calls for readability.
- `nested_try` and `unwind_mismatches_with_loop` are added to test newly
found bugs in the new EH.
- Some tests in `cfg-stackify-eh-legacy.ll` about the legacy-EH-specific
asepcts have not been added to `cfg-stackify-eh.ll`.
(`remove_unnecessary_instrs`, `remove_unnecessary_br`,
`fix_function_end_return_type_with_try_catch`, and
`branch_remapping_after_fixing_unwind_mismatches_0/1`)
WebAssembly's `memory.fill` and `memory.copy` instructions trap if the
pointers are out of bounds, even if the length is zero. This is
different from LLVM, which expects that it can call `memcpy` on
arbitrary invalid pointers if the length is zero. To avoid spurious
traps, branch around `memory.fill` and `memory.copy` when the length is
zero.
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
This commit implements the [wide-arithmetic] proposal which has recently
reached phase 2 in the WebAssembly proposals process. The goal here is
to implement support in LLVM for emitting these instructions which are
gated behind a new feature flag by default. A new `wide-arithmetic`
feature flag is introduced which gates these four new instructions from
being emitted.
Emission of each instruction itself is relatively simple given LLVM's
preexisting lowering rules and infrastructure. The main gotcha is that
due to the multi-result nature of all of these instructions it needed
the lowerings to be implemented in C++ rather than in TableGen.
[wide-arithmetic]: https://github.com/WebAssembly/wide-arithmetic
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- `WebAssemblyRuntimeLibcallSignatures.cpp`: Add `RTLIB::ATAN2*` to
RuntimeLibcallSignatureTable
- Add atan2 calls to `CodeGen/WebAssembly/libcalls-trig.ll` and update
test checks
Part of: Implement the atan2 HLSL Function #70096.
This fixes a problem introduced in #80094. That PR copied negative
features from the TargetMachine to the end of the feature string. This
is not correct, because even if we have a baseline TM of say `-simd128`,
but a function with `+simd128`, the coalesced feature string should have
`+simd128`, not `-simd128`.
To address the original motivation of that PR, we should instead
explicitly materialize the negative features in the target feature
string, so that explicitly disabled default features are honored.
Unfortunately, there doesn't seem to be any way to actually test this
using llc, because `-mattr` appends the specified features to the end of
the `"target-features"` attribute. I've tested this locally by making it
prepend the features instead.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
This adds supports for the new EH instructions (`try_table` and
`throw_ref`) to the type checker.
One thing I'd like to improve on is the locations in the errors for
`catch_***` clauses. Currently they just point to the starting column of
`try_table` instruction itself. But to figure out where catch clauses
start you need to traverse `OperandVector` and check
`WebAssemblyOperand::isCatchList` on them to see which one is the catch
list operand, but `WebAssemblyOperand` class is in AsmParser and
AsmTypeCheck does not have access to it:
cdfdc857cb/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp (L43-L204)
And even if AsmTypeCheck has access to it, currently it treats the list
of catch clauses as a single `WebAssemblyOperand` so there is no way to
get the starting location of each `catch_***` clause in the current
structure.
This also renames `valTypeToStackType` to `valTypesToStackTypes`, given
that it takes two type lists.
When there was a type checking error, we didn't run `InstPrinter`. This
can be confusing because when there is an error in, say, block parameter
type, `InstPrinter` doesn't run even if it has nothing to do with block
parameter types, and all those updates to `ControlFlowStack` or
`TryStack` do not happen:
c20b90ab85/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.cpp (L135-L151)
For example,
```wast
block (i32) -> () ;; Block input parameter error
end_block ;; Now this errors out as "End marker mismatch"
```
This is confusing because there is a `block` and the `end_block` is not
a mismatch. Only that `block` has a type checking error, but that's not
an end marker mismatch.
I think we can just print the instruction whether we had a type checking
error or not, and this will be less confusing.
This makes the type checker handle blocks with input parameters and
return types, branches, and polymorphic stacks correctly.
We maintain the stack of "block info", which contains its input
parameter type, return type, and whether it is a loop or not. And this
is used when checking the validity of the value stack at the `end`
marker and all branches targeting the block.
`StackType` now supports a new variant `Polymorphic`, which indicates
the stack is in the polymorphic state. `Polymorphic`s are not popped
even when `popType` is executed; they are only popped when the current
block ends.
When popping from the value stack, we ensure we don't pop more than we
are allowed to at the given block level and print appropriate error
messages instead. Also after a block ends, the value stack is guaranteed
to have the right types based on the block return type. For example,
```wast
block i32
unreachable
end_block
;; You can expect to have an i32 on the stack here
```
This also adds handling for `br_if`. Previously only `br`s were checked.
`checkEnd` and `checkBr` were removed and their contents have been
inlined to the main `typeCheck` function, because they are called only
from a single callsite.
This also fixes two existing bugs in AsmParser, which were required to
make the tests passing. I added Github comments about them inline.
This modifies several existing invalid tests, those that passed
(incorrectly) before but do not pass with the new type checker anymore.
Fixes#107524.
Now that we support 'any' type in the value stack in the checker, this
uses it in more places.
When an instruction pops multiple values, rather than popping in one by
one and generating multiple error messages, it adds them to a vector and
pops them at once. When the type to be popped is not clear, it pops
'any', at least makes sure there are correct number of values on the
stack. So for example, in case of `table.fill`, which expects `[i32 t
i32]` (where t is the type of the elements in the table), it pops them
at once, generating an error message like
```console
error: type mismatch, expected [i32, externref, i32] but got [...]
```
In case the table is invalid so we don't know the type, it tries to pop
an 'any' instead, popping whatever value there is:
```console
error: type mismatch, expected [i32, any, i32] but got [...]
```
Checks done on other instructions based on the register info are already
popping and pushing types in vectors, after #110094:
a52251675f/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.cpp (L515-L536)
This also pushes 'any' in case the type to push is unclear. For example,
`local/global.set` pushes a value of the type specified in the local or
global, but in case that local or global is invalid, we push 'any'
instead, which will match with whatever type.
The objective of all these is not to make one instruction's error
propragate continuously into subsequent instructions. This also matches
Wabt's behavior.
This also renames `checkAndPopTypes` to just `popTypes`, to be
consistent with a single-element version `popType`. `popType(s)` also
does type checks.
This unifies the way we check types in various places in AsmTypeCheck.
The objectives of this PR are:
- We now use `checkTypes` for all type checking and `checkAndPopTypes`
for type checking + popping. All other functions are helper functions to
call these two functions.
- We now support comparisons of types between vectors. This lets us
printing error messages in more readable way. When an instruction takes
[i32, i64] but the stack top is [f32, f64], now instead of
```console
error: type mismatch, expected i64 but got f64
error: type mismatch, expected i32 but got f32
```
we can print this
```console
error: type mismatch, expected [i32, i64] but got [f32, f64]
```
which is also the format Wabt checker prints. This also helps printing
more meaningful messages when there are superfluous values on the stack
at the end of the function, such as:
```console
error: type mismatch, expected [] but got [i32, exnref]
```
Actually, many instructions are not utilizing this batch printing now,
which still causes multiple error messages to be printed for a single
instruction. This will be improved in a follow-up.
- The value stack now supports `Any` and `Ref`. There are instructions
that requires the type to be anything. Also instructions like
`ref.is_null` requires the type to be any reference types. Type
comparison function will handle this types accordingly, meaning
`match(I32, Any)` or `match(externref, Ref)` will succeed.
The changes in `type-checker-errors.s` are mostly the message format
changes. One downside of the new message format is that it doesn't have
instruction names in it. I plan to improve that in a potential
follow-up.
This also made some modifications in the instructions in
`type-checker-errors.s`. Currently, except for a few functions I've
recently added at the end, each function tests for a single error,
because the type checker used to bail out after the first error until
#109705. But many functions included multiple errors anyway, which I
don't think was the intention of the original writer. So I added some
instructions to remove the other errors which are not being tested. (In
some cases I added more error checking lines instead, when I felt that
could be relevant.)
Thanks to the new `ExactMatch` option in `checkTypes` function family,
we now can distinguish the cases when to check against only the top of
the value stack and when to check against the whole stack (e.g. to check
whether we have any superfluous values remaining at the end of the
function). `return` or `return_call(_indirect)` can set `ExactMatch` to
`false` because they don't care about the superfluous values. This makes
`type-checker-return.s` succeed and I was able to remove the `FIXME`.
This is the basis of the PR that fixes block parameter/return type
handling in the checker, but does not yet include the actual
block-related functionality, which will be submitted separately after
this PR.
This allows multiple errors to be reported within a function, rather
than returning on the first error and not looking at the rest of the
function.
I think the rationale for the previous behavior was that upon
encountering the first error, the value stack was not in the correct
status anymore and the rest of the function checking was not very
meaningful. But this patch makes the instruction push the correct result
type upon its completion, so the while the possibility of previous error
affecting later instructions is not zero, I think this can be more
helpful to assembly hand-writers. This also allows us to write multiple
error test cases without creating as many functions.
This is what Wabt and Binaryen wast checker/validator do as well.
Also this makes sure we return a value (true/false) within an `if` for
each instruction, removing the need for the long `if`-`else if`-`else
if` chain and making them all just `if`s. I also added newlines between
the `if`s, which I feel is easier to read.
Fixes are mostly one of these:
- `auto` -> `auto *` when the type is a pointer
- Function names start with a lowercase letter
- Variable names start with an uppercase letter
- No need to have an `else` after a `return`
Diff without whitespaces is easier to view.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
This adds support for binary generation for the new EH proposal.
So far the only case that we emitted variable immediate operands in
binary has been `br_table`'s destinations. (Other `variable_ops` uses in
TableGen files are register operands, such as the operands of `call`, so
they don't get emitted in binary as a part of the same instruction.)
With this PR, variable immediate operands can include `try_table`'s
operands:
- The number of of catch clauses
- catch clauses sub-opcodes
- `catch`: 0x00
- `catch_ref`: 0x01
- `catch_all`: 0x02
- `catch_all_ref`: 0x03
- catch clauses' destinations
With `try_table`, we now have variable expr operands for `try_table`'s
catch clauses' tags. We treat their fixups in the same way we do for
tags in other instructions such as in `throw`.
Diff without whitespace will be easier to view.