1210 Commits

Author SHA1 Message Date
David Green
6c27817294
[SelectionDAG] Use SimplifyDemandedBits from SimplifyDemandedVectorElts Bitcast. (#133717)
This adds a call to SimplifyDemandedBits from bitcasts with scalar input
types in SimplifyDemandedVectorElts, which can help simplify the input
scalar.
2025-04-03 11:14:08 +01:00
Florian Hahn
3bdf9a0880
[EquivalenceClasses] Use SmallVector for deterministic iteration order. (#134075)
Currently iterators over EquivalenceClasses will iterate over std::set,
which guarantees the order specified by the comperator. Unfortunately in
many cases, EquivalenceClasses are used with pointers, so iterating over
std::set of pointers will not be deterministic across runs.

There are multiple places that explicitly try to sort the equivalence
classes before using them to try to get a deterministic order
(LowerTypeTests, SplitModule), but there are others that do not at the
moment and this can result at least in non-determinstic value naming in
Float2Int.

This patch updates EquivalenceClasses to keep track of all members via a
extra SmallVector and removes code from LowerTypeTests and SplitModule
to sort the classes before processing.

Overall it looks like compile-time slightly decreases in most cases, but
close to noise:

https://llvm-compile-time-tracker.com/compare.php?from=7d441d9892295a6eb8aaf481e1715f039f6f224f&to=b0c2ac67a88d3ef86987e2f82115ea0170675a17&stat=instructions

PR: https://github.com/llvm/llvm-project/pull/134075
2025-04-02 20:27:43 +01:00
Sam Clegg
a30caa6a73
[WebAssembly] Add missing tests from #133289 (#133938) 2025-04-01 10:47:35 -07:00
Alex Crichton
a415b7f86e
[WebAssembly] Add more lowerings for wide-arithmetic (#132430)
This commit is the result of investigation and discussion on
WebAssembly/wide-arithmetic#6 where alternatives to the `i64.add128`
instruction were discussed but ultimately deferred to a future proposal.
In spite of this though I wanted to apply a few changes to the LLVM
backend here with `wide-arithmetic` enabled for a few minor changes:

* A lowering for the `ISD::UADDO` node is added which uses `add128`
where the upper bits of the two operands are constant zeros and the
result of the 128-bit addition is the result of the overflowing
addition.
* The high bits of a `I64_ADD128` node are now flagged as "known zero"
if the upper bits of the inputs are also zero, assisting this `UADDO`
lowering to ensure the backend knows that the carry result is a 1-bit
result.

A few tests were then added to showcase various lowerings for various
operations that can be done with wide-arithmetic. They don't all
optimize super well at this time but I wanted to add them as a reference
here regardless to have them on-hand for future evaluations if
necessary.
2025-03-31 11:36:32 -07:00
Sam Parker
103119a435
[WebAssembly] Lower wide SIMD i8 muls (#130785)
Currently, 'wide' i32 simd multiplication, with extended i8 elements,
will perform the multiplication with i32 So, for IR like the following:
```
  %wide.a = sext <8 x i8> %a to <8 x i32>
  %wide.b = sext <8 x i8> %a to <8 x i32>
  %mul = mul <8 x i32> %wide.a, %wide.b
  ret <8 x i32> %mul
```

We would generate the following sequence:
```
  i16x8.extend_low_i8x16_s $push6=, $1
  local.tee $push5=, $3=, $pop6
  i32x4.extmul_low_i16x8_s $push0=, $pop5, $3
  v128.store 0($0), $pop0
  i8x16.shuffle $push1=, $1, $1, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
  i16x8.extend_low_i8x16_s $push4=, $pop1
  local.tee $push3=, $1=, $pop4
  i32x4.extmul_low_i16x8_s $push2=, $pop3, $1
  v128.store 16($0), $pop2
  return
```

But now we perform the multiplication with i16, resulting in:
```
  i16x8.extmul_low_i8x16_s $push3=, $1, $1
  local.tee $push2=, $1=, $pop3
  i32x4.extend_high_i16x8_s $push0=, $pop2
  v128.store 16($0), $pop0
  i32x4.extend_low_i16x8_s $push1=, $1
  v128.store 0($0), $pop1
  return
```
2025-03-21 06:57:57 +00:00
yonghong-song
0ffe83feac
[SelectionDAG] Not issue TRAP node if naked function (#132147)
In [1], Nikita Popov suggested that during lowering 'unreachable' insn
should not generate extra code for naked functions, and this applies to
all architectures. Note that for naked functions, 'unreachable' insn is
necessary in IR since the basic block needs a terminator to end.

This patch checked whether a function is naked function or not. If it is
a naked function, 'unreachable' insn will not generate ISD::TRAP.

  [1] https://github.com/llvm/llvm-project/pull/131731

Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
2025-03-20 18:18:03 -07:00
Heejin Ahn
494fe0b414
[WebAssembly] Remove wasm-specific findWasmUnwindDestinations (#130374)
Unlike in Itanium EH IR, WinEH IR's unwinding instructions (e.g.
`invoke`s) can have multiple possible unwind destinations.

For example:
```ll
entry:
  invoke void @foo()
     to label %cont unwind label %catch.dispatch

catch.dispatch:                                ; preds = %entry
  %0 = catchswitch within none [label %catch.start] unwind label %terminate

catch.start:                                   ; preds = %catch.dispatch
  %1 = catchpad within %0 [ptr null]
  ...

terminate:                                     ; preds = %catch.dispatch
  %2 = catchpad within none []
  ...
...
```

In this case, if an exception is not caught by `catch.dispatch` (and
thus `catch.start`), it should next unwind to `terminate`.
`findUnwindDestination` in ISel gathers the list of this unwind
destinations traversing the unwind edges:

ae42f07103/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L2089-L2150)
But we don't use that, and instead use our custom
`findWasmUnwindDestinations` that only adds the first unwind
destination, `catch.start`, to the successor list of `entry`, and not
`terminate`:

ae42f07103/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L2037-L2087)

The reason behind it was, as described in the comment block in the code,
it was assumed that there always would be an `invoke` that connects
`catch.start` and `terminate`. In case of `catch (type)`, there will be
`call void @llvm.wasm.rethrow()` in `catch.start`'s predecessor that
unwinds to the next destination. For example:

0db702ac8e/llvm/test/CodeGen/WebAssembly/exception.ll (L429-L430)
In case of `catch (...)`, `__cxa_end_catch` can throw, so it becomes an
`invoke` that unwinds to the next destination. For example:
0db702ac8e/llvm/test/CodeGen/WebAssembly/exception.ll (L537-L538)
So the unwind ordering relationship between `catch.start` and
`terminate` here would be preserved.

But turns out this assumption does not always hold. For example:
```ll
entry:
  invoke void @foo()
     to label %cont unwind label %catch.dispatch

catch.dispatch:                                ; preds = %entry
  %0 = catchswitch within none [label %catch.start] unwind label %terminate

catch.start:                                   ; preds = %catch.dispatch
  %1 = catchpad within %0 [ptr null]
  ...
  call void @_ZSt9terminatev()
  unreachable

terminate:                                     ; preds = %catch.dispatch
  %2 = catchpad within none []
  call void @_ZSt9terminatev()
  unreachable
...
```
In this case there is no `invoke` that connects `catch.start` to
`terminate`. So after `catch.dispatch` BB is removed in ISel,
`terminate` is considered unreachable and incorrectly removed in DCE.

This makes Wasm just use the general `findUnwindDestination`. In that
case `entry`'s successor is going to be [`catch.start`, `terminate`]. We
can get the first unwind destination by just traversing the list from
the front.

---

This required another change in WinEHPrepare. WinEHPrepare demotes all
PHIs in EH pads because they are funclets in Windows and funclets can't
have PHIs. When used in Wasm they are not funclets so we don't need to
do that wholesale but we still need to demote PHIs in `catchswitch` BBs
because they are deleted during ISel. (So we created
[`-demote-catchswitch-only`](a5588b6d20/llvm/lib/CodeGen/WinEHPrepare.cpp (L57-L59))
option for that)

But turns out we need to remove PHIs that have a `catchswitch` BB as an
incoming block too:
```ll
...
catch.dispatch:
  %0 = catchswitch within none [label %catch.start] unwind label %terminate

catch.start:
  ...

somebb:
  ...

ehcleanup                                      ; preds = %catch.dispatch, %somebb
  %1 = phi i32 [ 10, %catch.dispatch ], [ 20, %somebb ]
  ...
```
In this case the `phi` in `ehcleanup` BB should be demoted too because
`catch.dispatch` BB will be removed in ISel so one if its incoming block
will be gone. This pattern didn't manifest before presumably due to how
`findWasmUnwindDestinations` worked. (In this example, in our
`findWasmUnwindDestinations`, `catch.dispatch` would have had only one
successor, `catch.start`. But now `catch.dispatch` has both
`catch.start` and `ehcleanup` as successors, revealing this bug. This
case is
[represented](ab87206c4b/llvm/test/CodeGen/WebAssembly/exception.ll (L445))
by `rethrow_terminator` function in `exception.ll` (or
`exception-legacy.ll`) and without the WinEHPrepare fix it will crash.

---

Discovered by the reproducer provided in #126916, even though the bug
reported there was not this one.
2025-03-10 20:56:38 -07:00
Derek Schuff
6916438b65
[WebAssembly] Add Libcall signatures for modf and variants (#130201)
Clang now lowers modf/modff/modfl as builtins using the llvm.modf
intrinsic.
2025-03-06 15:48:39 -08:00
Daniel Paoliello
16e051f0b9
[win] NFC: Rename EHCatchret to EHCont to allow for EH Continuation targets that aren't catchret instructions (#129953)
This change splits out the renaming and comment updates from #129612 as a non-functional change.
2025-03-06 09:28:44 -08:00
Sam Clegg
147d9d6915
[WebAssemblyLowerEmscriptenEHSjLj] Avoid setting import_name where possible (#128564)
This change effectively reverts 296ccef
(https://reviews.llvm.org/D77192)

Most of these symbols are just normal C symbols that get imported from
wither libcompiler-rt or from emscripten's JS library code. In most
cases it should not be necessary to give them explicit import names.

The advantage of doing this is that we can wasm-ld can/will fail with a
useful error message when these symbols are missing. As opposed to today
where it will simply import them and defer errors until later (when they
are less specific).
2025-02-26 14:05:00 -08:00
Brendan Dahl
9102afcd01
[WebAssembly] Use the same lowerings for f16x8 as other float vectors. (#127897)
This fixes failures to select the various compare operations that
weren't being expanded for f16x8.
2025-02-25 11:01:32 -08:00
Brendan Dahl
67056c280a
[WebAssembly] Support shuffle for F16x8 vectors. (#127857) 2025-02-25 10:39:54 -08:00
Heejin Ahn
d2d469eb79
[WebAssembly] Make llvm.wasm.throw invokable (#128104)
`llvm.wasm.throw` intrinsic can throw but it was not invokable. Not sure
what the rationale was when it was first written that way, but I think
at least in Emscripten's C++ exception support with the Wasm port of
libunwind, `__builtin_wasm_throw`, which is lowered down to
`llvm.wasm.rethrow`, is used only within `_Unwind_RaiseException`, which
is an one-liner and thus does not need an `invoke`:
720e97f76d/system/lib/libunwind/src/Unwind-wasm.c (L69)
(`_Unwind_RaiseException` is called by `__cxa_throw`, which is generated
by the `throw` C++ keyword)

But this does not address other direct uses of the builtin in C++, whose
use I'm not sure about but is not prohibited. Also other language
frontends may need to use the builtin in different functions, which has
`try`-`catch`es or destructors.

This makes `llvm.wasm.throw` invokable in the backend. To do that, this
adds a custom lowering routine to `SelectionDAGBuilder::visitInvoke`,
like we did for `llvm.wasm.rethrow`.

This does not generate `invoke`s for `__builtin_wasm_throw` yet, which
will be done by a follow-up PR.

Addresses #124710.
2025-02-25 09:53:01 -08:00
Sam Parker
ea7897a617
[WebAssembly] Enable interleaved memory accesses (#125696)
Enable the vectorizer to access interleaved memory. This means that,
when it's decided to be profitable, the memory accesses can be
vectorized instead of the value being built up by a sequence of
load_lane instructions. This will often increase the vectorization
factor of the loop, leading to significantly better performance.

I run a reasonably large collection of benchmarks and most are not
affected by this change, with most performance changes <1%. But I see a
2.5% speedup for the total run time of TSVC, 1% speedup for SPEC2017
x265, 28% speedup for a ResNet workload and 95% for libyuv. This is
running V8 on an AArch64 box.
2025-02-17 09:09:52 +00:00
Sam Parker
948a8477c6
[WebAssembly] Recognise EXTEND_HIGH (#123325)
When lowering EXTEND_VECTOR_INREG, check whether the operand is a
shuffle that is moving the top half of a vector into the lower half. If
so, we can EXTEND_HIGH the input to the shuffle instead.
2025-02-17 09:04:29 +00:00
Sam Parker
df2de13695
[WebAssembly] Autovec support for dot (#123207)
Enable the use of partial.reduce.add that we can lower to dot or a tree
of (add (extmul_low_u, extmul_high_u)) for the unsigned case. We support
both v8i16 and v16i8 inputs.
2025-02-03 08:58:43 +00:00
Sam Parker
28d7880618
[WebAssembly] getMemoryOpCost and getCastInstrCost (#122896)
Add inital implementations of these TTI methods for SIMD types. For
casts, The costing covers the free extensions provided by extmul_low as
well as extend_low. For memory operations we consider the use of
load32_zero and load64_zero, as well as full width v128 loads.
2025-01-31 10:33:31 +00:00
Heejin Ahn
539b2e0654
[WebAssembly] Fix catch block type in wasm64 (#124381)
`try_table`'s `catch` or `catch_ref`'s target block's return type should
be `i64` and `(i64, exnref)` in case of wasm64.
2025-01-27 11:01:48 -08:00
Heejin Ahn
c3dfd34e54
[WebAssembly] Add unreachable before catch destinations (#123915)
When `try_table`'s catch clause's destination has a return type, as in
the case of catch with a concrete tag, catch_ref, and catch_all_ref. For
example:
```wasm
block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
end_block
... use exnref ...
```

This code is not valid because the block's body type is not exnref. So
we add an unreachable after the 'end_try_table' to make the code valid
here:
```wasm
block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
  unreachable                    ;; Newly added
end_block
```
Because 'unreachable' is a terminator we also need to split the BB.

---

We need to handle the same thing for unwind mismatch handling. In the
code below, we create a "trampoline BB" that will be the destination for
the nested `try_table`~`end_try_table` added to fix a unwind mismatch:
```wasm
try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table
```
While the `block` added for the trampoline BB has the return type
`exnref`, its body, which contains the nested `try_table` and other
code, wouldn't have the `exnref` return type. Most times it didn't
become a problem because the block's body ended with something like `br`
or `return`, but that may not always be the case, especially when there
is a loop. So we add an `unreachable` to make the code valid here too:
```wasm
try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
    unreachable                  ;; Newly added
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table
```
In this case we just append the `unreachable` at the end of the layout
predecessor BB. (This was tricky to do in the first (non-mismatch) case
because there `end_try_table` and `end_block` were added in the
beginning of an EH pad in `placeTryTableMarker` and moving
`end_try_table` and the new `unreachable` to the previous BB caused
other problems.)

---

This adds many `unreaachable`s to the output, but this adds
`unreachable` to only a few places to see if this is working. The
FileCheck lines in `exception.ll` and `cfg-stackify-eh.ll` are already
heavily redacted to only leave important control-flow instructions, so I
don't think it's worth adding `unreachable`s everywhere.
2025-01-22 22:39:43 -08:00
Matt Arsenault
5e79ae60a6
DAG: Fix vector_shuffle -> splat fold defining undef lanes (#123596)
For shuffle vector splats with undef lanes in the mask,
this was introducing real values. Filter out build_vector
results based on the undef elements in the mask.

This avoids AMDGPU test regressions in a future change.

test/CodeGen/X86/urem-seteq-illegal-types.ll looks worse
but I didn't investigate.
2025-01-21 23:55:50 +07:00
Heejin Ahn
a8e1135baa
[WebAssembly] Add -wasm-use-legacy-eh option (#122158)
This replaces the existing `-wasm-enable-exnref` with
`-wasm-use-legacy-eh` option, in an effort to make the new standardized
exnref proposal the 'default' state and the legacy proposal needs to be
separately enabled an option. But given that most users haven't switched
to the new proposal and major web browsers haven't turned it on by
default, this `-wasm-use-legacy-eh` is turned on by default, so nothing
will change for now for the functionality perspective.

This also removes the restriction that `-wasm-enable-exnref` be only
used with `-wasm-enable-eh` because this option is enabled by default.
This option does not have any effect when `-wasm-enable-eh` is not used.
2025-01-09 22:36:10 -08:00
Dan Gohman
c5ab70c508
[WebAssembly] Add -i128:128 to the datalayout string. (#119204)
Clang [defaults to aligning `__int128_t` to 16 bytes], while LLVM
`datalayout` strings [default to aligning `i128` to 8 bytes]. Wasm is
currently using the defaults for both, so it's inconsistent. Fix this by
adding `-i128:128` to Wasm's `datalayout` string so that it aligns
`i128` to 16 bytes too.

This is similar to
[llvm/llvm-project@dbad963](dbad963a69)
for SPARC.

This fixes rust-lang/rust#133991; see that issue for further discussion.

[defaults to aligning `__int128_t` to 16 bytes]:
f8b4182f07/clang/lib/Basic/TargetInfo.cpp (L77)
[default to aligning `i128` to 8 bytes]:
https://llvm.org/docs/LangRef.html#langref-datalayout
2024-12-10 09:21:58 -08:00
Dan Gohman
e665e781dc
[SelectionDAG] Use the nuw flag when expanding loads. (#119288)
When expanding a load into two loads, use nuw for the add that computes
the offset from the base of the second load, because the original load
doesn't straddle the address space.

It turns out there's already a dedicated helper function for doing this,
`getObjectPtrOffset`.

This is in target-independent code, however in practice it only seems to
affact WebAssembly code, because WebAssembly load and store
instructions' constant offsets don't perform wrapping, so constant
folding often depends on the nuw flag being present.

This was noticed in the development of #119204.
2024-12-10 06:28:09 -08:00
Dan Gohman
35cce408ee
[WebAssembly] Support the new "Lime1" CPU (#112035)
This adds WebAssembly support for the new [Lime1 CPU].

First, this defines some new target features. These are subsets of
existing
features that reflect implementation concerns:

- "call-indirect-overlong" - implied by "reference-types"; just the
overlong
encoding for the `call_indirect` immediate, and not the actual reference
   types.

 - "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
   `memory.fill`, and not the other instructions in the bulk-memory
    proposal.

Next, this defines a new target CPU, "lime1", which enables
mutable-globals,
bulk-memory-opt, multivalue, sign-ext, nontrapping-fptoint,
extended-const,
and call-indirect-overlong. Unlike the default "generic" CPU, "lime1" is
meant
to be frozen, and followed up by "lime2" and so on when new features are
desired.

[Lime1 CPU]:
https://github.com/WebAssembly/tool-conventions/blob/main/Lime.md#lime1

---------

Co-authored-by: Heejin Ahn <aheejin@gmail.com>
2024-12-03 16:35:23 -08:00
Dan Gohman
c3536b263f
[WebAssembly] Define call-indirect-overlong and bulk-memory-opt features (#117087)
This defines some new target features. These are subsets of existing
features that reflect implementation concerns:

- "call-indirect-overlong" - implied by "reference-types"; just the
overlong encoding for the `call_indirect` immediate, and not the actual
reference types.

- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.

This is split out from https://github.com/llvm/llvm-project/pull/112035.

---------

Co-authored-by: Heejin Ahn <aheejin@gmail.com>
2024-12-02 17:08:07 -08:00
Sam Clegg
ea58410d0f
[WebAssembly] Implement %llvm.thread.pointer intrinsic (#117817)
We can simply use the `__tls_base` global for this which is guaranteed
to be non-zero and unique per thread.

Fixes: #117433
2024-11-26 17:19:14 -08:00
Heejin Ahn
492812f613
[WebAssembly] Fix rethrow's index calculation (#114693)
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
  %0 = cleanuppad ...
  call @destructor
  cleanupret from %0 unwind label %catch.dispatch
```

If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
  %0 = cleanuppad ...
  invoke @throwing_func
    to label %unreachale unwind label %catch.dispatch.i

catch.dispatch.i:
  catchswitch ... [ label %catch.start.i ]

catch.start.i:
  %1 = catchpad ...
  invoke @some_function
    to label %invoke.cont.i unwind label %terminate.i

invoke.cont.i:
  catchret from %1 to label %destructor.exit

destructor.exit:
  cleanupret from %0 unwind label %catch.dispatch
```

We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.

The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).

This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:

5091a359d9/llvm/include/llvm/Target/TargetSelectionDAG.td (L700)
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.

This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.

---

- Details about changes in the Wasm backend:

After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.

In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.

---

Fixes #114600.
2024-11-05 21:45:13 -08:00
Heejin Ahn
380fd09d98
[WebAssembly] Fix unwind mismatches in new EH (#114361)
This fixes unwind mismatches for the new EH spec.

The main flow is similar to that of the legacy EH's unwind mismatch
fixing. The new EH shared `fixCallUnwindMismatches` and
`fixCatchUnwindMismatches` functions, which gather the range of
instructions we need to fix their unwind destination for, with the
legacy EH. But unlike the legacy EH that uses `try`-`delegate`s to fix
them, the new EH wrap those instructions with nested
`try_table`-`end_try_table`s that jump to a "trampoline" BB, where we
rethrow (using a `throw_ref`) the exception to the correct `try_table`.

For a simple example of a call unwind mismatch, suppose if `call foo`
should unwind to the outer `try_table` but is wrapped in another
`try_table` (not shown here):
```wast
try_table
  ...
  call foo    ;; Unwind mismatch. Should unwind to the outer try_table
  ...
end_try_table
```

Then we wrap the call with a new nested `try_table`-`end_try_table`, add
a `block` / `end_block` right inside the target `try_table`, and make
the nested `try_table` jump to it using a `catch_all_ref` clause, and
rethrow the exception using a `throw_ref`:
```wast
try_table
  block $l0 exnref
    ...
    try_table (catch_all_ref $l0)
      call foo
    end_try_table
    ...
  end_block             ;; Trampoline BB
  throw_ref
end_try_table
```

---

This fixes two existing bugs. These are not easy to test independently
without the unwind mismatch fixing. The first one is how we calculate
`ScopeTops`. Turns out, we should do it in the same way as in the legacy
EH even though there is no `end_try` at the end of `catch` block
anymore. `nested_try` in `cfg-stackify-eh.ll` tests this case.

The second bug is in `rewriteDepthImmediates`. `try_table`'s immediates
should be computed without the `try_table` itself, meaning
```wast
block
  try_table (catch ... 0)
  end_try_table
end_block
```
Here 0 should target not `end_try_table` but `end_block`. This bug
didn't crash the program because `placeTryTableMarker` generated only
the simple form of `try_table` that has a single catch clause and an
`end_block` follows right after the `end_try_table` in the same BB, so
jumping to an `end_try_table` is the same as jumping to the `end_block`.
But now we generate `catch` clauses with depths greater than 0 with when
fixing unwind mismatches, which uncovered this bug.

---

One case that needs a special treatment was when `end_loop` precedes an
`end_try_table` within a BB and this BB is a (true) unwind destination
when fixing unwind mismatches. In this case we need to split this
`end_loop` into a predecessor BB. This case is tested in
`unwind_mismatches_with_loop` in `cfg-stackify-eh.ll`.

---

`cfg-stackify-eh.ll` contains mostly the same set of tests with the
existing `cfg-stackify-eh-legacy.ll` with the updated FileCheck
expectations. As in `cfg-stackify-eh-legacy.ll`, the FileCheck lines
mostly only contain control flow instructions and calls for readability.
- `nested_try` and `unwind_mismatches_with_loop` are added to test newly
found bugs in the new EH.
- Some tests in `cfg-stackify-eh-legacy.ll` about the legacy-EH-specific
asepcts have not been added to `cfg-stackify-eh.ll`.
(`remove_unnecessary_instrs`, `remove_unnecessary_br`,
`fix_function_end_return_type_with_try_catch`, and
`branch_remapping_after_fixing_unwind_mismatches_0/1`)
2024-11-05 09:40:41 -08:00
Heejin Ahn
d1d3e4b273
[WebAssembly] Add/Reorder legacy EH tests (#114363)
These tests are added to match the standard EH tests in #114361:
- `nested_try`
- `unwind_mismatches_with_loop`

These tests are useful to test certain aspects of the new EH but I think
they add more coverage to the legaacy tests as well.

And `unstackify_when_fixing_unwind_mismatch` and `unwind_mismatches_5`
have not changed; they have been just moved.

This also fixes some comments.
2024-11-04 16:07:52 -08:00
Dan Gohman
1bc2cd98c5
[WebAssembly] Enable nontrapping-fptoint and bulk-memory by default. (#112049)
We were prepared to enable these features [back in February], but they
got pulled for what appear to be unrelated reasons. So let's have
another try at enabling them!

Another motivation here is that it'd be convenient for the [Lime1
proposal] if "lime1" is close to a subset of "generic" (missing only
for extended-const).

[back in February]:
https://github.com/WebAssembly/tool-conventions/issues/158#issuecomment-1931119512
[Lime1 proposal]: https://github.com/llvm/llvm-project/pull/112035
2024-10-25 13:52:51 -07:00
Dan Gohman
118445841d
[WebAssembly] Protect memory.fill and memory.copy from zero-length ranges. (#112617)
WebAssembly's `memory.fill` and `memory.copy` instructions trap if the
pointers are out of bounds, even if the length is zero. This is
different from LLVM, which expects that it can call `memcpy` on
arbitrary invalid pointers if the length is zero. To avoid spurious
traps, branch around `memory.fill` and `memory.copy` when the length is
zero.

---------

Co-authored-by: Heejin Ahn <aheejin@gmail.com>
2024-10-24 14:13:58 -07:00
Alex Crichton
c2293b33dd
[WebAssembly] Implement the wide-arithmetic proposal (#111598)
This commit implements the [wide-arithmetic] proposal which has recently
reached phase 2 in the WebAssembly proposals process. The goal here is
to implement support in LLVM for emitting these instructions which are
gated behind a new feature flag by default. A new `wide-arithmetic`
feature flag is introduced which gates these four new instructions from
being emitted.

Emission of each instruction itself is relatively simple given LLVM's
preexisting lowering rules and infrastructure. The main gotcha is that
due to the multi-result nature of all of these instructions it needed
the lowerings to be implemented in C++ rather than in TableGen.

[wide-arithmetic]: https://github.com/WebAssembly/wide-arithmetic
2024-10-23 11:39:58 -07:00
Heejin Ahn
5c92f2331c
[WebAssembly] Fix MIR printing of reference types (#113028)
When printing a memory operand in MIR, this line

d37bc32a65/llvm/lib/CodeGen/MachineOperand.cpp (L1247)
calls this
d37bc32a65/llvm/include/llvm/Support/Alignment.h (L238)
which assumes `Rhs` (the size in this case) is positive.

But Wasm reference types' size is set to 0:
d37bc32a65/llvm/include/llvm/CodeGen/ValueTypes.td (L326-L328)

`getSize() > 0` condition was added with the Wasm reference types
support in
46667a1003,
and it looks it was removed in #84751. This revives the condition so
that Wasm reference types will not crash the MIR printer.
2024-10-22 13:48:00 -07:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
Tex Riddell
2bebeea2a1
[WebAssembly] Add atan2 to RuntimeLibcallSignatureTable (#112613)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- `WebAssemblyRuntimeLibcallSignatures.cpp`: Add `RTLIB::ATAN2*` to
RuntimeLibcallSignatureTable
- Add atan2 calls to `CodeGen/WebAssembly/libcalls-trig.ll` and update
test checks

Part of: Implement the atan2 HLSL Function #70096.
2024-10-17 10:39:36 -07:00
Yuta Saito
d4efc3e097
[Coverage][WebAssembly] Add initial support for WebAssembly/WASI (#111332)
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:

1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
    - [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
    - [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested

See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
2024-10-15 02:41:43 +09:00
Heejin Ahn
115cb402d8
[WebAssembly] Don't fold non-nuw add/sub in FastISel (#111278)
We should not fold one of add/sub operands into a load/store's offset
when `nuw` (no unsigned wrap) is not present, because the address
calculation, which adds the offset with the operand, does not wrap.

This is handled correctly in the normal ISel:

6de5305b3d/llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp (L328-L332)
but not in FastISel.

This positivity check in FastISel is not sufficient to avoid this case
fully:

6de5305b3d/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp (L348-L352)

because
1. Even if RHS is within signed int range, depending on the value of the
LHS, the resulting value can exceed uint32 max.
2. When one of the operands is a label, `Address` can contain a
`GlobalValue` and a `Reg` at the same time, so the `GlobalValue` becomes
incorrectly an offset:
6de5305b3d/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp (L53-L69)
6de5305b3d/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp (L409-L417)

Both cases are in the newly added test.

We should handle `SUB` too because `SUB` is the same as `ADD` when RHS's
sign changes. I checked why our current normal ISel only handles `ADD`,
and the reason it's OK for the normal ISel to handle only `ADD` seems
that DAGCombiner replaces `SUB` with `ADD` here:
6de5305b3d/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L3904-L3907)

Fixes #111018.
2024-10-09 14:31:16 -07:00
Nikita Popov
4b3ba64ba7
[SCEVExpander] Clear flags when reusing GEP (#109293)
As pointed out in the review of #102133, SCEVExpander currently
incorrectly reuses GEP instructions that have poison-generating flags
set. Fix this by clearing the flags on the reused instruction.
2024-10-01 14:22:54 +02:00
Craig Topper
92a8b81bdf
[LegalizeVectorOps] Enable ExpandFABS/COPYSIGN to use integer ops for fixed vectors in some cases. (#109232)
Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and
ARM.
2024-09-30 11:44:49 -07:00
Simon Pilgrim
f8f0a266e0
[clang][wasm] Replace the target integer sub saturate intrinsics with the equivalent generic __builtin_elementwise_sub_sat intrinsics (#109405)
Remove the Intrinsic::wasm_sub_sat_signed/wasm_sub_sat_unsigned entries
and just use sub_sat_s/sub_sat_u directly
2024-09-22 10:12:41 +01:00
Heejin Ahn
08bba6503b
[WebAssembly] Support binary generation for new EH (#109027)
This adds support for binary generation for the new EH proposal.

So far the only case that we emitted variable immediate operands in
binary has been `br_table`'s destinations. (Other `variable_ops` uses in
TableGen files are register operands, such as the operands of `call`, so
they don't get emitted in binary as a part of the same instruction.)

With this PR, variable immediate operands can include `try_table`'s
operands:
- The number of of catch clauses
- catch clauses sub-opcodes
  - `catch`: 0x00
  - `catch_ref`: 0x01
  - `catch_all`: 0x02
  - `catch_all_ref`: 0x03
- catch clauses' destinations

With `try_table`, we now have variable expr operands for `try_table`'s
catch clauses' tags. We treat their fixups in the same way we do for
tags in other instructions such as in `throw`.

Diff without whitespace will be easier to view.
2024-09-17 14:58:19 -07:00
Brendan Dahl
c076638c70
[WebAssembly] Support BUILD_VECTOR with F16x8. (#108117)
Convert BUILD_VECTORS with FP16x8 to I16x8 since there's no FP16 scalar
value to intialize v128.const.
2024-09-11 10:00:10 -07:00
Brendan Dahl
415288a2a7
[WebAssembly] Add load and store patterns for V8F16. (#108119) 2024-09-11 09:53:53 -07:00
Heejin Ahn
6bbf7f06d8
[WebAssembly] Add assembly support for final EH proposal (#107917)
This adds the basic assembly generation support for the final EH
proposal, which was newly adopted in Sep 2023 and advanced into Phase 4
in Jul 2024:

https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md

This adds support for the generation of new `try_table` and `throw_ref`
instruction in .s asesmbly format. This does NOT yet include
- Block annotation comment generation for .s format
- .o object file generation
- .s assembly parsing
- Type checking (AsmTypeCheck)
- Disassembler
- Fixing unwind mismatches in CFGStackify

These will be added as follow-up PRs.

---

The format for `TRY_TABLE`, both for `MachineInstr` and `MCInst`, is as
follows:
```
TRY_TABLE type number_of_catches catch_clauses*
```
where `catch_clause` is
```
catch_opcode tag+ destination
```
`catch_opcode` should be one of 0/1/2/3, which denotes
`CATCH`/`CATCH_REF`/`CATCH_ALL`/`CATCH_ALL_REF` respectively. (See
`BinaryFormat/Wasm.h`) `tag` exists when the catch is one of `CATCH` or
`CATCH_REF`.
The MIR format is printed as just the list of raw operands. The
(stack-based) assembly instruction supports pretty-printing, including
printing `catch` clauses by name, in InstPrinter.

In addition to the new instructions `TRY_TABLE` and `THROW_REF`, this
adds four pseudo instructions: `CATCH`, `CATCH_REF`, `CATCH_ALL`, and
`CATCH_ALL_REF`. These are pseudo instructions to simulate block return
values of `catch`, `catch_ref`, `catch_all`, `catch_all_ref` clauses in
`try_table` respectively, given that we don't support block return
values except for one case (`fixEndsAtEndOfFunction` in CFGStackify).
These will be omitted when we lower the instructions to `MCInst` at the
end.

LateEHPrepare now will have one more stage to covert
`CATCH`/`CATCH_ALL`s to `CATCH_REF`/`CATCH_ALL_REF`s when there is a
`RETHROW` to rethrow its exception. The pass also converts `RETHROW`s
into `THROW_REF`. Note that we still use `RETHROW` as an interim pseudo
instruction until we convert them to `THROW_REF` in LateEHPrepare.

CFGStackify has a new `placeTryTableMarker` function, which places
`try_table`/`end_try_table` markers with a necessary `catch` clause and
also `block`/`end_block` markers for the destination of the `catch`
clause.

In MCInstLower, now we need to support one more case for the multivalue
block signature (`catch_ref`'s destination's `(i32, exnref)` return
type).

InstPrinter has a new routine to print the `catch_list` type, which is
used to print `try_table` instructions.

The new test, `exception.ll`'s source is the same as
`exception-legacy.ll`, with the FileCheck expectations changed. One
difference is the commands in this file have `-wasm-enable-exnref` to
test the new format, and don't have `-wasm-disable-explicit-locals
-wasm-keep-registers`, because the new custom InstPrinter routine to
print `catch_list` only works for the stack-based instructions (`_S`),
and we can't use `-wasm-keep-registers` for them.

As in `exception-legacy.ll`, the FileCheck lines for the new tests do
not contain the whole program; they mostly contain only the control flow
instructions for readability.
2024-09-10 21:32:24 -07:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
Heejin Ahn
aecbc92410
[WebAssembly] Rename CATCH/CATCH_ALL to *_LEGACY (#107187)
This renames MIR instruction `CATCH` and `CATCH_ALL` to `CATCH_LEGACY`
and `CATCH_ALL_LEGACY` respectively.

Follow-up PRs for the new EH (exnref) implementation will use `CATCH`,
`CATCH_REF`, `CATCH_ALL`, and `CATCH_ALL_REF` as pseudo-instructions
that return extracted values or `exnref` or both, because we don't
currently support block return values in LLVM. So to give the old (real)
`CATCH`es and the new (pseudo) `CATCH`es different names, this attaches
`_LEGACY` prefix to the old names.

This also rearranges `WebAssemblyInstrControl.td` so that the old legacy
instructions are listed all together at the end.
2024-09-04 16:14:13 -07:00
Heejin Ahn
b2223b4d7e
[WebAssembly] Rename legacy EH mir tests (#107189)
We added `-legacy` suffix to the legacy EH `ll` tests in #107166 but
forgot to do the same for `mir` tests.
2024-09-04 09:52:42 -07:00
Heejin Ahn
8b28e2ebb3
[WebAssembly] Rename legacy EH tests (#107166)
Give each test in `cfg-stackify-eh-legacy.ll` a name rather than
something like `test5`, because I plan to copy many of these test into a
new file that tests for the new EH (exnref) and some of the tests here
are not applicable to the new EH so the numbering will be different,
which can make things confusing.

Also this removes `test_` prefixes in the test function names in
`exception-legacy.ll`, because, well, we all know they are tests.
2024-09-03 21:14:36 -07:00
Brendan Dahl
5703d8572f
[WebAssembly] Add intrinsics to wasm_simd128.h for all FP16 instructions (#106465)
Getting this to work required a few additional changes:
- Add builtins for any instructions that can't be done with plain C
currently.
- Add support for the saturating version of fp_to_<s,i>_I16x8. Other
vector sizes supported this already.
- Support bitcast of f16x8 to v128. Needed to return a __f16x8 as
v128_t.
2024-08-30 08:42:37 -07:00
Brendan Dahl
7d373cef49
[WebAssembly] Change half-precision feature name to fp16. (#105434)
This better aligns with how the feature is being referred to and what
runtimes (V8) are calling it.
2024-08-22 09:44:33 -07:00