215 Commits

Author SHA1 Message Date
Sam Parker
1e0114c21d
[WebAssembly] Zero and NaN checks for min/max (#177968)
Custom lower FMINNUM, FMINIMUMNUM, FMAXNUM and FMAXIMUMNUM to generate
relaxed_min and relaxed_max when the inputs cannot be NaN or signed
zero.

Tablegen patterns have also been modified to check the above conditions
when trying to match relaxed min/max using the pmin/pmax pattern.
2026-01-28 09:25:41 +00:00
Sam Parker
b84ffe040b
[WebAssembly] LoadLane matching with offsets (#176005) 2026-01-15 08:39:42 +00:00
hanbeom
1171e30cb0
[WebAssembly] Support v128.load{32,64}_zero for f32 and f64 types (#172291)
This patch extends the `load_zero` pattern matching to
support floating-point vector types (`v4f32` and `v2f64`).

Previously, the optimization to generate `v128.load32_zero` and
`v128.load64_zero` was only enabled for integer types
(`v4i32` and `v2i64`). This change adds the necessary TableGen
patterns to correctly match scalar floating-point loads inserted
into zero-initialized vectors.
2026-01-08 09:28:14 +09:00
Jasmine Tang
672757bf55
[WebAssembly] Add patterns for extadd pairwise (#167960)
Add a few patterns for extadd pairwise.
2025-11-18 02:41:16 -08:00
Jasmine Tang
e6cd7a52bc
[WebAssembly] [Codegen] Add pattern for relaxed min max from pmin/pmax-based patterns over v4f32 and v2f64 (#164486)
Related to https://github.com/llvm/llvm-project/issues/55932
2025-10-23 01:39:02 -07:00
Jasmine Tang
1fbfac30f1
[WebAssembly] [Codegen] Add pattern for relaxed min max from fminimum/fmaximum over v4f32 and v2f64 (#162948)
Related to #55932
2025-10-22 03:08:24 -07:00
Sam Parker
aa63949428
[WebAssembly] Avoid dot for v16i8 partial_smla (#163796)
The sequence is shorter, by two extend operations, if we just use extmul
and extadd_pairwise.
2025-10-20 09:12:00 +01:00
Jasmine Tang
893b1d4187
[WebAssembly] [Codegen] Add patterns for relaxed dot (#163266)
The pattern I added for `relaxed dot` similar to normal dot @
https://github.com/llvm/llvm-project/pull/151775.

For `relaxed dot add`, i noticed that in the proposal the portion of dot
implementation is similar to `relaxed dot`, so I think we can add a
pattern where after we do relaxed dot and do extadd pairwise, we can do
`relaxed dot add`.

One current obstacles is I don't think there is any pattern to singly
create a extadd pairwise from other instructions so the `relaxed dot
add` pattern would not cover a wide range of instructions.

related to https://github.com/llvm/llvm-project/issues/55932
2025-10-16 15:01:57 +00:00
Sam Parker
65363e64f8
[WebAssembly] Partial SMLA with relaxed dot (#163529)
Lower v16i8 to v4i32 partial_smla to relaxed_dot_add. I'm still unsure
whether we could/should take advantage of the unknown signedness of the
rhs, and also lower the partial_sumla operation too.
2025-10-16 07:09:16 +01:00
Jasmine Tang
55d4e92c88
[WebAssembly] Add extra pattern for dot (#151775)
Fixes https://github.com/llvm/llvm-project/issues/50154
2025-10-13 10:27:12 -07:00
Sam Parker
1820102167
Wasm fmuladd relaxed (#163177)
Reland #161355, after fixing up the cross-projects-tests for the wasm
simd intrinsics.

Original commit message:
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 16:50:53 +01:00
Sam Parker
30d3441cf0
Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171)
Reverts llvm/llvm-project#161355

Looks like I've broken some intrinsic code generation.
2025-10-13 11:53:40 +01:00
Sam Parker
a4eb7ea225
[WebAssembly] Lower fmuladd to madd and nmadd (#161355)
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 10:36:08 +01:00
Folkert de Vries
761be78dd7
[WebAssembly] recognize saturating truncation (#155470)
fixes https://github.com/llvm/llvm-project/issues/153838
using the same approach as
https://github.com/llvm/llvm-project/pull/155377

Recognize a manual saturating truncation and select the corresponding
instruction. This is useful in general, but came up specifically in
https://github.com/rust-lang/stdarch because it will allow us to drop
more target-specific intrinsics in favor of cross-platform ones.
2025-10-08 11:52:18 -07:00
Sam Parker
156e9b4b69
[WebAssembly] Use partial_reduce_mla ISD nodes (#161184)
Addresssing issue #160847.
 
Move away from combining the intrinsic call and instead lower the ISD
nodes, using tablegen for pattern matching.
2025-09-30 08:28:56 +01:00
Sam Parker
586c0ad918
[WebAssembly] Support partial-reduce accumulator (#158060)
We currently only support partial.reduce.add in the case where we are
performing a multiply-accumulate. Now add support for any partial
reduction where the input is being extended, where we can take advantage
of extadd_pairwise.
2025-09-12 07:03:49 +01:00
Sam Parker
6dacdc31ec
[WebAssembly] extadd_pairwise for PartialReduce (#157669)
Avoid using extends, and adding the high and low half and use
extadd_pairwise instead.
2025-09-10 08:13:46 +01:00
Jasmine Tang
7fcee5fe08
[WebAssembly] Add support for avgr_u in loops (#153252)
Fixes https://github.com/llvm/llvm-project/issues/150550.

With the test case 
```
void f(unsigned char *x, unsigned char *y, int n) {
  // should have been vectorized into avgr_u instead of seperated vectorized add and logical right shift
  for (int i = 0; i < n; i++)
    x[i] = (x[i] + y[i] + 1) / 2;
}
```

the backend failed to recognize that this can be reduced to avgr_u since
the loop vectorizer doesn't transform into the existing pattern in
tablegen.

This PR sets AVGCEIL_U as legal for v8i16 and v16i8 and selects it to
avgr_u in the tablegen file.
2025-08-22 09:52:49 -07:00
Jasmine Tang
522ac23609
[WebAssembly] Add pattern for relaxed nmadd (#150684)
Following footstep of https://github.com/llvm/llvm-project/pull/147487
(support for madd), this PR adds support for nmadd.

https://github.com/llvm/llvm-project/issues/55932 tracks this
2025-07-28 10:20:04 -07:00
jjasmine
6640b0a293
[WebAssembly] Add patterns for relaxed madd (#147487)
[WebAssembly] Fold fadd contract (fmul contract) to relaxed madd w/
-mattr=+simd128,+relaxed-simd

Fixes #121311

- Precommit test for #121311
- Fold fadd contract (fmul contract) to relaxed madd w/
-mattr=+simd128,+relaxed-simd
- Move PatFrag of fadd_contract in ARM.td and WebAssembly.td to
TargetSelectionDAG.td for reuse of pattern
2025-07-15 00:56:28 +08:00
Brendan Dahl
67056c280a
[WebAssembly] Support shuffle for F16x8 vectors. (#127857) 2025-02-25 10:39:54 -08:00
Sam Parker
df2de13695
[WebAssembly] Autovec support for dot (#123207)
Enable the use of partial.reduce.add that we can lower to dot or a tree
of (add (extmul_low_u, extmul_high_u)) for the unsigned case. We support
both v8i16 and v16i8 inputs.
2025-02-03 08:58:43 +00:00
Jay Foad
922992a22f
Fix typo "instrinsic" (#112899) 2024-10-18 15:58:33 +01:00
Simon Pilgrim
f8f0a266e0
[clang][wasm] Replace the target integer sub saturate intrinsics with the equivalent generic __builtin_elementwise_sub_sat intrinsics (#109405)
Remove the Intrinsic::wasm_sub_sat_signed/wasm_sub_sat_unsigned entries
and just use sub_sat_s/sub_sat_u directly
2024-09-22 10:12:41 +01:00
Brendan Dahl
07a7bdc806
[WebAssembly] Fix lane index size for f16x8 extract_lane. (#108118) 2024-09-11 15:27:38 -07:00
Brendan Dahl
415288a2a7
[WebAssembly] Add load and store patterns for V8F16. (#108119) 2024-09-11 09:53:53 -07:00
Brendan Dahl
923a1c1fc3
[WebAssembly] Update FP16 opcodes to match current spec. (#106759)
f267a3d544/proposals/half-precision/Overview.md (binary-format)
2024-08-30 13:01:16 -07:00
Brendan Dahl
5703d8572f
[WebAssembly] Add intrinsics to wasm_simd128.h for all FP16 instructions (#106465)
Getting this to work required a few additional changes:
- Add builtins for any instructions that can't be done with plain C
currently.
- Add support for the saturating version of fp_to_<s,i>_I16x8. Other
vector sizes supported this already.
- Support bitcast of f16x8 to v128. Needed to return a __f16x8 as
v128_t.
2024-08-30 08:42:37 -07:00
Brendan Dahl
7d373cef49
[WebAssembly] Change half-precision feature name to fp16. (#105434)
This better aligns with how the feature is being referred to and what
runtimes (V8) are calling it.
2024-08-22 09:44:33 -07:00
Sam Parker
08decd20a9
[WebAssembly] load_zero to initialise build_vector (#100610)
Instead of splatting a single lane, to initialise a build_vector, lower
to scalar_to_vector which can be selected to load_zero.

Also add load_zero and load_lane patterns for f32x4 and f64x2.
2024-08-02 10:11:21 +01:00
Brendan Dahl
0dbd72d6ab
[WebAssembly] Implement f16x8.replace_lane instruction. (#99388)
Use a builtin and intrinsic until half types are better supported for
instruction selection.
2024-07-24 11:55:36 -07:00
Sam Parker
a3de21cac1
[WebAssembly] Ofast pmin/pmax pattern matchers (#100107)
With fast-math, the ordered setcc nodes are converted to setcc nodes
which do not care about NaNs, so add patterns that use setlt, setle,
setgt and setge.
2024-07-24 09:23:49 +01:00
Brendan Dahl
928b780840
[WebAssembly] Implement trunc_sat and convert instructions for f16x8. (#95180)
These instructions can be generated using regular LL intrinsics.

Specified at:

29a9b9462c/proposals/half-precision/Overview.md
2024-06-25 10:39:05 -07:00
Brendan Dahl
3ab6d12625
[WebAssembly] Implement f16x8 madd and nmadd instructions. (#95151)
Implemented with intrinsics and builtins.

Specified at:

https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
2024-06-11 16:10:00 -07:00
Brendan Dahl
dfd1a2f081
[WebAssembly] Implement all f16x8 unary instructions. (#94063)
All of these instructions can be generated using regular LL intrinsics.

Specified at:

29a9b9462c/proposals/half-precision/Overview.md
2024-06-04 13:06:16 -04:00
Brendan Dahl
8aa8019975
[WebAssembly] Implement all f16x8 relation instructions. (#93751)
All of these instructions can be generated using regular LL
instructions.

Specified at:

29a9b9462c/proposals/half-precision/Overview.md
2024-05-30 09:02:17 -07:00
Brendan Dahl
60bce6eab4
[WebAssembly] Implement all f16x8 binary instructions. (#93360)
This reuses most of the code that was created for f32x4 and f64x2 binary
instructions and tries to follow how they were implemented.

add/sub/mul/div - use regular LL instructions
min/max - use the minimum/maximum intrinsic, and also have builtins
pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins

Specified at:

29a9b9462c/proposals/half-precision/Overview.md
2024-05-28 16:33:20 -07:00
Brendan Dahl
4ebe9bba59
[WebAssembly] Implement prototype f16x8.extract_lane instruction. (#93272)
Specified at:

29a9b9462c/proposals/half-precision/Overview.md

Note: the current spec has f16x8.extract_lane as opcode 0x124, but this
is incorrect and will be changed to 0x121 soon.
2024-05-24 08:31:07 -07:00
Brendan Dahl
09c5525610
[WebAssembly] Implement prototype f16x8.splat instruction. (#93228)
Adds a builtin and intrinsic for the f16x8.splat instruction.

Specified at:

29a9b9462c/proposals/half-precision/Overview.md

Note: the current spec has f16x8.splat as opcode 0x123, but this is
incorrect and will be changed to 0x120 soon.
2024-05-23 20:05:22 -07:00
Thomas Lively
767e0c8bce
[WebAssembly] Select BUILD_VECTOR with large unsigned lane values (#85880)
Previously we expected lane constants to be in the range of signed
values for each lane size, but the included test case produced large
unsigned values that fall outside that range. Allow instruction
selection to proceed in this case rather than failing.

Fixes #63817.
2024-03-20 08:42:42 -07:00
xortoast
bb648c9177 [WebAssembly] Add lowering for llvm.rint and llvm.roundeven
WebAssembly doesn't expose inexact exceptions, so frint can be mapped to
fnearbyint. Likewise, WebAssembly always rounds ties-to-even, so
froundeven can be mapped to fnearbyint.

Differential Revision: https://reviews.llvm.org/D153451
2023-06-23 14:07:11 -07:00
Thomas Lively
72a72315b0 [WebAssembly] Mark @llvm.wasm.shuffle lane indices as immediates
This intrinsic is meant to lower directly to the i8x16.shuffle instruction,
which takes its lane index arguments as immmediates. The ISel for the intrinsic
assumed that the lane index arguments were constants, so bitcode that
"incorrectly" used this intrinsic with non-immediate arguments caused an
assertion failure in the backend.

Avoid the crash by defining the lane index arguments to be immediates, matching
the underlying instruction. Update ISel accordingly. This change means that the
bitcode that previously caused a crash will now fail to validate.

Fixes #55559.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D149898
2023-05-05 08:12:41 -07:00
Thomas Lively
abdb5e041c [WebAssembly] Remove incorrect result from wasm64 store_lane instructions
The wasm64 versions of the v128.storeX_lane instructions was incorrectly defined
as returning a v128 value, which resulted in spurious drop instructions being
emitted and causing validation to fail. This was not caught earlier because
wasm64 has been experimental and not well tested. Update the relevant test file
to test both wasm32 and wasm64.

Fixes #62443.

Differential Revision: https://reviews.llvm.org/D149780
2023-05-03 16:00:20 -07:00
Samuel Parker
28ee604071 [WebAssembly] pmin/pmax fixes
Reverse the operand ordering to ? rhs : lhs.

Differential Revision: https://reviews.llvm.org/D144466
2023-02-22 10:02:16 +00:00
Jun Ma
e9d7f96a11 [WebAssembly] Add more combine pattern for vector shift
After change with D144169, the codegen generates redundant instructions
like and and wrap. This fixes it.

Differential Revision: https://reviews.llvm.org/D144360
2023-02-22 09:53:00 +08:00
Samuel Parker
a674a12dd5 [WebAssembly] Additional patterns for pmin/pax
Each operation was missing their inverted condition using olt or ogt.
Also, as we don't need to discern +/-0, I think we should also be
able to use ole and oge.

Differential Revision: https://reviews.llvm.org/D143581
2023-02-10 09:54:45 +00:00
Luke Lau
f841ad30d7 [WebAssembly] Replace LOAD_SPLAT with SPLAT_VECTOR
Splats were selected by matching on uses of `build_vector` with
identical elements, but a while back a target independent node for
vector splatting was added.
This removes the WebAssembly specific LOAD_SPLAT intrinsic, and instead
makes SPLAT_VECTOR legal and adds patterns for splat loads.

Differential Revision: https://reviews.llvm.org/D139871
2023-01-04 15:07:47 +00:00
Luke Lau
0cd9c51766 [WebAssembly] Use ComplexPattern on remaining memory instructions
This continues the refactoring work of selecting offset + address
operands with the AddrOpsN pattern, previously called LoadOpsN.

This is not an NFC, since constant addresses are now folded into the
offset in more places for v128.storeN_lane.

Differential Revision: https://reviews.llvm.org/D139950
2022-12-15 10:20:06 +00:00
Luke Lau
982b8e0bbb [WebAssembly][NFC] Add ComplexPattern for loads
This refactors out the offset and address operand pattern matching into
a ComplexPattern, so that one pattern fragment can match the dynamic and
static (offset) addresses in all possible positions.

Split out from D139530, which also contained an improvement to global
address folding.

Differential Revision: https://reviews.llvm.org/D139631
2022-12-14 12:11:30 +00:00
Thomas Lively
ae96b5bd2d [WebAssembly] Update relaxed-simd instruction names
Including builtin and intrinsic names. These should be the final names for the
proposal.
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md

Reviewed By: aheejin, maratyszcza

Differential Revision: https://reviews.llvm.org/D138249
2022-11-21 12:40:15 -08:00