172 Commits

Author SHA1 Message Date
Samuel Parker
28ee604071 [WebAssembly] pmin/pmax fixes
Reverse the operand ordering to ? rhs : lhs.

Differential Revision: https://reviews.llvm.org/D144466
2023-02-22 10:02:16 +00:00
Jun Ma
e9d7f96a11 [WebAssembly] Add more combine pattern for vector shift
After change with D144169, the codegen generates redundant instructions
like and and wrap. This fixes it.

Differential Revision: https://reviews.llvm.org/D144360
2023-02-22 09:53:00 +08:00
Samuel Parker
a674a12dd5 [WebAssembly] Additional patterns for pmin/pax
Each operation was missing their inverted condition using olt or ogt.
Also, as we don't need to discern +/-0, I think we should also be
able to use ole and oge.

Differential Revision: https://reviews.llvm.org/D143581
2023-02-10 09:54:45 +00:00
Luke Lau
f841ad30d7 [WebAssembly] Replace LOAD_SPLAT with SPLAT_VECTOR
Splats were selected by matching on uses of `build_vector` with
identical elements, but a while back a target independent node for
vector splatting was added.
This removes the WebAssembly specific LOAD_SPLAT intrinsic, and instead
makes SPLAT_VECTOR legal and adds patterns for splat loads.

Differential Revision: https://reviews.llvm.org/D139871
2023-01-04 15:07:47 +00:00
Luke Lau
0cd9c51766 [WebAssembly] Use ComplexPattern on remaining memory instructions
This continues the refactoring work of selecting offset + address
operands with the AddrOpsN pattern, previously called LoadOpsN.

This is not an NFC, since constant addresses are now folded into the
offset in more places for v128.storeN_lane.

Differential Revision: https://reviews.llvm.org/D139950
2022-12-15 10:20:06 +00:00
Luke Lau
982b8e0bbb [WebAssembly][NFC] Add ComplexPattern for loads
This refactors out the offset and address operand pattern matching into
a ComplexPattern, so that one pattern fragment can match the dynamic and
static (offset) addresses in all possible positions.

Split out from D139530, which also contained an improvement to global
address folding.

Differential Revision: https://reviews.llvm.org/D139631
2022-12-14 12:11:30 +00:00
Thomas Lively
ae96b5bd2d [WebAssembly] Update relaxed-simd instruction names
Including builtin and intrinsic names. These should be the final names for the
proposal.
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md

Reviewed By: aheejin, maratyszcza

Differential Revision: https://reviews.llvm.org/D138249
2022-11-21 12:40:15 -08:00
Fanchen Kong
8a2729fea7 [WebAssembly] Improve codegen for loading scalars from memory to v128
Use load32_zero instead of load32_splat to load the low 32 bits from memory to
v128. Test cases are added to cover this change.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D134257
2022-09-21 21:05:44 -07:00
Thomas Lively
ac3b8df8f2 [WebAssembly] Prototype f32x4.relaxed_dot_bf16x8_add_f32
As proposed in https://github.com/WebAssembly/relaxed-simd/issues/77. Only an
LLVM intrinsic and a clang builtin are implemented. Since there is no bfloat16
type, use u16 to represent the bfloats in the builtin function arguments.

Differential Revision: https://reviews.llvm.org/D133428
2022-09-08 08:07:49 -07:00
Thomas Lively
b19de814ad [WebAssembly] Improve codegen for v128.bitselect
Add patterns selecting ((v1 ^ v2) & c) ^ v2 and ((v1 ^ v2) & ~c) ^ v2 to
v128.bitselect.

Resolves #56827.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D131131
2022-08-03 23:28:37 -07:00
Thomas Lively
aff679a48c [WebAssembly] Implement remaining relaxed SIMD instructions
Add codegen, intrinsics, and builtins for the i16x8.relaxed_q15mulr_s,
i16x8.dot_i8x16_i7x16_s, and i32x4.dot_i8x16_i7x16_add_s instructions. These are
the last instructions from the relaxed SIMD proposal[1] that had not been
implemented.

[1]:
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md.

Differential Revision: https://reviews.llvm.org/D127170
2022-06-08 10:32:10 -07:00
Thomas Lively
576b8245c8 [WebAssembly][NFC] RelaxedBinary tablegen multiclass for relaxed SIMD
Refactor the tablegen definitions for relaxed SIMD min/max instructions to use a
shared RelaxedBinary multiclass modeled on the existing SIMDBinary multiclass. A
future commit will add further instruction definitions that use RelaxedBinary.

Also rename the SIMD_RELAXED_CONVERT multiclass to RelaxedConvert to better fit
existing naming conventions.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D127157
2022-06-06 17:56:39 -07:00
Thomas Lively
82a13d05ab [WebAssembly] Update relaxed SIMD opcodes and names
to reflect the latest state of the proposal:
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format.
Moves code around to match the instruction order from the proposal, but the only
functional changes are to the names and opcodes.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D125726
2022-05-16 17:51:45 -07:00
Thomas Lively
7e8913d775 [WebAssembly] Fix names of SIMD instructions containing '_zero'
Fix the instruction names to match the WebAssembly spec:

 - `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero`
 - `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero`

Also rename related things like intrinsics, builtins, and test functions to
match.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D121661
2022-03-16 13:34:57 -07:00
Jing Bao
2a4a229d6d [WebAssembly] Custom optimization for truncate
When possible, optimize TRUNCATE to generate Wasm SIMD narrow
instructions (i16x8.narrow_i32x4_u, i8x16.narrow_i16x8_u), rather than generate
lots of extract_lane and replace_lane.

Closes #50350.
2021-12-14 08:42:39 -08:00
Thomas Lively
fb67f3d969 [WebAssembly] Add prototype relaxed float to int trunc instructions
Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u,
i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero.

These are only exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112186
2021-10-28 14:01:53 -07:00
Zhi An Ng
e1fb13401e [WebAssembly] Add prototype relaxed float min max instructions
Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only
exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112146
2021-10-20 09:41:51 -07:00
Zhi An Ng
2542bfa43a [WebAssembly] Add prototype relaxed swizzle instructions
Add i8x16 relaxed_swizzle instructions. These are only
exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112022
2021-10-19 17:53:04 -07:00
Zhi An Ng
da07942834 [WebAssembly] Add prototype relaxed laneselect instructions
Add i8x16, i16x8, i32x4, i64x2 laneselect instructions. These are only
exposed as builtins, and require user opt-in.
2021-10-15 17:45:09 -07:00
Thomas Lively
2f519825ba [WebAssembly] Add prototype relaxed SIMD fma/fms instructions
Add experimental clang builtins, LLVM intrinsics, and backend definitions for
the new {f32x4,f64x2}.{fma,fms} instructions in the relaxed SIMD proposal:
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md.
Do not allow these instructions to be selected without explicit user opt-in.

Differential Revision: https://reviews.llvm.org/D110295
2021-09-23 11:01:36 -07:00
Thomas Lively
fec4749200 [WebAssembly] Lower v2f32 to v2f64 extending loads with promote_low
Previously extra wide v4f32 to v4f64 extending loads would be legalized to v2f32
to v2f64 extending loads, which would then be scalarized by legalization. (v2f32
to v2f64 extending loads not produced by legalization were already being emitted
correctly.) Instead, mark v2f32 to v2f64 extending loads as legal and explicitly
lower them using promote_low. This regresses the addressing modes supported for
the extloads not produced by legalization, but that's a fine trade off for now.

Differential Revision: https://reviews.llvm.org/D108496
2021-09-01 10:27:42 -07:00
Thomas Lively
88962cea46 [WebAssembly] Restore builtins and intrinsics for pmin/pmax
Partially reverts 85157c007903, which had removed these builtins and intrinsics
in favor of normal codegen patterns. It turns out that it is possible for the
patterns to be split over multiple basic blocks, however, which means that DAG
ISel is not able to select them to the pmin/pmax instructions. To make sure the
SIMD intrinsics generate the correct instructions in these cases, reintroduce
the clang builtins and corresponding LLVM intrinsics, but also keep the normal
pattern matching as well.

Differential Revision: https://reviews.llvm.org/D108387
2021-08-20 09:21:31 -07:00
Thomas Lively
b69374ca58 [WebAssembly] Legalize vector types by widening
The default legalization of unsupported vector types is to promote the integers
in each lane, which leads to extra sign or zero extending and masking when
moving data into and out of vectors. Switch our preferred type legalization from
the default to vector widening, which keeps the data in the low lanes of the
vector rather than in the low bits of each lane. The unused high lanes can be
ignored.

Half-wide vectors are now loaded from memory into the low 64 bits of the v128
rather than spread out among the lanes. As a result, v128.load64_splat is a much
more common operation, so add new patterns to support it.

Differential Revision: https://reviews.llvm.org/D107502
2021-08-19 12:07:33 -07:00
Thomas Lively
33786576fd [WebAssembly] Codegen for extmul SIMD instructions
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.

Differential Revision: https://reviews.llvm.org/D106724
2021-07-27 08:41:30 -07:00
Thomas Lively
85157c0079 [WebAssembly] Codegen for pmin and pmax
Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax}
with standard codegen patterns. Since wasm_simd128.h uses an integer vector as
the standard single vector type, the IR for the pmin and pmax intrinsic
functions contains bitcasts that would not be there otherwise. Add extra codegen
patterns that can still select the pmin and pmax instructions in the presence of
these bitcasts.

Differential Revision: https://reviews.llvm.org/D106612
2021-07-23 14:49:21 -07:00
Thomas Lively
39c0e4afce [WebAssembly][NFC] Simplify SIMD bitconvert pattern
Differential Revision: https://reviews.llvm.org/D106680
2021-07-23 14:43:48 -07:00
Thomas Lively
8af333cf1a [WebAssembly] Replace @llvm.wasm.popcnt with @llvm.ctpop.v16i8
Use the standard target-independent intrinsic to take advantage of standard
optimizations.

Differential Revision: https://reviews.llvm.org/D106506
2021-07-21 16:45:54 -07:00
Thomas Lively
1a57ee1276 [WebAssembly] Codegen for v128.load{32,64}_zero
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal instruction selection patterns. The wasm_simd128.h
intrinsics header was already using portable code for the corresponding
intrinsics, so now it produces the correct instructions.

Differential Revision: https://reviews.llvm.org/D106400
2021-07-21 09:02:12 -07:00
Thomas Lively
4a4229f70f [WebAssembly] Codegen for v128.storeX_lane instructions
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50435.

Differential Revision: https://reviews.llvm.org/D106019
2021-07-14 16:15:25 -07:00
Thomas Lively
970e090010 [WebAssembly] Codegen for v128.loadX_lane instructions
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.

Differential Revision: https://reviews.llvm.org/D105950
2021-07-14 11:31:53 -07:00
Thomas Lively
cbabfc63b1 [WebAssembly] Custom combines for f32x4.demote_zero_f64x2
Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.

Differential Revision: https://reviews.llvm.org/D105755
2021-07-12 10:32:18 -07:00
Thomas Lively
e5220104d0 [WebAssembly] Custom combines for f64x2.promote_low_f32x4
Replace the clang builtin function and LLVM intrinsic previously used to select
the f64x2.promote_low_f32x4 instruction with custom combines from standard
SelectionDAG nodes. Implement the new combines to share code with the similar
combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232.

Differential Revision: https://reviews.llvm.org/D105675
2021-07-09 18:59:29 -07:00
Thomas Lively
f8c5a4c670 [WebAssembly] Optimize out shift masks
WebAssembly's shift instructions implicitly masks the shift count, so optimize
out redundant explicit masks of the shift count. For vector shifts, this
currently only works if the mask is applied before splatting the shift count,
but this should be addressed in a future commit. Resolves PR49655.

Differential Revision: https://reviews.llvm.org/D105600
2021-07-07 23:14:31 -07:00
Craig Topper
3067520bf4 [SelectionDAG] Use a VTSDNode to store the saturation width for FP_TO_SINT_SAT/FP_TO_UINT_SAT
Previously we used an i32 constant to store the saturation width, but i32 isn't
legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the
type legalizer.

This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes
it opaque to the type legalizer.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101262
2021-04-27 14:38:42 -07:00
Thomas Lively
e657c84fa1 [WebAssembly] Use v128.const instead of splats for constants
We previously used splats instead of v128.const to materialize vector constants
because V8 did not support v128.const. Now that V8 supports v128.const, we can
use v128.const instead. Although this increases code size, it should also
increase performance (or at least require fewer engine-side optimizations), so
it is an appropriate change to make.

Differential Revision: https://reviews.llvm.org/D100716
2021-04-19 12:43:59 -07:00
Thomas Lively
5c729750a6 [WebAssembly] Remove saturating fp-to-int target intrinsics
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.

Differential Revision: https://reviews.llvm.org/D100596
2021-04-16 12:11:20 -07:00
Thomas Lively
6a18cc23ef [WebAssembly] Codegen for i64x2.extend_{low,high}_i32x4_{s,u}
Removes the builtins and intrinsics used to opt in to using these instructions
and replaces them with normal ISel patterns now that they are no longer
prototypes.

Differential Revision: https://reviews.llvm.org/D100402
2021-04-14 13:43:09 -07:00
Thomas Lively
af7925b4dd [WebAssembly] Codegen for f64x2.convert_low_i32x4_{s,u}
Add a custom DAG combine and ISD opcode for detecting patterns like

  (uint_to_fp (extract_subvector ...))

before the extract_subvector is expanded to ensure that they will ultimately
lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions
are no longer prototypes and can now be produced via standard IR, this commit
also removes the target intrinsics and builtins that had been used to prototype
the instructions.

Differential Revision: https://reviews.llvm.org/D100425
2021-04-14 10:42:45 -07:00
Thomas Lively
af7ab81ce3 [WebAssembly] Use standard intrinsics for f32x4 and f64x2 ops
Now that these instructions are no longer prototypes, we do not need to be
careful about keeping them opt-in and can use the standard LLVM infrastructure
for them. This commit removes the bespoke intrinsics we were using to represent
these operations in favor of the corresponding target-independent intrinsics.
The clang builtins are preserved because there is no standard way to easily
represent these operations in C/C++.

For consistency with the scalar codegen in the Wasm backend, the intrinsic used
to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though
@llvm.roundeven better captures the semantics of the underlying Wasm
instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is
left to a potential future patch.

Differential Revision: https://reviews.llvm.org/D100411
2021-04-14 09:19:27 -07:00
Thomas Lively
ea8dd3ee2e [WebAssembly] Update v128.any_true
In the final SIMD spec, there is only a single v128.any_true instruction, rather
than one for each lane interpretation because the semantics do not depend on the
lane interpretation.

Differential Revision: https://reviews.llvm.org/D100241
2021-04-11 11:13:16 -07:00
Thomas Lively
45783d0e8a [WebAssembly] Implement i64x2 comparisons
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.

Differential Revision: https://reviews.llvm.org/D99623
2021-03-31 10:46:17 -07:00
Thomas Lively
a1b8b0739a [WebAssembly] Fix i8x16.popcnt opcode
When I updated the SIMD opcodes in f5764a8654e3, I accidentally missed updating
i8x16.popcnt. This patch fixes the omission.

Differential Revision: https://reviews.llvm.org/D99536
2021-03-29 17:23:15 -07:00
Thomas Lively
f5764a8654 [WebAssembly] Finalize SIMD names and opcodes
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.

Depends on D98466.

Differential Revision: https://reviews.llvm.org/D98676
2021-03-18 11:21:25 -07:00
Thomas Lively
2f2ae08da9 [WebAssembly] Remove experimental SIMD instructions
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.

Depends on D98457.

Differential Revision: https://reviews.llvm.org/D98466
2021-03-18 11:21:24 -07:00
Thomas Lively
8638c897f4 [WebAssembly] Remove unimplemented-simd target feature
Now that the WebAssembly SIMD specification is finalized and engines are
generally up-to-date, there is no need for a separate target feature for gating
SIMD instructions that engines have not implemented. With this change,
v128.const is now enabled by default with the simd128 target feature.

Differential Revision: https://reviews.llvm.org/D98457
2021-03-18 10:23:12 -07:00
Simon Pilgrim
db5abfbbb4 [WebAssembly] Fix multiclass template parameter types. NFC.
Fixes TableGen parser errors reported by D95874 due to incompatible types being used on multiclass templates.

Differential Revision: https://reviews.llvm.org/D96205
2021-02-08 09:36:56 +00:00
Thomas Lively
4b68b64dcc [WebAssembly] Prototype i8x16 to i32x4 widening instructions
As proposed in https://github.com/WebAssembly/simd/pull/395 and matching the
opcodes used in V8:
https://chromium-review.googlesource.com/c/v8/v8/+/2617385/4/src/wasm/wasm-opcodes.h

Differential Revision: https://reviews.llvm.org/D95557
2021-01-28 10:59:32 -08:00
Thomas Lively
11802eced5 [WebAssembly] Prototype new f64x2 conversions
As proposed in https://github.com/WebAssembly/simd/pull/383.

Differential Revision: https://reviews.llvm.org/D95012
2021-01-20 11:28:06 -08:00
Thomas Lively
497026c902 [WebAssembly] Prototype prefetch instructions
As proposed in https://github.com/WebAssembly/simd/pull/352 and using the
opcodes used in the V8 prototype:
https://chromium-review.googlesource.com/c/v8/v8/+/2543167. These instructions
are only usable via intrinsics and clang builtins to make them opt-in while they
are being benchmarked.

Differential Revision: https://reviews.llvm.org/D93883
2021-01-05 11:32:03 -08:00
Thomas Lively
5e09e9979b [WebAssembly] Prototype extending pairwise add instructions
As proposed in https://github.com/WebAssembly/simd/pull/380. This commit makes
the new instructions available only via clang builtins and LLVM intrinsics to
make their use opt-in while they are still being evaluated for inclusion in the
SIMD proposal.

Depends on D93771.

Differential Revision: https://reviews.llvm.org/D93775
2020-12-28 14:11:14 -08:00