llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	f8f0a266e0	[clang][wasm] Replace the target integer sub saturate intrinsics with the equivalent generic `__builtin_elementwise_sub_sat` intrinsics (#109405 ) Remove the Intrinsic::wasm_sub_sat_signed/wasm_sub_sat_unsigned entries and just use sub_sat_s/sub_sat_u directly	2024-09-22 10:12:41 +01:00
Brendan Dahl	07a7bdc806	[WebAssembly] Fix lane index size for f16x8 extract_lane. (#108118 )	2024-09-11 15:27:38 -07:00
Brendan Dahl	415288a2a7	[WebAssembly] Add load and store patterns for V8F16. (#108119 )	2024-09-11 09:53:53 -07:00
Brendan Dahl	923a1c1fc3	[WebAssembly] Update FP16 opcodes to match current spec. (#106759 ) `f267a3d544/proposals/half-precision/Overview.md (binary-format)`	2024-08-30 13:01:16 -07:00
Brendan Dahl	5703d8572f	[WebAssembly] Add intrinsics to wasm_simd128.h for all FP16 instructions (#106465 ) Getting this to work required a few additional changes: - Add builtins for any instructions that can't be done with plain C currently. - Add support for the saturating version of fp_to_<s,i>_I16x8. Other vector sizes supported this already. - Support bitcast of f16x8 to v128. Needed to return a __f16x8 as v128_t.	2024-08-30 08:42:37 -07:00
Brendan Dahl	7d373cef49	[WebAssembly] Change half-precision feature name to fp16. (#105434 ) This better aligns with how the feature is being referred to and what runtimes (V8) are calling it.	2024-08-22 09:44:33 -07:00
Sam Parker	08decd20a9	[WebAssembly] load_zero to initialise build_vector (#100610 ) Instead of splatting a single lane, to initialise a build_vector, lower to scalar_to_vector which can be selected to load_zero. Also add load_zero and load_lane patterns for f32x4 and f64x2.	2024-08-02 10:11:21 +01:00
Brendan Dahl	0dbd72d6ab	[WebAssembly] Implement f16x8.replace_lane instruction. (#99388 ) Use a builtin and intrinsic until half types are better supported for instruction selection.	2024-07-24 11:55:36 -07:00
Sam Parker	a3de21cac1	[WebAssembly] Ofast pmin/pmax pattern matchers (#100107 ) With fast-math, the ordered setcc nodes are converted to setcc nodes which do not care about NaNs, so add patterns that use setlt, setle, setgt and setge.	2024-07-24 09:23:49 +01:00
Brendan Dahl	928b780840	[WebAssembly] Implement trunc_sat and convert instructions for f16x8. (#95180 ) These instructions can be generated using regular LL intrinsics. Specified at: `29a9b9462c/proposals/half-precision/Overview.md`	2024-06-25 10:39:05 -07:00
Brendan Dahl	3ab6d12625	[WebAssembly] Implement f16x8 madd and nmadd instructions. (#95151 ) Implemented with intrinsics and builtins. Specified at: https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md	2024-06-11 16:10:00 -07:00
Brendan Dahl	dfd1a2f081	[WebAssembly] Implement all f16x8 unary instructions. (#94063 ) All of these instructions can be generated using regular LL intrinsics. Specified at: `29a9b9462c/proposals/half-precision/Overview.md`	2024-06-04 13:06:16 -04:00
Brendan Dahl	8aa8019975	[WebAssembly] Implement all f16x8 relation instructions. (#93751 ) All of these instructions can be generated using regular LL instructions. Specified at: `29a9b9462c/proposals/half-precision/Overview.md`	2024-05-30 09:02:17 -07:00
Brendan Dahl	60bce6eab4	[WebAssembly] Implement all f16x8 binary instructions. (#93360 ) This reuses most of the code that was created for f32x4 and f64x2 binary instructions and tries to follow how they were implemented. add/sub/mul/div - use regular LL instructions min/max - use the minimum/maximum intrinsic, and also have builtins pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins Specified at: `29a9b9462c/proposals/half-precision/Overview.md`	2024-05-28 16:33:20 -07:00
Brendan Dahl	4ebe9bba59	[WebAssembly] Implement prototype f16x8.extract_lane instruction. (#93272 ) Specified at: `29a9b9462c/proposals/half-precision/Overview.md` Note: the current spec has f16x8.extract_lane as opcode 0x124, but this is incorrect and will be changed to 0x121 soon.	2024-05-24 08:31:07 -07:00
Brendan Dahl	09c5525610	[WebAssembly] Implement prototype f16x8.splat instruction. (#93228 ) Adds a builtin and intrinsic for the f16x8.splat instruction. Specified at: `29a9b9462c/proposals/half-precision/Overview.md` Note: the current spec has f16x8.splat as opcode 0x123, but this is incorrect and will be changed to 0x120 soon.	2024-05-23 20:05:22 -07:00
Thomas Lively	767e0c8bce	[WebAssembly] Select BUILD_VECTOR with large unsigned lane values (#85880 ) Previously we expected lane constants to be in the range of signed values for each lane size, but the included test case produced large unsigned values that fall outside that range. Allow instruction selection to proceed in this case rather than failing. Fixes #63817.	2024-03-20 08:42:42 -07:00
xortoast	bb648c9177	[WebAssembly] Add lowering for llvm.rint and llvm.roundeven WebAssembly doesn't expose inexact exceptions, so frint can be mapped to fnearbyint. Likewise, WebAssembly always rounds ties-to-even, so froundeven can be mapped to fnearbyint. Differential Revision: https://reviews.llvm.org/D153451	2023-06-23 14:07:11 -07:00
Thomas Lively	72a72315b0	[WebAssembly] Mark @llvm.wasm.shuffle lane indices as immediates This intrinsic is meant to lower directly to the i8x16.shuffle instruction, which takes its lane index arguments as immmediates. The ISel for the intrinsic assumed that the lane index arguments were constants, so bitcode that "incorrectly" used this intrinsic with non-immediate arguments caused an assertion failure in the backend. Avoid the crash by defining the lane index arguments to be immediates, matching the underlying instruction. Update ISel accordingly. This change means that the bitcode that previously caused a crash will now fail to validate. Fixes #55559. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D149898	2023-05-05 08:12:41 -07:00
Thomas Lively	abdb5e041c	[WebAssembly] Remove incorrect result from wasm64 store_lane instructions The wasm64 versions of the v128.storeX_lane instructions was incorrectly defined as returning a v128 value, which resulted in spurious drop instructions being emitted and causing validation to fail. This was not caught earlier because wasm64 has been experimental and not well tested. Update the relevant test file to test both wasm32 and wasm64. Fixes #62443. Differential Revision: https://reviews.llvm.org/D149780	2023-05-03 16:00:20 -07:00
Samuel Parker	28ee604071	[WebAssembly] pmin/pmax fixes Reverse the operand ordering to ? rhs : lhs. Differential Revision: https://reviews.llvm.org/D144466	2023-02-22 10:02:16 +00:00
Jun Ma	e9d7f96a11	[WebAssembly] Add more combine pattern for vector shift After change with D144169, the codegen generates redundant instructions like and and wrap. This fixes it. Differential Revision: https://reviews.llvm.org/D144360	2023-02-22 09:53:00 +08:00
Samuel Parker	a674a12dd5	[WebAssembly] Additional patterns for pmin/pax Each operation was missing their inverted condition using olt or ogt. Also, as we don't need to discern +/-0, I think we should also be able to use ole and oge. Differential Revision: https://reviews.llvm.org/D143581	2023-02-10 09:54:45 +00:00
Luke Lau	f841ad30d7	[WebAssembly] Replace LOAD_SPLAT with SPLAT_VECTOR Splats were selected by matching on uses of `build_vector` with identical elements, but a while back a target independent node for vector splatting was added. This removes the WebAssembly specific LOAD_SPLAT intrinsic, and instead makes SPLAT_VECTOR legal and adds patterns for splat loads. Differential Revision: https://reviews.llvm.org/D139871	2023-01-04 15:07:47 +00:00
Luke Lau	0cd9c51766	[WebAssembly] Use ComplexPattern on remaining memory instructions This continues the refactoring work of selecting offset + address operands with the AddrOpsN pattern, previously called LoadOpsN. This is not an NFC, since constant addresses are now folded into the offset in more places for v128.storeN_lane. Differential Revision: https://reviews.llvm.org/D139950	2022-12-15 10:20:06 +00:00
Luke Lau	982b8e0bbb	[WebAssembly][NFC] Add ComplexPattern for loads This refactors out the offset and address operand pattern matching into a ComplexPattern, so that one pattern fragment can match the dynamic and static (offset) addresses in all possible positions. Split out from D139530, which also contained an improvement to global address folding. Differential Revision: https://reviews.llvm.org/D139631	2022-12-14 12:11:30 +00:00
Thomas Lively	ae96b5bd2d	[WebAssembly] Update relaxed-simd instruction names Including builtin and intrinsic names. These should be the final names for the proposal. https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md Reviewed By: aheejin, maratyszcza Differential Revision: https://reviews.llvm.org/D138249	2022-11-21 12:40:15 -08:00
Fanchen Kong	8a2729fea7	[WebAssembly] Improve codegen for loading scalars from memory to v128 Use load32_zero instead of load32_splat to load the low 32 bits from memory to v128. Test cases are added to cover this change. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D134257	2022-09-21 21:05:44 -07:00
Thomas Lively	ac3b8df8f2	[WebAssembly] Prototype `f32x4.relaxed_dot_bf16x8_add_f32` As proposed in https://github.com/WebAssembly/relaxed-simd/issues/77. Only an LLVM intrinsic and a clang builtin are implemented. Since there is no bfloat16 type, use u16 to represent the bfloats in the builtin function arguments. Differential Revision: https://reviews.llvm.org/D133428	2022-09-08 08:07:49 -07:00
Thomas Lively	b19de814ad	[WebAssembly] Improve codegen for v128.bitselect Add patterns selecting ((v1 ^ v2) & c) ^ v2 and ((v1 ^ v2) & ~c) ^ v2 to v128.bitselect. Resolves #56827. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D131131	2022-08-03 23:28:37 -07:00
Thomas Lively	aff679a48c	[WebAssembly] Implement remaining relaxed SIMD instructions Add codegen, intrinsics, and builtins for the i16x8.relaxed_q15mulr_s, i16x8.dot_i8x16_i7x16_s, and i32x4.dot_i8x16_i7x16_add_s instructions. These are the last instructions from the relaxed SIMD proposal[1] that had not been implemented. [1]: https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md. Differential Revision: https://reviews.llvm.org/D127170	2022-06-08 10:32:10 -07:00
Thomas Lively	576b8245c8	[WebAssembly][NFC] RelaxedBinary tablegen multiclass for relaxed SIMD Refactor the tablegen definitions for relaxed SIMD min/max instructions to use a shared RelaxedBinary multiclass modeled on the existing SIMDBinary multiclass. A future commit will add further instruction definitions that use RelaxedBinary. Also rename the SIMD_RELAXED_CONVERT multiclass to RelaxedConvert to better fit existing naming conventions. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D127157	2022-06-06 17:56:39 -07:00
Thomas Lively	82a13d05ab	[WebAssembly] Update relaxed SIMD opcodes and names to reflect the latest state of the proposal: https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md#binary-format. Moves code around to match the instruction order from the proposal, but the only functional changes are to the names and opcodes. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D125726	2022-05-16 17:51:45 -07:00
Thomas Lively	7e8913d775	[WebAssembly] Fix names of SIMD instructions containing '_zero' Fix the instruction names to match the WebAssembly spec: - `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero` - `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero` Also rename related things like intrinsics, builtins, and test functions to match. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D121661	2022-03-16 13:34:57 -07:00
Jing Bao	2a4a229d6d	[WebAssembly] Custom optimization for truncate When possible, optimize TRUNCATE to generate Wasm SIMD narrow instructions (i16x8.narrow_i32x4_u, i8x16.narrow_i16x8_u), rather than generate lots of extract_lane and replace_lane. Closes #50350.	2021-12-14 08:42:39 -08:00
Thomas Lively	fb67f3d969	[WebAssembly] Add prototype relaxed float to int trunc instructions Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u, i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112186	2021-10-28 14:01:53 -07:00
Zhi An Ng	e1fb13401e	[WebAssembly] Add prototype relaxed float min max instructions Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112146	2021-10-20 09:41:51 -07:00
Zhi An Ng	2542bfa43a	[WebAssembly] Add prototype relaxed swizzle instructions Add i8x16 relaxed_swizzle instructions. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112022	2021-10-19 17:53:04 -07:00
Zhi An Ng	da07942834	[WebAssembly] Add prototype relaxed laneselect instructions Add i8x16, i16x8, i32x4, i64x2 laneselect instructions. These are only exposed as builtins, and require user opt-in.	2021-10-15 17:45:09 -07:00
Thomas Lively	2f519825ba	[WebAssembly] Add prototype relaxed SIMD fma/fms instructions Add experimental clang builtins, LLVM intrinsics, and backend definitions for the new {f32x4,f64x2}.{fma,fms} instructions in the relaxed SIMD proposal: https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md. Do not allow these instructions to be selected without explicit user opt-in. Differential Revision: https://reviews.llvm.org/D110295	2021-09-23 11:01:36 -07:00
Thomas Lively	fec4749200	[WebAssembly] Lower v2f32 to v2f64 extending loads with promote_low Previously extra wide v4f32 to v4f64 extending loads would be legalized to v2f32 to v2f64 extending loads, which would then be scalarized by legalization. (v2f32 to v2f64 extending loads not produced by legalization were already being emitted correctly.) Instead, mark v2f32 to v2f64 extending loads as legal and explicitly lower them using promote_low. This regresses the addressing modes supported for the extloads not produced by legalization, but that's a fine trade off for now. Differential Revision: https://reviews.llvm.org/D108496	2021-09-01 10:27:42 -07:00
Thomas Lively	88962cea46	[WebAssembly] Restore builtins and intrinsics for pmin/pmax Partially reverts 85157c007903, which had removed these builtins and intrinsics in favor of normal codegen patterns. It turns out that it is possible for the patterns to be split over multiple basic blocks, however, which means that DAG ISel is not able to select them to the pmin/pmax instructions. To make sure the SIMD intrinsics generate the correct instructions in these cases, reintroduce the clang builtins and corresponding LLVM intrinsics, but also keep the normal pattern matching as well. Differential Revision: https://reviews.llvm.org/D108387	2021-08-20 09:21:31 -07:00
Thomas Lively	b69374ca58	[WebAssembly] Legalize vector types by widening The default legalization of unsupported vector types is to promote the integers in each lane, which leads to extra sign or zero extending and masking when moving data into and out of vectors. Switch our preferred type legalization from the default to vector widening, which keeps the data in the low lanes of the vector rather than in the low bits of each lane. The unused high lanes can be ignored. Half-wide vectors are now loaded from memory into the low 64 bits of the v128 rather than spread out among the lanes. As a result, v128.load64_splat is a much more common operation, so add new patterns to support it. Differential Revision: https://reviews.llvm.org/D107502	2021-08-19 12:07:33 -07:00
Thomas Lively	33786576fd	[WebAssembly] Codegen for extmul SIMD instructions Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724	2021-07-27 08:41:30 -07:00
Thomas Lively	85157c0079	[WebAssembly] Codegen for pmin and pmax Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax} with standard codegen patterns. Since wasm_simd128.h uses an integer vector as the standard single vector type, the IR for the pmin and pmax intrinsic functions contains bitcasts that would not be there otherwise. Add extra codegen patterns that can still select the pmin and pmax instructions in the presence of these bitcasts. Differential Revision: https://reviews.llvm.org/D106612	2021-07-23 14:49:21 -07:00
Thomas Lively	39c0e4afce	[WebAssembly][NFC] Simplify SIMD bitconvert pattern Differential Revision: https://reviews.llvm.org/D106680	2021-07-23 14:43:48 -07:00
Thomas Lively	8af333cf1a	[WebAssembly] Replace @llvm.wasm.popcnt with @llvm.ctpop.v16i8 Use the standard target-independent intrinsic to take advantage of standard optimizations. Differential Revision: https://reviews.llvm.org/D106506	2021-07-21 16:45:54 -07:00
Thomas Lively	1a57ee1276	[WebAssembly] Codegen for v128.load{32,64}_zero Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal instruction selection patterns. The wasm_simd128.h intrinsics header was already using portable code for the corresponding intrinsics, so now it produces the correct instructions. Differential Revision: https://reviews.llvm.org/D106400	2021-07-21 09:02:12 -07:00
Thomas Lively	4a4229f70f	[WebAssembly] Codegen for v128.storeX_lane instructions Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50435. Differential Revision: https://reviews.llvm.org/D106019	2021-07-14 16:15:25 -07:00
Thomas Lively	970e090010	[WebAssembly] Codegen for v128.loadX_lane instructions Replace the experimental clang builtin and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50433. Differential Revision: https://reviews.llvm.org/D105950	2021-07-14 11:31:53 -07:00

1 2 3 4

192 Commits