Getting this to work required a few additional changes:
- Add builtins for any instructions that can't be done with plain C
currently.
- Add support for the saturating version of fp_to_<s,i>_I16x8. Other
vector sizes supported this already.
- Support bitcast of f16x8 to v128. Needed to return a __f16x8 as
v128_t.
Instead of splatting a single lane, to initialise a build_vector, lower
to scalar_to_vector which can be selected to load_zero.
Also add load_zero and load_lane patterns for f32x4 and f64x2.
With fast-math, the ordered setcc nodes are converted to setcc nodes
which do not care about NaNs, so add patterns that use setlt, setle,
setgt and setge.
This reuses most of the code that was created for f32x4 and f64x2 binary
instructions and tries to follow how they were implemented.
add/sub/mul/div - use regular LL instructions
min/max - use the minimum/maximum intrinsic, and also have builtins
pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Adds a builtin and intrinsic for the f16x8.splat instruction.
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f16x8.splat as opcode 0x123, but this is
incorrect and will be changed to 0x120 soon.
Previously we expected lane constants to be in the range of signed
values for each lane size, but the included test case produced large
unsigned values that fall outside that range. Allow instruction
selection to proceed in this case rather than failing.
Fixes#63817.
WebAssembly doesn't expose inexact exceptions, so frint can be mapped to
fnearbyint. Likewise, WebAssembly always rounds ties-to-even, so
froundeven can be mapped to fnearbyint.
Differential Revision: https://reviews.llvm.org/D153451
This intrinsic is meant to lower directly to the i8x16.shuffle instruction,
which takes its lane index arguments as immmediates. The ISel for the intrinsic
assumed that the lane index arguments were constants, so bitcode that
"incorrectly" used this intrinsic with non-immediate arguments caused an
assertion failure in the backend.
Avoid the crash by defining the lane index arguments to be immediates, matching
the underlying instruction. Update ISel accordingly. This change means that the
bitcode that previously caused a crash will now fail to validate.
Fixes#55559.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D149898
The wasm64 versions of the v128.storeX_lane instructions was incorrectly defined
as returning a v128 value, which resulted in spurious drop instructions being
emitted and causing validation to fail. This was not caught earlier because
wasm64 has been experimental and not well tested. Update the relevant test file
to test both wasm32 and wasm64.
Fixes#62443.
Differential Revision: https://reviews.llvm.org/D149780
After change with D144169, the codegen generates redundant instructions
like and and wrap. This fixes it.
Differential Revision: https://reviews.llvm.org/D144360
Each operation was missing their inverted condition using olt or ogt.
Also, as we don't need to discern +/-0, I think we should also be
able to use ole and oge.
Differential Revision: https://reviews.llvm.org/D143581
Splats were selected by matching on uses of `build_vector` with
identical elements, but a while back a target independent node for
vector splatting was added.
This removes the WebAssembly specific LOAD_SPLAT intrinsic, and instead
makes SPLAT_VECTOR legal and adds patterns for splat loads.
Differential Revision: https://reviews.llvm.org/D139871
This continues the refactoring work of selecting offset + address
operands with the AddrOpsN pattern, previously called LoadOpsN.
This is not an NFC, since constant addresses are now folded into the
offset in more places for v128.storeN_lane.
Differential Revision: https://reviews.llvm.org/D139950
This refactors out the offset and address operand pattern matching into
a ComplexPattern, so that one pattern fragment can match the dynamic and
static (offset) addresses in all possible positions.
Split out from D139530, which also contained an improvement to global
address folding.
Differential Revision: https://reviews.llvm.org/D139631
Use load32_zero instead of load32_splat to load the low 32 bits from memory to
v128. Test cases are added to cover this change.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D134257
Refactor the tablegen definitions for relaxed SIMD min/max instructions to use a
shared RelaxedBinary multiclass modeled on the existing SIMDBinary multiclass. A
future commit will add further instruction definitions that use RelaxedBinary.
Also rename the SIMD_RELAXED_CONVERT multiclass to RelaxedConvert to better fit
existing naming conventions.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D127157
Fix the instruction names to match the WebAssembly spec:
- `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero`
- `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero`
Also rename related things like intrinsics, builtins, and test functions to
match.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D121661
When possible, optimize TRUNCATE to generate Wasm SIMD narrow
instructions (i16x8.narrow_i32x4_u, i8x16.narrow_i16x8_u), rather than generate
lots of extract_lane and replace_lane.
Closes#50350.
Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u,
i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero.
These are only exposed as builtins, and require user opt-in.
Differential Revision: https://reviews.llvm.org/D112186
Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only
exposed as builtins, and require user opt-in.
Differential Revision: https://reviews.llvm.org/D112146
Add i8x16 relaxed_swizzle instructions. These are only
exposed as builtins, and require user opt-in.
Differential Revision: https://reviews.llvm.org/D112022
Previously extra wide v4f32 to v4f64 extending loads would be legalized to v2f32
to v2f64 extending loads, which would then be scalarized by legalization. (v2f32
to v2f64 extending loads not produced by legalization were already being emitted
correctly.) Instead, mark v2f32 to v2f64 extending loads as legal and explicitly
lower them using promote_low. This regresses the addressing modes supported for
the extloads not produced by legalization, but that's a fine trade off for now.
Differential Revision: https://reviews.llvm.org/D108496
Partially reverts 85157c007903, which had removed these builtins and intrinsics
in favor of normal codegen patterns. It turns out that it is possible for the
patterns to be split over multiple basic blocks, however, which means that DAG
ISel is not able to select them to the pmin/pmax instructions. To make sure the
SIMD intrinsics generate the correct instructions in these cases, reintroduce
the clang builtins and corresponding LLVM intrinsics, but also keep the normal
pattern matching as well.
Differential Revision: https://reviews.llvm.org/D108387
The default legalization of unsupported vector types is to promote the integers
in each lane, which leads to extra sign or zero extending and masking when
moving data into and out of vectors. Switch our preferred type legalization from
the default to vector widening, which keeps the data in the low lanes of the
vector rather than in the low bits of each lane. The unused high lanes can be
ignored.
Half-wide vectors are now loaded from memory into the low 64 bits of the v128
rather than spread out among the lanes. As a result, v128.load64_splat is a much
more common operation, so add new patterns to support it.
Differential Revision: https://reviews.llvm.org/D107502
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.
Differential Revision: https://reviews.llvm.org/D106724
Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax}
with standard codegen patterns. Since wasm_simd128.h uses an integer vector as
the standard single vector type, the IR for the pmin and pmax intrinsic
functions contains bitcasts that would not be there otherwise. Add extra codegen
patterns that can still select the pmin and pmax instructions in the presence of
these bitcasts.
Differential Revision: https://reviews.llvm.org/D106612
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal instruction selection patterns. The wasm_simd128.h
intrinsics header was already using portable code for the corresponding
intrinsics, so now it produces the correct instructions.
Differential Revision: https://reviews.llvm.org/D106400
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50435.
Differential Revision: https://reviews.llvm.org/D106019
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.
Differential Revision: https://reviews.llvm.org/D105950