1288 Commits

Author SHA1 Message Date
Jasmine Tang
672757bf55
[WebAssembly] Add patterns for extadd pairwise (#167960)
Add a few patterns for extadd pairwise.
2025-11-18 02:41:16 -08:00
Hongyu Chen
63e6373efd
[WebAssembly] Truncate extra bits of large elements in BUILD_VECTOR (#167223)
Fixes https://github.com/llvm/llvm-project/issues/165713
This patch handles out-of-bound vector elements and truncates extra
bits.
2025-11-17 10:39:18 +00:00
Matt Arsenault
dfdada1b78
CodeGen: Remove target hook for terminal rule (#165962)
Enables the terminal rule for remaining targets
2025-11-12 21:12:19 +00:00
Hongyu Chen
9697f4b9e4
[WebAssembly][FastISel] Bail out on meeting non-integer type in selectTrunc (#167165)
Fixes https://github.com/llvm/llvm-project/issues/165438
With `simd128` enabled, we may meet vector type truncation in FastISel.
To respect #138479, this patch merely bails out on non-integer IR types,
though I prefer bailing out for all non-simple types as most targets
(X86, AArch64) do.
2025-11-12 04:33:41 +08:00
Sam Parker
d47fdfec2b
[NFC][WebAssembly] Precommit test. (#167520) 2025-11-11 16:20:12 +00:00
Sam Parker
d10a85167a
[WebAssembly] Implement more of getCastInstrCost (#164612)
Fill out more information for sign and zero extend and add some truncate
information; however, the primary change is to int/fp conversions. In
particular, fp to (narrow) int appears to be relatively expensive.
2025-11-10 08:07:16 +00:00
Sam Parker
9e6a31f832
[WebAssembly] vf32 to vi8, vi16 lowering (#164644)
Avoid scalarizing the conversion and use trunc_sat and narrow instead.
2025-11-06 08:32:44 +00:00
Kleis Auke Wolthuizen
4b367e0b85
[WebAssembly] Use IRBuilder in FixFunctionBitcasts (NFC) (#164268)
Simplifies the code a bit.
2025-11-05 01:35:15 +00:00
Jasmine Tang
e6cd7a52bc
[WebAssembly] [Codegen] Add pattern for relaxed min max from pmin/pmax-based patterns over v4f32 and v2f64 (#164486)
Related to https://github.com/llvm/llvm-project/issues/55932
2025-10-23 01:39:02 -07:00
Florian Hahn
a7672fee0f
[WebAssembly] Fixup test after bfc322dd724735.
Test update was missed in bfc322dd724735 due a codegen test running
loop-vectorize directly. The loop does not get vectorized any longer.
2025-10-22 22:34:47 +01:00
Sam Parker
20340accf2
[NFC][WebAssembly] FP conversion interleave tests (#164576) 2025-10-22 11:43:44 +01:00
Jasmine Tang
1fbfac30f1
[WebAssembly] [Codegen] Add pattern for relaxed min max from fminimum/fmaximum over v4f32 and v2f64 (#162948)
Related to #55932
2025-10-22 03:08:24 -07:00
Sam Parker
aa63949428
[WebAssembly] Avoid dot for v16i8 partial_smla (#163796)
The sequence is shorter, by two extend operations, if we just use extmul
and extadd_pairwise.
2025-10-20 09:12:00 +01:00
Jasmine Tang
893b1d4187
[WebAssembly] [Codegen] Add patterns for relaxed dot (#163266)
The pattern I added for `relaxed dot` similar to normal dot @
https://github.com/llvm/llvm-project/pull/151775.

For `relaxed dot add`, i noticed that in the proposal the portion of dot
implementation is similar to `relaxed dot`, so I think we can add a
pattern where after we do relaxed dot and do extadd pairwise, we can do
`relaxed dot add`.

One current obstacles is I don't think there is any pattern to singly
create a extadd pairwise from other instructions so the `relaxed dot
add` pattern would not cover a wide range of instructions.

related to https://github.com/llvm/llvm-project/issues/55932
2025-10-16 15:01:57 +00:00
Sam Parker
65363e64f8
[WebAssembly] Partial SMLA with relaxed dot (#163529)
Lower v16i8 to v4i32 partial_smla to relaxed_dot_add. I'm still unsure
whether we could/should take advantage of the unknown signedness of the
rhs, and also lower the partial_sumla operation too.
2025-10-16 07:09:16 +01:00
Derek Schuff
19a58a5208
[WebAssembly] Optimize lowering of constant-sized memcpy and memset (#163294)
We currently emit a check that the size operand isn't zero, to avoid
executing the wasm memory.copy instruction when it would trap.
But this isn't necessary if the operand is a constant.

Fixes #163245
2025-10-14 22:00:25 +00:00
Derek Schuff
3e22438320
[CodeGen] Use getObjectPtrOffset to generate loads/stores for mem intrinsics (#80184)
This causes address arithmetic to be generated with the 'nuw' flag, 
allowing WebAssembly constant offset folding.

Fixes #79692
2025-10-13 17:22:48 -07:00
Jasmine Tang
55d4e92c88
[WebAssembly] Add extra pattern for dot (#151775)
Fixes https://github.com/llvm/llvm-project/issues/50154
2025-10-13 10:27:12 -07:00
Sam Parker
1820102167
Wasm fmuladd relaxed (#163177)
Reland #161355, after fixing up the cross-projects-tests for the wasm
simd intrinsics.

Original commit message:
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 16:50:53 +01:00
Sam Parker
30d3441cf0
Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171)
Reverts llvm/llvm-project#161355

Looks like I've broken some intrinsic code generation.
2025-10-13 11:53:40 +01:00
Sam Parker
a4eb7ea225
[WebAssembly] Lower fmuladd to madd and nmadd (#161355)
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 10:36:08 +01:00
Folkert de Vries
761be78dd7
[WebAssembly] recognize saturating truncation (#155470)
fixes https://github.com/llvm/llvm-project/issues/153838
using the same approach as
https://github.com/llvm/llvm-project/pull/155377

Recognize a manual saturating truncation and select the corresponding
instruction. This is useful in general, but came up specifically in
https://github.com/rust-lang/stdarch because it will allow us to drop
more target-specific intrinsics in favor of cross-platform ones.
2025-10-08 11:52:18 -07:00
Derek Schuff
abc8aac6d2
[WebAssembly] Check intrinsic argument count before Any/All combine (#162163)
This code is activated on all INTRINSIC_WO_CHAIN but only handles
a selection. However it was trying to read the arguments before
checking which intrinsic it was handling. This fails for intrinsics
that have no arguments.
2025-10-07 23:52:25 +00:00
Yatao Wang
178e2a704b
[LLVM][CodeGen] Check Non Saturate Case in isSaturatingMinMax (#160637)
Fix Issue #160611
2025-10-03 20:39:45 +01:00
Sam Parker
156e9b4b69
[WebAssembly] Use partial_reduce_mla ISD nodes (#161184)
Addresssing issue #160847.
 
Move away from combining the intrinsic call and instead lower the ISD
nodes, using tablegen for pattern matching.
2025-09-30 08:28:56 +01:00
Heejin Ahn
e5b2a06546
[WebAssembly] Remove FAKE_USEs before ExplicitLocals (#160768)
`FAKE_USE`s are essentially no-ops, so they have to be removed before
running ExplicitLocals so that `drop`s will be correctly inserted to
drop those values used by the `FAKE_USE`s.

---

This is reapplication of #160228, which broke Wasm waterfall. This PR
additionally prevents `FAKE_USE`s uses from being stackified.

Previously, a 'def' whose first use was a `FAKE_USE` was able to be
stackified as `TEE`:
- Before
```
Reg = INST ...            // Def
FAKE_USE ..., Reg, ...    // Insert
INST ..., Reg, ...
INST ..., Reg, ...
```

- After RegStackify
```
DefReg = INST ...            // Def
TeeReg, Reg = TEE ... DefReg
FAKE_USE ..., TeeReg, ...    // Insert
INST ..., Reg, ...
INST ..., Reg, ...
```
And this assumes `DefReg` and `TeeReg` are stackified.

But this PR removes `FAKE_USE`s in the beginning of ExplicitLocals. And
later in ExplicitLocals we have a routine to unstackify registers that
have no uses left:

7b28fcd2b1/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp (L257-L269)
(This was added in #149626. Then it didn't seem it would trigger the
same assertions for `TEE`s because it was fixing the bug where a
terminator was removed in CFGSort (#149097).
Details here:
https://github.com/llvm/llvm-project/pull/149432#issuecomment-3091444141)

- After `FAKE_USE` removal and unstackification
```
DefReg = INST ...
TeeReg, Reg = TEE ... DefReg
INST ..., Reg, ...
INST ..., Reg, ...
```
And now `TeeReg` is unstackified. This triggered the assertion here,
that `TeeReg` should be stackified:

7b28fcd2b1/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp (L316)

This prevents `FAKE_USE`s' uses from being stackified altogether,
including `TEE` transformation. Even when it is not a `TEE`
transformation and just a single use stackification, it does not trigger
the assertion but there's no point stackifying it given that it will be
deleted.

---

Fixes https://github.com/emscripten-core/emscripten/issues/25301.
2025-09-25 14:49:25 -07:00
Derek Schuff
3bdf05a05a
Revert "[WebAssembly] Remove FAKE_USEs before ExplicitLocals" (#160553)
Reverts llvm/llvm-project#160228
See
https://github.com/llvm/llvm-project/pull/160228#issuecomment-3329752471
2025-09-24 16:55:48 +00:00
Heejin Ahn
d27654f9d8
[WebAssembly] Remove FAKE_USEs before ExplicitLocals (#160228)
`FAKE_USE`s are essentially no-ops, so they have to be removed before
running ExplicitLocals so that `drop`s will be correctly inserted to
drop those values used by the `FAKE_USE`s.

Fixes https://github.com/emscripten-core/emscripten/issues/25301.
2025-09-23 14:14:40 -07:00
Sam Clegg
cac54a8ad0
[WebAssembly] Require tags for Wasm EH and Wasm SJLJ to be defined externally (#159143)
Rather then defining these tags in each object file that requires them
we can can declare them as undefined and require that they defined
externally in, for example, compiler-rt or libcxxabi.
2025-09-19 10:11:15 -07:00
Sam Parker
586c0ad918
[WebAssembly] Support partial-reduce accumulator (#158060)
We currently only support partial.reduce.add in the case where we are
performing a multiply-accumulate. Now add support for any partial
reduction where the input is being extended, where we can take advantage
of extadd_pairwise.
2025-09-12 07:03:49 +01:00
Arthur Eubanks
984251acad
Revert "[DAGCombiner] Relax condition for extract_vector_elt combine" (#157953)
Reverts llvm/llvm-project#157658

Causes hangs, see
https://github.com/llvm/llvm-project/pull/157658#issuecomment-3276441812
2025-09-10 21:33:44 +00:00
ZhaoQi
4621e17dee
[DAGCombiner] Relax condition for extract_vector_elt combine (#157658)
Checking `isOperationLegalOrCustom` instead of `isOperationLegal` allows
more optimization opportunities. In particular, if a target wants to
mark `extract_vector_elt` as `Custom` rather than `Legal` in order to
optimize some certain cases, this combiner would otherwise miss some
improvements.

Previously, using `isOperationLegalOrCustom` was avoided due to the risk
of getting stuck in infinite loops (as noted in
61ec738b60).
After testing, the issue no longer reproduces, but the coverage is
limited to the regression/unit tests and the test-suite.
2025-09-10 15:51:52 +08:00
Sam Parker
6dacdc31ec
[WebAssembly] extadd_pairwise for PartialReduce (#157669)
Avoid using extends, and adding the high and low half and use
extadd_pairwise instead.
2025-09-10 08:13:46 +01:00
Trevor Gross
79d2961626
[WebAssembly] Update the test for half (NFC) (#152832)
Replace the existing `f16` test with the version that is uses for other
architectures (typically as `half.ll`). This still covers the
conversions from the existing test, but also adds checks for most simple
ops.

Additionally, rename `half-precision.ll` to `fp-intrinsics.ll` to keep
the name similar to this test.
2025-09-08 15:43:51 +00:00
Derek Schuff
a3c41ddcaf
[WebAssembly] Guard use of getSymbolName with isSymbol (#156105)
WebAssemblyRegStackfy checks for writes to the stack pointer to avoid
stackifying across them, but it wasn't prepared for other global_set
instructions (such as writes in addrspace 1).

Fixes #156055

Thanks to @QuantumSegfault for reporting and identifying the offending
code.
2025-09-02 16:21:35 -07:00
Sam Parker
7b3e77f8d9
[WebAssembly] Implement getInterleavedMemoryOpCost (#146864)
First pass where we calculate the cost of the memory operation, as well
as the shuffles required. Interleaving by a factor of two should be
relatively cheap, as many ISAs have dedicated instructions to perform
the (de)interleaving. Several of these permutations can be combined for
an interleave stride of 4 and this is the highest stride we allow.

I've costed larger vectors, and more lanes, as more expensive because
not only is more work is needed but the risk of codegen going 'wrong'
rises dramatically. I also filled in a bit of cost modelling for vector
stores.

It appears the main vector plan to avoid is an interleave factor of 4
with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on
AArch64, and observe geomean improvement of ~3% with some kernels
improving 40-60%.

I know there is still significant performance being left on the table,
so this will need more development along with the rest of the cost
model.
2025-08-27 12:43:52 +01:00
Sam Parker
e557ad687b
[WebAssembly] v8i8 mul support (#151145)
During DAG combine, promote the operands to v8i16 by concanting with an
undef vector and then use extmul_low to perform the mul at i16. Finally,
shuffle the low bytes out of the i16 elements into the result vector.
2025-08-27 11:39:26 +01:00
Jasmine Tang
7fcee5fe08
[WebAssembly] Add support for avgr_u in loops (#153252)
Fixes https://github.com/llvm/llvm-project/issues/150550.

With the test case 
```
void f(unsigned char *x, unsigned char *y, int n) {
  // should have been vectorized into avgr_u instead of seperated vectorized add and logical right shift
  for (int i = 0; i < n; i++)
    x[i] = (x[i] + y[i] + 1) / 2;
}
```

the backend failed to recognize that this can be reduced to avgr_u since
the loop vectorizer doesn't transform into the existing pattern in
tablegen.

This PR sets AVGCEIL_U as legal for v8i16 and v16i8 and selects it to
avgr_u in the tablegen file.
2025-08-22 09:52:49 -07:00
Jasmine Tang
d7a29e5d56
[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703)
This PR reapplies https://github.com/llvm/llvm-project/pull/149461

In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.

For example, with
```llvm
 %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
  %res = icmp eq i32 %cmp_16, 0
```

the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.

Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.
2025-08-15 12:06:47 -07:00
Jasmine Tang
d32793ca6e
Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360)
Reverts llvm/llvm-project#149461

The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it

Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp

Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`

Original comment report:
https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746
2025-08-13 07:41:44 +00:00
Philip Reames
4d629f9744
[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226)
In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
2025-08-12 11:23:05 -07:00
Jasmine Tang
348f01f89c
[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 (#149461)
Fixes https://github.com/llvm/llvm-project/issues/149230

Previously, even with simd enabled via `-mattr=+simd128`, the compiler
cannot utilize v128 to optimize loads and setcc of i128, instead
legalizing it to consecutive i64s.

This PR then adds support for setcc of i128 by converting them to
v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16
bytes or more (when simd128 is present).

The check for enabling this optimization is if the comparison operand is
either a load or an integer in i128, with the comparison code being
either `EQ | NE`, without `NoImplicitFloat` function flag.

Inspiration taken from RISCV's isel lowering.
2025-08-12 11:04:37 -07:00
Trevor Gross
00c4be3c9e
[Test] Add and update tests for lrint/llrint (NFC) (#152662)
Many backends are missing either all tests for lrint, or specifically
those for f16, which currently crashes for `softPromoteHalf` targets.
For a number of popular backends, do the following:

* Ensure f16, f32, f64, and f128 are all covered
* Ensure both a 32- and 64-bit target are tested, if relevant
* Add `nounwind` to clean up CFI output
* Add a test covering the above if one did not exist
* Always specify the integer type in intrinsic calls

There are quite a few FIXMEs here, especially for `f16`, but much of
this will be resolved in the near future.
2025-08-12 09:56:51 +09:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Hood Chatham
b9c328480c
[clang][WebAssembly] Support reftypes & varargs in test_function_pointer_signature (#150921)
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).

I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.

It will also crash in the backend if passed an int128 or float128 type.
2025-08-07 13:07:04 -07:00
Hood Chatham
8dd91996f0
[WebAssembly] Add gc target feature to addBleedingEdgeFeatures (#151294)
Also alphebetize feature list, add `-mgc` and `-mno-gc` flags, and add
some missing feature tests.

Reland of #151107.

https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
2025-07-30 13:04:02 -07:00
ronlieb
a7e029bd0b
Revert "[WebAssembly] Add gc target feature to addBleedingEdgeFeatures" (#151268)
Reverts llvm/llvm-project#151107
2025-07-29 22:07:17 -07:00
Hood Chatham
fe25445ded
[WebAssembly] Add gc target feature to addBleedingEdgeFeatures (#151107)
See suggestion here:
https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
2025-07-29 17:54:12 -07:00
Sam Parker
29e02d792b
[NFC][WebAssembly] Precommit test for v8i8 mul (#151139) 2025-07-29 13:49:51 +01:00
Sam Parker
68152f1301
[WebAssembly] v16i8 mul support (#150209)
During target DAG combine, use two i16x8.extmul_low_i8x16 and a shuffle
for v16i8 mul.

On my AArch64 machine, using V8, I observe a 3.14% geomean improvement
across 65 benchmarks, including: 9.2% for spec2017.x264, 6% for libyuv
and 1.8% for ncnn.
2025-07-29 09:23:31 +01:00