1260 Commits

Author SHA1 Message Date
Sam Clegg
cac54a8ad0
[WebAssembly] Require tags for Wasm EH and Wasm SJLJ to be defined externally (#159143)
Rather then defining these tags in each object file that requires them
we can can declare them as undefined and require that they defined
externally in, for example, compiler-rt or libcxxabi.
2025-09-19 10:11:15 -07:00
Sam Parker
586c0ad918
[WebAssembly] Support partial-reduce accumulator (#158060)
We currently only support partial.reduce.add in the case where we are
performing a multiply-accumulate. Now add support for any partial
reduction where the input is being extended, where we can take advantage
of extadd_pairwise.
2025-09-12 07:03:49 +01:00
Arthur Eubanks
984251acad
Revert "[DAGCombiner] Relax condition for extract_vector_elt combine" (#157953)
Reverts llvm/llvm-project#157658

Causes hangs, see
https://github.com/llvm/llvm-project/pull/157658#issuecomment-3276441812
2025-09-10 21:33:44 +00:00
ZhaoQi
4621e17dee
[DAGCombiner] Relax condition for extract_vector_elt combine (#157658)
Checking `isOperationLegalOrCustom` instead of `isOperationLegal` allows
more optimization opportunities. In particular, if a target wants to
mark `extract_vector_elt` as `Custom` rather than `Legal` in order to
optimize some certain cases, this combiner would otherwise miss some
improvements.

Previously, using `isOperationLegalOrCustom` was avoided due to the risk
of getting stuck in infinite loops (as noted in
61ec738b60).
After testing, the issue no longer reproduces, but the coverage is
limited to the regression/unit tests and the test-suite.
2025-09-10 15:51:52 +08:00
Sam Parker
6dacdc31ec
[WebAssembly] extadd_pairwise for PartialReduce (#157669)
Avoid using extends, and adding the high and low half and use
extadd_pairwise instead.
2025-09-10 08:13:46 +01:00
Trevor Gross
79d2961626
[WebAssembly] Update the test for half (NFC) (#152832)
Replace the existing `f16` test with the version that is uses for other
architectures (typically as `half.ll`). This still covers the
conversions from the existing test, but also adds checks for most simple
ops.

Additionally, rename `half-precision.ll` to `fp-intrinsics.ll` to keep
the name similar to this test.
2025-09-08 15:43:51 +00:00
Derek Schuff
a3c41ddcaf
[WebAssembly] Guard use of getSymbolName with isSymbol (#156105)
WebAssemblyRegStackfy checks for writes to the stack pointer to avoid
stackifying across them, but it wasn't prepared for other global_set
instructions (such as writes in addrspace 1).

Fixes #156055

Thanks to @QuantumSegfault for reporting and identifying the offending
code.
2025-09-02 16:21:35 -07:00
Sam Parker
7b3e77f8d9
[WebAssembly] Implement getInterleavedMemoryOpCost (#146864)
First pass where we calculate the cost of the memory operation, as well
as the shuffles required. Interleaving by a factor of two should be
relatively cheap, as many ISAs have dedicated instructions to perform
the (de)interleaving. Several of these permutations can be combined for
an interleave stride of 4 and this is the highest stride we allow.

I've costed larger vectors, and more lanes, as more expensive because
not only is more work is needed but the risk of codegen going 'wrong'
rises dramatically. I also filled in a bit of cost modelling for vector
stores.

It appears the main vector plan to avoid is an interleave factor of 4
with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on
AArch64, and observe geomean improvement of ~3% with some kernels
improving 40-60%.

I know there is still significant performance being left on the table,
so this will need more development along with the rest of the cost
model.
2025-08-27 12:43:52 +01:00
Sam Parker
e557ad687b
[WebAssembly] v8i8 mul support (#151145)
During DAG combine, promote the operands to v8i16 by concanting with an
undef vector and then use extmul_low to perform the mul at i16. Finally,
shuffle the low bytes out of the i16 elements into the result vector.
2025-08-27 11:39:26 +01:00
Jasmine Tang
7fcee5fe08
[WebAssembly] Add support for avgr_u in loops (#153252)
Fixes https://github.com/llvm/llvm-project/issues/150550.

With the test case 
```
void f(unsigned char *x, unsigned char *y, int n) {
  // should have been vectorized into avgr_u instead of seperated vectorized add and logical right shift
  for (int i = 0; i < n; i++)
    x[i] = (x[i] + y[i] + 1) / 2;
}
```

the backend failed to recognize that this can be reduced to avgr_u since
the loop vectorizer doesn't transform into the existing pattern in
tablegen.

This PR sets AVGCEIL_U as legal for v8i16 and v16i8 and selects it to
avgr_u in the tablegen file.
2025-08-22 09:52:49 -07:00
Jasmine Tang
d7a29e5d56
[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703)
This PR reapplies https://github.com/llvm/llvm-project/pull/149461

In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.

For example, with
```llvm
 %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
  %res = icmp eq i32 %cmp_16, 0
```

the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.

Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.
2025-08-15 12:06:47 -07:00
Jasmine Tang
d32793ca6e
Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360)
Reverts llvm/llvm-project#149461

The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it

Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp

Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`

Original comment report:
https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746
2025-08-13 07:41:44 +00:00
Philip Reames
4d629f9744
[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226)
In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
2025-08-12 11:23:05 -07:00
Jasmine Tang
348f01f89c
[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 (#149461)
Fixes https://github.com/llvm/llvm-project/issues/149230

Previously, even with simd enabled via `-mattr=+simd128`, the compiler
cannot utilize v128 to optimize loads and setcc of i128, instead
legalizing it to consecutive i64s.

This PR then adds support for setcc of i128 by converting them to
v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16
bytes or more (when simd128 is present).

The check for enabling this optimization is if the comparison operand is
either a load or an integer in i128, with the comparison code being
either `EQ | NE`, without `NoImplicitFloat` function flag.

Inspiration taken from RISCV's isel lowering.
2025-08-12 11:04:37 -07:00
Trevor Gross
00c4be3c9e
[Test] Add and update tests for lrint/llrint (NFC) (#152662)
Many backends are missing either all tests for lrint, or specifically
those for f16, which currently crashes for `softPromoteHalf` targets.
For a number of popular backends, do the following:

* Ensure f16, f32, f64, and f128 are all covered
* Ensure both a 32- and 64-bit target are tested, if relevant
* Add `nounwind` to clean up CFI output
* Add a test covering the above if one did not exist
* Always specify the integer type in intrinsic calls

There are quite a few FIXMEs here, especially for `f16`, but much of
this will be resolved in the near future.
2025-08-12 09:56:51 +09:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Hood Chatham
b9c328480c
[clang][WebAssembly] Support reftypes & varargs in test_function_pointer_signature (#150921)
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).

I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.

It will also crash in the backend if passed an int128 or float128 type.
2025-08-07 13:07:04 -07:00
Hood Chatham
8dd91996f0
[WebAssembly] Add gc target feature to addBleedingEdgeFeatures (#151294)
Also alphebetize feature list, add `-mgc` and `-mno-gc` flags, and add
some missing feature tests.

Reland of #151107.

https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
2025-07-30 13:04:02 -07:00
ronlieb
a7e029bd0b
Revert "[WebAssembly] Add gc target feature to addBleedingEdgeFeatures" (#151268)
Reverts llvm/llvm-project#151107
2025-07-29 22:07:17 -07:00
Hood Chatham
fe25445ded
[WebAssembly] Add gc target feature to addBleedingEdgeFeatures (#151107)
See suggestion here:
https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
2025-07-29 17:54:12 -07:00
Sam Parker
29e02d792b
[NFC][WebAssembly] Precommit test for v8i8 mul (#151139) 2025-07-29 13:49:51 +01:00
Sam Parker
68152f1301
[WebAssembly] v16i8 mul support (#150209)
During target DAG combine, use two i16x8.extmul_low_i8x16 and a shuffle
for v16i8 mul.

On my AArch64 machine, using V8, I observe a 3.14% geomean improvement
across 65 benchmarks, including: 9.2% for spec2017.x264, 6% for libyuv
and 1.8% for ncnn.
2025-07-29 09:23:31 +01:00
Nikita Popov
fc90685354
[WebAssemblyLowerEmscriptenEHSjLj] Avoid lifetime of phi (#150932)
After #149310 lifetime intrinsics require an alloca argument, an
invariant that this pass can break.

I've fixed this in two ways:
* First, move static allocas into the entry block. Currently, the way
the pass splits the entry block makes all allocas dynamic, which I
assume was not actually intended. This will avoid unnecessary SSA
reconstruction for allocas as well, and thus avoid the problem.
* If this fails (for dynamic allocas) drop all lifetime intrinsics if
any one of them would require a rewrite during SSA reconstruction.

Fixes https://github.com/llvm/llvm-project/issues/150498.
2025-07-29 09:58:57 +02:00
Jasmine Tang
522ac23609
[WebAssembly] Add pattern for relaxed nmadd (#150684)
Following footstep of https://github.com/llvm/llvm-project/pull/147487
(support for madd), this PR adds support for nmadd.

https://github.com/llvm/llvm-project/issues/55932 tracks this
2025-07-28 10:20:04 -07:00
Hood Chatham
15b03687ff
[WebAssembly,clang] Add __builtin_wasm_test_function_pointer_signature (#150201)
Tests if the runtime type of the function pointer matches the static
type. If this returns false, calling the function pointer will trap.
Uses `@llvm.wasm.ref.test.func` added in #147486.

Also adds a "gc" wasm feature to gate the use of the ref.test
instruction.
2025-07-25 16:52:39 -07:00
Jasmine Tang
8e6a05d471
[WebAssembly] Added vectorized version of fexp10 to the supported list (#150564)
Fixes https://github.com/llvm/llvm-project/issues/117200.

The default behavior in TargetLoweringBase is only scalar floats on fexp
are supported by default, not the vectorized version. This PR adds
`ISD::FEXP10` to the supported list.
2025-07-25 12:30:59 -07:00
Nikita Popov
129a35454c [WebAssemblyOptimizeReturned] Skip lifetime intrinsic uses
Replacing an alloca with a call result in a lifetime intrinsic
will cause a verifier error.

Fixes https://github.com/llvm/llvm-project/issues/150498.
2025-07-25 12:12:26 +02:00
Hood Chatham
e3b79afa67
[WebAssembly,llvm] Fix buildbot problems with llvm.wasm.ref.test.func (#150116)
PR #147486 broke the sanitizer and expensive-checks buildbot. 

These captures were needed when toWasmValType emitted a diagnostic but
are no longer needed since we changed it to an assertion failure. This
removes the unneeded captures and should fix the sanitizer-buildbot.

I also fixed the codegen in the wasm64 target: table.get requires an i32
but in wasm64 the function pointer is an i64. We need an additional
`i32.wrap_i64` to convert it. I also added `-verify-machineinstrs` to
the tests so that the test suite validates this fix.

Finally, I noticed that #150201 uses a feature of the intrinsic that is
not covered by the tests, namely `ptr` arguments. So I added one
additional test case to ensure that it works properly.

cc @dschuff
2025-07-23 09:52:05 -07:00
Heejin Ahn
b13bca7387
[WebAssembly] Unstackify registers with no uses in ExplicitLocals (#149626)
There are cases we end up removing some intructions that use stackified
registers after RegStackify. For example,

```wasm
bb.0:
  %0 = ...    ;; %0 is stackified
  br_if %bb.1, %0
bb.1:
```

In this code, br_if will be removed in CFGSort, so we should unstackify
%0 so that it can be correctly dropped in ExplicitLocals.

Rather than handling this in case-by-case basis, this PR just
unstackifies all stackifies register with no uses in the beginning of
ExplicitLocals, so that they can be correctly dropped.

Fixes #149097.
2025-07-22 15:34:23 -07:00
Hood Chatham
15715f4089
[WebAssembly,llvm] Add llvm.wasm.ref.test.func intrinsic (#147486)
This adds an llvm intrinsic for WebAssembly to test the type of a
function. It is intended for adding a future clang builtin
` __builtin_wasm_test_function_pointer_signature` so we can test whether
calling a function pointer will fail with function signature mismatch.

Since the type of a function pointer is just `ptr` we can't figure out
the expected type from that.
The way I figured out to encode the type was by passing 0's of the
appropriate type to the intrinsic.
The first argument gives the expected type of the return type and the
later values give the expected
type of the arguments. So
```llvm
@llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0)
```
tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to:
```wat
local.get $func
table.get $__indirect_function_table
ref.test (double, i32) -> (i32)
```
To indicate the function should be void, I somewhat arbitrarily picked
`token poison`, so the following tests for `(i32) -> ()`:
```llvm
@llvm.wasm.ref.test.func(ptr %func, token poison, i32 0)
```

To lower this intrinsic, we need some place to put the type information.
With `encodeFunctionSignature()` we encode the signature information
into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in
`WebAssemblyMCInstLower.cpp`.
2025-07-22 14:07:34 -07:00
Sam Parker
03b90486da
[WebAssembly] Memory interleave test (#149045)
Precommit codegen test for vectorization cost modelling.
2025-07-22 09:50:47 +01:00
Arseny Kapoulkine
5b98992fb9
[WebAssembly] Optimize convert_iKxN_u into convert_iKxN_s (#149609)
convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is
known to have sign bit 0. This results in emitting Wasm opcodes that, on
some targets (like x86_64), are dramatically slower than signed versions
on major engines.

Similarly to X86, we now fix this up in isel when the instruction has
nonneg flag from canonicalization or if we know the source has zero sign
bit.

Fixes #149457.
2025-07-21 09:17:29 -07:00
Jasmine Tang
343f7475be
[WebAssembly] Add support for memcmp expansion (#148298)
Fixes https://github.com/llvm/llvm-project/issues/61400

Added test case in llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
2025-07-20 10:27:42 -07:00
jjasmine
6640b0a293
[WebAssembly] Add patterns for relaxed madd (#147487)
[WebAssembly] Fold fadd contract (fmul contract) to relaxed madd w/
-mattr=+simd128,+relaxed-simd

Fixes #121311

- Precommit test for #121311
- Fold fadd contract (fmul contract) to relaxed madd w/
-mattr=+simd128,+relaxed-simd
- Move PatFrag of fadd_contract in ARM.td and WebAssembly.td to
TargetSelectionDAG.td for reuse of pattern
2025-07-15 00:56:28 +08:00
jjasmine
44481f5067
[DAGCombine] Change isBuildVectorAll* -> isConstantSplatVectorAll* for Vselect (#147305)
Change isBuildVectorAll* -> isConstantSplatVectorAll* in VSelect in case
the fold happens after BuildVector has been canonically transformed to
Splat or if the Splat is initially in vselect already

- Fixes #73454
- Update related test cases, add extra tests in wasm

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-07-11 10:13:05 +01:00
Matt Arsenault
1e26443cf9
CodeGen: Remove redundant REQUIRES registered-target from backend tests (#147475)
These are already applied to all the tests in the target subdirectory
2025-07-09 09:25:53 +09:00
Matt Arsenault
3697d6dd98
DAG: Fall back to separate sin and cos when softening sincos (#147468)
Fix asserting in the error case.
2025-07-09 01:52:46 +09:00
Matt Arsenault
4a507b1f56
WebAssembly: Add test for sincos intrinsic (#147467) 2025-07-09 01:49:17 +09:00
jjasmine
cbc2ac5db8
[WebAssembly] Fold TargetGlobalAddress with added offset (#145829)
Previously we only folded TargetGlobalAddresses into the memarg if they
were on their own, so this patch supports folding TargetGlobalAddresses
that are added to some other offset.

Previously we weren't able to do this because we didn't have nuw on the
add, but we can now that getelementptr has nuw and is plumbed through to
the add in 0564d0665b302d1c7861e03d2995612f46613a0f.

Fixes #61930
2025-07-03 11:01:36 +01:00
Matt Arsenault
6ab7e52dd8
WebAssembly: Move validation of EH flags to TargetMachine construct time (#146634) 2025-07-03 07:25:38 +09:00
Alex Crichton
a8a9a7f95a
[WebAssembly] Fix inline assembly with vector types (#146574)
This commit fixes using inline assembly with v128 results. Previously
this failed with an internal assertion about a failure to legalize a
`CopyFromReg` where the source register was typed `v8f16`. It looks like
the type used for the destination register was whatever was listed first
in the `def V128 : WebAssemblyRegClass` listing, so the types were
shuffled around to have a default-supported type.

A small test was added as well which failed to generate previously and
should now pass in generation. This test passed on LLVM 18 additionally
and regressed by accident in #93228 which was first included in LLVM 19.
2025-07-01 20:26:30 -07:00
jjasmine
e9c9f8f374
[WebAssembly] Fold any/alltrue (setcc x, 0, eq/ne) to [not] any/alltrue x (#144741)
Fixes https://github.com/llvm/llvm-project/issues/50142, a miss of
further vectorization, where we can only achieve zext (xor (any_true),
-1).

Now in test case simd-setcc-reductions, it's converted to all_true.

Also fixes https://github.com/llvm/llvm-project/issues/145177, which is

all_true (setcc x, 0, eq) -> not any_true
any_true (setcc x, 0, ne) -> any_true
all_true (setcc x, 0, ne) -> all_true

---------

Co-authored-by: badumbatish <--show-origin>
2025-07-01 15:27:37 -07:00
jjasmine
4a8c1f7d12
[WebAssembly] [Backend] Wasm optimize illegal bitmask (#145627)
[WebAssembly] [Backend] Wasm optimize illegal bitmask for #131980.

Currently, the case for illegal bitmask (v32i8 or v64i8) is that at the
SelectionDag level, two (four) vectors of v128 will be concatenated
together, then they'll all be SETCC by the same pseudo illegal
instruction, which requires expansion later on.

I opt for SETCC-ing them seperately, bitcast and zext them and then add
them up together in the end.

---------

Co-authored-by: badumbatish <--show-origin>
2025-07-01 15:13:08 -07:00
SingleAccretion
cd46354dbd
[WebAssembly] Enable a limited amount of stackification for debug code (#136510)
This change is a step towards fixing one long-standing problem with
LLVM's debug WASM codegen: excessive use of locals. One local for each
temporary value in IR (roughly speaking).

This has a lot of problems:
1) It makes it easy to hit engine limitations of 50K locals with certain
code patterns and large functions.
2) It makes for larger binaries that are slower to load and slower to
compile to native code.
3) It makes certain compilation strategies (spill all WASM locals to
stack, for example) for debug code excessively expensive and makes debug
WASM code either run very slow, or be less debuggable.
4) It slows down LLVM itself.

This change addresses these partially by running a limited version of
the stackification pass for unoptimized code, one that gets rid of the
most 'obviously' unnecessary locals.

Care needs to be taken to not impact LLVM's ability to produce high
quality debug variable locations with this pass. To that end:
1) We only allow stackification when it doesn't require moving any
instructions.
2) We disable stackification of any locals that are used in
DEBUG_VALUEs, or as a frame base.
I have verified on a moderately large example that the baseline and the
diff produce the same kinds (local/global/stack) of locations, and the
only differences are due to the shifting of instruction offsets, with
many local.[get|set]s not being present anymore.

Even with this quite conservative approach, the results are pretty good:
1) 30% reduction in raw code size, up to 10x reduction in the number of
locals for select large methods (~1000 => ~100).
2) ~10% reduction in instructions retired for an "llc -O0" run on a
moderately sized input.
2025-06-24 11:40:47 -07:00
Fangrui Song
28bda77843
Introduce MCAsmInfo::UsesSetToEquateSymbol and prefer = to .set
Introduce MCAsmInfo::UsesSetToEquateSymbol to control the preferred
syntax for symbol equating. We now favor the more readable and common
`symbol = expression` syntax over `.set`. This aligns with pre- https://reviews.llvm.org/D44256 behavior.

On Apple platforms, this resolves a clang -S vs -c behavior difference (resolves #104623).

For targets whose = support is unconfirmed, UsesSetToEquateSymbol is set to false.
This also minimizes test updates.

Pull Request: https://github.com/llvm/llvm-project/pull/142289
2025-06-11 22:19:31 -07:00
Iris Shi
24d730b380
Reland "[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets" (#143651) 2025-06-11 15:56:37 +08:00
Iris Shi
8c890eaa3f
Revert "[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets" (#143648) 2025-06-11 10:19:12 +08:00
Iris Shi
bfb48363b0
[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets (#137641) 2025-06-09 17:57:15 +08:00
Pavel Verigo
9d89b05f11
[WebAssembly] Fix trunc in FastISel (#138479)
Previous logic did not handle the case where the result bit size was
between 32 and 64 bits inclusive. I updated the if-statements for more
precise handling.

An alternative solution would have been to abort FastISel in case the
result type is not legal for FastISel.

Resolves: #64222.

This PR began as an investigation into the root cause of
https://github.com/ziglang/zig/issues/20966.

Godbolt link showing incorrect codegen on 20.1.0:
https://godbolt.org/z/cEr4vY7d4.
2025-05-06 14:16:35 -07:00
Nikita Popov
6feb4a8ef4
[IR] Don't allow values of opaque type (#137625)
Consider opaque types as non-first-class types, i.e. do not allow SSA
values to have opaque type.
2025-04-30 15:01:00 +02:00