448 Commits

Author SHA1 Message Date
Austin Jiang
e6cdfb75ac
Fix typos and spelling errors across codebase (#156270)
Corrected various spelling mistakes such as 'occurred', 'receiver',
'initialized', 'length', and others in comments, variable names,
function names, and documentation throughout the project. These
changes improve code readability and maintain consistency in naming
and documentation.

Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
2026-01-13 11:52:46 -05:00
Sam Parker
e5b6833e49
[WebAssembly] vi8 mul cost modelling. (#175177)
We've already optimised these, so update the cost model to reflect it.
And skip the isBeforeLegalize check when lowering i8 muls, because it
then misses the cases where, say v32i8, has been type legalised into 2x
v16i8.

Also explicitly disable memory interleaving for any factor other than
two or four.
2026-01-12 09:25:54 +00:00
Derek Schuff
7a22bea512
[WebAssembly] Expand vector frem instructions (#174854)
Commit
6ad41bcc49
changed how frem is expanded during legalization and it
broke WebAssembly but we were missing test coverage. We want to maintain
our previous behavior of unrolling vectors and using a libcall to
implement scalar frem. I'm not sure why this now has to be different
(in ISelLowering) from other libcalls like fsin which work the same way
in the end, but this code does accurately describe what we want.

Fixes: https://github.com/emscripten-core/emscripten/issues/25991
2026-01-08 16:19:44 -08:00
Islam Imad
7ceecfad40
[CodeGen] Fix EVT::changeVectorElementType assertion on simple-to-extended fallback (#173413)
Fixes #171608
2025-12-28 18:51:18 +00:00
Frederik Harwath
6ad41bcc49
[CodeGen] expand-fp: Change frem expansion criterion (#158285)
The existing condition for checking whether or not to expand an frem
instruction in expand-fp is not sufficiently precise.
The expansion on other targets than AMDGPU - which is the only intended
user right now - is only prevented due to the interaction with the
MaxLegalFpConvertBitWidth check.  Relying on this is conceptually wrong
and limits the use of the pass for other targets and further expansions
(e.g. merging with the similar ExpandLargeDivRem pass).

Change the expansion criterion to always expand frem of a given type
for targets that use "Expand" as the legalization action for the 
underlying scalar type and use this to exit the pass early for targets 
which do not require any expansions. This requires to change the
frem legalization action for all targets which do not want frem to 
be expanded in this pass from "Expand" to "LibCall".

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-12-16 17:31:26 +01:00
Derek Schuff
6d60d3d7e4
Revert "[WebAssembly] Implement addrspacecast to funcref" (#170785)
Reverts llvm/llvm-project#166820
There was a failure in the ENABLE_EXPENSIVE_CHECKS configuration.
2025-12-04 17:24:14 -08:00
Demetrius Kanios
d3b9fd0f86
[WebAssembly] Implement addrspacecast to funcref (#166820)
Adds lowering of `addrspacecast [0 -> 20]` to allow easy conversion of
function pointers to Wasm `funcref`

When given a constant function pointer, it lowers to a direct
`ref.func`. Otherwise it lowers to a `table.get` from
`__indirect_function_table` using the provided pointer as the index.
2025-12-04 16:34:42 -08:00
Robert Imschweiler
5c3c0020af
[NFC] Refactor TargetLowering::getTgtMemIntrinsic to take CallBase parameter (#170334)
cf.
https://github.com/llvm/llvm-project/pull/133907#discussion_r2578576548
2025-12-02 19:42:31 +01:00
Sam Parker
e44646b795
[WebAssembly] Lower ANY_EXTEND_VECTOR_INREG (#167529)
Treat it in the same manner of zero_extend_vector_inreg and generate an
extend_low_u if possible. This is to try an prevent expensive shuffles
from being generated instead. computeKnownBitsForTargetNode has also
been updated to specify known zeros on extend_low_u.
2025-11-20 08:57:08 +00:00
Matt Arsenault
a757c4e74e
CodeGen: Add subtarget to TargetLoweringBase constructor (#168620)
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
2025-11-19 19:18:13 +00:00
Hongyu Chen
63e6373efd
[WebAssembly] Truncate extra bits of large elements in BUILD_VECTOR (#167223)
Fixes https://github.com/llvm/llvm-project/issues/165713
This patch handles out-of-bound vector elements and truncates extra
bits.
2025-11-17 10:39:18 +00:00
Sam Parker
9e6a31f832
[WebAssembly] vf32 to vi8, vi16 lowering (#164644)
Avoid scalarizing the conversion and use trunc_sat and narrow instead.
2025-11-06 08:32:44 +00:00
Sergei Barannikov
0c73009236
[WebAssembly] TableGen-erate SDNode descriptions (#166259)
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.

CALL and RET_CALL do not have a description in td files, and it is not
currently possible to add one as these nodes have both variable operands
and variable results.

This also fixes a subtle bug detected by the enabled verification
functionality. `LOCAL_GET` is declared with `SDNPHasChain` property, and
thus should have both a chain operand and a chain result. The original
code created a node without a chain result, which caused a check in
`SDNodeInfo::verifyNode()` to fail.

Part of #119709.

Pull Request: https://github.com/llvm/llvm-project/pull/166259
2025-11-05 06:24:53 +03:00
Jasmine Tang
1fbfac30f1
[WebAssembly] [Codegen] Add pattern for relaxed min max from fminimum/fmaximum over v4f32 and v2f64 (#162948)
Related to #55932
2025-10-22 03:08:24 -07:00
Derek Schuff
19a58a5208
[WebAssembly] Optimize lowering of constant-sized memcpy and memset (#163294)
We currently emit a check that the size operand isn't zero, to avoid
executing the wasm memory.copy instruction when it would trap.
But this isn't necessary if the operand is a constant.

Fixes #163245
2025-10-14 22:00:25 +00:00
Sam Parker
1820102167
Wasm fmuladd relaxed (#163177)
Reland #161355, after fixing up the cross-projects-tests for the wasm
simd intrinsics.

Original commit message:
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 16:50:53 +01:00
Sam Parker
30d3441cf0
Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171)
Reverts llvm/llvm-project#161355

Looks like I've broken some intrinsic code generation.
2025-10-13 11:53:40 +01:00
Sam Parker
a4eb7ea225
[WebAssembly] Lower fmuladd to madd and nmadd (#161355)
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 10:36:08 +01:00
Derek Schuff
abc8aac6d2
[WebAssembly] Check intrinsic argument count before Any/All combine (#162163)
This code is activated on all INTRINSIC_WO_CHAIN but only handles
a selection. However it was trying to read the arguments before
checking which intrinsic it was handling. This fails for intrinsics
that have no arguments.
2025-10-07 23:52:25 +00:00
Sam Parker
156e9b4b69
[WebAssembly] Use partial_reduce_mla ISD nodes (#161184)
Addresssing issue #160847.
 
Move away from combining the intrinsic call and instead lower the ISD
nodes, using tablegen for pattern matching.
2025-09-30 08:28:56 +01:00
Sander de Smalen
17e008db17
[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
2025-09-17 11:44:47 +01:00
Sam Parker
586c0ad918
[WebAssembly] Support partial-reduce accumulator (#158060)
We currently only support partial.reduce.add in the case where we are
performing a multiply-accumulate. Now add support for any partial
reduction where the input is being extended, where we can take advantage
of extadd_pairwise.
2025-09-12 07:03:49 +01:00
Sam Parker
6dacdc31ec
[WebAssembly] extadd_pairwise for PartialReduce (#157669)
Avoid using extends, and adding the high and low half and use
extadd_pairwise instead.
2025-09-10 08:13:46 +01:00
Sam Parker
e557ad687b
[WebAssembly] v8i8 mul support (#151145)
During DAG combine, promote the operands to v8i16 by concanting with an
undef vector and then use extmul_low to perform the mul at i16. Finally,
shuffle the low bytes out of the i16 elements into the result vector.
2025-08-27 11:39:26 +01:00
Jasmine Tang
7fcee5fe08
[WebAssembly] Add support for avgr_u in loops (#153252)
Fixes https://github.com/llvm/llvm-project/issues/150550.

With the test case 
```
void f(unsigned char *x, unsigned char *y, int n) {
  // should have been vectorized into avgr_u instead of seperated vectorized add and logical right shift
  for (int i = 0; i < n; i++)
    x[i] = (x[i] + y[i] + 1) / 2;
}
```

the backend failed to recognize that this can be reduced to avgr_u since
the loop vectorizer doesn't transform into the existing pattern in
tablegen.

This PR sets AVGCEIL_U as legal for v8i16 and v16i8 and selects it to
avgr_u in the tablegen file.
2025-08-22 09:52:49 -07:00
Jasmine Tang
d7a29e5d56
[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703)
This PR reapplies https://github.com/llvm/llvm-project/pull/149461

In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.

For example, with
```llvm
 %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
  %res = icmp eq i32 %cmp_16, 0
```

the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.

Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.
2025-08-15 12:06:47 -07:00
Nikita Popov
240c454c4d
[CodeGen] Remove default ctors for InputArg and OutputArg (#153205)
These make it easy to forget to initialize some members, like the newly
added OrigTy. Force these to always go through the ctor instead.
2025-08-13 10:51:43 +02:00
Jasmine Tang
d32793ca6e
Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360)
Reverts llvm/llvm-project#149461

The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it

Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp

Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`

Original comment report:
https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746
2025-08-13 07:41:44 +00:00
Jasmine Tang
348f01f89c
[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 (#149461)
Fixes https://github.com/llvm/llvm-project/issues/149230

Previously, even with simd enabled via `-mattr=+simd128`, the compiler
cannot utilize v128 to optimize loads and setcc of i128, instead
legalizing it to consecutive i64s.

This PR then adds support for setcc of i128 by converting them to
v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16
bytes or more (when simd128 is present).

The check for enabling this optimization is if the comparison operand is
either a load or an integer in i128, with the comparison code being
either `EQ | NE`, without `NoImplicitFloat` function flag.

Inspiration taken from RISCV's isel lowering.
2025-08-12 11:04:37 -07:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
Sam Parker
68152f1301
[WebAssembly] v16i8 mul support (#150209)
During target DAG combine, use two i16x8.extmul_low_i8x16 and a shuffle
for v16i8 mul.

On my AArch64 machine, using V8, I observe a 3.14% geomean improvement
across 65 benchmarks, including: 9.2% for spec2017.x264, 6% for libyuv
and 1.8% for ncnn.
2025-07-29 09:23:31 +01:00
Jasmine Tang
8e6a05d471
[WebAssembly] Added vectorized version of fexp10 to the supported list (#150564)
Fixes https://github.com/llvm/llvm-project/issues/117200.

The default behavior in TargetLoweringBase is only scalar floats on fexp
are supported by default, not the vectorized version. This PR adds
`ISD::FEXP10` to the supported list.
2025-07-25 12:30:59 -07:00
Hood Chatham
15715f4089
[WebAssembly,llvm] Add llvm.wasm.ref.test.func intrinsic (#147486)
This adds an llvm intrinsic for WebAssembly to test the type of a
function. It is intended for adding a future clang builtin
` __builtin_wasm_test_function_pointer_signature` so we can test whether
calling a function pointer will fail with function signature mismatch.

Since the type of a function pointer is just `ptr` we can't figure out
the expected type from that.
The way I figured out to encode the type was by passing 0's of the
appropriate type to the intrinsic.
The first argument gives the expected type of the return type and the
later values give the expected
type of the arguments. So
```llvm
@llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0)
```
tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to:
```wat
local.get $func
table.get $__indirect_function_table
ref.test (double, i32) -> (i32)
```
To indicate the function should be void, I somewhat arbitrarily picked
`token poison`, so the following tests for `(i32) -> ()`:
```llvm
@llvm.wasm.ref.test.func(ptr %func, token poison, i32 0)
```

To lower this intrinsic, we need some place to put the type information.
With `encodeFunctionSignature()` we encode the signature information
into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in
`WebAssemblyMCInstLower.cpp`.
2025-07-22 14:07:34 -07:00
Arseny Kapoulkine
5b98992fb9
[WebAssembly] Optimize convert_iKxN_u into convert_iKxN_s (#149609)
convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is
known to have sign bit 0. This results in emitting Wasm opcodes that, on
some targets (like x86_64), are dramatically slower than signed versions
on major engines.

Similarly to X86, we now fix this up in isel when the instruction has
nonneg flag from canonicalization or if we know the source has zero sign
bit.

Fixes #149457.
2025-07-21 09:17:29 -07:00
Jasmine Tang
343f7475be
[WebAssembly] Add support for memcmp expansion (#148298)
Fixes https://github.com/llvm/llvm-project/issues/61400

Added test case in llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
2025-07-20 10:27:42 -07:00
Matt Arsenault
d8ef156379
DAG: Remove verifyReturnAddressArgumentIsConstant (#147240)
The intrinsic argument is already marked with immarg so non-constant
values are rejected by the IR verifier.
2025-07-07 16:28:47 +09:00
jjasmine
e9c9f8f374
[WebAssembly] Fold any/alltrue (setcc x, 0, eq/ne) to [not] any/alltrue x (#144741)
Fixes https://github.com/llvm/llvm-project/issues/50142, a miss of
further vectorization, where we can only achieve zext (xor (any_true),
-1).

Now in test case simd-setcc-reductions, it's converted to all_true.

Also fixes https://github.com/llvm/llvm-project/issues/145177, which is

all_true (setcc x, 0, eq) -> not any_true
any_true (setcc x, 0, ne) -> any_true
all_true (setcc x, 0, ne) -> all_true

---------

Co-authored-by: badumbatish <--show-origin>
2025-07-01 15:27:37 -07:00
jjasmine
4a8c1f7d12
[WebAssembly] [Backend] Wasm optimize illegal bitmask (#145627)
[WebAssembly] [Backend] Wasm optimize illegal bitmask for #131980.

Currently, the case for illegal bitmask (v32i8 or v64i8) is that at the
SelectionDag level, two (four) vectors of v128 will be concatenated
together, then they'll all be SETCC by the same pseudo illegal
instruction, which requires expansion later on.

I opt for SETCC-ing them seperately, bitcast and zext them and then add
them up together in the end.

---------

Co-authored-by: badumbatish <--show-origin>
2025-07-01 15:13:08 -07:00
Sam Parker
d12fb1fc37
[WebAssembly] Refactor PerformSETCCCombine (#144875)
Extract the logic into a templated helper function.
2025-06-25 08:56:35 +01:00
Matt Arsenault
ba7369c49c
WebAssembly: Move runtime libcall setting out of TargetLowering (#142624)
RuntimeLibcallInfo needs to be correct outside of codegen contexts.
2025-06-16 10:46:05 +09:00
Kazu Hirata
dd702b3969
[llvm] Remove unused local variables (NFC) (#140422) 2025-05-18 07:31:51 -07:00
Kazu Hirata
b4ab53c3b0
[Target] Use llvm::max_element (NFC) (#137926) 2025-05-01 23:44:28 -07:00
Alex Crichton
c63246645e
[WebAssembly] Add a missing break statement (#133783)
This fixes an issue introduced in #132430 where a `break;` statement was
accidentally missing causing unintended fall-through.
2025-03-31 12:58:06 -07:00
Alex Crichton
a415b7f86e
[WebAssembly] Add more lowerings for wide-arithmetic (#132430)
This commit is the result of investigation and discussion on
WebAssembly/wide-arithmetic#6 where alternatives to the `i64.add128`
instruction were discussed but ultimately deferred to a future proposal.
In spite of this though I wanted to apply a few changes to the LLVM
backend here with `wide-arithmetic` enabled for a few minor changes:

* A lowering for the `ISD::UADDO` node is added which uses `add128`
where the upper bits of the two operands are constant zeros and the
result of the 128-bit addition is the result of the overflowing
addition.
* The high bits of a `I64_ADD128` node are now flagged as "known zero"
if the upper bits of the inputs are also zero, assisting this `UADDO`
lowering to ensure the backend knows that the carry result is a 1-bit
result.

A few tests were then added to showcase various lowerings for various
operations that can be done with wide-arithmetic. They don't all
optimize super well at this time but I wanted to add them as a reference
here regardless to have them on-hand for future evaluations if
necessary.
2025-03-31 11:36:32 -07:00
Sam Parker
103119a435
[WebAssembly] Lower wide SIMD i8 muls (#130785)
Currently, 'wide' i32 simd multiplication, with extended i8 elements,
will perform the multiplication with i32 So, for IR like the following:
```
  %wide.a = sext <8 x i8> %a to <8 x i32>
  %wide.b = sext <8 x i8> %a to <8 x i32>
  %mul = mul <8 x i32> %wide.a, %wide.b
  ret <8 x i32> %mul
```

We would generate the following sequence:
```
  i16x8.extend_low_i8x16_s $push6=, $1
  local.tee $push5=, $3=, $pop6
  i32x4.extmul_low_i16x8_s $push0=, $pop5, $3
  v128.store 0($0), $pop0
  i8x16.shuffle $push1=, $1, $1, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
  i16x8.extend_low_i8x16_s $push4=, $pop1
  local.tee $push3=, $1=, $pop4
  i32x4.extmul_low_i16x8_s $push2=, $pop3, $1
  v128.store 16($0), $pop2
  return
```

But now we perform the multiplication with i16, resulting in:
```
  i16x8.extmul_low_i8x16_s $push3=, $1, $1
  local.tee $push2=, $1=, $pop3
  i32x4.extend_high_i16x8_s $push0=, $pop2
  v128.store 16($0), $pop0
  i32x4.extend_low_i16x8_s $push1=, $1
  v128.store 0($0), $pop1
  return
```
2025-03-21 06:57:57 +00:00
Brendan Dahl
9102afcd01
[WebAssembly] Use the same lowerings for f16x8 as other float vectors. (#127897)
This fixes failures to select the various compare operations that
weren't being expanded for f16x8.
2025-02-25 11:01:32 -08:00
Brendan Dahl
67056c280a
[WebAssembly] Support shuffle for F16x8 vectors. (#127857) 2025-02-25 10:39:54 -08:00
Nikita Popov
cc539138ac
[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (#126880)
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.

Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
2025-02-19 10:16:57 +01:00
Sam Parker
948a8477c6
[WebAssembly] Recognise EXTEND_HIGH (#123325)
When lowering EXTEND_VECTOR_INREG, check whether the operand is a
shuffle that is moving the top half of a vector into the lower half. If
so, we can EXTEND_HIGH the input to the shuffle instead.
2025-02-17 09:04:29 +00:00
Sam Parker
df2de13695
[WebAssembly] Autovec support for dot (#123207)
Enable the use of partial.reduce.add that we can lower to dot or a tree
of (add (extmul_low_u, extmul_high_u)) for the unsigned case. We support
both v8i16 and v16i8 inputs.
2025-02-03 08:58:43 +00:00