423 Commits

Author SHA1 Message Date
Jasmine Tang
d7a29e5d56
[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703)
This PR reapplies https://github.com/llvm/llvm-project/pull/149461

In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.

For example, with
```llvm
 %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
  %res = icmp eq i32 %cmp_16, 0
```

the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.

Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.
2025-08-15 12:06:47 -07:00
Nikita Popov
240c454c4d
[CodeGen] Remove default ctors for InputArg and OutputArg (#153205)
These make it easy to forget to initialize some members, like the newly
added OrigTy. Force these to always go through the ctor instead.
2025-08-13 10:51:43 +02:00
Jasmine Tang
d32793ca6e
Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360)
Reverts llvm/llvm-project#149461

The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it

Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp

Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`

Original comment report:
https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746
2025-08-13 07:41:44 +00:00
Jasmine Tang
348f01f89c
[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 (#149461)
Fixes https://github.com/llvm/llvm-project/issues/149230

Previously, even with simd enabled via `-mattr=+simd128`, the compiler
cannot utilize v128 to optimize loads and setcc of i128, instead
legalizing it to consecutive i64s.

This PR then adds support for setcc of i128 by converting them to
v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16
bytes or more (when simd128 is present).

The check for enabling this optimization is if the comparison operand is
either a load or an integer in i128, with the comparison code being
either `EQ | NE`, without `NoImplicitFloat` function flag.

Inspiration taken from RISCV's isel lowering.
2025-08-12 11:04:37 -07:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
Sam Parker
68152f1301
[WebAssembly] v16i8 mul support (#150209)
During target DAG combine, use two i16x8.extmul_low_i8x16 and a shuffle
for v16i8 mul.

On my AArch64 machine, using V8, I observe a 3.14% geomean improvement
across 65 benchmarks, including: 9.2% for spec2017.x264, 6% for libyuv
and 1.8% for ncnn.
2025-07-29 09:23:31 +01:00
Jasmine Tang
8e6a05d471
[WebAssembly] Added vectorized version of fexp10 to the supported list (#150564)
Fixes https://github.com/llvm/llvm-project/issues/117200.

The default behavior in TargetLoweringBase is only scalar floats on fexp
are supported by default, not the vectorized version. This PR adds
`ISD::FEXP10` to the supported list.
2025-07-25 12:30:59 -07:00
Hood Chatham
15715f4089
[WebAssembly,llvm] Add llvm.wasm.ref.test.func intrinsic (#147486)
This adds an llvm intrinsic for WebAssembly to test the type of a
function. It is intended for adding a future clang builtin
` __builtin_wasm_test_function_pointer_signature` so we can test whether
calling a function pointer will fail with function signature mismatch.

Since the type of a function pointer is just `ptr` we can't figure out
the expected type from that.
The way I figured out to encode the type was by passing 0's of the
appropriate type to the intrinsic.
The first argument gives the expected type of the return type and the
later values give the expected
type of the arguments. So
```llvm
@llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0)
```
tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to:
```wat
local.get $func
table.get $__indirect_function_table
ref.test (double, i32) -> (i32)
```
To indicate the function should be void, I somewhat arbitrarily picked
`token poison`, so the following tests for `(i32) -> ()`:
```llvm
@llvm.wasm.ref.test.func(ptr %func, token poison, i32 0)
```

To lower this intrinsic, we need some place to put the type information.
With `encodeFunctionSignature()` we encode the signature information
into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in
`WebAssemblyMCInstLower.cpp`.
2025-07-22 14:07:34 -07:00
Arseny Kapoulkine
5b98992fb9
[WebAssembly] Optimize convert_iKxN_u into convert_iKxN_s (#149609)
convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is
known to have sign bit 0. This results in emitting Wasm opcodes that, on
some targets (like x86_64), are dramatically slower than signed versions
on major engines.

Similarly to X86, we now fix this up in isel when the instruction has
nonneg flag from canonicalization or if we know the source has zero sign
bit.

Fixes #149457.
2025-07-21 09:17:29 -07:00
Jasmine Tang
343f7475be
[WebAssembly] Add support for memcmp expansion (#148298)
Fixes https://github.com/llvm/llvm-project/issues/61400

Added test case in llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
2025-07-20 10:27:42 -07:00
Matt Arsenault
d8ef156379
DAG: Remove verifyReturnAddressArgumentIsConstant (#147240)
The intrinsic argument is already marked with immarg so non-constant
values are rejected by the IR verifier.
2025-07-07 16:28:47 +09:00
jjasmine
e9c9f8f374
[WebAssembly] Fold any/alltrue (setcc x, 0, eq/ne) to [not] any/alltrue x (#144741)
Fixes https://github.com/llvm/llvm-project/issues/50142, a miss of
further vectorization, where we can only achieve zext (xor (any_true),
-1).

Now in test case simd-setcc-reductions, it's converted to all_true.

Also fixes https://github.com/llvm/llvm-project/issues/145177, which is

all_true (setcc x, 0, eq) -> not any_true
any_true (setcc x, 0, ne) -> any_true
all_true (setcc x, 0, ne) -> all_true

---------

Co-authored-by: badumbatish <--show-origin>
2025-07-01 15:27:37 -07:00
jjasmine
4a8c1f7d12
[WebAssembly] [Backend] Wasm optimize illegal bitmask (#145627)
[WebAssembly] [Backend] Wasm optimize illegal bitmask for #131980.

Currently, the case for illegal bitmask (v32i8 or v64i8) is that at the
SelectionDag level, two (four) vectors of v128 will be concatenated
together, then they'll all be SETCC by the same pseudo illegal
instruction, which requires expansion later on.

I opt for SETCC-ing them seperately, bitcast and zext them and then add
them up together in the end.

---------

Co-authored-by: badumbatish <--show-origin>
2025-07-01 15:13:08 -07:00
Sam Parker
d12fb1fc37
[WebAssembly] Refactor PerformSETCCCombine (#144875)
Extract the logic into a templated helper function.
2025-06-25 08:56:35 +01:00
Matt Arsenault
ba7369c49c
WebAssembly: Move runtime libcall setting out of TargetLowering (#142624)
RuntimeLibcallInfo needs to be correct outside of codegen contexts.
2025-06-16 10:46:05 +09:00
Kazu Hirata
dd702b3969
[llvm] Remove unused local variables (NFC) (#140422) 2025-05-18 07:31:51 -07:00
Kazu Hirata
b4ab53c3b0
[Target] Use llvm::max_element (NFC) (#137926) 2025-05-01 23:44:28 -07:00
Alex Crichton
c63246645e
[WebAssembly] Add a missing break statement (#133783)
This fixes an issue introduced in #132430 where a `break;` statement was
accidentally missing causing unintended fall-through.
2025-03-31 12:58:06 -07:00
Alex Crichton
a415b7f86e
[WebAssembly] Add more lowerings for wide-arithmetic (#132430)
This commit is the result of investigation and discussion on
WebAssembly/wide-arithmetic#6 where alternatives to the `i64.add128`
instruction were discussed but ultimately deferred to a future proposal.
In spite of this though I wanted to apply a few changes to the LLVM
backend here with `wide-arithmetic` enabled for a few minor changes:

* A lowering for the `ISD::UADDO` node is added which uses `add128`
where the upper bits of the two operands are constant zeros and the
result of the 128-bit addition is the result of the overflowing
addition.
* The high bits of a `I64_ADD128` node are now flagged as "known zero"
if the upper bits of the inputs are also zero, assisting this `UADDO`
lowering to ensure the backend knows that the carry result is a 1-bit
result.

A few tests were then added to showcase various lowerings for various
operations that can be done with wide-arithmetic. They don't all
optimize super well at this time but I wanted to add them as a reference
here regardless to have them on-hand for future evaluations if
necessary.
2025-03-31 11:36:32 -07:00
Sam Parker
103119a435
[WebAssembly] Lower wide SIMD i8 muls (#130785)
Currently, 'wide' i32 simd multiplication, with extended i8 elements,
will perform the multiplication with i32 So, for IR like the following:
```
  %wide.a = sext <8 x i8> %a to <8 x i32>
  %wide.b = sext <8 x i8> %a to <8 x i32>
  %mul = mul <8 x i32> %wide.a, %wide.b
  ret <8 x i32> %mul
```

We would generate the following sequence:
```
  i16x8.extend_low_i8x16_s $push6=, $1
  local.tee $push5=, $3=, $pop6
  i32x4.extmul_low_i16x8_s $push0=, $pop5, $3
  v128.store 0($0), $pop0
  i8x16.shuffle $push1=, $1, $1, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
  i16x8.extend_low_i8x16_s $push4=, $pop1
  local.tee $push3=, $1=, $pop4
  i32x4.extmul_low_i16x8_s $push2=, $pop3, $1
  v128.store 16($0), $pop2
  return
```

But now we perform the multiplication with i16, resulting in:
```
  i16x8.extmul_low_i8x16_s $push3=, $1, $1
  local.tee $push2=, $1=, $pop3
  i32x4.extend_high_i16x8_s $push0=, $pop2
  v128.store 16($0), $pop0
  i32x4.extend_low_i16x8_s $push1=, $1
  v128.store 0($0), $pop1
  return
```
2025-03-21 06:57:57 +00:00
Brendan Dahl
9102afcd01
[WebAssembly] Use the same lowerings for f16x8 as other float vectors. (#127897)
This fixes failures to select the various compare operations that
weren't being expanded for f16x8.
2025-02-25 11:01:32 -08:00
Brendan Dahl
67056c280a
[WebAssembly] Support shuffle for F16x8 vectors. (#127857) 2025-02-25 10:39:54 -08:00
Nikita Popov
cc539138ac
[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (#126880)
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.

Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
2025-02-19 10:16:57 +01:00
Sam Parker
948a8477c6
[WebAssembly] Recognise EXTEND_HIGH (#123325)
When lowering EXTEND_VECTOR_INREG, check whether the operand is a
shuffle that is moving the top half of a vector into the lower half. If
so, we can EXTEND_HIGH the input to the shuffle instead.
2025-02-17 09:04:29 +00:00
Sam Parker
df2de13695
[WebAssembly] Autovec support for dot (#123207)
Enable the use of partial.reduce.add that we can lower to dot or a tree
of (add (extmul_low_u, extmul_high_u)) for the unsigned case. We support
both v8i16 and v16i8 inputs.
2025-02-03 08:58:43 +00:00
yingopq
754ed95b66
[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525)
…on returning { i8, i128 }

Fixes https://github.com/llvm/llvm-project/issues/96432.
2025-01-20 16:47:40 +08:00
Sergei Barannikov
9ae92d7056
[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969)
With this change, targets are no longer required to put memory / strict-fp opcodes after special
`ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers.
This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).

Pull Request: https://github.com/llvm/llvm-project/pull/119969
2024-12-21 05:29:51 +03:00
David Sherwood
8630a7ba7c
Reapply "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566)" (#118823)
[Reverts d57892a2a153ab71a796f07e39d939eae6910c21]

For IR like this:

%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1

where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into

%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5

For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.

NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file

CodeGen/AArch64/extract-vector-cmp.ll

because scalarizeExtractedBinOp only works if one of the input
operands is a constant.

---------

Co-authored-by: Paul Walker <paul.walker@arm.com>
2024-12-09 10:56:44 +00:00
Vitaly Buka
d57892a2a1
Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693)
Reverts llvm/llvm-project#117566

Breaks libc++ tests with HWASAN
https://lab.llvm.org/buildbot/#/builders/55/builds/3959
2024-12-04 12:36:46 -08:00
David Sherwood
4675db5f39
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566)
For IR like this:

%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1

where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into

%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5

For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.

NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file

CodeGen/AArch64/extract-vector-cmp.ll

because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
2024-12-04 10:26:51 +00:00
Dan Gohman
c3536b263f
[WebAssembly] Define call-indirect-overlong and bulk-memory-opt features (#117087)
This defines some new target features. These are subsets of existing
features that reflect implementation concerns:

- "call-indirect-overlong" - implied by "reference-types"; just the
overlong encoding for the `call_indirect` immediate, and not the actual
reference types.

- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.

This is split out from https://github.com/llvm/llvm-project/pull/112035.

---------

Co-authored-by: Heejin Ahn <aheejin@gmail.com>
2024-12-02 17:08:07 -08:00
Sam Clegg
ea58410d0f
[WebAssembly] Implement %llvm.thread.pointer intrinsic (#117817)
We can simply use the `__tls_base` global for this which is guaranteed
to be non-zero and unique per thread.

Fixes: #117433
2024-11-26 17:19:14 -08:00
David Sherwood
9b76e7fc60
Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)" (#117556)
This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.
2024-11-25 13:49:21 +00:00
David Sherwood
22ec44f509
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)
For IR like this:

  %icmp = icmp ult <4 x i32> %a, splat (i32 5)
  %res = extractelement <4 x i1> %icmp, i32 1

where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into

  %ext = extractelement <4 x i32> %a, i32 1
  %res = icmp ult i32 %ext, 5

For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.

NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file

CodeGen/AArch64/extract-vector-cmp.ll

because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
2024-11-25 09:25:01 +00:00
Kazu Hirata
43570a2841
[WebAssembly] Remove unused includes (NFC) (#116318)
Identified with misc-include-cleaner.
2024-11-15 07:26:37 -08:00
Dan Gohman
118445841d
[WebAssembly] Protect memory.fill and memory.copy from zero-length ranges. (#112617)
WebAssembly's `memory.fill` and `memory.copy` instructions trap if the
pointers are out of bounds, even if the length is zero. This is
different from LLVM, which expects that it can call `memcpy` on
arbitrary invalid pointers if the length is zero. To avoid spurious
traps, branch around `memory.fill` and `memory.copy` when the length is
zero.

---------

Co-authored-by: Heejin Ahn <aheejin@gmail.com>
2024-10-24 14:13:58 -07:00
Jordan Rupprecht
33363521ca
[NFC][WebAssembly] Inline var only used in assertion (#113507) 2024-10-23 18:51:25 -05:00
Alex Crichton
c2293b33dd
[WebAssembly] Implement the wide-arithmetic proposal (#111598)
This commit implements the [wide-arithmetic] proposal which has recently
reached phase 2 in the WebAssembly proposals process. The goal here is
to implement support in LLVM for emitting these instructions which are
gated behind a new feature flag by default. A new `wide-arithmetic`
feature flag is introduced which gates these four new instructions from
being emitted.

Emission of each instruction itself is relatively simple given LLVM's
preexisting lowering rules and infrastructure. The main gotcha is that
due to the multi-result nature of all of these instructions it needed
the lowerings to be implemented in C++ rather than in TableGen.

[wide-arithmetic]: https://github.com/WebAssembly/wide-arithmetic
2024-10-23 11:39:58 -07:00
Jeffrey Byrnes
853c43d04a
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
2024-10-09 14:30:09 -07:00
Simon Pilgrim
f8f0a266e0
[clang][wasm] Replace the target integer sub saturate intrinsics with the equivalent generic __builtin_elementwise_sub_sat intrinsics (#109405)
Remove the Intrinsic::wasm_sub_sat_signed/wasm_sub_sat_unsigned entries
and just use sub_sat_s/sub_sat_u directly
2024-09-22 10:12:41 +01:00
Brendan Dahl
c076638c70
[WebAssembly] Support BUILD_VECTOR with F16x8. (#108117)
Convert BUILD_VECTORS with FP16x8 to I16x8 since there's no FP16 scalar
value to intialize v128.const.
2024-09-11 10:00:10 -07:00
Brendan Dahl
415288a2a7
[WebAssembly] Add load and store patterns for V8F16. (#108119) 2024-09-11 09:53:53 -07:00
Brendan Dahl
5703d8572f
[WebAssembly] Add intrinsics to wasm_simd128.h for all FP16 instructions (#106465)
Getting this to work required a few additional changes:
- Add builtins for any instructions that can't be done with plain C
currently.
- Add support for the saturating version of fp_to_<s,i>_I16x8. Other
vector sizes supported this already.
- Support bitcast of f16x8 to v128. Needed to return a __f16x8 as
v128_t.
2024-08-30 08:42:37 -07:00
Sergei Barannikov
4d7a0abae8
[DataLayout] Change return type of getStackAlignment to MaybeAlign (#105478)
Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use and complicates testing.

This change also makes `exceedsNaturalStackAlignment` method redundant.
2024-08-27 22:59:33 +03:00
Brendan Dahl
7d373cef49
[WebAssembly] Change half-precision feature name to fp16. (#105434)
This better aligns with how the feature is being referred to and what
runtimes (V8) are calling it.
2024-08-22 09:44:33 -07:00
Sam Parker
76c4529515
[WebAssembly] Fix assertion in LowerBUILD_VECTOR (#101961)
The assertion was failing in the case where we were trying to lower to
loadxx_zero, but lane zero was undef.
2024-08-05 14:38:12 -07:00
Sam Parker
08decd20a9
[WebAssembly] load_zero to initialise build_vector (#100610)
Instead of splatting a single lane, to initialise a build_vector, lower
to scalar_to_vector which can be selected to load_zero.

Also add load_zero and load_lane patterns for f32x4 and f64x2.
2024-08-02 10:11:21 +01:00
Amara Emerson
f270a4dd66
[AArch64] Don't tail call memset if it would convert to a bzero. (#98969)
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.

This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.

rdar://131419786
2024-07-17 01:31:52 -07:00
Roger Ferrer Ibáñez
05e6bb40eb
[SelectionDAG] Add an ISD::CLEAR_CACHE node to lower llvm.clear_cache (#93795)
The current way of lowering `llvm.clear_cache` is a bit unusual. As
suggested by Matt Arsenault we are better off using an ISD node.

This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall
by default named `__clear_cache` and the default legalisation is a
libcall.

This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE`
needed by RISC-V on some platforms.
2024-05-30 14:55:32 +02:00
Brendan Dahl
60bce6eab4
[WebAssembly] Implement all f16x8 binary instructions. (#93360)
This reuses most of the code that was created for f32x4 and f64x2 binary
instructions and tries to follow how they were implemented.

add/sub/mul/div - use regular LL instructions
min/max - use the minimum/maximum intrinsic, and also have builtins
pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins

Specified at:

29a9b9462c/proposals/half-precision/Overview.md
2024-05-28 16:33:20 -07:00