llvm-project

Author	SHA1	Message	Date
Sam Parker	a4eb7ea225	[WebAssembly] Lower fmuladd to madd and nmadd (#161355 ) Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 10:36:08 +01:00
Derek Schuff	abc8aac6d2	[WebAssembly] Check intrinsic argument count before Any/All combine (#162163 ) This code is activated on all INTRINSIC_WO_CHAIN but only handles a selection. However it was trying to read the arguments before checking which intrinsic it was handling. This fails for intrinsics that have no arguments.	2025-10-07 23:52:25 +00:00
Sam Parker	156e9b4b69	[WebAssembly] Use partial_reduce_mla ISD nodes (#161184 ) Addresssing issue #160847. Move away from combining the intrinsic call and instead lower the ISD nodes, using tablegen for pattern matching.	2025-09-30 08:28:56 +01:00
Sander de Smalen	17e008db17	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637 ) The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.	2025-09-17 11:44:47 +01:00
Sam Parker	586c0ad918	[WebAssembly] Support partial-reduce accumulator (#158060 ) We currently only support partial.reduce.add in the case where we are performing a multiply-accumulate. Now add support for any partial reduction where the input is being extended, where we can take advantage of extadd_pairwise.	2025-09-12 07:03:49 +01:00
Sam Parker	6dacdc31ec	[WebAssembly] extadd_pairwise for PartialReduce (#157669 ) Avoid using extends, and adding the high and low half and use extadd_pairwise instead.	2025-09-10 08:13:46 +01:00
Sam Parker	e557ad687b	[WebAssembly] v8i8 mul support (#151145 ) During DAG combine, promote the operands to v8i16 by concanting with an undef vector and then use extmul_low to perform the mul at i16. Finally, shuffle the low bytes out of the i16 elements into the result vector.	2025-08-27 11:39:26 +01:00
Jasmine Tang	7fcee5fe08	[WebAssembly] Add support for avgr_u in loops (#153252 ) Fixes https://github.com/llvm/llvm-project/issues/150550. With the test case ``` void f(unsigned char x, unsigned char y, int n) { // should have been vectorized into avgr_u instead of seperated vectorized add and logical right shift for (int i = 0; i < n; i++) x[i] = (x[i] + y[i] + 1) / 2; } ``` the backend failed to recognize that this can be reduced to avgr_u since the loop vectorizer doesn't transform into the existing pattern in tablegen. This PR sets AVGCEIL_U as legal for v8i16 and v16i8 and selects it to avgr_u in the tablegen file.	2025-08-22 09:52:49 -07:00
Jasmine Tang	d7a29e5d56	[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703 ) This PR reapplies https://github.com/llvm/llvm-project/pull/149461 In the original `combineVectorSizedSetCCEquality`, the result of setcc is being negated by returning setcc with the same cond code, leading to wrong logic. For example, with ```llvm %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16) %res = icmp eq i32 %cmp_16, 0 ``` the original PR producese all_true and then also compares the result equal to 0 (using the same SETEQ in the returning setcc), meaning that semantically, it effectively is calling icmp ne. Instead, the PR should have use SETNE in the returning setcc, this way, all true return 1, then it is compared again ne 0, which is equivalent to icmp eq.	2025-08-15 12:06:47 -07:00
Nikita Popov	240c454c4d	[CodeGen] Remove default ctors for InputArg and OutputArg (#153205 ) These make it easy to forget to initialize some members, like the newly added OrigTy. Force these to always go through the ctor instead.	2025-08-13 10:51:43 +02:00
Jasmine Tang	d32793ca6e	Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360 ) Reverts llvm/llvm-project#149461 The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the Emscripten test suite has failed. This PR applies a revert so I can take a closer look at it Test case link: https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o something.js` Original comment report: https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746	2025-08-13 07:41:44 +00:00
Jasmine Tang	348f01f89c	[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 (#149461 ) Fixes https://github.com/llvm/llvm-project/issues/149230 Previously, even with simd enabled via `-mattr=+simd128`, the compiler cannot utilize v128 to optimize loads and setcc of i128, instead legalizing it to consecutive i64s. This PR then adds support for setcc of i128 by converting them to v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16 bytes or more (when simd128 is present). The check for enabling this optimization is if the comparison operand is either a load or an integer in i128, with the comparison code being either `EQ \| NE`, without `NoImplicitFloat` function flag. Inspiration taken from RISCV's isel lowering.	2025-08-12 11:04:37 -07:00
Nikita Popov	406d9b1dd6	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319 ) The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).	2025-08-07 09:12:40 +02:00
Sam Parker	68152f1301	[WebAssembly] v16i8 mul support (#150209 ) During target DAG combine, use two i16x8.extmul_low_i8x16 and a shuffle for v16i8 mul. On my AArch64 machine, using V8, I observe a 3.14% geomean improvement across 65 benchmarks, including: 9.2% for spec2017.x264, 6% for libyuv and 1.8% for ncnn.	2025-07-29 09:23:31 +01:00
Jasmine Tang	8e6a05d471	[WebAssembly] Added vectorized version of fexp10 to the supported list (#150564 ) Fixes https://github.com/llvm/llvm-project/issues/117200. The default behavior in TargetLoweringBase is only scalar floats on fexp are supported by default, not the vectorized version. This PR adds `ISD::FEXP10` to the supported list.	2025-07-25 12:30:59 -07:00
Hood Chatham	15715f4089	[WebAssembly,llvm] Add llvm.wasm.ref.test.func intrinsic (#147486 ) This adds an llvm intrinsic for WebAssembly to test the type of a function. It is intended for adding a future clang builtin ` __builtin_wasm_test_function_pointer_signature` so we can test whether calling a function pointer will fail with function signature mismatch. Since the type of a function pointer is just `ptr` we can't figure out the expected type from that. The way I figured out to encode the type was by passing 0's of the appropriate type to the intrinsic. The first argument gives the expected type of the return type and the later values give the expected type of the arguments. So ```llvm @llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0) ``` tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to: ```wat local.get $func table.get $__indirect_function_table ref.test (double, i32) -> (i32) ``` To indicate the function should be void, I somewhat arbitrarily picked `token poison`, so the following tests for `(i32) -> ()`: ```llvm @llvm.wasm.ref.test.func(ptr %func, token poison, i32 0) ``` To lower this intrinsic, we need some place to put the type information. With `encodeFunctionSignature()` we encode the signature information into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in `WebAssemblyMCInstLower.cpp`.	2025-07-22 14:07:34 -07:00
Arseny Kapoulkine	5b98992fb9	[WebAssembly] Optimize convert_iKxN_u into convert_iKxN_s (#149609 ) convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is known to have sign bit 0. This results in emitting Wasm opcodes that, on some targets (like x86_64), are dramatically slower than signed versions on major engines. Similarly to X86, we now fix this up in isel when the instruction has nonneg flag from canonicalization or if we know the source has zero sign bit. Fixes #149457.	2025-07-21 09:17:29 -07:00
Jasmine Tang	343f7475be	[WebAssembly] Add support for memcmp expansion (#148298 ) Fixes https://github.com/llvm/llvm-project/issues/61400 Added test case in llvm/test/CodeGen/WebAssembly/memcmp-expand.ll	2025-07-20 10:27:42 -07:00
Matt Arsenault	d8ef156379	DAG: Remove verifyReturnAddressArgumentIsConstant (#147240 ) The intrinsic argument is already marked with immarg so non-constant values are rejected by the IR verifier.	2025-07-07 16:28:47 +09:00
jjasmine	e9c9f8f374	[WebAssembly] Fold any/alltrue (setcc x, 0, eq/ne) to [not] any/alltrue x (#144741 ) Fixes https://github.com/llvm/llvm-project/issues/50142, a miss of further vectorization, where we can only achieve zext (xor (any_true), -1). Now in test case simd-setcc-reductions, it's converted to all_true. Also fixes https://github.com/llvm/llvm-project/issues/145177, which is all_true (setcc x, 0, eq) -> not any_true any_true (setcc x, 0, ne) -> any_true all_true (setcc x, 0, ne) -> all_true --------- Co-authored-by: badumbatish <--show-origin>	2025-07-01 15:27:37 -07:00
jjasmine	4a8c1f7d12	[WebAssembly] [Backend] Wasm optimize illegal bitmask (#145627 ) [WebAssembly] [Backend] Wasm optimize illegal bitmask for #131980. Currently, the case for illegal bitmask (v32i8 or v64i8) is that at the SelectionDag level, two (four) vectors of v128 will be concatenated together, then they'll all be SETCC by the same pseudo illegal instruction, which requires expansion later on. I opt for SETCC-ing them seperately, bitcast and zext them and then add them up together in the end. --------- Co-authored-by: badumbatish <--show-origin>	2025-07-01 15:13:08 -07:00
Sam Parker	d12fb1fc37	[WebAssembly] Refactor PerformSETCCCombine (#144875 ) Extract the logic into a templated helper function.	2025-06-25 08:56:35 +01:00
Matt Arsenault	ba7369c49c	WebAssembly: Move runtime libcall setting out of TargetLowering (#142624 ) RuntimeLibcallInfo needs to be correct outside of codegen contexts.	2025-06-16 10:46:05 +09:00
Kazu Hirata	dd702b3969	[llvm] Remove unused local variables (NFC) (#140422 )	2025-05-18 07:31:51 -07:00
Kazu Hirata	b4ab53c3b0	[Target] Use llvm::max_element (NFC) (#137926 )	2025-05-01 23:44:28 -07:00
Alex Crichton	c63246645e	[WebAssembly] Add a missing `break` statement (#133783 ) This fixes an issue introduced in #132430 where a `break;` statement was accidentally missing causing unintended fall-through.	2025-03-31 12:58:06 -07:00
Alex Crichton	a415b7f86e	[WebAssembly] Add more lowerings for wide-arithmetic (#132430 ) This commit is the result of investigation and discussion on WebAssembly/wide-arithmetic#6 where alternatives to the `i64.add128` instruction were discussed but ultimately deferred to a future proposal. In spite of this though I wanted to apply a few changes to the LLVM backend here with `wide-arithmetic` enabled for a few minor changes: * A lowering for the `ISD::UADDO` node is added which uses `add128` where the upper bits of the two operands are constant zeros and the result of the 128-bit addition is the result of the overflowing addition. * The high bits of a `I64_ADD128` node are now flagged as "known zero" if the upper bits of the inputs are also zero, assisting this `UADDO` lowering to ensure the backend knows that the carry result is a 1-bit result. A few tests were then added to showcase various lowerings for various operations that can be done with wide-arithmetic. They don't all optimize super well at this time but I wanted to add them as a reference here regardless to have them on-hand for future evaluations if necessary.	2025-03-31 11:36:32 -07:00
Sam Parker	103119a435	[WebAssembly] Lower wide SIMD i8 muls (#130785 ) Currently, 'wide' i32 simd multiplication, with extended i8 elements, will perform the multiplication with i32 So, for IR like the following: ``` %wide.a = sext <8 x i8> %a to <8 x i32> %wide.b = sext <8 x i8> %a to <8 x i32> %mul = mul <8 x i32> %wide.a, %wide.b ret <8 x i32> %mul ``` We would generate the following sequence: ``` i16x8.extend_low_i8x16_s $push6=, $1 local.tee $push5=, $3=, $pop6 i32x4.extmul_low_i16x8_s $push0=, $pop5, $3 v128.store 0($0), $pop0 i8x16.shuffle $push1=, $1, $1, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 i16x8.extend_low_i8x16_s $push4=, $pop1 local.tee $push3=, $1=, $pop4 i32x4.extmul_low_i16x8_s $push2=, $pop3, $1 v128.store 16($0), $pop2 return ``` But now we perform the multiplication with i16, resulting in: ``` i16x8.extmul_low_i8x16_s $push3=, $1, $1 local.tee $push2=, $1=, $pop3 i32x4.extend_high_i16x8_s $push0=, $pop2 v128.store 16($0), $pop0 i32x4.extend_low_i16x8_s $push1=, $1 v128.store 0($0), $pop1 return ```	2025-03-21 06:57:57 +00:00
Brendan Dahl	9102afcd01	[WebAssembly] Use the same lowerings for f16x8 as other float vectors. (#127897 ) This fixes failures to select the various compare operations that weren't being expanded for f16x8.	2025-02-25 11:01:32 -08:00
Brendan Dahl	67056c280a	[WebAssembly] Support shuffle for F16x8 vectors. (#127857 )	2025-02-25 10:39:54 -08:00
Nikita Popov	cc539138ac	[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (#126880 ) The standard libcalls for half to float and float to half conversion are __extendhfsf2 and __truncsfhf2. However, LLVM currently uses __gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these libcalls are an ARM-ism and only provided by libgcc on that platform. compiler-rt always provides both libcalls. Use the standard libcalls by default, and only use the __gnu libcalls on ARM.	2025-02-19 10:16:57 +01:00
Sam Parker	948a8477c6	[WebAssembly] Recognise EXTEND_HIGH (#123325 ) When lowering EXTEND_VECTOR_INREG, check whether the operand is a shuffle that is moving the top half of a vector into the lower half. If so, we can EXTEND_HIGH the input to the shuffle instead.	2025-02-17 09:04:29 +00:00
Sam Parker	df2de13695	[WebAssembly] Autovec support for dot (#123207 ) Enable the use of partial.reduce.add that we can lower to dot or a tree of (add (extmul_low_u, extmul_high_u)) for the unsigned case. We support both v8i16 and v16i8 inputs.	2025-02-03 08:58:43 +00:00
yingopq	754ed95b66	[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525 ) …on returning { i8, i128 } Fixes https://github.com/llvm/llvm-project/issues/96432.	2025-01-20 16:47:40 +08:00
Sergei Barannikov	9ae92d7056	[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969 ) With this change, targets are no longer required to put memory / strict-fp opcodes after special `ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers. This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709). Pull Request: https://github.com/llvm/llvm-project/pull/119969	2024-12-21 05:29:51 +03:00
David Sherwood	8630a7ba7c	Reapply "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 )" (#118823 ) [Reverts d57892a2a153ab71a796f07e39d939eae6910c21] For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant. --------- Co-authored-by: Paul Walker <paul.walker@arm.com>	2024-12-09 10:56:44 +00:00
Vitaly Buka	d57892a2a1	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693 ) Reverts llvm/llvm-project#117566 Breaks libc++ tests with HWASAN https://lab.llvm.org/buildbot/#/builders/55/builds/3959	2024-12-04 12:36:46 -08:00
David Sherwood	4675db5f39	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-12-04 10:26:51 +00:00
Dan Gohman	c3536b263f	[WebAssembly] Define call-indirect-overlong and bulk-memory-opt features (#117087 ) This defines some new target features. These are subsets of existing features that reflect implementation concerns: - "call-indirect-overlong" - implied by "reference-types"; just the overlong encoding for the `call_indirect` immediate, and not the actual reference types. - "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and `memory.fill`, and not the other instructions in the bulk-memory proposal. This is split out from https://github.com/llvm/llvm-project/pull/112035. --------- Co-authored-by: Heejin Ahn <aheejin@gmail.com>	2024-12-02 17:08:07 -08:00
Sam Clegg	ea58410d0f	[WebAssembly] Implement %llvm.thread.pointer intrinsic (#117817 ) We can simply use the `__tls_base` global for this which is guaranteed to be non-zero and unique per thread. Fixes: #117433	2024-11-26 17:19:14 -08:00
David Sherwood	9b76e7fc60	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 )" (#117556 ) This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.	2024-11-25 13:49:21 +00:00
David Sherwood	22ec44f509	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-11-25 09:25:01 +00:00
Kazu Hirata	43570a2841	[WebAssembly] Remove unused includes (NFC) (#116318 ) Identified with misc-include-cleaner.	2024-11-15 07:26:37 -08:00
Dan Gohman	118445841d	[WebAssembly] Protect memory.fill and memory.copy from zero-length ranges. (#112617 ) WebAssembly's `memory.fill` and `memory.copy` instructions trap if the pointers are out of bounds, even if the length is zero. This is different from LLVM, which expects that it can call `memcpy` on arbitrary invalid pointers if the length is zero. To avoid spurious traps, branch around `memory.fill` and `memory.copy` when the length is zero. --------- Co-authored-by: Heejin Ahn <aheejin@gmail.com>	2024-10-24 14:13:58 -07:00
Jordan Rupprecht	33363521ca	[NFC][WebAssembly] Inline var only used in assertion (#113507 )	2024-10-23 18:51:25 -05:00
Alex Crichton	c2293b33dd	[WebAssembly] Implement the wide-arithmetic proposal (#111598 ) This commit implements the [wide-arithmetic] proposal which has recently reached phase 2 in the WebAssembly proposals process. The goal here is to implement support in LLVM for emitting these instructions which are gated behind a new feature flag by default. A new `wide-arithmetic` feature flag is introduced which gates these four new instructions from being emitted. Emission of each instruction itself is relatively simple given LLVM's preexisting lowering rules and infrastructure. The main gotcha is that due to the multi-result nature of all of these instructions it needed the lowerings to be implemented in C++ rather than in TableGen. [wide-arithmetic]: https://github.com/WebAssembly/wide-arithmetic	2024-10-23 11:39:58 -07:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Simon Pilgrim	f8f0a266e0	[clang][wasm] Replace the target integer sub saturate intrinsics with the equivalent generic `__builtin_elementwise_sub_sat` intrinsics (#109405 ) Remove the Intrinsic::wasm_sub_sat_signed/wasm_sub_sat_unsigned entries and just use sub_sat_s/sub_sat_u directly	2024-09-22 10:12:41 +01:00
Brendan Dahl	c076638c70	[WebAssembly] Support BUILD_VECTOR with F16x8. (#108117 ) Convert BUILD_VECTORS with FP16x8 to I16x8 since there's no FP16 scalar value to intialize v128.const.	2024-09-11 10:00:10 -07:00
Brendan Dahl	415288a2a7	[WebAssembly] Add load and store patterns for V8F16. (#108119 )	2024-09-11 09:53:53 -07:00

1 2 3 4 5 ...

431 Commits