llvm-project

Author	SHA1	Message	Date
Jasmine Tang	522ac23609	[WebAssembly] Add pattern for relaxed nmadd (#150684 ) Following footstep of https://github.com/llvm/llvm-project/pull/147487 (support for madd), this PR adds support for nmadd. https://github.com/llvm/llvm-project/issues/55932 tracks this	2025-07-28 10:20:04 -07:00
Hood Chatham	15b03687ff	[WebAssembly,clang] Add __builtin_wasm_test_function_pointer_signature (#150201 ) Tests if the runtime type of the function pointer matches the static type. If this returns false, calling the function pointer will trap. Uses `@llvm.wasm.ref.test.func` added in #147486. Also adds a "gc" wasm feature to gate the use of the ref.test instruction.	2025-07-25 16:52:39 -07:00
Jasmine Tang	8e6a05d471	[WebAssembly] Added vectorized version of fexp10 to the supported list (#150564 ) Fixes https://github.com/llvm/llvm-project/issues/117200. The default behavior in TargetLoweringBase is only scalar floats on fexp are supported by default, not the vectorized version. This PR adds `ISD::FEXP10` to the supported list.	2025-07-25 12:30:59 -07:00
Nikita Popov	129a35454c	[WebAssemblyOptimizeReturned] Skip lifetime intrinsic uses Replacing an alloca with a call result in a lifetime intrinsic will cause a verifier error. Fixes https://github.com/llvm/llvm-project/issues/150498.	2025-07-25 12:12:26 +02:00
Hood Chatham	e3b79afa67	[WebAssembly,llvm] Fix buildbot problems with llvm.wasm.ref.test.func (#150116 ) PR #147486 broke the sanitizer and expensive-checks buildbot. These captures were needed when toWasmValType emitted a diagnostic but are no longer needed since we changed it to an assertion failure. This removes the unneeded captures and should fix the sanitizer-buildbot. I also fixed the codegen in the wasm64 target: table.get requires an i32 but in wasm64 the function pointer is an i64. We need an additional `i32.wrap_i64` to convert it. I also added `-verify-machineinstrs` to the tests so that the test suite validates this fix. Finally, I noticed that #150201 uses a feature of the intrinsic that is not covered by the tests, namely `ptr` arguments. So I added one additional test case to ensure that it works properly. cc @dschuff	2025-07-23 09:52:05 -07:00
Heejin Ahn	b13bca7387	[WebAssembly] Unstackify registers with no uses in ExplicitLocals (#149626 ) There are cases we end up removing some intructions that use stackified registers after RegStackify. For example, ```wasm bb.0: %0 = ... ;; %0 is stackified br_if %bb.1, %0 bb.1: ``` In this code, br_if will be removed in CFGSort, so we should unstackify %0 so that it can be correctly dropped in ExplicitLocals. Rather than handling this in case-by-case basis, this PR just unstackifies all stackifies register with no uses in the beginning of ExplicitLocals, so that they can be correctly dropped. Fixes #149097.	2025-07-22 15:34:23 -07:00
Hood Chatham	15715f4089	[WebAssembly,llvm] Add llvm.wasm.ref.test.func intrinsic (#147486 ) This adds an llvm intrinsic for WebAssembly to test the type of a function. It is intended for adding a future clang builtin ` __builtin_wasm_test_function_pointer_signature` so we can test whether calling a function pointer will fail with function signature mismatch. Since the type of a function pointer is just `ptr` we can't figure out the expected type from that. The way I figured out to encode the type was by passing 0's of the appropriate type to the intrinsic. The first argument gives the expected type of the return type and the later values give the expected type of the arguments. So ```llvm @llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0) ``` tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to: ```wat local.get $func table.get $__indirect_function_table ref.test (double, i32) -> (i32) ``` To indicate the function should be void, I somewhat arbitrarily picked `token poison`, so the following tests for `(i32) -> ()`: ```llvm @llvm.wasm.ref.test.func(ptr %func, token poison, i32 0) ``` To lower this intrinsic, we need some place to put the type information. With `encodeFunctionSignature()` we encode the signature information into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in `WebAssemblyMCInstLower.cpp`.	2025-07-22 14:07:34 -07:00
Sam Parker	03b90486da	[WebAssembly] Memory interleave test (#149045 ) Precommit codegen test for vectorization cost modelling.	2025-07-22 09:50:47 +01:00
Arseny Kapoulkine	5b98992fb9	[WebAssembly] Optimize convert_iKxN_u into convert_iKxN_s (#149609 ) convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is known to have sign bit 0. This results in emitting Wasm opcodes that, on some targets (like x86_64), are dramatically slower than signed versions on major engines. Similarly to X86, we now fix this up in isel when the instruction has nonneg flag from canonicalization or if we know the source has zero sign bit. Fixes #149457.	2025-07-21 09:17:29 -07:00
Jasmine Tang	343f7475be	[WebAssembly] Add support for memcmp expansion (#148298 ) Fixes https://github.com/llvm/llvm-project/issues/61400 Added test case in llvm/test/CodeGen/WebAssembly/memcmp-expand.ll	2025-07-20 10:27:42 -07:00
jjasmine	6640b0a293	[WebAssembly] Add patterns for relaxed madd (#147487 ) [WebAssembly] Fold fadd contract (fmul contract) to relaxed madd w/ -mattr=+simd128,+relaxed-simd Fixes #121311 - Precommit test for #121311 - Fold fadd contract (fmul contract) to relaxed madd w/ -mattr=+simd128,+relaxed-simd - Move PatFrag of fadd_contract in ARM.td and WebAssembly.td to TargetSelectionDAG.td for reuse of pattern	2025-07-15 00:56:28 +08:00
jjasmine	44481f5067	[DAGCombine] Change isBuildVectorAll* -> isConstantSplatVectorAll* for Vselect (#147305 ) Change isBuildVectorAll* -> isConstantSplatVectorAll* in VSelect in case the fold happens after BuildVector has been canonically transformed to Splat or if the Splat is initially in vselect already - Fixes #73454 - Update related test cases, add extra tests in wasm --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-07-11 10:13:05 +01:00
Matt Arsenault	1e26443cf9	CodeGen: Remove redundant REQUIRES registered-target from backend tests (#147475 ) These are already applied to all the tests in the target subdirectory	2025-07-09 09:25:53 +09:00
Matt Arsenault	3697d6dd98	DAG: Fall back to separate sin and cos when softening sincos (#147468 ) Fix asserting in the error case.	2025-07-09 01:52:46 +09:00
Matt Arsenault	4a507b1f56	WebAssembly: Add test for sincos intrinsic (#147467 )	2025-07-09 01:49:17 +09:00
jjasmine	cbc2ac5db8	[WebAssembly] Fold TargetGlobalAddress with added offset (#145829 ) Previously we only folded TargetGlobalAddresses into the memarg if they were on their own, so this patch supports folding TargetGlobalAddresses that are added to some other offset. Previously we weren't able to do this because we didn't have nuw on the add, but we can now that getelementptr has nuw and is plumbed through to the add in 0564d0665b302d1c7861e03d2995612f46613a0f. Fixes #61930	2025-07-03 11:01:36 +01:00
Matt Arsenault	6ab7e52dd8	WebAssembly: Move validation of EH flags to TargetMachine construct time (#146634 )	2025-07-03 07:25:38 +09:00
Alex Crichton	a8a9a7f95a	[WebAssembly] Fix inline assembly with vector types (#146574 ) This commit fixes using inline assembly with v128 results. Previously this failed with an internal assertion about a failure to legalize a `CopyFromReg` where the source register was typed `v8f16`. It looks like the type used for the destination register was whatever was listed first in the `def V128 : WebAssemblyRegClass` listing, so the types were shuffled around to have a default-supported type. A small test was added as well which failed to generate previously and should now pass in generation. This test passed on LLVM 18 additionally and regressed by accident in #93228 which was first included in LLVM 19.	2025-07-01 20:26:30 -07:00
jjasmine	e9c9f8f374	[WebAssembly] Fold any/alltrue (setcc x, 0, eq/ne) to [not] any/alltrue x (#144741 ) Fixes https://github.com/llvm/llvm-project/issues/50142, a miss of further vectorization, where we can only achieve zext (xor (any_true), -1). Now in test case simd-setcc-reductions, it's converted to all_true. Also fixes https://github.com/llvm/llvm-project/issues/145177, which is all_true (setcc x, 0, eq) -> not any_true any_true (setcc x, 0, ne) -> any_true all_true (setcc x, 0, ne) -> all_true --------- Co-authored-by: badumbatish <--show-origin>	2025-07-01 15:27:37 -07:00
jjasmine	4a8c1f7d12	[WebAssembly] [Backend] Wasm optimize illegal bitmask (#145627 ) [WebAssembly] [Backend] Wasm optimize illegal bitmask for #131980. Currently, the case for illegal bitmask (v32i8 or v64i8) is that at the SelectionDag level, two (four) vectors of v128 will be concatenated together, then they'll all be SETCC by the same pseudo illegal instruction, which requires expansion later on. I opt for SETCC-ing them seperately, bitcast and zext them and then add them up together in the end. --------- Co-authored-by: badumbatish <--show-origin>	2025-07-01 15:13:08 -07:00
SingleAccretion	cd46354dbd	[WebAssembly] Enable a limited amount of stackification for debug code (#136510 ) This change is a step towards fixing one long-standing problem with LLVM's debug WASM codegen: excessive use of locals. One local for each temporary value in IR (roughly speaking). This has a lot of problems: 1) It makes it easy to hit engine limitations of 50K locals with certain code patterns and large functions. 2) It makes for larger binaries that are slower to load and slower to compile to native code. 3) It makes certain compilation strategies (spill all WASM locals to stack, for example) for debug code excessively expensive and makes debug WASM code either run very slow, or be less debuggable. 4) It slows down LLVM itself. This change addresses these partially by running a limited version of the stackification pass for unoptimized code, one that gets rid of the most 'obviously' unnecessary locals. Care needs to be taken to not impact LLVM's ability to produce high quality debug variable locations with this pass. To that end: 1) We only allow stackification when it doesn't require moving any instructions. 2) We disable stackification of any locals that are used in DEBUG_VALUEs, or as a frame base. I have verified on a moderately large example that the baseline and the diff produce the same kinds (local/global/stack) of locations, and the only differences are due to the shifting of instruction offsets, with many local.[get\|set]s not being present anymore. Even with this quite conservative approach, the results are pretty good: 1) 30% reduction in raw code size, up to 10x reduction in the number of locals for select large methods (~1000 => ~100). 2) ~10% reduction in instructions retired for an "llc -O0" run on a moderately sized input.	2025-06-24 11:40:47 -07:00
Fangrui Song	28bda77843	Introduce MCAsmInfo::UsesSetToEquateSymbol and prefer = to .set Introduce MCAsmInfo::UsesSetToEquateSymbol to control the preferred syntax for symbol equating. We now favor the more readable and common `symbol = expression` syntax over `.set`. This aligns with pre- https://reviews.llvm.org/D44256 behavior. On Apple platforms, this resolves a clang -S vs -c behavior difference (resolves #104623). For targets whose = support is unconfirmed, UsesSetToEquateSymbol is set to false. This also minimizes test updates. Pull Request: https://github.com/llvm/llvm-project/pull/142289	2025-06-11 22:19:31 -07:00
Iris Shi	24d730b380	Reland "[SelectionDAG] Make `(a & x) \| (~a & y) -> (a & (x ^ y)) ^ y` available for all targets" (#143651 )	2025-06-11 15:56:37 +08:00
Iris Shi	8c890eaa3f	Revert "[SelectionDAG] Make `(a & x) \| (~a & y) -> (a & (x ^ y)) ^ y` available for all targets" (#143648 )	2025-06-11 10:19:12 +08:00
Iris Shi	bfb48363b0	[SelectionDAG] Make `(a & x) \| (~a & y) -> (a & (x ^ y)) ^ y` available for all targets (#137641 )	2025-06-09 17:57:15 +08:00
Pavel Verigo	9d89b05f11	[WebAssembly] Fix trunc in FastISel (#138479 ) Previous logic did not handle the case where the result bit size was between 32 and 64 bits inclusive. I updated the if-statements for more precise handling. An alternative solution would have been to abort FastISel in case the result type is not legal for FastISel. Resolves: #64222. This PR began as an investigation into the root cause of https://github.com/ziglang/zig/issues/20966. Godbolt link showing incorrect codegen on 20.1.0: https://godbolt.org/z/cEr4vY7d4.	2025-05-06 14:16:35 -07:00
Nikita Popov	6feb4a8ef4	[IR] Don't allow values of opaque type (#137625 ) Consider opaque types as non-first-class types, i.e. do not allow SSA values to have opaque type.	2025-04-30 15:01:00 +02:00
David Green	6c27817294	[SelectionDAG] Use SimplifyDemandedBits from SimplifyDemandedVectorElts Bitcast. (#133717 ) This adds a call to SimplifyDemandedBits from bitcasts with scalar input types in SimplifyDemandedVectorElts, which can help simplify the input scalar.	2025-04-03 11:14:08 +01:00
Florian Hahn	3bdf9a0880	[EquivalenceClasses] Use SmallVector for deterministic iteration order. (#134075 ) Currently iterators over EquivalenceClasses will iterate over std::set, which guarantees the order specified by the comperator. Unfortunately in many cases, EquivalenceClasses are used with pointers, so iterating over std::set of pointers will not be deterministic across runs. There are multiple places that explicitly try to sort the equivalence classes before using them to try to get a deterministic order (LowerTypeTests, SplitModule), but there are others that do not at the moment and this can result at least in non-determinstic value naming in Float2Int. This patch updates EquivalenceClasses to keep track of all members via a extra SmallVector and removes code from LowerTypeTests and SplitModule to sort the classes before processing. Overall it looks like compile-time slightly decreases in most cases, but close to noise: https://llvm-compile-time-tracker.com/compare.php?from=7d441d9892295a6eb8aaf481e1715f039f6f224f&to=b0c2ac67a88d3ef86987e2f82115ea0170675a17&stat=instructions PR: https://github.com/llvm/llvm-project/pull/134075	2025-04-02 20:27:43 +01:00
Sam Clegg	a30caa6a73	[WebAssembly] Add missing tests from #133289 (#133938 )	2025-04-01 10:47:35 -07:00
Alex Crichton	a415b7f86e	[WebAssembly] Add more lowerings for wide-arithmetic (#132430 ) This commit is the result of investigation and discussion on WebAssembly/wide-arithmetic#6 where alternatives to the `i64.add128` instruction were discussed but ultimately deferred to a future proposal. In spite of this though I wanted to apply a few changes to the LLVM backend here with `wide-arithmetic` enabled for a few minor changes: * A lowering for the `ISD::UADDO` node is added which uses `add128` where the upper bits of the two operands are constant zeros and the result of the 128-bit addition is the result of the overflowing addition. * The high bits of a `I64_ADD128` node are now flagged as "known zero" if the upper bits of the inputs are also zero, assisting this `UADDO` lowering to ensure the backend knows that the carry result is a 1-bit result. A few tests were then added to showcase various lowerings for various operations that can be done with wide-arithmetic. They don't all optimize super well at this time but I wanted to add them as a reference here regardless to have them on-hand for future evaluations if necessary.	2025-03-31 11:36:32 -07:00
Sam Parker	103119a435	[WebAssembly] Lower wide SIMD i8 muls (#130785 ) Currently, 'wide' i32 simd multiplication, with extended i8 elements, will perform the multiplication with i32 So, for IR like the following: ``` %wide.a = sext <8 x i8> %a to <8 x i32> %wide.b = sext <8 x i8> %a to <8 x i32> %mul = mul <8 x i32> %wide.a, %wide.b ret <8 x i32> %mul ``` We would generate the following sequence: ``` i16x8.extend_low_i8x16_s $push6=, $1 local.tee $push5=, $3=, $pop6 i32x4.extmul_low_i16x8_s $push0=, $pop5, $3 v128.store 0($0), $pop0 i8x16.shuffle $push1=, $1, $1, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 i16x8.extend_low_i8x16_s $push4=, $pop1 local.tee $push3=, $1=, $pop4 i32x4.extmul_low_i16x8_s $push2=, $pop3, $1 v128.store 16($0), $pop2 return ``` But now we perform the multiplication with i16, resulting in: ``` i16x8.extmul_low_i8x16_s $push3=, $1, $1 local.tee $push2=, $1=, $pop3 i32x4.extend_high_i16x8_s $push0=, $pop2 v128.store 16($0), $pop0 i32x4.extend_low_i16x8_s $push1=, $1 v128.store 0($0), $pop1 return ```	2025-03-21 06:57:57 +00:00
yonghong-song	0ffe83feac	[SelectionDAG] Not issue TRAP node if naked function (#132147 ) In [1], Nikita Popov suggested that during lowering 'unreachable' insn should not generate extra code for naked functions, and this applies to all architectures. Note that for naked functions, 'unreachable' insn is necessary in IR since the basic block needs a terminator to end. This patch checked whether a function is naked function or not. If it is a naked function, 'unreachable' insn will not generate ISD::TRAP. [1] https://github.com/llvm/llvm-project/pull/131731 Co-authored-by: Yonghong Song <yonghong.song@linux.dev>	2025-03-20 18:18:03 -07:00
Heejin Ahn	494fe0b414	[WebAssembly] Remove wasm-specific findWasmUnwindDestinations (#130374 ) Unlike in Itanium EH IR, WinEH IR's unwinding instructions (e.g. `invoke`s) can have multiple possible unwind destinations. For example: ```ll entry: invoke void @foo() to label %cont unwind label %catch.dispatch catch.dispatch: ; preds = %entry %0 = catchswitch within none [label %catch.start] unwind label %terminate catch.start: ; preds = %catch.dispatch %1 = catchpad within %0 [ptr null] ... terminate: ; preds = %catch.dispatch %2 = catchpad within none [] ... ... ``` In this case, if an exception is not caught by `catch.dispatch` (and thus `catch.start`), it should next unwind to `terminate`. `findUnwindDestination` in ISel gathers the list of this unwind destinations traversing the unwind edges: `ae42f07103/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L2089-L2150)` But we don't use that, and instead use our custom `findWasmUnwindDestinations` that only adds the first unwind destination, `catch.start`, to the successor list of `entry`, and not `terminate`: `ae42f07103/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L2037-L2087)` The reason behind it was, as described in the comment block in the code, it was assumed that there always would be an `invoke` that connects `catch.start` and `terminate`. In case of `catch (type)`, there will be `call void @llvm.wasm.rethrow()` in `catch.start`'s predecessor that unwinds to the next destination. For example: `0db702ac8e/llvm/test/CodeGen/WebAssembly/exception.ll (L429-L430)` In case of `catch (...)`, `__cxa_end_catch` can throw, so it becomes an `invoke` that unwinds to the next destination. For example: `0db702ac8e/llvm/test/CodeGen/WebAssembly/exception.ll (L537-L538)` So the unwind ordering relationship between `catch.start` and `terminate` here would be preserved. But turns out this assumption does not always hold. For example: ```ll entry: invoke void @foo() to label %cont unwind label %catch.dispatch catch.dispatch: ; preds = %entry %0 = catchswitch within none [label %catch.start] unwind label %terminate catch.start: ; preds = %catch.dispatch %1 = catchpad within %0 [ptr null] ... call void @_ZSt9terminatev() unreachable terminate: ; preds = %catch.dispatch %2 = catchpad within none [] call void @_ZSt9terminatev() unreachable ... ``` In this case there is no `invoke` that connects `catch.start` to `terminate`. So after `catch.dispatch` BB is removed in ISel, `terminate` is considered unreachable and incorrectly removed in DCE. This makes Wasm just use the general `findUnwindDestination`. In that case `entry`'s successor is going to be [`catch.start`, `terminate`]. We can get the first unwind destination by just traversing the list from the front. --- This required another change in WinEHPrepare. WinEHPrepare demotes all PHIs in EH pads because they are funclets in Windows and funclets can't have PHIs. When used in Wasm they are not funclets so we don't need to do that wholesale but we still need to demote PHIs in `catchswitch` BBs because they are deleted during ISel. (So we created [`-demote-catchswitch-only`](`a5588b6d20/llvm/lib/CodeGen/WinEHPrepare.cpp (L57-L59)`) option for that) But turns out we need to remove PHIs that have a `catchswitch` BB as an incoming block too: ```ll ... catch.dispatch: %0 = catchswitch within none [label %catch.start] unwind label %terminate catch.start: ... somebb: ... ehcleanup ; preds = %catch.dispatch, %somebb %1 = phi i32 [ 10, %catch.dispatch ], [ 20, %somebb ] ... ``` In this case the `phi` in `ehcleanup` BB should be demoted too because `catch.dispatch` BB will be removed in ISel so one if its incoming block will be gone. This pattern didn't manifest before presumably due to how `findWasmUnwindDestinations` worked. (In this example, in our `findWasmUnwindDestinations`, `catch.dispatch` would have had only one successor, `catch.start`. But now `catch.dispatch` has both `catch.start` and `ehcleanup` as successors, revealing this bug. This case is [represented](`ab87206c4b/llvm/test/CodeGen/WebAssembly/exception.ll (L445)`) by `rethrow_terminator` function in `exception.ll` (or `exception-legacy.ll`) and without the WinEHPrepare fix it will crash. --- Discovered by the reproducer provided in #126916, even though the bug reported there was not this one.	2025-03-10 20:56:38 -07:00
Derek Schuff	6916438b65	[WebAssembly] Add Libcall signatures for modf and variants (#130201 ) Clang now lowers modf/modff/modfl as builtins using the llvm.modf intrinsic.	2025-03-06 15:48:39 -08:00
Daniel Paoliello	16e051f0b9	[win] NFC: Rename `EHCatchret` to `EHCont` to allow for EH Continuation targets that aren't `catchret` instructions (#129953 ) This change splits out the renaming and comment updates from #129612 as a non-functional change.	2025-03-06 09:28:44 -08:00
Sam Clegg	147d9d6915	[WebAssemblyLowerEmscriptenEHSjLj] Avoid setting import_name where possible (#128564 ) This change effectively reverts 296ccef (https://reviews.llvm.org/D77192) Most of these symbols are just normal C symbols that get imported from wither libcompiler-rt or from emscripten's JS library code. In most cases it should not be necessary to give them explicit import names. The advantage of doing this is that we can wasm-ld can/will fail with a useful error message when these symbols are missing. As opposed to today where it will simply import them and defer errors until later (when they are less specific).	2025-02-26 14:05:00 -08:00
Brendan Dahl	9102afcd01	[WebAssembly] Use the same lowerings for f16x8 as other float vectors. (#127897 ) This fixes failures to select the various compare operations that weren't being expanded for f16x8.	2025-02-25 11:01:32 -08:00
Brendan Dahl	67056c280a	[WebAssembly] Support shuffle for F16x8 vectors. (#127857 )	2025-02-25 10:39:54 -08:00
Heejin Ahn	d2d469eb79	[WebAssembly] Make llvm.wasm.throw invokable (#128104 ) `llvm.wasm.throw` intrinsic can throw but it was not invokable. Not sure what the rationale was when it was first written that way, but I think at least in Emscripten's C++ exception support with the Wasm port of libunwind, `__builtin_wasm_throw`, which is lowered down to `llvm.wasm.rethrow`, is used only within `_Unwind_RaiseException`, which is an one-liner and thus does not need an `invoke`: `720e97f76d/system/lib/libunwind/src/Unwind-wasm.c (L69)` (`_Unwind_RaiseException` is called by `__cxa_throw`, which is generated by the `throw` C++ keyword) But this does not address other direct uses of the builtin in C++, whose use I'm not sure about but is not prohibited. Also other language frontends may need to use the builtin in different functions, which has `try`-`catch`es or destructors. This makes `llvm.wasm.throw` invokable in the backend. To do that, this adds a custom lowering routine to `SelectionDAGBuilder::visitInvoke`, like we did for `llvm.wasm.rethrow`. This does not generate `invoke`s for `__builtin_wasm_throw` yet, which will be done by a follow-up PR. Addresses #124710.	2025-02-25 09:53:01 -08:00
Sam Parker	ea7897a617	[WebAssembly] Enable interleaved memory accesses (#125696 ) Enable the vectorizer to access interleaved memory. This means that, when it's decided to be profitable, the memory accesses can be vectorized instead of the value being built up by a sequence of load_lane instructions. This will often increase the vectorization factor of the loop, leading to significantly better performance. I run a reasonably large collection of benchmarks and most are not affected by this change, with most performance changes <1%. But I see a 2.5% speedup for the total run time of TSVC, 1% speedup for SPEC2017 x265, 28% speedup for a ResNet workload and 95% for libyuv. This is running V8 on an AArch64 box.	2025-02-17 09:09:52 +00:00
Sam Parker	948a8477c6	[WebAssembly] Recognise EXTEND_HIGH (#123325 ) When lowering EXTEND_VECTOR_INREG, check whether the operand is a shuffle that is moving the top half of a vector into the lower half. If so, we can EXTEND_HIGH the input to the shuffle instead.	2025-02-17 09:04:29 +00:00
Sam Parker	df2de13695	[WebAssembly] Autovec support for dot (#123207 ) Enable the use of partial.reduce.add that we can lower to dot or a tree of (add (extmul_low_u, extmul_high_u)) for the unsigned case. We support both v8i16 and v16i8 inputs.	2025-02-03 08:58:43 +00:00
Sam Parker	28d7880618	[WebAssembly] getMemoryOpCost and getCastInstrCost (#122896 ) Add inital implementations of these TTI methods for SIMD types. For casts, The costing covers the free extensions provided by extmul_low as well as extend_low. For memory operations we consider the use of load32_zero and load64_zero, as well as full width v128 loads.	2025-01-31 10:33:31 +00:00
Heejin Ahn	539b2e0654	[WebAssembly] Fix catch block type in wasm64 (#124381 ) `try_table`'s `catch` or `catch_ref`'s target block's return type should be `i64` and `(i64, exnref)` in case of wasm64.	2025-01-27 11:01:48 -08:00
Heejin Ahn	c3dfd34e54	[WebAssembly] Add unreachable before catch destinations (#123915 ) When `try_table`'s catch clause's destination has a return type, as in the case of catch with a concrete tag, catch_ref, and catch_all_ref. For example: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table end_block ... use exnref ... ``` This code is not valid because the block's body type is not exnref. So we add an unreachable after the 'end_try_table' to make the code valid here: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table unreachable ;; Newly added end_block ``` Because 'unreachable' is a terminator we also need to split the BB. --- We need to handle the same thing for unwind mismatch handling. In the code below, we create a "trampoline BB" that will be the destination for the nested `try_table`~`end_try_table` added to fix a unwind mismatch: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... end_block ;; Trampoline BB throw_ref end_try_table ``` While the `block` added for the trampoline BB has the return type `exnref`, its body, which contains the nested `try_table` and other code, wouldn't have the `exnref` return type. Most times it didn't become a problem because the block's body ended with something like `br` or `return`, but that may not always be the case, especially when there is a loop. So we add an `unreachable` to make the code valid here too: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... unreachable ;; Newly added end_block ;; Trampoline BB throw_ref end_try_table ``` In this case we just append the `unreachable` at the end of the layout predecessor BB. (This was tricky to do in the first (non-mismatch) case because there `end_try_table` and `end_block` were added in the beginning of an EH pad in `placeTryTableMarker` and moving `end_try_table` and the new `unreachable` to the previous BB caused other problems.) --- This adds many `unreaachable`s to the output, but this adds `unreachable` to only a few places to see if this is working. The FileCheck lines in `exception.ll` and `cfg-stackify-eh.ll` are already heavily redacted to only leave important control-flow instructions, so I don't think it's worth adding `unreachable`s everywhere.	2025-01-22 22:39:43 -08:00
Matt Arsenault	5e79ae60a6	DAG: Fix vector_shuffle -> splat fold defining undef lanes (#123596 ) For shuffle vector splats with undef lanes in the mask, this was introducing real values. Filter out build_vector results based on the undef elements in the mask. This avoids AMDGPU test regressions in a future change. test/CodeGen/X86/urem-seteq-illegal-types.ll looks worse but I didn't investigate.	2025-01-21 23:55:50 +07:00
Heejin Ahn	a8e1135baa	[WebAssembly] Add -wasm-use-legacy-eh option (#122158 ) This replaces the existing `-wasm-enable-exnref` with `-wasm-use-legacy-eh` option, in an effort to make the new standardized exnref proposal the 'default' state and the legacy proposal needs to be separately enabled an option. But given that most users haven't switched to the new proposal and major web browsers haven't turned it on by default, this `-wasm-use-legacy-eh` is turned on by default, so nothing will change for now for the functionality perspective. This also removes the restriction that `-wasm-enable-exnref` be only used with `-wasm-enable-eh` because this option is enabled by default. This option does not have any effect when `-wasm-enable-eh` is not used.	2025-01-09 22:36:10 -08:00
Dan Gohman	c5ab70c508	[WebAssembly] Add `-i128:128` to the `datalayout` string. (#119204 ) Clang [defaults to aligning `__int128_t` to 16 bytes], while LLVM `datalayout` strings [default to aligning `i128` to 8 bytes]. Wasm is currently using the defaults for both, so it's inconsistent. Fix this by adding `-i128:128` to Wasm's `datalayout` string so that it aligns `i128` to 16 bytes too. This is similar to [llvm/llvm-project@dbad963](`dbad963a69`) for SPARC. This fixes rust-lang/rust#133991; see that issue for further discussion. [defaults to aligning `__int128_t` to 16 bytes]: `f8b4182f07/clang/lib/Basic/TargetInfo.cpp (L77)` [default to aligning `i128` to 8 bytes]: https://llvm.org/docs/LangRef.html#langref-datalayout	2024-12-10 09:21:58 -08:00
Dan Gohman	e665e781dc	[SelectionDAG] Use the nuw flag when expanding loads. (#119288 ) When expanding a load into two loads, use nuw for the add that computes the offset from the base of the second load, because the original load doesn't straddle the address space. It turns out there's already a dedicated helper function for doing this, `getObjectPtrOffset`. This is in target-independent code, however in practice it only seems to affact WebAssembly code, because WebAssembly load and store instructions' constant offsets don't perform wrapping, so constant folding often depends on the nuw flag being present. This was noticed in the development of #119204.	2024-12-10 06:28:09 -08:00

1 2 3 4 5 ...

1237 Commits