llvm-project

Author	SHA1	Message	Date
Austin Jiang	e6cdfb75ac	Fix typos and spelling errors across codebase (#156270 ) Corrected various spelling mistakes such as 'occurred', 'receiver', 'initialized', 'length', and others in comments, variable names, function names, and documentation throughout the project. These changes improve code readability and maintain consistency in naming and documentation. Co-authored-by: Louis Dionne <ldionne.2@gmail.com>	2026-01-13 11:52:46 -05:00
Sam Parker	e5b6833e49	[WebAssembly] vi8 mul cost modelling. (#175177 ) We've already optimised these, so update the cost model to reflect it. And skip the isBeforeLegalize check when lowering i8 muls, because it then misses the cases where, say v32i8, has been type legalised into 2x v16i8. Also explicitly disable memory interleaving for any factor other than two or four.	2026-01-12 09:25:54 +00:00
Derek Schuff	7a22bea512	[WebAssembly] Expand vector frem instructions (#174854 ) Commit `6ad41bcc49` changed how frem is expanded during legalization and it broke WebAssembly but we were missing test coverage. We want to maintain our previous behavior of unrolling vectors and using a libcall to implement scalar frem. I'm not sure why this now has to be different (in ISelLowering) from other libcalls like fsin which work the same way in the end, but this code does accurately describe what we want. Fixes: https://github.com/emscripten-core/emscripten/issues/25991	2026-01-08 16:19:44 -08:00
Islam Imad	7ceecfad40	[CodeGen] Fix EVT::changeVectorElementType assertion on simple-to-extended fallback (#173413 ) Fixes #171608	2025-12-28 18:51:18 +00:00
Frederik Harwath	6ad41bcc49	[CodeGen] expand-fp: Change frem expansion criterion (#158285 ) The existing condition for checking whether or not to expand an frem instruction in expand-fp is not sufficiently precise. The expansion on other targets than AMDGPU - which is the only intended user right now - is only prevented due to the interaction with the MaxLegalFpConvertBitWidth check. Relying on this is conceptually wrong and limits the use of the pass for other targets and further expansions (e.g. merging with the similar ExpandLargeDivRem pass). Change the expansion criterion to always expand frem of a given type for targets that use "Expand" as the legalization action for the underlying scalar type and use this to exit the pass early for targets which do not require any expansions. This requires to change the frem legalization action for all targets which do not want frem to be expanded in this pass from "Expand" to "LibCall". --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-12-16 17:31:26 +01:00
Derek Schuff	6d60d3d7e4	Revert "[WebAssembly] Implement addrspacecast to funcref" (#170785 ) Reverts llvm/llvm-project#166820 There was a failure in the ENABLE_EXPENSIVE_CHECKS configuration.	2025-12-04 17:24:14 -08:00
Demetrius Kanios	d3b9fd0f86	[WebAssembly] Implement addrspacecast to funcref (#166820 ) Adds lowering of `addrspacecast [0 -> 20]` to allow easy conversion of function pointers to Wasm `funcref` When given a constant function pointer, it lowers to a direct `ref.func`. Otherwise it lowers to a `table.get` from `__indirect_function_table` using the provided pointer as the index.	2025-12-04 16:34:42 -08:00
Robert Imschweiler	5c3c0020af	[NFC] Refactor TargetLowering::getTgtMemIntrinsic to take CallBase parameter (#170334 ) cf. https://github.com/llvm/llvm-project/pull/133907#discussion_r2578576548	2025-12-02 19:42:31 +01:00
Sam Parker	e44646b795	[WebAssembly] Lower ANY_EXTEND_VECTOR_INREG (#167529 ) Treat it in the same manner of zero_extend_vector_inreg and generate an extend_low_u if possible. This is to try an prevent expensive shuffles from being generated instead. computeKnownBitsForTargetNode has also been updated to specify known zeros on extend_low_u.	2025-11-20 08:57:08 +00:00
Matt Arsenault	a757c4e74e	CodeGen: Add subtarget to TargetLoweringBase constructor (#168620 ) Currently LibcallLoweringInfo is defined inside of TargetLowering, which is owned by the subtarget. Pass in the subtarget so we can construct LibcallLoweringInfo with the subtarget. This is a temporary step that should be revertable in the future, after LibcallLoweringInfo is moved out of TargetLowering.	2025-11-19 19:18:13 +00:00
Hongyu Chen	63e6373efd	[WebAssembly] Truncate extra bits of large elements in BUILD_VECTOR (#167223 ) Fixes https://github.com/llvm/llvm-project/issues/165713 This patch handles out-of-bound vector elements and truncates extra bits.	2025-11-17 10:39:18 +00:00
Sam Parker	9e6a31f832	[WebAssembly] vf32 to vi8, vi16 lowering (#164644 ) Avoid scalarizing the conversion and use trunc_sat and narrow instead.	2025-11-06 08:32:44 +00:00
Sergei Barannikov	0c73009236	[WebAssembly] TableGen-erate SDNode descriptions (#166259 ) This allows SDNodes to be validated against their expected type profiles and reduces the number of changes required to add a new node. CALL and RET_CALL do not have a description in td files, and it is not currently possible to add one as these nodes have both variable operands and variable results. This also fixes a subtle bug detected by the enabled verification functionality. `LOCAL_GET` is declared with `SDNPHasChain` property, and thus should have both a chain operand and a chain result. The original code created a node without a chain result, which caused a check in `SDNodeInfo::verifyNode()` to fail. Part of #119709. Pull Request: https://github.com/llvm/llvm-project/pull/166259	2025-11-05 06:24:53 +03:00
Jasmine Tang	1fbfac30f1	[WebAssembly] [Codegen] Add pattern for relaxed min max from fminimum/fmaximum over v4f32 and v2f64 (#162948 ) Related to #55932	2025-10-22 03:08:24 -07:00
Derek Schuff	19a58a5208	[WebAssembly] Optimize lowering of constant-sized memcpy and memset (#163294 ) We currently emit a check that the size operand isn't zero, to avoid executing the wasm memory.copy instruction when it would trap. But this isn't necessary if the operand is a constant. Fixes #163245	2025-10-14 22:00:25 +00:00
Sam Parker	1820102167	Wasm fmuladd relaxed (#163177 ) Reland #161355, after fixing up the cross-projects-tests for the wasm simd intrinsics. Original commit message: Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 16:50:53 +01:00
Sam Parker	30d3441cf0	Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171 ) Reverts llvm/llvm-project#161355 Looks like I've broken some intrinsic code generation.	2025-10-13 11:53:40 +01:00
Sam Parker	a4eb7ea225	[WebAssembly] Lower fmuladd to madd and nmadd (#161355 ) Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 10:36:08 +01:00
Derek Schuff	abc8aac6d2	[WebAssembly] Check intrinsic argument count before Any/All combine (#162163 ) This code is activated on all INTRINSIC_WO_CHAIN but only handles a selection. However it was trying to read the arguments before checking which intrinsic it was handling. This fails for intrinsics that have no arguments.	2025-10-07 23:52:25 +00:00
Sam Parker	156e9b4b69	[WebAssembly] Use partial_reduce_mla ISD nodes (#161184 ) Addresssing issue #160847. Move away from combining the intrinsic call and instead lower the ISD nodes, using tablegen for pattern matching.	2025-09-30 08:28:56 +01:00
Sander de Smalen	17e008db17	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637 ) The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.	2025-09-17 11:44:47 +01:00
Sam Parker	586c0ad918	[WebAssembly] Support partial-reduce accumulator (#158060 ) We currently only support partial.reduce.add in the case where we are performing a multiply-accumulate. Now add support for any partial reduction where the input is being extended, where we can take advantage of extadd_pairwise.	2025-09-12 07:03:49 +01:00
Sam Parker	6dacdc31ec	[WebAssembly] extadd_pairwise for PartialReduce (#157669 ) Avoid using extends, and adding the high and low half and use extadd_pairwise instead.	2025-09-10 08:13:46 +01:00
Sam Parker	e557ad687b	[WebAssembly] v8i8 mul support (#151145 ) During DAG combine, promote the operands to v8i16 by concanting with an undef vector and then use extmul_low to perform the mul at i16. Finally, shuffle the low bytes out of the i16 elements into the result vector.	2025-08-27 11:39:26 +01:00
Jasmine Tang	7fcee5fe08	[WebAssembly] Add support for avgr_u in loops (#153252 ) Fixes https://github.com/llvm/llvm-project/issues/150550. With the test case ``` void f(unsigned char x, unsigned char y, int n) { // should have been vectorized into avgr_u instead of seperated vectorized add and logical right shift for (int i = 0; i < n; i++) x[i] = (x[i] + y[i] + 1) / 2; } ``` the backend failed to recognize that this can be reduced to avgr_u since the loop vectorizer doesn't transform into the existing pattern in tablegen. This PR sets AVGCEIL_U as legal for v8i16 and v16i8 and selects it to avgr_u in the tablegen file.	2025-08-22 09:52:49 -07:00
Jasmine Tang	d7a29e5d56	[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC (#153703 ) This PR reapplies https://github.com/llvm/llvm-project/pull/149461 In the original `combineVectorSizedSetCCEquality`, the result of setcc is being negated by returning setcc with the same cond code, leading to wrong logic. For example, with ```llvm %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16) %res = icmp eq i32 %cmp_16, 0 ``` the original PR producese all_true and then also compares the result equal to 0 (using the same SETEQ in the returning setcc), meaning that semantically, it effectively is calling icmp ne. Instead, the PR should have use SETNE in the returning setcc, this way, all true return 1, then it is compared again ne 0, which is equivalent to icmp eq.	2025-08-15 12:06:47 -07:00
Nikita Popov	240c454c4d	[CodeGen] Remove default ctors for InputArg and OutputArg (#153205 ) These make it easy to forget to initialize some members, like the newly added OrigTy. Force these to always go through the ctor instead.	2025-08-13 10:51:43 +02:00
Jasmine Tang	d32793ca6e	Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360 ) Reverts llvm/llvm-project#149461 The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the Emscripten test suite has failed. This PR applies a revert so I can take a closer look at it Test case link: https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o something.js` Original comment report: https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746	2025-08-13 07:41:44 +00:00
Jasmine Tang	348f01f89c	[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 (#149461 ) Fixes https://github.com/llvm/llvm-project/issues/149230 Previously, even with simd enabled via `-mattr=+simd128`, the compiler cannot utilize v128 to optimize loads and setcc of i128, instead legalizing it to consecutive i64s. This PR then adds support for setcc of i128 by converting them to v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16 bytes or more (when simd128 is present). The check for enabling this optimization is if the comparison operand is either a load or an integer in i128, with the comparison code being either `EQ \| NE`, without `NoImplicitFloat` function flag. Inspiration taken from RISCV's isel lowering.	2025-08-12 11:04:37 -07:00
Nikita Popov	406d9b1dd6	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319 ) The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).	2025-08-07 09:12:40 +02:00
Sam Parker	68152f1301	[WebAssembly] v16i8 mul support (#150209 ) During target DAG combine, use two i16x8.extmul_low_i8x16 and a shuffle for v16i8 mul. On my AArch64 machine, using V8, I observe a 3.14% geomean improvement across 65 benchmarks, including: 9.2% for spec2017.x264, 6% for libyuv and 1.8% for ncnn.	2025-07-29 09:23:31 +01:00
Jasmine Tang	8e6a05d471	[WebAssembly] Added vectorized version of fexp10 to the supported list (#150564 ) Fixes https://github.com/llvm/llvm-project/issues/117200. The default behavior in TargetLoweringBase is only scalar floats on fexp are supported by default, not the vectorized version. This PR adds `ISD::FEXP10` to the supported list.	2025-07-25 12:30:59 -07:00
Hood Chatham	15715f4089	[WebAssembly,llvm] Add llvm.wasm.ref.test.func intrinsic (#147486 ) This adds an llvm intrinsic for WebAssembly to test the type of a function. It is intended for adding a future clang builtin ` __builtin_wasm_test_function_pointer_signature` so we can test whether calling a function pointer will fail with function signature mismatch. Since the type of a function pointer is just `ptr` we can't figure out the expected type from that. The way I figured out to encode the type was by passing 0's of the appropriate type to the intrinsic. The first argument gives the expected type of the return type and the later values give the expected type of the arguments. So ```llvm @llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0) ``` tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to: ```wat local.get $func table.get $__indirect_function_table ref.test (double, i32) -> (i32) ``` To indicate the function should be void, I somewhat arbitrarily picked `token poison`, so the following tests for `(i32) -> ()`: ```llvm @llvm.wasm.ref.test.func(ptr %func, token poison, i32 0) ``` To lower this intrinsic, we need some place to put the type information. With `encodeFunctionSignature()` we encode the signature information into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in `WebAssemblyMCInstLower.cpp`.	2025-07-22 14:07:34 -07:00
Arseny Kapoulkine	5b98992fb9	[WebAssembly] Optimize convert_iKxN_u into convert_iKxN_s (#149609 ) convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is known to have sign bit 0. This results in emitting Wasm opcodes that, on some targets (like x86_64), are dramatically slower than signed versions on major engines. Similarly to X86, we now fix this up in isel when the instruction has nonneg flag from canonicalization or if we know the source has zero sign bit. Fixes #149457.	2025-07-21 09:17:29 -07:00
Jasmine Tang	343f7475be	[WebAssembly] Add support for memcmp expansion (#148298 ) Fixes https://github.com/llvm/llvm-project/issues/61400 Added test case in llvm/test/CodeGen/WebAssembly/memcmp-expand.ll	2025-07-20 10:27:42 -07:00
Matt Arsenault	d8ef156379	DAG: Remove verifyReturnAddressArgumentIsConstant (#147240 ) The intrinsic argument is already marked with immarg so non-constant values are rejected by the IR verifier.	2025-07-07 16:28:47 +09:00
jjasmine	e9c9f8f374	[WebAssembly] Fold any/alltrue (setcc x, 0, eq/ne) to [not] any/alltrue x (#144741 ) Fixes https://github.com/llvm/llvm-project/issues/50142, a miss of further vectorization, where we can only achieve zext (xor (any_true), -1). Now in test case simd-setcc-reductions, it's converted to all_true. Also fixes https://github.com/llvm/llvm-project/issues/145177, which is all_true (setcc x, 0, eq) -> not any_true any_true (setcc x, 0, ne) -> any_true all_true (setcc x, 0, ne) -> all_true --------- Co-authored-by: badumbatish <--show-origin>	2025-07-01 15:27:37 -07:00
jjasmine	4a8c1f7d12	[WebAssembly] [Backend] Wasm optimize illegal bitmask (#145627 ) [WebAssembly] [Backend] Wasm optimize illegal bitmask for #131980. Currently, the case for illegal bitmask (v32i8 or v64i8) is that at the SelectionDag level, two (four) vectors of v128 will be concatenated together, then they'll all be SETCC by the same pseudo illegal instruction, which requires expansion later on. I opt for SETCC-ing them seperately, bitcast and zext them and then add them up together in the end. --------- Co-authored-by: badumbatish <--show-origin>	2025-07-01 15:13:08 -07:00
Sam Parker	d12fb1fc37	[WebAssembly] Refactor PerformSETCCCombine (#144875 ) Extract the logic into a templated helper function.	2025-06-25 08:56:35 +01:00
Matt Arsenault	ba7369c49c	WebAssembly: Move runtime libcall setting out of TargetLowering (#142624 ) RuntimeLibcallInfo needs to be correct outside of codegen contexts.	2025-06-16 10:46:05 +09:00
Kazu Hirata	dd702b3969	[llvm] Remove unused local variables (NFC) (#140422 )	2025-05-18 07:31:51 -07:00
Kazu Hirata	b4ab53c3b0	[Target] Use llvm::max_element (NFC) (#137926 )	2025-05-01 23:44:28 -07:00
Alex Crichton	c63246645e	[WebAssembly] Add a missing `break` statement (#133783 ) This fixes an issue introduced in #132430 where a `break;` statement was accidentally missing causing unintended fall-through.	2025-03-31 12:58:06 -07:00
Alex Crichton	a415b7f86e	[WebAssembly] Add more lowerings for wide-arithmetic (#132430 ) This commit is the result of investigation and discussion on WebAssembly/wide-arithmetic#6 where alternatives to the `i64.add128` instruction were discussed but ultimately deferred to a future proposal. In spite of this though I wanted to apply a few changes to the LLVM backend here with `wide-arithmetic` enabled for a few minor changes: * A lowering for the `ISD::UADDO` node is added which uses `add128` where the upper bits of the two operands are constant zeros and the result of the 128-bit addition is the result of the overflowing addition. * The high bits of a `I64_ADD128` node are now flagged as "known zero" if the upper bits of the inputs are also zero, assisting this `UADDO` lowering to ensure the backend knows that the carry result is a 1-bit result. A few tests were then added to showcase various lowerings for various operations that can be done with wide-arithmetic. They don't all optimize super well at this time but I wanted to add them as a reference here regardless to have them on-hand for future evaluations if necessary.	2025-03-31 11:36:32 -07:00
Sam Parker	103119a435	[WebAssembly] Lower wide SIMD i8 muls (#130785 ) Currently, 'wide' i32 simd multiplication, with extended i8 elements, will perform the multiplication with i32 So, for IR like the following: ``` %wide.a = sext <8 x i8> %a to <8 x i32> %wide.b = sext <8 x i8> %a to <8 x i32> %mul = mul <8 x i32> %wide.a, %wide.b ret <8 x i32> %mul ``` We would generate the following sequence: ``` i16x8.extend_low_i8x16_s $push6=, $1 local.tee $push5=, $3=, $pop6 i32x4.extmul_low_i16x8_s $push0=, $pop5, $3 v128.store 0($0), $pop0 i8x16.shuffle $push1=, $1, $1, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 i16x8.extend_low_i8x16_s $push4=, $pop1 local.tee $push3=, $1=, $pop4 i32x4.extmul_low_i16x8_s $push2=, $pop3, $1 v128.store 16($0), $pop2 return ``` But now we perform the multiplication with i16, resulting in: ``` i16x8.extmul_low_i8x16_s $push3=, $1, $1 local.tee $push2=, $1=, $pop3 i32x4.extend_high_i16x8_s $push0=, $pop2 v128.store 16($0), $pop0 i32x4.extend_low_i16x8_s $push1=, $1 v128.store 0($0), $pop1 return ```	2025-03-21 06:57:57 +00:00
Brendan Dahl	9102afcd01	[WebAssembly] Use the same lowerings for f16x8 as other float vectors. (#127897 ) This fixes failures to select the various compare operations that weren't being expanded for f16x8.	2025-02-25 11:01:32 -08:00
Brendan Dahl	67056c280a	[WebAssembly] Support shuffle for F16x8 vectors. (#127857 )	2025-02-25 10:39:54 -08:00
Nikita Popov	cc539138ac	[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (#126880 ) The standard libcalls for half to float and float to half conversion are __extendhfsf2 and __truncsfhf2. However, LLVM currently uses __gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these libcalls are an ARM-ism and only provided by libgcc on that platform. compiler-rt always provides both libcalls. Use the standard libcalls by default, and only use the __gnu libcalls on ARM.	2025-02-19 10:16:57 +01:00
Sam Parker	948a8477c6	[WebAssembly] Recognise EXTEND_HIGH (#123325 ) When lowering EXTEND_VECTOR_INREG, check whether the operand is a shuffle that is moving the top half of a vector into the lower half. If so, we can EXTEND_HIGH the input to the shuffle instead.	2025-02-17 09:04:29 +00:00
Sam Parker	df2de13695	[WebAssembly] Autovec support for dot (#123207 ) Enable the use of partial.reduce.add that we can lower to dot or a tree of (add (extmul_low_u, extmul_high_u)) for the unsigned case. We support both v8i16 and v16i8 inputs.	2025-02-03 08:58:43 +00:00

1 2 3 4 5 ...

448 Commits