llvm-project

Author	SHA1	Message	Date
Alex Crichton	c63246645e	[WebAssembly] Add a missing `break` statement (#133783 ) This fixes an issue introduced in #132430 where a `break;` statement was accidentally missing causing unintended fall-through.	2025-03-31 12:58:06 -07:00
Alex Crichton	a415b7f86e	[WebAssembly] Add more lowerings for wide-arithmetic (#132430 ) This commit is the result of investigation and discussion on WebAssembly/wide-arithmetic#6 where alternatives to the `i64.add128` instruction were discussed but ultimately deferred to a future proposal. In spite of this though I wanted to apply a few changes to the LLVM backend here with `wide-arithmetic` enabled for a few minor changes: * A lowering for the `ISD::UADDO` node is added which uses `add128` where the upper bits of the two operands are constant zeros and the result of the 128-bit addition is the result of the overflowing addition. * The high bits of a `I64_ADD128` node are now flagged as "known zero" if the upper bits of the inputs are also zero, assisting this `UADDO` lowering to ensure the backend knows that the carry result is a 1-bit result. A few tests were then added to showcase various lowerings for various operations that can be done with wide-arithmetic. They don't all optimize super well at this time but I wanted to add them as a reference here regardless to have them on-hand for future evaluations if necessary.	2025-03-31 11:36:32 -07:00
Sam Parker	103119a435	[WebAssembly] Lower wide SIMD i8 muls (#130785 ) Currently, 'wide' i32 simd multiplication, with extended i8 elements, will perform the multiplication with i32 So, for IR like the following: ``` %wide.a = sext <8 x i8> %a to <8 x i32> %wide.b = sext <8 x i8> %a to <8 x i32> %mul = mul <8 x i32> %wide.a, %wide.b ret <8 x i32> %mul ``` We would generate the following sequence: ``` i16x8.extend_low_i8x16_s $push6=, $1 local.tee $push5=, $3=, $pop6 i32x4.extmul_low_i16x8_s $push0=, $pop5, $3 v128.store 0($0), $pop0 i8x16.shuffle $push1=, $1, $1, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 i16x8.extend_low_i8x16_s $push4=, $pop1 local.tee $push3=, $1=, $pop4 i32x4.extmul_low_i16x8_s $push2=, $pop3, $1 v128.store 16($0), $pop2 return ``` But now we perform the multiplication with i16, resulting in: ``` i16x8.extmul_low_i8x16_s $push3=, $1, $1 local.tee $push2=, $1=, $pop3 i32x4.extend_high_i16x8_s $push0=, $pop2 v128.store 16($0), $pop0 i32x4.extend_low_i16x8_s $push1=, $1 v128.store 0($0), $pop1 return ```	2025-03-21 06:57:57 +00:00
Brendan Dahl	9102afcd01	[WebAssembly] Use the same lowerings for f16x8 as other float vectors. (#127897 ) This fixes failures to select the various compare operations that weren't being expanded for f16x8.	2025-02-25 11:01:32 -08:00
Brendan Dahl	67056c280a	[WebAssembly] Support shuffle for F16x8 vectors. (#127857 )	2025-02-25 10:39:54 -08:00
Nikita Popov	cc539138ac	[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (#126880 ) The standard libcalls for half to float and float to half conversion are __extendhfsf2 and __truncsfhf2. However, LLVM currently uses __gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these libcalls are an ARM-ism and only provided by libgcc on that platform. compiler-rt always provides both libcalls. Use the standard libcalls by default, and only use the __gnu libcalls on ARM.	2025-02-19 10:16:57 +01:00
Sam Parker	948a8477c6	[WebAssembly] Recognise EXTEND_HIGH (#123325 ) When lowering EXTEND_VECTOR_INREG, check whether the operand is a shuffle that is moving the top half of a vector into the lower half. If so, we can EXTEND_HIGH the input to the shuffle instead.	2025-02-17 09:04:29 +00:00
Sam Parker	df2de13695	[WebAssembly] Autovec support for dot (#123207 ) Enable the use of partial.reduce.add that we can lower to dot or a tree of (add (extmul_low_u, extmul_high_u)) for the unsigned case. We support both v8i16 and v16i8 inputs.	2025-02-03 08:58:43 +00:00
yingopq	754ed95b66	[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525 ) …on returning { i8, i128 } Fixes https://github.com/llvm/llvm-project/issues/96432.	2025-01-20 16:47:40 +08:00
Sergei Barannikov	9ae92d7056	[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969 ) With this change, targets are no longer required to put memory / strict-fp opcodes after special `ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers. This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709). Pull Request: https://github.com/llvm/llvm-project/pull/119969	2024-12-21 05:29:51 +03:00
David Sherwood	8630a7ba7c	Reapply "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 )" (#118823 ) [Reverts d57892a2a153ab71a796f07e39d939eae6910c21] For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant. --------- Co-authored-by: Paul Walker <paul.walker@arm.com>	2024-12-09 10:56:44 +00:00
Vitaly Buka	d57892a2a1	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693 ) Reverts llvm/llvm-project#117566 Breaks libc++ tests with HWASAN https://lab.llvm.org/buildbot/#/builders/55/builds/3959	2024-12-04 12:36:46 -08:00
David Sherwood	4675db5f39	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-12-04 10:26:51 +00:00
Dan Gohman	c3536b263f	[WebAssembly] Define call-indirect-overlong and bulk-memory-opt features (#117087 ) This defines some new target features. These are subsets of existing features that reflect implementation concerns: - "call-indirect-overlong" - implied by "reference-types"; just the overlong encoding for the `call_indirect` immediate, and not the actual reference types. - "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and `memory.fill`, and not the other instructions in the bulk-memory proposal. This is split out from https://github.com/llvm/llvm-project/pull/112035. --------- Co-authored-by: Heejin Ahn <aheejin@gmail.com>	2024-12-02 17:08:07 -08:00
Sam Clegg	ea58410d0f	[WebAssembly] Implement %llvm.thread.pointer intrinsic (#117817 ) We can simply use the `__tls_base` global for this which is guaranteed to be non-zero and unique per thread. Fixes: #117433	2024-11-26 17:19:14 -08:00
David Sherwood	9b76e7fc60	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 )" (#117556 ) This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.	2024-11-25 13:49:21 +00:00
David Sherwood	22ec44f509	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-11-25 09:25:01 +00:00
Kazu Hirata	43570a2841	[WebAssembly] Remove unused includes (NFC) (#116318 ) Identified with misc-include-cleaner.	2024-11-15 07:26:37 -08:00
Dan Gohman	118445841d	[WebAssembly] Protect memory.fill and memory.copy from zero-length ranges. (#112617 ) WebAssembly's `memory.fill` and `memory.copy` instructions trap if the pointers are out of bounds, even if the length is zero. This is different from LLVM, which expects that it can call `memcpy` on arbitrary invalid pointers if the length is zero. To avoid spurious traps, branch around `memory.fill` and `memory.copy` when the length is zero. --------- Co-authored-by: Heejin Ahn <aheejin@gmail.com>	2024-10-24 14:13:58 -07:00
Jordan Rupprecht	33363521ca	[NFC][WebAssembly] Inline var only used in assertion (#113507 )	2024-10-23 18:51:25 -05:00
Alex Crichton	c2293b33dd	[WebAssembly] Implement the wide-arithmetic proposal (#111598 ) This commit implements the [wide-arithmetic] proposal which has recently reached phase 2 in the WebAssembly proposals process. The goal here is to implement support in LLVM for emitting these instructions which are gated behind a new feature flag by default. A new `wide-arithmetic` feature flag is introduced which gates these four new instructions from being emitted. Emission of each instruction itself is relatively simple given LLVM's preexisting lowering rules and infrastructure. The main gotcha is that due to the multi-result nature of all of these instructions it needed the lowerings to be implemented in C++ rather than in TableGen. [wide-arithmetic]: https://github.com/WebAssembly/wide-arithmetic	2024-10-23 11:39:58 -07:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Simon Pilgrim	f8f0a266e0	[clang][wasm] Replace the target integer sub saturate intrinsics with the equivalent generic `__builtin_elementwise_sub_sat` intrinsics (#109405 ) Remove the Intrinsic::wasm_sub_sat_signed/wasm_sub_sat_unsigned entries and just use sub_sat_s/sub_sat_u directly	2024-09-22 10:12:41 +01:00
Brendan Dahl	c076638c70	[WebAssembly] Support BUILD_VECTOR with F16x8. (#108117 ) Convert BUILD_VECTORS with FP16x8 to I16x8 since there's no FP16 scalar value to intialize v128.const.	2024-09-11 10:00:10 -07:00
Brendan Dahl	415288a2a7	[WebAssembly] Add load and store patterns for V8F16. (#108119 )	2024-09-11 09:53:53 -07:00
Brendan Dahl	5703d8572f	[WebAssembly] Add intrinsics to wasm_simd128.h for all FP16 instructions (#106465 ) Getting this to work required a few additional changes: - Add builtins for any instructions that can't be done with plain C currently. - Add support for the saturating version of fp_to_<s,i>_I16x8. Other vector sizes supported this already. - Support bitcast of f16x8 to v128. Needed to return a __f16x8 as v128_t.	2024-08-30 08:42:37 -07:00
Sergei Barannikov	4d7a0abae8	[DataLayout] Change return type of `getStackAlignment` to `MaybeAlign` (#105478 ) Currently, `getStackAlignment` asserts if the stack alignment wasn't specified. This makes it inconvenient to use and complicates testing. This change also makes `exceedsNaturalStackAlignment` method redundant.	2024-08-27 22:59:33 +03:00
Brendan Dahl	7d373cef49	[WebAssembly] Change half-precision feature name to fp16. (#105434 ) This better aligns with how the feature is being referred to and what runtimes (V8) are calling it.	2024-08-22 09:44:33 -07:00
Sam Parker	76c4529515	[WebAssembly] Fix assertion in LowerBUILD_VECTOR (#101961 ) The assertion was failing in the case where we were trying to lower to loadxx_zero, but lane zero was undef.	2024-08-05 14:38:12 -07:00
Sam Parker	08decd20a9	[WebAssembly] load_zero to initialise build_vector (#100610 ) Instead of splatting a single lane, to initialise a build_vector, lower to scalar_to_vector which can be selected to load_zero. Also add load_zero and load_lane patterns for f32x4 and f64x2.	2024-08-02 10:11:21 +01:00
Amara Emerson	f270a4dd66	[AArch64] Don't tail call memset if it would convert to a bzero. (#98969 ) Well, not quite that simple. We can tc memset since it returns the first argument but bzero doesn't do that and therefore we can end up miscompiling. This patch also refactors the logic out of isInTailCallPosition() into the callers. As a result memcpy and memmove are also modified to do the same thing for consistency. rdar://131419786	2024-07-17 01:31:52 -07:00
Roger Ferrer Ibáñez	05e6bb40eb	[SelectionDAG] Add an ISD::CLEAR_CACHE node to lower llvm.clear_cache (#93795 ) The current way of lowering `llvm.clear_cache` is a bit unusual. As suggested by Matt Arsenault we are better off using an ISD node. This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall by default named `__clear_cache` and the default legalisation is a libcall. This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE` needed by RISC-V on some platforms.	2024-05-30 14:55:32 +02:00
Brendan Dahl	60bce6eab4	[WebAssembly] Implement all f16x8 binary instructions. (#93360 ) This reuses most of the code that was created for f32x4 and f64x2 binary instructions and tries to follow how they were implemented. add/sub/mul/div - use regular LL instructions min/max - use the minimum/maximum intrinsic, and also have builtins pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins Specified at: `29a9b9462c/proposals/half-precision/Overview.md`	2024-05-28 16:33:20 -07:00
Heejin Ahn	c179d50fd3	[WebAssembly] Add exnref type (#93586 ) This adds (back) the exnref type restored in the new EH proposal adopted in Oct 2023 CG meeting: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md:x	2024-05-28 16:10:11 -07:00
Brendan Dahl	09c5525610	[WebAssembly] Implement prototype f16x8.splat instruction. (#93228 ) Adds a builtin and intrinsic for the f16x8.splat instruction. Specified at: `29a9b9462c/proposals/half-precision/Overview.md` Note: the current spec has f16x8.splat as opcode 0x123, but this is incorrect and will be changed to 0x120 soon.	2024-05-23 20:05:22 -07:00
Sam Clegg	39d32b238d	[WebAssembly] Use 64-bit table when targeting wasm64 (#92042 ) See https://github.com/WebAssembly/memory64/issues/51	2024-05-23 18:25:58 -07:00
Brendan Dahl	8a3277acbc	[WebAssembly] Implement prototype f32.store_f16 instruction. (#91545 ) Adds a builtin and intrinsic for the f32.store_f16 instruction. The instruction stores an f32 value as an f16 memory. Specified at: `29a9b9462c/proposals/half-precision/Overview.md` Note: the current spec has f32.store_f16 as opcode 0xFD0121, but this is incorrect and will be changed to 0xFC31 soon.	2024-05-09 15:38:13 -07:00
Brendan Dahl	1a2a1fbd7c	[WebAssembly] Implement prototype f32.load_f16 instruction. (#90906 ) Adds a builtin and intrinsic for the f32.load_f16 instruction. The instruction loads an f16 value from memory and puts it in an f32. Specified at: `29a9b9462c/proposals/half-precision/Overview.md` Note: the current spec has f32.load_f16 as opcode 0xFD0120, but this is incorrect and will be changed to 0xFC30 soon.	2024-05-07 11:33:10 -07:00
Heejin Ahn	c921ac724f	[WebAssembly] Enable multivalue return when multivalue ABI is used (#88492 ) Multivalue feature of WebAssembly has been standardized for several years now. I think it makes sense to be able to enable it in the feature section by default for our clang/llvm-produced binaries so that the multivalue feature can be used as necessary when necessary within our toolchain and also when running other optimizers (e.g. wasm-opt) after the LLVM code generation. But some WebAssembly toolchains, such as Emscripten, do not provide both mulvalue-returning and not-multivalue-returning versions of libraries. Also allowing the uses of multivalue in the features section does not necessarily mean we generate them whenever we can to the fullest, which is a different code generation / optimization option. So this makes the lowering of multivalue returns conditional on the use of 'experimental-mv' target ABI. This ABI is turned off by default and turned on by passing `-Xclang -target-abi -Xclang experimental-mv` to `clang`, or `-target-abi experimental-mv` to `clang -cc1` or `llc`. But the purpose of this PR is not tying the multivalue lowering to this specific 'experimental-mv'. 'experimental-mv' is just one multivalue ABI we currently have, and it is still experimental, meaning it is not very well optimized or tuned for performance. (e.g. it does not have the limitation of the max number of multivalue-lowered values, which can be detrimental to performance.) We may change the name of this ABI, or improve it, or add a new multivalue ABI in the future. Also I heard that WASI is planning to add their multivalue ABI soon. So the plan is, whenever any one of multivalue ABIs is enabled, we enable the lowering of multivalue returns in the backend. We currently have only 'experimental-mv' in the repo so we only check for that in this PR. Related past discussions: #82714 https://github.com/WebAssembly/tool-conventions/pull/223#issuecomment-2008298652	2024-04-23 17:48:59 +09:00
Arthur Eubanks	94c988bcfd	[NFC] Remove unused parameter from shouldAssumeDSOLocal()	2024-03-11 19:48:17 +00:00
Heejin Ahn	8506a63bf7	Revert "[WebAssembly] Disable multivalue emission temporarily (#82714 )" This reverts commit 6e6bf9f81756ba6655b4eea8dc45469a47f89b39. It turned out the multivalue feature had active outside users and it could cause some disruptions to them, so I'd like to investigate more about the workarounds before doing this.	2024-02-28 01:02:39 +00:00
Heejin Ahn	6e6bf9f817	[WebAssembly] Disable multivalue emission temporarily (#82714 ) We plan to enable multivalue in the features section soon (#80923) for other reasons, such as the feature having been standardized for many years and other features being developed (e.g. EH) depending on it. This is separate from enabling Clang experimental multivalue ABI (`-Xclang -target-abi -Xclang experimental-mv`), but it turned out we generate some multivalue code in the backend as well if it is enabled in the features section. Given that our backend multivalue generation still has not been much used nor tested, and enabling the feature in the features section can be a separate decision from how much multialue (including none) we decide to generate for now, I'd like to temporarily disable the actual generation of multivalue in our backend. To do that, this adds an internal flag `-wasm-emit-multivalue` that defaults to false. All our existing multivalue tests can use this to test multivalue code. This flag can be removed later when we are confident the multivalue generation is well tested.	2024-02-22 19:17:15 -08:00
Alex Bradbury	197214e39b	[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710 ) This follows on from #76708, allowing `cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just `N->getAsZextVal();` Introduced via `git grep -l "cast<ConstantSDNode>$.$.getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.*)$->getZExtValue/\1->getAsZExtVal/'` and then using `git clang-format` on the result.	2024-01-09 12:25:17 +00:00
Benjamin Kramer	858d6a15a0	[wasm] Don't crash on non-simple value types during shuffle combine These still exist during the DAGCombine phase.	2023-10-24 12:35:43 +02:00
Björn Pettersson	4acb96c99f	[SelectionDAG] Tidy up around endianness and isConstantSplat (#68212 ) The BuildVectorSDNode::isConstantSplat function could depend on endianness, and it takes a bool argument that can be used to indicate if big or little endian should be considered when internally casting from a vector to a scalar. However, that argument is default set to false (= little endian). And in many situations, even in target generic code such as DAGCombiner, the endianness isn't specified when using the function. The intent with this patch is to highlight that endianness doesn't matter, depending on the context in which the function is used. In DAGCombiner the code is slightly refactored. Back in the days when the code was written it wasn't possible to request a MinSplatBits size when calling isConstantSplat. Instead the code re-expanded the found SplatValue to match with the EltBitWidth. Now we can just provide EltBitWidth as MinSplatBits and remove the logic for doing the re-expand. While being at it, tidying up around isConstantSplat, this patch also adds an explicit check in BuildVectorSDNode::isConstantSplat to break out from the loop if trying to split an on VecWidth into two halves. Haven't been able to prove that there could be miscompiles involved if not doing so. There are lit tests that trigger that scenario, although I think they happen to later discard the returned SplatValue for other reasons.	2023-10-16 14:53:53 +02:00
Paulo Matos	a29e8ef1c3	[WebAssembly] Add path to PIC mode for wasm tables (#67545 ) Currently tables cannot be shared between compilation units, therefore no special treatment is needed for tables. Fixes #65191	2023-10-03 08:00:21 +02:00
Yolanda Chen	291101aa8e	[WebAssembly] Optimize vector shift using a splat value from outside block The vector shift operation in WebAssembly uses an i32 shift amount type, while the LLVM IR requires binary operator uses the same type of operands. When the shift amount operand is splated from a different block, the splat source will not be exported and the vector shift will be unrolled to scalar shifts. This patch enables the vector shift to identify the splat source value from the other block, and generate expected WebAssembly bytecode when lowering. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D158399	2023-08-25 08:13:27 -07:00
Reid Kleckner	984dc4b9cd	[WebAssembly] Create separation between MC and CodeGen layers Move WebAssemblyUtilities from Utils to the CodeGen library. It primarily deals in MIR layer types, so it really lives in the CodeGen library. Move a variety of other things around to try create better separation. See issue #64166 for more info on layering. Move llvm/include/CodeGen/WasmAddressSpaces.h back to llvm/lib/Target/WebAssembly/Utils. Differential Revision: https://reviews.llvm.org/D156472	2023-08-18 14:08:37 -07:00
Thomas Lively	4f065fcb57	[WebAssembly] Fix incorrect assertion in SIMD reduction codegen The codegen routine introduced in 18077e9fd688 did not account for vectors with more than 16 lanes. Remove the incorrect assertion and bail out of the optimization when encountering this case. Add test cases that previously triggered the assertion. Unfortunately, these test cases now have terrible codegen, but that is at least better than crashing. Fixes #63500. Differential Revision: https://reviews.llvm.org/D154124	2023-06-30 11:30:18 -07:00
xortoast	bb648c9177	[WebAssembly] Add lowering for llvm.rint and llvm.roundeven WebAssembly doesn't expose inexact exceptions, so frint can be mapped to fnearbyint. Likewise, WebAssembly always rounds ties-to-even, so froundeven can be mapped to fnearbyint. Differential Revision: https://reviews.llvm.org/D153451	2023-06-23 14:07:11 -07:00

1 2 3 4 5 ...

406 Commits