llvm-project

Author	SHA1	Message	Date
Matt Arsenault	a521774217	DAG: Use poison for unused shuffle operands in legalizer (#177578 )	2026-01-23 18:20:56 +01:00
Matt Arsenault	01e6245af4	DAG: Avoid querying libcall info from TargetLowering (#176268 ) Libcall lowering decisions should come from the LibcallLoweringInfo analysis. Query this through the DAG, so eventually the source can be the analysis. For the moment this is just a wrapper around the TargetLowering information.	2026-01-16 09:02:49 +00:00
Ramkumar Ramachandra	9e5e267a03	[ISel] Introduce llvm.clmul intrinsic (#168731 ) In line with a std proposal to introduce the llvm.clmul family of intrinsics corresponding to carry-less multiply operations. This work builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives), and follow-up patches will introduce custom-lowering on supported targets, replacing target-specific clmul intrinsics. Testing is done on the RISC-V target, which should be sufficient to prove that the intrinsics work, since no RISC-V specific lowering has been added. Ref: https://isocpp.org/files/papers/P3642R3.html Co-authored-by: Craig Topper <craig.topper@sifive.com>	2026-01-05 20:24:06 +00:00
Islam Imad	7ceecfad40	[CodeGen] Fix EVT::changeVectorElementType assertion on simple-to-extended fallback (#173413 ) Fixes #171608	2025-12-28 18:51:18 +00:00
Benjamin Maxwell	baa49835da	[AArch64] Support lowering v4i16/f16 VECTOR_COMPRESS nodes to SVE (#173256 ) This is a follow-up to #171162, which broke the (untested) lowering of v4i16/f16 to SVE. See: https://github.com/llvm/llvm-project/pull/171162#discussion_r2601901376	2025-12-24 14:26:13 +00:00
Sam Tebbs	19e1011df5	[SelectionDAG] Fix unsafe cases for loop.dependence.{war/raw}.mask (#168565 ) Both `LOOP_DEPENDENCE_WAR_MASK` and `LOOP_DEPENDENCE_RAW_MASK` are currently hard to split correctly, and there are a number of incorrect cases. The difficulty comes from how the intrinsics are defined. For example, take `LOOP_DEPENDENCE_WAR_MASK`. It is defined as the OR of: * `(ptrB - ptrA) <= 0` * `elementSize * lane < (ptrB - ptrA)` Now, if we want to split a loop dependence mask for the high half of the mask we want to compute: * `(ptrB - ptrA) <= 0` * `elementSize * (lane + LoVT.getElementCount()) < (ptrB - ptrA)` However, with the current opcode definitions, we can only modify ptrA or ptrB, which may change the result of the first case, which should be invariant to the lane. This patch resolves these cases by adding a "lane offset" to the ISD opcodes. The lane offset is always a constant. For scalable masks, it is implicitly multiplied by vscale. This makes splitting trivial as we increment the lane offset by `LoVT.getElementCount()` now. Note: In the AArch64 backend, we only support zero lane offsets (as other cases are tricky to lower to whilewr/rw). --------- Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com>	2025-12-12 08:44:33 +00:00
Matt Arsenault	a3aaa1a391	DAG: Use RuntimeLibcalls to legalize vector frem calls (#170719 ) This continues the replacement of TargetLibraryInfo uses in codegen with RuntimeLibcallsInfo started in 821d2825a4f782da3da3c03b8a002802bff4b95c. The series there handled all of the multiple result calls. This extends for the other handled case, which happened to be frem. For some reason the Libcall for these are prefixed with "REM_", for the instruction "frem", which maps to the libcall "fmod".	2025-12-11 13:33:27 +00:00
AZero13	d831f8df52	[SelectionDAG] Fix AArch64 machine verifier bug when expanding LOOP_DEPENDENCE_MASK (#168221 ) TargetConstant nodes don't match TableGen ImmLeaf patterns during instruction selection. When this zero constant flows into the AArch64 CCMP formation code, the machine verifier hits an assertion in expensive checks. Fixes: #168227	2025-11-15 21:12:11 +00:00
Matt Arsenault	c5aace4236	DAG: Move expandMultipleResultFPLibCall to TargetLowering (NFC) (#166988 ) This kind of helper is higher level and not general enough to go directly in SelectionDAG. Most similar utilities are in TargetLowering.	2025-11-12 03:50:33 +00:00
Matt Arsenault	95f2728b5c	DAG: Stop using TargetLibraryInfo for multi-result FP intrinsic codegen (#166987 ) Only use RuntimeLibcallsInfo. Remove the helper functions used to transition.	2025-11-12 02:47:28 +00:00
Matt Arsenault	4b9771e41a	DAG: Use modf vector libcalls through RuntimeLibcalls (#166986 ) Copy new process from sincos/sincospi	2025-11-11 18:05:35 -08:00
Matt Arsenault	de68181d7f	DAG: Use sincos vector libcalls through RuntimeLibcalls (#166984 ) Copy new process from sincospi.	2025-11-11 10:51:23 -08:00
Matt Arsenault	821d2825a4	RuntimeLibcalls: Remove incorrect sincospi from most targets (#166982 ) sincospi/sincospif/sincospil does not appear to exist on common targets. Darwin targets have __sincospi and __sincospif, so define and use those implementations. I have no idea what version added those calls, so I'm just guessing it's the same conditions as __sincos_stret. Most of this patch is working to preserve codegen when a vector library is explicitly enabled. This only covers sleef and armpl, as those are the only cases tested. The multiple result libcalls have an aberrant process where the legalizer looks for the scalar type's libcall in RuntimeLibcalls, and then cross references TargetLibraryInfo to find a matching vector call. This was unworkable in the sincospi case, since the common case is there is no scalar call available. To preserve codegen if the call is available, first try to match a libcall with the vector type before falling back on the old scalar search. Eventually all of this logic should be contained in RuntimeLibcalls, without the link to TargetLibraryInfo. In principle we should perform the same legalization logic as for an ordinary operation, trying to find a matching subvector type with a libcall.	2025-11-10 11:05:08 -08:00
Damian Heaton	70f4b596cf	Add `llvm.vector.partial.reduce.fadd` intrinsic (#159776 ) With this intrinsic, and supporting SelectionDAG nodes, we can better make use of instructions such as AArch64's `FDOT`.	2025-11-07 15:36:54 +00:00
Sam Tebbs	569d738d4e	[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes (#117007 ) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.	2025-09-02 15:35:15 +01:00
Nikita Popov	01bc742185	[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817 ) This ensures that the required fields are set, and also makes the construction more convenient.	2025-08-15 18:06:07 +02:00
Craig Topper	8d549cf036	[SelectionDAG] Pass SDNodeFlags through getNode instead of setFlags. (#149852 ) getNode updates flags correctly for CSE. Calling setFlags after getNode may set the flags where they don't apply. I've added a Flags argument to getSelectCC and the signature of getNode that takes an ArrayRef of EVTs.	2025-07-22 08:06:30 -07:00
Paul Walker	68732ce8e0	[LLVM][CodeGen][SVE] Add isel for bfloat unordered reductions. (#143540 ) The omissions are VECREDUCE_SEQ_* and MUL. The former goes down a different code path and the latter is unsupported across all element types.	2025-06-20 11:46:25 +01:00
Philip Reames	939666380f	[SDAG] Add partial_reduce_sumla node (#141267 ) We have recently added the partial_reduce_smla and partial_reduce_umla nodes to represent Acc += ext(b) * ext(b) where the two extends have to have the same source type, and have the same extend kind. For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which correspond to the existing nodes, but we also have vqdotsu which represents the case where the two extends are sign and zero respective (i.e. not the same type of extend). This patch adds a partial_reduce_sumla node which has sign extension for A, and zero extension for B. The addition is somewhat mechanical.	2025-06-09 07:17:45 -07:00
Philip Reames	1651aa2943	[SDAG] Split the partial reduce legalize table by opcode [nfc] (#141970 ) On it's own, this change should be non-functional. This is a preparatory change for https://github.com/llvm/llvm-project/pull/141267 which adds a new form of PARTIAL_REDUCE_*MLA. As noted in the discussion on that review, AArch64 needs a different set of legal and custom types for the PARTIAL_REDUCE_SUMLA variant than the currently existing PARTIAL_REDUCE_UMLA/SMLA.	2025-05-29 14:05:31 -07:00
Philip Reames	cf2f558501	[DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvector Follow up to 6e654caab, use the new routines in more places. Note that I've excluded from this patch any case which uses a getConstant index instead of a getVectorIdxConstant index just to minimize room for error. I'll get those in a separate follow up.	2025-05-08 09:40:45 -07:00
Nicholas Guy	a1f369e630	[AArch64][SVE] Add dot product lowering for PARTIAL_REDUCE_MLA node (#130933 ) Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only happens when the combine has been performed on the ISD node. Also adds in check to only do the DAG combine when the node can then eventually be lowered, so changes neon tests too. --------- Co-authored-by: James Chesterman <james.chesterman@arm.com>	2025-04-23 13:19:41 +01:00
Jim Lin	94f6b6d538	[SelectionDAG][RISCV] Promote VECREDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM} (#128800 ) This patch also adds the tests for VP_REDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM}, which have been supported for a while.	2025-02-28 23:13:30 +08:00
James Chesterman	d4a0848dc6	[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207 ) Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line argument (aarch64-enable-partial-reduce-nodes) that indicates whether the intrinsic experimental_vector_partial_ reduce_add will be transformed into the new ISD node. Lowering with the new ISD nodes will, for now, always be done as an expand.	2025-02-18 09:08:47 +00:00
Benjamin Maxwell	19556eccf6	[RTLIB] Rename getFSINCOS() to getSINCOS (NFC) (#126705 ) This makes the name more consistent with the other helpers.	2025-02-11 11:51:35 +00:00
Benjamin Maxwell	701223ac20	[IR] Add llvm.sincospi intrinsic (#125873 ) This adds the `llvm.sincospi` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.sincospi` intrinsic takes a floating-point value and returns both the sine and cosine of the value multiplied by pi. It computes the result more accurately than the naive approach of doing the multiplication ahead of time, especially for large input values. ``` declare { float, float } @llvm.sincospi.f32(float %Val) declare { double, double } @llvm.sincospi.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val) ``` Currently, the default lowering of this intrinsic relies on the `sincospi[f\|l]` functions being available in the target's runtime (e.g. libc).	2025-02-11 09:01:30 +00:00
Benjamin Maxwell	4bf97aa818	[IR] Add `llvm.modf` intrinsic (#121948 ) This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.modf` intrinsic takes a floating-point value and returns both the integral and fractional parts (as a struct). ``` declare { float, float } @llvm.modf.f32(float %Val) declare { double, double } @llvm.modf.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val) ``` This corresponds to the libm `modf` function but returns multiple values in a struct (rather than take output pointers), which makes it easier to vectorize.	2025-02-07 09:25:13 +00:00
Graham Hunter	d9f165ddea	[SDAG] Add an ISD node to help lower vector.extract.last.active (#118810 ) Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder. The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.	2025-01-20 12:57:05 +00:00
Craig Topper	8ce81f17a1	[LegalizeVectorOps][RISCV] Use VP_FP_EXTEND/ROUND when promoting VP_FP* operations. (#122784 ) This preserves the original VL leading to more reuse of VL for vsetvli. The VLOptimizer can also clean up a lot of this, but I'm not sure if it gets all of it. There are some regressions in here from propagating the mask too, but I'm not sure if that's a concern.	2025-01-13 15:18:41 -08:00
abhishek-kaushik22	366e62a0cb	[X86] Combine `uitofp <v x i32> to <v x half>` (#121809 ) Closes #121793	2025-01-08 16:49:29 +08:00
Simon Pilgrim	923675193b	[DAG] VectorLegalizer::ExpandUINT_TO_FLOAT- pull out repeated getValueType calls. NFC.	2025-01-06 18:49:51 +00:00
Phoebe Wang	1547382033	[X86] Support lowering of FMINIMUMNUM/FMAXIMUMNUM (#121464 )	2025-01-06 21:28:58 +08:00
Craig Topper	e32afded92	[LegalizeVectorOps] Use getBoolConstant instead of getAllOnesConstant in VectorLegalizer::UnrollVSETCC. (#121526 ) This code should follow the target preference for boolean contents of a vector type. We shouldn't assume that true is negative one.	2025-01-03 10:46:37 -08:00
Benjamin Maxwell	ea6b8fa4b9	[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() (#114792 ) This merges the logic for expanding both FFREXP and FSINCOS into one method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication and also allows FFREXP to benefit from the stack slot elimination implemented for FSINCOS. This method will also be used in future to implement more multiple-result intrinsics (such as modf and sincospi).	2024-11-06 11:06:06 +00:00
Benjamin Maxwell	89a8c71db6	[SDAG] Support expanding `FSINCOS` to vector library calls (#114039 ) This shares most of its code with the scalar sincos expansion. It allows expanding vector FSINCOS nodes to a library call from the specified `-vector-library`. The upside of this is it will mean the vectorizer only needs to handle the sincos intrinsic, which has no memory effects, and this can handle lowering the intrinsic to a call that takes output pointers.	2024-10-31 12:41:43 +00:00
Yingwei Zheng	cf9d1c1486	[SDAG] Simplify `SDNodeFlags` with bitwise logic (#114061 ) This patch allows using enumeration values directly and simplifies the implementation with bitwise logic. It addresses the comment in https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.	2024-10-31 08:10:07 +08:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
Tex Riddell	875afa939d	[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 Based on example PR #96222 and fix PR #101268, with some differences due to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp). - Add llvm.experimental.constrained.atan2 - Intrinsics.td, ConstrainedOps.def, LangRef.rst - Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp - Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp, and LegalizeVectorTypes.cpp - Update isKnownNeverNaN in SelectionDAG.cpp - Update SelectionDAGDumper.cpp - Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp - TargetLoweringBase.cpp - Expand for vectors, promote f16 - X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC Part 4 for Implement the atan2 HLSL Function #70096.	2024-10-16 11:43:17 -07:00
Paul Walker	02dd6b1014	[LLVM][CodeGen] Add lowering for scalable vector bfloat operations. (#109803 ) Specifically: fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm, fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt & ftrunc	2024-10-07 13:01:59 +01:00
Craig Topper	92a8b81bdf	[LegalizeVectorOps] Enable ExpandFABS/COPYSIGN to use integer ops for fixed vectors in some cases. (#109232 ) Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and ARM.	2024-09-30 11:44:49 -07:00
Craig Topper	d21a43579e	[LegalizeVectorOps][RISCV] Don't scalarize FNEG in ExpandFNEG if FSUB is marked Promote. We have a special check that tries to determine if vector FP operations are supported for the type to determine whether to scalarize or not. If FP arithmetic would be promoted, don't unroll. This improves Zvfhmin codegen on RISC-V.	2024-09-18 18:19:21 -07:00
Craig Topper	da46244e49	Revert "[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific." This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a. Regression was reported in Halide for arm32.	2024-09-17 09:04:43 -07:00
Craig Topper	f36580fcb5	[LegalizeVectorOps] Remove calls to DAG.UnrollVectorsOps from some expansion handlers. NFC (#108930 ) Instead, return SDValue() to tell the caller to do the unrolling. This is consistent with how some other handler work. Especially the handlers that live in TLI. ExpandBITREVERSE was rewritten to not take the Results vector an argument.	2024-09-17 08:35:22 -07:00
Craig Topper	884ff9e3f9	[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific. Only scalarize single element vectors when vector FSUB is not supported and scalar FNEG is supported.	2024-09-16 21:48:42 -07:00
Craig Topper	3e798476de	[LegalizeDAG][RISCV] Don't promote f16 vector ISD::FNEG/FABS/FCOPYSIGN to f32 when we don't have Zvfh. (#106652 ) The fp_extend will canonicalize NaNs which is not the semantics of FNEG/FABS/FCOPYSIGN. For fixed vectors I'm scalarizing due to test changes on other targets where the scalarization is expected. I will try to address in a follow up. For scalable vectors, we bitcast to integer and use integer logic ops.	2024-09-03 22:44:49 -07:00
Craig Topper	366ac8c090	[LegalizeVectorOps] Defer UnrollVectorOp in ExpandFNEG to caller. (#106783 ) Make ExpandFNEG return SDValue() when it doesn't expand. The caller already knows how to Unroll when Results is empty.	2024-09-02 16:16:12 -07:00
Yingwei Zheng	affc0c64b6	[SDAG] Expand vector [u\|s]cmp in VectorLegalizer (#106883 ) Address comment https://github.com/llvm/llvm-project/pull/106747#issuecomment-2322922855.	2024-09-01 22:35:52 +08:00
Craig Topper	c25293c6dd	[LegalizeVectorOps][RISCV] Don't promote VP_FABS/FNEG/FCOPYSIGN. (#106659 ) Promoting canonicalizes NaNs which changes the semantics. Bitcast to integer and use logic ops instead.	2024-08-30 09:44:51 -07:00
Craig Topper	aa91d90cb0	[LegalizeVectorOps][PowerPC] Use xor to expand fneg. (#106595 ) This preserves the semantis of fneg and matches what we do in LegalizeDAG. I kept the legal FSUB check to force unrolling for some targets that don't have FSUB but have XOR. On Aarch64, using xor broke some tests that expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg (extractvectorelt X)))))) pattern.	2024-08-29 15:00:23 -07:00
Sumanth Gundapaneni	e78156a0e2	Scalarize the vector inputs to llvm.lround intrinsic by default. (#101054 ) Verifier is updated in a different patch to let the vector types for llvm.lround and llvm.llround intrinsics.	2024-08-21 12:13:56 -05:00

1 2 3 4 5 ...

354 Commits