llvm-project

Author	SHA1	Message	Date
ZhaoQi	38e280d8a4	[SelectionDAG] Use promoted types when creating nodes after type legalization (#178617 ) When creating new nodes with illegal types after type legalization, we should try to use promoted type to avoid creating nodes with illegal types. Fixes: https://github.com/llvm/llvm-project/issues/177155	2026-02-03 09:56:20 +00:00
niqiangpro-cell	603b625b21	[Analysis] Add Intrinsics::CLMUL case to cost calculations to getIntrinsicInstrCost / getTypeBasedIntrinsicInstrCost (#176552 ) This patch adds a case in getIntrinsicInstrCost and getTypeBasedIntrinsicInstrCost in llvm/include/llvm/CodeGen/BasicTTIImpl.h for Intrinsic::clmul. This patch uses TLI->isOperationLegalOrCustom to check if the instruction is cheap. If not cheap, it sums up the cost of the arithmetic operations (AND, SHIFT, XOR) multiplied by the bit width. Fixes #176354	2026-02-01 12:56:41 +00:00
Osama Abdelkader	aad7259ff6	[AArch64] Optimize memset to use NEON DUP instruction for more sizes (#166030 ) This change improves memset code generation for non-zero values on AArch64 by using NEON's DUP instruction instead of the less efficient multiplication with 0x01010101 pattern. For small sizes, the value is extracted from a larger DUP. For non-power-of-two sizes, overlapping stores are used in some cases. TargetLowering::findOptimalMemOpLowering is modified to allow explicitly specifying the size of the constant in cases where the constant is larger than the store operations. Fixes #165949	2026-01-29 13:03:38 -08:00
Anikesh Parashar	fd45140ed6	[DAG] SimplifyDemandedBits - ICMP_SLT(X,0) - only sign mask of X is required (#164946 ) Resolves #164589	2026-01-28 17:30:23 +00:00
valadaptive	cdc6a84c14	TargetLowering: Allow FMINNUM/FMAXNUM to lower to FMINIMUM/FMAXIMUM even without `nsz` (#177828 ) This restriction was originally added in https://reviews.llvm.org/D143256, with the given justification: > Currently, in TargetLowering, if the target does not support fminnum, we lower to fminimum if neither operand could be a NaN. But this isn't quite correct because fminnum and fminimum treat +/-0 differently; so, we need to prove that one of the operands isn't a zero. As far as I can tell, this was never correct. Before https://github.com/llvm/llvm-project/pull/172012, `minnum` and `maxnum` were nondeterministic with regards to signed zero, so it's always been perfectly legal to lower them to operations that order signed zeroes.	2026-01-25 18:24:12 -05:00
Simon Pilgrim	15cd9f736b	[DAG] expandIntMINMAX - use getOppositeSignednessMinMaxOpcode helper to flip min/max signedness. NFC. (#177450 )	2026-01-22 20:38:35 +00:00
Luke Lau	cee36b23cc	[IR] Allow non-constant offsets in @llvm.vector.splice.{left,right} (#174693 ) Following on from #170796, this PR implements the second part of https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974 by allowing non-constant offsets in the vector splice intrinsics. Previously @llvm.vector.splice had a restriction enforced by the verifier that the offset had to be known to be within the range of the vector at compile time. Because we can't enforce this with non-constant offsets, it's been relaxed so that offsets that would slide the vector out of bounds return a poison value, similar to insertelement/extractelement. @llvm.vector.splice.left also previously only allowed offsets within the range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so that it's consistent with @llvm.vector.splice.right. In lieu of the verifier checks that were removed, InstSimplify has been taught to fold splices to poison when the offset is out of bounds. The cost model isn't implemented in this PR, and just returns invalid for any non-constant offsets for now. I think the correct way to cost these non-constant offets isn't through getShuffleCost because they can't handle variable masks, but instead just through getIntrinsicInstCost.	2026-01-21 10:58:40 +00:00
Simon Pilgrim	c7af813b52	[DAG] expandCLMUL - if a target supports CLMUL+CLMULH then CLMULR can be merged from the results (#176644 ) If a target supports CLMUL + CLMULH, then we can funnel shift the results together to form CMULR. Helps x86 PCLMUL targets particularly	2026-01-18 21:17:36 +00:00
Valeriy Savchenko	9391d46389	[SelectionDAG] Eliminate redundant setcc on comparison results (#171431 ) When comparisons produce all-zeros or all-ones in scalars or per lane in vectors, comparing results of such comparisons against 0 is an identity operation. This change eliminates redundant comparison instructions after another comparison operation.	2026-01-16 16:45:19 +00:00
Phoebe Wang	e83021ab16	[SelectionDAG][InlineAsm] Check VT isSimple before getSimpleVT (#176323 ) Fixes: #170024	2026-01-16 19:57:52 +08:00
Florian Hahn	68a04c1ada	[SelDag] Use use BoolVT size when expanding find-last-active, if larger. (#175971 ) On some targets, BoolVT may have been widened earlier. In those cases, choosing StepVT to be smaller can cause crashes when widening the mis-matched select. Without the fix, the new test @extract_last_active_v4i32_penryn crashes when trying to widen. It also improves codegen for other cases. PR: https://github.com/llvm/llvm-project/pull/175971	2026-01-14 20:46:16 +00:00
DaKnig	aa299269ea	[SDAG] (setcc (sub nsw a, b), zero, s??) -> (setcc a, b, s??) (#175459 ) This often happens when the dag combiner produces sign/zero extends and realizes that nsw/nuw can be added, for example in the case of `(abds (sext a), (sext b))` alive2: - slt, nsw: [link](https://alive2.llvm.org/ce/z/cgjMSx) - sgt, nsw: [link](https://alive2.llvm.org/ce/z/JP7h2f) - sle, nsw: [link](https://alive2.llvm.org/ce/z/n5Wuc_) - sge, nsw: [link](https://alive2.llvm.org/ce/z/Eps53-)	2026-01-13 17:00:00 +00:00
Liao Chunyu	b5401031d6	[DAG]Add ISD::SPLAT_VECTOR to TargetLowering::getNegatedExpression (#173967 ) Fold splat_vector(fneg(X)) -> splat_vector(-X) Call the getCheaperNegatedExpression function, and ISD::SPLAT_VECTOR return NegatibleCost::Cheaper. This optimization is applied only to the fneg instruction.	2026-01-09 18:07:10 +08:00
Florian Hahn	f444467a38	[ISel] Handle TypeWidenVector in expandVectorFindLastActive. (#174384 ) When widening extract.last.active, the element count changes. Create a step vector with only the original elements valid and zeros for padding. Also widen the mask accordingly. This fixes a hang when lowering on X86, where widening is required in some cases. Fixes https://github.com/llvm/llvm-project/issues/171831. PR: https://github.com/llvm/llvm-project/pull/174384	2026-01-06 12:40:34 +00:00
Luke Lau	ad4bfac732	[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796 ) This PR implements the first change outlined in https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel In order to allow non-immediate offsets in the llvm.vector.splice intrinsic, we need to separate out the "shift left" and "shift right" modes into two separate intrinsics, which were previously determined by whether or not the offset is positive or negative. The description in the LangRef has also been reworded in terms of sliding elements left or right and extracting either the upper or lower half as opposed to extracting from a certain index, which brings it inline with the definition of `llvm.fshr.`/`llvm.fshl.`. This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into their new equivalent one based on their offset, so existing uses of vector.splice should still work. Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced in this PR to keep the diff small and kick the tyres on the AutoUpgrader a bit. I planned to do this in a follow up NFC but can include it in this PR if reviewers prefer. Similarly the shuffle costing kind `SK_Splice` has just been kept the same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight` later.	2026-01-06 15:41:26 +08:00
Ramkumar Ramachandra	9e5e267a03	[ISel] Introduce llvm.clmul intrinsic (#168731 ) In line with a std proposal to introduce the llvm.clmul family of intrinsics corresponding to carry-less multiply operations. This work builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives), and follow-up patches will introduce custom-lowering on supported targets, replacing target-specific clmul intrinsics. Testing is done on the RISC-V target, which should be sufficient to prove that the intrinsics work, since no RISC-V specific lowering has been added. Ref: https://isocpp.org/files/papers/P3642R3.html Co-authored-by: Craig Topper <craig.topper@sifive.com>	2026-01-05 20:24:06 +00:00
Simon Pilgrim	19a1c407f9	[X86] LowerMINMAX - use valuetracking to attempt to find a smaller type that can efficiently lower min/max ops (#174294 ) We currently use the generic expansions to custom lower integer min/max instructions, but if we have sufficient leading bits, SSE/AVX is always better off handling it directly with smaller types. vXi64 cmp/min/max is particularly weak, and as we narrow the types the better legality we have - this approach seems to work well for x86, but I'm not sure if its valid enough to try generically in this manner. However, I added the signed/unsigned generic flip fold to expandIntMINMAX to further improve SSE2 codegen, similar to what we already attempt in DAGCombiner (which with a bit more work we might be able to remove now). All thats missing is better ComputeNumSignBits handling for vXi64 ashr expansion, which still misses a lot of cases when split across vXi32 types and shuffles. Fixes #174169	2026-01-04 17:27:50 +00:00
Islam Imad	7ceecfad40	[CodeGen] Fix EVT::changeVectorElementType assertion on simple-to-extended fallback (#173413 ) Fixes #171608	2025-12-28 18:51:18 +00:00
Craig Topper	877df9e4b9	[SelectionDAG] Make SSHLSAT/USHLSAT obey getShiftAmountTy(). (#173216 ) Treat these like other shift operations by allowing the shift amount to be a different type than the result. The PromoteIntOp_Shift and LegalizeDAG code are not tested due to lack of target support. I'm looking at adding SSHLSAT for the RISC-V P extension. I don't need this support for that since RISC-V only has one legal type. I just thought it was odd that they weren't like other shifts.	2025-12-22 10:28:04 -08:00
guan jian	4e675a0c45	[SelectionDAG] Lowering usub.sat(a, 1) to a - (a != 0) (#170076 ) I recently observed that LLVM generates the following code: ``` addi a1, a0, -1 sltu a0, a0, a1 addi a0, a0, -1 and a0, a0, a1 ret ``` This could be optimized using the snez instruction instead.	2025-12-18 14:31:53 +00:00
Valeriy Savchenko	e7892d702f	[DAGCombiner] Fix assertion failure in vector division lowering (#172321 )	2025-12-17 22:09:54 +00:00
Craig Topper	816c9d64a7	[TargetLowering] Use getNegative. NFC (#172526 ) This also fixes the type for the SUB to be ShVT instead of VT. I guess we only test this when ShVT == VT.	2025-12-16 16:45:18 -08:00
natanelh-mobileye	cef490d94b	[SDAG] Check context node for free truncates in DemandedBits (#171266 ) Allow ShrinkDemandedOp to use Node-specific info	2025-12-10 14:38:56 +00:00
Matt Arsenault	27bf5fdcc6	DAG: Add overload of getExternalSymbol using RTLIB::LibcallImpl (#170587 )	2025-12-05 22:39:57 +00:00
Matt Arsenault	fde7819ad1	DAG: Add overload of makeLibCall which calls an RTLIB::LibcallImpl (#170584 )	2025-12-05 15:49:23 +01:00
Valeriy Savchenko	e7f3226e4f	[DAGCombiner] Handle type-promoted constants in SDIV exact lowering (#169950 ) Builds up on the solution proposed for #169491 and #169924 and applies it for SDIV exact as well. Almost a carbon copy of UDIV exact solution from #169949.	2025-12-04 12:56:01 +00:00
Valeriy Savchenko	8e53a88de3	[DAGCombiner] Handle type-promoted constants in SDIV lowering (#169924 ) Builds up on the solution proposed for #169491 and applies it for SDIV as well.	2025-12-04 11:33:19 +00:00
Valeriy Savchenko	73ef27c74c	[DAGCombiner] Handle type-promoted constants in UDIV exact lowering (#169949 ) Builds up on the solution proposed for https://github.com/llvm/llvm-project/pull/169491 and applies it for UDIV exact as well.	2025-12-04 10:57:32 +00:00
YunQiang Su	e5c3a538a7	expandFMINIMUMNUM_FMAXIMUMNUM: Improve compare between zeros (#140193 ) 1. On GPR32 platform, expandIS_FPCLASS may fail due to ISD::BITCAST double to int64 may fail. Let's FP_ROUND double to float first. Since we use it if MinMax is zero only, so the flushing won't break anything. 2. Only one IS_FPCLASS is needed. MinMax will always be RHS if equal. So we can select between LHS and MinMax. It will even safe if FP_ROUND flush a small LHS, as if LHS is not zero then, MinMax won't be Zero, so we will always use MinMax. --------- Co-authored-by: Nikita Popov <github@npopov.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-12-04 10:20:02 +08:00
Valeriy Savchenko	c5fa1f8c4b	[DAGCombiner] Handle type-promoted constants in UDIV lowering (#169491 )	2025-12-03 19:34:21 +00:00
Matt Arsenault	cdb501064f	DAG: Avoid more uses of getLibcallName (#170402 )	2025-12-03 13:01:04 -05:00
Matt Arsenault	8d6c5cddf2	DAG: Use LibcallImpl in various getLibFunc helpers (#170400 ) Avoid using getLibcallName in favor of querying the libcall impl, and getting the ABI details from that.	2025-12-03 13:00:45 -05:00
Luke Lau	d1500d12be	[SelectionDAG] Add SelectionDAG::getTypeSize. NFC (#169764 ) Similar to how getElementCount avoids the need to reason about fixed and scalable ElementCounts separately, this patch adds getTypeSize to do the same for TypeSize. It also goes through and replaces some of the manual uses of getVScale with getTypeSize/getElementCount where possible.	2025-12-01 10:33:50 +00:00
Benjamin Maxwell	135ddf1e8e	[AArch64][SVE] Add basic support for `@llvm.masked.compressstore` (#168350 ) This patch adds SVE support for the `masked.compressstore` intrinsic via the existing `VECTOR_COMPRESS` lowering and compressing the store mask via `VECREDUCE_ADD`. Currently, only `nxv4[i32\|f32]` and `nxv2[i64\|f64]` are directly supported, with other types promoted to these, where possible. This is done in preparation for LV support of this intrinsic, which is currently being worked on in #140723.	2025-11-28 10:17:36 +00:00
Matt Arsenault	a757c4e74e	CodeGen: Add subtarget to TargetLoweringBase constructor (#168620 ) Currently LibcallLoweringInfo is defined inside of TargetLowering, which is owned by the subtarget. Pass in the subtarget so we can construct LibcallLoweringInfo with the subtarget. This is a temporary step that should be revertable in the future, after LibcallLoweringInfo is moved out of TargetLowering.	2025-11-19 19:18:13 +00:00
Matt Arsenault	c5aace4236	DAG: Move expandMultipleResultFPLibCall to TargetLowering (NFC) (#166988 ) This kind of helper is higher level and not general enough to go directly in SelectionDAG. Most similar utilities are in TargetLowering.	2025-11-12 03:50:33 +00:00
Damian Heaton	70f4b596cf	Add `llvm.vector.partial.reduce.fadd` intrinsic (#159776 ) With this intrinsic, and supporting SelectionDAG nodes, we can better make use of instructions such as AArch64's `FDOT`.	2025-11-07 15:36:54 +00:00
Fabian Ritter	8ea447b4c4	[SDAG] Set InBounds when when computing offsets into memory objects (#165425 ) When a load or store accesses N bytes starting from a pointer P, and we want to compute an offset pointer within these N bytes after P, we know that the arithmetic to add the offset must be inbounds. This is for example relevant when legalizing too-wide memory accesses, when lowering memcpy&Co., or when optimizing "vector-load -> extractelement" into an offset load. For SWDEV-516125.	2025-10-31 11:27:55 +01:00
AZero13	5d0f1591f8	[DAGCombine] Improve bswap lowering for machines that support bit rotates (#164848 ) Source: Hacker's delight.	2025-10-25 10:17:15 -07:00
Sam Parker	1820102167	Wasm fmuladd relaxed (#163177 ) Reland #161355, after fixing up the cross-projects-tests for the wasm simd intrinsics. Original commit message: Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 16:50:53 +01:00
Sam Parker	30d3441cf0	Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171 ) Reverts llvm/llvm-project#161355 Looks like I've broken some intrinsic code generation.	2025-10-13 11:53:40 +01:00
Sam Parker	a4eb7ea225	[WebAssembly] Lower fmuladd to madd and nmadd (#161355 ) Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 10:36:08 +01:00
paperchalice	b0a755b2bf	[TargetLowering] Remove NoSignedZerosFPMath uses (#160975 ) Remove NoSignedZerosFPMath in TargetLowering part, users should always use instruction level fast math flags.	2025-09-29 14:33:56 +08:00
Lewis Crawford	a27baf9c96	[SelectionDAG] Improve v2f16 maximumnum expansion (#160723 ) On targets where f32 maximumnum is legal, but maximumnum on vectors of smaller types is not legal (e.g. v2f16), try unrolling the vector first as part of the expansion. Only fall back to expanding the full maximumnum computation into compares + selects if maximumnum on the scalar element type cannot be supported.	2025-09-26 11:37:29 +01:00
AZero13	151a80bbce	[TargetLowering][ExpandABD] Prefer selects over usubo if we do the same for ucmp (#159889 ) Same deal we use for determining ucmp vs scmp. Using selects on platforms that like selects is better than using usubo. Rename function to be more general fitting this new description.	2025-09-25 10:33:05 +09:00
Craig Topper	ef1372af43	[KnownBits] Add setAllConflict to set all bits in Zero and One. NFC (#159815 ) This is a common pattern to initialize Knownbits that occurs before loops that call intersectWith.	2025-09-19 13:15:54 -07:00
Fabian Ritter	a2dcc88f39	[AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (#145330 ) There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125.	2025-09-19 10:19:38 +02:00
Björn Pettersson	1c4c7bd808	[SelectionDAG] Deal with POISON for INSERT_VECTOR_ELT/INSERT_SUBVECTOR (#143102 ) As reported in https://github.com/llvm/llvm-project/issues/141034 SelectionDAG::getNode had some unexpected behaviors when trying to create vectors with UNDEF elements. Since we treat both UNDEF and POISON as undefined (when using isUndef()) we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on isUndef(), as that could make the resulting vector more poisonous. Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR. Here are some examples: This fold was done even if vec[idx] was POISON: INSERT_VECTOR_ELT vec, UNDEF, idx -> vec This fold was done even if any of vec[idx..idx+size] was POISON: INSERT_SUBVECTOR vec, UNDEF, idx -> vec This fold was done even if the elements not extracted from vec could be POISON: sub = EXTRACT_SUBVECTOR vec, idx INSERT_SUBVECTOR UNDEF, sub, idx -> vec With this patch we avoid such folds unless we can prove that the result isn't more poisonous when eliminating the insert. Fixes https://github.com/llvm/llvm-project/issues/141034	2025-09-17 21:04:00 +00:00
Björn Pettersson	593f24cac6	[SelectionDAG] Clean up SCALAR_TO_VECTOR handling in SimplifyDemandedVectorElts (#157027 ) This patch reverts changes from commit 585e65d3307f5f0 (https://reviews.llvm.org/D104250), as it doesn't seem to be needed nowadays. The removed code was doing a recursive call to SimplifyDemandedVectorElts trying to simplify the vector %vec when finding things like (SCALAR_TO_VECTOR (EXTRACT_VECTOR_ELT %vec, 0)) I figure that (EXTRACT_VECTOR_ELT %vec, 0) would be simplified based on only demanding element zero regardless of being used in a SCALAR_TO_VECTOR operation or not. It had been different if the code tried to simplify the whole expression as %vec. That could also have motivate why to make element zero a special case. But it only simplified %vec without folding away the SCALAR_TO_VECTOR.	2025-09-05 15:08:49 +02:00
Craig Topper	c65d6cb0a1	[SelectionDAG] Return std::optional<unsigned> from getValidShiftAmount and friends. NFC (#156224 ) Instead of std::optional<uint64_t>. Shift amounts must be less than or equal to our maximum supported bit widths which fit in unsigned. Most of the callers already assumed it fit in unsigned.	2025-08-31 11:29:07 -07:00

1 2 3 4 5 ...

1657 Commits