llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	ea2ee5dc2f	[DAG] Add legalization handling for AVGCEIL/AVGFLOOR nodes (#92096 ) Always match AVG patterns pre-legalization, and use TargetLowering::expandAVG to expand again during legalization. I've removed the X86 custom AVGCEILU pattern detection and replaced with combines to try and convert other AVG nodes to AVGCEILU.	2024-06-12 14:11:07 +01:00
c8ef	0e346eeac6	[DAG] fold avgu(zext(x), zext(y)) -> zext(avgu(x, y)) (#95134 ) close: #86301	2024-06-12 12:58:49 +01:00
David Green	a284bdb311	[DAG] Fold fdiv X, c2 -> fmul X, 1/c2 without AllowReciprocal if exact (#93882 ) This moves the combine of fdiv by constant to fmul out of an 'if (Options.UnsafeFPMath \|\| Flags.hasAllowReciprocal()' block, so that it triggers if the divide is exact. An extra check for Recip.isDenormal() is added as multiple places make reference to it being unsafe or slow on certain platforms.	2024-06-09 12:28:20 +01:00
Yingwei Zheng	d9507a3e10	[DAGCombine] Fix miscompilation caused by PR94008 (#94850 ) The pr description in #94008 mismatches with the code. > + When VT is smaller than ShiftVT, it is safe to use trunc. > + When VT is larger than ShiftVT, it is safe to use zext iff `is_zero_poison` is true (i.e., `opcode == ISD::CTTZ_ZERO_UNDEF`). See also the counterexample `src_shl_cttz2 -> tgt_shl_cttz2` in the alive2 proofs. Closes #94824.	2024-06-08 21:40:57 +08:00
Quentin Colombet	25506f4864	[SDISel][Combine] Constant fold FP16_TO_FP (#94790 ) In some case, constant can survive early constant folding optimization because they are hidden behind several layers of type changes. E.g., consider the following sequence (extracted from the arm test that this commit changes): ``` t2: v1f16 = BUILD_VECTOR ConstantFP:f16<APFloat(0)> t4: v1f16 = insert_vector_elt t2, ConstantFP:f16<APFloat(0)>, Constant:i32<0> t5: f16 = bitcast t4 t6: f32 = fp_extend t5 ``` Because the constant (APFloat(0)) is hidden behind a <1 x ty> type, all the constant folding that normally happen for scalar nodes when using `SelectionDAG::getNode` are blocked. As a result the constant manages to survive as an actual conversion instruction down to the select phase: ``` t11: f32 = fp16_to_fp Constant:i32<0> ``` With the change in this patch, we try to do constant folding one more time during dag combine, which in the motivating example result in the much better sequence: ``` t7: ch = CopyToReg t0, Register:f32 %0, ConstantFP:f32<0.000000e+00> ``` Note: I'm sure we have this problem in a lot of other places. Generally speaking I believe SDISel is not that good with <1 x ty> compared to pure scalar. However, I only changed what I could easily test.	2024-06-08 11:31:13 +02:00
aengelke	74d62c2f73	[CodeGen][SDAG] Remove CombinedNodes SmallPtrSet (#94609 ) This "small" set grows quite large and it's more performant to store whether a node has been combined before in the node itself. As this information is only relevant for nodes that are currently not in the worklist, add a second state to the CombinerWorklistIndex (-2) to indicate that a node is currently not in a worklist, but was combined before. This brings a substantial performance improvement.	2024-06-07 13:17:27 +02:00
Simon Pilgrim	af3ffff34f	[DAG] Always allow folding XOR patterns to ABS pre-legalization (#94601 ) Removes residual ARM handling for vXi64 ABS nodes to prevent infinite loops.	2024-06-07 11:02:50 +01:00
Simon Pilgrim	03a2fe9a75	[DAG] visitSUB - update the ABS matching code to use SDPatternMatch and hasOperation. Avoids the need to explicitly test both commuted variants and doesn't match custom lowering after legalization. Cleanup for #94504	2024-06-06 10:06:57 +01:00
aengelke	6150e84cfc	[CodeGen][SDAG] Remove Combiner WorklistMap (#92900 ) DenseMap for pointer lookup is expensive, and this is only used for deduplication and index lookup. Instead, store the worklist index in the node itself. This brings a substantial performance improvement.	2024-06-05 17:58:08 +02:00
Yingwei Zheng	47fd32f81c	[DAGCombine] Fix type mismatch in `(shl X, cttz(Y)) -> (mul (Y & -Y), X)` (#94008 ) Proof: https://alive2.llvm.org/ce/z/J7GBMU Same as https://github.com/llvm/llvm-project/pull/92753, the types of LHS and RHS in shift nodes may differ. + When VT is smaller than ShiftVT, it is safe to use trunc. + When VT is larger than ShiftVT, it is safe to use zext iff `is_zero_poison` is true (i.e., `opcode == ISD::CTTZ_ZERO_UNDEF`). See also the counterexample `src_shl_cttz2 -> tgt_shl_cttz2` in the alive2 proofs. Fixes issue https://github.com/llvm/llvm-project/pull/85066#issuecomment-2142553617.	2024-06-01 19:04:55 +08:00
Jay Foad	b1be480b03	[DAGCombiner] Move CanReassociate down to first use. NFC.	2024-05-31 09:44:47 +01:00
Jianjian Guan	db6de1a20f	[DAGCombiner][VP] Add DAGCombine for VP_MUL (#80105 ) Use visitMUL to combine VP_MUL, share most logic of MUL with VP_MUL. Migrate from https://reviews.llvm.org/D121187	2024-05-31 10:17:11 +08:00
Yingwei Zheng	9e8ecce88e	[DAGCombine] Transform `shl X, cttz(Y)` to `mul (Y & -Y), X` if cttz is unsupported (#85066 ) This patch fold `shl X, cttz(Y)` to `mul (Y & -Y), X` if cttz is unsupported by the target. Alive2: https://alive2.llvm.org/ce/z/AtLN5Y Fixes https://github.com/llvm/llvm-project/issues/84763.	2024-05-29 18:26:54 +08:00
Matt Arsenault	16a5fd3fdb	DAG: Use flags in isLegalToCombineMinNumMaxNum (#93555 )	2024-05-28 18:57:38 +02:00
Shengchen Kan	eeb2f72a49	[SelectionDAG][X86] Fix the assertion failure in Release build after #91747 (#93434 ) In #91747, we changed the SDNode from `X86ISD::SUB` (FROM) to `X86ISD::CCMP` (TO) in the DAGCombine. The value type of `X86ISD::SUB` can be `i8, i32` while the value type of `X86ISD::CCMP` is i32. This breaks the assumption that the value type should match after the combine and triggers the error ``` SelectionDAG.cpp:10942: void llvm::SelectionDAG::transferDbgValues(llvm::SDValue, llvm::SDValue, unsigned int, unsigned int, bool): Assertion `FromNode && ToNode && "Can't modify dbg values"' failed. ``` when running tests llvm/test/CodeGen/X86/apx/ccmp.ll llvm/test/CodeGen/X86/apx/ctest.ll in Release build when LLVM_ENABLE_ASSERTIONS is on. In this patch, we fix it by creating a merged value.	2024-05-27 11:33:23 +08:00
Shengchen Kan	331eb8a004	[X86][CodeGen] Support lowering for CCMP/CTEST (#91747 ) DAG combine for `CCMP` and `CTESTrr`: ``` and/or(setcc(cc0, flag0), setcc(cc1, sub (X, Y))) -> setcc(cc1, ccmp(X, Y, ~cflags/cflags, cc0/~cc0, flag0)) and/or(setcc(cc0, flag0), setcc(cc1, cmp (X, 0))) -> setcc(cc1, ctest(X, X, ~cflags/cflags, cc0/~cc0, flag0)) ``` where `cflags` is determined by `cc1`. Generic DAG combine: ``` cmp(setcc(cc, X), 0) brcond ne -> X brcond cc sub(setcc(cc, X), 1) brcond ne -> X brcond ~cc ``` Post DAG transform: `ANDrr/rm + CTESTrr -> CTESTrr/CTESTmr` Pattern match for `CTESTri`: ``` X= and A, B ctest(X, X, cflags, cc0/, flag0) -> ctest(A, B, cflags, cc0/, flag0) ``` `CTESTmi` is already handled by the memory folding mechanism in MIR.	2024-05-26 18:32:23 +08:00
Simon Pilgrim	729fdb6bb6	[DAG] visitFunnelShift - pull out repeated SDLoc.	2024-05-24 14:50:42 +01:00
Simon Pilgrim	7273ad1238	[DAG] visitABD - rewrite "(abs x, 0)" folds to use SDPatternMatch No need for this to be vector specific, and its more likely that scalar cases will appear after #92576	2024-05-19 11:49:51 +01:00
Simon Pilgrim	9f5c8de386	[DAG] visitAVG - rewrite "fold (avgfloor x, 0) -> x >> 1" to use SDPatternMatch No need for this to be vector specific, and its more likely that scalar cases will appear after #92096	2024-05-19 11:30:20 +01:00
David Green	4c98f5b439	[DAG] Use copysign in frem power-2 fold. (#91751 ) As a small addition to #91148, this uses copysign to produce the correct sign for zero when converting frem to div/trunc/mul when we do not know that the input is positive (and we care about sign bits). The copysign lets us get the sign of zero correct. In testing, the only case this produced different results than fmod was: frem -inf, 4.0 -> nan vs -nan	2024-05-18 22:50:19 +01:00
Patrick O'Neill	4ab2ac22d0	[DAGCombiner] Mark vectors as not AllAddOne/AllSubOne on type mismatch (#92195 ) Fixes #92193.	2024-05-15 12:39:28 -07:00
Simon Pilgrim	7f3e3785d0	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC.	2024-05-10 22:40:23 +01:00
Simon Pilgrim	7e6879b245	[X86] scalarizeExtractedBinop - reuse existing SDLoc. NFC.	2024-05-10 22:40:23 +01:00
David Green	8fc9e3d577	[DAG] Lower frem of power-2 using div/trunc/mul+sub (#91148 ) If we are lowering a frem and the divisor is known to be an integer power-2, we can use the formula 'frem = x - trunc(x / d) * d'. This avoids the more expensive call to fmod. The results are identical as fmod so long as d is a power-2 (so the mul does not round incorrectly), and the sign of the return is either always positive or not important for zeroes (nsz). Unfortunately Alive2 does not handle this well at the moment. I was using exhaustive checking to test this: (https://gist.github.com/davemgreen/6078015f30d3bacd1e9572f8db5d4b64). I found this in cpythons implementation of float_pow. I currently added it as a DAG combine for frem with power-2 fp constants.	2024-05-10 14:58:48 +01:00
David Green	23b673e5b4	[DAG][AArch64] Handle vscale addressing modes in reassociationCanBreakAddressingModePattern (#89908 ) reassociationCanBreakAddressingModePattern tries to prevent bad add reassociations that would break adrressing mode patterns. This adds support for vscale offset addressing modes, making sure we don't break patterns that already exist. It does not optimize _to_ the correct addressing modes yet, but prevents us from optimizating _away_ from them.	2024-05-10 09:27:02 +01:00
David Green	fcf945f4ed	[DAG] Fold add(mul(add(A, CA), CM), CB) -> add(mul(A, CM), CM*CA+CB) (#90860 ) This is useful when the inner add has multiple uses, and so cannot be canonicalized by pushing the constants down through the mul. This patch adds patterns for both `add(mul(add(A, CA), CM), CB)` and with an extra add `add(add(mul(add(A, CA), CM), B) CB)` as the second can come up when lowering geps.	2024-05-08 22:11:18 +01:00
Craig Topper	ef84452571	[DAGCombiner] Be more careful about looking through extends and truncates in mergeTruncStores. (#91375 ) Previously we recursively looked through extends and truncates on both SourceValue and WideVal. SourceValue is the largest source found for each of the stores we are combining. WideVal is the source for the current store. Previously we could incorrectly look through a (zext (trunc X)) pair and incorrectly believe X to be a good source. I think we could also look through a zext on one store and a sext on another store and arbitrarily pick one of the extends as the final source. With this patch we only look through one level of extend or truncate. And we don't look through extends/truncs on both SourceValue and WideVal at the same time. This may lose some optimization cases, but keeps everything we had tests for. Fixes #90936.	2024-05-07 21:17:50 -07:00
Simon Pilgrim	522b4bfe5b	[DAG] Fold bitreverse(shl/srl(bitreverse(x),y)) -> srl/shl(x,y) (#89897 ) Noticed while investigating GFNI per-element vector shifts (we can form SHL but not SRL/SRA) Alive2: https://alive2.llvm.org/ce/z/fSH-rf	2024-05-06 11:13:05 +01:00
Simon Pilgrim	caacf8685a	[DAG] Fold freeze(shuffle(x,y,m)) -> shuffle(freeze(x),freeze(y),m) (#90952 ) If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node. This requires special case handling to create a new ShuffleVectorSDNode. Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison / canCreateUndefOrPoison.	2024-05-04 12:03:10 +01:00
Craig Topper	3563af6c06	[DAGCombiner] In mergeTruncStore, make sure we aren't storing shifted in bits. (#90939 ) When looking through a right shift, we need to make sure that all of the bits we are using from the shift come from the shift input and not the sign or zero bits that are shifted in. Fixes #90936.	2024-05-03 09:59:33 -07:00
Simon Pilgrim	91c52b966a	[DAG] Pull out repeated SDLoc() from SHL/SRL/SRA combines. NFC. We were always calling SDLoc(N) at the top of each visitSHL/SRL/SRA for the FoldConstantArithmetic call, so just reuse this as much as possible.	2024-04-30 17:30:43 +01:00
Luke Lau	5e03c0af47	[DAGCombiner] Fix mayAlias not accounting for scalable MMOs with offsets (#90573 ) In #70452 DAGCombiner::mayAlias was taught to handle scalable sizes, but when it checks via AA->isNoAlias it didn't take into account the case where the size is scalable but there was an offset too. For the fixed length case the offset was just accounted for by adding to the LocationSize, but for the scalable case there doesn't seem to be a way to represent both a scalable and fixed part in it. So this patch works around it by bailing if there is an offset. Fixes #90559	2024-04-30 20:20:40 +08:00
Bjorn Pettersson	55c6bda01e	Revert "Revert "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921 )" and more..." This reverts commit 16bd10a38730fed27a3bf111076b8ef7a7e7b3ee. Re-applies: b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)" 8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)" 73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)" with a fix in DAGCombiner::visitFREEZE.	2024-04-29 13:08:52 +02:00
David Spickett	16bd10a387	Revert "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921 )" and more... This reverts: b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)" (because it updates a test case that I don't know how to resolve the conflict for) 8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)" 73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)" Due to a test suite failure on AArch64 when compiling for SVE. https://lab.llvm.org/buildbot/#/builders/197/builds/13955 clang: ../llvm/llvm/include/llvm/CodeGen/ValueTypes.h:307: MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed.	2024-04-29 09:47:41 +01:00
Björn Pettersson	b3c55b7071	[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921 ) [SelectionDAG] Handle more opcodes in canCreateUndefOrPoison Handle SELECT_CC similarly as SETCC. Handle these operations that only propagate poison/undef based on the input operands: SADDSAT, UADDSAT, SSUBSAT, USUBSAT, MULHU, MULHS, SMIN, SMAX, UMIN, UMAX These operations may create poison based on shift amount and exact flag being violated: SRL, SRA One goal here is to allow pushing freeze through these operations when allowed, as well as letting analyses such as isGuaranteedNotToBeUndefOrPoison to not break on such operations. Since some problems have been observed with pushing freeze through SRA/SRL we block that explicitly in DAGCombiner::visitFreeze now. That way we can still model SRA/SRL properly in SelectionDAG::canCreateUndefOrPoison, e.g. when used by isGuaranteedNotToBeUndefOrPoison, even if we do not want to push freeze through those instructions.	2024-04-29 07:56:49 +02:00
Matt Arsenault	405c018c71	DAG: Simplify demanded bits for truncating atomic_store (#90113 ) It's really unfortunate that STORE and ATOMIC_STORE are separate opcodes. This duplicates a basic simplify demanded for the truncating case. This avoids some AMDGPU lit regressions in a future patch. I'm not sure how to craft a test that exposes this without first introducing the regressions by promoting half to i16.	2024-04-26 15:21:44 +02:00
Simon Pilgrim	55d85c84ac	[DAG] visitORCommutative - fold build_pair(not(x),not(y)) -> not(build_pair(x,y)) style patterns (#90050 ) (Sorry, not an actual build_pair node just a similar pattern). For cases where we're concatenating 2 integers into a double width integer, see if both integer sources are NOT patterns. We could take this further and handle all logic ops with a constant operands, but I just wanted to handle the case reported on #89533 initially. Fixes #89533	2024-04-26 14:11:03 +01:00
Bjorn Pettersson	8e2f6495c0	[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932 ) Avoid turning a BUILD_VECTOR that can be recognized as "all zeros", "all ones" or "constant" into something that depends on freeze(undef), as that would destroy those properties. Instead we replace undef by 0/-1 in such vectors, making it possible to fold away the freeze. We typically use -1 if the BUILD_VECTOR would identify as "all ones", and otherwise we use the value 0.	2024-04-26 13:41:21 +02:00
Simon Pilgrim	d51a17f684	[DAG] visitORCommutative - pull out repeated SDLoc(). NFC.	2024-04-25 14:23:36 +01:00
Björn Pettersson	f9b419b7a0	[DAGCombiner] Fix miscompile bug in combineShiftOfShiftedLogic (#89616 ) Ensure that the sum of the shift amounts does not overflow the shift amount type when combining shifts in combineShiftOfShiftedLogic. Solves a miscompile bug found when testing the C23 BitInt feature. Targets like X86 that only use an i8 for shift amounts after legalization seems to be extra susceptible for bugs like this as it isn't legal to shift more than 255 steps.	2024-04-23 14:11:34 +02:00
Simon Pilgrim	ca9a44ef47	[DAG] visitORCommutative - use sd_match to reduce the need for commutative operand matching. NFCI. Use sd_match to match commutative inner AND/OR/XOR node arguments instead of some messy manual matching of each commutation.	2024-04-22 10:41:57 +01:00
Simon Pilgrim	c88b84d467	[DAG] visitOR/visitORLike - merge repeated SDLoc calls.	2024-04-22 10:28:02 +01:00
Craig Topper	ce48f43f05	[SelectionDAG] Require UADDO_CARRY carryin and carryout to have the same type. (#89255 ) This requires type legalization to keep them the same. This means we no longer need to legalize the operand since it will be legalized when we legalize the second result.	2024-04-19 12:38:53 -07:00
Simon Pilgrim	2e68ba99de	[DAG] visitADDLike - update "(x - y) + -1 -> add (xor y, -1), x" fold to accept UNDEF in a splat vector of -1 Make sure we use getNOT instead of reusing the allones (with undefs) vector	2024-04-19 13:47:29 +01:00
Craig Topper	ba1158813d	[DAGCombiner][AArch64] Make combineCarryDiamond avoid creating UADDO_CARRY with carry in larger than setcc result type. (#89121 ) In the attach test case we were creating a UADDO_CARRY with i1 carry out and i41 carry in. i41 exceeds is larger than the setcc result type for AArch64 which is i32. i41 needs to be promoted to i64 since it is larger than i32. The type legalizer tried to use promoteTargetBoolean, but that can only promote from a type smaller than setcc result type. The easiest fix here is to force the carryin type to match the carryout type at the type of creation. This should ensure the node won't exceeed setcc result type as long as the output type doesn't. I think we should explore requiring the types to match for this node. Fixes #88966	2024-04-18 08:34:51 -07:00
Simon Pilgrim	73b255c9f8	[DAG] Ensure extract_subvector(insert_subvector(x,y,c1),c2) --> extract_subvector(y,c2-c1) is working on fixed vector types #87925 failed to ensure we weren't removing the extracted subvector from a scalable vector type Thanks to @antmox for the headsup.	2024-04-18 13:21:52 +01:00
Simon Pilgrim	c18a3b6bd3	[DAG] Fold extract_subvector(insert_subvector(x,y,c1),c2) --> extract_subvector(y,c2-c1) (#87925 ) (REAPPLIED) If the extract_subvector is cheap, attempt to extract directly from an inserted subvector Reapplied with a check to ensure we only attempt this for fixed vectors	2024-04-16 12:30:27 +01:00
Alina Sbirlea	40bbdb609f	Revert "[DAG] Fold extract_subvector(insert_subvector(x,y,c1),c2) --> extract_subvector(y,c2-c1) (#87925 )" This reverts commit 8c0f52e9d5a99bf96bb64ac23b5893482c292527. Reverting to green, reproducer attached in the PR/revision comments.	2024-04-15 17:38:52 -07:00
Yingwei Zheng	4d28d3f93b	[SDAG] Turn umin into smin if the saturation pattern is broken (#88505 ) As we canonicalizes smin with non-negative operands into umin in the middle-end, the saturation pattern will be broken. This patch reverts the transform in DAGCombine to fix the regression on ARM. Fixes https://github.com/llvm/llvm-project/issues/85706.	2024-04-16 01:28:28 +08:00
fengfeng	36230f90ee	[SelectionDAG] Propagate Disjoint flag. (#88370 ) Signed-off-by: feng.feng <feng.feng@iluvatar.com>	2024-04-15 11:01:15 +02:00

... 3 4 5 6 7 ...

4027 Commits