llvm-project

Author	SHA1	Message	Date
David Green	57ff805a6d	[DAG] Create fptoui.sat from clamped fptosi As an extension to D111976, this converts clamp fptosi, clamped between 0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with conversions that naturally saturate, such as Arm. X86 disables the transform as some of the test cases increases in size. A fptoui.sat necessitates a fp clamp without native support, so there is little use in converting if the instruction is just going to be expanded. Differential Revision: https://reviews.llvm.org/D112428	2021-12-05 09:25:52 +00:00
Simon Pilgrim	19d34f6e95	[X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation. But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation. This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result. Differential Revision: https://reviews.llvm.org/D113371	2021-12-01 16:37:49 +00:00
Bradley Smith	0eb1efb92c	[DAGCombiner] When combining REM ensure optimized div nodes are unique The REM DAG combine uses the visitDivLike functions to try and get an optimized DIV node to provide better codegen, however in some cases this visitDivLike call ends up in the BuildSDIVPow2 target hook, which in turn sometimes will return the same node passed in to indicate not to change it. The REM DAG combine does not anticipate this and creates a cycle in the DAG because of it. Fix this by ensuring any such optimized div node returned is distinct from the node being combined. Differential Revision: https://reviews.llvm.org/D114716	2021-12-01 11:24:26 +00:00
Simon Pilgrim	9981dd142f	[DAG] Apply clang-format to visitMSTORE + visitMLOAD. NFC. Reduce diff in D114582	2021-12-01 11:23:47 +00:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit 52ff3b009388f1bef4854f1b6470b4ec19d10b0e.	2021-11-30 15:36:56 +01:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
Bradley Smith	6180806632	[AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom This allows the generic DAG combine to fold fp_extend/fp_trunc into loads/stores which we can then lower into a integer extending load/truncating store plus an FP_EXTEND/FP_ROUND. The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked types hence lowering them introduces an unpack/zip. By allowing these nodes to be combined with loads/store we make it much easier to have this unpack/zip combined into the load/store by our custom lowering. Differential Revision: https://reviews.llvm.org/D114580	2021-11-29 11:56:07 +00:00
Simon Pilgrim	812e64ef0c	[DAG] MatchRotate - support rotate-by-constant of illegal types Patch to fix some of the regressions in D77804. By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443. Differential Revision: https://reviews.llvm.org/D113192	2021-11-19 11:12:04 +00:00
Craig Topper	233def40f7	[DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs. It's possible that the mask is already a NOT. At least if InstCombine hasn't canonicalized the input. In that case we will form an ANDN with X instead of with Y. So we don't need to worry about Y being a constant. We might need to check that X isn't a constant instead, but we don't have a test case for that yet. This fixes a size regression found when trying to enable this combine for RISCV in D113937. Differential Revision: https://reviews.llvm.org/D113948	2021-11-15 17:15:51 -08:00
Simon Pilgrim	7bac1985f4	[DAG] SimplifyVBinOp - add SDLoc() argument Pass in SDLoc instead of (repeated) local creations in SimplifyVBinOp and scalarizeBinOpOfSplats	2021-11-15 10:43:56 +00:00
Simon Pilgrim	8658d20724	[DAG] SimplifyVBinOp - pull out repeated getValueType() call. NFC.	2021-11-15 10:43:55 +00:00
Sanjay Patel	254c5246e9	[DAGCombiner] match inverted/swapped patterns for vselect of mask of signbit This was noted as a follow-up to D113212 / D113426: 4fc1fc4005f7 7e30404c3b6c 11522cfcad6b https://alive2.llvm.org/ce/z/e4o96b The canonicalization rules for these IR patterns are complicated, and we were not matching the expected forms in 2 out of the 3 cases. We can make codegen more robust by matching the swapped forms (and that will also work if these patterns are created late).	2021-11-14 09:35:26 -05:00
Kazu Hirata	99d5cbbd7e	[CodeGen] Use SDNode::uses (NFC)	2021-11-12 07:33:29 -08:00
Simon Pilgrim	010b09b0c5	[DAG] reassociateOpsCommutative - test getNode result directly. NFC Matches the clean code style we use directly above	2021-11-11 18:45:50 +00:00
Sanjay Patel	11522cfcad	[DAGCombiner] add fold for vselect based on mask of signbit, part 3 (Cond0 s> -1) ? N1 : 0 --> ~(Cond0 s>> BW-1) & N1 https://alive2.llvm.org/ce/z/mGCBrd This was suggested as a potential enhancement in D113212 (also 7e30404c3b6c ). There's an improvement for AArch that could be generalized ( X > -1 --> X >= 0 ). For x86, we have a counter-acting fold for most cases that turns the shift+not back into a setcc, so that needs a work-around to get more cases to use "pandn": D113603 Note that this pattern (and a previous one) are not currently canonical forms in IR: https://alive2.llvm.org/ce/z/e4o96b Adding swapped variants is left as a TODO item here, but is planned as a near-term follow-up patch. Differential Revision: https://reviews.llvm.org/D113426	2021-11-11 10:27:37 -05:00
Simon Pilgrim	82b74363a9	[DAG] reassociateOpsCommutative - peek through bitcasts to find constants Now that FoldConstantArithmetic can fold bitcasted constants, we should peek through bitcasts of binop operands to try and find foldable constants	2021-11-11 12:00:22 +00:00
Simon Pilgrim	381d14775e	[DAG] reassociateOpsCommutative - pull out repeated getOperand() calls. NFC.	2021-11-10 15:19:13 +00:00
Simon Pilgrim	f059b04f7b	[DAG] Add SelectionDAG::ComputeMinSignedBits helper As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper. I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious. Differential Revision: https://reviews.llvm.org/D113396	2021-11-08 14:12:45 +00:00
Simon Pilgrim	f60d3ec0c7	[DAG] Add BuildVectorSDNode::getConstantRawBits helper We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation. This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data. Differential Revision: https://reviews.llvm.org/D113351	2021-11-08 12:07:38 +00:00
Simon Pilgrim	0ff1edeeec	[DAG] SimplifyVBinOp - replace FoldConstantVectorArithmetic with FoldConstantArithmetic Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.	2021-11-07 12:11:46 +00:00
Sanjay Patel	39c4c7d391	[DAGCombiner] remove vselect fold that was accidentally added This diff snuck into the unrelated: 025a2f73a319 It's a suggested follow-up for D113212, but I need to add test coverage first.	2021-11-06 09:34:30 -04:00
Sanjay Patel	025a2f73a3	[InstCombine] add tests for umax with sub; NFC	2021-11-06 08:32:52 -04:00
Sanjay Patel	7e30404c3b	[DAGCombiner] add fold for vselect based on mask of signbit, part 2 This is the 'or' sibling for the fold added with: D113212 https://alive2.llvm.org/ce/z/tgnp7K Note that neither of these transforms is poison-safe, but it does not seem to matter at this level. We have had the scalar version of D113212 for a long time, so this is just making optimizer behavior consistent. We do not have the scalar version of this fold, however, so that is another follow-up.	2021-11-05 15:02:12 -04:00
Simon Pilgrim	9e6506299a	[DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic. We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code. Differential Revision: https://reviews.llvm.org/D113276	2021-11-05 14:36:17 +00:00
Sanjay Patel	4fc1fc4005	[DAGCombiner] add fold for vselect based on mask of signbit (X s< 0) ? Y : 0 --> (X s>> BW-1) & Y We canonicalize to the icmp+select form in IR, and we already have this fold for scalar select in SDAG, so I think it's an oversight that we don't have the fold for vectors. It seems neutral for AArch64 and saves some instructions on x86. Whether we should also have the sibling folds for the inverse condition or all-ones true value may depend on target-specific factors such as whether there's an "and-not" instruction. Differential Revision: https://reviews.llvm.org/D113212	2021-11-05 10:06:16 -04:00
jacquesguan	a39eadcf16	[DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector. Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b), if c can truncate to i32 without loss. Reviewed By: frasercrmck, craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D108129	2021-11-02 12:04:23 +00:00
Simon Pilgrim	37e17f278f	[DAG] MatchRotate - remove (redundant) legal type check. Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.	2021-11-02 11:24:50 +00:00
Abinav Puthan Purayil	db8d7b6e2d	[DAGCombine][NFC] s/it's/its in the comment of hasNoInfs().	2021-10-29 07:36:38 +05:30
Sanjay Patel	6e46b66e2a	[DAGCombiner] make matching bit-hack form of usubsat more flexible (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 As suggested in D112085, we can substitute 'xor' with 'add' in this pattern, and it is logically equivalent: https://alive2.llvm.org/ce/z/eJtWWC We canonicalize to 'xor' in IR, but SDAG does not do that (and it probably should not - https://llvm.org/PR52267 ), so it is possible to see either pattern in codegen. Note that 'sub' is a another potential pattern, but that is canonicalized to 'add' in DAGCombiner, so we don't need to worry about that variation. Differential Revision: https://reviews.llvm.org/D112377	2021-10-25 09:01:52 -04:00
Simon Pilgrim	a5f56342b0	[DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.	2021-10-22 18:32:14 +01:00
Craig Topper	04c184bba7	[TargetLowering] Simplify the interface of expandABS. NFC Instead of returning a bool to indicate success and a separate SDValue, return the SDValue and have the callers check if it is null. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112331	2021-10-22 10:22:23 -07:00
Sanjay Patel	d2198771e9	[DAGCombiner] fold bit-hack form of usubsat (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 I haven't found a generalization of this identity: https://alive2.llvm.org/ce/z/_sriEQ Note: I was actually looking at the first form of the pattern in that link, but that's part of a long chain of potential missed transforms in codegen and IR....that I hope ends here! The predicates for when this is profitable are a bit tricky. This version of the patch excludes multi-use but includes custom lowering (as opposed to legal only). On x86 for example, we have custom lowering for some vector types, and that uses umax and sub. So to enable that fold, we need add use checks to avoid regressions. Even with legal-only lowering, we could see code with extra reg move instructions for extra uses, so that constraint would have to be eased very carefully to avoid penalties. Differential Revision: https://reviews.llvm.org/D112085	2021-10-21 09:47:19 -04:00
Arthur Eubanks	6ea7437ca5	[SelectionDAG] Bail out of mergeTruncStores when not optimizing With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores. Fixes PR51827. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D111596	2021-10-20 16:58:22 -07:00
Simon Pilgrim	71e39e3f18	[ADT] Add APInt::isNegatedPowerOf2() helper Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos..... Differential Revision: https://reviews.llvm.org/D111998	2021-10-19 14:38:21 +01:00
Sanjay Patel	2a3cc4d461	[Analysis] add utility function for unary shuffle mask creation This is NFC-intended for the callers. Posting in case there are other potential users that I missed. I would also use this from VectorCombine in a patch for: https://llvm.org/PR52178 ( D111901 ) Differential Revision: https://reviews.llvm.org/D111891	2021-10-18 09:00:39 -04:00
Mingming Liu	cfd155c41b	[SelectionDAG] Fix typo in option help Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D111867	2021-10-15 11:27:40 -07:00
Roman Lebedev	684cbae89a	[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places	2021-10-11 23:36:06 +03:00
Wang, Pengfei	c236883b6b	[X86] Optimize fdiv with reciprocal instructions for half type Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110557	2021-10-08 09:41:13 +08:00
David Sherwood	37edb7d3e2	[SVE] Fix incorrect DAG combines when extracting fixed-width from scalable vectors We were previously silently generating incorrect code when extracting a fixed-width vector from a scalable vector. This is worse than crashing, since the user will have no indication that this is currently unsupported behaviour. I have fixed the code to only perform DAG combines when safe to do so, i.e. the input and output vectors are both fixed-width or both scalable. Test added here: CodeGen/AArch64/sve-extract-scalable-vector.ll Differential revision: https://reviews.llvm.org/D110624	2021-10-06 09:27:44 +01:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Simon Pilgrim	df672f66b6	[DAG] scalarizeExtractedVectorLoad - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit extracted loads to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback. I've also cleaned up the alignment calculation code - if we have a constant extraction index then the alignment can be based on an offset from the original vector load alignment, but for non-constant indices we should assume the worst (single element alignment only). Differential Revision: https://reviews.llvm.org/D110486	2021-10-01 21:07:34 +01:00
Fraser Cormack	e2b46e336b	[DAGCombiner][VP] Fold zero-length or false-masked VP ops This patch adds a generic DAGCombine for vector-predicated (VP) nodes. Those for which we can determine that no vector element is active can be replaced by either undef or, for reductions, the start value. This is tested rather trivially at the IR level, where it's possible that we want to teach instcombine to perform this optimization. However, we can also see the zero-evl case arise during SelectionDAG legalization, when wide VP operations can be split into two and the upper operation emerges as trivially false. It's possible that we could perform this optimization "proactively" (both on legal vectors and before splitting) and reduce the width of an operation and insert it into a larger undef vector: ``` v8i32 vp_add x, y, mask, 4 -> v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0 ``` This is somewhat analogous to similar vector narrow/widening optimizations, but it's unclear at this point whether that's beneficial to do this for VP ops for any/all targets. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D109148	2021-09-27 11:30:09 +01:00
Simon Pilgrim	18c8ed5416	[DAG] ReduceLoadOpStoreWidth - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit store narrowing to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory access by checking allowsMisalignedMemoryAccesses as a fallback.	2021-09-25 18:35:57 +01:00
Simon Pilgrim	6bd5b1b1ce	[DAG] combineShiftToMULH - move getValueType() inside assert. NFCI. Avoids an unnecessary (void).	2021-09-25 11:56:35 +01:00
Bjorn Pettersson	c3ae8ecb52	[DAGCombiner] Rename isAlias as mayAlias. NFC Differential Revision: https://reviews.llvm.org/D110062	2021-09-23 09:54:42 +02:00
Michael Liao	5fb3ae525f	[SelectionDAG] Re-calculate scoped AA metadata when merging stores. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D102821	2021-09-21 11:41:17 -04:00
Matt Arsenault	54d755a034	DAG: Fix incorrect folding of fmul -1 to fneg The fmul is a canonicalizing operation, and fneg is not so this would break denormals that need flushing and also would not quiet signaling nans. Fold to fsub instead, which is also canonicalizing.	2021-09-14 21:25:02 -04:00
David Truby	915e9e76bf	[llvm][sve] Lowering for VLS masked extending loads This extends the custom lowering for extending loads on fixed length vectors in SVE to support masked extending loads. The existing tests for correct behaviour of masked extending loads exhibit bad code generation due to the legalistaion of i1 vectors. They have been left as-is and new tests have been added that do not exhibit this behaviour. Differential Revision: https://reviews.llvm.org/D108200	2021-09-13 11:13:25 +01:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00

1 2 3 4 5 ...

3138 Commits