llvm-project

Author	SHA1	Message	Date
David Sherwood	2078771759	[SVE][CodeGen] Add simple integer add tests for SVE tuple types I have added tests to: CodeGen/AArch64/sve-intrinsics-int-arith.ll for doing simple integer add operations on tuple types. Since these tests introduced new warnings due to incorrect use of getVectorNumElements() I have also fixed up these warnings in the same patch. These fixes are: 1. In narrowExtractedVectorBinOp I have changed the code to bail out early for scalable vector types, since we've not yet hit a case that proves the optimisations are profitable for scalable vectors. 2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced calls to getVectorNumElements with getVectorMinNumElements in cases that work with scalable vectors. For the other cases I have added asserts that the vector is not scalable because we should not be using shuffle vectors and build vectors in such cases. Differential revision: https://reviews.llvm.org/D84016	2020-07-29 13:32:10 +01:00
Changpeng Fang	9162b70e51	DADCombiner: Don't simplify the token factor if the node's number of operands already exceeds TokenFactorInlineLimit Summary: In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000. We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose to give up simplification with the consideration of compile time. Reviewers: @spatel, @arsenm Differential Revision: https://reviews.llvm.org/D84204	2020-07-25 21:20:59 -07:00
Eli Friedman	b8f765a1e1	[AArch64][SVE] Add support for trunc to <vscale x N x i1>. This isn't a natively supported operation, so convert it to a mask+compare. In addition to the operation itself, fix up some surrounding stuff to make the testcase work: we need concat_vectors on i1 vectors, we need legalization of i1 vector truncates, and we need to fix up all the relevant uses of getVectorNumElements(). Differential Revision: https://reviews.llvm.org/D83811	2020-07-20 13:11:02 -07:00
Roger Ferrer Ibanez	14bc5e149d	[DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1) The existing code already considered this case. Unfortunately a typo in the condition prevents it from triggering. Also the existing code, had it run, forgot to do the folding. This fixes PR42876. Differential Revision: https://reviews.llvm.org/D65802	2020-07-15 07:34:22 +00:00
David Sherwood	3b8eaf26db	[SVE][CodeGen] Fix implicit TypeSize->uint64_t conversion in TransformFPLoadStorePair In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable property of TypeSize when trying to create an integer type of equivalent size. In fact, this optimisation makes no sense for scalable types since we don't know the size at compile time. I have changed the code to bail out when encountering scalable type sizes. I've added a test to llvm/test/CodeGen/AArch64/sve-fp.ll that exercises this code path. The test already emits an error if it encounters warnings due to implicit TypeSize->uint64_t conversions. Differential Revision: https://reviews.llvm.org/D83572	2020-07-14 08:07:30 +01:00
Sanjay Patel	8779b11410	[DAGCombiner] rot i16 X, 8 --> bswap X We have this generic transform in IR (instcombine), but as shown in PR41098: http://bugs.llvm.org/PR41098 ...the pattern may emerge in codegen too. x86 has a potential refinement/reversal opportunity here, but that should come later or needs a target hook to avoid the transform. Converting to bswap is the more specific form, so we should use it if it is available.	2020-07-13 12:01:53 -04:00
Sanjay Patel	2df46a5743	[DAGCombiner] allow load/store merging if pairs can be rotated into place This carves out an exception for a pair of consecutive loads that are reversed from the consecutive order of a pair of stores. All of the existing profitability/legality checks for the memops remain between the 2 altered hunks of code. This should give us the same x86 base-case asm that gcc gets in PR41098 and PR44895: http://bugs.llvm.org/PR41098 http://bugs.llvm.org/PR44895 I think we are missing a potential subsequent conversion to use "movbe" if the target supports that. That might be similar to what AArch64 would use to get "rev16". Differential Revision: https://reviews.llvm.org/D83567	2020-07-13 08:57:00 -04:00
Sanjay Patel	f1bbf3acb4	Revert "[DAGCombiner] allow load/store merging if pairs can be rotated into place" This reverts commit 591a3af5c7acc05617c0eacf6ae4f76bd8a9a6ce. The commit message was cut off and failed to include the review citation.	2020-07-13 08:55:29 -04:00
Sanjay Patel	591a3af5c7	[DAGCombiner] allow load/store merging if pairs can be rotated into place This carves out an exception for a pair of consecutive loads that are reversed from the consecutive order of a pair of stores. All of the existing profitability/legality checks for the memops remain between the 2 altered hunks of code. This should give us the same x86 base-case asm that gcc gets in PR41098 and PR44895:i http://bugs.llvm.org/PR41098 http://bugs.llvm.org/PR44895 I think we are missing a potential subsequent conversion to use "movbe" if the target supports that. That might be similar to what AArch64 would use to get "rev16". Differential Revision:	2020-07-13 08:53:06 -04:00
Sanjay Patel	39009a8245	[DAGCombiner] tighten fast-math constraints for fma fold fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E) This is only allowed when "reassoc" is present on the fadd. As discussed in D80801, this transform goes beyond what is allowed by "contract" FMF (-ffp-contract=fast). That is because we are fusing the trailing add of 'E' with a multiply, but without "reassoc", the code mandates that the products AB and CD are added together before adding in 'E'. I've added this example to the LangRef to try to clarify the meaning of "contract". If that seems reasonable, we should probably do something similar for the clang docs because there does not appear to be any formal spec for the behavior of -ffp-contract=fast. Differential Revision: https://reviews.llvm.org/D82499	2020-07-12 08:51:49 -04:00
Sanjay Patel	02fec9d2a5	[DAGCombiner] move/rename variables for readability; NFC	2020-07-10 11:28:51 -04:00
Sanjay Patel	a46cf40240	[DAGCombiner] convert if-chain in store merging to switch; NFC	2020-07-09 17:20:04 -04:00
Sanjay Patel	b476e6a642	[DAGCombiner] add helper function for store merging of loaded values; NFC	2020-07-09 17:20:04 -04:00
Sanjay Patel	f98a602c2e	[DAGCombiner] add helper function for store merging of extracts; NFC	2020-07-09 17:20:03 -04:00
Sanjay Patel	8d74cb01b7	[DAGCombiner] add helper function for store merging of constants; NFC	2020-07-09 17:20:03 -04:00
Sanjay Patel	6890e2a17b	[DAGCombiner] add helper function to manage list of consecutive stores; NFC	2020-07-09 17:20:03 -04:00
Sanjay Patel	1265eb2d5f	[DAGCombiner] clean up in mergeConsecutiveStores(); NFC	2020-07-08 14:48:05 -04:00
Sanjay Patel	12c2271e53	[DAGCombiner] fix code comment and improve readability; NFC	2020-07-08 14:48:05 -04:00
Sanjay Patel	683a7f7025	[DAGCombiner] fix function-name formatting; NFC	2020-07-08 12:49:59 -04:00
Sanjay Patel	39329d5724	[DAGCombiner] add enum for store source value; NFC This removes existing code duplication and allows us to assert that we are handling the expected cases. We have a list of outstanding bugs that could benefit by handling truncated source values, so that's a possible addition going forward.	2020-07-08 12:49:59 -04:00
Ties Stuij	26a22478cd	[CodeGen] Don't combine extract + concat vectors with non-legal types Summary: The following combine currently breaks in the DAGCombiner: ``` extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x -> extract_vector_elt a, x ``` This happens because after we have combined these nodes we have inserted nodes that use individual instances of the vector element type. In the above example i16. However this isn't a legal type on all backends, and when the combining pass calls the legalizer it breaks as it expects types to already be legal. The type legalizer has already been run, and running it again would make a mess of the nodes. In the example code at least, the generated code is still efficient after the change. Reviewers: miyuki, arsenm, dmgreen, lebedev.ri Reviewed By: miyuki, lebedev.ri Subscribers: lebedev.ri, wdng, hiraditya, steven.zhang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83231	2020-07-08 15:29:57 +01:00
Kerry McLaughlin	5e8084beba	[SVE][CodeGen] Legalisation of unpredicated load instructions Summary: When splitting a load of a scalable type, the new address is calculated in SplitVecRes_LOAD using a vscale and an add instruction. This patch also adds a DAG combiner fold to visitADD for vscale: - Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1))) Reviewers: sdesmalen, efriedma, david-arm Reviewed By: david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82792	2020-07-07 11:05:03 +01:00
Sanjay Patel	ea71ba11ab	[DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division X / (fabs(A) * sqrt(Z)) --> X / sqrt(AAZ) --> X * rsqrt(AAZ) In the motivating case from PR46406: https://bugs.llvm.org/show_bug.cgi?id=46406 ...this is restoring the sequence that was originally in the source code. We extracted a term from within the sqrt because we do not know in instcombine whether a target will expand a sqrt call. Note: we could say that the transform in IR should be restricted, but that would not solve the problem if the source was originally in the pattern shown here. This is a gray area for fast-math-flag requirements. I think we should at least check fast-math-flags on the fdiv and fmul because I view this transform as 2 pieces: reassociate the fmul operands and form reciprocal from the fdiv (as with the existing transform). We could argue that the sqrt also needs FMF, but that was not required before, so we should change that in a follow-up patch if that seems better. We don't currently have a way to check that the target will produce a sqrt or recip estimate without actually creating nodes (the APIs are SDValue getSqrtEstimate() and SDValue getRecipEstimate()), so we clean up speculatively created nodes if we are not able to create an estimate. The x86 test with doubles verifies that we are not changing a test with no estimate sequence. Differential Revision: https://reviews.llvm.org/D82716	2020-07-06 19:12:21 -04:00
Craig Topper	76123d338d	[DAGCombiner] visitSIGN_EXTEND_INREG should fold sext_vector_inreg(undef) to 0 not undef. We need to ensure that the sign bits of the result all match so we can't fold to undef. Similar to PR46585. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D83163	2020-07-04 14:35:49 -07:00
Craig Topper	120c5f1057	[DAGCombiner] Don't fold zext_vector_inreg/sext_vector_inreg(undef) to undef. Fold to 0. zext_vector_inreg needs to produces 0s in the extended bits and sext_vector_inreg needs to produce upper bits that are all the same. So we should fold them to a 0 vector instead of undef. Fixes PR46585.	2020-07-04 11:42:53 -07:00
Sander de Smalen	143e324e75	[CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics, that shouldn't really have been there (I suspect this was a remnant from when we expected the wider vector always to have come from a vector CONCAT). When I tried to create a more minimal reproducer, I found a bug in DAGCombiner where it drops the scalable flag when trying to fold: extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index') This patch fixes both issues. Reviewers: david-arm, efriedma, spatel Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D82910	2020-07-02 10:16:43 +01:00
David Sherwood	f11305780f	[CodeGen] Fix warnings in DAGCombiner::visitSCALAR_TO_VECTOR In visitSCALAR_TO_VECTOR we try to optimise cases such as: scalar_to_vector (extract_vector_elt %x) into vector shuffles of %x. However, it led to numerous warnings when %x is a scalable vector type, so for now I've changed the code to only perform the combination on fixed length vectors. Although we probably could change the code to work with scalable vectors in certain cases, without a proper profit analysis it doesn't seem worth it at the moment. This change fixes up one of the warnings in: llvm/test/CodeGen/AArch64/sve-merging-stores.ll I've also added a simplified version of the same test to: llvm/test/CodeGen/AArch64/sve-fp.ll which already has checks for no warnings. Differential Revision: https://reviews.llvm.org/D82872	2020-07-01 18:47:13 +01:00
Guillaume Chatelet	ef36f5143d	[Alignment] TargetLowering::hasPairedLoad must use Align for RequiredAlignment As per documentation of `hasPairLoad`: "`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load." In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned. There is only one implementor of this interface. Differential Revision: https://reviews.llvm.org/D82958	2020-07-01 14:32:30 +00:00
David Sherwood	97a7a9abb2	[CodeGen] Fix up warnings in visitEXTRACT_SUBVECTOR It's perfectly valid to do certain DAG combines where we extract subvectors from a concat vector when we have scalable vector types. However, we can do this in a way that avoids generating compiler warnings by replacing calls to getVectorNumElements() with getVectorMinNumElements(). Due to the way subvector extracts are designed to work with scalable vector types this is ok. This eliminates some warnings from existing tests in this file: llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll Differential Revision: https://reviews.llvm.org/D82655	2020-07-01 15:10:53 +01:00
Guillaume Chatelet	5f8bdb3e6a	[Alignment][NFC] TargetLowering::allowsMemoryAccess Second patch of a series to adapt TargetLowering::allowsXXX functions This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82785	2020-06-30 08:17:00 +00:00
David Sherwood	46a7f4d6f4	[SVE][CodeGen] Fix bug in DAGCombiner::reduceBuildVecToShuffle When trying to reduce a BUILD_VECTOR to a SHUFFLE_VECTOR it's important that we carefully check the vector types that led to that BUILD_VECTOR. In the test I have attached to this commit there is a case where the results of two SVE faddv instructions are being stored to consecutive memory locations. With my fix, as part of merging those stores we discover that each BUILD_VECTOR element came from an extract of a SVE vector element and therefore bail out. Differential Revision: https://reviews.llvm.org/D82564	2020-06-30 07:28:15 +01:00
Guillaume Chatelet	3500d9ec95	Fix invalid alignment in DAGCombiner::isLegalNarrowLdSt `ShAmt / 8` can be a non power of two, this can lead to an invalid alignment. context: https://reviews.llvm.org/D41350#inline-749165 Differential Revision: https://reviews.llvm.org/D82565	2020-06-29 09:22:15 +00:00
Simon Pilgrim	6bdb3ce452	[DAG] reduceBuildVecExtToExtBuildVec - don't combine if it would break a splat. reduceBuildVecExtToExtBuildVec was breaking a splat(zext(x)) pattern into buildvector(x, 0, x, 0, ..) resulting in much more complex insert+shuffle codegen. We already go to some lengths to avoid this in SimplifyDemandedVectorElts etc. when we encounter splat buildvectors. It should be OK to fold all splat(aext(x)) patterns - we might need to tighten this if we find a case where we mustn't introduce a buildvector(x, undef, x, undef, ..) but I can't find one. Fixes PR46461.	2020-06-27 11:03:57 +01:00
Sanjay Patel	e7f7715eb9	[DAGCombiner] rename variables for readability; NFC PR46406 shows a pattern where we can do better, so try to clean this up before adding more code.	2020-06-26 14:22:11 -04:00
Simon Pilgrim	bcc0dc3832	[DAG] visitSIGN_EXTEND_INREG - rename EVT variable. NFCI. We had a EVT type variable called EVT, which isn't a good idea....	2020-06-23 10:45:27 +01:00
Michael Liao	b1360caa82	[SDAG] Add new AssertAlign ISD node. Summary: - AssertAlign node records the guaranteed alignment on its source node, where these alignments are retrieved from alignment attributes in LLVM IR. These tracked alignments could help DAG combining and lowering generating efficient code. - In this patch, the basic support of AssertAlign node is added. So far, we only generate AssertAlign nodes on return values from intrinsic calls. - Addressing selection in AMDGPU is revised accordingly to capture the new (base + offset) patterns. Reviewers: arsenm, bogner Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81711	2020-06-23 00:51:11 -04:00
David Sherwood	7edc7f6edb	[CodeGen] Fix SimplifyDemandedBits for scalable vectors For now I have changed SimplifyDemandedBits and it's various callers to assume we know nothing for scalable vectors and to ignore the demanded bits completely. I have also done something similar for SimplifyDemandedVectorElts. These changes fix up lots of warnings due to calls to EVT::getVectorNumElements() for types with scalable vectors. These functions are all used for optimisations, rather than functional requirements. In future we can revisit this code if there is a need to improve code quality for SVE. Differential Revision: https://reviews.llvm.org/D80537	2020-06-19 07:59:35 +01:00
Qiu Chaofan	f8ef7c99a0	[DAGCombiner] Require ninf for division estimation Current implementation of division estimation isn't correct for some cases like 1.0/0.0 (result is nan, not expected inf). And this change exposes a potential infinite loop: we use isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the divisor is some constant. But it doesn't work after legalized on some platforms. This patch restricts the method to act before LegalDAG. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D80542	2020-06-14 22:58:22 +08:00
Michael Liao	e7b920e6fe	[DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2)) Reviewers: arsenm Subscribers: sdardis, wdng, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, ecnelises, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81708	2020-06-12 13:53:08 -04:00
Simon Pilgrim	5509e2cc2e	[DAG] foldAddSubOfSignBit - add support for non-uniform vector constants	2020-06-12 14:58:15 +01:00
Sanjay Patel	702cf93356	[DAGCombiner] allow more folding of fadd + fmul into fma If fmul and fadd are separated by an fma, we can fold them together to save an instruction: fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1)) The fold implemented here is actually a specialization - we should be able to peek through >1 fma to find this pattern. That's another patch if we want to try that enhancement though. This transform was guarded by the TLI hook enableAggressiveFMAFusion(), so it was done for some in-tree targets like PowerPC, but not AArch64 or x86. The hook is protecting against forming a potentially more expensive computation when fma takes longer to execute than a single fadd. That hook may be needed for other transforms, but in this case, we are replacing fmul+fadd with fma, and the fma should never take longer than the 2 individual instructions. 'contract' FMF is all we need to allow this transform. That flag corresponds to -ffp-contract=fast in Clang, so we are allowed to form fma ops freely across expressions. Differential Revision: https://reviews.llvm.org/D80801	2020-06-09 10:41:27 -04:00
Guillaume Chatelet	800e100588	Revert "[Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess" This reverts commit f21c52667ed147903015a94643b0057319189d4e.	2020-06-09 10:43:59 +00:00
Guillaume Chatelet	f21c52667e	[Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMemoryAccess` without marking it override. This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81379	2020-06-09 10:11:07 +00:00
Sanjay Patel	302cc8a121	[DAGCombiner] clean-up FMA+FMUL folds; NFC D80801 suggests some readability improvements before mocing this block.	2020-06-06 10:32:54 -04:00
Sanjay Patel	652b3757c8	[x86] add test/code comment for chain value use (PR46195); NFC	2020-06-04 09:15:17 -04:00
Simon Pilgrim	adf10dcf2e	[DAG] scalarizeBinOpOfSplats - extract from the source of splat vector (PR46189) D79003/rG9fa58d1bf2f8 exposed an issue with scalarizeBinOpOfSplats that we were extracting from the splatted vector result instead of the source, the splat index is only valid for the source vector not the result, which may contain undefs, including at the splat index.	2020-06-04 11:58:59 +01:00
Tim Northover	87e24c3200	Revert "[DAGCombiner] avoid unnecessary indirection from SDNode/SDValue; NFCI" This reverts commit 21dadd774f56778ef68c1ce307205dfbdacc793a. In at least PromoteIntBinOps, they wanted to know about users of all values produced by the node not just the integer being promoted. For example not replacing chain users if the operation was a load breaks the ordering of the DAG.	2020-06-04 11:53:14 +01:00
Kadir Cetinkaya	af86a10bad	[llvm] Fix unused variable warning	2020-06-02 22:46:24 +02:00
Amy Kwan	a3ada630d8	[DAGCombiner] Combine shifts into multiply-high This patch implements a target independent DAG combine to produce multiply-high instructions from shifts. This DAG combine will combine shifts for any type as long as the MULH on the narrow type is legal. For now, it is enabled on PowerPC as PowerPC is the only target that has an implementation of the isMulhCheaperThanMulShift TLI hook introduced in D78271. Moreover, this DAG combine focuses on catching the pattern: (shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>) to produce mulhs when we have a sign-extend, and mulhu when we have a zero-extend. The patch performs the following checks: - Operation is a right shift arithmetic (sra) or logical (srl) - Input to the shift is a multiply - Both operands to the shift are sext/zext nodes - The extends into the multiply are both the same - The narrow type is half the width of the wide type - The shift amount is the width of the narrow type - The respective mulh operation is legal Differential Revision: https://reviews.llvm.org/D78272	2020-06-02 15:22:48 -05:00
Craig Topper	a4dd45b7d0	[DAGCombiner] Move debug message and statistic update into CommitTargetLoweringOpt. This code was repeated in two callers of CommitTargetLoweringOpt. But CommitTargetLoweringOpt is also called from TargetLowering. We should print a message for those calls to. So sink the repeated code into CommitTargetLoweringOpt to catch those calls.	2020-05-30 19:47:07 -07:00

... 5 6 7 8 9 ...

3171 Commits