llvm-project

Author	SHA1	Message	Date
Nirav Dave	beabf456df	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252	2017-02-25 11:43:58 +00:00
Sanjay Patel	832b1622d8	[DAGCombiner] add missing folds for scalar select of {-1,0,1} The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in this patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 llvm-svn: 296137	2017-02-24 17:17:33 +00:00
Sanjay Patel	4a4fbe162f	[DAG] add convenience function to get -1 constant; NFCI llvm-svn: 296004	2017-02-23 19:02:33 +00:00
Bill Seurer	8e48f416ad	[DAGCombiner] revert r295336 r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849	2017-02-22 16:27:33 +00:00
Matt Arsenault	f0a4823b91	DAG: Check if extract_vector_elt is legal or custom Avoids test regressions in future AMDGPU commits when more vector types are custom lowered. llvm-svn: 295782	2017-02-21 22:47:27 +00:00
Simon Pilgrim	c0dc9a4913	Strip trailing whitespace. llvm-svn: 295653	2017-02-20 11:56:43 +00:00
Sanjay Patel	7f2e58972c	[DAGCombiner] split i1 select-of-constants from non-i1 case; NFCI I can't find any tests of the non-i1 code path, so it may be unnecessary at this point. llvm-svn: 295463	2017-02-17 17:13:27 +00:00
Simon Pilgrim	0429c0cf8b	Fix signed/unsigned comparison warning. llvm-svn: 295453	2017-02-17 16:01:16 +00:00
Simon Pilgrim	511d788a95	[DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle masks During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing). This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types. The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712). Differential Revision: https://reviews.llvm.org/D29454 llvm-svn: 295451	2017-02-17 15:14:48 +00:00
Sanjay Patel	5573042035	[DAGCombiner] improve readability; NFCI llvm-svn: 295447	2017-02-17 14:21:59 +00:00
Artur Pilipenko	85d758299e	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336	2017-02-16 17:07:27 +00:00
Artur Pilipenko	a1b384c4ce	Rever -r295314 "[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316	2017-02-16 13:04:46 +00:00
Artur Pilipenko	daaa0c0f7d	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314	2017-02-16 12:53:26 +00:00
Michael Kuperstein	ba80db39d7	[DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source We currently can't legalize those, but we should really not be creating them in the first place, since legalization would probably look similar to the way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD. This fixes PR311956. Differential Revision: https://reviews.llvm.org/D29961 llvm-svn: 295213	2017-02-15 18:37:26 +00:00
Craig Topper	3668bde371	[DAGCombiner] Teach DAG combine that inserting an extract_subvector result into the same location of a an undef vector can just use the original input to the extract. llvm-svn: 294932	2017-02-13 04:53:33 +00:00
Craig Topper	aa46204ed9	[DAGCombiner] Remove the half vector width check for the combine of EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR. This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors. llvm-svn: 294930	2017-02-12 23:49:49 +00:00
Craig Topper	b633adedc7	[DAGCombiner] Make the combine of INSERT_SUBVECTOR into a CONCAT_VECTOR more generic to support larger concats. llvm-svn: 294875	2017-02-11 22:57:09 +00:00
Artur Pilipenko	4a64031954	[DAGCombiner] Support non-zero offset in load combine Enable folding patterns which load the value from non-zero offset: i8 a = ... i32 val = a[4] \| (a[5] << 8) \| (a[6] << 16) \| (a[7] << 24) => i32 val = ((i32*)(a+4)) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29394 llvm-svn: 294582	2017-02-09 12:06:01 +00:00
Artur Pilipenko	045ab08252	[DAGCombiner] NFC. Mark ByteProvider accessors as const llvm-svn: 294494	2017-02-08 17:59:34 +00:00
Amaury Sechet	4b946916ac	[DAGCombiner] Push truncate through adde when the carry isn't used. Summary: As per title. Reviewers: mkuper, spatel, bkramer, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29528 llvm-svn: 294394	2017-02-08 00:32:36 +00:00
Daniel Jasper	84b3cc394d	Revert "[DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)" This reverts commit r294186. On an internal test, this triggers an out-of-memory error on PPC, presumably because there is another dagcombine that does the exact opposite triggering and endless loop consuming more and more memory. Chandler has started at creating a reduced test case and we'll attach it as soon as possible. llvm-svn: 294288	2017-02-07 08:57:50 +00:00
Artur Pilipenko	d3464bf9ad	[DAGCombiner] Support bswap as a part of load combine patterns Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29397 llvm-svn: 294201	2017-02-06 17:48:08 +00:00
Amaury Sechet	8a3b32941d	[DAGCombiner] Make DAGCombiner smarter about overflow Summary: Leverage it to transform addc into add. Reviewers: mkuper, spatel, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29524 llvm-svn: 294187	2017-02-06 14:54:49 +00:00
Amaury Sechet	1d466f598e	[DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry) Summary: This is extracted from D29443 . Reviewers: mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29564 llvm-svn: 294186	2017-02-06 14:28:39 +00:00
Amaury Sechet	143902c29f	[DAGCombiner] Leverage add's commutativity Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case. Reviewers: mkuper, spatel, bkramer, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29541 llvm-svn: 294125	2017-02-05 14:22:20 +00:00
Craig Topper	42b83f8d6e	[DAGCombiner] Canonicalize the order of a chain of INSERT_SUBVECTORs. Based on similar code for INSERT_VECTOR_ELT. llvm-svn: 294110	2017-02-04 23:26:39 +00:00
Craig Topper	04dce84ead	[DAGCombiner] Use DAG.getAnyExtOrTrunc to simplify some code. NFC llvm-svn: 294109	2017-02-04 23:26:37 +00:00
Craig Topper	ceaf9c1633	[DAGCombiner] In visitINSERT_VECTOR_ELT, move check for BUILD_VECTOR being legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS. llvm-svn: 294108	2017-02-04 23:26:34 +00:00
Amaury Sechet	6e2d8e49ec	Formatting in DAGCombiner. NFC llvm-svn: 294091	2017-02-04 13:01:53 +00:00
Nirav Dave	93f9d5ce04	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915	2017-02-02 18:24:55 +00:00
Amaury Sechet	f3e421d6e9	Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC llvm-svn: 293903	2017-02-02 16:07:44 +00:00
Nirav Dave	4442667fc5	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893	2017-02-02 14:39:42 +00:00
Nicolai Haehnle	8813d5d221	[DAGCombine] require UnsafeFPMath for re-association of addition Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 llvm-svn: 293635	2017-01-31 14:35:37 +00:00
Craig Topper	4753736abf	[DAGCombiner] Use unsigned for a constant vector index instead of APInt. The type system requires that the number of vector elements should fit in 32-bits so this should be safe. llvm-svn: 293414	2017-01-29 04:38:21 +00:00
Craig Topper	d15730902b	[DAGCombiner] Remove unnecessary check on the size of the type of the index of EXTRACT_SUBVECTOR. The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue. llvm-svn: 293413	2017-01-29 04:38:19 +00:00
Craig Topper	24cdbe8fa6	[DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying to use getConstantOperandVal. llvm-svn: 293412	2017-01-29 04:38:16 +00:00
Nirav Dave	d32a421f75	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188	2017-01-26 16:46:13 +00:00
Nirav Dave	de6516c466	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184	2017-01-26 16:02:24 +00:00
Craig Topper	001aad7da7	[DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting undef subvectors. llvm-svn: 293152	2017-01-26 05:38:46 +00:00
Artur Pilipenko	bc93452420	Fix buildbot failures introduced by 293036 Fix unused variable, specify types explicitly to make VC compiler happy. llvm-svn: 293039	2017-01-25 09:10:07 +00:00
Artur Pilipenko	41c0005aa3	[DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2 . The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used. From the original commit: Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 a = ... i32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) => i32 val = ((i32)a) i8 a = ... i32 val = (a[0] << 24) \| (a[1] << 16) \| (a[2] << 8) \| a[3] => i32 val = BSWAP(((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] \| (a[i + 1] << 8) \| (a[i + 2] << 16) \| (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: RKSimon, filcab, chandlerc Differential Revision: https://reviews.llvm.org/D27861 llvm-svn: 293036	2017-01-25 08:53:31 +00:00
Matt Arsenault	732a531506	DAG: Recognize no-signed-zeros-fp-math attribute clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024	2017-01-25 06:08:42 +00:00
Matt Arsenault	8a27aee6ae	DAGCombiner: Allow negating ConstantFP after legalize llvm-svn: 293019	2017-01-25 04:54:34 +00:00
Matt Arsenault	4e305c6c1e	DAG: Don't fold vector extract into load if target doesn't want to Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842	2017-01-23 22:48:53 +00:00
Simon Pilgrim	db73dbcc7c	[SelectionDAG] Add support for BITREVERSE constant folding We were relying on constant folding of the legalized instructions to do what constant folding we had previously llvm-svn: 292114	2017-01-16 13:39:00 +00:00
Benjamin Kramer	061f4a5fe6	Apply clang-tidy's performance-unnecessary-value-param to LLVM. With some minor manual fixes for using function_ref instead of std::function. No functional change intended. llvm-svn: 291904	2017-01-13 14:39:03 +00:00
Craig Topper	7af39837a9	Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent." Some test appears to be hanging on the build bots. llvm-svn: 291650	2017-01-11 04:59:25 +00:00
Craig Topper	577d258569	[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent. llvm-svn: 291645	2017-01-11 04:02:23 +00:00
Matt Arsenault	e482403e1c	DAGCombiner: Add hasOneUse checks to fadd/fma combine Even with aggressive fusion enabled, this requires duplicating the fmul, or increases an fadd to another fma which is not an improvement. llvm-svn: 291642	2017-01-11 02:02:12 +00:00
Craig Topper	588c734b0f	[DAGCombiner] Merge together duplicate checks for folding fold (select C, 1, X) -> (or C, X) and folding (select C, X, 0) -> (and C, X). Also be consistent about checking that both the condition and the result type are i1. NFC I guess previously we just assumed if the result type was i1, then the condition type must also be i1? llvm-svn: 291548	2017-01-10 07:42:57 +00:00

1 2 3 4 5 ...

1744 Commits