llvm-project

Author	SHA1	Message	Date
Roman Lebedev	c45f765c0d	[SimplifyCFG] Teach SimplifyBranchOnICmpChain() to preserve DomTree	2020-12-30 23:58:40 +03:00
Kazu Hirata	16d20e2554	[Transforms/Utils] Construct SmallVector with iterator ranges (NFC)	2020-12-29 19:23:23 -08:00
Roman Lebedev	39a56f7f17	[SimplifyCFG] Teach SimplifyTerminatorOnSelect() to preserve DomTree	2020-12-30 00:48:12 +03:00
Roman Lebedev	ec0b671a61	[SimplifyCFG] Teach SimplifyCondBranchToCondBranch() to preserve DomTree	2020-12-30 00:48:12 +03:00
Roman Lebedev	307156246f	[SimplifyCFG] Teach mergeConditionalStoreToAddress() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	d4c0abb4a3	[SimplifyCFG] Teach FoldCondBranchOnPHI() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	b8121b2e62	[SimplifyCFG] Teach SinkCommonCodeFromPredecessors() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	18c407bf4c	[SimplifyCFG] Teach HoistThenElseCodeToIf() to preserve DomTree	2020-12-30 00:48:10 +03:00
Roman Lebedev	fe9bdd9621	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 2	2020-12-30 00:48:10 +03:00
Roman Lebedev	6027e05dbf	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 1	2020-12-30 00:48:10 +03:00
Roman Lebedev	ef93f7a11c	[SimplifyCFG] FoldBranchToCommonDest: gracefully handle unreachable code () We might be dealing with an unreachable code, so the bonus instruction we clone might be self-referencing. There is a sanity check that all uses of bonus instructions that are not in the original block with said bonus instructions are PHI nodes, and that is obviously not the case for self-referencing instructions.. So if we find such an use, just rewrite it. Thanks to Mikael Holmén for the reproducer! Fixes https://bugs.llvm.org/show_bug.cgi?id=48450#c8	2020-12-28 23:31:19 +03:00
Roman Lebedev	ff3749fc79	[NFC] SimplifyCFGOpt::simplifyUnreachable(): pacify unused variable warning Thanks to Luke Benes for pointing it out.	2020-12-24 21:20:46 +03:00
Roman Lebedev	c043f5055e	[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1 ... for conditional branch case	2020-12-20 00:18:36 +03:00
Roman Lebedev	262ff9c23e	[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree	2020-12-20 00:18:36 +03:00
Roman Lebedev	6a1617d67c	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2 ... for the custom case returning void.	2020-12-20 00:18:36 +03:00
Roman Lebedev	b94520c9ee	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1 ... for the general case of returning a value.	2020-12-20 00:18:35 +03:00
Roman Lebedev	4d87a6ad13	[NFCI][SimplifyCFG] SimplifyCondBranchToTwoReturns(): pull out BI->getParent() into a variable	2020-12-20 00:18:35 +03:00
Roman Lebedev	83659c7076	[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Apparently, there were no dedicated tests just for that functionality, so i'm adding one here.	2020-12-20 00:18:34 +03:00
Roman Lebedev	b7d00e29b7	[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	c209b88dd4	[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	76e74d9395	[SimplifyCFG] Teach removeEmptyCleanup() to preserve DomTree	2020-12-20 00:18:33 +03:00
Roman Lebedev	4be8707e64	[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree Still boring, simply drop all edges to successors of DomBlock, and add an edge to to BB instead.	2020-12-20 00:18:33 +03:00
Roman Lebedev	b43b77ff9b	[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation And that exposes that a number of tests don't actually manage to maintain DomTree validity, which is inline with my observations. Once again, SimlifyCFG pass currently does not require/preserve DomTree by default, so this is effectively NFC.	2020-12-20 00:18:32 +03:00
Roman Lebedev	2d07414ee5	[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree Pretty boring, removeUnwindEdge() already known how to update DomTree, so if we are to call it, we must first flush our own pending updates; otherwise, we just stop predecessors from branching to us, and for certain predecessors, stop their predecessors from branching to them also.	2020-12-18 00:37:22 +03:00
Roman Lebedev	2ee724863e	[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:22 +03:00
Roman Lebedev	164e0847a5	[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:21 +03:00
Florian Hahn	75c04bfc61	[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest. When folding a branch to a common destination, preserve !annotation on the created instruction, if the terminator of the BB that is going to be removed has !annotation. This should ensure that !annotation is attached to the instructions that 'replace' the original terminator. Reviewed By: jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D93410	2020-12-17 14:06:58 +00:00
Roman Lebedev	5cce4aff18	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Roman Lebedev	49dac4aca0	[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Roman Lebedev	e113317958	[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware Two observations: 1. Unavailability of DomTree makes it impossible to make `FoldBranchToCommonDest()` transform in certain cases, where the successor is dominated by predecessor, because we then don't have PHI's, and can't recreate them, well, without handrolling 'is dominated by' check, which doesn't really look like a great solution to me. 2. Avoiding invalidating DomTree in SimplifyCFG will decrease the number of `Dominator Tree Construction` by 5 (from 28 now, i.e. -18%) in `-O3` old-pm pipeline (as per `llvm/test/Other/opt-O3-pipeline.ll`) This might or might not be beneficial for compile time. So the plan is to make SimplifyCFG preserve DomTree, and then eventually make DomTree fully required and preserved by the pass. Now, SimplifyCFG is ~7KLOC. I don't think it will be nice to do all this uplifting in a single mega-commit, nor would it be possible to review it in any meaningful way. But, i believe, it should be possible to do this in smaller steps, introducing the new behavior, in an optional way, off-by-default, opt-in option, and gradually fixing transforms one-by-one and adding the flag to appropriate test coverage. Then, eventually, the default should be flipped, and eventually^2 the flag removed. And that is what is happening here - when the new off-by-default option is specified, DomTree is required and is claimed to be preserved, and SimplifyCFG-internal assertions verify that the DomTree is still OK.	2020-12-16 00:38:00 +03:00
Roman Lebedev	59560e8589	[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450) Even though d38205144febf4dc42c9270c6aa3d978f1ef65e1 was mostly a correct fix for the external non-PHI users, it's not a generally correct fix, because the 'placeholder' values in those trivial PHI's we create shouldn't be always 'undef', but the PHI itself for the backedges, else we end up with wrong value, as the `@pr48450_2` test shows. But we can't just do that, because we can't check that the PHI can be it's own incoming value when coming from certain predecessor, because we don't have a dominator tree. So until we can address this correctness problem properly, ensure that we don't perform the transformation if there are such problematic external uses. Making dominator tree available there is going to be involved, since `-simplifycfg` pass currently does not preserve/update domtree...	2020-12-14 20:14:31 +03:00
Roman Lebedev	e8360a8e1e	[NFC][SimplifyCFG] FoldBranchToCommonDest(): pull out 'common successor' into a variable Makes it easier to use it elsewhere	2020-12-14 20:14:31 +03:00
Kazu Hirata	5891ad4e22	[Transforms] Use llvm::erase_value (NFC)	2020-12-13 09:48:47 -08:00
Roman Lebedev	d38205144f	[SimplifyCFG] FoldBranchToCommonDest(): bonus instrns must only be used by PHI nodes in successors (PR48450) In particular, if the successor block, which is about to get a new predecessor block, currently only has a single predecessor, then the bonus instructions will be directly used within said successor, which is fine, since the block with bonus instructions dominates that successor. But once there's a new predecessor, the IR is no longer valid, and we don't fix it, because we only update PHI nodes. Which means, the live-out bonus instructions must be exclusively used by the PHI nodes in successor blocks. So we have to form trivial PHI nodes. which will then be successfully updated to recieve cloned bonus instns. This all works fine, except for the fact that we don't have access to the dominator tree, and we don't ignore unreachable code, so we sometimes do end up having to deal with some weird IR. Fixes https://bugs.llvm.org/show_bug.cgi?id=48450	2020-12-13 00:06:57 +03:00
Roman Lebedev	15f8060f6f	[SimplifyCFG] FoldBranchToCommonDest: don't require that cmp of br is last instruction There is no correctness need for that, and since we allow live-out uses, this could theoretically happen, because currently nothing will move the cond to right before the branch in those tests. But regardless, lifting that restriction even makes the transform easier to understand. This makes the transform happen in 81 more cases (+0.55%) )	2020-12-01 15:13:06 +03:00
Roman Lebedev	b0e9b7c59f	[NFC][SimplifyCFG] Add STATISTIC() to the FoldValueComparisonIntoPredecessors() fold	2020-11-30 12:27:16 +03:00
Roman Lebedev	b33fbbaa34	Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions This was orginally committed in 2245fb8aaa1c1f85f53f7b19a1ee3ac69b1a1dfe. but was immediately reverted in f3abd54958ab90ba7c100d3fa936a3ce0dd2ad04 because of a PHI handling issue. Original commit message: 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-27 12:47:15 +03:00
Roman Lebedev	f3abd54958	Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions" Many bots are unhappy, at the very least missed a few codegen tests, and possibly this has a logic hole inducing a miscompile (will be really awesome to have ready reproducer..) Need to investigate. This reverts commit 2245fb8aaa1c1f85f53f7b19a1ee3ac69b1a1dfe.	2020-11-26 23:13:43 +03:00
Roman Lebedev	2245fb8aaa	[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-26 22:51:22 +03:00
Roman Lebedev	65db7d38e0	[NFC][SimplifyCFG] Add statistic to `FoldBranchToCommonDest()` fold	2020-11-26 22:51:21 +03:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit 408c4408facc3a79ee4ff7e9983cc972f797e176. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Arthur Eubanks	5c31b8b94f	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit 10f2a0d662d8d72eaac48d3e9b31ca8dc90df5a4. More uint64_t overflows.	2020-10-31 00:25:32 -07:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit 73f01e3df58dca9d1596440b866b52929e3878de. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Arthur Eubanks	10f2a0d662	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-30 10:03:46 -07:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Nico Weber	2a4e704c92	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit e5766f25c62c185632e3a75bf45b313eadab774b. Makes clang assert when building Chromium, see https://crbug.com/1142813 for a repro.	2020-10-27 09:26:21 -04:00
Arthur Eubanks	e5766f25c6	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-26 20:24:04 -07:00
Zequan Wu	2f29341114	Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation"" This reverts commit 716f7636e1ec7880a6d2f2205f54f65191cf8f9a.	2020-10-21 17:08:56 -07:00
Zequan Wu	716f7636e1	Revert "SimplifyCFG: Clean up optforfuzzing implementation" See discussion: https://reviews.llvm.org/D89590 This reverts commit cdd006eec9409923f9a56b9026ce2cb72e7b71dc.	2020-10-21 16:56:32 -07:00

... 5 6 7 8 9 ...

1287 Commits