llvm-project

Author	SHA1	Message	Date
Roman Lebedev	2245fb8aaa	[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-26 22:51:22 +03:00
Roman Lebedev	65db7d38e0	[NFC][SimplifyCFG] Add statistic to `FoldBranchToCommonDest()` fold	2020-11-26 22:51:21 +03:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit 408c4408facc3a79ee4ff7e9983cc972f797e176. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Arthur Eubanks	5c31b8b94f	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit 10f2a0d662d8d72eaac48d3e9b31ca8dc90df5a4. More uint64_t overflows.	2020-10-31 00:25:32 -07:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit 73f01e3df58dca9d1596440b866b52929e3878de. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Arthur Eubanks	10f2a0d662	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-30 10:03:46 -07:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Nico Weber	2a4e704c92	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit e5766f25c62c185632e3a75bf45b313eadab774b. Makes clang assert when building Chromium, see https://crbug.com/1142813 for a repro.	2020-10-27 09:26:21 -04:00
Arthur Eubanks	e5766f25c6	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-26 20:24:04 -07:00
Zequan Wu	2f29341114	Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation"" This reverts commit 716f7636e1ec7880a6d2f2205f54f65191cf8f9a.	2020-10-21 17:08:56 -07:00
Zequan Wu	716f7636e1	Revert "SimplifyCFG: Clean up optforfuzzing implementation" See discussion: https://reviews.llvm.org/D89590 This reverts commit cdd006eec9409923f9a56b9026ce2cb72e7b71dc.	2020-10-21 16:56:32 -07:00
Jeremy Morse	05659606a2	Revert "[gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC" Some of the buildbots have croaked with this patch, for examples failures that begin in this build: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/29933 This reverts commit 674f57870f4c8a7fd7b629bffc85b149cbefd3e0.	2020-09-30 09:52:12 +01:00
Vedant Kumar	674f57870f	[gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC	2020-09-29 17:39:07 -07:00
Sam Parker	65f78e73ad	[SimplifyCFG] Consider cost of combining predicates. Modify FoldBranchToCommonDest to consider the cost of inserting instructions when attempting to combine predicates to fold blocks. The threshold can be controlled via a new option: -simplifycfg-branch-fold-threshold which defaults to '2' to allow the insertion of a not and another logical operator. Differential Revision: https://reviews.llvm.org/D86526	2020-09-07 10:04:50 +01:00
Benjamin Kramer	8782c72765	Strength-reduce SmallVectors to arrays. NFCI.	2020-08-28 21:14:20 +02:00
Sam Parker	8ce450da32	[NFCI][SimplifyCFG] Combine select costs and checks Combine the cost modelling and validity checks for the phi to select conversion in SpeculativelyExecuteBB, extracting the logic out into a function.	2020-08-24 09:16:11 +01:00
Sam Parker	bfc6d8b59b	[NFC][SimplifyCFG] Formatting and variable rename	2020-08-21 13:11:17 +01:00
Sam Parker	47251582f5	[SimplifyCFG] Cost required selects Before we speculatively execute a basic block, query the cost of inserting the necessary select instructions against the phi folding threshold. For non-trivial insertions, a more accurate decision can probably be made during machine if-conversion. With minsize we query the CodeSize cost, otherwise we use SizeAndLatency. Differential Revision: https://reviews.llvm.org/D82438	2020-08-21 09:52:52 +01:00
Roman Lebedev	e492f0e03b	[SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics SimplifyCFG has two main folds for resumes - one when resume is directly using the landingpad, and the other one where resume is using a PHI node. While for the first case, we were already correctly ignoring all the PHI nodes, and both the debug info intrinsics and lifetime intrinsics, in the PHI-based-one, we weren't ignoring PHI's in the resume block, and weren't ignoring lifetime intrinsics. That is clearly a bug. On RawSpeed library, this results in +9.34% (+81) more invoke->call folds, -0.19% (-39) landing pads, -0.24% (-81) invoke instructions but +51 call instructions and -132 basic blocks. Though, the run-time performance impact appears to be within the noise.	2020-08-08 20:00:28 +03:00
Roman Lebedev	1f452ac1d7	[NFC][SimplifyCFG] Rewrite isCleanupBlockEmpty() to be iterator_range-based	2020-08-08 20:00:28 +03:00
Roman Lebedev	a587bf3eb0	[NFC][SimplifyCFG] Count the number of invokes turned into calls due to empty cleanup blocks	2020-08-08 20:00:27 +03:00
Max Kazantsev	360ab70712	[SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls We do not thread blocks with convergent calls, but this check was missing when we decide to insert PR Phis into it (which we only do for threading). Differential Revision: https://reviews.llvm.org/D83936 Reviewed By: nikic	2020-07-22 13:53:50 +07:00
Roman Lebedev	04b729d076	[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag Common code sinking is already guarded with a (with default-off!) flag, so add a flag for hoisting, too. D84108 will hopefully make hoisting off-by-default too.	2020-07-20 10:29:57 +03:00
Jon Roelofs	a0537fc35f	[SimplifyCFG] Fix crash in the EXPENSIVE_CHECKS build SimplifyCFG was incorrectly reporting to the pass manager that it had not made changes after folding away a PHI. This is detected in the EXPENSIVE_CHECKS build when the function's hash changes. Differential Revision: https://reviews.llvm.org/D83985	2020-07-16 15:34:41 -06:00
Roman Lebedev	2815429d08	[NFC][SimplifyCFG] HoistThenElseCodeToIf(): after hoisting terminator, do return Changed, not just true Otherwise, if Changed was still false before that, we would not account for that hoist in NumHoistCommonCode statistic.	2020-07-16 00:32:48 +03:00
Roman Lebedev	1cfc24fd67	[NFC][SimplifyCFG] HoistThenElseCodeToIf(): count number of common instruction "blocks" hoisted I.e. out of all the times HoistThenElseCodeToIf() was called, how many times did it actually hoist something?	2020-07-16 00:21:56 +03:00
Roman Lebedev	7b53ad88d4	[NFC][SimplifyCFG] HoistThenElseCodeToIf(): count number of common instructions hoisted	2020-07-16 00:21:56 +03:00
Roman Lebedev	3fc1defc0b	[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): count number of instruction "blocks" actually sunk Out of all the times the function was called, how many times did we actually sink anything?	2020-07-16 00:21:56 +03:00
Roman Lebedev	9ed65c76c0	[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): add debug output when failing to actually sink instr	2020-07-16 00:21:55 +03:00
Roman Lebedev	4c79864488	[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): early return if nothing to sink If we can't sink even one instruction, early return, to increase readability.	2020-07-16 00:21:55 +03:00
Roman Lebedev	702a3c6410	[NFC][SimplifyCFG] Rename statistic NumSinkCommons into NumSinkCommonInstrs It really counts instructions added into common block, not number of instruction groups sunk.	2020-07-16 00:21:55 +03:00
Guillaume Chatelet	8dbafd24d6	[Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82977	2020-07-02 11:28:02 +00:00
Max Kazantsev	f01d9e6fc3	[SimplifyCFG] Fix inconsistency in block size assessment for threading Sometimes SimplifyCFG may decide to perform jump threading. In order to do it, it follows the following algorithm: 1. Checks if the block is small enough for threading; 2. If yes, inserts a PR Phi relying that the next iteration will remove it by performing jump threading; 3. The next iteration checks the block again and performs the threading. This logic has a corner case: inserting the PR Phi increases block's size by 1. If the block size at first check was max possible, one more Phi will exceed this size, and we will neither perform threading nor remove the created Phi node. As result, we will end up with worse IR than before. This patch fixes this situation by excluding Phis from block size computation. Excluding Phis from size computation for threading also makes sense by itself because in case of threadign all those Phis will be removed. Differential Revision: https://reviews.llvm.org/D81835 Reviewed By: asbirlea, nikic	2020-06-30 12:40:07 +07:00
serge-sans-paille	b4130e6e99	Correctly report Changed status in FoldBranchToCommonDest It's possible for the first loop trip(s) to set the `Changed` Status, and to a later one to early exit, in which case `Changed` must be return. Differential Revision: https://reviews.llvm.org/D82753	2020-06-29 18:13:42 +02:00
Vedant Kumar	f8bd6a75ed	[SimplifyCFG] Drop debug loc in SpeculativelyExecuteBB Summary: According to HowToUpdateDebugInfo.rst: ``` Preserving the debug locations of speculated instructions can make it seem like a condition is true when it's not (or vice versa), which leads to a confusing single-stepping experience ``` This patch follows the recommendation to drop debug locations on speculated instructions. Reviewers: aprantl, davide Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82420	2020-06-23 18:25:52 -07:00
Davide Italiano	8cdd2a158c	[SimplifyCFG] Update debug location when folding branch to common destination Sometimes a dead block gets folded and the debug information is still retained. This manifests as jumpy stepping in lldb, see the bugzilla PR for an end-to-end C testcase. Fixes https://bugs.llvm.org/show_bug.cgi?id=46008 Differential Revision: https://reviews.llvm.org/D82062	2020-06-18 12:33:32 -07:00
serge-sans-paille	9daccb7a47	Correctly update Changed status for SimplifyCFG Interestingly, this leads to better output in one of the test case. Differential Revision: https://reviews.llvm.org/D81237	2020-06-10 16:54:15 +02:00
Matt Arsenault	cdd006eec9	SimplifyCFG: Clean up optforfuzzing implementation This should function as any other SimplifyCFGOption rather than having the transform check and specially consider the attribute itself.	2020-05-23 13:49:50 -04:00
Zequan Wu	cb22ab7403	Add nomerge function attribute to supress tail merge optimization in simplifyCFG We want to add a way to avoid merging identical calls so as to keep the separate debug-information for those calls. There is also an asan usecase where having this attribute would be beneficial to avoid alternative work-arounds. Here is the link to the feature request: https://bugs.llvm.org/show_bug.cgi?id=42783. `nomerge` is different from `noline`. `noinline` prevents function from inlining at callsites, but `nomerge` prevents multiple identical calls from being merged into one. This patch adds `nomerge` to disable the optimization in IR level. A followup patch will be needed to let backend understands `nomerge` and avoid tail merge at backend. Reviewed By: asbirlea, rnk Differential Revision: https://reviews.llvm.org/D78659	2020-05-12 16:49:20 -07:00
Ricky Zhou	b38d77f185	[SimplifyCFG] Remap rewritten debug intrinsic operands. FoldBranchToCommonDest clones instructions to a different basic block, but handles debug intrinsics in a separate path. Previously, when cloning debug intrinsics, their operands were not updated to reference the correct cloned values. As a result, we would emit debug.value intrinsics with broken operand references which are discarded in later passes. This leads to incorrect debuginfo that reports incorrect values for variables. Fix this by remapping debug intrinsic operands when cloning them. Fixes https://bugs.llvm.org/show_bug.cgi?id=45667. Differential Revision: https://reviews.llvm.org/D79602	2020-05-08 11:10:25 -07:00
Sam Parker	e9c9329aa4	[TTI] Add TargetCostKind argument to getUserCost There are several different types of cost that TTI tries to provide explicit information for: throughput, latency, code size along with a vague 'intersection of code-size cost and execution cost'. The vectorizer is a keen user of RecipThroughput and there's at least 'getInstructionThroughput' and 'getArithmeticInstrCost' designed to help with this cost. The latency cost has a single use and a single implementation. The intersection cost appears to cover most of the rest of the API. getUserCost is explicitly called from within TTI when the user has been explicit in wanting the code size (also only one use) as well as a few passes which are concerned with a mixture of size and/or a relative cost. In many cases these costs are closely related, such as when multiple instructions are required, but one evident diverging cost in this function is for div/rem. This patch adds an argument so that the cost required is explicit, so that we can make the important distinction when necessary. Differential Revision: https://reviews.llvm.org/D78635	2020-04-28 08:57:45 +01:00
Craig Topper	a58b62b4a2	[IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand(). This method has been commented as deprecated for a while. Remove it and replace all uses with the equivalent getCalledOperand(). I also made a few cleanups in here. For example, to removes use of getElementType on a pointer when we could just use getFunctionType from the call. Differential Revision: https://reviews.llvm.org/D78882	2020-04-27 22:17:03 -07:00
Tyker	97ecd91e20	[NFC] Refactor SimplifyCFG to make propagating information easier. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77742	2020-04-24 22:22:20 +02:00
Craig Topper	05a11974ae	[CallSite removal] Remove unneeded includes of CallSite.h. NFC	2020-04-22 00:07:13 -07:00
Mircea Trofin	4213bc761a	[llvm][NFC][CallSite] Removed CallSite from some implementation details. Reviewers: craig.topper, dblaikie Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78256	2020-04-15 22:27:05 -07:00
Zequan Wu	eccfa35d53	Fix lifetime call in landingpad blocking Simplifycfg pass Fix lifetime call in landingpad blocks simplifycfg from removing the landingpad. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D77188	2020-04-09 13:07:32 -07:00
Matt Arsenault	43d98a0ecf	Allow replacing intrinsic operands with variables Since intrinsics can now specify when an argument is required to be constant, it is now OK to replace arguments with variables if they aren't. This means intrinsics must now be accurately marked with immarg.	2020-03-23 15:51:57 -04:00
Sanjay Patel	94f5d73182	[SimplifyCFG] fix formatting; NFC	2020-03-13 14:12:28 -04:00
Sanjay Patel	51e53af11c	[SimplifyCFG] fix debug print formatting; NFC	2020-03-13 14:12:28 -04:00

1 2 3 4 5 ...

999 Commits