llvm-project

Author	SHA1	Message	Date
Benjamin Kramer	8fb4a4f7bb	[SampleFDO] Silence -Wnon-virtual-dtor warning There's no polymorphic deletion happening here.	2021-02-10 23:37:15 +01:00
Rong Xu	db0d7d0ba9	[SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen Break SampleProfileLoader into to a base and a derived class. Base class (SampleProfileLoaderBaseImpl) includes the common code for IR and MachineIR (CodeGen) sample loader. It will be templatelized in the later patch. Inline and Probe related code will remain in the derived class of SampleProfileLoader and stays in SampleProfile.cpp. We need to refactor some functions: (1) getInstWeight() to enable the code sharing -- put the core into getInstWeightImpl(). (2) emitAnnotation() and propagateWeights() to carve out the code specific to SampleProfileLoader. (3) make getInstWeight() and findFunctionSamples() virtual and override in SampleProfileLoader as they need to access the fields in the derived class. Differential Revision: https://reviews.llvm.org/D95832	2021-02-10 13:29:15 -08:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Johannes Doerfert	81429abd83	[Attributor][FIX] Do not create UB by introducing a `noundef undef` This was reported as PR49104. The reproducer uses varargs but the issue is the same, we know an argument is dead but can't change the signature for some reason. The PR49104 situation was: We are in an CG-SCC traversal and we remove all the uses of an argument and proof it thereby dead. However, if we do not remove the argument, via signature rewrite, we need to ensure that the `undef` we introduce at the call site doesn't clash with a `noundef` attribute.	2021-02-09 13:02:38 -06:00
Kazu Hirata	302313a264	[Transforms] Use range-based for loops (NFC)	2021-02-08 22:33:53 -08:00
Teresa Johnson	3a27933ec2	[LTT] Don't attempt to lower type tests used only by assumes Type tests used only by assumes were original for devirtualization, but are meant to be kept through the first invocation of LTT so that they can be used for additional optimization. In the regular LTO case where the IR is analyzed we may find a resolution for the type test and end up rewriting the associated vtable global, which can have implications on section splitting. Simply ignore these type tests. Fixes PR48245. Differential Revision: https://reviews.llvm.org/D96083	2021-02-06 09:02:10 -08:00
Simon Pilgrim	f7d07dbb29	IROutliner.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:38:09 +00:00
Simon Pilgrim	476b912e7c	SampleProfile.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:31:17 +00:00
Simon Pilgrim	89edda7084	IROutliner.cpp - fix Wdocumentation warnings. NFCI.	2021-02-05 11:21:00 +00:00
Kazu Hirata	be37475897	[Transforms/IPO] Use range-based for loops (NFC)	2021-02-03 20:41:20 -08:00
Rong Xu	b8f13db5b7	[SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker This patch detaches SampleProfileLoader from class SampleCoverageTracker. We plan to move SampleProfileLoader to a template class. This would remain SampleCoverageTracker as a class. Also make callsiteIsHot() as a file static function. Differential Revision: https://reviews.llvm.org/D95823	2021-02-03 11:38:04 -08:00
Hongtao Yu	3d89b3cbec	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Wenlei He	1645f465be	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. This is resubmit of D95024, with build break and overtighten assertion fixed. Test Plan:	2021-02-02 07:55:08 -08:00
Adrian Kuegel	48ca6da9d2	Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline" This reverts commit 9a03058d6322edb8abc803ba3e436cc62647d979.	2021-02-02 11:51:04 +01:00
Adrian Kuegel	3a65ec4bf9	Revert "Fix build break from D95024" This reverts commit 09cd849fdef2b2d3de2d0b0a5c512100957e0ef6.	2021-02-02 11:51:04 +01:00
Wenlei He	09cd849fde	Fix build break from D95024	2021-02-02 01:01:12 -08:00
Wenlei He	9a03058d63	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. Test Plan: Differential Revision: https://reviews.llvm.org/D95024	2021-02-02 00:34:06 -08:00
Wenlei He	6bae5973c4	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Hongtao Yu	224fee8219	[CSSPGO] Tweaking inlining with pseudo probes. Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95791	2021-02-01 13:56:40 -08:00
Kazu Hirata	3d1200b9f6	[llvm] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-01-31 10:23:43 -08:00
Kazu Hirata	8ed1636184	[llvm] Use isa instead of dyn_cast (NFC)	2021-01-29 23:23:37 -08:00
Hongtao Yu	7e99bddfea	[CSSPGO] Support of CS profiles in extended binary format. This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95547	2021-01-27 21:29:46 -08:00
Teresa Johnson	1487747e99	[LTO] Prevent devirtualization for symbols dynamically exported Identify dynamically exported symbols (--export-dynamic[-symbol=], --dynamic-list=, or definitions needed to preempt shared objects) and prevent their LTO visibility from being upgraded. This helps avoid use of whole program devirtualization when there may be overrides in dynamic libraries. Differential Revision: https://reviews.llvm.org/D91583	2021-01-27 15:54:13 -08:00
Fangrui Song	54fb3ca96e	[ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags Imported functions and variable get the visibility from the module supplying the definition. However, non-imported definitions do not get the visibility from (ELF) the most constraining visibility among all modules (Mach-O) the visibility of the prevailing definition. This patch * adds visibility bits to GlobalValueSummary::GVFlags * computes the result visibility and propagates it to all definitions Protected/hidden can imply dso_local which can enable some optimizations (this is stronger than GVFlags::DSOLocal because the implied dso_local can be leveraged for ELF -shared while default visibility dso_local has to be cleared for ELF -shared). Note: we don't have summaries for declarations, so for ELF if a declaration has the most constraining visibility, the result visibility may not be that one. Differential Revision: https://reviews.llvm.org/D92900	2021-01-27 10:43:51 -08:00
Bjorn Pettersson	a9bd3d37bd	[NewPM] Add ExtraVectorizerPasses support As it looks like NewPM generally is using SimpleLoopUnswitch instead of LoopUnswitch, this patch also use SimpleLoopUnswitch in the ExtraVectorizerPasses sequence (compared with LegacyPM which use the LoopUnswitch pass). Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D95457	2021-01-26 22:59:10 +01:00
Florian Hahn	35b3989a30	[Passes] Run peeling as part of simple/full loop unrolling. Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling. For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well. This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0% Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0% Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88471	2021-01-26 13:52:30 +00:00
modimo	ce7f9cdb50	[InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only. The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining. In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places. Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay. Testing: ninja check-llvm newly added test correctly replays CGSCC inlining decisions Reviewed By: mtrofin, wenlei Differential Revision: https://reviews.llvm.org/D94334	2021-01-25 15:38:57 -08:00
Richard Smith	925ae8c790	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit 53176c168061d6f26dcf3ce4fa59288b7d67255e, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Akira Hatanaka	53176c1680	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
Wei Mi	c9cd9a0066	[SampleFDO] Report error when reading a bad/incompatible profile instead of turning off SampleFDO silently. Currently sample loader pass turns off SampleFDO optimization silently when it sees error in reading the profile. This behavior will defeat the tests which could have caught those bad/incompatible profile problems. This patch change the behavior to report error. Differential Revision: https://reviews.llvm.org/D95269	2021-01-25 10:28:23 -08:00
Kazu Hirata	1238378f18	[llvm] Use pop_back_val (NFC)	2021-01-23 10:56:33 -08:00
Xun Li	bd3ca6666d	[Inlining] Delete redundant optnone/alwaysinline check The same check is done in InlineCost: `8b0bd54d0e/llvm/lib/Analysis/InlineCost.cpp (L2537-L2552)` Also, doing a check on the callee here is confusing, because anything that deals with callee should be done in the inner loop where we proecss all calls from the same caller. Differential Revision: https://reviews.llvm.org/D95186	2021-01-21 18:38:10 -08:00
Nikita Popov	65fd034b95	[FunctionAttrs] Infer willreturn for functions without loops If a function doesn't contain loops and does not call non-willreturn functions, then it is willreturn. Loops are detected by checking for backedges in the function. We don't attempt to handle finite loops at this point. Differential Revision: https://reviews.llvm.org/D94633	2021-01-21 20:29:33 +01:00
Kazu Hirata	e53472de68	[Transforms] Use llvm::append_range (NFC)	2021-01-20 21:35:54 -08:00
Mircea Trofin	ccec2cf1d9	Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit d97f776be5f8cd3cd446fe73827cd355f6bab4e1. The original problem was due to build failures in shared lib builds. D95079 moved ImportedFunctionsInliningStatistics under Analysis, unblocking this.	2021-01-20 13:33:43 -08:00
Mircea Trofin	95ce32c787	[NFC] Move ImportedFunctionsInliningStatistics to Analysis This is related to D94982. We want to call these APIs from the Analysis component, so we can't leave them under Transforms. Differential Revision: https://reviews.llvm.org/D95079	2021-01-20 13:18:03 -08:00
Mircea Trofin	d97f776be5	Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit e8aec763a57e211420dfceb2a8dc6b88574924f3.	2021-01-20 11:19:34 -08:00
Mircea Trofin	e8aec763a5	[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor When using 2 InlinePass instances in the same CGSCC - one for other mandatory inlinings, the other for the heuristic-driven ones - the order in which the ImportedFunctionStats would be output-ed would depend on the destruction order of the inline passes, which is not deterministic. This patch moves the ImportedFunctionStats responsibility to the InlineAdvisor to address this problem. Differential Revision: https://reviews.llvm.org/D94982	2021-01-20 11:07:36 -08:00
David Sherwood	255a507716	[NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp In places where we call a TTI.getXXCost() function I have changed the code to use InstructionCost instead of unsigned. This is in preparation for later on when we will change the TTI interfaces to return InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94427	2021-01-20 08:33:59 +00:00
Juneyoung Lee	4479c0c2c0	Allow nonnull/align attribute to accept poison Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison. To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null. This makes many transformations like below legal: ``` %p = gep inbounds %x, 1 ; % p is non-null pointer or poison call void @f(%p) ; instcombine converts this to call void @f(nonnull %p) ``` Instead, this semantics makes propagation of `nonnull` to caller illegal. The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument. Having `noundef` attribute there re-allows this. ``` define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p. ret void } ``` Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90529	2021-01-20 11:31:23 +09:00
Wei Mi	21b1ad0340	[SampleFDO] Add the support to split the function profiles with context into separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435	2021-01-19 15:16:19 -08:00
Florian Hahn	83daa49758	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Mircea Trofin	e8049dc3c8	[NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner Expanding from D94808 - we ensure the same InlineAdvisor is used by both InlinerPass instances. The notion of mandatory inlining is moved into the core InlineAdvisor: advisors anyway have to handle that case, so this change also factors out that a bit better. Differential Revision: https://reviews.llvm.org/D94825	2021-01-15 17:59:38 -08:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Kazu Hirata	125ea20d55	[llvm] Use llvm::stable_sort (NFC)	2021-01-13 19:14:43 -08:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Wei Mi	86341247c4	[NFC] Rename ThinLTOPhase to ThinOrFullLTOPhase and move it from PassBuilder.h to Pass.h. In some compiler passes like SampleProfileLoaderPass, we want to know which LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and it also cannot represent phase in full LTO. The patch extends it to include full LTO phases and move it from PassBuilder.h to Pass.h, then it is much easier for PassBuilder to communiate with each pass about current LTO phase. Differential Revision: https://reviews.llvm.org/D94613	2021-01-13 15:55:40 -08:00

1 2 3 4 5 ...

4436 Commits