llvm-project

Author	SHA1	Message	Date
wlei	bfefeeb139	[SamplePGO] Fix ICE that callee samples returns null while finding import functions We found that in a special condition, the input callee `Samples` is null for `findExternalInlineCandidate`, which caused an ICE. In some rare cases, call instruction could be changed after being pushed into inline candidate queue, this is because earlier inlining may expose constant propagation which can change indirect call to direct call. When this happens, we may fail to find matching function samples for the candidate later(for example if the profile is stale), even if a match was found when the candidate was enqueued. See this reduced program: file1.c: ``` int bar(int x); int(foo())() { return bar; }; void func() { int (fptr)(int); fptr = foo(); a += (fptr)(10); } ``` file2.c: ``` int bar(int x) { return x + 1;} ``` The two CALL: `foo` and `(ptr)` are pushed into the queue at the beginning, say `foo` is hotter and popped first for inlining. During the inlining of `foo`, it performs the constant propagation for the function pointer `bar` and then changed `(ptr)` to a direct call `bar(..)`. Note that at this time, `(ptr)/bar` is still in the queue, later while it's popped out for inlining, it use the a different target name(bar) to look for the callee samples. At the same time, if the profile is stale and the new function is different from the old function in the profile, then this led the return of the null callee sample. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D154637	2023-07-07 14:56:21 -07:00
wlei	444d2e1a54	[CSSPGO] Enable stale profile matching by default for CSSPGO We tested the stale profile matching on several Meta's internal services, all results are positive, for instance, in one service that refreshed its profile every one or two weeks, it consistently gave 1~2% performance improvement. We also observed an instance that a trivial refactoring caused a 2% regression and the matching can successfully recover the whole regression. Therefore, we'd like to turn it on by default for CSSPGO. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D154027	2023-06-29 11:18:51 -07:00
Amir Ayupov	b244a4c4c9	[profi][NFC] Get rid of afdo_detail::TypeMap Parametrize SampleProfileInference and SampleProfileLoaderBaseImpl by function type (Function/MachineFunction) instead of block type (BasicBlock/MachineBasicBlock). Move out specializations to appropriate locations. This change makes it possible to use GraphTraits instead of a custom TypeMap and make SampleProfileInference not dependent on LLVM types, paving the way for generalizing SampleProfileInference interfaces to BOLT IR types (BinaryFunction/BinaryBasicBlock) in stale profile matching (D144500). Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D152187	2023-06-06 13:48:37 -07:00
Hongtao Yu	b7d9322b49	[FS-AFDO] Load pseudo probe profile on MIR This change enables loading pseudo-probe based profile on MIR. Different from the IR profile loader, callsites are excluded from MIR profile loading since they are not assinged a FS discriminator. Using zero as the discriminator is not accurate and would undo the distribution work done by the IR loader based on pseudo probe distribution factor. We reply on block probes only for FS profile loading. Some refactoring is done to the IR profile loader so that `getProbeWeight` can be shared by both loaders. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D148584	2023-05-10 11:29:37 -07:00
wlei	ba3cbc7aad	fix a use-after-free failure	2023-04-28 15:55:51 -07:00
wlei	892daede72	[SamplePGO] Stale profile matching(part 2) Part 2 of https://reviews.llvm.org/D147456 Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular: - Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles. - Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped. The input of the algorithm: - IR locations: the anchor is the callee name of direct callsite. - Profile locations: the anchor is the call target name for `BodySample`s or inlinee's profile name for `CallsiteSamples`. The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor. For example: say location `1.2(foo)` refers to a callsite at `1.2` with callee name `foo` and `1.3` refers to a non-directcall location `1.3`. ``` // The current build source code: int main() { 1. ... 2. foo(); 3. ... 4 ... 5. ... 6. bar(); 7. ... } ``` IR locations are populated and simplified as: `[1, 2(foo), 3, 5, 6(bar), 7]`. ``` ; The "stale" profile: main:350:1 1: 1 2: 3 3: 100 foo:100 4: 2 7: 2 8: 200 bar:200 9: 30 ``` Profile locations are populated and simplified as `[1, 2, 3(foo), 4, 7, 8(bar), 9]` Matching heuristic: - Match all the anchors in lexical order first. - Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor. So the example above is matched like: ``` [1, 2(foo), 3, 5, 6(bar), 7] \| \| \| \| \| \| [1, 2, 3(foo), 4, 7, 8(bar), 9] ``` 3 -> 4 matching is based on anchor `foo`, 5 -> 7 matching is based on anchor `bar`. The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9]. For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into `IRToProfileLocationMap`(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(`distributeIRToProfileLocationMap`) Clang-self build benchmark: Current build version: clang-10 The profiled version: clang-9 Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list). 1) Regression to using refresh profile with this off : -3.93% 2) Regression to using refresh profile with this on : -1.1% So this algorithm can recover ~72% of the regression. Internal(Meta) large-scale services. we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win. Notes or future work: - Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure. - The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future. - Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples. - This doesn't handle function name mismatch, we plan to solve it in future. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D147545	2023-04-28 13:07:32 -07:00
wlei	a98d6a11ea	[SamplePGO] Stale profile matching(part 1) AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary. This patch is the part 1 of stale profile matching, summary of the implementation: - Added a structure for the matching result:`LocToLocMap`, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile). - In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(`IRToProfileLocationMap`) to `FunctionSamples`. - Added a wrapper(`mapIRLocToProfileLoc`) for the query to the location, the location from input IR will be remapped to the matched profile location. - Added a new switch `--salvage-stale-profile`. - Some refactoring for the staleness detection. Test case is in part 2 with the matching algorithm. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D147456	2023-04-28 13:07:32 -07:00
Bjorn Pettersson	a20f7efbc5	Remove several no longer needed includes. NFCI Mostly removing includes of InitializePasses.h and Pass.h in passes that no longer has support for the legacy PM.	2023-04-17 13:54:19 +02:00
wlei	339b8a0019	[AutoFDO] Use flattened profiles for profile staleness metrics For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching. Example for profile flattening: ``` Original profile: _Z3bazi:20301:1000 1: 1000 3: 2000 5: inline1:1600 1: 600 3: inline2:500 1: 500 Flattened profile: _Z3bazi:18701:1000 1: 1000 3: 2000 5: 600 inline1:600 inline1:1100:600 1: 600 3: 500 inline2: 500 inline2:500:500 1: 500 ``` This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D146452	2023-03-30 11:05:10 -07:00
Arthur Eubanks	fa6ea7a419	[AlwaysInliner] Make legacy pass like the new pass The legacy pass is only used in AMDGPU codegen, which doesn't care about running it in call graph order (it actually has to work around that fact). Make the legacy pass a module pass and share code with the new pass. This allows us to remove the legacy inliner infrastructure. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D146446	2023-03-21 11:04:22 -07:00
Arthur Eubanks	eecb8c5f06	[SampleProfile] Use LazyCallGraph instead of CallGraph The function order in some tests had to be changed because they relied on ordering of functions returned in an SCC which is consistent but unspecified.	2023-03-20 13:43:54 -07:00
Fangrui Song	1e6921131a	Move global namespace cl::opt inside llvm::	2023-02-14 00:09:44 -08:00
Steven Wu	516e301752	[NFC][Profile] Access profile through VirtualFileSystem Make the access to profile data going through virtual file system so the inputs can be remapped. In the context of the caching, it can make sure we capture the inputs and provided an immutable input as profile data. Reviewed By: akyrtzi, benlangmuir Differential Revision: https://reviews.llvm.org/D139052	2023-02-01 09:25:02 -08:00
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
wlei	97e2aeab71	[AutoFDO] Use getHeadSamplesEstimate instead of getTotalSamples to compute profile callsite staleness Fix two issues for profile staleness report. 1) It should be more accurate to use the sum of all entry count(`getHeadSamplesEstimate`) for the callsite samples than the total samples, since even the top-level callsite is mismatched, it does affect the inlining but it can still be merged into base profile and used later. 2) I accidentally missed to persist the num of mismatched callsite into binary. Also added the asm testing to test the decoding of the section. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D140063	2022-12-15 11:21:18 -08:00
Fangrui Song	75801e3b45	Transforms/IPO: llvm::Optional => std::optional	2022-12-05 07:07:19 +00:00
Fangrui Song	89fae41ef1	[IR] llvm::Optional => std::optional Many llvm/IR/* files have been migrated by other contributors. This migrates most remaining files.	2022-12-05 04:13:11 +00:00
Vasileios Porpodas	8a1ccb8ae0	[NFC] Removed call to getInstList() from range loops on BBs. Differential Revision: https://reviews.llvm.org/D138605	2022-11-29 17:33:10 -08:00
wlei	47b0758049	[SampleFDO] Persist profile staleness metrics into binary With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system. For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`) In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D136698	2022-11-09 22:34:33 -08:00
wlei	63c27c5c75	Use getCanonicalFnName for callee name	2022-10-27 13:14:47 -07:00
wlei	d6a0585dd1	[SampleFDO] Compute and report profile staleness metrics When a profile is stale and profile mismatch could happen, the mismatched samples are discarded, so we'd like to compute the mismatch metrics to quantify how stale the profile is, which will suggest user to refresh the profile if the number is high. Two sets of metrics are introduced here: - (Num_of_mismatched_funchash/Total_profiled_funchash), (Samples_of_mismached_func_hash / Samples_of_profiled_function) : Here it leverages the FunctionSamples's checksums attribute which is a feature of pseudo probe. When the source code CFG changes, the function checksums will be different, later sample loader will discard the whole functions' samples, this metrics can show the percentage of samples are discarded due to this. - (Num_of_mismatched_callsite/Total_profiled_callsite), (Samples_of_mismached_callsite / Samples_of_profiled_callsite) : This shows how many mismatching for the callsite location as callsite location mismatch will affect the inlining which is highly correlated with the performance. It goes through all the callsite location in the IR and profile, use the call target name to match, report the num of samples in the profile that doesn't match a IR callsite. This is implemented in a new class(SampleProfileMatcher) and under a switch("--report-profile-staleness"), we plan to extend it with a fuzzy profile matching feature in the future. Reviewed By: hoy, wenlei, davidxl Differential Revision: https://reviews.llvm.org/D136627	2022-10-26 21:06:52 -07:00
Kazu Hirata	00874c48ea	[IPO] Reorder parameters of InlineFunction (NFC) With the recent addition of new parameter MergeAttributes (D134117), callers need to specify several default parameters before getting to specify the new parameter. This patch reorders the parameters so that callers do not have to specify as many default parameters. Differential Revision: https://reviews.llvm.org/D134125	2022-09-20 09:09:38 -07:00
Kazu Hirata	284f0397e2	[Transforms] Merge function attributes within InlineFunction (NFC) In the past, we've had a bug resulting in a compiler crash after forgetting to merge function attributes (D105729). This patch teaches InlineFunction to merge function attributes. This way, we minimize the "time" when the IR is valid, but the function attributes are not. Differential Revision: https://reviews.llvm.org/D134117	2022-09-17 23:10:23 -07:00
Kazu Hirata	56ea4f9bd3	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-27 21:21:02 -07:00
Paul Kirth	3155e3070c	[llvm][misexpect] Re-enable MisExpect for SampleProfiling MisExpect was occasionally crashing under SampleProfiling, due to a division by zero. We worked around that in D124302 by changing the assert to an early return. This patch is intended to add a test case for the crashing scenario and re-enable MisExpect for SampleProfiling. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124481	2022-08-26 20:24:10 +00:00
Kazu Hirata	50724716cd	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-14 12:51:58 -07:00
Arthur Eubanks	a3ac1cfaed	[SampleProfile] Fix non-determinism in promoteMergeNotInlinedContextSamples() We're seeing non-determinism with loading sample profiles. It seems to be related to the order in which we merge FunctionSamples in promoteMergeNotInlinedContextSamples(). Use a MapVector to iterate over NonInlinedCallSites in the order entries were inserted. Reviewed By: wenlei, davidxl Differential Revision: https://reviews.llvm.org/D131592	2022-08-12 10:13:25 -07:00
Mircea Trofin	7b81a81d5f	[NFC] FunctionSamples::getEntrySamples -> getHeadSamplesEstimate The name `getEntrySamples` was misleading for 2 reasons. One, it's close in name to `Function::getEntryCount`, but the equivalent here is `getHeadSamples`; second, as opposed to the other get* APIs in `FunctionSamples`, it performs an estimate/heuristic rather than just retrieving raw data (or a non-heuristic derivate off that data, like `getMaxCountInside`) The new name should more clearly communicate its intent; and, being close (in name) to `getHeadSamples`, it should allow the reader discover the relation between them. Also updated the doc comments for both `getHeadSamples[Estimate]` so a reader may better understand the relation between them. Differential Revision: https://reviews.llvm.org/D130281	2022-07-22 09:17:59 -07:00
Fangrui Song	dd5e3f0e27	[LegacyPM] Remove SampleProfileLoaderLegacyPass Following recent changes removing non-core features of the legacy PM/optimization pipeline (e.g. PGO), remove SamplePGO.	2022-07-17 12:09:46 -07:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
wlei	7e86b13c63	[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner. One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node. Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic. Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D127031	2022-06-27 23:22:21 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Mingming Liu	e0d069598b	[Inline] Annotate inline pass name with link phase information for analysis. The annotation is flag gated; flag is turned off by default. Differential Revision: https://reviews.llvm.org/D125495	2022-06-24 10:06:43 -07:00
Mingming Liu	bc856eb3fc	[SampleProfile][Inline] Annotate sample profile inline remarks with link phase (prelink/postlink) information. Differential Revision: https://reviews.llvm.org/D126833	2022-06-22 17:00:53 -07:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
serge-sans-paille	7030654296	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847	2022-05-04 08:32:38 +02:00
Hongtao Yu	3113e5bb52	[CSSPGO] Relax size limitation for priority inlining with preinlined profile As a follow-up to D124632, I'm turning on unlimited size caps for inlining with preinlined profile. It should be safe as a preinlined profile has "bounded" inline contexts. No noticeable size or perf delta was seen with two of our internal large services, but I think this is still a good change to be consistent with the other case. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124793	2022-05-03 18:43:07 -07:00
Hongtao Yu	e95ae395aa	[CSSPGO][NFC] Replace SampleProfileLoader::ProfileIsCS with FunctionSamples::ProfileIsCS. The two fields have the same meaning. Their values come from the reader. Therefore I'm removing one. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124788	2022-05-03 18:32:37 -07:00
Hongtao Yu	bdb8c50a1c	[CSSPGO] Turn on priority inlining for probe-only profile We have seen that the prioirty inliner delivered on-par performance with the old inliner for probe-only CSSPGO profile, as long as without a size budget. I'm turning on the priority inliner for probe-only profile by default. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124632	2022-04-29 17:31:56 -07:00
Hongtao Yu	e36786d15f	[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122602	2022-04-29 17:03:52 -07:00
Paul Kirth	4683a2effa	[llvm][misexpect] Avoid division by 0 when using sample profiling MisExpect diagnostics should not prevent compilation from succeeding, and the assertion is insufficient to prevent division by zero in release builds. This patch addresses that by replacing the assert with an early return. Additionally, it disables MisExpect diagnostics when using sample profiling, since this is the only known case where this error has manifested. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124302	2022-04-22 22:48:00 +00:00
Paul Kirth	bac6cd5bf8	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-04-19 21:23:48 +00:00
Jorge Gorbe Moya	fc7573f29c	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit 46774df307159444d65083c2fd82f8574f0ab1d9.	2022-03-31 14:54:41 -07:00
Paul Kirth	46774df307	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-03-31 17:38:21 +00:00
Paul Kirth	90cb325abd	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit 2add3fbd976d7b80a3a7fc14ef0deb9b1ca6beee.	2022-03-29 06:20:30 +00:00
Paul Kirth	2add3fbd97	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D115907	2022-03-28 23:30:04 +00:00
Hongtao Yu	7a316c0a1f	[CSSPGO] Turn on profi and ext-tsp when using probe-based profile. Probe-based profile leads to a better performance when combined with profi and ext-tsp block layout. I'm turning them on by default. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122442	2022-03-25 09:09:21 -07:00

1 2 3 4 5 ...

334 Commits