llvm-project

Author	SHA1	Message	Date
Mircea Trofin	bb6497ffa6	[BPI] Reuse the AsmWriter's BB naming scheme in BranchProbabilityPrinterPass (#73593 ) When using `BranchProbabilityPrinterPass`, if a BB has no name, we get pretty unusable information like `edge -> has probability...` (i.e. we have no idea what the vertices of that edge are). This patch uses `printAsOperand`, which uses the same naming scheme as `Function::dump`, so for example during debugging sessions, the IR obtained from a function and the names used by `BranchProbabilityPrinterPass` will match. A shortcoming is that `printAsOperand` will result in the numbering algorithm re-running for every edge and every vertex (when `BranchProbabilityPrinterPass` is run on a function). If, for the given scenario, this is a problem, we can revisit this subsequently. Another nuance is that the entry basic block will be numbered, which may be slightly confusing when it's anonymous, but it's easily identifiable - the first edge would have it as source (and the number should be easily recognizable)	2023-12-02 13:01:48 -08:00
William Junda Huang	683f2df6e5	[SampleProfile] Fix bug where remapper returns empty string and crashing Sample Profile loader (#71479 ) Normally SampleContext does not allow using an empty StirngRef to construct an object, this is to prevent bugs reading the profile. However empty names may be emitted by a function which its name is intentionally set to empty, or a bug in the remapper that returns an empty string. Regardless, converting it to FunctionId first will prevent the assert, and that assert check is unnecessary, which will be addressed in another patch	2023-11-10 21:38:13 +00:00
Matthias Braun	e3cf80c5c1	BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that: * Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room. * Spread the difference between hottest/coldest block as much as possible to increase precision. * If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.	2023-10-24 20:27:39 -07:00
Fangrui Song	001af0f894	[MC] Actually make .pseudoprobe created sections deterministic Fix a18ee8b7c95c6dfa410c6acaaf8cffcfde1220b5 to use a comparator that actually works: assign an ordinal to registered section.	2023-09-20 22:41:28 -07:00
HaohaiWen	954979d681	Reland [InlineCost] Enable the cost benefit analysis for Sample PGO (#66457 ) Enables the cost-benefit-analysis-based inliner by default if we have sample profile. No extra fix is required.	2023-09-21 12:44:24 +08:00
Haohai Wen	486fc81583	Revert "[InlineCost] Enable the cost benefit analysis for Sample PGO (#66457 )" This reverts commit 2f2319cf2406d9830a331cbf015881c55ae78806.	2023-09-21 10:30:39 +08:00
HaohaiWen	2f2319cf24	[InlineCost] Enable the cost benefit analysis for Sample PGO (#66457 ) Enables the cost-benefit-analysis-based inliner by default if we have sample profile.	2023-09-21 09:21:55 +08:00
Fangrui Song	a18ee8b7c9	[MC] Make .pseudo_probe created sections deterministic after D91878 MCPseudoProbeSections::emit iterates over MCProbeDivisions and creates sections. When the map key is MCSymbol *, the iteration order is not stable. The underlying BumpPtrAllocator largely decreases the flakiness. That said, two elements may sit in two different allocations from BumpPtrAllocator, with an unpredictable order. Under tcmalloc, llvm/test/Transforms/SampleProfile/pseudo-probe-emit.ll fails about 7 times per 1000 runs.	2023-09-20 18:11:14 -07:00
Simon Pilgrim	e6b85c3027	[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case (REAPPLIED) Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future. Reapplied after reversion at e1e3c75c7dad72 with a tweak to the pseudo-probe-peep.ll test Differential Revision: https://reviews.llvm.org/D158068	2023-09-13 12:33:39 +01:00
Hongtao Yu	6b856abc6f	[PseudoProbe] Use probe id as the base dwarf discriminator for callsites (#65685 ) With `-fpseudo-probe-for-profiling`, the dwarf discriminator for a callsite will be overwritten to pseudo probe related information for that callsite. The probe information is encoded in a special format (i.e., with all lowest three digits be one) in order to be distinguished from regular dwarf discriminator. The special encoding format will be decoded to zero by the regular discriminator logic. This means all callsites would have a zero discriminator in both the sample profile and the compiler, for classic AutoFDO. This is inconvenient in that no decent classic AutoFDO can be generated from a pseudo probe build. I'm mitigating the issue by allowing callsite probe id to be used as the base dwarf discriminator for classic AutoFDO, since probe id is also unique and can be used to differentiate callsites on the same source line.	2023-09-08 09:49:54 -07:00
wlei	4bb6bbb9bf	[CSSPGO] Skip reporting staleness metrics for imported functions Accumulating the staleness metrics from per-link is less accurate than doing it from post-link time(assuming we use the offline profile mismatch as baseline), the reason is that there are some duplicated reports for the same functions, for example, one template function could be included in multiple TUs, but in post thin link time, only one function are kept(linkonce_odr) and others are marked as available-externally function. Hence, this change skips reporting the metrics for imported functions(available-externally). I saw the post-link number is now very close to the offline number(dump the mismatched functions and count the metrics offline based on the entire profile), sightly smaller than offline number due to some missing inlined functions. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D156725	2023-08-30 18:00:23 -07:00
wlei	3365cd4544	[CSSPGO] Compute checksum mismatch recursively on nested profile Follow-up diff for https://reviews.llvm.org/D158891. Compute the checksum mismatch based on the original nested profile. Additionally, use a recursive way to compute the children mismatched samples in the nested tree even the top-level func checksum is matched. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D158900	2023-08-30 18:00:23 -07:00
wlei	62a3f6c96e	[CSSPGO] Retire FlattenProfileForMatching - Always use flattened profile to find the profile anchors. Since profile under different contexts may have different inlined callsites, to get more profile anchors, we use a merged profile from all the contexts(the flattened profile) to find callsite anchors. - Compute the staleness metrics based on the original nested profile, as currently once a callsite is mismatched, all its children profile are dropped.(TODO: in future, we can improve to reuse the children valid profile) Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D158891	2023-08-30 18:00:23 -07:00
wlei	062af2e763	[CSSPGO] Support stale profile matching for LTO As in per-link time, callsites could be optimized out by inlining, we don't have those original call targets in the IR in LTO time. Additionally, the inlined code doesn't actually belong to the original function, the IR locations or pseudo probe parsed from it are incorrect and could mislead the matching later. This change adds the support to extract the original IR location info from the inlined code, specifically, it make sure to skip all the inlined code that doesn't belong the original function, but before that, it processes the inline frames of the debug info to extract the base frame and recover its callsite and callee target(name). Measured on some stale profile instances, all showed some perf improvements. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D156722	2023-08-30 18:00:23 -07:00
wlei	148cceb0d6	[CSSPGO] Refactoring SampleProfileMatcher::runOnFunction - rename `IRLocation` --> `IRAnchors`, `ProfileLocation` --> `ProfileAnchors` - reorganize runOnFunction, fact out the finding IR anchors code into `findIRAnchors` - introduce a new function `findProfileAnchors` to populate the profile related anchors, the result is saved into `ProfileAnchors`, it's later used for both mismatch report and matching, this can avoid to parse the `getBodySamples` and `getCallsiteSamples` for multiple times. - move the `MatchedCallsiteLocs` stuffs from `findIRAnchors` to `countProfileMismatches` so that all the staleness metrics report are computed in one function. - move all matching related into `runStaleProfileMatching`, and move all mismatching report into `countProfileMismatches` Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D158817	2023-08-30 18:00:23 -07:00
Sameer Sahasrabuddhe	f7031c41ec	[NFC] strengthen some CHECK-NOT lines The affected lit tests failed when they were run in a path that contained the word "call". CHECK-NOT lines that were supposed to match only the IR ended up matching the path printed in the output. Fixed this by checking for "call void" instead.	2023-08-07 16:50:03 +05:30
wlei	bfefeeb139	[SamplePGO] Fix ICE that callee samples returns null while finding import functions We found that in a special condition, the input callee `Samples` is null for `findExternalInlineCandidate`, which caused an ICE. In some rare cases, call instruction could be changed after being pushed into inline candidate queue, this is because earlier inlining may expose constant propagation which can change indirect call to direct call. When this happens, we may fail to find matching function samples for the candidate later(for example if the profile is stale), even if a match was found when the candidate was enqueued. See this reduced program: file1.c: ``` int bar(int x); int(foo())() { return bar; }; void func() { int (fptr)(int); fptr = foo(); a += (fptr)(10); } ``` file2.c: ``` int bar(int x) { return x + 1;} ``` The two CALL: `foo` and `(ptr)` are pushed into the queue at the beginning, say `foo` is hotter and popped first for inlining. During the inlining of `foo`, it performs the constant propagation for the function pointer `bar` and then changed `(ptr)` to a direct call `bar(..)`. Note that at this time, `(ptr)/bar` is still in the queue, later while it's popped out for inlining, it use the a different target name(bar) to look for the callee samples. At the same time, if the profile is stale and the new function is different from the old function in the profile, then this led the return of the null callee sample. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D154637	2023-07-07 14:56:21 -07:00
Fangrui Song	2cb8d5ca3a	[Pseudo Probe] Do not place functions in nodeduplicate COMDATs For a function not in an IR COMDAT, currently we place it into a nodeduplicate IR COMDAT so that its text section and its associated .pseudo_probe section will be in the same section group, which can be retained or discarded by the linker as a unit. However, the section group wastes space. After D153189 uses SHF_LINK_ORDER to ensure a .pseudo_probe section will be discarded when its associated text section is discarded, we can remove the nodeduplicate IR change. In the following example, the .pseudo_probe associated with .text.f is discarded as expected. ``` clang -c -ffunction-sections -fpseudo-probe-for-profiling -xc =(printf 'void _start(){} void f(){}') -o a.o ld.lld --gc-sections --print-gc-sections a.o ``` Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D153191	2023-06-17 15:40:20 -07:00
Fangrui Song	62d8614223	[Pseudo Probe] Make .pseudo_probe GC-able * Add the SHF_LINK_ORDER flag so that the .pseudo_probe section is discarded when the associated text section is discarded. * Add unique ID so that with `clang -ffunction-sections -fno-unique-section-names`, there is one separate .pseudo_probe for each text section (disambiguated by `.section ....,unique,id` in assembly) The changes allow .pseudo_probe GC even if we don't place instrumented functions in an IR comdat (see `getOrCreateFunctionComdat` in SampleProfileProbe.cpp). Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D153189	2023-06-16 23:46:36 -07:00
Arthur Eubanks	3e39cfe5b4	Revert "Revert "InstSimplify: Require instruction be parented"" This reverts commit 0c03f48480f69b854f86d31235425b5cb71ac921. Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.	2023-06-16 13:53:31 -07:00
Arthur Eubanks	0c03f48480	Revert "InstSimplify: Require instruction be parented" This reverts commit 1536e299e63d7788f38117b0212ca50eb76d7a3b. Causes large binary size regressions, see comments on https://reviews.llvm.org/rG1536e299e63d7788f38117b0212ca50eb76d7a3b.	2023-06-16 11:24:29 -07:00
Alan Zhao	d6b4f6786b	Revert "Revert "InstSimplify: Require instruction be parented"" This reverts commit 00264eac4d0938ae8a0826da38e4777be269124c. Reason: caused a bunch of bots to break	2023-06-16 10:58:54 -07:00
Alan Zhao	00264eac4d	Revert "InstSimplify: Require instruction be parented" This reverts commit 1536e299e63d7788f38117b0212ca50eb76d7a3b. Reason: causes a regression in the inliner (see https://crbug.com/1454531 and https://reviews.llvm.org/rG1536e299e63d7788f38117b0212ca50eb76d7a3b#1217141)	2023-06-16 10:36:49 -07:00
Matt Arsenault	1536e299e6	InstSimplify: Require instruction be parented Unlike every other analysis and transform, simplifyInstruction permitted operating on instructions which are not inserted into a function. This created an edge case no other code needs to really worry about, and limited transforms in cases that can make use of the context function. Only the inliner and a handful of other utilities were making use of this, so just fix up these edge cases. Results in some IR ordering differences since cloned blocks are inserted eagerly now. Plus some additional simplifications trigger (e.g. some add 0s now folded out that previously didn't).	2023-06-02 18:14:28 -04:00
Hongtao Yu	23da210624	[PseudoProbe] Do not force the calliste debug loc to inlined probes from __nodebug__ functions. For pseudo probes we would like to keep their original dwarf discriminator (either a zero or null) until the first FS-discriminator pass. The inliner is a violation of that, given that it assigns inlinee instructions with no debug info with the that of the callsite. This is being disabled in this patch. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D151568	2023-05-26 13:00:16 -07:00
Shengchen Kan	c81a121f3f	Revert "Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC lowering, NFCI"" This reverts commit cb16b33a03aff70b2499c3452f2f817f3f92d20d. In fact, the test https://bugs.chromium.org/p/chromium/issues/detail?id=1446973#c2 already passed after 5586bc539acb26cb94e461438de01a5080513401	2023-05-19 22:21:56 +08:00
Hans Wennborg	cb16b33a03	Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and optimize during MC lowering, NFCI" This caused compiler assertions, see comment on https://reviews.llvm.org/D150107. This also reverts the dependent follow-up change: > [X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI > > This is follow-up of D150107. > > In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be > shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp. > > Differential Revision: https://reviews.llvm.org/D150949 This reverts commit 2ef8ae134828876ab3ebda4a81bb2df7b095d030 and 5586bc539acb26cb94e461438de01a5080513401.	2023-05-19 14:43:33 +02:00
Shengchen Kan	5586bc539a	[X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI This is follow-up of D150107. In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp. Differential Revision: https://reviews.llvm.org/D150949	2023-05-19 18:22:30 +08:00
Hongtao Yu	b7d9322b49	[FS-AFDO] Load pseudo probe profile on MIR This change enables loading pseudo-probe based profile on MIR. Different from the IR profile loader, callsites are excluded from MIR profile loading since they are not assinged a FS discriminator. Using zero as the discriminator is not accurate and would undo the distribution work done by the IR loader based on pseudo probe distribution factor. We reply on block probes only for FS profile loading. Some refactoring is done to the IR profile loader so that `getProbeWeight` can be shared by both loaders. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D148584	2023-05-10 11:29:37 -07:00
Hongtao Yu	9272d0f079	[PseudoProbe] Clean up dwarf discriminator and avoid duplicating factor. A pseudo probe is created with dwarf line information shared with its nearest instruction. If the instruction comes with a dwarf discriminator, it will be shared with the probe as well. This can confuse the later FS-AFDO discriminator assignment pass. To fix this, I'm cleaning up the discriminator fields for probes when they are inserted. I also notice another possibility to change the discriminator field of pseudo probes in the pipeline before the FS discriminator assignment pass. That is the loop unroller, which assigns duplication factor to instruction being vectorized. I'm disabling that for pseudo probe intrinsics specifically, also for callsites with probes. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D148569	2023-05-10 11:26:23 -07:00
William Huang	d38d6ca179	[llvm-profdata] Deprecate Compact Binary Sample Profile Format Remove support for compact binary sample profile format Reviewed By: davidxl, wenlei Differential Revision: https://reviews.llvm.org/D149400	2023-05-01 17:10:08 +00:00
wlei	892daede72	[SamplePGO] Stale profile matching(part 2) Part 2 of https://reviews.llvm.org/D147456 Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular: - Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles. - Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped. The input of the algorithm: - IR locations: the anchor is the callee name of direct callsite. - Profile locations: the anchor is the call target name for `BodySample`s or inlinee's profile name for `CallsiteSamples`. The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor. For example: say location `1.2(foo)` refers to a callsite at `1.2` with callee name `foo` and `1.3` refers to a non-directcall location `1.3`. ``` // The current build source code: int main() { 1. ... 2. foo(); 3. ... 4 ... 5. ... 6. bar(); 7. ... } ``` IR locations are populated and simplified as: `[1, 2(foo), 3, 5, 6(bar), 7]`. ``` ; The "stale" profile: main:350:1 1: 1 2: 3 3: 100 foo:100 4: 2 7: 2 8: 200 bar:200 9: 30 ``` Profile locations are populated and simplified as `[1, 2, 3(foo), 4, 7, 8(bar), 9]` Matching heuristic: - Match all the anchors in lexical order first. - Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor. So the example above is matched like: ``` [1, 2(foo), 3, 5, 6(bar), 7] \| \| \| \| \| \| [1, 2, 3(foo), 4, 7, 8(bar), 9] ``` 3 -> 4 matching is based on anchor `foo`, 5 -> 7 matching is based on anchor `bar`. The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9]. For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into `IRToProfileLocationMap`(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(`distributeIRToProfileLocationMap`) Clang-self build benchmark: Current build version: clang-10 The profiled version: clang-9 Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list). 1) Regression to using refresh profile with this off : -3.93% 2) Regression to using refresh profile with this on : -1.1% So this algorithm can recover ~72% of the regression. Internal(Meta) large-scale services. we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win. Notes or future work: - Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure. - The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future. - Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples. - This doesn't handle function name mismatch, we plan to solve it in future. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D147545	2023-04-28 13:07:32 -07:00
wlei	339b8a0019	[AutoFDO] Use flattened profiles for profile staleness metrics For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching. Example for profile flattening: ``` Original profile: _Z3bazi:20301:1000 1: 1000 3: 2000 5: inline1:1600 1: 600 3: inline2:500 1: 500 Flattened profile: _Z3bazi:18701:1000 1: 1000 3: 2000 5: 600 inline1:600 inline1:1100:600 1: 600 3: 500 inline2: 500 inline2:500:500 1: 500 ``` This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D146452	2023-03-30 11:05:10 -07:00
wlei	8ab9eebb18	[Pseudo Probe] Add the test for probe desc Added a test to https://reviews.llvm.org/D146657, make sure the guid and name are computed using the debug info name. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D146826	2023-03-24 12:43:16 -07:00
Arthur Eubanks	eecb8c5f06	[SampleProfile] Use LazyCallGraph instead of CallGraph The function order in some tests had to be changed because they relied on ordering of functions returned in an SCC which is consistent but unspecified.	2023-03-20 13:43:54 -07:00
Yuanfang Chen	9aae408d55	[NFC] fix typo `funciton` -> `function` credits to @jmagee	2023-03-10 18:05:25 -08:00
Hongtao Yu	c38c8d6743	[PseudoProbe] Refactoring a test As titled. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D144137	2023-02-15 14:07:51 -08:00
Hongtao Yu	eddec9de44	[Pseudo probe] Duplicate probes in vectorized loop body. Prevoius pseudo probes were dropped out of a vectorized loop body during loop vectorization. This can result in the samples of the loop entry is used for the loop body, which in turn can cause undercounting of the loop iteration count. The undercounting can further prevent the loop from being vectorized in the next build. I'm fixing this by explicting allowing pseudo probes to be kept in the vectorized loop body, and by claiming a probe instruction is not "uniform", the vectorizer will duplicate it by the number of vector lanes. For one internal service, I'm seeing the change causes the size increase of the .pseudoprobe section by 0.7%, which should count around 0.2% of the whole binary size. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D144066	2023-02-15 10:18:08 -08:00
Hongtao Yu	950487bddf	[Pseudo Probe] Do not instrument EH blocks. This change avoids inserting probes to EH blocks. Pseudo probe can prevent block merging when probes in the blocks look different. This has a chained effect to passes incurring exponential IR growth (such as jump threading) and as a consequence the compilation may time out. Not inserting probes to EH blocks could mitigate the issue. Another benefit is that both IR size and binary size are smaller. Since EH blocks are usually cold, the change should have minimal impact to profile quality. Testing: Out of two internal large benchmarks, no perf impact seen. 1% size savings to both the `text` and the `pseudo_probe` section. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D142747	2023-01-30 13:26:56 -08:00
spupyrev	45b155924e	[BOLT] using jump weights in profi We want to use profile inference (profi) in BOLT for stale profile matching. This is the second change for existing usages of profi (e.g., CSSPGO): (i) Added the ability to provide (estimated) jump weights for the algorithm. The goal of the algorithm is to create a valid control flow for a given function (that is, one in which incoming counts equal outgoing counts for every basic block while minimally modifying the original input block and jump weights). The input jump weights will be provided based on collected LBR profiles in BOLT. (ii) Added the corresponding options to ProfiParams. (iii) Slightly modified / simplified the construction of the flow network in profi so as it utilizes fewer auxiliary nodes. This is done by introducing parallel edges to the network (which is supported by MMF) and reduces the size of the network from 3\|V\| to 2\|V\|, where \|V\| is the number of basic blocks in the function. Inference (profile quality) impact: The diff is supposed to be a no-op for the inferred counts. However, our implementation of MCF is not fully deterministic and might return different results depending on the input network model. Since we changed the model construction, there are a few differences in comparison to the original implementation. I checked manually on an internal benchmark and see a minor difference (+/- 1 count for certain basic blocks) in just a dozen of instances (out of 10000+ input functions). Hence, the diff is highly unlikely to have an impact for existing prod workloads. Runtime impact: I measure up to 10% speedup for block-only (ie CSSPGO/AutoFDO) inference and up to 50% speedup for block+jump inference (ie BOLT) in comparison to the original unoptimized version. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D139870	2023-01-11 14:34:43 -08:00
spupyrev	61eb12e1f4	[BOLT] introducing profi params We want to use profile inference (profi) in BOLT for stale profile matching. To this end, I am making a few changes modifying the interface of the algorithm. This is the first change for existing usages of profi (e.g., CSSPGO): - introducing an object holding the algorithmic parameters; - some renaming of existing options; - dropped unused option, SampleProfileInferEntryCount, as we don't plan to change its default value; - no changes in the output / tests. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D134756	2023-01-09 12:03:28 -08:00
Nikita Popov	25450788a4	[SampleProfile] Avoid branch on undef UB in tests (NFC)	2023-01-03 14:23:25 +01:00
Nikita Popov	bf5f05e3fe	[SampleProfile] Regenerate test checks (NFC)	2022-12-22 16:24:03 +01:00
wlei	97e2aeab71	[AutoFDO] Use getHeadSamplesEstimate instead of getTotalSamples to compute profile callsite staleness Fix two issues for profile staleness report. 1) It should be more accurate to use the sum of all entry count(`getHeadSamplesEstimate`) for the callsite samples than the total samples, since even the top-level callsite is mismatched, it does affect the inlining but it can still be merged into base profile and used later. 2) I accidentally missed to persist the num of mismatched callsite into binary. Also added the asm testing to test the decoding of the section. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D140063	2022-12-15 11:21:18 -08:00
Bjorn Pettersson	3528e63d89	[test] Remove duplicate RUN lines in Transform tests	2022-12-08 11:47:16 +01:00
Roman Lebedev	b2c2d49edc	[NFC] Port all SampleProfile tests to `-passes=` syntax	2022-12-08 02:38:50 +03:00
Bjorn Pettersson	a11faeed44	[test] Switch to use -passes syntax in various test cases	2022-12-01 21:25:59 +01:00
Arthur Eubanks	4b3202e639	[opt] Remove "new-pm" from some cl::opt names	2022-11-28 11:00:45 -08:00
Matt Arsenault	0d2271bb44	SampleProfile: Convert tests to opaque pointers syntax.ll required removing some diffs that apparently looked like pointers in message checking.	2022-11-27 21:27:50 -05:00
Matt Arsenault	5e49649d16	SampleProfile: Don't use anonymous values in test These interfered with converting the test to opaque pointers.	2022-11-27 09:40:00 -05:00

1 2 3 4 5 ...

359 Commits