llvm-project

Author	SHA1	Message	Date
Alexis Engelke	efcd3b6108	[IPO][InstCombine][Vectorize][NFCI] Drop uses of BranchInst (#186596 ) Refactor remaining parts of Transforms apart from Scalar and Utils.	2026-03-14 17:49:00 +00:00
HighW4y2H3ll	e615400861	[SampleProfile] Skip counting mismatched weak symbols during profile loading (#185514 ) Weak symbols may be overridden during linking, and this may cause profile mismatch when compiling the weak symbols, while the profile was created based on the overriding function. Skip counting the weak symbol while checking the mismatched function profiles to avoid false alarm on rejecting legit profiles.	2026-03-12 14:53:57 -07:00
serge-sans-paille	2095ea5b40	Remove unused <set> and <map> inclusion (#167175 )	2025-11-09 15:15:21 +00:00
Kazu Hirata	0028ef667a	[llvm] Remove unused local variables (NFC) (#167106 ) Identified with bugprone-unused-local-non-trivial-variable.	2025-11-08 07:41:07 -08:00
Nicolai Hähnle	11a4b2d950	Cleanup the LLVM exported symbols namespace (#161240 ) There's a pattern throughout LLVM of cl::opts being exported. That in itself is probably a bit unfortunate, but what's especially bad about it is that a lot of those symbols are in the global namespace. Move them into the llvm namespace. While doing this, I noticed some other variables in the global namespace and moved them as well.	2025-10-01 15:32:07 -07:00
Mircea Trofin	240b73e10f	[SimplifyCFG][PGO] Reuse existing `setBranchWeights` (#160629 ) The main difference between SimplifyCFG's `setBranchWeights` and the ProfDataUtils' is that the former doesn't propagate all-zero weights. That seems like a sensible thing to do, so updated the latter accordingly, and added a flag to control the behavior. Also moved to ProfDataUtils the logic fitting 64-bit weights to 32-bit. As side-effect, this fixes some profcheck failures.	2025-10-01 09:54:30 -07:00
Aiden Grossman	c91fa95fc7	[SampleProfile] Always use FAM to get ORE The split in this code path was left over from when we had to support the old PM and the new PM at the same time. Now that the legacy pass has been dropped, this simplifies the code a little bit and swaps pointers for references in a couple places. Reviewers: aeubanks, efriedma-quic, wlei-llvm Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/159858	2025-09-19 15:54:40 -07:00
Kazu Hirata	47b9837dad	[llvm] Use std::tie to implement comparison functors (NFC) (#141353 )	2025-05-24 09:37:32 -07:00
Kazu Hirata	3667f29dfd	[llvm] Use std::optional::value_or (NFC) (#140014 )	2025-05-15 07:18:55 -07:00
Owen Rodley	d3d856ad84	Clean up external users of GlobalValue::getGUID(StringRef) (#129644 ) See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.	2025-04-28 11:09:43 +10:00
Kazu Hirata	d8b078d550	[Transforms] Use llvm::append_range (NFC) (#133607 )	2025-03-29 18:57:50 -07:00
Jinjie Huang	c8b69c9076	[NFC][SampleFDO] Clean the unneeded field and the related loop (#132376 ) Clean the unneeded field 'TotalCollectedSamples' and the unnecessary loop. The field seems introduced in:https://reviews.llvm.org/D31952, and its uses were removed in: https://reviews.llvm.org/D19287, but this field and unnecessary calculation were not cleaned up. This patch will remove these unneeded codes.	2025-03-28 11:06:00 +08:00
Kazu Hirata	ad8d549505	[IPO] Avoid repeated hash lookups (NFC) (#132588 )	2025-03-22 22:25:25 -07:00
Lei Wang	068d0c0f4b	[CSSPGO] Turn on call-graph matching by default for CSSPGO (#125938 ) Tested call-graph matching on some of Meta's large services, it works to reuse some renamed function profiles, no negative perf or significant build speed regression observed. Turned it on by default for CSSPGO mode.	2025-02-06 11:54:59 -08:00
Jeremy Morse	81d18ad864	[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction's as positions with iterators. A number of these (such as getFirstNonPHIOrDbg) are sufficiently infrequently used that we can just replace the pointer-returning version with an iterator-returning version, hopefully without much/any disruption. Thus this patch has getFirstNonPHIOrDbg and getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all call-sites. There are no concerns about the iterators returned being converted to Instruction's and losing the debug-info bit: because the methods skip debug intrinsics, the iterator head bit is always false anyway.	2025-01-27 16:27:54 +00:00
Ryan Mansfield	67efbd0bf1	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955 )	2025-01-08 11:07:23 +01:00
Haohai Wen	ccc8e45404	[PseudoProbe] Fix cleanup for pseudo probe after annotation (#119660 ) When using -sample-profile-remove-probe, pseudo probe desc should also be removed and dwarf discriminator for call instruction should be restored.	2024-12-13 17:05:03 +08:00
Lei Wang	bc1aa2863b	[SampleFDO] Support enabling sample loader pass in O0 mode (#113985 ) Add support for enabling sample loader pass in O0 mode(under `-fsample-profile-use`). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy. - Explicitly disable the sample loader inlining to ensure it only emits sampling annotation. - Use flattened profile for O0 mode. - Add the pass after `AddDiscriminatorsPass` pass to work with `-fdebug-info-for-profiling`.	2024-11-08 15:29:44 -08:00
Kazu Hirata	98ea1a81a2	[IPO] Remove unused includes (NFC) (#114716 ) Identified with misc-include-cleaner.	2024-11-03 13:48:55 -08:00
Antonio Frighetto	2ae968a0d9	[Instrumentation] Move out to Utils (NFC) (#108532 ) Utility functions have been moved out to Utils. Minor opportunity to drop the header where not needed.	2024-09-15 21:07:40 -07:00
Lei Wang	ce8c43fe27	Fix assertion of null pointer samples in inline replay mode (#99378 ) Fix https://github.com/llvm/llvm-project/issues/97108. In inline replay mode, `CalleeSamples` may be null and the order doesn't matter.	2024-07-18 10:16:44 -07:00
Lei Wang	18cdfa72e0	[SampleFDO] Stale profile call-graph matching (#95135 ) Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as `int foo()` --> `long foo()` ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable. This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile. Some noteworthy details: 1. Extend the LCS based CFG level matching to identify new function. - Extend to match function and profile have different name instead of the exact function name matching. This leverages LCS, i.e during the finding of callsite anchor matching, when two function name are different, try matching the functions instead of return. - In LCS, the equal function check is replaced by `functionMatchesProfile`. - Only try matching functions that are new functions(neither appears on each side). This reduces the matching scope as we don't need to match the originally matched function. 2. Determine the matching by call-site anchor similarity check. - A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define: `Similarity = \|equalSet * 2\| / (\|A\| + \|B\|)`. The profile name is marked as renamed if the similarity is above a threshold(`-func-profile-similarity-threshold`) 3. Process the matching in top-down function order - when a caller's is done matching, the new function names are saved for later use, using top-down order will maximize the reused results. - `ProfileNameToFuncMap` is used to save or cache the matching result. 4. Update the original profile at the end using `ProfileNameToFuncMap`. 5. Added a new switch --salvage-unused-profile to control this, default is false. Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function)	2024-07-17 10:33:00 -07:00
Mircea Trofin	ce03155a1b	[NFC] Coding style fixes: SampleProf (#98208 ) Also some control flow simplifications. Notably, this doesn't address `sampleprof_error`. I think the style there tries to match `std::error_category`. Also left `hash_value` as-is, because it matches what we do in Hashing.h	2024-07-09 14:35:49 -07:00
Kazu Hirata	3cf762b7b7	[Transforms] Migrate to a new version of getValueProfDataFromInst (#96380 )	2024-06-30 14:54:07 -07:00
Kazu Hirata	836ca5bbf7	[Transforms] Migrate to a new version of getValueProfDataFromInst (#95485 ) Note that the version of getValueProfDataFromInst that returns bool has been "deprecated" since: commit 1e15371dd8843dfc52b9435afaa133997c1773d8 Author: Mingming Liu <mingmingl@google.com> Date: Mon Apr 1 15:14:49 2024 -0700	2024-06-13 18:21:09 -07:00
Paul Kirth	294f3ce5dd	Reapply "[llvm][IR] Extend BranchWeightMetadata to track provenance o… (#95281 ) …f weights" #95136 Reverts #95060, and relands #86609, with the unintended code generation changes addressed. This patch implements the changes to LLVM IR discussed in https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032 In this patch, we add an optional field to MD_prof meatdata nodes for branch weights, which can be used to distinguish weights added from llvm.expect* intrinsics from those added via other methods, e.g. from profiles or inserted by the compiler. One of the major motivations, is for use with MisExpect diagnostics, which need to know if branch_weight metadata originates from an llvm.expect intrinsic. Without that information, we end up checking branch weights multiple times in the case if ThinLTO + SampleProfiling, leading to some inaccuracy in how we report MisExpect related diagnostics to users. Since we change the format of MD_prof metadata in a fundamental way, we need to update code handling branch weights in a number of places. We also update the lang ref for branch weights to reflect the change.	2024-06-12 12:52:28 -07:00
Paul Kirth	607afa0b63	Revert "[llvm][IR] Extend BranchWeightMetadata to track provenance of weights" (#95060 ) Reverts llvm/llvm-project#86609 This change causes compile-time regressions for stage2 builds (https://llvm-compile-time-tracker.com/compare.php?from=3254f31a66263ea9647c9547f1531c3123444fcd&to=c5978f1eb5eeca8610b9dfce1fcbf1f473911cd8&stat=instructions:u). It also introduced unintended changes to `.text` which should be addressed before relanding.	2024-06-11 08:06:06 +02:00
Paul Kirth	c5978f1eb5	[llvm][IR] Extend BranchWeightMetadata to track provenance of weights (#86609 ) This patch implements the changes to LLVM IR discussed in https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032 In this patch, we add an optional field to MD_prof metadata nodes for branch weights, which can be used to distinguish weights added from `llvm.expect*` intrinsics from those added via other methods, e.g. from profiles or inserted by the compiler. One of the major motivations, is for use with MisExpect diagnostics, which need to know if branch_weight metadata originates from an llvm.expect intrinsic. Without that information, we end up checking branch weights multiple times in the case if ThinLTO + SampleProfiling, leading to some inaccuracy in how we report MisExpect related diagnostics to users. Since we change the format of MD_prof metadata in a fundamental way, we need to update code handling branch weights in a number of places. We also update the lang ref for branch weights to reflect the change.	2024-06-10 11:27:21 -07:00
William Junda Huang	5a23d31c50	[Sample Profile] Check hot callsite threshold when inlining a function with a sample profile (#93286 ) Currently if a callsite is hot as determined by the sample profile, it is unconditionally inlined barring invalid cases (such as recursion). Inline cost check should still apply because a function's hotness and its inline cost are two different things. For example if a function is calling another very large function multiple times (at different code paths), the large function should not be inlined even if its hot.	2024-05-28 16:41:53 -04:00
Nabeel Omer	686a206b26	[SampleProfileLoader] Fix integer overflow in generateMDProfMetadata (#90217 ) This patch fixes an integer overflow in the SampleProfileLoader pass. The issue occurs when weights are saturated and Profi isn't being used. This patch also adds a newline to a debug message to make it more readable.	2024-05-08 14:32:56 +01:00
Lei Wang	b7248d5363	[PseudoProbe] Add an option to remove pseudo probes after profile annotation (#90293 ) This can be used for testing perf overhead of pseudo-probe.	2024-04-29 09:27:33 -07:00
Lei Wang	1aceee7bb6	Remove unused variable (#88223 ) fix the CI	2024-04-09 19:25:08 -07:00
Lei Wang	1d99d7a6f8	[SampleFDO][NFC] Refactoring SampleProfileMatcher (#86988 ) Move all the stale profile matching stuffs into new files so that it can be shared for unit testing.	2024-03-28 20:03:03 -07:00
Lei Wang	f8bab38b6d	[CSSPGO] Fix the issue of missing callee profile matches (#85715 ) Two fixes related to the callee/inlinee profile: 1. Fix the bug that the matching results are missing to distribute to the callee profiles (should be pass-by-reference). 2. Narrow imported function matching to checksum mismatched functions. More context: before we run matchings for all imported functions even checksums are matched, however, after we fix 1), we got a regression, it's likely due to the matching is not no-op for checksum matched function, so we want to make it consistent to only run matching for checksum mismatched (imported)functions. Since the metadata(pseudo_probe_desc) are dropped for imported function, we leverage the function attribute mechanism and add a new function attribute(`profile-checksum-mismatch`) to transfer the info from pre-link to post-link.	2024-03-27 22:27:22 -07:00
Lei Wang	2598aa67c8	[CSSPGO] Reject high checksum mismatched profile (#84097 ) Error out the build if the checksum mismatch is extremely high, it's better to drop the profile rather than apply the bad profile. Note that the check is on a module level, the user could make big changes to functions in one single module but those changes might not be performance significant to the whole binary, so we want to be conservative, only expect to catch big perf regression. To do this, we select a set of the "hot" functions for the check. We use two parameter(`hot-func-cutoff-for-staleness-error` and `min-functions-for-staleness-error`) to control the function selection to make sure the selected are hot enough and the num of function is not small. Tuned the parameters on our internal services, it works to catch big perf regression due to the high mismatch .	2024-03-27 11:14:21 -07:00
Lei Wang	12a2bc301f	[CSSPGO] Fix the issue of preinliner import function list (#85719 ) By design, when the nested profile is pre-inliner based, we should fully honor pre-inliner decision, fix it by setting threshold to zero. We observed a perf win on one internal service, no negative impact for other big services.	2024-03-19 16:50:48 -07:00
Lei Wang	c98da372cb	[CSSPGO] Compute and report profile matching recovered callsites and samples (#79090 ) This change adds the support to compute and report the staleness metrics after stale profile matching so that we can know how effective the fuzzy matching is, i. e. how many callsites and samples are recovered by the matching. Some implementation notes: - The function checksum mismatch metrics are not applicable here as it's function-level metrics, checksum mismatch remains the same before and after matching, so we need to compute based on the callsite samples. - Added two new counters `NumRecoveredCallsites`, `RecoveredCallsiteSamples` for this and removed `TotalCallsiteSamples` as now the we can use the `TotalFuncHashSamples` as base, and renamed some counters. - In profile matching, we changed to use a state machine to represent the callsite's matching state changes. See the `MatchState` for the state, and used a new function `recordCallsiteMatchStates` to compute and record the callsite's match states changes before and after the matching, , the result is compressed and saved into a `FuncCallsiteMatchStates` map for later counting use. - Changed the counting function to run on module-level and moved it to the end of the whole process(`computeAndReportProfileStaleness`). The reason is before the callsite is only counted on top-level function, this change extends it to count(recursively) on the inlined functions and samples, which is more accurate.	2024-02-19 11:36:20 -08:00
Benjamin Kramer	9423e45987	[ProfileData] Copy CallTargetMaps a bit less. NFCI	2023-12-24 17:48:18 +01:00
Matthias Braun	cb4627d150	Add setBranchWeigths convenience function. NFC (#72446 ) Add `setBranchWeights` convenience function to ProfDataUtils.h and use it where appropriate.	2023-11-16 10:55:19 -08:00
William Junda Huang	683f2df6e5	[SampleProfile] Fix bug where remapper returns empty string and crashing Sample Profile loader (#71479 ) Normally SampleContext does not allow using an empty StirngRef to construct an object, this is to prevent bugs reading the profile. However empty names may be emitted by a function which its name is intentionally set to empty, or a bug in the remapper that returns an empty string. Regardless, converting it to FunctionId first will prevent the assert, and that assert check is unnecessary, which will be addressed in another patch	2023-11-10 21:38:13 +00:00
William Junda Huang	ef0e0adccd	[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164 ) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.org/D147740 In previous implementation, when a MD5 Sample Profile is read, the reader first converts the MD5 values to strings, and then create a StringRef as if the numerical strings are regular function names, and later on IPO transformation passes perform string comparison over these numerical strings for profile matching. This is inefficient since it causes many small heap allocations. In this patch I created a class `ProfileFuncRef` that is similar to `StringRef` but it can represent a hash value directly without any conversion, and it will be more efficient (I will attach some benchmark results later) when being used in associative containers. ProfileFuncRef guarantees the same function name in string form or in MD5 form has the same hash value, which also fix a few issue in IPO passes where function matching/lookup only check for function name string, while returns a no-match if the profile is MD5. When testing on an internal large profile (> 1 GB, with more than 10 million functions), the full profile load time is reduced from 28 sec to 25 sec in average, and reading function offset table from 0.78s to 0.7s	2023-10-17 21:09:39 +00:00
Fangrui Song	111fcb0df0	[llvm] Fix duplicate word typos. NFC Those fixes were taken from https://reviews.llvm.org/D137338	2023-09-01 18:25:16 -07:00
wlei	f14a5ff635	[CSSPGO] Refactoring findIRAnchors Address feedback in https://reviews.llvm.org/D158817. Since `extractProbe` can be used for both calliste and BB probe, we can leverage this to unify the callsite handling code. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D159169	2023-08-31 16:25:47 -07:00
Jie Fu	3b51881dd5	[CSSPGO] Silence -Wunused-but-set-variable warning without asserts (NFC) /data/home/jiefu/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2189:8: error: variable 'IsFuncHashMismatch' set but not used [-Werror,-Wunused-but-set-variable] bool IsFuncHashMismatch = false; ^ 1 error generated.	2023-08-31 09:58:29 +08:00
wlei	4bb6bbb9bf	[CSSPGO] Skip reporting staleness metrics for imported functions Accumulating the staleness metrics from per-link is less accurate than doing it from post-link time(assuming we use the offline profile mismatch as baseline), the reason is that there are some duplicated reports for the same functions, for example, one template function could be included in multiple TUs, but in post thin link time, only one function are kept(linkonce_odr) and others are marked as available-externally function. Hence, this change skips reporting the metrics for imported functions(available-externally). I saw the post-link number is now very close to the offline number(dump the mismatched functions and count the metrics offline based on the entire profile), sightly smaller than offline number due to some missing inlined functions. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D156725	2023-08-30 18:00:23 -07:00
wlei	3365cd4544	[CSSPGO] Compute checksum mismatch recursively on nested profile Follow-up diff for https://reviews.llvm.org/D158891. Compute the checksum mismatch based on the original nested profile. Additionally, use a recursive way to compute the children mismatched samples in the nested tree even the top-level func checksum is matched. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D158900	2023-08-30 18:00:23 -07:00
wlei	62a3f6c96e	[CSSPGO] Retire FlattenProfileForMatching - Always use flattened profile to find the profile anchors. Since profile under different contexts may have different inlined callsites, to get more profile anchors, we use a merged profile from all the contexts(the flattened profile) to find callsite anchors. - Compute the staleness metrics based on the original nested profile, as currently once a callsite is mismatched, all its children profile are dropped.(TODO: in future, we can improve to reuse the children valid profile) Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D158891	2023-08-30 18:00:23 -07:00
wlei	062af2e763	[CSSPGO] Support stale profile matching for LTO As in per-link time, callsites could be optimized out by inlining, we don't have those original call targets in the IR in LTO time. Additionally, the inlined code doesn't actually belong to the original function, the IR locations or pseudo probe parsed from it are incorrect and could mislead the matching later. This change adds the support to extract the original IR location info from the inlined code, specifically, it make sure to skip all the inlined code that doesn't belong the original function, but before that, it processes the inline frames of the debug info to extract the base frame and recover its callsite and callee target(name). Measured on some stale profile instances, all showed some perf improvements. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D156722	2023-08-30 18:00:23 -07:00
wlei	148cceb0d6	[CSSPGO] Refactoring SampleProfileMatcher::runOnFunction - rename `IRLocation` --> `IRAnchors`, `ProfileLocation` --> `ProfileAnchors` - reorganize runOnFunction, fact out the finding IR anchors code into `findIRAnchors` - introduce a new function `findProfileAnchors` to populate the profile related anchors, the result is saved into `ProfileAnchors`, it's later used for both mismatch report and matching, this can avoid to parse the `getBodySamples` and `getCallsiteSamples` for multiple times. - move the `MatchedCallsiteLocs` stuffs from `findIRAnchors` to `countProfileMismatches` so that all the staleness metrics report are computed in one function. - move all matching related into `runStaleProfileMatching`, and move all mismatching report into `countProfileMismatches` Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D158817	2023-08-30 18:00:23 -07:00
William Huang	da2855c0ba	[SampleProfile] Potential use after move in SampleProfileLoader::promoteMergeNotInlinedContextSamples SampleProfileLoader::promoteMergeNotInlinedContextSample adds certain uninlined functions to the sample profile map (unordered_map, which is previously read from a profile file). This action may cause the map to be rehashed, invalidating all pointers to FunctionSamples used by many members of SampleProfileLoader, while the existing code did nothing to guard against that. This bug is theoretical since adding a few new functions to a large profile usually won't trigger a rehash, or even if there's a rehash std::unordered_map tries its best to expand its capacity in-place. This bug will trigger if the container type of sample profile map is changed to llvm::DenseMap or other implementation, such as in D147740, for SampleProfReader's performance reason. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D157061	2023-08-16 20:32:15 +00:00

1 2 3 4 5 ...

384 Commits