384 Commits

Author SHA1 Message Date
Alexis Engelke
efcd3b6108
[IPO][InstCombine][Vectorize][NFCI] Drop uses of BranchInst (#186596)
Refactor remaining parts of Transforms apart from Scalar and Utils.
2026-03-14 17:49:00 +00:00
HighW4y2H3ll
e615400861
[SampleProfile] Skip counting mismatched weak symbols during profile loading (#185514)
Weak symbols may be overridden during linking, and this may cause
profile mismatch when compiling the weak symbols, while the profile was
created based on the overriding function. Skip counting the weak symbol
while checking the mismatched function profiles to avoid false alarm on
rejecting legit profiles.
2026-03-12 14:53:57 -07:00
serge-sans-paille
2095ea5b40
Remove unused <set> and <map> inclusion (#167175) 2025-11-09 15:15:21 +00:00
Kazu Hirata
0028ef667a
[llvm] Remove unused local variables (NFC) (#167106)
Identified with bugprone-unused-local-non-trivial-variable.
2025-11-08 07:41:07 -08:00
Nicolai Hähnle
11a4b2d950
Cleanup the LLVM exported symbols namespace (#161240)
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.
2025-10-01 15:32:07 -07:00
Mircea Trofin
240b73e10f
[SimplifyCFG][PGO] Reuse existing setBranchWeights (#160629)
The main difference between SimplifyCFG's `setBranchWeights`​ and the ProfDataUtils' is that the former doesn't propagate all-zero weights. That seems like a sensible thing to do, so updated the latter accordingly, and added a flag to control the behavior.

Also moved to ProfDataUtils the logic fitting 64-bit weights to 32-bit.

As side-effect, this fixes some profcheck failures.
2025-10-01 09:54:30 -07:00
Aiden Grossman
c91fa95fc7
[SampleProfile] Always use FAM to get ORE
The split in this code path was left over from when we had to support
the old PM and the new PM at the same time. Now that the legacy pass has
been dropped, this simplifies the code a little bit and swaps pointers
for references in a couple places.

Reviewers: aeubanks, efriedma-quic, wlei-llvm

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/159858
2025-09-19 15:54:40 -07:00
Kazu Hirata
47b9837dad
[llvm] Use std::tie to implement comparison functors (NFC) (#141353) 2025-05-24 09:37:32 -07:00
Kazu Hirata
3667f29dfd
[llvm] Use std::optional::value_or (NFC) (#140014) 2025-05-15 07:18:55 -07:00
Owen Rodley
d3d856ad84
Clean up external users of GlobalValue::getGUID(StringRef) (#129644)
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.

This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.

This does the following:

* Rename GlobalValue::getGUID(StringRef) to
  getGUIDAssumingExternalLinkage. This is simply making explicit at the
  callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
  GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
  getGUIDAssumingExternalLinkage, to make the assumption explicit.


There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
2025-04-28 11:09:43 +10:00
Kazu Hirata
d8b078d550
[Transforms] Use llvm::append_range (NFC) (#133607) 2025-03-29 18:57:50 -07:00
Jinjie Huang
c8b69c9076
[NFC][SampleFDO] Clean the unneeded field and the related loop (#132376)
Clean the unneeded field 'TotalCollectedSamples' and the unnecessary
loop.
The field seems introduced in:https://reviews.llvm.org/D31952, and its
uses were removed in: https://reviews.llvm.org/D19287, but this field
and unnecessary calculation were not cleaned up.
This patch will remove these unneeded codes.
2025-03-28 11:06:00 +08:00
Kazu Hirata
ad8d549505
[IPO] Avoid repeated hash lookups (NFC) (#132588) 2025-03-22 22:25:25 -07:00
Lei Wang
068d0c0f4b
[CSSPGO] Turn on call-graph matching by default for CSSPGO (#125938)
Tested call-graph matching on some of Meta's large services, it works to
reuse some renamed function profiles, no negative perf or significant
build speed regression observed. Turned it on by default for CSSPGO
mode.
2025-02-06 11:54:59 -08:00
Jeremy Morse
81d18ad864
[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.

Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
2025-01-27 16:27:54 +00:00
Ryan Mansfield
67efbd0bf1
[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955) 2025-01-08 11:07:23 +01:00
Haohai Wen
ccc8e45404
[PseudoProbe] Fix cleanup for pseudo probe after annotation (#119660)
When using -sample-profile-remove-probe, pseudo probe desc should
also be removed and dwarf discriminator for call instruction should
be restored.
2024-12-13 17:05:03 +08:00
Lei Wang
bc1aa2863b
[SampleFDO] Support enabling sample loader pass in O0 mode (#113985)
Add support for enabling sample loader pass in O0 mode(under
`-fsample-profile-use`). This can help verify PGO raw profile count
quality or provide a more accurate performance proxy(predictor), as O0
mode has minimal or no compiler optimizations that might otherwise
impact profile count accuracy.
- Explicitly disable the sample loader inlining to ensure it only emits
sampling annotation.
- Use flattened profile for O0 mode.
- Add the pass after `AddDiscriminatorsPass` pass to work with
`-fdebug-info-for-profiling`.
2024-11-08 15:29:44 -08:00
Kazu Hirata
98ea1a81a2
[IPO] Remove unused includes (NFC) (#114716)
Identified with misc-include-cleaner.
2024-11-03 13:48:55 -08:00
Antonio Frighetto
2ae968a0d9
[Instrumentation] Move out to Utils (NFC) (#108532)
Utility functions have been moved out to Utils. Minor opportunity to
drop the header where not needed.
2024-09-15 21:07:40 -07:00
Lei Wang
ce8c43fe27
Fix assertion of null pointer samples in inline replay mode (#99378)
Fix https://github.com/llvm/llvm-project/issues/97108. In inline replay
mode, `CalleeSamples` may be null and the order doesn't matter.
2024-07-18 10:16:44 -07:00
Lei Wang
18cdfa72e0
[SampleFDO] Stale profile call-graph matching (#95135)
Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
function signature( such as `int foo()` --> `long foo()` ) can make the
mangled name different, the function profile(including all nested
children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting
at identifying the trivial function renaming and reusing the old
function profile.

Some noteworthy details:

1. Extend the LCS based CFG level matching to identify new function. 
- Extend to match function and profile have different name instead of
the exact function name matching. This leverages LCS, i.e during the
finding of callsite anchor matching, when two function name are
different, try matching the functions instead of return.
- In LCS, the equal function check is replaced by
`functionMatchesProfile`.
- Only try matching functions that are new functions(neither appears on
each side). This reduces the matching scope as we don't need to match
the originally matched function.
2.  Determine the matching by call-site anchor similarity check.
- A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to
check the renaming for the possible <IRFunc, ProfFunc> pair, use the
LCS(diff) matching to compute the equal set and we define: `Similarity =
|equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if
the similarity is above a
threshold(`-func-profile-similarity-threshold`)

3.  Process the matching in top-down function order 
- when a caller's is done matching, the new function names are saved for
later use, using top-down order will maximize the reused results.
- `ProfileNameToFuncMap` is used to save or cache the matching result.
4. Update the original profile at the end using `ProfileNameToFuncMap`.

5. Added a new switch --salvage-unused-profile to control this, default
is false.

Verified on one Meta's internal big service, confirmed 90%+ of the found
renaming pair is good. (There could be incorrect renaming pair if the
num of the anchor is small, but checked that those functions are simple
cold function)
2024-07-17 10:33:00 -07:00
Mircea Trofin
ce03155a1b
[NFC] Coding style fixes: SampleProf (#98208)
Also some control flow simplifications.

Notably, this doesn't address `sampleprof_error`. I *think* the style
there tries to match `std::error_category`.

Also left `hash_value` as-is, because it matches what we do in Hashing.h
2024-07-09 14:35:49 -07:00
Kazu Hirata
3cf762b7b7
[Transforms] Migrate to a new version of getValueProfDataFromInst (#96380) 2024-06-30 14:54:07 -07:00
Kazu Hirata
836ca5bbf7
[Transforms] Migrate to a new version of getValueProfDataFromInst (#95485)
Note that the version of getValueProfDataFromInst that returns bool
has been "deprecated" since:

  commit 1e15371dd8843dfc52b9435afaa133997c1773d8
  Author: Mingming Liu <mingmingl@google.com>
  Date:   Mon Apr 1 15:14:49 2024 -0700
2024-06-13 18:21:09 -07:00
Paul Kirth
294f3ce5dd
Reapply "[llvm][IR] Extend BranchWeightMetadata to track provenance o… (#95281)
…f weights" #95136

Reverts #95060, and relands #86609, with the unintended code generation
changes addressed.

This patch implements the changes to LLVM IR discussed in
https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032

In this patch, we add an optional field to MD_prof meatdata nodes for
branch weights, which can be used to distinguish weights added from
llvm.expect* intrinsics from those added via other methods, e.g. from
profiles or inserted by the compiler.

One of the major motivations, is for use with MisExpect diagnostics,
which need to know if branch_weight metadata originates from an
llvm.expect intrinsic. Without that information, we end up checking
branch weights multiple times in the case if ThinLTO + SampleProfiling,
leading to some inaccuracy in how we report MisExpect related
diagnostics to users.

Since we change the format of MD_prof metadata in a fundamental way, we
need to update code handling branch weights in a number of places.

We also update the lang ref for branch weights to reflect the change.
2024-06-12 12:52:28 -07:00
Paul Kirth
607afa0b63
Revert "[llvm][IR] Extend BranchWeightMetadata to track provenance of weights" (#95060)
Reverts llvm/llvm-project#86609

This change causes compile-time regressions for stage2 builds
(https://llvm-compile-time-tracker.com/compare.php?from=3254f31a66263ea9647c9547f1531c3123444fcd&to=c5978f1eb5eeca8610b9dfce1fcbf1f473911cd8&stat=instructions:u).
It also introduced unintended changes to `.text` which should be
addressed before relanding.
2024-06-11 08:06:06 +02:00
Paul Kirth
c5978f1eb5
[llvm][IR] Extend BranchWeightMetadata to track provenance of weights (#86609)
This patch implements the changes to LLVM IR discussed in

https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032

In this patch, we add an optional field to MD_prof metadata nodes for
branch weights, which can be used to distinguish weights added from
`llvm.expect*` intrinsics from those added via other methods, e.g.
from profiles or inserted by the compiler.

One of the major motivations, is for use with MisExpect diagnostics,
which need to know if branch_weight metadata originates from an
llvm.expect intrinsic. Without that information, we end up checking
branch weights multiple times in the case if ThinLTO + SampleProfiling,
leading to some inaccuracy in how we report MisExpect related
diagnostics to users.

Since we change the format of MD_prof metadata in a fundamental way, we
need to update code handling branch weights in a number of places.

We also update the lang ref for branch weights to reflect the change.
2024-06-10 11:27:21 -07:00
William Junda Huang
5a23d31c50
[Sample Profile] Check hot callsite threshold when inlining a function with a sample profile (#93286)
Currently if a callsite is hot as determined by the sample profile, it
is unconditionally inlined barring invalid cases (such as recursion).
Inline cost check should still apply because a function's hotness and
its inline cost are two different things.
For example if a function is calling another very large function
multiple times (at different code paths), the large function should not
be inlined even if its hot.
2024-05-28 16:41:53 -04:00
Nabeel Omer
686a206b26
[SampleProfileLoader] Fix integer overflow in generateMDProfMetadata (#90217)
This patch fixes an integer overflow in the SampleProfileLoader pass.
The issue occurs when weights are saturated and Profi isn't being used.

This patch also adds a newline to a debug message to make it more
readable.
2024-05-08 14:32:56 +01:00
Lei Wang
b7248d5363
[PseudoProbe] Add an option to remove pseudo probes after profile annotation (#90293)
This can be used for testing perf overhead of pseudo-probe.
2024-04-29 09:27:33 -07:00
Lei Wang
1aceee7bb6
Remove unused variable (#88223)
fix the CI
2024-04-09 19:25:08 -07:00
Lei Wang
1d99d7a6f8
[SampleFDO][NFC] Refactoring SampleProfileMatcher (#86988)
Move all the stale profile matching stuffs into new files so that it can
be shared for unit testing.
2024-03-28 20:03:03 -07:00
Lei Wang
f8bab38b6d
[CSSPGO] Fix the issue of missing callee profile matches (#85715)
Two fixes related to the callee/inlinee profile:

1. Fix the bug that the matching results are missing to distribute to
the callee profiles (should be pass-by-reference).
2. Narrow imported function matching to checksum mismatched functions. 

More context: before we run matchings for all imported functions even
checksums are matched, however, after we fix 1), we got a regression,
it's likely due to the matching is not no-op for checksum matched
function, so we want to make it consistent to only run matching for
checksum mismatched (imported)functions. Since the
metadata(pseudo_probe_desc) are dropped for imported function, we
leverage the function attribute mechanism and add a new function
attribute(`profile-checksum-mismatch`) to transfer the info from
pre-link to post-link.
2024-03-27 22:27:22 -07:00
Lei Wang
2598aa67c8
[CSSPGO] Reject high checksum mismatched profile (#84097)
Error out the build if the checksum mismatch is extremely high, it's
better to drop the profile rather than apply the bad profile.
Note that the check is on a module level, the user could make big
changes to functions in one single module but those changes might not be
performance significant to the whole binary, so we want to be
conservative, only expect to catch big perf regression. To do this, we
select a set of the "hot" functions for the check. We use two
parameter(`hot-func-cutoff-for-staleness-error` and
`min-functions-for-staleness-error`) to control the function selection
to make sure the selected are hot enough and the num of function is not
small.
Tuned the parameters on our internal services, it works to catch big
perf regression due to the high mismatch .
2024-03-27 11:14:21 -07:00
Lei Wang
12a2bc301f
[CSSPGO] Fix the issue of preinliner import function list (#85719)
By design, when the nested profile is pre-inliner based, we should fully
honor pre-inliner decision, fix it by setting threshold to zero. We
observed a perf win on one internal service, no negative impact for
other big services.
2024-03-19 16:50:48 -07:00
Lei Wang
c98da372cb
[CSSPGO] Compute and report profile matching recovered callsites and samples (#79090)
This change adds the support to compute and report the staleness metrics
after stale profile matching so that we can know how effective the fuzzy
matching is, i. e. how many callsites and samples are recovered by the
matching.
Some implementation notes:
- The function checksum mismatch metrics are not applicable here as it's
function-level metrics, checksum mismatch remains the same before and
after matching, so we need to compute based on the callsite samples.
- Added two new counters `NumRecoveredCallsites`,
`RecoveredCallsiteSamples` for this and removed `TotalCallsiteSamples`
as now the we can use the `TotalFuncHashSamples` as base, and renamed
some counters.
- In profile matching, we changed to use a state machine to represent
the callsite's matching state changes. See the `MatchState` for the
state, and used a new function `recordCallsiteMatchStates` to compute
and record the callsite's match states changes before and after the
matching, , the result is compressed and saved into a
`FuncCallsiteMatchStates` map for later counting use.
- Changed the counting function to run on module-level and moved it to
the end of the whole process(`computeAndReportProfileStaleness`). The
reason is before the callsite is only counted on top-level function,
this change extends it to count(recursively) on the inlined functions
and samples, which is more accurate.
2024-02-19 11:36:20 -08:00
Benjamin Kramer
9423e45987 [ProfileData] Copy CallTargetMaps a bit less. NFCI 2023-12-24 17:48:18 +01:00
Matthias Braun
cb4627d150
Add setBranchWeigths convenience function. NFC (#72446)
Add `setBranchWeights` convenience function to ProfDataUtils.h and use
it where appropriate.
2023-11-16 10:55:19 -08:00
William Junda Huang
683f2df6e5
[SampleProfile] Fix bug where remapper returns empty string and crashing Sample Profile loader (#71479)
Normally SampleContext does not allow using an empty StirngRef to
construct an object, this is to prevent bugs reading the profile.
However empty names may be emitted by a function which its name is
intentionally set to empty, or a bug in the remapper that returns an
empty string. Regardless, converting it to FunctionId first will prevent
the assert, and that assert check is unnecessary, which will be
addressed in another patch
2023-11-10 21:38:13 +00:00
William Junda Huang
ef0e0adccd
[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164)
This is phase 2 of the MD5 refactoring on Sample Profile following
https://reviews.llvm.org/D147740
    
In previous implementation, when a MD5 Sample Profile is read, the
reader first converts the MD5 values to strings, and then create a
StringRef as if the numerical strings are regular function names, and
later on IPO transformation passes perform string comparison over these
numerical strings for profile matching. This is inefficient since it
causes many small heap allocations.
In this patch I created a class `ProfileFuncRef` that is similar to
`StringRef` but it can represent a hash value directly without any
conversion, and it will be more efficient (I will attach some benchmark
results later) when being used in associative containers.

ProfileFuncRef guarantees the same function name in string form or in
MD5 form has the same hash value, which also fix a few issue in IPO
passes where function matching/lookup only check for function name
string, while returns a no-match if the profile is MD5.

When testing on an internal large profile (> 1 GB, with more than 10
million functions), the full profile load time is reduced from 28 sec to
25 sec in average, and reading function offset table from 0.78s to 0.7s
2023-10-17 21:09:39 +00:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
wlei
f14a5ff635 [CSSPGO] Refactoring findIRAnchors
Address feedback in https://reviews.llvm.org/D158817. Since `extractProbe` can be used for both calliste and BB probe, we can leverage this to unify the callsite handling code.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D159169
2023-08-31 16:25:47 -07:00
Jie Fu
3b51881dd5 [CSSPGO] Silence -Wunused-but-set-variable warning without asserts (NFC)
/data/home/jiefu/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2189:8: error: variable 'IsFuncHashMismatch' set but not used [-Werror,-Wunused-but-set-variable]
  bool IsFuncHashMismatch = false;
       ^
1 error generated.
2023-08-31 09:58:29 +08:00
wlei
4bb6bbb9bf [CSSPGO] Skip reporting staleness metrics for imported functions
Accumulating the staleness metrics from per-link is less accurate than doing it from post-link time(assuming we use the offline profile mismatch as baseline), the reason is that there are some duplicated reports for the same functions, for example, one template function could be included in multiple TUs, but in post thin link time, only one function are kept(linkonce_odr) and others are marked as available-externally function. Hence, this change skips reporting the metrics for imported functions(available-externally).

I saw the post-link number is now very close to the offline number(dump the mismatched functions and count the metrics offline based on the entire profile), sightly smaller than offline number due to some missing inlined functions.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D156725
2023-08-30 18:00:23 -07:00
wlei
3365cd4544 [CSSPGO] Compute checksum mismatch recursively on nested profile
Follow-up diff for https://reviews.llvm.org/D158891. Compute the checksum mismatch based on the original nested profile. Additionally, use a recursive way to compute the children mismatched samples in the nested tree even the top-level func checksum is matched.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D158900
2023-08-30 18:00:23 -07:00
wlei
62a3f6c96e [CSSPGO] Retire FlattenProfileForMatching
- Always use flattened profile to find the profile anchors. Since profile under different contexts may have different inlined callsites, to get more profile anchors, we use a merged profile from all the contexts(the flattened profile) to find callsite anchors.

- Compute the staleness metrics based on the original nested profile, as currently once a callsite is mismatched, all its children profile are dropped.(TODO: in future, we can improve to reuse the children valid profile)

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D158891
2023-08-30 18:00:23 -07:00
wlei
062af2e763 [CSSPGO] Support stale profile matching for LTO
As in per-link time, callsites could be optimized out by inlining, we don't have those original call targets in the IR in LTO time. Additionally, the inlined code doesn't actually belong to the original function, the IR locations or pseudo probe parsed from it are incorrect and could mislead the matching later.

This change adds the support to extract the original IR location info from the inlined code, specifically, it make sure to skip all the inlined code that doesn't belong the original function, but before that, it processes the inline frames of the debug info to extract the base frame and recover its callsite and callee target(name).

Measured on some stale profile instances, all showed some perf improvements.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D156722
2023-08-30 18:00:23 -07:00
wlei
148cceb0d6 [CSSPGO] Refactoring SampleProfileMatcher::runOnFunction
- rename `IRLocation` --> `IRAnchors`,  `ProfileLocation` --> `ProfileAnchors`
- reorganize runOnFunction, fact out the finding IR anchors code into `findIRAnchors`
- introduce a new function `findProfileAnchors` to populate the profile related anchors, the result is saved into `ProfileAnchors`, it's later used for both mismatch report and matching, this can avoid to parse the `getBodySamples` and `getCallsiteSamples` for multiple times.
- move the `MatchedCallsiteLocs` stuffs from `findIRAnchors` to `countProfileMismatches` so that all the staleness metrics report are computed in one function.
- move all matching related into `runStaleProfileMatching`, and move all mismatching report into `countProfileMismatches`

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D158817
2023-08-30 18:00:23 -07:00
William Huang
da2855c0ba [SampleProfile] Potential use after move in SampleProfileLoader::promoteMergeNotInlinedContextSamples
SampleProfileLoader::promoteMergeNotInlinedContextSample adds
certain uninlined functions to the sample profile map (unordered_map, which is
previously read from a profile file). This action may cause the map to
be rehashed, invalidating all pointers to FunctionSamples used by many
members of SampleProfileLoader, while the existing code did nothing to
guard against that. This bug is theoretical since adding a few
new functions to a large profile usually won't trigger a rehash, or even
if there's a rehash std::unordered_map tries its best to expand its
capacity in-place.

This bug will trigger if the container type of sample profile map is
changed to llvm::DenseMap or other implementation, such as in D147740,
for SampleProfReader's performance reason.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D157061
2023-08-16 20:32:15 +00:00