We aren't currently deduplicating contexts that are identical or nearly
identical (differing inline frame information) when generating the
profile. When we have multiple identical contexts we end up
conservatively marking it as non-cold, even if some are much smaller in
terms of bytes allocated.
This was causing us to lose sight of a very large cold context, because
we had a small non-cold one that only differed in the inlining (which we
don't consider when matching as the inlining could change or be
incomplete at that point in compilation). Likely the smaller one was
from binary with much smaller usage and therefore not yet detected as
cold.
Deduplicate the alloc contexts for a function before applying the
profile, selecting the largest one, or conservatively selecting the
non-cold one if they are the same size.
This caused a minor difference to an existing test
(memprof_loop_unroll.ll), which now only gets one message for the
duplicate context instead of 2. While here, convert to the text version
of the profile.
This change introduces new helper functions to check if a global
variable is eligible for section prefix annotation.
This shared logic is used by both MemProfUse and StaticDataSplitter to
avoid annotating ineligible variables.
This is the 2nd patch as a split of
https://github.com/llvm/llvm-project/pull/155337
The codegen pass in the pipeline can read the module flag to tell
whether the IR is compiled with data access profile, to support two use
cases when `memprof-annotate-static-data-prefix=true` is enabled
1. The binary is compiled with data access profiles.
- The module flag will have value 1, and codegen pass should regard an
empty section prefix as 'unknown' and conservatively not placing the
data into `.unlikely` data sections.
2. The binary is compiled without data access profiles (e.g., during
incremental rollout, etc)
- The module flag will have value 0, and codegen pass can override an
empty section prefix based on PGO counters.
https://github.com/llvm/llvm-project/pull/155337 shows the motivating
use case in function `StaticDataProfileInfo::getConstantSectionPrefix`
in `llvm/lib/Analysis/StaticDataProfileInfo.cpp`
This is the 1st patch as a split of
https://github.com/llvm/llvm-project/pull/155337
f3f28323ad
introduces the data access profile format as a payload inside
[memprof](https://llvm.org/docs/InstrProfileFormat.html#memprof-profile-data),
and the MemProfUse pass reads the memprof payload.
This change extends the MemProfUse pass to read the data access profiles
to annotate global variables' section prefix.
1. If there are samples for a global variable, it's annotated as hot.
2. If a global variable is seen in the profiled binary file but doesn't
have access samples, it's annotated as unlikely.
Introduce an option `annotate-static-data-prefix` to flag-gate the
global-variable annotation path, and make it false by default.
https://github.com/llvm/llvm-project/pull/155337 is the (WIP) draft
change to "reconcile" two sources of hotness.
Now that readMemProf calls two helper functions handleAllocSite and
handleCallSite, we can simplify the control flow. We don't need to
use "continue" anymore.
Continuing the effort to refactor readMemProf, this patch introduces
handlCallSite to handle, well, call sites.
Moving the code requires taking CallSiteEntry and CallSiteEntryHash
out of readMemProf.
We could simplify some code, but I'm keeping this patch very simple to
facilitate the review process. For example, we could simplify the
control flow near the end of readMemProf, but we can address that
later.
This patch creates a helper function named handleAllocSite to handle
the allocation site. It makes readMemProf a little bit shorter.
I'm planning to move the code to handle call sites in a subsequent
patch. Doing so in this patch would make this patch a lot longer
because we need to move other things like CallSiteEntry and
CallSiteEntryHash.
Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.