Factor out code in `populateCounters()` and `populateCoverage()` used to
grab the record into `PGOUseFunc::getRecord()` to reduce code
duplication. And return `NamedInstrProfRecord` in `getInstrProfRecord()`
to avoid an unnecessary cast. No functional change is intented.
When no profile is provided, but the new --empty-profile option is
specified, the export/report/show commands now emit coverage data
equivalent to that obtained from a profile with all zero counters
("baseline coverage").
This is useful for build systems (e.g. Bazel) that can track coverage
information for each build target, even those that are never linked into
tests and thus don't have runtime coverage data recorded. By merging in
baseline coverage, lines in files that aren't linked into tests are
correctly reported as uncovered.
Reland with fixes to `CoverageMappingTest.cpp`.
Reverts llvm/llvm-project#144121
Reverts llvm/llvm-project#117910
```
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ProfileData/CoverageMappingTest.cpp:281:28: error: 'std::reference_wrapper' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported]
281 | std::make_optional(std::reference_wrapper(*ProfileReader));
| ^
/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../include/c++/8/bits/refwrap.h:289:11: note: add a deduction guide to suppress this warning
289 | class reference_wrapper
| ^
```
When no profile is provided, but the new --empty-profile option is
specifed, the export/report/show commands now emit coverage data
equivalent to that obtained from a profile with all zero counters
("baseline coverage").
This is useful for build systems (e.g. Bazel) that can track coverage
information for each build target, even those that are never linked into
tests and thus don't have runtime coverage data recorded. By merging in
baseline coverage, lines in files that aren't linked into tests are
correctly reported as uncovered.
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/ProfileData`
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS on
Linux:
- Manually annotate the file
`llvm/include/llvm/ProfileData/InstrProfData.inc` because it is skipped
by IDS
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates.
- Add `LLVM_ABI_FRIEND` to friend member functions declared with
`LLVM_ABI`
- Add `LLVM_ABI` to a small number of symbols that require export but
are not declared in headers
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Address post-commit review comments from PR141805. Misc cleanup but the
biggest changes are moving some common utilities to new MemProfCommon
files to reduce unnecessary includes.
This patch adds support for a basic MemProf summary section, which is
built along with the indexed MemProf profile (e.g. when reading the raw
or YAML profiles), and serialized through the indexed profile just after
the header.
Currently only 6 fields are written, specifically the number of contexts
(total, cold, hot), and the max context size (cold, warm, hot).
To support forwards and backwards compatibility for added fields in the
indexed profile, the number of fields serialized first. The code is
written to support forwards compatibility (reading newer profiles with
additional summary fields), and comments indicate how to implement
backwards compatibility (reading older profiles with fewer summary
fields) as needed.
Support is added to print the summary as YAML comments when displaying
both the raw and indexed profiles via `llvm-profdata show`. Because they
are YAML comments, the YAML reader ignores these (the summary is always
recomputed when building the indexed profile as described above).
This necessitated moving some options and a couple of interfaces out of
Analysis/MemoryProfileInfo.cpp and into the new
ProfileData/MemProfSummary.cpp file, as we need to classify context
hotness earlier and also compute context ids to build the summary from
older indexed profiles.
Re-apply https://github.com/llvm/llvm-project/pull/139997 after fixing the use-of-uninitialized-memory error
(https://lab.llvm.org/buildbot/#/builders/94/builds/7373).
Tested: The error is reproduced with
https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_msan.sh
without the fix, and test pass with the fix.
**Original commit message:**
https://github.com/llvm/llvm-project/pull/138170 introduces classes to
operate on data access profiles. This change supports the read and write
of `DataAccessProfData` in indexed format of MemProf (v4) as well as its
the text (yaml) format.
For indexed format:
* InstrProfWriter owns (by `std::unique_ptr<DataAccessProfData>`) the
data access profiles, and gives a non-owned copy when it calls
`writeMemProf`.
* MemProf v4 header has a new `uint64_t` to record the byte offset of
data access profiles. This `uint64_t` field is zero if data access
profile is not set (nullptr).
* MemProfReader reads the offset from v4 header and de-serializes
in-memory bytes into class `DataAccessProfData`.
For textual format:
* MemProfYAML.h adds the mapping for DAP class, and make DAP optional
for both read and write.
099a0fa (by @snehasish) introduces v4 which contains CalleeGuids in
CallSiteInfo, and this change changes the v4 format in place with data
access profiles. The current plan is to bump the version and enable v4
profiles with both features, assuming waiting for this change won't
delay the callsite change too long.
---------
Co-authored-by: Kazu Hirata <kazu@google.com>
https://github.com/llvm/llvm-project/pull/138170 introduces classes to
operate on data access profiles. This change supports the read and write
of `DataAccessProfData` in indexed format of MemProf (v4) as well as its
the text (yaml) format.
For indexed format:
* InstrProfWriter owns (by `std::unique_ptr<DataAccessProfData>`) the
data access profiles, and gives a non-owned copy when it calls
`writeMemProf`.
* MemProf v4 header has a new `uint64_t` to record the byte offset of
data access profiles. This `uint64_t` field is zero if data access
profile is not set (nullptr).
* MemProfReader reads the offset from v4 header and de-serializes
in-memory bytes into class `DataAccessProfData`.
For textual format:
* MemProfYAML.h adds the mapping for DAP class, and make DAP optional
for both read and write.
099a0fa (by @snehasish) introduces v4 which contains CalleeGuids in
CallSiteInfo, and this change changes the v4 format in place with data
access profiles. The current plan is to bump the version and enable v4
profiles with both features, assuming waiting for this change won't
delay the callsite change too long.
---------
Co-authored-by: Kazu Hirata <kazu@google.com>
Context: For
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744#p-336543-background-3,
we propose to profile memory loads and stores via hardware events,
symbolize the addresses of binary static data sections and feed the
profile back into compiler for data partitioning.
This change adds the profile format for static data layout, and the
classes to operate on it.
The profile and its format
1. Conceptually, a piece of data (call it a symbol) is represented by
its symbol name or its content hash. The former applies to majority of
data whose mangled name remains relatively stable over binary releases,
and the latter applies to string literals (with name patterns like
`.str.<N>[.llvm.<hash>]`.
- The symbols with samples are hot data. The number of hot symbols is
small relative to all symbols. The profile tracks its sampled counts and
locations. Sampled counts come from hardware events, and locations come
from debug information in the profiled binary. The symbols without
samples are cold data. The number of such cold symbols is large. The
profile tracks its representation (the name or content hash).
- Based on a preliminary study, debug information coverage for data
symbols is partial and best-effort. In the LLVM IR, global variables
with source code correspondence may or may not have debug information.
Therefore the location information is optional in the profiles.
2. The profile-and-compile cycle is similar to SamplePGO. Profiles are
sampled from production binaries, and used in next binary releases.
Known cold symbols and new hot symbols can both have zero sampled
counts, so the profile records known cold symbols to tell the two for
next compile.
In the profile's serialization format, strings are concatenated together
and compressed. Individual records stores the index.
A separate PR will connect this class to InstrProfReader/Writer via
MemProfReader/Writer.
---------
Co-authored-by: Kazu Hirata <kazu@google.com>
This patch adds CalleeGuids to the serialized format and increments the version number to 4. The unit tests are updated to include a new test for v4 and the YAML format is also updated to be able to roundtrip the v4 format.
Collect profiles for functions we encounter when collecting a contextual profile, that are not associated with a call site. This is expected to happen for signal handlers, but it also - problematically - currently happens for mem{memset|copy|move|set}, which are currently inserted after profile instrumentation.
Collecting a "regular" flat profile in these cases would hide the problem - that we loose better profile opportunities.
When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root.
We expose this in the profile and leave the normalization to the user of the profile, for a few reasons:
* it's only needed if reasoning about all profiles in aggregate.
* the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles
* the information itself may be interesting as an indication of relative sampling of various contexts.
* Added YAML traits for `CallSiteInfo`
* Updated the `MemProfReader` to pass `Frames` instead of the entire
`CallSiteInfo`
* Updated test cases to use `testing::Field`
* Add YAML sequence traits for CallSiteInfo in MemProfYAML
* Also extend IndexedMemProfRecord
* XFAIL the MemProfYaml round trip test until we update the profile
format
For now we only read and write the additional information from the YAML
format. The YAML round trip test will be enabled when the serialized format is updated.
The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches.
The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
This patch introduces IndexedCallstackIdConveter as a convenience
wrapper around FrameIdConverter and CallStackIdConverter just for
tests.
With the new wrapper, we get to replace idioms like:
FrameIdConverter<decltype(MemProfData.Frames)> FrameIdConv(
MemProfData.Frames);
CallStackIdConverter<decltype(MemProfData.CallStacks)> CSIdConv(
MemProfData.CallStacks, FrameIdConv);
with:
IndexedCallstackIdConveter CSIdConv(MemProfData);
Unfortunately, this exact pattern occurs in tests only; the
combinations of the frame ID converter and call stack ID converter are
diverse in production code.
This patch checks the result of YAML parsing at the level of
MemProfRecord instead of IndexedMemProfRecord, thereby avoiding use of
Frame::hash and hashCallStacks. This makes sense because we
ultimately care about consumers like MemProfiler.cpp obtaining
MemProfRecord correctly; IndexedMemProfData and hash values are just
intermediaries.
Once this patch lands, we call Frame::hash and hashCallStacks only
when adding Frames or call stacks to their respective data structures.
In other words, the hash functions are pretty much business internal
to IndexedMemProfRecord.
In these tests, we just want to add one instance of
IndexedMemProfRecord to MemProfData.Records and retrieve it from
MemProfReader. There is no particular reason to associate F1.hash()
with the IndexedMemProfRecord instance. A fake value suffices.
While I am at it, I'm switching to try_emplace so that I can move
FakeRecord.
Migrating away from Frame::hash and hashCallStack further encapsulates
how the IDs are calculated.
Note that unit tests are the only places where Frame::hash and
hashCallStack are used. The code proper (i.e. llvm/lib) uses
IndexedMemProfData::{addFrame,addCallStack}; they do not directly use
Frame::hash or hashCallStack.
Here IndexedMemProfRecord just needs to reference a CallStackID, so we
can use addCallStack for a real hash-based CallStackId instead of a
fake value like 0x222.
This patch uses IndexedMemProfData in unit tests even when we only
need CallStacks. This way, we get to use addCallStack. Also, the
look is more consistent with other unit tests, where we typically do:
IndexMemProfData MemProfData;
MemProfData.addFrame(...);
MemProfData.addCallStack(...);
// Run some tests
This patch does two things:
- During deserialization, we accept a function name for Frame as an
alternative to the usual GUID expressed as a hexadecimal number.
- During serialization, we print a GUID of Frame as a 16-digit
hexadecimal number prefixed with 0x in the usual way. (Without this
patch, we print a decimal number, which is not customary.)
The patch uses a machinery called "normalization" in YAML I/O, which
lets us serialize and deserialize into an alternative data structure.
For our use case, we have an alternative Frame data structure, which
is identical to "struct Frame" except that Function is of type
GUIDHex64 instead of GlobalValue::GUID. This alternative type
supports the two bullet points above without modifying "struct Frame"
at all.
This patch does two things:
- During deserialization, we accept a function name as an alternative
to the usual GUID represented as a hexadecimal number.
- During serialization, we print a GUID as a 16-digit hexadecimal
number prefixed with 0x in the usual way. (Without this patch, we
print a decimal number, which is not customary.)
In YAML, the MemProf profile is a vector of pairs of GUID and
MemProfRecord. This patch accepts a function name for the GUID, but
it does not accept a function name for the GUID used in Frames yet.
That will be addressed in a subsequent patch.
When we call IndexedMemProfRecord::toMemProfRecord, we care about
getting the original (that is, non-indexed) MemProfRecord back, so we
should just verify that, not the hash values, which are
intermediaries.
There is a remote possibility of hash collisions where call stack
{F1, F2} might come back as {F1, F1} if F1.hash() == F2.hash() for
example. However, since FrameId uses BLAKE, the hash values should be
consistent across architectures. That is, if this test case works on
one architecture, it should work on others as well.
IndexedMemProfData eliminates the need for the "using" directives.
Also, we do not need to declare maps for individual components of the
MemProf profile.
We always call getFrameMapping and getCallStackMapping together in
InstrProfTest.cpp. This patch combines the two functions into new
function getMemProfDataForTest.
This patch replaces FrameIdMap and CallStackIdMap with
IndexedMemProfData, which comes with recently introduced methods like
addFrame and addCallStack.
This patch adds a helper function to replace an idiom like:
CallStackId CSId = hashCallStack(CallStack)
MemProfData.CallStacks.try_emplace(CSId, CallStack);
// Do something with CSId.
For Frames, we prefer the inline notation for the brevity.
For PortableMemInfoBlock, we go through all member fields and print
out those that are populated.
This iteration works around the unavailability of
ScalarTraits<uintptr_t> on macOS.