This patch refactors the serialization of MemProf data to a switch
statement style:
switch (Version) {
case Version0:
return ...;
case Version1:
return ...;
}
just like IndexedMemProfRecord::serialize.
A reasonable amount of code is shared and factored out to helper
functions between writeMemProfV0 and writeMemProfV1 to the extent that
doens't hamper readability.
commit d89914f30bc7c180fe349a5aa0f03438ae6c20a4
Author: Kazu Hirata <kazu@google.com>
Date: Wed Apr 3 21:48:38 2024 -0700
changed RecordWriterTrait to a template class with IndexedVersion as a
template parameter. This patch changes the class back to a
non-template one while retaining the ability to serialize multiple
versions.
The reason I changed RecordWriterTrait to a template class was
because, even if RecordWriterTrait had IndexedVersion as a member
variable, RecordWriterTrait::EmitKeyDataLength, being a static
function, would not have access to the variable.
Since OnDiskChainedHashTableGenerator calls EmitKeyDataLength as:
const std::pair<offset_type, offset_type> &Len =
InfoObj.EmitKeyDataLength(Out, I->Key, I->Data);
we can make EmitKeyDataLength a member function, but we have one
problem. InstrProfWriter::writeImpl calls:
void insert(typename Info::key_type_ref Key,
typename Info::data_type_ref Data) {
Info InfoObj;
insert(Key, Data, InfoObj);
}
which default-constructs RecordWriterTrait without a specific version
number. This patch fixes the problem by adjusting
InstrProfWriter::writeImpl to call the other form of insert instead:
void insert(typename Info::key_type_ref Key,
typename Info::data_type_ref Data, Info &InfoObj)
To prevent an accidental invocation of the default constructor of
RecordWriterTrait, this patch deletes the default constructor.
I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites. We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId. The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.
As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A
subsequent patch will add Version2 support to llvm-profdata.
The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId. That is:
const IndexedAllocationInfo &N = ...;
...
LE.write<uint64_t>(N.CallStack.size());
for (const FrameId &Id : N.CallStack)
LE.write<FrameId>(Id);
becomes:
LE.write<CallStackId>(N.CSId);
Add annotated vtable GUID as referenced variables in per function
summary, and update bitcode writer to create value-ids for these
referenced vtables.
- This is the part3 of type profiling work, and described in the "Virtual Table Definition Import" [1] section of the
RFC.
[1] https://github.com/llvm/llvm-project/pull/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
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)
* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
* Implement profile reader and writer support
* Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
* Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
* Indexed profile writer collects the list of vtable names, and stores that to index profiles.
* Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
This patch adds a version field to the MemProf section of the indexed
profile format, calling the new version "version 1". The existing
version is called "version 0".
The writer supports both versions via a command-line option:
llvm-profdata merge --memprof-version=1 ...
The reader supports both versions by automatically detecting the
version from the header.
- The direct use case (in [1]) is to add `llvm::IntervalMap` [2] and the allocator required by IntervalMap ctor [3]
to class `InstrProfSymtab` as owned members. The allocator class doesn't have a move-assignment operator;
and it's going to take much effort to implement move-assignment operator for the allocator class such that the
enclosing class is movable.
- There is only one use of compiler-generated move-assignment operator in the repo, which is in
CoverageMappingReader.cpp. Luckily it's possible to use std::unique_ptr<InstrProfSymtab> instead, so did the change.
[1] https://github.com/llvm/llvm-project/pull/66825
[2] 4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L936)
[3] 4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L1041)
There are two ways to create in-memory instances of
IndexedAllocationInfo -- deserialization of the raw MemProf data and
that of the indexed MemProf data.
With:
commit 74799f424063a2d751e0f9ea698db1f4efd0d8b2
Author: Kazu Hirata <kazu@google.com>
Date: Sat Mar 23 19:50:15 2024 -0700
we compute CallStackId for each call stack in IndexedAllocationInfo
while deserializing the raw MemProf data.
This patch does the same while deserilizing the indexed MemProf data.
As with the patch above, this patch does not add any use of
CallStackId yet.
The indexed MemProf file has a huge amount of redundancy. In a large
internal application, 82% of call stacks, stored in
IndexedAllocationInfo::CallStack, are duplicates.
We should work toward deduplicating call stacks by referring to them
with unique IDs with actual call stacks stored in a separate data
structure, much like we refer to memprof::Frame with memprof::FrameId.
At the same time, we need to facilitate a graceful transition from the
current version of the MemProf format to the next. We should be able
to read (but not write) the current version of the MemProf file even
after we move onto the next one.
With those goals in mind, I propose to have an integer ID next to
CallStack in IndexedAllocationInfo to refer to a call stack in a
succinct manner. We'll gradually increase the areas of the compiler
where IDs and call stacks have one-to-one correspondence and
eventually remove the existing CallStack field.
This patch adds call stack ID, named CSId, to IndexedAllocationInfo
and teaches the raw profile reader to compute unique call stack IDs
and store them in the new field. It does not introduce any user of
the call stack IDs yet, except in verifyFunctionProfileData.
With one raw memprof file I have, NumPCs averages about 41. Given the
default number of inline elements being 8 for SmallVector<uint64_t>,
we should reserve the storage in advance.
Enable temporary support to ease use of new llvm-profdata with slightly
older indexed profiles after 16e74fd48988ac95551d0f64e1b36f78a82a89a2,
which bumped the indexed format for type profiling.
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.
1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))
**Original Description**
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
This replaces `SmallVector<CondState>` and emulates it.
- -------- True False DontCare
- Values: True False False
- Visited: True True False
`findIndependencePairs()` can be optimized with logical ops.
FIXME: Specialize `findIndependencePairs()` for the single word.
To reduce conditional judges in the loop in `findIndependencePairs()`, I
have tried a couple of tweaks.
* Isolate the final result in `TestVectors`
`using TestVectors = llvm::SmallVector<std::pair<TestVector,
CondState>>;`
The final result was just piggybacked on `TestVector`, so it has been
isolated.
* Filter out and sort `ExecVectors` by the final result
It will cost more in constructing `ExecVectors`, but it can reduce at
least one conditional judgement in the loop.
This patch inserts 1-byte counters instead of an 8-byte counters into
llvm profiles for source-based code coverage. The origial idea was
proposed as block-cov for PGO, and this patch repurposes that idea for
coverage: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
The current 8-byte counters mechanism add counters to minimal regions,
and infer the counters in the remaining regions via adding or
subtracting counters. For example, it infers the counter in the if.else
region by subtracting the counters between if.entry and if.then regions
in an if statement. Whenever there is a control-flow merge, it adds the
counters from all the incoming regions. However, we are not going to be
able to infer counters by subtracting two execution counts when using
single-byte counters. Therefore, this patch conservatively inserts
additional counters for the cases where we need to add or subtract
counters.
RFC:
https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685
This is a preparation of incoming Clang changes (#82448) and just checks
`TVIdx` is calculated correctly. NFC.
`TVIdxBuilder` calculates deterministic Indices for each Condition Node.
It is used for `clang` to emit `TestVector` indices (aka ID) and for
`llvm-cov` to reconstruct `TestVectors`.
This includes the unittest `CoverageMappingTest.TVIdxBuilder`.
See also
https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798
- Prune `RegionMCDCBitmapMap` and `RegionCondIDMap`. They are handled
by `MCDCState`.
- Rename `s/BitmapMap/DecisionByStmt/`. It can handle Decision stuff.
- Rename `s/CondIDMap/BranchByStmt/`. It can be handle Branch stuff.
- `MCDCRecordProcessor`: Use `DecisionParams.BitmapIdx` directly.
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
Also, Let `NumConditions` `uint16_t`.
It is smarter to handle the ID as signed.
Narrowing to `int16_t` will reduce costs of handling byvalue. (See also
#81221 and #81227)
External behavior doesn't change. They below handle values as internal
values plus 1.
* `-dump-coverage-mapping`
* `CoverageMappingReader.cpp`
* `CoverageMappingWriter.cpp`
Introduce `mcdc::DecisionParameters` and `mcdc::BranchParameters` and make
sure them not initialized as zero.
FIXME: Could we make `CoverageMappingRegion` as a smart tagged union?
They can be also used in `clang`.
Introduce the lightweight header instead of `CoverageMapping.h`.
This includes for now:
* `mcdc::ConditionID`
* `mcdc::Parameters`
Deprecate `TestVectors`, since no one uses it.
This affects the output order of ExecVectors.
The current impl emits sorted by binary value of ExecVector. This impl
emits along the traversal of `buildTestVector()`.
- Function getParsedIRPGOFuncName splits name by delimiter. The `[filename;]mangled-name` format could be generalized for non-function global values (e.g., vtables for type profiling). So rename the
function.
- Use kGlobalIdentifierDelimiter rather than semicolon directly for defragmentation.
* `getFunctionBitmap()` stores not `std::vector<uint8_t>` but
`BitVector`.
* `CounterMappingContext` holds `Bitmap` (instead of the ref of bytes)
* `Bitmap` and `BitmapIdx` are used instead of `evaluateBitmap()`.
FIXME: `InstrProfRecord` itself should handle `Bitmap` as `BitVector`.
The current implementation (D138849) assumes `Branch`(es) would follow
after the corresponding `Decision`. It is not true if `Branch`(es) are
forwarded to expanded file ID. As a result, consecutive `Decision`(s)
would be confused with insufficient number of `Branch`(es).
`Expansion` will point `Branch`(es) in other file IDs if `Expansion` is
included in the range of `Decision`.
Fixes#77871
---------
Co-authored-by: Alan Phipps <a-phipps@ti.com>
To relax scanning record, tweak order by `Decision < Expansion`, or
`Expansion` could not be distinguished whether it belonged to `Decision`
or not.
Relevant to #77871
Update code from https://reviews.llvm.org/D138847
`buildTestVector` is a standard DFS (walking a reduced ordered binary
decision diagram). Avoid shouldCopyOffTestVectorFor{True,False}Path
complexity and redundant `Map[ID]` lookups.
`findIndependencePairs` unnecessarily uses four nested loops (n<=6) to
find independence pairs. Instead, enumerate the two execution vectors
and find the number of mismatches. This algorithm can be optimized using
the marking function technique described in _Efficient Test Coverage
Measurement for MC/DC, 2013_, but this may be overkill.
In `CoverageMapping.cpp:getMaxBitmapSize()`,
this assumed that the last `Decision` has the maxmum `BitmapIdx`.
Let it scan `max(BitmapIdx)`.
Note that `<=` is used insted of `<`, because `BitmapIdx == 0` is valid
and `MaxBitmapID` is `unsigned`. `BitmapIdx` is unique in the record.
Fixes#78922
The life of `MCDCRecordProcessor`'s instance is short. It may accept
`const` objects to process.
On the other hand, the life of `MCDCBranches` is shorter than `Record`.
It may be rewritten with reference, rather than copying.
`if constexpr` and `if consteval` conditional statements code coverage
should behave more like a preprocesor `#if`-s than normal
ConditionalStmt. This PR should fix that.
---------
Co-authored-by: cor3ntin <corentinjabot@gmail.com>