We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader.
A subsequent patch will address deserialization.
(thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)
The result of "Independence pairs" is not mergeable. This change makes
defers re-calculation of "Independence pairs" after merging test
vectors.
No apparent behavior changes.
This patch makes sure YAML files are opened for reading as text file to
trigger auto-conversion from EBCDIC encoding into expected ASCII
encoding on z/OS platform. This is required to fix the following lit
tests:
```
LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-gsym-callsite-info-exe.yaml
LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-gsym-callsite-info-obj.test
LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-gsym-callsite-info-dsym.yaml
LLVM :: Transforms/PGOProfile/memprof_undrift_missing_leaf.ll
```
Now that IndexedMemProfData::{addFrame,addCallStack} are the only
callers of Frame::hash and hashCallStack, respectively, this patch
moves those functions into IndexedMemProfData and makes them private.
With this patch, we can obtain FrameId and CallStackId only through
addFrame and addCallStack, respectively.
This patch adds YAML read/write support to llvm-profdata. The primary
intent is to accommodate MemProf profiles in test cases, thereby
avoiding the binary format.
The read support is via llvm-profdata merge. This is useful when we
want to verify that the compiler does the right thing on a given .ll
file and a MemProf profile in a test case. In the test case, we would
convert the MemProf profile in YAML to an indexed profile and invoke
the compiler on the .ll file along with the indexed profile.
The write support is via llvm-profdata show --memory. This is useful
when we wish to convert an indexed MemProf profile to YAML while
writing tests. We would compile a test case in C++, run it for an
indexed MemProf profile, and then convert it to the text format.
This patch adds a helper function to replace an idiom like:
CallStackId CSId = hashCallStack(CallStack)
MemProfData.CallStacks.try_emplace(CSId, CallStack);
// Do something with CSId.
The patch makes InstrProfWriter::writeImpl less monolithic by adding
InstrProfWriter::writeBinaryIds to serialize binary IDs. This way,
InstrProfWriter::writeImpl can simply call the new function instead of
handling all the details within writeImpl.
This patch extends the PGO infrastructure with an option to prefer the
instrumentation of loop entry blocks.
This option is a generalization of
19fb5b467b,
and helps to cover cases where the loop exit is never executed.
An example where this can occur are event handling loops.
Note that change does NOT change the default behavior.
Now that MemProf format version 1 has been removed, nobody uses:
- IndexedAllocationInfo::CallStack
- IndexedMemProfRecord::CallSites
This patch removed the dead struct fields.
You might notice that IndexedMemProfRecord::{clear,merge} do not
mention CallSiteIds at all. I think it's an oversight. clear doesn't
matter at the moment because we call it during serialization to reduce
memory footprint. merge is simply not as well tested as it should be.
I'll follow up with a separate patch to address these issues.
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because we do not
have a text format for the MemProf profile. We would write a test
case in C++, run it for a binary MemProf profile, and then finally run
a test written in LLVM IR with the binary profile.
This patch paves the way toward YAML-based MemProf profile.
Specifically, it adds new class YAMLMemProfReader derived from
MemProfReader. For now, it only adds a function to parse StringRef
pointing to YAML data. Subseqeunt patches will wire it to
llvm-profdata and read from a file.
The field names are based on various printYAML functions in MemProf.h.
I'm not aiming for compatibility with the format used in printYAML,
but I don't see a point in changing the field names.
This iteration works around the unavailability of
ScalarTraits<uintptr_t> on macOS.
This reverts commit c00e53208db638c35499fc80b555f8e14baa35f0.
It looks like this breaks building LLVM on macOS and some other
platform/compiler combos
https://lab.llvm.org/buildbot/#/builders/23/builds/5252https://green.lab.llvm.org/job/llvm.org/job/clang-san-iossim/5356/console
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:34:
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MemProfReader.h:24:
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfReader.h:22:
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfCorrelator.h:21:
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:1173:36: error: implicit instantiation of undefined template 'llvm::yaml::MissingTrait<unsigned long>'
char missing_yaml_trait_for_type[sizeof(MissingTrait<T>)];
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:961:7: note: in instantiation of function template specialization 'llvm::yaml::yamlize<unsigned long>' requested here
yamlize(*this, Val, Required, Ctx);
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:883:11: note: in instantiation of function template specialization 'llvm::yaml::IO::processKey<unsigned long, llvm::yaml::EmptyContext>' requested here
this->processKey(Key, Val, true, Ctx);
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MIBEntryDef.inc:55:1: note: in instantiation of function template specialization 'llvm::yaml::IO::mapRequired<unsigned long>' requested here
MIBEntryDef(AccessHistogram = 27, AccessHistogram, uintptr_t)
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:77:8: note: expanded from macro 'MIBEntryDef'
Io.mapRequired(KeyStr.str().c_str(), MIB.Name); \
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:310:8: note: template is declared here
struct MissingTrait;
^
1 error generated.
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because we do not
have a text format for the MemProf profile. We would write a test
case in C++, run it for a binary MemProf profile, and then finally run
a test written in LLVM IR with the binary profile.
This patch paves the way toward YAML-based MemProf profile.
Specifically, it adds new class YAMLMemProfReader derived from
MemProfReader. For now, it only adds a function to parse StringRef
pointing to YAML data. Subseqeunt patches will wire it to
llvm-profdata and read from a file.
The field names are based on various printYAML functions in MemProf.h.
I'm not aiming for compatibility with the format used in printYAML,
but I don't see a point in changing the field names.
IndexedMemProfRecord contains a complete package of the MemProf
profile, including frames, call stacks, and records. This patch
replaces the three member variables of MemProfReader with
IndexedMemProfRecord.
This transition significantly simplies both the constructor and the
final "take" method:
MemProfReader(IndexedMemProfData MemProfData)
: MemProfData(std::move(MemProfData)) {}
IndexedMemProfData takeMemProfData() { return std::move(MemProfData); }
CallStackRadixTreeBuilder::build takes the parameter
MemProfFrameIndexes by value, involving copies:
std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>>
MemProfFrameIndexes
Then "build" makes another copy of MemProfFrameIndexe and passes it to
encodeCallStack for every call stack, which is painfully slow.
This patch changes the type to a pointer so that we don't have to make
a copy every time we pass the argument.
Without this patch, it takes 553 seconds to run "llvm-profdata merge"
on a large MemProf raw profile. This patch shortenes that down to 67
seconds.
This patch adds a quick validity check to
InstrProfWriter::addMemProfData. Specifically, we check to see if we
have all (or none) of the MemProf profile components (frames, call
stacks, records).
The credit goes to Teresa Johnson for suggesting this assert.
This patch removes two functions to verify the consistency between:
- IndexedAllocationInfo::CallStack
- IndexedAllocationInfo::CSId
Now that MemProf format Version 1 has been removed,
IndexedAllocationInfo::CallStack doesn't participate in either
serialization or deserialization, so we don't care about the
consistency between the two fields in IndexAllocationInfo.
Subsequent patches will remove uses of the old field and eventually
remove the field.
This reverts commit fdb050a5024320ec29d2edf3f2bc686c3a84abaa, and
restores ccb4702038900d82d1041ff610788740f5cef723, with a fix for build
bot failures.
Specifically, add ProfileData to the dependences of the BitWriter
library, which was causing shared library builds of LLVM to fail.
Reproduced the failure with a shared library build and confirmed this
change fixes that build failure.
Leverage the support added to represent allocation contexts in a more
compact way via a radix tree in the indexed profile to similarly reduce
sizes of the bitcode summaries.
For a large target, this reduced the size of the per-module summaries by
about 18% and in the distributed combined index files by 28%.
Prepare for usage in the bitcode reader/writer where we already have a
LinearFrameId:
- templatize input frame id type in CallStackRadixTreeBuilder
- templatize input frame id type in computeFrameHistogram
- make the map from FrameId to LinearFrameId optional
We plan to use the same radix format in the ThinLTO summary records,
where we already have a LinearFrameId.
This patch adds MemProfReader::takeMemProfData, a function to return
the complete MemProf profile from the reader. We can directly pass
its return value to InstrProfWriter::addMemProfData without having to
deal with the indivual components of the MemProf profile. The new
function is named "take", but it doesn't do std::move yet because of
type differences (DenseMap v.s. MapVector).
The end state I'm trying to get to is roughly as follows:
- MemProfReader accepts IndexedMemProfData as a parameter as opposed
to the three individual components (frames, call stacks, and
records).
- MemProfReader keeps IndexedMemProfData as a class member without
decomposing it into its individual components.
- MemProfReader returns IndexedMemProfData like:
IndexedMemProfData takeMemProfData() {
return std::move(MemProfData);
}
This patch adds InstrProfWriter::addMemProfData, which adds the
complete MemProf profile (frames, call stacks, and records) to the
writer context.
Without this function, functions like loadInput in llvm-profdata.cpp
and InstrProfWriter::mergeRecordsFromWriter must add one item (frame,
call stack, or record) at a time. The new function std::moves the
entire MemProf profile to the writer context if the destination is
empty, which is the common use case. Otherwise, we fall back to
adding one item at a time behind the scene.
Here are a couple of reasons why we should add this function:
- We've had a bug where we forgot to add one of the three data
structures (frames, call stacks, and records) to the writer context,
resulting in a nearly empty indexed profile. We should always
package the three data structures together, especially on API
boundaries.
- We expose a little too much of the MemProf detail to
InstrProfWriter. I'd like to gradually transform
InstrProfReader/Writer to entities managing buffers (sequences of
bytes), with actual serialization/deserialization left to external
classes. We already do some of this in InstrProfReader, where
InstrProfReader "contracts out" to IndexedMemProfReader to handle
MemProf details.
I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter
for now because MemProfReader uses DenseMap for frames and call
stacks, whereas MemProfData uses MapVector. I'll resolve these
mismatches in subsequent patches.
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.
I'm not touching version 1 for now because some tests still rely on
version 1.
Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.
This patch further speeds up the extraction of caller-callee pairs
from the profile.
Recall that we reconstruct a call stack by traversing the radix tree
from one of its leaf nodes toward a root. The implication is that
when we decode many different call stacks, we end up visiting nodes
near the root(s) repeatedly. That in turn adds many duplicates to our
data structure:
DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;
only to be deduplicated later with sort+unique for each vector.
This patch makes the extraction process more efficient by keeping
track of indices of the radix tree array we've visited so far and
terminating traversal as soon as we encounter an element previously
visited.
Note that even with this improvement, we still add at least one
caller-callee pair to the data structure above for each call stack
because we do need to add a caller-callee pair for the leaf node with
the callee GUID being 0.
Without this patch, it takes 4 seconds to extract caller-callee pairs
from a large MemProf profile. This patch shortenes that down to
900ms.
We've seen bugs where we lost track of error states stored in the
functor because we passed the functor by value (that is,
std::function) as opposed to reference (llvm::function_ref).
This patch fixes a couple of places we pass functors by value.
While we are at it, this patch adds curly braces around a "for" loop
spanning multiple lines.
We know that the MemProf profile has a lot of duplicate call stacks.
Extracting caller-callee pairs from a call stack we've seen before is
a wasteful effort.
This patch makes the extraction more efficient by first coming up with
a work list of linear call stack IDs -- the set of starting positions
in the radix tree array -- and then extract caller-callee pairs from
each call stack in the work list.
We implement the work list as a bit vector because we expect the work
list to be dense in the range [0, RadixTreeSize). Also, we want the
set insertion to be cheap.
Without this patch, it takes 25 seconds to extract caller-callee pairs
from a large MemProf profile. This patch shortenes that down to 4
seconds.
Undrifting the MemProf profile requires two sets of information:
- caller-callee pairs from the profile
- callee-callee pairs from the IR
This patch adds a function to do the former. The latter has been
addressed by extractCallsFromIR.
Unfortunately, the current MemProf format does not directly give us
the caller-callee pairs from the profile. "struct Frame" just tells
us where the call site is -- Caller GUID and line/column numbers; it
doesn't tell us what function a given Frame is calling. To extract
caller-callee pairs, we need to scan each call stack, look at two
adjacent Frames, and extract a caller-callee pair.
Conceptually, we would extract caller-callee pairs with:
for each MemProfRecord in the profile:
for each call stack in AllocSites:
extract caller-callee pairs from adjacent pairs of Frames
However, this is highly inefficient. Obtaining MemProfRecord involves
looking up the OnDiskHashTable, allocating several vectors on the
heap, and populating fields that are irrelevant to us, such as MIB and
CallSites.
This patch adds an efficient way of doing the above. Specifically, we
- go though all IndexedMemProfRecords,
- look at each linear call stack ID
- extract caller-callee pairs from each call stack
The extraction is done by a new class CallerCalleePairExtractor,
modified from LinearCallStackIdConverter, which reconstructs a call
stack from the radix tree array. For our purposes, we skip the
reconstruction and immediately populates the data structure for
caller-callee pairs.
The resulting caller-callee-pairs is of the type:
DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs;
which can be passed directly to longestCommonSequence just like the
result of extractCallsFromIR.
Further performance optimizations are possible for the new functions
in this patch. I'll address those in follow-up patches.
ICP builds a symtab from the symbols in the module allowing mapping from
the VP metadata GUIDs to the Function. MemProf uses this same symtab
handling for its ICP during cloning. When symbols are added to the
symtab, the handling adds both a GUID computed from the function name,
or from the attached PGOFuncName metadata for locals, as well as a GUID
computed from the "canonicalized" name, which strips all "." suffixes
other than ".__uniq". This was originally meant to remove the ".llvm.*"
suffix added to promoted locals (done earlier in the ThinLTO backend).
In theory, it should no longer be needed as locals should have
PGOFuncName metadata.
However, this was causing a linker unsat, in code that used coroutines.
For an original coroutine function, there were several additional
functions created that had the same name, but different "." suffixes.
Therefore the canonical name for these additional functions had the same
GUID as that of the original function, leading to extra entries in the
symtab, and to selecting the wrong function for promotion. For regular
ICP this can happen, but is just a performance issue. However, for
memprof the promoted direct call calls a memprof clone, and because we
called the wrong function, in this case it didn't have a memprof clone
and we got a linker unsat.
We may be able to remove the canonical name handling for ICP in general,
but for now disable it for MemProf. At worst this could lead to not
finding a GUID in the symtab and not performing an ICP, so should be
conservatively correct.
My change in bb3915149a7c9b1660db9caebfc96343352e8454 added a call to
std::time which worked generally as there must be some transitive
include of <ctime>. However, I saw one MSVC bot failure:
InstrProfWriter.cpp(202): error C2039: 'time': is not a member of 'std'
from https://lab.llvm.org/buildbot/#/builders/63/builds/2325.
Presumably explictly including <ctime> should fix this.
Add support for generating random hotness in the memprof profile writer,
to be used for testing. The random seed is printed to stderr, and an
additional option enables providing a specific seed in order to
reproduce a particular random profile.
On WebAssembly, most coverage metadata contents read by llvm-cov (like
`__llvm_covmap` and `__llvm_covfun`) are stored in custom sections
because they are not referenced at runtime. However, `__llvm_prf_names`
is referenced at runtime by the profile runtime library and is read by
llvm-cov post-processing tools, so it needs to be stored in a data
segment, which is allocatable at runtime and accessible by tools as long
as "name" section is present in the binary.
This patch changes the way llvm-cov reads `__llvm_prf_names` on
WebAssembly. Instead of looking for a section, it looks for a data
segment with the same name.
This reverts commit 157f10ddf2d851125a85a71e530dc9d50cb032a2 and fixes
PE/COFF `.lprfn$A` section handling.
Currently both True/False counts were folded. It lost the information,
"It is True or False before folding." It prevented recalling branch
counts in merging template instantiations.
In `llvm-cov`, a folded branch is shown as:
- `[True: n, Folded]`
- `[Folded, False n]`
In the case If `n` is zero, a branch is reported as "uncovered". This is
distinguished from "folded" branch. When folded branches are merged,
`Folded` may be dissolved.
In the coverage map, either `Counter` is `Zero`. Currently both were
`Zero`.
Since "partial fold" has been introduced, either case in `switch` is
omitted as `Folded`.
Each `case:` in `switch` is reported as `[True: n, Folded]`, since
`False` count doesn't show meaningful value.
When `switch` doesn't have `default:`, `switch (Cond)` is reported as
`[Folded, False: n]`, since `True` count was just the sum of `case`(s).
`switch` with `default` can be considered as "the statement that doesn't
have any `False`(s)".
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:
1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
- [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
- [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested
See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.