Collect profiles for functions we encounter when collecting a contextual profile, that are not associated with a call site. This is expected to happen for signal handlers, but it also - problematically - currently happens for mem{memset|copy|move|set}, which are currently inserted after profile instrumentation.
Collecting a "regular" flat profile in these cases would hide the problem - that we loose better profile opportunities.
When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root.
We expose this in the profile and leave the normalization to the user of the profile, for a few reasons:
* it's only needed if reasoning about all profiles in aggregate.
* the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles
* the information itself may be interesting as an indication of relative sampling of various contexts.
A section for flat profiles (i.e. non-contextual). This is useful for debugging or for intentional cases where a root isn't identified.
This patch adds the reader/writer support. `compiler-rt` changes follow in a subsequent change.
The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches.
The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
This is a follow-up from PR #122545, which enabled converting yaml to contextual profiles.
This change uses the lower level yaml APIs because:
- the mapping APIs `llvm::yaml` offers don't work with `const` values, because they (the APIs) want to enable both serialization and deserialization
- building a helper data structure would be an alternative, but it'd be either memory-consuming or overly-complex design, given the recursive nature of the contextual profiles.
- the set used for targets under a callsite is simpler to use if iterators
are stable (it gets manipulated during updates)
- the set used to fetch the transitive closure of GUIDs under a node can
be left as a choice to the user.
This requires output-ing a "Magic" 4-byte header. We also emit a block info block, to describe our blocks and records. The output of `llvm-bcanalyzer` would look like:
```
<BLOCKINFO_BLOCK/>
<Metadata NumWords=17 BlockCodeSize=2>
<Version op0=1/>
<Context NumWords=13 BlockCodeSize=2>
<GUID op0=2/>
<Counters op0=1 op1=2 op2=3/>
```
Instead of having `Unknown` for block and record IDs.
This reverts commit 03c7458a3603396d2d0e1dee43399d3d1664a264.
One of the problems was addressed in #92208
The other problem: needed to add `BitstreamReader` to the list of
link deps of `LLVMProfileData`
Utility converting a profile coming from `compiler_rt` to bitstream, and
a reader.
`PGOCtxProfileWriter::write` would be used as the `Writer` parameter for
`__llvm_ctx_profile_fetch` API. This is expected to happen in user code,
for example in the RPC hanler tasked with collecting a profile, and
would look like this:
```
// set up an output stream "Out", which could contain other stuff
{
// constructing the Writer will start the section, in Out, containing
// the collected contextual profiles.
PGOCtxProfWriter Writer(Out);
__llvm_ctx_profile_fetch(&Writer, +[](void* W, const ContextNode &N) {
reinterpret_cast<PGOCtxProfWriter*>(W)->write(N);
});
// Writer going out of scope will finish up the section.
}
```
The reader produces a data structure suitable for maintenance during IPO
transformations.