(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)
* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
* Implement profile reader and writer support
* Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
* Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
* Indexed profile writer collects the list of vtable names, and stores that to index profiles.
* Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
This patch adds a version field to the MemProf section of the indexed
profile format, calling the new version "version 1". The existing
version is called "version 0".
The writer supports both versions via a command-line option:
llvm-profdata merge --memprof-version=1 ...
The reader supports both versions by automatically detecting the
version from the header.
Enable temporary support to ease use of new llvm-profdata with slightly
older indexed profiles after 16e74fd48988ac95551d0f64e1b36f78a82a89a2,
which bumped the indexed format for type profiling.
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.
1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))
**Original Description**
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
The on-disk hash table for the memprof writer holds copies of all the
memprof records to be written. These hold a lot of memory in aggregate,
due to the lists of alloc sites (which each have a list of context
frames) and call sites. Clear each one after emitting it.
This drops the peak memory when writing a very large indexed memprof
profile by about 2.5G.
The MemProf InstrProfWriter uses a couple of MapVector for building the
lists of records it needs to write. Once its entries are all added to
the associated OnDiskChainedHashTableGenerator, it is no longer used.
Clearing these MapVectors, which grow quite large for large profiles,
saved 4G for a large memory profile.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class. This patch replaces
{big,little,native} with llvm::endianness::{big,little,native}.
This patch completes the migration to llvm::endianness and
llvm::endianness::{big,little,native}. I'll post a separate patch to
remove the migration helpers in llvm/Support/Endian.h:
using endianness = llvm::endianness;
constexpr llvm::endianness big = llvm::endianness::big;
constexpr llvm::endianness little = llvm::endianness::little;
constexpr llvm::endianness native = llvm::endianness::native;
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form. This patch replaces
support::endianness with llvm::endianness.
- This function looks up MD5ToNameMap to return a name for a given MD5.
https://github.com/llvm/llvm-project/pull/66825 adds MD5 of global
variable names into this map. So rename methods and update comments
This seems to cause Clang to crash, see comments on the code review. Reverting
until the problem can be investigated.
> Part 1 of 3. This includes the LLVM back-end processing and profile
> reading/writing components. compiler-rt changes are included.
>
> Differential Revision: https://reviews.llvm.org/D138846
This reverts commit a50486fd736ab2fe03fcacaf8b98876db77217a7.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
If two functions are inserted to the same bucket, their order in the
serialized profile is dependent on StringMap iteration order, which is
not guaranteed to be deterministic.
(https://llvm.org/docs/ProgrammersManual.html#llvm-adt-stringmap-h).
Use a sort like we do in writeText.
Add an overload for InstrProfWriter::write so that users can emit the
buffer to a string. Also use this new overload for existing unit test
usecases.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D153904
As discussed in [0], add a `weight` field to temporal profiling traces found in profiles. This allows users to use the `--weighted-input=` flag in the `llvm-profdata merge` command to weight traces from different scenarios differently.
Note that this is a breaking change, but since [1] landed very recently and there is no way to "use" this trace data, there should be no users of this feature. We believe it is acceptable to land this change without bumping the profile format version.
[0] https://reviews.llvm.org/D147812#4259507
[1] https://reviews.llvm.org/D147287
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D148150
As described in [0], this extends IRPGO to support //Temporal Profiling//.
When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.
Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.
In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.
Special thanks to Julian Mestre for helping with reservoir sampling.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D147287
This patch adds support for including binary ids in an indexed profile.
It adds a new field into the header that points to the offset of the
binary id section. The binary id section consists of a size of the
section, and a list of binary ids (if they are present) that consist
of two parts: length and data.
This patch guarantees that indexed profile is backwards compatible
after adding binary ids.
Differential Revision: https://reviews.llvm.org/D135929
This patch adds support for including binary ids in an indexed profile.
It adds a new field into the header that points to the offset of the
binary id section. The binary id section consists of a size of the
section, and a list of binary ids (if they are present) that consist
of two parts: length and data.
This patch guarantees that indexed profile is backwards compatible
after adding binary ids.
Differential Revision: https://reviews.llvm.org/D135929
Previously, we only checked for duplicate zero entries when merging a
MemOPSize table (see D92074), but a user recently provided a reproducer
demonstrating that other entries can also be duplicated. As demonstrated
by the test in this patch, PGOMemOPSizeOpt can potentially generate
invalid IR for non-zero, non-consecutive duplicate entries. This seems
to be a rare case, since the duplicate entry is often below the
threshold, but possible. This patch extends the existing warning to
check for any duplicate values in the table, both in the optimization
and in llvm-profdata.
Differential Revision: https://reviews.llvm.org/D136211
The current implementation of memprof information in the indexed profile
format stores the representation of each calling context fram inline.
This patch uses an interned representation where the frame contents are
stored in a separate on-disk hash table. The table is indexed via a hash
of the contents of the frame. With this patch, the compressed size of a
large memprof profile reduces by ~22%.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D123094
This reverts commit 0d362c90d335509c57c0fbd01ae1829e2b9c3765.
Reason: Causes the MSan buildbot to fail (see comments on
https://reviews.llvm.org/D121179 for more information
To ease profile annotation, each of the callsites in a function can be
annotated with profile data - "IR metadata format for MemProf" [1]. This
patch extends the on-disk serialized record format to store the debug
information for allocation callsites incl inline frames. This change is
incompatible with the existing format i.e. indexed profiles must be
regenerated, raw profiles are unaffected.
[1] https://groups.google.com/g/llvm-dev/c/aWHsdMxKAfE/m/WtEmRqyhAgAJ
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D121179
Based on the discussion in D115393, I've updated the names to be more
descriptive.
Reviewed By: ellis, MaskRay
Differential Revision: https://reviews.llvm.org/D120092
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
This commit also includes the changes reviewed separately in D120093.
Differential Revision: https://reviews.llvm.org/D120103
This reverts commit 85355a560a33897453df2ef959e255ee725eebce.
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
Differential Revision: https://reviews.llvm.org/D118653
This reverts commit 0f73fb18ca333e38cdb9ffa701a8db026c56041d.
Use llvm/Profile/MIBEntryDef.inc instead of relative path.
Generated the raw profile data with `-mllvm
-enable-name-compression=false` so that builbots where the reader is
built without zlib do not fail.
Also updated the test build instructions.
This reverts commit 43c2348c5b926df6bdbc5b70efaa35ecdefe12d5.
Buildbots are failing with an error on reading memprof testdata.
"Inputs/basic.profraw: profile uses zlib
compression but the profile reader was built without zlib support"
https://lab.llvm.org/buildbot/#/builders/16/builds/24490
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
Differential Revision: https://reviews.llvm.org/D118653
This variable was added to `InstrProfWriter.cpp` in D115693 by mistake and it isn't needed.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D118664
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
* We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
* We use a single byte per function rather than 8 bytes per block
The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.
When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).
[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116180
This change refactors the ProfileKind enum into a bitset enum to
represent the different attributes a profile can have. This change
simplifies the logic in the instrprof writer when multiple profiles are
merged together. In the future we plan on introducing a new memory
profile section which will extend the enum by one additional entry.
Without this change when accounting for memory profiles will have to be
maintained separately and will make the logic more complex.
Differential Revision: https://reviews.llvm.org/D115393
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115693
This reverts commit 800bf8ed29fbcaa9436540e83bc119ec92e7d40f.
The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was
failing because I forgot the `llc` commands are architecture specific.
I'll follow up with a fix.
Differential Revision: https://reviews.llvm.org/D115689
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D114565