(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)
* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
* Implement profile reader and writer support
* Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
* Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
* Indexed profile writer collects the list of vtable names, and stores that to index profiles.
* Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.
1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))
**Original Description**
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
When merging instrFDO profiles with afdo profile as supplementary, instrFDO counters for static functions are stored with function's PGO name (with filename.cpp; prefix).
- This pull request fixes the delimiter used when a PGO function name is 'normalized' for AFDO look-up.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
This seems to cause Clang to crash, see comments on the code review. Reverting
until the problem can be investigated.
> Part 1 of 3. This includes the LLVM back-end processing and profile
> reading/writing components. compiler-rt changes are included.
>
> Differential Revision: https://reviews.llvm.org/D138846
This reverts commit a50486fd736ab2fe03fcacaf8b98876db77217a7.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
Cleanup profile reader classes to prepare for complex refactoring as propsed in D147740 (Use MD5 as key for profile map). Change is too complicated so I am cleaning up the reader implementation first with these goals.
- Reduce duplicated/similar logic
- Reduce virtual functions, changing them to non-virtual
- Reduce unnecessry checks, indirections, and dead writes.
This is patch 1/n. This patch refactors NameTable
Explaining several decisions here
1. useMD5() means whether names of the profiles (the ProfileMap) are represented as MD5. It is NOT whether the input profile format is MD5. This function is an interface for IPO passes to decide whether to match function names or function MD5. There are two motives here:
(a) Eventually we want to use MD5 to represent all function contexts because it is much faster to use it as a key for lookup tables (prototype implementation D147740), so in compilation mode we call setProfileUseMD5() to force use MD5. While in tools mode (llvm-profdata) we want to keep the function name info if it's in the input profile.
(b) We also propose to allow multiple name tables and profile sections in ExtBinary format, and it could consist of name tables with or without using MD5, in this case MD5 prevails and other name tables are converted to MD5.
2. MD5 handling logic is pushed up to BinaryReader base class, because this trades a non-devirtualized virtual function call with a predictable branch. ReadStringFromTable() accounts for >5% time when loading a full 1 GB profile, it should not be virtual.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D148868
For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching.
Example for profile flattening:
```
Original profile:
_Z3bazi:20301:1000
1: 1000
3: 2000
5: inline1:1600
1: 600
3: inline2:500
1: 500
Flattened profile:
_Z3bazi:18701:1000
1: 1000
3: 2000
5: 600 inline1:600
inline1:1100:600
1: 600
3: 500 inline2: 500
inline2:500:500
1: 500
```
This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D146452
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145190
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145190
The memprof profiles and binaries need to be updated in case of version
updates. This change adds three scripts for llvm-profdata, clang and
llvm tests where memprof profiles are used as inputs. Also update the
tests, profiles and binaries in this change. Change based on the review
suggestions in D145023.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145644
This change replaces the binary profiles and executables used for
testing the memprof profile reader with tests where the profiles are
generated on the fly. This reduces toil when the profile version
changes. The tests are moved from tools/llvm-profdata to
compiler-rt/test/memprof due to the following reasons:
1. Adding dependency on memprof lit.cfg.py for llvm-profdata is
preferable to adding a dependency on compiler-rt for llvm/test.
2. All the tests can now be run with `ninja check-memprof`.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D145023
Text editors can introduce spaces aligning the previous line's indentation. This crashes llvm-profdata. Added check to handle this case.
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D143369
Track min/max/avg access density (accesses per byte and accesses per
byte per lifetime second) metrics directly during profiling. This allows
more accurate use of these metrics in profile analysis and use, instead
of trying to compute them from already aggregated data in the profile.
This required regenerating some of the raw profile and executable inputs
for a few tests. While here, make the llvm-profdata memprof tests more
resilient to differences in things like memory mapping, timestamps and
cpu ids to make future test updates easier.
Differential Revision: https://reviews.llvm.org/D141558
We need to flatten the SampleFDO profile in profile supplementation
because the InstrFDO profile does not have inlined callsite counters.
Without flattening profile, FDO optimizations are not stable:
we will not supplement the second generation profile when the modified
functions are all inlined.
This patch fixes this issue: we will flatten the profile for functions
that appears in FDO profile.
Note that we only need to find the hot/warm functions in SampleFDO
profile, so we will not perform a full flatten. We will use
a DFS traversal to compute the accumulated entry count and max bodycount.
This is much cheaper than full flattening.
Differential Revision: https://reviews.llvm.org/D138893
Current implementation promotes a non-cold function in the SampleFDO profile
into a hot function in the FDO profile. This is too aggressive. This patch
promotes a hot functions in the SampleFDO profile into a hot function, and a
warm function in SampleFDO into a warm function in FDO.
Differential Revision: https://reviews.llvm.org/D132601
This patch has the following changes:
(1) Handling of internal linkage functions (static functions)
Static functions in FDO have a prefix of source file name, while they do not
have one in SampleFDO. Current implementation does not handle this and we are
not updating the profile for static functions. This patch fixes this.
(2) Handling of -funique-internal-linakge-symbols
Again this is for the internal linkage functions. Option
-funique-internal-linakge-symbols can now be applied to both FDO and SampleFDO
compilation. When it is used, it demangles internal linkage function names and
adds a hash value as the postfix.
When both SampleFDO and FDO profiles use this option, or both
not use this option, changes in (1) should handle this.
Here we also handle when the SampleFDO profile using this option while FDO
profile not using this option, or vice versa.
There is one case where this patch won't work: If one of the profiles used
mangled name and the other does not. For example, if the SampleFDO profile
uses clang c-compiler and without -funique-internal-linakge-symbols, while
the FDO profile uses -funique-internal-linakge-symbols. The SampleFDO profile
contains unmangled names while the FDO profile contains mangled names. If
both profiles use c++ compiler, this won't happen. We think this use case
is rare and does not justify the effort to fix.
Differential Revision: https://reviews.llvm.org/D132600
1) We now use the count size in FDO as the main factor to deal with pre-inliner.
Currently we use the number of sample records in the SampleFDO profile. But
that only counts the top-level body sample records (not including the nested
call-sites). We are seeing some big functions not being updated because of
this. I think using the count size in FDO profile is more reasonable to judge if
the function is likely to be inlined to the callers in pre-inliner.
(2) We use getMaxCount in SampleFDO rather the HeadSample to determine if
if the function is hot in SampleFDO. This is in-sync with the logic
in the compiler (also HeadSample can be 0).
Differential Revision: https://reviews.llvm.org/D132602
Add a check to detect that the profiled binary was build with position
independent code. Add a test with a pie binary to which can be reused
later when support is added. Also clean up the error messages with
trailing colons.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D128564
This reverts commit 0d362c90d335509c57c0fbd01ae1829e2b9c3765.
Reason: Causes the MSan buildbot to fail (see comments on
https://reviews.llvm.org/D121179 for more information
To ease profile annotation, each of the callsites in a function can be
annotated with profile data - "IR metadata format for MemProf" [1]. This
patch extends the on-disk serialized record format to store the debug
information for allocation callsites incl inline frames. This change is
incompatible with the existing format i.e. indexed profiles must be
regenerated, raw profiles are unaffected.
[1] https://groups.google.com/g/llvm-dev/c/aWHsdMxKAfE/m/WtEmRqyhAgAJ
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D121179
We add to ensure that we are observing the correct callstack order in memprof
during symbolization. There was some confusion whether the order of
DIFrame objects were reversed but in reality the leaf function is at
index 0 so no code changes are required.
Differential Revision: https://reviews.llvm.org/D121759
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
This commit also includes the changes reviewed separately in D120093.
Differential Revision: https://reviews.llvm.org/D120103
This reverts commit 85355a560a33897453df2ef959e255ee725eebce.
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
Differential Revision: https://reviews.llvm.org/D118653
This reverts commit 0f73fb18ca333e38cdb9ffa701a8db026c56041d.
Use llvm/Profile/MIBEntryDef.inc instead of relative path.
Generated the raw profile data with `-mllvm
-enable-name-compression=false` so that builbots where the reader is
built without zlib do not fail.
Also updated the test build instructions.
This reverts commit 43c2348c5b926df6bdbc5b70efaa35ecdefe12d5.
Buildbots are failing with an error on reading memprof testdata.
"Inputs/basic.profraw: profile uses zlib
compression but the profile reader was built without zlib support"
https://lab.llvm.org/buildbot/#/builders/16/builds/24490
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
Differential Revision: https://reviews.llvm.org/D118653
This reverts commit dbf47d227d080e4eb7239b589660f51d7b08afa9.
Reapply https://reviews.llvm.org/D116784 now that
https://reviews.llvm.org/D118413 has landed with a couple of fixes:
* fix raw profile reader unaligned access identified by ubsan
* fix windows build by using MOCK_CONST_METHOD3 instead of MOCK_METHOD.
This change extends the RawMemProfReader to read all the sections of the
raw profile and symbolize the virtual addresses recorded as part of the
callstack for each allocation. For now the symbolization is used to
display the contents of the profile with llvm-profdata.
Differential Revision: https://reviews.llvm.org/D116784
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
* We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
* We use a single byte per function rather than 8 bytes per block
The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.
When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).
[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116180
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.
For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.
In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.
A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum.
```
main:1968679:12
2: 24
3: 28 _Z5funcAi:18
3.1: 28 _Z5funcBi:30
3: _Z5funcAi:1467398
0: 10
1: 10 _Z8funcLeafi:11
3: 24
1: _Z8funcLeafi:1467299
0: 6
1: 6
3: 287884
4: 287864 _Z3fibi:315608
15: 23
!CFGChecksum: 138828622701
!Attributes: 2
!CFGChecksum: 281479271677951
!Attributes: 2
```
Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
- Changes in the sample loader to support probe-based nested profile.
I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115205
The first 8b of each raw profile section need to be aligned to 8b since
the first item in each section is a u64 count of the number of items in
the section.
Summary of changes:
* Assert alignment when reading counts.
* Update test to check alignment, relax some size checks to allow padding.
* Update raw binary inputs for llvm-profdata tests.
Differential Revision: https://reviews.llvm.org/D114826