131 Commits

Author SHA1 Message Date
Kazu Hirata
02f67c097d Use llvm::endianness::{big,little,native} (NFC)
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class. This patch replaces
{big,little,native} with llvm::endianness::{big,little,native}.

This patch completes the migration to llvm::endianness and
llvm::endianness::{big,little,native}.  I'll post a separate patch to
remove the migration helpers in llvm/Support/Endian.h:

  using endianness = llvm::endianness;
  constexpr llvm::endianness big = llvm::endianness::big;
  constexpr llvm::endianness little = llvm::endianness::little;
  constexpr llvm::endianness native = llvm::endianness::native;
2023-10-13 23:16:25 -07:00
Kazu Hirata
b3ec0595d3 [llvm] Drop unaligned from calls to llvm::support::endian::{read,write} (NFC)
The last template parameter of llvm::support::endian::{read,write}
defaults to unaligned, so we can drop that at call sites.
2023-10-10 18:57:14 -07:00
William Junda Huang
3a807b5ac6
[llvm-profdata] Move error handling logic out of normal input's code path (#67177)
Based on disassembly and profiling results, it looks like the cost of
std::error_code and llvm::ErrorOr<> are non-trivial, as it involves
virtual function calls that are not optimized in -O3.

This patch moves error handling logic into the conditional branch in
error checking, only generates std::error_code on actual error instead
of the default case to generate sampleprof_error::success.

In the next patch I am looking to change the API to uses a plain old
enum for error checking instead, because based on profiling results
error handling related code accounts for 7-8% of the profile loading
time even if there's no error at all.
2023-09-25 16:08:27 +00:00
William Huang
7624de5bea [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-08-17 20:10:45 +00:00
Aaron Ballman
1a53b5c367 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292.

Addressing issues found by:
https://lab.llvm.org/buildbot/#/builders/245/builds/11732
https://lab.llvm.org/buildbot/#/builders/187/builds/12251
https://lab.llvm.org/buildbot/#/builders/186/builds/11099
https://lab.llvm.org/buildbot/#/builders/182/builds/6976
2023-07-28 09:41:38 -04:00
William Huang
66ba71d913 [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-07-27 23:08:27 +00:00
Haojian Wu
58056ae299 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 12e9c7aaa66b7624b5d7666ce2794d912bf9e4b7.

The commit has broken the buildbot, see comment https://reviews.llvm.org/D147740#4451540
2023-06-27 15:19:35 +02:00
William Huang
12e9c7aaa6 [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-06-27 00:06:05 +00:00
Douglas Yung
c9a8a0e8a9 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 31af18bccea95fe1ae8aa2c51cf7c8e92a1c208e.

This change is causing build failures on many Windows build bots:

https://lab.llvm.org/buildbot/#/builders/216/builds/22833
https://lab.llvm.org/buildbot/#/builders/123/builds/19602
https://lab.llvm.org/buildbot/#/builders/172/builds/28315
https://lab.llvm.org/buildbot/#/builders/119/builds/13870
https://lab.llvm.org/buildbot/#/builders/233/builds/794
https://lab.llvm.org/buildbot/#/builders/235/builds/387
https://lab.llvm.org/buildbot/#/builders/13/builds/36921
https://lab.llvm.org/buildbot/#/builders/127/builds/50510
2023-06-23 17:58:22 -07:00
William Huang
31af18bcce [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-06-23 21:48:52 +00:00
William Huang
4fe91e083a [llvm-profdata] ProfileReader cleanup - preparation for MD5 refactoring - 3
Cleanup profile reader classes to prepare for complex refactoring as propsed in D147740, continuing D148872
This is patch 3/n. This patch changes the behavior of function offset table.

Previously when reading ExtBinary profile, the funcOffsetTable (map) is always populated, and in addition if the profile is CS, the orderedFuncOffsets (list) is also populated. However when reading the function samples, only one of the container is being used, never both, so it's a huge waste of time to populate both. Added logic to select which one to use, and completely skip reading function offset table if we are in tool mode (all function samples are to be read sequentially regardless)

Reviewed By: davidxl, wenlei

Differential Revision: https://reviews.llvm.org/D149124
2023-05-12 20:45:33 +00:00
William Huang
776bb279d6 [llvm-profdata] ProfileReader cleanup - preparation for MD5 refactoring - 2
Cleanup profile reader classes to prepare for complex refactoring as propsed in D147740, continuing D148868
This is patch 2/n. This patch refactors CSNameTable and related things

The decision to move CSNameTable up to the base class is because a planned improvement (D147740) to use MD5 to lookup Functions/Context frames. In this case we want a unified data structure between contextless function or Context frames, so that it can be mapped by MD5 value. Since Context Frames can represent contextless functions, it is being used for MD5 lookup, therefore exposing it to the base class

Reviewed By: snehasish, wenlei

Differential Revision: https://reviews.llvm.org/D148872
2023-05-09 04:38:01 +00:00
William Huang
4357824c63 [llvm-profdata] ProfileReader cleanup - preparation for MD5 refactoring
Cleanup profile reader classes to prepare for complex refactoring as propsed in D147740 (Use MD5 as key for profile map). Change is too complicated so I am cleaning up the reader implementation first with these goals.
-  Reduce duplicated/similar logic
-  Reduce virtual functions, changing them to non-virtual
-  Reduce unnecessry checks, indirections, and dead writes.

This is patch 1/n. This patch refactors NameTable

Explaining several decisions here

1. useMD5() means whether  names of the profiles (the ProfileMap) are represented as MD5. It is NOT whether the input profile format is MD5. This function is an interface for IPO passes to decide whether to match function names or function MD5. There are two motives here:
(a) Eventually we want to use MD5 to represent all function contexts because it is much faster to use it as a key for lookup tables (prototype implementation D147740), so in compilation mode we call setProfileUseMD5() to force use MD5. While in tools mode (llvm-profdata) we want to keep the function name info if it's in the input profile.
(b) We also propose to allow multiple name tables and profile sections in ExtBinary format, and it could consist of name tables with or without using MD5, in this case MD5 prevails and other name tables are converted to MD5.

2. MD5 handling logic is pushed up to BinaryReader base class, because this trades a non-devirtualized virtual function call with a predictable branch. ReadStringFromTable() accounts for >5% time when loading a full 1 GB profile, it should not be virtual.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D148868
2023-05-06 00:21:03 +00:00
William Huang
d38d6ca179 [llvm-profdata] Deprecate Compact Binary Sample Profile Format
Remove support for compact binary sample profile format

Reviewed By: davidxl, wenlei

Differential Revision: https://reviews.llvm.org/D149400
2023-05-01 17:10:08 +00:00
William Huang
30037b7a6a [llvm-profdata] Fixed various issue with Sample Profile Reader
Fixed various undefind behaviors with current Sample Profile Reader when reading unusual input. Furthermore, add the following rule on allowing multiple name table sections (current Reader has conflicted code handling such case):

When a new name table section is read (in the order sections are read), the names in the previous name table are cleared. Any subsequent sections referring to function names will index into the most recent read name table.

Also changed name table index to uint64_t to be consistent since there's a mix of using uint32_t and uint64_t.

Reviewed By: snehasish, huangjd

Differential Revision: https://reviews.llvm.org/D146182
2023-04-13 21:31:26 +00:00
William Huang
84719e2f03 [llvm-profdata] Fix bug llvm-profdata crashes when reading a text sample profile with an empty line with spaces.
Text editors can introduce spaces aligning the previous line's indentation. This crashes llvm-profdata. Added check to handle this case.

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D143369
2023-02-06 22:29:20 +00:00
Steven Wu
516e301752 [NFC][Profile] Access profile through VirtualFileSystem
Make the access to profile data going through virtual file system so the
inputs can be remapped. In the context of the caching, it can make sure
we capture the inputs and provided an immutable input as profile data.

Reviewed By: akyrtzi, benlangmuir

Differential Revision: https://reviews.llvm.org/D139052
2023-02-01 09:25:02 -08:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
William Huang
07107f3390 [llvm-profdata] Remove unnecessary file size check
Unsure why profile reader checks profile size to be less than 4 GB. This breaks builds using a very large profile.
The limit is not seen anywhere else, so I am not sure why is it there in the first place.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D140741
2023-01-03 19:24:45 +00:00
Fangrui Song
4e0e0bbd6b [ProfileData] llvm::Optional => std::optional 2022-12-12 09:11:55 +00:00
Kazu Hirata
aadaaface2 [llvm] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:44 -08:00
Fangrui Song
367997d0d6 [Support] Rename llvm::compression::{zlib,zstd}::uncompress to more appropriate decompress
This improves consistency with other places (e.g. llvm::compression::decompress,
llvm::object::Decompressor::decompress, llvm-objcopy).
Note: when zstd::uncompress was added, we noticed that the API `ZSTD_decompress`
is fine while the zlib API `uncompress` is a misnomer.
2022-09-17 12:35:17 -07:00
Kazu Hirata
0e9d37ff95 [llvm] Qualify auto in range-based for loops (NFC) 2022-08-28 23:29:00 -07:00
William Huang
b105656207 [Sample Profile Reader] Fix potential integer overflow/infinite loop bug in sample profile reader
Change loop induction variable type to match the type of "SIZE" where it's compared against, to prevent infinite loop caused by overflow wraparound if there are more than 2^32 samples

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D132493
2022-08-23 20:35:23 +00:00
liaochunyu
e36a71d87b [llvm-profdata][NFC] fix warning
warning: unused variable ‘FC’

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D132429
2022-08-23 14:09:19 +08:00
Kazu Hirata
a044d0491e [llvm-profdata] Support JSON as as an output-only format
This patch teaches llvm-profdata to output the sample profile in the
JSON format.  The new option is intended to be used for research and
development purposes.  For example, one can write a Python script to
take a JSON file and analyze how similar different inline instances of
a given function are to each other.

I've chosen JSON because Python can parse it reasonably fast, and it
just takes a couple of lines to read the whole data:

  import json
  with open ('profile.json') as f:
    profile = json.load(f)

Differential Revision: https://reviews.llvm.org/D130944
2022-08-09 16:24:53 -07:00
Fangrui Song
e690137dde [Support] Change compression::zlib::{compress,uncompress} to use uint8_t *
It's more natural to use uint8_t * (std::byte needs C++17 and llvm has
too much uint8_t *) and most callers use uint8_t * instead of char *.
The functions are recently moved into `llvm::compression::zlib::`, so
downstream projects need to make adaption anyway.
2022-07-13 16:26:54 -07:00
Cole Kissane
ea61750c35 [NFC] Refactor llvm::zlib namespace
* Refactor compression namespaces across the project, making way for a possible
  introduction of alternatives to zlib compression.
  Changes are as follows:
  * Relocate the `llvm::zlib` namespace to `llvm::compression::zlib`.

Reviewed By: MaskRay, leonardchan, phosek

Differential Revision: https://reviews.llvm.org/D128953
2022-07-08 11:19:07 -07:00
Hongtao Yu
e36786d15f [CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles.  `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D122602
2022-04-29 17:03:52 -07:00
Kazu Hirata
d3e5f0ab01 Apply clang-tidy fixes for readability-redundant-smartptr-get in SampleProfReader.cpp (NFC) 2022-03-28 09:18:39 -07:00
serge-sans-paille
fc97efa409 Cleanup includes: ProfileData
Estimation of the impact on preprocessor output:

before: 1067349756
after: 1065940348

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120434
2022-02-24 13:25:11 +01:00
Fangrui Song
045f07b7dc [ProfileData] Remove unused and racy FunctionSamples::Format after D51643
The write may be racy if ThinLTO creates multiple `InProcessThinBackend` instances.
2022-02-22 20:29:08 -08:00
Hongtao Yu
62ef77ca63 [CSSPGO] Do not merge a context that is already duplicated into the base profile.
Do not merge a context that is already duplicated into the base profile.

Also fixing a typo caused by previous refactoring.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D119735
2022-02-14 18:07:11 -08:00
Hongtao Yu
ff0b634d97 [CSSPGO] Print "context-nested" instead of "preilnined" for ProfileSummarySection.
Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D117141
2022-01-18 18:10:42 -08:00
Hongtao Yu
5740bb801a [CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.

For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts,  since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.

In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.

A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes,  a nested profile shown as below can also have a CFG checksum.

```

main:1968679:12
 2: 24
 3: 28 _Z5funcAi:18
 3.1: 28 _Z5funcBi:30
 3: _Z5funcAi:1467398
  0: 10
  1: 10 _Z8funcLeafi:11
  3: 24
  1: _Z8funcLeafi:1467299
   0: 6
   1: 6
   3: 287884
   4: 287864 _Z3fibi:315608
   15: 23
   !CFGChecksum: 138828622701
   !Attributes: 2
  !CFGChecksum: 281479271677951
  !Attributes: 2
```

Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
 - Changes in the sample loader to support probe-based nested profile.

I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.

Test Plan:

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D115205
2021-12-14 14:40:25 -08:00
Zarko Todorovski
95875d246a [LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm
Part of work to use more inclusive language in clang/llvm. Rewording
some comments and change function and variable names.
2021-11-24 17:29:55 -05:00
Hongtao Yu
d0eb472f33 [llvm-profdata] Print out section flags for FunctionMetadata section
As titled.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D113064
2021-11-02 17:59:22 -07:00
Simon Pilgrim
cb8d96e72f Fix Wdocumentation unknown parameter warning. NFCI. 2021-09-04 15:06:53 +01:00
Hongtao Yu
f4711e0d00 [CSSPGO] Sort function offset table to speed up profile loading.
With the context split work, the context-based (an array of strings) sorting performed at profile load time is way more expansive than single-string-based sorting. This is likely due to auxiliary operations done on each array element, such as indirect references, std::min operations, also likely cache misses. In this change I'm presorting profiles during profile generation time to avoid sorting at compile time.

Compared to the previous context-split work, this effectively cuts down compile time by 20% for one of our large services and brings us closer to non-CS build, with still a small gap in build time.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D109036
2021-09-01 12:17:48 -07:00
Hongtao Yu
7ca8030030 [CSSPGO] Enable loading MD5 CS profile.
Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster.

There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D108342
2021-09-01 09:19:47 -07:00
Hongtao Yu
b9db70369b [CSSPGO] Split context string to deduplicate function name used in the context.
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.

A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is  time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.

When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.

Testing
This is no-diff change in terms of code quality and profile content (for text profile).

For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.

The compile time of ads is dropped by 25%.

Differential Revision: https://reviews.llvm.org/D107299
2021-08-30 20:09:29 -07:00
Wenlei He
a45d72e024 [CSSPGO] Add switch for sample loader to honor global pre-inliner decision from llvm-profgen
The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy.

Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner.

Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked.

Differential Revision: https://reviews.llvm.org/D108677
2021-08-25 17:20:15 -07:00
Rong Xu
24201b6437 [SampleFDO] Set ProfileIsFS bit properly from the internal option
We have "-profile-isfs" internal option for text, binary, and
compactbinary format (mostly for debug and test purpose). We
need to set the related flag in FunctionSamples so that ProfileIsFS
is written to the header in extbinary format.

Differential Revision: https://reviews.llvm.org/D108707
2021-08-25 09:07:34 -07:00
Hongtao Yu
f27fee623d [SamplePGO][NFC] Dump function profiles in order
Sample profiles are stored in a string map which is basically an unordered map. Printing out profiles by simply walking the string map doesn't enforce an order. I'm sorting the map in the decreasing order of total samples to enable a more stable dump, which is good for comparing two dumps.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D108147
2021-08-16 17:22:30 -07:00
Rong Xu
9b8425e42c Reapply commit b7425e956
The commit b7425e956: [NFC] fix typos
is harmless but was reverted by accident. Reapply.
2021-08-16 12:18:40 -07:00
Kostya Kortchinsky
80ed75e7fb Revert "[NFC] Fix typos"
This reverts commit b7425e956be60a73004d7ae5bb37da85872c29fb.
2021-08-16 11:13:05 -07:00
Rong Xu
b7425e956b [NFC] Fix typos
s/senstive/senstive/g
2021-08-16 10:15:30 -07:00
Rong Xu
8d581857d7 [SampleFDO] New hierarchical discriminator for FS SampleFDO (llvm-profdata part)
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is for llvm-profdata part of change. It sets the bit masks for the
profile reader in llvm-profdata. Also add an internal option
"-fs-discriminator-pass" for show and merge command to process the profile
offline.

This patch also moved setDiscriminatorMaskedBitFrom() to
SampleProfileReader::create() to simplify the interface.

Differential Revision: https://reviews.llvm.org/D103550
2021-06-04 11:22:06 -07:00
Rong Xu
6745ffe4fa [SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part)
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is mainly for ProfileData part of change. It will load
FS Profile when such profile is detected. For an extbinary format profile,
create_llvm_prof tool will add a flag to profile summary section.
For other format profiles, the users need to use an internal option
(-profile-isfs) to tell the compiler that the profile uses FS discriminators.

This patch also simplified the bit API used by FS discriminators.

Differential Revision: https://reviews.llvm.org/D103041
2021-06-02 10:32:52 -07:00
Jonathan Crowther
e71994a239 [SystemZ][z/OS] Add IsText Argument to GetFile and GetFileOrSTDIN
Add the `IsText` argument to `GetFile` and `GetFileOrSTDIN` which will help z/OS distinguish between text and binary correctly. This is an extension to [this patch](https://reviews.llvm.org/D97785)

Reviewed By: abhina.sreeskantharajan, amccarth

Differential Revision: https://reviews.llvm.org/D100488
2021-04-16 10:08:36 -04:00