llvm-project

Author	SHA1	Message	Date
Ellis Hoag	9a2df55f47	[InstrProf] No linkage prefixes in IRPGO names (#76994 ) Change the format of IRPGO counter names to `[<filepath>;]<mangled-name>` which is computed by `GlobalValue::getGlobalIdentifier()` to fix #74565. In fe051934cbb0aaf25d960d7d45305135635d650b (https://reviews.llvm.org/D156569) the format of IRPGO counter names was changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is basically `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O (yes, it is confusing that `<linkage-name>` is computed with `Mangler().getNameWithPrefix()` while `<mangled-name>` is just `F.getName()`). We discovered in #74565 that this causes some missed import issues on some targets and #74008 is a partial fix. Since `<mangled-name>` may not match the `<linkage-name>` on some targets like Mach-O, we will need to post-process the output of `llvm-profdata order` before passing to the linker via `-order_file`. Profiles generated after fe051934cbb0aaf25d960d7d45305135635d650b will become stale after this diff, but I think this is acceptable since that patch landed after the LLVM 18 cut which hasn't been released yet.	2024-01-04 16:13:57 -08:00
Mingming Liu	665e46c268	[llvm-profdata] Use semicolon as the delimiter for supplementary profiles. (#75080 ) When merging instrFDO profiles with afdo profile as supplementary, instrFDO counters for static functions are stored with function's PGO name (with filename.cpp; prefix). - This pull request fixes the delimiter used when a PGO function name is 'normalized' for AFDO look-up.	2024-01-04 15:03:18 -08:00
Kazu Hirata	9664ab570a	[llvm-profdata] Modernize FuncSampleStats, ValueSitesStats, and HotFuncInfo (NFC)	2023-12-21 10:43:04 -08:00
Zequan Wu	ab3430f891	[Profile] Add binary profile correlation for code coverage. (#69493 ) ## Motivation Since we don't need the metadata sections at runtime, we can somehow offload them from memory at runtime. Initially, I explored [debug info correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113), which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it's be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a platform independent option. ## Design The idea is to use new section names for profile name and data sections and mark them as metadata sections. Under this mode, the new sections are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile. ## Data For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduce from 128M to 68M (46.9%) ``` $ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data [NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table [ = ] 0 +65% +1.23Ki .relro_padding +62% +1.20Ki [ = ] 0 [Unmapped] +13% +448 +19% +448 .init_array +8.8% +192 [ = ] 0 [ELF Section Headers] +0.0% +136 +0.0% +80 [7 Others] +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.5% +80 +1.2% +64 .plt [ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]] +195% +64.0Mi +194% +64.0Mi TOTAL $ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table +13% +448 +19% +448 .init_array +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.2% +64 +1.2% +64 .plt +2.9% +64 [ = ] 0 [ELF Section Headers] +0.0% +40 +0.0% +40 .data +1.2% +32 +1.2% +32 .got.plt +0.0% +24 +0.0% +8 [5 Others] [ = ] 0 -22.9% -872 [LOAD #5 [RW]] -74.5% -1.44Ki [ = ] 0 [Unmapped] [ = ] 0 -76.5% -1.45Ki .relro_padding +118% +38.8Mi +117% +38.8Mi TOTAL ``` A few things to note: 1. llvm-profdata doesn't support filter raw profiles by binary id yet, so when a raw profile doesn't belongs to the binary being digested by llvm-profdata, merging will fail. Once this is implemented, llvm-profdata should be able to only merge raw profiles with the same binary id as the binary and discard the rest (with mismatched/missing binary id). The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging. 2. Note: In COFF, currently they are still loaded into memory but not used. I didn't do it in this patch because I noticed that `.lcovmap` and `.lcovfunc` are loaded into memory. A separate patch will address it. 3. This should works with PGO when value profiling is disabled as debug info correlation currently doing, though I haven't tested this yet.	2023-12-14 14:16:38 -05:00
Kazu Hirata	586ecdf205	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-11 21:01:36 -08:00
lifengxiang1025	58199dfb55	[NFC] Typo fix (#74033 ) Fix spelling error from `linakge` to `linkage`. Co-authored-by: lifengxiang <lifengxiang.1025@bytedance.com>	2023-12-01 14:16:01 +08:00
Mingming Liu	493e2400ca	[nfc][llvm-profdata] Use cl::Subcommand to organize subcommand and options in llvm-profdata (#71328 ) - The motivation is to reduce the number of arguments passed around (e.g., from `show_main` to `show*Profile`). In order to do this, move function-defined options to global variables, and create `cl::SubCommand` for {show, merge, overlap, order} to organize options. - The side-effect by extracting function local options to a C++ namespace is that the extracted options are no longer (lazily) initialized when the enclosing function runs for the first time. - `cl::Subcommand` support (introduced in https://lists.llvm.org/pipermail/llvm-dev/2016-June/101804.html) could put options in a per-subcommand namespace. - One option could belong to multiple subcommand. This patch defines most of the options once and associates them with multiple subcommands except 1. `overlap` and `show` both has `value-cutoff` with different default values ([former](`64f62de966/llvm/tools/llvm-profdata/llvm-profdata.cpp (L2352)`) vs [latter](`64f62de966/llvm/tools/llvm-profdata/llvm-profdata.cpp (L3009)`)). Define 'OverlapValueCutoff' and 'ShowValueCutoff' respectively. 2. `show` supports three profile formats in `ProfileKind` while {`merge`, `overlap`} supports two. Define separate options. - Clean up obsolete code as a result, including `-h` and `--version` customizations. These two options are supported for all commands. Results pasted. - [-h and --help](https://gist.github.com/minglotus-6/387490e5eeda2dd2f9c440a424d6f360) output. - [--version](https://gist.github.com/minglotus-6/f905abcc3a346957bd797f2f84c18c1b) - [llvm-profdata show --help](https://gist.github.com/minglotus-6/f143079f02af243a94758138c0af471a) This PR should be `llvm-profdata` only. It depends on https://github.com/llvm/llvm-project/pull/71981	2023-11-14 10:19:13 -08:00
Zequan Wu	3c97c8b6fc	[Profile] Refactor profile correlation. (#70856 ) Refactor some code from https://github.com/llvm/llvm-project/pull/69493. #70712 was reverted due to linking failures. So, `-debug-info-correlate` remains unchanged and no new flag added.	2023-11-01 14:16:43 -04:00
Zequan Wu	89a2e70159	[llvm-profdata] Emit warning when counter value is greater than 2^56. (#69513 ) Fixes #65416	2023-10-31 16:40:51 -04:00
Zequan Wu	db7a1ed9a2	Revert "[Profile] Refactor profile correlation. (#70712 )" This reverts commit 4b383d0af93136b80841fc140da0823dfc441dd4.	2023-10-31 10:53:45 -04:00
Zequan Wu	4b383d0af9	[Profile] Refactor profile correlation. (#70712 ) Refactor some code from https://github.com/llvm/llvm-project/pull/69493. Rebase of https://github.com/llvm/llvm-project/pull/69656 on top of main as it was messed up.	2023-10-31 10:41:01 -04:00
William Junda Huang	ef0e0adccd	[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164 ) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.org/D147740 In previous implementation, when a MD5 Sample Profile is read, the reader first converts the MD5 values to strings, and then create a StringRef as if the numerical strings are regular function names, and later on IPO transformation passes perform string comparison over these numerical strings for profile matching. This is inefficient since it causes many small heap allocations. In this patch I created a class `ProfileFuncRef` that is similar to `StringRef` but it can represent a hash value directly without any conversion, and it will be more efficient (I will attach some benchmark results later) when being used in associative containers. ProfileFuncRef guarantees the same function name in string form or in MD5 form has the same hash value, which also fix a few issue in IPO passes where function matching/lookup only check for function name string, while returns a no-match if the profile is MD5. When testing on an internal large profile (> 1 GB, with more than 10 million functions), the full profile load time is reduced from 28 sec to 25 sec in average, and reading function offset table from 0.78s to 0.7s	2023-10-17 21:09:39 +00:00
Mingming Liu	1c2634e316	[NFC]Rename InstrProf::getFuncName{,orExternalSymbol} to getFuncOrValName{,IfDefined} (#68240 ) - This function looks up MD5ToNameMap to return a name for a given MD5. https://github.com/llvm/llvm-project/pull/66825 adds MD5 of global variable names into this map. So rename methods and update comments	2023-10-04 11:56:28 -07:00
Kazu Hirata	d27614e1d3	[llvm-profdata] Modernize SampleOverlapStats (NFC)	2023-08-19 07:56:37 -07:00
William Huang	7624de5bea	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740	2023-08-17 20:10:45 +00:00
Ellis Hoag	d687caae00	[InstrProf] Emit warnings when correlating lightweight profiles Emit warnings when `InstrProfCorrelator` finds problems with debug info for lightweight instrumentation profile correlation. To prevent excessive printing, only emit the first 5 warnings. In addition, remove a diagnostic about missing debug info in `InstrProfiling.cpp`. Some compiler-generated functions, e.g., `__clang_call_terminate`, does not emit debug info and will fail a build if `-Werror` is used. This warning is not actionable by the user and I have not seen non-compiler-generated functions fail this test. Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D156006	2023-08-15 15:28:16 -07:00
Ellis Hoag	fe051934cb	[InstrProf] Encode linkage names in IRPGO counter names Prior to this diff, names in the `__llvm_prf_names` section had the format `[<filepath>:]<function-name>`, e.g., `main.cpp:foo`, `bar`. `<filepath>` is used to discriminate between possibly identical function names when linkage is local and `<function-name>` simply comes from `F.getName()`. This has two problems: * `:` is commonly found in Objective-C functions so that names like `main.mm:-[C foo::]` and `-[C bar::]` are difficult to parse * `<function-name>` might be different from the linkage name, so it cannot be used to pass a function order to the linker via `-symbol-ordering-file` or `-order_file` (see https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068) Instead, this diff changes the format to `[<filepath>;]<linkage-name>`, e.g., `main.cpp;_foo`, `_bar`. The hope is that `;` won't realistically be found in either `<filepath>` or `<linkage-name>`. To prevent invalidating all prior IRPGO profiles, we also lookup the prior name format when a record is not found (see `InstrProfSymtab::create()`, `readMemprof()`, and `getInstrProfRecord()`). It seems that Swift and Clang FE-PGO rely on the original `getPGOFuncName()`, so we cannot simply replace it. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D156569	2023-08-07 10:15:08 -07:00
Aaron Ballman	1a53b5c367	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292. Addressing issues found by: https://lab.llvm.org/buildbot/#/builders/245/builds/11732 https://lab.llvm.org/buildbot/#/builders/187/builds/12251 https://lab.llvm.org/buildbot/#/builders/186/builds/11099 https://lab.llvm.org/buildbot/#/builders/182/builds/6976	2023-07-28 09:41:38 -04:00
William Huang	66ba71d913	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740	2023-07-27 23:08:27 +00:00
Haojian Wu	58056ae299	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 12e9c7aaa66b7624b5d7666ce2794d912bf9e4b7. The commit has broken the buildbot, see comment https://reviews.llvm.org/D147740#4451540	2023-06-27 15:19:35 +02:00
William Huang	12e9c7aaa6	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740	2023-06-27 00:06:05 +00:00
Douglas Yung	c9a8a0e8a9	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 31af18bccea95fe1ae8aa2c51cf7c8e92a1c208e. This change is causing build failures on many Windows build bots: https://lab.llvm.org/buildbot/#/builders/216/builds/22833 https://lab.llvm.org/buildbot/#/builders/123/builds/19602 https://lab.llvm.org/buildbot/#/builders/172/builds/28315 https://lab.llvm.org/buildbot/#/builders/119/builds/13870 https://lab.llvm.org/buildbot/#/builders/233/builds/794 https://lab.llvm.org/buildbot/#/builders/235/builds/387 https://lab.llvm.org/buildbot/#/builders/13/builds/36921 https://lab.llvm.org/buildbot/#/builders/127/builds/50510	2023-06-23 17:58:22 -07:00
William Huang	31af18bcce	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740	2023-06-23 21:48:52 +00:00
serge-sans-paille	f6be0814c3	[llvm-profdata] Fix llvm-profdata help and make sure it remains in sync This makes the new `order` subcommand part of the help. As a side effect, also make llvm::map_range compatible with plain arrays. Differential Revision: https://reviews.llvm.org/D153303	2023-06-20 10:25:29 +02:00
Ellis Hoag	1117b9a284	[InstrProf] Use BalancedPartitioning to order temporal profiling trace data In [0] we described an algorithm called //BalancedPartitioning// (bp) to consume function traces [1] and compute a function order that reduces the number of page faults during startup. This patch adds the `order` command to the `llvm-profdata` tool which uses bp to output a function order that can be passed to the linker via `--symbol-ordering-file=`. Special thanks to Sergey Pupyrev and Julian Mestre for designing this balanced partitioning algorithm. [0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068 [1] https://reviews.llvm.org/D147287 Reviewed By: spupyrev Differential Revision: https://reviews.llvm.org/D147812	2023-06-06 11:59:57 -07:00
Michael Platings	6521905389	[llvm-profdata] Accept --version argument The `llvm-profdata --version` output now looks like: llvm-profdata LLVM (http://llvm.org/): LLVM version 17.0.0git Optimized build with assertions. This makes llvm-profdata more consistent with other tools. Reviewed By: simon_tatham Differential Revision: https://reviews.llvm.org/D150964	2023-05-22 14:44:03 +01:00
William Huang	d1d4e56433	[llvm-profdata] Change default output format of llvm-profdata to ExtBinary ExtBinary is compatible to, and more superior than Binary format, which is the current default output format. In the long run we are looking to only support ExtBinary format and Text format (for visual inspection), and drop Binary format as well. Since Binary format was the default, we expect many users are still using it, so let's change the default output format first, and hopefully the usage decreases over time Reviewed By: davidxl, hoy Differential Revision: https://reviews.llvm.org/D149700	2023-05-04 19:34:12 +00:00
William Huang	d38d6ca179	[llvm-profdata] Deprecate Compact Binary Sample Profile Format Remove support for compact binary sample profile format Reviewed By: davidxl, wenlei Differential Revision: https://reviews.llvm.org/D149400	2023-05-01 17:10:08 +00:00
Jessica Paquette	17cfd2e025	[profiling] Improve error message for raw profile header mismatches When a user uses a mismatched clang + llvm-profdata, they didn't get a very informative error message. It would just say "unsupported version". As a result, users are often confused as to what they are supposed to do and tend to assume that it's a bug in the profiling runtime. This patch improves the error message by: - Adding a new class of error (`raw_profile_version_mismatch`) to make it clear that, specifically, the raw profile version is unsupported because of a tool mismatch. - Adding an error message that tells the user which raw profile version was encountered, which version was expected, and instructs them to align their tool versions. To support this, this patch also updates `InstrProfError::take` to also propagate the optional error message. Differential Revision: https://reviews.llvm.org/D149361	2023-04-27 14:51:38 -07:00
Ellis Hoag	4bddef4117	[InstrProf][Temporal] Add weight field to traces As discussed in [0], add a `weight` field to temporal profiling traces found in profiles. This allows users to use the `--weighted-input=` flag in the `llvm-profdata merge` command to weight traces from different scenarios differently. Note that this is a breaking change, but since [1] landed very recently and there is no way to "use" this trace data, there should be no users of this feature. We believe it is acceptable to land this change without bumping the profile format version. [0] https://reviews.llvm.org/D147812#4259507 [1] https://reviews.llvm.org/D147287 Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D148150	2023-04-13 10:37:05 -07:00
Ellis Hoag	244be0b0de	[InstrProf] Temporal Profiling As described in [0], this extends IRPGO to support //Temporal Profiling//. When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile. Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively. In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup. Special thanks to Julian Mestre for helping with reservoir sampling. [0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068 Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D147287	2023-04-11 08:30:52 -07:00
wlei	339b8a0019	[AutoFDO] Use flattened profiles for profile staleness metrics For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching. Example for profile flattening: ``` Original profile: _Z3bazi:20301:1000 1: 1000 3: 2000 5: inline1:1600 1: 600 3: inline2:500 1: 500 Flattened profile: _Z3bazi:18701:1000 1: 1000 3: 2000 5: 600 inline1:600 inline1:1100:600 1: 600 3: 500 inline2: 500 inline2:500:500 1: 500 ``` This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D146452	2023-03-30 11:05:10 -07:00
Kazu Hirata	b595eb83e5	[llvm] Use *{Set,Map}::contains (NFC)	2023-03-14 18:56:07 -07:00
Alex Brachet	1f173a0653	[llvm-driver] Pass extra arguments to tools Differential Revision: https://reviews.llvm.org/D137799	2023-02-10 19:42:32 +00:00
William Huang	79971d0d77	[llvm-profdata] Add option to cap profile output size D139603 (add option to llvm-profdata to reduce output profile size) contains test cases that are not cross-platform. Moving those tests to unit test and making sure the feature is callable from llvm library Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D141446	2023-02-08 22:21:33 +00:00
William Huang	981218e0f8	Revert "[llvm-profdata] Add option to cap profile output size" This reverts commit 48f163b889a8f373474c7d198c43e27779f38692.	2023-02-08 02:29:12 +00:00
William Huang	48f163b889	[llvm-profdata] Add option to cap profile output size D139603 (add option to llvm-profdata to reduce output profile size) contains test cases that are not cross-platform. Moving those tests to unit test and making sure the feature is callable from llvm library Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D141446	2023-02-08 02:17:12 +00:00
Steven Wu	516e301752	[NFC][Profile] Access profile through VirtualFileSystem Make the access to profile data going through virtual file system so the inputs can be remapped. In the context of the caching, it can make sure we capture the inputs and provided an immutable input as profile data. Reviewed By: akyrtzi, benlangmuir Differential Revision: https://reviews.llvm.org/D139052	2023-02-01 09:25:02 -08:00
Vitaly Buka	c37694817a	Revert "Fix to D139603(reverted) - moved size check to unit test so that it is cross-platform" Several bots are broken, details in https://reviews.llvm.org/D141446 This reverts commit c268f850a2998eb5370c07c74d7d0756dcc851c9.	2023-01-11 23:24:22 -08:00
William Huang	c268f850a2	Fix to D139603(reverted) - moved size check to unit test so that it is cross-platform D139603 (add option to llvm-profdata to reduce output profile size) contains test cases that are not cross-platform. Moving those tests to unit test and making sure the feature is callable from llvm library Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D141446	2023-01-12 00:40:57 +00:00
Fangrui Song	72bdcee557	[llvm-profdata] Remove an unused include after D115915	2023-01-11 15:18:10 -08:00
Douglas Yung	ac07911b45	Revert "[llvm-profdata] Add option to cap profile output size" This reverts commit 5b72d0e4f5eeb8f90c744cac8e0728cffeca61a9. The test added is failing on Mac/Windows. See review for buildbot failure links.	2023-01-09 23:53:14 -08:00
William Huang	5b72d0e4f5	[llvm-profdata] Add option to cap profile output size Allow user to specify `--output-size-limit=n` to cap the size of generated profile to be strictly under n. Functions with the lowest total sample count are dropped first if necessary. Due to using a heuristic, excessive functions may be dropped to satisfy the size requirement Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D139603	2023-01-09 22:01:10 +00:00
Gulfem Savrun Yeniceri	1ae7d83803	[profile] Add binary ids into indexed profiles This patch adds support for including binary ids in an indexed profile. It adds a new field into the header that points to the offset of the binary id section. The binary id section consists of a size of the section, and a list of binary ids (if they are present) that consist of two parts: length and data. This patch guarantees that indexed profile is backwards compatible after adding binary ids. Differential Revision: https://reviews.llvm.org/D135929	2022-12-29 18:46:56 +00:00
Gulfem Savrun Yeniceri	59b3d8f1db	Revert "[profile] Add binary ids into indexed profiles" This reverts commit 7734053fd98e7d5ddc749808ce38134686425fb7 because it broke powerpc64 bot: https://lab.llvm.org/buildbot#builders/231/builds/6229	2022-12-14 21:48:41 +00:00
Gulfem Savrun Yeniceri	7734053fd9	[profile] Add binary ids into indexed profiles This patch adds support for including binary ids in an indexed profile. It adds a new field into the header that points to the offset of the binary id section. The binary id section consists of a size of the section, and a list of binary ids (if they are present) that consist of two parts: length and data. This patch guarantees that indexed profile is backwards compatible after adding binary ids. Differential Revision: https://reviews.llvm.org/D135929	2022-12-14 20:26:36 +00:00
Ellis Hoag	769c7ad2b1	[InstrProf] Fix bug when merging empty profile with multiple threads When merging profiles with multiple threads, the `mergeWriterContexts()` function is used to merge profile data between writers. This must be in sync with `loadInput()` which merges profiles to a single writer. This diff merges the profile kind correctly in `mergeWriterContexts()` to fix a subtle bug. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D139755	2022-12-12 09:23:26 -08:00
Hongtao Yu	ad03f40792	[llvm-profdata] Drop profile symbol list during merging AutoFDO profiles. Adding a switch to drop profile symbol list during merging AutoFDO profiles. This is needed to minimize the impact on default profiles when the profile symbol list is enabled for the source input profiles. The symbol list is quite large and could potentially slow down the compiler. Reviewed By: davidxl, wenlei Differential Revision: https://reviews.llvm.org/D139486	2022-12-06 21:11:50 -08:00
Rong Xu	077baefc99	[llvm-profdata] Use flattening sample profile in profile supplementation We need to flatten the SampleFDO profile in profile supplementation because the InstrFDO profile does not have inlined callsite counters. Without flattening profile, FDO optimizations are not stable: we will not supplement the second generation profile when the modified functions are all inlined. This patch fixes this issue: we will flatten the profile for functions that appears in FDO profile. Note that we only need to find the hot/warm functions in SampleFDO profile, so we will not perform a full flatten. We will use a DFS traversal to compute the accumulated entry count and max bodycount. This is much cheaper than full flattening. Differential Revision: https://reviews.llvm.org/D138893	2022-11-29 22:23:47 -08:00
Kazu Hirata	20d2432040	[llvm-profdata] Use std::optional in llvm-profdata.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-26 18:59:41 -08:00

1 2 3 4 5 ...

285 Commits