llvm-project

Author	SHA1	Message	Date
Kazu Hirata	9040dd469d	[memprof] Improve the way we express Frames in YAML (#119629 ) This patch does two things: - During deserialization, we accept a function name for Frame as an alternative to the usual GUID expressed as a hexadecimal number. - During serialization, we print a GUID of Frame as a 16-digit hexadecimal number prefixed with 0x in the usual way. (Without this patch, we print a decimal number, which is not customary.) The patch uses a machinery called "normalization" in YAML I/O, which lets us serialize and deserialize into an alternative data structure. For our use case, we have an alternative Frame data structure, which is identical to "struct Frame" except that Function is of type GUIDHex64 instead of GlobalValue::GUID. This alternative type supports the two bullet points above without modifying "struct Frame" at all.	2024-12-11 17:58:23 -08:00
Kazu Hirata	66edefaee5	[memprof] Move YAML support to MemProfYAML.h (NFC) (#119515 ) The YAML support is increasing in size, so this patch moves it to a separate file.	2024-12-11 09:17:16 -08:00
Kazu Hirata	684e79f254	[memprof] Add YAML read/write support to llvm-profdata (#118915 ) This patch adds YAML read/write support to llvm-profdata. The primary intent is to accommodate MemProf profiles in test cases, thereby avoiding the binary format. The read support is via llvm-profdata merge. This is useful when we want to verify that the compiler does the right thing on a given .ll file and a MemProf profile in a test case. In the test case, we would convert the MemProf profile in YAML to an indexed profile and invoke the compiler on the .ll file along with the indexed profile. The write support is via llvm-profdata show --memory. This is useful when we wish to convert an indexed MemProf profile to YAML while writing tests. We would compile a test case in C++, run it for an indexed MemProf profile, and then convert it to the text format.	2024-12-07 20:22:05 -08:00
ronryvchin	ff281f7d37	[PGO] Add option to always instrumenting loop entries (#116789 ) This patch extends the PGO infrastructure with an option to prefer the instrumentation of loop entry blocks. This option is a generalization of `19fb5b467b`, and helps to cover cases where the loop exit is never executed. An example where this can occur are event handling loops. Note that change does NOT change the default behavior.	2024-12-04 07:56:46 +01:00
Kazu Hirata	ad2bdd8fab	[memprof] Remove MemProf format Version 1 (#117357 ) This patch removes MemProf format Version 1 now that Version 2 and 3 are working well.	2024-11-22 11:53:31 -08:00
Kazu Hirata	f97c610d1f	[memprof] Add MemProfReader::takeMemProfData (#116769 ) This patch adds MemProfReader::takeMemProfData, a function to return the complete MemProf profile from the reader. We can directly pass its return value to InstrProfWriter::addMemProfData without having to deal with the indivual components of the MemProf profile. The new function is named "take", but it doesn't do std::move yet because of type differences (DenseMap v.s. MapVector). The end state I'm trying to get to is roughly as follows: - MemProfReader accepts IndexedMemProfData as a parameter as opposed to the three individual components (frames, call stacks, and records). - MemProfReader keeps IndexedMemProfData as a class member without decomposing it into its individual components. - MemProfReader returns IndexedMemProfData like: IndexedMemProfData takeMemProfData() { return std::move(MemProfData); }	2024-11-19 19:33:26 -08:00
lifengxiang1025	314e9b1cff	[llvm-profdata] fix typo (#116754 )	2024-11-20 10:52:16 +08:00
Kazu Hirata	0d38f64e7d	[memprof] Remove MemProf format Version 0 (#116442 ) This patch removes MemProf format Version 0 now that version 2 and 3 seem to be working well. I'm not touching version 1 for now because some tests still rely on version 1. Note that Version 0 is identical to Version 1 except that the MemProf section of the indexed format has a MemProf version field.	2024-11-15 15:37:00 -08:00
Teresa Johnson	bb3915149a	[MemProf] Support for random hotness when writing profile (#113998 ) Add support for generating random hotness in the memprof profile writer, to be used for testing. The random seed is printed to stderr, and an additional option enables providing a specific seed in order to reproduce a particular random profile.	2024-10-29 22:10:33 -07:00
Kazu Hirata	61a286ac08	[tools] Don't call StringRef::str() when calling StringMap::find (NFC) (#113119 ) StringMap::find takes StringRef. We don't need to create an instance of std::string from StringRef only to convert it right back to StringRef.	2024-10-21 06:50:34 -07:00
Kazu Hirata	75774c1c36	[llvm-profdata] Default to MemProf version 3 (#108863 ) It's very confusing to have support for Verion 3 but not default to it. This patch teaches llvm-profdata to use MemProf version 3 by default.	2024-10-11 08:56:45 -07:00
Kazu Hirata	55dd29c61d	[llvm-profdata] Avoid repeated hash lookups (NFC) (#111629 )	2024-10-08 23:02:04 -07:00
Corentin Kerisit	01135de401	[llvm-profdata] Fix typo in usage (#110434 ) From `profata` to `profdata`	2024-09-30 11:15:29 -04:00
gulfemsavrun	787cd8f0fe	[InstrProf] Add debuginfod correlation support (#106606 ) This patch adds debuginfod support into llvm-profdata to find the assosicated executable by a build id in a raw profile to correlate a profile with a provided correlation kind (debug-info or binary).	2024-09-06 13:28:23 -07:00
William Junda Huang	75e9d191f5	[llvm-profdata] Enabled functionality to write split-layout profile (#101795 ) Using the flag `-split_layout` in llvm-profdata merge, the output profile can write profiles with and without inlined function into two different extbinary sections (and their FuncOffsetTable too). The section without inlined functions are marked with `SecFlagFlat` and is skipped by ThinLTO because it provides no useful info. The split layout feature was already implemented in SampleProfWriter but previously there is no way to use it from llvm-profdata.	2024-08-28 20:33:54 -04:00
Mircea Trofin	afbd7d1e7c	[NFC] Coding style: drop `k` in `kGlobalIdentifierDelimiter` (#98230 )	2024-07-09 15:44:55 -07:00
Mircea Trofin	ce03155a1b	[NFC] Coding style fixes: SampleProf (#98208 ) Also some control flow simplifications. Notably, this doesn't address `sampleprof_error`. I think the style there tries to match `std::error_category`. Also left `hash_value` as-is, because it matches what we do in Hashing.h	2024-07-09 14:35:49 -07:00
Mingming Liu	3f78d89a2e	[TypeProf][InstrFDO]Omit vtable symbols in indexed profiles by default (#96520 ) - The indexed iFDO profiles contains compressed vtable names for `llvm-profdata show --show-vtables` debugging usage. An optimized build doesn't need it and doesn't decompress the blob now [1], since optimized binary has the source code and IR to find vtable symbols. - The motivation is to avoid increasing profile size when it's not necessary. - This doesn't change the indexed profile format and thereby doesn't need a version change. [1] `eac925fb81/llvm/include/llvm/ProfileData/InstrProfReader.h (L696-L699)`	2024-06-26 11:38:20 -07:00
Kazu Hirata	1365ce22e9	[llvm-profdata] Clean up traverseAllValueSites (NFC) (#95467 ) If NV == 0, nothing interesting happens after the "if" statement. We should just "continue" to the next value site. While I am at it, this patch migrates a use of getValueForSite to getValueArrayForSite.	2024-06-13 13:59:01 -07:00
Kazu Hirata	9e89d107a6	[memprof] Add MemProf format Version 3 (#93608 ) This patch adds Version 3 for development purposes. For now, this patch adds V3 as a copy of V2. For the most part, this patch adds "case Version3:" wherever "case Version2:" appears. One exception is writeMemProfV3, which is copied from writeMemProfV2 but updated to write out memprof::Version3 to the MemProf header. We'll incrementally modify writeMemProfV3 in subsequent patches.	2024-05-28 13:30:00 -07:00
Ellis Hoag	73eb9b3314	[InstrProf] Evaluate function order using test traces (#92451 ) The `llvm-profdata order` command is used to compute a function order using traces from the input profile. Add the `--num-test-traces` flag to keep aside N traces to evalute this order. These test traces are assumed to be the actual function execution order in some experiment. The output is a number that represents how many page faults we got. Lower is better. I tested on a large profile I already had. ``` llvm-profdata order default.profdata --num-test-traces=30 # Ordered 149103 functions # Total area under the page fault curve: 2.271827e+09 ... ``` I also improved `TemporalProfTraceTy::createBPFunctionNodes()` in a few ways: * Simplified how `UN`s are computed * Change how the initial `Node` order is computed * Filter out rare and common `UN`s * Output vector is an aliased argument instead of a return These changes slightly improved the evaluation in my test. ``` llvm-profdata order default.profdata --num-test-traces=30 # Ordered 149103 functions # Total area under the page fault curve: 2.268586e+09 ... ```	2024-05-23 11:19:29 -07:00
Fangrui Song	3fa6b3bbdb	[llvm-profdata] Fix some style and clang-tidy issues Fix #92761 Fix #92762	2024-05-20 17:15:52 -07:00
Ellis Hoag	f05c068429	[InstrProf] Remove unused argv in llvm-profdata.cpp (#92435 ) https://github.com/llvm/llvm-project/pull/71328 refactored `llvm-profdata.cpp` to use subcommands (which is super nice), but left many unused `argv` variables. This opts to use `ProgName` where necessary, and removes `argv` otherwise.	2024-05-16 15:40:02 -05:00
Kazu Hirata	4c8ec8f8bc	[memprof] Reduce schema for Version2 (#89876 ) Curently, the compiler only uses several fields of MemoryInfoBlock. Serializing all fields into the indexed MemProf file simply wastes storage. This patch limits the schema down to four fields for Version2 by default. It retains the old behavior of serializing all fields via: llvm-profdata merge --memprof-version=2 --memprof-full-schema This patch reduces the size of the indexed MemProf profile I have by 40% (1.6GB down to 1.0GB).	2024-04-24 16:25:35 -07:00
Kazu Hirata	172f6ddfa7	[memprof] Add Version2 of the indexed MemProf format (#89100 ) This patch adds Version2 of the indexed MemProf format. The new format comes with a hash table from CallStackId to actual call stacks llvm::SmallVector<FrameId>. The rest of the format refers to call stacks with CallStackId. This "values + references" model effectively deduplicates call stacks. Without this patch, a large indexed memprof file of mine shrinks from 4.4GB to 1.6GB, a 64% reduction. This patch does not make Version2 generally available yet as I am planning to make a few more changes to the format.	2024-04-18 14:12:58 -07:00
Kazu Hirata	3749e0d43f	[memprof] Use structured binding (NFC) (#88363 )	2024-04-11 09:54:41 -07:00
Kazu Hirata	2bede6873d	[memprof] Rename RawMemProfReader.{cpp,h} to MemProfReader.{cpp,h} (NFC) (#88200 ) This patch renames RawMemProfReader.{cpp,h} to MemProfReader.{cpp,h}, respectively. Also, it re-creates RawMemProfReader.h just to include MemProfReader.h for compatibility with out-of-tree users.	2024-04-10 22:03:20 -07:00
Mingming Liu	1351d17826	[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825 ) (The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691) * For InstrFDO value profiling, implement instrumentation and lowering for virtual table address. * This is controlled by `-enable-vtable-value-profiling` and off by default. * When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads. * Implement profile reader and writer support * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols. * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't happen since IR is used to construct InstrProfSymtab. * Indexed profile writer collects the list of vtable names, and stores that to index profiles. * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type. * `llvm-profdata show -show-vtables <args> <profile>` is implemented. rfc in https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7	2024-04-01 08:52:35 -07:00
Kazu Hirata	44253a9ce6	[memprof] Add MemProf version (#86414 ) This patch adds a version field to the MemProf section of the indexed profile format, calling the new version "version 1". The existing version is called "version 0". The writer supports both versions via a command-line option: llvm-profdata merge --memprof-version=1 ... The reader supports both versions by automatically detecting the version from the header.	2024-03-28 14:29:34 -07:00
Teresa Johnson	08ddd2ce40	[PGO] Add support for writing previous indexed format (#84505 ) Enable temporary support to ease use of new llvm-profdata with slightly older indexed profiles after 16e74fd48988ac95551d0f64e1b36f78a82a89a2, which bumped the indexed format for type profiling.	2024-03-08 12:27:46 -08:00
Mehdi Amini	716042a63f	Rename llvm::ThreadPool -> llvm::DefaultThreadPool (NFC) (#83702 ) The base class llvm::ThreadPoolInterface will be renamed llvm::ThreadPool in a subsequent commit. This is a breaking change: clients who use to create a ThreadPool must now create a DefaultThreadPool instead.	2024-03-05 18:00:46 -08:00
Mingming Liu	05091aa3ac	[NFC][InstrProf]Generalize getParsedIRPGOFuncName to getParsedIRPGOName (#81054 ) - Function getParsedIRPGOFuncName splits name by delimiter. The `[filename;]mangled-name` format could be generalized for non-function global values (e.g., vtables for type profiling). So rename the function. - Use kGlobalIdentifierDelimiter rather than semicolon directly for defragmentation.	2024-02-07 20:03:44 -08:00
Kazu Hirata	7d7f358cee	[llvm-profdata] Simplify a string equality comparison (NFC)	2024-01-27 22:20:27 -08:00
William Junda Huang	2b8649fbec	Added feature in llvm-profdata merge to filter functions from the profile (#78378 ) `--function=<regex>` Include functions matching regex in the output `--no-function=<regex>` Exclude functions matching regex from the output If both are specified, `--no-function` has a higher precedence if a function name matches both filters	2024-01-23 16:19:45 -05:00
Alexandre Ganea	3c6f47d6b8	[llvm-driver] Fix usage of `InitLLVM` on Windows (#76306 ) Previously, some tools such as `clang` or `lld` which require strict order for certain command-line options, such as `clang -cc1` or `lld -flavor`, would not longer work on Windows, when these tools were linked as part of `llvm-driver`. This was caused by `InitLLVM` which was part of the `*_main()` function of these tools, which in turn calls `windows::GetCommandLineArguments`. That function completly replaces argc/argv by new UTF-8 contents, so any ajustements to argc/argv made by `llvm-driver` prior to calling these tools was reset. `InitLLVM` is now called by the `llvm-driver`. Any tool that participates in (or is part of) the `llvm-driver` doesn't call `InitLLVM` anymore.	2024-01-11 19:08:28 -05:00
Ellis Hoag	9a2df55f47	[InstrProf] No linkage prefixes in IRPGO names (#76994 ) Change the format of IRPGO counter names to `[<filepath>;]<mangled-name>` which is computed by `GlobalValue::getGlobalIdentifier()` to fix #74565. In fe051934cbb0aaf25d960d7d45305135635d650b (https://reviews.llvm.org/D156569) the format of IRPGO counter names was changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is basically `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O (yes, it is confusing that `<linkage-name>` is computed with `Mangler().getNameWithPrefix()` while `<mangled-name>` is just `F.getName()`). We discovered in #74565 that this causes some missed import issues on some targets and #74008 is a partial fix. Since `<mangled-name>` may not match the `<linkage-name>` on some targets like Mach-O, we will need to post-process the output of `llvm-profdata order` before passing to the linker via `-order_file`. Profiles generated after fe051934cbb0aaf25d960d7d45305135635d650b will become stale after this diff, but I think this is acceptable since that patch landed after the LLVM 18 cut which hasn't been released yet.	2024-01-04 16:13:57 -08:00
Mingming Liu	665e46c268	[llvm-profdata] Use semicolon as the delimiter for supplementary profiles. (#75080 ) When merging instrFDO profiles with afdo profile as supplementary, instrFDO counters for static functions are stored with function's PGO name (with filename.cpp; prefix). - This pull request fixes the delimiter used when a PGO function name is 'normalized' for AFDO look-up.	2024-01-04 15:03:18 -08:00
Kazu Hirata	9664ab570a	[llvm-profdata] Modernize FuncSampleStats, ValueSitesStats, and HotFuncInfo (NFC)	2023-12-21 10:43:04 -08:00
Zequan Wu	ab3430f891	[Profile] Add binary profile correlation for code coverage. (#69493 ) ## Motivation Since we don't need the metadata sections at runtime, we can somehow offload them from memory at runtime. Initially, I explored [debug info correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113), which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it's be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a platform independent option. ## Design The idea is to use new section names for profile name and data sections and mark them as metadata sections. Under this mode, the new sections are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile. ## Data For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduce from 128M to 68M (46.9%) ``` $ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data [NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table [ = ] 0 +65% +1.23Ki .relro_padding +62% +1.20Ki [ = ] 0 [Unmapped] +13% +448 +19% +448 .init_array +8.8% +192 [ = ] 0 [ELF Section Headers] +0.0% +136 +0.0% +80 [7 Others] +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.5% +80 +1.2% +64 .plt [ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]] +195% +64.0Mi +194% +64.0Mi TOTAL $ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table +13% +448 +19% +448 .init_array +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.2% +64 +1.2% +64 .plt +2.9% +64 [ = ] 0 [ELF Section Headers] +0.0% +40 +0.0% +40 .data +1.2% +32 +1.2% +32 .got.plt +0.0% +24 +0.0% +8 [5 Others] [ = ] 0 -22.9% -872 [LOAD #5 [RW]] -74.5% -1.44Ki [ = ] 0 [Unmapped] [ = ] 0 -76.5% -1.45Ki .relro_padding +118% +38.8Mi +117% +38.8Mi TOTAL ``` A few things to note: 1. llvm-profdata doesn't support filter raw profiles by binary id yet, so when a raw profile doesn't belongs to the binary being digested by llvm-profdata, merging will fail. Once this is implemented, llvm-profdata should be able to only merge raw profiles with the same binary id as the binary and discard the rest (with mismatched/missing binary id). The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging. 2. Note: In COFF, currently they are still loaded into memory but not used. I didn't do it in this patch because I noticed that `.lcovmap` and `.lcovfunc` are loaded into memory. A separate patch will address it. 3. This should works with PGO when value profiling is disabled as debug info correlation currently doing, though I haven't tested this yet.	2023-12-14 14:16:38 -05:00
Kazu Hirata	586ecdf205	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-11 21:01:36 -08:00
lifengxiang1025	58199dfb55	[NFC] Typo fix (#74033 ) Fix spelling error from `linakge` to `linkage`. Co-authored-by: lifengxiang <lifengxiang.1025@bytedance.com>	2023-12-01 14:16:01 +08:00
Mingming Liu	493e2400ca	[nfc][llvm-profdata] Use cl::Subcommand to organize subcommand and options in llvm-profdata (#71328 ) - The motivation is to reduce the number of arguments passed around (e.g., from `show_main` to `show*Profile`). In order to do this, move function-defined options to global variables, and create `cl::SubCommand` for {show, merge, overlap, order} to organize options. - The side-effect by extracting function local options to a C++ namespace is that the extracted options are no longer (lazily) initialized when the enclosing function runs for the first time. - `cl::Subcommand` support (introduced in https://lists.llvm.org/pipermail/llvm-dev/2016-June/101804.html) could put options in a per-subcommand namespace. - One option could belong to multiple subcommand. This patch defines most of the options once and associates them with multiple subcommands except 1. `overlap` and `show` both has `value-cutoff` with different default values ([former](`64f62de966/llvm/tools/llvm-profdata/llvm-profdata.cpp (L2352)`) vs [latter](`64f62de966/llvm/tools/llvm-profdata/llvm-profdata.cpp (L3009)`)). Define 'OverlapValueCutoff' and 'ShowValueCutoff' respectively. 2. `show` supports three profile formats in `ProfileKind` while {`merge`, `overlap`} supports two. Define separate options. - Clean up obsolete code as a result, including `-h` and `--version` customizations. These two options are supported for all commands. Results pasted. - [-h and --help](https://gist.github.com/minglotus-6/387490e5eeda2dd2f9c440a424d6f360) output. - [--version](https://gist.github.com/minglotus-6/f905abcc3a346957bd797f2f84c18c1b) - [llvm-profdata show --help](https://gist.github.com/minglotus-6/f143079f02af243a94758138c0af471a) This PR should be `llvm-profdata` only. It depends on https://github.com/llvm/llvm-project/pull/71981	2023-11-14 10:19:13 -08:00
Zequan Wu	3c97c8b6fc	[Profile] Refactor profile correlation. (#70856 ) Refactor some code from https://github.com/llvm/llvm-project/pull/69493. #70712 was reverted due to linking failures. So, `-debug-info-correlate` remains unchanged and no new flag added.	2023-11-01 14:16:43 -04:00
Zequan Wu	89a2e70159	[llvm-profdata] Emit warning when counter value is greater than 2^56. (#69513 ) Fixes #65416	2023-10-31 16:40:51 -04:00
Zequan Wu	db7a1ed9a2	Revert "[Profile] Refactor profile correlation. (#70712 )" This reverts commit 4b383d0af93136b80841fc140da0823dfc441dd4.	2023-10-31 10:53:45 -04:00
Zequan Wu	4b383d0af9	[Profile] Refactor profile correlation. (#70712 ) Refactor some code from https://github.com/llvm/llvm-project/pull/69493. Rebase of https://github.com/llvm/llvm-project/pull/69656 on top of main as it was messed up.	2023-10-31 10:41:01 -04:00
William Junda Huang	ef0e0adccd	[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164 ) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.org/D147740 In previous implementation, when a MD5 Sample Profile is read, the reader first converts the MD5 values to strings, and then create a StringRef as if the numerical strings are regular function names, and later on IPO transformation passes perform string comparison over these numerical strings for profile matching. This is inefficient since it causes many small heap allocations. In this patch I created a class `ProfileFuncRef` that is similar to `StringRef` but it can represent a hash value directly without any conversion, and it will be more efficient (I will attach some benchmark results later) when being used in associative containers. ProfileFuncRef guarantees the same function name in string form or in MD5 form has the same hash value, which also fix a few issue in IPO passes where function matching/lookup only check for function name string, while returns a no-match if the profile is MD5. When testing on an internal large profile (> 1 GB, with more than 10 million functions), the full profile load time is reduced from 28 sec to 25 sec in average, and reading function offset table from 0.78s to 0.7s	2023-10-17 21:09:39 +00:00
Mingming Liu	1c2634e316	[NFC]Rename InstrProf::getFuncName{,orExternalSymbol} to getFuncOrValName{,IfDefined} (#68240 ) - This function looks up MD5ToNameMap to return a name for a given MD5. https://github.com/llvm/llvm-project/pull/66825 adds MD5 of global variable names into this map. So rename methods and update comments	2023-10-04 11:56:28 -07:00
Kazu Hirata	d27614e1d3	[llvm-profdata] Modernize SampleOverlapStats (NFC)	2023-08-19 07:56:37 -07:00
William Huang	7624de5bea	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740	2023-08-17 20:10:45 +00:00

1 2 3 4 5 ...

320 Commits