312 Commits

Author SHA1 Message Date
Kazu Hirata
4d1bb7699b
[memprof] Fix a typo in writeMemProfV1 (#87890)
This patch borrows memprof-merge.test to test --memprof-version.
2024-04-07 15:06:13 -07:00
Mingming Liu
1351d17826
[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825)
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)

* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
 
* Implement profile reader and writer support 
  * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
  * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
  * Indexed profile writer collects the list of vtable names, and stores that to index profiles.
  * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.

rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
2024-04-01 08:52:35 -07:00
Teresa Johnson
08ddd2ce40
[PGO] Add support for writing previous indexed format (#84505)
Enable temporary support to ease use of new llvm-profdata with slightly
older indexed profiles after 16e74fd48988ac95551d0f64e1b36f78a82a89a2,
which bumped the indexed format for type profiling.
2024-03-08 12:27:46 -08:00
Alan Zhao
922a431e10
[profdata][nfc] Disable several tests on Windows (#83907)
Several profdata tests pass the byte 012 to printf. This causes these
tests to fail when using GnuWin32's version of printf because printf
will detect that 012 is the LF character and will prepend the byte 015
(CR) in front of LF.

This change is required after
https://github.com/llvm/llvm-project/pull/82711 which bumped the version
number.
2024-03-04 13:15:50 -08:00
Mingming Liu
16e74fd489
Reland "[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling." (#82711)
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.

1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
   
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))

**Original Description**

* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-27 11:07:40 -08:00
Mingming Liu
0e8d1877cd
Revert type profiling change as compiler-rt test break on Windows. (#82583)
Examples
https://lab.llvm.org/buildbot/#/builders/127/builds/62532/steps/8/logs/stdio
2024-02-21 21:41:33 -08:00
Mingming Liu
db7e9e6841
[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling. (#81691)
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-21 20:59:42 -08:00
William Junda Huang
2b8649fbec
Added feature in llvm-profdata merge to filter functions from the profile (#78378)
`--function=<regex>` Include functions matching regex in the output
`--no-function=<regex>` Exclude functions matching regex from the output

If both are specified, `--no-function` has a higher precedence if a
function name matches both filters
2024-01-23 16:19:45 -05:00
Mingming Liu
665e46c268
[llvm-profdata] Use semicolon as the delimiter for supplementary profiles. (#75080)
When merging instrFDO profiles with afdo profile as supplementary, instrFDO counters for static functions are stored with function's PGO name (with filename.cpp; prefix).
- This pull request fixes the delimiter used when a PGO function name is 'normalized' for AFDO look-up.
2024-01-04 15:03:18 -08:00
Kazu Hirata
1c1eaf75f5 [llvm-profdata] Make tests more readable (NFC)
This patch splits a couple of lines of printf into four for
readability so that each corresponds to one field or padding.  They
correspond to NumCounters, NumValueSites, NumBitmapBytes, and padding,
respectively.
2023-12-27 13:28:37 -08:00
Kazu Hirata
ce02357795 [llvm-profdata] Make tests more readable (NFC)
This patch splits a couple of lines of printf into four for
readability so that each corresponds to one field or padding.  They
correspond to NumCounters, NumValueSites, NumBitmapBytes, and padding,
respectively.
2023-12-27 13:19:09 -08:00
Kazu Hirata
b8424eaede [llvm-profdata] Make tests more readable (NFC)
These tests generally use one printf for each field of
RawInstrProf::ProfileData except the lines being touched in this
patch.  These lines print two fields, namely NumValueSites and
NumBitmapBytes, with one printf, which is very confusing.  (Note that
the 4-byte printf at the end of the group is padding to make the
struct size a multiple of 8 bytes.)

This patch makes the tests a litle more readable by splitting
NumValueSites and NumBitmapBytes into two separate lines.
2023-12-26 23:31:21 -08:00
Mingming Liu
493e2400ca
[nfc][llvm-profdata] Use cl::Subcommand to organize subcommand and options in llvm-profdata (#71328)
- The motivation is to reduce the number of arguments passed around
(e.g., from `show_main` to `show*Profile`). In order to do this, move
function-defined options to global variables, and create
`cl::SubCommand` for {show, merge, overlap, order} to organize options.
- The side-effect by extracting function local options to a C++
namespace is that the extracted options are no longer (lazily)
initialized when the enclosing function runs for the first time.
- `cl::Subcommand` support (introduced in
https://lists.llvm.org/pipermail/llvm-dev/2016-June/101804.html) could
put options in a per-subcommand namespace.
- One option could belong to multiple subcommand. This patch defines
most of the options once and associates them with multiple subcommands
except
1. `overlap` and `show` both has `value-cutoff` with different default
values
([former](64f62de966/llvm/tools/llvm-profdata/llvm-profdata.cpp (L2352))
vs
[latter](64f62de966/llvm/tools/llvm-profdata/llvm-profdata.cpp (L3009))).
Define 'OverlapValueCutoff' and 'ShowValueCutoff' respectively.
2. `show` supports three profile formats in `ProfileKind` while
{`merge`, `overlap`} supports two. Define separate options.
- Clean up obsolete code as a result, including `-h` and `--version`
customizations. These two options are supported for all commands.
Results pasted.
- [-h and
--help](https://gist.github.com/minglotus-6/387490e5eeda2dd2f9c440a424d6f360)
output.
-
[--version](https://gist.github.com/minglotus-6/f905abcc3a346957bd797f2f84c18c1b)
- [llvm-profdata show
--help](https://gist.github.com/minglotus-6/f143079f02af243a94758138c0af471a)

This PR should be `llvm-profdata` only. It depends on
https://github.com/llvm/llvm-project/pull/71981
2023-11-14 10:19:13 -08:00
Zequan Wu
89a2e70159
[llvm-profdata] Emit warning when counter value is greater than 2^56. (#69513)
Fixes #65416
2023-10-31 16:40:51 -04:00
Alan Phipps
f95b2f1acf Reland "[InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)"
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.

Differential Revision: https://reviews.llvm.org/D138846
2023-10-30 11:15:02 -05:00
Zequan Wu
3c34245c47
[Profile] Use upper 32 bits of profile version for profile variants. (#67695)
Currently all upper 8 bits are reserved for different profile variants.
We need more bits for new mods in the future.
Context:
https://discourse.llvm.org/t/how-to-add-a-new-mode-to-llvm-raw-profile-version/73688
2023-10-03 10:15:22 -04:00
Hans Wennborg
53a2923bf6 Revert "[InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)"
This seems to cause Clang to crash, see comments on the code review. Reverting
until the problem can be investigated.

> Part 1 of 3. This includes the LLVM back-end processing and profile
> reading/writing components. compiler-rt changes are included.
>
> Differential Revision: https://reviews.llvm.org/D138846

This reverts commit a50486fd736ab2fe03fcacaf8b98876db77217a7.
2023-09-21 12:20:24 +02:00
Alan Phipps
a50486fd73 [InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.

Differential Revision: https://reviews.llvm.org/D138846
2023-09-19 17:07:23 -05:00
William Huang
7624de5bea [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-08-17 20:10:45 +00:00
Aaron Ballman
1a53b5c367 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292.

Addressing issues found by:
https://lab.llvm.org/buildbot/#/builders/245/builds/11732
https://lab.llvm.org/buildbot/#/builders/187/builds/12251
https://lab.llvm.org/buildbot/#/builders/186/builds/11099
https://lab.llvm.org/buildbot/#/builders/182/builds/6976
2023-07-28 09:41:38 -04:00
William Huang
66ba71d913 [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-07-27 23:08:27 +00:00
Fangrui Song
4c2980c1a3 [llvm-profdata] Stabilize iteration order for InstrProfWriter
If two functions are inserted to the same bucket, their order in the
serialized profile is dependent on StringMap iteration order, which is
not guaranteed to be deterministic.
(https://llvm.org/docs/ProgrammersManual.html#llvm-adt-stringmap-h).
Use a sort like we do in writeText.
2023-07-20 18:31:41 -07:00
Haojian Wu
58056ae299 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 12e9c7aaa66b7624b5d7666ce2794d912bf9e4b7.

The commit has broken the buildbot, see comment https://reviews.llvm.org/D147740#4451540
2023-06-27 15:19:35 +02:00
William Huang
12e9c7aaa6 [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-06-27 00:06:05 +00:00
Douglas Yung
c9a8a0e8a9 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 31af18bccea95fe1ae8aa2c51cf7c8e92a1c208e.

This change is causing build failures on many Windows build bots:

https://lab.llvm.org/buildbot/#/builders/216/builds/22833
https://lab.llvm.org/buildbot/#/builders/123/builds/19602
https://lab.llvm.org/buildbot/#/builders/172/builds/28315
https://lab.llvm.org/buildbot/#/builders/119/builds/13870
https://lab.llvm.org/buildbot/#/builders/233/builds/794
https://lab.llvm.org/buildbot/#/builders/235/builds/387
https://lab.llvm.org/buildbot/#/builders/13/builds/36921
https://lab.llvm.org/buildbot/#/builders/127/builds/50510
2023-06-23 17:58:22 -07:00
William Huang
31af18bcce [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-06-23 21:48:52 +00:00
Ellis Hoag
c1d935ece3 [InstrProf] Fix BalancedPartitioning when threads are disabled
In https://reviews.llvm.org/D147812 we introduced the class
`BalancedPartitioning` which includes some threading code. The tests in
that diff run forever when built with `-DLLVM_ENABLE_THREADS=OFF` so
some bots were broken.

These tests were skipped in
https://reviews.llvm.org/rGa4845eaf2e9aa18dd900d7cbeff4e5ff52e4b50e
because of this.

This diff disables the threading code if `LLVM_ENABLE_THREADS` is
disabled so we can re-enable the tests.

Reviewed By: luporl

Differential Revision: https://reviews.llvm.org/D152390
2023-06-07 12:04:35 -07:00
Leandro Lupori
a4845eaf2e [InstrProf] Skip Balanced Partitioning tests on ARM
Balanced Partitioning tests, added by 1794532bb942, currently
hang on ARM, what causes check-all to never finish.

Issue #63168

Differential Revision: https://reviews.llvm.org/D147812
2023-06-07 17:39:41 +00:00
Ellis Hoag
1117b9a284 [InstrProf] Use BalancedPartitioning to order temporal profiling trace data
In [0] we described an algorithm called //BalancedPartitioning// (bp) to consume function traces [1] and compute a function order that reduces the number of page faults during startup.

This patch adds the `order` command to the `llvm-profdata` tool which uses bp to output a function order that can be passed to the linker via `--symbol-ordering-file=`.

Special thanks to Sergey Pupyrev and Julian Mestre for designing this balanced partitioning algorithm.

[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
[1] https://reviews.llvm.org/D147287

Reviewed By: spupyrev

Differential Revision: https://reviews.llvm.org/D147812
2023-06-06 11:59:57 -07:00
Michael Platings
1023d7e40d [llvm-profdata] Fix test on Windows
Output on Windows is "llvm-profdata.exe"
2023-05-22 16:00:44 +01:00
Michael Platings
6521905389 [llvm-profdata] Accept --version argument
The `llvm-profdata --version` output now looks like:

  llvm-profdata
  LLVM (http://llvm.org/):
    LLVM version 17.0.0git
    Optimized build with assertions.

This makes llvm-profdata more consistent with other tools.

Reviewed By: simon_tatham

Differential Revision: https://reviews.llvm.org/D150964
2023-05-22 14:44:03 +01:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
William Huang
4357824c63 [llvm-profdata] ProfileReader cleanup - preparation for MD5 refactoring
Cleanup profile reader classes to prepare for complex refactoring as propsed in D147740 (Use MD5 as key for profile map). Change is too complicated so I am cleaning up the reader implementation first with these goals.
-  Reduce duplicated/similar logic
-  Reduce virtual functions, changing them to non-virtual
-  Reduce unnecessry checks, indirections, and dead writes.

This is patch 1/n. This patch refactors NameTable

Explaining several decisions here

1. useMD5() means whether  names of the profiles (the ProfileMap) are represented as MD5. It is NOT whether the input profile format is MD5. This function is an interface for IPO passes to decide whether to match function names or function MD5. There are two motives here:
(a) Eventually we want to use MD5 to represent all function contexts because it is much faster to use it as a key for lookup tables (prototype implementation D147740), so in compilation mode we call setProfileUseMD5() to force use MD5. While in tools mode (llvm-profdata) we want to keep the function name info if it's in the input profile.
(b) We also propose to allow multiple name tables and profile sections in ExtBinary format, and it could consist of name tables with or without using MD5, in this case MD5 prevails and other name tables are converted to MD5.

2. MD5 handling logic is pushed up to BinaryReader base class, because this trades a non-devirtualized virtual function call with a predictable branch. ReadStringFromTable() accounts for >5% time when loading a full 1 GB profile, it should not be virtual.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D148868
2023-05-06 00:21:03 +00:00
William Huang
d38d6ca179 [llvm-profdata] Deprecate Compact Binary Sample Profile Format
Remove support for compact binary sample profile format

Reviewed By: davidxl, wenlei

Differential Revision: https://reviews.llvm.org/D149400
2023-05-01 17:10:08 +00:00
Jessica Paquette
17cfd2e025 [profiling] Improve error message for raw profile header mismatches
When a user uses a mismatched clang + llvm-profdata, they didn't get a very
informative error message. It would just say "unsupported version".

As a result, users are often confused as to what they are supposed to do and
tend to assume that it's a bug in the profiling runtime.

This patch improves the error message by:

- Adding a new class of error (`raw_profile_version_mismatch`) to make it clear
  that, specifically, the *raw profile* version is unsupported because of a
  tool mismatch.

- Adding an error message that tells the user which raw profile version was
  encountered, which version was expected, and instructs them to align their
  tool versions.

To support this, this patch also updates `InstrProfError::take` to also
propagate the optional error message.

Differential Revision: https://reviews.llvm.org/D149361
2023-04-27 14:51:38 -07:00
Snehasish Kumar
932d7b9ddd [memprof] Print out profile build ids in the error message.
When no --profiled-binary flag is provided we can print out the build
ids of the modules in the profile. This can help the user fetch the
correct binary from e.g. remote object store.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D148301
2023-04-17 17:53:57 +00:00
Ellis Hoag
4bddef4117 [InstrProf][Temporal] Add weight field to traces
As discussed in [0], add a `weight` field to temporal profiling traces found in profiles. This allows users to use the `--weighted-input=` flag in the `llvm-profdata merge` command to weight traces from different scenarios differently.

Note that this is a breaking change, but since [1] landed very recently and there is no way to "use" this trace data, there should be no users of this feature. We believe it is acceptable to land this change without bumping the profile format version.

[0] https://reviews.llvm.org/D147812#4259507
[1] https://reviews.llvm.org/D147287

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D148150
2023-04-13 10:37:05 -07:00
Ellis Hoag
244be0b0de [InstrProf] Temporal Profiling
As described in [0], this extends IRPGO to support //Temporal Profiling//.

When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.

Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.

In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.

Special thanks to Julian Mestre for helping with reservoir sampling.

[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D147287
2023-04-11 08:30:52 -07:00
Mingming Liu
f91b0f23c2 [AutoFDO]Merge called target in body samples when flattening profiles
- Body samples could have call targets, merge them as well.

Differential Revision: https://reviews.llvm.org/D147297
2023-03-31 13:24:59 -07:00
wlei
3520f6d666 Fix a missing checksum field 2023-03-31 10:16:06 -07:00
wlei
339b8a0019 [AutoFDO] Use flattened profiles for profile staleness metrics
For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching.
Example for profile flattening:

```
Original profile:
_Z3bazi:20301:1000
 1: 1000
 3: 2000
 5: inline1:1600
   1: 600
   3: inline2:500
     1: 500

Flattened profile:
_Z3bazi:18701:1000
 1: 1000
 3: 2000
 5: 600 inline1:600
inline1:1100:600
 1: 600
 3: 500 inline2: 500
inline2:500:500
 1: 500
```
This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D146452
2023-03-30 11:05:10 -07:00
Snehasish Kumar
cef71d0105 [memprof] Support symbolization of PIE binaries.
Support symolization of PIE binaries in memprof. We assume that the
profiled binary has one executable text segment for simplicity. Update
the memprof-pic test to now expect the same output as the memprof-basic test.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D146181
2023-03-21 20:13:18 +00:00
Snehasish Kumar
a1bbf5ac3c [memprof] Record BuildIDs in the raw profile.
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D145190
2023-03-14 18:16:38 +00:00
Snehasish Kumar
debe80cb8d Revert "[memprof] Record BuildIDs in the raw profile."
This reverts commit 287177a47a396ca6cc0bef7696108cdaa0c68e5f.
2023-03-13 20:09:46 +00:00
Snehasish Kumar
287177a47a [memprof] Record BuildIDs in the raw profile.
This patch adds support for recording BuildIds usng the sanitizer
ListOfModules API. We add another entry to the SegmentEntry struct and
change the memprof raw version.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D145190
2023-03-13 19:28:38 +00:00
Kirill Stoimenov
011b4d4706 [HWASAN][LSAN] Disable tests which don't pass in HWASAN+LSAN mode
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D145727
2023-03-10 00:51:55 +00:00
Snehasish Kumar
e99b5ad383 [memprof] Add scripts to automate testdata regeneration.
The memprof profiles and binaries need to be updated in case of version
updates. This change adds three scripts for llvm-profdata, clang and
llvm tests where memprof profiles are used as inputs. Also update the
tests, profiles and binaries in this change. Change based on the review
suggestions in D145023.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D145644
2023-03-09 19:54:23 +00:00
Snehasish Kumar
e1b569b96a Revert "[memprof] Refactor tests to generate binaries and profiles on the fly."
This reverts commit 599b7690fa917ea4e9cd67275e34d0b5a0f51aa9. Since
adding a cross project dependency is a concern.
2023-03-06 23:48:52 +00:00
Snehasish Kumar
599b7690fa [memprof] Refactor tests to generate binaries and profiles on the fly.
This change replaces the binary profiles and executables used for
testing the memprof profile reader with tests where the profiles are
generated on the fly. This reduces toil when the profile version
changes. The tests are moved from tools/llvm-profdata to
compiler-rt/test/memprof due to the following reasons:
1. Adding dependency on memprof lit.cfg.py for llvm-profdata is
   preferable to adding a dependency on compiler-rt for llvm/test.
2. All the tests can now be run with `ninja check-memprof`.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D145023
2023-03-06 21:24:40 +00:00
Erik Desjardins
87d02e0dfd Recommit "[Support] change StringMap hash function from djbHash to xxHash"
This reverts commit 37eb9d13f891f7656f811516e765b929b169afe0.

Test failures have been fixed:

- ubsan failure fixed by 72eac42f21c0f45a27f3eaaff9364cbb5189b9e4
- warn-unsafe-buffer-usage-fixits-local-var-span.cpp fixed by
  03cc52dfd1dbb4a59b479da55e87838fb93d2067 (wasn't related)
- test-output-format.ll failure was spurious, build failed at
  https://lab.llvm.org/buildbot/#/builders/54/builds/3545 (b4431b2d945b6fc19b1a55ac6ce969a8e06e1e93)
  but passed at
  https://lab.llvm.org/buildbot/#/builders/54/builds/3546 (5ae99be0377248c74346096dc475af254a3fc799)
  which is before my revert
  b4431b2d94...5ae99be037

Original commit message:

    Depends on https://reviews.llvm.org/D142861.

    Alternative to https://reviews.llvm.org/D137601.

    xxHash is much faster than djbHash. This makes a simple Rust test case with a large constant string 10% faster to compile.

    Previous attempts at changing this hash function (e.g. https://reviews.llvm.org/D97396) had to be reverted due to breaking tests that depended on iteration order.
    No additional tests fail with this patch compared to `main` when running `check-all` with `-DLLVM_ENABLE_PROJECTS="all"` (on a Linux host), so I hope I found everything that needs to be changed.

    Differential Revision: https://reviews.llvm.org/D142862
2023-02-19 16:52:26 -05:00