220 Commits

Author SHA1 Message Date
Kazu Hirata
4ce65423be
[memprof] Use const ref for IndexedRecord (#94114)
The type of *Iter here is "const IndexedMemProfRecord &" as defined in
RecordLookupTrait.  Assigning *Iter to a variable of type
"const IndexedMemProfRecord &" avoids a copy, reducing the cycle and
instruction counts by 1.8% and 0.2%, respectively, with
"llvm-profdata show" modified to deserialize all MemProfRecords.

Note that RecordLookupTrait has an internal copy of
IndexedMemProfRecord, so we don't have to worry about a dangling
reference to a temporary.
2024-06-02 13:30:32 -07:00
Kazu Hirata
90acfbf90d
[memprof] Use linear IDs for Frames and call stacks (#93740)
With this patch, we stop using on-disk hash tables for Frames and call
stacks.  Instead, we'll write out all the Frames as a flat array while
maintaining mappings from FrameIds to the indexes into the array.
Then we serialize call stacks in terms of those indexes.

Likewise, we'll write out all the call stacks as another flat array
while maintaining mappings from CallStackIds to the indexes into the
call stack array.  One minor difference from Frames is that the
indexes into the call stack array are not contiguous because call
stacks are variable-length objects.

Then we serialize IndexedMemProfRecords in terms of the indexes
into the call stack array.

Now, we describe each call stack with 32-bit indexes into the Frame
array (as opposed to the 64-bit FrameIds in Version 2).  The use of
the smaller type cuts down the profile file size by about 40% relative
to Version 2.  The departure from the on-disk hash tables contributes
a little bit to the savings, too.

For now, IndexedMemProfRecords refer to call stacks with 64-bit
indexes into the call stack array.  As a follow-up, I'll change that
to uint32_t, including necessary updates to RecordWriterTrait.
2024-05-30 14:28:22 -07:00
Kazu Hirata
99b9ab45cd
[memprof] Reorder MemProf sections in profile (#93640)
This patch teaches the V3 format to serialize Frames, call stacks, and
IndexedMemProfRecords, in that order.

I'm planning to use linear IDs for Frames.  That is, Frames will be
numbered 0, 1, 2, and so on in the order we serialize them.  In turn,
we will seialize the call stacks in terms of those linear IDs.

Likewise, I'm planning to use linear IDs for call stacks and then
serialize IndexedMemProfRecords in terms of those linear IDs for call
stacks.

With the new order, we can successively free data structures as we
serialize them.  That is, once we serialize Frames, we can free the
Frames' data proper and just retain mappings from FrameIds to linear
IDs.  A similar story applies to call stacks.
2024-05-29 12:18:24 -07:00
Mingming Liu
737a3018e8
[nfc][InstrFDO] Add Header::getIndexedProfileVersion and use it to decide profile version. (#93613)
This is a split of https://github.com/llvm/llvm-project/pull/93346 as
discussed.
2024-05-29 10:15:17 -07:00
Kazu Hirata
9e89d107a6
[memprof] Add MemProf format Version 3 (#93608)
This patch adds Version 3 for development purposes.  For now, this
patch adds V3 as a copy of V2.

For the most part, this patch adds "case Version3:" wherever "case
Version2:" appears.  One exception is writeMemProfV3, which is copied
from writeMemProfV2 but updated to write out memprof::Version3 to the
MemProf header.  We'll incrementally modify writeMemProfV3 in
subsequent patches.
2024-05-28 13:30:00 -07:00
Kazu Hirata
8ad980d7dc
[memprof] Refactor getMemProfRecord (NFC) (#93138)
This patch refactors getMemProfRecord for readability while adding
consistency checks.

- This patch adds a switch statement on the MemProf version just like
  most places dealing with MemProf serialization/deserialization.

- This patch adds asserts to ensure that the exact set of data
  structures are available while ones we do not use are not present.
  That is, getMemProfRecord no longer determines the version based on
  the availability of MemProfCallStackTable.
2024-05-23 14:13:20 -07:00
Mingming Liu
b66779b5bf
[nfc][InstrProfReader]Store header fields in native endianness (#92947)
- Use `Header.Version` directly and remove Header::formatVersion

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
2024-05-21 21:25:12 -07:00
Kazu Hirata
352602010f Repply [memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places.  This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.

The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions.  This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.

This iteration fixes a problem uncovered with ubsan, where we were
dereferencing an uninitialized std::unique_ptr.
2024-04-28 11:44:45 -07:00
Vitaly Buka
7aa6896dd7
Revert "[memprof] Introduce FrameIdConverter and CallStackIdConverter" (#90318)
Reverts llvm/llvm-project#90307

Breaks bots https://lab.llvm.org/buildbot/#/builders/5/builds/42943
2024-04-27 00:15:08 -07:00
Kazu Hirata
e04df693bf
[memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places.  This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.

The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions.  This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.
2024-04-26 19:22:17 -07:00
Kazu Hirata
777d2e54a9
[memprof] Drop the trait parameter (NFC) (#89461)
OnDiskIterableChainedHashTable::Create can default-contruct a trait
object for us.  We don't need to construct one on our own unless we
need to customize something (like a version number).
2024-04-19 17:00:57 -07:00
Kazu Hirata
b64c69d5b1
[memprof] Introduce IndexedMemProfReader (NFC) (#89331)
Without this patch, a lot of details about the deserilization of the
MemProf data are exposed in IndexedInstrProfReader -- four
MemProf-related variables in IndexedInstrProfReader plus 90+ lines of
deserilization code within IndexedInstrProfReader::readHeader.

This patch encapsulates them into a separate class, exposing only
three methods, namely the default constructor, deserialize, and
getMemProfRecord.
2024-04-18 21:16:46 -07:00
Kazu Hirata
172f6ddfa7
[memprof] Add Version2 of the indexed MemProf format (#89100)
This patch adds Version2 of the indexed MemProf format.  The new
format comes with a hash table from CallStackId to actual call stacks
llvm::SmallVector<FrameId>.  The rest of the format refers to call
stacks with CallStackId.  This "values + references" model effectively
deduplicates call stacks.  Without this patch, a large indexed memprof
file of mine shrinks from 4.4GB to 1.6GB, a 64% reduction.

This patch does not make Version2 generally available yet as I am
planning to make a few more changes to the format.
2024-04-18 14:12:58 -07:00
Kazu Hirata
f430e37446
[llvm] Drop unaligned from calls to readNext (NFC) (#88841)
Now readNext defaults to unaligned accesses.  This patch drops
unaligned to improve readability.
2024-04-16 12:47:02 -07:00
Kazu Hirata
db9a17a407
[memprof] Use std::optional (NFC) (#88366) 2024-04-11 09:56:01 -07:00
Kazu Hirata
d89914f30b
[memprof] Add Version2 of IndexedMemProfRecord serialization (#87455)
I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites.  We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId.  The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.

As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2.  A
subsequent patch will add Version2 support to llvm-profdata.

The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId.  That is:

  const IndexedAllocationInfo &N = ...;
  ...
  LE.write<uint64_t>(N.CallStack.size());
  for (const FrameId &Id : N.CallStack)
    LE.write<FrameId>(Id);

becomes:

  LE.write<CallStackId>(N.CSId);
2024-04-03 21:48:38 -07:00
Mingming Liu
1351d17826
[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825)
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)

* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
 
* Implement profile reader and writer support 
  * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
  * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
  * Indexed profile writer collects the list of vtable names, and stores that to index profiles.
  * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.

rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
2024-04-01 08:52:35 -07:00
Kazu Hirata
44253a9ce6
[memprof] Add MemProf version (#86414)
This patch adds a version field to the MemProf section of the indexed
profile format, calling the new version "version 1".  The existing
version is called "version 0".

The writer supports both versions via a command-line option:

  llvm-profdata merge --memprof-version=1 ...

The reader supports both versions by automatically detecting the
version from the header.
2024-03-28 14:29:34 -07:00
Kazu Hirata
9855134d07
[memprof] Use #ifdef EXPENSIVE_CHECKS (#86585)
This patch replaces:

  #if EXPENSIVE_CHECKS

with:

  #ifdef EXPENSIVE_CHECKS

to follow the existing conventions.
2024-03-25 14:36:03 -07:00
Kazu Hirata
6646fe884c
[memprof] Compute CallStackId when deserializing IndexedAllocationInfo (#86421)
There are two ways to create in-memory instances of
IndexedAllocationInfo -- deserialization of the raw MemProf data and
that of the indexed MemProf data.

With:

  commit 74799f424063a2d751e0f9ea698db1f4efd0d8b2
  Author: Kazu Hirata <kazu@google.com>
  Date:   Sat Mar 23 19:50:15 2024 -0700

we compute CallStackId for each call stack in IndexedAllocationInfo
while deserializing the raw MemProf data.

This patch does the same while deserilizing the indexed MemProf data.

As with the patch above, this patch does not add any use of
CallStackId yet.
2024-03-25 14:21:49 -07:00
Mingming Liu
16e74fd489
Reland "[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling." (#82711)
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.

1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
   
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))

**Original Description**

* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-27 11:07:40 -08:00
Mingming Liu
0e8d1877cd
Revert type profiling change as compiler-rt test break on Windows. (#82583)
Examples
https://lab.llvm.org/buildbot/#/builders/127/builds/62532/steps/8/logs/stdio
2024-02-21 21:41:33 -08:00
Mingming Liu
db7e9e6841
[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling. (#81691)
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-21 20:59:42 -08:00
NAKAMURA Takumi
f035c018a6 InstrProf::getFunctionBitmap: Fix BE hosts (#80608) 2024-02-05 15:04:57 +09:00
NAKAMURA Takumi
4926f12ff5
[Coverage] ProfileData: Handle MC/DC Bitmap as BitVector. NFC. (#80608)
* `getFunctionBitmap()` stores not `std::vector<uint8_t>` but
`BitVector`.
* `CounterMappingContext` holds `Bitmap` (instead of the ref of bytes)
* `Bitmap` and `BitmapIdx` are used instead of `evaluateBitmap()`.

FIXME: `InstrProfRecord` itself should handle `Bitmap` as `BitVector`.
2024-02-05 12:42:08 +09:00
Mingming Liu
eba2b789d3
[RawProfReader]When constructing symbol table, read the MD5 of function name in the proper byte order (#76312)
Before this patch, when the field `NameRef` is generated in little-endian systems and read back in big-endian systems, the information gets dropped.
- The bug gets caught by a buildbot
https://lab.llvm.org/buildbot/#/builders/94/builds/17931. In the error message (pasted below),
two indirect call targets are not imported.

```
   ; IMPORTS-DAG: Import _Z7callee1v
               ^
<stdin>:1:1: note: scanning from here
main.ll: Import _Z11global_funcv from lib.cc
^
<stdin>:1:10: note: possible intended match here
main.ll: Import _Z11global_funcv from lib.cc
         ^
Input file: <stdin>
Check file:
/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll
-dump-input=help explains the following input dump.
Input was:
<<<<<<
          1: main.ll: Import _Z11global_funcv from lib.cc 
dag:34'0 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match
found
dag:34'1 ? possible intended match
```

[This commit](b3999246b1 (diff-b196b796c5a396c7cdf93b347fe47e2b29b72d0b7dd0e2b88abb964d376ee50e)) gates the fix by flag and provide test data by creating big-endian profiles (rather than reading the little-endian data on a big-endian system that might require a VM).

- [This](b3999246b1 (diff-643176077ddbe537bd0a05d2a8a53bdff6339420a30e8511710bf232afdda8b9)) is a hexdump of little-endian profile data, and [this](b3999246b1 (diff-1736a3ee25dde02bba55d670df78988fdb227e5a85b94b8707cf182cf70b28f0)) is the big-endian version of it.
- The [README.md](b3999246b1 (diff-6717b6a385de3ae60ab3aec9638af2a43b55adaf6784b6f0393ebe1a6639438b)) shows the result of `llvm-profdata show -ic-targets` before and after the fix when the profile is in big-endian.
2024-01-02 10:23:29 -08:00
Mingming Liu
78a195e100
Reland the reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75954)
Simplify the compiler-rt test to make it more general for different
platforms, and use `*DAG` matchers for lines that may be emitted
out-of-order.
- The compiler-rt test passed on a Windows machine. Previously name
matchers don't work for MSVC mangling
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907)
- `*DAG` matchers fixed the error in
https://lab.llvm.org/buildbot/#/builders/94/builds/17924

This is the second reland and fixed errors caught in first reland
(https://github.com/llvm/llvm-project/pull/75860)

**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-19 12:25:56 -08:00
Mingming Liu
6ce23ea0ab
Revert "Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. "" (#75888)
Reverts llvm/llvm-project#75860
- Mangled name mismatch on Windows
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907/steps/8/logs/stdio)
2023-12-18 19:31:18 -08:00
Mingming Liu
c5871712ae
Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75860)
Fixed build-bot failures caught by post-submit tests
1) Add the list of command line tools needed by new compiler-rt test into dependency.
2) Use `starts_with` to replace deprecated `startswith`.

**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-18 17:43:40 -08:00
Mingming Liu
3aa5d71127
Revert "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles." (#75835)
Reverts llvm/llvm-project#74008

The compiler-rt test failed due to `llvm-dis` not found
(https://lab.llvm.org/buildbot/#/builders/127/builds/59884)
Will revert and investigate how to require the proper dependency.
2023-12-18 09:39:55 -08:00
Mingming Liu
245cddae70
[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (#74008)
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-18 09:10:39 -08:00
Zequan Wu
ab3430f891
[Profile] Add binary profile correlation for code coverage. (#69493)
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.

## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.

## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +14.6Mi  [NEW] +14.6Mi    __llvm_prf_data
  [NEW] +10.6Mi  [NEW] +10.6Mi    __llvm_prf_names
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
  [ = ]       0   +65% +1.23Ki    .relro_padding
   +62% +1.20Ki  [ = ]       0    [Unmapped]
   +13%    +448   +19%    +448    .init_array
  +8.8%    +192  [ = ]       0    [ELF Section Headers]
  +0.0%    +136  +0.0%     +80    [7 Others]
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.5%     +80  +1.2%     +64    .plt
  [ = ]       0 -99.2% -3.68Ki    [LOAD #5 [RW]]
  +195% +64.0Mi  +194% +64.0Mi    TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
   +13%    +448   +19%    +448    .init_array
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.2%     +64  +1.2%     +64    .plt
  +2.9%     +64  [ = ]       0    [ELF Section Headers]
  +0.0%     +40  +0.0%     +40    .data
  +1.2%     +32  +1.2%     +32    .got.plt
  +0.0%     +24  +0.0%      +8    [5 Others]
  [ = ]       0 -22.9%    -872    [LOAD #5 [RW]]
 -74.5% -1.44Ki  [ = ]       0    [Unmapped]
  [ = ]       0 -76.5% -1.45Ki    .relro_padding
  +118% +38.8Mi  +117% +38.8Mi    TOTAL
```

A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
2023-12-14 14:16:38 -05:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Zequan Wu
b9951b3fe6
[llvm-profdata] Fix binary ids with multiple raw profiles in a single… (#72740)
Save binary ids when iterating through `RawInstrProfReader`.

Fixes #72699.
2023-11-20 14:25:24 -05:00
Ellis Hoag
b0154c36d6
[InstrProf] Add pgo use block coverage test (#72443)
Back in https://reviews.llvm.org/D124490 we added a block coverage mode
that instruments a subset of basic blocks using single byte counters to
get coverage for the whole function.

This commit adds a test to make sure that we correctly assign branch
weights based on the coverage profile.

I noticed this test was missing after seeing that we had no coverage on
`PGOUseFunc::populateCoverage()`

https://lab.llvm.org/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp.html#L1383
2023-11-20 09:25:33 -06:00
Zequan Wu
3c97c8b6fc
[Profile] Refactor profile correlation. (#70856)
Refactor some code from https://github.com/llvm/llvm-project/pull/69493.

#70712 was reverted due to linking failures. So, `-debug-info-correlate` remains unchanged and no new flag added.
2023-11-01 14:16:43 -04:00
Zequan Wu
89a2e70159
[llvm-profdata] Emit warning when counter value is greater than 2^56. (#69513)
Fixes #65416
2023-10-31 16:40:51 -04:00
Zequan Wu
db7a1ed9a2 Revert "[Profile] Refactor profile correlation. (#70712)"
This reverts commit 4b383d0af93136b80841fc140da0823dfc441dd4.
2023-10-31 10:53:45 -04:00
Zequan Wu
4b383d0af9
[Profile] Refactor profile correlation. (#70712)
Refactor some code from https://github.com/llvm/llvm-project/pull/69493.

Rebase of https://github.com/llvm/llvm-project/pull/69656 on top of main
as it was messed up.
2023-10-31 10:41:01 -04:00
Alan Phipps
f95b2f1acf Reland "[InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)"
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.

Differential Revision: https://reviews.llvm.org/D138846
2023-10-30 11:15:02 -05:00
Kazu Hirata
02f67c097d Use llvm::endianness::{big,little,native} (NFC)
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class. This patch replaces
{big,little,native} with llvm::endianness::{big,little,native}.

This patch completes the migration to llvm::endianness and
llvm::endianness::{big,little,native}.  I'll post a separate patch to
remove the migration helpers in llvm/Support/Endian.h:

  using endianness = llvm::endianness;
  constexpr llvm::endianness big = llvm::endianness::big;
  constexpr llvm::endianness little = llvm::endianness::little;
  constexpr llvm::endianness native = llvm::endianness::native;
2023-10-13 23:16:25 -07:00
Kazu Hirata
b8885926f8 Use llvm::endianness::{big,little,native} (NFC)
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an enum.
This patch replaces llvm::support::{big,little,native} with
llvm::endianness::{big,little,native}.
2023-10-10 22:54:51 -07:00
Kazu Hirata
d7b18d5083 Use llvm::endianness{,::little,::native} (NFC)
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form.  This patch replaces
llvm::support::endianness with llvm::endianness.
2023-10-09 00:54:47 -07:00
Zequan Wu
3c34245c47
[Profile] Use upper 32 bits of profile version for profile variants. (#67695)
Currently all upper 8 bits are reserved for different profile variants.
We need more bits for new mods in the future.
Context:
https://discourse.llvm.org/t/how-to-add-a-new-mode-to-llvm-raw-profile-version/73688
2023-10-03 10:15:22 -04:00
Kazu Hirata
7df88212d4 [ProfileData] Use llvm::byteswap instead of sys::getSwappedBytes (NFC) 2023-09-23 13:00:47 -07:00
Hans Wennborg
53a2923bf6 Revert "[InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)"
This seems to cause Clang to crash, see comments on the code review. Reverting
until the problem can be investigated.

> Part 1 of 3. This includes the LLVM back-end processing and profile
> reading/writing components. compiler-rt changes are included.
>
> Differential Revision: https://reviews.llvm.org/D138846

This reverts commit a50486fd736ab2fe03fcacaf8b98876db77217a7.
2023-09-21 12:20:24 +02:00
Alan Phipps
a50486fd73 [InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.

Differential Revision: https://reviews.llvm.org/D138846
2023-09-19 17:07:23 -05:00
Arthur Eubanks
a6f33ad447 [NFC][Profile] Rename Counters/DataSize to NumCounters/Data
Fixes some FIXMEs.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D158466
2023-08-22 09:03:59 -07:00
Ellis Hoag
fe051934cb [InstrProf] Encode linkage names in IRPGO counter names
Prior to this diff, names in the `__llvm_prf_names` section had the format `[<filepath>:]<function-name>`, e.g., `main.cpp:foo`, `bar`. `<filepath>` is used to discriminate between possibly identical function names when linkage is local and `<function-name>` simply comes from `F.getName()`. This has two problems:
  * `:` is commonly found in Objective-C functions so that names like `main.mm:-[C foo::]` and `-[C bar::]` are difficult to parse
  * `<function-name>` might be different from the linkage name, so it cannot be used to pass a function order to the linker via `-symbol-ordering-file` or `-order_file` (see https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068)

Instead, this diff changes the format to `[<filepath>;]<linkage-name>`, e.g., `main.cpp;_foo`, `_bar`. The hope is that `;` won't realistically be found in either `<filepath>` or `<linkage-name>`.

To prevent invalidating all prior IRPGO profiles, we also lookup the prior name format when a record is not found (see `InstrProfSymtab::create()`, `readMemprof()`, and `getInstrProfRecord()`). It seems that Swift and Clang FE-PGO rely on the original `getPGOFuncName()`, so we cannot simply replace it.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D156569
2023-08-07 10:15:08 -07:00
Jessica Paquette
17cfd2e025 [profiling] Improve error message for raw profile header mismatches
When a user uses a mismatched clang + llvm-profdata, they didn't get a very
informative error message. It would just say "unsupported version".

As a result, users are often confused as to what they are supposed to do and
tend to assume that it's a bug in the profiling runtime.

This patch improves the error message by:

- Adding a new class of error (`raw_profile_version_mismatch`) to make it clear
  that, specifically, the *raw profile* version is unsupported because of a
  tool mismatch.

- Adding an error message that tells the user which raw profile version was
  encountered, which version was expected, and instructs them to align their
  tool versions.

To support this, this patch also updates `InstrProfError::take` to also
propagate the optional error message.

Differential Revision: https://reviews.llvm.org/D149361
2023-04-27 14:51:38 -07:00