240 Commits

Author SHA1 Message Date
ronryvchin
ff281f7d37
[PGO] Add option to always instrumenting loop entries (#116789)
This patch extends the PGO infrastructure with an option to prefer the
instrumentation of loop entry blocks.
This option is a generalization of
19fb5b467b,
and helps to cover cases where the loop exit is never executed.
An example where this can occur are event handling loops.

Note that change does NOT change the default behavior.
2024-12-04 07:56:46 +01:00
Kazu Hirata
a0153eaa65 [memprof] Fix builds under EXPENSIVE_CHECKS
memprof::Version1 has been removed, so the whole block of code is
dead.
2024-11-22 17:23:16 -08:00
Kazu Hirata
ad2bdd8fab
[memprof] Remove MemProf format Version 1 (#117357)
This patch removes MemProf format Version 1 now that Version 2 and 3
are working well.
2024-11-22 11:53:31 -08:00
Kazu Hirata
4f1b20f023
[ProfileData] Remove unused includes (NFC) (#116751)
Identified with misc-include-cleaner.
2024-11-19 19:42:20 -08:00
Kazu Hirata
0d38f64e7d
[memprof] Remove MemProf format Version 0 (#116442)
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.

I'm not touching version 1 for now because some tests still rely on
version 1.

Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.
2024-11-15 15:37:00 -08:00
Kazu Hirata
57ed628fb3
[memprof] Speed up caller-callee pair extraction (Part 2) (#116441)
This patch further speeds up the extraction of caller-callee pairs
from the profile.

Recall that we reconstruct a call stack by traversing the radix tree
from one of its leaf nodes toward a root.  The implication is that
when we decode many different call stacks, we end up visiting nodes
near the root(s) repeatedly.  That in turn adds many duplicates to our
data structure:

  DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;

only to be deduplicated later with sort+unique for each vector.

This patch makes the extraction process more efficient by keeping
track of indices of the radix tree array we've visited so far and
terminating traversal as soon as we encounter an element previously
visited.

Note that even with this improvement, we still add at least one
caller-callee pair to the data structure above for each call stack
because we do need to add a caller-callee pair for the leaf node with
the callee GUID being 0.

Without this patch, it takes 4 seconds to extract caller-callee pairs
from a large MemProf profile.  This patch shortenes that down to
900ms.
2024-11-15 15:33:23 -08:00
Kazu Hirata
ec353b7418
[memprof] Use llvm::function_ref instead of std::function (#116306)
We've seen bugs where we lost track of error states stored in the
functor because we passed the functor by value (that is,
std::function) as opposed to reference (llvm::function_ref).

This patch fixes a couple of places we pass functors by value.

While we are at it, this patch adds curly braces around a "for" loop
spanning multiple lines.
2024-11-15 13:03:24 -08:00
Kazu Hirata
59da1afd2a
[memprof] Speed up caller-callee pair extraction (#116184)
We know that the MemProf profile has a lot of duplicate call stacks.
Extracting caller-callee pairs from a call stack we've seen before is
a wasteful effort.

This patch makes the extraction more efficient by first coming up with
a work list of linear call stack IDs -- the set of starting positions
in the radix tree array -- and then extract caller-callee pairs from
each call stack in the work list.

We implement the work list as a bit vector because we expect the work
list to be dense in the range [0, RadixTreeSize).  Also, we want the
set insertion to be cheap.

Without this patch, it takes 25 seconds to extract caller-callee pairs
from a large MemProf profile.  This patch shortenes that down to 4
seconds.
2024-11-14 15:54:55 -08:00
Kazu Hirata
9a730d878e
[memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs (#115807)
Undrifting the MemProf profile requires two sets of information:

- caller-callee pairs from the profile
- callee-callee pairs from the IR

This patch adds a function to do the former.  The latter has been
addressed by extractCallsFromIR.

Unfortunately, the current MemProf format does not directly give us
the caller-callee pairs from the profile.  "struct Frame" just tells
us where the call site is -- Caller GUID and line/column numbers; it
doesn't tell us what function a given Frame is calling.  To extract
caller-callee pairs, we need to scan each call stack, look at two
adjacent Frames, and extract a caller-callee pair.

Conceptually, we would extract caller-callee pairs with:

  for each MemProfRecord in the profile:
    for each call stack in AllocSites:
      extract caller-callee pairs from adjacent pairs of Frames

However, this is highly inefficient.  Obtaining MemProfRecord involves
looking up the OnDiskHashTable, allocating several vectors on the
heap, and populating fields that are irrelevant to us, such as MIB and
CallSites.

This patch adds an efficient way of doing the above.  Specifically, we

- go though all IndexedMemProfRecords,
- look at each linear call stack ID
- extract caller-callee pairs from each call stack

The extraction is done by a new class CallerCalleePairExtractor,
modified from LinearCallStackIdConverter, which reconstructs a call
stack from the radix tree array.  For our purposes, we skip the
reconstruction and immediately populates the data structure for
caller-callee pairs.

The resulting caller-callee-pairs is of the type:

  DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs;

which can be passed directly to longestCommonSequence just like the
result of extractCallsFromIR.

Further performance optimizations are possible for the new functions
in this patch.  I'll address those in follow-up patches.
2024-11-13 23:40:12 -08:00
gulfemsavrun
787cd8f0fe
[InstrProf] Add debuginfod correlation support (#106606)
This patch adds debuginfod support into llvm-profdata to
find the assosicated executable by a build id in a raw
profile to correlate a profile with a provided correlation
kind (debug-info or binary).
2024-09-06 13:28:23 -07:00
Kazu Hirata
6c8ff4cbb8
[ProfileData] Take ArrayRef<InstrProfValueData> in addValueData (NFC) (#97363)
This patch fixes another place in ProfileData where we have a pointer
to an array of InstrProfValueData and its length separately.

addValueData is a bit unique in that it remaps incoming values in
place before adding them to ValueSites.  AFAICT, no caller of
addValueData uses updated incoming values.  With this patch, we add
value data to ValueSites first and then remaps values there.  This
way, we can take ArrayRef<InstrProfValueData> as a parameter.
2024-07-11 16:38:44 -07:00
Mircea Trofin
afbd7d1e7c
[NFC] Coding style: drop k in kGlobalIdentifierDelimiter (#98230) 2024-07-09 15:44:55 -07:00
Kazu Hirata
fef144cebb Revert "[llvm] Use llvm::sort (NFC) (#96434)"
This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d.

Reverting the patch fixes the following under EXPENSIVE_CHECKS:

  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir
  LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll
  LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll
2024-06-25 11:18:40 -07:00
Kazu Hirata
05d167fc20
[llvm] Use llvm::sort (NFC) (#96434) 2024-06-23 10:38:51 -07:00
Kazu Hirata
4403cdbaf0
[ProfileData] Refactor BinaryIdsStart and BinaryIdsSize (NFC) (#94922)
BinaryIdsStart and BinaryIdsSize in IndexedInstrProfReader are always
used together, so this patch packages them into an ArrayRef<uint8_t>.

For now, readBinaryIdsInternal immediately unpacks ArrayRef into its
constituents to avoid touching the rest of readBinaryIdsInternal.
2024-06-09 20:08:45 -07:00
Kazu Hirata
521238d19b
[ProfileData] Refactor VTableNamePtr and CompressedVTableNamesLen (NFC) (#94859)
VTableNamePtr and CompressedVTableNamesLen are always used together to
create a StringRef in getSymtab.

We can create the StringRef ahead of time in readHeader.  This way,
IndexedInstrProfReader becomes a tiny bit simpler with fewer member
variables.  Also, StringRef default-constructs itself with its Data
and Length set to nullptr and 0, respectively, which is exactly what
we need.
2024-06-09 15:33:00 -07:00
Kazu Hirata
089c4bb589
[ProfileData] Use ArrayRef instead of const std::vector<T> & (NFC) (#94878) 2024-06-09 10:26:18 -07:00
Kazu Hirata
e62c2146aa
[ProfileData] Simplify calls to readNext in readBinaryIdsInternal (NFC) (#94862)
readNext has two variants:

- readNext<uint64_t, endian>(ptr)
- readNext<uint64_t>(ptr, endian)

This patch uses the latter to simplify readBinaryIdsInternal.  Both
forms default to unaligned.
2024-06-08 14:03:30 -07:00
Kazu Hirata
38124fef7e
[ProfileData] Use a range-based for loop (NFC) (#94856)
While I am at it, this patch adds const to a couple of places.
2024-06-08 08:36:42 -07:00
Kazu Hirata
eb33e462ba
[memprof] Clean up IndexedMemProfReader (NFC) (#94710)
Parameter "Version" is confusing in deserializeV012 and deserializeV3
because we also have member variable "Version".  Fortunately,
parameter "Version" and member variable "Version" always have the same
value because IndexedMemProfReader::deserialize initializes the member
variable and passes it to deserializeV012 and deserializeV3.

This patch removes the parameter.
2024-06-07 07:04:17 -07:00
Kazu Hirata
4ce65423be
[memprof] Use const ref for IndexedRecord (#94114)
The type of *Iter here is "const IndexedMemProfRecord &" as defined in
RecordLookupTrait.  Assigning *Iter to a variable of type
"const IndexedMemProfRecord &" avoids a copy, reducing the cycle and
instruction counts by 1.8% and 0.2%, respectively, with
"llvm-profdata show" modified to deserialize all MemProfRecords.

Note that RecordLookupTrait has an internal copy of
IndexedMemProfRecord, so we don't have to worry about a dangling
reference to a temporary.
2024-06-02 13:30:32 -07:00
Kazu Hirata
90acfbf90d
[memprof] Use linear IDs for Frames and call stacks (#93740)
With this patch, we stop using on-disk hash tables for Frames and call
stacks.  Instead, we'll write out all the Frames as a flat array while
maintaining mappings from FrameIds to the indexes into the array.
Then we serialize call stacks in terms of those indexes.

Likewise, we'll write out all the call stacks as another flat array
while maintaining mappings from CallStackIds to the indexes into the
call stack array.  One minor difference from Frames is that the
indexes into the call stack array are not contiguous because call
stacks are variable-length objects.

Then we serialize IndexedMemProfRecords in terms of the indexes
into the call stack array.

Now, we describe each call stack with 32-bit indexes into the Frame
array (as opposed to the 64-bit FrameIds in Version 2).  The use of
the smaller type cuts down the profile file size by about 40% relative
to Version 2.  The departure from the on-disk hash tables contributes
a little bit to the savings, too.

For now, IndexedMemProfRecords refer to call stacks with 64-bit
indexes into the call stack array.  As a follow-up, I'll change that
to uint32_t, including necessary updates to RecordWriterTrait.
2024-05-30 14:28:22 -07:00
Kazu Hirata
99b9ab45cd
[memprof] Reorder MemProf sections in profile (#93640)
This patch teaches the V3 format to serialize Frames, call stacks, and
IndexedMemProfRecords, in that order.

I'm planning to use linear IDs for Frames.  That is, Frames will be
numbered 0, 1, 2, and so on in the order we serialize them.  In turn,
we will seialize the call stacks in terms of those linear IDs.

Likewise, I'm planning to use linear IDs for call stacks and then
serialize IndexedMemProfRecords in terms of those linear IDs for call
stacks.

With the new order, we can successively free data structures as we
serialize them.  That is, once we serialize Frames, we can free the
Frames' data proper and just retain mappings from FrameIds to linear
IDs.  A similar story applies to call stacks.
2024-05-29 12:18:24 -07:00
Mingming Liu
737a3018e8
[nfc][InstrFDO] Add Header::getIndexedProfileVersion and use it to decide profile version. (#93613)
This is a split of https://github.com/llvm/llvm-project/pull/93346 as
discussed.
2024-05-29 10:15:17 -07:00
Kazu Hirata
9e89d107a6
[memprof] Add MemProf format Version 3 (#93608)
This patch adds Version 3 for development purposes.  For now, this
patch adds V3 as a copy of V2.

For the most part, this patch adds "case Version3:" wherever "case
Version2:" appears.  One exception is writeMemProfV3, which is copied
from writeMemProfV2 but updated to write out memprof::Version3 to the
MemProf header.  We'll incrementally modify writeMemProfV3 in
subsequent patches.
2024-05-28 13:30:00 -07:00
Kazu Hirata
8ad980d7dc
[memprof] Refactor getMemProfRecord (NFC) (#93138)
This patch refactors getMemProfRecord for readability while adding
consistency checks.

- This patch adds a switch statement on the MemProf version just like
  most places dealing with MemProf serialization/deserialization.

- This patch adds asserts to ensure that the exact set of data
  structures are available while ones we do not use are not present.
  That is, getMemProfRecord no longer determines the version based on
  the availability of MemProfCallStackTable.
2024-05-23 14:13:20 -07:00
Mingming Liu
b66779b5bf
[nfc][InstrProfReader]Store header fields in native endianness (#92947)
- Use `Header.Version` directly and remove Header::formatVersion

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
2024-05-21 21:25:12 -07:00
Kazu Hirata
352602010f Repply [memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places.  This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.

The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions.  This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.

This iteration fixes a problem uncovered with ubsan, where we were
dereferencing an uninitialized std::unique_ptr.
2024-04-28 11:44:45 -07:00
Vitaly Buka
7aa6896dd7
Revert "[memprof] Introduce FrameIdConverter and CallStackIdConverter" (#90318)
Reverts llvm/llvm-project#90307

Breaks bots https://lab.llvm.org/buildbot/#/builders/5/builds/42943
2024-04-27 00:15:08 -07:00
Kazu Hirata
e04df693bf
[memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places.  This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.

The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions.  This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.
2024-04-26 19:22:17 -07:00
Kazu Hirata
777d2e54a9
[memprof] Drop the trait parameter (NFC) (#89461)
OnDiskIterableChainedHashTable::Create can default-contruct a trait
object for us.  We don't need to construct one on our own unless we
need to customize something (like a version number).
2024-04-19 17:00:57 -07:00
Kazu Hirata
b64c69d5b1
[memprof] Introduce IndexedMemProfReader (NFC) (#89331)
Without this patch, a lot of details about the deserilization of the
MemProf data are exposed in IndexedInstrProfReader -- four
MemProf-related variables in IndexedInstrProfReader plus 90+ lines of
deserilization code within IndexedInstrProfReader::readHeader.

This patch encapsulates them into a separate class, exposing only
three methods, namely the default constructor, deserialize, and
getMemProfRecord.
2024-04-18 21:16:46 -07:00
Kazu Hirata
172f6ddfa7
[memprof] Add Version2 of the indexed MemProf format (#89100)
This patch adds Version2 of the indexed MemProf format.  The new
format comes with a hash table from CallStackId to actual call stacks
llvm::SmallVector<FrameId>.  The rest of the format refers to call
stacks with CallStackId.  This "values + references" model effectively
deduplicates call stacks.  Without this patch, a large indexed memprof
file of mine shrinks from 4.4GB to 1.6GB, a 64% reduction.

This patch does not make Version2 generally available yet as I am
planning to make a few more changes to the format.
2024-04-18 14:12:58 -07:00
Kazu Hirata
f430e37446
[llvm] Drop unaligned from calls to readNext (NFC) (#88841)
Now readNext defaults to unaligned accesses.  This patch drops
unaligned to improve readability.
2024-04-16 12:47:02 -07:00
Kazu Hirata
db9a17a407
[memprof] Use std::optional (NFC) (#88366) 2024-04-11 09:56:01 -07:00
Kazu Hirata
d89914f30b
[memprof] Add Version2 of IndexedMemProfRecord serialization (#87455)
I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites.  We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId.  The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.

As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2.  A
subsequent patch will add Version2 support to llvm-profdata.

The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId.  That is:

  const IndexedAllocationInfo &N = ...;
  ...
  LE.write<uint64_t>(N.CallStack.size());
  for (const FrameId &Id : N.CallStack)
    LE.write<FrameId>(Id);

becomes:

  LE.write<CallStackId>(N.CSId);
2024-04-03 21:48:38 -07:00
Mingming Liu
1351d17826
[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825)
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)

* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
 
* Implement profile reader and writer support 
  * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
  * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
  * Indexed profile writer collects the list of vtable names, and stores that to index profiles.
  * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.

rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
2024-04-01 08:52:35 -07:00
Kazu Hirata
44253a9ce6
[memprof] Add MemProf version (#86414)
This patch adds a version field to the MemProf section of the indexed
profile format, calling the new version "version 1".  The existing
version is called "version 0".

The writer supports both versions via a command-line option:

  llvm-profdata merge --memprof-version=1 ...

The reader supports both versions by automatically detecting the
version from the header.
2024-03-28 14:29:34 -07:00
Kazu Hirata
9855134d07
[memprof] Use #ifdef EXPENSIVE_CHECKS (#86585)
This patch replaces:

  #if EXPENSIVE_CHECKS

with:

  #ifdef EXPENSIVE_CHECKS

to follow the existing conventions.
2024-03-25 14:36:03 -07:00
Kazu Hirata
6646fe884c
[memprof] Compute CallStackId when deserializing IndexedAllocationInfo (#86421)
There are two ways to create in-memory instances of
IndexedAllocationInfo -- deserialization of the raw MemProf data and
that of the indexed MemProf data.

With:

  commit 74799f424063a2d751e0f9ea698db1f4efd0d8b2
  Author: Kazu Hirata <kazu@google.com>
  Date:   Sat Mar 23 19:50:15 2024 -0700

we compute CallStackId for each call stack in IndexedAllocationInfo
while deserializing the raw MemProf data.

This patch does the same while deserilizing the indexed MemProf data.

As with the patch above, this patch does not add any use of
CallStackId yet.
2024-03-25 14:21:49 -07:00
Mingming Liu
16e74fd489
Reland "[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling." (#82711)
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.

1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
   
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))

**Original Description**

* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-27 11:07:40 -08:00
Mingming Liu
0e8d1877cd
Revert type profiling change as compiler-rt test break on Windows. (#82583)
Examples
https://lab.llvm.org/buildbot/#/builders/127/builds/62532/steps/8/logs/stdio
2024-02-21 21:41:33 -08:00
Mingming Liu
db7e9e6841
[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling. (#81691)
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-21 20:59:42 -08:00
NAKAMURA Takumi
f035c018a6 InstrProf::getFunctionBitmap: Fix BE hosts (#80608) 2024-02-05 15:04:57 +09:00
NAKAMURA Takumi
4926f12ff5
[Coverage] ProfileData: Handle MC/DC Bitmap as BitVector. NFC. (#80608)
* `getFunctionBitmap()` stores not `std::vector<uint8_t>` but
`BitVector`.
* `CounterMappingContext` holds `Bitmap` (instead of the ref of bytes)
* `Bitmap` and `BitmapIdx` are used instead of `evaluateBitmap()`.

FIXME: `InstrProfRecord` itself should handle `Bitmap` as `BitVector`.
2024-02-05 12:42:08 +09:00
Mingming Liu
eba2b789d3
[RawProfReader]When constructing symbol table, read the MD5 of function name in the proper byte order (#76312)
Before this patch, when the field `NameRef` is generated in little-endian systems and read back in big-endian systems, the information gets dropped.
- The bug gets caught by a buildbot
https://lab.llvm.org/buildbot/#/builders/94/builds/17931. In the error message (pasted below),
two indirect call targets are not imported.

```
   ; IMPORTS-DAG: Import _Z7callee1v
               ^
<stdin>:1:1: note: scanning from here
main.ll: Import _Z11global_funcv from lib.cc
^
<stdin>:1:10: note: possible intended match here
main.ll: Import _Z11global_funcv from lib.cc
         ^
Input file: <stdin>
Check file:
/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll
-dump-input=help explains the following input dump.
Input was:
<<<<<<
          1: main.ll: Import _Z11global_funcv from lib.cc 
dag:34'0 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match
found
dag:34'1 ? possible intended match
```

[This commit](b3999246b1 (diff-b196b796c5a396c7cdf93b347fe47e2b29b72d0b7dd0e2b88abb964d376ee50e)) gates the fix by flag and provide test data by creating big-endian profiles (rather than reading the little-endian data on a big-endian system that might require a VM).

- [This](b3999246b1 (diff-643176077ddbe537bd0a05d2a8a53bdff6339420a30e8511710bf232afdda8b9)) is a hexdump of little-endian profile data, and [this](b3999246b1 (diff-1736a3ee25dde02bba55d670df78988fdb227e5a85b94b8707cf182cf70b28f0)) is the big-endian version of it.
- The [README.md](b3999246b1 (diff-6717b6a385de3ae60ab3aec9638af2a43b55adaf6784b6f0393ebe1a6639438b)) shows the result of `llvm-profdata show -ic-targets` before and after the fix when the profile is in big-endian.
2024-01-02 10:23:29 -08:00
Mingming Liu
78a195e100
Reland the reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75954)
Simplify the compiler-rt test to make it more general for different
platforms, and use `*DAG` matchers for lines that may be emitted
out-of-order.
- The compiler-rt test passed on a Windows machine. Previously name
matchers don't work for MSVC mangling
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907)
- `*DAG` matchers fixed the error in
https://lab.llvm.org/buildbot/#/builders/94/builds/17924

This is the second reland and fixed errors caught in first reland
(https://github.com/llvm/llvm-project/pull/75860)

**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-19 12:25:56 -08:00
Mingming Liu
6ce23ea0ab
Revert "Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. "" (#75888)
Reverts llvm/llvm-project#75860
- Mangled name mismatch on Windows
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907/steps/8/logs/stdio)
2023-12-18 19:31:18 -08:00
Mingming Liu
c5871712ae
Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75860)
Fixed build-bot failures caught by post-submit tests
1) Add the list of command line tools needed by new compiler-rt test into dependency.
2) Use `starts_with` to replace deprecated `startswith`.

**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-18 17:43:40 -08:00
Mingming Liu
3aa5d71127
Revert "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles." (#75835)
Reverts llvm/llvm-project#74008

The compiler-rt test failed due to `llvm-dis` not found
(https://lab.llvm.org/buildbot/#/builders/127/builds/59884)
Will revert and investigate how to require the proper dependency.
2023-12-18 09:39:55 -08:00