185 Commits

Author SHA1 Message Date
Ellis Hoag
f18cfb9108
[InstrProf] Factor out getRecord() and use NamedInstrProfRecord (#145417)
Factor out code in `populateCounters()` and `populateCoverage()` used to
grab the record into `PGOUseFunc::getRecord()` to reduce code
duplication. And return `NamedInstrProfRecord` in `getInstrProfRecord()`
to avoid an unnecessary cast. No functional change is intented.
2025-06-24 09:52:47 -07:00
Snehasish Kumar
0528848def
[NFC][MemProf] Move IndexedMemProfData to its own header. (#140503)
Part of a larger refactoring with the following goals
1. Reduce the size of MemProf.h 
2. Avoid including ModuleSummaryIndex just for a couple of types
2025-05-19 16:21:51 -07:00
Snehasish Kumar
a53b306c47
[NFC][MemProf] Move Radix tree methods to their own header and cpp. (#140501)
Part of a larger refactoring with the following goals
1. Reduce the size of MemProf.h 
2. Avoid including ModuleSummaryIndex just for a couple of types
2025-05-19 16:16:09 -07:00
Snehasish Kumar
7268c4e7b3
[NFC][MemProf] Fix typo in type name (#140500) 2025-05-19 16:12:57 -07:00
Snehasish Kumar
e1ac57d53a
[MemProf] Extend CallSite information to include potential callees. (#130441)
* Added YAML traits for `CallSiteInfo`
* Updated the `MemProfReader` to pass `Frames` instead of the entire
`CallSiteInfo`
* Updated test cases to use `testing::Field`
* Add YAML sequence traits for CallSiteInfo in MemProfYAML
* Also extend IndexedMemProfRecord
* XFAIL the MemProfYaml round trip test until we update the profile
format

For now we only read and write the additional information from the YAML
format. The YAML round trip test will be enabled when the serialized format is updated.
2025-03-12 09:55:56 -07:00
Kazu Hirata
10d054e954
[memprof] Introduce IndexedCallstackIdConveter (NFC) (#120540)
This patch introduces IndexedCallstackIdConveter as a convenience
wrapper around FrameIdConverter and CallStackIdConverter just for
tests.

With the new wrapper, we get to replace idioms like:

  FrameIdConverter<decltype(MemProfData.Frames)> FrameIdConv(
      MemProfData.Frames);
  CallStackIdConverter<decltype(MemProfData.CallStacks)> CSIdConv(
      MemProfData.CallStacks, FrameIdConv);

with:

  IndexedCallstackIdConveter CSIdConv(MemProfData);

Unfortunately, this exact pattern occurs in tests only; the
combinations of the frame ID converter and call stack ID converter are
diverse in production code.
2024-12-19 12:20:25 -08:00
Kazu Hirata
b6dfdd2b1e
[memprof] Drop memprof:: in unit tests (NFC) (#119113)
This patch replaces memprof::Foo with Foo if we have corresponding:

  using llvm::memprof::Foo;
2024-12-08 11:34:29 -08:00
Kazu Hirata
4cf0bd89ee
[memprof] Add getMemProfDataForTest for unit tests (#119061)
We always call getFrameMapping and getCallStackMapping together in
InstrProfTest.cpp.  This patch combines the two functions into new
function getMemProfDataForTest.
2024-12-07 10:41:42 -08:00
Kazu Hirata
427172a861
[memprof] Remove a stale comment in a unit test (#119060)
We've removed IndexedAllocationInfo::CallStack, so we don't need to
mention it.
2024-12-07 10:41:26 -08:00
Kazu Hirata
32f7f0010b
[memprof] Use gtest matchers at more places (#119050)
These gtest matchers reduce the number of times we mention the
variables under examined.
2024-12-06 22:46:54 -08:00
Kazu Hirata
ad2bdd8fab
[memprof] Remove MemProf format Version 1 (#117357)
This patch removes MemProf format Version 1 now that Version 2 and 3
are working well.
2024-11-22 11:53:31 -08:00
Kazu Hirata
ec5b729e65
[memprof] Upgrade a unit test to MemProf Version 3 (#117063)
This patch upgrades a unit test to MemProf Version 3 while removing
those bits that cannot be upgraded to Version 3.

The bits being removed expect instrprof_error::hash_mismatch from a
broken MemProf profile that references a frame that doesn't actually
exist.  Now, Version 3 no longer issues
instrprof_error::hash_mismatch.  Even if it still issued
instrprof_error::hash_mismatch, we would have a couple of hurdles:

- InstrProfWriter::addMemProfData will soon require all (or none) of
  the fields (frames, call stacks, and records) be populated.  That
  is, it won't accept an instance of IndexedMemProfData with frames
  missing.

- writeMemProfV3 asserts that every frame occurs at least once:

  assert(MemProfData.Frames.size() == FrameHistogram.size());

This patch gives up on instrprof_error::hash_mismatch and tries to
trigger instrprof_error::unknown_function with the empty profile.
2024-11-20 14:03:27 -08:00
Kazu Hirata
e468653ee7
[memprof] Use LineLocation in a unit test (NFC) (#117031)
We've switched to LineLocation from FieldsAre in MemProfUseTest.cpp.
This patch does the same thing in InstrProfTest.cpp.

llvm/unittests/Transforms/Instrumentation/MemProfUseTest.cpp
2024-11-20 12:58:07 -08:00
Kazu Hirata
a4e1a3dc8b
[memprof] Add another constructor to IndexedAllocationInfo (NFC) (#116684)
This patch adds another constructor to IndexedAllocationInfo that is
identical to the existing constructor except that the new one leaves
the CallStack field empty.

I'm planning to remove MemProf format Version 1.  Then we will migrate
the users of the existing constructor to the new one as nobody will be
using the CallStack field anymore.

Adding the new constructor now allows us to migrate a few existing
users of the old constructor even before we remove the CallStack
field.  In turn, that simplifies the patch to actually remove the
field.
2024-11-18 14:09:21 -08:00
Kazu Hirata
6bf8f08989
[memprof] Add InstrProfWriter::addMemProfData (#116528)
This patch adds InstrProfWriter::addMemProfData, which adds the
complete MemProf profile (frames, call stacks, and records) to the
writer context.

Without this function, functions like loadInput in llvm-profdata.cpp
and InstrProfWriter::mergeRecordsFromWriter must add one item (frame,
call stack, or record) at a time.  The new function std::moves the
entire MemProf profile to the writer context if the destination is
empty, which is the common use case.  Otherwise, we fall back to
adding one item at a time behind the scene.

Here are a couple of reasons why we should add this function:

- We've had a bug where we forgot to add one of the three data
  structures (frames, call stacks, and records) to the writer context,
  resulting in a nearly empty indexed profile.  We should always
  package the three data structures together, especially on API
  boundaries.

- We expose a little too much of the MemProf detail to
  InstrProfWriter.  I'd like to gradually transform
  InstrProfReader/Writer to entities managing buffers (sequences of
  bytes), with actual serialization/deserialization left to external
  classes.  We already do some of this in InstrProfReader, where
  InstrProfReader "contracts out" to IndexedMemProfReader to handle
  MemProf details.

I am not changing loadInput or InstrProfWriter::mergeRecordsFromWriter
for now because MemProfReader uses DenseMap for frames and call
stacks, whereas MemProfData uses MapVector.  I'll resolve these
mismatches in subsequent patches.
2024-11-18 08:56:25 -08:00
Kazu Hirata
fd9f3beb0f
[memprof] Upgrade a unit test to Version 3 (#116516)
I'm planning to remove MemProf Version 1, which is a maintenance
burden because it uses a different set of struct fields in
IndexedAllocationInfo and IndexedMemProfRecord compared to Version 2
and 3.  (FWIW, Version 2 and 3 are much closer to each other.)

Before we remove the old version, we need to upgrade
test_memprof_merge to Version 3.  Here are some remarks:

- Without this patch, we call Writer.addMemProfFrame, which I don't
  think is correct.  This way, we are adding IndexedMemProfRecord to
  Writer2 without any frame.  I'm changing that to
  Writer2.addMemProfFrame.

- This patch adds a call to getCallStackMapping.  Version 2 and 3
  require a map from call stack IDs to call stacks.

- I'm calling makeRecordV2 instead of makeRecord to populate the
  struct fields used by Version 2 and 3.

- Version 1 uses MemProfRecord::MemProfRecord (that is, a constructor)
  to convert IndexedMemProfRecord to MemProfRecord.  Version 2 and 3
  use MemProfRecord::toMemProfRecord, a member function, to do the
  same task.
2024-11-16 19:06:10 -08:00
Kazu Hirata
612b779963
[memprof] Update a comment (NFC) (#116500)
Note that Version0 has been removed in #116442.
2024-11-16 11:06:04 -08:00
Kazu Hirata
9a730d878e
[memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs (#115807)
Undrifting the MemProf profile requires two sets of information:

- caller-callee pairs from the profile
- callee-callee pairs from the IR

This patch adds a function to do the former.  The latter has been
addressed by extractCallsFromIR.

Unfortunately, the current MemProf format does not directly give us
the caller-callee pairs from the profile.  "struct Frame" just tells
us where the call site is -- Caller GUID and line/column numbers; it
doesn't tell us what function a given Frame is calling.  To extract
caller-callee pairs, we need to scan each call stack, look at two
adjacent Frames, and extract a caller-callee pair.

Conceptually, we would extract caller-callee pairs with:

  for each MemProfRecord in the profile:
    for each call stack in AllocSites:
      extract caller-callee pairs from adjacent pairs of Frames

However, this is highly inefficient.  Obtaining MemProfRecord involves
looking up the OnDiskHashTable, allocating several vectors on the
heap, and populating fields that are irrelevant to us, such as MIB and
CallSites.

This patch adds an efficient way of doing the above.  Specifically, we

- go though all IndexedMemProfRecords,
- look at each linear call stack ID
- extract caller-callee pairs from each call stack

The extraction is done by a new class CallerCalleePairExtractor,
modified from LinearCallStackIdConverter, which reconstructs a call
stack from the radix tree array.  For our purposes, we skip the
reconstruction and immediately populates the data structure for
caller-callee pairs.

The resulting caller-callee-pairs is of the type:

  DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs;

which can be passed directly to longestCommonSequence just like the
result of extractCallsFromIR.

Further performance optimizations are possible for the new functions
in this patch.  I'll address those in follow-up patches.
2024-11-13 23:40:12 -08:00
Jay Foad
eb6e7e8f89
[unittests] Use {} instead of std::nullopt to initialize empty ArrayRef (#109388)
Follow up to #109133.
2024-09-21 10:59:50 +01:00
JOE1994
459a82e689 [llvm][unittests] Don't call raw_string_ostream::flush() (NFC)
raw_string_ostream::flush() is essentially a no-op (also specified in docs).
Don't call it in tests that aren't meant to test 'raw_string_ostream' itself.

p.s. remove a few redundant calls to raw_string_ostream::str()
2024-09-13 19:55:44 -04:00
Kazu Hirata
6c8ff4cbb8
[ProfileData] Take ArrayRef<InstrProfValueData> in addValueData (NFC) (#97363)
This patch fixes another place in ProfileData where we have a pointer
to an array of InstrProfValueData and its length separately.

addValueData is a bit unique in that it remaps incoming values in
place before adding them to ValueSites.  AFAICT, no caller of
addValueData uses updated incoming values.  With this patch, we add
value data to ValueSites first and then remaps values there.  This
way, we can take ArrayRef<InstrProfValueData> as a parameter.
2024-07-11 16:38:44 -07:00
Kazu Hirata
40a5a86df9
[ProfileData] Migrate to a new version of getValueProfDataFromInst (#96381) 2024-06-29 00:39:43 -07:00
Kazu Hirata
beba2e7385
[ProfileData] Teach addValueData to honor parameter Site (#96233)
This patch teaches addValueData to honor Site for verification
purposes.  It does not affect the profile data in any manner.
2024-06-20 22:25:19 -07:00
Kazu Hirata
8f7d30abb2
[ProfileData] Sink the length checks (#95604)
The new API getValueArrayForSite returns ArrayRef<InstrProfValueData>,
packaging the array length and contents together.

This patch sinks the array length checks just before we check the
contents.  This way, we check both the array length and contents
immediately after calling getValueArrayForSite.
2024-06-14 15:48:06 -07:00
Kazu Hirata
76b64aeb88
[ProfileData] Migrate to a new version of getValueProfDataFromInst (#95568)
Note that the version of getValueProfDataFromInst that returns bool
has been "deprecated" since:

  commit 1e15371dd8843dfc52b9435afaa133997c1773d8
  Author: Mingming Liu <mingmingl@google.com>
  Date:   Mon Apr 1 15:14:49 2024 -0700
2024-06-14 15:32:11 -07:00
Kazu Hirata
bbe9119d9c
[ProfileData] Sink the length checks (#95559)
The new API getValueArrayForSite returns ArrayRef<InstrProfValueData>,
packaging the array length and contents together.

This patch sinks the array length checks just before we check the
contents.  This way, we check both the array length and contents
immediately after calling getValueArrayForSite.
2024-06-14 09:21:18 -07:00
Kazu Hirata
180a536665
[ProfileData] Fix the order of tests (#95549)
Without this patch, we call getValueForSite before veryfing that we
have an expected number of value sites with getNumValueSites.

This patch fixes the order by "sinking" the call to getValueForSite.

While I am at it, this patch migrates the use of getValueForSite to
getValueArrayForSite.
2024-06-14 09:18:31 -07:00
Kazu Hirata
9ad102f03b
[ProfileData] Migrate to getValueArrayForSite (#95493)
This patch migrates uses of getValueForSite to getValueArrayForSite.
Each hunk is self-contained, meaning that each one can be applied
independently of the others.

In the unit test, there are cases where the array length check is
performed a lot earlier than the array content check.  For now, I'm
leaving the length checks where they are.  I'll consider moving them
when I migrate uses of getNumValueDataForSite to getValueArrayForSite
in a follow-up patch.
2024-06-14 06:38:48 -07:00
Kazu Hirata
41587739a6
[ProfileData] Migrate to getValueArrayForSite (#95457)
This patch is a collection of one-liner migrations to
getValueArrayForSite.
2024-06-13 13:04:50 -07:00
Kazu Hirata
4b493e31b2
[ProfileData] Add getValueArrayForSite (#95335)
Without this patch, a typical traversal over the value data looks
like:

  uint32_t NV = Func.getNumValueDataForSite(VK, S);
std::unique_ptr<InstrProfValueData[]> VD = Func.getValueForSite(VK, S);
  for (uint32_t V = 0; V < NV; V++)
    Do something with VD[V].Value and/or VD[V].Count;

This patch adds getValueArrayForSite, which returns
ArrayRef<InstrProfValueData>, so we can do:

  for (const auto &V : Func.getValueArrayForSite(VK, S))
    Do something with V.Value and/or V.Count;

I'm planning to migrate the existing uses of getValueForSite to
getValueArrayForSite in follow-up patches and remove getValueForSite
and getNumValueDataForSite.
2024-06-13 11:08:17 -07:00
Kazu Hirata
00fa3fbfb8
[ProfileData] Compute sum in annotateValueSite (NFC) (#95199)
getValueForSite computes the total count -- the total number of times
a given value site is visited.  The problem is that, excluding tests,
annotateValueSite is the only place that needs the total count.

This patch moves the total count computation to annotateValueSite.
2024-06-12 10:14:33 -07:00
Min Hsu
8466480bda [ProfData] Remove unused variable in unittest
Removed unused `VTables` in unittests/ProfileData/InstrProfTest.cpp.

NFC.
2024-05-09 14:42:52 -07:00
Mingming Liu
98c1ba460a
[InstrProf] Add vtables with type metadata into symtab (#81051)
The indirect-call-promotion pass will look up the vtable to find out
the virtual function [1],
and add vtable-derived information in icall
candidate [2] for cost-benefit analysis.

[1] https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR395-R416
[2] https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR195-R199
2024-05-09 10:41:23 -07:00
Kazu Hirata
c9dae43438
[memprof] Add access checks to PortableMemInfoBlock::get* (#90121)
commit 4c8ec8f8bc3fb4dda4fd36c3b2ad745bd3451970
  Author: Kazu Hirata <kazu@google.com>
  Date:   Wed Apr 24 16:25:35 2024 -0700

introduced the idea of serializing/deserializing a subset of the
fields in PortableMemInfoBlock.  While it reduces the size of the
indexed MemProf profile file, we now could inadvertently access
unavailable fields and go without noticing.

To protect ourselves from the risk, this patch adds access checks to
PortableMemInfoBlock::get* methods by embedding a bit set representing
available fields into PortableMemInfoBlock.
2024-04-28 12:49:08 -07:00
Kazu Hirata
352602010f Repply [memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places.  This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.

The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions.  This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.

This iteration fixes a problem uncovered with ubsan, where we were
dereferencing an uninitialized std::unique_ptr.
2024-04-28 11:44:45 -07:00
Vitaly Buka
7aa6896dd7
Revert "[memprof] Introduce FrameIdConverter and CallStackIdConverter" (#90318)
Reverts llvm/llvm-project#90307

Breaks bots https://lab.llvm.org/buildbot/#/builders/5/builds/42943
2024-04-27 00:15:08 -07:00
Kazu Hirata
e04df693bf
[memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places.  This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.

The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions.  This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.
2024-04-26 19:22:17 -07:00
Kazu Hirata
4c8ec8f8bc
[memprof] Reduce schema for Version2 (#89876)
Curently, the compiler only uses several fields of MemoryInfoBlock.
Serializing all fields into the indexed MemProf file simply wastes
storage.

This patch limits the schema down to four fields for Version2 by
default.  It retains the old behavior of serializing all fields via:

  llvm-profdata merge --memprof-version=2 --memprof-full-schema

This patch reduces the size of the indexed MemProf profile I have by
40% (1.6GB down to 1.0GB).
2024-04-24 16:25:35 -07:00
Kazu Hirata
446371f5cc
[memprof] Use std::optional (NFC) (#89317)
This is partly for readability and partly for consistency with other
id-to-frame callbacks.
2024-04-18 14:56:33 -07:00
Kazu Hirata
172f6ddfa7
[memprof] Add Version2 of the indexed MemProf format (#89100)
This patch adds Version2 of the indexed MemProf format.  The new
format comes with a hash table from CallStackId to actual call stacks
llvm::SmallVector<FrameId>.  The rest of the format refers to call
stacks with CallStackId.  This "values + references" model effectively
deduplicates call stacks.  Without this patch, a large indexed memprof
file of mine shrinks from 4.4GB to 1.6GB, a 64% reduction.

This patch does not make Version2 generally available yet as I am
planning to make a few more changes to the format.
2024-04-18 14:12:58 -07:00
Kazu Hirata
db9a17a407
[memprof] Use std::optional (NFC) (#88366) 2024-04-11 09:56:01 -07:00
Mingming Liu
1351d17826
[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825)
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)

* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
 
* Implement profile reader and writer support 
  * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
  * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
  * Indexed profile writer collects the list of vtable names, and stores that to index profiles.
  * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.

rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
2024-04-01 08:52:35 -07:00
Kazu Hirata
74799f4240
[memprof] Add call stack IDs to IndexedAllocationInfo (#85888)
The indexed MemProf file has a huge amount of redundancy.  In a large
internal application, 82% of call stacks, stored in
IndexedAllocationInfo::CallStack, are duplicates.

We should work toward deduplicating call stacks by referring to them
with unique IDs with actual call stacks stored in a separate data
structure, much like we refer to memprof::Frame with memprof::FrameId.

At the same time, we need to facilitate a graceful transition from the
current version of the MemProf format to the next.  We should be able
to read (but not write) the current version of the MemProf file even
after we move onto the next one.

With those goals in mind, I propose to have an integer ID next to
CallStack in IndexedAllocationInfo to refer to a call stack in a
succinct manner.  We'll gradually increase the areas of the compiler
where IDs and call stacks have one-to-one correspondence and
eventually remove the existing CallStack field.

This patch adds call stack ID, named CSId, to IndexedAllocationInfo
and teaches the raw profile reader to compute unique call stack IDs
and store them in the new field.  It does not introduce any user of
the call stack IDs yet, except in verifyFunctionProfileData.
2024-03-23 19:50:15 -07:00
Mingming Liu
05091aa3ac
[NFC][InstrProf]Generalize getParsedIRPGOFuncName to getParsedIRPGOName (#81054)
- Function getParsedIRPGOFuncName splits name by delimiter. The `[filename;]mangled-name` format could be generalized for non-function global values (e.g., vtables for type profiling). So rename the
function.
- Use kGlobalIdentifierDelimiter rather than semicolon directly for defragmentation.
2024-02-07 20:03:44 -08:00
Ellis Hoag
9a2df55f47
[InstrProf] No linkage prefixes in IRPGO names (#76994)
Change the format of IRPGO counter names to
`[<filepath>;]<mangled-name>` which is computed by
`GlobalValue::getGlobalIdentifier()` to fix #74565.

In fe051934cbb0aaf25d960d7d45305135635d650b
(https://reviews.llvm.org/D156569) the format of IRPGO counter names was
changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is
basically `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O
(yes, it is confusing that `<linkage-name>` is computed with
`Mangler().getNameWithPrefix()` while `<mangled-name>` is just
`F.getName()`). We discovered in #74565 that this causes some missed
import issues on some targets and #74008 is a partial fix.

Since `<mangled-name>` may not match the `<linkage-name>` on some
targets like Mach-O, we will need to post-process the output of
`llvm-profdata order` before passing to the linker via `-order_file`.

Profiles generated after fe051934cbb0aaf25d960d7d45305135635d650b will
become stale after this diff, but I think this is acceptable since that
patch landed after the LLVM 18 cut which hasn't been released yet.
2024-01-04 16:13:57 -08:00
Mingming Liu
78a195e100
Reland the reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75954)
Simplify the compiler-rt test to make it more general for different
platforms, and use `*DAG` matchers for lines that may be emitted
out-of-order.
- The compiler-rt test passed on a Windows machine. Previously name
matchers don't work for MSVC mangling
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907)
- `*DAG` matchers fixed the error in
https://lab.llvm.org/buildbot/#/builders/94/builds/17924

This is the second reland and fixed errors caught in first reland
(https://github.com/llvm/llvm-project/pull/75860)

**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-19 12:25:56 -08:00
Mingming Liu
6ce23ea0ab
Revert "Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. "" (#75888)
Reverts llvm/llvm-project#75860
- Mangled name mismatch on Windows
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907/steps/8/logs/stdio)
2023-12-18 19:31:18 -08:00
Mingming Liu
c5871712ae
Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75860)
Fixed build-bot failures caught by post-submit tests
1) Add the list of command line tools needed by new compiler-rt test into dependency.
2) Use `starts_with` to replace deprecated `startswith`.

**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-18 17:43:40 -08:00
Mingming Liu
3aa5d71127
Revert "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles." (#75835)
Reverts llvm/llvm-project#74008

The compiler-rt test failed due to `llvm-dis` not found
(https://lab.llvm.org/buildbot/#/builders/127/builds/59884)
Will revert and investigate how to require the proper dependency.
2023-12-18 09:39:55 -08:00
Mingming Liu
245cddae70
[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. (#74008)
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-18 09:10:39 -08:00