301 Commits

Author SHA1 Message Date
Mircea Trofin
7efceca079
[nfc][pgo] const-ify some APIs in InstrProfSymtab (#153284)
The main reason some `const` - sounding APIs weren't const was because their state is lazily updated (ensuring ordering).
2025-08-13 18:08:08 +02:00
Kazu Hirata
cf18e5e0f8
[ProfileData] Remove an unnecessary cast (NFC) (#152087)
new already returns ValueProfData *.
2025-08-05 07:39:14 -07:00
Kazu Hirata
b809d5e2ac
[ProfileData] Use lambdas instead of std::bind (NFC) (#146625)
Lambdas are a lot shorter than std::bind here.
2025-07-01 22:50:04 -07:00
Mircea Trofin
82cbd68504
[NFC][PGO] Use constants rather than free strings for metadata labels (#145721) 2025-06-25 16:20:10 -07:00
Mingming Liu
f3f28323ad
[StaticDataLayout][PGO] Add profile format for static data layout, and the classes to operate on the profiles. (#138170)
Context: For
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744#p-336543-background-3,
we propose to profile memory loads and stores via hardware events,
symbolize the addresses of binary static data sections and feed the
profile back into compiler for data partitioning.

This change adds the profile format for static data layout, and the
classes to operate on it.

The profile and its format
1. Conceptually, a piece of data (call it a symbol) is represented by
its symbol name or its content hash. The former applies to majority of
data whose mangled name remains relatively stable over binary releases,
and the latter applies to string literals (with name patterns like
`.str.<N>[.llvm.<hash>]`.
- The symbols with samples are hot data. The number of hot symbols is
small relative to all symbols. The profile tracks its sampled counts and
locations. Sampled counts come from hardware events, and locations come
from debug information in the profiled binary. The symbols without
samples are cold data. The number of such cold symbols is large. The
profile tracks its representation (the name or content hash).
- Based on a preliminary study, debug information coverage for data
symbols is partial and best-effort. In the LLVM IR, global variables
with source code correspondence may or may not have debug information.
Therefore the location information is optional in the profiles.
2. The profile-and-compile cycle is similar to SamplePGO. Profiles are
sampled from production binaries, and used in next binary releases.
Known cold symbols and new hot symbols can both have zero sampled
counts, so the profile records known cold symbols to tell the two for
next compile.

In the profile's serialization format, strings are concatenated together
and compressed. Individual records stores the index.

A separate PR will connect this class to InstrProfReader/Writer via
MemProfReader/Writer.

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
2025-05-15 18:31:50 -07:00
Kazu Hirata
2f3067ed69
[llvm] Remove unused local variables (NFC) (#138454) 2025-05-04 09:38:16 -07:00
Owen Rodley
d3d856ad84
Clean up external users of GlobalValue::getGUID(StringRef) (#129644)
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.

This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.

This does the following:

* Rename GlobalValue::getGUID(StringRef) to
  getGUIDAssumingExternalLinkage. This is simply making explicit at the
  callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
  GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
  getGUIDAssumingExternalLinkage, to make the assumption explicit.


There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
2025-04-28 11:09:43 +10:00
Mingming Liu
2f0cd0c68e
[NFCI] Move ProfOStream from InstrProfWriter.cpp to InstrProf.h/cpp (#136791)
ProfOStream is a wrapper class for output stream, and used by
InstrProfWriter.cpp to serialize various profiles, like PGO profiles and
MemProf.

This change proposes to move it into InstrProf.h/cpp. After this is in,
InstrProfWriter can dispatch serialization of various formats into
methods like `obj->serialize()`, and the serialization code could be
move out of InstrProfWriter.cpp into individual classes (each in a
smaller cpp file). One example is that we can gradually move
writeMemprof [1] into llvm/*/ProfileData/MemProf.h/cpp, where a couple
of classes already have `serialize/deserialize` methods.


[1]
85b35a9077/llvm/lib/ProfileData/InstrProfWriter.cpp (L774-L791)
2025-04-23 09:21:07 -07:00
Nick Sarnie
48b7530273
[clang][flang][Triple][llvm] Add isOffload function to LangOpts and isGPU function to Triple (#126956)
I'm adding support for SPIR-V, so let's consolidate these checks.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-03-28 14:19:20 +00:00
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Kazu Hirata
e264b0e856
[ProfileData] Avoid repeated hash lookups (NFC) (#128829) 2025-02-26 00:57:28 -08:00
Teresa Johnson
594e11ce42
[MemProf] Avoid incorrect ICP symtab canonicalization (#115419)
ICP builds a symtab from the symbols in the module allowing mapping from
the VP metadata GUIDs to the Function. MemProf uses this same symtab
handling for its ICP during cloning. When symbols are added to the
symtab, the handling adds both a GUID computed from the function name,
or from the attached PGOFuncName metadata for locals, as well as a GUID
computed from the "canonicalized" name, which strips all "." suffixes
other than ".__uniq". This was originally meant to remove the ".llvm.*"
suffix added to promoted locals (done earlier in the ThinLTO backend).
In theory, it should no longer be needed as locals should have
PGOFuncName metadata.

However, this was causing a linker unsat, in code that used coroutines.
For an original coroutine function, there were several additional
functions created that had the same name, but different "." suffixes.
Therefore the canonical name for these additional functions had the same
GUID as that of the original function, leading to extra entries in the
symtab, and to selecting the wrong function for promotion. For regular
ICP this can happen, but is just a performance issue. However, for
memprof the promoted direct call calls a memprof clone, and because we
called the wrong function, in this case it didn't have a memprof clone
and we got a linker unsat.

We may be able to remove the canonical name handling for ICP in general,
but for now disable it for MemProf. At worst this could lead to not
finding a GUID in the symtab and not performing an ICP, so should be
conservatively correct.
2024-11-07 21:00:42 -08:00
Ethan Luis McDonough
fde2d23ee2
[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)
This pull request is a revised version of #76587. This pull request
fixes some build issues that were present in the previous version of
this change.

> This pull request is the first part of an ongoing effort to extends
PGO instrumentation to GPU device code. This PR makes the following
changes:
>
> - Adds blank registration functions to device RTL
> - Gives PGO globals protected visibility when targeting a supported
GPU
> - Handles any addrspace casts for PGO calls
> - Implements PGO global extraction in GPU plugins (currently only
dumps info)
>
> These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-08-22 01:10:54 -05:00
Kazu Hirata
6c8ff4cbb8
[ProfileData] Take ArrayRef<InstrProfValueData> in addValueData (NFC) (#97363)
This patch fixes another place in ProfileData where we have a pointer
to an array of InstrProfValueData and its length separately.

addValueData is a bit unique in that it remaps incoming values in
place before adding them to ValueSites.  AFAICT, no caller of
addValueData uses updated incoming values.  With this patch, we add
value data to ValueSites first and then remaps values there.  This
way, we can take ArrayRef<InstrProfValueData> as a parameter.
2024-07-11 16:38:44 -07:00
Mircea Trofin
afbd7d1e7c
[NFC] Coding style: drop k in kGlobalIdentifierDelimiter (#98230) 2024-07-09 15:44:55 -07:00
Mircea Trofin
e291f31f89
[NFC] Coding style fixes in InstrProf.cpp (#98211) 2024-07-09 13:28:35 -07:00
Kazu Hirata
b8eaa5bb10
[ProfileData] Remove the old version of getValueProfDataFromInst (#97374)
I've migrated uses of the old version of getValueProfDataFromInst to
the one that returns SmallVector<InstrProfValueData, 4>.  This patch
removes the old version.
2024-07-02 11:46:31 -07:00
Mingming Liu
1518b260ce
[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (#81442)
Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.
    
* Function-comparison (before):

```
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
   call %func
```

* VTable-comparison (after):

```
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
  %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
  %func = load ptr, ptr %vfn
  call %func
```
    
Key changes:
1. Find out virtual calls and the vtables they come from.
- The ICP relies on type intrinsic `llvm.type.test` to find out virtual
calls and the
compatible vtables, and relies on type metadata to find the address
point for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when the
number of vtables for a function candidate is within (option specified)
threshold.
3. Sink the function addressing and vtable load instruction to indirect
fallback.
- The sink helper functions are simplified versions of
`InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are
not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues`
and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be
moved into Transforms/Utils/Local.cpp (or another util cpp file) to
handle debug intrinsics when moving instructions across basic blocks.
4. Keep value profiles updated
     1) Update vtable value profiles after inline
     2) For either function-based comparison or vtable-based comparison,
          update both vtable and indirect call value profiles.
2024-06-29 23:21:33 -07:00
Ethan Luis McDonough
2c8b912f63
Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587)"
This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.
2024-06-28 12:30:45 -05:00
Ethan Luis McDonough
5fd2af38e4
[PGO][OpenMP] Instrumentation for GPU devices (#76587)
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:

- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)

These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-06-28 10:42:19 -05:00
Kazu Hirata
b0ae923ada
[ProfileData] Add a variant of getValueProfDataFromInst (#95993)
This patch adds a variant of getValueProfDataFromInst that returns
std::vector<InstrProfValueData> instead of
std::unique<InstrProfValueData[]>.  The new return type carries the
length with it, so we can drop out parameter ActualNumValueData.
Also, the caller can directly feed the return value into a range-based
for loop as shown in the patch.

I'm planning to migrate other callers of getValueProfDataFromInst to
the new variant in follow-up patches.
2024-06-22 00:40:36 -07:00
Kazu Hirata
beba2e7385
[ProfileData] Teach addValueData to honor parameter Site (#96233)
This patch teaches addValueData to honor Site for verification
purposes.  It does not affect the profile data in any manner.
2024-06-20 22:25:19 -07:00
Kazu Hirata
d6b0b7acf3
[ProfileData] Remove getValueProfDataFromInst (#95617)
I've migrated all uses to the new version of getValueProfDataFromInst
that returns std::unique_ptr<InstrProfValueData[]>.
2024-06-17 18:50:08 -07:00
Kazu Hirata
9ad102f03b
[ProfileData] Migrate to getValueArrayForSite (#95493)
This patch migrates uses of getValueForSite to getValueArrayForSite.
Each hunk is self-contained, meaning that each one can be applied
independently of the others.

In the unit test, there are cases where the array length check is
performed a lot earlier than the array content check.  For now, I'm
leaving the length checks where they are.  I'll consider moving them
when I migrate uses of getNumValueDataForSite to getValueArrayForSite
in a follow-up patch.
2024-06-14 06:38:48 -07:00
Kazu Hirata
31440738bd
[ProfileData] Use std::vector for ValueData (NFC) (#95194)
This patch changes the type of ValueData to
std::vector<InstrProfValueData> so that, in a follow-up patch, we can
teach getValueForSite to return ArrayRef<InstrProfValueData>.

Currently, a typical traversal over the value data looks like:

  uint32_t NV = Func.getNumValueDataForSite(VK, I);
std::unique_ptr<InstrProfValueData[]> VD = Func.getValueForSite(VK, I);
  for (uint32_t V = 0; V < NV; V++)
    Do something with VD[V].Value and/or VD[V].Count;

Note that we need to call getNumValueDataForSite and getValueForSite
separately.  If getValueForSite returns ArrayRef<InstrProfValueData>
in the future, then we'll be able to do something like:

  for (const auto &V : Func.getValueForSite(VK, I))
    Do something with V.Value and/or V.Count;

If ArrayRef<InstrProfValueData> directly points to ValueData, then
getValueForSite won't need to allocate memory with std::make_unique.

Now, switching to std::vector requires us to update several places:

- sortByTargetValues switches to llvm::sort because we don't need to
  worry about sort stability.

- sortByCount retains sort stability because std::list::sort also
  performs stable sort.

- merge builds another array and move it back to ValueData to avoid a
  potential quadratic behavior with std::vector::insert into the
  middle of a vector.
2024-06-12 11:22:49 -07:00
Kazu Hirata
00fa3fbfb8
[ProfileData] Compute sum in annotateValueSite (NFC) (#95199)
getValueForSite computes the total count -- the total number of times
a given value site is visited.  The problem is that, excluding tests,
annotateValueSite is the only place that needs the total count.

This patch moves the total count computation to annotateValueSite.
2024-06-12 10:14:33 -07:00
Kazu Hirata
bfa937a487
[ProfileData] Add const to a few places (NFC) (#94803) 2024-06-07 15:06:04 -07:00
Kazu Hirata
7476c20c48
[ProfileData] Remove swapToHostOrder (#94665)
This patch removes swapToHostOrder in favor of
llvm::support::endian::readNext as swapToHostOrder is too thin a
wrapper around readNext.

Note that there are two variants of readNext:

- readNext<type, endian, align>(ptr)
- readNext<type, align>(ptr, endian)

swapToHostOrder uses the former, but this patch switches to the latter.

While we are at it, this patch teaches readNext to default to
unaligned just as I did in:

  commit 568368a43e5b4adb3c5d105a0eff3e0c13c0af8c
  Author: Kazu Hirata <kazu@google.com>
  Date:   Mon Apr 15 19:05:30 2024 -0700
2024-06-06 13:25:52 -07:00
Mingming Liu
c803c29039
[nfc][InstrProf]Remove 'offsetOf' when parsing indexed profiles (#93346)
- In `Header::readFromBuffer`, read the buffer in the forward direction by using `readNext`.
- When compute the header size, spell out the constant.

With the changes above, we can remove `offsetOf` in InstrProf.cpp

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
2024-05-30 12:44:29 -07:00
Mingming Liu
737a3018e8
[nfc][InstrFDO] Add Header::getIndexedProfileVersion and use it to decide profile version. (#93613)
This is a split of https://github.com/llvm/llvm-project/pull/93346 as
discussed.
2024-05-29 10:15:17 -07:00
Ellis Hoag
73eb9b3314
[InstrProf] Evaluate function order using test traces (#92451)
The `llvm-profdata order` command is used to compute a function order
using traces from the input profile. Add the `--num-test-traces` flag to
keep aside N traces to evalute this order. These test traces are assumed
to be the actual function execution order in some experiment. The output
is a number that represents how many page faults we got. Lower is
better.

I tested on a large profile I already had.
```
llvm-profdata order default.profdata --num-test-traces=30
# Ordered 149103 functions
# Total area under the page fault curve: 2.271827e+09
...
```

I also improved `TemporalProfTraceTy::createBPFunctionNodes()` in a few
ways:
* Simplified how `UN`s are computed
* Change how the initial `Node` order is computed
* Filter out rare and common `UN`s
* Output vector is an aliased argument instead of a return

These changes slightly improved the evaluation in my test.
```
llvm-profdata order default.profdata --num-test-traces=30
# Ordered 149103 functions
# Total area under the page fault curve: 2.268586e+09
...
```
2024-05-23 11:19:29 -07:00
Mingming Liu
b66779b5bf
[nfc][InstrProfReader]Store header fields in native endianness (#92947)
- Use `Header.Version` directly and remove Header::formatVersion

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
2024-05-21 21:25:12 -07:00
Mingming Liu
98c1ba460a
[InstrProf] Add vtables with type metadata into symtab (#81051)
The indirect-call-promotion pass will look up the vtable to find out
the virtual function [1],
and add vtable-derived information in icall
candidate [2] for cost-benefit analysis.

[1] https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR395-R416
[2] https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR195-R199
2024-05-09 10:41:23 -07:00
Kazu Hirata
bb6df0804b
[llvm] Use StringRef::operator== instead of StringRef::equals (NFC) (#91441)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  70 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-08 10:33:53 -07:00
Kazu Hirata
f430e37446
[llvm] Drop unaligned from calls to readNext (NFC) (#88841)
Now readNext defaults to unaligned accesses.  This patch drops
unaligned to improve readability.
2024-04-16 12:47:02 -07:00
Mingming Liu
08e210c6af
[NFC][IndirectCallProm] Refactor function-based conditional devirtualization and indirect call value profile update into one helper function (#80762)
* The motivation is to move indirect callee profile update inside the
function-based speculative indirect-call promotion, so that there are
fewer diffs the vtable-based transformation and profile update is
implemented in a follow-up patch.
* The Parent patch is https://github.com/llvm/llvm-project/pull/79381
2024-04-11 13:28:20 -07:00
Mingming Liu
1e15371dd8
[ThinLTO][TypeProf] Implement vtable def import (#79381)
Add annotated vtable GUID as referenced variables in per function
summary, and update bitcode writer to create value-ids for these
referenced vtables.

- This is the part3 of type profiling work, and described in the "Virtual Table Definition Import" [1] section of the
RFC.

[1] https://github.com/llvm/llvm-project/pull/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
2024-04-01 15:14:49 -07:00
Mingming Liu
1351d17826
[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825)
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)

* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
 
* Implement profile reader and writer support 
  * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
  * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
  * Indexed profile writer collects the list of vtable names, and stores that to index profiles.
  * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.

rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
2024-04-01 08:52:35 -07:00
wanglei
f439c71373 [InstrProf][NFC] Fix -Wimplicit-fallthrough warning in InstrProf.cpp after #82711 2024-03-06 10:20:30 +08:00
Mingming Liu
16e74fd489
Reland "[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling." (#82711)
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.

1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
   
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))

**Original Description**

* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-27 11:07:40 -08:00
Mingming Liu
0e8d1877cd
Revert type profiling change as compiler-rt test break on Windows. (#82583)
Examples
https://lab.llvm.org/buildbot/#/builders/127/builds/62532/steps/8/logs/stdio
2024-02-21 21:41:33 -08:00
Mingming Liu
db7e9e6841
[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling. (#81691)
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
  - Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
  
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
  
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600

---------

Co-authored-by: modiking <modiking213@gmail.com>
2024-02-21 20:59:42 -08:00
Mingming Liu
2422e969bf
[NFC][InstrProf]Factor out getCanonicalName to compute the canonical name given a pgo name. (#81547)
- Also update the `InstrProf::addFuncWithName` to call the newly added
`getCanonicalName`.
2024-02-13 10:49:35 -08:00
Mingming Liu
05091aa3ac
[NFC][InstrProf]Generalize getParsedIRPGOFuncName to getParsedIRPGOName (#81054)
- Function getParsedIRPGOFuncName splits name by delimiter. The `[filename;]mangled-name` format could be generalized for non-function global values (e.g., vtables for type profiling). So rename the
function.
- Use kGlobalIdentifierDelimiter rather than semicolon directly for defragmentation.
2024-02-07 20:03:44 -08:00
spupyrev
30aa9fb4c1 Revert "[InstrProf] Adding utility weights to BalancedPartitioning (#72717)"
This reverts commit 5954b9dca21bb0c69b9e991b2ddb84c8b05ecba3
due to broken Windows build
2024-01-19 15:13:47 -08:00
spupyrev
5954b9dca2
[InstrProf] Adding utility weights to BalancedPartitioning (#72717)
Adding weights to utility nodes in BP so that we can give more
importance to
certain utilities. This is useful when we optimize several objectives
jointly.
2024-01-19 13:36:59 -08:00
Fangrui Song
0c6dc80531
BalancedPartitioning: minor updates (#77568)
When LargestTraceSize is a power of two, createBPFunctionNodes does not
allocate a group ID for Trace[LargestTraceSize-1] (as N is off by 1).
Fix
this and change floor+log2 to Log2_64.

BalancedPartitioning::bisect can use unstable sort because `Nodes`
contains distinct `InputOrderIndex`s.

BalancedPartitioning::runIterations: use one DenseMap and simplify the
node renumbering code.
2024-01-17 10:46:34 -08:00
Ellis Hoag
9a2df55f47
[InstrProf] No linkage prefixes in IRPGO names (#76994)
Change the format of IRPGO counter names to
`[<filepath>;]<mangled-name>` which is computed by
`GlobalValue::getGlobalIdentifier()` to fix #74565.

In fe051934cbb0aaf25d960d7d45305135635d650b
(https://reviews.llvm.org/D156569) the format of IRPGO counter names was
changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is
basically `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O
(yes, it is confusing that `<linkage-name>` is computed with
`Mangler().getNameWithPrefix()` while `<mangled-name>` is just
`F.getName()`). We discovered in #74565 that this causes some missed
import issues on some targets and #74008 is a partial fix.

Since `<mangled-name>` may not match the `<linkage-name>` on some
targets like Mach-O, we will need to post-process the output of
`llvm-profdata order` before passing to the linker via `-order_file`.

Profiles generated after fe051934cbb0aaf25d960d7d45305135635d650b will
become stale after this diff, but I think this is acceptable since that
patch landed after the LLVM 18 cut which hasn't been released yet.
2024-01-04 16:13:57 -08:00
Mingming Liu
78a195e100
Reland the reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75954)
Simplify the compiler-rt test to make it more general for different
platforms, and use `*DAG` matchers for lines that may be emitted
out-of-order.
- The compiler-rt test passed on a Windows machine. Previously name
matchers don't work for MSVC mangling
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907)
- `*DAG` matchers fixed the error in
https://lab.llvm.org/buildbot/#/builders/94/builds/17924

This is the second reland and fixed errors caught in first reland
(https://github.com/llvm/llvm-project/pull/75860)

**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`

This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.

To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.

*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
2023-12-19 12:25:56 -08:00
Mingming Liu
6ce23ea0ab
Revert "Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. "" (#75888)
Reverts llvm/llvm-project#75860
- Mangled name mismatch on Windows
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907/steps/8/logs/stdio)
2023-12-18 19:31:18 -08:00