27 Commits

Author SHA1 Message Date
Nicolai Hähnle
11a4b2d950
Cleanup the LLVM exported symbols namespace (#161240)
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.
2025-10-01 15:32:07 -07:00
Owen Rodley
d3d856ad84
Clean up external users of GlobalValue::getGUID(StringRef) (#129644)
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.

This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.

This does the following:

* Rename GlobalValue::getGUID(StringRef) to
  getGUIDAssumingExternalLinkage. This is simply making explicit at the
  callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
  GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
  getGUIDAssumingExternalLinkage, to make the assumption explicit.


There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
2025-04-28 11:09:43 +10:00
Mircea Trofin
93b74f7178
[ctxprof] Scale up everything under a root by its TotalRootEntryCount (#136015)
`TotalRootEntryCount` captures how many times that root was entered - regardless if a profile was also collected or not (profile collection for a given root happens on only one thread at a time).

We don't do this in compiler_rt because the goal there is to flush out the data as fast as possible, so traversing and multiplying vectors is punted to the profile user.

We really just need to do this when flattening the profile so that the values across roots and flat profiles match. We could do it earlier, too - like when loading the profile - but it seems beneficial (at least for debugging) to keep the counter values the same as the loaded ones. We can revisit this later.
2025-04-21 08:43:21 -07:00
Mircea Trofin
4903a7b77b
[ctxprof][nfc] Move profile annotator to Analysis (#135871)
This moves the utility that propagates counter values such that we can reuse it elsewhere. Specifically, in a subsequent patch, it'll be used to guide ICP: we need to prioritize promoting indirect calls that dominate larger portions of the dynamic instruction count. We can compare them based on the dynamic count of IR instructions, and we can get that early with this counter propagation logic.

The patch is mostly a move of the existing logic, with a pimpl - style implementation to hide all the current complexity.
2025-04-16 12:10:08 -07:00
Mircea Trofin
442050ce8f
[ctxprof] Flatten indirect call info in pre-thinlink compilation (#134766)
Same idea as in #134723 - flatten indirect call info in `"VP"` `MD_prof` metadata for the thinlinker, for cases that aren't covered by a contextual profile. If we don't ICP an indirect call target in the specialized module, the call will fall to the copy of that target outside the specialized module. If the graph under that target also has some indirect calls, in the absence of this pass, we'd have a steeper performance regression - because none of those would have a chance to be ICPed.
2025-04-08 17:33:37 -07:00
Mircea Trofin
4c90d977db
[ctxprof] Use the flattened contextual profile pre-thinlink (#134723)
Flatten the profile pre-thinlink so that ThinLTO has something to work with for the parts of the binary that aren't covered by contextual profiles. Post-thinlink, the flattener is re-run and will actually change profile info, but just for the modules containing contextual trees ("specialized modules"). For the rest, the flattener just yanks out the instrumentation.
2025-04-08 17:30:49 -07:00
Mircea Trofin
6a3e5f89bb
[ctxprof] Only prune the profile in modules containing only context trees (#134340)
We will subsequently treat the whole profile as "flat" in the frontend, (i.e flatten and combine with the flat profile section), so we can have a profile for ThinLTO for parts of the application that don't come under the contextual profile. After ThinLTO, we will treat the module(s) containing contextual trees differently: they'll have only the contextual profile pertinent to them. The rest of the modules (non-contextual) will proceed "as usual", off the flattened profile.

This patch implements pruning of the contextual profile to enable the above.
2025-04-07 19:52:03 -07:00
Mircea Trofin
5223ddd83f
[ctxprof] Prepare profile format for flat profiles (#129626)
The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches.

The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
2025-03-05 07:22:35 -08:00
Mircea Trofin
2068a18c86
[ctxprof][nfc] Prepare CtxProfAnalysis for flat profiles (#129623)
Mostly remove the equivalence "no contexts == no CtxProfAnalysis result", and instead check explicitly there are no contextual profiles.
2025-03-04 16:42:47 -08:00
Mircea Trofin
b15845c005
[ctxprof] dump profiles using yaml (for testing) (#123108)
This is a follow-up from PR #122545, which enabled converting yaml to contextual profiles.

This change uses the lower level yaml APIs because:
- the mapping APIs `llvm::yaml` offers don't work with `const` values, because they (the APIs) want to enable both serialization and deserialization
- building a helper data structure would be an alternative, but it'd be either memory-consuming or overly-complex design, given the recursive nature of the contextual profiles.
2025-01-15 16:49:59 -08:00
Mircea Trofin
c4952e513f
[nfc][ctx_prof] Efficient profile traversal and update (#110052)
This optimizes profile updates and visits, where we want to access contexts for a specific function. These are all the current update cases. We do so by maintaining a list of contexts for each function, preserving preorder traversal. The list is updated whenever contexts are `std::move`-d or deleted.
2024-09-27 08:09:10 -07:00
Mircea Trofin
3d01af78a9 [nfc][ctx_prof] Remove unnecessary include
Removed dependency on `Transforms/Utils` from
`CtxProfAnalysis.cpp` - it was unnecessary to
begin with.
2024-09-25 18:42:47 -07:00
Mircea Trofin
c8365feed7
[ctx_prof] Simple ICP criteria during module inliner (#109881)
This is mostly for test: under contextual profiling, we perform ICP for those indirect callsites which have targets marked as `alwaysinline`.

This helped uncover a bug with the way the profile was updated upon ICP, where we were skipping over the update if the target wasn't called in that context. That was resulting in incorrect counts for the indirect BB.

Also flyby fix to the total/direct count values, they should be 64-bit (as all counters are in the contextual profile)
2024-09-25 15:05:52 -07:00
Mircea Trofin
783bac7ffb
[ctx_prof] Handle select and its step instrumentation (#109185)
The `step` instrumentation shouldn't be treated, during use, like an `increment`. The latter is treated as a BB ID. The step isn't that, it's more of a type of value profiling. We need to distinguish between the 2 when really looking for BB IDs (==increments), and handle appropriately `step`s. In particular, we need to know when to elide them because `select`s may get elided by function cloning, if the condition of the select is statically known.
2024-09-23 15:21:25 -07:00
Mircea Trofin
ee5709b3b4
[nfc][ctx_prof] Don't try finding callsite annotation for un-instrumentable callsites (#109184)
Reinforcing properties ensured at instrumentation time.
2024-09-18 21:13:48 -07:00
Mircea Trofin
82266d3a2b
[nfc][ctx_prof] Factor the callsite instrumentation exclusion criteria (#108471)
Reusing this in the logic fetching the instrumentation in `CtxProfAnalysis`.
2024-09-13 21:25:47 -07:00
Mircea Trofin
6cb2d40387
[ctx_prof] Handle case when no root is in this Module. (#107463)
If none of the functions in this `Module` are roots in the contextual profile, we can't use it and should just return the `{}` case.
2024-09-06 13:44:05 -07:00
Mircea Trofin
3209766608
[ctx_prof] Add Inlining support (#106154)
Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant.

Post-inlining, the update mainly consists of:
- making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions)
- in the contextual profile:
   - each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual.
   - the contexts of the callee (at the inlined callsite) are moved to the caller.
   - the callee context at the inlined callsite is deleted.
2024-09-03 16:14:05 -07:00
Mircea Trofin
73c3b7337b
[ctx_prof] Add support for ICP (#105469)
An overload of `llvm::promoteCallWithIfThenElse` that updates the contextual profile.

High-level, this is very simple: after creating the `if... then (direct call) else (indirect call)` structure, we instrument the new callsites and BBs (the instrumentation will help with tracking for other IPO transformations, and, ultimately, to match counter values before flattening to `MD_prof`).

In more detail:

- move the callsite instrumentation of the indirect call to the `else` BB, before the indirect call
- create a new callsite instrumentation for the direct call
- create instrumentation for both the `then` and `else` BBs - we could instrument just one (MST-style) but we're not running the binary with this instrumentation, and at most this would save some space (less counters tracked). For simplicity instrumenting both at this point
- update each context belonging to the caller by updating the counters, and moving the indirect callee to the new, direct callsite ID

Issue #89287
2024-08-27 15:50:13 -07:00
Mircea Trofin
1e70122cbc
[ctx_prof] API to get the instrumentation of a BB (#105468)
Analogous to PR #104491 

Issue #89287
2024-08-21 17:17:46 -07:00
Mircea Trofin
22d3fb182c
[ctx_prof] Profile flatterner (#104539)
Eventually we'll need to flatten the profile (at the end of all IPO) and lower to "vanilla" `MD_prof`. This is the first part of that.

Issue #89287
2024-08-21 10:52:10 -07:00
Mircea Trofin
c8a678b1e4
[ctx_prof] Add analysis utility to fetch ID of a callsite (#104491)
This will be needed when maintaining the contextual profile for ICP or inlining - we'll need to first fetch the ID of a callsite, which is in an instrumentation instruction (intrinsic) preceding the callsite.
2024-08-20 10:49:42 -07:00
Mircea Trofin
50c876a486
[nfc][ctx_prof] Remove the need for PassBuilder to know about UseCtxProfile (#104492) 2024-08-15 13:16:55 -07:00
Haojian Wu
8d03710728 [ctx_prof] Remove an unneeded include in CtxProfAnalysis.cpp 2024-08-15 08:17:22 +02:00
Mircea Trofin
aca01bff07
[ctx_prof] CtxProfAnalysis: populate module data (#102930)
Continuing from #102084, which introduced the analysis, we now populate
it with info about functions contained in the module.

When we will update the profile due to e.g. inlined callsites, we'll
ingest the callee's counters and callsites to the caller. We'll move
those to the caller's respective index space (counter and callers), so
we need to know and maintain where those currently end.

We also don't need to keep profiles not pertinent to this module.

This patch also introduces an arguably much simpler way to track the
GUID of a function from the frontend compilation, through ThinLTO, and
into the post-thinlink compilation step, which doesn't rely on keeping
names around. A separate RFC and patches will discuss extending this to
the current PGO (instrumented and sampled) and other consumers as an
infrastructural component.
2024-08-14 18:46:25 -07:00
Mircea Trofin
4a2bf05980 Reapply "[ctx_prof] Fix the pre-thinlink "use" case (#102511)"
This reverts commit 967185eeb85abb77bd6b6cdd2b026d5c54b7d4f3.

The problem was link dependencies, moved `UseCtxProfile` to `Analysis`.
2024-08-08 17:04:00 -07:00
Mircea Trofin
dbbf0762b6
[ctx_prof] CtxProfAnalysis (#102084)
This is an immutable analysis that loads and makes the contextual profile available to other passes. This patch introduces the analysis and an analysis printer pass. Subsequent patches will introduce the APIs that IPO passes will call to modify the profile as result of their changes.
2024-08-07 14:39:48 -04:00