20 Commits

Author SHA1 Message Date
Mircea Trofin
1576fa1010
[ctxprof] Extend the notion of "cannot return" (#135651)
At the time of instrumentation (and instrumentation lowering), `noreturn` is not applied uniformously. Rather than running `FunctionAttrs` pass, we just need to use `llvm::canReturn` exposed in PR #135650
2025-04-16 10:39:34 -07:00
Mircea Trofin
e7aed23d32
[ctxprof] Handle instrumenting functions with musttail calls (#135121)
Functions with `musttail` calls can't be roots because we can't instrument their `ret` to release the context. This patch tags their `CtxRoot` field in their `FunctionData`. In compiler-rt we then know not to allow such functions become roots, and also not confuse `CtxRoot == 0x1` with there being a context root.

Currently we also lose the context tree under such cases. We can, in a subsequent patch, have the root detector search past these functions.
2025-04-14 10:01:25 -07:00
Mircea Trofin
cfa6a5940e
[ctxprof] Don't lower instrumentation for noreturn functions (#134932)
`noreturn` functions are doubtfully interesting for performance optimization / profiling.
2025-04-08 14:48:41 -07:00
Mircea Trofin
b2dea4fd22
[ctxprof] root autodetection mechanism (#133147)
This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to *avoid* pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits).

High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`.

For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired. 

Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case.

The mechanism could be used in combination with explicit root specification, too.
2025-04-08 06:59:38 -07:00
Mircea Trofin
dd191d3d4f
[ctxprof][nfc] Share the definition of FunctionData between compiler-rt and llvm (#132136)
Mechanism to keep the compiler-rt and llvm view of `FunctionData` in sync. Since CtxInstrContextNode.h is exactly the same on both sides (there's an existing test, `compiler-rt/test/ctx_profile/TestCases/check-same-ctx-node.test`, checking that), we capture the structure in a macro that is then generated as `struct` fields on the compiler-rt side, and as `Type` objects on the llvm side. The macro needs to be told how to render a few kinds of fields. 

If we add more fields to FunctionData that can be described by the current known types of fields, then the llvm side would automatically be updated. If we need to add more kinds of fields, which we do by adding parameters to the macro, the llvm side (if not updated) would trigger a compilation error.
2025-03-20 12:48:18 -07:00
Mircea Trofin
1757a235e3
[ctxprof] Make ContextRoot an implementation detail (#131416)
`ContextRoot` `FunctionData` are currently known by the llvm side, which has to instantiate and zero-initialize them. 

This patch makes `FunctionData` the only global value that needs to be known and instantiated by the compiler. On the compiler-rt side, `ContextRoot`s are hung off `FunctionData`, when applicable.

This is for two reasons. First, it is a step towards root autodetection (in a subsequent patch). An autodetection mechanism would instantiate the `ContextRoot` for the detected roots, and then `__llvm_ctx_profile_get_context` would detect that and route to `__llvm_ctx_profile_start_context`.

The second reason is that we will hang off `ContextRoot` more complex datatypes (next patch), and we want to avoid too deep of a coupling between llvm and compiler-rt. Acting as a place to hang related data, `FunctionData` can stay simple - pointers and an (atomic) int (the mutex).
2025-03-18 22:03:26 -07:00
Mircea Trofin
b034905c82
[ctxprof] Capture sampling info for context roots (#131201)
When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root.

We expose this in the profile and leave the normalization to the user of the profile, for a few reasons:

* it's only needed if reasoning about all profiles in aggregate.
* the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles
* the information itself may be interesting as an indication of relative sampling of various contexts.
2025-03-14 21:10:22 -07:00
Mircea Trofin
61adca7c97
[ctxprof] Fix initializer in PGOCtxProfLowering (#131269)
It needs to match the structure of `FunctionData` in compiler-rt, but
missed a field in PR #130655.
2025-03-13 21:04:27 -07:00
Mircea Trofin
07d86d25c9
[ctxprof] Flat profile collection (#130655)
Collect flat profiles. We only do this for function activations that aren't otherwise collectible under a context root are encountered. 

This allows us to reason about the full profile without concerning ourselves wether we are double-counting. For example we can combine (during profile use) flattened contextual profiles with flat profiles.
2025-03-12 07:47:58 -07:00
Mircea Trofin
f6703a4ff5
[ctxprof] don't inline weak symbols after instrumentation (#128811)
Contextual profiling identifies functions by GUID. Functions that may get overridden by the linker with a prevailing copy may have, during instrumentation, different variants in different modules. If these variants get inlined before linking (here I assume thinlto), they will identify themselves to the ctxprof runtime as their GUID, leading to issues - they may have different counter counts, for instance.

If we block their inlining in the pre-thinlink compilation, only the prevailing copy will survive post-thinlink and the confusion is avoided.

The change introduces a small pass just for this purpose, which marks any symbols that could be affected by the above as `noinline` (even if they were `alwaysinline`). We already carried out some inlining (via the preinliner), before instrumenting, so technically the `alwaysinline` directives were honored.

We could later (different patch) choose to mark them back to their original attribute (none or `alwaysinline`) post-thinlink, if we want to - but experimentally that doesn't really change much of the performance of the instrumented binary.
2025-02-26 11:01:37 -08:00
Youngsuk Kim
f0ed31ce4b
[llvm][PGOCtxProfLowering] Avoid Type::getPointerTo() (NFC) (#111857)
`Type::getPointerTo()` is to be deprecated & removed soon.
2024-10-10 16:02:13 -04:00
Mircea Trofin
f32e5bdcef
[NFC] Rename the Nr abbreviation to Num (#107151)
It's more clear. (This isn't exhaustive).
2024-09-05 12:34:47 -07:00
Mircea Trofin
960a210b1f
[ctx_prof] Remove the dependency on the "name" GlobalVariable (#105731)
We don't need that name variable for contextual instrumentation, we just
use the function to get its GUID which we pass to the runtime, and rely
on metadata to capture it through the various optimization passes. This
change removes the need for the name global variable.
2024-08-23 10:31:43 -07:00
Mircea Trofin
aca01bff07
[ctx_prof] CtxProfAnalysis: populate module data (#102930)
Continuing from #102084, which introduced the analysis, we now populate
it with info about functions contained in the module.

When we will update the profile due to e.g. inlined callsites, we'll
ingest the callee's counters and callsites to the caller. We'll move
those to the caller's respective index space (counter and callers), so
we need to know and maintain where those currently end.

We also don't need to keep profiles not pertinent to this module.

This patch also introduces an arguably much simpler way to track the
GUID of a function from the frontend compilation, through ThinLTO, and
into the post-thinlink compilation step, which doesn't rely on keeping
names around. A separate RFC and patches will discuss extending this to
the current PGO (instrumented and sampled) and other consumers as an
infrastructural component.
2024-08-14 18:46:25 -07:00
Mircea Trofin
4a2bf05980 Reapply "[ctx_prof] Fix the pre-thinlink "use" case (#102511)"
This reverts commit 967185eeb85abb77bd6b6cdd2b026d5c54b7d4f3.

The problem was link dependencies, moved `UseCtxProfile` to `Analysis`.
2024-08-08 17:04:00 -07:00
Aiden Grossman
967185eeb8 Revert "[ctx_prof] Fix the pre-thinlink "use" case (#102511)"
This reverts commit 1a6d60e0162b3ef767c87c95512dd453bf4f4746.

Broke some buildbots.
2024-08-08 21:14:56 +00:00
Mircea Trofin
1a6d60e016
[ctx_prof] Fix the pre-thinlink "use" case (#102511)
Didn't notice in #101338 that the instrumentation in `llvm/test/Transforms/PGOProfile/ctx-prof-use-prelink.ll` was actually incorrect.
2024-08-08 16:45:04 -04:00
Nikita Popov
74deadf196
[IRBuilder] Don't include Module.h (NFC) (#97159)
This used to be necessary to fetch the DataLayout, but isn't anymore.
2024-06-29 15:05:04 +02:00
Mircea Trofin
96568f3539
[llvm][ctx_profile] Add instrumentation lowering (#90821)
This adds the instrumentation lowering pass.

(Tracking Issue: #89287, RFC referenced there)
2024-05-08 16:49:08 -07:00
Mircea Trofin
c2d892668b
[llvm][ctx_profile] Add instrumentation (#90136)
This adds instrumenting callsites to PGOInstrumentation, *if* contextual profiling is requested. The latter also enables inserting counters in the entry basic block and disables value profiling (the latter is a point in time change)

This change adds the skeleton of the contextual profiling lowering pass, just so we can introduce the flag controlling that and the API to check that. The actual lowering pass will be introduced in a subsequent patch.

(Tracking Issue: #89287, RFC referenced there)
2024-05-01 14:47:49 -07:00