13 Commits

Author SHA1 Message Date
Mircea Trofin
e7aed23d32
[ctxprof] Handle instrumenting functions with musttail calls (#135121)
Functions with `musttail` calls can't be roots because we can't instrument their `ret` to release the context. This patch tags their `CtxRoot` field in their `FunctionData`. In compiler-rt we then know not to allow such functions become roots, and also not confuse `CtxRoot == 0x1` with there being a context root.

Currently we also lose the context tree under such cases. We can, in a subsequent patch, have the root detector search past these functions.
2025-04-14 10:01:25 -07:00
Mircea Trofin
b2dea4fd22
[ctxprof] root autodetection mechanism (#133147)
This is an optional mechanism that automatically detects roots. It's a best-effort mechanism, and its main goal is to *avoid* pointing at the message pump function as a root. This is the function that polls message queue(s) in an infinite loop, and is thus a bad root (it never exits).

High-level, when collection is requested - which should happen when a server has already been set up and handing requests - we spend a bit of time sampling all the server's threads. Each sample is a stack which we insert in a `PerThreadCallsiteTrie`. After a while, we run for each `PerThreadCallsiteTrie` the root detection logic. We then traverse all the `FunctionData`, find the ones matching the detected roots, and allocate a `ContextRoot` for them. From here, we special case `FunctionData` objects, in `__llvm_ctx_profile_get_context, that have a `CtxRoot` and route them to `__llvm_ctx_profile_start_context`.

For this to work, on the llvm side, we need to have all functions call `__llvm_ctx_profile_release_context` because they _might_ be roots. This comes at a slight (percentages) penalty during collection - which we can afford since the overall technique is ~5x faster than normal instrumentation. We can later explore conditionally enabling autoroot detection and avoiding this penalty, if desired. 

Note that functions that `musttail call` can't have their return instrumented this way, and a subsequent patch will harden the mechanism against this case.

The mechanism could be used in combination with explicit root specification, too.
2025-04-08 06:59:38 -07:00
Mircea Trofin
dd191d3d4f
[ctxprof][nfc] Share the definition of FunctionData between compiler-rt and llvm (#132136)
Mechanism to keep the compiler-rt and llvm view of `FunctionData` in sync. Since CtxInstrContextNode.h is exactly the same on both sides (there's an existing test, `compiler-rt/test/ctx_profile/TestCases/check-same-ctx-node.test`, checking that), we capture the structure in a macro that is then generated as `struct` fields on the compiler-rt side, and as `Type` objects on the llvm side. The macro needs to be told how to render a few kinds of fields. 

If we add more fields to FunctionData that can be described by the current known types of fields, then the llvm side would automatically be updated. If we need to add more kinds of fields, which we do by adding parameters to the macro, the llvm side (if not updated) would trigger a compilation error.
2025-03-20 12:48:18 -07:00
Mircea Trofin
0668bb28cc
[ctxprof] Track unhandled call targets (#131417)
Collect profiles for functions we encounter when collecting a contextual profile, that are not associated with a call site. This is expected to happen for signal handlers, but it also - problematically - currently happens for mem{memset|copy|move|set}, which are currently inserted after profile instrumentation.

Collecting a "regular" flat profile in these cases would hide the problem - that we loose better profile opportunities.
2025-03-19 13:51:22 -07:00
Mircea Trofin
b034905c82
[ctxprof] Capture sampling info for context roots (#131201)
When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root.

We expose this in the profile and leave the normalization to the user of the profile, for a few reasons:

* it's only needed if reasoning about all profiles in aggregate.
* the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles
* the information itself may be interesting as an indication of relative sampling of various contexts.
2025-03-14 21:10:22 -07:00
Mircea Trofin
07d86d25c9
[ctxprof] Flat profile collection (#130655)
Collect flat profiles. We only do this for function activations that aren't otherwise collectible under a context root are encountered. 

This allows us to reason about the full profile without concerning ourselves wether we are double-counting. For example we can combine (during profile use) flattened contextual profiles with flat profiles.
2025-03-12 07:47:58 -07:00
Mircea Trofin
5223ddd83f
[ctxprof] Prepare profile format for flat profiles (#129626)
The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches.

The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
2025-03-05 07:22:35 -08:00
Mircea Trofin
1b46db7776
[ctxprof] ProfileWriter abstraction (#129590)
Introduce a `ProfileWriter` abstraction to replace the callback passed to `__llvm_ctx_profile_fetch`. Subsequent changes will add support for flat profile collection (as in, collection of non-contextual profile for those functions not under a contextual root), which require also a change in the profile format. The abstraction makes it easy to add "write flat" - related capabilities without constantly complicating the signature of `__llvm_ctx_profile_fetch`.
2025-03-04 12:41:16 -08:00
c8ef
59770a4382
[NFC] Correct imprecise file location in the comment. (#115630) 2024-11-10 15:23:58 +08:00
c8ef
b57b3f6425
[NFC] Simple typo correction. (#114548) 2024-11-02 00:40:57 +08:00
Mircea Trofin
f32e5bdcef
[NFC] Rename the Nr abbreviation to Num (#107151)
It's more clear. (This isn't exhaustive).
2024-09-05 12:34:47 -07:00
Mircea Trofin
0fd017ce43 [nfc][ctx_profile] Move CtxInstrContextNode.h in include
Follow-up from PR #91669.
2024-05-09 17:30:46 -07:00
Mircea Trofin
e4763ca83b
[ctx_profile] Pull ContextNode in a .inc file (#91669)
This pulls out `ContextNode` as we need to use it pretty much as-is to implement a writer. The writer will be implemented on the LLVM side because it takes a dependency on BitStreamWriter.

Since we can't reuse a header between compiler-rt and llvm, we use a header file which is copied on both sides, and test that the 2 copies are identical.

The changes adds the necessary other stuff for compiler-rt/ctx_profile testing.
2024-05-09 16:58:40 -07:00