On Windows there is no way to distinguish callstack data coming from random
sampling and from context switches. Each callstack timestamp has to be matched
against the context switch data in order to decide its origin. This is
obviously non-trivial.
On some other platforms, the origin information may be available right away,
in which case the process of matching against the context switch data, which
possibly includes postponing callstacks for processing in the future, may be
completely omitted.
Both Windows and Linux use 32-bit thread identifiers. MacOS has a 64-bit
counter, but in practice it will never overflow during profiling and no false
aliasing will happen.
These changes are only done client-side and in the network protocol. The
server still uses 64-bit thread identifiers, to enable virtual threads, etc.
These are extremely useful for ecosystems such as Rust. There are a
couple of reasons why:
1. Rust strongly advises against relying on life before/after main, as
it is difficult to reason about. Most users working in Rust will
generally be quite surprised when encountering this concept.
2. Rust and its package manager makes it easy to use packages (crates)
and somewhat less straightforward to consider the implications of
including a dependency.
In case of the `rust_tracy_client` set of packages, I currently have
to warn throughout the documentation of the package that simply
adding a dependency on the bindings package is sufficient to
potentially accidentally broadcast a lot of information about the
instrumented binary to the broader world. This seems like a major
footgun given how easy is it to forget about having added this
dependency.
Ability to manually manage the lifetime of the profiler would be a great
solution to these problems.