Both Windows and Linux use 32-bit thread identifiers. MacOS has a 64-bit
counter, but in practice it will never overflow during profiling and no false
aliasing will happen.
These changes are only done client-side and in the network protocol. The
server still uses 64-bit thread identifiers, to enable virtual threads, etc.
This removes dependency on unistd.h header (required for syscall() on
linux), which includes a definition of getopt(), which may conflict with
a custom getopt implementation (for example, one that does work on
windows).
This saves us a non-inlineable function call. Thread local block is
accessed anyway, since we need to get the token, so we already have the
pointer and don't need to get it a second time (which is done inside
Windows' GetCurrentThreadId()). We also don't need to store the thread
id in ScopedZone anymore, as it was a micro-optimization to save us the
second GetThreadHandle() call.
This change has a measurable effect of reducing enqueue time from ~10 to
~8 ns.
A further optimization would be to completely skip thread handle
retrieval during zone capture and do it instead on retrieval of data
from the queue. Since each thread has its own producer ("token"), the
thread handle should be accessible during the dequeue operation. This is
a much more invasive change, that would require a) modification of the
queue, b) additional processing of dequeued data to inject the thread
handle.
_WINDOWS_ is the macro defined by the windows.h header guard,
checking it whether the symbols have already been included before
forward declaring our own.
It was working before, because there was _GNU_SOURCE define injection.
Without this macro defined pthread_[gs]etname_np() functions are not
exposed in the API.