The C++11 spec states in [basic.stc.thread] thread storage duration:
2. A variable with thread storage duration shall be initialized before its
first odr-use (3.2) and, if constructed, shall be destroyed on thread exit.
Previously Tracy relied on the TLS data being initialized:
- During thread creation (MSVC).
- Or during first use in a thread, but the initialization was performed for
the whole TLS block.
It seems that new compilers are more granular with how they perform the
initialization, hence rpmalloc init has to be checked before each allocation,
as it cannot be "folded" into, for example, initialization of the profiler
itself.
With gcc/clang the wrapper functions for intrinsics are annoyingly inserted at
top level of stack traces, making it hard to see the call site. Filter out all
known instrinsic headers.
Fuck knows how this is supposed to work. perf_event_open() opens the
descriptor successfully, but it produces no samples, if precise_ip is not 0.
There are no such problems on ARM (where precise_ip is 3, but maybe it is not
supported at all on that architecture, again, fuck knows if), and on AMD
perf_event_open() does not succeed when precise_ip > 0.
Sets m_compare to the matched index. It supports multiple flags. It can
be run by comparing function name, source file, line number, and any
combination thereof. When searching for a match, we do 3 runs, quitting
out if any of them succeed.
1. Look for zone with same function same, source file, line number.
2. Look for zone with same function same, source file.
3. Look for zone with same function same.
When comparing traces, where multiple classes share the same zone
names, the behavior prior to this patch was to auto-select the first
matching zone name in the other trace. Instead, find the most correct
zone by using filename and line number.
Original commit a6b25497 by xavier <xavierb@gmail.com>:
add TRACY_CALLSTACK_IGNORE_INLINES to tradeoff speed vs precision in win32 DecodeCallstackPtr()
SymQueryInlineTrace() is too slow in some cases:
300000 queries backlog getting processed at ~70 per second is prohibitive.
(without inlines resolution, it's more like ~20000 queries per second)