There might have been new modules loaded by another thread between the `SymInitialize` and `EnumProcessModules` calls.
Since we register the enumerated modules into the cache, we need to make sure that symbols for this module are loaded.
The only way to do that is to call `SymLoadModuleEx`, just like we do when finding new modules after `InitCallstack`.
gcc error:
public/tracy/../client/TracyScoped.hpp:102:9: error: ‘___tracy_scoped_zone.tracy::ScopedZone::m_connectionId’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
if( GetProfiler().ConnectionId() != m_connectionId ) return;
^~
assert() in release configuration resolves to empty code, while abort() is marked as [[noreturn]] and always is available.
gcc error:
error: ‘type’ may be used uninitialized in this function [-Werror=maybe-uninitialized]:
public/tracy/../client/../common/TracyAlign.hpp: In function ‘void tracy::SysTraceWorker(void*)’:
public/tracy/../client/../common/TracyAlign.hpp:22:11: error: ‘type’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
memcpy( ptr, &val, sizeof( T ) );
~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from public/TracyClient.cpp:26,
from X.cpp:
public/client/TracySysTrace.cpp:1258:35: note: ‘type’ was declared here
QueueType type;
^~~~
Red and blue channels were mislabeled. Otherwise, coding and decoding was
performed correctly, as far as the color channel order described in the manual
is followed by the user.
No change to the binary protocol was made.
These functions are only defined when -DTRACY_FIBERS is set. However,
the function is declared regardless of this declaration, which seems
like it could lead to obscure linking errors. I haven’t encountered any
of these specifically, but in my case, this distinction makes it more
difficult to produce correctly auto-generated bindings.
This fixes usage with TRACY_HAS_CALLSTACK undefined, allowing compilation of
otherwise unused functions, which are already protected from being called
through macro redirections.
See https://github.com/wolfpld/tracy/pull/492 for more information.
Provide a custom no-op implementation of dlclose(), in order to prevent shared
object data from disappearing from profiler view. The server makes queries for
program executable code, which has to be always available, otherwise wrong
data may be provided, or the program may crash, due to referencing no longer
mapped memory.
The dlclose() documentation states that the function internally decreases the
reference count, and only does unload the shared object when the count reaches
zero. There is no guarantee that the shared object data will be unloaded
immediately after any dlclose call originating from the program. This function
override exploits this fact.
Should the symbol thread crash, mark that it is gone. This will allow the
profiler to transmit crash call stack, including resolved symbol names and
locations (which will resolve on the main profiler thread).
DecodeCallstackPtrFast() may be called outside the symbol processing thread,
for example in the crash handler. Using the less-capable dladdr functionality
doesn't have a big impact here. Callstack decoding in this context is used to
remove the uninteresting top part of the callstack, so that the callstack ends
at the crashing function, and not in the crash handler. Even if this
functionality would be impacted by this change, the damage done is close to
none.
The other alternative is to use locking each time a libbacktrace is to be
used, which does not seem to be worthy to do, considering that the problem
only occurs in a very rare code path.
NB everything was working when it was first implemented, because back then the
callstack decoding was still performed on the main thread, and not on a
separate, dedicated one.
The profiled app might install handlers to track crashes, write minidumps,
etc. - this patch makes sure the app's exception handler is called when
a crash happens while profiling with Tracy.
Since commit 940f32c1a8 building the Tracy
library on Linux using a GCC version < 11 would result in compile errors
due to symbol redefinitions of __get_cpuid_max, __get_cpuid and
__get_cpuid_count.
This is because prior to GCC 11 the cpuid.h header file did not have any
include guards and thus including this header more than once would
produce the abovementioned errors.
To work around this issue, including cpuid.h has been wrapped into a
custom header file that itself uses include guards and thus shields
cpuid.h from being included multiple times.
Fixes#452
Previously a bitmap of buffers was repeatedly scanned to see which buffers
still contain data. This process was needlessly wasting cycles (seen as a
hotspot when profiled) and worse yet, the workload increased with the number
of CPU cores (=> buffers used) to handle.
The new implementation instead maintains a list of buffer indices that have to
be handled. This list does not contain empty buffers, so each loop iteration
performs some work, instead of just spinning in search for buffers to handle.
Initializing structures for callstack processing (building memory map of the
process, gathering kernel symbols, etc) takes some time, which in some cases
may be significant.
Callstack queries are now handled on a separate thread. In such setup it no
longer makes sense to block main thread execution with this lengthy init
process.
All the heavy initialization phase has been now moved to this separate
processing thread. Some initial callstack queries may now not produce
responses as promptly as before, but this is only because the main thread is
able to start working earlier.
Some parts of the initialization process may be critical to do in the main
thread, for example because the function responsible for gathering callstacks
must be loaded first. This is done still on the main thread, in a new function
InitCallstackCritical().