Merge pull request #657 from tiago-rodrigues/trodrigues/tracy_libunwind

Add support for using using libunwind
This commit is contained in:
Bartosz Taudul 2023-11-13 20:19:06 +01:00 committed by GitHub
commit 2f2f9939db
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 51 additions and 6 deletions

View File

@ -84,6 +84,7 @@ set_option(TRACY_MANUAL_LIFETIME "Enable the manual lifetime management of the p
set_option(TRACY_FIBERS "Enable fibers support" OFF) set_option(TRACY_FIBERS "Enable fibers support" OFF)
set_option(TRACY_NO_CRASH_HANDLER "Disable crash handling" OFF) set_option(TRACY_NO_CRASH_HANDLER "Disable crash handling" OFF)
set_option(TRACY_TIMER_FALLBACK "Use lower resolution timers" OFF) set_option(TRACY_TIMER_FALLBACK "Use lower resolution timers" OFF)
set_option(TRACE_CLIENT_LIBUNWIND_BACKTRACE "Use libunwind backtracing where supported" OFF)
if(NOT TRACY_STATIC) if(NOT TRACY_STATIC)
target_compile_definitions(TracyClient PRIVATE TRACY_EXPORTS) target_compile_definitions(TracyClient PRIVATE TRACY_EXPORTS)

View File

@ -258,12 +258,16 @@ This is a very OS-specific task. It is split into two parts: getting the call st
On some platforms a bit of setup work is required. This is done in the \texttt{InitCallstack()} function. On some platforms a bit of setup work is required. This is done in the \texttt{InitCallstack()} function.
On Windows, tracy will attempt to preload symbols at \texttt{InitCallstack()} time. It does this for device drivers and process modules. As this process can be slow when a lot of pdbs are involved, you can set the \texttt{TRACY\_NO\_DBHELP\_INIT\_LOAD} environment variable to "1" to disable this behavior and rely on-demand symbol loading.
\subsubsection{Getting the frames} \subsubsection{Getting the frames}
Call stack collection is initiated by calling the \texttt{Callstack()} procedure, with maximum stack depth to be collected passed as a parameter. Stack unwinding must be performed in the place in which call stack was queried, as further execution of the application will change the stack contents. The unfortunate part is that the stack unwinding on platforms other than x86 is not a fast operation. Call stack collection is initiated by calling the \texttt{Callstack()} procedure, with maximum stack depth to be collected passed as a parameter. Stack unwinding must be performed in the place in which call stack was queried, as further execution of the application will change the stack contents. The unfortunate part is that the stack unwinding on platforms other than x86 is not a fast operation.
To perform unwinding various OS functions are used: \texttt{RtlWalkFrameChain()}, \texttt{\_Unwind\_Backtrace()}, \texttt{backtrace()}. A list of returned frame pointers is saved in a buffer, which will be later sent to the server. The maximum unwinding depth limit (63 entries) is due to the specifics of the underlying OS functionality. To perform unwinding various OS functions are used: \texttt{RtlWalkFrameChain()}, \texttt{\_Unwind\_Backtrace()}, \texttt{backtrace()}. A list of returned frame pointers is saved in a buffer, which will be later sent to the server. The maximum unwinding depth limit (63 entries) is due to the specifics of the underlying OS functionality.
On some platforms you can define \texttt{TRACE\_CLIENT\_LIBUNWIND\_BACKTRACE} to use libunwind to perform callstack captures, as it might be a faster alternative than the default implementation. If you do, you must compile/link you client against libunwind. See \url{https://github.com/libunwind/libunwind} for more details.
\subsubsection{Decoding stack frames} \subsubsection{Decoding stack frames}
Unlike the always changing call stack, stack frames themselves are immutable pointers to a specific place in the executable code. As such, the decoding process can be performed at any time (even outside of the program execution, as exemplified by debuggers). Frame decoding is only performed when the server asks for the details of a frame (section~\ref{communicationsprotocol}). Unlike the always changing call stack, stack frames themselves are immutable pointers to a specific place in the executable code. As such, the decoding process can be performed at any time (even outside of the program execution, as exemplified by debuggers). Frame decoding is only performed when the server asks for the details of a frame (section~\ref{communicationsprotocol}).

View File

@ -1698,6 +1698,14 @@ logo=\bclampe
Tracy will prepare for call stack collection regardless of whether you use the functionality or not. In some cases, this may be unwanted or otherwise troublesome for the user. To disable support for collecting call stacks, define the \texttt{TRACY\_NO\_CALLSTACK} macro. Tracy will prepare for call stack collection regardless of whether you use the functionality or not. In some cases, this may be unwanted or otherwise troublesome for the user. To disable support for collecting call stacks, define the \texttt{TRACY\_NO\_CALLSTACK} macro.
\end{bclogo} \end{bclogo}
\begin{bclogo}[
noborder=true,
couleur=black!5,
logo=\bclampe
]{libunwind}
On some platforms you can define \texttt{TRACE\_CLIENT\_LIBUNWIND\_BACKTRACE} to use libunwind to perform callstack captures as it might be a faster alternative than the default implementation. If you do, you must compile/link you client against libunwind. See \url{https://github.com/libunwind/libunwind} for more details.
\end{bclogo}
\subsubsection{Debugging symbols} \subsubsection{Debugging symbols}
You must compile the profiled application with debugging symbols enabled to have correct call stack information. You can achieve that in the following way: You must compile the profiled application with debugging symbols enabled to have correct call stack information. You can achieve that in the following way:
@ -1768,6 +1776,8 @@ void DbgHelpUnlock() { ReleaseMutex(dbgHelpLock); }
} }
\end{lstlisting} \end{lstlisting}
At initilization time, tracy will attempt to preload symbols for device drivers and process modules. As this process can be slow when a lot of pdbs are involved, you can set the \texttt{TRACY\_NO\_DBHELP\_INIT\_LOAD} environment variable to "1" to disable this behavior and rely on-demand symbol loading.
\paragraph{Disabling resolution of inline frames} \paragraph{Disabling resolution of inline frames}
Inline frames retrieval on Windows can be multiple orders of magnitude slower than just performing essential symbol resolution. This manifests as profiler seemingly being stuck for a long time, having hundreds of thousands of query backlog entries queued, which are slowly trickling down. If your use case requires speed of operation rather than having call stacks with inline frames included, you may define the \texttt{TRACY\_NO\_CALLSTACK\_INLINES} macro, which will make the profiler stick to the basic but fast frame resolution mode. Inline frames retrieval on Windows can be multiple orders of magnitude slower than just performing essential symbol resolution. This manifests as profiler seemingly being stuck for a long time, having hundreds of thousands of query backlog entries queued, which are slowly trickling down. If your use case requires speed of operation rather than having call stacks with inline frames included, you may define the \texttt{TRACY\_NO\_CALLSTACK\_INLINES} macro, which will make the profiler stick to the basic but fast frame resolution mode.
@ -2049,7 +2059,7 @@ Tracy will perform an automatic collection of system data without user intervent
Some profiling data can only be retrieved using the kernel facilities, which are not available to users with normal privilege level. To collect such data, you will need to elevate your rights to the administrator level. You can do so either by running the profiled program from the \texttt{root} account on Unix or through the \emph{Run as administrator} option on Windows\footnote{To make this easier, you can run MSVC with admin privileges, which will be inherited by your program when you start it from within the IDE.}. On Android, you will need to have a rooted device (see section~\ref{androidlunacy} for additional information). Some profiling data can only be retrieved using the kernel facilities, which are not available to users with normal privilege level. To collect such data, you will need to elevate your rights to the administrator level. You can do so either by running the profiled program from the \texttt{root} account on Unix or through the \emph{Run as administrator} option on Windows\footnote{To make this easier, you can run MSVC with admin privileges, which will be inherited by your program when you start it from within the IDE.}. On Android, you will need to have a rooted device (see section~\ref{androidlunacy} for additional information).
As this system-level tracing functionality is part of the automated collection process, no user intervention is necessary to enable it (assuming that the program was granted the rights needed). However, if, for some reason, you would want to prevent your application from trying to access kernel data, you may recompile your program with the \texttt{TRACY\_NO\_SYSTEM\_TRACING} define. As this system-level tracing functionality is part of the automated collection process, no user intervention is necessary to enable it (assuming that the program was granted the rights needed). However, if, for some reason, you would want to prevent your application from trying to access kernel data, you may recompile your program with the \texttt{TRACY\_NO\_SYSTEM\_TRACING} define. If you want to disable this functionality dynamically at runtime instead, you can set the \texttt{TRACY\_NO\_SYSTEM\_TRACING} environment variable to "1".
\begin{bclogo}[ \begin{bclogo}[
noborder=true, noborder=true,

View File

@ -157,9 +157,20 @@ void InitCallstack()
SymInitialize( GetCurrentProcess(), nullptr, true ); SymInitialize( GetCurrentProcess(), nullptr, true );
SymSetOptions( SYMOPT_LOAD_LINES ); SymSetOptions( SYMOPT_LOAD_LINES );
// use TRACY_NO_DBHELP_INIT_LOAD=1 to disable preloading of driver
// and process module symbol loading at startup time - they will be loaded on demand later
// Sometimes this process can take a very long time and prevent resolving callstack frames
// symbols during that time.
const char* noInitLoadEnv = GetEnvVar( "TRACY_NO_DBHELP_INIT_LOAD" );
const bool initTimeModuleLoad = !( noInitLoadEnv && noInitLoadEnv[0] == '1' );
if ( !initTimeModuleLoad )
{
TracyDebug("TRACY: skipping init time dbghelper module load\n");
}
DWORD needed; DWORD needed;
LPVOID dev[4096]; LPVOID dev[4096];
if( EnumDeviceDrivers( dev, sizeof(dev), &needed ) != 0 ) if( initTimeModuleLoad && EnumDeviceDrivers( dev, sizeof(dev), &needed ) != 0 )
{ {
char windir[MAX_PATH]; char windir[MAX_PATH];
if( !GetWindowsDirectoryA( windir, sizeof( windir ) ) ) memcpy( windir, "c:\\windows", 11 ); if( !GetWindowsDirectoryA( windir, sizeof( windir ) ) ) memcpy( windir, "c:\\windows", 11 );
@ -214,7 +225,7 @@ void InitCallstack()
HANDLE proc = GetCurrentProcess(); HANDLE proc = GetCurrentProcess();
HMODULE mod[1024]; HMODULE mod[1024];
if( EnumProcessModules( proc, mod, sizeof( mod ), &needed ) != 0 ) if( initTimeModuleLoad && EnumProcessModules( proc, mod, sizeof( mod ), &needed ) != 0 )
{ {
const auto sz = needed / sizeof( HMODULE ); const auto sz = needed / sizeof( HMODULE );
for( size_t i=0; i<sz; i++ ) for( size_t i=0; i<sz; i++ )

View File

@ -8,10 +8,15 @@
#if TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 5 #if TRACY_HAS_CALLSTACK == 2 || TRACY_HAS_CALLSTACK == 5
# include <unwind.h> # include <unwind.h>
#elif TRACY_HAS_CALLSTACK >= 3 #elif TRACY_HAS_CALLSTACK >= 3
# ifdef TRACE_CLIENT_LIBUNWIND_BACKTRACE
// libunwind is, in general, significantly faster than execinfo based backtraces
# define UNW_LOCAL_ONLY
# include <libunwind.h>
# else
# include <execinfo.h> # include <execinfo.h>
# endif
#endif #endif
#ifndef TRACY_HAS_CALLSTACK #ifndef TRACY_HAS_CALLSTACK
namespace tracy namespace tracy
@ -127,7 +132,13 @@ static tracy_force_inline void* Callstack( int depth )
assert( depth >= 1 ); assert( depth >= 1 );
auto trace = (uintptr_t*)tracy_malloc( ( 1 + (size_t)depth ) * sizeof( uintptr_t ) ); auto trace = (uintptr_t*)tracy_malloc( ( 1 + (size_t)depth ) * sizeof( uintptr_t ) );
#ifdef TRACE_CLIENT_LIBUNWIND_BACKTRACE
size_t num = unw_backtrace( (void**)(trace+1), depth );
#else
const auto num = (size_t)backtrace( (void**)(trace+1), depth ); const auto num = (size_t)backtrace( (void**)(trace+1), depth );
#endif
*trace = num; *trace = num;
return trace; return trace;

View File

@ -1439,7 +1439,15 @@ Profiler::Profiler()
void Profiler::SpawnWorkerThreads() void Profiler::SpawnWorkerThreads()
{ {
#ifdef TRACY_HAS_SYSTEM_TRACING #ifdef TRACY_HAS_SYSTEM_TRACING
if( SysTraceStart( m_samplingPeriod ) ) // use TRACY_NO_SYS_TRACE=1 to force disabling sys tracing (even if available in the underlying system)
// as it can have significant impact on the size of the traces
const char* noSysTrace = GetEnvVar( "TRACY_NO_SYS_TRACE" );
const bool disableSystrace = (noSysTrace && noSysTrace[0] == '1');
if( disableSystrace )
{
TracyDebug("TRACY: Sys Trace was disabled by 'TRACY_NO_SYS_TRACE=1'\n");
}
else if( SysTraceStart( m_samplingPeriod ) )
{ {
s_sysTraceThread = (Thread*)tracy_malloc( sizeof( Thread ) ); s_sysTraceThread = (Thread*)tracy_malloc( sizeof( Thread ) );
new(s_sysTraceThread) Thread( SysTraceWorker, nullptr ); new(s_sysTraceThread) Thread( SysTraceWorker, nullptr );