Frame profiler
Go to file
Bartosz Taudul a56c47a6a0 Store thread handle in a thread local variable.
This saves us a non-inlineable function call. Thread local block is
accessed anyway, since we need to get the token, so we already have the
pointer and don't need to get it a second time (which is done inside
Windows' GetCurrentThreadId()). We also don't need to store the thread
id in ScopedZone anymore, as it was a micro-optimization to save us the
second GetThreadHandle() call.

This change has a measurable effect of reducing enqueue time from ~10 to
~8 ns.

A further optimization would be to completely skip thread handle
retrieval during zone capture and do it instead on retrieval of data
from the queue. Since each thread has its own producer ("token"), the
thread handle should be accessible during the dequeue operation. This is
a much more invasive change, that would require a) modification of the
queue, b) additional processing of dequeued data to inject the thread
handle.
2019-06-24 19:19:47 +02:00
capture Extract text printing functions. 2019-06-18 20:43:28 +02:00
client Store thread handle in a thread local variable. 2019-06-24 19:19:47 +02:00
common Store thread handle in a thread local variable. 2019-06-24 19:19:47 +02:00
doc Add histogram, compare screenshots to README. 2019-01-15 19:55:41 +01:00
extra X11 colors conversion program. 2018-07-04 18:26:57 +02:00
icon Add icon files. 2019-06-02 18:05:49 +02:00
imgui Update imgui to 1.71. 2019-06-12 22:53:23 +02:00
imguicolortextedit Update ImGuiColorTextEdit to 0a88824f7de8d. 2019-06-22 14:19:10 +02:00
libbacktrace Fixed compiler warnings. 2019-02-20 17:50:49 +01:00
manual Update manual. 2019-06-23 00:21:56 +02:00
nfd All variables must be defined before goto. 2019-06-23 00:36:25 +02:00
profiler Can't read negative number of bytes. 2019-06-22 14:08:48 +02:00
server Use proper popcnt for gcc/clang (including cygwin). 2019-06-24 18:56:04 +02:00
test Add frame images to test application. 2019-06-13 01:53:47 +02:00
update Wait for source location zones in update tool. 2019-03-13 01:28:42 +01:00
.appveyor.yml Build test application on appveyor. 2019-06-19 22:17:11 +02:00
.gitignore Add icon to win32 profiler executable. 2019-06-02 18:05:49 +02:00
AUTHORS Briefly describe contributor's work. 2019-03-09 12:36:54 +01:00
FAQ.md Mention on-demand mode in FAQ. 2018-07-12 13:32:49 +02:00
LICENSE Update year in copyright notice. 2018-12-30 17:51:17 +01:00
NEWS Update NEWS. 2019-06-22 14:25:35 +02:00
README.md Update README. 2019-02-19 20:46:17 +01:00
Tracy.hpp Frame image may need flipping. 2019-06-12 15:28:32 +02:00
TracyC.h Use language neutral header for callstack capability detection. 2019-01-27 13:41:32 +01:00
TracyClient.cpp Add ETC1 compressor. 2019-06-07 00:31:51 +02:00
TracyClientDLL.cpp Clean up imported functions in multi-dll projects. 2019-06-07 19:50:08 +03:00
TracyLua.hpp Clean up imported functions in multi-dll projects. 2019-06-07 19:50:08 +03:00
TracyOpenGL.hpp Support GL_EXT_disjoint_timer_query with EXT postfix. 2019-06-18 16:34:27 +02:00
TracyVulkan.hpp Hide rest of statics. 2019-02-19 19:33:37 +01:00

Tracy Profiler

Build status

Tracy is a real time, nanosecond resolution frame profiler that can be used for remote or embedded telemetry of your application. It can profile CPU (C, C++11, Lua), GPU (OpenGL, Vulkan) and memory. It also can display locks held by threads and their interactions with each other.

The following compilers are supported:

  • MSVC
  • gcc
  • clang

The following platforms are confirmed to be working (this is not a complete list):

  • Windows (x86, x64)
  • Linux (x86, x64, ARM, ARM64)
  • Android (ARM, x86)
  • FreeBSD (x64)
  • Cygwin (x64)
  • WSL (x64)
  • OSX (x64)

Introduction to Tracy Profiler v0.2
New features in Tracy Profiler v0.3
New features in Tracy Profiler v0.4

A quick FAQ.
List of changes.

High-level overview

Tracy is split into client and server side. The client side collects events using a high-efficiency queue and awaits for an incoming connection. The server part connects to client and receives collected data from the client, which is then reconstructed into a viewable timeline. The transfer is performed using a TCP connection.

Performance impact

To check how much slowdown is introduced by using Tracy, I have profiled etcpak, which is the fastest ETC texture compression utility there is. I used an 8192×8192 test image as input data and instrumented everything down to the 4×4 pixel block compression function (that's 4 million blocks to compress). It should be noted that Tracy needs to calibrate its internal timers at each run. This introduces a delay of 115 ms (on my machine), which is negligible when doing lengthy profiling runs, but it skews the results of etcpak timing. The following times have this delay subtracted, to give focus on zone collection impact, which is the thing that really matters here.

Scenario Zones Clean run Profiling run Difference
Compression of an image to ETC1 format 4194568 0.94 s 1.003 s +0.063 s
Compression of an image to ETC2 format, with mip-maps 5592822 1.034 s 1.119 s +0.085 s

In both scenarios the per-zone time cost is at ~15 ns. This is in line with the measured 8 ns single event collection time (each zone has to report start and end event).

Usage instructions

The user manual for Tracy is available at the following address. It provides information about the integration process, required code markup and so on.

Features

Histogram of function execution times

Comparison of two profiling runs

Marking locks

Plotting data

Message log