mirror of https://github.com/wolfpld/tracy.git synced 2024-11-26 07:54:36 +00:00

Frame profiler

gamedev gamedev-library gamedevelopment library performance performance-analysis profiler profiling profiling-library

Go to file

Bartosz Taudul c774534b47 Use rdtsc instead of rdtscp. But rdtscp is serializing! No, it's not. Quoting the Intel Instruction Set Reference: "The RDTSCP instruction is not a serializing instruction, but it does wait until all previous instructions have executed and all previous loads are globally visible. But it does not wait for previous stores to be globally visible, and subsequent instructions may begin execution before the read operation is performed.", "The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed." So, the difference is in waiting for prior instructions to finish executing. Notice that even in the rdtscp case, execution of the following instructions may commence before time measurement is finished and data stores may be still pending. But, you may say, Intel in its "How to Benchmark Code Execution Times" document shows that using rdtscp is superior to rdstc. Well, not exactly. What they do show is that when a single function is considered, there are ways to measure its execution time with little to no error. This is not what Tracy is doing. In our case there is no way to determine absolute "this is before" and "this is after" points of a zone, as we probably already are inside another zone. Stopping the CPU execution, so that a deeply nested zone may be measured with great precision, will skew the measurements of all parent zones. And this is not what we want to measure, anyway. We are not interested in how a single function behaves, but how a whole program behaves. The out-of-order CPU behavior may influence the measurements? Good! We are interested in that. We want to see how the code is really executed. How is stopping the CPU to make a timer read an appropriate thing to do, when we want to see how a program is performing? At least that's the theory. And besides all that, the profiling overhead is now reduced.		2019-10-20 20:52:33 +02:00
capture	Allow specifying network port in server.	2019-09-21 15:43:01 +02:00
client	Use rdtsc instead of rdtscp.	2019-10-20 20:52:33 +02:00
common	Add missing NOMINMAX definitions.	2019-10-10 20:29:06 +02:00
doc	Fix 32 bit NEON version of DXT1 compression.	2019-09-03 21:37:07 +02:00
extra	Use a proper build script.	2019-09-27 00:06:45 +02:00
icon	Add icon files.	2019-06-02 18:05:49 +02:00
imgui	Update ImGui to 1.73.	2019-09-24 23:32:03 +02:00
imguicolortextedit	Update ImGuiColorTextEdit to 0a88824f7de8d.	2019-06-22 14:19:10 +02:00
libbacktrace	Fixed compiler warnings.	2019-02-20 17:50:49 +01:00
manual	Use rdtsc instead of rdtscp.	2019-10-20 20:52:33 +02:00
nfd	All variables must be defined before goto.	2019-06-23 00:36:25 +02:00
profiler	Add natvis for ContextSwitchData and ContextSwitchCpu.	2019-10-15 14:11:02 +02:00
server	Add early exit for invalid times.	2019-10-20 18:47:50 +02:00
test	Allow easy adding of tracy flags in test application.	2019-09-21 15:49:54 +02:00
update	Expose extreme compression level in update utility.	2019-09-29 21:03:08 +02:00
.appveyor.yml	Also build test with TRACY_ON_DEMAND enabled.	2019-09-21 15:50:27 +02:00
.gitignore	Basic technical documentation.	2019-07-15 21:00:12 +02:00
AUTHORS	Briefly describe contributor's work.	2019-03-09 12:36:54 +01:00
FAQ.md	Mention on-demand mode in FAQ.	2018-07-12 13:32:49 +02:00
LICENSE	Update year in copyright notice.	2018-12-30 17:51:17 +01:00
NEWS	Update NEWS.	2019-10-15 21:50:22 +02:00
README.md	Release 0.5.0.	2019-08-10 22:14:14 +02:00
Tracy.hpp	Allow sending application information messages.	2019-07-12 18:34:46 +02:00
TracyC.h	Make C API symbols visible across dlls.	2019-10-03 22:39:26 +02:00
TracyClient.cpp	Add system tracing skeleton.	2019-08-12 23:05:34 +02:00
TracyClientDLL.cpp	TracySystem.cpp should be always compiled in.	2019-10-03 22:34:06 +02:00
TracyLua.hpp	Drop support for CPU id queries.	2019-08-12 23:05:34 +02:00
TracyOpenGL.hpp	Send thread id in GPU zone end message.	2019-09-23 16:06:14 +02:00
TracyVulkan.hpp	Send thread id in GPU zone end message.	2019-09-23 16:06:14 +02:00

README.md

Tracy Profiler

Tracy is a real time, nanosecond resolution frame profiler that can be used for remote or embedded telemetry of your application. It can profile CPU (C, C++11, Lua), GPU (OpenGL, Vulkan) and memory. It also can display locks held by threads and their interactions with each other.

The following compilers are supported:

MSVC
gcc
clang

The following platforms are confirmed to be working (this is not a complete list):

Windows (x86, x64)
Linux (x86, x64, ARM, ARM64)
Android (ARM, x86)
FreeBSD (x64)
Cygwin (x64)
WSL (x64)
OSX (x64)

Introduction to Tracy Profiler v0.2
New features in Tracy Profiler v0.3
New features in Tracy Profiler v0.4
New features in Tracy Profiler v0.5

A quick FAQ.
List of changes.

High-level overview

Tracy is split into client and server side. The client side collects events using a high-efficiency queue and awaits for an incoming connection. The server part connects to client and receives collected data from the client, which is then reconstructed into a viewable timeline. The transfer is performed using a TCP connection.

Performance impact

To check how much slowdown is introduced by using Tracy, I have profiled etcpak, which is the fastest ETC texture compression utility there is. I used an 8192×8192 test image as input data and instrumented everything down to the 4×4 pixel block compression function (that's 4 million blocks to compress). It should be noted that Tracy needs to calibrate its internal timers at each run. This introduces a delay of 115 ms (on my machine), which is negligible when doing lengthy profiling runs, but it skews the results of etcpak timing. The following times have this delay subtracted, to give focus on zone collection impact, which is the thing that really matters here.

Scenario	Zones	Clean run	Profiling run	Difference
Compression of an image to ETC1 format	4194568	0.94 s	1.003 s	+0.063 s
Compression of an image to ETC2 format, with mip-maps	5592822	1.034 s	1.119 s	+0.085 s

In both scenarios the per-zone time cost is at ~15 ns. This is in line with the measured 8 ns single event collection time (each zone has to report start and end event).

Usage instructions

The user manual for Tracy is available at the following address. It provides information about the integration process, required code markup and so on.

README.md

Tracy Profiler

High-level overview

Performance impact

Usage instructions

Features

Histogram of function execution times

Comparison of two profiling runs

Marking locks

Plotting data

Message log

README.md Unescape Escape

Tracy Profiler

High-level overview

Performance impact

Usage instructions

Features

Histogram of function execution times

Comparison of two profiling runs

Marking locks

Plotting data

Message log

README.md