From 699ff43f1e1df99ee629812b534ae6c8f12fb60f Mon Sep 17 00:00:00 2001
From: Bartosz Taudul <wolf.pld@gmail.com>
Date: Sun, 20 Oct 2019 22:18:20 +0200
Subject: [PATCH] Update timings.

---
 FAQ.md           |  4 ++--
 manual/tracy.tex | 16 +++++++---------
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/FAQ.md b/FAQ.md
index aacf588a..822f4a5a 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -10,11 +10,11 @@ Telemetry license costs about 8000 $ per year. Tracy is open source software. Te
 
 ### You can use the free Brofiler. Crytek does use it, so it has to be good.
 
-After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 5 ns timer resolution. Brofiler event logging infrastructure seems to be over-engineered. Brofiler can't track lock contention, nor does it have Lua bindings.
+After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 4 ns timer resolution. Brofiler event logging infrastructure seems to be over-engineered. Brofiler can't track lock contention, nor does it have Lua bindings.
 
 ### So tracy is supposedly faster?
 
-My measurements show that logging a single zone with tracy takes only 15 ns. In theory, if the program was doing nothing else, tracy should be able to log 66 million zones per second.
+My measurements show that logging a single zone with tracy takes only 2.25 ns. In theory, if the program was doing nothing else, tracy should be able to log 444 million zones per second.
 
 ### Bullshit, RAD is advertising that they are able only to log about a million zones, over the network nevertheless: "Capture over a million timing zones per second in real-time!"
 
diff --git a/manual/tracy.tex b/manual/tracy.tex
index 5780e0ee..8ed9b836 100644
--- a/manual/tracy.tex
+++ b/manual/tracy.tex
@@ -227,26 +227,24 @@ In Tracy terminology, the profiled application is a \emph{client} and the profil
 \subsection{Performance impact}
 \label{perfimpact}
 
-To check how much slowdown is introduced by using Tracy, let's profile an example application. For this purpose we will use etcpak\footnote{\url{https://bitbucket.org/wolfpld/etcpak}}. Let's use an $8192 \times 8192$ pixels test image as input data and instrument everything down to the $4 \times 4$ pixel block compression function (that's 4 million blocks to compress).
+To check how much slowdown is introduced by using Tracy, let's profile an example application. For this purpose we have used etcpak\footnote{\url{https://bitbucket.org/wolfpld/etcpak}}. The input data was a $16384 \times 16384$ pixels test image and the $4 \times 4$ pixel block compression function was selected to be instrumented. The image was compressed on 12 parallel threads, and the timing data represents a mean compression time of a single image.
 
-The resulting timing information is presented in table~\ref{PerformanceImpact}. As can be seen, the cost of a single-zone capture (consisting of the zone begin and zone end events) is \textasciitilde 15 \si{\nano\second}.
+The results are presented in table~\ref{PerformanceImpact}. Dividing the average of run time differences (37.7 \si{\milli\second}) by a number of captured zones per single image (\num{16777216}) shows us that the impact of profiling is only 2.25 \si{\nano\second} per zone (this includes two events: start and end of a zone).
 
 \begin{table}[h]
 \centering
-\begin{tabular}[h]{c|c|c|c|c}
-\textbf{Output} & \textbf{Zones} & \textbf{Clean run} & \textbf{Profiling run} & \textbf{Difference} \\ \hline
-ETC1 & \num{4194568} & 0.94 \si{\second} & 1.003 \si{\second} & +0.063 \si{\second} \\
-ETC2 + mip-maps & \num{5592822} & 1.034 \si{\second} & 1.119 \si{\second} & +0.085 \si{\second}
+\begin{tabular}[h]{c|c|c|c|c|c}
+\textbf{Mode} & \textbf{Zones (total)} & \textbf{Zones (single image)} & \textbf{Clean run} & \textbf{Profiling run} & \textbf{Difference} \\ \hline
+ETC1 & \num{201326592} & \num{16777216} & 110.9 \si{\milli\second} & 148.2 \si{\milli\second} & +37.3 \si{\milli\second} \\
+ETC2 & \num{201326592} & \num{16777216} & 212.4 \si{\milli\second} & 250.5 \si{\milli\second} & +38.1 \si{\milli\second}
 \end{tabular}
 \caption{Zone capture time cost.}
 \label{PerformanceImpact}
 \end{table}
 
-It should be noted that Tracy has a constant initialization cost, needed to perform timer calibration. This cost was subtracted from the profiling run times, as it is irrelevant to the single-zone capture time.
-
 \subsubsection{Assembly analysis}
 
-To see how such small overhead (only 15 \si{\nano\second}) is achieved, let's take a look at the assembly. The following x64 code is responsible for logging start of a zone. Do note that it is generated by compiling fully portable C++.
+To see how such small overhead (only 2.25 \si{\nano\second}) is achieved, let's take a look at the assembly. The following x64 code is responsible for logging start of a zone. Do note that it is generated by compiling fully portable C++.
 
 \begin{lstlisting}[language={[x86masm]Assembler}]
 mov         byte ptr [rsp+0C0h],1           ; store zone activity information