Update timings.

This commit is contained in:
Bartosz Taudul 2019-10-20 22:18:20 +02:00
parent 07b66cd4ab
commit 699ff43f1e
2 changed files with 9 additions and 11 deletions

4
FAQ.md
View File

@ -10,11 +10,11 @@ Telemetry license costs about 8000 $ per year. Tracy is open source software. Te
### You can use the free Brofiler. Crytek does use it, so it has to be good.
After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 5 ns timer resolution. Brofiler event logging infrastructure seems to be over-engineered. Brofiler can't track lock contention, nor does it have Lua bindings.
After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 4 ns timer resolution. Brofiler event logging infrastructure seems to be over-engineered. Brofiler can't track lock contention, nor does it have Lua bindings.
### So tracy is supposedly faster?
My measurements show that logging a single zone with tracy takes only 15 ns. In theory, if the program was doing nothing else, tracy should be able to log 66 million zones per second.
My measurements show that logging a single zone with tracy takes only 2.25 ns. In theory, if the program was doing nothing else, tracy should be able to log 444 million zones per second.
### Bullshit, RAD is advertising that they are able only to log about a million zones, over the network nevertheless: "Capture over a million timing zones per second in real-time!"

View File

@ -227,26 +227,24 @@ In Tracy terminology, the profiled application is a \emph{client} and the profil
\subsection{Performance impact}
\label{perfimpact}
To check how much slowdown is introduced by using Tracy, let's profile an example application. For this purpose we will use etcpak\footnote{\url{https://bitbucket.org/wolfpld/etcpak}}. Let's use an $8192 \times 8192$ pixels test image as input data and instrument everything down to the $4 \times 4$ pixel block compression function (that's 4 million blocks to compress).
To check how much slowdown is introduced by using Tracy, let's profile an example application. For this purpose we have used etcpak\footnote{\url{https://bitbucket.org/wolfpld/etcpak}}. The input data was a $16384 \times 16384$ pixels test image and the $4 \times 4$ pixel block compression function was selected to be instrumented. The image was compressed on 12 parallel threads, and the timing data represents a mean compression time of a single image.
The resulting timing information is presented in table~\ref{PerformanceImpact}. As can be seen, the cost of a single-zone capture (consisting of the zone begin and zone end events) is \textasciitilde 15 \si{\nano\second}.
The results are presented in table~\ref{PerformanceImpact}. Dividing the average of run time differences (37.7 \si{\milli\second}) by a number of captured zones per single image (\num{16777216}) shows us that the impact of profiling is only 2.25 \si{\nano\second} per zone (this includes two events: start and end of a zone).
\begin{table}[h]
\centering
\begin{tabular}[h]{c|c|c|c|c}
\textbf{Output} & \textbf{Zones} & \textbf{Clean run} & \textbf{Profiling run} & \textbf{Difference} \\ \hline
ETC1 & \num{4194568} & 0.94 \si{\second} & 1.003 \si{\second} & +0.063 \si{\second} \\
ETC2 + mip-maps & \num{5592822} & 1.034 \si{\second} & 1.119 \si{\second} & +0.085 \si{\second}
\begin{tabular}[h]{c|c|c|c|c|c}
\textbf{Mode} & \textbf{Zones (total)} & \textbf{Zones (single image)} & \textbf{Clean run} & \textbf{Profiling run} & \textbf{Difference} \\ \hline
ETC1 & \num{201326592} & \num{16777216} & 110.9 \si{\milli\second} & 148.2 \si{\milli\second} & +37.3 \si{\milli\second} \\
ETC2 & \num{201326592} & \num{16777216} & 212.4 \si{\milli\second} & 250.5 \si{\milli\second} & +38.1 \si{\milli\second}
\end{tabular}
\caption{Zone capture time cost.}
\label{PerformanceImpact}
\end{table}
It should be noted that Tracy has a constant initialization cost, needed to perform timer calibration. This cost was subtracted from the profiling run times, as it is irrelevant to the single-zone capture time.
\subsubsection{Assembly analysis}
To see how such small overhead (only 15 \si{\nano\second}) is achieved, let's take a look at the assembly. The following x64 code is responsible for logging start of a zone. Do note that it is generated by compiling fully portable C++.
To see how such small overhead (only 2.25 \si{\nano\second}) is achieved, let's take a look at the assembly. The following x64 code is responsible for logging start of a zone. Do note that it is generated by compiling fully portable C++.
\begin{lstlisting}[language={[x86masm]Assembler}]
mov byte ptr [rsp+0C0h],1 ; store zone activity information