From 699ff43f1e1df99ee629812b534ae6c8f12fb60f Mon Sep 17 00:00:00 2001 From: Bartosz Taudul Date: Sun, 20 Oct 2019 22:18:20 +0200 Subject: [PATCH] Update timings. --- FAQ.md | 4 ++-- manual/tracy.tex | 16 +++++++--------- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/FAQ.md b/FAQ.md index aacf588a..822f4a5a 100644 --- a/FAQ.md +++ b/FAQ.md @@ -10,11 +10,11 @@ Telemetry license costs about 8000 $ per year. Tracy is open source software. Te ### You can use the free Brofiler. Crytek does use it, so it has to be good. -After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 5 ns timer resolution. Brofiler event logging infrastructure seems to be over-engineered. Brofiler can't track lock contention, nor does it have Lua bindings. +After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 4 ns timer resolution. Brofiler event logging infrastructure seems to be over-engineered. Brofiler can't track lock contention, nor does it have Lua bindings. ### So tracy is supposedly faster? -My measurements show that logging a single zone with tracy takes only 15 ns. In theory, if the program was doing nothing else, tracy should be able to log 66 million zones per second. +My measurements show that logging a single zone with tracy takes only 2.25 ns. In theory, if the program was doing nothing else, tracy should be able to log 444 million zones per second. ### Bullshit, RAD is advertising that they are able only to log about a million zones, over the network nevertheless: "Capture over a million timing zones per second in real-time!" diff --git a/manual/tracy.tex b/manual/tracy.tex index 5780e0ee..8ed9b836 100644 --- a/manual/tracy.tex +++ b/manual/tracy.tex @@ -227,26 +227,24 @@ In Tracy terminology, the profiled application is a \emph{client} and the profil \subsection{Performance impact} \label{perfimpact} -To check how much slowdown is introduced by using Tracy, let's profile an example application. For this purpose we will use etcpak\footnote{\url{https://bitbucket.org/wolfpld/etcpak}}. Let's use an $8192 \times 8192$ pixels test image as input data and instrument everything down to the $4 \times 4$ pixel block compression function (that's 4 million blocks to compress). +To check how much slowdown is introduced by using Tracy, let's profile an example application. For this purpose we have used etcpak\footnote{\url{https://bitbucket.org/wolfpld/etcpak}}. The input data was a $16384 \times 16384$ pixels test image and the $4 \times 4$ pixel block compression function was selected to be instrumented. The image was compressed on 12 parallel threads, and the timing data represents a mean compression time of a single image. -The resulting timing information is presented in table~\ref{PerformanceImpact}. As can be seen, the cost of a single-zone capture (consisting of the zone begin and zone end events) is \textasciitilde 15 \si{\nano\second}. +The results are presented in table~\ref{PerformanceImpact}. Dividing the average of run time differences (37.7 \si{\milli\second}) by a number of captured zones per single image (\num{16777216}) shows us that the impact of profiling is only 2.25 \si{\nano\second} per zone (this includes two events: start and end of a zone). \begin{table}[h] \centering -\begin{tabular}[h]{c|c|c|c|c} -\textbf{Output} & \textbf{Zones} & \textbf{Clean run} & \textbf{Profiling run} & \textbf{Difference} \\ \hline -ETC1 & \num{4194568} & 0.94 \si{\second} & 1.003 \si{\second} & +0.063 \si{\second} \\ -ETC2 + mip-maps & \num{5592822} & 1.034 \si{\second} & 1.119 \si{\second} & +0.085 \si{\second} +\begin{tabular}[h]{c|c|c|c|c|c} +\textbf{Mode} & \textbf{Zones (total)} & \textbf{Zones (single image)} & \textbf{Clean run} & \textbf{Profiling run} & \textbf{Difference} \\ \hline +ETC1 & \num{201326592} & \num{16777216} & 110.9 \si{\milli\second} & 148.2 \si{\milli\second} & +37.3 \si{\milli\second} \\ +ETC2 & \num{201326592} & \num{16777216} & 212.4 \si{\milli\second} & 250.5 \si{\milli\second} & +38.1 \si{\milli\second} \end{tabular} \caption{Zone capture time cost.} \label{PerformanceImpact} \end{table} -It should be noted that Tracy has a constant initialization cost, needed to perform timer calibration. This cost was subtracted from the profiling run times, as it is irrelevant to the single-zone capture time. - \subsubsection{Assembly analysis} -To see how such small overhead (only 15 \si{\nano\second}) is achieved, let's take a look at the assembly. The following x64 code is responsible for logging start of a zone. Do note that it is generated by compiling fully portable C++. +To see how such small overhead (only 2.25 \si{\nano\second}) is achieved, let's take a look at the assembly. The following x64 code is responsible for logging start of a zone. Do note that it is generated by compiling fully portable C++. \begin{lstlisting}[language={[x86masm]Assembler}] mov byte ptr [rsp+0C0h],1 ; store zone activity information