mirror of
https://github.com/wolfpld/tracy.git
synced 2024-11-22 14:44:34 +00:00
Update timings.
This commit is contained in:
parent
07b66cd4ab
commit
699ff43f1e
4
FAQ.md
4
FAQ.md
@ -10,11 +10,11 @@ Telemetry license costs about 8000 $ per year. Tracy is open source software. Te
|
||||
|
||||
### You can use the free Brofiler. Crytek does use it, so it has to be good.
|
||||
|
||||
After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 5 ns timer resolution. Brofiler event logging infrastructure seems to be over-engineered. Brofiler can't track lock contention, nor does it have Lua bindings.
|
||||
After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 4 ns timer resolution. Brofiler event logging infrastructure seems to be over-engineered. Brofiler can't track lock contention, nor does it have Lua bindings.
|
||||
|
||||
### So tracy is supposedly faster?
|
||||
|
||||
My measurements show that logging a single zone with tracy takes only 15 ns. In theory, if the program was doing nothing else, tracy should be able to log 66 million zones per second.
|
||||
My measurements show that logging a single zone with tracy takes only 2.25 ns. In theory, if the program was doing nothing else, tracy should be able to log 444 million zones per second.
|
||||
|
||||
### Bullshit, RAD is advertising that they are able only to log about a million zones, over the network nevertheless: "Capture over a million timing zones per second in real-time!"
|
||||
|
||||
|
@ -227,26 +227,24 @@ In Tracy terminology, the profiled application is a \emph{client} and the profil
|
||||
\subsection{Performance impact}
|
||||
\label{perfimpact}
|
||||
|
||||
To check how much slowdown is introduced by using Tracy, let's profile an example application. For this purpose we will use etcpak\footnote{\url{https://bitbucket.org/wolfpld/etcpak}}. Let's use an $8192 \times 8192$ pixels test image as input data and instrument everything down to the $4 \times 4$ pixel block compression function (that's 4 million blocks to compress).
|
||||
To check how much slowdown is introduced by using Tracy, let's profile an example application. For this purpose we have used etcpak\footnote{\url{https://bitbucket.org/wolfpld/etcpak}}. The input data was a $16384 \times 16384$ pixels test image and the $4 \times 4$ pixel block compression function was selected to be instrumented. The image was compressed on 12 parallel threads, and the timing data represents a mean compression time of a single image.
|
||||
|
||||
The resulting timing information is presented in table~\ref{PerformanceImpact}. As can be seen, the cost of a single-zone capture (consisting of the zone begin and zone end events) is \textasciitilde 15 \si{\nano\second}.
|
||||
The results are presented in table~\ref{PerformanceImpact}. Dividing the average of run time differences (37.7 \si{\milli\second}) by a number of captured zones per single image (\num{16777216}) shows us that the impact of profiling is only 2.25 \si{\nano\second} per zone (this includes two events: start and end of a zone).
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\begin{tabular}[h]{c|c|c|c|c}
|
||||
\textbf{Output} & \textbf{Zones} & \textbf{Clean run} & \textbf{Profiling run} & \textbf{Difference} \\ \hline
|
||||
ETC1 & \num{4194568} & 0.94 \si{\second} & 1.003 \si{\second} & +0.063 \si{\second} \\
|
||||
ETC2 + mip-maps & \num{5592822} & 1.034 \si{\second} & 1.119 \si{\second} & +0.085 \si{\second}
|
||||
\begin{tabular}[h]{c|c|c|c|c|c}
|
||||
\textbf{Mode} & \textbf{Zones (total)} & \textbf{Zones (single image)} & \textbf{Clean run} & \textbf{Profiling run} & \textbf{Difference} \\ \hline
|
||||
ETC1 & \num{201326592} & \num{16777216} & 110.9 \si{\milli\second} & 148.2 \si{\milli\second} & +37.3 \si{\milli\second} \\
|
||||
ETC2 & \num{201326592} & \num{16777216} & 212.4 \si{\milli\second} & 250.5 \si{\milli\second} & +38.1 \si{\milli\second}
|
||||
\end{tabular}
|
||||
\caption{Zone capture time cost.}
|
||||
\label{PerformanceImpact}
|
||||
\end{table}
|
||||
|
||||
It should be noted that Tracy has a constant initialization cost, needed to perform timer calibration. This cost was subtracted from the profiling run times, as it is irrelevant to the single-zone capture time.
|
||||
|
||||
\subsubsection{Assembly analysis}
|
||||
|
||||
To see how such small overhead (only 15 \si{\nano\second}) is achieved, let's take a look at the assembly. The following x64 code is responsible for logging start of a zone. Do note that it is generated by compiling fully portable C++.
|
||||
To see how such small overhead (only 2.25 \si{\nano\second}) is achieved, let's take a look at the assembly. The following x64 code is responsible for logging start of a zone. Do note that it is generated by compiling fully portable C++.
|
||||
|
||||
\begin{lstlisting}[language={[x86masm]Assembler}]
|
||||
mov byte ptr [rsp+0C0h],1 ; store zone activity information
|
||||
|
Loading…
Reference in New Issue
Block a user