mirror of
https://github.com/wolfpld/tracy.git
synced 2024-11-10 10:41:50 +00:00
c774534b47
But rdtscp is serializing! No, it's not. Quoting the Intel Instruction Set Reference: "The RDTSCP instruction is not a serializing instruction, but it does wait until all previous instructions have executed and all previous loads are globally visible. But it does not wait for previous stores to be globally visible, and subsequent instructions may begin execution before the read operation is performed.", "The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed." So, the difference is in waiting for prior instructions to finish executing. Notice that even in the rdtscp case, execution of the following instructions may commence before time measurement is finished and data stores may be still pending. But, you may say, Intel in its "How to Benchmark Code Execution Times" document shows that using rdtscp is superior to rdstc. Well, not exactly. What they do show is that when a *single function* is considered, there are ways to measure its execution time with little to no error. This is not what Tracy is doing. In our case there is no way to determine absolute "this is before" and "this is after" points of a zone, as we probably already are inside another zone. Stopping the CPU execution, so that a deeply nested zone may be measured with great precision, will skew the measurements of all parent zones. And this is not what we want to measure, anyway. We are not interested in how a *single function* behaves, but how a *whole program* behaves. The out-of-order CPU behavior may influence the measurements? Good! We are interested in that. We want to see *how* the code is really executed. How is *stopping* the CPU to make a timer read an appropriate thing to do, when we want to see how a program is performing? At least that's the theory. And besides all that, the profiling overhead is now reduced. |
||
---|---|---|
.. | ||
icons | ||
techdoc.tex | ||
tracy.tex |