Commit Graph

3940 Commits

Author SHA1 Message Date
Bartosz Taudul
7f07f5beb4 Free child time stack. 2019-10-26 23:32:16 +02:00
Bartosz Taudul
312b7190f8 Mention that only release builds should be profiled. 2019-10-26 16:59:54 +02:00
Bartosz Taudul
f024a05a01 Document another funny optimization. 2019-10-26 16:49:52 +02:00
Bartosz Taudul
01985f50ef Cache source location zones counter search. 2019-10-26 16:33:40 +02:00
Bartosz Taudul
dfe99c2604 Update capture utility in the manual. 2019-10-26 16:33:40 +02:00
Bartosz Taudul
f1fe2df780 Add data transferred display to capture utility. 2019-10-26 16:18:03 +02:00
Bartosz Taudul
dda192985a General updates to the manual. 2019-10-26 16:05:43 +02:00
Bartosz Taudul
0b142a7b29 Remove FAQ. 2019-10-26 15:33:40 +02:00
Bartosz Taudul
492b7f9134 Update connection speed in the manual. 2019-10-26 14:37:45 +02:00
Bartosz Taudul
6aab54cfc4 Improve frame time graph in the manual. 2019-10-26 14:10:47 +02:00
Bartosz Taudul
f7155d7a77 Update context switches in the manual. 2019-10-26 14:00:32 +02:00
Bartosz Taudul
cccabe9b64 Update connection popup in the manual. 2019-10-26 13:54:57 +02:00
Bartosz Taudul
1d0084aa28 Add cache for last accessed source location zones. 2019-10-25 21:29:55 +02:00
Bartosz Taudul
b5419944aa Only write to memory if value has changed. 2019-10-25 21:28:55 +02:00
Bartosz Taudul
779063a18b Cache last shrinked source location. 2019-10-25 21:07:28 +02:00
Bartosz Taudul
294793367f Cache last CheckSourceLocation query.
Just knowing that the query was performed is enough here -- this
function adds a new source location entry, if there already isn't one.
2019-10-25 21:01:33 +02:00
Bartosz Taudul
0f2503d334 Send time deltas in GPU time events. 2019-10-25 19:52:01 +02:00
Bartosz Taudul
1ce25d3aef Init cache in-place. 2019-10-25 19:19:35 +02:00
Bartosz Taudul
8fa5188176 Send delta times for context switches. 2019-10-25 19:13:11 +02:00
Bartosz Taudul
25b3cdc1ee Send thread wakeups when handling disconnect request. 2019-10-25 18:22:42 +02:00
Bartosz Taudul
c8e5489e99 Group caches together. 2019-10-25 18:16:27 +02:00
Bartosz Taudul
29c42cc8d7 Fix assert. 2019-10-25 01:00:32 +02:00
Bartosz Taudul
17a51c898e No need to check if vector is empty. 2019-10-25 00:54:46 +02:00
Bartosz Taudul
b5e759bc5a Don't calculate child index twice. 2019-10-25 00:54:46 +02:00
Bartosz Taudul
70f1074490 Don't iterate over children to calculate zone self time. 2019-10-25 00:33:44 +02:00
Bartosz Taudul
d6a8a8532f Prevent storing variable on stack. 2019-10-24 23:40:21 +02:00
Bartosz Taudul
1fe76be955 Don't reconstruct lock event time on insert. 2019-10-24 23:25:04 +02:00
Bartosz Taudul
b83d0f46d9 Improve updating last time.
Avoid LHS, don't write if don't need to.
2019-10-24 23:23:52 +02:00
Bartosz Taudul
721f3c8925 Callstack is already zero-initialized. 2019-10-24 23:05:39 +02:00
Bartosz Taudul
45332fd837 Don't read memory when setting values. 2019-10-24 23:03:13 +02:00
Bartosz Taudul
c9da5f1474 Use cached thread retriever. 2019-10-24 22:34:18 +02:00
Bartosz Taudul
5873561b54 Add cached thread retriever. 2019-10-24 22:33:48 +02:00
Bartosz Taudul
06bc802107 Avoid load-hit-store. 2019-10-24 22:24:00 +02:00
Bartosz Taudul
04b132b6e2 Check if requested data size doesn't overflow buffer. 2019-10-24 21:22:22 +02:00
Bartosz Taudul
01ceedb57a Focus out labels in connection window. 2019-10-24 00:54:19 +02:00
Bartosz Taudul
c5a6c7bf63 Display transferred data size. 2019-10-24 00:47:25 +02:00
Bartosz Taudul
1cfb5adc44 Count transferred data size. 2019-10-24 00:47:16 +02:00
Bartosz Taudul
2d31ca993e Update NEWS. 2019-10-24 00:13:12 +02:00
Bartosz Taudul
ba61a9ed84 Transfer time deltas, not absolute times.
This change significantly reduces network bandwidth requirements.

Implemented for:
- CPU zones,
- GPU zones,
- locks,
- plots,
- memory events.
2019-10-24 00:06:41 +02:00
Bartosz Taudul
cf88265304 Full 64-bit register is set by rdtsc. 2019-10-21 01:13:55 +02:00
Bartosz Taudul
699ff43f1e Update timings. 2019-10-20 22:18:20 +02:00
Bartosz Taudul
07b66cd4ab Move fake source location out of loop. 2019-10-20 22:18:05 +02:00
Bartosz Taudul
909503403b Simplify delay calibration. 2019-10-20 22:13:29 +02:00
Bartosz Taudul
411e4d42ac Move disassembly from FAQ to manual. 2019-10-20 21:23:16 +02:00
Bartosz Taudul
c774534b47 Use rdtsc instead of rdtscp.
But rdtscp is serializing!

No, it's not. Quoting the Intel Instruction Set Reference:

"The RDTSCP instruction is not a serializing instruction, but it does
wait until all previous instructions have executed and all previous
loads are globally visible. But it does not wait for previous stores to
be globally visible, and subsequent instructions may begin execution
before the read operation is performed.",

"The RDTSC instruction is not a serializing instruction. It does not
necessarily wait until all previous instructions have been executed
before reading the counter. Similarly, subsequent instructions may begin
execution before the read operation is performed."

So, the difference is in waiting for prior instructions to finish
executing. Notice that even in the rdtscp case, execution of the
following instructions may commence before time measurement is finished
and data stores may be still pending.

But, you may say, Intel in its "How to Benchmark Code Execution Times"
document shows that using rdtscp is superior to rdstc. Well, not
exactly. What they do show is that when a *single function* is
considered, there are ways to measure its execution time with little to
no error.

This is not what Tracy is doing.

In our case there is no way to determine absolute "this is before" and
"this is after" points of a zone, as we probably already are inside
another zone.  Stopping the CPU execution, so that a deeply nested zone
may be measured with great precision, will skew the measurements of all
parent zones.

And this is not what we want to measure, anyway. We are not interested
in how a *single function* behaves, but how a *whole program* behaves.
The out-of-order CPU behavior may influence the measurements? Good! We
are interested in that. We want to see *how* the code is really
executed. How is *stopping* the CPU to make a timer read an appropriate
thing to do, when we want to see how a program is performing?

At least that's the theory.

And besides all that, the profiling overhead is now reduced.
2019-10-20 20:52:33 +02:00
Bartosz Taudul
30fc2f02ab Omit calculation of on-stack variable address. 2019-10-20 19:42:29 +02:00
Bartosz Taudul
5c92eae3ed Add early exit for invalid times. 2019-10-20 18:47:50 +02:00
Bartosz Taudul
d592af9c2f Fix TRACY_NO_STATISTICS build. 2019-10-20 17:32:20 +02:00
Bartosz Taudul
5816dc2b11 Don't cache timedist data if ctx switch data is incomplete. 2019-10-20 17:03:30 +02:00
Bartosz Taudul
ccdc102d5a Cache zone time distribution data. 2019-10-20 03:24:58 +02:00