Commit Graph

5353 Commits

Author SHA1 Message Date
Bartosz Taudul
2d31ca993e Update NEWS. 2019-10-24 00:13:12 +02:00
Bartosz Taudul
ba61a9ed84 Transfer time deltas, not absolute times.
This change significantly reduces network bandwidth requirements.

Implemented for:
- CPU zones,
- GPU zones,
- locks,
- plots,
- memory events.
2019-10-24 00:06:41 +02:00
Bartosz Taudul
cf88265304 Full 64-bit register is set by rdtsc. 2019-10-21 01:13:55 +02:00
Bartosz Taudul
699ff43f1e Update timings. 2019-10-20 22:18:20 +02:00
Bartosz Taudul
07b66cd4ab Move fake source location out of loop. 2019-10-20 22:18:05 +02:00
Bartosz Taudul
909503403b Simplify delay calibration. 2019-10-20 22:13:29 +02:00
Bartosz Taudul
411e4d42ac Move disassembly from FAQ to manual. 2019-10-20 21:23:16 +02:00
Bartosz Taudul
c774534b47 Use rdtsc instead of rdtscp.
But rdtscp is serializing!

No, it's not. Quoting the Intel Instruction Set Reference:

"The RDTSCP instruction is not a serializing instruction, but it does
wait until all previous instructions have executed and all previous
loads are globally visible. But it does not wait for previous stores to
be globally visible, and subsequent instructions may begin execution
before the read operation is performed.",

"The RDTSC instruction is not a serializing instruction. It does not
necessarily wait until all previous instructions have been executed
before reading the counter. Similarly, subsequent instructions may begin
execution before the read operation is performed."

So, the difference is in waiting for prior instructions to finish
executing. Notice that even in the rdtscp case, execution of the
following instructions may commence before time measurement is finished
and data stores may be still pending.

But, you may say, Intel in its "How to Benchmark Code Execution Times"
document shows that using rdtscp is superior to rdstc. Well, not
exactly. What they do show is that when a *single function* is
considered, there are ways to measure its execution time with little to
no error.

This is not what Tracy is doing.

In our case there is no way to determine absolute "this is before" and
"this is after" points of a zone, as we probably already are inside
another zone.  Stopping the CPU execution, so that a deeply nested zone
may be measured with great precision, will skew the measurements of all
parent zones.

And this is not what we want to measure, anyway. We are not interested
in how a *single function* behaves, but how a *whole program* behaves.
The out-of-order CPU behavior may influence the measurements? Good! We
are interested in that. We want to see *how* the code is really
executed. How is *stopping* the CPU to make a timer read an appropriate
thing to do, when we want to see how a program is performing?

At least that's the theory.

And besides all that, the profiling overhead is now reduced.
2019-10-20 20:52:33 +02:00
Bartosz Taudul
30fc2f02ab Omit calculation of on-stack variable address. 2019-10-20 19:42:29 +02:00
Bartosz Taudul
5c92eae3ed Add early exit for invalid times. 2019-10-20 18:47:50 +02:00
Bartosz Taudul
d592af9c2f Fix TRACY_NO_STATISTICS build. 2019-10-20 17:32:20 +02:00
Bartosz Taudul
5816dc2b11 Don't cache timedist data if ctx switch data is incomplete. 2019-10-20 17:03:30 +02:00
Bartosz Taudul
ccdc102d5a Cache zone time distribution data. 2019-10-20 03:24:58 +02:00
Bartosz Taudul
4d761def61 Microoptimize comparison. 2019-10-16 20:26:39 +02:00
Bartosz Taudul
14292f9e35 Update manual. 2019-10-15 21:57:49 +02:00
Bartosz Taudul
f89bc970ee Update NEWS. 2019-10-15 21:50:22 +02:00
Bartosz Taudul
bfbd09b619 Add CPU usage graph tooltip. 2019-10-15 21:47:37 +02:00
Bartosz Taudul
7a9d4aecd3 Fix graph height calculation. 2019-10-15 21:41:06 +02:00
Bartosz Taudul
4372ad1bc3 Allow disabling CPU usage graph. 2019-10-15 21:37:16 +02:00
Bartosz Taudul
c28bab59b5 Improve look of CPU usage graph. 2019-10-15 21:20:00 +02:00
Bartosz Taudul
5aeeefefbd Draw CPU usage graph. 2019-10-15 16:55:15 +02:00
Bartosz Taudul
3ae5c125f6 Implement counting CPU usage (ctx switch) at a given time. 2019-10-15 16:54:43 +02:00
Bartosz Taudul
3ce6b1205f Don't iterate over 256 CPUs. 2019-10-15 16:13:53 +02:00
Bartosz Taudul
eccb0b1e4a Track max CPU present in context switch data. 2019-10-15 16:13:53 +02:00
Bartosz Taudul
bdb8516d04 Make sure context switch end time wasn't set already. 2019-10-15 14:54:28 +02:00
Bartosz Taudul
a20c6604c3 Add natvis for ContextSwitchData and ContextSwitchCpu. 2019-10-15 14:11:02 +02:00
Bartosz Taudul
fefa3b4693 Improve options UI. 2019-10-15 01:49:36 +02:00
Bartosz Taudul
dffe65f8e2 Update manual. 2019-10-14 20:52:18 +02:00
Bartosz Taudul
f0c77b4ef4 Add annotation list window. 2019-10-14 20:52:18 +02:00
Bartosz Taudul
1ad246b4ca Update manual. 2019-10-14 20:17:28 +02:00
Bartosz Taudul
c6207ed0e9 Move extra tools to main window button bar popup. 2019-10-14 20:07:55 +02:00
Bartosz Taudul
fc7f77eb7a Add implementation of disablable button. 2019-10-14 20:06:57 +02:00
Bartosz Taudul
6de8e6987f Sort annotations. 2019-10-14 19:04:37 +02:00
Bartosz Taudul
5c47467c88 Fix includes. 2019-10-13 17:13:15 +02:00
Bartosz Taudul
671a8f673e Don't interact with unfocused annotations. 2019-10-13 17:01:55 +02:00
Bartosz Taudul
98ab83c69b Update manual. 2019-10-13 17:00:07 +02:00
Bartosz Taudul
ae2c9b4859 Update NEWS. 2019-10-13 16:30:07 +02:00
Bartosz Taudul
e462335f83 Save/load annotations. 2019-10-13 16:29:24 +02:00
Bartosz Taudul
c2f38d0db7 Implement removal of user data files. 2019-10-13 16:29:02 +02:00
Bartosz Taudul
9d0316342d Move Annotation struct to a proper place. 2019-10-13 16:28:40 +02:00
Bartosz Taudul
20cf1d9f83 Implement color selection for annotation region. 2019-10-13 16:14:22 +02:00
Bartosz Taudul
f9e860f559 Display annotation text on timeline. 2019-10-13 15:59:48 +02:00
Bartosz Taudul
1527e7bc10 Add annotation modification window. 2019-10-13 15:50:37 +02:00
Bartosz Taudul
5fed86dae7 Allow adding annotations to timeline. 2019-10-13 15:28:52 +02:00
Bartosz Taudul
215dc8a804 More compact GpuEvent struct (save 4 bytes).
Memory usage reduction of various traces:

big         9011 -> 9007
frameimages 561  -> 552
fi-big      4144 -> 4139
long        5253 -> 5125
2019-10-13 14:42:52 +02:00
Bartosz Taudul
c044df6324 Display number of GPU zones. 2019-10-13 14:21:28 +02:00
Bartosz Taudul
1ae49c14a2 GPU zone count accessor. 2019-10-13 14:13:28 +02:00
Bartosz Taudul
5e1894dd79 Count GPU zones. 2019-10-13 14:13:04 +02:00
Bartosz Taudul
c3870f8837 Use proper type. 2019-10-10 20:30:08 +02:00
Bartosz Taudul
707f113bda Add missing NOMINMAX definitions. 2019-10-10 20:29:06 +02:00