Commit Graph

714 Commits

Author SHA1 Message Date
Bartosz Taudul
b9cdf2cbb7 Expose srcloc allocation in C API. 2019-12-06 00:25:52 +01:00
Bartosz Taudul
399b87fecc Add allocated srcloc zone begin emit functions to C API. 2019-12-06 00:22:49 +01:00
Bartosz Taudul
68ff33d0ba Extract source location allocation functionality. 2019-12-06 00:15:46 +01:00
Bartosz Taudul
e8fcc250a1 Report CPU topology on Linux. 2019-11-30 01:51:29 +01:00
Bartosz Taudul
712403e9fd Transfer, display, save CPU topology data. 2019-11-29 22:41:41 +01:00
Bartosz Taudul
59371eef5a Obtain CPU topology on windows. 2019-11-29 18:29:31 +01:00
thedmd
a1e2c533f6 libbacktrace: Add support for Mach-O (dSYM)
`macho.cpp` was backported from official libbacktrace repository.
2019-11-29 12:04:47 +01:00
Bartosz Taudul
a7d2d5f08b Fix DeferItem() call. 2019-11-26 01:10:50 +01:00
Bartosz Taudul
4551553eb4 Implement setting client parameters from server. 2019-11-25 23:59:48 +01:00
Bartosz Taudul
c5c9dfb0c9 Native callstacks are now optional in allocated callstack messages. 2019-11-25 22:54:10 +01:00
Bartosz Taudul
37eef59d54 Implement reading sys time on BSD. 2019-11-21 20:41:57 +01:00
Bartosz Taudul
c7a22cc1ff Use libbacktrace on BSD. 2019-11-21 20:41:57 +01:00
Bartosz Taudul
bd7b0a8197 Support callstack capture on BSD. 2019-11-21 02:34:42 +01:00
Bartosz Taudul
c79449a6a1 Get proper program name on BSD. 2019-11-21 02:16:12 +01:00
Bartosz Taudul
7940977dba Report physical memory size on BSD. 2019-11-21 02:14:08 +01:00
Bartosz Taudul
3854ae11b2 Revert "Remove dead code."
This reverts commit a36b73f745.
2019-11-17 17:38:02 +01:00
Bartosz Taudul
a36b73f745 Remove dead code. 2019-11-16 18:34:05 +01:00
Bartosz Taudul
8286b0b72f Plumbing for message call stacks. 2019-11-14 23:40:41 +01:00
Bartosz Taudul
0befc75f83 Fix conflicts with X.h. 2019-11-14 18:24:29 +01:00
Bartosz Taudul
655864eb7c Enable crash handler on cygwin.
Crash is properly recorded, but the profiler hangs while waiting for
shutdown finish.
2019-11-07 19:20:13 +01:00
Bartosz Taudul
3fd74a92f9 Native threads are used on mingw. 2019-11-07 19:02:54 +01:00
Bartosz Taudul
351e220d30 Don't calculate queue delay if delayed init is used.
Queue calibration requires queue access during profiler construction. This
in turn requires construction of profiler data block, *which at this point
is underway*, because the profiler is being constructed.
2019-06-19 17:29:04 +02:00
Bartosz Taudul
c98f1f0b6b Make sure profiler is initialized only once in delayed init scenario. 2019-06-19 17:28:18 +02:00
Bartosz Taudul
d4f58ddaf3 Use native windows threads on cygwin, mingw. 2019-11-06 01:42:14 +01:00
Bartosz Taudul
ca198e44d3 Remove dead code from concurrentqueue. 2019-11-05 21:40:52 +01:00
Bartosz Taudul
b5590ed197 Include <mutex> for std::once. 2019-11-05 21:40:35 +01:00
Bartosz Taudul
3e9bb80217 More header cleanup. 2019-11-05 20:15:53 +01:00
Bartosz Taudul
6bbf273581 Partial header inclusion cleanup. 2019-11-05 20:09:40 +01:00
Bartosz Taudul
907574e637 Allow remote plot configuration. 2019-11-05 17:45:19 +01:00
Bartosz Taudul
f34609fd9b Set per-cpu kernel buffer size to 512 KB.
The default setting was causing events to be lost on Android.
2019-11-03 21:52:20 +01:00
Bartosz Taudul
b8d459d48b Use proper string size (for consistency).
On Android code path this value is ignored.
2019-11-03 21:51:49 +01:00
Bartosz Taudul
ca0fae33d1 Remove obsolete assert.
Before-terminate-events now include events that have time delta
processing, with no memory to free.
2019-11-01 20:10:24 +01:00
Bartosz Taudul
1f0c18882c Don't collect sys time after application has exited. 2019-10-29 23:05:14 +01:00
Bartosz Taudul
0f2503d334 Send time deltas in GPU time events. 2019-10-25 19:52:01 +02:00
Bartosz Taudul
8fa5188176 Send delta times for context switches. 2019-10-25 19:13:11 +02:00
Bartosz Taudul
25b3cdc1ee Send thread wakeups when handling disconnect request. 2019-10-25 18:22:42 +02:00
Bartosz Taudul
04b132b6e2 Check if requested data size doesn't overflow buffer. 2019-10-24 21:22:22 +02:00
Bartosz Taudul
ba61a9ed84 Transfer time deltas, not absolute times.
This change significantly reduces network bandwidth requirements.

Implemented for:
- CPU zones,
- GPU zones,
- locks,
- plots,
- memory events.
2019-10-24 00:06:41 +02:00
Bartosz Taudul
cf88265304 Full 64-bit register is set by rdtsc. 2019-10-21 01:13:55 +02:00
Bartosz Taudul
07b66cd4ab Move fake source location out of loop. 2019-10-20 22:18:05 +02:00
Bartosz Taudul
909503403b Simplify delay calibration. 2019-10-20 22:13:29 +02:00
Bartosz Taudul
c774534b47 Use rdtsc instead of rdtscp.
But rdtscp is serializing!

No, it's not. Quoting the Intel Instruction Set Reference:

"The RDTSCP instruction is not a serializing instruction, but it does
wait until all previous instructions have executed and all previous
loads are globally visible. But it does not wait for previous stores to
be globally visible, and subsequent instructions may begin execution
before the read operation is performed.",

"The RDTSC instruction is not a serializing instruction. It does not
necessarily wait until all previous instructions have been executed
before reading the counter. Similarly, subsequent instructions may begin
execution before the read operation is performed."

So, the difference is in waiting for prior instructions to finish
executing. Notice that even in the rdtscp case, execution of the
following instructions may commence before time measurement is finished
and data stores may be still pending.

But, you may say, Intel in its "How to Benchmark Code Execution Times"
document shows that using rdtscp is superior to rdstc. Well, not
exactly. What they do show is that when a *single function* is
considered, there are ways to measure its execution time with little to
no error.

This is not what Tracy is doing.

In our case there is no way to determine absolute "this is before" and
"this is after" points of a zone, as we probably already are inside
another zone.  Stopping the CPU execution, so that a deeply nested zone
may be measured with great precision, will skew the measurements of all
parent zones.

And this is not what we want to measure, anyway. We are not interested
in how a *single function* behaves, but how a *whole program* behaves.
The out-of-order CPU behavior may influence the measurements? Good! We
are interested in that. We want to see *how* the code is really
executed. How is *stopping* the CPU to make a timer read an appropriate
thing to do, when we want to see how a program is performing?

At least that's the theory.

And besides all that, the profiling overhead is now reduced.
2019-10-20 20:52:33 +02:00
Bartosz Taudul
30fc2f02ab Omit calculation of on-stack variable address. 2019-10-20 19:42:29 +02:00
Bartosz Taudul
c3870f8837 Use proper type. 2019-10-10 20:30:08 +02:00
Bartosz Taudul
707f113bda Add missing NOMINMAX definitions. 2019-10-10 20:29:06 +02:00
Bartosz Taudul
7cf3608493 Avoid unused variables. 2019-10-05 02:11:45 +02:00
Bartosz Taudul
e481b5ba22 Add missing thread sent indication. 2019-10-04 19:18:47 +02:00
Bartosz Taudul
9e1935f070 Make C API symbols visible across dlls. 2019-10-03 22:39:26 +02:00
Bartosz Taudul
130365f4ff Inject tracy_systrace into filesystem and use instead of cat.
Statistics for a one-minute trace:

  Capture tool | Running time | Running regions
---------------+--------------+-----------------
      cat      |    25.11 s   |     392,300
tracy_systrace |    10.41 s   |      12,249
2019-09-27 15:51:29 +02:00
Bartosz Taudul
3dba4088ee Embed precompiled tracy_systrace for android. 2019-09-27 15:50:58 +02:00
Bartosz Taudul
e13cbf52fd Allow changing tracy port in client. 2019-09-21 15:11:15 +02:00
Bartosz Taudul
a221f121ba Extract lock state handling to a separate context class. 2019-09-21 14:55:14 +02:00
Bartosz Taudul
37661fd2ee Fix 32 bit NEON version of DXT1 compression.
This reverts commit b32e8fa24e.

Apparently it is possible to receive non-uniform data in alpha channel, which
breaks the original assumption about not needing the mask. This seemed to be a
problem only on 32 bit NEON implementation of DXT1 compression. Other
implementations handle such data without degradation of visual output.
2019-09-03 21:37:07 +02:00
Bartosz Taudul
7a6564feae Only recycle producers, if there's no data in queue.
("The queue" is per-thread partial queue here.)

This fixes a problem where one thread writes to the queue, then is
terminated, making the (partially filled) queue available for other
threads to recycle. If another thread re-owns the queue, it will change
the associated thread id, while part of the queue was filled by the
original thread. This obviously created invalid data during dequeue.

The fix makes the recycling process check not only for queue inactivity
(which is marked when the original thread terminates), but also if the
queue is empty, preventing mixing data from different threads.
2019-08-30 14:28:44 +02:00
Bartosz Taudul
00b26c1acf Fix TRACY_NO_SYSTEM_TRACING. 2019-08-26 18:02:10 +02:00
Bartosz Taudul
fbeee3cf61 Fix (?) invalid function pointer signature. 2019-08-26 17:59:58 +02:00
Bartosz Taudul
78127dc357 System threads only allow limited information queries. 2019-08-25 00:33:22 +02:00
Bartosz Taudul
deb59b4c38 Somehow fix event ordering. 2019-08-24 01:43:55 +02:00
Bartosz Taudul
1e74a89924 Check if there's data to read from kernel.
Reading from kernel pipe, while being a blocking operation, spin locks the
thread.
2019-08-24 01:06:21 +02:00
Bartosz Taudul
8f6e94d75c Sleep if sys trace pipe buffer underruns. 2019-08-24 00:42:00 +02:00
Bartosz Taudul
2d50d07438 Allow completely disabling system tracing. 2019-08-21 01:16:25 +02:00
Bartosz Taudul
0cbb853945 Add missing SetThreadName() calls. 2019-08-20 16:23:00 +02:00
Bartosz Taudul
332262dd84 Shorter thread names. 2019-08-20 16:22:54 +02:00
Bartosz Taudul
247acd03ee Kernel tracing on android. 2019-08-20 15:49:40 +02:00
Bartosz Taudul
e427d67347 Don't bail out if unimportant variables are not available. 2019-08-20 12:19:05 +02:00
Bartosz Taudul
bfda30be0b Use su on android to set tracing variables. 2019-08-20 12:18:46 +02:00
Bartosz Taudul
9d87a8394d Add missing getline() implementation for android API < 18. 2019-08-19 15:26:09 +02:00
Bartosz Taudul
9be6f4a414 Fix typo. 2019-08-19 13:03:37 +02:00
Bartosz Taudul
d209bb4d01 Add missing function pointer checks. 2019-08-19 12:47:27 +02:00
Bartosz Taudul
20e8a5ecc8 Create tid to pid mapping. 2019-08-17 22:32:41 +02:00
Bartosz Taudul
678e942e9f Transfer PID of profiled program. 2019-08-17 22:19:04 +02:00
Bartosz Taudul
77c636c3fd Retrieve module name for threads with no names on windows. 2019-08-17 21:24:40 +02:00
Bartosz Taudul
f7589bde02 Trace thread wakeups on linux. 2019-08-17 17:18:11 +02:00
Bartosz Taudul
414f903cc5 Collect thread wakeup data. 2019-08-17 17:05:29 +02:00
Bartosz Taudul
e9080bdbcd Hardcode windows PID 4 as "System". 2019-08-17 03:44:47 +02:00
Bartosz Taudul
40eb8a5a03 Proper check for invalid handle. 2019-08-17 03:44:11 +02:00
Bartosz Taudul
6c1dd8eaec Cast thread handle to DWORD. 2019-08-16 21:21:37 +02:00
Bartosz Taudul
d7104c752a Cygwin compat layer. 2019-08-16 21:16:04 +02:00
Bartosz Taudul
819ef2a82b External process/thread name retrieval on linux. 2019-08-16 21:00:42 +02:00
Bartosz Taudul
e975c4d7bf Also retrieve external thread names. 2019-08-16 19:49:16 +02:00
Bartosz Taudul
fe7f56b022 Implement retrieval of external process names. 2019-08-16 19:22:23 +02:00
Bartosz Taudul
83fddd9aa6 Fix unicode builds. 2019-08-16 13:09:27 +02:00
Bartosz Taudul
9d5240c597 Mutable char array is required here due to shit API design. 2019-08-16 13:03:20 +02:00
Bartosz Taudul
14a373a3b8 Add number of CPU cores to host info. 2019-08-15 02:28:35 +02:00
Bartosz Taudul
69077e4e6f Finish sending context switches during disconnect. 2019-08-14 23:06:13 +02:00
Bartosz Taudul
6dc79cf14e Cosmetics. 2019-08-14 23:05:58 +02:00
Bartosz Taudul
c0b524d8de Add a separate method for clearing serial queue. 2019-08-14 22:39:12 +02:00
Bartosz Taudul
71b54dd48a Always collect thread names.
This fixes an issue when a thread was destroyed before its name could be
retrieved.
2019-08-14 16:52:04 +02:00
Bartosz Taudul
5e199d1ab3 Support ftrace on ARM. 2019-08-14 16:28:54 +02:00
Bartosz Taudul
5fbb811f5d Degrade ARM timer to monotonic raw clock.
The monotonic raw clock has the same accuracy as reading cntvct registers, but
using clock_gettime() has a measurable impact on queueing time (135 us vs
83 us).

This change is needed to enable ftrace time readings on ARM linux, which
doesn't provide any way to get raw cntvct readings, like x86-tsc on x86.
2019-08-14 16:19:02 +02:00
Bartosz Taudul
42865d7c7b Don't set x86-tsc clock on non-x86 platforms. 2019-08-14 15:14:36 +02:00
Bartosz Taudul
54a9132bb5 Skip context switch events in on demand mode, if no connection. 2019-08-14 15:09:33 +02:00
Bartosz Taudul
602c38c6c0 Allow checking timer implementation. 2019-08-14 14:35:44 +02:00
Bartosz Taudul
3988b56c92 Capture context switches on linux. 2019-08-14 13:56:15 +02:00
Bartosz Taudul
92b6da7cc2 SetThreadName() only works on the current thread.
This breaking change is required, because kernel trace facilities use
kernel thread ids, which are inaccessible from the pthread_t level.
2019-08-14 02:22:45 +02:00
Bartosz Taudul
73cbf2eead Use windows thread ids on cygwin. 2019-08-13 16:22:58 +02:00
Bartosz Taudul
b313e46139 Keep event trace properties to terminate trace on exit. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
90d26cb1b6 Collect and send context switch events. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
fe0f1aea07 Add system tracing skeleton. 2019-08-12 23:05:34 +02:00
Bartosz Taudul
8aa0be39d5 Drop support for CPU id queries. 2019-08-12 23:05:34 +02:00