Commit Graph

720 Commits

Author SHA1 Message Date
Bartosz Taudul
707f113bda Add missing NOMINMAX definitions. 2019-10-10 20:29:06 +02:00
Bartosz Taudul
7cf3608493 Avoid unused variables. 2019-10-05 02:11:45 +02:00
Bartosz Taudul
e481b5ba22 Add missing thread sent indication. 2019-10-04 19:18:47 +02:00
Bartosz Taudul
9e1935f070 Make C API symbols visible across dlls. 2019-10-03 22:39:26 +02:00
Bartosz Taudul
130365f4ff Inject tracy_systrace into filesystem and use instead of cat.
Statistics for a one-minute trace:

  Capture tool | Running time | Running regions
---------------+--------------+-----------------
      cat      |    25.11 s   |     392,300
tracy_systrace |    10.41 s   |      12,249
2019-09-27 15:51:29 +02:00
Bartosz Taudul
3dba4088ee Embed precompiled tracy_systrace for android. 2019-09-27 15:50:58 +02:00
Bartosz Taudul
e13cbf52fd Allow changing tracy port in client. 2019-09-21 15:11:15 +02:00
Bartosz Taudul
a221f121ba Extract lock state handling to a separate context class. 2019-09-21 14:55:14 +02:00
Bartosz Taudul
37661fd2ee Fix 32 bit NEON version of DXT1 compression.
This reverts commit b32e8fa24e.

Apparently it is possible to receive non-uniform data in alpha channel, which
breaks the original assumption about not needing the mask. This seemed to be a
problem only on 32 bit NEON implementation of DXT1 compression. Other
implementations handle such data without degradation of visual output.
2019-09-03 21:37:07 +02:00
Bartosz Taudul
7a6564feae Only recycle producers, if there's no data in queue.
("The queue" is per-thread partial queue here.)

This fixes a problem where one thread writes to the queue, then is
terminated, making the (partially filled) queue available for other
threads to recycle. If another thread re-owns the queue, it will change
the associated thread id, while part of the queue was filled by the
original thread. This obviously created invalid data during dequeue.

The fix makes the recycling process check not only for queue inactivity
(which is marked when the original thread terminates), but also if the
queue is empty, preventing mixing data from different threads.
2019-08-30 14:28:44 +02:00
Bartosz Taudul
00b26c1acf Fix TRACY_NO_SYSTEM_TRACING. 2019-08-26 18:02:10 +02:00
Bartosz Taudul
fbeee3cf61 Fix (?) invalid function pointer signature. 2019-08-26 17:59:58 +02:00
Bartosz Taudul
78127dc357 System threads only allow limited information queries. 2019-08-25 00:33:22 +02:00
Bartosz Taudul
deb59b4c38 Somehow fix event ordering. 2019-08-24 01:43:55 +02:00
Bartosz Taudul
1e74a89924 Check if there's data to read from kernel.
Reading from kernel pipe, while being a blocking operation, spin locks the
thread.
2019-08-24 01:06:21 +02:00
Bartosz Taudul
8f6e94d75c Sleep if sys trace pipe buffer underruns. 2019-08-24 00:42:00 +02:00
Bartosz Taudul
2d50d07438 Allow completely disabling system tracing. 2019-08-21 01:16:25 +02:00
Bartosz Taudul
0cbb853945 Add missing SetThreadName() calls. 2019-08-20 16:23:00 +02:00
Bartosz Taudul
332262dd84 Shorter thread names. 2019-08-20 16:22:54 +02:00
Bartosz Taudul
247acd03ee Kernel tracing on android. 2019-08-20 15:49:40 +02:00
Bartosz Taudul
e427d67347 Don't bail out if unimportant variables are not available. 2019-08-20 12:19:05 +02:00
Bartosz Taudul
bfda30be0b Use su on android to set tracing variables. 2019-08-20 12:18:46 +02:00
Bartosz Taudul
9d87a8394d Add missing getline() implementation for android API < 18. 2019-08-19 15:26:09 +02:00
Bartosz Taudul
9be6f4a414 Fix typo. 2019-08-19 13:03:37 +02:00
Bartosz Taudul
d209bb4d01 Add missing function pointer checks. 2019-08-19 12:47:27 +02:00
Bartosz Taudul
20e8a5ecc8 Create tid to pid mapping. 2019-08-17 22:32:41 +02:00
Bartosz Taudul
678e942e9f Transfer PID of profiled program. 2019-08-17 22:19:04 +02:00
Bartosz Taudul
77c636c3fd Retrieve module name for threads with no names on windows. 2019-08-17 21:24:40 +02:00
Bartosz Taudul
f7589bde02 Trace thread wakeups on linux. 2019-08-17 17:18:11 +02:00
Bartosz Taudul
414f903cc5 Collect thread wakeup data. 2019-08-17 17:05:29 +02:00
Bartosz Taudul
e9080bdbcd Hardcode windows PID 4 as "System". 2019-08-17 03:44:47 +02:00
Bartosz Taudul
40eb8a5a03 Proper check for invalid handle. 2019-08-17 03:44:11 +02:00
Bartosz Taudul
6c1dd8eaec Cast thread handle to DWORD. 2019-08-16 21:21:37 +02:00
Bartosz Taudul
d7104c752a Cygwin compat layer. 2019-08-16 21:16:04 +02:00
Bartosz Taudul
819ef2a82b External process/thread name retrieval on linux. 2019-08-16 21:00:42 +02:00
Bartosz Taudul
e975c4d7bf Also retrieve external thread names. 2019-08-16 19:49:16 +02:00
Bartosz Taudul
fe7f56b022 Implement retrieval of external process names. 2019-08-16 19:22:23 +02:00
Bartosz Taudul
83fddd9aa6 Fix unicode builds. 2019-08-16 13:09:27 +02:00
Bartosz Taudul
9d5240c597 Mutable char array is required here due to shit API design. 2019-08-16 13:03:20 +02:00
Bartosz Taudul
14a373a3b8 Add number of CPU cores to host info. 2019-08-15 02:28:35 +02:00
Bartosz Taudul
69077e4e6f Finish sending context switches during disconnect. 2019-08-14 23:06:13 +02:00
Bartosz Taudul
6dc79cf14e Cosmetics. 2019-08-14 23:05:58 +02:00
Bartosz Taudul
c0b524d8de Add a separate method for clearing serial queue. 2019-08-14 22:39:12 +02:00
Bartosz Taudul
71b54dd48a Always collect thread names.
This fixes an issue when a thread was destroyed before its name could be
retrieved.
2019-08-14 16:52:04 +02:00
Bartosz Taudul
5e199d1ab3 Support ftrace on ARM. 2019-08-14 16:28:54 +02:00
Bartosz Taudul
5fbb811f5d Degrade ARM timer to monotonic raw clock.
The monotonic raw clock has the same accuracy as reading cntvct registers, but
using clock_gettime() has a measurable impact on queueing time (135 us vs
83 us).

This change is needed to enable ftrace time readings on ARM linux, which
doesn't provide any way to get raw cntvct readings, like x86-tsc on x86.
2019-08-14 16:19:02 +02:00
Bartosz Taudul
42865d7c7b Don't set x86-tsc clock on non-x86 platforms. 2019-08-14 15:14:36 +02:00
Bartosz Taudul
54a9132bb5 Skip context switch events in on demand mode, if no connection. 2019-08-14 15:09:33 +02:00
Bartosz Taudul
602c38c6c0 Allow checking timer implementation. 2019-08-14 14:35:44 +02:00
Bartosz Taudul
3988b56c92 Capture context switches on linux. 2019-08-14 13:56:15 +02:00
Bartosz Taudul
92b6da7cc2 SetThreadName() only works on the current thread.
This breaking change is required, because kernel trace facilities use
kernel thread ids, which are inaccessible from the pthread_t level.
2019-08-14 02:22:45 +02:00
Bartosz Taudul
73cbf2eead Use windows thread ids on cygwin. 2019-08-13 16:22:58 +02:00
Bartosz Taudul
b313e46139 Keep event trace properties to terminate trace on exit. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
90d26cb1b6 Collect and send context switch events. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
fe0f1aea07 Add system tracing skeleton. 2019-08-12 23:05:34 +02:00
Bartosz Taudul
8aa0be39d5 Drop support for CPU id queries. 2019-08-12 23:05:34 +02:00
Bartosz Taudul
d6f32a0839 Serialize lock processing.
This makes is much easier to process on the server and opens new
optimization possibilities. It also fixes theoretical problems, which
may be caused by invalid ordering of events with the same timestamp.
2019-08-12 13:51:01 +02:00
Bartosz Taudul
0431c03556 Add serial queue interface. 2019-08-12 13:27:15 +02:00
Bartosz Taudul
4d2c7899ab Allow skipping invariant TSC check. 2019-08-08 19:21:39 +02:00
Bartosz Taudul
3a221dafde Display error messages on console, if available. 2019-08-08 19:18:05 +02:00
Bartosz Taudul
aada588129 Proper buffer reset. 2019-08-04 17:48:19 +02:00
Rokas Kupstys
b391e4c21a Fix multiple build errors when compiling with MinGW. 2019-08-04 15:49:46 +03:00
Bartosz Taudul
12969ee497 Track thread context.
This change exploits the fact that events are processed in batches
originating from a single thread. A single message changing thread
context is enough to handle multiple messages, as opposed to inclusion
of thread identifier in each message.
2019-08-02 20:18:08 +02:00
Bartosz Taudul
a4e7a341c0 Proper handling of disconnect request. 2019-08-01 23:14:09 +02:00
Bartosz Taudul
ca3571fd2b Still more. 2019-07-30 01:30:31 +02:00
Bartosz Taudul
47423e6263 And more. 2019-07-30 01:29:13 +02:00
Bartosz Taudul
d3783ae359 Remove magic template syntax. 2019-07-30 01:28:21 +02:00
Bartosz Taudul
9c28b82954 RPMallocInit and RPMallocThreadInit are identical. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
a6a3f45810 Fill in thread id during dequeue, not during enqueue. 2019-07-30 01:15:14 +02:00
Bartosz Taudul
142ef53b42 Dequeue items from a single thread. 2019-07-29 23:44:08 +02:00
Bartosz Taudul
c7f769c52b Allow dequeuing from a single producer, retrieving thread id. 2019-07-29 23:29:30 +02:00
Bartosz Taudul
6cad76ae67 Store thread id in queue producer. 2019-07-29 23:13:06 +02:00
Bartosz Taudul
7ae9a28e32 Drop BlockingConcurrentQueue. 2019-07-29 22:58:13 +02:00
Bartosz Taudul
480a427e07 No need to hash thread ids anymore. 2019-07-29 22:36:04 +02:00
Bartosz Taudul
c60af95053 Remove unused const. 2019-07-29 22:33:32 +02:00
Bartosz Taudul
2d42abf552 Remove CannoAlloc functions. 2019-07-29 22:31:32 +02:00
Bartosz Taudul
b142860c8d More implicit producer removal. 2019-07-29 22:29:39 +02:00
Bartosz Taudul
db6eceb1a6 Producers must be explicit. 2019-07-29 22:25:28 +02:00
Bartosz Taudul
89928fde7b Queue must be always able to alloc. 2019-07-29 22:13:16 +02:00
Bartosz Taudul
a03734afa6 Remove more debug code. 2019-07-29 22:01:06 +02:00
Bartosz Taudul
e9a0145cd5 Remove MCDBGQ_NOLOCKFREE_IMPLICITPRODBLOCKINDEX. 2019-07-29 21:56:53 +02:00
Bartosz Taudul
b496f1ff90 Remove MOODYCAMEL_QUEUE_INTERNAL_DEBUG. 2019-07-29 21:52:49 +02:00
Bartosz Taudul
beaadc3a56 Remove always disabled MCDBGQ_TRACKMEM code. 2019-07-29 21:51:29 +02:00
Bartosz Taudul
82a4a6d9cc Add tracy_ prefix to concurrentqueue.h file name. 2019-07-29 21:47:50 +02:00
Bartosz Taudul
276d764141 Fix cygwin. 2019-07-26 00:02:57 +02:00
Bartosz Taudul
36de7b2cc7 Fix incomplete headers. 2019-07-25 23:41:42 +02:00
Bartosz Taudul
e659220602 Use generic std::call_once() on other platforms. 2019-07-25 23:30:47 +02:00
Bartosz Taudul
d31d1f5946 Detect and report clang-cl. 2019-07-25 19:03:58 +02:00
Bartosz Taudul
092e830264 Use shifts instead of const vector and. 2019-07-22 19:56:47 +02:00
Bartosz Taudul
178dc9eba7 Combine block data directly in AVX registers. 2019-07-20 14:52:34 +02:00
Bartosz Taudul
a6300ef7d1 Ditto on ARM. 2019-07-19 22:13:56 +02:00
Bartosz Taudul
dc49f2f76a Move DXT1 index conversion to server. 2019-07-19 21:46:58 +02:00
Bartosz Taudul
11ba77ced5 Use pthread_once() to initialize rpmalloc on linux. 2019-07-19 20:15:56 +02:00
Bartosz Taudul
4c28593031 Fix races in rpmalloc initialization.
Ensure rpmalloc_thread_initialize() int worker threads is called only after
rpmalloc_initialize() was called on the main profiler thread.
2019-07-19 19:25:27 +02:00
Bartosz Taudul
cef8124247 Replace or with addition to enable usra instruction. 2019-07-19 01:40:27 +02:00
Bartosz Taudul
fd4689a6e2 Don't perform unnecessary ands. 2019-07-19 01:19:52 +02:00
Bartosz Taudul
f65373ece7 Replace two packs with one shuffle. 2019-07-13 20:01:12 +02:00
Bartosz Taudul
fc83f97ad3 Same for AVX/SSE. 2019-07-13 19:34:08 +02:00
Bartosz Taudul
62a167541c No need to mask out indices. 2019-07-13 19:07:25 +02:00
Alex
0c5ea710b0 Merged in z33ky/tracy/const-frame-image (pull request #37)
Constify frame-image pointer in API.
2019-07-13 13:09:21 +00:00
Bartosz Taudul
7bb9549e84 ARM64 specific NEON implementation of DXT1 compression. 2019-07-13 14:31:33 +02:00
Alexander 'z33ky' Hirsch
c6e8dc8d63 Constify frame-image pointer in API. 2019-07-13 12:33:55 +02:00
Bartosz Taudul
60d2384a6a Allow sending application information messages. 2019-07-12 18:34:46 +02:00
Bartosz Taudul
a1ce5fc1f6 Add include for built-in __get_cpuid() on gcc/clang. 2019-07-10 02:09:19 +02:00
Bartosz Taudul
c164a70b9d Check for rdstcp/invariant tsc support. 2019-07-10 02:04:14 +02:00
Bartosz Taudul
c0670848d2 Reuse variable. 2019-07-08 02:08:06 +02:00
Bartosz Taudul
17dbbe67de Remove dependency on range subtraction. 2019-07-08 00:14:36 +02:00
Bartosz Taudul
af1bd3e1fa Faster horizontal add. 2019-07-07 23:57:23 +02:00
Bartosz Taudul
b32e8fa24e Ditto for NEON. 2019-07-06 00:18:53 +02:00
Bartosz Taudul
d236d4b70f Ditto for AVX2. 2019-07-06 00:05:32 +02:00
Bartosz Taudul
f62b21c21d Masking alpha out is not needed.
We assume that alpha value is constant for the whole image. The range
calculation is max - min, so alpha zeroes out. The color normalization
to range is color - min, so alpha also zeroes out here.
2019-07-05 23:58:19 +02:00
Bartosz Taudul
03189a30b8 Two ands less in NEON DXT1 compression. 2019-07-05 18:37:25 +02:00
Bartosz Taudul
275d992cb1 Two ands less in AVX2 DXT1 compression. 2019-07-05 18:22:42 +02:00
Bartosz Taudul
c89358d6b9 Two ands less in SSE DXT1 compression. 2019-07-05 18:17:50 +02:00
Bartosz Taudul
5bfc62f1bf iOS device name decoding. 2019-06-19 09:59:46 +02:00
Bartosz Taudul
59b4f84ce5 Display unknown implementer, part as hex values. 2019-07-03 21:18:17 +02:00
Bartosz Taudul
c6f6c368b2 Decode ARM CPU names. 2019-07-03 21:01:34 +02:00
Bartosz Taudul
e26ab8e9f6 Make forwarding functions more compact. 2019-07-03 18:05:38 +02:00
Bartosz Taudul
bdfb568742 Fix div tables for max range on all channels. 2019-07-01 12:31:06 +02:00
Bartosz Taudul
684a119a2c Fix order of checks for including intrinsics. 2019-07-01 11:45:16 +02:00
Bartosz Taudul
983c48994b Write block data directly to memory. 2019-06-30 11:44:32 +02:00
Bartosz Taudul
9b8c18f99e Improve readability. 2019-06-30 11:44:00 +02:00
Bartosz Taudul
52b6bdb55a Force inline ProcessRGB functions. 2019-06-30 03:33:14 +02:00
Bartosz Taudul
8c06f7288c AVX2 DXT1 compression. 2019-06-30 03:20:58 +02:00
Bartosz Taudul
2e893bba91 Use division tables. 2019-06-29 12:16:49 +02:00
Bartosz Taudul
ab9f036f5e Integrate CheckSolid into ProcessRGB. 2019-06-29 02:04:08 +02:00
Bartosz Taudul
faf6bb97a4 DXT1 NEON color index packing. 2019-06-28 22:36:44 +02:00
Bartosz Taudul
2df1eaaa7e Pack color indices using SSE. 2019-06-28 21:58:10 +02:00
Bartosz Taudul
fcb5b4b888 NEON DXT1 compression. 2019-06-28 14:24:16 +02:00
Bartosz Taudul
e8d4ba492b Unify shifts. 2019-06-28 13:05:32 +02:00
Bartosz Taudul
be4900c822 NEON CheckSolid. 2019-06-28 01:47:04 +02:00
Bartosz Taudul
3c066f1527 Simplify code. 2019-06-27 22:40:03 +02:00
Bartosz Taudul
72a0d4c2ab Rest of SSE DXTC compression. 2019-06-27 22:29:44 +02:00
Bartosz Taudul
137b28e110 SSE CheckSolid. 2019-06-27 22:29:44 +02:00
Bartosz Taudul
3d590b6b8c Initialize rpmalloc in compression thread. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
1939c31165 Experimental DXT1 compressor. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
79eb1b9029 Swap queue and dequeue only if queue has contents. 2019-06-27 13:37:09 +02:00
Bartosz Taudul
bb35f9a897 Compress frame images in a separate thread. 2019-06-27 13:24:35 +02:00
Bartosz Taudul
7ebd2162c6 Add ETC1 compression thread. 2019-06-26 22:57:24 +02:00
Bartosz Taudul
f565e11976 Store frame images in queue. 2019-06-26 22:52:24 +02:00
Bartosz Taudul
281dcf7c1f Cast to proper types. 2019-06-26 19:33:37 +02:00
Bartosz Taudul
8ce41b3543 Proper init order of thread local thread handle. 2019-06-26 19:32:52 +02:00
Bartosz Taudul
bc7f2c49c8 GetThreadHandle() might be used by application's code. 2019-06-25 15:44:49 +02:00
Bartosz Taudul
c749a2e3fe Add C API for plots and messages. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
48e08acb62 Add C API for frame markup. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
ee99ce833c Implement memory allocation tracking for C API. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
281477f7f9 Tokens must be retrieved for each enqueue. 2019-06-24 20:12:14 +02:00
Bartosz Taudul
06a41708a7 Move TLS accesses close together. 2019-06-24 19:38:44 +02:00
Bartosz Taudul
c4f0965851 Don't use cached thread id to retrieve main thread id. 2019-06-24 19:38:07 +02:00
Bartosz Taudul
a56c47a6a0 Store thread handle in a thread local variable.
This saves us a non-inlineable function call. Thread local block is
accessed anyway, since we need to get the token, so we already have the
pointer and don't need to get it a second time (which is done inside
Windows' GetCurrentThreadId()). We also don't need to store the thread
id in ScopedZone anymore, as it was a micro-optimization to save us the
second GetThreadHandle() call.

This change has a measurable effect of reducing enqueue time from ~10 to
~8 ns.

A further optimization would be to completely skip thread handle
retrieval during zone capture and do it instead on retrieval of data
from the queue. Since each thread has its own producer ("token"), the
thread handle should be accessible during the dequeue operation. This is
a much more invasive change, that would require a) modification of the
queue, b) additional processing of dequeued data to inject the thread
handle.
2019-06-24 19:19:47 +02:00