Commit Graph

247 Commits

Author SHA1 Message Date
Bartosz Taudul
9360df89b1 Store announce and terminate time of locks. 2018-12-16 21:07:26 +01:00
Bartosz Taudul
f42d52923a No-op processing of lock terminate events. 2018-12-16 20:46:33 +01:00
Bartosz Taudul
793e955480 Fix crash when loading a trace with unresolved strings.
Unresolved strings ("???") are not saved, but the internal string
pointers are saved. Resolving such string pointers caused a crash.
2018-10-21 16:38:20 +02:00
Bartosz Taudul
9211ce42da Non-on-demand client is only able to handle one connection. 2018-09-09 19:42:06 +02:00
Bartosz Taudul
984a711666 Send protocol version to verify handshake. 2018-09-09 19:28:53 +02:00
Bartosz Taudul
270072b09e Require shibboleth match at start of connection. 2018-09-09 18:26:53 +02:00
Bartosz Taudul
806c8de463 Only one outgoing server connection is supported. 2018-09-09 17:47:20 +02:00
Bartosz Taudul
9f4d6692dc Proper way to get full frame count. 2018-09-01 12:38:12 +02:00
Bartosz Taudul
8f1acf2571 Store explicit program name and capture time. 2018-08-29 01:02:29 +02:00
Bartosz Taudul
bc6a553a3a Fetch thread names in memory events. 2018-08-28 01:48:19 +02:00
Bartosz Taudul
99b7a39c52 Save/load crash information. 2018-08-20 02:27:24 +02:00
Bartosz Taudul
3b526b074e Send crash report. 2018-08-20 02:23:55 +02:00
Bartosz Taudul
366ea35593 Allow crash event reporting.
When crash happens there's no longer anything to profile -- don't wait
for unfinished zones to finish before sending client terminate
confirmation.
2018-08-20 01:03:16 +02:00
Bartosz Taudul
e0a4b9c56a Save/load host info. 2018-08-19 18:28:48 +02:00
Bartosz Taudul
203d9b4b85 Store host info. 2018-08-19 18:21:56 +02:00
Bartosz Taudul
a15a287a6b Don't over-allocate vectors, when exact needed size is known.
This reduces memory usage when loading saved traces. Memory usage
reduction observed on a selected number of traces:

5625.76 MB -> 5330.29 MB
3292.94 MB -> 2978.66 MB
632.77 MB  -> 479.58 MB
681.32 MB  -> 506.27 MB
11.9 GB    -> 11.22 GB
854.21 MB  -> 806.17 MB
10.57 GB   -> 7175.31 MB
67.38 MB   -> 66.63 MB
2026.12 MB -> 1744.2 MB
86.55 MB   -> 85.57 MB
343.64 MB  -> 244.81 MB
201.93 MB  -> 162.25 MB
2018-08-09 19:41:15 +02:00
Bartosz Taudul
a14a6fa8fb Don't shadow variables. 2018-08-09 19:41:15 +02:00
Bartosz Taudul
a51da71fa4 Add lock, plot counts to worker. 2018-08-08 19:21:53 +02:00
Bartosz Taudul
d36b0aff45 Fix progress of loading GPU zones. 2018-08-05 13:07:58 +02:00
Bartosz Taudul
9d051cf5ee Add support for discontinuous frames. 2018-08-05 02:15:54 +02:00
Bartosz Taudul
6b8a3b25ba Fix drawing of last frame. 2018-08-04 23:19:35 +02:00
Bartosz Taudul
9b4348b497 Handle frame name queries. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
4424a7d7e8 Last time should never be zero. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
23dfc2e3fc Multiple frame sets support. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
0b4c2724ce Add strings to map directly in StringDiscovery. 2018-08-04 17:10:45 +02:00
Bartosz Taudul
ada9f78678 Use StringDiscovery for plots. 2018-08-04 16:33:03 +02:00
Bartosz Taudul
e174e2c12a Remove obsolete comment.
Nothing happens with the source data, as the strings are uniquely stored
in the StoreString() function.
2018-08-04 15:46:10 +02:00
Bartosz Taudul
6ef2d2d9a3 Track progress of loading plots. 2018-08-04 15:17:37 +02:00
Bartosz Taudul
18896044c4 Display explicit names of loaded things. 2018-07-29 16:56:46 +02:00
Bartosz Taudul
9f13475b52 Track trace version in worker. 2018-07-29 15:33:48 +02:00
Bartosz Taudul
13509c14f1 Save size of 'active' and 'frees' memory data structures. 2018-07-29 15:29:56 +02:00
Bartosz Taudul
00d07e39f7 Save threadExpand size to allow vector preallocation. 2018-07-29 15:19:44 +02:00
Bartosz Taudul
bff6eb4c34 Save source location zones counts.
This allows preallocation of zones-in-source-location vectors.
2018-07-29 14:58:01 +02:00
Bartosz Taudul
12b90d1630 Move tracy version to a separate header. 2018-07-29 14:20:44 +02:00
Bartosz Taudul
ccc5c37af5 Always count source location zones. 2018-07-29 14:16:13 +02:00
Bartosz Taudul
4456c8a454 Reserve space for string data. 2018-07-29 14:13:29 +02:00
Bartosz Taudul
648070e6a1 Include each loaded zone in sub progress. 2018-07-28 19:22:28 +02:00
Bartosz Taudul
4741dab833 Track sub progress. 2018-07-28 19:05:01 +02:00
Bartosz Taudul
a14238c199 Add sub progress display. 2018-07-28 18:56:52 +02:00
Bartosz Taudul
a46425f4e9 Adjust load stages. 2018-07-28 18:26:00 +02:00
Bartosz Taudul
0bf0ceed3d Track trace loading progress. 2018-07-28 17:59:17 +02:00
Bartosz Taudul
d84d0b7754 Don't try to read empty timelines. 2018-07-22 21:15:28 +02:00
Bartosz Taudul
25116a8059 Don't try to compress invalid thread. 2018-07-22 21:13:42 +02:00
Bartosz Taudul
010cf66e43 Call Vector destructors. 2018-07-22 21:01:45 +02:00
Bartosz Taudul
29159069ab Properly initialize child index. 2018-07-22 20:14:55 +02:00
Bartosz Taudul
7d7877517e Also remove child vectors from GPU events. 2018-07-22 19:47:01 +02:00
Bartosz Taudul
3a934b2ba3 Store children vectors in a separate data collection.
This reduces per-zone memory cost by 9 bytes if there are no children
and increases it by 4 bytes, if there are children. This is universally
a better solution, as the following data shows:

+++ /home/wolf/desktop/tracy-old/android.tracy +++
Vectors: 2794480
Size 0: 2373070 (84.92%)
Size 1: 70237 (2.51%)
Size 2+: 351173 (12.57%)
+++ /home/wolf/desktop/tracy-old/asset-new.tracy +++
Vectors: 1799227
Size 0: 1482691 (82.41%)
Size 1: 93272 (5.18%)
Size 2+: 223264 (12.41%)
+++ /home/wolf/desktop/tracy-old/asset-new-id.tracy +++
Vectors: 1977996
Size 0: 1640817 (82.95%)
Size 1: 97198 (4.91%)
Size 2+: 239981 (12.13%)
+++ /home/wolf/desktop/tracy-old/asset-old.tracy +++
Vectors: 1782395
Size 0: 1471437 (82.55%)
Size 1: 88813 (4.98%)
Size 2+: 222145 (12.46%)
+++ /home/wolf/desktop/tracy-old/big.tracy +++
Vectors: 180794047
Size 0: 172696094 (95.52%)
Size 1: 2799772 (1.55%)
Size 2+: 5298181 (2.93%)
+++ /home/wolf/desktop/tracy-old/darkrl.tracy +++
Vectors: 12014129
Size 0: 11611324 (96.65%)
Size 1: 134980 (1.12%)
Size 2+: 267825 (2.23%)
+++ /home/wolf/desktop/tracy-old/mem.tracy +++
Vectors: 383097
Size 0: 321932 (84.03%)
Size 1: 854 (0.22%)
Size 2+: 60311 (15.74%)
+++ /home/wolf/desktop/tracy-old/new.tracy +++
Vectors: 77536
Size 0: 63035 (81.30%)
Size 1: 8886 (11.46%)
Size 2+: 5615 (7.24%)
+++ /home/wolf/desktop/tracy-old/selfprofile.tracy +++
Vectors: 22940871
Size 0: 22704868 (98.97%)
Size 1: 73000 (0.32%)
Size 2+: 163003 (0.71%)
+++ /home/wolf/desktop/tracy-old/tbrowser.tracy +++
Vectors: 962682
Size 0: 695380 (72.23%)
Size 1: 43007 (4.47%)
Size 2+: 224295 (23.30%)
+++ /home/wolf/desktop/tracy-old/virtualfile_hc.tracy +++
Vectors: 529170
Size 0: 449386 (84.92%)
Size 1: 15694 (2.97%)
Size 2+: 64090 (12.11%)
+++ /home/wolf/desktop/tracy-old/zfile_hc.tracy +++
Vectors: 264849
Size 0: 220589 (83.29%)
Size 1: 9386 (3.54%)
Size 2+: 34874 (13.17%)
2018-07-22 16:05:50 +02:00
Bartosz Taudul
fc310ce15a Fix check. 2018-07-17 18:29:07 +02:00
Rokas Kupstys
8a8faa3d6c Added __has_include(<execution>) back. 2018-07-17 19:25:26 +03:00
Rokas Kupstys
5c75fe292f Fix msvc builds when required c++ standard version is set to lower than c++17.
Also use latest available c++ standard which allows using older VS versions that only support c++14.
2018-07-17 18:29:48 +03:00
Bartosz Taudul
c6ea032de3 GPU source location may not yet be available. 2018-07-15 19:00:40 +02:00
Bartosz Taudul
21da3bca63 Don't create lz4buf on stack. 2018-07-14 16:02:33 +02:00
Bartosz Taudul
561d2dc360 Use the fastest mutex available.
The selection is based on the following test results:

MSVC:
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 11.641 ns/iter
     2 thread contention: 141.559 ns/iter
     3 thread contention: 242.733 ns/iter
     4 thread contention: 409.807 ns/iter
     5 thread contention: 561.544 ns/iter
     6 thread contention: 785.845 ns/iter
=> std::mutex
     No contention: 19.190 ns/iter
     2 thread contention: 39.305 ns/iter
     3 thread contention: 58.999 ns/iter
     4 thread contention: 59.532 ns/iter
     5 thread contention: 103.539 ns/iter
     6 thread contention: 110.314 ns/iter
=> std::shared_timed_mutex
     No contention: 45.487 ns/iter
     2 thread contention: 96.351 ns/iter
     3 thread contention: 142.871 ns/iter
     4 thread contention: 184.999 ns/iter
     5 thread contention: 336.608 ns/iter
     6 thread contention: 542.551 ns/iter
=> std::shared_mutex
     No contention: 10.861 ns/iter
     2 thread contention: 17.495 ns/iter
     3 thread contention: 31.126 ns/iter
     4 thread contention: 40.468 ns/iter
     5 thread contention: 15.677 ns/iter
     6 thread contention: 64.505 ns/iter

Cygwin (clang):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 11.536 ns/iter
     2 thread contention: 121.082 ns/iter
     3 thread contention: 396.430 ns/iter
     4 thread contention: 672.555 ns/iter
     5 thread contention: 1327.761 ns/iter
     6 thread contention: 14151.955 ns/iter
=> std::mutex
     No contention: 62.583 ns/iter
     2 thread contention: 3990.464 ns/iter
     3 thread contention: 7161.189 ns/iter
     4 thread contention: 9870.820 ns/iter
     5 thread contention: 12355.178 ns/iter
     6 thread contention: 14694.903 ns/iter
=> std::shared_timed_mutex
     No contention: 91.687 ns/iter
     2 thread contention: 1115.037 ns/iter
     3 thread contention: 4183.792 ns/iter
     4 thread contention: 15283.491 ns/iter
     5 thread contention: 27812.477 ns/iter
     6 thread contention: 35028.140 ns/iter
=> std::shared_mutex
     No contention: 91.764 ns/iter
     2 thread contention: 1051.826 ns/iter
     3 thread contention: 5574.720 ns/iter
     4 thread contention: 15721.416 ns/iter
     5 thread contention: 27721.487 ns/iter
     6 thread contention: 35420.404 ns/iter

Linux (x64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 13.487 ns/iter
     2 thread contention: 210.317 ns/iter
     3 thread contention: 430.855 ns/iter
     4 thread contention: 510.533 ns/iter
     5 thread contention: 1003.609 ns/iter
     6 thread contention: 1787.683 ns/iter
=> std::mutex
     No contention: 12.403 ns/iter
     2 thread contention: 157.122 ns/iter
     3 thread contention: 186.791 ns/iter
     4 thread contention: 265.073 ns/iter
     5 thread contention: 283.778 ns/iter
     6 thread contention: 270.687 ns/iter
=> std::shared_timed_mutex
     No contention: 21.509 ns/iter
     2 thread contention: 150.179 ns/iter
     3 thread contention: 256.574 ns/iter
     4 thread contention: 415.351 ns/iter
     5 thread contention: 611.532 ns/iter
     6 thread contention: 944.695 ns/iter
=> std::shared_mutex
     No contention: 20.805 ns/iter
     2 thread contention: 157.034 ns/iter
     3 thread contention: 244.025 ns/iter
     4 thread contention: 406.269 ns/iter
     5 thread contention: 387.985 ns/iter
     6 thread contention: 468.550 ns/iter

Linux (arm64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 20.891 ns/iter
     2 thread contention: 211.037 ns/iter
     3 thread contention: 409.962 ns/iter
     4 thread contention: 657.441 ns/iter
     5 thread contention: 828.405 ns/iter
     6 thread contention: 1131.827 ns/iter
=> std::mutex
     No contention: 50.884 ns/iter
     2 thread contention: 103.620 ns/iter
     3 thread contention: 332.429 ns/iter
     4 thread contention: 620.802 ns/iter
     5 thread contention: 783.943 ns/iter
     6 thread contention: 834.002 ns/iter
=> std::shared_timed_mutex
     No contention: 64.948 ns/iter
     2 thread contention: 173.191 ns/iter
     3 thread contention: 490.352 ns/iter
     4 thread contention: 660.668 ns/iter
     5 thread contention: 1014.546 ns/iter
     6 thread contention: 1451.553 ns/iter
=> std::shared_mutex
     No contention: 64.521 ns/iter
     2 thread contention: 195.222 ns/iter
     3 thread contention: 490.819 ns/iter
     4 thread contention: 654.786 ns/iter
     5 thread contention: 955.759 ns/iter
     6 thread contention: 1282.544 ns/iter
2018-07-14 00:39:01 +02:00
Bartosz Taudul
96042891f7 Reintroduce explicit template type for std::lock_guard.
Requested in issue #4 for support of older MSVC versions.
2018-07-13 12:30:29 +02:00
Bartosz Taudul
90a874f311 Require MSVC 15.7 for <execution> support. 2018-07-13 12:26:02 +02:00
Bartosz Taudul
c8b5b9447d Ignore dangling memory frees in on-demand mode. 2018-07-12 01:35:32 +02:00
Bartosz Taudul
e5064dec1e Store on-demand connection state. 2018-07-12 01:21:04 +02:00
Bartosz Taudul
d1ddaa8d59 Store frame offset in trace dumps. 2018-07-10 22:56:41 +02:00
Bartosz Taudul
a78981e040 Store on-demand frame offset. 2018-07-10 22:42:00 +02:00
Bartosz Taudul
6a9caabc63 Send on-demand initial payload message. 2018-07-10 22:37:39 +02:00
Bartosz Taudul
c056f3be41 Send keep alive messages to determine if client disconnected. 2018-07-10 21:39:17 +02:00
Bartosz Taudul
cb100e261c Return custom zone names. 2018-06-29 16:12:40 +02:00
Bartosz Taudul
053284b1c7 Process custom free-form zone names. 2018-06-29 16:12:17 +02:00
Bartosz Taudul
865e8d8506 Extract zone name getting functionality. 2018-06-29 15:14:20 +02:00
Bartosz Taudul
4a467b6d03 Remove GPU resync leftovers. 2018-06-28 00:48:23 +02:00
Bartosz Taudul
ab2945b988 Slab allocator is not thread safe. 2018-06-24 17:10:46 +02:00
Bartosz Taudul
b0aa13f4af Callstack getters are const. 2018-06-24 16:15:49 +02:00
Bartosz Taudul
11cf650be6 Fix GPU queries ordering.
With multithreaded Vulkan rendering it is possible that GPU time queries
will be sent in a different order than the originating CPU queries were
made. This commit changes the in-order queue to a map of queries,
waiting to be resolved.
2018-06-22 16:37:54 +02:00
Bartosz Taudul
af0c64c888 Remove GPU resync support.
The whole concept is not really reliable. And it forces CPU to GPU sync,
which is bad.
2018-06-22 16:34:51 +02:00
Bartosz Taudul
cd5ca3e754 Don't use hash table to store 256 pointers. 2018-06-22 15:14:44 +02:00
Bartosz Taudul
3a885bb8fd Support callstack collection for OpenGL GPU zones. 2018-06-22 02:13:35 +02:00
Bartosz Taudul
35dc2f796e Process GpuZoneBeginCallstack queue event. 2018-06-22 01:56:32 +02:00
Bartosz Taudul
4992ae6b39 Take callstack field in ZoneEvent into account in save/load. 2018-06-22 01:30:08 +02:00
Bartosz Taudul
5e01a8ead9 Process callstack queue event. 2018-06-22 01:15:49 +02:00
Bartosz Taudul
205a4e4ca2 Add callstack index to ZoneEvent. 2018-06-22 01:11:03 +02:00
Bartosz Taudul
978e168cbd Handle ZoneBeginCallstack queue event.
This is identical to ZoneBegin handling, but requires some additional
bookkeeping to account for the incoming callstack information.
2018-06-22 01:07:25 +02:00
Bartosz Taudul
973eab2b4a Fix typo. 2018-06-20 23:42:00 +02:00
Bartosz Taudul
2a618c90d5 Properly save compressed thread in GPU events. 2018-06-20 23:12:49 +02:00
Bartosz Taudul
7912807133 Wait for transfer of pending callback frames. 2018-06-20 14:57:48 +02:00
Bartosz Taudul
60395c85e0 Wait for pending callstacks. 2018-06-20 14:54:08 +02:00
Bartosz Taudul
9a5329b97d Save and load callstack frames. 2018-06-20 01:59:25 +02:00
Bartosz Taudul
e56ee377f4 Fix off-by-one. 2018-06-20 01:54:27 +02:00
Bartosz Taudul
88b1955a5a Filename in callstack frame is not a persistent pointer. 2018-06-20 01:26:05 +02:00
Bartosz Taudul
4000f27e15 Stack frame accessor. 2018-06-20 01:18:59 +02:00
Bartosz Taudul
0c0afa5ac7 Process callstack frames. 2018-06-20 01:07:09 +02:00
Bartosz Taudul
203744cdd9 Callstack frame queries. 2018-06-20 00:25:26 +02:00
Bartosz Taudul
06f34052a5 Have to track callstacks of both alloc and free. 2018-06-19 22:08:47 +02:00
Bartosz Taudul
0de279005b Load saved callstack payload. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
14b71e988b Properly skip memory event data. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
4033d74479 Callstack payload index 0 is invalid. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
b6e71dd909 Load memory event callstack index. 2018-06-19 21:51:06 +02:00
Bartosz Taudul
7c1333ce2f Save callstack payload. 2018-06-19 21:39:52 +02:00
Bartosz Taudul
2940230fcf Save callstack index in memory events. 2018-06-19 21:39:42 +02:00
Bartosz Taudul
77db91253b Assign callstack idx to memory event. 2018-06-19 21:34:36 +02:00
Bartosz Taudul
c28465aa7c Store unique callstack payloads. 2018-06-19 21:16:02 +02:00
Bartosz Taudul
cbc9ede3ca No-op callstack payload handling. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
6a63d09a49 Don't check for each type, if range check is possible. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
e51eef3dcd Process memory events with callstack. 2018-06-19 18:52:45 +02:00
Bartosz Taudul
59dc55002b Callstack ptr in server data structures.
Will be probably reduced to 32-bit index later on.
2018-06-19 18:52:10 +02:00
Bartosz Taudul
bb0631585c Store thread id of GPU events. 2018-06-17 19:07:07 +02:00