Commit Graph

695 Commits

Author SHA1 Message Date
Bartosz Taudul
561d2dc360 Use the fastest mutex available.
The selection is based on the following test results:

MSVC:
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 11.641 ns/iter
     2 thread contention: 141.559 ns/iter
     3 thread contention: 242.733 ns/iter
     4 thread contention: 409.807 ns/iter
     5 thread contention: 561.544 ns/iter
     6 thread contention: 785.845 ns/iter
=> std::mutex
     No contention: 19.190 ns/iter
     2 thread contention: 39.305 ns/iter
     3 thread contention: 58.999 ns/iter
     4 thread contention: 59.532 ns/iter
     5 thread contention: 103.539 ns/iter
     6 thread contention: 110.314 ns/iter
=> std::shared_timed_mutex
     No contention: 45.487 ns/iter
     2 thread contention: 96.351 ns/iter
     3 thread contention: 142.871 ns/iter
     4 thread contention: 184.999 ns/iter
     5 thread contention: 336.608 ns/iter
     6 thread contention: 542.551 ns/iter
=> std::shared_mutex
     No contention: 10.861 ns/iter
     2 thread contention: 17.495 ns/iter
     3 thread contention: 31.126 ns/iter
     4 thread contention: 40.468 ns/iter
     5 thread contention: 15.677 ns/iter
     6 thread contention: 64.505 ns/iter

Cygwin (clang):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 11.536 ns/iter
     2 thread contention: 121.082 ns/iter
     3 thread contention: 396.430 ns/iter
     4 thread contention: 672.555 ns/iter
     5 thread contention: 1327.761 ns/iter
     6 thread contention: 14151.955 ns/iter
=> std::mutex
     No contention: 62.583 ns/iter
     2 thread contention: 3990.464 ns/iter
     3 thread contention: 7161.189 ns/iter
     4 thread contention: 9870.820 ns/iter
     5 thread contention: 12355.178 ns/iter
     6 thread contention: 14694.903 ns/iter
=> std::shared_timed_mutex
     No contention: 91.687 ns/iter
     2 thread contention: 1115.037 ns/iter
     3 thread contention: 4183.792 ns/iter
     4 thread contention: 15283.491 ns/iter
     5 thread contention: 27812.477 ns/iter
     6 thread contention: 35028.140 ns/iter
=> std::shared_mutex
     No contention: 91.764 ns/iter
     2 thread contention: 1051.826 ns/iter
     3 thread contention: 5574.720 ns/iter
     4 thread contention: 15721.416 ns/iter
     5 thread contention: 27721.487 ns/iter
     6 thread contention: 35420.404 ns/iter

Linux (x64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 13.487 ns/iter
     2 thread contention: 210.317 ns/iter
     3 thread contention: 430.855 ns/iter
     4 thread contention: 510.533 ns/iter
     5 thread contention: 1003.609 ns/iter
     6 thread contention: 1787.683 ns/iter
=> std::mutex
     No contention: 12.403 ns/iter
     2 thread contention: 157.122 ns/iter
     3 thread contention: 186.791 ns/iter
     4 thread contention: 265.073 ns/iter
     5 thread contention: 283.778 ns/iter
     6 thread contention: 270.687 ns/iter
=> std::shared_timed_mutex
     No contention: 21.509 ns/iter
     2 thread contention: 150.179 ns/iter
     3 thread contention: 256.574 ns/iter
     4 thread contention: 415.351 ns/iter
     5 thread contention: 611.532 ns/iter
     6 thread contention: 944.695 ns/iter
=> std::shared_mutex
     No contention: 20.805 ns/iter
     2 thread contention: 157.034 ns/iter
     3 thread contention: 244.025 ns/iter
     4 thread contention: 406.269 ns/iter
     5 thread contention: 387.985 ns/iter
     6 thread contention: 468.550 ns/iter

Linux (arm64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 20.891 ns/iter
     2 thread contention: 211.037 ns/iter
     3 thread contention: 409.962 ns/iter
     4 thread contention: 657.441 ns/iter
     5 thread contention: 828.405 ns/iter
     6 thread contention: 1131.827 ns/iter
=> std::mutex
     No contention: 50.884 ns/iter
     2 thread contention: 103.620 ns/iter
     3 thread contention: 332.429 ns/iter
     4 thread contention: 620.802 ns/iter
     5 thread contention: 783.943 ns/iter
     6 thread contention: 834.002 ns/iter
=> std::shared_timed_mutex
     No contention: 64.948 ns/iter
     2 thread contention: 173.191 ns/iter
     3 thread contention: 490.352 ns/iter
     4 thread contention: 660.668 ns/iter
     5 thread contention: 1014.546 ns/iter
     6 thread contention: 1451.553 ns/iter
=> std::shared_mutex
     No contention: 64.521 ns/iter
     2 thread contention: 195.222 ns/iter
     3 thread contention: 490.819 ns/iter
     4 thread contention: 654.786 ns/iter
     5 thread contention: 955.759 ns/iter
     6 thread contention: 1282.544 ns/iter
2018-07-14 00:39:01 +02:00
Bartosz Taudul
96042891f7 Reintroduce explicit template type for std::lock_guard.
Requested in issue #4 for support of older MSVC versions.
2018-07-13 12:30:29 +02:00
Bartosz Taudul
90a874f311 Require MSVC 15.7 for <execution> support. 2018-07-13 12:26:02 +02:00
Bartosz Taudul
c8b5b9447d Ignore dangling memory frees in on-demand mode. 2018-07-12 01:35:32 +02:00
Bartosz Taudul
e5064dec1e Store on-demand connection state. 2018-07-12 01:21:04 +02:00
Bartosz Taudul
d1ddaa8d59 Store frame offset in trace dumps. 2018-07-10 22:56:41 +02:00
Bartosz Taudul
a78981e040 Store on-demand frame offset. 2018-07-10 22:42:00 +02:00
Bartosz Taudul
6a9caabc63 Send on-demand initial payload message. 2018-07-10 22:37:39 +02:00
Bartosz Taudul
c056f3be41 Send keep alive messages to determine if client disconnected. 2018-07-10 21:39:17 +02:00
Bartosz Taudul
cb100e261c Return custom zone names. 2018-06-29 16:12:40 +02:00
Bartosz Taudul
053284b1c7 Process custom free-form zone names. 2018-06-29 16:12:17 +02:00
Bartosz Taudul
865e8d8506 Extract zone name getting functionality. 2018-06-29 15:14:20 +02:00
Bartosz Taudul
4a467b6d03 Remove GPU resync leftovers. 2018-06-28 00:48:23 +02:00
Bartosz Taudul
ab2945b988 Slab allocator is not thread safe. 2018-06-24 17:10:46 +02:00
Bartosz Taudul
b0aa13f4af Callstack getters are const. 2018-06-24 16:15:49 +02:00
Bartosz Taudul
11cf650be6 Fix GPU queries ordering.
With multithreaded Vulkan rendering it is possible that GPU time queries
will be sent in a different order than the originating CPU queries were
made. This commit changes the in-order queue to a map of queries,
waiting to be resolved.
2018-06-22 16:37:54 +02:00
Bartosz Taudul
af0c64c888 Remove GPU resync support.
The whole concept is not really reliable. And it forces CPU to GPU sync,
which is bad.
2018-06-22 16:34:51 +02:00
Bartosz Taudul
cd5ca3e754 Don't use hash table to store 256 pointers. 2018-06-22 15:14:44 +02:00
Bartosz Taudul
3a885bb8fd Support callstack collection for OpenGL GPU zones. 2018-06-22 02:13:35 +02:00
Bartosz Taudul
35dc2f796e Process GpuZoneBeginCallstack queue event. 2018-06-22 01:56:32 +02:00
Bartosz Taudul
4992ae6b39 Take callstack field in ZoneEvent into account in save/load. 2018-06-22 01:30:08 +02:00
Bartosz Taudul
5e01a8ead9 Process callstack queue event. 2018-06-22 01:15:49 +02:00
Bartosz Taudul
205a4e4ca2 Add callstack index to ZoneEvent. 2018-06-22 01:11:03 +02:00
Bartosz Taudul
978e168cbd Handle ZoneBeginCallstack queue event.
This is identical to ZoneBegin handling, but requires some additional
bookkeeping to account for the incoming callstack information.
2018-06-22 01:07:25 +02:00
Bartosz Taudul
973eab2b4a Fix typo. 2018-06-20 23:42:00 +02:00
Bartosz Taudul
2a618c90d5 Properly save compressed thread in GPU events. 2018-06-20 23:12:49 +02:00
Bartosz Taudul
7912807133 Wait for transfer of pending callback frames. 2018-06-20 14:57:48 +02:00
Bartosz Taudul
60395c85e0 Wait for pending callstacks. 2018-06-20 14:54:08 +02:00
Bartosz Taudul
9a5329b97d Save and load callstack frames. 2018-06-20 01:59:25 +02:00
Bartosz Taudul
e56ee377f4 Fix off-by-one. 2018-06-20 01:54:27 +02:00
Bartosz Taudul
88b1955a5a Filename in callstack frame is not a persistent pointer. 2018-06-20 01:26:05 +02:00
Bartosz Taudul
4000f27e15 Stack frame accessor. 2018-06-20 01:18:59 +02:00
Bartosz Taudul
0c0afa5ac7 Process callstack frames. 2018-06-20 01:07:09 +02:00
Bartosz Taudul
203744cdd9 Callstack frame queries. 2018-06-20 00:25:26 +02:00
Bartosz Taudul
06f34052a5 Have to track callstacks of both alloc and free. 2018-06-19 22:08:47 +02:00
Bartosz Taudul
0de279005b Load saved callstack payload. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
14b71e988b Properly skip memory event data. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
4033d74479 Callstack payload index 0 is invalid. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
b6e71dd909 Load memory event callstack index. 2018-06-19 21:51:06 +02:00
Bartosz Taudul
7c1333ce2f Save callstack payload. 2018-06-19 21:39:52 +02:00
Bartosz Taudul
2940230fcf Save callstack index in memory events. 2018-06-19 21:39:42 +02:00
Bartosz Taudul
77db91253b Assign callstack idx to memory event. 2018-06-19 21:34:36 +02:00
Bartosz Taudul
c28465aa7c Store unique callstack payloads. 2018-06-19 21:16:02 +02:00
Bartosz Taudul
cbc9ede3ca No-op callstack payload handling. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
6a63d09a49 Don't check for each type, if range check is possible. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
e51eef3dcd Process memory events with callstack. 2018-06-19 18:52:45 +02:00
Bartosz Taudul
59dc55002b Callstack ptr in server data structures.
Will be probably reduced to 32-bit index later on.
2018-06-19 18:52:10 +02:00
Bartosz Taudul
bb0631585c Store thread id of GPU events. 2018-06-17 19:07:07 +02:00
Bartosz Taudul
cfd7ac3957 Map compressed thread id 0 to real thread id 0. 2018-06-17 19:03:06 +02:00
Bartosz Taudul
d5a4c693d8 Take GPU timestamp period into account. 2018-06-17 18:49:56 +02:00
Bartosz Taudul
dcd6cac078 Save GPU timestamp period.
Bump file version to 0.3.2.
2018-06-17 18:27:42 +02:00
Bartosz Taudul
2be1d1d2b2 Use proper type. 2018-06-07 13:30:46 +02:00
Bartosz Taudul
b7930f67da Calculate total self time of zones. 2018-06-06 00:39:22 +02:00
Bartosz Taudul
53aea660c8 Store thread id in MessageData. 2018-05-25 21:10:38 +02:00
Bartosz Taudul
bb0246730f Don't save MessageData padding.
This requires file version bump to 0.3.1.
2018-05-25 21:10:38 +02:00
Bartosz Taudul
312c20b0bc Fallback to pdqsort if parallel STL is not available. 2018-05-12 22:41:18 +02:00
Bartosz Taudul
920bfc8c82 Parallelize (big) sorts in worker. 2018-05-08 01:40:22 +02:00
Bartosz Taudul
dbc963d55c Drop template argument from std::lock_guard. 2018-05-08 01:25:16 +02:00
Bartosz Taudul
3768ed5dd7 Don't reconstruct mem plot if there's no mem event data. 2018-05-04 16:08:16 +02:00
Bartosz Taudul
e7ffe288e6 One less FileWrite::Write() call. 2018-05-04 15:11:19 +02:00
Bartosz Taudul
e058bb34c1 CompressThread body must be available. 2018-05-03 18:43:51 +02:00
Bartosz Taudul
b18841aa75 Store ordered list of memory frees. 2018-05-02 17:59:50 +02:00
Bartosz Taudul
754e79b443 Setup memory plot pointer on dump load. 2018-05-02 17:18:52 +02:00
Bartosz Taudul
7266a979c3 Omit stack. 2018-05-01 02:13:49 +02:00
Bartosz Taudul
8beb1c1a39 Add thread compression cache.
Observation: calls to CompressThread() are likely to be repeated with
the same value. Exploit that by storing last query and its result.
2018-05-01 01:29:25 +02:00
Bartosz Taudul
ec58aa4ce1 Don't increase vector size in each iteration. 2018-04-30 13:57:12 +02:00
Bartosz Taudul
553e3ca38b Optimize mem plot reconstruction loop. 2018-04-30 13:45:36 +02:00
Bartosz Taudul
76f0c8fafe Sort source location zones on a separate thread. 2018-04-30 03:54:09 +02:00
Bartosz Taudul
63e4f6fa04 Directly store values. 2018-04-30 03:30:19 +02:00
Bartosz Taudul
e5cb241c19 Optimize creation of vector of frees. 2018-04-29 13:40:47 +02:00
Bartosz Taudul
3eb73b8d43 Move memory plot reconstruction to a background thread. 2018-04-29 13:40:04 +02:00
Bartosz Taudul
bc84ebc338 Read/write LockEvent data in one go. 2018-04-29 03:41:58 +02:00
Bartosz Taudul
c5133e0b4e Walk lockmap timeline pointer. 2018-04-29 03:41:58 +02:00
Bartosz Taudul
9769cc4d7d Read/write most of MemEvent in one go. 2018-04-29 03:41:58 +02:00
Bartosz Taudul
d5f0f0939d No need to track min memory usage.
At least if client instrumentation was not broken and the data makes
sense.
2018-04-29 02:57:20 +02:00
Bartosz Taudul
7fdc6f5453 Zero as initial max value is fine too. 2018-04-29 02:56:23 +02:00
Bartosz Taudul
723f98d24b Overflow checks are not needed. 2018-04-29 02:47:25 +02:00
Bartosz Taudul
b06f445de9 Don't use stack to write two values... 2018-04-29 02:32:20 +02:00
Bartosz Taudul
333d3a92c8 Perform memory usage calculation on doubles. 2018-04-29 02:29:06 +02:00
Bartosz Taudul
aceaed25b9 Walk plot data pointer. 2018-04-29 02:11:47 +02:00
Bartosz Taudul
868fbace5a Don't compress thread twice, if it's the same. 2018-04-29 02:04:51 +02:00
Bartosz Taudul
fdaebc2bd8 No need to perform space check here. 2018-04-29 01:38:54 +02:00
Bartosz Taudul
d64f0390da Don't use std::sort. 2018-04-29 01:23:30 +02:00
Bartosz Taudul
7df7bf1745 Begin memory plot with no memory usage. 2018-04-28 16:26:45 +02:00
Bartosz Taudul
a0b8ed2e50 Restore memory plot when loading data dump. 2018-04-28 16:26:45 +02:00
Bartosz Taudul
d8bfe7de2e Create memory plot based on memory alloc/free events. 2018-04-28 15:49:12 +02:00
Bartosz Taudul
cd34ed6968 Two plot types: user and memory.
Only user plots are saved in a dump file.
2018-04-28 15:48:05 +02:00
Bartosz Taudul
1fb47899b2 Fix skipping lock data with new dump version. 2018-04-22 01:26:51 +02:00
Bartosz Taudul
436cd2b6cf Drop '###Profiler' from capture name. 2018-04-21 23:29:28 +02:00
Bartosz Taudul
d1e185e176 Cleanup message data. 2018-04-21 20:36:33 +02:00
Bartosz Taudul
4cd9cf5dd9 Cleanup zone data. 2018-04-21 20:34:29 +02:00
Bartosz Taudul
0de5bcacaf Free plot data. 2018-04-21 20:12:16 +02:00
Bartosz Taudul
dda25cf66a Cosmetics. 2018-04-21 20:11:59 +02:00
Bartosz Taudul
cb298893e7 Fix skipping lock data. 2018-04-21 16:02:36 +02:00
Bartosz Taudul
121cced681 Don't save unneeded lock data.
Store only the minimal lock information required and calculate lock
counts, wait lists, etc. at load time.
2018-04-21 15:42:08 +02:00
Bartosz Taudul
a63f214964 Use static assert where static assert is due. 2018-04-21 14:47:15 +02:00
Bartosz Taudul
36efe96e9d Throw exception when trying to open unsupported dump version. 2018-04-21 14:18:42 +02:00
Bartosz Taudul
d9fd1ce74a Add dump file header. 2018-04-21 13:45:48 +02:00
Bartosz Taudul
84fd351fba Allow partial load of data from dump. 2018-04-20 16:03:09 +02:00
Bartosz Taudul
3df7c70f99 Optimize mem alloc processing. 2018-04-10 16:06:01 +02:00
Bartosz Taudul
be50fb26b5 Remove useless assert. 2018-04-10 14:37:17 +02:00
Bartosz Taudul
4e1dbb3973 Fix lock announce processing. 2018-04-09 14:28:40 +02:00
Bartosz Taudul
bf99bff87d Store MemEvents directly in the vector. 2018-04-03 14:17:51 +02:00
Bartosz Taudul
821b08fbe4 Thread compression state is not preserved. 2018-04-02 14:52:36 +02:00
Bartosz Taudul
1fa943d109 Save/load memory data. 2018-04-02 02:05:39 +02:00
Bartosz Taudul
52f59c90bf Track memory usage. 2018-04-02 00:00:49 +02:00
Bartosz Taudul
a574f98f0c Memory events are now serialized. 2018-04-01 20:13:01 +02:00
Bartosz Taudul
b12375815c Broken memory events processing. 2018-04-01 02:03:34 +02:00
Bartosz Taudul
991fc6bd95 Memory allocations tracker. 2018-03-31 21:56:05 +02:00
Bartosz Taudul
225423bd21 Cosmetics. 2018-03-24 14:42:48 +01:00
Bartosz Taudul
a9e1a9bddb Calculate total time spent in source location.
This simple solution doesn't handle recursion at all.
2018-03-24 14:24:30 +01:00
Bartosz Taudul
fea0234a60 Change zone end "-1" comparisons to "0" comparisons. 2018-03-24 02:00:20 +01:00
Bartosz Taudul
6a4e58b545 Force inline compress/decompress thread id. 2018-03-24 01:31:58 +01:00
Bartosz Taudul
69b49f527d Inline GetZoneEndDirect(). 2018-03-23 02:06:44 +01:00
Bartosz Taudul
6e6addfa81 Use pdqsort. 2018-03-20 19:19:07 +01:00
Bartosz Taudul
ae55360a6d Don't sort zones if statistics are disabled. 2018-03-20 19:12:42 +01:00
Bartosz Taudul
d8f7903a97 Use flat hash map for ptr mapping during data load. 2018-03-20 15:44:13 +01:00
Bartosz Taudul
ceeae3c2cf Restore ordering of source location zones after load. 2018-03-20 14:56:42 +01:00
Bartosz Taudul
05eb4b7ebc Don't use memcpy to terminate string. 2018-03-19 15:41:28 +01:00
Bartosz Taudul
2eece7c1f3 Reorder instructions. 2018-03-18 23:46:34 +01:00
Bartosz Taudul
d0519499f4 Store thread id next to zone ptr in source location zone list. 2018-03-18 20:45:49 +01:00
Bartosz Taudul
777d672e05 Thread id compression/decompression. 2018-03-18 20:45:22 +01:00
Bartosz Taudul
40c6f01a41 Perform search after condition was verified, not before. 2018-03-18 20:25:00 +01:00
Bartosz Taudul
3ac98beb5a Use precalculated min/max time spans. 2018-03-18 20:20:24 +01:00
Bartosz Taudul
0f1f7c6813 Calculate min/max time spans for source locations. 2018-03-18 20:15:45 +01:00
Bartosz Taudul
43c3fe25ba Put source location zone data into a struct. 2018-03-18 20:08:57 +01:00
Bartosz Taudul
7a4e7cbf86 Reduce data collection if TRACY_NO_STATISTICS is defined.
Statistical data collection is only useful if it's meant to be used.
Otherwise it only incurs CPU and memory cost.
2018-03-18 12:55:54 +01:00
Bartosz Taudul
e6b3f373c5 Add direct zone end getter. 2018-03-18 02:53:00 +01:00
Bartosz Taudul
c807b3f7ef Getter for source location zones. 2018-03-18 02:35:39 +01:00
Bartosz Taudul
9830fa297e Store per-source-location zone lists. 2018-03-18 02:05:33 +01:00
Bartosz Taudul
c5c81a73bc Skip initialization of StringIdx.
That memory will be loaded from file.
2018-03-17 14:43:02 +01:00
Bartosz Taudul
81ff554c7d Don't call ReadTimeline() when there's nothing to read. 2018-03-15 22:54:10 +01:00
Bartosz Taudul
9dfa9c95cb Read and write whole ZoneEvent/GpuEvent data at once. 2018-03-15 21:59:16 +01:00
Bartosz Taudul
e5796af196 More efficient vector filling. 2018-03-15 21:42:00 +01:00
Bartosz Taudul
c510c9705b No need to check for reserved space. 2018-03-15 21:32:06 +01:00
Bartosz Taudul
b7ba64a223 Microoptimize ReadTimeline(). 2018-03-15 21:27:36 +01:00
Bartosz Taudul
5cb917e868 No nonsense union. 2018-03-04 17:52:51 +01:00
Bartosz Taudul
e9395cd988 Reconstruct source location payload map on data load. 2018-03-04 17:22:34 +01:00
Bartosz Taudul
a374114358 Use proper encoding of source location. 2018-03-04 17:17:37 +01:00
Bartosz Taudul
9170cfd943 First entry in sourceLocationExpand is special. 2018-03-04 16:57:57 +01:00
Bartosz Taudul
b48602f5d1 Implement search for matching source locations. 2018-03-04 16:52:45 +01:00
Bartosz Taudul
f99c6eec78 Simplify code. 2018-03-04 16:23:28 +01:00
Bartosz Szreder
3b9639a9de Tweak included header files in View and Worker. 2018-02-23 15:08:20 +01:00
Bartosz Szreder
bae1c02ad0 Worker thread will take care of itself. 2018-02-21 16:41:37 +01:00
Bartosz Szreder
9e3f18a62a Split data handling code from the view. 2018-02-21 16:41:37 +01:00