Commit Graph

695 Commits

Author SHA1 Message Date
Bartosz Taudul
c9da5f1474 Use cached thread retriever. 2019-10-24 22:34:18 +02:00
Bartosz Taudul
5873561b54 Add cached thread retriever. 2019-10-24 22:33:48 +02:00
Bartosz Taudul
06bc802107 Avoid load-hit-store. 2019-10-24 22:24:00 +02:00
Bartosz Taudul
1cfb5adc44 Count transferred data size. 2019-10-24 00:47:16 +02:00
Bartosz Taudul
ba61a9ed84 Transfer time deltas, not absolute times.
This change significantly reduces network bandwidth requirements.

Implemented for:
- CPU zones,
- GPU zones,
- locks,
- plots,
- memory events.
2019-10-24 00:06:41 +02:00
Bartosz Taudul
5c92eae3ed Add early exit for invalid times. 2019-10-20 18:47:50 +02:00
Bartosz Taudul
4d761def61 Microoptimize comparison. 2019-10-16 20:26:39 +02:00
Bartosz Taudul
3ae5c125f6 Implement counting CPU usage (ctx switch) at a given time. 2019-10-15 16:54:43 +02:00
Bartosz Taudul
3ce6b1205f Don't iterate over 256 CPUs. 2019-10-15 16:13:53 +02:00
Bartosz Taudul
eccb0b1e4a Track max CPU present in context switch data. 2019-10-15 16:13:53 +02:00
Bartosz Taudul
bdb8516d04 Make sure context switch end time wasn't set already. 2019-10-15 14:54:28 +02:00
Bartosz Taudul
215dc8a804 More compact GpuEvent struct (save 4 bytes).
Memory usage reduction of various traces:

big         9011 -> 9007
frameimages 561  -> 552
fi-big      4144 -> 4139
long        5253 -> 5125
2019-10-13 14:42:52 +02:00
Bartosz Taudul
5e1894dd79 Count GPU zones. 2019-10-13 14:13:04 +02:00
Bartosz Taudul
65ea33a60f Store memory callstack data as 24-bit ints.
This reduces MemEvent size from 40 to 38 bytes.

Memory usage reduction:

chicken     2027 -> 2019
mem         6468 -> 6308
q3bsp-mt    5304 -> 5283
2019-10-01 22:38:17 +02:00
Bartosz Taudul
f0b957ec56 Store callstacks on 24 bits.
ZoneEvent is now 27 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9224 -> 9011  (97%)
chicken         2044 -> 2027  (99%)
drl-l-b         1443 -> 1383  (95%)
long            5327 -> 5253  (98%)
q3bsp-mt        5400 -> 5304  (98%)
selfprofile     1403 -> 1382  (98%)
2019-10-01 22:38:17 +02:00
Bartosz Taudul
717a212563 Save another 2 bytes per ZoneEvent.
ZoneEvent is not 28 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9527 -> 9224  (96%)
chicken         2107 -> 2044  (97%)
drl-l-b         1479 -> 1443  (97%)
long            5412 -> 5327  (98%)
q3bsp-mt        5592 -> 5400  (96%)
selfprofile     1443 -> 1403  (97%)
2019-10-01 01:05:37 +02:00
Bartosz Taudul
2470936050 Don't perform background tasks during trace upgrade. 2019-09-29 20:52:25 +02:00
Bartosz Taudul
d228bcb622 Pack StringIdx in 24 bits.
This reduces ZoneEvent size from 32 to 30 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9902 -> 9527  (96%)
chicken         2172 -> 2107  (97%)
ctx-big          311 ->  309  (99%)
drl-l-b         1570 -> 1479  (94%)
long            5496 -> 5412  (98%)
mem             6468 -> 6468  (100%)
q3bsp-mt        5784 -> 5592  (96%)
selfprofile     1486 -> 1443  (97%)
2019-09-29 20:32:42 +02:00
Aleksei Skriabin
c0c2f4536a strstr_nocase() typo fix. 2019-09-28 14:20:29 +05:00
Bartosz Taudul
a5ba74ed13 Handle multiple Vulkan threads. 2019-09-23 17:27:49 +02:00
Bartosz Taudul
82cd667b30 Allow specifying network port in server. 2019-09-21 15:43:01 +02:00
Bartosz Taudul
d8e0853cd8 Multithreaded frame image compression. 2019-09-20 23:03:12 +02:00
Bartosz Taudul
e1e5d6bd47 Add const version of PackFrameImage().
Temporary buffer needs to be handled outside of the function.
2019-09-20 22:55:55 +02:00
Bartosz Taudul
8fe9b56b6f Calculate frame statistics. 2019-09-16 22:02:47 +02:00
Bartosz Taudul
7673028dba Fix skipping memory data. 2019-09-16 15:42:25 +02:00
Bartosz Taudul
cdc4575dba Setup tid -> thread data mapping when loading trace. 2019-09-08 14:15:40 +02:00
Bartosz Taudul
ea6a0a58a7 Thread data accessor. 2019-09-08 14:07:16 +02:00
Bartosz Taudul
aac0a36a2d Don't use source location zones before they are ready. 2019-09-07 17:23:11 +02:00
Bartosz Taudul
86cb477811 Pack ZoneThreadData.
This reduces struct size from 10 to 8 bytes. Assumes 48-bit pointers
(4-level paging)!

Memory savings (MB):

android     2766    ->  2757    (99%)
big         10.29 G ->  9902    (96%)
chicken     2244    ->  2172    (96%)
ctx-android 228     ->  224     (98%)
drl-l-b     1635    ->  1570    (96%)
gn-vulkan   244     ->  240     (98%)
long        5656    ->  5496    (97%)
q3bsp-mt    6043    ->  5784    (95%)
selfprofile 1554    ->  1486    (95%)
2019-08-31 00:55:51 +02:00
Bartosz Taudul
3ec534cdf3 Prevent "ntdll.dll" from appearing as a thread name. 2019-08-30 23:09:07 +02:00
Bartosz Taudul
1c0c6311ec Fix skipping data when loading traces. 2019-08-30 01:16:42 +02:00
Bartosz Taudul
a2f968d843 Compress thread id in MessageData. 2019-08-28 21:03:01 +02:00
Bartosz Taudul
fd5014be6f GetThreadString() is no longer used. 2019-08-28 20:08:16 +02:00
Bartosz Taudul
3c092b4bec Add thread name getter combining local and external thread names. 2019-08-27 23:00:13 +02:00
Bartosz Taudul
f76f38777e Signed minus unsigned is unsigned... 2019-08-26 19:09:12 +02:00
Bartosz Taudul
1712431dfd Compress external threads. Saves 4 bytes per ctx switch.
Dropped support for loading context switch data in previous versions of
traces.
2019-08-19 23:09:58 +02:00
Bartosz Taudul
21e7a4bb16 Extract thread compression into a separate class. 2019-08-19 22:56:58 +02:00
Bartosz Taudul
94382f54ca Move FileVersion() to TracyFileHeader.hpp. 2019-08-19 22:56:58 +02:00
Bartosz Taudul
19857473e3 Also collect information on local threads. 2019-08-18 14:56:17 +02:00
Bartosz Taudul
3b8518f7b6 Save/load CPU thread data. 2019-08-18 01:53:38 +02:00
Bartosz Taudul
62dbe522c5 Add accessors. 2019-08-18 01:51:02 +02:00
Bartosz Taudul
103645c2fa Calculate cpu thread data statistics. 2019-08-18 01:50:49 +02:00
Bartosz Taudul
1498417a8d Save/load tid to pid mapping. 2019-08-17 22:36:21 +02:00
Bartosz Taudul
20e8a5ecc8 Create tid to pid mapping. 2019-08-17 22:32:41 +02:00
Bartosz Taudul
678e942e9f Transfer PID of profiled program. 2019-08-17 22:19:04 +02:00
Bartosz Taudul
414f903cc5 Collect thread wakeup data. 2019-08-17 17:05:29 +02:00
Bartosz Taudul
26be78530f Use signed number to calculate frame offset. 2019-08-17 15:22:54 +02:00
Bartosz Taudul
6c53cac15e Fix uninitialized variable. 2019-08-16 21:20:04 +02:00
Bartosz Taudul
e975c4d7bf Also retrieve external thread names. 2019-08-16 19:49:16 +02:00
Bartosz Taudul
ccaf92afc4 Save/load external process names. 2019-08-16 19:24:38 +02:00
Bartosz Taudul
fe7f56b022 Implement retrieval of external process names. 2019-08-16 19:22:23 +02:00
Bartosz Taudul
c212661714 Allow determining whether thread is local to profiled program. 2019-08-16 17:59:25 +02:00
Bartosz Taudul
cef7e4b8d0 Save/load per-cpu context switches. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
8bc4258e29 Display count of per-cpu context switch data. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
69527d2f71 Collect per-cpu context switch data. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
42c71d7e46 Fix loading old traces. 2019-08-16 00:24:29 +02:00
Bartosz Taudul
889eddd646 Pack ContextSwitchData. Saves 3 bytes per context switch region. 2019-08-15 23:53:47 +02:00
Bartosz Taudul
c22c259a13 Pack time and thread in MemEvent.
This saves 4 bytes per logged memory allocation. Memory savings for
selected traces:

android     2945 MB -> 2766 MB
chicken     2261 MB -> 2245 MB
q3bsp-mt    6085 MB -> 6043 MB
mem         6788 MB -> 6468 MB
2019-08-15 23:02:43 +02:00
Bartosz Taudul
9618ee3581 Fix skipping locks. 2019-08-15 22:24:27 +02:00
Bartosz Taudul
8b73dece98 Preserve magic time values when loading old traces. 2019-08-15 21:30:37 +02:00
Bartosz Taudul
3db3952135 Hackfix for broken lock terminate times. 2019-08-15 20:45:00 +02:00
Bartosz Taudul
5e20b3f28a Pack time and source location in LockEvent. 2019-08-15 20:39:16 +02:00
Bartosz Taudul
bf3ad57456 Pack start time and srcloc together in ZoneEvent.
This reduces ZoneEvent struct size by 2 bytes. Memory savings on various
captures:

10.62 GB -> 10.29 GB
 2342 MB ->  2276 MB
 1706 MB ->  1635 MB
 6277 MB ->  6085 MB
2019-08-15 20:17:36 +02:00
Bartosz Taudul
042e6c9e11 Set initial time of old traces to 0. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
b322d20c19 Store received timestamps offset to 0. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
659907c972 Store srcloc identifiers using 16 bit.
This reduces various structure sizes by 2 bytes. Memory usage reduction
on various traces:

big               11 GB -> 10.62 GB
chicken         2436 MB ->  2342 MB
drl-light-big   1761 MB ->  1706 MB
q3bsp-mt        6469 MB ->  6277 MB
2019-08-15 20:15:48 +02:00
Bartosz Taudul
416113fdcb Drop support for ETC1 frame images. 2019-08-15 16:29:50 +02:00
Bartosz Taudul
9a364fe5fe Cache context switch data queries. 2019-08-14 20:16:11 +02:00
Bartosz Taudul
3e01ca3269 Calculate how long thread was in running time. 2019-08-14 17:12:48 +02:00
Bartosz Taudul
0bb0c10e3c Revert "Save one byte on ContextSwitchData."
Counting bits is hard, let's go shopping.
2019-08-14 13:55:05 +02:00
Bartosz Taudul
3996516fce One more SetThreadName() to change. 2019-08-14 02:27:01 +02:00
Bartosz Taudul
f285e0f5cc Save one byte on ContextSwitchData. 2019-08-13 15:16:46 +02:00
Bartosz Taudul
9417ad994d Save/load context switch data. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
1c937ad9bb Implement skipping frame image data. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
8c494eabbf Display number of context switch regions. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
0b03fed61c Add context switch accessor. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
419f74280d Store context switches. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
8aa0be39d5 Drop support for CPU id queries. 2019-08-12 23:05:34 +02:00
Bartosz Taudul
d6f32a0839 Serialize lock processing.
This makes is much easier to process on the server and opens new
optimization possibilities. It also fixes theoretical problems, which
may be caused by invalid ordering of events with the same timestamp.
2019-08-12 13:51:01 +02:00
Bartosz Taudul
6398ecb344 Drop support for pre-0.4 traces. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
a76622d17a Cache last searched ThreadData. 2019-08-03 14:35:01 +02:00
Bartosz Taudul
12969ee497 Track thread context.
This change exploits the fact that events are processed in batches
originating from a single thread. A single message changing thread
context is enough to handle multiple messages, as opposed to inclusion
of thread identifier in each message.
2019-08-02 20:18:08 +02:00
Bartosz Taudul
a4e7a341c0 Proper handling of disconnect request. 2019-08-01 23:14:09 +02:00
Bartosz Taudul
dc49f2f76a Move DXT1 index conversion to server. 2019-07-19 21:46:58 +02:00
Bartosz Taudul
2e774f4626 Save/load application info. 2019-07-12 18:45:35 +02:00
Bartosz Taudul
d64ab7db5a Store app info messages. 2019-07-12 18:34:46 +02:00
Bartosz Taudul
10bcc8c770 Switch to DXT1 textures in profiler utility. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
7dc7ece2bd Add staging area for frame images.
Compressing frame images on a separate thread may cause frame image
arrival before frames are sent. Fix this issue by creating a staging
area in which frame images will wait for frames to arrive.

This probably breaks playback functionality, as non-existent frames may
be queried, but this problem seems to be very hard to find, so let's
ignore it for now.
2019-06-27 13:24:35 +02:00
Bartosz Taudul
bb35f9a897 Compress frame images in a separate thread. 2019-06-27 13:24:35 +02:00
Bartosz Taudul
1c41229766 Use proper type for buffer size comparison. 2019-06-22 14:07:53 +02:00
Bartosz Taudul
6a82f666a7 Cosmetics. 2019-06-22 14:05:18 +02:00
Bartosz Taudul
eb4c7ca9ea Ignore useless warnings. 2019-06-22 13:40:00 +02:00
Bartosz Taudul
850815534e Insert frame mark at beginning of on-demand connection. 2019-06-21 19:39:41 +02:00
Bartosz Taudul
31a4a45b14 Ignore memory free faults if running on apple.
There's a case in MoltenVK initialization where overloading operator new
and operator delete works for std::string destruction, but not
construction.
2019-06-13 14:15:17 +02:00
Bartosz Taudul
37d1457b44 Frame image may need flipping. 2019-06-12 15:28:32 +02:00
Bartosz Taudul
eb6ac5e6e1 Store frame reference in frame images. 2019-06-12 00:55:02 +02:00
Bartosz Taudul
5f8eadfb16 Release zone id stack. 2019-06-09 17:56:41 +02:00
Bartosz Taudul
b1f8d9fba1 Send server termination query on server disconnect. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
2c780f1af4 Allow sending immediate termination query from server. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
d6d7b82529 Ignore invalid frame images in on-demand mode. 2019-06-09 15:37:49 +02:00
Bartosz Taudul
50cda7720f Handle frame image instrumentation failures. 2019-06-09 13:44:53 +02:00
Bartosz Taudul
bef1988800 Compress frame images using LZ4. 2019-06-08 12:17:18 +02:00
Bartosz Taudul
fc5a8f7e3a Assign frame image to the correct frame (including offset). 2019-06-07 20:13:08 +02:00
Bartosz Taudul
42a30bffe1 Frame images are now ETC1 compressed. 2019-06-07 00:31:51 +02:00
Bartosz Taudul
646e7327b8 Show loading progress of frame images. 2019-06-06 23:40:37 +02:00
Bartosz Taudul
6b2741ccdb Save/load frame images. 2019-06-06 23:08:19 +02:00
Bartosz Taudul
cd2f572a2f Use proper index. 2019-06-06 22:22:57 +02:00
Bartosz Taudul
af56f41e32 Add frame image accessor. 2019-06-06 22:14:51 +02:00
Bartosz Taudul
34b84bb284 Add frame image index to frame data. 2019-06-06 21:44:48 +02:00
Bartosz Taudul
e5bb6011c5 Frame image transfer prototype. 2019-06-06 21:39:54 +02:00
Bartosz Taudul
5681096486 Track status of worker background tasks. 2019-06-02 15:00:38 +02:00
Bartosz Taudul
845f3a2ddf Use std::shared_mutex for locking worker access. 2019-05-28 19:21:53 +02:00
Bartosz Taudul
0da1e8551f Track lock contention status. 2019-05-12 16:17:17 +02:00
Bartosz Taudul
a714cd4369 Typo. 2019-05-12 15:59:53 +02:00
Bartosz Taudul
74575250a5 Save message color data in trace dumps. 2019-05-10 20:32:47 +02:00
Bartosz Taudul
4850e19ebd Store color in message data. 2019-05-10 20:26:27 +02:00
Bartosz Taudul
797ebd3caf Cosmetics. 2019-05-10 20:20:08 +02:00
Bartosz Taudul
efc54babe3 Transfer of colored messages. 2019-05-10 20:17:44 +02:00
Bartosz Taudul
20e6813461 Store send queue size in mbps block. 2019-04-01 19:55:37 +02:00
Bartosz Taudul
9010b2c142 Put queries into queue if send buffer is full. 2019-04-01 19:47:29 +02:00
Bartosz Taudul
deeea0ee70 Track space left in send buffer. 2019-04-01 19:37:39 +02:00
Bartosz Taudul
57dff0abc9 Add server query queue. 2019-04-01 19:26:50 +02:00
Bartosz Taudul
c07c6d11b7 Define server query packet. 2019-04-01 19:21:53 +02:00
Bartosz Taudul
fef417f286 Store total number of CPU and GPU zones in trace. 2019-03-27 01:46:54 +01:00
Bartosz Taudul
2e6ac050f4 Use custom vector swap. 2019-03-26 23:02:39 +01:00
Bartosz Taudul
a632d9e2a3 Add zone vector cache.
Zone children will be now collected in staging vectors. When the zone is
ended (and no children can be added anymore to it), a size-fitted vector
is allocated using slab allocation. The over-allocated vector is then
put into cache for use in future zones.

This is only active for vectors <= 8192 elements, or 64 KB (chosen
arbitrarily), to reduce time spent on copying memory.

Overall, this change should have the following effects:
- System memory allocation pressure reduction, due to re-usage of
  vectors, which eliminates the need for constant growth.
- Reduction of memory usage, because children vectors are now fitted to
  required size.
- Slight increase of zone processing time, due to memory copying?
2019-03-26 22:06:00 +01:00
Bartosz Taudul
11f4dcbf1e Consistent variable naming. 2019-03-26 21:41:44 +01:00
Bartosz Taudul
99fca9e069 Fix loading old traces when skipping locks. 2019-03-26 20:25:29 +01:00
Bartosz Taudul
e79fa04a8b Don't fail when timer accuracy is low. 2019-03-21 21:24:07 +01:00
Bartosz Taudul
7e6a8135df Remove double indirection in GetNextLockEvent(). 2019-03-16 14:18:43 +01:00
Bartosz Taudul
67f14be6aa Update lock ranges when loading trace. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
8ced8a457c Update thread time range on lock event insert. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
dc981550a1 Load lock event time to a variable. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
71e20e7e7f Store lock map as flat_hash_map with pointer values. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
5fbc14c487 Fix skipping plots in version >= 0.4.5. 2019-03-15 15:27:37 +01:00
Bartosz Taudul
a0299cc63a Optimize calculation of standard deviation. 2019-03-14 01:23:37 +01:00
Bartosz Taudul
d64f07f853 Don't search for thread for empty timelines. 2019-03-14 01:10:57 +01:00
Bartosz Taudul
b7fe29f750 Offload timeline statistics update to a background thread. 2019-03-13 01:46:05 +01:00
Bartosz Taudul
d0d7131e35 Properly restore threadMap.
This fixes thread ids returned by CompressThread in loaded traces. The
bug was manifesting by not displaying memory events in zone info window.
2019-03-07 00:49:25 +01:00
Bartosz Taudul
07bcca9dc0 Don't pre-fill threadExpand, if not needed. 2019-03-07 00:49:06 +01:00
Bartosz Taudul
bc87762012 Merge native callstack with the allocated one. 2019-03-05 19:44:08 +01:00
Bartosz Taudul
ebf09bebae Only one callstack may be in-flight at any time.
Save for the allocated callstack, but this will be solved in another
way. There's no need to keep pending callstacks in a map.
2019-03-05 19:44:08 +01:00
Bartosz Taudul
3e81e44d75 Separate processing for allocated callstacks.
Only native callstack is used for the moment.
2019-03-05 02:42:50 +01:00
Bartosz Taudul
bef31ba073 Separate message for zone begin with alloc src loc and callstack. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
66b8a13e77 Store callstack alloc payloads. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
9fc022346b Replace frame pointers with callstack frame ids. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
664847211c Pack/unpack frame pointer to callstack frame id. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
42c94f7142 Handle discontinuous frame end mismatch failure. 2019-02-28 19:32:42 +01:00
Bartosz Taudul
fb96d60256 Adjust last time in mem alloc/free and sys time messages. 2019-02-24 18:26:23 +01:00
Bartosz Taudul
e190faa7e1 Save/load CPU usage plot. 2019-02-21 22:56:59 +01:00
Bartosz Taudul
e9baa80bf3 Process CPU usage reports. 2019-02-21 22:56:59 +01:00
Bartosz Taudul
4422fce55c Don't decompress GpuZone threads while saving trace.
Saving the threads compressed in GPU zones and memory event has the
following result on trace dump sizes:

043/aa.tracy (0.4.3) {10055 KB} -> 044/aa.tracy (0.4.4) {9975 KB}  99.20% size change
043/android.tracy (0.4.3) {542739 KB} -> 044/android.tracy (0.4.4) {519248 KB}  95.67% size change
043/asset-new.tracy (0.4.3) {78403 KB} -> 044/asset-new.tracy (0.4.4) {75899 KB}  96.81% size change
043/asset-new-id.tracy (0.4.3) {84341 KB} -> 044/asset-new-id.tracy (0.4.4) {81771 KB}  96.95% size change
043/asset-old.tracy (0.4.3) {80688 KB} -> 044/asset-old.tracy (0.4.4) {78410 KB}  97.18% size change
043/big.tracy (0.4.3) {939577 KB} -> 044/big.tracy (0.4.4) {938427 KB}  99.88% size change
043/callstack.tracy (0.4.3) {14557 KB} -> 044/callstack.tracy (0.4.4) {14465 KB}  99.37% size change
043/callstack-linux.tracy (0.4.3) {6949 KB} -> 044/callstack-linux.tracy (0.4.4) {6942 KB}  99.90% size change
043/crash.tracy (0.4.3) {131 KB} -> 044/crash.tracy (0.4.4) {127 KB}  97.10% size change
043/crash2.tracy (0.4.3) {1422 KB} -> 044/crash2.tracy (0.4.4) {1412 KB}  99.29% size change
043/darkrl.tracy (0.4.3) {15767 KB} -> 044/darkrl.tracy (0.4.4) {15663 KB}  99.34% size change
043/darkrl2.tracy (0.4.3) {7947 KB} -> 044/darkrl2.tracy (0.4.4) {7886 KB}  99.23% size change
043/darkrl-old.tracy (0.4.3) {67448 KB} -> 044/darkrl-old.tracy (0.4.4) {67004 KB}  99.34% size change
043/deadlock.tracy (0.4.3) {5984 KB} -> 044/deadlock.tracy (0.4.4) {5986 KB}  100.03% size change
043/gn-opengl.tracy (0.4.3) {29005 KB} -> 044/gn-opengl.tracy (0.4.4) {28885 KB}  99.59% size change
043/gn-vulkan.tracy (0.4.3) {29352 KB} -> 044/gn-vulkan.tracy (0.4.4) {29257 KB}  99.68% size change
043/long.tracy (0.4.3) {1182800 KB} -> 044/long.tracy (0.4.4) {1176584 KB}  99.47% size change
043/mem.tracy (0.4.3) {1369067 KB} -> 044/mem.tracy (0.4.4) {1262406 KB}  92.21% size change
043/multi.tracy (0.4.3) {8004 KB} -> 044/multi.tracy (0.4.4) {7944 KB}  99.24% size change
043/new.tracy (0.4.3) {1108 KB} -> 044/new.tracy (0.4.4) {1099 KB}  99.18% size change
043/q3bsp-mt.tracy (0.4.3) {949855 KB} -> 044/q3bsp-mt.tracy (0.4.4) {937574 KB}  98.71% size change
043/q3bsp-st.tracy (0.4.3) {240347 KB} -> 044/q3bsp-st.tracy (0.4.4) {230092 KB}  95.73% size change
043/selfprofile.tracy (0.4.3) {197708 KB} -> 044/selfprofile.tracy (0.4.4) {197659 KB}  99.98% size change
043/tbrowser.tracy (0.4.3) {9503 KB} -> 044/tbrowser.tracy (0.4.4) {9503 KB}  100.00% size change
043/test.tracy (0.4.3) {40700 KB} -> 044/test.tracy (0.4.4) {40699 KB}  100.00% size change
043/virtualfile_hc.tracy (0.4.3) {72424 KB} -> 044/virtualfile_hc.tracy (0.4.4) {72304 KB}  99.83% size change
043/zfile_hc.tracy (0.4.3) {39419 KB} -> 044/zfile_hc.tracy (0.4.4) {39328 KB}  99.77% size change
2019-02-17 00:29:01 +01:00
Bartosz Taudul
760e9105d0 Don't decompress memory thread data while saving trace. 2019-02-17 00:27:41 +01:00
Bartosz Taudul
9ee494c0f4 Store thread compression layout in trace dump. 2019-02-16 22:48:29 +01:00
Bartosz Taudul
d030674b83 Simplify loading memory events. 2019-02-16 22:32:14 +01:00
Bartosz Taudul
569a9fb9be Change order of file version checks during loading memory events. 2019-02-16 22:26:50 +01:00
Bartosz Taudul
88b7961421 Allocate memory for all zones at the current level at once. 2019-02-16 20:53:07 +01:00
Bartosz Taudul
c127f51767 Load time offsets to scratch buffers. 2019-02-15 02:46:25 +01:00
Bartosz Taudul
7b023e533d Use big allocation mode for Vector's reserve_exact. 2019-02-15 01:59:33 +01:00
Bartosz Taudul
127be8e995 GpuEvent doesn't need init. 2019-02-15 01:31:58 +01:00
Bartosz Taudul
92c1420c30 Improve handling of post-load background jobs.
Background tasks (source location zones sorting, reconstruction of
memory plot) are now started only after trace loading is finished.
Multithreaded sorting was previously impacting trace load times.

Only one thread is used to perform both jobs, one after another.
2019-02-14 01:17:37 +01:00
Bartosz Taudul
080873003b Simplify support for 0.2 traces. 2019-02-14 01:13:11 +01:00
Bartosz Taudul
bd1c1d044b Force inline read/write time offset functions. 2019-02-14 00:17:50 +01:00
Bartosz Taudul
08642d034b Preserve string length in string map. 2019-02-12 22:11:15 +01:00
Bartosz Taudul
d32c070a9e Two more places where connection can silently drop. 2019-02-12 11:07:12 +01:00
Bartosz Taudul
7f11260bf0 Handle dropped connection during handshake. 2019-02-12 01:41:09 +01:00
Bartosz Taudul
76186f3221 Allow zone name retrieval from source location. 2019-02-10 16:45:19 +01:00
Bartosz Taudul
c7e64bb8a8 Replace select() with poll(). 2019-02-10 15:45:23 +01:00
Bartosz Taudul
e801943b90 Array index is changing here. 2019-01-31 18:37:59 +01:00
Bartosz Taudul
852fe03cbc More references. 2019-01-29 22:10:14 +01:00
Bartosz Taudul
d6c616848c Use reference instead of repeated deep dereferences. 2019-01-29 21:59:52 +01:00
Bartosz Taudul
d86e36cc62 Fix progress of loading CPU zones. 2019-01-26 22:18:07 +01:00
Bartosz Taudul
39680ad315 Boost lock loading time. 2019-01-24 22:44:09 +01:00
Bartosz Taudul
42af2d14cc Calculate self min and max times of source location zones. 2019-01-23 14:24:22 +01:00
Bartosz Taudul
ddad475c19 Make it possible to store multiple frames at single frame address. 2019-01-20 19:11:48 +01:00
Rokas Kupstys
8157e3a0b3 Fix builds with MingW. 2019-01-19 13:53:10 +02:00
Bartosz Taudul
92f3a4bba0 Add ZoneText and ZoneName to the C API. 2019-01-16 02:10:21 +01:00
Bartosz Taudul
49e270d8a6 Detect zone end without begin failure. 2019-01-16 00:45:48 +01:00
Bartosz Taudul
708fdfea49 Track memory alloc+free matching failures. 2019-01-15 18:56:26 +01:00
Bartosz Taudul
ecf9a299de Check for proper number of failure reasons. 2019-01-15 18:56:17 +01:00
Bartosz Taudul
76ab70a948 Simplify failure detection code. 2019-01-15 18:55:47 +01:00
Bartosz Taudul
9944a73444 Store failure reason strings in Worker. 2019-01-15 18:42:15 +01:00
Bartosz Taudul
ac6e7439e2 TODO: track memory allocation tracking failures. 2019-01-14 23:26:32 +01:00
Bartosz Taudul
c3246ca3b5 Gracefully store failure states. 2019-01-14 23:22:31 +01:00
Bartosz Taudul
4dc339c933 Close connection when zone validation fails. 2019-01-14 23:12:11 +01:00
Bartosz Taudul
c3b67e4482 Perform zone stack validation. 2019-01-14 23:08:34 +01:00
Bartosz Taudul
dcc6bee607 Process zone validation messages. 2019-01-14 22:56:10 +01:00
Bartosz Taudul
da8b01357d Proper skipping of locks in 0.4.1+ (fixes compare menu). 2019-01-08 17:19:04 +01:00
Bartosz Taudul
13a0ddfe03 No need to perform capture here. 2019-01-06 21:11:36 +01:00
Bartosz Taudul
6a1c552c61 Reduce zone loading time. 2019-01-06 20:49:37 +01:00
Bartosz Taudul
980c54e349 Track trace loading time. 2019-01-06 19:09:50 +01:00
Bartosz Taudul
5ac26ce084 Init common Worker variables in header. 2019-01-06 19:04:50 +01:00
Bartosz Taudul
a313ed4720 Track separate time offset for GPU times.
This is second version of 0.4.2 dump file format. Previous 0.4.2 format
cannot be read anymore.

041/aa.tracy (0.4.1) {18987 KB} -> 042/aa.tracy (0.4.2) {10051 KB}  52.94% size change
041/android.tracy (0.4.1) {696753 KB} -> 042/android.tracy (0.4.2) {542738 KB}  77.90% size change
041/asset-new.tracy (0.4.1) {97163 KB} -> 042/asset-new.tracy (0.4.2) {78402 KB}  80.69% size change
041/asset-new-id.tracy (0.4.1) {105683 KB} -> 042/asset-new-id.tracy (0.4.2) {84341 KB}  79.81% size change
041/asset-old.tracy (0.4.1) {100205 KB} -> 042/asset-old.tracy (0.4.2) {80688 KB}  80.52% size change
041/big.tracy (0.4.1) {2246014 KB} -> 042/big.tracy (0.4.2) {939578 KB}  41.83% size change
041/crash.tracy (0.4.1) {143 KB} -> 042/crash.tracy (0.4.2) {131 KB}  91.37% size change
041/crash2.tracy (0.4.1) {3411 KB} -> 042/crash2.tracy (0.4.2) {1420 KB}  41.63% size change
041/darkrl.tracy (0.4.1) {31818 KB} -> 042/darkrl.tracy (0.4.2) {15762 KB}  49.54% size change
041/darkrl2.tracy (0.4.1) {18778 KB} -> 042/darkrl2.tracy (0.4.2) {7945 KB}  42.31% size change
041/darkrl-old.tracy (0.4.1) {151346 KB} -> 042/darkrl-old.tracy (0.4.2) {67449 KB}  44.57% size change
041/deadlock.tracy (0.4.1) {53 KB} -> 042/deadlock.tracy (0.4.2) {52 KB}  98.55% size change
041/gn-opengl.tracy (0.4.1) {45860 KB} -> 042/gn-opengl.tracy (0.4.2) {29005 KB}  63.25% size change
041/gn-vulkan.tracy (0.4.1) {45618 KB} -> 042/gn-vulkan.tracy (0.4.2) {29352 KB}  64.34% size change
041/long.tracy (0.4.1) {1583550 KB} -> 042/long.tracy (0.4.2) {1182800 KB}  74.69% size change
041/mem.tracy (0.4.1) {1243058 KB} -> 042/mem.tracy (0.4.2) {1369067 KB}  110.14% size change
041/multi.tracy (0.4.1) {14519 KB} -> 042/multi.tracy (0.4.2) {8000 KB}  55.10% size change
041/new.tracy (0.4.1) {1439 KB} -> 042/new.tracy (0.4.2) {1105 KB}  76.75% size change
041/q3bsp-mt.tracy (0.4.1) {1414323 KB} -> 042/q3bsp-mt.tracy (0.4.2) {949855 KB}  67.16% size change
041/q3bsp-st.tracy (0.4.1) {301334 KB} -> 042/q3bsp-st.tracy (0.4.2) {240347 KB}  79.76% size change
041/selfprofile.tracy (0.4.1) {399648 KB} -> 042/selfprofile.tracy (0.4.2) {197704 KB}  49.47% size change
041/tbrowser.tracy (0.4.1) {13052 KB} -> 042/tbrowser.tracy (0.4.2) {9503 KB}  72.81% size change
041/test.tracy (0.4.1) {60309 KB} -> 042/test.tracy (0.4.2) {40700 KB}  67.49% size change
041/virtualfile_hc.tracy (0.4.1) {108967 KB} -> 042/virtualfile_hc.tracy (0.4.2) {72424 KB}  66.46% size change
041/zfile_hc.tracy (0.4.1) {58814 KB} -> 042/zfile_hc.tracy (0.4.2) {39418 KB}  67.02% size change
2019-01-03 21:52:43 +01:00
Bartosz Taudul
f8ef5b726a Store time deltas, instead of absolute time in trace dumps.
This change greatly reduces the size of saved dumps, but increase the
cost of processing during loading. One notable outlier in the dataset
below is mem.tracy, which increased in size, even if changes in the
memory dump saving scheme decrease size of the other traces.

041/aa.tracy (0.4.1) {18987 KB} -> 042/aa.tracy (0.4.2) {10140 KB}  53.40% size change
041/android.tracy (0.4.1) {696753 KB} -> 042/android.tracy (0.4.2) {542738 KB}  77.90% size change
041/asset-new.tracy (0.4.1) {97163 KB} -> 042/asset-new.tracy (0.4.2) {78402 KB}  80.69% size change
041/asset-new-id.tracy (0.4.1) {105683 KB} -> 042/asset-new-id.tracy (0.4.2) {84341 KB}  79.81% size change
041/asset-old.tracy (0.4.1) {100205 KB} -> 042/asset-old.tracy (0.4.2) {80688 KB}  80.52% size change
041/big.tracy (0.4.1) {2246014 KB} -> 042/big.tracy (0.4.2) {943083 KB}  41.99% size change
041/crash.tracy (0.4.1) {143 KB} -> 042/crash.tracy (0.4.2) {131 KB}  91.39% size change
041/crash2.tracy (0.4.1) {3411 KB} -> 042/crash2.tracy (0.4.2) {1425 KB}  41.80% size change
041/darkrl.tracy (0.4.1) {31818 KB} -> 042/darkrl.tracy (0.4.2) {15897 KB}  49.96% size change
041/darkrl2.tracy (0.4.1) {18778 KB} -> 042/darkrl2.tracy (0.4.2) {8002 KB}  42.62% size change
041/darkrl-old.tracy (0.4.1) {151346 KB} -> 042/darkrl-old.tracy (0.4.2) {67945 KB}  44.89% size change
041/deadlock.tracy (0.4.1) {53 KB} -> 042/deadlock.tracy (0.4.2) {52 KB}  98.55% size change
041/gn-opengl.tracy (0.4.1) {45860 KB} -> 042/gn-opengl.tracy (0.4.2) {30983 KB}  67.56% size change
041/gn-vulkan.tracy (0.4.1) {45618 KB} -> 042/gn-vulkan.tracy (0.4.2) {31349 KB}  68.72% size change
041/long.tracy (0.4.1) {1583550 KB} -> 042/long.tracy (0.4.2) {1225316 KB}  77.38% size change
041/mem.tracy (0.4.1) {1243058 KB} -> 042/mem.tracy (0.4.2) {1369291 KB}  110.15% size change
041/multi.tracy (0.4.1) {14519 KB} -> 042/multi.tracy (0.4.2) {8110 KB}  55.86% size change
041/new.tracy (0.4.1) {1439 KB} -> 042/new.tracy (0.4.2) {1108 KB}  77.01% size change
041/q3bsp-mt.tracy (0.4.1) {1414323 KB} -> 042/q3bsp-mt.tracy (0.4.2) {949855 KB}  67.16% size change
041/q3bsp-st.tracy (0.4.1) {301334 KB} -> 042/q3bsp-st.tracy (0.4.2) {240347 KB}  79.76% size change
041/selfprofile.tracy (0.4.1) {399648 KB} -> 042/selfprofile.tracy (0.4.2) {197713 KB}  49.47% size change
041/tbrowser.tracy (0.4.1) {13052 KB} -> 042/tbrowser.tracy (0.4.2) {9503 KB}  72.81% size change
041/test.tracy (0.4.1) {60309 KB} -> 042/test.tracy (0.4.2) {40700 KB}  67.49% size change
041/virtualfile_hc.tracy (0.4.1) {108967 KB} -> 042/virtualfile_hc.tracy (0.4.2) {72839 KB}  66.85% size change
041/zfile_hc.tracy (0.4.1) {58814 KB} -> 042/zfile_hc.tracy (0.4.2) {39608 KB}  67.35% size change
2018-12-30 23:42:17 +01:00
Bartosz Taudul
8c5670489c Freeing nullptr is valid. 2018-12-20 17:03:09 +01:00
Bartosz Taudul
a220f38fbd Add support for matching source locations ignoring case. 2018-12-18 16:52:29 +01:00
Bartosz Taudul
acddcbd9bf Add case-ignoring string matcher. 2018-12-18 16:52:05 +01:00
Bartosz Taudul
7376ec65b0 Store lock announce and terminate time in trace dump. 2018-12-16 21:09:37 +01:00
Bartosz Taudul
9360df89b1 Store announce and terminate time of locks. 2018-12-16 21:07:26 +01:00
Bartosz Taudul
f42d52923a No-op processing of lock terminate events. 2018-12-16 20:46:33 +01:00
Bartosz Taudul
793e955480 Fix crash when loading a trace with unresolved strings.
Unresolved strings ("???") are not saved, but the internal string
pointers are saved. Resolving such string pointers caused a crash.
2018-10-21 16:38:20 +02:00
Bartosz Taudul
9211ce42da Non-on-demand client is only able to handle one connection. 2018-09-09 19:42:06 +02:00
Bartosz Taudul
984a711666 Send protocol version to verify handshake. 2018-09-09 19:28:53 +02:00
Bartosz Taudul
270072b09e Require shibboleth match at start of connection. 2018-09-09 18:26:53 +02:00
Bartosz Taudul
806c8de463 Only one outgoing server connection is supported. 2018-09-09 17:47:20 +02:00
Bartosz Taudul
9f4d6692dc Proper way to get full frame count. 2018-09-01 12:38:12 +02:00
Bartosz Taudul
8f1acf2571 Store explicit program name and capture time. 2018-08-29 01:02:29 +02:00
Bartosz Taudul
bc6a553a3a Fetch thread names in memory events. 2018-08-28 01:48:19 +02:00
Bartosz Taudul
99b7a39c52 Save/load crash information. 2018-08-20 02:27:24 +02:00
Bartosz Taudul
3b526b074e Send crash report. 2018-08-20 02:23:55 +02:00
Bartosz Taudul
366ea35593 Allow crash event reporting.
When crash happens there's no longer anything to profile -- don't wait
for unfinished zones to finish before sending client terminate
confirmation.
2018-08-20 01:03:16 +02:00
Bartosz Taudul
e0a4b9c56a Save/load host info. 2018-08-19 18:28:48 +02:00
Bartosz Taudul
203d9b4b85 Store host info. 2018-08-19 18:21:56 +02:00
Bartosz Taudul
a15a287a6b Don't over-allocate vectors, when exact needed size is known.
This reduces memory usage when loading saved traces. Memory usage
reduction observed on a selected number of traces:

5625.76 MB -> 5330.29 MB
3292.94 MB -> 2978.66 MB
632.77 MB  -> 479.58 MB
681.32 MB  -> 506.27 MB
11.9 GB    -> 11.22 GB
854.21 MB  -> 806.17 MB
10.57 GB   -> 7175.31 MB
67.38 MB   -> 66.63 MB
2026.12 MB -> 1744.2 MB
86.55 MB   -> 85.57 MB
343.64 MB  -> 244.81 MB
201.93 MB  -> 162.25 MB
2018-08-09 19:41:15 +02:00
Bartosz Taudul
a14a6fa8fb Don't shadow variables. 2018-08-09 19:41:15 +02:00
Bartosz Taudul
a51da71fa4 Add lock, plot counts to worker. 2018-08-08 19:21:53 +02:00
Bartosz Taudul
d36b0aff45 Fix progress of loading GPU zones. 2018-08-05 13:07:58 +02:00
Bartosz Taudul
9d051cf5ee Add support for discontinuous frames. 2018-08-05 02:15:54 +02:00
Bartosz Taudul
6b8a3b25ba Fix drawing of last frame. 2018-08-04 23:19:35 +02:00
Bartosz Taudul
9b4348b497 Handle frame name queries. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
4424a7d7e8 Last time should never be zero. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
23dfc2e3fc Multiple frame sets support. 2018-08-04 21:10:45 +02:00
Bartosz Taudul
0b4c2724ce Add strings to map directly in StringDiscovery. 2018-08-04 17:10:45 +02:00
Bartosz Taudul
ada9f78678 Use StringDiscovery for plots. 2018-08-04 16:33:03 +02:00
Bartosz Taudul
e174e2c12a Remove obsolete comment.
Nothing happens with the source data, as the strings are uniquely stored
in the StoreString() function.
2018-08-04 15:46:10 +02:00
Bartosz Taudul
6ef2d2d9a3 Track progress of loading plots. 2018-08-04 15:17:37 +02:00
Bartosz Taudul
18896044c4 Display explicit names of loaded things. 2018-07-29 16:56:46 +02:00
Bartosz Taudul
9f13475b52 Track trace version in worker. 2018-07-29 15:33:48 +02:00
Bartosz Taudul
13509c14f1 Save size of 'active' and 'frees' memory data structures. 2018-07-29 15:29:56 +02:00
Bartosz Taudul
00d07e39f7 Save threadExpand size to allow vector preallocation. 2018-07-29 15:19:44 +02:00
Bartosz Taudul
bff6eb4c34 Save source location zones counts.
This allows preallocation of zones-in-source-location vectors.
2018-07-29 14:58:01 +02:00
Bartosz Taudul
12b90d1630 Move tracy version to a separate header. 2018-07-29 14:20:44 +02:00
Bartosz Taudul
ccc5c37af5 Always count source location zones. 2018-07-29 14:16:13 +02:00
Bartosz Taudul
4456c8a454 Reserve space for string data. 2018-07-29 14:13:29 +02:00
Bartosz Taudul
648070e6a1 Include each loaded zone in sub progress. 2018-07-28 19:22:28 +02:00
Bartosz Taudul
4741dab833 Track sub progress. 2018-07-28 19:05:01 +02:00
Bartosz Taudul
a14238c199 Add sub progress display. 2018-07-28 18:56:52 +02:00
Bartosz Taudul
a46425f4e9 Adjust load stages. 2018-07-28 18:26:00 +02:00
Bartosz Taudul
0bf0ceed3d Track trace loading progress. 2018-07-28 17:59:17 +02:00
Bartosz Taudul
d84d0b7754 Don't try to read empty timelines. 2018-07-22 21:15:28 +02:00
Bartosz Taudul
25116a8059 Don't try to compress invalid thread. 2018-07-22 21:13:42 +02:00
Bartosz Taudul
010cf66e43 Call Vector destructors. 2018-07-22 21:01:45 +02:00
Bartosz Taudul
29159069ab Properly initialize child index. 2018-07-22 20:14:55 +02:00
Bartosz Taudul
7d7877517e Also remove child vectors from GPU events. 2018-07-22 19:47:01 +02:00
Bartosz Taudul
3a934b2ba3 Store children vectors in a separate data collection.
This reduces per-zone memory cost by 9 bytes if there are no children
and increases it by 4 bytes, if there are children. This is universally
a better solution, as the following data shows:

+++ /home/wolf/desktop/tracy-old/android.tracy +++
Vectors: 2794480
Size 0: 2373070 (84.92%)
Size 1: 70237 (2.51%)
Size 2+: 351173 (12.57%)
+++ /home/wolf/desktop/tracy-old/asset-new.tracy +++
Vectors: 1799227
Size 0: 1482691 (82.41%)
Size 1: 93272 (5.18%)
Size 2+: 223264 (12.41%)
+++ /home/wolf/desktop/tracy-old/asset-new-id.tracy +++
Vectors: 1977996
Size 0: 1640817 (82.95%)
Size 1: 97198 (4.91%)
Size 2+: 239981 (12.13%)
+++ /home/wolf/desktop/tracy-old/asset-old.tracy +++
Vectors: 1782395
Size 0: 1471437 (82.55%)
Size 1: 88813 (4.98%)
Size 2+: 222145 (12.46%)
+++ /home/wolf/desktop/tracy-old/big.tracy +++
Vectors: 180794047
Size 0: 172696094 (95.52%)
Size 1: 2799772 (1.55%)
Size 2+: 5298181 (2.93%)
+++ /home/wolf/desktop/tracy-old/darkrl.tracy +++
Vectors: 12014129
Size 0: 11611324 (96.65%)
Size 1: 134980 (1.12%)
Size 2+: 267825 (2.23%)
+++ /home/wolf/desktop/tracy-old/mem.tracy +++
Vectors: 383097
Size 0: 321932 (84.03%)
Size 1: 854 (0.22%)
Size 2+: 60311 (15.74%)
+++ /home/wolf/desktop/tracy-old/new.tracy +++
Vectors: 77536
Size 0: 63035 (81.30%)
Size 1: 8886 (11.46%)
Size 2+: 5615 (7.24%)
+++ /home/wolf/desktop/tracy-old/selfprofile.tracy +++
Vectors: 22940871
Size 0: 22704868 (98.97%)
Size 1: 73000 (0.32%)
Size 2+: 163003 (0.71%)
+++ /home/wolf/desktop/tracy-old/tbrowser.tracy +++
Vectors: 962682
Size 0: 695380 (72.23%)
Size 1: 43007 (4.47%)
Size 2+: 224295 (23.30%)
+++ /home/wolf/desktop/tracy-old/virtualfile_hc.tracy +++
Vectors: 529170
Size 0: 449386 (84.92%)
Size 1: 15694 (2.97%)
Size 2+: 64090 (12.11%)
+++ /home/wolf/desktop/tracy-old/zfile_hc.tracy +++
Vectors: 264849
Size 0: 220589 (83.29%)
Size 1: 9386 (3.54%)
Size 2+: 34874 (13.17%)
2018-07-22 16:05:50 +02:00
Bartosz Taudul
fc310ce15a Fix check. 2018-07-17 18:29:07 +02:00
Rokas Kupstys
8a8faa3d6c Added __has_include(<execution>) back. 2018-07-17 19:25:26 +03:00
Rokas Kupstys
5c75fe292f Fix msvc builds when required c++ standard version is set to lower than c++17.
Also use latest available c++ standard which allows using older VS versions that only support c++14.
2018-07-17 18:29:48 +03:00
Bartosz Taudul
c6ea032de3 GPU source location may not yet be available. 2018-07-15 19:00:40 +02:00
Bartosz Taudul
21da3bca63 Don't create lz4buf on stack. 2018-07-14 16:02:33 +02:00
Bartosz Taudul
561d2dc360 Use the fastest mutex available.
The selection is based on the following test results:

MSVC:
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 11.641 ns/iter
     2 thread contention: 141.559 ns/iter
     3 thread contention: 242.733 ns/iter
     4 thread contention: 409.807 ns/iter
     5 thread contention: 561.544 ns/iter
     6 thread contention: 785.845 ns/iter
=> std::mutex
     No contention: 19.190 ns/iter
     2 thread contention: 39.305 ns/iter
     3 thread contention: 58.999 ns/iter
     4 thread contention: 59.532 ns/iter
     5 thread contention: 103.539 ns/iter
     6 thread contention: 110.314 ns/iter
=> std::shared_timed_mutex
     No contention: 45.487 ns/iter
     2 thread contention: 96.351 ns/iter
     3 thread contention: 142.871 ns/iter
     4 thread contention: 184.999 ns/iter
     5 thread contention: 336.608 ns/iter
     6 thread contention: 542.551 ns/iter
=> std::shared_mutex
     No contention: 10.861 ns/iter
     2 thread contention: 17.495 ns/iter
     3 thread contention: 31.126 ns/iter
     4 thread contention: 40.468 ns/iter
     5 thread contention: 15.677 ns/iter
     6 thread contention: 64.505 ns/iter

Cygwin (clang):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 11.536 ns/iter
     2 thread contention: 121.082 ns/iter
     3 thread contention: 396.430 ns/iter
     4 thread contention: 672.555 ns/iter
     5 thread contention: 1327.761 ns/iter
     6 thread contention: 14151.955 ns/iter
=> std::mutex
     No contention: 62.583 ns/iter
     2 thread contention: 3990.464 ns/iter
     3 thread contention: 7161.189 ns/iter
     4 thread contention: 9870.820 ns/iter
     5 thread contention: 12355.178 ns/iter
     6 thread contention: 14694.903 ns/iter
=> std::shared_timed_mutex
     No contention: 91.687 ns/iter
     2 thread contention: 1115.037 ns/iter
     3 thread contention: 4183.792 ns/iter
     4 thread contention: 15283.491 ns/iter
     5 thread contention: 27812.477 ns/iter
     6 thread contention: 35028.140 ns/iter
=> std::shared_mutex
     No contention: 91.764 ns/iter
     2 thread contention: 1051.826 ns/iter
     3 thread contention: 5574.720 ns/iter
     4 thread contention: 15721.416 ns/iter
     5 thread contention: 27721.487 ns/iter
     6 thread contention: 35420.404 ns/iter

Linux (x64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 13.487 ns/iter
     2 thread contention: 210.317 ns/iter
     3 thread contention: 430.855 ns/iter
     4 thread contention: 510.533 ns/iter
     5 thread contention: 1003.609 ns/iter
     6 thread contention: 1787.683 ns/iter
=> std::mutex
     No contention: 12.403 ns/iter
     2 thread contention: 157.122 ns/iter
     3 thread contention: 186.791 ns/iter
     4 thread contention: 265.073 ns/iter
     5 thread contention: 283.778 ns/iter
     6 thread contention: 270.687 ns/iter
=> std::shared_timed_mutex
     No contention: 21.509 ns/iter
     2 thread contention: 150.179 ns/iter
     3 thread contention: 256.574 ns/iter
     4 thread contention: 415.351 ns/iter
     5 thread contention: 611.532 ns/iter
     6 thread contention: 944.695 ns/iter
=> std::shared_mutex
     No contention: 20.805 ns/iter
     2 thread contention: 157.034 ns/iter
     3 thread contention: 244.025 ns/iter
     4 thread contention: 406.269 ns/iter
     5 thread contention: 387.985 ns/iter
     6 thread contention: 468.550 ns/iter

Linux (arm64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
     No contention: 20.891 ns/iter
     2 thread contention: 211.037 ns/iter
     3 thread contention: 409.962 ns/iter
     4 thread contention: 657.441 ns/iter
     5 thread contention: 828.405 ns/iter
     6 thread contention: 1131.827 ns/iter
=> std::mutex
     No contention: 50.884 ns/iter
     2 thread contention: 103.620 ns/iter
     3 thread contention: 332.429 ns/iter
     4 thread contention: 620.802 ns/iter
     5 thread contention: 783.943 ns/iter
     6 thread contention: 834.002 ns/iter
=> std::shared_timed_mutex
     No contention: 64.948 ns/iter
     2 thread contention: 173.191 ns/iter
     3 thread contention: 490.352 ns/iter
     4 thread contention: 660.668 ns/iter
     5 thread contention: 1014.546 ns/iter
     6 thread contention: 1451.553 ns/iter
=> std::shared_mutex
     No contention: 64.521 ns/iter
     2 thread contention: 195.222 ns/iter
     3 thread contention: 490.819 ns/iter
     4 thread contention: 654.786 ns/iter
     5 thread contention: 955.759 ns/iter
     6 thread contention: 1282.544 ns/iter
2018-07-14 00:39:01 +02:00
Bartosz Taudul
96042891f7 Reintroduce explicit template type for std::lock_guard.
Requested in issue #4 for support of older MSVC versions.
2018-07-13 12:30:29 +02:00
Bartosz Taudul
90a874f311 Require MSVC 15.7 for <execution> support. 2018-07-13 12:26:02 +02:00
Bartosz Taudul
c8b5b9447d Ignore dangling memory frees in on-demand mode. 2018-07-12 01:35:32 +02:00
Bartosz Taudul
e5064dec1e Store on-demand connection state. 2018-07-12 01:21:04 +02:00
Bartosz Taudul
d1ddaa8d59 Store frame offset in trace dumps. 2018-07-10 22:56:41 +02:00
Bartosz Taudul
a78981e040 Store on-demand frame offset. 2018-07-10 22:42:00 +02:00
Bartosz Taudul
6a9caabc63 Send on-demand initial payload message. 2018-07-10 22:37:39 +02:00
Bartosz Taudul
c056f3be41 Send keep alive messages to determine if client disconnected. 2018-07-10 21:39:17 +02:00
Bartosz Taudul
cb100e261c Return custom zone names. 2018-06-29 16:12:40 +02:00
Bartosz Taudul
053284b1c7 Process custom free-form zone names. 2018-06-29 16:12:17 +02:00
Bartosz Taudul
865e8d8506 Extract zone name getting functionality. 2018-06-29 15:14:20 +02:00
Bartosz Taudul
4a467b6d03 Remove GPU resync leftovers. 2018-06-28 00:48:23 +02:00
Bartosz Taudul
ab2945b988 Slab allocator is not thread safe. 2018-06-24 17:10:46 +02:00
Bartosz Taudul
b0aa13f4af Callstack getters are const. 2018-06-24 16:15:49 +02:00
Bartosz Taudul
11cf650be6 Fix GPU queries ordering.
With multithreaded Vulkan rendering it is possible that GPU time queries
will be sent in a different order than the originating CPU queries were
made. This commit changes the in-order queue to a map of queries,
waiting to be resolved.
2018-06-22 16:37:54 +02:00
Bartosz Taudul
af0c64c888 Remove GPU resync support.
The whole concept is not really reliable. And it forces CPU to GPU sync,
which is bad.
2018-06-22 16:34:51 +02:00
Bartosz Taudul
cd5ca3e754 Don't use hash table to store 256 pointers. 2018-06-22 15:14:44 +02:00
Bartosz Taudul
3a885bb8fd Support callstack collection for OpenGL GPU zones. 2018-06-22 02:13:35 +02:00
Bartosz Taudul
35dc2f796e Process GpuZoneBeginCallstack queue event. 2018-06-22 01:56:32 +02:00
Bartosz Taudul
4992ae6b39 Take callstack field in ZoneEvent into account in save/load. 2018-06-22 01:30:08 +02:00
Bartosz Taudul
5e01a8ead9 Process callstack queue event. 2018-06-22 01:15:49 +02:00
Bartosz Taudul
205a4e4ca2 Add callstack index to ZoneEvent. 2018-06-22 01:11:03 +02:00
Bartosz Taudul
978e168cbd Handle ZoneBeginCallstack queue event.
This is identical to ZoneBegin handling, but requires some additional
bookkeeping to account for the incoming callstack information.
2018-06-22 01:07:25 +02:00
Bartosz Taudul
973eab2b4a Fix typo. 2018-06-20 23:42:00 +02:00
Bartosz Taudul
2a618c90d5 Properly save compressed thread in GPU events. 2018-06-20 23:12:49 +02:00
Bartosz Taudul
7912807133 Wait for transfer of pending callback frames. 2018-06-20 14:57:48 +02:00
Bartosz Taudul
60395c85e0 Wait for pending callstacks. 2018-06-20 14:54:08 +02:00
Bartosz Taudul
9a5329b97d Save and load callstack frames. 2018-06-20 01:59:25 +02:00
Bartosz Taudul
e56ee377f4 Fix off-by-one. 2018-06-20 01:54:27 +02:00
Bartosz Taudul
88b1955a5a Filename in callstack frame is not a persistent pointer. 2018-06-20 01:26:05 +02:00
Bartosz Taudul
4000f27e15 Stack frame accessor. 2018-06-20 01:18:59 +02:00
Bartosz Taudul
0c0afa5ac7 Process callstack frames. 2018-06-20 01:07:09 +02:00
Bartosz Taudul
203744cdd9 Callstack frame queries. 2018-06-20 00:25:26 +02:00
Bartosz Taudul
06f34052a5 Have to track callstacks of both alloc and free. 2018-06-19 22:08:47 +02:00
Bartosz Taudul
0de279005b Load saved callstack payload. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
14b71e988b Properly skip memory event data. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
4033d74479 Callstack payload index 0 is invalid. 2018-06-19 22:05:15 +02:00
Bartosz Taudul
b6e71dd909 Load memory event callstack index. 2018-06-19 21:51:06 +02:00
Bartosz Taudul
7c1333ce2f Save callstack payload. 2018-06-19 21:39:52 +02:00
Bartosz Taudul
2940230fcf Save callstack index in memory events. 2018-06-19 21:39:42 +02:00
Bartosz Taudul
77db91253b Assign callstack idx to memory event. 2018-06-19 21:34:36 +02:00
Bartosz Taudul
c28465aa7c Store unique callstack payloads. 2018-06-19 21:16:02 +02:00
Bartosz Taudul
cbc9ede3ca No-op callstack payload handling. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
6a63d09a49 Don't check for each type, if range check is possible. 2018-06-19 19:31:16 +02:00
Bartosz Taudul
e51eef3dcd Process memory events with callstack. 2018-06-19 18:52:45 +02:00
Bartosz Taudul
59dc55002b Callstack ptr in server data structures.
Will be probably reduced to 32-bit index later on.
2018-06-19 18:52:10 +02:00
Bartosz Taudul
bb0631585c Store thread id of GPU events. 2018-06-17 19:07:07 +02:00
Bartosz Taudul
cfd7ac3957 Map compressed thread id 0 to real thread id 0. 2018-06-17 19:03:06 +02:00
Bartosz Taudul
d5a4c693d8 Take GPU timestamp period into account. 2018-06-17 18:49:56 +02:00
Bartosz Taudul
dcd6cac078 Save GPU timestamp period.
Bump file version to 0.3.2.
2018-06-17 18:27:42 +02:00
Bartosz Taudul
2be1d1d2b2 Use proper type. 2018-06-07 13:30:46 +02:00
Bartosz Taudul
b7930f67da Calculate total self time of zones. 2018-06-06 00:39:22 +02:00
Bartosz Taudul
53aea660c8 Store thread id in MessageData. 2018-05-25 21:10:38 +02:00
Bartosz Taudul
bb0246730f Don't save MessageData padding.
This requires file version bump to 0.3.1.
2018-05-25 21:10:38 +02:00
Bartosz Taudul
312c20b0bc Fallback to pdqsort if parallel STL is not available. 2018-05-12 22:41:18 +02:00
Bartosz Taudul
920bfc8c82 Parallelize (big) sorts in worker. 2018-05-08 01:40:22 +02:00
Bartosz Taudul
dbc963d55c Drop template argument from std::lock_guard. 2018-05-08 01:25:16 +02:00
Bartosz Taudul
3768ed5dd7 Don't reconstruct mem plot if there's no mem event data. 2018-05-04 16:08:16 +02:00
Bartosz Taudul
e7ffe288e6 One less FileWrite::Write() call. 2018-05-04 15:11:19 +02:00
Bartosz Taudul
e058bb34c1 CompressThread body must be available. 2018-05-03 18:43:51 +02:00
Bartosz Taudul
b18841aa75 Store ordered list of memory frees. 2018-05-02 17:59:50 +02:00
Bartosz Taudul
754e79b443 Setup memory plot pointer on dump load. 2018-05-02 17:18:52 +02:00
Bartosz Taudul
7266a979c3 Omit stack. 2018-05-01 02:13:49 +02:00
Bartosz Taudul
8beb1c1a39 Add thread compression cache.
Observation: calls to CompressThread() are likely to be repeated with
the same value. Exploit that by storing last query and its result.
2018-05-01 01:29:25 +02:00
Bartosz Taudul
ec58aa4ce1 Don't increase vector size in each iteration. 2018-04-30 13:57:12 +02:00
Bartosz Taudul
553e3ca38b Optimize mem plot reconstruction loop. 2018-04-30 13:45:36 +02:00
Bartosz Taudul
76f0c8fafe Sort source location zones on a separate thread. 2018-04-30 03:54:09 +02:00
Bartosz Taudul
63e4f6fa04 Directly store values. 2018-04-30 03:30:19 +02:00
Bartosz Taudul
e5cb241c19 Optimize creation of vector of frees. 2018-04-29 13:40:47 +02:00
Bartosz Taudul
3eb73b8d43 Move memory plot reconstruction to a background thread. 2018-04-29 13:40:04 +02:00
Bartosz Taudul
bc84ebc338 Read/write LockEvent data in one go. 2018-04-29 03:41:58 +02:00
Bartosz Taudul
c5133e0b4e Walk lockmap timeline pointer. 2018-04-29 03:41:58 +02:00
Bartosz Taudul
9769cc4d7d Read/write most of MemEvent in one go. 2018-04-29 03:41:58 +02:00
Bartosz Taudul
d5f0f0939d No need to track min memory usage.
At least if client instrumentation was not broken and the data makes
sense.
2018-04-29 02:57:20 +02:00
Bartosz Taudul
7fdc6f5453 Zero as initial max value is fine too. 2018-04-29 02:56:23 +02:00
Bartosz Taudul
723f98d24b Overflow checks are not needed. 2018-04-29 02:47:25 +02:00
Bartosz Taudul
b06f445de9 Don't use stack to write two values... 2018-04-29 02:32:20 +02:00
Bartosz Taudul
333d3a92c8 Perform memory usage calculation on doubles. 2018-04-29 02:29:06 +02:00
Bartosz Taudul
aceaed25b9 Walk plot data pointer. 2018-04-29 02:11:47 +02:00
Bartosz Taudul
868fbace5a Don't compress thread twice, if it's the same. 2018-04-29 02:04:51 +02:00
Bartosz Taudul
fdaebc2bd8 No need to perform space check here. 2018-04-29 01:38:54 +02:00
Bartosz Taudul
d64f0390da Don't use std::sort. 2018-04-29 01:23:30 +02:00
Bartosz Taudul
7df7bf1745 Begin memory plot with no memory usage. 2018-04-28 16:26:45 +02:00
Bartosz Taudul
a0b8ed2e50 Restore memory plot when loading data dump. 2018-04-28 16:26:45 +02:00
Bartosz Taudul
d8bfe7de2e Create memory plot based on memory alloc/free events. 2018-04-28 15:49:12 +02:00
Bartosz Taudul
cd34ed6968 Two plot types: user and memory.
Only user plots are saved in a dump file.
2018-04-28 15:48:05 +02:00
Bartosz Taudul
1fb47899b2 Fix skipping lock data with new dump version. 2018-04-22 01:26:51 +02:00
Bartosz Taudul
436cd2b6cf Drop '###Profiler' from capture name. 2018-04-21 23:29:28 +02:00
Bartosz Taudul
d1e185e176 Cleanup message data. 2018-04-21 20:36:33 +02:00
Bartosz Taudul
4cd9cf5dd9 Cleanup zone data. 2018-04-21 20:34:29 +02:00
Bartosz Taudul
0de5bcacaf Free plot data. 2018-04-21 20:12:16 +02:00
Bartosz Taudul
dda25cf66a Cosmetics. 2018-04-21 20:11:59 +02:00
Bartosz Taudul
cb298893e7 Fix skipping lock data. 2018-04-21 16:02:36 +02:00
Bartosz Taudul
121cced681 Don't save unneeded lock data.
Store only the minimal lock information required and calculate lock
counts, wait lists, etc. at load time.
2018-04-21 15:42:08 +02:00
Bartosz Taudul
a63f214964 Use static assert where static assert is due. 2018-04-21 14:47:15 +02:00
Bartosz Taudul
36efe96e9d Throw exception when trying to open unsupported dump version. 2018-04-21 14:18:42 +02:00
Bartosz Taudul
d9fd1ce74a Add dump file header. 2018-04-21 13:45:48 +02:00
Bartosz Taudul
84fd351fba Allow partial load of data from dump. 2018-04-20 16:03:09 +02:00
Bartosz Taudul
3df7c70f99 Optimize mem alloc processing. 2018-04-10 16:06:01 +02:00