Commit Graph

264 Commits

Author SHA1 Message Date
Bartosz Taudul
a40bbacb17 Use short ptr for CPU zone data. 2019-11-02 16:54:12 +01:00
Bartosz Taudul
cb20bf01f9 Use short ptr for GPU zone data. 2019-11-02 16:54:11 +01:00
Bartosz Taudul
16bc862904 Save sizes of children vectors to prevent reallocation. 2019-11-02 12:38:32 +01:00
Bartosz Taudul
25b610a36f Pack child into GPU start/end in GpuEvent (saves 4 bytes).
long    5152 MB -> 5061 MB
2019-10-30 23:50:37 +01:00
Bartosz Taudul
789b95f259 Force inline small functions. 2019-10-29 01:32:09 +01:00
Bartosz Taudul
8050622b0f Read and decompress network data on a separate thread. 2019-10-28 23:22:50 +01:00
Bartosz Taudul
788ca2e5df Spawn no-op network thread. 2019-10-28 22:45:10 +01:00
Bartosz Taudul
01985f50ef Cache source location zones counter search. 2019-10-26 16:33:40 +02:00
Bartosz Taudul
1d0084aa28 Add cache for last accessed source location zones. 2019-10-25 21:29:55 +02:00
Bartosz Taudul
779063a18b Cache last shrinked source location. 2019-10-25 21:07:28 +02:00
Bartosz Taudul
294793367f Cache last CheckSourceLocation query.
Just knowing that the query was performed is enough here -- this
function adds a new source location entry, if there already isn't one.
2019-10-25 21:01:33 +02:00
Bartosz Taudul
0f2503d334 Send time deltas in GPU time events. 2019-10-25 19:52:01 +02:00
Bartosz Taudul
1ce25d3aef Init cache in-place. 2019-10-25 19:19:35 +02:00
Bartosz Taudul
8fa5188176 Send delta times for context switches. 2019-10-25 19:13:11 +02:00
Bartosz Taudul
c8e5489e99 Group caches together. 2019-10-25 18:16:27 +02:00
Bartosz Taudul
d6a8a8532f Prevent storing variable on stack. 2019-10-24 23:40:21 +02:00
Bartosz Taudul
1fe76be955 Don't reconstruct lock event time on insert. 2019-10-24 23:25:04 +02:00
Bartosz Taudul
45332fd837 Don't read memory when setting values. 2019-10-24 23:03:13 +02:00
Bartosz Taudul
5873561b54 Add cached thread retriever. 2019-10-24 22:33:48 +02:00
Bartosz Taudul
1cfb5adc44 Count transferred data size. 2019-10-24 00:47:16 +02:00
Bartosz Taudul
ba61a9ed84 Transfer time deltas, not absolute times.
This change significantly reduces network bandwidth requirements.

Implemented for:
- CPU zones,
- GPU zones,
- locks,
- plots,
- memory events.
2019-10-24 00:06:41 +02:00
Bartosz Taudul
3ae5c125f6 Implement counting CPU usage (ctx switch) at a given time. 2019-10-15 16:54:43 +02:00
Bartosz Taudul
eccb0b1e4a Track max CPU present in context switch data. 2019-10-15 16:13:53 +02:00
Bartosz Taudul
1ae49c14a2 GPU zone count accessor. 2019-10-13 14:13:28 +02:00
Bartosz Taudul
5e1894dd79 Count GPU zones. 2019-10-13 14:13:04 +02:00
Bartosz Taudul
f0b957ec56 Store callstacks on 24 bits.
ZoneEvent is now 27 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9224 -> 9011  (97%)
chicken         2044 -> 2027  (99%)
drl-l-b         1443 -> 1383  (95%)
long            5327 -> 5253  (98%)
q3bsp-mt        5400 -> 5304  (98%)
selfprofile     1403 -> 1382  (98%)
2019-10-01 22:38:17 +02:00
Bartosz Taudul
717a212563 Save another 2 bytes per ZoneEvent.
ZoneEvent is not 28 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9527 -> 9224  (96%)
chicken         2107 -> 2044  (97%)
drl-l-b         1479 -> 1443  (97%)
long            5412 -> 5327  (98%)
q3bsp-mt        5592 -> 5400  (96%)
selfprofile     1443 -> 1403  (97%)
2019-10-01 01:05:37 +02:00
Bartosz Taudul
2470936050 Don't perform background tasks during trace upgrade. 2019-09-29 20:52:25 +02:00
Bartosz Taudul
d228bcb622 Pack StringIdx in 24 bits.
This reduces ZoneEvent size from 32 to 30 bytes.

Memory usage reduction on selected traces (sizes in MB):

big             9902 -> 9527  (96%)
chicken         2172 -> 2107  (97%)
ctx-big          311 ->  309  (99%)
drl-l-b         1570 -> 1479  (94%)
long            5496 -> 5412  (98%)
mem             6468 -> 6468  (100%)
q3bsp-mt        5784 -> 5592  (96%)
selfprofile     1486 -> 1443  (97%)
2019-09-29 20:32:42 +02:00
Bartosz Taudul
c91ae667d1 Add string count getter. 2019-09-29 19:22:15 +02:00
Bartosz Taudul
82cd667b30 Allow specifying network port in server. 2019-09-21 15:43:01 +02:00
Bartosz Taudul
e1e5d6bd47 Add const version of PackFrameImage().
Temporary buffer needs to be handled outside of the function.
2019-09-20 22:55:55 +02:00
Bartosz Taudul
ea6a0a58a7 Thread data accessor. 2019-09-08 14:07:16 +02:00
Bartosz Taudul
86cb477811 Pack ZoneThreadData.
This reduces struct size from 10 to 8 bytes. Assumes 48-bit pointers
(4-level paging)!

Memory savings (MB):

android     2766    ->  2757    (99%)
big         10.29 G ->  9902    (96%)
chicken     2244    ->  2172    (96%)
ctx-android 228     ->  224     (98%)
drl-l-b     1635    ->  1570    (96%)
gn-vulkan   244     ->  240     (98%)
long        5656    ->  5496    (97%)
q3bsp-mt    6043    ->  5784    (95%)
selfprofile 1554    ->  1486    (95%)
2019-08-31 00:55:51 +02:00
Bartosz Taudul
fd5014be6f GetThreadString() is no longer used. 2019-08-28 20:08:16 +02:00
Bartosz Taudul
3c092b4bec Add thread name getter combining local and external thread names. 2019-08-27 23:00:13 +02:00
Bartosz Taudul
1712431dfd Compress external threads. Saves 4 bytes per ctx switch.
Dropped support for loading context switch data in previous versions of
traces.
2019-08-19 23:09:58 +02:00
Bartosz Taudul
21e7a4bb16 Extract thread compression into a separate class. 2019-08-19 22:56:58 +02:00
Bartosz Taudul
62dbe522c5 Add accessors. 2019-08-18 01:51:02 +02:00
Bartosz Taudul
103645c2fa Calculate cpu thread data statistics. 2019-08-18 01:50:49 +02:00
Bartosz Taudul
20e8a5ecc8 Create tid to pid mapping. 2019-08-17 22:32:41 +02:00
Bartosz Taudul
678e942e9f Transfer PID of profiled program. 2019-08-17 22:19:04 +02:00
Bartosz Taudul
414f903cc5 Collect thread wakeup data. 2019-08-17 17:05:29 +02:00
Bartosz Taudul
e975c4d7bf Also retrieve external thread names. 2019-08-16 19:49:16 +02:00
Bartosz Taudul
fe7f56b022 Implement retrieval of external process names. 2019-08-16 19:22:23 +02:00
Bartosz Taudul
c212661714 Allow determining whether thread is local to profiled program. 2019-08-16 17:59:25 +02:00
Bartosz Taudul
cef7e4b8d0 Save/load per-cpu context switches. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
8bc4258e29 Display count of per-cpu context switch data. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
a92034d59d CPU data accessor. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
69527d2f71 Collect per-cpu context switch data. 2019-08-16 16:51:18 +02:00
Bartosz Taudul
41beff29a9 Remove redundant GetTimeBegin().
Traces now start at zero time.
2019-08-15 21:04:20 +02:00
Bartosz Taudul
bf3ad57456 Pack start time and srcloc together in ZoneEvent.
This reduces ZoneEvent struct size by 2 bytes. Memory savings on various
captures:

10.62 GB -> 10.29 GB
 2342 MB ->  2276 MB
 1706 MB ->  1635 MB
 6277 MB ->  6085 MB
2019-08-15 20:17:36 +02:00
Bartosz Taudul
b322d20c19 Store received timestamps offset to 0. 2019-08-15 20:17:36 +02:00
Bartosz Taudul
659907c972 Store srcloc identifiers using 16 bit.
This reduces various structure sizes by 2 bytes. Memory usage reduction
on various traces:

big               11 GB -> 10.62 GB
chicken         2436 MB ->  2342 MB
drl-light-big   1761 MB ->  1706 MB
q3bsp-mt        6469 MB ->  6277 MB
2019-08-15 20:15:48 +02:00
Bartosz Taudul
416113fdcb Drop support for ETC1 frame images. 2019-08-15 16:29:50 +02:00
Bartosz Taudul
a194c93740 Allow checking if context switch data is available. 2019-08-14 20:26:55 +02:00
Bartosz Taudul
9a364fe5fe Cache context switch data queries. 2019-08-14 20:16:11 +02:00
Bartosz Taudul
9417ad994d Save/load context switch data. 2019-08-13 13:10:37 +02:00
Bartosz Taudul
8c494eabbf Display number of context switch regions. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
0b03fed61c Add context switch accessor. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
419f74280d Store context switches. 2019-08-13 02:35:32 +02:00
Bartosz Taudul
8aa0be39d5 Drop support for CPU id queries. 2019-08-12 23:05:34 +02:00
Bartosz Taudul
154c902e03 Handle legacy file versions. 2019-08-12 12:36:37 +02:00
Bartosz Taudul
a76622d17a Cache last searched ThreadData. 2019-08-03 14:35:01 +02:00
Bartosz Taudul
12969ee497 Track thread context.
This change exploits the fact that events are processed in batches
originating from a single thread. A single message changing thread
context is enough to handle multiple messages, as opposed to inclusion
of thread identifier in each message.
2019-08-02 20:18:08 +02:00
Bartosz Taudul
a4e7a341c0 Proper handling of disconnect request. 2019-08-01 23:14:09 +02:00
Bartosz Taudul
8c9d46ef29 Display application info in info window. 2019-07-12 18:39:07 +02:00
Bartosz Taudul
d64ab7db5a Store app info messages. 2019-07-12 18:34:46 +02:00
Bartosz Taudul
10bcc8c770 Switch to DXT1 textures in profiler utility. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
7dc7ece2bd Add staging area for frame images.
Compressing frame images on a separate thread may cause frame image
arrival before frames are sent. Fix this issue by creating a staging
area in which frame images will wait for frames to arrive.

This probably breaks playback functionality, as non-existent frames may
be queried, but this problem seems to be very hard to find, so let's
ignore it for now.
2019-06-27 13:24:35 +02:00
Bartosz Taudul
8f7be5a0fa Allow only 2^32-1 frame images. 2019-06-22 14:11:45 +02:00
Bartosz Taudul
6a82f666a7 Cosmetics. 2019-06-22 14:05:18 +02:00
Bartosz Taudul
31a4a45b14 Ignore memory free faults if running on apple.
There's a case in MoltenVK initialization where overloading operator new
and operator delete works for std::string destruction, but not
construction.
2019-06-13 14:15:17 +02:00
Bartosz Taudul
38b76ea32d Add frame images vector accessor. 2019-06-12 00:14:44 +02:00
Bartosz Taudul
2c780f1af4 Allow sending immediate termination query from server. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
50cda7720f Handle frame image instrumentation failures. 2019-06-09 13:44:53 +02:00
Bartosz Taudul
bef1988800 Compress frame images using LZ4. 2019-06-08 12:17:18 +02:00
Bartosz Taudul
646e7327b8 Show loading progress of frame images. 2019-06-06 23:40:37 +02:00
Bartosz Taudul
f8a4909c96 Display number of frame images in a trace. 2019-06-06 23:21:36 +02:00
Bartosz Taudul
6b2741ccdb Save/load frame images. 2019-06-06 23:08:19 +02:00
Bartosz Taudul
af56f41e32 Add frame image accessor. 2019-06-06 22:14:51 +02:00
Bartosz Taudul
e5bb6011c5 Frame image transfer prototype. 2019-06-06 21:39:54 +02:00
Bartosz Taudul
5681096486 Track status of worker background tasks. 2019-06-02 15:00:38 +02:00
Bartosz Taudul
845f3a2ddf Use std::shared_mutex for locking worker access. 2019-05-28 19:21:53 +02:00
Bartosz Taudul
b3812146cb Fix atomics initialization. 2019-05-27 14:09:55 +02:00
Bartosz Taudul
efc54babe3 Transfer of colored messages. 2019-05-10 20:17:44 +02:00
Bartosz Taudul
20e6813461 Store send queue size in mbps block. 2019-04-01 19:55:37 +02:00
Bartosz Taudul
57dff0abc9 Add server query queue. 2019-04-01 19:26:50 +02:00
Bartosz Taudul
c07c6d11b7 Define server query packet. 2019-04-01 19:21:53 +02:00
Bartosz Taudul
a632d9e2a3 Add zone vector cache.
Zone children will be now collected in staging vectors. When the zone is
ended (and no children can be added anymore to it), a size-fitted vector
is allocated using slab allocation. The over-allocated vector is then
put into cache for use in future zones.

This is only active for vectors <= 8192 elements, or 64 KB (chosen
arbitrarily), to reduce time spent on copying memory.

Overall, this change should have the following effects:
- System memory allocation pressure reduction, due to re-usage of
  vectors, which eliminates the need for constant growth.
- Reduction of memory usage, because children vectors are now fitted to
  required size.
- Slight increase of zone processing time, due to memory copying?
2019-03-26 22:06:00 +01:00
Bartosz Taudul
11f4dcbf1e Consistent variable naming. 2019-03-26 21:41:44 +01:00
Bartosz Taudul
71e20e7e7f Store lock map as flat_hash_map with pointer values. 2019-03-16 02:50:51 +01:00
Bartosz Taudul
a0299cc63a Optimize calculation of standard deviation. 2019-03-14 01:23:37 +01:00
Bartosz Taudul
f57cac9042 Initialize SourceLocationZones in-place. 2019-03-14 01:15:19 +01:00
Bartosz Taudul
ebf09bebae Only one callstack may be in-flight at any time.
Save for the allocated callstack, but this will be solved in another
way. There's no need to keep pending callstacks in a map.
2019-03-05 19:44:08 +01:00
Bartosz Taudul
3e81e44d75 Separate processing for allocated callstacks.
Only native callstack is used for the moment.
2019-03-05 02:42:50 +01:00
Bartosz Taudul
dc74297439 Add missing const qualifiers. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
bef31ba073 Separate message for zone begin with alloc src loc and callstack. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
66b8a13e77 Store callstack alloc payloads. 2019-03-03 18:05:03 +01:00
Bartosz Taudul
9fc022346b Replace frame pointers with callstack frame ids. 2019-03-03 18:05:03 +01:00