Bartosz Taudul
3a934b2ba3
Store children vectors in a separate data collection.
...
This reduces per-zone memory cost by 9 bytes if there are no children
and increases it by 4 bytes, if there are children. This is universally
a better solution, as the following data shows:
+++ /home/wolf/desktop/tracy-old/android.tracy +++
Vectors: 2794480
Size 0: 2373070 (84.92%)
Size 1: 70237 (2.51%)
Size 2+: 351173 (12.57%)
+++ /home/wolf/desktop/tracy-old/asset-new.tracy +++
Vectors: 1799227
Size 0: 1482691 (82.41%)
Size 1: 93272 (5.18%)
Size 2+: 223264 (12.41%)
+++ /home/wolf/desktop/tracy-old/asset-new-id.tracy +++
Vectors: 1977996
Size 0: 1640817 (82.95%)
Size 1: 97198 (4.91%)
Size 2+: 239981 (12.13%)
+++ /home/wolf/desktop/tracy-old/asset-old.tracy +++
Vectors: 1782395
Size 0: 1471437 (82.55%)
Size 1: 88813 (4.98%)
Size 2+: 222145 (12.46%)
+++ /home/wolf/desktop/tracy-old/big.tracy +++
Vectors: 180794047
Size 0: 172696094 (95.52%)
Size 1: 2799772 (1.55%)
Size 2+: 5298181 (2.93%)
+++ /home/wolf/desktop/tracy-old/darkrl.tracy +++
Vectors: 12014129
Size 0: 11611324 (96.65%)
Size 1: 134980 (1.12%)
Size 2+: 267825 (2.23%)
+++ /home/wolf/desktop/tracy-old/mem.tracy +++
Vectors: 383097
Size 0: 321932 (84.03%)
Size 1: 854 (0.22%)
Size 2+: 60311 (15.74%)
+++ /home/wolf/desktop/tracy-old/new.tracy +++
Vectors: 77536
Size 0: 63035 (81.30%)
Size 1: 8886 (11.46%)
Size 2+: 5615 (7.24%)
+++ /home/wolf/desktop/tracy-old/selfprofile.tracy +++
Vectors: 22940871
Size 0: 22704868 (98.97%)
Size 1: 73000 (0.32%)
Size 2+: 163003 (0.71%)
+++ /home/wolf/desktop/tracy-old/tbrowser.tracy +++
Vectors: 962682
Size 0: 695380 (72.23%)
Size 1: 43007 (4.47%)
Size 2+: 224295 (23.30%)
+++ /home/wolf/desktop/tracy-old/virtualfile_hc.tracy +++
Vectors: 529170
Size 0: 449386 (84.92%)
Size 1: 15694 (2.97%)
Size 2+: 64090 (12.11%)
+++ /home/wolf/desktop/tracy-old/zfile_hc.tracy +++
Vectors: 264849
Size 0: 220589 (83.29%)
Size 1: 9386 (3.54%)
Size 2+: 34874 (13.17%)
2018-07-22 16:05:50 +02:00
Bartosz Taudul
fc310ce15a
Fix check.
2018-07-17 18:29:07 +02:00
Rokas Kupstys
8a8faa3d6c
Added __has_include(<execution>) back.
2018-07-17 19:25:26 +03:00
Rokas Kupstys
5c75fe292f
Fix msvc builds when required c++ standard version is set to lower than c++17.
...
Also use latest available c++ standard which allows using older VS versions that only support c++14.
2018-07-17 18:29:48 +03:00
Bartosz Taudul
c6ea032de3
GPU source location may not yet be available.
2018-07-15 19:00:40 +02:00
Bartosz Taudul
21da3bca63
Don't create lz4buf on stack.
2018-07-14 16:02:33 +02:00
Bartosz Taudul
561d2dc360
Use the fastest mutex available.
...
The selection is based on the following test results:
MSVC:
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
No contention: 11.641 ns/iter
2 thread contention: 141.559 ns/iter
3 thread contention: 242.733 ns/iter
4 thread contention: 409.807 ns/iter
5 thread contention: 561.544 ns/iter
6 thread contention: 785.845 ns/iter
=> std::mutex
No contention: 19.190 ns/iter
2 thread contention: 39.305 ns/iter
3 thread contention: 58.999 ns/iter
4 thread contention: 59.532 ns/iter
5 thread contention: 103.539 ns/iter
6 thread contention: 110.314 ns/iter
=> std::shared_timed_mutex
No contention: 45.487 ns/iter
2 thread contention: 96.351 ns/iter
3 thread contention: 142.871 ns/iter
4 thread contention: 184.999 ns/iter
5 thread contention: 336.608 ns/iter
6 thread contention: 542.551 ns/iter
=> std::shared_mutex
No contention: 10.861 ns/iter
2 thread contention: 17.495 ns/iter
3 thread contention: 31.126 ns/iter
4 thread contention: 40.468 ns/iter
5 thread contention: 15.677 ns/iter
6 thread contention: 64.505 ns/iter
Cygwin (clang):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
No contention: 11.536 ns/iter
2 thread contention: 121.082 ns/iter
3 thread contention: 396.430 ns/iter
4 thread contention: 672.555 ns/iter
5 thread contention: 1327.761 ns/iter
6 thread contention: 14151.955 ns/iter
=> std::mutex
No contention: 62.583 ns/iter
2 thread contention: 3990.464 ns/iter
3 thread contention: 7161.189 ns/iter
4 thread contention: 9870.820 ns/iter
5 thread contention: 12355.178 ns/iter
6 thread contention: 14694.903 ns/iter
=> std::shared_timed_mutex
No contention: 91.687 ns/iter
2 thread contention: 1115.037 ns/iter
3 thread contention: 4183.792 ns/iter
4 thread contention: 15283.491 ns/iter
5 thread contention: 27812.477 ns/iter
6 thread contention: 35028.140 ns/iter
=> std::shared_mutex
No contention: 91.764 ns/iter
2 thread contention: 1051.826 ns/iter
3 thread contention: 5574.720 ns/iter
4 thread contention: 15721.416 ns/iter
5 thread contention: 27721.487 ns/iter
6 thread contention: 35420.404 ns/iter
Linux (x64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
No contention: 13.487 ns/iter
2 thread contention: 210.317 ns/iter
3 thread contention: 430.855 ns/iter
4 thread contention: 510.533 ns/iter
5 thread contention: 1003.609 ns/iter
6 thread contention: 1787.683 ns/iter
=> std::mutex
No contention: 12.403 ns/iter
2 thread contention: 157.122 ns/iter
3 thread contention: 186.791 ns/iter
4 thread contention: 265.073 ns/iter
5 thread contention: 283.778 ns/iter
6 thread contention: 270.687 ns/iter
=> std::shared_timed_mutex
No contention: 21.509 ns/iter
2 thread contention: 150.179 ns/iter
3 thread contention: 256.574 ns/iter
4 thread contention: 415.351 ns/iter
5 thread contention: 611.532 ns/iter
6 thread contention: 944.695 ns/iter
=> std::shared_mutex
No contention: 20.805 ns/iter
2 thread contention: 157.034 ns/iter
3 thread contention: 244.025 ns/iter
4 thread contention: 406.269 ns/iter
5 thread contention: 387.985 ns/iter
6 thread contention: 468.550 ns/iter
Linux (arm64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
No contention: 20.891 ns/iter
2 thread contention: 211.037 ns/iter
3 thread contention: 409.962 ns/iter
4 thread contention: 657.441 ns/iter
5 thread contention: 828.405 ns/iter
6 thread contention: 1131.827 ns/iter
=> std::mutex
No contention: 50.884 ns/iter
2 thread contention: 103.620 ns/iter
3 thread contention: 332.429 ns/iter
4 thread contention: 620.802 ns/iter
5 thread contention: 783.943 ns/iter
6 thread contention: 834.002 ns/iter
=> std::shared_timed_mutex
No contention: 64.948 ns/iter
2 thread contention: 173.191 ns/iter
3 thread contention: 490.352 ns/iter
4 thread contention: 660.668 ns/iter
5 thread contention: 1014.546 ns/iter
6 thread contention: 1451.553 ns/iter
=> std::shared_mutex
No contention: 64.521 ns/iter
2 thread contention: 195.222 ns/iter
3 thread contention: 490.819 ns/iter
4 thread contention: 654.786 ns/iter
5 thread contention: 955.759 ns/iter
6 thread contention: 1282.544 ns/iter
2018-07-14 00:39:01 +02:00
Bartosz Taudul
96042891f7
Reintroduce explicit template type for std::lock_guard.
...
Requested in issue #4 for support of older MSVC versions.
2018-07-13 12:30:29 +02:00
Bartosz Taudul
90a874f311
Require MSVC 15.7 for <execution> support.
2018-07-13 12:26:02 +02:00
Bartosz Taudul
c8b5b9447d
Ignore dangling memory frees in on-demand mode.
2018-07-12 01:35:32 +02:00
Bartosz Taudul
e5064dec1e
Store on-demand connection state.
2018-07-12 01:21:04 +02:00
Bartosz Taudul
d1ddaa8d59
Store frame offset in trace dumps.
2018-07-10 22:56:41 +02:00
Bartosz Taudul
a78981e040
Store on-demand frame offset.
2018-07-10 22:42:00 +02:00
Bartosz Taudul
6a9caabc63
Send on-demand initial payload message.
2018-07-10 22:37:39 +02:00
Bartosz Taudul
c056f3be41
Send keep alive messages to determine if client disconnected.
2018-07-10 21:39:17 +02:00
Bartosz Taudul
cb100e261c
Return custom zone names.
2018-06-29 16:12:40 +02:00
Bartosz Taudul
053284b1c7
Process custom free-form zone names.
2018-06-29 16:12:17 +02:00
Bartosz Taudul
865e8d8506
Extract zone name getting functionality.
2018-06-29 15:14:20 +02:00
Bartosz Taudul
4a467b6d03
Remove GPU resync leftovers.
2018-06-28 00:48:23 +02:00
Bartosz Taudul
ab2945b988
Slab allocator is not thread safe.
2018-06-24 17:10:46 +02:00
Bartosz Taudul
b0aa13f4af
Callstack getters are const.
2018-06-24 16:15:49 +02:00
Bartosz Taudul
11cf650be6
Fix GPU queries ordering.
...
With multithreaded Vulkan rendering it is possible that GPU time queries
will be sent in a different order than the originating CPU queries were
made. This commit changes the in-order queue to a map of queries,
waiting to be resolved.
2018-06-22 16:37:54 +02:00
Bartosz Taudul
af0c64c888
Remove GPU resync support.
...
The whole concept is not really reliable. And it forces CPU to GPU sync,
which is bad.
2018-06-22 16:34:51 +02:00
Bartosz Taudul
cd5ca3e754
Don't use hash table to store 256 pointers.
2018-06-22 15:14:44 +02:00
Bartosz Taudul
3a885bb8fd
Support callstack collection for OpenGL GPU zones.
2018-06-22 02:13:35 +02:00
Bartosz Taudul
35dc2f796e
Process GpuZoneBeginCallstack queue event.
2018-06-22 01:56:32 +02:00
Bartosz Taudul
4992ae6b39
Take callstack field in ZoneEvent into account in save/load.
2018-06-22 01:30:08 +02:00
Bartosz Taudul
5e01a8ead9
Process callstack queue event.
2018-06-22 01:15:49 +02:00
Bartosz Taudul
205a4e4ca2
Add callstack index to ZoneEvent.
2018-06-22 01:11:03 +02:00
Bartosz Taudul
978e168cbd
Handle ZoneBeginCallstack queue event.
...
This is identical to ZoneBegin handling, but requires some additional
bookkeeping to account for the incoming callstack information.
2018-06-22 01:07:25 +02:00
Bartosz Taudul
973eab2b4a
Fix typo.
2018-06-20 23:42:00 +02:00
Bartosz Taudul
2a618c90d5
Properly save compressed thread in GPU events.
2018-06-20 23:12:49 +02:00
Bartosz Taudul
7912807133
Wait for transfer of pending callback frames.
2018-06-20 14:57:48 +02:00
Bartosz Taudul
60395c85e0
Wait for pending callstacks.
2018-06-20 14:54:08 +02:00
Bartosz Taudul
9a5329b97d
Save and load callstack frames.
2018-06-20 01:59:25 +02:00
Bartosz Taudul
e56ee377f4
Fix off-by-one.
2018-06-20 01:54:27 +02:00
Bartosz Taudul
88b1955a5a
Filename in callstack frame is not a persistent pointer.
2018-06-20 01:26:05 +02:00
Bartosz Taudul
4000f27e15
Stack frame accessor.
2018-06-20 01:18:59 +02:00
Bartosz Taudul
0c0afa5ac7
Process callstack frames.
2018-06-20 01:07:09 +02:00
Bartosz Taudul
203744cdd9
Callstack frame queries.
2018-06-20 00:25:26 +02:00
Bartosz Taudul
06f34052a5
Have to track callstacks of both alloc and free.
2018-06-19 22:08:47 +02:00
Bartosz Taudul
0de279005b
Load saved callstack payload.
2018-06-19 22:05:15 +02:00
Bartosz Taudul
14b71e988b
Properly skip memory event data.
2018-06-19 22:05:15 +02:00
Bartosz Taudul
4033d74479
Callstack payload index 0 is invalid.
2018-06-19 22:05:15 +02:00
Bartosz Taudul
b6e71dd909
Load memory event callstack index.
2018-06-19 21:51:06 +02:00
Bartosz Taudul
7c1333ce2f
Save callstack payload.
2018-06-19 21:39:52 +02:00
Bartosz Taudul
2940230fcf
Save callstack index in memory events.
2018-06-19 21:39:42 +02:00
Bartosz Taudul
77db91253b
Assign callstack idx to memory event.
2018-06-19 21:34:36 +02:00
Bartosz Taudul
c28465aa7c
Store unique callstack payloads.
2018-06-19 21:16:02 +02:00
Bartosz Taudul
cbc9ede3ca
No-op callstack payload handling.
2018-06-19 19:31:16 +02:00
Bartosz Taudul
6a63d09a49
Don't check for each type, if range check is possible.
2018-06-19 19:31:16 +02:00
Bartosz Taudul
e51eef3dcd
Process memory events with callstack.
2018-06-19 18:52:45 +02:00
Bartosz Taudul
59dc55002b
Callstack ptr in server data structures.
...
Will be probably reduced to 32-bit index later on.
2018-06-19 18:52:10 +02:00
Bartosz Taudul
bb0631585c
Store thread id of GPU events.
2018-06-17 19:07:07 +02:00
Bartosz Taudul
cfd7ac3957
Map compressed thread id 0 to real thread id 0.
2018-06-17 19:03:06 +02:00
Bartosz Taudul
d5a4c693d8
Take GPU timestamp period into account.
2018-06-17 18:49:56 +02:00
Bartosz Taudul
dcd6cac078
Save GPU timestamp period.
...
Bump file version to 0.3.2.
2018-06-17 18:27:42 +02:00
Bartosz Taudul
2be1d1d2b2
Use proper type.
2018-06-07 13:30:46 +02:00
Bartosz Taudul
b7930f67da
Calculate total self time of zones.
2018-06-06 00:39:22 +02:00
Bartosz Taudul
53aea660c8
Store thread id in MessageData.
2018-05-25 21:10:38 +02:00
Bartosz Taudul
bb0246730f
Don't save MessageData padding.
...
This requires file version bump to 0.3.1.
2018-05-25 21:10:38 +02:00
Bartosz Taudul
312c20b0bc
Fallback to pdqsort if parallel STL is not available.
2018-05-12 22:41:18 +02:00
Bartosz Taudul
920bfc8c82
Parallelize (big) sorts in worker.
2018-05-08 01:40:22 +02:00
Bartosz Taudul
dbc963d55c
Drop template argument from std::lock_guard.
2018-05-08 01:25:16 +02:00
Bartosz Taudul
3768ed5dd7
Don't reconstruct mem plot if there's no mem event data.
2018-05-04 16:08:16 +02:00
Bartosz Taudul
e7ffe288e6
One less FileWrite::Write() call.
2018-05-04 15:11:19 +02:00
Bartosz Taudul
e058bb34c1
CompressThread body must be available.
2018-05-03 18:43:51 +02:00
Bartosz Taudul
b18841aa75
Store ordered list of memory frees.
2018-05-02 17:59:50 +02:00
Bartosz Taudul
754e79b443
Setup memory plot pointer on dump load.
2018-05-02 17:18:52 +02:00
Bartosz Taudul
7266a979c3
Omit stack.
2018-05-01 02:13:49 +02:00
Bartosz Taudul
8beb1c1a39
Add thread compression cache.
...
Observation: calls to CompressThread() are likely to be repeated with
the same value. Exploit that by storing last query and its result.
2018-05-01 01:29:25 +02:00
Bartosz Taudul
ec58aa4ce1
Don't increase vector size in each iteration.
2018-04-30 13:57:12 +02:00
Bartosz Taudul
553e3ca38b
Optimize mem plot reconstruction loop.
2018-04-30 13:45:36 +02:00
Bartosz Taudul
76f0c8fafe
Sort source location zones on a separate thread.
2018-04-30 03:54:09 +02:00
Bartosz Taudul
63e4f6fa04
Directly store values.
2018-04-30 03:30:19 +02:00
Bartosz Taudul
e5cb241c19
Optimize creation of vector of frees.
2018-04-29 13:40:47 +02:00
Bartosz Taudul
3eb73b8d43
Move memory plot reconstruction to a background thread.
2018-04-29 13:40:04 +02:00
Bartosz Taudul
bc84ebc338
Read/write LockEvent data in one go.
2018-04-29 03:41:58 +02:00
Bartosz Taudul
c5133e0b4e
Walk lockmap timeline pointer.
2018-04-29 03:41:58 +02:00
Bartosz Taudul
9769cc4d7d
Read/write most of MemEvent in one go.
2018-04-29 03:41:58 +02:00
Bartosz Taudul
d5f0f0939d
No need to track min memory usage.
...
At least if client instrumentation was not broken and the data makes
sense.
2018-04-29 02:57:20 +02:00
Bartosz Taudul
7fdc6f5453
Zero as initial max value is fine too.
2018-04-29 02:56:23 +02:00
Bartosz Taudul
723f98d24b
Overflow checks are not needed.
2018-04-29 02:47:25 +02:00
Bartosz Taudul
b06f445de9
Don't use stack to write two values...
2018-04-29 02:32:20 +02:00
Bartosz Taudul
333d3a92c8
Perform memory usage calculation on doubles.
2018-04-29 02:29:06 +02:00
Bartosz Taudul
aceaed25b9
Walk plot data pointer.
2018-04-29 02:11:47 +02:00
Bartosz Taudul
868fbace5a
Don't compress thread twice, if it's the same.
2018-04-29 02:04:51 +02:00
Bartosz Taudul
fdaebc2bd8
No need to perform space check here.
2018-04-29 01:38:54 +02:00
Bartosz Taudul
d64f0390da
Don't use std::sort.
2018-04-29 01:23:30 +02:00
Bartosz Taudul
7df7bf1745
Begin memory plot with no memory usage.
2018-04-28 16:26:45 +02:00
Bartosz Taudul
a0b8ed2e50
Restore memory plot when loading data dump.
2018-04-28 16:26:45 +02:00
Bartosz Taudul
d8bfe7de2e
Create memory plot based on memory alloc/free events.
2018-04-28 15:49:12 +02:00
Bartosz Taudul
cd34ed6968
Two plot types: user and memory.
...
Only user plots are saved in a dump file.
2018-04-28 15:48:05 +02:00
Bartosz Taudul
1fb47899b2
Fix skipping lock data with new dump version.
2018-04-22 01:26:51 +02:00
Bartosz Taudul
436cd2b6cf
Drop '###Profiler' from capture name.
2018-04-21 23:29:28 +02:00
Bartosz Taudul
d1e185e176
Cleanup message data.
2018-04-21 20:36:33 +02:00
Bartosz Taudul
4cd9cf5dd9
Cleanup zone data.
2018-04-21 20:34:29 +02:00
Bartosz Taudul
0de5bcacaf
Free plot data.
2018-04-21 20:12:16 +02:00
Bartosz Taudul
dda25cf66a
Cosmetics.
2018-04-21 20:11:59 +02:00
Bartosz Taudul
cb298893e7
Fix skipping lock data.
2018-04-21 16:02:36 +02:00