Bartosz Taudul
|
0bf0ceed3d
|
Track trace loading progress.
|
2018-07-28 17:59:17 +02:00 |
|
Bartosz Taudul
|
7d7877517e
|
Also remove child vectors from GPU events.
|
2018-07-22 19:47:01 +02:00 |
|
Bartosz Taudul
|
3a934b2ba3
|
Store children vectors in a separate data collection.
This reduces per-zone memory cost by 9 bytes if there are no children
and increases it by 4 bytes, if there are children. This is universally
a better solution, as the following data shows:
+++ /home/wolf/desktop/tracy-old/android.tracy +++
Vectors: 2794480
Size 0: 2373070 (84.92%)
Size 1: 70237 (2.51%)
Size 2+: 351173 (12.57%)
+++ /home/wolf/desktop/tracy-old/asset-new.tracy +++
Vectors: 1799227
Size 0: 1482691 (82.41%)
Size 1: 93272 (5.18%)
Size 2+: 223264 (12.41%)
+++ /home/wolf/desktop/tracy-old/asset-new-id.tracy +++
Vectors: 1977996
Size 0: 1640817 (82.95%)
Size 1: 97198 (4.91%)
Size 2+: 239981 (12.13%)
+++ /home/wolf/desktop/tracy-old/asset-old.tracy +++
Vectors: 1782395
Size 0: 1471437 (82.55%)
Size 1: 88813 (4.98%)
Size 2+: 222145 (12.46%)
+++ /home/wolf/desktop/tracy-old/big.tracy +++
Vectors: 180794047
Size 0: 172696094 (95.52%)
Size 1: 2799772 (1.55%)
Size 2+: 5298181 (2.93%)
+++ /home/wolf/desktop/tracy-old/darkrl.tracy +++
Vectors: 12014129
Size 0: 11611324 (96.65%)
Size 1: 134980 (1.12%)
Size 2+: 267825 (2.23%)
+++ /home/wolf/desktop/tracy-old/mem.tracy +++
Vectors: 383097
Size 0: 321932 (84.03%)
Size 1: 854 (0.22%)
Size 2+: 60311 (15.74%)
+++ /home/wolf/desktop/tracy-old/new.tracy +++
Vectors: 77536
Size 0: 63035 (81.30%)
Size 1: 8886 (11.46%)
Size 2+: 5615 (7.24%)
+++ /home/wolf/desktop/tracy-old/selfprofile.tracy +++
Vectors: 22940871
Size 0: 22704868 (98.97%)
Size 1: 73000 (0.32%)
Size 2+: 163003 (0.71%)
+++ /home/wolf/desktop/tracy-old/tbrowser.tracy +++
Vectors: 962682
Size 0: 695380 (72.23%)
Size 1: 43007 (4.47%)
Size 2+: 224295 (23.30%)
+++ /home/wolf/desktop/tracy-old/virtualfile_hc.tracy +++
Vectors: 529170
Size 0: 449386 (84.92%)
Size 1: 15694 (2.97%)
Size 2+: 64090 (12.11%)
+++ /home/wolf/desktop/tracy-old/zfile_hc.tracy +++
Vectors: 264849
Size 0: 220589 (83.29%)
Size 1: 9386 (3.54%)
Size 2+: 34874 (13.17%)
|
2018-07-22 16:05:50 +02:00 |
|
Bartosz Taudul
|
9291a88020
|
Zones can be now also grouped by call stack.
|
2018-07-21 20:26:13 +02:00 |
|
Bartosz Taudul
|
561d2dc360
|
Use the fastest mutex available.
The selection is based on the following test results:
MSVC:
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
No contention: 11.641 ns/iter
2 thread contention: 141.559 ns/iter
3 thread contention: 242.733 ns/iter
4 thread contention: 409.807 ns/iter
5 thread contention: 561.544 ns/iter
6 thread contention: 785.845 ns/iter
=> std::mutex
No contention: 19.190 ns/iter
2 thread contention: 39.305 ns/iter
3 thread contention: 58.999 ns/iter
4 thread contention: 59.532 ns/iter
5 thread contention: 103.539 ns/iter
6 thread contention: 110.314 ns/iter
=> std::shared_timed_mutex
No contention: 45.487 ns/iter
2 thread contention: 96.351 ns/iter
3 thread contention: 142.871 ns/iter
4 thread contention: 184.999 ns/iter
5 thread contention: 336.608 ns/iter
6 thread contention: 542.551 ns/iter
=> std::shared_mutex
No contention: 10.861 ns/iter
2 thread contention: 17.495 ns/iter
3 thread contention: 31.126 ns/iter
4 thread contention: 40.468 ns/iter
5 thread contention: 15.677 ns/iter
6 thread contention: 64.505 ns/iter
Cygwin (clang):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
No contention: 11.536 ns/iter
2 thread contention: 121.082 ns/iter
3 thread contention: 396.430 ns/iter
4 thread contention: 672.555 ns/iter
5 thread contention: 1327.761 ns/iter
6 thread contention: 14151.955 ns/iter
=> std::mutex
No contention: 62.583 ns/iter
2 thread contention: 3990.464 ns/iter
3 thread contention: 7161.189 ns/iter
4 thread contention: 9870.820 ns/iter
5 thread contention: 12355.178 ns/iter
6 thread contention: 14694.903 ns/iter
=> std::shared_timed_mutex
No contention: 91.687 ns/iter
2 thread contention: 1115.037 ns/iter
3 thread contention: 4183.792 ns/iter
4 thread contention: 15283.491 ns/iter
5 thread contention: 27812.477 ns/iter
6 thread contention: 35028.140 ns/iter
=> std::shared_mutex
No contention: 91.764 ns/iter
2 thread contention: 1051.826 ns/iter
3 thread contention: 5574.720 ns/iter
4 thread contention: 15721.416 ns/iter
5 thread contention: 27721.487 ns/iter
6 thread contention: 35420.404 ns/iter
Linux (x64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
No contention: 13.487 ns/iter
2 thread contention: 210.317 ns/iter
3 thread contention: 430.855 ns/iter
4 thread contention: 510.533 ns/iter
5 thread contention: 1003.609 ns/iter
6 thread contention: 1787.683 ns/iter
=> std::mutex
No contention: 12.403 ns/iter
2 thread contention: 157.122 ns/iter
3 thread contention: 186.791 ns/iter
4 thread contention: 265.073 ns/iter
5 thread contention: 283.778 ns/iter
6 thread contention: 270.687 ns/iter
=> std::shared_timed_mutex
No contention: 21.509 ns/iter
2 thread contention: 150.179 ns/iter
3 thread contention: 256.574 ns/iter
4 thread contention: 415.351 ns/iter
5 thread contention: 611.532 ns/iter
6 thread contention: 944.695 ns/iter
=> std::shared_mutex
No contention: 20.805 ns/iter
2 thread contention: 157.034 ns/iter
3 thread contention: 244.025 ns/iter
4 thread contention: 406.269 ns/iter
5 thread contention: 387.985 ns/iter
6 thread contention: 468.550 ns/iter
Linux (arm64):
=== Lock test, 6 threads ===
=> NonRecursiveBenaphore
No contention: 20.891 ns/iter
2 thread contention: 211.037 ns/iter
3 thread contention: 409.962 ns/iter
4 thread contention: 657.441 ns/iter
5 thread contention: 828.405 ns/iter
6 thread contention: 1131.827 ns/iter
=> std::mutex
No contention: 50.884 ns/iter
2 thread contention: 103.620 ns/iter
3 thread contention: 332.429 ns/iter
4 thread contention: 620.802 ns/iter
5 thread contention: 783.943 ns/iter
6 thread contention: 834.002 ns/iter
=> std::shared_timed_mutex
No contention: 64.948 ns/iter
2 thread contention: 173.191 ns/iter
3 thread contention: 490.352 ns/iter
4 thread contention: 660.668 ns/iter
5 thread contention: 1014.546 ns/iter
6 thread contention: 1451.553 ns/iter
=> std::shared_mutex
No contention: 64.521 ns/iter
2 thread contention: 195.222 ns/iter
3 thread contention: 490.819 ns/iter
4 thread contention: 654.786 ns/iter
5 thread contention: 955.759 ns/iter
6 thread contention: 1282.544 ns/iter
|
2018-07-14 00:39:01 +02:00 |
|
Bartosz Taudul
|
c8b5b9447d
|
Ignore dangling memory frees in on-demand mode.
|
2018-07-12 01:35:32 +02:00 |
|
Bartosz Taudul
|
e5064dec1e
|
Store on-demand connection state.
|
2018-07-12 01:21:04 +02:00 |
|
Bartosz Taudul
|
a78981e040
|
Store on-demand frame offset.
|
2018-07-10 22:42:00 +02:00 |
|
Bartosz Taudul
|
053284b1c7
|
Process custom free-form zone names.
|
2018-06-29 16:12:17 +02:00 |
|
Bartosz Taudul
|
865e8d8506
|
Extract zone name getting functionality.
|
2018-06-29 15:14:20 +02:00 |
|
Bartosz Taudul
|
b0aa13f4af
|
Callstack getters are const.
|
2018-06-24 16:15:49 +02:00 |
|
Bartosz Taudul
|
858628918b
|
Force inline AddCallstackPayload.
|
2018-06-24 15:28:09 +02:00 |
|
Bartosz Taudul
|
af0c64c888
|
Remove GPU resync support.
The whole concept is not really reliable. And it forces CPU to GPU sync,
which is bad.
|
2018-06-22 16:34:51 +02:00 |
|
Bartosz Taudul
|
cd5ca3e754
|
Don't use hash table to store 256 pointers.
|
2018-06-22 15:14:44 +02:00 |
|
Bartosz Taudul
|
55ddb64352
|
GPU context counter is now 8 bit.
|
2018-06-22 15:10:23 +02:00 |
|
Bartosz Taudul
|
35dc2f796e
|
Process GpuZoneBeginCallstack queue event.
|
2018-06-22 01:56:32 +02:00 |
|
Bartosz Taudul
|
4992ae6b39
|
Take callstack field in ZoneEvent into account in save/load.
|
2018-06-22 01:30:08 +02:00 |
|
Bartosz Taudul
|
5e01a8ead9
|
Process callstack queue event.
|
2018-06-22 01:15:49 +02:00 |
|
Bartosz Taudul
|
978e168cbd
|
Handle ZoneBeginCallstack queue event.
This is identical to ZoneBegin handling, but requires some additional
bookkeeping to account for the incoming callstack information.
|
2018-06-22 01:07:25 +02:00 |
|
Bartosz Taudul
|
973eab2b4a
|
Fix typo.
|
2018-06-20 23:42:00 +02:00 |
|
Bartosz Taudul
|
7912807133
|
Wait for transfer of pending callback frames.
|
2018-06-20 14:57:48 +02:00 |
|
Bartosz Taudul
|
4000f27e15
|
Stack frame accessor.
|
2018-06-20 01:18:59 +02:00 |
|
Bartosz Taudul
|
0c0afa5ac7
|
Process callstack frames.
|
2018-06-20 01:07:09 +02:00 |
|
Bartosz Taudul
|
203744cdd9
|
Callstack frame queries.
|
2018-06-20 00:25:26 +02:00 |
|
Bartosz Taudul
|
4eea85fdad
|
Callstack payload accessor.
|
2018-06-19 22:19:20 +02:00 |
|
Bartosz Taudul
|
06f34052a5
|
Have to track callstacks of both alloc and free.
|
2018-06-19 22:08:47 +02:00 |
|
Bartosz Taudul
|
c28465aa7c
|
Store unique callstack payloads.
|
2018-06-19 21:16:02 +02:00 |
|
Bartosz Taudul
|
e51eef3dcd
|
Process memory events with callstack.
|
2018-06-19 18:52:45 +02:00 |
|
Bartosz Taudul
|
bb0631585c
|
Store thread id of GPU events.
|
2018-06-17 19:07:07 +02:00 |
|
Bartosz Taudul
|
b7930f67da
|
Calculate total self time of zones.
|
2018-06-06 00:39:22 +02:00 |
|
Bartosz Taudul
|
e058bb34c1
|
CompressThread body must be available.
|
2018-05-03 18:43:51 +02:00 |
|
Bartosz Taudul
|
8beb1c1a39
|
Add thread compression cache.
Observation: calls to CompressThread() are likely to be repeated with
the same value. Exploit that by storing last query and its result.
|
2018-05-01 01:29:25 +02:00 |
|
Bartosz Taudul
|
76f0c8fafe
|
Sort source location zones on a separate thread.
|
2018-04-30 03:54:09 +02:00 |
|
Bartosz Taudul
|
3eb73b8d43
|
Move memory plot reconstruction to a background thread.
|
2018-04-29 13:40:04 +02:00 |
|
Bartosz Taudul
|
d8bfe7de2e
|
Create memory plot based on memory alloc/free events.
|
2018-04-28 15:49:12 +02:00 |
|
Bartosz Taudul
|
36efe96e9d
|
Throw exception when trying to open unsupported dump version.
|
2018-04-21 14:18:42 +02:00 |
|
Bartosz Taudul
|
84fd351fba
|
Allow partial load of data from dump.
|
2018-04-20 16:03:09 +02:00 |
|
Bartosz Taudul
|
cd3bba8063
|
Memory data accessor.
|
2018-04-01 20:34:58 +02:00 |
|
Bartosz Taudul
|
a574f98f0c
|
Memory events are now serialized.
|
2018-04-01 20:13:01 +02:00 |
|
Bartosz Taudul
|
16a98c8c17
|
Move benaphore to common directory.
|
2018-04-01 18:59:55 +02:00 |
|
Bartosz Taudul
|
b12375815c
|
Broken memory events processing.
|
2018-04-01 02:03:34 +02:00 |
|
Bartosz Taudul
|
aa9d9575e0
|
Allow raw access to source location zones data.
|
2018-03-24 14:48:52 +01:00 |
|
Bartosz Taudul
|
d8ac7dee83
|
Expose worker data state (static/dynamic).
|
2018-03-24 14:43:57 +01:00 |
|
Bartosz Taudul
|
a9e1a9bddb
|
Calculate total time spent in source location.
This simple solution doesn't handle recursion at all.
|
2018-03-24 14:24:30 +01:00 |
|
Bartosz Taudul
|
fea0234a60
|
Change zone end "-1" comparisons to "0" comparisons.
|
2018-03-24 02:00:20 +01:00 |
|
Bartosz Taudul
|
6a4e58b545
|
Force inline compress/decompress thread id.
|
2018-03-24 01:31:58 +01:00 |
|
Bartosz Taudul
|
c0577fd5b2
|
Unordered map is no longer used.
|
2018-03-23 21:18:52 +01:00 |
|
Bartosz Taudul
|
f4b88b9c05
|
Use flat hash map for reverse plot lookup.
|
2018-03-23 21:18:00 +01:00 |
|
Bartosz Taudul
|
6cb2fec48e
|
Use flat hash map for string map.
|
2018-03-23 21:12:29 +01:00 |
|
Bartosz Taudul
|
69b49f527d
|
Inline GetZoneEndDirect().
|
2018-03-23 02:06:44 +01:00 |
|
Bartosz Taudul
|
765a1ececf
|
Move nohash<> from TracyWorker to flat hash map.
|
2018-03-20 15:40:11 +01:00 |
|
Bartosz Taudul
|
d0519499f4
|
Store thread id next to zone ptr in source location zone list.
|
2018-03-18 20:45:49 +01:00 |
|
Bartosz Taudul
|
777d672e05
|
Thread id compression/decompression.
|
2018-03-18 20:45:22 +01:00 |
|
Bartosz Taudul
|
3ac98beb5a
|
Use precalculated min/max time spans.
|
2018-03-18 20:20:24 +01:00 |
|
Bartosz Taudul
|
0f1f7c6813
|
Calculate min/max time spans for source locations.
|
2018-03-18 20:15:45 +01:00 |
|
Bartosz Taudul
|
43c3fe25ba
|
Put source location zone data into a struct.
|
2018-03-18 20:08:57 +01:00 |
|
Bartosz Taudul
|
7a4e7cbf86
|
Reduce data collection if TRACY_NO_STATISTICS is defined.
Statistical data collection is only useful if it's meant to be used.
Otherwise it only incurs CPU and memory cost.
|
2018-03-18 12:55:54 +01:00 |
|
Bartosz Taudul
|
4baea4a74f
|
Don't hash source location zones keys.
|
2018-03-18 03:25:14 +01:00 |
|
Bartosz Taudul
|
e6b3f373c5
|
Add direct zone end getter.
|
2018-03-18 02:53:00 +01:00 |
|
Bartosz Taudul
|
c807b3f7ef
|
Getter for source location zones.
|
2018-03-18 02:35:39 +01:00 |
|
Bartosz Taudul
|
9830fa297e
|
Store per-source-location zone lists.
|
2018-03-18 02:05:33 +01:00 |
|
Bartosz Taudul
|
81ff554c7d
|
Don't call ReadTimeline() when there's nothing to read.
|
2018-03-15 22:54:10 +01:00 |
|
Bartosz Taudul
|
b48602f5d1
|
Implement search for matching source locations.
|
2018-03-04 16:52:45 +01:00 |
|
Bartosz Taudul
|
f8c5f28372
|
Use Vector for source location expand storage.
|
2018-03-04 16:32:51 +01:00 |
|
Bartosz Szreder
|
bae1c02ad0
|
Worker thread will take care of itself.
|
2018-02-21 16:41:37 +01:00 |
|
Bartosz Szreder
|
9e3f18a62a
|
Split data handling code from the view.
|
2018-02-21 16:41:37 +01:00 |
|