Bartosz Taudul
14ec246659
Fix typo.
2020-04-24 00:55:57 +02:00
Bartosz Taudul
a5bff2f7e5
Sleep to force rescheduling main thread during init.
...
This fixes problems with first context switch data region possibly not being
available for the main thread, if no rescheduling was performed after sys
tracing has started.
2020-04-14 22:45:32 +02:00
Bartosz Taudul
3b85c51e5f
Search for free listen port, if default is occupied.
2020-04-13 21:40:52 +02:00
Bartosz Taudul
b389ccbb38
Issue just one read call when handling server queries.
2020-04-13 14:32:31 +02:00
Bartosz Taudul
1bbece649f
Implement socket read without exit check.
2020-04-13 14:22:58 +02:00
Bartosz Taudul
a2187565d1
Optimize non-native-size memcpy.
2020-04-13 13:45:21 +02:00
Bartosz Taudul
b69aaf04e9
Add support for QPC timer.
2020-04-07 22:01:31 +02:00
Bartosz Taudul
34b512d04b
Don't declare unused variables on cygwin.
2020-04-07 21:41:12 +02:00
Bartosz Taudul
8d9a611874
Get rid of unicode ifdefs.
2020-04-07 21:35:37 +02:00
Bartosz Taudul
69c5e667ae
Dynamically load Get/SetThreadDescription.
2020-04-07 21:33:03 +02:00
Bartosz Taudul
7fca642c3d
Compress full-quality DXT1 on AVX2 path.
2020-04-05 17:10:43 +02:00
Bartosz Taudul
a6468b6b6e
Sleep when clearing queues if listen port is occupied.
2020-04-04 21:08:13 +02:00
Bartosz Taudul
b2a8b53efa
Query source location of each assembly instruction.
2020-04-01 21:43:03 +02:00
Bartosz Taudul
0ba0125eb5
Cosmetics.
2020-04-01 21:42:14 +02:00
Bartosz Taudul
a8e8a4a167
Add code address to function, line decoder.
2020-04-01 21:41:33 +02:00
Bartosz Taudul
36ddd0b98b
Don't use new to allocate memory on the client.
2020-03-28 21:27:19 +01:00
Bartosz Taudul
9b8eb69886
Apparently sampled call stacks may be empty.
2020-03-28 16:09:44 +01:00
Bartosz Taudul
40281ce2a1
Add default no-op to switch.
2020-03-26 01:07:25 +01:00
Bartosz Taudul
add5b29d03
Report CPU architecture in welcome message.
2020-03-25 21:28:02 +01:00
Bartosz Taudul
ce449ac0e2
Notify server that parameter was handled.
2020-03-25 20:37:26 +01:00
Bartosz Taudul
f114ec3f80
Add code transfer from client to server.
2020-03-25 20:04:55 +01:00
Bartosz Taudul
3e0e120222
Add extra parameter to server queries.
2020-03-25 20:04:01 +01:00
Bartosz Taudul
c999a74d34
Symbol length transfer.
2020-03-25 18:32:36 +01:00
Bartosz Taudul
d47e6819a8
Collect symbol sizes.
2020-03-25 18:28:28 +01:00
Bartosz Taudul
6c0c508280
Ignore kernel-only stacks.
...
It is common to receive duplicate stack traces for the same timestamp
(and thread), one containing proper user-space stack, and the second one
containing only kernel frames. Discard the second one, as there's no
documentation how this should be interpreted and the kernel stack is
mostly useless.
2020-03-21 15:25:30 +01:00
Bartosz Taudul
c7afda2562
Exit processing loops when trace has stopped.
2020-03-10 18:56:49 +01:00
Bartosz Taudul
c6bb08355c
Allow specification of port through env variable.
2020-03-08 16:14:36 +01:00
Bartosz Taudul
1da62c2190
Send deferred lock names.
2020-03-08 15:05:35 +01:00
Bartosz Taudul
127224acc6
Send listen port in broadcast message.
2020-03-08 14:37:59 +01:00
Bartosz Taudul
14c896573d
Separate config for data and broadcast port.
2020-03-08 14:34:09 +01:00
Bartosz Taudul
2ffaa88c9e
Fix typo.
2020-03-08 14:19:08 +01:00
Bartosz Taudul
e7240cb77d
Custom lock name transfer.
2020-03-08 13:47:38 +01:00
Bartosz Taudul
f945278959
Fix rpmalloc on android.
2020-03-02 17:10:47 +01:00
Bartosz Taudul
c36ed4b8b8
Boring warning fixes.
2020-03-01 01:48:20 +01:00
Bartosz Taudul
c23984dd6a
Fix static assert in rpmalloc.
2020-03-01 01:31:31 +01:00
Bartosz Taudul
e9a32d5dc7
Greatly increase queue block size.
...
Previous block size could hold only 256 elements (8KB), which stressed
out the memory allocator. Storing 65536 elements (2MB) per block almost
completely reduces the allocator pressure.
2020-03-01 01:15:13 +01:00
Bartosz Taudul
82f463724c
Update rpmalloc to 1.4.0.
...
Notable changes: use C++11 atomics everywhere.
2020-03-01 01:02:25 +01:00
Bartosz Taudul
710a2a64e4
Fix copy pasta.
2020-02-27 14:08:56 +01:00
Bartosz Taudul
4346620afa
No need to copy module name.
2020-02-27 13:45:39 +01:00
Bartosz Taudul
fd8a9465d4
Cosmetics.
2020-02-27 13:40:41 +01:00
Bartosz Taudul
9ae71ac4ee
Dl_info doesn't destroy data.
2020-02-27 13:28:45 +01:00
Bartosz Taudul
5f6b3d2cd5
No need for module name intermediate buffer.
2020-02-27 13:24:36 +01:00
Bartosz Taudul
474383b656
Only copy symbol strings, if needed.
2020-02-27 13:17:26 +01:00
Bartosz Taudul
2df6f9068a
Don't retrieve symbol name for address.
2020-02-27 12:58:01 +01:00
Bartosz Taudul
be5793987e
Don't send symbol name.
2020-02-27 12:49:48 +01:00
Bartosz Taudul
56dce646cc
Symbol address decoding on unix.
2020-02-26 23:38:04 +01:00
Bartosz Taudul
4ddafdeeaf
Symbol address decoding for old androids.
2020-02-26 23:24:18 +01:00
Bartosz Taudul
7c506d5426
Remove unused variables.
2020-02-26 23:24:11 +01:00
Bartosz Taudul
26cee8acf0
Perform symbol information queries.
2020-02-26 22:35:15 +01:00
Bartosz Taudul
ef05570540
Symbol address decoding (win32 implementation).
2020-02-26 22:32:42 +01:00
Bartosz Taudul
03ff08a934
Increase max name size.
2020-02-26 22:32:09 +01:00
Bartosz Taudul
d1fcf80c2d
Move definition of max symbol name size to one place.
2020-02-26 22:30:11 +01:00
Bartosz Taudul
c0f49c648b
Validate size.
2020-02-26 22:27:10 +01:00
Bartosz Taudul
890cec9872
Retrieve symbol addresses on unix.
2020-02-26 02:25:45 +01:00
Bartosz Taudul
9231261d73
Retrieve image name on unix.
2020-02-26 02:11:51 +01:00
Bartosz Taudul
fe80a7ed46
Retrieve symbol address on old androids.
2020-02-26 02:06:44 +01:00
Bartosz Taudul
abf8c42a7c
Send module name.
2020-02-26 00:33:09 +01:00
Bartosz Taudul
7d0dac9ae2
Store callstack frame module name.
2020-02-26 00:32:47 +01:00
Bartosz Taudul
4cf520db93
Unify copying symbol strings.
2020-02-26 00:02:30 +01:00
Bartosz Taudul
c5b2d14f8c
Send sampling period in welcome message.
2020-02-25 23:12:31 +01:00
Bartosz Taudul
2b7f5091f1
Store sampling period.
2020-02-25 23:08:52 +01:00
Bartosz Taudul
3402d16548
Send symbol base address.
2020-02-25 23:03:40 +01:00
Bartosz Taudul
85ffe0ea04
Don't search module list for kernel addresses.
2020-02-24 23:04:53 +01:00
Bartosz Taudul
ece32b47df
Zero capacity is invalid.
2020-02-24 23:04:53 +01:00
Bartosz Taudul
9c9e854005
Replace list with vector.
...
Maybe next time let's not forget that there's already a custom
allocating vector available.
2020-02-24 23:04:53 +01:00
Bartosz Taudul
24cd73e366
Fix linux tracing with long pids.
2020-02-23 18:23:53 +01:00
Bartosz Taudul
0fa1d25d98
Disable trace annotations.
2020-02-23 18:20:48 +01:00
Bartosz Taudul
02d200878d
Process queue data in-place.
2020-02-23 15:18:24 +01:00
Bartosz Taudul
96034bca3e
Force inline AppendData(), NeedDataSize().
2020-02-23 14:44:19 +01:00
Bartosz Taudul
bd34c24b84
Increase block size.
2020-02-23 12:35:30 +01:00
Bartosz Taudul
26b13abac8
Pre-fill module cache.
2020-02-22 21:32:18 +01:00
Bartosz Taudul
0a02cf32be
Add module name cache.
2020-02-22 21:32:10 +01:00
Bartosz Taudul
096e8cd8ae
Retrieve module name if symbol name cannot be found.
2020-02-22 21:06:32 +01:00
Bartosz Taudul
d0930e9053
Use maximum possible sampling rate.
2020-02-22 19:08:15 +01:00
Bartosz Taudul
4502858407
Use maximum possible etw buffer size (1MB).
2020-02-22 18:52:38 +01:00
Bartosz Taudul
e270603117
Don't write reference time to memory in each iteration.
2020-02-22 18:52:37 +01:00
Bartosz Taudul
054a6f8563
Send time deltas in callstack sample data packets.
2020-02-22 16:42:47 +01:00
Bartosz Taudul
1ee80e0df5
Send/free callstack sample payloads.
2020-02-22 16:20:43 +01:00
Bartosz Taudul
3b0ed5337b
Provide TraceSetInformation() definition for cygwin.
2020-02-22 16:03:07 +01:00
Bartosz Taudul
baf8e6fe80
No support for sampling on 32-bit windows.
...
Note that 32-bit applications running on 64-bit windows will perform
sampling.
2020-02-22 14:16:04 +01:00
Bartosz Taudul
23fe3e623d
64-bit only version of callstack payload sender.
2020-02-22 14:05:01 +01:00
Bartosz Taudul
9e9c7db5b1
Send sampled call stacks.
2020-02-22 13:42:09 +01:00
Bartosz Taudul
f186540c4f
Fix callstack pointers in 32-bit builds.
2020-02-22 13:38:09 +01:00
Bartosz Taudul
9b9474ada1
Request stack traces for execution sampling events.
2020-02-22 13:13:49 +01:00
Bartosz Taudul
28d0f387ea
Setup execution sampling profiling.
2020-02-22 13:13:32 +01:00
Bartosz Taudul
ad77b4f73b
Store current process id.
2020-02-22 13:11:16 +01:00
Bartosz Taudul
1f671fbacc
Keep sys trace variables local.
2020-02-22 13:08:35 +01:00
Bartosz Taudul
539ccf5a61
Check provider id in etw callback.
2020-02-22 12:56:33 +01:00
Bartosz Taudul
0b82902618
Optimize scalar DXT1 compression.
2020-02-15 13:43:40 +01:00
Bartosz Taudul
838c0aaaa9
Check if BUS_MCEERR_AR and BUS_MCEERR_AO are defined.
2020-02-12 01:27:03 +01:00
Bartosz Taudul
2c8d519d70
Fix typo.
2020-02-11 15:12:06 +01:00
Bartosz Taudul
abfa4c65df
Update fun list of iDevices.
2020-02-10 16:13:32 +01:00
Bartosz Taudul
8d5f4d7363
Always use init once to initialize rpmalloc.
2020-01-30 20:08:34 +01:00
Bartosz Taudul
885fa16373
Don't retrieve connection id, if zone is not active.
2020-01-25 17:21:30 +01:00
Bartosz Taudul
aa94df0845
Replace rpmalloc_thread_initialize with InitRPMallocThread().
2020-01-25 17:16:08 +01:00
Bartosz Taudul
ab2fbd6164
Move ParamaterSetup() implementation to header.
2020-01-25 16:51:17 +01:00
Bartosz Taudul
13370dc01c
Hide RtlWalkFrameChain inside library.
2020-01-25 16:49:29 +01:00
Bartosz Taudul
a90004b983
Move Set/GetThreadName() to Tracy API.
2020-01-25 16:36:58 +01:00
Bartosz Taudul
6f31eb2a9d
Disable MSVC idiocy.
2020-01-20 22:49:03 +01:00
Bartosz Taudul
55d03cb03e
Hide async queue setup/commit behind macros.
2020-01-19 15:06:11 +01:00
Bartosz Taudul
25082b2bec
Don't report CPU topology if delayed init is active.
...
Reporting topology requires producer to be available, which creates a
deadlock during delayed init data structures construction.
Calling GetProducer() results in a call to GetProfilerThreadData(),
which in turn calls GetProfilerData() to construct its thread local
variable. However, at this point we already are calling
GetProfilerData() (to construct the profiler itself). This would result
in an incorrect double construction of data, but the code already
prevents this by allowing init code to be entered only once. Hence the
deadlock.
Currently this is a non-issue, as no platform which can report CPU
topology needs to use delayed init.
2020-01-14 19:41:34 +01:00
Bartosz Taudul
4f8eb53e8b
Capture exact tid to pid mapping on windows.
2020-01-14 02:06:22 +01:00
Bartosz Taudul
4ef2ce4622
Fix _mm256_cvtsi256_si32 on gcc.
2019-12-12 02:13:12 +01:00
Bartosz Taudul
129b80ef0f
Free source location, if zone is not active.
2019-12-06 00:42:42 +01:00
Bartosz Taudul
b9cdf2cbb7
Expose srcloc allocation in C API.
2019-12-06 00:25:52 +01:00
Bartosz Taudul
399b87fecc
Add allocated srcloc zone begin emit functions to C API.
2019-12-06 00:22:49 +01:00
Bartosz Taudul
68ff33d0ba
Extract source location allocation functionality.
2019-12-06 00:15:46 +01:00
Bartosz Taudul
e8fcc250a1
Report CPU topology on Linux.
2019-11-30 01:51:29 +01:00
Bartosz Taudul
712403e9fd
Transfer, display, save CPU topology data.
2019-11-29 22:41:41 +01:00
Bartosz Taudul
59371eef5a
Obtain CPU topology on windows.
2019-11-29 18:29:31 +01:00
thedmd
a1e2c533f6
libbacktrace: Add support for Mach-O (dSYM)
...
`macho.cpp` was backported from official libbacktrace repository.
2019-11-29 12:04:47 +01:00
Bartosz Taudul
a7d2d5f08b
Fix DeferItem() call.
2019-11-26 01:10:50 +01:00
Bartosz Taudul
4551553eb4
Implement setting client parameters from server.
2019-11-25 23:59:48 +01:00
Bartosz Taudul
c5c9dfb0c9
Native callstacks are now optional in allocated callstack messages.
2019-11-25 22:54:10 +01:00
Bartosz Taudul
37eef59d54
Implement reading sys time on BSD.
2019-11-21 20:41:57 +01:00
Bartosz Taudul
c7a22cc1ff
Use libbacktrace on BSD.
2019-11-21 20:41:57 +01:00
Bartosz Taudul
bd7b0a8197
Support callstack capture on BSD.
2019-11-21 02:34:42 +01:00
Bartosz Taudul
c79449a6a1
Get proper program name on BSD.
2019-11-21 02:16:12 +01:00
Bartosz Taudul
7940977dba
Report physical memory size on BSD.
2019-11-21 02:14:08 +01:00
Bartosz Taudul
3854ae11b2
Revert "Remove dead code."
...
This reverts commit a36b73f745
.
2019-11-17 17:38:02 +01:00
Bartosz Taudul
a36b73f745
Remove dead code.
2019-11-16 18:34:05 +01:00
Bartosz Taudul
8286b0b72f
Plumbing for message call stacks.
2019-11-14 23:40:41 +01:00
Bartosz Taudul
0befc75f83
Fix conflicts with X.h.
2019-11-14 18:24:29 +01:00
Bartosz Taudul
655864eb7c
Enable crash handler on cygwin.
...
Crash is properly recorded, but the profiler hangs while waiting for
shutdown finish.
2019-11-07 19:20:13 +01:00
Bartosz Taudul
3fd74a92f9
Native threads are used on mingw.
2019-11-07 19:02:54 +01:00
Bartosz Taudul
351e220d30
Don't calculate queue delay if delayed init is used.
...
Queue calibration requires queue access during profiler construction. This
in turn requires construction of profiler data block, *which at this point
is underway*, because the profiler is being constructed.
2019-06-19 17:29:04 +02:00
Bartosz Taudul
c98f1f0b6b
Make sure profiler is initialized only once in delayed init scenario.
2019-06-19 17:28:18 +02:00
Bartosz Taudul
d4f58ddaf3
Use native windows threads on cygwin, mingw.
2019-11-06 01:42:14 +01:00
Bartosz Taudul
ca198e44d3
Remove dead code from concurrentqueue.
2019-11-05 21:40:52 +01:00
Bartosz Taudul
b5590ed197
Include <mutex> for std::once.
2019-11-05 21:40:35 +01:00
Bartosz Taudul
3e9bb80217
More header cleanup.
2019-11-05 20:15:53 +01:00
Bartosz Taudul
6bbf273581
Partial header inclusion cleanup.
2019-11-05 20:09:40 +01:00
Bartosz Taudul
907574e637
Allow remote plot configuration.
2019-11-05 17:45:19 +01:00
Bartosz Taudul
f34609fd9b
Set per-cpu kernel buffer size to 512 KB.
...
The default setting was causing events to be lost on Android.
2019-11-03 21:52:20 +01:00
Bartosz Taudul
b8d459d48b
Use proper string size (for consistency).
...
On Android code path this value is ignored.
2019-11-03 21:51:49 +01:00
Bartosz Taudul
ca0fae33d1
Remove obsolete assert.
...
Before-terminate-events now include events that have time delta
processing, with no memory to free.
2019-11-01 20:10:24 +01:00
Bartosz Taudul
1f0c18882c
Don't collect sys time after application has exited.
2019-10-29 23:05:14 +01:00
Bartosz Taudul
0f2503d334
Send time deltas in GPU time events.
2019-10-25 19:52:01 +02:00
Bartosz Taudul
8fa5188176
Send delta times for context switches.
2019-10-25 19:13:11 +02:00
Bartosz Taudul
25b3cdc1ee
Send thread wakeups when handling disconnect request.
2019-10-25 18:22:42 +02:00
Bartosz Taudul
04b132b6e2
Check if requested data size doesn't overflow buffer.
2019-10-24 21:22:22 +02:00
Bartosz Taudul
ba61a9ed84
Transfer time deltas, not absolute times.
...
This change significantly reduces network bandwidth requirements.
Implemented for:
- CPU zones,
- GPU zones,
- locks,
- plots,
- memory events.
2019-10-24 00:06:41 +02:00
Bartosz Taudul
cf88265304
Full 64-bit register is set by rdtsc.
2019-10-21 01:13:55 +02:00
Bartosz Taudul
07b66cd4ab
Move fake source location out of loop.
2019-10-20 22:18:05 +02:00
Bartosz Taudul
909503403b
Simplify delay calibration.
2019-10-20 22:13:29 +02:00
Bartosz Taudul
c774534b47
Use rdtsc instead of rdtscp.
...
But rdtscp is serializing!
No, it's not. Quoting the Intel Instruction Set Reference:
"The RDTSCP instruction is not a serializing instruction, but it does
wait until all previous instructions have executed and all previous
loads are globally visible. But it does not wait for previous stores to
be globally visible, and subsequent instructions may begin execution
before the read operation is performed.",
"The RDTSC instruction is not a serializing instruction. It does not
necessarily wait until all previous instructions have been executed
before reading the counter. Similarly, subsequent instructions may begin
execution before the read operation is performed."
So, the difference is in waiting for prior instructions to finish
executing. Notice that even in the rdtscp case, execution of the
following instructions may commence before time measurement is finished
and data stores may be still pending.
But, you may say, Intel in its "How to Benchmark Code Execution Times"
document shows that using rdtscp is superior to rdstc. Well, not
exactly. What they do show is that when a *single function* is
considered, there are ways to measure its execution time with little to
no error.
This is not what Tracy is doing.
In our case there is no way to determine absolute "this is before" and
"this is after" points of a zone, as we probably already are inside
another zone. Stopping the CPU execution, so that a deeply nested zone
may be measured with great precision, will skew the measurements of all
parent zones.
And this is not what we want to measure, anyway. We are not interested
in how a *single function* behaves, but how a *whole program* behaves.
The out-of-order CPU behavior may influence the measurements? Good! We
are interested in that. We want to see *how* the code is really
executed. How is *stopping* the CPU to make a timer read an appropriate
thing to do, when we want to see how a program is performing?
At least that's the theory.
And besides all that, the profiling overhead is now reduced.
2019-10-20 20:52:33 +02:00
Bartosz Taudul
30fc2f02ab
Omit calculation of on-stack variable address.
2019-10-20 19:42:29 +02:00
Bartosz Taudul
c3870f8837
Use proper type.
2019-10-10 20:30:08 +02:00
Bartosz Taudul
707f113bda
Add missing NOMINMAX definitions.
2019-10-10 20:29:06 +02:00
Bartosz Taudul
7cf3608493
Avoid unused variables.
2019-10-05 02:11:45 +02:00