Commit Graph

577 Commits

Author SHA1 Message Date
Bartosz Taudul
4c28593031 Fix races in rpmalloc initialization.
Ensure rpmalloc_thread_initialize() int worker threads is called only after
rpmalloc_initialize() was called on the main profiler thread.
2019-07-19 19:25:27 +02:00
Bartosz Taudul
cef8124247 Replace or with addition to enable usra instruction. 2019-07-19 01:40:27 +02:00
Bartosz Taudul
fd4689a6e2 Don't perform unnecessary ands. 2019-07-19 01:19:52 +02:00
Bartosz Taudul
f65373ece7 Replace two packs with one shuffle. 2019-07-13 20:01:12 +02:00
Bartosz Taudul
fc83f97ad3 Same for AVX/SSE. 2019-07-13 19:34:08 +02:00
Bartosz Taudul
62a167541c No need to mask out indices. 2019-07-13 19:07:25 +02:00
Alex
0c5ea710b0 Merged in z33ky/tracy/const-frame-image (pull request #37)
Constify frame-image pointer in API.
2019-07-13 13:09:21 +00:00
Bartosz Taudul
7bb9549e84 ARM64 specific NEON implementation of DXT1 compression. 2019-07-13 14:31:33 +02:00
Alexander 'z33ky' Hirsch
c6e8dc8d63 Constify frame-image pointer in API. 2019-07-13 12:33:55 +02:00
Bartosz Taudul
60d2384a6a Allow sending application information messages. 2019-07-12 18:34:46 +02:00
Bartosz Taudul
a1ce5fc1f6 Add include for built-in __get_cpuid() on gcc/clang. 2019-07-10 02:09:19 +02:00
Bartosz Taudul
c164a70b9d Check for rdstcp/invariant tsc support. 2019-07-10 02:04:14 +02:00
Bartosz Taudul
c0670848d2 Reuse variable. 2019-07-08 02:08:06 +02:00
Bartosz Taudul
17dbbe67de Remove dependency on range subtraction. 2019-07-08 00:14:36 +02:00
Bartosz Taudul
af1bd3e1fa Faster horizontal add. 2019-07-07 23:57:23 +02:00
Bartosz Taudul
b32e8fa24e Ditto for NEON. 2019-07-06 00:18:53 +02:00
Bartosz Taudul
d236d4b70f Ditto for AVX2. 2019-07-06 00:05:32 +02:00
Bartosz Taudul
f62b21c21d Masking alpha out is not needed.
We assume that alpha value is constant for the whole image. The range
calculation is max - min, so alpha zeroes out. The color normalization
to range is color - min, so alpha also zeroes out here.
2019-07-05 23:58:19 +02:00
Bartosz Taudul
03189a30b8 Two ands less in NEON DXT1 compression. 2019-07-05 18:37:25 +02:00
Bartosz Taudul
275d992cb1 Two ands less in AVX2 DXT1 compression. 2019-07-05 18:22:42 +02:00
Bartosz Taudul
c89358d6b9 Two ands less in SSE DXT1 compression. 2019-07-05 18:17:50 +02:00
Bartosz Taudul
5bfc62f1bf iOS device name decoding. 2019-06-19 09:59:46 +02:00
Bartosz Taudul
59b4f84ce5 Display unknown implementer, part as hex values. 2019-07-03 21:18:17 +02:00
Bartosz Taudul
c6f6c368b2 Decode ARM CPU names. 2019-07-03 21:01:34 +02:00
Bartosz Taudul
e26ab8e9f6 Make forwarding functions more compact. 2019-07-03 18:05:38 +02:00
Bartosz Taudul
bdfb568742 Fix div tables for max range on all channels. 2019-07-01 12:31:06 +02:00
Bartosz Taudul
684a119a2c Fix order of checks for including intrinsics. 2019-07-01 11:45:16 +02:00
Bartosz Taudul
983c48994b Write block data directly to memory. 2019-06-30 11:44:32 +02:00
Bartosz Taudul
9b8c18f99e Improve readability. 2019-06-30 11:44:00 +02:00
Bartosz Taudul
52b6bdb55a Force inline ProcessRGB functions. 2019-06-30 03:33:14 +02:00
Bartosz Taudul
8c06f7288c AVX2 DXT1 compression. 2019-06-30 03:20:58 +02:00
Bartosz Taudul
2e893bba91 Use division tables. 2019-06-29 12:16:49 +02:00
Bartosz Taudul
ab9f036f5e Integrate CheckSolid into ProcessRGB. 2019-06-29 02:04:08 +02:00
Bartosz Taudul
faf6bb97a4 DXT1 NEON color index packing. 2019-06-28 22:36:44 +02:00
Bartosz Taudul
2df1eaaa7e Pack color indices using SSE. 2019-06-28 21:58:10 +02:00
Bartosz Taudul
fcb5b4b888 NEON DXT1 compression. 2019-06-28 14:24:16 +02:00
Bartosz Taudul
e8d4ba492b Unify shifts. 2019-06-28 13:05:32 +02:00
Bartosz Taudul
be4900c822 NEON CheckSolid. 2019-06-28 01:47:04 +02:00
Bartosz Taudul
3c066f1527 Simplify code. 2019-06-27 22:40:03 +02:00
Bartosz Taudul
72a0d4c2ab Rest of SSE DXTC compression. 2019-06-27 22:29:44 +02:00
Bartosz Taudul
137b28e110 SSE CheckSolid. 2019-06-27 22:29:44 +02:00
Bartosz Taudul
3d590b6b8c Initialize rpmalloc in compression thread. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
1939c31165 Experimental DXT1 compressor. 2019-06-27 19:14:51 +02:00
Bartosz Taudul
79eb1b9029 Swap queue and dequeue only if queue has contents. 2019-06-27 13:37:09 +02:00
Bartosz Taudul
bb35f9a897 Compress frame images in a separate thread. 2019-06-27 13:24:35 +02:00
Bartosz Taudul
7ebd2162c6 Add ETC1 compression thread. 2019-06-26 22:57:24 +02:00
Bartosz Taudul
f565e11976 Store frame images in queue. 2019-06-26 22:52:24 +02:00
Bartosz Taudul
281dcf7c1f Cast to proper types. 2019-06-26 19:33:37 +02:00
Bartosz Taudul
8ce41b3543 Proper init order of thread local thread handle. 2019-06-26 19:32:52 +02:00
Bartosz Taudul
bc7f2c49c8 GetThreadHandle() might be used by application's code. 2019-06-25 15:44:49 +02:00
Bartosz Taudul
c749a2e3fe Add C API for plots and messages. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
48e08acb62 Add C API for frame markup. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
ee99ce833c Implement memory allocation tracking for C API. 2019-06-24 21:03:39 +02:00
Bartosz Taudul
281477f7f9 Tokens must be retrieved for each enqueue. 2019-06-24 20:12:14 +02:00
Bartosz Taudul
06a41708a7 Move TLS accesses close together. 2019-06-24 19:38:44 +02:00
Bartosz Taudul
c4f0965851 Don't use cached thread id to retrieve main thread id. 2019-06-24 19:38:07 +02:00
Bartosz Taudul
a56c47a6a0 Store thread handle in a thread local variable.
This saves us a non-inlineable function call. Thread local block is
accessed anyway, since we need to get the token, so we already have the
pointer and don't need to get it a second time (which is done inside
Windows' GetCurrentThreadId()). We also don't need to store the thread
id in ScopedZone anymore, as it was a micro-optimization to save us the
second GetThreadHandle() call.

This change has a measurable effect of reducing enqueue time from ~10 to
~8 ns.

A further optimization would be to completely skip thread handle
retrieval during zone capture and do it instead on retrieval of data
from the queue. Since each thread has its own producer ("token"), the
thread handle should be accessible during the dequeue operation. This is
a much more invasive change, that would require a) modification of the
queue, b) additional processing of dequeued data to inject the thread
handle.
2019-06-24 19:19:47 +02:00
Bartosz Taudul
fd9fc880a6 Send current time in on-demand welcome message. 2019-06-21 19:39:41 +02:00
Bartosz Taudul
5309e6d94a Broadcast client activity time. 2019-06-18 20:46:12 +02:00
Bartosz Taudul
aa5259b20a Use the same port (8086) for both TCP and UDP traffic. 2019-06-18 20:28:03 +02:00
Bartosz Taudul
0e5a7263d9 Define broadcast message, add versioning. 2019-06-18 20:26:40 +02:00
Bartosz Taudul
0b394c3f53 Don't need to keep last broadcast time in Profiler class. 2019-06-18 20:15:09 +02:00
Bartosz Taudul
11dc8e67e5 Change broadcast rate from 5s to 3s. 2019-06-17 19:57:17 +02:00
Bartosz Taudul
6bf8081f5b Remove debug leftovers. 2019-06-17 19:52:44 +02:00
Bartosz Taudul
de058d2a0d Don't hardcode broadcast port. 2019-06-17 18:37:34 +02:00
Bartosz Taudul
1b3b3a94a2 Broadcast protocol version and process name. 2019-06-17 18:34:35 +02:00
Bartosz Taudul
0b9ef7e514 Disable broadcast if TRACY_NO_BROADCAST is defined. 2019-06-17 18:18:58 +02:00
Bartosz Taudul
e609c0fdce UDP broadcast loop. 2019-06-17 02:25:09 +02:00
Bartosz Taudul
014c3ed63b Use non-reference, optimized NEON ETC1 compression. 2019-06-15 15:35:57 +02:00
Bartosz Taudul
ab4e99229d Indicate whether client is running on apple shitware. 2019-06-13 14:05:15 +02:00
Bartosz Taudul
e5d5abf59a Add NEON path for ETC1 compression. 2019-06-13 02:04:19 +02:00
Bartosz Taudul
d3e0163dd4 Add byteswap for apple. 2019-06-12 16:54:44 +02:00
Bartosz Taudul
37d1457b44 Frame image may need flipping. 2019-06-12 15:28:32 +02:00
Bartosz Taudul
04dd33f5c4 Fix mismatched linkage. 2019-06-11 23:51:12 +02:00
Rokas K. (rku)
c4e05b6264 Merged in rokups/tracy/dllimport-cleanup (pull request #36)
Clean up imported functions in multi-dll projects.

Approved-by: Till Rathmann <till.rathmann@gmx.de>
2019-06-11 15:04:34 +00:00
Bartosz Taudul
57b8b425ba Discard send buffer data after disconnect. 2019-06-10 02:11:29 +02:00
Bartosz Taudul
80dff1ede1 Add connection id for on-demand mode.
Long-lived zones could send their end events without begin events in a
following scenario:

1. On-demand connection is made.
2. Zone begin is emitted, m_active is set to true.
3. Connection is terminated.
4. A new connection is made.
5. Zone end is emitted, because m_active is true.

To this point it was assumed that all zone end events will happen before
a new connection is made, but it's not necessarily true.
2019-06-09 17:15:47 +02:00
Bartosz Taudul
0db9c73d76 Immediately react to connection termination. 2019-06-09 16:51:39 +02:00
Bartosz Taudul
cc5bad294a More strict memory ordering for on-demand connection status. 2019-06-09 16:48:00 +02:00
Bartosz Taudul
e2d42fae2f We're done here, don't try to send termination request. 2019-06-09 16:25:52 +02:00
Bartosz Taudul
496f866add Don't send data when connection is terminated.
There are only two cases for which HandleServerQuery() returns false.
Either data can't be read from the socket (which is checked by HasData()
call before calling HandleServerQuery()), or if the server sent
termination query. In both these cases there's no need to send data
anymore.
2019-06-09 16:19:40 +02:00
Bartosz Taudul
23e7850162 Make DequeueStatus enum class. 2019-06-09 16:14:30 +02:00
Bartosz Taudul
34d89d39a1 Prevent double freeing of socket. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
139299389b Add comments to client connection handling. 2019-06-09 16:10:49 +02:00
Bartosz Taudul
4c2ff80ac8 Restore frame counting for on-demand mode. 2019-06-09 15:23:01 +02:00
Bartosz Taudul
00a468162d Fix signed/unsigned comparison. 2019-06-08 00:57:25 +02:00
Bartosz Taudul
9ef128995a Add AVX2 version of etcpak. 2019-06-08 00:50:39 +02:00
Bartosz Taudul
7e9539ef2d AVX implies SSE 4.1. 2019-06-08 00:39:19 +02:00
Bartosz Taudul
784c4da53a Include frame offset in frame image message. 2019-06-07 20:09:29 +02:00
Rokas Kupstys
9bd1037347 Clean up imported functions in multi-dll projects. 2019-06-07 19:50:08 +03:00
Bartosz Taudul
d271634a95 Keep one ETC1 compression buffer. 2019-06-07 01:29:24 +02:00
Bartosz Taudul
34a6fe7055 _bswap may be already defined. 2019-06-07 01:07:51 +02:00
Bartosz Taudul
a654b642ef Compress frame images to ETC1 before sending. 2019-06-07 00:31:51 +02:00
Bartosz Taudul
aff3246f82 Add ETC1 compressor. 2019-06-07 00:31:51 +02:00
Bartosz Taudul
e5bb6011c5 Frame image transfer prototype. 2019-06-06 21:39:54 +02:00
Bartosz Taudul
b3812146cb Fix atomics initialization. 2019-05-27 14:09:55 +02:00
Bartosz Taudul
340837e202 Callstack decode for android api <= 21.
libbacktrace/elf.cpp:3249:3: error: use of undeclared identifier 'dl_iterate_phdr'
2019-05-22 14:14:30 +02:00
Bartosz Taudul
84efe070fe Make callstack logic more obvious. 2019-05-22 14:05:44 +02:00
Bartosz Taudul
efc54babe3 Transfer of colored messages. 2019-05-10 20:17:44 +02:00
Bartosz Taudul
9ec8704dad Don't include LZ4 headers in tracy headers.
The LZ4 implementation is wrapped in tracy namespace, but it also adds
some defines, which may conflict with other LZ4 implementations.
2019-05-01 12:57:42 +02:00