Both Windows and Linux use 32-bit thread identifiers. MacOS has a 64-bit
counter, but in practice it will never overflow during profiling and no false
aliasing will happen.
These changes are only done client-side and in the network protocol. The
server still uses 64-bit thread identifiers, to enable virtual threads, etc.
Previous block size could hold only 256 elements (8KB), which stressed
out the memory allocator. Storing 65536 elements (2MB) per block almost
completely reduces the allocator pressure.
("The queue" is per-thread partial queue here.)
This fixes a problem where one thread writes to the queue, then is
terminated, making the (partially filled) queue available for other
threads to recycle. If another thread re-owns the queue, it will change
the associated thread id, while part of the queue was filled by the
original thread. This obviously created invalid data during dequeue.
The fix makes the recycling process check not only for queue inactivity
(which is marked when the original thread terminates), but also if the
queue is empty, preventing mixing data from different threads.