pushd %~dp0 will push the directory that contains the file as the working directory and we also pop it at the end to be a good citizen and restore the previous working directory
'program_invocation_short_name' is Linux-specific; other OSs such as
macOS do not support it.
Fixes build break on macOS 12.2 with _GNU_SOURCE defined.
pthread_setname_np() can only set the name of the current thread on
macOS, so only pass a single name argument.
Fixes build break on macOS 12.2 with _GNU_SOURCE defined.
Consider running the following code with operator new and delete overloaded to
track allocations with call stacks:
std::thread( []({ thread_local std::string str; });
Each call stack requires a memory allocation to be performed by the profiler,
to make the stack available at a later time. When the thread is created, the
TLS block is initialized and the std::string buffer can be allocated. To track
this allocation, rpmalloc has to be initialized. This initialization also
happens within the TLS block.
Now, when the thread exits, the heap managed by rpmalloc may be released first
during the TLS block destruction (and if the destruction is performed in
reverse creation order, then it *will* be destroyed first, as rpmalloc was
initialized only after the std::string initialization, to track the allocation
performed within). The next thing to happen is destruction of std::string and
release of the memory block it contains.
The release is tracked by the profiler, and as mentioned earlier, to save the
call stack for later use, a memory allocation is needed. But the allocator is
no longer available in this thread, because rpmalloc was released just before!
As a solution to this issue, profiler will detect whether the allocator is
still available and will ignore the call stack, if it's not. The other
solution is to disable the rpmalloc thread cleanup, which may potentially
cause leak-like behavior, in case a large number of threads is spawned and
destroyed.
Note that this is not a water-tight solution. Other functions will still want
to allocate memory for call stacks, but it is rather unlikely that such calls
would be performed during TLS block destruction. It is also possible that the
event queue will run out of allocated space for events at this very moment,
and in such a case the allocator will also fail.
In case of CPU statistics data, this entry is created during creation of a
source location. This won't be done for GPU zones, as it would needlessly
expand the number of held entries. This is assuming the number of GPU zones
is significantly less than the number of CPU zones.