We (Julia) ship both support for using tracy to trace julia applications,
as well as using `rr` (https://github.com/rr-debugger/rr) for record-replay debugging.
After our most recent rebuild of tracy, users have been reporting signfificant performance
slowdowns when `rr` recording a session that happens to also load the tracy library
(even if tracing is not enabled). Upon further examination, the recompile happened
to trigger a protective heuristic that disabled rr's patching of tracy's use of
`rdtsc` because an earlier part of the same function happened to look like a
conditional branch into the patch region. See https://github.com/rr-debugger/rr/pull/3580
for details. To avoid this issue occurring again in future rebuilds of tracy,
adjust tracy's `rdtsc` sequence to be `nopl; rdtsc`, which (as of of the
linked PR) is a sequence that is guaranteed to bypass this heuristic
and not incur the additional overhead when run under rr.
This functionality is kept behind a compile-time flag `TRACY_PATCHABLE_NOPSLEDS`
in order to avoid polluting the instruction cache unnecessarily.
The windows.h header file defines the macro max. If the max macro is include
it will lead to name collisions with the std::numeric_limits<T>::max() function.
One solution is to define NOMINMAX before the inclusion of windows.h.
However, that might be a demanding task for a large codebase. Defining the
NOMINMAX as global define may also break previous code.
Another to solution to the problem is to wrap the numeric_limits function in
parenthesis to instruct the compiler to treat it as a function and not a macro.
This commit wraps the std::numeric_limits<T>::max() calls in the public
interfacing header files, with parenthesis.
This compile-time flag was being ignored on Linux. This change adds
gating for software-sampled stack trace sampling following the same
pattern as other `TRACY_NO_SAMPLE_*` options.
If `TRACY_NO_SAMPLING=1` is provided as an environment variable,
software stack sampling is also disabled.