We (Julia) ship both support for using tracy to trace julia applications,
as well as using `rr` (https://github.com/rr-debugger/rr) for record-replay debugging.
After our most recent rebuild of tracy, users have been reporting signfificant performance
slowdowns when `rr` recording a session that happens to also load the tracy library
(even if tracing is not enabled). Upon further examination, the recompile happened
to trigger a protective heuristic that disabled rr's patching of tracy's use of
`rdtsc` because an earlier part of the same function happened to look like a
conditional branch into the patch region. See https://github.com/rr-debugger/rr/pull/3580
for details. To avoid this issue occurring again in future rebuilds of tracy,
adjust tracy's `rdtsc` sequence to be `nopl; rdtsc`, which (as of of the
linked PR) is a sequence that is guaranteed to bypass this heuristic
and not incur the additional overhead when run under rr.
This functionality is kept behind a compile-time flag `TRACY_PATCHABLE_NOPSLEDS`
in order to avoid polluting the instruction cache unnecessarily.
Renamed TRACY_NO_SYS_TRACE -> TRACY_NO_SYSTEM_TRACING to match the
build flag name. Unlike the meson logic, the CMake logic directly
maps the option name to the build flag that is injected. With the
mismatched name, the flag wasn't being properly applied.
Added TRACY_TIMER_FALLBACK option to expose the same-named flag.
Moved signal.h include to get sigaction definition that was missing when
TRACY_NO_CALLSTACK was defined.