This speeds up the CI a bit (anecdotally ~10%) for those jobs, and it also helps ensure that we are clean w.r.t. Clang modules when we disable some of the carve-outs like no-localization or no-threads.
Implement the std::atomic_ref class template by reusing atomic_base_impl. Based on the work from https://reviews.llvm.org/D72240