Weaken barriers for atomic ops to the form that's just enough to enforce memory model constraints. In particular, we try to avoid emitting expensive #StoreLoad barriers whenever possible. The barriers emitted conform to V9's RMO and V8's PSO memory model, and is compatible with GCC's lowering. A quick test with `pgbench` on a T4-1 shows some small (up to about 4%), but consistent speedup.