This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.
Additionally, adjust some comments, and remove partial code dealing with
larger-than-128bit atomics, as it's now unreachable.
AArch64 always supports 128-bit atomics, so there's no conditionals
needed here. (Though: we really ought to require that a 128-bit load is
available, not just a cmpxchg, which would mean conditioning on LSE2.
But that's future work.)
The arm64-irtranslator.ll test was adjusted as it was using an i258 type
as a hack to avoid IR atomic lowering to test GlobalISel behavior. Pass
-mattr=+lse and use i32, instead, to accomplish that goal in a way that
continues to work.