
Adding an extra argument before a `fp128` only changes the stack offset by four bytes, while it should instead go in the next 16-aligned slot. Add a test demonstrating the current behavior. `no_x86_scrub_sp` is added because offset from the stack pointer is needed to show the problem. Relevant issue: https://github.com/llvm/llvm-project/issues/77401