Make it possible to use `s_alloc_vgpr` at the IR level. This is a huge
footgun and use for anything other than compiler internal purposes is
heavily discouraged. The calling code must make sure that it does not
allocate fewer VGPRs than necessary - the intrinsic is NOT a request to
the backend to limit the number of VGPRs it uses (in essence it's not so
different from what we do with the dynamic VGPR flags of the
`amdgcn.cs.chain` intrinsic, it just makes it possible to use this
functionality in other scenarios).