Here we add a scheduling mutation in pre-ra scheduling, which will
add an artificial dependency edge between mask producer and its
previous nearest instruction that uses V0 register.
This prevents the overlap of live intervals of mask registers and
as a consequence we can reduce some spills/moves.
From the test changes, we can see some improvements and also some
regressions (more vtype toggles).
Partially fixes#113489.