In some cases padding bubbles between sequential MFMA instructions may
lead to increased inter-wave performance. Add option to request to pad
some portion of these stall cycles with s_nops.
Fixes: SWDEV-326925
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D121437