Adds logic to detect cases where the llvm.amdgcn.wave.shuffle intrinsic
is being applied to an index operand that would make the result
equivalent to the various Row Share flavors of DPP16 operations, and
replaces the intrinsic and the instructions computing the index with an
equivalent llvm.amdgcn.update.dpp call.