Changes: * Decouple layout propagation from subgroup distribution and move it to an independent pass. * Refine layout assignment to handle control-flow ops correctly (scf.for, scf.while). * Refine test cases.