Packed_A must be copied repeatedly, not just for the first iteration of the outer tile. This fixes llvm.org/PR50557
Functions shared between generalized matrix-multiplication optimization and other post-reschedule optimizations (tiling, prevect) are moved into the schedule tree transformation utility ScheduleTreeTransform.