Extends Vector to AMX conversion to attempt populating AMX tiles
directly from memory.
When possible, contraction producers and consumers are replaced by AMX
tile data transfer operations. This shortens data path by skipping
intermediate register loads and stores.
Adds a pass for Vector to AMX operation conversion.
Initially, a direct rewrite for vector contraction in packed VNNI layout
is supported. Operations are expected to already be in shapes which are
AMX-compatible for the rewriting to occur.