Use wrappers around `std::accumulate` to make the code more concise and
less bug-prone: https://github.com/llvm/llvm-project/pull/162129.
With `std::accumulate`, it's the initial value that determines the
accumulator type. `llvm::sum_of` and `llvm::product_of` pick the right
accumulator type based on the range element type.
Found some funny bugs like a local accumulate helper that calculated a
sum with initial value of 1 -- we didn't hit the bug because the code
was actually dead...
Extends Vector to AMX conversion to attempt populating AMX tiles
directly from memory.
When possible, contraction producers and consumers are replaced by AMX
tile data transfer operations. This shortens data path by skipping
intermediate register loads and stores.
Adds a pass for Vector to AMX operation conversion.
Initially, a direct rewrite for vector contraction in packed VNNI layout
is supported. Operations are expected to already be in shapes which are
AMX-compatible for the rewriting to occur.