This PR adds support for lowering of `vector.reduction` and `vector.multi_reduction` ops in subgroup to work-item distribution. Following cases are considered currently (more support will be added later): * `vector.reduction` : This assumes the source vector is distributed to all lanes and lanes must shuffle data to do a collaborative reduction. result is shared among all lanes. This is done by emitting `gpu::ShuffleOp` s and doing a butterfly reduction. Refer `VectorDistribution` for more details. * `vector.multi_reduction`: 2 cases are considered, 1. **Reduction is lane-local**: simply lower to a lane local multi reduction op. each lane does its own reduction. result is distributed. 2. **Reduction is not lane-local:** This one is handled indirectly. In this case, we rewrite the reduction in terms of `vector.reduction` ops (plus exrtact. insert) before the WI distribution even begin. Then whole things is distributed using `gpu::ShuffleOp` s later (not fullly supported yet).
Multi-Level Intermediate Representation
See https://mlir.llvm.org/ for more information.