The two types duplicated mostly the same values.
Here they are decomposed to carry orthogonal and complimentary
information.
Use `utils::IteratorType` instead of `mesh::IteratorType`. It now has
only parallel and reduction values.
Rename `Partial` to `ReductionKind`.
Add `getReductionLoopIteratorKinds` method to `ShardingInterface`.
This op is the inverse of all-gather. It is useful to have an explicit
concise representation instead of having a blob of slicing logic.
Add lowering for the op that slices from the tensor based on the
in-group process index.
Make resharding generate an all-slice instead of inserting the slicing
logic directly.
Rename
* Op mesh.cluster -> mesh.mesh
* Op mesh.cluster_shape -> mesh.mesh_shape
* variables and attributes.
The name `mesh` is more specific to what it really represents. It is a
mesh of devices.
The name `cluster` implies a broader posibility of device
configurations. When just the word `mesh` is used the meaning can often
be inferred from the context whether it refers to the mesh dialect or a
device mesh. The full name can be used when needed.
Remove the somewhat redundant rank attribute.
Before this change
```
mesh.cluster @mesh(rank = 3, dim_sizes = 2x3)
```
After
```
mesh.cluster @mesh(shape = 2x3x?)
```
The rank is instead determined by the provided shape. With this change
no longer `getDimSizes()` can be wrongly assumed to have size equal to
the cluster rank.
Now `getShape().size()` will always equal `getRank()`.
* Rename mesh.process_index -> mesh.process_multi_index.
* Add mesh.process_linear_index op.
* Add lowering of mesh.process_multi_index into an expression using
mesh.process_linear_index, mesh.cluster_shape and
affine.delinearize_index.
This is useful to lower mesh ops and prepare them for further lowering
where the runtime may have only the linear index of a device/process.
For example in MPI we have a rank (linear index) in a communicator.
Does transformations like
all_reduce(x) + all_reduce(y) -> all_reduce(x + y)
max(all_reduce(x), all_reduce(y)) -> all_reduce(max(x, y))
when the all_reduce element-wise op is max.
Added general rewrite pattern HomomorphismSimplification and
EndomorphismSimplification that encapsulate the general algorithm.
Made specialization for all-reduce with respect to
addf, addi, minsi, maxsi, minimumf and maximumf
in the Arithmetic dialect.