This patch extends the Linalg vectoriser so that scalar loads are
correctly identified as scalar rather than gather loads. Below is an
example of a scalar load (note that both indices are loop invariant):
```
func.func @example(%arg0: tensor<80x16xf32>, %arg2: tensor<1x4xf32>) -> tensor<1x4xf32> {
%c8 = arith.constant 8 : index
%c16 = arith.constant 16 : index
%1 = linalg.generic {
indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>],
iterator_types = ["parallel", "parallel"]
} outs(%arg2 : tensor<1x4xf32>) {
^bb0(%out: f32):
%2 = linalg.index 0 : index
%extracted = tensor.extract %arg0[%2, %c16] : tensor<80x16xf32>
linalg.yield %extracted : f32
} -> tensor<1x4xf32>
return %1 : tensor<1x4xf32>
}
```
This patch also makes sure that these scalar loads are indeed lowered to
a scalar load followed by a broadcast:
```
%extracted = tensor.extract %arg0[%1, %c16] : tensor<80x16xf32>
%2 = vector.broadcast %extracted : f32 to vector<1x4xf32>
```
Differential Revision: https://reviews.llvm.org/D149678