This adds support for lowering vector.broadcast ops to SME, if the
source is either a scalar, 0-d vector, or 1-d vector, and the result a
2-d scalable vector that aligns with SME tiles.
This follows on from D157005 which introduced a vector to tile slice op
that moves a 1-d scalable vector to a slice of a 2-d scalable vector
(tile). The lowering from vector.broadcast is similar, a couple of
helper functions are added to prevent duplication.
Lowering of vector.broadcast contributes towards a path from linalg.fill
to SME.
Depends on D157005
Reviewed By: awarzynski, dcaballe
Differential Revision: https://reviews.llvm.org/D158586
This adds a 'move_vector_to_tile_slice' op to the ArmSME dialect that
moves a 1-D scalable vector to a slice of a 2-D tile at a given index.
This is lowered to the 'llvm.aarch64.sme.write.horiz' intrinsic that
maps to the MOVA (vector to tile, single) SME instruction [1] when
lowering to LLVM. Like the SME load and store instructions this operates
on ZA tile slices, which are 1D vectors of horizontally or vertically
contiguous elements within a ZA tile.
This patch extends the lowering of 'arith.constant' to SME to support
non-zero constants using this new op. This requires materializing a
loop that broadcasts the constant to each tile slice with the
'vector_to_tile_slice' op. Unlike load and store, this is done during
conversion from Vector to ArmSME, rather than ArmSME to SCF. The latter
would require a higher-level custom op in the ArmSME dialect like
'tile_load' and 'tile_store' and this isn't necessary. We may also
remove the load and store ops in the future in favour of lowering
straight from Vector, at which point this would converge.
Currently only horizontal tile slices are supported. A future patch will
extend this mechanism to support 'vector.broadcast'.
Depends on D156980 D157004
[1] https://developer.arm.com/documentation/ddi0602
Reviewed By: awarzynski, dcaballe
Differential Revision: https://reviews.llvm.org/D157005
The arm_sme.zero op currently only supports 8-bit element tiles. This
extends the op and lowering from 'arith.constant dense<0>' ->
'arm_sme.zero' to support all tile types.
The lowering from arm_sme.zero to intrinsics is not updated as part of
this patch and will be done separately.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D156980
Enables the lowering of other tile types and values to match the
vector.store -> arm_sme.tile_store lowering.
Reviewed By: awarzynski, dcaballe
Differential Revision: https://reviews.llvm.org/D156976
An 'arith.constant dense<0>' is currently lowered to 'arm_sme.zero' as
part of the 'vector.transfer_write' lowering during '-vector-to-arm-sme'
conversion. This patch makes this lowering independent of the
'vector.transfer_write'. This can then be extended for further tile
types and non-zero constants.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D156802
This extends the existing 'arm_sme.tile_store' op to support all tile
sizes and adds a new op 'arm_sme.tile_load', as well as lowerings from
vector -> custom ops and custom ops -> intrinsics. Currently there's no
lowering for i128.
Depends on D154867
Reviewed By: awarzynski, dcaballe
Differential Revision: https://reviews.llvm.org/D155306
At the moment, the lowering from the Vector dialect to SME looks like
this:
* Vector --> SME LLVM IR intrinsics
This patch introduces a new lowering layer between the Vector dialect
and the Arm SME extension:
* Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics.
This is motivated by 2 considerations:
1. Storing `ZA` to memory (e.g. `vector.transfer_write`) requires an
`scf.for` loop over all rows of `ZA`. Similar logic will apply to
"load to ZA from memory". This is a rather complex transformation and
a custom Op seems justified.
2. As discussed in [1], we need to prevent the LLVM type converter from
having to convert types unsupported in LLVM, e.g.
`vector<[16]x[16]xi8>`. A dedicated abstraction layer with custom Ops
opens a path to some fine tuning (e.g. custom type converters) that
will allow us to avoid this.
To facilitate this change, two new custom SME Op are introduced:
* `TileStoreOp`, and
* `ZeroOp`.
Note that no new functionality is added - these Ops merely model what's
already supported. In particular, the following tile size is assumed
(dimension and element size are fixed):
* `vector<[16]x[16]xi8>`
The new lowering layer is introduced via a conversion pass between the
Vector and the SME dialects. You can use the `-convert-vector-to-sme`
flag to run it. The following function:
```
func.func @example(%arg0 : memref<?x?xi8>) {
// (...)
%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8>
return
}
```
would be lowered to:
```
func.func @example(%arg0: memref<?x?xi8>) {
// (...)
%0 = arm_sme.zero : vector<[16]x[16]xi8>
arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
return
}
```
Later, a mechanism will be introduced to guarantee that `arm_sme.zero`
and `arm_sme.tile_store` operate on the same virtual tile. For `i8`
elements this is not required as there is only one tile.
In order to lower the above output to LLVM, use
* `-convert-vector-to-llvm="enable-arm-sme"`.
[1] https://github.com/openxla/iree/issues/14294
Reviewed By: WanderAway
Differential Revision: https://reviews.llvm.org/D154867