llvm-project

Author SHA1 Message Date

Author	SHA1	Message	Date
Benjamin Maxwell	e2296d8295	[mlir][ArmSME] Lower extract from 2D scalable create_mask to psel (#96066 ) Example: ```mlir %mask = vector.create_mask %a, %b : vector<[4]x[8]xi1> %slice = vector.extract %mask[%index] : vector<[8]xi1> from vector<[4]x[8]xi1> ``` Becomes: ```mlir %mask_rows = vector.create_mask %a : vector<[4]xi1> %mask_cols = vector.create_mask %b : vector<[8]xi1> %slice = arm_sve.psel %mask_cols, %mask_rows[%index] : vector<[8]xi1>, vector<[4]xi1> ``` Note: While psel is under ArmSVE it requires SME (or SVE 2.1), so this is currently the most logical place for this lowering.	2024-06-20 10:27:07 +01:00
Benjamin Maxwell	06881d222d	[mlir][ArmSME] Add constructor for `-convert-vector-to-arm-sme` pass (NFC) (#71705 ) This will be needed to construct the pass downstream (i.e. in IREE).	2023-11-09 10:03:33 +00:00
Andrzej Warzynski	447bb5bee4	[mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME) At the moment, the lowering from the Vector dialect to SME looks like this: * Vector --> SME LLVM IR intrinsics This patch introduces a new lowering layer between the Vector dialect and the Arm SME extension: * Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics. This is motivated by 2 considerations: 1. Storing `ZA` to memory (e.g. `vector.transfer_write`) requires an `scf.for` loop over all rows of `ZA`. Similar logic will apply to "load to ZA from memory". This is a rather complex transformation and a custom Op seems justified. 2. As discussed in [1], we need to prevent the LLVM type converter from having to convert types unsupported in LLVM, e.g. `vector<[16]x[16]xi8>`. A dedicated abstraction layer with custom Ops opens a path to some fine tuning (e.g. custom type converters) that will allow us to avoid this. To facilitate this change, two new custom SME Op are introduced: * `TileStoreOp`, and * `ZeroOp`. Note that no new functionality is added - these Ops merely model what's already supported. In particular, the following tile size is assumed (dimension and element size are fixed): * `vector<[16]x[16]xi8>` The new lowering layer is introduced via a conversion pass between the Vector and the SME dialects. You can use the `-convert-vector-to-sme` flag to run it. The following function: ``` func.func @example(%arg0 : memref<?x?xi8>) { // (...) %cst = arith.constant dense<0> : vector<[16]x[16]xi8> vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8> return } ``` would be lowered to: ``` func.func @example(%arg0: memref<?x?xi8>) { // (...) %0 = arm_sme.zero : vector<[16]x[16]xi8> arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8> return } ``` Later, a mechanism will be introduced to guarantee that `arm_sme.zero` and `arm_sme.tile_store` operate on the same virtual tile. For `i8` elements this is not required as there is only one tile. In order to lower the above output to LLVM, use * `-convert-vector-to-llvm="enable-arm-sme"`. [1] https://github.com/openxla/iree/issues/14294 Reviewed By: WanderAway Differential Revision: https://reviews.llvm.org/D154867	2023-07-18 08:04:59 +00:00

Benjamin Maxwell

e2296d8295

[mlir][ArmSME] Lower extract from 2D scalable create_mask to psel (#96066 )

Example:
```mlir
%mask = vector.create_mask %a, %b : vector<[4]x[8]xi1>
%slice = vector.extract %mask[%index]
           : vector<[8]xi1> from vector<[4]x[8]xi1>
```
Becomes:
```mlir
%mask_rows = vector.create_mask %a : vector<[4]xi1>
%mask_cols = vector.create_mask %b : vector<[8]xi1>
%slice = arm_sve.psel %mask_cols, %mask_rows[%index]
           : vector<[8]xi1>, vector<[4]xi1>
```

Note: While psel is under ArmSVE it requires SME (or SVE 2.1), so this
is currently the most logical place for this lowering.

2024-06-20 10:27:07 +01:00

Benjamin Maxwell

06881d222d

[mlir][ArmSME] Add constructor for -convert-vector-to-arm-sme pass (NFC) (#71705 )

This will be needed to construct the pass downstream (i.e. in IREE).

2023-11-09 10:03:33 +00:00

Andrzej Warzynski

447bb5bee4

[mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME)

At the moment, the lowering from the Vector dialect to SME looks like
this:

  * Vector --> SME LLVM IR intrinsics

This patch introduces a new lowering layer between the Vector dialect
and the Arm SME extension:

  * Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics.

This is motivated by 2 considerations:
1. Storing `ZA` to memory (e.g. `vector.transfer_write`) requires an
   `scf.for` loop over all rows of `ZA`. Similar logic will apply to
   "load to ZA from memory". This is a rather complex transformation and
   a custom Op seems justified.
2. As discussed in [1], we need to prevent the LLVM type converter from
   having to convert types unsupported in LLVM, e.g.
   `vector<[16]x[16]xi8>`. A dedicated abstraction layer with custom Ops
   opens a path to some fine tuning (e.g. custom type converters) that
   will allow us to avoid this.

To facilitate this change, two new custom SME Op are introduced:

  * `TileStoreOp`, and
  * `ZeroOp`.

Note that no new functionality is added - these Ops merely model what's
already supported. In particular, the following tile size is assumed
(dimension and element size are fixed):

  * `vector<[16]x[16]xi8>`

The new lowering layer is introduced via a conversion pass between the
Vector and the SME dialects. You can use the `-convert-vector-to-sme`
flag to run it. The following function:
```
func.func @example(%arg0 : memref<?x?xi8>) {
  // (...)
  %cst = arith.constant dense<0> : vector<[16]x[16]xi8>
  vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8>
  return
}
```
would be lowered to:
```
  func.func @example(%arg0: memref<?x?xi8>) {
    // (...)
    %0 = arm_sme.zero : vector<[16]x[16]xi8>
    arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
    return
  }
```

Later, a mechanism will be introduced to guarantee that `arm_sme.zero`
and `arm_sme.tile_store` operate on the same virtual tile. For `i8`
elements this is not required as there is only one tile.

In order to lower the above output to LLVM, use
  * `-convert-vector-to-llvm="enable-arm-sme"`.

[1] https://github.com/openxla/iree/issues/14294

Reviewed By: WanderAway

Differential Revision: https://reviews.llvm.org/D154867

2023-07-18 08:04:59 +00:00

3 Commits