3 Commits

Author SHA1 Message Date
Cullen Rhodes
fff86c6111
[mlir][ArmSME] Support 4-way widening outer products (#79288)
This patch introduces support for 4-way widening outer products. This
enables the fusion of 4 'arm_sme.outerproduct' operations that are
chained via the accumulator into single widened operations.

Changes:

- Adds the following operations:
  - smopa_4way, smops_4way
  - umopa_4way, umops_4way
  - sumopa_4way, sumops_4way
  - sumopa_4way, sumops_4way
- Implements conversions for the above ops to intrinsics in ArmSMEToLLVM.
- Extends 'arm-sme-outer-product' pass.

For a detailed description of these operations see the
'arm_sme.smopa_4way' description.
2024-02-07 08:17:47 +00:00
Cullen Rhodes
5f5b3bb22b
[mlir][ArmSME] Add rewrites to swap extract of extend (#80407)
In mixed matmul lowering (e.g., i8 to i32) we're seeing the following
sequence:

  %0 = arith.extsi %src : vector<4x[8]xi8> to vector<4x[8]xi32>
  %1 = vector.extract %0[0] : vector<[8]xi32> from vector<4x[8]xi32>
%lhs = vector.scalable.extract %1[0] : vector<[4]xi32> from
vector<[8]xi32>

  ... (same for rhs)

%2 = vector.outerproduct %lhs, %rhs, %acc vector<[4]xi32>,
vector<[4]xi32>

  // x4 chained by accumulator

This chain of 4 outer products can be fused into a single 4-way widening
variant but the pass doesn't match on the IR, as it expects the source
of the inputs to be an extend and it can't look through the extracts.

This patch fixes this with two rewrites that swaps extract(extend) into
extend(extract).

Related to #78975, #79288.
2024-02-05 14:13:53 +00:00
Cullen Rhodes
95ef8e3868
[mlir][ArmSME] Support 2-way widening outer products (#78975)
This patch introduces support for 2-way widening outer products. This
enables the fusion of 2 'arm_sme.outerproduct' operations that are
chained via the accumulator into a 2-way widening outer product
operation.

Changes:

- Add 'llvm.aarch64.sme.[us]mop[as].za32' intrinsics for 2-way variants.
  These map to instruction variants added in SME2 and use different
  intrinsics. Intrinsics are already implemented for widening variants
  from SME1.
- Adds the following operations:
  - fmopa_2way, fmops_2way
  - smopa_2way, smops_2way
  - umopa_2way, umops_2way
- Implements conversions for the above ops to intrinsics in
ArmSMEToLLVM.
- Adds a pass 'arm-sme-outer-product-fusion'  that fuses
  'arm_sme.outerproduct' operations.

For a detailed description of these operations see the
'arm_sme.fmopa_2way' description.

The reason for introducing many operations rather than one is the
signed/unsigned variants can't be distinguished with types (e.g., ui16,
si16) since 'arith.extui' and 'arith.extsi' only support signless
integers. A single operation would require this information and an
attribute (for example) for the sign doesn't feel right if
floating-point types are also supported where this wouldn't apply.
Furthermore, the SME FP8 extensions (FEAT_SME_F8F16, FEAT_SME_F8F32)
introduce FMOPA 2-way (FP8 to FP16) and 4-way (FP8 to FP32) variants but
no subtract variant. Whilst these are not supported in this patch, it
felt simpler to have separate ops for add/subtract given this.
2024-01-31 09:13:18 +00:00