llvm-project/mlir/test/Dialect/LLVM/transform-e2e.mlir
Erick Ochoa Lopez 613a5c555e
[mlir][vector] Replace OneDimMultiReductionToTwoDim with OneDimMultiReductionToReduction (#184241)
The `OneDimMultiReductionToTwoDim` pattern had some issues. For the
input program:

```mlir
func.func @rank1_multi_reduction(%arg0: vector<8xf32>, %acc: f32) -> f32 {
    %0 = vector.multi_reduction <add>, %arg0, %acc [0] : vector<8xf32> to f32
    return %0 : f32
}
```

* when lowering using the inner-parallel strategy, the compiler would
essentially produce scalar code:
```mlir
func.func @rank1_multi_reduction(%arg0: vector<8xf32>, %arg1: f32) -> f32 {
    %0 = vector.shape_cast %arg0 : vector<8xf32> to vector<1x8xf32>
    %1 = vector.broadcast %arg1 : f32 to vector<1xf32>
    %2 = vector.transpose %0, [1, 0] : vector<1x8xf32> to vector<8x1xf32>
    %3 = vector.extract %2[0] : vector<1xf32> from vector<8x1xf32>
    %4 = arith.addf %3, %1 : vector<1xf32>
    %5 = vector.extract %2[1] : vector<1xf32> from vector<8x1xf32>
    %6 = arith.addf %5, %4 : vector<1xf32>
    ... (repeats for all 8 elements) ...
    %17 = vector.extract %2[7] : vector<1xf32> from vector<8x1xf32>
    %18 = arith.addf %17, %16 : vector<1xf32>
    %19 = vector.extract %18[0] : f32 from vector<1xf32>
    return %19 : f32
}
```
* when lowering using the inner-reduction strategy, the compiler would
first unnecessarily transform it into a 2-D multi_reduction operation
<1x8xf32> and then extract an <8xf32> vector and apply reduction. The
canonicalization and folding would lead to the following final result:
```mlir
func.func @rank1_multi_reduction(%arg0: vector<8xf32>, %arg1: f32) -> f32 {
    %0 = vector.reduction <add>, %arg0, %arg1 : vector<8xf32> into f32
    return %0 : f32
}
```

Now, after this change:
* when lowering the compiler now produces for both strategies in one
step.
```
func.func @rank1_multi_reduction(%arg0: vector<8xf32>, %arg1: f32) -> f32 {
    %0 = vector.reduction <add>, %arg0, %arg1 : vector<8xf32> into f32
    return %0 : f32
}
```

This pattern is also useful for an ongoing refactoring that is happening
in the multi_reduction patterns. It is the only pattern that increases
multi_reduction in rank and would lead to an infinite loop when
attempting to reach a fixed point once we generalize other unrolling
patterns.

Assisted-by: Claude
2026-03-04 16:13:11 +00:00

45 lines
2.5 KiB
MLIR

// RUN: mlir-opt %s --transform-interpreter -test-transform-dialect-erase-schedule --test-lower-to-llvm --split-input-file | FileCheck %s
// CHECK-LABEL: llvm.func @matmul_tensors
func.func @matmul_tensors(
%arg0: tensor<2x4xf32>, %arg1: tensor<4x6xf32>, %arg2: tensor<2x6xf32>)
-> tensor<2x6xf32> {
// CHECK-NOT: linalg
// CHECK: llvm.intr.fmuladd{{.*}}
%0 = linalg.matmul ins(%arg0, %arg1: tensor<2x4xf32>, tensor<4x6xf32>)
outs(%arg2: tensor<2x6xf32>)
-> tensor<2x6xf32>
return %0 : tensor<2x6xf32>
}
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%module_op: !transform.any_op {transform.consumed}) {
%0 = transform.structured.match ops{["linalg.matmul"]} in %module_op : (!transform.any_op) -> !transform.any_op
%1, %loops:3 = transform.structured.tile_using_for %0 tile_sizes [2, 2, 2] : (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op, !transform.any_op)
%2 = transform.get_parent_op %1 {isolated_from_above} : (!transform.any_op) -> !transform.any_op
transform.structured.vectorize_children_and_apply_patterns %2 : (!transform.any_op) -> !transform.any_op
%b = transform.bufferization.one_shot_bufferize layout{IdentityLayoutMap}
%module_op {bufferize_function_boundaries = true}
: (!transform.any_op) -> !transform.any_op
%f = transform.structured.match ops{["func.func"]} in %b
: (!transform.any_op) -> !transform.any_op
// TODO: group these lower-level controls into various properly named vector
// lowering TD macros.
transform.apply_patterns to %f {
transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
transform.apply_patterns.vector.transfer_permutation_patterns
transform.apply_patterns.vector.reorder_multi_reduction_dims lowering_strategy = "innerparallel"
transform.apply_patterns.vector.multi_reduction_flattening lowering_strategy = "innerparallel"
transform.apply_patterns.vector.multi_reduction_unrolling lowering_strategy = "innerparallel"
transform.apply_patterns.vector.split_transfer_full_partial split_transfer_strategy = "linalg-copy"
transform.apply_patterns.vector.transfer_to_scf max_transfer_rank = 1 full_unroll = true
transform.apply_patterns.vector.lower_transfer max_transfer_rank = 1
transform.apply_patterns.vector.lower_shape_cast
transform.apply_patterns.vector.lower_transpose lowering_strategy = "shuffle_1d"
} : !transform.any_op
transform.yield
}
}