22 Commits

Author SHA1 Message Date
Nicolas Vasilache
81c7f2e2f3 Cleanup resource management and rename recursive matchers
This CL follows up on a memory leak issue related to SmallVector growth that
escapes the BumpPtrAllocator.
The fix is to properly use ArrayRef and placement new to define away the
issue.

The following renaming is also applied:
1. MLFunctionMatcher -> NestedPattern
2. MLFunctionMatches -> NestedMatch

As a consequence all allocations are now guaranteed to live on the BumpPtrAllocator.

PiperOrigin-RevId: 231047766
2019-03-29 15:39:53 -07:00
River Riddle
6859f33292 Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int.
PiperOrigin-RevId: 230605756
2019-03-29 15:33:20 -07:00
Nicolas Vasilache
9f3f39d61a Cleanup EDSCs
This CL performs a bunch of cleanups related to EDSCs that are generally
useful in the context of using them with a simple wrapping C API (not in this
CL) and with simple language bindings to Python and Swift.

PiperOrigin-RevId: 230066505
2019-03-29 15:27:58 -07:00
Nicolas Vasilache
4573a8da9a Fix improperly indexed DimOp in LowerVectorTransfers.cpp
This CL fixes a misunderstanding in how to build DimOp which triggered
execution issues in the CPU path.

The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to
construct the dynamic dimensions should be:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and
`dim %arg, 4 : memref<?x4x?x8x?xf32>`

Before this CL, we wold construct:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 1 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`

and expect the other dimensions to be constants.
This assumption seems consistent at first glance with the syntax of alloc:

```
    %tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32>
```

But this was actuallyincorrect.

This CL also makes the relevant functions available to EDSCs and removes
duplication of the incorrect function.

PiperOrigin-RevId: 229622766
2019-03-29 15:24:13 -07:00
Nicolas Vasilache
424041ad58 Add EDSC sugar
This allows load, store and ForNest to be used with both Expr and Bindable.
This simplifies writing generic pieces of MLIR snippet.

For instance, a generic pointwise add can now be written:

```cpp
// Different Bindable ivs, one per loop in the loop nest.
auto ivs = makeBindables(shapeA.size());
Bindable zero, one;
// Same bindable, all equal to `zero`.
SmallVector<Bindable, 8> zeros(ivs.size(), zero);
// Same bindable, all equal to `one`.
SmallVector<Bindable, 8> ones(ivs.size(), one);
// clang-format off
Bindable A, B, C;
Stmt scalarA, scalarB, tmp;
Stmt block = edsc::Block({
  ForNest(ivs, zeros, shapeA, ones, {
    scalarA = load(A, ivs),
    scalarB = load(B, ivs),
    tmp = scalarA + scalarB,
    store(tmp, C, ivs)
  }),
});
// clang-format on
```

This CL also adds some extra support for pretty printing that will be used in
a future CL when we introduce standalone testing of EDSCs. At the momen twe
are lacking the basic infrastructure to write such tests.

PiperOrigin-RevId: 229375850
2019-03-29 15:16:53 -07:00
Nicolas Vasilache
d734c50c5f [MLIR] Clip all access dimensions during LowerVectorTransfers
This CL adds a short term remedy to an issue that was found during execution
tests.

Lowering of vector transfer ops uses the permutation map to determine which
ForInst have been super-vectorized. During materialization to HW vector sizes
however, some of those dimensions may be fully unrolled and do not appear in
the permutation map.
Such dimensions were then not clipped and may have accessed out of bounds.

This CL conservatively clips all dimensions to ensure no out of bounds access.
The longer term solution is still up for debate but will probably require
either passing more information between Materialization and lowering, or just
merging the 2 passes.

PiperOrigin-RevId: 228980787
2019-03-29 15:12:26 -07:00
Nicolas Vasilache
73f5c9c380 [MLIR] Sketch a simple set of EDSCs to declaratively write MLIR
This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs)
in MLIR components:
1. a `Type` system of shell classes that closely matches the MLIR type system. These
types are subdivided into `Bindable` leaf expressions and non-bindable `Expr`
expressions;
2. an `MLIREmitter` class whose purpose is to:
  a. maintain a map of `Bindable` leaf expressions to concrete SSAValue*;
  b. provide helper functionality to specify bindings of `Bindable` classes to
     SSAValue* while verifying comformable types;
  c. traverse the `Expr` and emit the MLIR.

This is used on a concrete example to implement MemRef load/store with clipping in the
LowerVectorTransfer pass. More specifically, the following pseudo-C++ code:
```c++
MLFuncBuilder *b = ...;
Location location = ...;
Bindable zero, one, expr, size;
// EDSL expression
auto access = select(expr < zero, zero, select(expr < size, expr, size - one));
auto ssaValue = MLIREmitter(b)
    .bind(zero, ...)
    .bind(one, ...)
    .bind(expr, ...)
    .bind(size, ...)
    .emit(location, access);
```
is used to emit all the MLIR for a clipped MemRef access.

This simple EDSL can easily be extended to more powerful patterns and should
serve as the counterpart to pattern matchers (and could potentially be unified
once we get enough experience).

In the future, most of this code should be TableGen'd but for now it has
concrete valuable uses: make MLIR programmable in a declarative fashion.

This CL also adds Stmt, proper supporting free functions and rewrites
VectorTransferLowering fully using EDSCs.

The code for creating the EDSCs emitting a VectorTransferReadOp as loops
with clipped loads is:

```c++
  Stmt block = Block({
    tmpAlloc = alloc(tmpMemRefType),
    vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
    ForNest(ivs, lbs, ubs, steps, {
      scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs),
      store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs),
    }),
    vectorValue = load(vectorView, zero),
    tmpDealloc = dealloc(tmpAlloc.getLHS())});
  emitter.emitStmt(block);
```

where `accessInfo.clippedScalarAccessExprs)` is created with:

```c++
select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one));
```

The generated MLIR resembles:

```mlir
    %1 = dim %0, 0 : memref<?x?x?x?xf32>
    %2 = dim %0, 1 : memref<?x?x?x?xf32>
    %3 = dim %0, 2 : memref<?x?x?x?xf32>
    %4 = dim %0, 3 : memref<?x?x?x?xf32>
    %5 = alloc() : memref<5x4x3xf32>
    %6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
    for %i4 = 0 to 3 {
      for %i5 = 0 to 4 {
        for %i6 = 0 to 5 {
          %7 = affine_apply #map0(%i0, %i4)
          %8 = cmpi "slt", %7, %c0 : index
          %9 = affine_apply #map0(%i0, %i4)
          %10 = cmpi "slt", %9, %1 : index
          %11 = affine_apply #map0(%i0, %i4)
          %12 = affine_apply #map1(%1, %c1)
          %13 = select %10, %11, %12 : index
          %14 = select %8, %c0, %13 : index
          %15 = affine_apply #map0(%i3, %i6)
          %16 = cmpi "slt", %15, %c0 : index
          %17 = affine_apply #map0(%i3, %i6)
          %18 = cmpi "slt", %17, %4 : index
          %19 = affine_apply #map0(%i3, %i6)
          %20 = affine_apply #map1(%4, %c1)
          %21 = select %18, %19, %20 : index
          %22 = select %16, %c0, %21 : index
          %23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32>
          store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32>
        }
      }
    }
    %24 = load %6[%c0] : memref<1xvector<5x4x3xf32>>
    dealloc %5 : memref<5x4x3xf32>
```

In particular notice that only 3 out of the 4-d accesses are clipped: this
corresponds indeed to the number of dimensions in the super-vector.

This CL also addresses the cleanups resulting from the review of the prevous
CL and performs some refactoring to simplify the abstraction.

PiperOrigin-RevId: 227367414
2019-03-29 14:50:23 -07:00
Chris Lattner
dffc589ad2 Extend InstVisitor and Walker to handle arbitrary CFG functions, expand the
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops.  Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.

This is step 25/n towards merging instructions and statements.

PiperOrigin-RevId: 227243966
2019-03-29 14:46:58 -07:00
Chris Lattner
456ad6a8e0 Standardize naming of statements -> instructions, revisting the code base to be
consistent and moving the using declarations over.  Hopefully this is the last
truly massive patch in this refactoring.

This is step 21/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227178245
2019-03-29 14:44:30 -07:00
Chris Lattner
315a466aed Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand.
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.

This is step 19/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227163082
2019-03-29 14:43:58 -07:00
Chris Lattner
69d9e990fa Eliminate the using decls for MLFunction and CFGFunction standardizing on
Function.

This is step 18/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227139399
2019-03-29 14:43:13 -07:00
Chris Lattner
d798f9bad5 Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(),
StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names.

This is step 17/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227121537
2019-03-29 14:42:40 -07:00
Chris Lattner
5187cfcf03 Merge Operation into OperationInst and standardize nomenclature around
OperationInst.  This is a big mechanical patch.

This is step 16/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227093712
2019-03-29 14:42:23 -07:00
Chris Lattner
4c05f8cac6 Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single new
FuncBuilder class.  Also rename SSAValue.cpp to Value.cpp

This is step 12/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227067644
2019-03-29 14:40:22 -07:00
Chris Lattner
3f190312f8 Merge SSAValue, CFGValue, and MLValue together into a single Value class, which
is the new base of the SSA value hierarchy.  This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate.  This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.

This is step 11/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227064624
2019-03-29 14:40:06 -07:00
Chris Lattner
d613f5ab65 Refactor MLFunction to contain a StmtBlock for its body instead of inheriting
from it.  This is necessary progress to squaring away the parent relationship
that a StmtBlock has with its enclosing if/for/fn, and makes room for functions
to have more than one block in the future.  This also removes IfClause and ForStmtBody.

This is step 5/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226936541
2019-03-29 14:36:35 -07:00
Chris Lattner
1301f907a1 Refactor ForStmt: having it contain a StmtBlock instead of subclassing
StmtBlock.  This is more consistent with IfStmt and also conceptually makes
more sense - a forstmt "isn't" its body, it contains its body.

This is step 1/N towards merging BasicBlock and StmtBlock.  This is required
because in the new regime StmtBlock will have a use list (just like BasicBlock
does) of operands, and ForStmt already has a use list for its induction
variable.

This is a mechanical patch, NFC.

PiperOrigin-RevId: 226684158
2019-03-29 14:35:19 -07:00
Alex Zinenko
4dbd94b543 Refactor LowerVectorTransfersPass using pattern rewriters
This introduces a generic lowering pass for ML functions.  The pass is
parameterized by template arguments defining individual pattern rewriters.
Concrete lowering passes define individual pattern rewriters and inherit from
the generic class that takes care of allocating rewriters, traversing ML
functions and performing the actual rewrite.

While this is similar to the greedy pattern rewriter available in
Transform/Utils, it requires adjustments due to the ML/CFG duality.  In
particular, ML function rewriters must be able to create statements, not only
operations, and need access to an MLFuncBuilder.  When we move to using the
unified function type, the ML-specific rewriting will become unnecessary.

Use LowerVectorTransfers as a testbed for the generic pass.

PiperOrigin-RevId: 225887424
2019-03-29 14:31:43 -07:00
Alex Zinenko
51c8a095a3 Materialize vector_type_cast operation in the SuperVector dialect
This operation is produced and used by the super-vectorization passes and has
been emitted as an abstract unregistered operation until now.  For end-to-end
testing purposes, it has to be eventually lowered to LLVM IR.  Matching
abstract operation by name goes into the opposite direction of the generic
lowering approach that is expected to be used for LLVM IR lowering in the
future.  Register vector_type_cast operation as a part of the SuperVector
dialect.

Arguably, this operation is a special case of the `view` operation from the
Standard dialect.  The semantics of `view` is not fully specified at this point
so it is safer to rely on a custom operation.  Additionally, using a custom
operation may help to achieve clear dialect separation.

PiperOrigin-RevId: 225887305
2019-03-29 14:31:13 -07:00
Alex Zinenko
bc52a639f9 Extract vector_transfer_* Ops into a SuperVectorDialect.
From the beginning, vector_transfer_read and vector_transfer_write opreations
were intended as a mid-level vectorization abstraction.  In particular, they
are lowered to the StandardOps dialect before further processing.  As such, it
does not make sense to keep them at the same level as StandardOps.  Introduce
the new SuperVectorOps dialect and move vector_transfer_* operations there.
This will be used as a testbed for the generic lowering/legalization pass.

PiperOrigin-RevId: 225554492
2019-03-29 14:28:58 -07:00
Nicolas Vasilache
c28aeef901 [MLIR] Drop bug-prone global map indexed by MLFunction*
PiperOrigin-RevId: 224610805
2019-03-29 14:23:49 -07:00
Nicolas Vasilache
d9b6420fc9 [MLIR] Add LowerVectorTransfersPass
This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp
to a simple loop nest via local buffer allocations.

This is an MLIR->MLIR lowering based on builders.

A few TODOs are left to address in particular:
1. invert the permutation map so the accesses to the remote memref are coalesced;
2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory);
3. support broadcast / avoid copies when permutation_map is not of full column rank
4. add a proper "element_cast" op

One notable limitation is this does not plan on supporting boundary conditions.
It should be significantly easier to use pre-baked MLIR functions to handle such paddings.
This is left for future consideration.
Therefore the current CL only works properly for full-tile cases atm.

This CL also adds 2 simple tests:

```mlir
  for %i0 = 0 to %M step 3 {
    for %i1 = 0 to %N step 4 {
      for %i2 = 0 to %O {
        for %i3 = 0 to %P step 5 {
          vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index
```

lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
  for %i1 = 0 to %arg1 step 4 {
    for %i2 = 0 to %arg2 {
      for %i3 = 0 to %arg3 step 5 {
        %1 = alloc() : memref<5x4x3xf32>
        %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
        store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>>
        for %i4 = 0 to 5 {
          %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
          for %i5 = 0 to 4 {
            %4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5)
            for %i6 = 0 to 3 {
              %5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
              %6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32>
              store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32>
       dealloc %1 : memref<5x4x3xf32>
```

and
```mlir
  for %i0 = 0 to %M step 3 {
    for %i1 = 0 to %N {
      for %i2 = 0 to %O {
        for %i3 = 0 to %P step 5 {
          %f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32>

```

lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
  for %i1 = 0 to %arg1 {
    for %i2 = 0 to %arg2 {
      for %i3 = 0 to %arg3 step 5 {
        %1 = alloc() : memref<5x4x3xf32>
        %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
        for %i4 = 0 to 5 {
          %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
          for %i5 = 0 to 4 {
            for %i6 = 0 to 3 {
              %4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
              %5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32>
              store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32>
        %6 = load %2[%c0] : memref<1xvector<5x4x3xf32>>
        dealloc %1 : memref<5x4x3xf32>
```

PiperOrigin-RevId: 224552717
2019-03-29 14:23:05 -07:00