8 Commits

Author SHA1 Message Date
Nathan Malimban
689f9788d3
[ml_program] fix bufferizesToMemoryRead for ml_program.global_store (#177387)
This is a fix for the `BufferizableOpInterface` implementation for
`ml_program.global_store`.

`bufferizesToMemoryRead` currently returns false for
`GlobalStoreOpInterface`, but I believe it should return true as
`ml_program.global_store` needs to read its input buffer to know what
value to store to global.

This manifested in a bug where `one-shot-bufferize` would produce MLIR
that copies uninitialized data to the global var instead of the intended
value to be stored.

For the following MLIR:

```
module {
  ml_program.global private mutable @"state_tensor"(dense<0.0> : tensor<4x75xf32>) : tensor<4x75xf32>
  func.func @main() -> tensor<4x75xf32> {
    %c0 = arith.constant 0 : index
    %cst_val = arith.constant 1.0 : f32
    %initial_state = ml_program.global_load @"state_tensor" : tensor<4x75xf32>
    %val = tensor.extract %initial_state[%c0, %c0] : tensor<4x75xf32>
    %next_val = arith.addf %val, %cst_val : f32
    %updated_tensor = tensor.insert %next_val into %initial_state[%c0, %c0] : tensor<4x75xf32>
    ml_program.global_store @"state_tensor" = %updated_tensor : tensor<4x75xf32>
    return %updated_tensor : tensor<4x75xf32>
  }
}
```
`one-shot-bufferize` produces this incorrect MLIR
```
module {
  memref.global "private" @state_tensor : memref<4x75xf32> = dense<0.000000e+00>
  func.func @main() -> tensor<4x75xf32> {
    %c0 = arith.constant 0 : index
    %cst = arith.constant 1.000000e+00 : f32
    %0 = memref.get_global @state_tensor : memref<4x75xf32>
    %1 = memref.load %0[%c0, %c0] : memref<4x75xf32>
    %2 = arith.addf %1, %cst : f32
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<4x75xf32>
    memref.copy %0, %alloc : memref<4x75xf32> to memref<4x75xf32>
    memref.store %2, %alloc[%c0, %c0] : memref<4x75xf32>
    %3 = bufferization.to_tensor %alloc : memref<4x75xf32> to tensor<4x75xf32>
    %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<4x75xf32>
    %4 = memref.get_global @state_tensor : memref<4x75xf32>
    memref.copy %alloc_0, %4 : memref<4x75xf32> to memref<4x75xf32>
    return %3 : tensor<4x75xf32>
  }
}
```
Note that `memref.copy` at the end copies an uninitialized `alloc_0` to
the global variable.

But after the change we see the following MLIR:
```
module {
  memref.global "private" @state_tensor : memref<4x75xf32> = dense<0.000000e+00>
  func.func @main() -> tensor<4x75xf32> {
    %c0 = arith.constant 0 : index
    %cst = arith.constant 1.000000e+00 : f32
    %0 = memref.get_global @state_tensor : memref<4x75xf32>
    %1 = memref.load %0[%c0, %c0] : memref<4x75xf32>
    %2 = arith.addf %1, %cst : f32
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<4x75xf32>
    memref.copy %0, %alloc : memref<4x75xf32> to memref<4x75xf32>
    memref.store %2, %alloc[%c0, %c0] : memref<4x75xf32>
    %3 = bufferization.to_tensor %alloc : memref<4x75xf32> to tensor<4x75xf32>
    %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<4x75xf32>
    memref.copy %alloc, %alloc_0 : memref<4x75xf32> to memref<4x75xf32>
    %4 = memref.get_global @state_tensor : memref<4x75xf32>
    memref.copy %alloc_0, %4 : memref<4x75xf32> to memref<4x75xf32>
    return %3 : tensor<4x75xf32>
  }
}
```
We now see that the relevant data is copied to `alloc_0` before it is
stored in global.

Co-authored-by: Nathan Malimban <nmalimba@ah-nmalimba-l.dhcp.mathworks.com>
2026-01-23 13:20:30 +01:00
Stella Laurenzo
dbbdee2ea2
[mlir] Make the ml_program dialect allow all of its operations to be inlined. (#85479) 2024-03-15 22:22:09 -07:00
Ryan Holt
fa10121415
[mlir][MLProgram] Add MLProgram to MemRef bufferization pass (#75103)
There is currently no lowering out of `ml_program` in the LLVM
repository. This change adds a lowering to `memref` so that it can be
lowered all the way to LLVM. This lowering was taken from the [reference
backend in
torch-mlir](f416953600
).

I had tried implementing the `BufferizableOpInterface` for `ml_program`
instead of adding a new pass but that did not work because
`OneShotBufferize` does not visit module-level ops like
`ml_program.global`.
2024-01-30 16:34:33 +01:00
Rob Suderman
cbd475040f [mlir][mlprogram] Add mlprogram-pipeline-globals optimization pass
Added pass optimizes MLProgram global operations by reducing to only
the minimal load/store operations for global tensors. This avoids
unnecessary global operations throughout a program and potentially
improves operation gusion.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D159228
2023-09-18 17:11:29 -07:00
Jacques Pienaar
d30c0221cf [mlir] Split MLProgram global load and store to Graph variants
* Split ops into X_graph variants as discussed;
* Remove tokens from non-Graph region variants and rely on side-effect
  modelling there while removing side-effect modelling from Graph
  variants and relying on explicit ordering there;
* Make tokens required to be produced by Graph variants - but kept
  explicit token type specification given previous discussion on this
  potentially being configurable in future;

This results in duplicating some code. I considered adding helper
functions but decided against adding an abstraction there early given
size of duplication and creating accidental coupling.

Differential Revision: https://reviews.llvm.org/D127813
2022-06-16 20:01:54 -07:00
Stella Laurenzo
3bb7999339 [mlir] Add global_load and global_store ops to ml_program.
* Adds simple, non-atomic, non-volatile, non-synchronized direct load/store ops.

Differential Revision: https://reviews.llvm.org/D126230
2022-06-01 11:32:15 -07:00
Stella Laurenzo
2bb252852c [mlir] Add GlobalOp, GlobalLoadConstOp to ml_program.
The approach I took was to define a dialect 'extern' attribute that a GlobalOp can take as a value to signify external linkage. I think this approach should compose well and should also work with wherever the OpaqueElements work goes in the future (since that is just another kind of attribute). I special cased the GlobalOp parser/printer for this case because it is significantly easier on the eyes.

In the discussion, Jeff Niu had proposed an alternative syntax for GlobalOp that I ended up not taking. I did try to implement it but a) I don't think it made anything easier to read in the common case, and b) it made the parsing/printing logic a lot more complicated (I think I would need a completely custom parser/printer to do it well). Please have a look at the common cases where the global type and initial value type match: I don't think how I have it is too bad. The less common cases seem ok to me.

I chose to only implement the direct, constant load op since that is non side effecting and there was still discussion pending on that.

Differential Revision: https://reviews.llvm.org/D124318
2022-05-18 23:08:28 -07:00
Stella Laurenzo
61352a580a [mlir] Introduce ml_program dialect.
Differential Revision: https://reviews.llvm.org/D120203
2022-04-13 21:38:14 -07:00