llvm-project

Author	SHA1	Message	Date
Nathan Malimban	689f9788d3	[ml_program] fix bufferizesToMemoryRead for ml_program.global_store (#177387 ) This is a fix for the `BufferizableOpInterface` implementation for `ml_program.global_store`. `bufferizesToMemoryRead` currently returns false for `GlobalStoreOpInterface`, but I believe it should return true as `ml_program.global_store` needs to read its input buffer to know what value to store to global. This manifested in a bug where `one-shot-bufferize` would produce MLIR that copies uninitialized data to the global var instead of the intended value to be stored. For the following MLIR: ``` module { ml_program.global private mutable @"state_tensor"(dense<0.0> : tensor<4x75xf32>) : tensor<4x75xf32> func.func @main() -> tensor<4x75xf32> { %c0 = arith.constant 0 : index %cst_val = arith.constant 1.0 : f32 %initial_state = ml_program.global_load @"state_tensor" : tensor<4x75xf32> %val = tensor.extract %initial_state[%c0, %c0] : tensor<4x75xf32> %next_val = arith.addf %val, %cst_val : f32 %updated_tensor = tensor.insert %next_val into %initial_state[%c0, %c0] : tensor<4x75xf32> ml_program.global_store @"state_tensor" = %updated_tensor : tensor<4x75xf32> return %updated_tensor : tensor<4x75xf32> } } ``` `one-shot-bufferize` produces this incorrect MLIR ``` module { memref.global "private" @state_tensor : memref<4x75xf32> = dense<0.000000e+00> func.func @main() -> tensor<4x75xf32> { %c0 = arith.constant 0 : index %cst = arith.constant 1.000000e+00 : f32 %0 = memref.get_global @state_tensor : memref<4x75xf32> %1 = memref.load %0[%c0, %c0] : memref<4x75xf32> %2 = arith.addf %1, %cst : f32 %alloc = memref.alloc() {alignment = 64 : i64} : memref<4x75xf32> memref.copy %0, %alloc : memref<4x75xf32> to memref<4x75xf32> memref.store %2, %alloc[%c0, %c0] : memref<4x75xf32> %3 = bufferization.to_tensor %alloc : memref<4x75xf32> to tensor<4x75xf32> %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<4x75xf32> %4 = memref.get_global @state_tensor : memref<4x75xf32> memref.copy %alloc_0, %4 : memref<4x75xf32> to memref<4x75xf32> return %3 : tensor<4x75xf32> } } ``` Note that `memref.copy` at the end copies an uninitialized `alloc_0` to the global variable. But after the change we see the following MLIR: ``` module { memref.global "private" @state_tensor : memref<4x75xf32> = dense<0.000000e+00> func.func @main() -> tensor<4x75xf32> { %c0 = arith.constant 0 : index %cst = arith.constant 1.000000e+00 : f32 %0 = memref.get_global @state_tensor : memref<4x75xf32> %1 = memref.load %0[%c0, %c0] : memref<4x75xf32> %2 = arith.addf %1, %cst : f32 %alloc = memref.alloc() {alignment = 64 : i64} : memref<4x75xf32> memref.copy %0, %alloc : memref<4x75xf32> to memref<4x75xf32> memref.store %2, %alloc[%c0, %c0] : memref<4x75xf32> %3 = bufferization.to_tensor %alloc : memref<4x75xf32> to tensor<4x75xf32> %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<4x75xf32> memref.copy %alloc, %alloc_0 : memref<4x75xf32> to memref<4x75xf32> %4 = memref.get_global @state_tensor : memref<4x75xf32> memref.copy %alloc_0, %4 : memref<4x75xf32> to memref<4x75xf32> return %3 : tensor<4x75xf32> } } ``` We now see that the relevant data is copied to `alloc_0` before it is stored in global. Co-authored-by: Nathan Malimban <nmalimba@ah-nmalimba-l.dhcp.mathworks.com>	2026-01-23 13:20:30 +01:00
Stella Laurenzo	dbbdee2ea2	[mlir] Make the ml_program dialect allow all of its operations to be inlined. (#85479 )	2024-03-15 22:22:09 -07:00
Ryan Holt	fa10121415	[mlir][MLProgram] Add MLProgram to MemRef bufferization pass (#75103 ) There is currently no lowering out of `ml_program` in the LLVM repository. This change adds a lowering to `memref` so that it can be lowered all the way to LLVM. This lowering was taken from the [reference backend in torch-mlir](`f416953600` ). I had tried implementing the `BufferizableOpInterface` for `ml_program` instead of adding a new pass but that did not work because `OneShotBufferize` does not visit module-level ops like `ml_program.global`.	2024-01-30 16:34:33 +01:00
Rob Suderman	cbd475040f	[mlir][mlprogram] Add `mlprogram-pipeline-globals` optimization pass Added pass optimizes MLProgram global operations by reducing to only the minimal load/store operations for global tensors. This avoids unnecessary global operations throughout a program and potentially improves operation gusion. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D159228	2023-09-18 17:11:29 -07:00
Jacques Pienaar	d30c0221cf	[mlir] Split MLProgram global load and store to Graph variants * Split ops into X_graph variants as discussed; * Remove tokens from non-Graph region variants and rely on side-effect modelling there while removing side-effect modelling from Graph variants and relying on explicit ordering there; * Make tokens required to be produced by Graph variants - but kept explicit token type specification given previous discussion on this potentially being configurable in future; This results in duplicating some code. I considered adding helper functions but decided against adding an abstraction there early given size of duplication and creating accidental coupling. Differential Revision: https://reviews.llvm.org/D127813	2022-06-16 20:01:54 -07:00
Stella Laurenzo	3bb7999339	[mlir] Add global_load and global_store ops to ml_program. * Adds simple, non-atomic, non-volatile, non-synchronized direct load/store ops. Differential Revision: https://reviews.llvm.org/D126230	2022-06-01 11:32:15 -07:00
Stella Laurenzo	2bb252852c	[mlir] Add GlobalOp, GlobalLoadConstOp to ml_program. The approach I took was to define a dialect 'extern' attribute that a GlobalOp can take as a value to signify external linkage. I think this approach should compose well and should also work with wherever the OpaqueElements work goes in the future (since that is just another kind of attribute). I special cased the GlobalOp parser/printer for this case because it is significantly easier on the eyes. In the discussion, Jeff Niu had proposed an alternative syntax for GlobalOp that I ended up not taking. I did try to implement it but a) I don't think it made anything easier to read in the common case, and b) it made the parsing/printing logic a lot more complicated (I think I would need a completely custom parser/printer to do it well). Please have a look at the common cases where the global type and initial value type match: I don't think how I have it is too bad. The less common cases seem ok to me. I chose to only implement the direct, constant load op since that is non side effecting and there was still discussion pending on that. Differential Revision: https://reviews.llvm.org/D124318	2022-05-18 23:08:28 -07:00
Stella Laurenzo	61352a580a	[mlir] Introduce ml_program dialect. Differential Revision: https://reviews.llvm.org/D120203	2022-04-13 21:38:14 -07:00

8 Commits