This allows for users to provide operand_range and result_range in builder.create<> calls, instead of requiring an explicit copy into a separate data structure like SmallVector/std::vector.
PiperOrigin-RevId: 284360710
In the replaceAllUsesExcept utility function called from loop coalescing the
iteration over the use-chain is incorrect. The use list nodes (IROperands) have
next/prev links, and bluntly resetting the use would make the loop to continue
on uses of the value that was replaced instead of the original one. As a
result, it could miss the existing uses and update the wrong ones. Make sure we
increment the iterator before updating the use in the loop body.
Reported-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#291.
PiperOrigin-RevId: 283754195
The current lowering of loops to GPU only supports lowering of loop
nests where the loops mapped to workgroups and workitems are perfectly
nested. Here a new lowering is added to handle lowering of imperfectly
nested loop body with the following properties
1) The loops partitioned to workgroups are perfectly nested.
2) The loop body of the inner most loop partitioned to workgroups can
contain one or more loop nests that are to be partitioned across
workitems. Each individual loops nests partitioned to workitems should
also be perfectly nested.
3) The number of workgroups and workitems are not deduced from the
loop bounds but are passed in by the caller of the lowering as values.
4) For statements within the perfectly nested loop nest partitioned
across workgroups that are not loops, it is valid to have all threads
execute that statement. This is NOT verified.
PiperOrigin-RevId: 277958868
- also remove stale terminology/references in docs
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#148
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/148 from bondhugula:cleanup e846b641a3c2936e874138aff480a23cdbf66591
PiperOrigin-RevId: 271618279
- allow symbols in index remapping provided for memref replacement
- fix memref normalize crash on cases with layout maps with symbols
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Reported by: Alex Zinenko
Closestensorflow/mlir#139
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/139 from bondhugula:memref-rep-symbols 2f48c1fdb5d4c58915bbddbd9f07b18541819233
PiperOrigin-RevId: 269851182
- turn copy/dma generation method into a utility in LoopUtils, allowing
it to be reused elsewhere.
- no functional/logic change to the pass/utility
- trim down header includes in files affected
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#124
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/124 from bondhugula:datacopy 9f346e62e5bd9dd1986720a30a35f302eb4d3252
PiperOrigin-RevId: 269106088
This change refactors and cleans up the implementation of the operation walk methods. After this refactoring is that the explicit template parameter for the operation type is no longer needed for the explicit op walks. For example:
op->walk<AffineForOp>([](AffineForOp op) { ... });
is now accomplished via:
op->walk([](AffineForOp op) { ... });
PiperOrigin-RevId: 266209552
This CL introduces a simple loop utility function which rewrites the bounds and step of a loop so as to become mappable on a regular grid of processors whose identifiers are given by SSA values.
A corresponding unit test is added.
For example, using CUDA terminology, and assuming a 2-d grid with processorIds = [blockIdx.x, threadIdx.x] and numProcessors = [gridDim.x, blockDim.x], the loop:
```
loop.for %i = %lb to %ub step %step {
...
}
```
is rewritten into a version resembling the following pseudo-IR:
```
loop.for %i = %lb + threadIdx.x + blockIdx.x * blockDim.x to %ub
step %gridDim.x * blockDim.x {
...
}
```
PiperOrigin-RevId: 258945942
This CL adapts the recently introduced parametric tiling to have an API matching the tiling
of AffineForOp. The transformation using stripmineSink is more general and produces imperfectly nested loops.
Perfect nesting invariants of the tiled version are obtained by selectively applying hoisting of ops to isolate perfectly nested bands. Such hoisting may fail to produce a perfect loop nest in cases where ForOp transitively depend on enclosing induction variables. In such cases, the API provides a LogicalResult return but the SimpleParametricLoopTilingPass does not currently use this result.
A new unit test is added with a triangular loop for which the perfect nesting property does not hold. For this example, the old behavior was to produce IR that did not verify (some use was not dominated by its def).
PiperOrigin-RevId: 258928309
Multiple (perfectly) nested loops with independent bounds can be combined into
a single loop and than subdivided into blocks of arbitrary size for load
balancing or more efficient parallelism exploitation. However, MLIR wants to
preserve the multi-dimensional multi-loop structure at higher levels of
abstraction. Introduce a transformation that coalesces nested loops with
independent bounds so that they can be further subdivided by tiling.
PiperOrigin-RevId: 258151016
These ops should not belong to the std dialect.
This CL extracts them in their own dialect and updates the corresponding conversions and tests.
PiperOrigin-RevId: 258123853
This CL splits the lowering of affine to LLVM into 2 parts:
1. affine -> std
2. std -> LLVM
The conversions mostly consists of splitting concerns between the affine and non-affine worlds from existing conversions.
Short-circuiting of affine `if` conditions was never tested or exercised and is removed in the process, it can be reintroduced later if needed.
LoopParametricTiling.cpp is updated to reflect the newly added ForOp::build.
PiperOrigin-RevId: 257794436
This allows for the attribute to hold symbolic references to other operations than FuncOp. This also allows for removing the dependence on FuncOp from the base Builder.
PiperOrigin-RevId: 257650017
Parametric tiling can be used to extract outer loops with fixed number of
iterations. This in turn enables mapping to GPU kernels on a fixed grid
independently of the range of the original loops, which may be unknown
statically, making the kernel adaptable to different sizes. Provide a utility
function that also computes the parametric tile size given the range of the
loop. Exercise the utility function through a simple pass that applies it to
all top-level loop nests. Permutability or parallelism checks must be
performed before calling this utility function in actual passes.
Note that parametric tiling cannot be implemented in a purely affine way,
although it can be encoded using semi-affine maps. The choice to implement it
on standard loops is guided by them being the common representation between
Affine loops, Linalg and GPU kernels.
PiperOrigin-RevId: 257180251
These methods assume that a function is a valid builtin top-level operation, and removing these methods allows for decoupling FuncOp and IR/. Utility "getParentOfType" methods have been added to Operation/OpState to allow for querying the first parent operation of a given type.
PiperOrigin-RevId: 257018913
Move the data members out of Function and into a new impl storage class 'FunctionStorage'. This allows for Function to become value typed, which will greatly simplify the transition of Function to FuncOp(given that FuncOp is also value typed).
PiperOrigin-RevId: 255983022
Due to legacy reasons (ML/CFG function separation), regions in affine control
flow operations require contained blocks not to have terminators. This is
inconsistent with the notion of the block and may complicate code motion
between regions of affine control operations and other regions.
Introduce `affine.terminator`, a special terminator operation that must be used
to terminate blocks inside affine operations and transfers the control back to
he region enclosing the affine operation. For brevity and readability reasons,
allow `affine.for` and `affine.if` to omit the `affine.terminator` in their
regions when using custom printing and parsing format. The custom parser
injects the `affine.terminator` if it is missing so as to always have it
present in constructed operations.
Update transformations to account for the presence of terminator. In
particular, most code motion transformation between loops should leave the
terminator in place, and code motion between loops and non-affine blocks should
drop the terminator.
PiperOrigin-RevId: 240536998
inherited constructors, which is cleaner and means you can now use DimOp()
to get a null op, instead of having to use Instruction::getNull<DimOp>().
This removes another 200 lines of code.
PiperOrigin-RevId: 240068113
This eliminate ConstOpPointer (but keeps OpPointer for now) by making OpPointer
implicitly launder const in a const incorrect way. It will eventually go away
entirely, this is a progressive step towards the new const model.
PiperOrigin-RevId: 239512640