This change removes dependencies declared as either 'LINK_LIBS' or
'LINK_COMPONENTS' across several MLIR libraries. The removed
dependencies appear
to be incorrect and may have been required in older versions of the
project.
These dependencies cause many high level dialects to have transitive
dependence on the LLVM dialect and the LLVM 'Core' library
('llvm/lib/IR').
Note that if using the 'Ninja' CMake generator, one can inspect the
dependencies
(including all transitive libraries) of any given MLIR target but using
the command `ninja -C <build dir> -t browse` and navigating to the
library
of interest in a web browser.
We recently discovered that the loop with a dynamic upper bound is
unexpectedly unrolled during the NVVM to PTX process. By attaching the
`llvm.loop_annotation`, we can control the unrolling behavior precisely.
This PR enables the `cf.cond_br` to retain the loop annotation of
`scf.for` after the `convert-scf-to-cf` pass. This change allows users
to have precise control over the loop behavior during backend
transformation.
---------
Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>
There is currently no path to lower scf.forall to scf.parallel with the
goal of targeting the OpenMP dialect.
In the SCF->ControlFlow conversion, scf.forall is briefly converted to
scf.parallel, but the scf.parallel is lowered directly to a sequential
loop. This makes experimenting with scf.forall for CPU execution
difficult.
This change factors out the rewrite in the SCF->ControlFlow pass into a
utility function that can then be used in the SCF->ControlFlow lowering
and via a separate -scf-forall-to-parallel pass.
---------
Co-authored-by: Spenser Bauman <sabauma@fastmail>
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.
Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
-> f32, f32 {
%elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
%elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.addf %lhs, %rhs : f32
scf.reduce.return %res : f32
}, {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.mulf %lhs, %rhs : f32
scf.reduce.return %res : f32
}
}
```
`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperands` and `getNumIterOperands`. The names were
inconsistent because the "arg" is called `initArgs` in TableGen. Use
`getInitArgs().size()` instead.
* Fix verification around ops with no results.
scf.forall ops without shared outputs (i.e., fully bufferized ops) are
lowered to scf.parallel. scf.forall ops are typically lowered by an
earlier pass depending on the execution target. E.g., there are
optimized lowerings for GPU execution. This new lowering is for
completeness (convert-scf-to-cf can now lower all SCF loop constructs)
and provides a simple CPU lowering strategy for testing purposes.
scf.parallel is currently lowered to scf.for, which executes
sequentially. The scf.parallel lowering could be improved in the future
to run on multiple threads.
Add two new helper functions `getBeforeBody` and `getAfterBody` to be consistent with "scf.for" (`getBody`) and to show in the API that both regions have exactly one block. Also simplify some code that assumed that there can be more than one block in a region.
Differential Revision: https://reviews.llvm.org/D157860
* `RewriterBase::mergeBlocks` is simplified: it is implemented in terms of `mergeBlockBefore`.
* The signature of `mergeBlockBefore` is consistent with other API (such as `inlineRegionBefore`): an overload for a `Block::iterator` is added.
* Additional safety checks are added to `mergeBlockBefore`: detect cases where the resulting IR could be invalid (no more `dropAllUses`) or partly unreachable (likely a case of incorrect API usage).
* Rename `mergeBlockBefore` to `inlineBlockBefore`.
Differential Revision: https://reviews.llvm.org/D144969
The patch adds operations to `BlockAndValueMapping` and renames it to `IRMapping`. When operations are cloned, old operations are mapped to the cloned operations. This allows mapping from an operation to a cloned operation. Example:
```
Operation *opWithRegion = ...
Operation *opInsideRegion = &opWithRegion->front().front();
IRMapping map
Operation *newOpWithRegion = opWithRegion->clone(map);
Operation *newOpInsideRegion = map.lookupOrNull(opInsideRegion);
```
Migration instructions:
All includes to `mlir/IR/BlockAndValueMapping.h` should be replaced with `mlir/IR/IRMapping.h`. All uses of `BlockAndValueMapping` need to be renamed to `IRMapping`.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D139665
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
This aligns the SCF dialect file layout with the majority of the dialects.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D128049