`IndexSwitchLowering` stored case values as `SmallVector<int32_t>`,
which silently truncated any `int64_t` case value larger than INT32_MAX
(e.g. `4294967296` became `0`). The `cf.switch` flag was also created
via `arith.index_cast index -> i32`, losing the upper 32 bits on 64-bit
platforms.
Fix: store case values as `SmallVector<APInt>` with 64-bit width, cast
the index argument to `i64`, and use the `ArrayRef<APInt>` overload of
`cf::SwitchOp::create` so the resulting switch correctly uses `i64` case
values and flag type.
Fixes#111589
Assisted-by: Claude Code
Add a pass option to `convert-scf-to-cf` to deactivate pattern rollback
for better performance. The lowering patterns from SCF->CF to benefit a
lot from this feature because `splitBlock` is expensive in the rollback
driver.
Do not access the erased `scf.while` operation in the lowering pattern.
That won't work anymore in a One-Shot Dialect Conversion and triggers a
use-after-free sanitizer error.
After the One-Shot Dialect Conversion refactoring, a
`ConversionPatternRewriter` will behave more like a normal
`PatternRewriter`.
Do not access the erased `scf.condition` operation in the lowering
pattern. That won't work anymore in a One-Shot Dialect Conversion and
triggers a use-after-free sanitizer error.
After the One-Shot Dialect Conversion refactoring, a
`ConversionPatternRewriter` will behave more like a normal
`PatternRewriter`.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
`let constructor` is deprecated since the table gen backend emits most
of the glue logic to build a pass. This PR retires the td method for
most (I need another pass) passes in the Conversion directory.
This change removes dependencies declared as either 'LINK_LIBS' or
'LINK_COMPONENTS' across several MLIR libraries. The removed
dependencies appear
to be incorrect and may have been required in older versions of the
project.
These dependencies cause many high level dialects to have transitive
dependence on the LLVM dialect and the LLVM 'Core' library
('llvm/lib/IR').
Note that if using the 'Ninja' CMake generator, one can inspect the
dependencies
(including all transitive libraries) of any given MLIR target but using
the command `ninja -C <build dir> -t browse` and navigating to the
library
of interest in a web browser.
We recently discovered that the loop with a dynamic upper bound is
unexpectedly unrolled during the NVVM to PTX process. By attaching the
`llvm.loop_annotation`, we can control the unrolling behavior precisely.
This PR enables the `cf.cond_br` to retain the loop annotation of
`scf.for` after the `convert-scf-to-cf` pass. This change allows users
to have precise control over the loop behavior during backend
transformation.
---------
Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>
There is currently no path to lower scf.forall to scf.parallel with the
goal of targeting the OpenMP dialect.
In the SCF->ControlFlow conversion, scf.forall is briefly converted to
scf.parallel, but the scf.parallel is lowered directly to a sequential
loop. This makes experimenting with scf.forall for CPU execution
difficult.
This change factors out the rewrite in the SCF->ControlFlow pass into a
utility function that can then be used in the SCF->ControlFlow lowering
and via a separate -scf-forall-to-parallel pass.
---------
Co-authored-by: Spenser Bauman <sabauma@fastmail>
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.
Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
-> f32, f32 {
%elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
%elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.addf %lhs, %rhs : f32
scf.reduce.return %res : f32
}, {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.mulf %lhs, %rhs : f32
scf.reduce.return %res : f32
}
}
```
`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperands` and `getNumIterOperands`. The names were
inconsistent because the "arg" is called `initArgs` in TableGen. Use
`getInitArgs().size()` instead.
* Fix verification around ops with no results.
scf.forall ops without shared outputs (i.e., fully bufferized ops) are
lowered to scf.parallel. scf.forall ops are typically lowered by an
earlier pass depending on the execution target. E.g., there are
optimized lowerings for GPU execution. This new lowering is for
completeness (convert-scf-to-cf can now lower all SCF loop constructs)
and provides a simple CPU lowering strategy for testing purposes.
scf.parallel is currently lowered to scf.for, which executes
sequentially. The scf.parallel lowering could be improved in the future
to run on multiple threads.
Add two new helper functions `getBeforeBody` and `getAfterBody` to be consistent with "scf.for" (`getBody`) and to show in the API that both regions have exactly one block. Also simplify some code that assumed that there can be more than one block in a region.
Differential Revision: https://reviews.llvm.org/D157860
* `RewriterBase::mergeBlocks` is simplified: it is implemented in terms of `mergeBlockBefore`.
* The signature of `mergeBlockBefore` is consistent with other API (such as `inlineRegionBefore`): an overload for a `Block::iterator` is added.
* Additional safety checks are added to `mergeBlockBefore`: detect cases where the resulting IR could be invalid (no more `dropAllUses`) or partly unreachable (likely a case of incorrect API usage).
* Rename `mergeBlockBefore` to `inlineBlockBefore`.
Differential Revision: https://reviews.llvm.org/D144969
The patch adds operations to `BlockAndValueMapping` and renames it to `IRMapping`. When operations are cloned, old operations are mapped to the cloned operations. This allows mapping from an operation to a cloned operation. Example:
```
Operation *opWithRegion = ...
Operation *opInsideRegion = &opWithRegion->front().front();
IRMapping map
Operation *newOpWithRegion = opWithRegion->clone(map);
Operation *newOpInsideRegion = map.lookupOrNull(opInsideRegion);
```
Migration instructions:
All includes to `mlir/IR/BlockAndValueMapping.h` should be replaced with `mlir/IR/IRMapping.h`. All uses of `BlockAndValueMapping` need to be renamed to `IRMapping`.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D139665
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
This aligns the SCF dialect file layout with the majority of the dialects.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D128049