We currently only support one level of aliases, which isn't great
in situations where an attribute/type can have multiple duplicated
components nested within it(e.g. debuginfo metadata). This commit
refactors alias generation to support nested aliases, which requires
changing alias grouping to take into account the depth of child
aliases, to ensure that attributes/types aren't printed before the
aliases they use.
The only real user facing change here was that we no longer print
0 as an alias suffix, which would be unnecessarily expensive to keep
in the new alias generation method (and isn't that valuable of a
behavior to preserve).
Differential Revision: https://reviews.llvm.org/D136541
This change allows analyzing ops from different block, in particular when used in programs that have `cf` branches.
Differential Revision: https://reviews.llvm.org/D135644
These operations have undefined behavior if the index is not less than the rank of the source tensor / memref, so they cannot be freely speculated like they were before this patch. After this patch we speculate them only if we can prove that they don't have UB.
Depends on D135505.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D135748
This patch takes the first step towards a more principled modeling of undefined behavior in MLIR as discussed in the following discourse threads:
1. https://discourse.llvm.org/t/semantics-modeling-undefined-behavior-and-side-effects/4812
2. https://discourse.llvm.org/t/rfc-mark-tensor-dim-and-memref-dim-as-side-effecting/65729
This patch in particular does the following:
1. Introduces a ConditionallySpeculatable OpInterface that dynamically determines whether an Operation can be speculated.
2. Re-defines `NoSideEffect` to allow undefined behavior, making it necessary but not sufficient for speculation. Also renames it to `NoMemoryEffect`.
3. Makes LICM respect the above semantics.
4. Changes all ops tagged with `NoSideEffect` today to additionally implement ConditionallySpeculatable and mark themselves as always speculatable. This combined trait is named `Pure`. This makes this change NFC.
For out of tree dialects:
1. Replace `NoSideEffect` with `Pure` if the operation does not have any memory effects, undefined behavior or infinite loops.
2. Replace `NoSideEffect` with `NoSideEffect` otherwise.
The next steps in this process are (I'm proposing to do these in upcoming patches):
1. Update operations like `tensor.dim`, `memref.dim`, `scf.for`, `affine.for` to implement a correct hook for `ConditionallySpeculatable`. I'm also happy to update ops in other dialects if the respective dialect owners would like to and can give me some pointers.
2. Update other passes that speculate operations to consult `ConditionallySpeculatable` in addition to `NoMemoryEffect`. I could not find any other than LICM on a quick skim, but I could have missed some.
3. Add some documentation / FAQs detailing the differences between side effects, undefined behavior, speculatabilty.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D135505
Fix crash in normalizeMemRefType. Correctly handle scenario and replace
assertion with a failure.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D135424
Many tests still depend on specific names of SSA values (!!).
This commit is a best effort cleanup that will set the stage for adding some pretty SSA result names.
All relevant operations have been switched to primarily use the strided
layout, but still support the affine map layout. Update the relevant
tests to use the strided format instead for compatibility with how ops
now print by default.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134045
Memref subview operation has been initially designed to work on memrefs with
strided layouts only and has never supported anything else. Port it to use the
recently added StridedLayoutAttr instead of extracting the strided from
implicitly from affine maps.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D133938
Introduce a new attribute to represent the strided memref layout. Strided
layouts are omnipresent in code generation flows and are the only kind of
layouts produced and supported by a half of operation in the memref dialect
(view-related, shape-related). However, they are internally represented as
affine maps that require a somewhat fragile extraction of the strides from the
linear form that also comes with an overhead. Furthermore, textual
representation of strided layouts as affine maps is difficult to read: compare
`affine_map<(d0, d1, d2)[s0, s1] -> (d0*32 + d1*s0 + s1 + d2)>` with
`strides: [32, ?, 1], offset: ?`. While a rudimentary support for parsing a
syntactically sugared version of the strided layout has existed in the codebase
for a long time, it does not go as far as this commit to make the strided
layout a first-class attribute in the IR.
This introduces the attribute and updates the tests that using the pre-existing
sugared form to use the new attribute instead. Most memref created
programmatically, e.g., in passes, still use the affine form with further
extraction of strides and will be updated separately.
Update and clean-up the memref type documentation that has gotten stale and has
been referring to the details of affine map composition that are long gone.
See https://discourse.llvm.org/t/rfc-materialize-strided-memref-layout-as-an-attribute/64211.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D132864
The current approach is convervative in which whenever there is a
non-normalizable operations in a function will the function be labelled
as non-normalizable. It means it requires that all operations must have
MemRefsNormalizable trait.
This patch relaxes the requirement that if the memref map layouts of a
non-normalizable operation are identity, this operation does not block
the normalization of the other operations in the same function.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D125854
This change add a helper function for computing a topological sorting of a list of ops. E.g. this can be useful in transforms where a subset of ops should be cloned without dominance errors.
The analysis reuses the existing implementation in TopologicalSortUtils.cpp.
Differential Revision: https://reviews.llvm.org/D131669
This reland includes changes to the Python bindings.
Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.
Depends on D131801
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D131803
Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.
Depends on D131738
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D131702
When dead-code analysis is run at the scope of a function, call ops to
other functions at the same level were being marked as unreachable,
since the analysis optimistically assumes the call op to have no known
predecessors and that all predecessors are known, but the callee would
never get visited.
This patch fixes the bug by checking if a referenced function is above
the top-level op of the analysis, and is thus considered an external
callable.
Fixes#56830
Reviewed By: zero9178
Differential Revision: https://reviews.llvm.org/D130829
Added a commutativity utility pattern and a function to populate it. The pattern sorts the operands of an op in ascending order of the "key" associated with each operand iff the op is commutative. This sorting is stable.
The function is intended to be used inside passes to simplify the matching of commutative operations. After the application of the above-mentioned pattern, since the commutative operands now have a deterministic order in which they occur in an op, the matching of large DAGs becomes much simpler, i.e., requires much less number of checks to be written by a user in her/his pattern matching function.
The "key" associated with an operand is the list of the "AncestorKeys" associated with the ancestors of this operand, in a breadth-first order.
The operand of any op is produced by a set of ops and block arguments. Each of these ops and block arguments is called an "ancestor" of this operand.
Now, the "AncestorKey" associated with:
1. A block argument is `{type: BLOCK_ARGUMENT, opName: ""}`.
2. A non-constant-like op, for example, `arith.addi`, is `{type: NON_CONSTANT_OP, opName: "arith.addi"}`.
3. A constant-like op, for example, `arith.constant`, is `{type: CONSTANT_OP, opName: "arith.constant"}`.
So, if an operand, say `A`, was produced as follows:
```
`<block argument>` `<block argument>`
\ /
\ /
`arith.subi` `arith.constant`
\ /
`arith.addi`
|
returns `A`
```
Then, the block arguments and operations present in the backward slice of `A`, in the breadth-first order are:
`arith.addi`, `arith.subi`, `arith.constant`, `<block argument>`, and `<block argument>`.
Thus, the "key" associated with operand `A` is:
```
{
{type: NON_CONSTANT_OP, opName: "arith.addi"},
{type: NON_CONSTANT_OP, opName: "arith.subi"},
{type: CONSTANT_OP, opName: "arith.constant"},
{type: BLOCK_ARGUMENT, opName: ""},
{type: BLOCK_ARGUMENT, opName: ""}
}
```
Now, if "keyA" is the key associated with operand `A` and "keyB" is the key associated with operand `B`, then:
"keyA" < "keyB" iff:
1. In the first unequal pair of corresponding AncestorKeys, the AncestorKey in operand `A` is smaller, or,
2. Both the AncestorKeys in every pair are the same and the size of operand `A`'s "key" is smaller.
AncestorKeys of type `BLOCK_ARGUMENT` are considered the smallest, those of type `CONSTANT_OP`, the largest, and `NON_CONSTANT_OP` types come in between. Within the types `NON_CONSTANT_OP` and `CONSTANT_OP`, the smaller ones are the ones with smaller op names (lexicographically).
---
Some examples of such a sorting:
Assume that the sorting is being applied to `foo.commutative`, which is a commutative op.
Example 1:
> %1 = foo.const 0
> %2 = foo.mul <block argument>, <block argument>
> %3 = foo.commutative %1, %2
Here,
1. The key associated with %1 is:
```
{
{CONSTANT_OP, "foo.const"}
}
```
2. The key associated with %2 is:
```
{
{NON_CONSTANT_OP, "foo.mul"},
{BLOCK_ARGUMENT, ""},
{BLOCK_ARGUMENT, ""}
}
```
The key of %2 < the key of %1
Thus, the sorted `foo.commutative` is:
> %3 = foo.commutative %2, %1
Example 2:
> %1 = foo.const 0
> %2 = foo.mul <block argument>, <block argument>
> %3 = foo.mul %2, %1
> %4 = foo.add %2, %1
> %5 = foo.commutative %1, %2, %3, %4
Here,
1. The key associated with %1 is:
```
{
{CONSTANT_OP, "foo.const"}
}
```
2. The key associated with %2 is:
```
{
{NON_CONSTANT_OP, "foo.mul"},
{BLOCK_ARGUMENT, ""}
}
```
3. The key associated with %3 is:
```
{
{NON_CONSTANT_OP, "foo.mul"},
{NON_CONSTANT_OP, "foo.mul"},
{CONSTANT_OP, "foo.const"},
{BLOCK_ARGUMENT, ""},
{BLOCK_ARGUMENT, ""}
}
```
4. The key associated with %4 is:
```
{
{NON_CONSTANT_OP, "foo.add"},
{NON_CONSTANT_OP, "foo.mul"},
{CONSTANT_OP, "foo.const"},
{BLOCK_ARGUMENT, ""},
{BLOCK_ARGUMENT, ""}
}
```
Thus, the sorted `foo.commutative` is:
> %5 = foo.commutative %4, %3, %2, %1
Signed-off-by: Srishti Srivastava <srishti.srivastava@polymagelabs.com>
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D124750
When this was updated in D127139 the update in-place case was no longer
marked as pessimistic. Add back in.
Differential Revision: https://reviews.llvm.org/D130453
Convert arith.cmpi to the canonical form with constants on the right side
to simplify further optimizations and open more opportunities for CSE.
Differential Revision: https://reviews.llvm.org/D129929
In the current state, this is only special cased for Allocation effects, but any effects on results allocated by the operation may be ignored when checking whether the op may be removed, as none of them are possible to be observed if the result is unused.
A use case for this is for IRs for languages which always initialize on allocation. To correctly model such operations, a Write as well as an Allocation effect should be placed on the result. This would prevent the Op from being deleted if unused however. This patch fixes that issue.
Differential Revision: https://reviews.llvm.org/D129854
Ops that implement `RegionBranchOpInterface` are allowed to indicate that they can branch back to themselves in `getSuccessorRegions`, but there is no API that allows them to specify the forwarded operands. This patch enables that by changing `getSuccessorEntryOperands` to accept `None`.
Fixes#54928
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D127239
Operand's defining op may not be valid for adding to the worklist under
stict mode
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D127180
When `RegionBranchOpInterface::getSuccessorRegions` is called for anything other than the parent op, it expects the operands of the terminator of the source region to be passed, not the operands of the parent op. This was not always respected.
This fixes a bug in integer range inference and ForwardDataFlowSolver and changes `scf.while` to allow narrowing of successors using constant inputs.
Fixes#55873
Reviewed By: mehdi_amini, krzysz00
Differential Revision: https://reviews.llvm.org/D127261
The previous fix from af371f9f98da only applied when using a bottom-up
traversal. The change here applies the constant preprocessing logic to the
top-down case as well. This resolves the issue with the canonicalizer pass still
reordering constants, since it uses a top-down traversal by default.
Fixes#51892
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D125623
The canonicalize command-line options currently have no effect, as the pass is
reading the pass options in its constructor, before they're actually
initialized. This results in the default values of the options always being used.
The change here moves the initialization of the `GreedyRewriteConfig` out of the
constructor, so that it runs after the pass options have been parsed.
Fixes#55466
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D125621
This patch adds a topological sort utility and pass. A topological sort reorders
the operations in a block without SSA dominance such that, as much as possible,
users of values come after their producers.
The utility function sorts topologically the operation range in a given block
with an optional user-provided callback that can be used to virtually break cycles.
The toposort pass itself recursively sorts graph regions under the target op.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D125063
The current implementation of `cloneWithNewYields` has a few issues
- It clones the loop body of the original loop to create a new
loop. This is very expensive.
- It performs `erase` operations which are incompatible when this
method is called from within a pattern rewrite. All erases need to
go through `PatternRewriter`.
To address these a new utility method `replaceLoopWithNewYields` is added
which
- moves the operations from the original loop into the new loop.
- replaces all uses of the original loop with the corresponding
results of the new loop
- use a call back to allow caller to generate the new yield values.
- the original loop is modified to just yield the basic block
arguments corresponding to the iter_args of the loop. This
represents a no-op loop. The loop itself is dead (since all its uses
are replaced), but is not removed. The caller is expected to erase
the op. Consequently, this method can be called from within a
`matchAndRewrite` method of a `PatternRewriter`.
The `cloneWithNewYields` could be replaces with
`replaceLoopWithNewYields`, but that seems to trigger a failure during
walks, potentially due to the operations being moved. That is left as
a TODO.
Differential Revision: https://reviews.llvm.org/D125147
This was leftover from when the standard dialect was destroyed, and
when FuncOp moved to the func dialect. Now that these transitions
have settled a bit we can drop these.
Most updates were handled using a simple regex: replace `^( *)func` with `$1func.func`
Differential Revision: https://reviews.llvm.org/D124146
This is necessary to handle conversions of operations defined at runtime in extensible dialects.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D124353
Add RegionBranchOpInterface on affine.for op so that transforms relying
on RegionBranchOpInterface can support affine.for. E.g.:
buffer-deallocation pass.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D123568
This patch takes advantage of the Commutative trait on operation
to remove identical commutative operations where the operands are swapped.
The second operation below can be removed since `arith.addi` is commutative.
```
%1 = arith.addi %a, %b : i32
%2 = arith.addi %b, %a : i32
```
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123492
Changes the algorithm of LICM to support graph regions (no guarantee of topologically sorted order). Also fixes an issue where ops with recursive side effects and regions would not be hoisted if any nested ops used operands that were defined within the nested region.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D122465
This patch revamps the BranchOpInterface a bit and allows a proper implementation of what was previously `getMutableSuccessorOperands` for operations, which internally produce arguments to some of the block arguments. A motivating example for this would be an invoke op with a error handling path:
```
invoke %function(%0)
label ^success ^error(%1 : i32)
^error(%e: !error, %arg0 : i32):
...
```
The advantages of this are that any users of `BranchOpInterface` can still argue over remaining block argument operands (such as `%1` in the example above), as well as make use of the modifying capabilities to add more operands, erase an operand etc.
The way this patch implements that functionality is via a new class called `SuccessorOperands`, which is now returned by `getSuccessorOperands`. It basically contains an `unsigned` denoting how many operator produced operands exist, as well as a `MutableOperandRange`, which are the usual forwarded operands we are used to. The produced operands are assumed to the first few block arguments, followed by the forwarded operands afterwards. The role of `SuccessorOperands` is to provide various utility functions to modify and query the successor arguments from a `BranchOpInterface`.
Differential Revision: https://reviews.llvm.org/D123062
Reland Note: Adds a fix to properly mark a commutative operation as folded if we change the order
of its operands. This was uncovered by the fact that we no longer re-process constants.
This avoids accidentally reversing the order of constants during successive
application, e.g. when running the canonicalizer. This helps reduce the number
of iterations, and also avoids unnecessary changes to input IR.
Fixes#51892
Differential Revision: https://reviews.llvm.org/D122692
This patch enhances the CSE pass to deal with simple cases of duplicated
operations with MemoryEffects.
It allows the CSE pass to remove safely duplicate operations with the
MemoryEffects::Read that have no other side-effecting operations in
between. Other MemoryEffects::Read operation are allowed.
The use case is pretty simple so far so we can build on top of it to add
more features.
This patch is also meant to avoid a dedicated CSE pass in FIR and was
brought together afetr discussion on https://reviews.llvm.org/D112711.
It does not currently cover the full range of use cases described in
https://reviews.llvm.org/D112711 but the idea is to gradually enhance
the MLIR CSE pass to handle common use cases that can be used by
other dialects.
This patch takes advantage of the new CSE capabilities in Fir.
Reviewed By: mehdi_amini, rriddle, schweitz
Differential Revision: https://reviews.llvm.org/D122801
This significantly simplifies the boilerplate necessary for passes
to define nested pass pipelines.
Differential Revision: https://reviews.llvm.org/D122880
This shows that pushing constant to the right in a commutative op leads
to `applyPatternsAndFoldGreedily` to converge without applying all the
patterns.
Differential Revision: https://reviews.llvm.org/D122870
This reverts commit 59bbc7a0851b6e0054bb3ed47df0958822f08880.
This exposes an issue breaking the contract of
`applyPatternsAndFoldGreedily` where we "converge" without applying
remaining patterns.
This avoids accidentally reversing the order of constants during successive
application, e.g. when running the canonicalizer. This helps reduce the number
of iterations, and also avoids unnecessary changes to input IR.
Fixes#51892
Differential Revision: https://reviews.llvm.org/D122692
(This was a TODO from the initial patch).
The control-flow sink utility accepts a callback that is used to sink an operation into a region.
The `moveIntoRegion` is called on the same operation and region that return true for `shouldMoveIntoRegion`.
The callback must preserve the dominance of the operation within the region. In the default control-flow
sink implementation, this is moving the operation to the start of the entry block.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D122445
When the current implementation merges two blocks that have operands defined outside of their block respectively, it will merge these by adding a block argument in the resulting merged block and adding successor arguments to the predecessors.
There is a special case where this is incorrect however: If one of predecessors terminator produce the operand, inserting the block argument and updating the predecessor would lead to the terminator using its own result as successor argument.
IR Example:
```
%0 = "test.producing_br"()[^bb1, ^bb2] {
operand_segment_sizes = dense<0> : vector<2 x i32>
} : () -> i32
^bb1:
"test.br"(%0)[^bb4] : (i32) -> ()
```
where `^bb1` is then merged with another block would lead to:
```
%0 = "test.producing_br"(%0)[^bb1, ^bb2]
```
This patch fixes that issue during clustering by making sure that if the operand is from an outside block, that it is not produced by the terminator of a predecessor.
Differential Revision: https://reviews.llvm.org/D121988
This commit moves FuncOp out of the builtin dialect, and into the Func
dialect. This move has been planned in some capacity from the moment
we made FuncOp an operation (years ago). This commit handles the
functional aspects of the move, but various aspects are left untouched
to ease migration: func::FuncOp is re-exported into mlir to reduce
the actual API churn, the assembly format still accepts the unqualified
`func`. These temporary measures will remain for a little while to
simplify migration before being removed.
Differential Revision: https://reviews.llvm.org/D121266
A lot of test passes are currently anchored on FuncOp, but this
dependency
is generally just historical. A majority of these test passes can run on
any operation, or can operate on a specific interface
(FunctionOpInterface/SymbolOpInterface).
This allows for greatly reducing the API dependency on FuncOp, which
is slated to be moved out of the Builtin dialect.
Differential Revision: https://reviews.llvm.org/D121191
RegionBranchOpInterface and BranchOpInterface are allowed to make implicit type conversions along control-flow edges. In effect, this adds an interface method, `areTypesCompatible`, to both interfaces, which should return whether the types of corresponding successor operands and block arguments are compatible. Users of the interfaces, here on forth, must be aware that types may mismatch, although current users (in MLIR core), are not affected by this change. By default, type equality is used.
`async.execute` already has unequal types along control-flow edges (`!async.value<f32>` vs. `f32`), but it opted out of calling `RegionBranchOpInterface::verifyTypes` in its verifier. That method has now been removed and `RegionBranchOpInterface` will verify types along control edges by default in its verifier.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D120790