Improve the bypass analysis for loop-like ops. Until now, loop-like ops
were treated like any other non-subset ops: they prevent hoisting of any
sort because the analysis does not know which parts of a tensor init
operand are accessed by the loop-like op. With this change, the analysis
can look into loop-like ops and analyze which subset they are operating
on.
Add a loop-invariant subset hoisting pass to `mlir/Interfaces`. This
pass hoist loop-invariant tensor subsets (subset extraction and subset
insertion ops) from loop-like ops. Extraction ops are moved before the
loop. Insertion ops are moved after the loop. The loop body operates on
newly added region iter_args (one per extraction-insertion pair).
This new pass will be improved in subsequent commits (to support more
cases/ops) and will eventually replace
`Linalg/Transforms/SubsetHoisting.cpp`. In contrast to the existing
Linalg subset hoisting, the new pass is op interface-based
(`SubsetOpInterface` and `LoopLikeOpInterface`).
This commit ensures that the CFG to SCF lifting does not accidentally
drop locations of loop latches during the lifting.
Note that I didn't add a test as we do not seem to have any tests for
location tracking in any of the similar passes.
This commit fixes a bug in the Mem2Reg operation erasure order.
Replacing the use-def based topological order with a dominance-based
weak order ensures that no operation is removed before all its uses have
been replaced. The order relation uses the topological order of blocks
and block internal ordering to determine a deterministic operation
order.
Additionally, the reliance on the `DenseMap` key order was eliminated by
switching to a `MapVector`, that gives a deterministic iteration order.
Example:
```
%ptr = alloca ...
...
%val0 = %load %ptr ... // LOAD0
store %val0 %ptr ...
%val1 = load %ptr ... // LOAD1
````
When promoting the slot backing %ptr, it can happen that the LOAD0 was
cleaned before LOAD1. This results in all uses of LOAD0 being replaced
by its reaching definition, before LOAD1's result is replaced by LOAD0's
result. The subsequent erasure of LOAD0 can thus not succeed, as it has
remaining usages.
Reverts the revert commit and fixes the weak ordering requirement of
`llvm::sort`.
Original commit message:
This commit fixes a bug in the Mem2Reg operation erasure order.
Replacing the topological order with a dominance based order ensures
that no operation is removed before all its uses have been replaced.
Additionally, the reliance on the `DenseMap` key order was eliminated by
switching to a `MapVector`, that gives a deterministic iteration order.
Example:
```
%ptr = alloca ...
...
%val0 = %load %ptr ... // LOAD0
store %val0 %ptr ...
%val1 = load %ptr ... // LOAD1
````
When promoting the slot backing %ptr, it can happen that the LOAD0 was
cleaned before LOAD1. This results in all uses of LOAD0 being replaced
by its reaching definition, before LOAD1's result is replaced by LOAD0's
result. The subsequent erasure of LOAD0 can thus not succeed, as it has
remaining usages.
This commit causes the following issue with sanitizers:
`include/c++/v1/__debug_utils/strict_weak_ordering_check.h:52: assertion
!__comp(*(__first + __b), *(__first + __a)) failed: Your comparator is
not a valid strict-weak ordering`
probably due to an invalid sort().
Revert "[MLIR][Transforms] Fix Mem2Reg removal order to respect
dominance (#68687)"
This reverts commit be81f42b551c8b3c520132c3d60bc19cfc1c72fb.
This commit fixes a bug in the Mem2Reg operation erasure order.
Replacing the topological order with a dominance based order ensures
that no operation is removed before all its uses have been replaced.
Additionally, the reliance on the `DenseMap` key order was eliminated by
switching to a `MapVector`, that gives a deterministic iteration order.
Example:
```
%ptr = alloca ...
...
%val0 = %load %ptr ... // LOAD0
store %val0 %ptr ...
%val1 = load %ptr ... // LOAD1
````
When promoting the slot backing %ptr, it can happen that the LOAD0 was
cleaned before LOAD1. This results in all uses of LOAD0 being replaced
by its reaching definition, before LOAD1's result is replaced by LOAD0's
result. The subsequent erasure of LOAD0 can thus not succeed, as it has
remaining usages.
The current loop-reduce-form transformation incorrectly assumes that any
value that is used in a block that isn't in the set of loop blocks is a
block outside the loop. This is correct for a pure CFG but is incorrect
if operations with subregions are present. In that case, a use may be in
a subregion of an operation part of the loop and incorrectly deemed
outside the loop. This would later lead to transformations with code
that does not verify.
This PR fixes that issue by checking the transitive parent block that is
in the same region as the loop rather than the immediate parent block.
When cloning an op, the `notifyOperationInserted` callback is triggered
for all nested ops. Similarly, the `notifyOperationRemoved` callback
should be triggered for all nested ops when removing an op.
Listeners may inspect the IR during a `notifyOperationRemoved` callback.
Therefore, when multiple ops are removed in a single
`RewriterBase::eraseOp` call, the notifications must be triggered in an
order in which the ops could have been removed one-by-one:
* Op removals must be interleaved with `notifyOperationRemoved`
callbacks. A callback is triggered right before the respective op is
removed.
* Ops are removed post-order and in reverse order. Other traversal
orders could delete an op that still has uses. (This is not avoidable in
graph regions and with cyclic block graphs.)
Differential Revision: Imported from https://reviews.llvm.org/D144193.
This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.
`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.
Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.
In line with #66515, change `MutableArrayRange::begin`/`end` to
enumerate `OpOperand &` instead of `Value`. Also remove
`ForOp::getIterOpOperands`/`setIterArg`, which are now redundant.
Note: `MutableOperandRange` cannot be made a derived class of
`indexed_accessor_range_base` (like `OperandRange`), because
`MutableOperandRange::assign` can change the number of operands in the
range.
`operator[]` returns `OpOperand &` instead of `Value`.
* This allows users to get OpOperands by name instead of "magic" number.
E.g., `extractSliceOp->getOpOperand(0)` can be written as
`extractSliceOp.getSourceMutable()[0]`.
* `OperandRange` provides a read-only API to operands: `operator[]`
returns `Value`. `MutableOperandRange` now provides a mutable API:
`operator[]` returns `OpOperand &`, which can be used to set operands.
Note: The TableGen code generator could be changed to return `OpOperand
&` (instead of `MutableOperandRange`) for non-variadic and non-optional
arguments in a subsequent change. Then the `[0]` part in the above
example would no longer be necessary.
Functions are always callable operations and thus every operation
implementing the `FunctionOpInterface` also implements the
`CallableOpInterface`. The only exception was the FuncOp in the toy
example. To make implementation of the `FunctionOpInterface` easier,
this commit lets `FunctionOpInterface` inherit from
`CallableOpInterface` and merges some of their methods. More precisely,
the `CallableOpInterface` has methods to get the argument and result
attributes and a method to get the result types of the callable region.
These methods are always implemented the same way as their analogues in
`FunctionOpInterface` and thus this commit moves all the argument and
result attribute handling methods to the callable interface as well as
the methods to get the argument and result types. The
`FuntionOpInterface` then does not have to declare them as well, but
just inherits them from the `CallableOpInterface`.
Adding the inheritance relation also required to move the
`FunctionOpInterface` from the IR directory to the Interfaces directory
since IR should not depend on Interfaces.
Reviewed By: jpienaar, springerm
Differential Revision: https://reviews.llvm.org/D157988
Do not inline IR with multiple blocks into ops that may not support unstructured control flow.
This fixes#64978.
Differential Revision: https://reviews.llvm.org/D159072
The current implementation is not very ergonomic or descriptive: It uses `std::optional<unsigned>` where `std::nullopt` represents the parent op and `unsigned` is the region number.
This doesn't give us any useful methods specific to region control flow and makes the code fragile to changes due to now taking the region number into account.
This patch introduces a new type called `RegionBranchPoint`, replacing all uses of `std::optional<unsigned>` in the interface. It can be implicitly constructed from a region or a `RegionSuccessor`, can be compared with a region to check whether the branch point is branching from the parent, adds `isParent` to check whether we are coming from a parent op and adds `RegionSuccessor::parent` as a descriptive way to indicate branching from the parent.
Differential Revision: https://reviews.llvm.org/D159116
This vector keeps tracks of recursive types through the recursive invocations
of `convertType()`. However this is something only useful for some specific
cases, in which the dedicated conversion callbacks can handle this stack
privately.
This allows removing a mutable member of the type converter.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158351
This commit fixes an error in the `RemoveDeadValues` pass that is
associated with its incorrect usage of the `cloneInto()` function.
The `setOperands()` function that is used by the `cloneInto()` function
requires all operands to not be null. But, that is not possible in this
pass because we drop uses of dead values, thus making them null. It is
only at the end of the pass that we are assured that such null values
won't exist but during the execution of the pass, there could be null
values.
To fix this, we replace the usage of the `cloneInto()` function to copy
a region with `moveBlock()` to move each block of the region one by one.
This function does not require the presence of non-null values and is
thus the right choice here. This implementation is also more opttimized
because we are moving things instead of copying them. The goal was
always moving.
Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com>
Reviewed By: srishti-pm
Differential Revision: https://reviews.llvm.org/D158941
Large deep learning models rely on heavy computations. However, not
every computation is necessary. And, even when a computation is
necessary, it helps if the values needed for the computation are
available in registers (which have low-latency) rather than being in
memory (which has high-latency).
Compilers can use liveness analysis to:-
(1) Remove extraneous computations from a program before it executes on
hardware, and,
(2) Optimize register allocation.
Both these tasks help achieve one very important goal: reducing runtime.
Recently, liveness analysis was added to MLIR. Thus, this commit uses
the recently added liveness analysis utility to try to accomplish task
(1).
It adds a pass called `remove-dead-values` whose goal is
optimization (reducing runtime) by removing unnecessary instructions.
Unlike other passes that rely on local information gathered from
patterns to accomplish optimization, this pass uses a full analysis of
the IR, specifically, liveness analysis, and is thus more powerful.
Currently, this pass performs the following optimizations:
(A) Removes function arguments that are not live,
(B) Removes function return values that are not live across all callers of
the function,
(C) Removes unneccesary operands, results, region arguments, region
terminator operands of region branch ops, and,
(D) Removes simple and region branch ops that have all non-live results and
don't affect memory in any way,
iff
the IR doesn't have any non-function symbol ops, non-call symbol user ops
and branch ops.
Here, a "simple op" refers to an op that isn't a symbol op, symbol-user op,
region branch op, branch op, region branch terminator op, or return-like.
It is noteworthy that we do not refer to non-live values as "dead" in this
file to avoid confusing it with dead code analysis's "dead", which refers to
unreachable code (code that never executes on hardware) while "non-live"
refers to code that executes on hardware but is unnecessary. Thus, while the
removal of dead code helps little in reducing runtime, removing non-live
values should theoretically have significant impact (depending on the amount
removed).
It is also important to note that unlike other passes (like `canonicalize`)
that apply op-specific optimizations through patterns, this pass uses
different interfaces to handle various types of ops and tries to cover all
existing ops through these interfaces.
It is because of its reliance on (a) liveness analysis and (b) interfaces
that makes it so powerful that it can optimize ops that don't have a
canonicalizer and even when an op does have a canonicalizer, it can perform
more aggressive optimizations, as observed in the test files associated with
this pass.
Example of optimization (A):-
```
int add_2_to_y(int x, int y) {
return 2 + y
}
print(add_2_to_y(3, 4))
print(add_2_to_y(5, 6))
```
becomes
```
int add_2_to_y(int y) {
return 2 + y
}
print(add_2_to_y(4))
print(add_2_to_y(6))
```
Example of optimization (B):-
```
int, int get_incremented_values(int y) {
store y somewhere in memory
return y + 1, y + 2
}
y1, y2 = get_incremented_values(4)
y3, y4 = get_incremented_values(6)
print(y2)
```
becomes
```
int get_incremented_values(int y) {
store y somewhere in memory
return y + 2
}
y2 = get_incremented_values(4)
y4 = get_incremented_values(6)
print(y2)
```
Example of optimization (C):-
Assume only `%result1` is live here. Then,
```
%result1, %result2, %result3 = scf.while (%arg1 = %operand1, %arg2 = %operand2) {
%terminator_operand2 = add %arg2, %arg2
%terminator_operand3 = mul %arg2, %arg2
%terminator_operand4 = add %arg1, %arg1
scf.condition(%terminator_operand1) %terminator_operand2, %terminator_operand3, %terminator_operand4
} do {
^bb0(%arg3, %arg4, %arg5):
%terminator_operand6 = add %arg4, %arg4
%terminator_operand5 = add %arg5, %arg5
scf.yield %terminator_operand5, %terminator_operand6
}
```
becomes
```
%result1, %result2 = scf.while (%arg2 = %operand2) {
%terminator_operand2 = add %arg2, %arg2
%terminator_operand3 = mul %arg2, %arg2
scf.condition(%terminator_operand1) %terminator_operand2, %terminator_operand3
} do {
^bb0(%arg3, %arg4):
%terminator_operand6 = add %arg4, %arg4
scf.yield %terminator_operand6
}
```
It is interesting to see that `%result2` won't be removed even though it is
not live because `%terminator_operand3` forwards to it and cannot be
removed. And, that is because it also forwards to `%arg4`, which is live.
Example of optimization (D):-
```
int square_and_double_of_y(int y) {
square = y ^ 2
double = y * 2
return square, double
}
sq, do = square_and_double_of_y(5)
print(do)
```
becomes
```
int square_and_double_of_y(int y) {
double = y * 2
return double
}
do = square_and_double_of_y(5)
print(do)
```
Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com>
Reviewed By: matthiaskramm, Mogball, jcai19
Differential Revision: https://reviews.llvm.org/D157049
It is surprising for the user that only some fields were honored.
Also make the FrozenRewritePatternSet a shared_ptr<const T>.
Fixes#64543
Differential Revision: https://reviews.llvm.org/D157469
To enable signature conversions to be used in CIRCT, locations should no longer be dropped from block arguments.
Reviewed By: Mogball, springerm
Differential Revision: https://reviews.llvm.org/D157882
This is a follow-up to https://reviews.llvm.org/D156889
Downstream projects may have more complicated ops than the control flow ops upstream and therefore need a more powerful interface to support the lifting process. Use cases include the propagation of (inherent) metadata that was previously on the control flow ops and now needs to be lifted to structured control flow ops.
Since the lifting process is inherently non-local in respect to the function-body, we require stronger guarantees from the interface.
This patch therefore makes two changes to the interface:
* Passes the terminator that is being replaced to `createStructuredBranchRegionTerminatorOp`
* Adds as precondition to `createCFGSwitchOp` that its predecessors are already correctly established
Asserts have been added to verify these were it makes sense and to correctly state intent. I have not added tests purely because testing preconditions like these is not really feasible (and incredibly specific).
Differential Revision: https://reviews.llvm.org/D157981
ConversionPatterns do not (and should not) modify the type converter that they are using.
* Make `ConversionPattern::typeConverter` const.
* Make member functions of the `LLVMTypeConverter` const.
* Conversion patterns take a const type converter.
* Various helper functions (that are called from patterns) now also take a const type converter.
Differential Revision: https://reviews.llvm.org/D157601
Functions that materialize IR or convert types can be const.
Caching data structures inside the TypeConverter are marked as `mutable`.
Differential Revision: https://reviews.llvm.org/D157597
Structured control flow ops have proven very useful for many transformations doing analysis on conditional flow and loops. Doing these transformations on CFGs requires repeated analysis of the IR possibly leading to more complicated or less capable implementations. With structured control flow, a lot of the information is already present in the structure.
This patch therefore adds a transformation making it possible to lift arbitrary control flow graphs to structured control flow operations. The algorithm used is outlined in https://dl.acm.org/doi/10.1145/2693261. The complexity in implementing the algorithm was mostly spent correctly handling block arguments in MLIR (the paper only addresses the control flow graph part of it).
Note that the transformation has been implemented fully generically and does not depend on any dialect. An interface implemented by the caller is used to construct any operation necessary for the transformation, making it possible to create an interface implementation purpose fit for ones IR.
For the purpose of testing and due to likely being a very common scenario, this patch adds an interface implementation lifting the control flow dialect to the SCF dialect.
Note the use of the word "lifting". Unlike other conversion passes, this pass is not 100% guaranteed to convert all ControlFlow ops.
Only if the input region being transformed contains a single kind of return-like operations is it guaranteed to replace all control flow ops. If that is not the case, exactly one control flow op will remain branching to regions terminating with a given return-like operation (e.g. one region terminates with `llvm.return` the other with `llvm.unreachable`).
Differential Revision: https://reviews.llvm.org/D156889
Add support for reasoning about operations with recursive memory effects
to CSE. The recursive effects are gathered by a helper function. I
decided to allow returning duplicates from the helper function because
there's no benefit to spending the computation time to remove them in
the existing use case.
Differential Revision: https://reviews.llvm.org/D156805
It is surprising for the user that only some fields were honored.
Also make the FrozenRewritePatternSet a shared_ptr<const T>.
Fixes#64543
Differential Revision: https://reviews.llvm.org/D157469
Also update the documentation of `Operation::fold`, which did not take into account in-place foldings.
Differential Revision: https://reviews.llvm.org/D155691
Commutative ops were previously folded with a special rule in `OperationFolder`. This change turns the folding into a proper `OpTrait` folder.
Differential Revision: https://reviews.llvm.org/D155687
`setListener` is dangerous because an already registered listener may accidentally be overwritten/replaced. (A `ForwardingListener` must be used in such cases.) This change updates a few trivial call sites of `setListener`, where no forwarding listener is needed.
Differential Revision: https://reviews.llvm.org/D155599
This fixes bad behavior of that class that surfaced in
https://reviews.llvm.org/D154299, where calling applySignatureConversion
left the insertion point different from before the call, which broke a
subsequent call to replaceOp. This patch introduces a fix in both
functions, each of which is enough to fix the specific problem in the
aforementioned diff: (1) applySignatureConversion now resets the
insertion point with a guard for the whole function and (2) replace sets
the insertion point to the op that should be replaced (and resets it
with a guard).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D154684
* All IR modifications are done with a rewriter.
* The new C++ entry point takes a `RewriterBase &`, which may have a listener attached to it.
This revision is useful because it allows users to run CSE and track IR modifications via a listener that can be attached to the rewriter.
This is a reupload. The original CL was reverted (9979417d4db4) due to a memory leak. The memory leak is unrelated to this change and fixed with D154185.
Differential Revision: https://reviews.llvm.org/D145226
Add an additional entry point so that CSE can be used without a pass. This allows CSE to be used from the Transform dialect without invalidating all handles.
* All IR modifications are done with a rewriter.
* The C++ entry point takes a `RewriterBase &`, which may have a listener attached to it. This allows users to track all IR modifications.
Differential Revision: https://reviews.llvm.org/D145226
This allows users of `applyPatternsAndFoldGreedily` to detect if any MLIR changes have occurred. An example use-case is where we expect the `applyPatternsAndFoldGreedily` to change the IR and want to validate that it indeed does change it.
Differential Revision: https://reviews.llvm.org/D153986
This revision introduces support for memset intrinsics in SROA and
mem2reg for the LLVM dialect. This is achieved for SROA by breaking
memsets of aggregates into multiple memsets of scalars, and for mem2reg
by promoting memsets of single integer slots into the value the memset
operation would yield.
The SROA logic supports breaking memsets of static size operating at the
start of a memory slot. The intended most common case is for memsets
covering the entirety of a struct, most often as a way to initialize it
to 0.
The mem2reg logic supports dynamic values and static sizes as input to
promotable memsets. This is achieved by lowering memsets into
`ceil(log_2(n))` LeftShift operations, `ceil(log_2(n))` Or operations
and up to one ZExt operation (for n the byte width of the integer),
computing in registers the integer value the memset would create. Only
byte-aligned integers are supported, more types could easily be added
afterwards.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D152367
`RewriterBase::Listener::notifyOperationReplaced` notifies observers that an op is about to be replaced with a range of values. This notification is not very useful for ops without results, because it does not specify the replacement op (and it cannot be deduced from the replacement values). It provides no additional information over the `notifyOperationRemoved` notification.
This revision adds an additional notification when a rewriter replaces an op with another op. By default, this notification triggers the original "op replaced with values" notification, so there is no functional change for existing code.
This new API is useful for the transform dialect, which needs to track op replacements. (Updated in a subsequent revision.)
Also includes minor documentation improvements.
Differential Revision: https://reviews.llvm.org/D152814
Instead of always taking the last op from the worklist, take a random one. For testing/debugging purposes only. This feature can be used to ensure that lowering pipelines work correctly regardless of the order in which ops are processed by the GreedyPatternRewriteDriver.
The randomizer can be enabled by setting a numeric `MLIR_GREEDY_REWRITE_RANDOMIZER_SEED` option.
Note: When enabled, 27 tests are currently failing. Partly because FileCheck tests are looking for exact IR.
Discussion: https://discourse.llvm.org/t/discussion-fuzzing-pattern-application/67911
Differential Revision: https://reviews.llvm.org/D142447
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D151542
Encapsulate all worklist-related functionality in a separate `Worklist` class. This makes the remaining code more readable and allows for custom worklist implementations (e.g., a randomized worklist for fuzzing pattern application: D142447).
Differential Revision: https://reviews.llvm.org/D151345
Boolean compiler flags (such as `DMLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`) show up in `mlir-config.h` as preprocessor defines that are either 0 or 1. Use `#if` instead of `#ifdef`.
This should have been part of D144552.