This makes it easy to compose the distribution computation with
other affine computations.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D98171
The current implementation of Value involves a pointer int pair with several different kinds of owners, i.e. BlockArgumentImpl*, Operation *, TrailingOpResult*. This design arose from the desire to save memory overhead for operations that have a very small number of results (generally 0-2). There are, unfortunately, many problematic aspects of the current implementation that make Values difficult to work with or just inefficient.
Operation result types are stored as a separate array on the Operation. This is very inefficient for many reasons: we use TupleType for multiple results, which can lead to huge amounts of memory usage if multi-result operations change types frequently(they do). It also means that simple methods like Value::getType/Value::setType now require complex logic to get to the desired type.
Value only has one pointer bit free, severely limiting the ability to use it in things like PointerUnion/PointerIntPair. Given that we store the kind of a Value along with the "owner" pointer, we only leave one bit free for users of Value. This creates situations where we end up nesting PointerUnions to be able to use Value in one.
As noted above, most of the methods in Value need to branch on at least 3 different cases which is both inefficient, possibly error prone, and verbose. The current storage of results also creates problems for utilities like ValueRange/TypeRange, which want to efficiently store base pointers to ranges (of which Operation* isn't really useful as one).
This revision greatly simplifies the implementation of Value by the introduction of a new ValueImpl class. This class contains all of the state shared between all of the various derived value classes; i.e. the use list, the type, and the kind. This shared implementation class provides several large benefits:
* Most of the methods on value are now branchless, and often one-liners.
* The "kind" of the value is now stored in ValueImpl instead of Value
This frees up all of Value's pointer bits, allowing for users to take full advantage of PointerUnion/PointerIntPair/etc. It also allows for storing more operation results as "inline", 6 now instead of 2, freeing up 1 word per new inline result.
* Operation result types are now stored in the result, instead of a side array
This drops the size of zero-result operations by 1 word. It also removes the memory crushing use of TupleType for operations results (which could lead up to hundreds of megabytes of "dead" TupleTypes in the context). This also allowed restructured ValueRange, making it simpler and one word smaller.
This revision does come with two conceptual downsides:
* Operation::getResultTypes no longer returns an ArrayRef<Type>
This conceptually makes some usages slower, as the iterator increment is slightly more complex.
* OpResult::getOwner is slightly more expensive, as it now requires a little bit of arithmetic
From profiling, neither of the conceptual downsides have resulted in any perceivable hit to performance. Given the advantages of the new design, most compiles are slightly faster.
Differential Revision: https://reviews.llvm.org/D97804
This patch continues detensorizing implementation by detensoring
internal control flow in functions.
In order to detensorize functions, all the non-entry block's arguments
are detensored and branches between such blocks are properly updated to
reflect the detensored types as well. Function entry block (signature)
is left intact.
This continues work towards handling github/google/iree#1159.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D97148
Just a pure method renaming.
It is a preparation step for replacing "memory space as raw integer"
with more generic "memory space as attribute", which will be done in
separate commit.
The `MemRefType::getMemorySpace` method will return `Attribute` and
become the main API, while `getMemorySpaceAsInt` will be declared as
deprecated and will be replaced in all in-tree dialects (also in separate
commits).
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D97476
This also exposed a bug in Dialect loading where it was not correctly identifying identifiers that had the dialect namespace as a prefix.
Differential Revision: https://reviews.llvm.org/D97431
Fixes a bug in affine fusion pipeline where an incorrect fusion is performed
despite a Call Op that potentially modifies memrefs under consideration
exists between source and target.
Fixes part of https://bugs.llvm.org/show_bug.cgi?id=49220
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/D97252
This patch handles defining ops between the source and dest loop nests, and prevents loop nests with `iter_args` from being fused.
If there is any SSA value in the dest loop nest whose defining op has dependence from the source loop nest, we cannot fuse the loop nests.
If there is a `affine.for` with `iter_args`, prevent it from being fused.
Reviewed By: dcaballe, bondhugula
Differential Revision: https://reviews.llvm.org/D97030
This prevents a bug in the pass instrumentation implementation where the main thread would end up with a different pass manager in different runs of the pass.
llvm::parallelTransformReduce does not schedule work on the caller thread, which becomes very costly for
the inliner where a majority of SCCs are small, often ~1 element. The switch to llvm::parallelForEach solves this,
and also aligns the implementation with the PassManager (which realistically should share the same implementation).
This change dropped compile time on an internal benchmark by ~1(25%) second.
Differential Revision: https://reviews.llvm.org/D96086
Affine parallel ops may contain and yield results from MemRefsNormalizable ops in the loop body. Thus, both affine.parallel and affine.yield should have the MemRefsNormalizable trait.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D96821
The current implementation of tilePerfectlyNested utility doesn't handle
the non-unit step size. We have added support to perform tiling
correctly even if the step size of the loop to be tiled is non-unit.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49188.
Differential Revision: https://reviews.llvm.org/D97037
This commit fixes a bug in affine fusion pipeline where an
incorrect fusion is performed despite a dealloc op is present
between a producer and a consumer. This is done by creating a
node for dealloc op in the MDG.
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/D97032
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h
Working on a fix.
This reverts commit 8aa6c3765b924d86f623d452777eb76b83bf2787.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.
Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp
The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667
Differential Revision: https://reviews.llvm.org/D96425
Separating the AffineMapAccessInterface from AffineRead/WriteOp interface so that dialects which extend Affine capabilities (e.g. PlaidML PXA = parallel extensions for Affine) can utilize relevant passes (e.g. MemRef normalization).
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D96284
SliceAnalysis originally was developed in the context of affine.for within mlfunc.
It predates the notion of region.
This revision updates it to not hardcode specific ops like scf::ForOp.
When rooted at an op, the behavior of the slice computation changes as it recurses into the regions of the op. This does not support gathering all values transitively depending on a loop induction variable anymore.
Additional variants rooted at a Value are added to also support the existing behavior.
Differential revision: https://reviews.llvm.org/D96702
These properties were useful for a few things before traits had a better integration story, but don't really carry their weight well these days. Most of these properties are already checked via traits in most of the code. It is better to align the system around traits, and improve the performance/cost of traits in general.
Differential Revision: https://reviews.llvm.org/D96088
This makes ignoring a result explicit by the user, and helps to prevent accidental errors with dropped results. Marking LogicalResult as no discard was always the intention from the beginning, but got lost along the way.
Differential Revision: https://reviews.llvm.org/D95841
In dialect conversion infrastructure, source materialization applies as part of
the finalization procedure to results of the newly produced operations that
replace previously existing values with values having a different type.
However, such operations may be created to replace operations created in other
patterns. At this point, it is possible that the results of the _original_
operation are still in use and have mismatching types, but the results of the
_intermediate_ operation that performed the type change are not in use leading
to the absence of source materialization. For example,
%0 = dialect.produce : !dialect.A
dialect.use %0 : !dialect.A
can be replaced with
%0 = dialect.other : !dialect.A
%1 = dialect.produce : !dialect.A // replaced, scheduled for removal
dialect.use %1 : !dialect.A
and then with
%0 = dialect.final : !dialect.B
%1 = dialect.other : !dialect.A // replaced, scheduled for removal
%2 = dialect.produce : !dialect.A // replaced, scheduled for removal
dialect.use %2 : !dialect.A
in the same rewriting, but only the %1->%0 replacement is currently considered.
Change the logic in dialect conversion to look up all values that were replaced
by the given value and performing source materialization if any of those values
is still in use with mismatching types. This is performed by computing the
inverse value replacement mapping. This arguably expensive manipulation is
performed only if there were some type-changing replacements. An alternative
could be to consider all replaced operations and not only those that resulted
in type changes, but it would harm pattern-level composability: the pattern
that performed the non-type-changing replacement would have to be made aware of
the type converter in order to call the materialization hook.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D95626
We could extend this with an interface to allow dialect to perform a type
conversion, but that would make the folder creating operation which isn't
the case at the moment, and isn't necessarily always desirable.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D95991
In dialect conversion, signature conversions essentially perform block argument
replacement and are added to the general value remapping. However, the replaced
values were not tracked, so if a signature conversion was rolled back, the
construction of operand lists for the following patterns could have obtained
block arguments from the mapping and give them to the pattern leading to
use-after-free. Keep track of signature conversions similarly to normal block
argument replacement, and erase such replacements from the general mapping when
the conversion is rolled back.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D95688
Currently, for a scf.parallel (i,j,k) after the loop collapsing to 1D is done, the
IVs would be traversed as for an scf.parallel(k,j,i).
Differential Revision: https://reviews.llvm.org/D95693
This patch adds support for producer-consumer fusion scenarios with
multiple producer stores to the AffineLoopFusion pass. The patch
introduces some changes to the producer-consumer algorithm, including:
* For a given consumer loop, producer-consumer fusion iterates over its
producer candidates until a fixed point is reached.
* Producer candidates are gathered beforehand for each iteration of the
consumer loop and visited in reverse program order (not strictly guaranteed)
to maximize the number of loops fused per iteration.
In general, these changes were needed to simplify the multi-store producer
support and remove some of the workarounds that were introduced in the past
to support more fusion cases under the single-store producer limitation.
This patch also preserves the existing functionality of AffineLoopFusion with
one minor change in behavior. Producer-consumer fusion didn't fuse scenarios
with escaping memrefs and multiple outgoing edges (from a single store).
Multi-store producer scenarios will usually (always?) have multiple outgoing
edges so we couldn't fuse any with escaping memrefs, which would greatly limit
the applicability of this new feature. Therefore, the patch enables fusion for
these scenarios. Please, see modified tests for specific details.
Reviewed By: andydavis1, bondhugula
Differential Revision: https://reviews.llvm.org/D92876
This extracts the implementation of getType, setType, and getBody from
FunctionSupport.h into the mlir::impl namespace and defines them
generically in FunctionSupport.cpp. This allows them to be used
elsewhere for any FunctionLike ops that use FunctionType for their
type signature.
Using the new helpers, FuncOpSignatureConversion is generalized to
work with all such FunctionLike ops. Convenience helpers are added to
configure the pattern for a given concrete FunctionLike op type.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D95021
This patch adds support for producer-consumer fusion scenarios with
multiple producer stores to the AffineLoopFusion pass. The patch
introduces some changes to the producer-consumer algorithm, including:
* For a given consumer loop, producer-consumer fusion iterates over its
producer candidates until a fixed point is reached.
* Producer candidates are gathered beforehand for each iteration of the
consumer loop and visited in reverse program order (not strictly guaranteed)
to maximize the number of loops fused per iteration.
In general, these changes were needed to simplify the multi-store producer
support and remove some of the workarounds that were introduced in the past
to support more fusion cases under the single-store producer limitation.
This patch also preserves the existing functionality of AffineLoopFusion with
one minor change in behavior. Producer-consumer fusion didn't fuse scenarios
with escaping memrefs and multiple outgoing edges (from a single store).
Multi-store producer scenarios will usually (always?) have multiple outgoing
edges so we couldn't fuse any with escaping memrefs, which would greatly limit
the applicability of this new feature. Therefore, the patch enables fusion for
these scenarios. Please, see modified tests for specific details.
Reviewed By: andydavis1, bondhugula
Differential Revision: https://reviews.llvm.org/D92876
Add a check if regions do not implement the RegionBranchOpInterface. This is not
allowed in the current deallocation steps. Furthermore, we handle edge-cases,
where a single region is attached and the parent operation has no results.
This fixes: https://bugs.llvm.org/show_bug.cgi?id=48575
Differential Revision: https://reviews.llvm.org/D94586
This revision adds a new `replaceOpWithIf` hook that replaces uses of an operation that satisfy a given functor. If all uses are replaced, the operation gets erased in a similar manner to `replaceOp`. DialectConversion support will be added in a followup as this requires adjusting how replacements are tracked there.
Differential Revision: https://reviews.llvm.org/D94632
This corrects the last 2 issues caught by tests when causing dialect
conversion rollbacks to occur.
Differential Revision: https://reviews.llvm.org/D94623
getDynOperands behavior is commonly used in a number of passes. Refactored to
use a helper function and avoid code reuse.
Differential Revision: https://reviews.llvm.org/D94340
This revision adds a new `initialize(MLIRContext *)` hook to passes that allows for them to initialize any heavy state before the first execution of the pass. A concrete use case of this is with patterns that rely on PDL, given that PDL is compiled at run time it is imperative that compilation results are cached as much as possible. The first use of this hook is in the Canonicalizer, which has the added benefit of reducing the number of expensive accesses to the context when collecting patterns.
Differential Revision: https://reviews.llvm.org/D93147
This class used to serve a few useful purposes:
* Allowed containing a null DictionaryAttr
* Provided some simple mutable API around a DictionaryAttr
The first of which is no longer an issue now that there is much better caching support for attributes in general, and a cache in the context for empty dictionaries. The second results in more trouble than it's worth because it mutates the internal dictionary on every action, leading to a potentially large number of dictionary copies. NamedAttrList is a much better alternative for the second use case, and should be modified as needed to better fit it's usage as a DictionaryAttrBuilder.
Differential Revision: https://reviews.llvm.org/D93442
This better matches the rest of the infrastructure, is much simpler, and makes it easier to move these types to being declaratively specified.
Differential Revision: https://reviews.llvm.org/D93432
The current condition implies that the target materialization will be
called even if the type is the new operand type is legal, but slightly
different. For example, if there is a bufferization pattern that changes
memref layout, then target materialization for an illegal type
(TensorType) would be called.
Differential Revision: https://reviews.llvm.org/D93126
Now that passes have support for running nested pipelines, the inliner can now allow for users to provide proper nested pipelines to use for optimization during inlining. This revision also changes the behavior of optimization during inlining to optimize before attempting to inline, which should lead to a more accurate cost model and prevents the need for users to schedule additional duplicate cleanup passes before/after the inliner that would already be run during inlining.
Differential Revision: https://reviews.llvm.org/D91211
This reverts commit 0d48d265db6633e4e575f81f9d3a52139b1dc5ca.
This reapplies the following commit, with a fix for CAPI/ir.c:
[mlir] Start splitting the `tensor` dialect out of `std`.
This starts by moving `std.extract_element` to `tensor.extract` (this
mirrors the naming of `vector.extract`).
Curiously, `std.extract_element` supposedly works on vectors as well,
and this patch removes that functionality. I would tend to do that in
separate patch, but I couldn't find any downstream users relying on
this, and the fact that we have `vector.extract` made it seem safe
enough to lump in here.
This also sets up the `tensor` dialect as a dependency of the `std`
dialect, as some ops that currently live in `std` depend on
`tensor.extract` via their canonicalization patterns.
Part of RFC: https://llvm.discourse.group/t/rfc-split-the-tensor-dialect-from-std/2347/2
Differential Revision: https://reviews.llvm.org/D92991
This reverts commit cab8dda90f48e15ee94b0d55ceac5b6a812e4743.
I mistakenly thought that CAPI/ir.c failure was unrelated to this
change. Need to debug it.
This starts by moving `std.extract_element` to `tensor.extract` (this
mirrors the naming of `vector.extract`).
Curiously, `std.extract_element` supposedly works on vectors as well,
and this patch removes that functionality. I would tend to do that in
separate patch, but I couldn't find any downstream users relying on
this, and the fact that we have `vector.extract` made it seem safe
enough to lump in here.
This also sets up the `tensor` dialect as a dependency of the `std`
dialect, as some ops that currently live in `std` depend on
`tensor.extract` via their canonicalization patterns.
Part of RFC: https://llvm.discourse.group/t/rfc-split-the-tensor-dialect-from-std/2347/2
Differential Revision: https://reviews.llvm.org/D92991