Application of sequence blocks in the transform interpreter assumes that
all operations (except for the terminator) in the sequence block have
the `TransformOpInterface`. For `SequenceOp`, this was already verified,
but not for `NamedSequenceOp`, causing assertion failures if the
assumption doesn't hold.
This change adds verification that all operations in the block except
for the terminator have the `TransformOpInterface`.
Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
This commit simplifies the design of the `RegionBranchOpInterface`. The
property of being a successor input is now independent of the region
branch point.
There is a new API for querying successor inputs:
`RegionBranchOpInterface::getSuccessorInputs(RegionSuccessor)`. Note
that this function does **not** take a `RegionBranchPoint` as parameter.
The `RegionSuccessor` API is now also simpler: it no longer stores
successor inputs. A region successor is simply `Region *`, wrapped
around a convenience API.
Note: This commit is mostly mechanical. Analyses / transformations that
build on top of the `RegionBranchOpInterface` (e.g.,
`visitNonControlFlowArguments` API) can likely be simplified in
follow-up commits.
Note for LLVM integration: Split
`RegionBranchOpInterface::getSuccessorRegion` implementations into two
functions: `getSuccessorRegion` and `getSuccessorInputs. (There are many
examples in this commit.)
RFC:
https://discourse.llvm.org/t/rfc-simplify-regionbranchopinterface-separate-successor-inputs-from-region-successor/89420/7
Simplify the design of `RegionSuccessor`. There is no need to store the
`Operation *` pointer when branching out of the region branch op (to the
parent). There is no API to even access the `Operation *` pointer.
Add a new helper function `RegionSuccessor::parent` to construct a
region successor that points to the parent. This aligns the
`RegionSuccessor` design and API with `RegionBranchPoint`:
* Both classes now have a `parent()` helper function.
`ClassName::parent()` can be used in documentation to precisely describe
the source/target of a region branch.
* Both classes now use `nullptr` internally to represent "parent".
This API change also protects against incorrect API usage: users can no
longer pass an incorrect parent op. If a region successor is not a
region of the region branch op, it *must* branch out of region branch op
itself ("parent"). However, the previous API allowed passing other
operations. There was one such API violation in a [test
case](https://github.com/llvm/llvm-project/pull/174945/files#diff-d5717e4a8d7344b2ff77762b8fa480bcfec0eeee97a86195c787d791a6217e13L71).
Also clean up the documentation to use the correct terminology (such as
"successor operands", "successor inputs") consistently.
Note: This PR effectively rolls back some changes from #161575. That PR
introduced `llvm::PointerUnion<Region *, Operation *>
successor{nullptr};`. It is unclear from the commit message why that
change was made.
Note for LLVM integration: You may have to slightly modify
`getSuccessorRegion` implementations: Replace
`RegionSuccessor(getOperation(), getOperation()->getResults())` with
`RegionSuccessor::parent(getResults())`.
This fixes a bug in the interpreter for transform.include op, which
crashes when attempting to copy out the handles from the yield op of a
failing sequence.
This is still somehow a WIP, we have some issues with this interface
that are not trivial to solve. This patch tries to make the concepts of
RegionBranchPoint and RegionSuccessor more robust and aligned with their
definition:
- A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`)
op or a `RegionBranchTerminatorOpInterface` operation in a nested
region.
- A `RegionSuccessor` is either one of the nested region or the parent
`RegionBranchOpInterface`
Some new methods with reasonnable default implementation are added to
help resolving the flow of values across the RegionBranchOpInterface.
It is still not trivial in the current state to walk the def-use chain
backward with this interface. For example when you have the 3rd block
argument in the entry block of a for-loop, finding the matching operands
requires to know about the hidden loop iterator block argument and where
the iterargs start. The API is designed around forward-tracking of the
chain unfortunately.
Try to reland #161575 ; I suspect a buildbot incremental build issue.
This is still somehow a WIP, we have some issues with this interface
that are not trivial to solve. This patch tries to make the concepts of
RegionBranchPoint and RegionSuccessor more robust and aligned with their
definition:
- A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`)
op or a `RegionBranchTerminatorOpInterface` operation in a nested
region.
- A `RegionSuccessor` is either one of the nested region or the parent
`RegionBranchOpInterface`
Some new methods with reasonnable default implementation are added to
help resolving the flow of values across the RegionBranchOpInterface.
It is still not trivial in the current state to walk the def-use chain
backward with this interface. For example when you have the 3rd block
argument in the entry block of a for-loop, finding the matching operands
requires to know about the hidden loop iterator block argument and where
the iterargs start. The API is designed around forward-tracking of the
chain unfortunately.
Fix crashes in the verifier of `transform.with_named_sequence` attribute
attached to a symbol table operation caused by it constructing a call
graph inside the symbol table. The call graph construction assumes calls
and callables, such as functions or named sequences, have been verified,
but it is not yet the case when the attribute verifier on the (parent)
symbol table operation runs. Trigger such verification manually before
constructing the call graph. This adds redundancy in verification, but
there is currently no mechanism to change the order of verificaiton. In
performance-critical scenarios, verification can be disabled altogether.
Remove unnecessary verfificaton from `transform::IncludeOp::getEffects`.
It was introduced along with the op definition as the op used to inspect
the body of callee, which assumed the body existed, to identify handle
consumption behavior. This was later evolved to having explicit argument
attributes on the callee, which handles the absence of such attributes
gracefully without the need for verification, but the verification was
never removed. It would have been causing infinite recursion if kept in
place.
Fixes#159646.
Fixes#159734.
Fixes#159736.
Also, improve LDBG() to accept debug type and level in any order, and
add unit-tests for LDBG() and LGDB_OS().
LDBG_OS() is a macro that behaves like LDBG() but instead of directly
using it to stream the output, it takes a callback function that will be
called with a raw_ostream.
This is a re-land with workarounds for older gcc and clang versions.
Previous attempts in #157194 and #158260
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
Also, improve LDBG() to accept debug type and level in any order, and
add unit-tests for LDBG() and LGDB_OS().
LDBG_OS() is a macro that behaves like LDBG() but instead of directly
using it to stream the output, it takes a callback function that will be
called with a raw_ostream.
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
Also, improve LDBG() to accept debug type and level in any order, and
add unit-tests for LDBG() and LGDB_OS().
LDBG_OS() is a macro that behaves like LDBG() but instead of directly
using it to stream the output, it takes a callback function that will be
called with a raw_ostream.
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
```
warning: comparison of unsigned expression in ‘< 0’ is always false [-Wtype-limits]
```
`size_t` is unsigned and always non-negative, whereas `getInt()` returns
a signless `int64_t`. To ensure type compatibility and eliminate the
warning, `dynamicOptionIdx` should be changed to `int64_t`.
Interpret an option value with multiple values, either in the form of an
`ArrayAttr` (either static or passed through a param) or as the multiple
attrs associated to a param, as a comma-separated list, i.e. as a
ListOption on a pass.
Improve ApplyRegisteredPassOp's support for taking options by taking
them as a dict (vs a list of string-valued key-value pairs).
Values of options are provided as either static attributes or as params
(which pass in attributes at interpreter runtime). In either case, the
keys and value attributes are converted to strings and a single
options-string, in the format used on the commandline, is constructed to
pass to the `addToPipeline`-pass API.
Makes it possible to pass around the options to a pass inside a schedule.
The refactoring also makes it so that the pass manager and pass are only
constructed once per `apply()` of the transform op versus for each target
payload given to the op's `apply()`.
This is similar to other configuration objects used across MLIR.
Rename some fields to better reflect that they are no longer booleans.
Reland 04d261101b4f229189463136a794e3e362a793af / #132253.
Print operations are often used for debugging, immediately before the
compiler aborts. In such cases, it is sometimes possible that the output
isn't fully produced yet. Make sure it is by explicitly flushing the
output.
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
This PR addresses a [comment] made by @ftynse about the syntax for
`ForeachOp`. The syntax was modified by @muneebkhan85 in #82792, where
the attribute dictionary was moved to the middle.
This patch moves it back to its original place at the end. And
introduces an optional keyword for `zip_shortest`.
[comment]:
https://github.com/llvm/llvm-project/pull/82792#pullrequestreview-2132814144
This patch enables continuous tiling of a target structured op using
diminishing tile sizes. In cases where the tensor dimensions are not
exactly divisible by the tile size, we are left with leftover tensor
chunks that are irregularly tiled. This approach enables tiling of the
leftover chunk with a smaller tile size and repeats this process
recursively using exponentially diminishing tile sizes. This eventually
generates a chain of loops that apply tiling using diminishing tile
sizes.
Adds `continuous_tile_sizes` op to the transform dialect. This op, when
given a tile size and a dimension, computes a series of diminishing tile
sizes that can be used to tile the target along the given dimension.
Additionally, this op also generates a series of chunk sizes that the
corresponding tile sizes should be applied to along the given dimension.
Adds `multiway` attribute to `transform.structured.split` that enables
multiway splitting of a single target op along the given dimension, as
specified in a list enumerating the chunk sizes.
This patch adds more precise side effects to the current ops with memory
effects, allowing us to determine which OpOperand/OpResult/BlockArgument
the
operation reads or writes, rather than just recording the reading and
writing
of values. This allows for convenient use of precise side effects to
achieve
analysis and optimization.
Related discussions:
https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243
Changes transform.foreach's interface to take multiple arguments, e.g.
transform.foreach %ops1, %ops2, %params : ... { ^bb0(%op1, %op2,
%param): BODY } The semantics are that the payloads for these handles
get iterated over as if the payloads have been zipped-up together - BODY
gets executed once for each such tuple. The documentation explains that
this implementation requires that the payloads have the same length.
This change also enables the target argument(s) to be any op/value/param
handle.
The added test cases demonstrate some use cases for this change.
A variable `typeConverterOp` may be nullptr after dynamic cast. There is
a security guard for this, but during logging error message the variable
getting dereferenced.
Found with static analysis.
It may be useful to have access to additional handles or parameters when
performing matches and actions in `foreach_match`, for example, to
parameterize the matcher by rank or restrict it in a non-trivial way.
Enable `foreach_match` to forward additional handles from operands to
matcher symbols and from action symbols to results.
Introduce 3 new optional attributes to the `transform.print` ops:
* `assume_verified`
* `use_local_scope`
* `skip_regions`
The primary motivation is to allow printing on large inputs that
otherwise take forever to print and verify. For the full context, see
this IREE issue: https://github.com/openxla/iree/issues/16901.
Also add some tests and fix the op description.
The original implementation was eagerly reporting silenceable failures
from actions as definite failures. Since silenceable failures are
intended for cases when the IR has not been irreversibly modified, it's
okay to propagate them as silenceable failures of the parent op.
Fixes#86834.
Transform op trait verification calls `getEffects`, and since trait
verification runs before op verification, this call cannot assume the op
to be valid. However, the operand getters now return a `TypedValue` that
unconditionally casts the value to the expected type, leading to an
assertion failure. Use the untyped mechanism instead.
Fixes#84701.