The affine.delinearize_index and affine.linearize_index operations, as
currently defined, require providing a length N basis to [de]linearize N
values. The first value in this basis is never used during lowering and
is unused during lowering. (Note that, even though it isn't used during
lowering it can still be used to, for example, remove length-1 outputs
from a delinearize).
This dead value makes sense in the original context of these operations,
which is linearizing or de-linearizing indexes to memref<>s, vector<>s,
and other shaped types, where that outer bound is avaliable and may be
useful for analysis.
However, other usecases exist where the outer bound is not known. For
example:
%thread_id_x = gpu.thread_id x : index
%0:3 = affine.delinearize_index %thread_id_x into (4, 16) : index,index,
index
In this code, we don't know the upper bound of the thread ID, but we do
want to construct the ?x4x16 grid of delinearized values in order to
further partition the GPU threads.
In order to support such usecases, we broaden the definition of
affine.delinearize_index and affine.linearize_index to make the outer
bound optional.
In the case of affine.delinearize_index, where the number of results is
a function of the size of the passed-in basis, we augment all existing
builders with a `hasOuterBound` argument, which, for backwards
compatibilty and to preserve the natural usage of the op, defaults to
`true`. If this flag is true, the op returns one result per basis
element, if it is false, it returns one extra result in position 0.
We also update existing canonicalization patterns (and move one of them
into the folder) to handle these cases. Note that disagreements about
the outer bound now no longer prevent delinearize/linearize
cancelations.
Fix the AffineIfOp's default builder such that it takes in an
IntegerSetAttr. AffineIfOp has skipDefaultBuilders=1 which effectively
skips the creation of the default AffineIfOp::builder on the C++ side.
(AffineIfOp has two custom OpBuilder defined in the
extraClassDeclaration.) However, on the python side, _affine_ops_gen.py
shows that the default builder is being created, but it does not accept
IntegerSet and thus is useless. This fix at line 411 makes the default
python AffineIfOp builder take in an IntegerSet input and does not
impact the C++ side of things.
This commit makes `affine.delinealize` join other indexing operators,
like `vector.extract`, which store a mixed static/dynamic set of sizes,
offsets, or such. In this case, the `basis` (the set of values that will
be used to decompose the linear index) is now stored as an array of
index attributes where the basis is statically known, eliminating the
need to cretae constants.
This commit also adds copies of the delinearize utility in the affine
dialect to allow it to take an array of `OpFoldResult`s and extends te
DynamicIndexList parser/printer to allow specifying the delimiters in
tablegen (this is needed to avoid breaking existing syntax).
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
This PR creates the necessary files to support bindings for operations
in the affine dialect.
This is the first of many PRs which will progressively introduce
affine.load, affine.for, etc operations. I would like to
acknowledge the work by Nelli's author @makslevental :
https://github.com/makslevental/nelli/blob/main/nelli/mlir/affine/affine.py
which jump-starts the work.