PR makes winograd.output_transform op a destination style op and fixes
handing of a pre-existing data in its output argument (i.e. possibly
pre-initialized with bias, which was discarded before).
---------
Signed-off-by: Dmitriy Smirnov <dmitriy.smirnov@arm.com>
Winograd lowering involves a number of matmul and batch_matmul which
are currently passed tensor.empty result as out parameter, thereby
are undefined behaviour. This commit adds the necessary linalg.fill.
---------
Co-authored-by: Max191 <44243577+Max191@users.noreply.github.com>
In order to support arbitrary size input data of conv2d, implement
TilingInterface for winograd operations. Before converting winograd
operations into nested loops with matrix multiply, tile the input of
conv2d into the supported size first.
Add a transform operation structured.decompose_winograd_op to decompose
winograd operations. Before applying the transform op, use
tile_using_for to tile the input data into supported size. The test case
shows how to tile and decompose winograd operations.
Convert Linalg winograd_filter_transform, winograd_input_transform, and
winograd_output_transform into nested loops with matrix multiplication
with constant transform matrices.
Support several configurations of Winograd Conv2D, including F(2, 3),
F(4, 3) and F(2, 5). These configurations show that the implementation
can support different kernel size (3 and 5) and different output size
(2 and 4). Besides symetric kernel size 3x3 and 5x5, this patch also
supports 1x3, 3x1, 1x5, and 5x1 kernels.
The implementation is based on the paper, Fast Algorithm for
Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308)
Reviewers: ftynse, Max191, GeorgeARM, nicolasvasilache, MaheshRavishankar, dcaballe, rengolin
Reviewed By: ftynse, Max191
Pull Request: https://github.com/llvm/llvm-project/pull/96183
Define high level winograd operators and convert conv_2d_nhwc_fhwc into
winograd operators. According to Winograd Conv2D algorithm, we need
three transform operators for input, filter, and output transformation.
The formula of Winograd Conv2D algorithm is
Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A
filter transform: G x g x G^T
input transform: B^T x d x B
output transform: A^T x y x A
The implementation is based on the paper, Fast Algorithm for
Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308)
Reviewers: stellaraccident, ftynse, Max191, GeorgeARM, cxy-1993, nicolasvasilache, MaheshRavishankar, dcaballe, rengolin
Reviewed By: ftynse, Max191, stellaraccident
Pull Request: https://github.com/llvm/llvm-project/pull/96181