The `conv_2d_ngchw_fgchw` Op implements 2d grouped convolution with
dimensions ordered as given in the name. However, the current
implementation orders weights as `gfchw` instead of `fgchw`. This was
already pointed out in an old phabricator revision which never landed:
https://reviews.llvm.org/D150064
This patch
1) Adds a new op `conv_2d_ngchw_gfchw`
2) Fixes the dimension ordering of the old op `conv_2d_ngchw_fgchw`
3) Adds tests with non-dynamic dimensions so that it's easier to
understand.