21 Commits

Author SHA1 Message Date
Benjamin Maxwell
f36e909da0 [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
Reland of the original patch after updating the Python binding tests,
a few CUDA/GPU MLIR tests, and ensuring the assembly format is
round-trippable.

This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.

The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.

To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

  vector.print punctuation <comma>

lowers to

  llvm.call @printComma() : () -> ()

The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).

Reviewed By: awarzynski, c-rhodes, aartbik

Differential Revision: https://reviews.llvm.org/D156519
2023-08-11 09:29:54 +00:00
Mehdi Amini
1b272d21c8 Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors"
This reverts commit 490dae26cb3bee2e8401e4c2a7ad3e0996be67d0.

Bot is broken, seems like there is a problem of ambiguity in the parser.
2023-08-09 19:37:01 -07:00
Benjamin Maxwell
490dae26cb [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
Reland of the original patch after updating the Python binding tests and
a few CUDA/GPU MLIR tests.

This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.

The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.

To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

  vector.print <comma>

lowers to

  llvm.call @printComma() : () -> ()

The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).

Reviewed By: awarzynski, c-rhodes, aartbik

Differential Revision: https://reviews.llvm.org/D156519
2023-08-09 11:47:18 +00:00
Benjamin Maxwell
b160442dd2 Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors"
This reverts commit 3875804a0725c6490b4c0e76e1c0e1e0dbccedf4.

This caused some test failures for the MLIR python bindings. Reverting
until those are addressed.
2023-08-09 09:54:05 +00:00
Benjamin Maxwell
3875804a07 [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.

The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.

To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

  vector.print <comma>

lowers to

  llvm.call @printComma() : () -> ()

The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).

Reviewed By: awarzynski, c-rhodes, aartbik

Differential Revision: https://reviews.llvm.org/D156519
2023-08-09 09:38:05 +00:00
Daniil Dudkin
8be07adfb4 [mlir][LLVM] Introduce reduction intrinsics for minimum/maximum
This patch adds supports for the reduction intrinsic
for floating point minimum and maximum that have
been added to LLVM by https://reviews.llvm.org/D152370.

Related to: #63969

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D155869
2023-07-22 16:25:32 +03:00
Tobias Hieta
f9008e6366
[NFC][Py Reformat] Reformat python files in mlir subdir
This is an ongoing series of commits that are reformatting our
Python code.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Differential Revision: https://reviews.llvm.org/D150782
2023-05-26 08:05:40 +02:00
Brandon Myers
85fe8e01a0 [mlir] Add mlir::LLVM::FastmathFlags to LLVM instrinsic vector reductions
Rationale:
The LLVM dialect supports passing fastmath flags from floating point ops to LLVMIR instructions. However, not all LLVM ops have the required attribute. This change adds support for fastmath flags to `llvm.intr.vector.reduce.{fmin,fmax}`. One scenario where this is useful is in lowering llvm.intr.vector.reduce.{fmax,fmin} to LLVMIR with `nnan` (NoNans) flag so it may be [[ 115c7beda7/llvm/lib/CodeGen/ExpandReductions.cpp (L159) | lowered to a shuffle reduction ]].

Changes:

  - Make `LLVM_VecReductionF` implement the `FastmathFlagsInterface`; change is modeled on `LLVM_UnaryIntrOpF`
  - Add an assembly format for `LLVM_VecReductionF` ops. The purpose is to keep existing functionality: avoid printing the fastmath flags attribute when it has its default value (`none`). Change is modeled on `LLVM_UnaryIntrOpBase`

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D145692
2023-03-10 09:14:16 -08:00
Markus Böck
9048ea28da Reland "[mlir] Make the vast majority of intgration and runner tests work on Windows"
This reverts commit 5561e174117ff395d65b6978d04b62c1a1275138

The logic was moved from cmake into lit fixing the issue that lead to the revert and potentially others with multi-config cmake generators

Differential Revision: https://reviews.llvm.org/D143925
2023-02-15 19:14:43 +01:00
Aart Bik
5561e17411 Revert "[mlir] Make the vast majority of integration and runner tests work on Windows"
This reverts commit 161b9d741a3c25f7bd79620598c5a2acf3f0f377.

REASON:

cmake --build . --target check-mlir-integration

Failed Tests (186):
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-addi-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-cmpi-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-compare-results-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-constants-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-max-min-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-muli-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-shli-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-shrsi-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-shrui-i16.mlir
  MLIR :: Integration/Dialect/Async/CPU/microbench-linalg-async-parallel-for.mlir
  MLIR :: Integration/Dialect/Async/CPU/microbench-scf-async-parallel-for.mlir
  MLIR :: Integration/Dialect/Async/CPU/test-async-parallel-for-1d.mlir
  MLIR :: Integration/Dialect/Async/CPU/test-async-parallel-for-2d.mlir
  MLIR :: Integration/Dialect/Complex/CPU/correctness.mlir
  MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir
  MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm.mlir
  MLIR :: Integration/Dialect/LLVMIR/CPU/test-vector-reductions-fp.mlir
  MLIR :: Integration/Dialect/LLVMIR/CPU/test-vector-reductions-int.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/matmul-vs-matvec.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/rank-reducing-subview.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-collapse-tensor.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-1d-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-1d-nwc-wcf-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-2d-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-2d-nhwc-hwcf-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-3d-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-3d-ndhwc-dhwcf-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-elementwise.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-expand-tensor.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-one-shot-bufferize.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-padtensor.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-subtensor-insert-multiple-uses.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-subtensor-insert.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-tensor-e2e.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-tensor-matmul.mlir
  MLIR :: Integration/Dialect/Memref/cast-runtime-verification.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/concatenate.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/dense_output.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/dense_output_bf16.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/dense_output_f16.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_abs.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_binary.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_cast.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_codegen_dim.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_codegen_foreach.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_complex32.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_complex64.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_complex_ops.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_constant_to_sparse_tensor.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_1d_nwc_wcf.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_2d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_2d_nhwc_hwcf.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_3d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_3d_ndhwc_dhwcf.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_dyn.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_ptr.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_sparse2dense.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_sparse2sparse.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_dot.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_expand.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_file_io.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_filter_conv2d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_flatten.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_foreach_slices.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_index.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_index_dense.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_insert_1d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_insert_2d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_insert_3d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_matmul.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_matrix_ops.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_matvec.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_mttkrp.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_out_mult_elt.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_out_reduction.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_out_simple.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_pack.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_quantized_matmul.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_re_im.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom_prod.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reductions.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reductions_prod.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reshape.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_rewrite_push_back.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort_coo.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sampled_matmul.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sampled_mm_fusion.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_scale.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_scf_nested.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_select.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sign.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sorted_coo.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_spmm.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_storage.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum_bf16.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum_c32.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum_f16.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_tanh.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_tensor_mul.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_tensor_ops.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_transpose.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_unary.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_vector_ops.mlir
  MLIR :: Integration/Dialect/SparseTensor/python/test_SDDMM.py
  MLIR :: Integration/Dialect/SparseTensor/python/test_SpMM.py
  MLIR :: Integration/Dialect/SparseTensor/python/test_elementwise_add_sparse_output.py
  MLIR :: Integration/Dialect/SparseTensor/python/test_output.py
  MLIR :: Integration/Dialect/SparseTensor/python/test_stress.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_MTTKRP.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_SDDMM.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_SpMM.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_SpMV.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_Tensor.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_scalar_tensor_algebra.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_simple_tensor_algebra.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_tensor_complex.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_tensor_types.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_tensor_unary_ops.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_true_dense_tensor_algebra.py
  MLIR :: Integration/Dialect/SparseTensor/taco/unit_test_tensor_core.py
  MLIR :: Integration/Dialect/SparseTensor/taco/unit_test_tensor_io.py
  MLIR :: Integration/Dialect/SparseTensor/taco/unit_test_tensor_utils.py
  MLIR :: Integration/Dialect/Standard/CPU/test-ceil-floor-pos-neg.mlir
  MLIR :: Integration/Dialect/Standard/CPU/test_subview.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-mulf-full.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-mulf.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-muli-ext.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-muli-full.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-muli.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-tilezero-block.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-tilezero.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-dot.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-inline-asm-vector-avx512.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-mask-compress.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-rsqrt.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-sparse-dot-product.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-vp2intersect-i32.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-0-d-vectors.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-broadcast.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-compress.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-constant-mask.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-contraction.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-create-mask-v4i1.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-create-mask.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-expand.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-extract-strided-slice.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-flat-transpose-col.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-flat-transpose-row.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-fma.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-gather.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-index-vectors.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-insert-strided-slice.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-maskedload.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-maskedstore.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-matrix-multiply-col.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-matrix-multiply-row.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-outerproduct-f32.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-outerproduct-i64.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-print-int.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-realloc.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f32-reassoc.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f32.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f64-reassoc.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f64.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-i32.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-i4.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-i64.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-si4.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-ui4.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-scan.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-scatter.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-shape-cast.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-shuffle.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-sparse-dot-matvec.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-sparse-saxpy-jagged-matvec.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read-3d.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-to-loops.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-write.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transpose.mlir

Testing Time: 0.29s
  Unsupported:  31
  Passed     :   5
  Failed     : 186

Differential Revision: https://reviews.llvm.org/D143970
2023-02-13 18:30:52 -08:00
Markus Böck
161b9d741a [mlir] Make the vast majority of integration and runner tests work on Windows
This patch contains the changes required to make the vast majority of integration and runner tests run on Windows.
Historically speaking, the JIT support for Windows has been lacking behind, but recent versions of ORC JIT have now caught up and works for basically all examples in repo.

Sadly due to these tests previously not working on Windows, basically all of them are making unix-like assumptions about things like filenames, paths, shell syntax etc.
This patch fixes all these issues in one big swoop and enables Windows support for the vast majority of integration tests.

More specifically, following changes had to be done:
* The various JIT runners used paths to the runtime libraries that assumed a Unix toolchain layout and filenames. I abstracted the specific path and filename of these runtime libraries away by making the paths to the runtime libraries be passed from cmake into lit. This now also allows a much more convenient syntax: `--shared-libs=%mlir_c_runner_utils` instead of `--shared-libs=%mlir_lib_dir/lib/libmlir_c_runner_utils%shlibext`
* Some tests using python set environment variables using the `ENV=VALUE cmd` format. This works on Unix, but on Windows it has to prefixed using `env ENV=VALUE cmd`
* Some tests used C functions that are simply not available or exported on Windows (`fabsf`, `aligned_alloc`). These tests have either been adjusted or explicitly marked as `UNSUPPORTED`

Some tests remain disabled on Windows as before:
* In SparseTensor some tests have non-trivial logic for finding the runtime libraries which seems to be required for the use of emulators. I do not have the time to port these so I simply kept them disabled
* Some tests requiring special hardware which I simply cannot test remain disabled on Windows. These include usage of AVX512 or AMX

The tests for `mlir-vulkan-runner` and `mlir-spirv-runner` all work now as well and so do the vast majority of `mlir-cpu-runner`.

Differential Revision: https://reviews.llvm.org/D143925
2023-02-13 22:24:20 +01:00
Quentin Colombet
cb4ccd38fa [mlir][Conversion] Rename the MemRefToLLVM pass
Since the recent MemRef refactoring that centralizes the lowering of
complex MemRef operations outside of the conversion framework, the
MemRefToLLVM pass doesn't directly convert these complex operations.

Instead, to fully convert the whole MemRef dialect space, MemRefToLLVM
needs to run after `expand-strided-metadata`.

Make this more obvious by changing the name of the pass and the option
associated with it from `convert-memref-to-llvm` to
`finalize-memref-to-llvm`.
The word "finalize" conveys that this pass needs to run after something
else and that something else is documented in its tablegen description.

This is a follow-up patch related to the conversation at:
https://discourse.llvm.org/t/psa-you-need-to-run-expand-strided-metadata-before-memref-to-llvm-now/66956/14

Differential Revision: https://reviews.llvm.org/D142463
2023-01-27 09:10:10 +00:00
zhanghb97
ee82b864f2 [mlir] Initial MLIR VP intrinsic integration test on host and RVV emulator.
This patch adds the initial VP intrinsic integration test on the host backend and RVV emulator. Please see more detailed [discussion on the discourse](https://discourse.llvm.org/t/mlir-vp-ops-on-rvv-backend-integration-test-and-issues-report/66343).

- Run the test cases on the host by configuring the CMake option: `-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`
- Build the RVV environment and run the test cases on RVV QEMU by [this doc](https://gist.github.com/zhanghb97/ad44407e169de298911b8a4235e68497).

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D137816
2022-11-22 17:30:24 +08:00
Christian Sigg
0f2ec35691 [MLIR] Switch lit tests to %mlir_lib_dir and %mlir_src_dir replacements.
The old replacements will be removed soon:
- `%linalg_test_lib_dir`
- `%cuda_wrapper_library_dir`
- `%spirv_wrapper_library_dir`
- `%vulkan_wrapper_library_dir`
- `%mlir_runner_utils_dir`
- `%mlir_integration_test_dir`

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D133270
2022-09-06 12:34:14 +02:00
Jeff Niu
491dd5a3b2 [mlir][LLVMIR] Fix syntax in integration tests (NFC) 2022-08-18 13:09:34 -04:00
Jeff Niu
b2ccfb4d95 [mlir][LLVMIR] Change ShuffleVectorOp to use assembly format
This patch moves `LLVM::ShuffleVectorOp` to assembly format and in the
process drops the extra type that can be inferred (both operand types
are required to be the same) and switches to a dense integer array.

The syntax change:

```
// Before
%0 = llvm.shufflevector %0, %1 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<4xf32>, vector<4xf32>
// After
%0 = llvm.shufflevector %0, %1 [0, 0, 0, 0] : vector<4xf32>
```

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D132038
2022-08-18 12:46:04 -04:00
Aart Bik
d926b3307e [mlir] add complex type to getZeroAttr
Fixes issue encountered with <sparse> complex constant
https://github.com/llvm/llvm-project/issues/56428

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D129325
2022-07-07 16:58:59 -07:00
Nicolas Vasilache
b2729fda60 [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
This revision follows up on the conversation titled:

```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```

The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.

This results in roughly 20% fewer cycles as reported by llvm-mca:

After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations:        100
Instructions:      5900
Total Cycles:      2415
Total uOps:        7300

Dispatch Width:    6
uOps Per Cycle:    3.02
IPC:               2.44
Block RThroughput: 24.0

Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
  Resource Pressure       [ 89.65% ]
  - SKXPort1  [ 0.04% ]
  - SKXPort2  [ 12.42% ]
  - SKXPort3  [ 12.42% ]
  - SKXPort5  [ 89.52% ]
  Data Dependencies:      [ 37.06% ]
  - Register Dependencies [ 37.06% ]
  - Memory Dependencies   [ 0.00% ]
```

After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations:        100
Instructions:      6300
Total Cycles:      2015
Total uOps:        7700

Dispatch Width:    6
uOps Per Cycle:    3.82
IPC:               3.13
Block RThroughput: 20.0

Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
  Resource Pressure       [ 83.18% ]
  - SKXPort0  [ 14.49% ]
  - SKXPort1  [ 14.54% ]
  - SKXPort2  [ 19.70% ]
  - SKXPort3  [ 19.70% ]
  - SKXPort5  [ 83.03% ]
  - SKXPort6  [ 14.49% ]
  Data Dependencies:      [ 39.75% ]
  - Register Dependencies [ 39.75% ]
  - Memory Dependencies   [ 0.00% ]
```

An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).

Differential Revision: https://reviews.llvm.org/D114393
2021-11-23 07:31:22 +00:00
Mehdi Amini
e0b7bee7cf Revert "[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)"
This reverts commit a9e236bed835c58be381dadb973a1db0681e4795.
This broke the Windows build:

mlir\include\mlir/Dialect/X86Vector/Transforms.h(28): error C2061: syntax error: identifier 'uint'
2021-11-22 19:23:18 +00:00
Nicolas Vasilache
a9e236bed8 [mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
This revision follows up on the conversation titled:

```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```

The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.

This results in roughly 20% fewer cycles as reported by llvm-mca:

After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations:        100
Instructions:      5900
Total Cycles:      2415
Total uOps:        7300

Dispatch Width:    6
uOps Per Cycle:    3.02
IPC:               2.44
Block RThroughput: 24.0

Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
  Resource Pressure       [ 89.65% ]
  - SKXPort1  [ 0.04% ]
  - SKXPort2  [ 12.42% ]
  - SKXPort3  [ 12.42% ]
  - SKXPort5  [ 89.52% ]
  Data Dependencies:      [ 37.06% ]
  - Register Dependencies [ 37.06% ]
  - Memory Dependencies   [ 0.00% ]
```

After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations:        100
Instructions:      6300
Total Cycles:      2015
Total uOps:        7700

Dispatch Width:    6
uOps Per Cycle:    3.82
IPC:               3.13
Block RThroughput: 20.0

Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
  Resource Pressure       [ 83.18% ]
  - SKXPort0  [ 14.49% ]
  - SKXPort1  [ 14.54% ]
  - SKXPort2  [ 19.70% ]
  - SKXPort3  [ 19.70% ]
  - SKXPort5  [ 83.03% ]
  - SKXPort6  [ 14.49% ]
  Data Dependencies:      [ 39.75% ]
  - Register Dependencies [ 39.75% ]
  - Memory Dependencies   [ 0.00% ]
```

An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).

Reviewed By: ftynse, dcaballe

Differential Revision: https://reviews.llvm.org/D114335
2021-11-22 10:32:34 +00:00
Mehdi Amini
99b0032ce0 Move the MLIR integration tests as a subdirectory of test (NFC)
This does not change the behavior directly: the tests only run when
`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON` is configured. However running
`ninja check-mlir` will not run all the tests within a single
lit invocation. The previous behavior would wait for all the integration
tests to complete before starting to run the first regular test. The
test results were also reported separately. This change is unifying all
of this and allow concurrent execution of the integration tests with
regular non-regression and unit-tests.

Differential Revision: https://reviews.llvm.org/D97241
2021-02-23 05:55:47 +00:00