This PR improves the documentation for the `gpu-lower-to-nvvm-pipeline`
(as it was remaning item for #75775)
- Changes pipeline `gpu-lower-to-nvvm` -> `gpu-lower-to-nvvm-pipeline`
- Adds a section in GPU Dialect in website. It clarifies the pipeline's
functionality in lowering primary dialects to NVVM targets.
The `test-lower-to-nvvm` pipeline serves as the common and proper
pipeline for nvvm+host compilation, and it's used across our CUDA
integration tests.
This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm`
and moves it within `InitAllPasses.h`. The aim is to call it from
Python, also having a standardize compilation process for nvvm.
The revert happened due to a build bot failure that threw 'CUDA_ERROR_UNSUPPORTED_PTX_VERSION'.
The failure's root cause was a pass using "+ptx76" for compilation and an old CUDA driver
on the bot. This commit relands the patch with "+ptx60".
Original Gh PR: #65768
Original commit message:
Migrate tests referencing `gpu-to-cubin` to the new compilation workflow
using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified
to use the new compilation workflow to simplify the introduction of
future tests.
The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't
allow for passing all options available in the `ConvertGpuOpsToNVVMOp`
pass.
Migrate tests referencing `gpu-to-cubin` to the new compilation workflow
using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified
to use the new compilation workflow to simplify the introduction of
future tests.
The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't
allow for passing all options available in the `ConvertGpuOpsToNVVMOp`
pass.
Reland of the original patch after updating the Python binding tests,
a few CUDA/GPU MLIR tests, and ensuring the assembly format is
round-trippable.
This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.
The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.
To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:
vector.print punctuation <comma>
lowers to
llvm.call @printComma() : () -> ()
The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).
Reviewed By: awarzynski, c-rhodes, aartbik
Differential Revision: https://reviews.llvm.org/D156519
Reland of the original patch after updating the Python binding tests and
a few CUDA/GPU MLIR tests.
This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.
The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.
To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:
vector.print <comma>
lowers to
llvm.call @printComma() : () -> ()
The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).
Reviewed By: awarzynski, c-rhodes, aartbik
Differential Revision: https://reviews.llvm.org/D156519
This reverts commit 3875804a0725c6490b4c0e76e1c0e1e0dbccedf4.
This caused some test failures for the MLIR python bindings. Reverting
until those are addressed.
This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.
The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.
To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:
vector.print <comma>
lowers to
llvm.call @printComma() : () -> ()
The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).
Reviewed By: awarzynski, c-rhodes, aartbik
Differential Revision: https://reviews.llvm.org/D156519
This is an ongoing series of commits that are reformatting our
Python code.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Differential Revision: https://reviews.llvm.org/D150782
This reverts commit 5561e174117ff395d65b6978d04b62c1a1275138
The logic was moved from cmake into lit fixing the issue that lead to the revert and potentially others with multi-config cmake generators
Differential Revision: https://reviews.llvm.org/D143925
This patch contains the changes required to make the vast majority of integration and runner tests run on Windows.
Historically speaking, the JIT support for Windows has been lacking behind, but recent versions of ORC JIT have now caught up and works for basically all examples in repo.
Sadly due to these tests previously not working on Windows, basically all of them are making unix-like assumptions about things like filenames, paths, shell syntax etc.
This patch fixes all these issues in one big swoop and enables Windows support for the vast majority of integration tests.
More specifically, following changes had to be done:
* The various JIT runners used paths to the runtime libraries that assumed a Unix toolchain layout and filenames. I abstracted the specific path and filename of these runtime libraries away by making the paths to the runtime libraries be passed from cmake into lit. This now also allows a much more convenient syntax: `--shared-libs=%mlir_c_runner_utils` instead of `--shared-libs=%mlir_lib_dir/lib/libmlir_c_runner_utils%shlibext`
* Some tests using python set environment variables using the `ENV=VALUE cmd` format. This works on Unix, but on Windows it has to prefixed using `env ENV=VALUE cmd`
* Some tests used C functions that are simply not available or exported on Windows (`fabsf`, `aligned_alloc`). These tests have either been adjusted or explicitly marked as `UNSUPPORTED`
Some tests remain disabled on Windows as before:
* In SparseTensor some tests have non-trivial logic for finding the runtime libraries which seems to be required for the use of emulators. I do not have the time to port these so I simply kept them disabled
* Some tests requiring special hardware which I simply cannot test remain disabled on Windows. These include usage of AVX512 or AMX
The tests for `mlir-vulkan-runner` and `mlir-spirv-runner` all work now as well and so do the vast majority of `mlir-cpu-runner`.
Differential Revision: https://reviews.llvm.org/D143925
In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.
This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D134900
These are test updates required for D135745, which disallows mixing
`-pass-pipeline` and the individual `-pass-name` options.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D135746
Add a pattern to do ad hoc lowering of vector.reduction to a sequence of
warp shuffles. This allow distributing reduction on a warp for GPU targets.
Also add an execution test for warp reduction.
co-authored with @springerm
Differential Revision: https://reviews.llvm.org/D127176
Add patterns to propagate vector distribution and remove dead
arguments. This handles propagation for several vector operations.
recommit after minor bug fix.
Differential Revision: https://reviews.llvm.org/D127167
Add patterns to propagate vector distribution and remove dead
arguments. This handles propagation for several vector operations.
Differential Revision: https://reviews.llvm.org/D127167
Add lowering of the vector.warp_execute_on_lane_0 into scf.if plus memory
transfer for the operands and yield values.
This also add an integration test running on GPU warp. The same tests can be
later re-used with different comment lines to tests distribution
transformations.
This is mostly from @springerm contribution.
Differential Revision: https://reviews.llvm.org/D125430