31 Commits

Author SHA1 Message Date
Thomas Raoux
7efdc117b1 [mlir][nvvm] Add lowering of gpu.printf to nvvm
When converting to nvvm lowering gpu.printf to vprintf allows us to
support printing when running on cuda.

Differential Revision: https://reviews.llvm.org/D141049
2023-01-06 17:29:30 +00:00
Ivan Butygin
befd167050 [mlir][gpu] Fix cuda integration tests
https://reviews.llvm.org/D138758 has added `uniform` flag to gpu reduce ops, update integration tests.

Differential Revision: https://reviews.llvm.org/D140014
2022-12-14 14:01:00 +01:00
Navdeep Katel
3d35546cd1 Support transpose mode for gpu.subgroup WMMA ops
Add support for loading, computing, and storing `gpu.subgroup` WMMA ops
in transpose mode as well. Update the GPU to NVVM lowerings to support
`transpose` mode and update integration tests as well.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D139021
2022-12-05 22:37:02 +05:30
rkayaith
13bd410962 [mlir][Pass] Include anchor op in -pass-pipeline
In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.

This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D134900
2022-11-03 11:36:12 -04:00
rkayaith
1c0f541a4d [mlir] Don't mix -pass-pipeline with other pass options
These are test updates required for D135745, which disallows mixing
`-pass-pipeline` and the individual `-pass-name` options.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D135746
2022-11-02 12:10:51 -04:00
Christian Sigg
0f2ec35691 [MLIR] Switch lit tests to %mlir_lib_dir and %mlir_src_dir replacements.
The old replacements will be removed soon:
- `%linalg_test_lib_dir`
- `%cuda_wrapper_library_dir`
- `%spirv_wrapper_library_dir`
- `%vulkan_wrapper_library_dir`
- `%mlir_runner_utils_dir`
- `%mlir_integration_test_dir`

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D133270
2022-09-06 12:34:14 +02:00
Krzysztof Drewniak
c2fc8d9b95 [mlir][GPU] Allow bare pointer memrefs when calling GPU kernels
In the ROCm runtime (and probably CUDA as well), all kernel arguments
are aligned. Therefore, enable using bare pointers for memref
arguments to kernels when these memrefs have static shape and a
trivial layout.

This is a substantial optimization to launching kernels that use
memrefs with known, static sizes, since it causes the kernel launch
packet to no longer include information already known to the kernel,
which can enable packing the kernel launch arguments into launch
packets instead of having to allocate an entire separate structure to
hold unneeded memref information.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D130716
2022-08-02 20:58:34 +00:00
Krzysztof Drewniak
d6ef3d20b4 [mlir] Remove VectorToROCDL
Between issues such as
https://github.com/llvm/llvm-project/issues/56323, the fact that this
lowering (unlike the code in amdgpu-to-rocdl) does not correctly set
up bounds checks (and thus will cause page faults on reads that might
need to be padded instead), and that fixing these problems would,
essentially, involve replicating amdgpu-to-rocdl, remove
--vector-to-rocdl for being broken. In addition, the lowering does not
support many aspects of transfer_{read,write}, like supervectors, and
may not work correctly in their presence.

We (the MLIR-based convolution generator at AMD) do not use this
conversion pass, nor are we aware of any other clients.

Migration strategies:
- Use VectorToLLVM
- If buffer ops are particularly needed in your application, use
amdgpu.raw_buffer_{load,store}

A VectorToAMDGPU pass may be introduced in the future.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D129308
2022-07-12 15:21:22 +00:00
Stella Stamenova
d4555698f8 [mlir] Fix the names of exported functions
The names of the functions that are supposed to be exported do not match the implementations. This is due in part to cac7aabbd8.

This change makes the implementations and declarations match and adds a couple missing declarations.

The new names follow the pattern of the existing `verify` functions where the prefix is maintained as `_mlir_ciface_` but the suffix follows the new naming convention.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D124891
2022-05-05 13:46:15 -07:00
River Riddle
87db8e4439 [mlir][NFC] Update textual references of func to func.func in Integration tests
The special case parsing of `func` operations is being removed.
2022-04-20 22:17:29 -07:00
River Riddle
5a7b919409 [mlir][NFC] Rename StandardToLLVM to FuncToLLVM
The current StandardToLLVM conversion patterns only really handle
the Func dialect. The pass itself adds patterns for Arithmetic/CFToLLVM, but
those should be/will be split out in a followup. This commit focuses solely
on being an NFC rename.

Aside from the directory change, the pattern and pass creation API have been renamed:
 * populateStdToLLVMFuncOpConversionPattern -> populateFuncToLLVMFuncOpConversionPattern
 * populateStdToLLVMConversionPatterns -> populateFuncToLLVMConversionPatterns
 * createLowerToLLVMPass -> createConvertFuncToLLVMPass

Differential Revision: https://reviews.llvm.org/D120778
2022-03-07 11:25:23 -08:00
River Riddle
ace01605e0 [mlir] Split out a new ControlFlow dialect from Standard
This dialect is intended to model lower level/branch based control-flow constructs. The initial set
of operations are: AssertOp, BranchOp, CondBranchOp, SwitchOp; all split out from the current
standard dialect.

See https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061

Differential Revision: https://reviews.llvm.org/D118966
2022-02-06 14:51:16 -08:00
Mogball
aae5125550 [mlir] Replace StrEnumAttr -> EnumAttr in core dialects
Removes uses of `StrEnumAttr` in core dialects

Reviewed By: mehdi_amini, rriddle

Differential Revision: https://reviews.llvm.org/D117514
2022-01-18 17:15:00 +00:00
Krzysztof Drewniak
e1da62910e [MLIR][GPU] Define gpu.printf op and its lowerings
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered

This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.

And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels

This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.

In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110448
2021-12-09 15:54:31 +00:00
Krzysztof Drewniak
f849640a0c [MLIR] Make the ROCM integration tests runnable
- Move the #define s to the GPU Transform library from GPU Ops so that
SerializeToHsaco is non-trivially compiled

- Add required includes to SerializeToHsaco

- Move MCSubtargetInfo creation to the correct point in the
compilation process

- Change mlir in ROCM tests to account for renamed/moved ops

Differential Revision: https://reviews.llvm.org/D114184
2021-11-19 17:09:53 +00:00
Krzysztof Drewniak
fb1a06aa13 [MLIR][GPU] Add target arguments to SerializeToHsaco
Compiling code for AMD GPUs requires knowledge of which chipset is
being targeted, especially if the code uses chipset-specific
intrinsics (which is the case in a downstream convolution generator).
This commit adds `target`, `chipset` and `features` arguments to the
SerializeToHsaco constructor to enable passing in this required
information.

It also amends the ROCm integration tests to pass in the target
chipset, which is set to the chipset of the first GPU on the system
executing the tests.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114107
2021-11-18 16:28:44 +00:00
thomasraoux
eacd6e1ebe [mlir][GPUtoNVVM] Relax restriction on wmma op lowering
Allow lowering of wmma ops with 64bits indexes. Change the default
version of the test to use default layout.

Differential Revision: https://reviews.llvm.org/D112479
2021-10-27 21:31:55 -07:00
Mogball
a54f4eae0e [MLIR] Replace std ops with arith dialect ops
Precursor: https://reviews.llvm.org/D110200

Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.

Renamed all instances of operations in the codebase and in tests.

Reviewed By: rriddle, jpienaar

Differential Revision: https://reviews.llvm.org/D110797
2021-10-13 03:07:03 +00:00
Vladislav Vinogradov
505afd1e64 [mlir] Clean up boolean flags usage in LIT tests
* Call `llvm_canonicalize_cmake_booleans` for all CMake options,
  which are propagated to `lit.local.cfg` files.
* Use Python native boolean values instead of strings for such options.

This fixes the cases, when CMake variables have values other than `ON` (like `TRUE`).
This might happen due to IDE integration or due to CMake preset usage.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D110073
2021-10-12 11:44:48 +03:00
thomasraoux
750799b7bc [mlir][NFC] Don't outline kernel in MMA integration tests
This matches better how other gpu integration tests are done.

Differential Revision: https://reviews.llvm.org/D103099
2021-05-27 09:43:54 -07:00
thomasraoux
b44007bec2 [mlir][gpu] Relax restriction on MMA store op to allow chain of mma ops.
In order to allow large matmul operations using the MMA ops we need to chain
operations this is not possible unless "DOp" and "COp" type have matching
layout so remove the "DOp" layout and force accumulator and result type to
match.
Added a test for the case where the MMA value is accumulated.

Differential Revision: https://reviews.llvm.org/D103023
2021-05-27 09:13:51 -07:00
thomasraoux
dae9038611 [mlir] Lower sm version for TensorCore intergration tests
Those tests only require sm70, this allows to run those integration
tests on more hardware.

Differential Revision: https://reviews.llvm.org/D103049
2021-05-24 14:45:24 -07:00
Navdeep Kumar
e552fa28da [MLIR][GPU] Add CUDA Tensor core WMMA test
Add a test case to test the complete execution of WMMA ops on a Nvidia
GPU with tensor cores. These tests are enabled under
MLIR_RUN_CUDA_TENSOR_CORE_TESTS.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D95334
2021-05-22 16:19:36 +05:30
Eugene Zhulenev
8a316b00d6 [mlir] Convert async dialect passes from function passes to op agnostic passes
Differential Revision: https://reviews.llvm.org/D100401
2021-04-13 11:46:00 -07:00
thomasraoux
3587728ed5 [mlir] Fix cuda integration test failure 2021-03-19 10:33:55 -07:00
Christian Sigg
a825fb2c07 [mlir] Remove mlir-rocm-runner
This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.

I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.

Reviewed By: whchung

Differential Revision: https://reviews.llvm.org/D98447
2021-03-19 00:24:10 -07:00
thomasraoux
1a572f4509 [mlir] Add vector op support to cuda-runner including vector.print
Differential Revision: https://reviews.llvm.org/D97346
2021-03-18 13:03:08 -07:00
Alex Zinenko
7aa6f3aa0c [mlir] fix integration tests post e2310704d890ad252aeb1ca28b4b84d29514b1d1
The commit in question moved some ops across dialects but did not update
some of the target-specific integration tests that use these ops,
presumably because the corresponding target hardware was not available.
Fix these tests.
2021-03-15 14:41:27 +01:00
Christian Sigg
1ef544d4a9 [mlir] Remove mlir-cuda-runner
Change CUDA integration tests to use mlir-opt + mlir-cpu-runner instead.

Depends On D98203

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D98396
2021-03-12 14:06:43 +01:00
Christian Sigg
2224221fb3 [mlir] Add NVVM to CUBIN conversion to mlir-opt
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.

The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.

Depends On D98279

Reviewed By: herhut, rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D98203
2021-03-11 10:07:11 +01:00
Christian Sigg
9d7be77bf9 [mlir] Move cuda tests
Move test inputs to test/Integration directory.
Move runtime wrappers to ExecutionEngine.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D97463
2021-03-03 13:16:51 +01:00