The current StandardToLLVM conversion patterns only really handle
the Func dialect. The pass itself adds patterns for Arithmetic/CFToLLVM, but
those should be/will be split out in a followup. This commit focuses solely
on being an NFC rename.
Aside from the directory change, the pattern and pass creation API have been renamed:
* populateStdToLLVMFuncOpConversionPattern -> populateFuncToLLVMFuncOpConversionPattern
* populateStdToLLVMConversionPatterns -> populateFuncToLLVMConversionPatterns
* createLowerToLLVMPass -> createConvertFuncToLLVMPass
Differential Revision: https://reviews.llvm.org/D120778
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered
This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.
And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels
This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.
In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110448
- Move the #define s to the GPU Transform library from GPU Ops so that
SerializeToHsaco is non-trivially compiled
- Add required includes to SerializeToHsaco
- Move MCSubtargetInfo creation to the correct point in the
compilation process
- Change mlir in ROCM tests to account for renamed/moved ops
Differential Revision: https://reviews.llvm.org/D114184
Compiling code for AMD GPUs requires knowledge of which chipset is
being targeted, especially if the code uses chipset-specific
intrinsics (which is the case in a downstream convolution generator).
This commit adds `target`, `chipset` and `features` arguments to the
SerializeToHsaco constructor to enable passing in this required
information.
It also amends the ROCm integration tests to pass in the target
chipset, which is set to the chipset of the first GPU on the system
executing the tests.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D114107
Allow lowering of wmma ops with 64bits indexes. Change the default
version of the test to use default layout.
Differential Revision: https://reviews.llvm.org/D112479
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
* Call `llvm_canonicalize_cmake_booleans` for all CMake options,
which are propagated to `lit.local.cfg` files.
* Use Python native boolean values instead of strings for such options.
This fixes the cases, when CMake variables have values other than `ON` (like `TRUE`).
This might happen due to IDE integration or due to CMake preset usage.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D110073
In order to allow large matmul operations using the MMA ops we need to chain
operations this is not possible unless "DOp" and "COp" type have matching
layout so remove the "DOp" layout and force accumulator and result type to
match.
Added a test for the case where the MMA value is accumulated.
Differential Revision: https://reviews.llvm.org/D103023
Add a test case to test the complete execution of WMMA ops on a Nvidia
GPU with tensor cores. These tests are enabled under
MLIR_RUN_CUDA_TENSOR_CORE_TESTS.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D95334
This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.
I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D98447
The commit in question moved some ops across dialects but did not update
some of the target-specific integration tests that use these ops,
presumably because the corresponding target hardware was not available.
Fix these tests.
Change CUDA integration tests to use mlir-opt + mlir-cpu-runner instead.
Depends On D98203
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98396
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.
The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.
Depends On D98279
Reviewed By: herhut, rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98203