History

[mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774 )

### Problem

PR #142944 introduced a new canonicalization pattern which caused
failures in the following GPU-related integration tests:

-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir

The issue occurs because the new canonicalization pattern can generate
multi-dimensional `vector.from_elements` operations (rank > 1), but the
GPU lowering pipelines were not equipped to handle these during the
conversion to LLVM.

### Fix

This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the
GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`:

- `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass.
- `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass.

Co-authored-by: Yang Bai <yangb@nvidia.com>

2025-08-21 21:46:06 -05:00

benchmark/python

…

cmake/modules

[NFC][CMake] quote ${CMAKE_SYSTEM_NAME} consistently (#154537 )

2025-08-20 12:45:41 -04:00

docs

[MLIR] Introduce RemarkEngine + pluggable remark streaming (YAML/Bitstream) (#152474 )

2025-08-21 16:02:31 +02:00

examples

[MLIR] Set LLVM_LIT_ARGS in Standalone Example CMake (#152423 )

2025-08-15 12:40:32 -07:00