llvm-project

Author	SHA1	Message	Date
Jared Hoberock	90ec5f2f62	[MLIR][test] Re-disable FileCheck on async.mlir integration test (#190702 ) #190563 re-enabled FileCheck on `Integration/GPU/CUDA/async.mlir`, but the buildbot has shown intermittent wrong-output failures ([example](https://lab.llvm.org/buildbot/#/builders/116/builds/27026)): the test produces `[42, 42]` instead of the expected `[84, 84]`. This wrong-output flakiness is distinct from the cleanup-time `cuModuleUnload` errors that #190563 actually fixes — it's the underlying issue tracked by #170833. The merged commit message for #190563 incorrectly says `Fixes #170833`; that issue should be reopened, since the cleanup-error fix doesn't address the wrong-output behavior. This PR puts the test back in its previously-disabled state. The runtime cleanup fix in #190563 is unaffected.	2026-04-07 01:14:56 +02:00
Jared Hoberock	7087ece044	[MLIR][ExecutionEngine] Tolerate CUDA_ERROR_DEINITIALIZED in mgpuModuleUnload (#190563 ) `mgpuModuleUnload` may be called from a global destructor (registered by `SelectObjectAttr`'s `appendToGlobalDtors`) after the CUDA primary context has already been destroyed during program shutdown. In this case, `cuModuleUnload` returns `CUDA_ERROR_DEINITIALIZED`, which is benign since the module's resources are already freed with the context. ## Reproduction Any program that uses `gpu.launch_func` and is AOT-compiled (via `mlir-translate --mlir-to-llvmir \| llc \| cc -lmlir_cuda_runtime`) will print `'cuModuleUnload(module)' failed with '<unknown>'` on exit. This is because `SelectObjectAttr` registers the module unload as a global destructor, which runs after the CUDA primary context is released. This script reproduces the error message from `mgpuModuleUnload` on my system: ``` #!/bin/bash set -e LLVM_BUILD=${LLVM_BUILD:-$HOME/dev/git/llvm-project-22/build} cat > /tmp/repro.mlir << 'MLIR' func.func @main() { %c1 = arith.constant 1 : index gpu.launch blocks(%bx, %by, %bz) in (%gx = %c1, %gy = %c1, %gz = %c1) threads(%tx, %ty, %tz) in (%bsx = %c1, %bsy = %c1, %bsz = %c1) { gpu.terminator } return } MLIR $LLVM_BUILD/bin/mlir-opt /tmp/repro.mlir \ -gpu-lower-to-nvvm-pipeline="cubin-format=fatbin" \ \| $LLVM_BUILD/bin/mlir-translate --mlir-to-llvmir -o /tmp/repro.ll $LLVM_BUILD/bin/llc -relocation-model=pic -filetype=obj /tmp/repro.ll -o /tmp/repro.o cc /tmp/repro.o \ -L$LLVM_BUILD/lib -Wl,-rpath,$LLVM_BUILD/lib \ -lmlir_cuda_runtime -lmlir_runner_utils -o /tmp/repro echo "Running:" /tmp/repro 2>&1 echo "Exit code: $?" ``` ## Context This matches how other projects handle the same shutdown ordering issue: - Clang CUDA (D48613) switched module cleanup from `__attribute__((destructor))` to `atexit()` - GCC libgomp checks context validity before `cuModuleUnload` - Apache TVM silently ignores `CUDA_ERROR_DEINITIALIZED` on module unload Fixes #170833	2026-04-06 21:11:58 +00:00
Stefan Mada	0769dde7a2	Removed Hardcoded SM Number from Mlir Test (#186917 ) This MR removes a hard-coded compute number in an MLIR test. This will allow the test to not need to be updated in the future. The default value will come from `NVVMOps.td`.	2026-03-17 11:12:52 -07:00
Jakub Kuderski	15e7177f08	[mlir][GPU] Fix double spaces in tests after ODS printer fix. NFC. (#185325 ) Follow-up to #184253. The ODS attr/type printer fix removed the leading space from generated print() methods. Update tests that checked for the old double-space output of GPU ops using GPU_DimensionAttr and GPU_MmaElementwiseOpAttr. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 18:46:54 -04:00
Zichen Lu	fbffdaa174	[MLIR][GPU] Update serializeToObject to use SerializedObject wrapper and include ISA compiler logs (#176697 ) This PR makes the compilation log from ISA compiler available to users by returning it as part of the `gpu::ObjectAttr` properties, following the existing pattern like `LLVMIRToISATimeInMs`. Currently, the compiler log (which contains useful information such as spill statistics when --verbose is passed) is only accessible in debug builds via `LLVM_DEBUG`. However, there are good reasons to make this information available in release builds as well: 1. Both `ptxas` and `libnvptxcompiler` are publicly available tools/libraries distributed with the CUDA Toolkit. The `--verbose` flag and its output are documented public features, not internal debug information. 2. The verbose output provides valuable insights for users. A new `SerializedObject` class is used to carry the metadata alongside the binary when returning from `serializeObject`.	2026-01-30 12:56:20 +01:00
Durgadoss R	22271c9e76	[MLIR][NVVM][Tests] Re-enable matmul.py tests (#175728 ) This patch re-enables the matmul.py tests: * Fix gpu.wait usages * Fix gpu.launchOp usage * Fix format-string for gpu.printf * Fix verification failure by removing the block[0] append. This is now done by the python script's init. * Fix the runtime error by adding the missing initialize() call during JIT. * Add the missing waitGroup(0) for _ws implementation. This was mistakenly removed in PR #113713. Without this fix, I see timing issues and the _ws tests with stage>1 randomly show output mismatch. With all these fixes, the test compiles and executes successfully on an sm90a machine. (locally verified for 1K iterations) Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2026-01-16 10:57:54 +05:30
Maksim Levental	ad5be31c30	[mlir][Python] fix NV examples after #172892 (#174481 )	2026-01-05 21:47:35 +00:00
Durgadoss R	6778f0d483	[MLIR][NVVM][Tests]: Update FileCheck primitives (#173252 ) This patch updates a few FileCheck primitives for the TMA test to use CHECK-PTX-DAG instead of CHECK-PTX to accommodate a slightly different ordering of BB's. The dump-ptx integration test fails when the PTX is generated through nvcc (intermediates) from public toolkit. This patch fixes it by allowing regex strings from both the backends. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-12-23 00:01:25 +05:30
Sang Ik Lee	528caf99a7	[MLIR] Fix GPU integration tests for SYCL and LevelZero runtime. (#171718 ) i1 type load / store lowering does not work anymore for SPIR-V kernel Rewrite test cases such that it does not use i1 load / store.	2025-12-19 09:10:44 -08:00
Mehdi Amini	d02471ae5e	[MLIR] Partially disable test/Integration/GPU/CUDA/async.mlir This test is flaky, needs investigation. See #170833	2025-12-05 03:54:04 -08:00
Sohaib Iftikhar	b31a398bcf	[MLIR][NVVM] Fix wmma test after d3edc94d (#170659 ) See discussion on #169061	2025-12-04 13:37:29 +00:00
Giacomo Castiglioni	d3edc94d11	[MLIR][GPU] subgroup_mma fp64 extension - take 2 (#169061 ) This PR re-lands #165873. This PR extends the gpu.subgroup_mma_* ops to support fp64 type. The extension requires special handling during the lowering to nvvm due to the return type for load ops for fragment a and b (they return a scalar instead of a struct). The original PR did not guard the new test based on the required architecture (sm80) which lead to a failure on the cuda runners with T4 GPUs.	2025-12-01 07:39:59 -05:00
Fabian Mora	8c3f59f1b2	Revert "[MLIR][GPU] subgroup_mma fp64 extension" (#169049 ) Reverts llvm/llvm-project#165873 The revert is triggered by a failing integration test on a couple of buildbots.	2025-11-21 10:02:59 -05:00
Giacomo Castiglioni	49995b2af0	[MLIR][GPU] subgroup_mma fp64 extension (#165873 ) This PR extends the `gpu.subgroup_mma_*` ops to support fp64 type. The extension requires special handling during the lowering to `nvvm` due to the return type for load ops for fragment a and b (they return a scalar instead of a struct).	2025-11-21 09:07:43 -05:00
Matthias Springer	951ab04d6c	[mlir][NVVM] Add no-rollback option to NVVM lowering passes (#168477 ) Add pass options to run lowerings to NVVM without pattern rollback. This makes the dialect conversions easier to debug and improves performance/memory usage.	2025-11-18 13:47:28 +08:00
Erick Ochoa Lopez	55d4d2ee0d	[mlir][spirv] Fix test (NFC) (#163413 ) This test had a CHECK-RAW command. The intention behind this command appears to be to avoid using the regular expression matching capabilities. However, this was interpretted as a comment by FileCheck. In order to check for literal strings the {LITERAL} modifier should be used. https://llvm.org/docs/CommandGuide/FileCheck.html#directive-modifiers	2025-10-14 12:09:27 -04:00
Durgadoss R	fa366b4e9f	[MLIR][NVVM] Update TMA Load Op (#156347 ) This patch includes im2col and gather mode support for the TMA Load Op. The lowering is also updated to intrinsics except when a Predicate is given. This completes the Blackwell additions on this Op. * NVVM Dialect has support for Shared::Cluster address-space now. So, this patch also updates the Op to use AS(7) instead of AS(3). The corresponding inline-ptx based unit tests are also updated. * lit tests are added for all combinations. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-09-23 13:03:35 +05:30
lonely eagle	a579278312	[mlir][nvgpu] Fix nvgpu integration test (#154748 ) Fix nvgpu mlir file integration test. This PR fixes the bug by removing memref.get_global and then using memref.view.	2025-08-25 17:04:01 +08:00
Md Abdullah Shahneous Bari	281e6d2cc4	[mlir][ExecutionEngine] Add LevelZeroRuntimeWrapper. (#151038 ) Adds LevelZeroRuntime wrapper and tests. Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com> Co-authored-by: Nishant Patel <nishant.b.patel@intel.com> --------- Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com> Co-authored-by: Nishant Patel <nishant.b.patel@intel.com>	2025-08-06 16:48:59 -05:00
Diego Caballero	33465bb2bb	[mlir][Vector] Remove `vector.extractelement` and `vector.insertelement` ops (#149603 ) This PR removes `vector.extractelement` and `vector.insertelement` ops from the code base in favor of the `vector.extract` and `vector.insert` counterparts. See RFC: https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops	2025-07-28 11:01:14 -07:00
lorenzo chelini	c5613dc863	[MLIR] Mark LLVM::FMAOp as legal (#144671 ) Mark LLVM::FMAOp as legal in configureGpuToNVVMConversionLegality, since we can handle intrinsic lowering in the NVPTX backend and emit fma.rn.f32.	2025-06-18 15:49:00 +02:00
Han-Chung Wang	77e2e3f641	[mlir][memref] Update tests to use memref.assume_alignment properly. (#142358 ) With `ffb9bbfd07`, memref.assume_alignment op returns a result value. The revision updates the tests to reflect the change: - Update all the lit tests to use the result of memref.assume_alignment, if it is present. - Capture the result of the op in lit tests. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-06-02 07:57:36 -07:00
Sang Ik Lee	3fa65dee14	[mlir] SYCL runtime wrapper: add memcpy support. (#141647 )	2025-05-28 11:33:15 -07:00
Mehdi Amini	c3858e55f4	[MLIR] Fix test run line: use `env` to set environment variable	2025-04-27 14:48:13 -07:00
Christian Sigg	7851b1bcf1	[mlir][gpu] Change GPU modules to globals (#135478 ) Load/unload GPU modules in global ctors/dtors instead of each time when launching a kernel. Loading GPU modules is a heavy-weight operation and synchronizes the GPU context. Now that the modules are loaded ahead of time, asynchronously launched kernels can run concurrently, see https://discourse.llvm.org/t/how-to-lower-the-combination-of-async-gpu-ops-in-gpu-dialect. The implementations of `embedBinary()` and `launchKernel()` use slightly different mechanics at the moment but I prefer to not change the latter more than necessary as part of this PR. I will prepare a follow-up NFC for `launchKernel()` to align them again.	2025-04-22 13:49:58 +02:00
Qinkun Bao	91d2ecf0d5	[NFC] Fix some typos in libc and mlir comments (#133374 )	2025-03-28 15:52:37 -04:00
Guray Ozen	e8dfd70fe2	[MLIR][NVGPU] Use `gpu.dynamic_shared_memory` in tests (#133122 ) Reland #133051	2025-03-26 18:00:22 +01:00
Karlo Basioli	3f82c3d5a8	Revert "[MLIR][NVGPU] Use `gpu.dynamic_shared_memory` in tests" (#133103 ) Reverts llvm/llvm-project#133051 due to failing integration tests	2025-03-26 15:39:14 +00:00
Guray Ozen	15f5a7a3ec	[MLIR][NVGPU] Use `gpu.dynamic_shared_memory` in tests (#133051 ) The `memref.subview` ops in the test case were incorrect: they extracted out-of-bounds.	2025-03-26 14:32:04 +01:00
Guray Ozen	837b89fc0f	[MLIR][NVVM] Add `ptxas-cmd-options` to pass flags to the downstream compiler (#127457 ) This PR adds `cmd-options` to the `gpu-lower-to-nvvm-pipeline` pipeline and the `nvvm-attach-target` pass, allowing users to pass flags to the downstream compiler, ptxas. Example: ``` mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-chip=sm_80 ptxas-cmd-options='-v --register-usage-level=8'" ```	2025-02-17 12:09:27 +01:00
Andrea Faulds	eb206e9ea8	[mlir] Rename mlir-cpu-runner to mlir-runner (#123776 ) With the removal of mlir-vulkan-runner (as part of #73457) in e7e3c45bc70904e24e2b3221ac8521e67eb84668, mlir-cpu-runner is now the only runner for all CPU and GPU targets, and the "cpu" name has been misleading for some time already. This commit renames it to mlir-runner.	2025-01-24 14:08:38 +01:00
Andrea Faulds	e7e3c45bc7	[mlir] Remove mlir-vulkan-runner and GPUToVulkan conversion passes (#123750 ) This follows up on 733be4ed7dcf976719f424c0cb81b77a14f91f5a, which made mlir-vulkan-runner and its associated passes redundant, and completes the main goal of #73457. The mlir-vulkan-runner tests become part of the integration test suite, and the Vulkan runner runtime components become part of ExecutionEngine, just as was done when removing other target-specific runners.	2025-01-21 16:51:27 +01:00
Benjamin Kramer	0d7022ed75	[MLIR][GPU] Fix gpu.printf test syntax after f50f9698ad012882df8dd605f5482e280c138266	2025-01-08 15:17:39 +01:00
Guray Ozen	f50f9698ad	[MLIR][GPU] Fix gpu.printf (#121940 )	2025-01-08 08:25:57 +01:00
Matthias Springer	599c739905	[mlir][GPU] Add NVVM-specific `cf.assert` lowering (#120431 ) This commit add an NVIDIA-specific lowering of `cf.assert` to to `__assertfail`. Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and `getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can be reused.	2025-01-06 12:00:11 +01:00
Matthias Springer	0dc086a787	[mlir] Fix integration tests after #120580 (#120729 ) This commit should have been part of #120580.	2024-12-20 14:01:23 +01:00
Matthias Springer	b03a09e74f	[mlir] Fix integration tests after #120548 (#120706 ) This should have been part of #120548.	2024-12-20 11:03:33 +01:00
Andrea Faulds	0e39b1348e	[mlir] Remove the mlir-spirv-cpu-runner (move to mlir-cpu-runner) (#114563 ) This commit builds on and completes the work done in 9f6c632ecda08bfff76b798c46d5d7cfde57b5e9 to eliminate the need for a separate mlir-spirv-cpu-runner binary. Since the MLIR processing is already done outside this runner, the only real difference between it and the mlir-cpu-runner is the final linking step between the nested LLVM IR modules. By moving this step into mlir-cpu-runner behind a new command-line flag (`--link-nested-modules`), this commit is able to completely remove the runner component of the mlir-spirv-cpu-runner. The runtime libraries and the tests are moved and renamed to fit into the Execution Engine and Integration tests, following the model of the similar migration done for the CUDA Runner in D97463.	2024-11-08 08:01:52 -05:00
Durgadoss R	13d6233e77	[MLIR][NVGPU] Fix nvgpu_arrive syntax in matmulBuilder.py (#113713 ) This patch updates the syntax for nvgpu_arrive Op in matmulBuilder.py. This fixes the compilation error for this test. For the warp-specialized matmul_kernel implementation, removing the WaitGroupSyncOp (after the mma-main-loop) fixes the hang observed. With these two fixes, the test compiles and executes successfully on an sm90a machine. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-10-26 11:15:50 +05:30
Durgadoss R	a8b5115441	[MLIR][NVGPU] Fix the cga_cluster.mlir test (#112191 ) This patch fixes the sm90 cluster test by: * Fixing a typo in LowerGpuOpsToNVVMOps where one of the ClusterDim Op conversion pattern should actually be for the ClusterDimBlocks Op. This addresses the compilation error for this test. * The grid-size should be (4,4,1) instead of (2,2,1). This passes the scf-if check against the threshold of 3 below and actually generates the required prints from the GPU. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-10-14 19:44:13 +05:30
Durgadoss R	9432f7074c	[MLIR][NVGPU-Tests] Fix a failing sm90 test (#111731 ) The memref.expand_shape explicitly takes an output_shape now. This patch adds it to the Op and fixes the failing test. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-10-10 10:51:59 +05:30
Matthias Springer	8e33ff7d56	[mlir][GPU][NFC] Move `dump-ptx.mlir` test case (#111142 )	2024-10-04 15:13:20 +02:00
Guray Ozen	816134b333	[MLIR] Dump sass (#110227 ) This PR dump sass by using nvdiasm	2024-09-27 13:52:15 +02:00
Christian Sigg	b4ac5c4b7c	[mlir][cuda] NFC: Remove accidentally committed 'asd' file. (#105491 ) Co-authored-by: Christian Sigg <chsigg@users.noreply.github.com>	2024-08-22 10:52:50 +02:00
Guray Ozen	f2251f93ab	[mlir][gpu] Add mlir_c_runner_utils to fix #99035 This fixes the unit test that is broken in #99035.	2024-07-17 09:23:32 +02:00
Guray Ozen	20861f1f2f	[mlir][gpu] Use alloc OP's `host_shared` in cuda runtime (#99035 )	2024-07-17 07:25:11 +02:00
Matthias Springer	7775be4d48	[mlir] Fix GPU integration test (part 2) (#98918 ) Fix tests that were broken by #97903.	2024-07-15 17:39:16 +02:00
Matthias Springer	6469faf9fd	[mlir] Fix GPU integration test (#98917 ) Fix tests that were broken by #97903.	2024-07-15 17:30:04 +02:00
Guray Ozen	f8ff909471	[mlir][gpu] Add py binding for AsyncTokenType (#96466 ) The PR adds py binding for `AsyncTokenType`	2024-06-24 11:39:22 +02:00
Pradeep Kumar	bd6568c98a	[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245 ) This commit adds support for `gpu.cluster_dim_blocks` and `gpu.cluster_block_id` Ops to represent number of blocks per cluster and block id inside a cluster respectively. Also, fixed the description of `gpu.cluster_dim` Op and updated the `cga_cluster.mlir` test file to use `gpu.cluster_dim_blocks` Co-authored-by: pradeepku <pradeepku@nvidia.com> Co-authored-by: Guray Ozen <guray.ozen@gmail.com>	2024-06-14 10:35:35 +05:30

1 2 3

133 Commits