llvm-project

Author	SHA1	Message	Date
Jared Hoberock	90ec5f2f62	[MLIR][test] Re-disable FileCheck on async.mlir integration test (#190702 ) #190563 re-enabled FileCheck on `Integration/GPU/CUDA/async.mlir`, but the buildbot has shown intermittent wrong-output failures ([example](https://lab.llvm.org/buildbot/#/builders/116/builds/27026)): the test produces `[42, 42]` instead of the expected `[84, 84]`. This wrong-output flakiness is distinct from the cleanup-time `cuModuleUnload` errors that #190563 actually fixes — it's the underlying issue tracked by #170833. The merged commit message for #190563 incorrectly says `Fixes #170833`; that issue should be reopened, since the cleanup-error fix doesn't address the wrong-output behavior. This PR puts the test back in its previously-disabled state. The runtime cleanup fix in #190563 is unaffected.	2026-04-07 01:14:56 +02:00
Jared Hoberock	7087ece044	[MLIR][ExecutionEngine] Tolerate CUDA_ERROR_DEINITIALIZED in mgpuModuleUnload (#190563 ) `mgpuModuleUnload` may be called from a global destructor (registered by `SelectObjectAttr`'s `appendToGlobalDtors`) after the CUDA primary context has already been destroyed during program shutdown. In this case, `cuModuleUnload` returns `CUDA_ERROR_DEINITIALIZED`, which is benign since the module's resources are already freed with the context. ## Reproduction Any program that uses `gpu.launch_func` and is AOT-compiled (via `mlir-translate --mlir-to-llvmir \| llc \| cc -lmlir_cuda_runtime`) will print `'cuModuleUnload(module)' failed with '<unknown>'` on exit. This is because `SelectObjectAttr` registers the module unload as a global destructor, which runs after the CUDA primary context is released. This script reproduces the error message from `mgpuModuleUnload` on my system: ``` #!/bin/bash set -e LLVM_BUILD=${LLVM_BUILD:-$HOME/dev/git/llvm-project-22/build} cat > /tmp/repro.mlir << 'MLIR' func.func @main() { %c1 = arith.constant 1 : index gpu.launch blocks(%bx, %by, %bz) in (%gx = %c1, %gy = %c1, %gz = %c1) threads(%tx, %ty, %tz) in (%bsx = %c1, %bsy = %c1, %bsz = %c1) { gpu.terminator } return } MLIR $LLVM_BUILD/bin/mlir-opt /tmp/repro.mlir \ -gpu-lower-to-nvvm-pipeline="cubin-format=fatbin" \ \| $LLVM_BUILD/bin/mlir-translate --mlir-to-llvmir -o /tmp/repro.ll $LLVM_BUILD/bin/llc -relocation-model=pic -filetype=obj /tmp/repro.ll -o /tmp/repro.o cc /tmp/repro.o \ -L$LLVM_BUILD/lib -Wl,-rpath,$LLVM_BUILD/lib \ -lmlir_cuda_runtime -lmlir_runner_utils -o /tmp/repro echo "Running:" /tmp/repro 2>&1 echo "Exit code: $?" ``` ## Context This matches how other projects handle the same shutdown ordering issue: - Clang CUDA (D48613) switched module cleanup from `__attribute__((destructor))` to `atexit()` - GCC libgomp checks context validity before `cuModuleUnload` - Apache TVM silently ignores `CUDA_ERROR_DEINITIALIZED` on module unload Fixes #170833	2026-04-06 21:11:58 +00:00
Will Froom	f52b2616f4	[mlir][vector] Use non-native runner in gather.mlir test (#187243 ) Fix after https://github.com/llvm/llvm-project/pull/187071	2026-03-18 11:28:14 +00:00
Andrzej Warzyński	9cb9081049	[mlir][vector] Extend vector.gather e2e test (#187071 ) Extend the vector.gather e2e test to cover both available lowering paths: * Direct lowering to LLVM (via -test-lower-to-llvm) * Lowering via vector.load (via -test-vector-gather-lowering) This is a follow-up to https://github.com/llvm/llvm-project/pull/184706, which updated a pattern used by -test-vector-gather-lowering. The test is extended to operate on 2D memrefs so that the changes in https://github.com/llvm/llvm-project/pull/184706 are meaningfully exercised.	2026-03-18 09:23:17 +00:00
Stefan Mada	0769dde7a2	Removed Hardcoded SM Number from Mlir Test (#186917 ) This MR removes a hard-coded compute number in an MLIR test. This will allow the test to not need to be updated in the future. The default value will come from `NVVMOps.td`.	2026-03-17 11:12:52 -07:00
Jakub Kuderski	ade6309229	[mlir][XeGPU] Fix double spaces in tests after ODS printer fix. NFC. (#185324 ) Follow-up to #184253. Update tests that checked for the old double-space output of gpu.block_id using GPU_DimensionAttr. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 20:59:47 -04:00
Jakub Kuderski	15e7177f08	[mlir][GPU] Fix double spaces in tests after ODS printer fix. NFC. (#185325 ) Follow-up to #184253. The ODS attr/type printer fix removed the leading space from generated print() methods. Update tests that checked for the old double-space output of GPU ops using GPU_DimensionAttr and GPU_MmaElementwiseOpAttr. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-08 18:46:54 -04:00
Sang Ik Lee	dcd1bfb0df	[MLIR][XeVM] Mark gpu.printf test with XFAIL. (#184215 ) gpu.printf test is expect to fail until vararg handling issue with SPIR-V backend is resolved.	2026-03-05 23:30:18 +00:00
Erick Ochoa Lopez	613a5c555e	[mlir][vector] Replace OneDimMultiReductionToTwoDim with OneDimMultiReductionToReduction (#184241 ) The `OneDimMultiReductionToTwoDim` pattern had some issues. For the input program: ```mlir func.func @rank1_multi_reduction(%arg0: vector<8xf32>, %acc: f32) -> f32 { %0 = vector.multi_reduction <add>, %arg0, %acc [0] : vector<8xf32> to f32 return %0 : f32 } ``` * when lowering using the inner-parallel strategy, the compiler would essentially produce scalar code: ```mlir func.func @rank1_multi_reduction(%arg0: vector<8xf32>, %arg1: f32) -> f32 { %0 = vector.shape_cast %arg0 : vector<8xf32> to vector<1x8xf32> %1 = vector.broadcast %arg1 : f32 to vector<1xf32> %2 = vector.transpose %0, [1, 0] : vector<1x8xf32> to vector<8x1xf32> %3 = vector.extract %2[0] : vector<1xf32> from vector<8x1xf32> %4 = arith.addf %3, %1 : vector<1xf32> %5 = vector.extract %2[1] : vector<1xf32> from vector<8x1xf32> %6 = arith.addf %5, %4 : vector<1xf32> ... (repeats for all 8 elements) ... %17 = vector.extract %2[7] : vector<1xf32> from vector<8x1xf32> %18 = arith.addf %17, %16 : vector<1xf32> %19 = vector.extract %18[0] : f32 from vector<1xf32> return %19 : f32 } ``` * when lowering using the inner-reduction strategy, the compiler would first unnecessarily transform it into a 2-D multi_reduction operation <1x8xf32> and then extract an <8xf32> vector and apply reduction. The canonicalization and folding would lead to the following final result: ```mlir func.func @rank1_multi_reduction(%arg0: vector<8xf32>, %arg1: f32) -> f32 { %0 = vector.reduction <add>, %arg0, %arg1 : vector<8xf32> into f32 return %0 : f32 } ``` Now, after this change: * when lowering the compiler now produces for both strategies in one step. ``` func.func @rank1_multi_reduction(%arg0: vector<8xf32>, %arg1: f32) -> f32 { %0 = vector.reduction <add>, %arg0, %arg1 : vector<8xf32> into f32 return %0 : f32 } ``` This pattern is also useful for an ongoing refactoring that is happening in the multi_reduction patterns. It is the only pattern that increases multi_reduction in rank and would lead to an infinite loop when attempting to reach a fixed point once we generalize other unrolling patterns. Assisted-by: Claude	2026-03-04 16:13:11 +00:00
Adam Siemieniuk	0fff939c1a	[mlir][linalg] Lower unpack - capture handle to created copy op (#183744 ) Adds missing copy op created to unpack lowering results. Corresponding transform op is also updated with the new result value.	2026-03-03 08:26:04 +01:00
Adam Siemieniuk	e44fd05035	[mlir][x86] Move AMX dialect into X86 dialect (#183717 ) Unifies the two dialects that define x86 operations into a single one. The AMX dialect is moved into X86 in line with other x86 extensions. Following the dialect renaming, X86 dialect is now a suitable home for wider range of operations targeting specific hardware features. Moving AMX definitions to X86 dialect creates a single, centralized hub for defining all x86 intrinsic-like operations. The new grouping aims to eliminate the need for new dialects as new hardware extensions become available. The two dialects are simply merged together. X86 dialect refactoring will be addressed separately. List of changes: - operations: 'amx.tile_' => 'x86.amx.tile_' - types: '!amx.tile' => '!x86.amx.tile' - namespace: 'mlir::amx' => 'mlir::x86::amx' - test define: 'MLIR_RUN_AMX_TESTS' => 'MLIR_RUN_X86_AMX_TESTS' - vector lowering: AMX is enabled by default together with X86 The MLIR AMX tests are now nested under X86 directory. To enable AMX integration tests, 'MLIR_RUN_X86_TESTS' must also be defined.	2026-03-02 11:47:30 +01:00
Adam Siemieniuk	67ac275fee	[mlir][x86] Rename x86vector to x86 (#183311 ) Renames 'x86vector' dialect to 'x86'. This is the first PR in series of cleanups around dialects targeting x86 platforms. The new naming scheme is shorter, cleaner, and opens possibility of integrating other x86-specific operations not strictly fitting pure vector representation. For example, the generalization will allow for future merger of AMX dialect into the x86 dialect to create one-stop x86 operations collection and boost discoverability.	2026-02-26 11:21:58 +01:00
Erick Ochoa Lopez	eeb6b394c5	[mlir][vector] remove lower_multi_reduction (#182332 ) * Removes `ApplyLowerMultiReductionPatternsOp` (`apply_patterns.vector.lower_multi_reduction`) * Updates uses of `apply_patterns.vector.lower_multi_reduction` in tests to use: * reorder_and_expand_multi_reduction_dims * multi_reduction_flattening * multi_reduction_unrolling * Removes `populateVectorMultiReductionLoweringPatterns` (unused)	2026-02-20 08:33:24 -05:00
Sang Ik Lee	f481bf1031	[MLIR][XeVM] Update cache control values and metadata format. (#175274 ) Fix incorrect cache control metadata values and format.	2026-02-18 13:14:16 -08:00
Sang Ik Lee	a51bc254bc	[MLIR][XeGPU] Add LANE level integration test without XeGPU ops. (#181891 ) XeGPU LANE level integration test lacks a test without usage of any XeGPU dialect ops. Add an integration test without XeGPU dialect ops.	2026-02-18 10:22:47 -08:00
Zichen Lu	fbffdaa174	[MLIR][GPU] Update serializeToObject to use SerializedObject wrapper and include ISA compiler logs (#176697 ) This PR makes the compilation log from ISA compiler available to users by returning it as part of the `gpu::ObjectAttr` properties, following the existing pattern like `LLVMIRToISATimeInMs`. Currently, the compiler log (which contains useful information such as spill statistics when --verbose is passed) is only accessible in debug builds via `LLVM_DEBUG`. However, there are good reasons to make this information available in release builds as well: 1. Both `ptxas` and `libnvptxcompiler` are publicly available tools/libraries distributed with the CUDA Toolkit. The `--verbose` flag and its output are documented public features, not internal debug information. 2. The verbose output provides valuable insights for users. A new `SerializedObject` class is used to carry the metadata alongside the binary when returning from `serializeObject`.	2026-01-30 12:56:20 +01:00
Maksim Levental	47f9e0ab2a	[mlir][math] Add vector support for math-to-apfloat (#172715 ) This PR adds vector type support to `math-to-apfloat`.	2026-01-16 10:14:00 -08:00
Durgadoss R	22271c9e76	[MLIR][NVVM][Tests] Re-enable matmul.py tests (#175728 ) This patch re-enables the matmul.py tests: * Fix gpu.wait usages * Fix gpu.launchOp usage * Fix format-string for gpu.printf * Fix verification failure by removing the block[0] append. This is now done by the python script's init. * Fix the runtime error by adding the missing initialize() call during JIT. * Add the missing waitGroup(0) for _ws implementation. This was mistakenly removed in PR #113713. Without this fix, I see timing issues and the _ws tests with stage>1 randomly show output mismatch. With all these fixes, the test compiles and executes successfully on an sm90a machine. (locally verified for 1K iterations) Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2026-01-16 10:57:54 +05:30
Maksim Levental	ad5be31c30	[mlir][Python] fix NV examples after #172892 (#174481 )	2026-01-05 21:47:35 +00:00
Durgadoss R	6778f0d483	[MLIR][NVVM][Tests]: Update FileCheck primitives (#173252 ) This patch updates a few FileCheck primitives for the TMA test to use CHECK-PTX-DAG instead of CHECK-PTX to accommodate a slightly different ordering of BB's. The dump-ptx integration test fails when the PTX is generated through nvcc (intermediates) from public toolkit. This patch fixes it by allowing regex strings from both the backends. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-12-23 00:01:25 +05:30
Sang Ik Lee	528caf99a7	[MLIR] Fix GPU integration tests for SYCL and LevelZero runtime. (#171718 ) i1 type load / store lowering does not work anymore for SPIR-V kernel Rewrite test cases such that it does not use i1 load / store.	2025-12-19 09:10:44 -08:00
Maksim Levental	54eee1e947	Reapply "[mlir][math] Add FP software implementation lowering pass: math-to-apfloat" (#172714 ) (#172716 ) Reapply https://github.com/llvm/llvm-project/pull/171221 - Fix builder by linking `MLIRTransformUtils`. Also move headers to `mlir/Conversion/ArithAndMathToAPFloat`.	2025-12-17 17:26:37 -08:00
Jianhui Li	2b9e47749c	[MLIR][XeGPU] Refactor Layout access interface (#172125 ) This PR builds on the anchor layout mechanism introduced in https://github.com/llvm/llvm-project/pull/169267 and performs the following refactoring: 1. Introduce getAnchorLayout() and setAnchorLayout() interface for anchor ops to get and set layout attributes. 2. Add getLocalLayout() and setLocalLayout() utility functions, and refactor workgroup/subgroup distribution patterns to use these APIs. These utilities access the layout information directly and locally, without relying on global propagation. 3. Introduce localPropagateLayoutsFromAnchor(), a utility used by subgroup distribution to unify non-anchor layout setup. This function is intended to be invoked upfront by all layout-based passes (including workgroup/subgroup distribution and unrolling) to propagate layouts from anchor ops to non-anchor ops. After this step, patterns within the pass should exclusively use getLocalLayout() / setLocalLayout(). 4. Refactor getDistributeLayoutAttr() and setDistributeLayoutAttr() to remove special-case handling. These APIs now operate in a uniform order: anchor ops first, then non-anchor ops, and finally block arguments. These APIs will be deprecated on long run. 5. Refactor patterns in wg/sg distribution, load optimization passes to use get/setAnchorLayout() and get/setLocalLayout(). 6. Update test cases to enforce that anchor ops must use—and only use—anchor layouts.	2025-12-17 12:04:58 -08:00
Maksim Levental	621fe03eaa	Revert "[mlir][math] Add FP software implementation lowering pass: math-to-apfloat" (#172714 ) Reverts llvm/llvm-project#171221 Broken builder https://lab.llvm.org/buildbot/#/builders/138/builds/23270	2025-12-17 10:52:43 -08:00
Maksim Levental	7f1a30ebd2	[mlir][math] Add FP software implementation lowering pass: math-to-apfloat (#171221 ) Add APFloat software implementation for `math.fma`, `math.abs`, `math.isnan`, `math.isfinite`, `math.isinf`, `math.isnormal` for reduced precision (`fp4`, `fp6`, `fp8*`).	2025-12-17 18:37:13 +00:00
Maksim Levental	8d5ade8feb	[mlir] enable APFloatWrappers on MacOS (#172070 )	2025-12-12 11:34:23 -08:00
Sang Ik Lee	b8ddbc4f03	[MLIR][XeVM] gpu.printf test: use correct runtime. (#170754 ) gpu printf test was not using the runtime required by lit.local.cfg All other tests in the directory are correctly using level zero runtime. But gpu printf test is using sycl runtime.	2025-12-08 08:14:56 -08:00
Matthias Springer	8378a6fa4f	[mlir][arith] Fix build after #171024 (#171057 ) Fix build after #171024.	2025-12-07 21:48:00 +01:00
Matthias Springer	5dbd049662	[mlir][arith] `arith-to-apfloat`: Add vector support (#171024 ) Add support for vectorized operations such as `arith.addf ... : vector<4xf4E2M1FN>`. The computation is scalarized: scalar operands are extracted with `vector.to_elements`, multiple scalar computations are performed and the result is inserted back into a vector with `vector.from_elements`.	2025-12-07 20:55:48 +01:00
Mehdi Amini	d02471ae5e	[MLIR] Partially disable test/Integration/GPU/CUDA/async.mlir This test is flaky, needs investigation. See #170833	2025-12-05 03:54:04 -08:00
Sohaib Iftikhar	b31a398bcf	[MLIR][NVVM] Fix wmma test after d3edc94d (#170659 ) See discussion on #169061	2025-12-04 13:37:29 +00:00
Sang Ik Lee	c379f7cc01	[MLIR][XeGPU] Add integration with XeGPU load / store ops to / from memref subview. (#170385 ) Add XeGPU integration test for missing usage case: base memory from memref subview.	2025-12-03 09:22:18 -08:00
Jakub Kuderski	ad656d3a19	[mlir][linalg][arm] Fix use of fill in arm integration tests (#170143 ) Follow up to https://github.com/llvm/llvm-project/pull/169567#issuecomment-3596220014	2025-12-01 14:19:07 +00:00
Ryan Holt	b27301ff5d	[mlir][linalg] Re-enable linalg runtime verification test (#170129 ) Test seems to pass after re-enabling without any additional changes.	2025-12-01 08:52:20 -05:00
Giacomo Castiglioni	d3edc94d11	[MLIR][GPU] subgroup_mma fp64 extension - take 2 (#169061 ) This PR re-lands #165873. This PR extends the gpu.subgroup_mma_* ops to support fp64 type. The extension requires special handling during the lowering to nvvm due to the return type for load ops for fragment a and b (they return a scalar instead of a struct). The original PR did not guard the new test based on the required architecture (sm80) which lead to a failure on the cuda runners with T4 GPUs.	2025-12-01 07:39:59 -05:00
Matthias Springer	147c466bcd	[mlir][arith] Add support for min/max to `ArithToAPFloat` (#169760 ) Add support for `arith.minnumf`, `arith.maxnumf`, `arith.minimumf`, `arith.maximumf`.	2025-12-01 08:50:02 +00:00
Matthias Springer	05b1989551	[mlir][arith] Add support for `negf` to `ArithToAPFloat` (#169759 ) Add support for `arith.negf`.	2025-12-01 08:28:23 +00:00
Matthias Springer	4d7abe5355	[mlir][arith] Add support for `cmpf` to `ArithToAPFloat` (#169753 ) Add support for `arith.cmpf`.	2025-12-01 09:12:11 +01:00
Jakub Kuderski	0bd2f12753	[mlir][linalg] Restrict fill initial value type to output element type (#169567 ) Disallow implicit casting, which is surprising, and, IME, usually indicative of copy-paste errors. Because the initial value must be a scalar, I don't expect this to affect any data movement.	2025-11-30 09:51:37 -05:00
Matthias Springer	6ec686735c	[mlir][arith] Add support for `sitofp`, `uitofp` to `ArithToAPFloat` (#169284 ) Add support for `arith.sitofp` and `arith.uitofp`.	2025-11-25 11:31:23 +09:00
Matthias Springer	3db8ed0500	[mlir][arith] Add support for `fptosi`, `fptoui` to `ArithToAPFloat` (#169277 ) Add support for `arith.fptosi` and `arith.fptoui`.	2025-11-25 10:50:20 +09:00
Matthias Springer	78994706d8	[mlir][arith] Add support for `extf`, `truncf` to `ArithToAPFloat` (#169275 ) Add support for `arith.extf` and `arith.truncf`. No support for custom rounding modes yet.	2025-11-25 10:09:26 +09:00
Fabian Mora	8c3f59f1b2	Revert "[MLIR][GPU] subgroup_mma fp64 extension" (#169049 ) Reverts llvm/llvm-project#165873 The revert is triggered by a failing integration test on a couple of buildbots.	2025-11-21 10:02:59 -05:00
Giacomo Castiglioni	49995b2af0	[MLIR][GPU] subgroup_mma fp64 extension (#165873 ) This PR extends the `gpu.subgroup_mma_*` ops to support fp64 type. The extension requires special handling during the lowering to `nvvm` due to the return type for load ops for fragment a and b (they return a scalar instead of a struct).	2025-11-21 09:07:43 -05:00
Matthias Springer	951ab04d6c	[mlir][NVVM] Add no-rollback option to NVVM lowering passes (#168477 ) Add pass options to run lowerings to NVVM without pattern rollback. This makes the dialect conversions easier to debug and improves performance/memory usage.	2025-11-18 13:47:28 +08:00
Matthias Springer	7a53d33e7c	[mlir] Add FP software implementation lowering pass: `arith-to-apfloat` (#167848 ) Reland pass and fix linker errors. --------- Co-authored-by: Maksim Levental <maksim.levental@gmail.com>	2025-11-13 18:35:30 +09:00
Maksim Levental	140e07c862	Revert "Reland yet again: [mlir] Add FP software implementation lowering pass: `arith-to-apfloat`" (#167834 ) Reverts llvm/llvm-project#167608 Broken builder https://lab.llvm.org/buildbot/#/builders/52/builds/12781	2025-11-12 23:02:21 -08:00
Matthias Springer	73e70e0c88	[mlir][linalg] Fix Linalg runtime verification test (#167814 ) This integration test has been broken for a while. This commit partially fixes it. - Use `CHECK` + `CHECK-NEXT` to ensure that the correct error lines are matched together. - Move all `CHECK-NOT` to the end. Having a `CHECK` with the same string does not make sense after a `CHECK-NOT`. - Add a missing `CHECK: ERROR` for one of the test cases. - Deactivate `reverse_from_3`, which is broken, and put a TODO.	2025-11-13 12:40:43 +09:00
Maksim Levental	0bba1e7658	Reland yet again: [mlir] Add FP software implementation lowering pass: `arith-to-apfloat` (#167608 ) Fix both symbol visibility issue in the mlir_apfloat_wrappers lib and the linkage issue in ArithToAPFloat.	2025-11-12 17:57:53 -08:00
Hanumanth	81964597f9	[mlir][tensor] Fix runtime verification for tensor.extract_slice for empty tensor slices (#166569 ) I hit another runtime verification issue (similar to https://github.com/llvm/llvm-project/pull/164878) while working with TFLite models. The verifier is incorrectly rejecting `tensor.extract_slice` operations when extracting an empty slice (size=0) that starts exactly at the tensor boundary. The current runtime verification unconditionally enforces `offset < dim_size`. This makes sense for non-empty slices, but it's too strict for empty slices, causing false positives that lead to spurious runtime assertions. Simple example that demonstrates the issue: ```mlir func.func @extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) { // When called with: tensor size=10, offset=10, size=0 // Runtime verification fails: "offset 0 is out-of-bounds" %slice = tensor.extract_slice %tensor[%offset] [%size] [1] : tensor<?xf32> to tensor<?xf32> return } ``` For the above example, the check evaluates `10 < 10` which is false, so verification fails. However, I believe this operation should be valid - we're extracting zero elements, so there's no actual out-of-bounds access. Real-world repro from the TensorFlow Lite models: This issue manifests while lowering TFLite models and a lot of our system tests are failing due to this. Here's a simplified version showing the problematic pattern: In this code, `%extracted_slice_0` becomes an empty tensor when SSA value `%15` reaches 10 (on the final loop iteration), making `%16 = 0`. The operation extracts zero elements along dimension 0, which is semantically valid but fails runtime verification. ```mlir func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %c2 = arith.constant 2 : index %c10 = arith.constant 10 : index %c-1 = arith.constant -1 : index %0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32> %1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32> %2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32> %3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32> %4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32> %5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32> %c4_1 = tosa.const_shape {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1> %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) : (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) { %7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1> %extracted = tensor.extract %7[] : tensor<i1> scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32> } do { ^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>): %7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32> // First slice %8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_0 = tensor.extract %9[%c0] : tensor<3xi32> %10 = index.casts %extracted_0 : i32 to index %11 = arith.cmpi eq, %10, %c-1 : index %12 = arith.select %11, %c10, %10 : index %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> // Second slice - this is where the failure occurs %13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32> %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32> %extracted_1 = tensor.extract %14[%c0] : tensor<3xi32> %15 = index.castu %extracted_1 : i32 to index %16 = arith.subi %c10, %15 : index // size = 10 - offset %extracted_2 = tensor.extract %14[%c1] : tensor<3xi32> %17 = index.castu %extracted_2 : i32 to index %extracted_3 = tensor.extract %14[%c2] : tensor<3xi32> %18 = index.castu %extracted_3 : i32 to index // On the last loop iteration: %15=10, %16=0 // %extracted_slice_0 becomes an empty tensor // Runtime verification fails: "offset 0 is out-of-bounds" %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x4x1xf32> %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} : (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32> scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32> } return %6#1 : tensor<10x4x1xf32> } ``` The fix: Make the offset check conditional on slice size: - Empty slice (size == 0): allow `0 <= offset <= dim_size` - Non-empty slice (size > 0): require `0 <= offset < dim_size` Question for reviewers: Should we also relax the static verifier to allow this edge case? Currently, the static verifier rejects the following IR: ```mlir %tensor = arith.constant dense<1.0> : tensor<10xf32> %slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32> ``` Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes. P.S. We have a similar issue with `memref.subview`. I will send a separate patch for the issue. Co-authored-by: Hanumanth Hanumantharayappa <hhanuman@ah-hhanuman-l.dhcp.mathworks.com>	2025-11-12 08:37:15 +09:00

1 2 3 4 5 ...

1068 Commits