llvm-project

Author	SHA1	Message	Date
Jackson Stogel	7ccd92e5e6	[mlir][python] Disable pytype not-yet-supported error on Buffer import (#189440 ) For pyhon versions <3.12, pytype complains that: ``` error: in <module>: collections.abc.Buffer not supported yet [not-supported-yet] from collections.abc import Buffer as _Buffer ``` Since it seems like this code intends to support <3.12, disabling the type error on this line.	2026-03-30 11:21:35 -07:00
Alexis Engelke	7581430722	[IR] Require well-formed IR for BasicBlock::getTerminator (#189416 ) BasicBlock::getTerminator() is frequently called on valid IR, yet the function has to check that the last instruction is in fact a terminator, even in release builds. This check can only be optimized away when the instruction is dereferenced. Therefore, introduce the functions hasTerminator() and getTerminatorOrNull() as replacement and require (assert) that getTerminator() always returns a valid terminator. As a side effect, this forces explicit expression of intent at call sites when unfinished basic blocks should be supported.	2026-03-30 18:57:37 +02:00
Nishant Patel	ad4d4c0f63	[MLIR][XeGPU] Support leading unit dims in vector.multi_reduction in sg to wi pass (#188767 ) This PR adds support for transforming vector.multi_reduction with vectors > rank 2d with leading unit dims	2026-03-30 09:29:20 -07:00
jeanPerier	9a8c018081	[mlir][acc] add VariableInfo attribute to thread language specific information about privatized variables (#186368 ) Add a new acc::VariableInfoAttr attribute that can be extended and implemented by language dialects to carry language specific information about variables that is not reflected into the MLIR type system and is needed in the implementation of the init/copy/destroy APIs. A new genPrivateVariableInfo API is added to the MappableTypeInterface to generate such attribute from an mlir::Value for the host variable. The use case and motivation is the Fortran OPTIONAL attribute. This patch adds a new fir::OpenACCFortranVariableInfoAtt that implements the acc::VariableInfoAttr to carry the OPTIONAL information around.	2026-03-30 16:03:14 +02:00
Mehdi Amini	79a7b57a44	[mlir][memref] Fix invalid folds in ReinterpretCastOpConstantFolder for negative constants (#189237 ) `ReinterpretCastOpConstantFolder` could fold `memref.reinterpret_cast` ops whose offset or sizes contain negative constants (e.g. `-1 : index`). - A negative constant size passed into `ReinterpretCastOp::create` reaches `MemRefType::get`, which asserts that all static dimension sizes are non-negative, causing a crash. - A negative constant offset produces an op with a static negative offset, which the `ViewLikeInterface` verifier then rejects ("expected offsets to be non-negative"). Fix by skipping the fold when any constant size or the offset is negative. Negative strides are intentionally left foldable: they are valid in strided MemRef layouts (e.g. for reverse iteration) and neither `MemRefType::get` nor `ViewLikeInterface` places a non-negativity constraint on strides. Fixes https://github.com/llvm/llvm-project/issues/188407 Assisted-by: Claude Code	2026-03-30 12:45:55 +00:00
Mehdi Amini	25fee95684	[MLIR] Apply clang-tidy fixes for modernize-loop-convert in Deserializer.cpp (NFC)	2026-03-30 04:54:11 -07:00
Mehdi Amini	1ac60ce8a0	[MLIR] Apply clang-tidy fixes for performance-unnecessary-copy-initialization in ShardingInterfaceImpl.cpp (NFC)	2026-03-30 04:44:53 -07:00
Mehdi Amini	dfc866ca02	[MLIR] Apply clang-tidy fixes for bugprone-argument-comment in SparseTensorRewriting.cpp (NFC)	2026-03-30 04:44:53 -07:00
Mehdi Amini	b50d5ad507	[MLIR] Apply clang-tidy fixes for llvm-else-after-return in TypeConverter.cpp (NFC)	2026-03-30 04:44:52 -07:00
Mehdi Amini	4991abe079	[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in TestReshardingPartition.cpp (NFC)	2026-03-30 04:44:52 -07:00
Mehdi Amini	6c8782b347	[MLIR][Vector] Fix direct operand.set() bypassing rewriter in WarpOpScfIfOp/ForOp (#188948 ) In WarpOpScfIfOp and WarpOpScfForOp, the walk that updates users of escaping values (after moving them to the inner WarpOp) was calling operand.set() directly, bypassing the rewriter API. This causes the MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS fingerprint check to fail. Fix by wrapping the operand updates with rewriter.modifyOpInPlace(). Assisted-by: Claude Code Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.	2026-03-30 12:28:07 +02:00
Mehdi Amini	0bb0c7db2b	[MLIR][MPI] Fix direct getRefMutable().assign() bypassing rewriter in FoldCast (#188943 ) The FoldCast canonicalization pattern was calling op.getRefMutable().assign(src) directly, bypassing the rewriter. This violates the pattern API contract and causes fingerprint change failures when MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS is enabled. Wrap the modification with b.modifyOpInPlace() to properly notify the rewriter of the changes. Assisted-by: Claude Code Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.	2026-03-30 12:27:41 +02:00
Michael Marjieh	ccb64cb53e	[Value] Mark getOperandNumber as Const (#189267 )	2026-03-30 13:27:11 +03:00
Zhewen Yu	00698678e4	[mlir][affine] Add ValueBounds-based simplification for delinearize(linearize) pairs (#187245 ) `affine.linearize_index` pairs (`CancelDelinearizeOfLinearizeDisjointExactTail`) only match when basis elements are exactly equal as `OpFoldResult` values. This means they cannot simplify cases where dynamic basis products are semantically equal but represented by different SSA values or affine expressions. This patch adds a new pass `affine-simplify-with-bounds` with two rewrite patterns that use `ValueBoundsConstraintSet` to prove equality of basis products: - `SimplifyDelinearizeOfLinearizeDisjointManyToOneTail`: matches when multiple consecutive linearize dimensions have a product equal to a single delinearize dimension (many-to-one). - `SimplifyDelinearizeOfLinearizeDisjointOneToManyTail`: matches when a single linearize dimension equals the product of multiple consecutive delinearize dimensions (one-to-many). Both patterns scan from the tail (innermost dimensions) and support partial matching. Unmatched prefix dimensions are left as residual linearize/delinearize operations. Assisted-by: Cursor (Claude) --------- Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>	2026-03-30 10:25:04 +01:00
Hocky Yudhiono	a7bc628e44	[mlir][tosa] Harden folds/canonicalizations for unranked and dynamic shapes (#188188 ) This MR fixes #188187 and #187974. Tighten TOSA constant folding and identity-style folds so they do not produce invalid or type-incorrect results when the op’s result type is unranked, rank-dynamic, or otherwise not a static `RankedTensorType`. Several paths previously assumed ranked/static shapes or folded through to the operand without checking that the result type matched the value being returned. `DenseElementsAttr::get`, `SplatElementsAttr::get` and similar builders need a static shape; folding with `tensor<*xT>` or dynamic dims must not fabricate dense attributes with the wrong shape. Returning the operand from a “no-op” fold is only valid when `operand.getType() == op.getType()`; otherwise the folder would change the IR’s type semantics (e.g. ranked → unranked). Which in the bigger pipeline supposed to be handled by `-tosa-infer-shapes` Assisted-by: CLion code completion, GPT 5.3 - Codex --------- Co-authored-by: Sayan Saha <sayans@mathworks.com>	2026-03-30 10:23:01 +01:00
Mehdi Amini	4151f5d36f	[MLIR][LLVMIR] Allow llvm.call and llvm.invoke to use llvm.mlir.alias as callee (#189154 ) Previously, the verifier for `llvm.call` and `llvm.invoke` would reject calls where the callee was an `llvm.mlir.alias`, reporting that the symbol does not reference a valid LLVM function or IFunc. Similarly, the MLIR-to-LLVM-IR translation had no handling for aliases as callees. This patch extends both the verifier and the translation to accept `llvm.mlir.alias` as a valid callee for `llvm.call` and `llvm.invoke`, mirroring the existing support for `llvm.mlir.ifunc`. The function type for alias calls is derived from the call operands and result types, and the translation emits a call through the alias global value. Fixes #147057 Assisted-by: Claude Code	2026-03-29 13:40:58 +02:00
Sergei Lebedev	dd9bc6603c	[MLIR] [Python] The generated op definitions now use typed parameters (#188635 ) As with operand/result types this only handles standard dialects, but I think it is still useful as is. We could consider extensibility if/when necessary.	2026-03-29 12:39:48 +01:00
dwrank	fc01c81d03	[MLIR][build] Fix undefined references in debug shared libs (#189207 ) Fixes undefined references in debug shared libs when building MLIR: -DLLVM_ENABLE_PROJECTS="mlir" -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=1 Debug build (-O0) disables dead code elimination, resulting in undefined references in the following shared libs: MLIROpenMPDialect (needs to link with TargetParser) MLIRXeVMDialect (needs to link with TargetParser and MLIROpenMPDialect) MLIRNVVMDialect (needs to link with TargetParser and MLIROpenMPDialect) Fixes #189206 Assisted-by: Claude Code From: `d7e60d5250` Utils were added to OpenMP, particularly [[maybe_unused]] setOffloadModuleInterfaceAttributes() which calls llvm::Triple::normalize() creating a new dependency on TargetParser.	2026-03-29 13:03:37 +02:00
Nishant Patel	9f3a9ea6ae	[MLIR][XeGPU] Add distribution patterns for vector step, shape_cast & broadcast from sg-to-wi (#185960 ) This PR adds distribution patterns for vector.step, vector.shape_cast & vector.broadcast in the new sg-to-wi pass	2026-03-28 10:00:04 -07:00
Twice	e568136e94	[MLIR][Python] Add more field specifiers to Python-defined operations (#188064 ) This PR adds two new field specifiers (`operand` and `attribute`) and extends the existing one (`result`): - `default_factory` parameter is added for `result` and `attribute` to specify default value via a lambda/function - `kw_only` parameter is added for all these three specifiers, to make a field a keyword-only parameter (without giving a default value). ```python def result( , infer_type: bool = False, default_factory: Optional[Callable[[], Any]] = None, kw_only: bool = False, ) -> Any: ... def operand( , kw_only: bool = False, ) -> Any: ... def attribute( *, default_factory: Optional[Callable[[], Any]] = None, kw_only: bool = False, ) -> Any: ... ``` Examples about how to use them: ```python class OperandSpecifierOp(TestFieldSpecifiers.Operation, name="operand_specifier"): a: Operand[IntegerType[32]] = operand() b: Optional[Operand[IntegerType[32]]] = None c: Operand[IntegerType[32]] = operand(kw_only=True) class ResultSpecifierOp(TestFieldSpecifiers.Operation, name="result_specifier"): a: Result[IntegerType[32]] = result() b: Result[IntegerType[16]] = result(infer_type=True) c: Result[IntegerType] = result( default_factory=lambda: IntegerType.get_signless(8) ) d: Sequence[Result[IntegerType]] = result(default_factory=list) e: Result[IntegerType[32]] = result(kw_only=True) class AttributeSpecifierOp( TestFieldSpecifiers.Operation, name="attribute_specifier" ): a: IntegerAttr = attribute() b: IntegerAttr = attribute( default_factory=lambda: IntegerAttr.get(IntegerType.get_signless(32), 42) ) c: StringAttr["a"] \| StringAttr["b"] = attribute( default_factory=lambda: StringAttr.get("a") ) d: IntegerAttr = attribute(kw_only=True) ``` --------- Co-authored-by: Rolf Morel <rolfmorel@gmail.com>	2026-03-28 21:46:21 +08:00
Mehdi Amini	3b76b85b15	[MLIR] Fix crash in test-bytecode-roundtrip when test dialect is absent (#189163 ) When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a module that contains no test dialect operations, the reader type callback in `runTest0` called `reader.getDialectVersion<test::TestDialect>()` and then immediately asserted that it succeeded. However, if the test dialect was never referenced in the bytecode (because no test dialect types appear in the module), the dialect's version information is not stored in the bytecode, so `getDialectVersion` legitimately returns failure. When the test dialect version is unavailable in the bytecode being read, the module contains no test dialect types, so no "funky"-group overrides are needed and the callback can safely skip by returning `success()`. A regression test is added with a module that has no test dialect ops, exercising the `test-dialect-version=2.0` path that previously crashed. Fixes #128321 Fixes #128325 Assisted-by: Claude Code	2026-03-28 12:54:43 +00:00
Mehdi Amini	00c6b4dabd	[MLIR][Vector] Fix crash in foldDenseElementsAttrDestInsertOp on poison index (#188508 ) When a dynamic index of -1 (the kPoisonIndex sentinel) was folded into the static position of a vector.insert op, foldDenseElementsAttrDestInsertOp would proceed to call calculateInsertPosition, which returned -1. The subsequent iterator arithmetic (allValues.begin() + (-1)) was undefined behaviour, causing an assertion in DenseElementsAttr::get. Fix by bailing out early in foldDenseElementsAttrDestInsertOp when any static position equals kPoisonIndex, consistent with how InsertChainFullyInitialized already guards this case. Fixes #188404 Assisted-by: Claude Code	2026-03-28 10:08:23 +00:00
lonely eagle	1efef761c5	Revert "[mlir][reducer] Add eraseRedundantBlocksInRegion and getSuccessorForwardOperands API to BranchOpInterface" (#189150 ) Reverts llvm/llvm-project#187864, because it is causing same build bot failures. See https://lab.llvm.org/buildbot/#/builders/138/builds/27662 and https://lab.llvm.org/buildbot/#/builders/169/builds/21376/steps/11/logs/stdio for memory leak issues.	2026-03-28 08:51:01 +00:00
Jorn Tuyls	5ae2fe75c3	[mlir][vector] Reject alignment attribute on tensor-level gather/scatter (#188924 )	2026-03-28 09:06:19 +01:00
lonely eagle	eb53972051	[mlir][reducer] Add eraseRedundantBlocksInRegion and getSuccessorForwardOperands API to BranchOpInterface (#187864 ) To simplify the output of the reduction-tree pass, this PR introduces the eraseRedundantBlocksInRegion. For regions containing multiple execution paths, this functionality selects the shortest 'interesting' path. Additionally, this PR adds the getSuccessorForwardOperands API to BranchOpInterface. This allows us to extract the ForwardOperands for a specific path chosen from multiple alternatives, enabling the creation of a cf.br operation for the redirected jump.	2026-03-28 15:22:46 +08:00
Md Abdullah Shahneous Bari	8e59c3a816	[XeVM] Fix the cache-control metadata string generation. (#187591 ) Previously, it generated extra `single` quote marks around the outer braces (i.e., `'{'` `6442:\220,1\22` `'}'`). SPIR-V backend does not expect that. It expects `{6442:\220,1\22}`.	2026-03-27 21:18:18 -05:00
Stanislav Mekhanoshin	a2d84b5d8d	[AMDGPU] Remove neg support from 4 more gfx1250 WMMA (#189115 ) These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.	2026-03-27 15:20:14 -07:00
Mehdi Amini	509f181f40	[MLIR][TableGen] Fix ArrayRefParameter in struct format roundtrip (#189065 ) When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a non-last position within a struct() assembly format directive, the printed output is ambiguous: the comma-separated array elements are indistinguishable from the struct-level commas separating key-value pairs. Fix this by wrapping such parameters in square brackets in both the generated printer and parser. The printer emits '[' before and ']' after the array value; the parser calls parseLSquare()/parseRSquare() around the FieldParser call. Parameters with a custom printer or parser are unaffected (the user controls the format in that case). Fixes #156623 Assisted-by: Claude Code	2026-03-27 18:41:46 +00:00
Md Abdullah Shahneous Bari	88bc265295	[XeVM] Use `ocloc` for binary generation. (#188331 ) XeVM currently doesn't support native binary generation. This PR enables Ahead of Time (AOT) compilation of gpu module to native binary using `ocloc`. Currently, only works with LevelZeroRuntimeWrappers.	2026-03-27 13:29:33 -05:00
Mehdi Amini	cb58fe9df5	[MLIR][SCF] Fix loopUnrollByFactor for unsigned loops with narrow integer types (#189001 ) `loopUnrollByFactor` used `getConstantIntValue()` to read loop bounds, which sign-extends the constant to `int64_t`. For unsigned `scf.for` loops with narrow integer types (e.g. i1, i2, i3), this produces wrong results: a bound such as `1 : i1` has `getSExtValue() == -1` but should be treated as `1` (unsigned). Two bugs were introduced by this: 1. Wrong epilogue detection: the comparison `upperBoundUnrolledCst < ubCst` used signed int64, so e.g. `0 < -1` (where ubCst is the sign-extended i1 value 1) evaluated to false, suppressing the epilogue that should execute the remaining iterations. 2. Zero step after overflow: when `tripCountEvenMultiple == 0` (all iterations go to the epilogue), `stepUnrolledCst = stepCst * unrollFactor` can overflow the bound type's bitwidth and wrap to 0. A zero step causes `constantTripCount` to return `nullopt`, preventing the zero-trip main loop from being elided. Fix: - Use zero-extension (`getZExtValue`) instead of sign-extension when reading bounds for unsigned loops. - When `tripCountEvenMultiple == 0`, keep the original step for the main loop to avoid the zero-step issue (the step value is irrelevant for a zero-trip loop anyway). Fixes #163743 Assisted-by: Claude Code	2026-03-27 18:36:51 +01:00
Jianhui Li	28e2fa3247	[MLIR][XeGPU] Extend convert_layout op to support scalar type (#188874 ) This PR adds scalar type to convert_layout op's result and operand. It also enhance convert_layout pattern in wg-to-sg, unrolling, and sg-to-lane distribution. It is to support reduction to scalar, whether currently the layout propagation doesn't support scalar to carry any layout. The design choice to insert convert_layout op after reduction-to-scalar op to record the layout information permanently across the passes.	2026-03-27 10:36:35 -07:00
Han-Chung Wang	9e44babdaf	[mlir][vector] Add support for dropping inner unit dims for transfer_read/write with masks. (#188841 ) The revision clears a long-due TODO, which supports the lowering when transfer_read/write ops have mask via inserting a vector.shape_cast op for the masked value. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2026-03-27 10:21:20 -07:00
Mehdi Amini	40d5b19690	[mlir][IR] Add test for complex<i1> dense element roundtrip (#189047 ) Fixes #140302 Assisted-by: Claude Code	2026-03-27 16:32:13 +00:00
Mehdi Amini	79fdef22d6	[mlir][ods] Document and test DefaultValuedProp elision in prop-dict format (#189045 ) Issue #152743 reports that DefaultValuedProp is printed even when the property value equals the default, unlike DefaultValuedAttr which is not printed in that case. The fix for this was already present in the codebase since commit 8955e285e1ac ("[mlir] Add property combinators, initial ODS support"), which added elision of default-valued properties in the genPropDictPrinter function in OpFormatGen.cpp. This commit adds: - Documentation in Operations.md clarifying that DefaultValuedProp is also elided from prop-dict output when the value equals the default, consistent with the existing documentation for DefaultValuedAttr. - An explicit test in properties.mlir verifying that DefaultValuedProp with value equal to default is elided from prop-dict output, and that DefaultValuedProp with a non-default value is still printed. Fixes #152743 Assisted-by: Claude Code	2026-03-27 16:31:09 +00:00
Mehdi Amini	5d293008c2	[MLIR][Transforms] Fix two bugs in loop-invariant-subset-hoisting (#188761 ) Fix two issues in `MatchingSubsets::populateSubsetOpsAtIterArg`: 1. The `collectHoistableOps` parameter was declared but never used when inserting subset ops via `insert(subsetOp)`. As a result, when recursing into nested loops with `collectHoistableOps=false`, the nested loop's subset ops were incorrectly added to the hoistable extraction/insertion pairs of the parent loop. This caused spurious failures in the `allDisjoint` check, preventing valid hoisting when nested loop ops overlapped with outer loop ops. Fix by passing the parameter: `insert(subsetOp, collectHoistableOps)`. 2. In the nested loop handling branch, there was no guard to detect when a value has multiple nested loop uses (i.e., is used as an init arg in more than one nested loop). Without the guard, `nextValue` would be silently overwritten, leading to an incorrect use-def chain traversal. Add `if (nextValue) return failure()` before setting `nextValue` for the nested loop case, mirroring the existing guard for insertion ops. Fixes #147096 Assisted-by: Claude Code	2026-03-27 17:27:08 +01:00
Mehdi Amini	e9669fd6fb	[MLIR][EmitC] Fix crash in SwitchOp::getEntrySuccessorRegions on unsigned integer type (#188546 ) SwitchOp::getEntrySuccessorRegions and getRegionInvocationBounds called IntegerAttr::getInt() to retrieve the constant switch argument, but getInt() asserts that the attribute type must be a signless integer or index. For unsigned integer types (e.g. ui32), this assertion fired and crashed the process. Fix by selecting the appropriate accessor based on the attribute type: getInt() for signless/index, getSInt() for signed, and getUInt() (cast to int64_t) for unsigned integer types. Unknown types fall back to the conservative "all regions possible" path. The same fix is applied to getRegionInvocationBounds, which had an identical call to getInt(). Fixes #187973 Assisted-by: Claude Code	2026-03-27 17:26:45 +01:00
Mehdi Amini	23eec12169	[MLIR] Fix outdated restriction comment in RemoveDeadValuesPass (#189041 ) The RemoveDeadValuesPass previously emitted an error and skipped optimization when the IR contained non-function symbol ops, non-call symbol user ops, or branch ops. This restriction was later removed, but the comments in RemoveDeadValues.cpp and Passes.td still described the pass as operating "iff the IR doesn't have any non-function symbol ops, non-call symbol user ops and branch ops." Remove the stale restriction text from both the .cpp file comment and the Passes.td description. Also add a test that verifies dead function arguments are correctly removed inside a module that defines a symbol (has a sym_name attribute), which was the original failure case reported in issue #98700. Fixes #98700 Assisted-by: Claude Code	2026-03-27 16:22:47 +00:00
Mehdi Amini	3363a0ead9	[MLIR][Shard] Fix NormalizeSharding and FoldDuplicateShardOp direct mutations (#188981 ) Calling attribute setters and MutableOperandRange::assign() without going through the PatternRewriter, bypassing the rewriter's change-tracking triggered "operation finger print changed" after the pattern returned success under MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS. Assisted-by: Claude Code	2026-03-27 16:22:12 +00:00
Mehdi Amini	3c9938d955	[MLIR][XeVM] Wrap in-place op modifications in modifyOpInPlace in LLVMLoadStoreToOCLPattern (#188952 ) LLVMLoadStoreToOCLPattern::matchAndRewrite was calling op->removeAttr() and op->setOperand() directly without going through the rewriter API. This caused MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS to report "expected pattern to replace the root operation or modify it in place". Fix: wrap the direct mutations in rewriter.modifyOpInPlace(). Assisted-by: Claude Code Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.	2026-03-27 09:19:29 -07:00
Mehdi Amini	e9f51a39f9	[MLIR][SCF] Add regression tests for ConditionPropagation in nested ifs (#189036 ) Add explicit tests for condition propagation in scf.if then and else branches, including the void-return case. These tests serve as regression tests for the bug reported in #159165 where the SCFIfConditionPropagationPass (since reverted) had a visited-set that was never populated, causing the pass to not propagate conditions into nested scf.if statements. The current ConditionPropagation canonicalization pattern in SCF.cpp correctly handles both nested ifs and direct condition uses within branches using the getParentType() ancestor check. Fixes #159165 Assisted-by: Claude Code	2026-03-27 16:17:11 +00:00
Mehdi Amini	ea8b1608af	[GPUToLLVM] Support multiple async dependencies in gpu.launch_func lowering (#188987 ) LegalizeLaunchFuncOpPattern previously rejected gpu.launch_func ops with more than one async dependency. This change removes that limitation by synchronizing additional dependencies onto the primary stream using CUDA/HIP events, following the same approach already used in ConvertWaitAsyncOpToGpuRuntimeCallPattern for gpu.wait async. For each additional async dependency beyond the first: - If it is a stream (produced by mgpuStreamCreate), create an event, record it on that stream, wait for it on the primary stream, then destroy the event. - If it is already an event, wait for it directly on the primary stream and destroy it. Fixes #156984 Assisted-by: Claude Code	2026-03-27 16:09:19 +00:00
Bangtian Liu	52bb40fe37	[mlir][gpu] Add gpu.ballot operation to GPU dialect (#188647 ) Motivated by the need from IREE side to support ArgMax/ArgMin-like operation using dpp and ballot operation (refer to https://github.com/iree-org/iree/discussions/23609#discussioncomment-16311655 for more details), this PR adds `gpu.ballot` operation to the MLIR GPU dialect with ROCDL, NVVM, and SPIR-V lowering support. Assisted-by: [Claude Code](https://claude.ai/code) --------- Signed-off-by: Bangtian Liu <liubangtian@gmail.com>	2026-03-27 11:48:47 -04:00
Ravil Dorozhinskii	760c4292c0	[MLIR][AMDGPU] Added l2-prefetch op to AMDGPU (#188457 ) This PR adds `global_prefetch` op to prefetch a cache line to high-level caches using the aligned address of the source `memref` and an offset provided by the indices of the element containing the cache line. This provides temporal hints (e.g., regular or high-priority). Note that out-of-bounds access is allowed in speculative mode. Ensure the source `memref` is in address space `1`. --------- Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2026-03-27 16:05:30 +01:00
Mehdi Amini	b959831ed2	[MLIR][Arith] Fix int-range-optimizations miscompile from stale solver state (#188992 ) The `--int-range-optimizations` pass runs the `DataFlowSolver` once, then calls `applyPatternsGreedily` with a `DataFlowListener` that erases solver state when ops are deleted. However, the greedy driver's `simplifyRegions` step (which calls `runRegionDCE` between pattern iterations) can remove block arguments without notifying the listener. This frees the `BlockArgumentImpl` storage, which may be reused by a subsequent allocation. The solver then finds stale lattice state keyed at the reused address and incorrectly treats the new block argument as a known constant, causing a miscompile. The existing `enableFolding(false)` was added for the same class of bug (folding can also remove block arguments). This patch extends the fix by also disabling region simplification, preventing dead-arg elimination from causing the same address-reuse problem. Fixes #137281 Fixes #126195 Assisted-by: Claude Code	2026-03-27 16:00:55 +01:00
Mehdi Amini	79658d7769	[MLIR][SCF] Fix ForLoopRangeFolding miscompile with non-positive MulIOp multiplier (#188995 ) The scf-for-loop-range-folding pass transforms loops of the form for (i = lb; i < ub; i += step) { use(i * c) } into for (j = lbc; j < ubc; j += step*c) { use(j) } This transformation is only valid when c is strictly positive, since scf.for requires a positive step. When c is zero or negative, the new step becomes zero or effectively negative (wrapping in unsigned arithmetic for index type), producing an incorrect loop. Add a guard that restricts the MulIOp folding to cases where the loop-invariant multiplier is a statically known positive integer constant. Non-constant loop-invariant multipliers are also excluded since their sign cannot be determined at compile time. Fixes #56235 Fixes #116664 Assisted-by: Claude Code	2026-03-27 16:00:28 +01:00
Mehdi Amini	c3bffc8e82	Revert "[MLIR] Fix ErasedOpsListener false positives for newly created ops/blocks" (#189010 ) Reverts llvm/llvm-project#188956 Hit "merge" by accident on the wrong tab, juggling too may PRs in parallel...	2026-03-27 14:38:33 +00:00
Davide Grohmann	71ea39a157	[mlir][spirv] Add Gather/Scatter/Resize ops in TOSA Ext Inst Set (#188497 ) This patch introduces the following operators: spirv.Tosa.Gather spirv.Tosa.Scatter spirv.Tosa.Resize Also dialect and serialization round-trip tests have been added. Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>	2026-03-27 15:13:53 +01:00
Mehdi Amini	3303578234	[MLIR][GPU] Reject nested symbol references in gpu-kernel-outlining (#188994 ) Nested symbol references (e.g. `@module::@func`) inside a `gpu.launch` body cannot be resolved after the body is outlined into a new `gpu.module`. Previously, `createKernelModule` used `getLeafReference()` to look up each symbol use, which silently skipped nested references when the leaf name could not be found in the parent symbol table. This left unresolvable cross-module references in the outlined kernel. This patch detects nested symbol references whose root exists in the parent symbol table — meaning the reference was valid before outlining but will become dangling after it — and emits a diagnostic error. Phantom references whose root does not exist in the parent are left as-is, preserving existing behavior for unregistered-op attributes (regression test from #185357). The existing `@nested_launch` test was inadvertently testing this broken behavior (silently producing invalid IR with a dangling `@nested_launch_kernel::@nested_launch_kernel` reference inside the outlined outer kernel module); it is updated to expect the new error. Fixes #187942 Assisted-by: Claude Code	2026-03-27 15:00:09 +01:00
Mehdi Amini	18ad761df0	[mlir] Fix generate-test-checks.py: don't put attr refs in CHECK-LABEL (#188985 ) Two related bugs in generate-test-checks.py when a top-level operation carries attribute alias references (e.g. `#map`, `#map1`) in its signature: 1. The attribute reference substitution (replacing `#map` with `#[[$ATTR_0]]`) ran before the pending attribute definitions were processed, so the names were not yet available and the references were left as-is in the output. 2. CHECK-LABEL lines do not support FileCheck variable references (e.g. `#[[$ATTR_0]]`), so even after substitution the generated check would be syntactically wrong. Fix both issues: - In the CHECK-LABEL branch, re-apply `process_attribute_references` to the label prefix and SSA-split rest after flushing pending attribute definitions, so that names are resolved. - Split the label prefix at attribute reference boundaries; keep only the text before the first reference in the CHECK-LABEL line and emit the remainder on a CHECK-SAME line. Before: // CHECK-LABEL: func.func @test() attributes {amap = #map, bmap = #map1} { After: // CHECK-LABEL: func.func @test() attributes {amap = // CHECK-SAME: #[[$ATTR_0]], bmap = #[[$ATTR_1]]} { Fixes #162310 Assisted-by: Claude Code	2026-03-27 14:59:42 +01:00
Mehdi Amini	9c0bce7ede	[MLIR][Transform] Fix crash in CheckUses when op lacks MemoryEffectOpInterface (#188998 ) collectFreedValues used cast<MemoryEffectOpInterface> unconditionally on every op encountered during the walk. Ops that do not implement the interface (e.g. pdl.pattern, pdl.operands, pdl.types inside a transform.with_pdl_patterns region) trigger the cast assertion, crashing mlir-opt when -transform-dialect-check-uses is requested. Change the cast to dyn_cast and skip ops that don't implement the interface; they cannot free transform values and are safely ignored. Fixes #120944 Assisted-by: Claude Code	2026-03-27 13:57:07 +00:00

1 2 3 4 5 ...

26501 Commits