26501 Commits

Author SHA1 Message Date
Jackson Stogel
7ccd92e5e6
[mlir][python] Disable pytype not-yet-supported error on Buffer import (#189440)
For pyhon versions <3.12, pytype complains that:

```
error: in <module>: collections.abc.Buffer not supported yet [not-supported-yet]
  from collections.abc import Buffer as _Buffer
```

Since it seems like this code intends to support <3.12, disabling the
type error on this line.
2026-03-30 11:21:35 -07:00
Alexis Engelke
7581430722
[IR] Require well-formed IR for BasicBlock::getTerminator (#189416)
BasicBlock::getTerminator() is frequently called on valid IR, yet the
function has to check that the last instruction is in fact a terminator,
even in release builds. This check can only be optimized away when the
instruction is dereferenced.

Therefore, introduce the functions hasTerminator() and
getTerminatorOrNull() as replacement and require (assert) that
getTerminator() always returns a valid terminator. As a side effect,
this forces explicit expression of intent at call sites when unfinished
basic blocks should be supported.
2026-03-30 18:57:37 +02:00
Nishant Patel
ad4d4c0f63
[MLIR][XeGPU] Support leading unit dims in vector.multi_reduction in sg to wi pass (#188767)
This PR adds support for transforming vector.multi_reduction with
vectors > rank 2d with leading unit dims
2026-03-30 09:29:20 -07:00
jeanPerier
9a8c018081
[mlir][acc] add VariableInfo attribute to thread language specific information about privatized variables (#186368)
Add a new acc::VariableInfoAttr attribute that can be extended and implemented by
language dialects to carry language specific information about variables that is
not reflected into the MLIR type system and is needed in the implementation
of the init/copy/destroy APIs.
A new genPrivateVariableInfo API is added to the MappableTypeInterface to generate
such attribute from an mlir::Value for the host variable.
The use case and motivation is the Fortran OPTIONAL attribute. This patch adds
a new fir::OpenACCFortranVariableInfoAtt that implements the acc::VariableInfoAttr
to carry the OPTIONAL information around.
2026-03-30 16:03:14 +02:00
Mehdi Amini
79a7b57a44
[mlir][memref] Fix invalid folds in ReinterpretCastOpConstantFolder for negative constants (#189237)
`ReinterpretCastOpConstantFolder` could fold `memref.reinterpret_cast`
ops whose offset or sizes contain negative constants (e.g. `-1 :
index`).

- A negative constant size passed into `ReinterpretCastOp::create`
reaches
  `MemRefType::get`, which asserts that all static dimension sizes are
  non-negative, causing a crash.

- A negative constant offset produces an op with a static negative
offset,
which the `ViewLikeInterface` verifier then rejects ("expected offsets
to
  be non-negative").

Fix by skipping the fold when any constant size or the offset is
negative.
Negative strides are intentionally left foldable: they are valid in
strided MemRef layouts (e.g. for reverse iteration) and neither
`MemRefType::get` nor `ViewLikeInterface` places a non-negativity
constraint on strides.

Fixes https://github.com/llvm/llvm-project/issues/188407

Assisted-by: Claude Code
2026-03-30 12:45:55 +00:00
Mehdi Amini
25fee95684 [MLIR] Apply clang-tidy fixes for modernize-loop-convert in Deserializer.cpp (NFC) 2026-03-30 04:54:11 -07:00
Mehdi Amini
1ac60ce8a0 [MLIR] Apply clang-tidy fixes for performance-unnecessary-copy-initialization in ShardingInterfaceImpl.cpp (NFC) 2026-03-30 04:44:53 -07:00
Mehdi Amini
dfc866ca02 [MLIR] Apply clang-tidy fixes for bugprone-argument-comment in SparseTensorRewriting.cpp (NFC) 2026-03-30 04:44:53 -07:00
Mehdi Amini
b50d5ad507 [MLIR] Apply clang-tidy fixes for llvm-else-after-return in TypeConverter.cpp (NFC) 2026-03-30 04:44:52 -07:00
Mehdi Amini
4991abe079 [MLIR] Apply clang-tidy fixes for llvm-qualified-auto in TestReshardingPartition.cpp (NFC) 2026-03-30 04:44:52 -07:00
Mehdi Amini
6c8782b347
[MLIR][Vector] Fix direct operand.set() bypassing rewriter in WarpOpScfIfOp/ForOp (#188948)
In WarpOpScfIfOp and WarpOpScfForOp, the walk that updates users of
escaping values (after moving them to the inner WarpOp) was calling
operand.set() directly, bypassing the rewriter API. This causes the
MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS fingerprint check to fail.

Fix by wrapping the operand updates with rewriter.modifyOpInPlace().

Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
2026-03-30 12:28:07 +02:00
Mehdi Amini
0bb0c7db2b
[MLIR][MPI] Fix direct getRefMutable().assign() bypassing rewriter in FoldCast (#188943)
The FoldCast canonicalization pattern was calling
op.getRefMutable().assign(src) directly, bypassing the rewriter. This
violates the pattern API contract and causes fingerprint change failures
when
MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS is enabled. Wrap the
modification with b.modifyOpInPlace() to properly notify the rewriter of
the changes.

Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
2026-03-30 12:27:41 +02:00
Michael Marjieh
ccb64cb53e
[Value] Mark getOperandNumber as Const (#189267) 2026-03-30 13:27:11 +03:00
Zhewen Yu
00698678e4
[mlir][affine] Add ValueBounds-based simplification for delinearize(linearize) pairs (#187245)
`affine.linearize_index` pairs
(`CancelDelinearizeOfLinearizeDisjointExactTail`) only match when basis
elements are exactly equal as `OpFoldResult` values. This means they
cannot simplify cases where dynamic basis products are semantically
equal but represented by different SSA values or affine expressions.

This patch adds a new pass `affine-simplify-with-bounds` with two
rewrite patterns that use `ValueBoundsConstraintSet` to prove equality
of basis products:

- **`SimplifyDelinearizeOfLinearizeDisjointManyToOneTail`**: matches
when multiple consecutive linearize dimensions have a product equal to a
single delinearize dimension (many-to-one).
- **`SimplifyDelinearizeOfLinearizeDisjointOneToManyTail`**: matches
when a single linearize dimension equals the product of multiple
consecutive delinearize dimensions (one-to-many).

Both patterns scan from the tail (innermost dimensions) and support
partial matching. Unmatched prefix dimensions are left as residual
linearize/delinearize operations.

Assisted-by: Cursor (Claude)

---------

Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>
2026-03-30 10:25:04 +01:00
Hocky Yudhiono
a7bc628e44
[mlir][tosa] Harden folds/canonicalizations for unranked and dynamic shapes (#188188)
This MR fixes #188187 and #187974. Tighten TOSA constant folding and
identity-style folds so they do not produce invalid or type-incorrect
results when the op’s result type is unranked, rank-dynamic, or
otherwise not a static `RankedTensorType`. Several paths previously
assumed ranked/static shapes or folded through to the operand without
checking that the result type matched the value being returned.

`DenseElementsAttr::get`, `SplatElementsAttr::get` and similar builders
need a static shape; folding with `tensor<*xT>` or dynamic dims must not
fabricate dense attributes with the wrong shape.

Returning the operand from a “no-op” fold is only valid when
`operand.getType() == op.getType()`; otherwise the folder would change
the IR’s type semantics (e.g. ranked → unranked). Which in the bigger
pipeline supposed to be handled by `-tosa-infer-shapes`

Assisted-by: CLion code completion, GPT 5.3 - Codex

---------

Co-authored-by: Sayan Saha <sayans@mathworks.com>
2026-03-30 10:23:01 +01:00
Mehdi Amini
4151f5d36f
[MLIR][LLVMIR] Allow llvm.call and llvm.invoke to use llvm.mlir.alias as callee (#189154)
Previously, the verifier for `llvm.call` and `llvm.invoke` would reject
calls where the callee was an `llvm.mlir.alias`, reporting that the
symbol does not reference a valid LLVM function or IFunc. Similarly, the
MLIR-to-LLVM-IR translation had no handling for aliases as callees.

This patch extends both the verifier and the translation to accept
`llvm.mlir.alias` as a valid callee for `llvm.call` and `llvm.invoke`,
mirroring the existing support for `llvm.mlir.ifunc`. The function type
for alias calls is derived from the call operands and result types, and
the translation emits a call through the alias global value.

Fixes #147057

Assisted-by: Claude Code
2026-03-29 13:40:58 +02:00
Sergei Lebedev
dd9bc6603c
[MLIR] [Python] The generated op definitions now use typed parameters (#188635)
As with operand/result types this only handles standard dialects, but I think it is still useful as is.

We could consider extensibility if/when necessary.
2026-03-29 12:39:48 +01:00
dwrank
fc01c81d03
[MLIR][build] Fix undefined references in debug shared libs (#189207)
Fixes undefined references in debug shared libs when building MLIR:
-DLLVM_ENABLE_PROJECTS="mlir"
-DCMAKE_BUILD_TYPE=Debug
-DBUILD_SHARED_LIBS=1

Debug build (-O0) disables dead code elimination, resulting in undefined
references in the following shared libs:

MLIROpenMPDialect (needs to link with TargetParser)
MLIRXeVMDialect (needs to link with TargetParser and MLIROpenMPDialect)
MLIRNVVMDialect (needs to link with TargetParser and MLIROpenMPDialect)

Fixes #189206

Assisted-by: Claude Code

From:

d7e60d5250
Utils were added to OpenMP, particularly [[maybe_unused]]
setOffloadModuleInterfaceAttributes() which calls
llvm::Triple::normalize() creating a new dependency on TargetParser.
2026-03-29 13:03:37 +02:00
Nishant Patel
9f3a9ea6ae
[MLIR][XeGPU] Add distribution patterns for vector step, shape_cast & broadcast from sg-to-wi (#185960)
This PR adds distribution patterns for vector.step, vector.shape_cast &
vector.broadcast in the new sg-to-wi pass
2026-03-28 10:00:04 -07:00
Twice
e568136e94
[MLIR][Python] Add more field specifiers to Python-defined operations (#188064)
This PR adds two new field specifiers (`operand` and `attribute`) and
extends the existing one (`result`):
- `default_factory` parameter is added for `result` and `attribute` to
specify default value via a lambda/function
- `kw_only` parameter is added for all these three specifiers, to make a
field a keyword-only parameter (without giving a default value).

```python
def result(
    *,
    infer_type: bool = False,
    default_factory: Optional[Callable[[], Any]] = None,
    kw_only: bool = False,
) -> Any: ...


def operand(
    *,
    kw_only: bool = False,
) -> Any: ...


def attribute(
    *,
    default_factory: Optional[Callable[[], Any]] = None,
    kw_only: bool = False,
) -> Any: ...
```

Examples about how to use them:
```python
class OperandSpecifierOp(TestFieldSpecifiers.Operation, name="operand_specifier"):
    a: Operand[IntegerType[32]] = operand()
    b: Optional[Operand[IntegerType[32]]] = None
    c: Operand[IntegerType[32]] = operand(kw_only=True)

class ResultSpecifierOp(TestFieldSpecifiers.Operation, name="result_specifier"):
    a: Result[IntegerType[32]] = result()
    b: Result[IntegerType[16]] = result(infer_type=True)
    c: Result[IntegerType] = result(
        default_factory=lambda: IntegerType.get_signless(8)
    )
    d: Sequence[Result[IntegerType]] = result(default_factory=list)
    e: Result[IntegerType[32]] = result(kw_only=True)

class AttributeSpecifierOp(
    TestFieldSpecifiers.Operation, name="attribute_specifier"
):
    a: IntegerAttr = attribute()
    b: IntegerAttr = attribute(
        default_factory=lambda: IntegerAttr.get(IntegerType.get_signless(32), 42)
    )
    c: StringAttr["a"] | StringAttr["b"] = attribute(
        default_factory=lambda: StringAttr.get("a")
    )
    d: IntegerAttr = attribute(kw_only=True)
```

---------

Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
2026-03-28 21:46:21 +08:00
Mehdi Amini
3b76b85b15
[MLIR] Fix crash in test-bytecode-roundtrip when test dialect is absent (#189163)
When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a
module that contains no test dialect operations, the reader type
callback in `runTest0` called
`reader.getDialectVersion<test::TestDialect>()` and then immediately
asserted that it succeeded. However, if the test dialect was never
referenced in the bytecode (because no test dialect types appear in the
module), the dialect's version information is not stored in the
bytecode, so `getDialectVersion` legitimately returns failure.

When the test dialect version is unavailable in the bytecode being read,
the module contains no test dialect types, so no "funky"-group overrides
are needed and the callback can safely skip by returning `success()`.

A regression test is added with a module that has no test dialect ops,
exercising the `test-dialect-version=2.0` path that previously crashed.

Fixes #128321
Fixes #128325

Assisted-by: Claude Code
2026-03-28 12:54:43 +00:00
Mehdi Amini
00c6b4dabd
[MLIR][Vector] Fix crash in foldDenseElementsAttrDestInsertOp on poison index (#188508)
When a dynamic index of -1 (the kPoisonIndex sentinel) was folded into
the static position of a vector.insert op,
foldDenseElementsAttrDestInsertOp would proceed to call
calculateInsertPosition, which returned -1. The subsequent iterator
arithmetic (allValues.begin() + (-1)) was undefined behaviour, causing
an assertion in DenseElementsAttr::get.

Fix by bailing out early in foldDenseElementsAttrDestInsertOp when any
static position equals kPoisonIndex, consistent with how
InsertChainFullyInitialized already guards this case.

Fixes #188404

Assisted-by: Claude Code
2026-03-28 10:08:23 +00:00
lonely eagle
1efef761c5
Revert "[mlir][reducer] Add eraseRedundantBlocksInRegion and getSuccessorForwardOperands API to BranchOpInterface" (#189150)
Reverts llvm/llvm-project#187864, because it is causing same build bot
failures. See https://lab.llvm.org/buildbot/#/builders/138/builds/27662
and
https://lab.llvm.org/buildbot/#/builders/169/builds/21376/steps/11/logs/stdio
for memory leak issues.
2026-03-28 08:51:01 +00:00
Jorn Tuyls
5ae2fe75c3
[mlir][vector] Reject alignment attribute on tensor-level gather/scatter (#188924) 2026-03-28 09:06:19 +01:00
lonely eagle
eb53972051
[mlir][reducer] Add eraseRedundantBlocksInRegion and getSuccessorForwardOperands API to BranchOpInterface (#187864)
To simplify the output of the reduction-tree pass, this PR introduces
the eraseRedundantBlocksInRegion. For regions containing multiple
execution paths, this functionality selects the shortest 'interesting'
path. Additionally, this PR adds the getSuccessorForwardOperands API to
BranchOpInterface. This allows us to extract the ForwardOperands for a
specific path chosen from multiple alternatives, enabling the creation
of a cf.br operation for the redirected jump.
2026-03-28 15:22:46 +08:00
Md Abdullah Shahneous Bari
8e59c3a816
[XeVM] Fix the cache-control metadata string generation. (#187591)
Previously, it generated extra `single` quote marks around the outer
braces (i.e., `'{'` `6442:\220,1\22` `'}'`). SPIR-V backend does not
expect that. It expects `{6442:\220,1\22}`.
2026-03-27 21:18:18 -05:00
Stanislav Mekhanoshin
a2d84b5d8d
[AMDGPU] Remove neg support from 4 more gfx1250 WMMA (#189115)
These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.
2026-03-27 15:20:14 -07:00
Mehdi Amini
509f181f40
[MLIR][TableGen] Fix ArrayRefParameter in struct format roundtrip (#189065)
When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a
non-last position within a struct() assembly format directive, the
printed
output is ambiguous: the comma-separated array elements are
indistinguishable from the struct-level commas separating key-value
pairs.

Fix this by wrapping such parameters in square brackets in both the
generated printer and parser. The printer emits '[' before and ']' after
the array value; the parser calls parseLSquare()/parseRSquare() around
the
FieldParser call. Parameters with a custom printer or parser are
unaffected
(the user controls the format in that case).

Fixes #156623

Assisted-by: Claude Code
2026-03-27 18:41:46 +00:00
Md Abdullah Shahneous Bari
88bc265295
[XeVM] Use ocloc for binary generation. (#188331)
XeVM currently doesn't support native binary generation. This PR enables
Ahead of Time (AOT) compilation of gpu module to native binary using
`ocloc`.

Currently, only works with LevelZeroRuntimeWrappers.
2026-03-27 13:29:33 -05:00
Mehdi Amini
cb58fe9df5
[MLIR][SCF] Fix loopUnrollByFactor for unsigned loops with narrow integer types (#189001)
`loopUnrollByFactor` used `getConstantIntValue()` to read loop bounds,
which sign-extends the constant to `int64_t`. For unsigned `scf.for`
loops with narrow integer types (e.g. i1, i2, i3), this produces wrong
results: a bound such as `1 : i1` has `getSExtValue() == -1` but should
be treated as `1` (unsigned).

Two bugs were introduced by this:

1. **Wrong epilogue detection**: the comparison `upperBoundUnrolledCst <
ubCst` used signed int64, so e.g. `0 < -1` (where ubCst is the
sign-extended i1 value 1) evaluated to false, suppressing the epilogue
that should execute the remaining iterations.

2. **Zero step after overflow**: when `tripCountEvenMultiple == 0` (all
iterations go to the epilogue), `stepUnrolledCst = stepCst *
unrollFactor` can overflow the bound type's bitwidth and wrap to 0. A
zero step causes `constantTripCount` to return `nullopt`, preventing the
zero-trip main loop from being elided.

Fix:
- Use zero-extension (`getZExtValue`) instead of sign-extension when
reading bounds for unsigned loops.
- When `tripCountEvenMultiple == 0`, keep the original step for the main
loop to avoid the zero-step issue (the step value is irrelevant for a
zero-trip loop anyway).

Fixes #163743

Assisted-by: Claude Code
2026-03-27 18:36:51 +01:00
Jianhui Li
28e2fa3247
[MLIR][XeGPU] Extend convert_layout op to support scalar type (#188874)
This PR adds scalar type to convert_layout op's result and operand. It
also enhance convert_layout pattern in wg-to-sg, unrolling, and
sg-to-lane distribution.

It is to support reduction to scalar, whether currently the layout
propagation doesn't support scalar to carry any layout. The design
choice to insert convert_layout op after reduction-to-scalar op to
record the layout information permanently across the passes.
2026-03-27 10:36:35 -07:00
Han-Chung Wang
9e44babdaf
[mlir][vector] Add support for dropping inner unit dims for transfer_read/write with masks. (#188841)
The revision clears a long-due TODO, which supports the lowering when
transfer_read/write ops have mask via inserting a vector.shape_cast op
for the masked value.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2026-03-27 10:21:20 -07:00
Mehdi Amini
40d5b19690
[mlir][IR] Add test for complex<i1> dense element roundtrip (#189047)
Fixes #140302

Assisted-by: Claude Code
2026-03-27 16:32:13 +00:00
Mehdi Amini
79fdef22d6
[mlir][ods] Document and test DefaultValuedProp elision in prop-dict format (#189045)
Issue #152743 reports that DefaultValuedProp is printed even when the
property value equals the default, unlike DefaultValuedAttr which is not
printed in that case.

The fix for this was already present in the codebase since commit
8955e285e1ac ("[mlir] Add property combinators, initial ODS support"),
which added elision of default-valued properties in the
genPropDictPrinter
function in OpFormatGen.cpp.

This commit adds:
- Documentation in Operations.md clarifying that DefaultValuedProp is
  also elided from prop-dict output when the value equals the default,
  consistent with the existing documentation for DefaultValuedAttr.
- An explicit test in properties.mlir verifying that DefaultValuedProp
  with value equal to default is elided from prop-dict output, and that
  DefaultValuedProp with a non-default value is still printed.

Fixes #152743

Assisted-by: Claude Code
2026-03-27 16:31:09 +00:00
Mehdi Amini
5d293008c2
[MLIR][Transforms] Fix two bugs in loop-invariant-subset-hoisting (#188761)
Fix two issues in `MatchingSubsets::populateSubsetOpsAtIterArg`:

1. The `collectHoistableOps` parameter was declared but never used when
inserting subset ops via `insert(subsetOp)`. As a result, when recursing
into nested loops with `collectHoistableOps=false`, the nested loop's
subset ops were incorrectly added to the hoistable extraction/insertion
pairs of the parent loop. This caused spurious failures in the
`allDisjoint` check, preventing valid hoisting when nested loop ops
overlapped with outer loop ops. Fix by passing the parameter:
`insert(subsetOp, collectHoistableOps)`.

2. In the nested loop handling branch, there was no guard to detect when
a value has multiple nested loop uses (i.e., is used as an init arg in
more than one nested loop). Without the guard, `nextValue` would be
silently overwritten, leading to an incorrect use-def chain traversal.
Add `if (nextValue) return failure()` before setting `nextValue` for the
nested loop case, mirroring the existing guard for insertion ops.

Fixes #147096

Assisted-by: Claude Code
2026-03-27 17:27:08 +01:00
Mehdi Amini
e9669fd6fb
[MLIR][EmitC] Fix crash in SwitchOp::getEntrySuccessorRegions on unsigned integer type (#188546)
SwitchOp::getEntrySuccessorRegions and getRegionInvocationBounds called
IntegerAttr::getInt() to retrieve the constant switch argument, but
getInt() asserts that the attribute type must be a signless integer or
index. For unsigned integer types (e.g. ui32), this assertion fired and
crashed the process.

Fix by selecting the appropriate accessor based on the attribute type:
getInt() for signless/index, getSInt() for signed, and getUInt() (cast
to int64_t) for unsigned integer types. Unknown types fall back to the
conservative "all regions possible" path.

The same fix is applied to getRegionInvocationBounds, which had an
identical call to getInt().

Fixes #187973

Assisted-by: Claude Code
2026-03-27 17:26:45 +01:00
Mehdi Amini
23eec12169
[MLIR] Fix outdated restriction comment in RemoveDeadValuesPass (#189041)
The RemoveDeadValuesPass previously emitted an error and skipped
optimization when the IR contained non-function symbol ops, non-call
symbol user ops, or branch ops. This restriction was later removed, but
the comments in RemoveDeadValues.cpp and Passes.td still described the
pass as operating "iff the IR doesn't have any non-function symbol ops,
non-call symbol user ops and branch ops."

Remove the stale restriction text from both the .cpp file comment and
the Passes.td description. Also add a test that verifies dead function
arguments are correctly removed inside a module that defines a symbol
(has a sym_name attribute), which was the original failure case reported
in issue #98700.

Fixes #98700

Assisted-by: Claude Code
2026-03-27 16:22:47 +00:00
Mehdi Amini
3363a0ead9
[MLIR][Shard] Fix NormalizeSharding and FoldDuplicateShardOp direct mutations (#188981)
Calling attribute setters and MutableOperandRange::assign() without
going through the PatternRewriter, bypassing the rewriter's
change-tracking triggered "operation finger print changed" after the
pattern returned success under
MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.

Assisted-by: Claude Code
2026-03-27 16:22:12 +00:00
Mehdi Amini
3c9938d955
[MLIR][XeVM] Wrap in-place op modifications in modifyOpInPlace in LLVMLoadStoreToOCLPattern (#188952)
LLVMLoadStoreToOCLPattern::matchAndRewrite was calling op->removeAttr()
and op->setOperand() directly without going through the rewriter API.
This caused MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS to report "expected
pattern to replace the root operation or modify it in place".

Fix: wrap the direct mutations in rewriter.modifyOpInPlace().

Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
2026-03-27 09:19:29 -07:00
Mehdi Amini
e9f51a39f9
[MLIR][SCF] Add regression tests for ConditionPropagation in nested ifs (#189036)
Add explicit tests for condition propagation in scf.if then and else
branches, including the void-return case. These tests serve as
regression
tests for the bug reported in #159165 where the
SCFIfConditionPropagationPass
(since reverted) had a visited-set that was never populated, causing the
pass
to not propagate conditions into nested scf.if statements.

The current ConditionPropagation canonicalization pattern in SCF.cpp
correctly handles both nested ifs and direct condition uses within
branches
using the getParentType() ancestor check.

Fixes #159165

Assisted-by: Claude Code
2026-03-27 16:17:11 +00:00
Mehdi Amini
ea8b1608af
[GPUToLLVM] Support multiple async dependencies in gpu.launch_func lowering (#188987)
LegalizeLaunchFuncOpPattern previously rejected gpu.launch_func ops with
more than one async dependency. This change removes that limitation by
synchronizing additional dependencies onto the primary stream using
CUDA/HIP events, following the same approach already used in
ConvertWaitAsyncOpToGpuRuntimeCallPattern for gpu.wait async.

For each additional async dependency beyond the first:
- If it is a stream (produced by mgpuStreamCreate), create an event,
record it on that stream, wait for it on the primary stream, then
destroy the event.
- If it is already an event, wait for it directly on the primary stream
and destroy it.

Fixes #156984

Assisted-by: Claude Code
2026-03-27 16:09:19 +00:00
Bangtian Liu
52bb40fe37
[mlir][gpu] Add gpu.ballot operation to GPU dialect (#188647)
Motivated by the need from IREE side to support ArgMax/ArgMin-like
operation using dpp and ballot operation (refer to
https://github.com/iree-org/iree/discussions/23609#discussioncomment-16311655
for more details), this PR adds `gpu.ballot` operation to the MLIR GPU
dialect with ROCDL, NVVM, and SPIR-V lowering support.

Assisted-by:  [Claude Code](https://claude.ai/code)

---------

Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
2026-03-27 11:48:47 -04:00
Ravil Dorozhinskii
760c4292c0
[MLIR][AMDGPU] Added l2-prefetch op to AMDGPU (#188457)
This PR adds `global_prefetch` op to prefetch a cache line to high-level
caches using the aligned address of the source `memref` and an offset
provided by the indices of the element containing the cache line. This
provides temporal hints (e.g., regular or high-priority). Note that
out-of-bounds access is allowed in speculative mode. Ensure the source
`memref` is in address space `1`.

---------

Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
2026-03-27 16:05:30 +01:00
Mehdi Amini
b959831ed2
[MLIR][Arith] Fix int-range-optimizations miscompile from stale solver state (#188992)
The `--int-range-optimizations` pass runs the `DataFlowSolver` once,
then calls `applyPatternsGreedily` with a `DataFlowListener` that erases
solver state when ops are deleted. However, the greedy driver's
`simplifyRegions` step (which calls `runRegionDCE` between pattern
iterations) can remove block arguments without notifying the listener.
This frees the `BlockArgumentImpl` storage, which may be reused by a
subsequent allocation. The solver then finds stale lattice state keyed
at the reused address and incorrectly treats the new block argument as a
known constant, causing a miscompile.

The existing `enableFolding(false)` was added for the same class of bug
(folding can also remove block arguments). This patch extends the fix by
also disabling region simplification, preventing dead-arg elimination
from causing the same address-reuse problem.

Fixes #137281
Fixes #126195

Assisted-by: Claude Code
2026-03-27 16:00:55 +01:00
Mehdi Amini
79658d7769
[MLIR][SCF] Fix ForLoopRangeFolding miscompile with non-positive MulIOp multiplier (#188995)
The scf-for-loop-range-folding pass transforms loops of the form

  for (i = lb; i < ub; i += step) { use(i * c) }

into

  for (j = lb*c; j < ub*c; j += step*c) { use(j) }

This transformation is only valid when c is strictly positive, since
scf.for requires a positive step. When c is zero or negative, the new
step becomes zero or effectively negative (wrapping in unsigned
arithmetic for index type), producing an incorrect loop.

Add a guard that restricts the MulIOp folding to cases where the
loop-invariant multiplier is a statically known positive integer
constant. Non-constant loop-invariant multipliers are also excluded
since their sign cannot be determined at compile time.

Fixes #56235
Fixes #116664

Assisted-by: Claude Code
2026-03-27 16:00:28 +01:00
Mehdi Amini
c3bffc8e82
Revert "[MLIR] Fix ErasedOpsListener false positives for newly created ops/blocks" (#189010)
Reverts llvm/llvm-project#188956

Hit "merge" by accident on the wrong tab, juggling too may PRs in
parallel...
2026-03-27 14:38:33 +00:00
Davide Grohmann
71ea39a157
[mlir][spirv] Add Gather/Scatter/Resize ops in TOSA Ext Inst Set (#188497)
This patch introduces the following operators:

spirv.Tosa.Gather
spirv.Tosa.Scatter
spirv.Tosa.Resize

Also dialect and serialization round-trip tests have been added.

Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
2026-03-27 15:13:53 +01:00
Mehdi Amini
3303578234
[MLIR][GPU] Reject nested symbol references in gpu-kernel-outlining (#188994)
Nested symbol references (e.g. `@module::@func`) inside a `gpu.launch`
body cannot be resolved after the body is outlined into a new
`gpu.module`. Previously, `createKernelModule` used `getLeafReference()`
to look up each symbol use, which silently skipped nested references
when the leaf name could not be found in the parent symbol table. This
left unresolvable cross-module references in the outlined kernel.

This patch detects nested symbol references whose root exists in the
parent symbol table — meaning the reference was valid before outlining
but will become dangling after it — and emits a diagnostic error.
Phantom references whose root does not exist in the parent are left
as-is, preserving existing behavior for unregistered-op attributes
(regression test from #185357).

The existing `@nested_launch` test was inadvertently testing this broken
behavior (silently producing invalid IR with a dangling
`@nested_launch_kernel::@nested_launch_kernel` reference inside the
outlined outer kernel module); it is updated to expect the new error.

Fixes #187942

Assisted-by: Claude Code
2026-03-27 15:00:09 +01:00
Mehdi Amini
18ad761df0
[mlir] Fix generate-test-checks.py: don't put attr refs in CHECK-LABEL (#188985)
Two related bugs in generate-test-checks.py when a top-level operation
carries attribute alias references (e.g. `#map`, `#map1`) in its
signature:

1. The attribute reference substitution (replacing `#map` with
`#[[$ATTR_0]]`) ran *before* the pending attribute definitions were
processed, so the names were not yet available and the references were
left as-is in the output.

2. CHECK-LABEL lines do not support FileCheck variable references (e.g.
`#[[$ATTR_0]]`), so even after substitution the generated check would be
syntactically wrong.

Fix both issues:
- In the CHECK-LABEL branch, re-apply `process_attribute_references` to
the label prefix and SSA-split rest after flushing pending attribute
definitions, so that names are resolved.
- Split the label prefix at attribute reference boundaries; keep only
the text before the first reference in the CHECK-LABEL line and emit the
remainder on a CHECK-SAME line.

Before:
// CHECK-LABEL: func.func @test() attributes {amap = #map, bmap = #map1}
{

After:
  // CHECK-LABEL:   func.func @test() attributes {amap =
  // CHECK-SAME:      #[[$ATTR_0]], bmap = #[[$ATTR_1]]} {

Fixes #162310

Assisted-by: Claude Code
2026-03-27 14:59:42 +01:00
Mehdi Amini
9c0bce7ede
[MLIR][Transform] Fix crash in CheckUses when op lacks MemoryEffectOpInterface (#188998)
collectFreedValues used cast<MemoryEffectOpInterface> unconditionally on
every op encountered during the walk. Ops that do not implement the
interface (e.g. pdl.pattern, pdl.operands, pdl.types inside a
transform.with_pdl_patterns region) trigger the cast assertion, crashing
mlir-opt when -transform-dialect-check-uses is requested.

Change the cast to dyn_cast and skip ops that don't implement the
interface; they cannot free transform values and are safely ignored.

Fixes #120944

Assisted-by: Claude Code
2026-03-27 13:57:07 +00:00