26551 Commits

Author SHA1 Message Date
Julian Oppermann
018e048daf
[MLIR][Linalg] Generic to category specialization for unary elementwise ops (#187217)
Handle specialization of `linalg.generic` ops representing a unary
elementwise computation to the `linalg.elementwise` category op. This
implements a previously absent path in the linalg morphism.
2026-04-02 10:50:21 +02:00
yebinchon
495e1a4257
[mlir] added a check in the walk to prevent catching a cos in a nested region (#190064)
The walk in SincosFusion may detect a cos within a nested region of the
sin block. This triggers an assertion in `isBeforeInBlock` later on.
Added a check within the walk so it filters operations in nested
regions, which are not in the same block and should not be fused anyway.

---------

Co-authored-by: Yebin Chon <ychon@nvidia.com>
2026-04-01 20:10:56 -07:00
Nishant Patel
b3ca423a78
[MLIR][Vector] Enhance vector.multi_reduction unrolling to handle scalar result (#188633)
Previously, UnrollMultiReductionPattern bailed out when all the
dimensions were reduced to a scalar. This PR adds support for this case
by tiling the source vector and chaining partial reductions through the
accumulator operand.
2026-04-01 14:59:08 -07:00
Jianhui Li
1a1fbf967a
[MLIR][XeGPU] Support round-robin layout for constant and broadcast in wg-to-sg distribution (#189798)
As title.
2026-04-01 14:58:01 -07:00
Nishant Patel
9f50004651
[MLIR][XeGPU] Enhance the peephole optimization to remove the convert_layout after multi-reduction rewrite (#188849) 2026-04-01 13:55:11 -07:00
Jianhui Li
401ba6df84
[MLIR][XeGPU] Add Layout Propagation support for multi-reduction/reduction op with scalar result (#189133)
This PR add Layout Propagation support for multi-reduction/reduction op
with scalar result:
1) Enhance setupMultiReductionResultLayout() and
LayoutInfoPropagation::visitVectorMultiReductionOp() to support scalar
result
2) Add propagation support for vector.reduction op at the lane level,
since the op is only introduced at the lane level.
2026-04-01 13:01:34 -07:00
Jeff Sandoval
95a76886c1
[OpenMP][MLIR] Fix GPU teams reduction buffer size for by-ref reductions (#185460)
The `ReductionDataSize` field in `KernelEnvironmentTy` and the
`MaxDataSize` used to compute the `reduce_data_size` argument to
`__kmpc_nvptx_teams_reduce_nowait_v2` were both computed using pointer
types for by-ref reductions instead of the actual element types. This
caused the global teams reduction buffer to be undersized relative to
the offsets used by the copy/reduce callbacks, resulting in
out-of-bounds accesses faults at runtime.

For example, a by-ref reduction over `[4 x i32]` (16 bytes) would
allocate buffer slots based on `sizeof(ptr)` = 8 bytes, but the
generated callbacks would access 16 bytes per slot.

Fix both computation sites:

1. In MLIR's `getReductionDataSize()`, use
`DeclareReductionOp::getByrefElementType()` instead of `getType()` when
the reduction is by-ref, so the reduction buffer struct layout (and more
importantly its size) matches that emitted by the `OMPIRBuilder`.

2. In `OMPIRBuilder::createReductionsGPU()`, use
`ReductionInfo::ByRefElementType` instead of `ElementType` for by-ref
reductions when computing `MaxDataSize`. It seems that `MaxDataSize`
isn't actually used in the deviceRTL, but it's better to fix it to avoid
future propagation of this bug.

Finally, add CHECK lines to the existing array-descriptor reduction test
to verify both the kernel environment `ReductionDataSize` and the
`reduce_data_size` call argument reflect the actual element type size.

Assisted-by: Claude Opus 4.6

---------

Co-authored-by: Jeffrey Sandoval <jeffrey.sandoval@hpe.com>
2026-04-01 14:59:16 -05:00
Nishant Patel
150aa6f2d3
[MLIR][XeGPU] Add support for convert layout with scalar in Sg to WI distribution (#189721) 2026-04-01 12:05:32 -07:00
Zhen Wang
3b3b556a12
[mlir][NVVM] Add managed attribute for global variables (#189751)
Add support for the `nvvm.managed` attribute on `llvm.mlir.global` ops.
When present, the LLVM IR translation emits `!nvvm.annotations` metadata
with `!"managed"` for the global variable, which the NVPTX backend uses
to generate `.attribute(.managed)` in PTX output.

This enables CUDA managed memory support for frontends that lower
through MLIR.
2026-04-01 18:18:20 +00:00
Vito Secona
fbf484009c
[mlir][sparse] add GPU num threads to sparsifier options (#189078)
This change adds a `gpu-num-threads` option to the sparsifier. This
allows users to specify the number of threads used for GPU codegen,
similar to the `num-threads` option in the `-sparse-gpu-codegen` pass.
2026-04-01 10:42:26 -07:00
Jan Leyonberg
91adaeceb1
[CIR][MLIR][OpenMP] Enable the MarkDeclareTarget pass for ClangIR (#189420)
This patch enables the MarkDeclareTarget for CIR by adding the pass to
the lowerings and attaching the declare target interface to the
cir::FuncOp. The MarkDeclareTarget is also generalized to work on the
FunctionOpInterface instead of func::Op since it needs to be able to
handle cir::FuncOp as well.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 12:50:09 -04:00
Razvan Lupusoru
9506f20b4d
[flang][acc] Add AA implementation for acc operations (#189772)
This PR extends flang's alias analysis so it can reason about values
that originate from OpenACC data and privatization operations, including
values passed through block arguments.
2026-04-01 09:03:47 -07:00
Ming Yan
158f10fe24
[mlir][memref] Fold memref.reinterpret_cast operations with valid offset or size constants. (#189533)
When encountering an invalid offset or size, we only skip the current
invalid value and continue attempting to fold other valid offsets or
sizes.
2026-04-01 15:30:12 +02:00
AidinT
461a1c51bf
[mlir][ods] resolve the wrong indent issue (#189277)
The `emitSummaryAndDescComments` is used to generate summary and
description for tablegen generated classes and structs such as Dialects
and Interfaces. The generated summary and description is indented
incorrectly in the output generated file. For example
`NVGPUDialect.h.inc ` looks like the following:

```cpp
namespace mlir::nvgpu {

/// The `NVGPU` dialect provides a bridge between higher-level target-agnostic
///     dialects (GPU and Vector) and the lower-level target-specific dialect
///     (LLVM IR based NVVM dialect) for NVIDIA GPUs. This allow representing PTX
///     specific operations while using MLIR high level dialects such as Memref
///     and Vector for memory and target-specific register operands, respectively.
class NVGPUDialect : public ::mlir::Dialect {
    ...
  };

} // namespace mlir::nvgpu
```

This is because the `emitSummaryAndDescComments` trims the summary and
description from both sides, rendering the re-indentation useless. This
PR resolves this bug.
2026-04-01 15:20:54 +02:00
Erick Ochoa Lopez
5072c020aa
[mlir][vector] Drop trailing 1-dims from constant_mask (#187383)
Generalize TransferReadDropUnitDimsPattern to also drop unit dimensions
when `vector::ConstantMaskOp` is used.

Previously TransferReadDropUnitDimsPattern would only drop unit
dimensions when `vector::CreateMaskOp` with a statically known operand
was used.

Assisted-by: Cursor
2026-04-01 09:05:25 -04:00
Mehdi Amini
f6ffdbcbae
[MLIR][Affine] Fix dead store elimination for vector stores with different types (#189248)
affine-scalrep's findUnusedStore incorrectly classified an
affine.vector_store as dead when a subsequent store wrote to the same
base index but with a smaller vector type. A vector<1xi64> store at
[0,0] does not fully overwrite a vector<5xi64> store at [0,0], so the
first store must be preserved.

The loadCSE function in the same file already had the correct
type-equality check for loads; this patch adds the analogous check for
stores in findUnusedStore.

Fixes #113687

Assisted-by: Claude Code
2026-04-01 10:40:53 +00:00
lonely eagle
6b2b0da40d
[mlir][CSE] Fix double-counting of numCSE statistic (#189802)
This PR fixes a regression where the numCSE statistic was being
incremented twice for a single operation elimination. The numCSE counter
is already internally incremented within the replaceUsesAndDelete
function. Manually incrementing it again after the function call leads
to an inaccurate total count. This is part of the
https://github.com/llvm/llvm-project/pull/180556.
2026-04-01 17:10:20 +08:00
Mehdi Amini
249e871fa4
[MLIR][ArithToLLVM] Fix index_cast on memref types generating invalid LLVM IR (#189227)
`arith.index_cast` and `arith.index_castui` accept memref operands (via
`IndexCastTypeConstraint`), but `IndexCastOpLowering::matchAndRewrite`
did not handle this case. When the operand was a memref, the conversion
framework substituted the converted LLVM struct type, and the lowering
incorrectly attempted to emit `llvm.sext`/`llvm.zext`/`llvm.trunc` on a
struct value, producing invalid LLVM IR.

Since LLVM uses opaque pointers, all memrefs with integer or index
element types lower to the same `\!llvm.struct<(ptr, ptr, i64, ...)>`
type, making `arith.index_cast` on memrefs a no-op at the LLVM level.
Add a check that treats the memref case as an identity conversion (same
as the same-bit-width path).

Fixes #92377

Assisted-by: Claude Code
2026-04-01 11:03:14 +02:00
Mehdi Amini
b1f8c28559
[MLIR] Validate APInt bitwidth in IntegerAttr::get(Type, APInt) (#188725)
IntegerAttr::get(Type, APInt) did not validate that the APInt's bit
width matched the expected bit width for the given type. For integer
types, the APInt width must equal the integer type's width. For index
types, the APInt width must equal IndexType::kInternalStorageBitWidth
(64 bits).

Passing an APInt with the wrong bit width could cause a
non-deterministic crash in StorageUniquer when comparing two IntegerAttr
instances for the same type but with different APInt widths.

This commit adds assertions in the get(Type, APInt) builder to catch
such misuse early in debug builds, providing a clear error message at
the call site rather than a cryptic crash in the storage uniquer.

Fixes #56401

Assisted-by: Claude Code
2026-04-01 10:47:36 +02:00
Lukas Sommer
6a31be68e3
[mlir][NFC] Remove conditionally unused type alias (#189894)
The `RawType` type alias is unused (`-Wunused-local-typedef`) in build
with asserts deactivated. In combination with `-Werror`, this causes
builds to fail.

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
2026-04-01 10:25:58 +02:00
AidinT
585e2a015b
[MLIR] Convert BytecodeDialectInterface to ods (#188852)
This PR converts `BytecodeDialectInterface` to ODS.
2026-04-01 05:41:07 +02:00
AidinT
d52a5e8a5a
[MLIR] convert ConvertToEmitCPatternInterface to ODS (#188621)
This PR converts `ConvertToEmitCPatternInterface` dialect interface to ODS. Also makes changes to derived classes.
2026-04-01 05:30:12 +02:00
Krzysztof Drewniak
7fce7631a0
[mlir] Refactor opaque properties to make them type-safe (#185157)
At its core, this commit changes `OpaqueProperties` (aka a void*) to
`PropertyRef`, which is a {TypeID, void*}, where the TypeID is the ID of
the storage type of the given property (which can, as is often the case
for operations, be a struct of other properties).

Long-term, this change will allow for
1) Some sort of getFooPropertyRef() on property structs, allowing
individual members to be extracted generically
2) By having a property kind that is an OwningProprtyRef, generic
parsing (in combination with a bunch of other changes) 3) Probably a
safer C/Python API because we'll be able to indicate what's supposed to
be under a given void*

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 19:49:40 -07:00
Krzysztof Drewniak
b813b0b4e4
[mlir][MemRef] Migrate memref dialect alias op folding to interface (#187168)
This PR adds code to FoldMemRefAliasOps / --fold-memref-alias-ops to use
the new IndexedMemoryAccessOpInterface and
IndexedMemCopyOpInterface and implement those operations for relevant
operations in the memref dialect.

This is a reordering of the changes planned in #177014 and #177016 to
make them more testable.

There are no behavior changes expected for how memref.load and
memref.store behave within the alias ops folding pass, though support
for new operations, like memref.prefetch, has been added.

Some error messages have been updated because certain laws of
memref.load/memref.store have been moved to IndexedAccessOpInterface.

Assisted-by: Claude 4.6 (helped deal with some of the boilerplate in the
rewrite patterns and with extracting the patch)
2026-03-31 14:44:27 -07:00
Zhen Wang
3ed48bf648
Revert "[flang][cuda] Support non-allocatable module-level managed variables" (#189745)
Reverts llvm/llvm-project#188526
2026-03-31 20:53:50 +00:00
Artem Gindinson
0454de8b54
[mlir][Arith] Avoid sign overflow when narrowing signed operations (#189676)
Whether an arith operation can be truncated to a given bitwidth should
also depend on the sign semantics of the operation itself. Consider:
```
%input = /* upper bound > INT32_MAX, <= UINT32_MAX */ : index
%c0 = arith.constant 0 : index
%cmp = arith.cmpi sle, %input, %c0 : index
```

Previously, `checkTruncatability()` would correctly judge that only an
unsigned truncation could be legal, however the narrowing would still
proceed despite the fact that the `sle` predicate treated the MSB as the
sign.

Ensure that the sign is checked for signed comparison predicates and for
signed elementwise operations by enforcing a `CastKind::Signed`
restriction, whereby the narrowing patterns bail out on incompatible
input range/operation signedness.

**AI tooling usage disclaimer**
LIT tests were expanded from manual reproducer examples with LLM
assistance. Those additional test cases were verified to
regression-test, proofread and edited manually in accordance with the
"Human in the loop" policy. LLMs/generative tooling were not used for
implementation/documentation purposes.

---------

Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
Co-authored-by: GPT 5.4 <codex@openai.com>
2026-03-31 22:45:18 +02:00
Zhen Wang
c4e6cf0abf
[flang][cuda] Support non-allocatable module-level managed variables (#188526)
Add support for non-allocatable module-level CUDA managed variables
using pointer indirection through a companion global in
__nv_managed_data__. The CUDA runtime populates this pointer with the
unified memory address via __cudaRegisterManagedVar and
__cudaInitModule.

1. Create a .managed.ptr companion global in the __nv_managed_data__
section and register it with _FortranACUFRegisterManagedVariable
(CUFAddConstructor.cpp)
2. Call __cudaInitModule after registration to populate the managed
pointer (registration.cpp)
3. Annotate managed globals in gpu.module with nvvm.managed for PTX
.attribute(.managed) generation (cuda-code-gen.mlir)
4. Suppress cuf.data_transfer for assignments to/from non-allocatable
module managed variables, since cudaMemcpy would target the shadow
address rather than the actual unified memory (tools.h)
5. Preserve cuf.data_transfer for device_var = managed_var assignments
where explicit transfer is still required
2026-03-31 16:27:08 +00:00
Nishant Patel
65720adc15
[MLIR][XeGPU] Switch to the new sg to wi pass (#188627)
This PR has changes required to switch the pipeline to use the new sg to
wi pass.
2026-03-31 09:20:11 -07:00
Chi-Chun, Chen
9e77a45935
[mlir][OpenMP][NFC] Refactor fillAffinityIteratorLoop (#189418)
Extract affinity-specific logic from fillAffinityIteratorLoop into a
callback so that the iterator loop codegen logic can be shared with
other clauses such as depend clause and target clause.
2026-03-31 11:12:00 -05:00
Chi-Chun, Chen
7ff0dc4b9f
[mlir][OpenMP] Add iterator support to depend clause (#189090)
Extend the depend clause to support `!omp.iterated<Ty>` handles
alongside plain depend vars, so the IR can represent both forms.

Assisted with copilot

This is part of feature work for
https://github.com/llvm/llvm-project/issues/188061
2026-03-31 11:11:08 -05:00
Mehdi Amini
6477f3aa16
[mlir][ArithToSPIRV] Fix invalid SPIRV and crashes when lowering integer ops on i1 (#189239)
Several arith integer operations on i1 / vector<Ni1> types were either
crashing or producing invalid SPIRV. The i1 type maps to spirv.bool in
SPIRV, not to a SPIRV integer — so standard integer SPIRV ops
(spirv.IAdd, spirv.UDiv, spirv.GLSMax, etc.) are illegal on it.
Add dedicated boolean patterns for all affected arith integer ops, each
with benefit=2 to take priority over the generic elementwise patterns.
The semantics for i1 follow from treating true = 1 / false = 0 with
two's complement wrapping:
   
- addi, subi → spirv.LogicalNotEqual (XOR on bits)
- muli, divui, divsi → spirv.LogicalAnd                                
- remui, remsi, shli, shrui → spirv.LogicalAnd(a, spirv.LogicalNot(b))
(a & ~b)
- shrsi → identity (arithmetic right shift of a 1-bit signed value is
always the input)
- maxui, minsi → spirv.LogicalOr (unsigned max / signed min treats true
as larger)
- maxsi, minui → spirv.LogicalAnd (signed max / unsigned min treats
false as larger)
Fixes #61162

Assisted-by: Claude Code
2026-03-31 17:56:51 +02:00
AidinT
67c34294a6
[mlir][docs] dialect interfaces and mlir reduce documentation fix (#189258)
Two modifications:

1. Reflect newly added dialect interface methods in the documentation
2. Remove the bug in the `MLIR Reduce` documentation
2026-03-31 15:23:13 +00:00
Arseniy Obolenskiy
09c54a8f7a
[mlir][SPIR-V] Support spirv.loop_control attribute on scf.for and scf.while (#189392)
Propagate the `spirv.loop_control` attribute from `scf.for` and
`scf.while` operations to the generated `spirv.mlir.loop` during
SCFToSPIRV conversion
2026-03-31 16:49:37 +02:00
Leandro Lupori
a30a8e9474
Reland "[flang][OpenMP] Fix lowering of LINEAR iteration variables (#183794)" (#188851)
Linear iteration variables were being treated as private. This fixes
one of the issues reported in #170784.

The regression reported in #188536 occurred because
LinearClauseProcessor was rewriting all basic blocks whose names
contained a given substring, including those that were not part of the
translated SIMD region.
This didn't cause problems before because linear variables were always
privatized, which doesn't happen with this change.
The issue is fixed by rewriting only the basic blocks that correspond to
the omp.simd operation.
2026-03-31 09:36:08 -03:00
Davide Grohmann
d5f7acdbc1
[mlir][spirv] Add Cast/Rescale ops in TOSA Ext Inst Set (#189028)
This patch introduces the following operators:

spirv.Tosa.Cast
spirv.Tosa.Rescale

Also dialect and serialization round-trip tests have been added.

Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
2026-03-31 14:31:04 +02:00
Guray Ozen
97b78d6ff3
[MLIR][NVVM] Fix predicate operand index in BasicPtxBuilderInterface (#189552)
Predicate index computation was incorrect, it was not counting
write/readwrite symbols.

Wrong case
```
  // CHECK: %{{.*}} =  llvm.inline_asm has_side_effects asm_dialect = att "@$1 ex2.approx.ftz.f32 $0, $1;", "=f,f,b" %{{.*}}, %{{.*}} : (f32, i1) -> f32
  %1 = nvvm.inline_ptx "ex2.approx.ftz.f32 {$w0}, {$r0};" ro (%input : f32), predicate = %pred  -> f32
```

PR fixes, predicate index became `@$2`
```
// CHECK: %{{.*}} =  llvm.inline_asm has_side_effects asm_dialect = att "@$2 ex2.approx.ftz.f32 $0, $1;", "=f,f,b" %{{.*}}, %{{.*}} : (f32, i1) -> f32
  %1 = nvvm.inline_ptx "ex2.approx.ftz.f32 {$w0}, {$r0};" ro (%input : f32), predicate = %pred  -> f32
  ```
2026-03-31 13:36:58 +02:00
Sergei Lebedev
b544ad5703
[MLIR] [Python] Added a way to extend MLIR->Python type mappings (#189368)
The idea is to use TableGen records for both custom type constraints and
attributes:

* `PythonTypeName` is for type constraints, while
* `PythonAttrType` is for attributes.

The key types differ between these two records. `PythonTypeName` is
keyed by C++ type because multiple type constraints map to the same C++
type (e.g. `I32` and `I64` both map to `::mlir::IntegerType`), so a
single entry covers all of them. `PythonAttrType` is keyed by TableGen
def name because different attributes can share the same C++ storage
type but need distinct Python types (e.g. `I32ArrayAttr` and
`StrArrayAttr` are both `::mlir::ArrayAttr`).

We could in theory reimplement `getPythonAttrName` using the same
approach, but I decided to leave it for future PRs.
2026-03-31 11:00:40 +01:00
Mehdi Amini
7ad564e54b
[MLIR][MemRef] Fix LoadOpOfExpandShapeOpFolder returning failure after IR change (#188964)
LoadOpOfExpandShapeOpFolder<vector::TransferReadOp>::matchAndRewrite
called resolveSourceIndicesExpandShape (which creates
AffineLinearizeIndexOp ops via the rewriter) before checking whether the
vector::TransferReadOp preconditions hold. When those checks failed
(sourceRank < vectorRank or permutation map mismatch), the pattern
returned failure() after already modifying the IR, triggering "pattern
returned failure but IR did change" under
MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.

Fix by hoisting the vector::TransferReadOp precondition checks to before
the resolveSourceIndicesExpandShape call. The source rank is derived
from expandShapeOp.getViewSource()'s type (no IR creation needed), and
the permutation map check only uses op attributes. Only if all checks
pass do we proceed to create the linearized-index ops.

Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
2026-03-31 11:39:03 +02:00
Slava Zakharin
35f89458fa
[mlir] Made DefaultResource the root of memory resource hierarchy. (#187423)
DefaultResource is made the root of the memory resource hierarchy,
so now it overlaps with all resources.

RFC:
https://discourse.llvm.org/t/rfc-mlir-memory-region-hierarchy-for-mlir-side-effects/89811/32
2026-03-30 17:52:45 -07:00
Mehdi Amini
acbf3f3186
[MLIR][SCF] Fix scf.index_switch lowering to preserve large case values (#189230)
`IndexSwitchLowering` stored case values as `SmallVector<int32_t>`,
which silently truncated any `int64_t` case value larger than INT32_MAX
(e.g. `4294967296` became `0`). The `cf.switch` flag was also created
via `arith.index_cast index -> i32`, losing the upper 32 bits on 64-bit
platforms.

Fix: store case values as `SmallVector<APInt>` with 64-bit width, cast
the index argument to `i64`, and use the `ArrayRef<APInt>` overload of
`cf::SwitchOp::create` so the resulting switch correctly uses `i64` case
values and flag type.

Fixes #111589

Assisted-by: Claude Code
2026-03-31 00:46:28 +02:00
Mehdi Amini
5da2546594
[mlir][scf] Fix FoldTensorCastOfOutputIntoForallOp write order bug (#189162)
`FoldTensorCastOfOutputIntoForallOp` incorrectly updated the
destinations of `tensor.parallel_insert_slice` ops in the `in_parallel`
block by zipping `getYieldingOps()` with `getRegionIterArgs()`
positionally. This assumed that the i-th yielding op writes to the i-th
shared output, which is not required by the IR semantics. When slices
are written to shared outputs in non-positional order, the
canonicalization would silently reverse the write targets, producing
incorrect output.

Fix by replacing the positional zip with a per-destination check: for
each yielding op's destination operand, if it is a `tensor.cast` result
whose source is one of the new `scf.forall` region iter args (i.e., a
cast we introduced to bridge the type change), replace the destination
with the cast's source directly. This correctly handles all orderings.

Add a regression test that exercises the multi-result case where
`parallel_insert_slice` ops write to shared outputs in non-sequential
order.

Fixes #172981

Assisted-by: Claude Code
2026-03-31 00:35:13 +02:00
Mehdi Amini
e097875417
[MLIR][SparseTensor] Fix fingerprint changes in SparseFuncAssembler (#188958)
SparseFuncAssembler::matchAndRewrite was calling funcOp.setName(),
funcOp.setPrivate(), and funcOp->removeAttr() directly without notifying
the rewriter, causing "operation fingerprint changed" errors under
MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.

Wrap all in-place funcOp mutations with rewriter.modifyOpInPlace.

Assisted-by: Claude Code

Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 00:33:47 +02:00
Mehdi Amini
27b9ea5ea0
[MLIR][SparseTensor] Fix domination violation in co-iteration for dense iterators (#188959)
In exitWhileLoop, random-accessible (dense) iterators were being located
using whileOp.getResults().back() while the insertion point was still
inside the while loop's after block. This caused a domination violation:
the ADDI created by locate() was inside the after block, but it was
later used (via derefImpl's SUBI) after the while loop exits.

Move the locate() calls for random-accessible iterators to after
builder.setInsertionPointAfter(whileOp), where the while results are
properly in scope.

Fixes 10 failing tests under MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.

Assisted-by: Claude Code

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 00:33:28 +02:00
Keshav Vinayak Jha
54b723097b
[MLIR][Affine] Add vector support to affine.linearize_index and affine.delinearize_index (#188369)
Allow `affine.delinearize_index` and `affine.linearize_index` to operate
on `vector<...x index>` types in addition to scalar indices.

---------

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:26:39 -07:00
Eric Feng
ae835dea74
[mlir][amdgpu] implement amdgpu.global_load_async_to_lds for gfx1250 (#189279)
This patch introduces an amdgpu wrapper for
`rocdl.global.load.async.to.lds.bN` intrinsics, which were introduced in
gfx1250.

Assisted-by: Claude

---------

Signed-off-by: Eric Feng <Eric.Feng@amd.com>
2026-03-30 14:20:59 -07:00
Stanislav Mekhanoshin
5f99854d01
[AMDGPU] Drop A and B neg modifier from amdgcn_wmma_bf16_16x16x32_bf16 (#189468)
Fixes: LCOMPILER-1673
2026-03-30 14:14:22 -07:00
Nishant Patel
e50f08b548
[MLIR] [XeGPU] Add distribution patterns for vector transpose, bitcast & mask ops in sg to wi pass (#187392)
This PR adds patterns for following vector ops in the new sg-to-wi pass

1. Transpose
2. BitCast
3. CreateMask
4. ConstantMask
2026-03-30 14:06:03 -07:00
Berke Ates
b6e4d27c48
[MLIR][Mem2Reg] Extract shared utilities for PromotableRegionOpInterface (#188514)
The `PromotableRegionOpInterface` implementations use two helpers that
are likely useful for other dialects implementing this interface as
well:
- `updateTerminator`: Appends the reaching definition as an operand to a
block's terminator, falling back to a default when the block has no
entry (e.g. dead code).
- `replaceWithNewResults`: Clones an operation with additional result
types while preserving its regions, then replaces the original.

This PR extracts them into a common utility header so that downstream
dialects can reuse them directly.
I'm open to discussion about the location of these utilities.
2026-03-30 22:20:39 +02:00
Maksim Levental
f10dccd458
[MLIR][SparseTensor] Add #undef FAILURE_IF_FAILED and ERROR_IF (#188685)
Both DimLvlMapParser.cpp and LvlTypeParser.cpp define FAILURE_IF_FAILED
and ERROR_IF macros that are never undefined, which can leak into
subsequent translation units in unity builds. Add #undef at the end of
each file. See
https://discourse.llvm.org/t/rfc-enabling-unity-build/90306 for more
info.

"clauded" not coded
2026-03-30 12:27:48 -07:00
Maksim Levental
03869c74b6
[MLIR][SparseTensor] Add missing #undef REMUI and DIVUI (#188686)
LoopEmitter.cpp and SparseTensorIterator.cpp define REMUI and DIVUI
macros but the existing #undef block at the end of each file omits them.
This can leak the macros into subsequent translation units in unity
builds. See https://discourse.llvm.org/t/rfc-enabling-unity-build/90306
for more info.

"clauded" not coded
2026-03-30 12:27:31 -07:00