llvm-project

Author	SHA1	Message	Date
Julian Oppermann	018e048daf	[MLIR][Linalg] Generic to category specialization for unary elementwise ops (#187217 ) Handle specialization of `linalg.generic` ops representing a unary elementwise computation to the `linalg.elementwise` category op. This implements a previously absent path in the linalg morphism.	2026-04-02 10:50:21 +02:00
yebinchon	495e1a4257	[mlir] added a check in the walk to prevent catching a cos in a nested region (#190064 ) The walk in SincosFusion may detect a cos within a nested region of the sin block. This triggers an assertion in `isBeforeInBlock` later on. Added a check within the walk so it filters operations in nested regions, which are not in the same block and should not be fused anyway. --------- Co-authored-by: Yebin Chon <ychon@nvidia.com>	2026-04-01 20:10:56 -07:00
Nishant Patel	b3ca423a78	[MLIR][Vector] Enhance vector.multi_reduction unrolling to handle scalar result (#188633 ) Previously, UnrollMultiReductionPattern bailed out when all the dimensions were reduced to a scalar. This PR adds support for this case by tiling the source vector and chaining partial reductions through the accumulator operand.	2026-04-01 14:59:08 -07:00
Jianhui Li	1a1fbf967a	[MLIR][XeGPU] Support round-robin layout for constant and broadcast in wg-to-sg distribution (#189798 ) As title.	2026-04-01 14:58:01 -07:00
Nishant Patel	9f50004651	[MLIR][XeGPU] Enhance the peephole optimization to remove the convert_layout after multi-reduction rewrite (#188849 )	2026-04-01 13:55:11 -07:00
Jianhui Li	401ba6df84	[MLIR][XeGPU] Add Layout Propagation support for multi-reduction/reduction op with scalar result (#189133 ) This PR add Layout Propagation support for multi-reduction/reduction op with scalar result: 1) Enhance setupMultiReductionResultLayout() and LayoutInfoPropagation::visitVectorMultiReductionOp() to support scalar result 2) Add propagation support for vector.reduction op at the lane level, since the op is only introduced at the lane level.	2026-04-01 13:01:34 -07:00
Jeff Sandoval	95a76886c1	[OpenMP][MLIR] Fix GPU teams reduction buffer size for by-ref reductions (#185460 ) The `ReductionDataSize` field in `KernelEnvironmentTy` and the `MaxDataSize` used to compute the `reduce_data_size` argument to `__kmpc_nvptx_teams_reduce_nowait_v2` were both computed using pointer types for by-ref reductions instead of the actual element types. This caused the global teams reduction buffer to be undersized relative to the offsets used by the copy/reduce callbacks, resulting in out-of-bounds accesses faults at runtime. For example, a by-ref reduction over `[4 x i32]` (16 bytes) would allocate buffer slots based on `sizeof(ptr)` = 8 bytes, but the generated callbacks would access 16 bytes per slot. Fix both computation sites: 1. In MLIR's `getReductionDataSize()`, use `DeclareReductionOp::getByrefElementType()` instead of `getType()` when the reduction is by-ref, so the reduction buffer struct layout (and more importantly its size) matches that emitted by the `OMPIRBuilder`. 2. In `OMPIRBuilder::createReductionsGPU()`, use `ReductionInfo::ByRefElementType` instead of `ElementType` for by-ref reductions when computing `MaxDataSize`. It seems that `MaxDataSize` isn't actually used in the deviceRTL, but it's better to fix it to avoid future propagation of this bug. Finally, add CHECK lines to the existing array-descriptor reduction test to verify both the kernel environment `ReductionDataSize` and the `reduce_data_size` call argument reflect the actual element type size. Assisted-by: Claude Opus 4.6 --------- Co-authored-by: Jeffrey Sandoval <jeffrey.sandoval@hpe.com>	2026-04-01 14:59:16 -05:00
Nishant Patel	150aa6f2d3	[MLIR][XeGPU] Add support for convert layout with scalar in Sg to WI distribution (#189721 )	2026-04-01 12:05:32 -07:00
Zhen Wang	3b3b556a12	[mlir][NVVM] Add managed attribute for global variables (#189751 ) Add support for the `nvvm.managed` attribute on `llvm.mlir.global` ops. When present, the LLVM IR translation emits `!nvvm.annotations` metadata with `!"managed"` for the global variable, which the NVPTX backend uses to generate `.attribute(.managed)` in PTX output. This enables CUDA managed memory support for frontends that lower through MLIR.	2026-04-01 18:18:20 +00:00
Vito Secona	fbf484009c	[mlir][sparse] add GPU num threads to sparsifier options (#189078 ) This change adds a `gpu-num-threads` option to the sparsifier. This allows users to specify the number of threads used for GPU codegen, similar to the `num-threads` option in the `-sparse-gpu-codegen` pass.	2026-04-01 10:42:26 -07:00
Jan Leyonberg	91adaeceb1	[CIR][MLIR][OpenMP] Enable the MarkDeclareTarget pass for ClangIR (#189420 ) This patch enables the MarkDeclareTarget for CIR by adding the pass to the lowerings and attaching the declare target interface to the cir::FuncOp. The MarkDeclareTarget is also generalized to work on the FunctionOpInterface instead of func::Op since it needs to be able to handle cir::FuncOp as well. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-01 12:50:09 -04:00
Razvan Lupusoru	9506f20b4d	[flang][acc] Add AA implementation for acc operations (#189772 ) This PR extends flang's alias analysis so it can reason about values that originate from OpenACC data and privatization operations, including values passed through block arguments.	2026-04-01 09:03:47 -07:00
Ming Yan	158f10fe24	[mlir][memref] Fold memref.reinterpret_cast operations with valid offset or size constants. (#189533 ) When encountering an invalid offset or size, we only skip the current invalid value and continue attempting to fold other valid offsets or sizes.	2026-04-01 15:30:12 +02:00
AidinT	461a1c51bf	[mlir][ods] resolve the wrong indent issue (#189277 ) The `emitSummaryAndDescComments` is used to generate summary and description for tablegen generated classes and structs such as Dialects and Interfaces. The generated summary and description is indented incorrectly in the output generated file. For example `NVGPUDialect.h.inc ` looks like the following: ```cpp namespace mlir::nvgpu { /// The `NVGPU` dialect provides a bridge between higher-level target-agnostic /// dialects (GPU and Vector) and the lower-level target-specific dialect /// (LLVM IR based NVVM dialect) for NVIDIA GPUs. This allow representing PTX /// specific operations while using MLIR high level dialects such as Memref /// and Vector for memory and target-specific register operands, respectively. class NVGPUDialect : public ::mlir::Dialect { ... }; } // namespace mlir::nvgpu ``` This is because the `emitSummaryAndDescComments` trims the summary and description from both sides, rendering the re-indentation useless. This PR resolves this bug.	2026-04-01 15:20:54 +02:00
Erick Ochoa Lopez	5072c020aa	[mlir][vector] Drop trailing 1-dims from constant_mask (#187383 ) Generalize TransferReadDropUnitDimsPattern to also drop unit dimensions when `vector::ConstantMaskOp` is used. Previously TransferReadDropUnitDimsPattern would only drop unit dimensions when `vector::CreateMaskOp` with a statically known operand was used. Assisted-by: Cursor	2026-04-01 09:05:25 -04:00
Mehdi Amini	f6ffdbcbae	[MLIR][Affine] Fix dead store elimination for vector stores with different types (#189248 ) affine-scalrep's findUnusedStore incorrectly classified an affine.vector_store as dead when a subsequent store wrote to the same base index but with a smaller vector type. A vector<1xi64> store at [0,0] does not fully overwrite a vector<5xi64> store at [0,0], so the first store must be preserved. The loadCSE function in the same file already had the correct type-equality check for loads; this patch adds the analogous check for stores in findUnusedStore. Fixes #113687 Assisted-by: Claude Code	2026-04-01 10:40:53 +00:00
lonely eagle	6b2b0da40d	[mlir][CSE] Fix double-counting of numCSE statistic (#189802 ) This PR fixes a regression where the numCSE statistic was being incremented twice for a single operation elimination. The numCSE counter is already internally incremented within the replaceUsesAndDelete function. Manually incrementing it again after the function call leads to an inaccurate total count. This is part of the https://github.com/llvm/llvm-project/pull/180556.	2026-04-01 17:10:20 +08:00
Mehdi Amini	249e871fa4	[MLIR][ArithToLLVM] Fix index_cast on memref types generating invalid LLVM IR (#189227 ) `arith.index_cast` and `arith.index_castui` accept memref operands (via `IndexCastTypeConstraint`), but `IndexCastOpLowering::matchAndRewrite` did not handle this case. When the operand was a memref, the conversion framework substituted the converted LLVM struct type, and the lowering incorrectly attempted to emit `llvm.sext`/`llvm.zext`/`llvm.trunc` on a struct value, producing invalid LLVM IR. Since LLVM uses opaque pointers, all memrefs with integer or index element types lower to the same `\!llvm.struct<(ptr, ptr, i64, ...)>` type, making `arith.index_cast` on memrefs a no-op at the LLVM level. Add a check that treats the memref case as an identity conversion (same as the same-bit-width path). Fixes #92377 Assisted-by: Claude Code	2026-04-01 11:03:14 +02:00
Mehdi Amini	b1f8c28559	[MLIR] Validate APInt bitwidth in IntegerAttr::get(Type, APInt) (#188725 ) IntegerAttr::get(Type, APInt) did not validate that the APInt's bit width matched the expected bit width for the given type. For integer types, the APInt width must equal the integer type's width. For index types, the APInt width must equal IndexType::kInternalStorageBitWidth (64 bits). Passing an APInt with the wrong bit width could cause a non-deterministic crash in StorageUniquer when comparing two IntegerAttr instances for the same type but with different APInt widths. This commit adds assertions in the get(Type, APInt) builder to catch such misuse early in debug builds, providing a clear error message at the call site rather than a cryptic crash in the storage uniquer. Fixes #56401 Assisted-by: Claude Code	2026-04-01 10:47:36 +02:00
Lukas Sommer	6a31be68e3	[mlir][NFC] Remove conditionally unused type alias (#189894 ) The `RawType` type alias is unused (`-Wunused-local-typedef`) in build with asserts deactivated. In combination with `-Werror`, this causes builds to fail. Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>	2026-04-01 10:25:58 +02:00
AidinT	585e2a015b	[MLIR] Convert BytecodeDialectInterface to ods (#188852 ) This PR converts `BytecodeDialectInterface` to ODS.	2026-04-01 05:41:07 +02:00
AidinT	d52a5e8a5a	[MLIR] convert ConvertToEmitCPatternInterface to ODS (#188621 ) This PR converts `ConvertToEmitCPatternInterface` dialect interface to ODS. Also makes changes to derived classes.	2026-04-01 05:30:12 +02:00
Krzysztof Drewniak	7fce7631a0	[mlir] Refactor opaque properties to make them type-safe (#185157 ) At its core, this commit changes `OpaqueProperties` (aka a void) to `PropertyRef`, which is a {TypeID, void}, where the TypeID is the ID of the storage type of the given property (which can, as is often the case for operations, be a struct of other properties). Long-term, this change will allow for 1) Some sort of getFooPropertyRef() on property structs, allowing individual members to be extracted generically 2) By having a property kind that is an OwningProprtyRef, generic parsing (in combination with a bunch of other changes) 3) Probably a safer C/Python API because we'll be able to indicate what's supposed to be under a given void* --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 19:49:40 -07:00
Krzysztof Drewniak	b813b0b4e4	[mlir][MemRef] Migrate memref dialect alias op folding to interface (#187168 ) This PR adds code to FoldMemRefAliasOps / --fold-memref-alias-ops to use the new IndexedMemoryAccessOpInterface and IndexedMemCopyOpInterface and implement those operations for relevant operations in the memref dialect. This is a reordering of the changes planned in #177014 and #177016 to make them more testable. There are no behavior changes expected for how memref.load and memref.store behave within the alias ops folding pass, though support for new operations, like memref.prefetch, has been added. Some error messages have been updated because certain laws of memref.load/memref.store have been moved to IndexedAccessOpInterface. Assisted-by: Claude 4.6 (helped deal with some of the boilerplate in the rewrite patterns and with extracting the patch)	2026-03-31 14:44:27 -07:00
Zhen Wang	3ed48bf648	Revert "[flang][cuda] Support non-allocatable module-level managed variables" (#189745 ) Reverts llvm/llvm-project#188526	2026-03-31 20:53:50 +00:00
Artem Gindinson	0454de8b54	[mlir][Arith] Avoid sign overflow when narrowing signed operations (#189676 ) Whether an arith operation can be truncated to a given bitwidth should also depend on the sign semantics of the operation itself. Consider: ``` %input = /* upper bound > INT32_MAX, <= UINT32_MAX / : index %c0 = arith.constant 0 : index %cmp = arith.cmpi sle, %input, %c0 : index ``` Previously, `checkTruncatability()` would correctly judge that only an unsigned truncation could be legal, however the narrowing would still proceed despite the fact that the `sle` predicate treated the MSB as the sign. Ensure that the sign is checked for signed comparison predicates and for signed elementwise operations by enforcing a `CastKind::Signed` restriction, whereby the narrowing patterns bail out on incompatible input range/operation signedness. AI tooling usage disclaimer* LIT tests were expanded from manual reproducer examples with LLM assistance. Those additional test cases were verified to regression-test, proofread and edited manually in accordance with the "Human in the loop" policy. LLMs/generative tooling were not used for implementation/documentation purposes. --------- Signed-off-by: Artem Gindinson <gindinson@roofline.ai> Co-authored-by: GPT 5.4 <codex@openai.com>	2026-03-31 22:45:18 +02:00
Zhen Wang	c4e6cf0abf	[flang][cuda] Support non-allocatable module-level managed variables (#188526 ) Add support for non-allocatable module-level CUDA managed variables using pointer indirection through a companion global in __nv_managed_data__. The CUDA runtime populates this pointer with the unified memory address via __cudaRegisterManagedVar and __cudaInitModule. 1. Create a .managed.ptr companion global in the __nv_managed_data__ section and register it with _FortranACUFRegisterManagedVariable (CUFAddConstructor.cpp) 2. Call __cudaInitModule after registration to populate the managed pointer (registration.cpp) 3. Annotate managed globals in gpu.module with nvvm.managed for PTX .attribute(.managed) generation (cuda-code-gen.mlir) 4. Suppress cuf.data_transfer for assignments to/from non-allocatable module managed variables, since cudaMemcpy would target the shadow address rather than the actual unified memory (tools.h) 5. Preserve cuf.data_transfer for device_var = managed_var assignments where explicit transfer is still required	2026-03-31 16:27:08 +00:00
Nishant Patel	65720adc15	[MLIR][XeGPU] Switch to the new sg to wi pass (#188627 ) This PR has changes required to switch the pipeline to use the new sg to wi pass.	2026-03-31 09:20:11 -07:00
Chi-Chun, Chen	9e77a45935	[mlir][OpenMP][NFC] Refactor fillAffinityIteratorLoop (#189418 ) Extract affinity-specific logic from fillAffinityIteratorLoop into a callback so that the iterator loop codegen logic can be shared with other clauses such as depend clause and target clause.	2026-03-31 11:12:00 -05:00
Chi-Chun, Chen	7ff0dc4b9f	[mlir][OpenMP] Add iterator support to depend clause (#189090 ) Extend the depend clause to support `!omp.iterated<Ty>` handles alongside plain depend vars, so the IR can represent both forms. Assisted with copilot This is part of feature work for https://github.com/llvm/llvm-project/issues/188061	2026-03-31 11:11:08 -05:00
Mehdi Amini	6477f3aa16	[mlir][ArithToSPIRV] Fix invalid SPIRV and crashes when lowering integer ops on i1 (#189239 ) Several arith integer operations on i1 / vector<Ni1> types were either crashing or producing invalid SPIRV. The i1 type maps to spirv.bool in SPIRV, not to a SPIRV integer — so standard integer SPIRV ops (spirv.IAdd, spirv.UDiv, spirv.GLSMax, etc.) are illegal on it. Add dedicated boolean patterns for all affected arith integer ops, each with benefit=2 to take priority over the generic elementwise patterns. The semantics for i1 follow from treating true = 1 / false = 0 with two's complement wrapping: - addi, subi → spirv.LogicalNotEqual (XOR on bits) - muli, divui, divsi → spirv.LogicalAnd - remui, remsi, shli, shrui → spirv.LogicalAnd(a, spirv.LogicalNot(b)) (a & ~b) - shrsi → identity (arithmetic right shift of a 1-bit signed value is always the input) - maxui, minsi → spirv.LogicalOr (unsigned max / signed min treats true as larger) - maxsi, minui → spirv.LogicalAnd (signed max / unsigned min treats false as larger) Fixes #61162 Assisted-by: Claude Code	2026-03-31 17:56:51 +02:00
AidinT	67c34294a6	[mlir][docs] dialect interfaces and mlir reduce documentation fix (#189258 ) Two modifications: 1. Reflect newly added dialect interface methods in the documentation 2. Remove the bug in the `MLIR Reduce` documentation	2026-03-31 15:23:13 +00:00
Arseniy Obolenskiy	09c54a8f7a	[mlir][SPIR-V] Support spirv.loop_control attribute on scf.for and scf.while (#189392 ) Propagate the `spirv.loop_control` attribute from `scf.for` and `scf.while` operations to the generated `spirv.mlir.loop` during SCFToSPIRV conversion	2026-03-31 16:49:37 +02:00
Leandro Lupori	a30a8e9474	Reland "[flang][OpenMP] Fix lowering of LINEAR iteration variables (#183794 )" (#188851 ) Linear iteration variables were being treated as private. This fixes one of the issues reported in #170784. The regression reported in #188536 occurred because LinearClauseProcessor was rewriting all basic blocks whose names contained a given substring, including those that were not part of the translated SIMD region. This didn't cause problems before because linear variables were always privatized, which doesn't happen with this change. The issue is fixed by rewriting only the basic blocks that correspond to the omp.simd operation.	2026-03-31 09:36:08 -03:00
Davide Grohmann	d5f7acdbc1	[mlir][spirv] Add Cast/Rescale ops in TOSA Ext Inst Set (#189028 ) This patch introduces the following operators: spirv.Tosa.Cast spirv.Tosa.Rescale Also dialect and serialization round-trip tests have been added. Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>	2026-03-31 14:31:04 +02:00
Guray Ozen	97b78d6ff3	[MLIR][NVVM] Fix predicate operand index in BasicPtxBuilderInterface (#189552 ) Predicate index computation was incorrect, it was not counting write/readwrite symbols. Wrong case ``` // CHECK: %{{.}} = llvm.inline_asm has_side_effects asm_dialect = att "@$1 ex2.approx.ftz.f32 $0, $1;", "=f,f,b" %{{.}}, %{{.}} : (f32, i1) -> f32 %1 = nvvm.inline_ptx "ex2.approx.ftz.f32 {$w0}, {$r0};" ro (%input : f32), predicate = %pred -> f32 ``` PR fixes, predicate index became `@$2` ``` // CHECK: %{{.}} = llvm.inline_asm has_side_effects asm_dialect = att "@$2 ex2.approx.ftz.f32 $0, $1;", "=f,f,b" %{{.}}, %{{.}} : (f32, i1) -> f32 %1 = nvvm.inline_ptx "ex2.approx.ftz.f32 {$w0}, {$r0};" ro (%input : f32), predicate = %pred -> f32 ```	2026-03-31 13:36:58 +02:00
Sergei Lebedev	b544ad5703	[MLIR] [Python] Added a way to extend MLIR->Python type mappings (#189368 ) The idea is to use TableGen records for both custom type constraints and attributes: * `PythonTypeName` is for type constraints, while * `PythonAttrType` is for attributes. The key types differ between these two records. `PythonTypeName` is keyed by C++ type because multiple type constraints map to the same C++ type (e.g. `I32` and `I64` both map to `::mlir::IntegerType`), so a single entry covers all of them. `PythonAttrType` is keyed by TableGen def name because different attributes can share the same C++ storage type but need distinct Python types (e.g. `I32ArrayAttr` and `StrArrayAttr` are both `::mlir::ArrayAttr`). We could in theory reimplement `getPythonAttrName` using the same approach, but I decided to leave it for future PRs.	2026-03-31 11:00:40 +01:00
Mehdi Amini	7ad564e54b	[MLIR][MemRef] Fix LoadOpOfExpandShapeOpFolder returning failure after IR change (#188964 ) LoadOpOfExpandShapeOpFolder<vector::TransferReadOp>::matchAndRewrite called resolveSourceIndicesExpandShape (which creates AffineLinearizeIndexOp ops via the rewriter) before checking whether the vector::TransferReadOp preconditions hold. When those checks failed (sourceRank < vectorRank or permutation map mismatch), the pattern returned failure() after already modifying the IR, triggering "pattern returned failure but IR did change" under MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS. Fix by hoisting the vector::TransferReadOp precondition checks to before the resolveSourceIndicesExpandShape call. The source rank is derived from expandShapeOp.getViewSource()'s type (no IR creation needed), and the permutation map check only uses op attributes. Only if all checks pass do we proceed to create the linearized-index ops. Assisted-by: Claude Code Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.	2026-03-31 11:39:03 +02:00
Slava Zakharin	35f89458fa	[mlir] Made DefaultResource the root of memory resource hierarchy. (#187423 ) DefaultResource is made the root of the memory resource hierarchy, so now it overlaps with all resources. RFC: https://discourse.llvm.org/t/rfc-mlir-memory-region-hierarchy-for-mlir-side-effects/89811/32	2026-03-30 17:52:45 -07:00
Mehdi Amini	acbf3f3186	[MLIR][SCF] Fix scf.index_switch lowering to preserve large case values (#189230 ) `IndexSwitchLowering` stored case values as `SmallVector<int32_t>`, which silently truncated any `int64_t` case value larger than INT32_MAX (e.g. `4294967296` became `0`). The `cf.switch` flag was also created via `arith.index_cast index -> i32`, losing the upper 32 bits on 64-bit platforms. Fix: store case values as `SmallVector<APInt>` with 64-bit width, cast the index argument to `i64`, and use the `ArrayRef<APInt>` overload of `cf::SwitchOp::create` so the resulting switch correctly uses `i64` case values and flag type. Fixes #111589 Assisted-by: Claude Code	2026-03-31 00:46:28 +02:00
Mehdi Amini	5da2546594	[mlir][scf] Fix FoldTensorCastOfOutputIntoForallOp write order bug (#189162 ) `FoldTensorCastOfOutputIntoForallOp` incorrectly updated the destinations of `tensor.parallel_insert_slice` ops in the `in_parallel` block by zipping `getYieldingOps()` with `getRegionIterArgs()` positionally. This assumed that the i-th yielding op writes to the i-th shared output, which is not required by the IR semantics. When slices are written to shared outputs in non-positional order, the canonicalization would silently reverse the write targets, producing incorrect output. Fix by replacing the positional zip with a per-destination check: for each yielding op's destination operand, if it is a `tensor.cast` result whose source is one of the new `scf.forall` region iter args (i.e., a cast we introduced to bridge the type change), replace the destination with the cast's source directly. This correctly handles all orderings. Add a regression test that exercises the multi-result case where `parallel_insert_slice` ops write to shared outputs in non-sequential order. Fixes #172981 Assisted-by: Claude Code	2026-03-31 00:35:13 +02:00
Mehdi Amini	e097875417	[MLIR][SparseTensor] Fix fingerprint changes in SparseFuncAssembler (#188958 ) SparseFuncAssembler::matchAndRewrite was calling funcOp.setName(), funcOp.setPrivate(), and funcOp->removeAttr() directly without notifying the rewriter, causing "operation fingerprint changed" errors under MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS. Wrap all in-place funcOp mutations with rewriter.modifyOpInPlace. Assisted-by: Claude Code Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 00:33:47 +02:00
Mehdi Amini	27b9ea5ea0	[MLIR][SparseTensor] Fix domination violation in co-iteration for dense iterators (#188959 ) In exitWhileLoop, random-accessible (dense) iterators were being located using whileOp.getResults().back() while the insertion point was still inside the while loop's after block. This caused a domination violation: the ADDI created by locate() was inside the after block, but it was later used (via derefImpl's SUBI) after the while loop exits. Move the locate() calls for random-accessible iterators to after builder.setInsertionPointAfter(whileOp), where the while results are properly in scope. Fixes 10 failing tests under MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS. Assisted-by: Claude Code Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 00:33:28 +02:00
Keshav Vinayak Jha	54b723097b	[MLIR][Affine] Add vector support to affine.linearize_index and affine.delinearize_index (#188369 ) Allow `affine.delinearize_index` and `affine.linearize_index` to operate on `vector<...x index>` types in addition to scalar indices. --------- Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 14:26:39 -07:00
Eric Feng	ae835dea74	[mlir][amdgpu] implement amdgpu.global_load_async_to_lds for gfx1250 (#189279 ) This patch introduces an amdgpu wrapper for `rocdl.global.load.async.to.lds.bN` intrinsics, which were introduced in gfx1250. Assisted-by: Claude --------- Signed-off-by: Eric Feng <Eric.Feng@amd.com>	2026-03-30 14:20:59 -07:00
Stanislav Mekhanoshin	5f99854d01	[AMDGPU] Drop A and B neg modifier from amdgcn_wmma_bf16_16x16x32_bf16 (#189468 ) Fixes: LCOMPILER-1673	2026-03-30 14:14:22 -07:00
Nishant Patel	e50f08b548	[MLIR] [XeGPU] Add distribution patterns for vector transpose, bitcast & mask ops in sg to wi pass (#187392 ) This PR adds patterns for following vector ops in the new sg-to-wi pass 1. Transpose 2. BitCast 3. CreateMask 4. ConstantMask	2026-03-30 14:06:03 -07:00
Berke Ates	b6e4d27c48	[MLIR][Mem2Reg] Extract shared utilities for PromotableRegionOpInterface (#188514 ) The `PromotableRegionOpInterface` implementations use two helpers that are likely useful for other dialects implementing this interface as well: - `updateTerminator`: Appends the reaching definition as an operand to a block's terminator, falling back to a default when the block has no entry (e.g. dead code). - `replaceWithNewResults`: Clones an operation with additional result types while preserving its regions, then replaces the original. This PR extracts them into a common utility header so that downstream dialects can reuse them directly. I'm open to discussion about the location of these utilities.	2026-03-30 22:20:39 +02:00
Maksim Levental	f10dccd458	[MLIR][SparseTensor] Add #undef FAILURE_IF_FAILED and ERROR_IF (#188685 ) Both DimLvlMapParser.cpp and LvlTypeParser.cpp define FAILURE_IF_FAILED and ERROR_IF macros that are never undefined, which can leak into subsequent translation units in unity builds. Add #undef at the end of each file. See https://discourse.llvm.org/t/rfc-enabling-unity-build/90306 for more info. "clauded" not coded	2026-03-30 12:27:48 -07:00
Maksim Levental	03869c74b6	[MLIR][SparseTensor] Add missing #undef REMUI and DIVUI (#188686 ) LoopEmitter.cpp and SparseTensorIterator.cpp define REMUI and DIVUI macros but the existing #undef block at the end of each file omits them. This can leak the macros into subsequent translation units in unity builds. See https://discourse.llvm.org/t/rfc-enabling-unity-build/90306 for more info. "clauded" not coded	2026-03-30 12:27:31 -07:00

1 2 3 4 5 ...

26551 Commits