llvm-project

Author	SHA1	Message	Date
Jared Hoberock	90ec5f2f62	[MLIR][test] Re-disable FileCheck on async.mlir integration test (#190702 ) #190563 re-enabled FileCheck on `Integration/GPU/CUDA/async.mlir`, but the buildbot has shown intermittent wrong-output failures ([example](https://lab.llvm.org/buildbot/#/builders/116/builds/27026)): the test produces `[42, 42]` instead of the expected `[84, 84]`. This wrong-output flakiness is distinct from the cleanup-time `cuModuleUnload` errors that #190563 actually fixes — it's the underlying issue tracked by #170833. The merged commit message for #190563 incorrectly says `Fixes #170833`; that issue should be reopened, since the cleanup-error fix doesn't address the wrong-output behavior. This PR puts the test back in its previously-disabled state. The runtime cleanup fix in #190563 is unaffected.	2026-04-07 01:14:56 +02:00
Jared Hoberock	7087ece044	[MLIR][ExecutionEngine] Tolerate CUDA_ERROR_DEINITIALIZED in mgpuModuleUnload (#190563 ) `mgpuModuleUnload` may be called from a global destructor (registered by `SelectObjectAttr`'s `appendToGlobalDtors`) after the CUDA primary context has already been destroyed during program shutdown. In this case, `cuModuleUnload` returns `CUDA_ERROR_DEINITIALIZED`, which is benign since the module's resources are already freed with the context. ## Reproduction Any program that uses `gpu.launch_func` and is AOT-compiled (via `mlir-translate --mlir-to-llvmir \| llc \| cc -lmlir_cuda_runtime`) will print `'cuModuleUnload(module)' failed with '<unknown>'` on exit. This is because `SelectObjectAttr` registers the module unload as a global destructor, which runs after the CUDA primary context is released. This script reproduces the error message from `mgpuModuleUnload` on my system: ``` #!/bin/bash set -e LLVM_BUILD=${LLVM_BUILD:-$HOME/dev/git/llvm-project-22/build} cat > /tmp/repro.mlir << 'MLIR' func.func @main() { %c1 = arith.constant 1 : index gpu.launch blocks(%bx, %by, %bz) in (%gx = %c1, %gy = %c1, %gz = %c1) threads(%tx, %ty, %tz) in (%bsx = %c1, %bsy = %c1, %bsz = %c1) { gpu.terminator } return } MLIR $LLVM_BUILD/bin/mlir-opt /tmp/repro.mlir \ -gpu-lower-to-nvvm-pipeline="cubin-format=fatbin" \ \| $LLVM_BUILD/bin/mlir-translate --mlir-to-llvmir -o /tmp/repro.ll $LLVM_BUILD/bin/llc -relocation-model=pic -filetype=obj /tmp/repro.ll -o /tmp/repro.o cc /tmp/repro.o \ -L$LLVM_BUILD/lib -Wl,-rpath,$LLVM_BUILD/lib \ -lmlir_cuda_runtime -lmlir_runner_utils -o /tmp/repro echo "Running:" /tmp/repro 2>&1 echo "Exit code: $?" ``` ## Context This matches how other projects handle the same shutdown ordering issue: - Clang CUDA (D48613) switched module cleanup from `__attribute__((destructor))` to `atexit()` - GCC libgomp checks context validity before `cuModuleUnload` - Apache TVM silently ignores `CUDA_ERROR_DEINITIALIZED` on module unload Fixes #170833	2026-04-06 21:11:58 +00:00
Jianhui Li	9bddf47198	[MLIR][XeGPU] Extend Wg-to-Sg Distribution of Multi-Reduction Op for round-robin layout (#189988 ) This PR enhance the multi-reduction op pattern of wg-to-sg distribution pass: 1. allows each sg have multiple distribution of sg_data tiles. 2. expand the slm buffer size. 3. construct the layout based on the partial reduced vector and use layout.computeDistributedCoords() to compute coordinates. the layout is constructed so that the store is cooperative, and load overlapps with neighbour threads. 4. perform save and load.	2026-04-06 14:07:50 -07:00
adams381	9265f9284c	[mlir][ABI] Add writable, dead_on_unwind, dead_on_return, nofpclass param attrs to LLVM dialect (#188374 ) The MLIR LLVM dialect is missing support for several parameter attributes that exist in LLVM IR: `writable`, `dead_on_unwind`, `dead_on_return`, and `nofpclass`. This adds them to the kind-to-name mapping in `AttrKindDetail.h` and the corresponding name accessors in `LLVMDialect.td`. The existing generic conversion infrastructure in `ModuleTranslation` and `ModuleImport` picks them up automatically — `writable` and `dead_on_unwind` round-trip as `UnitAttr`, while `dead_on_return` and `nofpclass` round-trip as `IntegerAttr`. CIR needs these to match classic codegen's ABI output (sret gets `writable dead_on_unwind`, indirect args get `dead_on_return`, fast-math FP args get `nofpclass`).	2026-04-06 11:26:11 -05:00
Eric Feng	930ef7736e	[mlir][amdgpu] Add optional write mask to amdgpu.global_load_async_to_lds (#190498 )	2026-04-06 09:21:32 -07:00
Srinivasa Ravi	63231ebfe7	[MLIR][NVVM] Add new narrow FP convert Ops (#184291 ) This change adds the following NVVM Ops for new narrow FP conversions introduced in PTX 9.1: - `convert.{f32x2/bf16x2}.to.s2f6x2` - `convert.s2f6x2.to.bf16x2` - `convert.bf16x2.to.f8x2` (extended for `f8E4M3FN` and `f8E5M2` types) - `convert.{f16x2/bf16x2}.to.f6x2` - `convert.{f16x2/bf16x2}.to.f4x2` PTX ISA Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt	2026-04-06 12:06:25 +05:30
lonely eagle	ce1a9fd766	Reland "[mlir][reducer] Add eraseRedundantBlocksInRegion and getSuccessorForwardOperands API to BranchOpInterface" (#189253 ) After fixing undefined symbol and memory leak issues(You can see previous issue https://github.com/llvm/llvm-project/pull/189150), the PR would like to reland it(https://github.com/llvm/llvm-project/pull/187864).	2026-04-06 12:54:15 +08:00
lonely eagle	d08bb68080	[mlir][reducer] Add opt-pass-file option to opt-reduction pass (#189353 ) Currently, the opt-reduction-pass only supports inputting the optimization pipeline via the command line, which becomes cumbersome when the pipeline is long. To address this, this PR introduces the opt-pass-file option. This allows users to save the pipeline in a file and provide the filename to parse the pipeline.	2026-04-05 12:55:21 +08:00
Ivan R. Ivanov	420111e9e4	[mlir][LLVM] Fix incorrect verification of atomicrmw f{min,max}imumnum (#190474 ) Fix llvm.atomicrmw fminimumnum and fmaximumnum to correctly take the float operation verification path.	2026-04-04 17:17:37 +00:00
lonely eagle	230d757a13	[mlir][reducer] make opt-reduction pass clone topOp after check (NFC) (#189356 ) To avoid potential memory leaks, this PR defers the ModuleOp cloning until after the verification check. If the check fails, the moduleVariant might not be properly deallocated(original implementation), leading to a memory leak. Therefore, this PR ensures that the clone operation is only performed after a successful check. It is part of https://github.com/llvm/llvm-project/pull/189353.	2026-04-04 22:08:34 +08:00
Yue Huang	38f8945362	[MLIR][Presburger] Fix stale pivot in Smith normal form (#189789 ) The pivot used to fix divisibility in Smith normal form is stale. This will not affect correctness, but can lower efficiency since the outer loop will be executed more times. Thanks for @benquike of discovering this.	2026-04-03 23:17:57 -07:00
Nishant Patel	6f68502a98	[MLIR][XeGPU] Port tests from the XeGPUSubgroupDistribute to XeGPUSgToWiDistributeExperimental (#189747 ) This PR ports tests from subgroup-distribute.mlir (old pass) to sg-to-wi-experimental.mlir (new pass)	2026-04-03 17:14:16 -07:00
Nishant Patel	b5936d4989	[MLIR][XeGPU] Remove verifyLayouts from sg to wi pass (#190360 ) The verifyLayouts function walked the IR before distribution and failed the pass if any XeGPU anchor op or vector-typed result was missing a layout attribute. This was added as a temporary guard while the pass was being developed. Now we add target check for each op.	2026-04-03 17:00:05 -07:00
Nishant Patel	9c0a9bb3c6	[MLIR][XeGPU] Add support for reducing to scalar in sg to wi pass (#190193 )	2026-04-03 16:56:28 -07:00
Tom Tromey	8d34545792	Introduce and use Verifier::visitDIType (#189067 ) This adds a new method Verifier::visitDIType, and then changes method for subclasses of DIType to call it. The new method just dispatches to DIScope and adds a file/line check inspired by Verifier::visitDISubprogram.	2026-04-03 12:40:37 -06:00
Md Abdullah Shahneous Bari	ffd29734cc	[mlir][gpu] Extend `mgpumoduleLoadJIT` API to add assemblySize parameter (#189429 ) When JITing SPIR-V using LevelZero API, it expects the length of the string since passed input data is a `void `. Problem is, getting the length of the string is not possible using something like `strlen(reinterpret_cast<char >(data))` in `mgpuModuleLoadJIT` implementation. Becasuse the SPIR-V binary contains null bytes (i.e., the data is binary SPIR-V, not null-terminated text). As a result we need to pass the `assmeblySize` via the `mgpuModuleLoadJIT(void* data, int optLevel, size_t assmeblySize)`.	2026-04-03 12:45:40 -05:00
Mehdi Amini	8c81064169	[MLIR][Arith] Fix index_cast/index_castui chain folding to check intermediate width (#189042 ) The patterns `IndexCastOfIndexCast` and `IndexCastUIOfIndexCastUI` in ArithCanonicalization.td incorrectly eliminated a pair of index casts whenever the outer result type equalled the original source type, without verifying that the intermediate cast was lossless. For example, the following was wrong folded to `%arg0`: %0 = index_castui %arg0 : i64 to index %1 = index_castui %0 : index to i8 ← truncates to 8 bits %2 = index_castui %1 : i8 to index ← incorrectly removed The pattern matched `%1`/`%2` because `i8.to(index)` has the same result type as `i64.to(index)`, even though the i8 intermediate silently drops 56 bits. The same bug existed for the signed `index_cast` variant. Fix: move the optimization into the `fold` methods of `IndexCastOp` and `IndexCastUIOp` with an explicit check that the intermediate type is at least as wide as the source type (using `IndexType::kInternalStorageBitWidth` as the representative width for `index`). Only then is the round-trip guaranteed lossless and the chain can be collapsed. Fixes #90238 Fixes #90296 Assisted-by: Claude Code	2026-04-03 16:05:08 +02:00
Mehdi Amini	c2ec012098	[mlir][linalg] Fix crash in tile_reduction when output map has constant exprs (#189166 ) `generateInitialTensorForPartialReduction` and the `getInitSliceInfo*` helpers unconditionally cast every result expression of the partial result AffineMap to `AffineDimExpr`. When the original output indexing map contains a constant (e.g. `affine_map<(d0,d1,d2)->(d0,0,d2)>`), the constant expression propagates into the partial map and the cast triggers an assertion. Fixes #173025 Assisted-by: Claude Code	2026-04-03 11:09:26 +00:00
Mehdi Amini	73bcfb6824	[mlir][Affine] Fix LICM incorrectly hoisting stores from zero-trip-count loops (#189165 ) The affine-loop-invariant-code-motion pass was hoisting side-effectful operations (e.g. affine.store) out of loops whose trip count is statically known to be zero. This caused stores to execute unconditionally even though the loop body should never run, producing incorrect results. The fix skips hoisting of non-memory-effect-free ops when getConstantTripCount returns 0. Pure/side-effect-free ops are still eligible for hoisting because they cannot change observable program state. Fixes #128273 Assisted-by: Claude Code	2026-04-03 13:07:26 +02:00
Mehdi Amini	ff86be21de	[MLIR][MemRef] Fix AllocOp/AllocaOp flattening domination violation (#188980 ) The generic MemRefRewritePattern handles AllocOp/AllocaOp by calling getFlattenMemrefAndOffset with the op's own result as the source memref. This inserts ExtractStridedMetadataOp and ReinterpretCastOp that consume op.result before the alloc op itself in the block. After replaceOpWithNewOp, op.result is RAUW'd to the new ReinterpretCastOp result, leaving those earlier ops with forward references — a domination violation caught by MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS. Replace the AllocOp/AllocaOp cases in MemRefRewritePattern with a dedicated AllocLikeFlattenPattern that never touches op.result until the final replaceOpWithNewOp: - sizes come from op.getMixedSizes() (operands, not the result) - strides come from getStridesAndOffset on the MemRefType - the flat allocation size is computed via getLinearizedMemRefOffsetAndSize plus the static base offset so the buffer covers [0, offset+extent) - castAllocResult is simplified to take the pre-computed sizes and strides rather than inserting an ExtractStridedMetadataOp on the original op - non-zero static base offsets are now correctly preserved in the reinterpret_cast (the old code hardcoded offset=0, which was a verifier error for layouts with offset \!= 0) - dynamic offsets or strides bail out via notifyMatchFailure Also remove the now-dead AllocOp/AllocaOp branches from replaceOp() and the constexpr specialisation in getIndices(). Assisted-by: Claude Code	2026-04-03 11:21:00 +02:00
Mehdi Amini	d725513e7d	[MLIR][Affine] Fix null operands in simplifyConstrainedMinMaxOp (#189246 ) `mlir::affine::simplifyConstrainedMinMaxOp` called `canonicalizeMapAndOperands` with `newOperands` that could contain null `Value()`s. These nulls came from `unpackOptionalValues(constraints.getMaybeValues(), newOperands)` where internal constraint variables added by `appendDimVar` (for `dimOp`, `dimOpBound`, and `resultDimStart*`) have no associated SSA values. Passing null Values to `canonicalizeMapAndOperands` risks undefined behavior: - `seenDims.find(null_value)` in the DenseMap causes all null operands to collide at the same key, producing incorrect dim remapping. - Any null operand that remains referenced in the result map would propagate as a null Value into `AffineValueMap`, crashing callers that try to use those operands to create ops. Fix: Before calling `canonicalizeMapAndOperands`, filter null operands from `newOperands` by replacing their dim/symbol positions in `newMap` with constant 0 (safe because internal constraint dims should not appear in the bound map expression) and compacting `newOperands` to contain only non-null Values. Fixes #127436 Assisted-by: Claude Code	2026-04-03 10:17:50 +02:00
Zhewen Yu	a7bf24919f	[mlir][IntRangeAnalysis] Fix assertion in inferAffineExpr for mod with range crossing modulus boundary (#188842 ) The "small range with constant divisor" optimization in `inferAffineExpr` for `AffineExprKind::Mod` assumed that if the dividend range span (`lhsMax - lhsMin`) is less than the divisor, then the mod results form a contiguous range. This is not always true, as the range can straddle a modulus boundary. For example, `[14, 17] mod 8`: - Span is 3 < 8, so the old condition passed - But `14%8=6` and `17%8=1` (wraps at 16) - `umin=6, umax=1` → assertion `umin.ule(umax)` fails The fix adds a same-quotient check (`lhsMin/rhs == lhsMax/rhs`) to ensure both endpoints fall within the same modular period. When they don't, we fall back to the conservative `[0, divisor-1]` range. Assisted-by: Cursor (Claude) Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>	2026-04-03 10:15:52 +02:00
lonely eagle	8db1f6492a	[mlir][reducer] Remove the restriction that OptReductionPass must be a ModuleOp (#189038 ) This PR aims to make the pass more generic by removing the ModuleOp restriction. This PR reimplements the logic using a standalone PassManager. Additionally, the isInteresting method has been updated to accept Operation* for better flexibility. Finally, a dedicated test directory has been added to improve the organization of OptReductionPass tests.	2026-04-03 14:49:01 +08:00
xys-syx	1474e3e4f4	[MLIR][NVVM] Derive NVVM_SyncWarpOp from NVVM_IntrOp for import support (#188415 ) Change `NVVM_SyncWarpOp` base class from `NVVM_Op` to `NVVM_IntrOp<"bar.warp.sync">`, which auto-generates `llvmEnumName = nvvm_bar_warp_sync` and registers it with `-gen-intr-from-llvmir-conversions` and `-gen-convertible-llvmir-intrinsics`. This enables LLVM IR to MLIR import. The hand-written `llvmBuilder` is removed as the default `LLVM_IntrOpBase` builder is equivalent.	2026-04-02 15:41:50 -04:00
Bangtian Liu	86b5f11ecc	[mlir][GPU] Add constant address space to GPU dialect (#190211 ) This PR adds a `constant` address space to the` GPU dialect and lowerings to all GPU backends. Signed-off-by: Bangtian Liu <liubangtian@gmail.com>	2026-04-02 15:02:12 -04:00
Leandro Lupori	d59356aac5	Revert "Reland "[flang][OpenMP] Fix lowering of LINEAR iteration variables (#183794 )"" (#190180 ) Reverts llvm/llvm-project#188851	2026-04-02 11:09:30 -03:00
Mehdi Amini	a36f821e77	[mlir][linalg] Add test for ReduceOp empty-input verifier; remove dead empty-output check (#189614 ) Add a FileCheck test covering the 'expected at least one input' error in ReduceOp::verify(). The companion 'expected at least one output' check was dead code: SameVariadicOperandSize fires first whenever inputs.size() \!= inits.size(), and when both are empty the input check fires first; remove the unreachable branch. Assisted-by: Claude Code	2026-04-02 15:57:53 +02:00
Dhruv Chauhan	b87be02cc7	Revert "[mlir][tensor] Forward concat insert_slice destination into DPS provider" (#190143 ) This reverts commit 1418f80. The change can cause an infinite rewrite loop when ForwardConcatInsertSliceDest interacts with FoldEmptyTensorWithExtractSliceOp.	2026-04-02 14:48:44 +01:00
Julian Oppermann	018e048daf	[MLIR][Linalg] Generic to category specialization for unary elementwise ops (#187217 ) Handle specialization of `linalg.generic` ops representing a unary elementwise computation to the `linalg.elementwise` category op. This implements a previously absent path in the linalg morphism.	2026-04-02 10:50:21 +02:00
yebinchon	495e1a4257	[mlir] added a check in the walk to prevent catching a cos in a nested region (#190064 ) The walk in SincosFusion may detect a cos within a nested region of the sin block. This triggers an assertion in `isBeforeInBlock` later on. Added a check within the walk so it filters operations in nested regions, which are not in the same block and should not be fused anyway. --------- Co-authored-by: Yebin Chon <ychon@nvidia.com>	2026-04-01 20:10:56 -07:00
Nishant Patel	b3ca423a78	[MLIR][Vector] Enhance vector.multi_reduction unrolling to handle scalar result (#188633 ) Previously, UnrollMultiReductionPattern bailed out when all the dimensions were reduced to a scalar. This PR adds support for this case by tiling the source vector and chaining partial reductions through the accumulator operand.	2026-04-01 14:59:08 -07:00
Jianhui Li	1a1fbf967a	[MLIR][XeGPU] Support round-robin layout for constant and broadcast in wg-to-sg distribution (#189798 ) As title.	2026-04-01 14:58:01 -07:00
Nishant Patel	9f50004651	[MLIR][XeGPU] Enhance the peephole optimization to remove the convert_layout after multi-reduction rewrite (#188849 )	2026-04-01 13:55:11 -07:00
Jianhui Li	401ba6df84	[MLIR][XeGPU] Add Layout Propagation support for multi-reduction/reduction op with scalar result (#189133 ) This PR add Layout Propagation support for multi-reduction/reduction op with scalar result: 1) Enhance setupMultiReductionResultLayout() and LayoutInfoPropagation::visitVectorMultiReductionOp() to support scalar result 2) Add propagation support for vector.reduction op at the lane level, since the op is only introduced at the lane level.	2026-04-01 13:01:34 -07:00
Jeff Sandoval	95a76886c1	[OpenMP][MLIR] Fix GPU teams reduction buffer size for by-ref reductions (#185460 ) The `ReductionDataSize` field in `KernelEnvironmentTy` and the `MaxDataSize` used to compute the `reduce_data_size` argument to `__kmpc_nvptx_teams_reduce_nowait_v2` were both computed using pointer types for by-ref reductions instead of the actual element types. This caused the global teams reduction buffer to be undersized relative to the offsets used by the copy/reduce callbacks, resulting in out-of-bounds accesses faults at runtime. For example, a by-ref reduction over `[4 x i32]` (16 bytes) would allocate buffer slots based on `sizeof(ptr)` = 8 bytes, but the generated callbacks would access 16 bytes per slot. Fix both computation sites: 1. In MLIR's `getReductionDataSize()`, use `DeclareReductionOp::getByrefElementType()` instead of `getType()` when the reduction is by-ref, so the reduction buffer struct layout (and more importantly its size) matches that emitted by the `OMPIRBuilder`. 2. In `OMPIRBuilder::createReductionsGPU()`, use `ReductionInfo::ByRefElementType` instead of `ElementType` for by-ref reductions when computing `MaxDataSize`. It seems that `MaxDataSize` isn't actually used in the deviceRTL, but it's better to fix it to avoid future propagation of this bug. Finally, add CHECK lines to the existing array-descriptor reduction test to verify both the kernel environment `ReductionDataSize` and the `reduce_data_size` call argument reflect the actual element type size. Assisted-by: Claude Opus 4.6 --------- Co-authored-by: Jeffrey Sandoval <jeffrey.sandoval@hpe.com>	2026-04-01 14:59:16 -05:00
Nishant Patel	150aa6f2d3	[MLIR][XeGPU] Add support for convert layout with scalar in Sg to WI distribution (#189721 )	2026-04-01 12:05:32 -07:00
Zhen Wang	3b3b556a12	[mlir][NVVM] Add managed attribute for global variables (#189751 ) Add support for the `nvvm.managed` attribute on `llvm.mlir.global` ops. When present, the LLVM IR translation emits `!nvvm.annotations` metadata with `!"managed"` for the global variable, which the NVPTX backend uses to generate `.attribute(.managed)` in PTX output. This enables CUDA managed memory support for frontends that lower through MLIR.	2026-04-01 18:18:20 +00:00
Vito Secona	fbf484009c	[mlir][sparse] add GPU num threads to sparsifier options (#189078 ) This change adds a `gpu-num-threads` option to the sparsifier. This allows users to specify the number of threads used for GPU codegen, similar to the `num-threads` option in the `-sparse-gpu-codegen` pass.	2026-04-01 10:42:26 -07:00
Jan Leyonberg	91adaeceb1	[CIR][MLIR][OpenMP] Enable the MarkDeclareTarget pass for ClangIR (#189420 ) This patch enables the MarkDeclareTarget for CIR by adding the pass to the lowerings and attaching the declare target interface to the cir::FuncOp. The MarkDeclareTarget is also generalized to work on the FunctionOpInterface instead of func::Op since it needs to be able to handle cir::FuncOp as well. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-01 12:50:09 -04:00
Razvan Lupusoru	9506f20b4d	[flang][acc] Add AA implementation for acc operations (#189772 ) This PR extends flang's alias analysis so it can reason about values that originate from OpenACC data and privatization operations, including values passed through block arguments.	2026-04-01 09:03:47 -07:00
Ming Yan	158f10fe24	[mlir][memref] Fold memref.reinterpret_cast operations with valid offset or size constants. (#189533 ) When encountering an invalid offset or size, we only skip the current invalid value and continue attempting to fold other valid offsets or sizes.	2026-04-01 15:30:12 +02:00
AidinT	461a1c51bf	[mlir][ods] resolve the wrong indent issue (#189277 ) The `emitSummaryAndDescComments` is used to generate summary and description for tablegen generated classes and structs such as Dialects and Interfaces. The generated summary and description is indented incorrectly in the output generated file. For example `NVGPUDialect.h.inc ` looks like the following: ```cpp namespace mlir::nvgpu { /// The `NVGPU` dialect provides a bridge between higher-level target-agnostic /// dialects (GPU and Vector) and the lower-level target-specific dialect /// (LLVM IR based NVVM dialect) for NVIDIA GPUs. This allow representing PTX /// specific operations while using MLIR high level dialects such as Memref /// and Vector for memory and target-specific register operands, respectively. class NVGPUDialect : public ::mlir::Dialect { ... }; } // namespace mlir::nvgpu ``` This is because the `emitSummaryAndDescComments` trims the summary and description from both sides, rendering the re-indentation useless. This PR resolves this bug.	2026-04-01 15:20:54 +02:00
Erick Ochoa Lopez	5072c020aa	[mlir][vector] Drop trailing 1-dims from constant_mask (#187383 ) Generalize TransferReadDropUnitDimsPattern to also drop unit dimensions when `vector::ConstantMaskOp` is used. Previously TransferReadDropUnitDimsPattern would only drop unit dimensions when `vector::CreateMaskOp` with a statically known operand was used. Assisted-by: Cursor	2026-04-01 09:05:25 -04:00
Mehdi Amini	f6ffdbcbae	[MLIR][Affine] Fix dead store elimination for vector stores with different types (#189248 ) affine-scalrep's findUnusedStore incorrectly classified an affine.vector_store as dead when a subsequent store wrote to the same base index but with a smaller vector type. A vector<1xi64> store at [0,0] does not fully overwrite a vector<5xi64> store at [0,0], so the first store must be preserved. The loadCSE function in the same file already had the correct type-equality check for loads; this patch adds the analogous check for stores in findUnusedStore. Fixes #113687 Assisted-by: Claude Code	2026-04-01 10:40:53 +00:00
lonely eagle	6b2b0da40d	[mlir][CSE] Fix double-counting of numCSE statistic (#189802 ) This PR fixes a regression where the numCSE statistic was being incremented twice for a single operation elimination. The numCSE counter is already internally incremented within the replaceUsesAndDelete function. Manually incrementing it again after the function call leads to an inaccurate total count. This is part of the https://github.com/llvm/llvm-project/pull/180556.	2026-04-01 17:10:20 +08:00
Mehdi Amini	249e871fa4	[MLIR][ArithToLLVM] Fix index_cast on memref types generating invalid LLVM IR (#189227 ) `arith.index_cast` and `arith.index_castui` accept memref operands (via `IndexCastTypeConstraint`), but `IndexCastOpLowering::matchAndRewrite` did not handle this case. When the operand was a memref, the conversion framework substituted the converted LLVM struct type, and the lowering incorrectly attempted to emit `llvm.sext`/`llvm.zext`/`llvm.trunc` on a struct value, producing invalid LLVM IR. Since LLVM uses opaque pointers, all memrefs with integer or index element types lower to the same `\!llvm.struct<(ptr, ptr, i64, ...)>` type, making `arith.index_cast` on memrefs a no-op at the LLVM level. Add a check that treats the memref case as an identity conversion (same as the same-bit-width path). Fixes #92377 Assisted-by: Claude Code	2026-04-01 11:03:14 +02:00
Mehdi Amini	b1f8c28559	[MLIR] Validate APInt bitwidth in IntegerAttr::get(Type, APInt) (#188725 ) IntegerAttr::get(Type, APInt) did not validate that the APInt's bit width matched the expected bit width for the given type. For integer types, the APInt width must equal the integer type's width. For index types, the APInt width must equal IndexType::kInternalStorageBitWidth (64 bits). Passing an APInt with the wrong bit width could cause a non-deterministic crash in StorageUniquer when comparing two IntegerAttr instances for the same type but with different APInt widths. This commit adds assertions in the get(Type, APInt) builder to catch such misuse early in debug builds, providing a clear error message at the call site rather than a cryptic crash in the storage uniquer. Fixes #56401 Assisted-by: Claude Code	2026-04-01 10:47:36 +02:00
Lukas Sommer	6a31be68e3	[mlir][NFC] Remove conditionally unused type alias (#189894 ) The `RawType` type alias is unused (`-Wunused-local-typedef`) in build with asserts deactivated. In combination with `-Werror`, this causes builds to fail. Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>	2026-04-01 10:25:58 +02:00
AidinT	585e2a015b	[MLIR] Convert BytecodeDialectInterface to ods (#188852 ) This PR converts `BytecodeDialectInterface` to ODS.	2026-04-01 05:41:07 +02:00
AidinT	d52a5e8a5a	[MLIR] convert ConvertToEmitCPatternInterface to ODS (#188621 ) This PR converts `ConvertToEmitCPatternInterface` dialect interface to ODS. Also makes changes to derived classes.	2026-04-01 05:30:12 +02:00

1 2 3 4 5 ...

26579 Commits