llvm-project

Author	SHA1	Message	Date
Durgadoss R	36dc6146b8	[MLIR][NVVM] Update TMA tensor prefetch Op (#153464 ) This patch updates the TMA Tensor prefetch Op to add support for im2col_w/w128 and tile_gather4 modes. This completes support for all modes available in Blackwell. * lit tests are added for all possible combinations. * The invalid tests are moved to a separate file with more coverage. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-08-22 12:51:29 +05:30
Rajat Bajpai	b08b219650	[MLIR][NVVM] Add "blocksareclusters" kernel attribute support (#154519 ) This change adds "nvvm.blocksareclusters" kernel attribute support in NVVM Dialect/MLIR.	2025-08-22 11:32:21 +05:30
Yang Bai	f1f194bf10	[mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774 ) ### Problem PR #142944 introduced a new canonicalization pattern which caused failures in the following GPU-related integration tests: - mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir - mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir The issue occurs because the new canonicalization pattern can generate multi-dimensional `vector.from_elements` operations (rank > 1), but the GPU lowering pipelines were not equipped to handle these during the conversion to LLVM. ### Fix This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`: - `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass. - `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass. Co-authored-by: Yang Bai <yangb@nvidia.com>	2025-08-21 21:46:06 -05:00
Tim Gymnich	e20fa4f412	[mlir][AMDGPU] Add PermlaneSwapOp (#154345 ) - Add PermlaneSwapOp that lowers to `rocdl.permlane16.swap` and `rocdl.permlane32.swap` --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-08-21 18:21:43 +02:00
Guray Ozen	5c36fb3303	[MLIR][NVVM] Improve inline_ptx, add readwrite support (#154358 ) Key Features 1. Multiple SSA returns – no struct packing/unpacking required. 2. Automatic struct unpacking – values are directly usable. 3. Readable register mapping * {$rwN} → read-write * {$roN} → read-only * {$woN} → write-only 4. Full read-write support (+ modifier). 5. Simplified operand specification – avoids cryptic "=r,=r,=f,=f,f,f,0,1" constraints. 6. Predicate support: PTX `@p` predication support IR Example: ``` %wo0, %wo1 = nvvm.inline_ptx """ .reg .pred p; setp.ge.s32 p, {$r0}, {$r1}; selp.s32 {$rw0}, {$r0}, {$r1}, p; selp.s32 {$rw1}, {$r0}, {$r1}, p; selp.s32 {$w0}, {$r0}, {$r1}, p; selp.s32 {$w1}, {$r0}, {$r1}, p; """ ro(%a, %b : f32, f32) rw(%c, %d : i32, i32) -> f32, f32 ``` After lowering ``` %0 = llvm.inline_asm has_side_effects asm_dialect = att "{ .reg .pred p;\ setp.ge.s32 p, $4, $5; \ selp.s32 $0, $4, $5, p;\ selp.s32 $1, $4, $5, p;\ selp.s32 $2, $4, $5, p;\ selp.s32 $3, $4, $5, p;\ }" "=r,=r,=f,=f,f,f,0,1" %c500_i32, %c400_i32, %cst, %cst_0 : (i32, i32, f32, f32) -> !llvm.struct<(i32, i32, f32, f32)> %1 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> %2 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> %3 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> %4 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> // Unpacked result from nvvm.inline_ptx %5 = arith.addi %1, %2 : i32 // read only %6 = arith.addf %cst, %cst_0 : f32 // write only %7 = arith.addf %3, %4 : f32 ```	2025-08-21 17:42:18 +02:00
Chao Chen	68d6866428	[mlir][XeGPU] add WgToSg distribution pattern for load_matrix and store_matrix. (#154403 )	2025-08-21 10:02:45 -05:00
Renato Golin	32a5adbd42	[MLIR][Linalg] Rename convolution pass (#154400 ) Rename the pass `LinalgNamedOpConversionPass` to `SimplifyDepthwiseConvPass` to avoid conflating it with the new morphisms we are creating between the norms.	2025-08-21 15:57:16 +01:00
Guray Ozen	3d41197d68	[MLIR] Introduce RemarkEngine + pluggable remark streaming (YAML/Bitstream) (#152474 ) This PR implements structured, tooling-friendly optimization remarks with zero cost unless enabled. It implements: - `RemarkEngine` collects finalized remarks within `MLIRContext`. - `MLIRRemarkStreamerBase` abstract class streams them to a backend. - Backends: `MLIRLLVMRemarkStreamer` (bridges to llvm::remarks → YAML/Bitstream) or your own custom streamer. - Optional mirroring to DiagnosticEngine (printAsEmitRemarks + categories). - Off by default; no behavior change unless enabled. Thread-safe; ordering best-effort. ## Overview ``` Passes (reportOptimization*) │ ▼ +-------------------+ \| RemarkEngine \| collects +-------------------+ │ │ │ mirror │ stream ▼ ▼ emitRemark MLIRRemarkStreamerBase (abstract) │ ├── MLIRLLVMRemarkStreamer → llvm::remarks → YAML \| Bitstream └── CustomStreamer → your sink ``` ## Enable Remark engine and Plug LLVM's Remark streamer ``` // Enable once per MLIRContext. This uses `MLIRLLVMRemarkStreamer` mlir::remark::enableOptimizationRemarksToFile( ctx, path, llvm::remarks::Format::YAML, cats); ``` ## API to emit remark ``` // Emit from a pass remark::passed(loc, categoryVectorizer, myPassname1) << "vectorized loop"; remark::missed(loc, categoryUnroll, "MyPass") << remark::reason("not profitable at this size") // Creates structured reason arg << remark::suggest("increase unroll factor to >=4"); // Creates structured suggestion arg remark::passed(loc, categoryVectorizer, myPassname1) << "vectorized loop" << remark::metric("tripCount", 128); // Create structured metric on-the-fly ```	2025-08-21 16:02:31 +02:00
donald chen	5af7263d42	[mlir] add getViewDest method to viewLikeOpInterface (#154524 ) The viewLikeOpInterface abstracts the behavior of an operation view one buffer as another. However, the current interface only includes a "getViewSource" method and lacks a "getViewDest" method. Previously, it was generally assumed that viewLikeOpInterface operations would have only one return value, which was the view dest. This assumption was broken by memref.extract_strided_metadata, and more operations may break these silent conventions in the future. Calling "viewLikeInterface->getResult(0)" may lead to a core dump at runtime. Therefore, we need 'getViewDest' method to standardize our behavior. This patch adds the getViewDest function to viewLikeOpInterface and modifies the usage points of viewLikeOpInterface to standardize its use.	2025-08-21 20:09:52 +08:00
Mehdi Amini	b20c291bae	[MLIR] Adopt LDBG() debug macro in PatternApplicator.cpp (NFC) (#154724 )	2025-08-21 10:32:21 +00:00
Mehdi Amini	acda808304	[MLIR] Adopt LDBG() macro in BuiltinAttributes.cpp (NFC) (#154723 )	2025-08-21 10:31:18 +00:00
Mehdi Amini	b916df3a08	[MLIR] Adopt LDBG() in Transform/IR/Utils.cpp (NFC) (#154722 )	2025-08-21 10:30:01 +00:00
Mehdi Amini	30f9428f14	[MLIR] Adopt LDBG() macro in LLVM/NVVM/Target.cpp (#154721 )	2025-08-21 10:29:37 +00:00
Mehdi Amini	db0529dca3	[MLIR] Use LDBG() macro in Dialect.cpp (NFC) (#154720 )	2025-08-21 10:28:51 +00:00
Mehdi Amini	62b29d9f76	[MLIR] Adopt LDBG() debug macro in BytecodeWriter.cpp (NFC) (#154642 )	2025-08-20 22:45:39 +00:00
Mehdi Amini	908eebcb93	[MLIR] Adopt LDBG() macro in PDL ByteCodeExecutor (NFC) (#154641 )	2025-08-20 22:40:52 +00:00
Mehdi Amini	dbbd3f0d07	[MLIR] Adopt LDBG() macro in Affine/Analysis/Utils.cpp (NFC) (#154626 )	2025-08-20 21:56:03 +00:00
Mehdi Amini	d20a74e631	[MLIR] Adopt LDBG() macro in BasicPtxBuilderInterface.cpp (NFC) (#154625 )	2025-08-20 21:51:17 +00:00
Mehdi Amini	4be19e27b5	[MLIR] Adopt LDBG() debug macros in Affine LoopAnalysis.cpp (NFC) (#154621 )	2025-08-20 21:45:42 +00:00
Mehdi Amini	6445a75c98	[MLIR] Update MLIRContext to use the LDBG() style debug macro (NFC) (#154619 )	2025-08-20 21:30:11 +00:00
Mehdi Amini	ffbc8da8b5	[MLIR] Migrate LICM utils to the LDBG() macro style logging (NFC) (#154615 )	2025-08-20 21:29:50 +00:00
Mehdi Amini	780750bbf9	[MLIR] Adopt LDBG() debug macro in ConvertToLLVMPass (NFC) (#154616 )	2025-08-20 21:29:35 +00:00
Mehdi Amini	5683baea6d	[MLIR] Adopt LDBG() debug macro in bufferization (NFC) (#154614 )	2025-08-20 21:14:02 +00:00
Rolf Morel	cbfa265e98	[MLIR][LLVMIR][DLTI] Add `LLVM::TargetAttrInterface` and `#llvm.target` attr (#145899 ) Adds the `#llvm.target<triple = $TRIPLE, chip = $CHIP, features = $FEATURES>` attribute and along with a `-llvm-target-to-data-layout` pass to derive a MLIR data layout from the LLVM data layout string (using the existing `DataLayoutImporter`). The attribute implements the relevant DLTI-interfaces, to expose the `triple`, `chip` (AKA `cpu`) and `features` on `#llvm.target` and the full `DataLayoutSpecInterface`. The pass combines the generated `#dlti.dl_spec` with an existing `dl_spec` in case one is already present, e.g. a `dl_spec` which is there to specify size of the `index` type. Adds a `TargetAttrInterface` which can be implemented by all attributes representing LLVM targets. Similar to the Draft PR https://github.com/llvm/llvm-project/pull/78073. RFC on which this PR is based: https://discourse.llvm.org/t/mandatory-data-layout-in-the-llvm-dialect/85875	2025-08-20 22:00:30 +01:00
Akash Banerjee	d69ccded4f	[MLIR] Add cpow support in ComplexToROCDLLibraryCalls (#153183 ) This PR adds support for complex power operations (`cpow`) in the `ComplexToROCDLLibraryCalls` conversion pass, specifically targeting AMDGPU architectures. The implementation optimises complex exponentiation by using mathematical identities and special-case handling for small integer powers. - Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa` target instead of using library calls - Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using mathematical identity	2025-08-20 17:18:30 +00:00
Matthias Springer	6a285cc8e6	[mlir][IR] Fix `Block::without_terminator` for blocks without terminator (#154498 ) Blocks without a terminator are not handled correctly by `Block::without_terminator`: the last operation is excluded, even when it is not a terminator. With this commit, only terminators are excluded. If the last operation is unregistered, it is included for safety.	2025-08-20 18:02:24 +02:00
Matthias Springer	0499d3a8cf	[mlir][Interfaces] Add `hasUnknownEffects` helper function (#154523 ) I have seen misuse of the `hasEffect` API in downstream projects: users sometimes think that `hasEffect == false` indicates that the operation does not have a certain memory effect. That's not necessarily the case. When the op does not implement the `MemoryEffectsOpInterface`, it is unknown whether it has the specified effect. "false" can also mean "maybe". This commit clarifies the semantics in the documentation. Also adds `hasUnknownEffects` and `mightHaveEffect` convenience functions. Also simplifies a few call sites.	2025-08-20 15:24:53 +00:00
Hank	c075fb8c37	[MLIR] Fix duplicated attribute nodes in MLIR bytecode deserialization (#151267 ) Fixes #150163 MLIR bytecode does not preserve alias definitions, so each attribute encountered during deserialization is treated as a new one. This can generate duplicate `DISubprogram` nodes during deserialization. The patch adds a `StringMap` cache that records attributes and fetches them when encountered again.	2025-08-20 13:03:26 +00:00
Luc Forget	95fbc18a70	[MLIR][Wasm] Extending Wasm binary to WasmSSA dialect importer (#154452 ) This is a cherry pick of #154053 with a fix for bad handling of endianess when loading float and double litteral from the binary. --------- Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global> Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global> Co-authored-by: Luc Forget <luc.forget@woven.toyota>	2025-08-20 10:55:55 +02:00
Ian Wood	961b052e98	[mlir][tensor][NFC] Refactor common methods for bubbling extract_slice op (#153675 ) Exposes the `tensor.extract_slice` reshaping logic in `BubbleUpExpandShapeThroughExtractSlice` and `BubbleUpCollapseShapeThroughExtractSlice` through two corresponding utility functions. These compute the offsets/sizes/strides of an extract slice after either collapsing or expanding. This should also make it easier to implement the two other bubbling cases: (1) the `collapse_shape` is a consumer or (2) the `expand_shape` is a consumer. --------- Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>	2025-08-19 19:31:30 +00:00
Kazu Hirata	2c4f0e7ac6	[mlir] Replace SmallSet with SmallPtrSet (NFC) (#154265 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 30 instances that rely on this "redirection". Since the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types. I'm planning to remove the redirection eventually.	2025-08-19 07:11:47 -07:00
Yang Bai	b4c31dc98d	[mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements (#142944 ) ## Description This change introduces a new canonicalization pattern for the MLIR Vector dialect that optimizes chains of insertions. The optimization identifies when a vector is completely initialized through a series of vector.insert operations and replaces the entire chain with a single `vector.from_elements` operation. Please be aware that the new pattern doesn't work for poison vectors where only some elements are set, as MLIR doesn't support partial poison vectors for now. New Pattern: InsertChainFullyInitialized * Detects chains of vector.insert operations. * Validates that all insertions are at static positions, and all intermediate insertions have only one use. * Ensures the entire vector is completely initialized. * Replaces the entire chain with a single vector.from_elementts operation. Refactored Helper Function * Extracted `calculateInsertPosition` from `foldDenseElementsAttrDestInsertOp` to avoid code duplication. ## Example ``` // Before: %v1 = vector.insert %c10, %v0[0] : i64 into vector<2xi64> %v2 = vector.insert %c20, %v1[1] : i64 into vector<2xi64> // After: %v2 = vector.from_elements %c10, %c20 : vector<2xi64> ``` It also works for multidimensional vectors. ``` // Before: %v1 = vector.insert %cv0, %v0[0] : vector<3xi64> into vector<2x3xi64> %v2 = vector.insert %cv1, %v1[1] : vector<3xi64> into vector<2x3xi64> // After: %0:3 = vector.to_elements %arg1 : vector<3xi64> %1:3 = vector.to_elements %arg2 : vector<3xi64> %v2 = vector.from_elements %0#0, %0#1, %0#2, %1#0, %1#1, %1#2 : vector<2x3xi64> ``` --------- Co-authored-by: Yang Bai <yangb@nvidia.com> Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>	2025-08-19 13:43:31 +01:00
Mehdi Amini	dc82b2cc70	Revert "[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer" (#154314 ) Reverts llvm/llvm-project#154053 Seems like an endianness sensitivity failing a big-endian bot.	2025-08-19 14:05:09 +02:00
Luc Forget	df57bb8c49	[MLIR][WASM] Extending the Wasm binary to WasmSSA dialect importer (#154053 ) This is the continuation of #152131 This PR adds support for parsing the global initializer and function body, and support for decoding scalar numerical instructions and variable related instructions. --------- Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global> Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global> Co-authored-by: Luc Forget <luc.forget@woven.toyota>	2025-08-19 13:42:47 +02:00
Md Asghar Ahmad Shahid	c24c23d9ab	[NFC][mlir][vector] Handle potential static cast assertion. (#152957 ) In FoldArithToVectorOuterProduct pattern, static cast to vector type causes assertion when a scalar type was encountered. It seems the author meant to have a dyn_cast instead. This NFC patch handles it by using dyn_cast.	2025-08-19 09:27:20 +05:30
Jianjian Guan	1eb5b18a04	[mlir][emitc] Support dense as init value for ShapedType (#144826 )	2025-08-19 09:41:15 +08:00
Mehdi Amini	89abccc9a6	[MLIR] Update GreedyRewriter to use the LDBG() debug log mechanism (NFC) (#153961 ) Also improve a bit the LDBG() implementation	2025-08-18 21:05:34 +00:00
Mehdi Amini	8c605bd1f4	[MLIR] Add logging to eraseUnreachableBlocks (NFC) (#153968 )	2025-08-18 21:02:53 +00:00
Mehdi Amini	dfaebe7f48	[MLIR] Fix Liveness analysis handling of unreachable code (#153973 ) This patch is forcing all values to be initialized by the LivenessAnalysis, even in dead blocks. The dataflow framework will skip visiting values when its already knows that a block is dynamically unreachable, so this requires specific handling. Downstream code could consider that the absence of liveness is the same a "dead". However as the code is mutated, new value can be introduced, and a transformation like "RemoveDeadValue" must conservatively consider that the absence of liveness information meant that we weren't sure if a value was dead (it could be a newly introduced value. Fixes #153906	2025-08-18 20:50:36 +00:00
Mehdi Amini	191e7eba93	[MLIR] Stop visiting unreachable blocks in the walkAndApplyPatterns driver (#154038 ) This is similar to the fix to the greedy driver in #153957 ; except that instead of removing unreachable code, we just ignore it. Operations like: ``` %add = arith.addi %add, %add : i64 ``` are legal in unreachable code. Unfortunately many patterns would be unsafe to apply on such IR and can lead to crashes or infinite loops.	2025-08-18 20:46:59 +00:00
Charitha Saumya	9617ce4862	[vector][distribution] Bug fix in `moveRegionToNewWarpOpAndAppendReturns` (#153656 )	2025-08-18 13:26:08 -07:00
Yang Bai	4eb1a07d7d	[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175 ) This patch introduces a new unrolling-based approach for lowering multi-dimensional `vector.from_elements` operations. Implementation Details: 1. New Transform Pattern: Added `UnrollFromElements` that unrolls a N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the outermost dimension. 2. Utility Functions: Added `unrollVectorOp` to reuse the unroll algo of vector.gather for vector.from_elements. 3. Integration: Added the unrolling pattern to the convert-vector-to-llvm pass as a temporal transformation. 4. Use direct LLVM dialect operations instead of intermediate vector.insert operations for efficiency in `VectorFromElementsLowering`. Example: ```mlir // unroll %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32> %vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32> %vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32> %result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32> // convert-vector-to-llvm %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>> %poison_1d_0 = llvm.mlir.poison : vector<2xf32> %c0_0 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32> %c1_0 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32> %vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>> %poison_1d_1 = llvm.mlir.poison : vector<2xf32> %c0_1 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32> %c1_1 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32> %vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>> %result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32> ``` --------- Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com> Co-authored-by: Yang Bai <yangb@nvidia.com> Co-authored-by: James Newling <james.newling@gmail.com> Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>	2025-08-18 10:09:12 -07:00
Nishant Patel	4a9d038acd	[MLIR][XeGPU] Distribute load_nd/store_nd/prefetch_nd with offsets from Wg to Sg (#153432 ) This PR adds pattern to distribute the load/store/prefetch nd ops with offsets from workgroup to subgroup IR. This PR is part of the transition to move offsets from create_nd to load/store/prefetch nd ops. Create_nd PR : #152351	2025-08-18 09:45:29 -07:00
Jeremy Kun	c67d27dad0	[mlir][Presburger] NFC: return var index from IntegerRelation::addLocalFloorDiv (#153463 ) addLocalFloorDiv currently returns void and requires the caller to know that the newly added local variable is in a particular index. This commit returns the index of the newly added variable so that callers need not tie themselves to this implementation detail. I found one relevant callsite demonstrating this and updated it. I am using this API out of tree and wanted to make our out-of-tree code a bit more resilient to upstream changes.	2025-08-18 08:47:47 -07:00
Jacques Pienaar	4bf33958da	[mlir] Update builders to use new form. (#154132 ) Mechanically applied using clang-tidy.	2025-08-18 15:19:34 +00:00
Matthias Springer	f84aaa6eaa	[mlir][Transforms] Dialect conversion: Add flag to dump materialization kind (#119532 ) Add a debugging flag to the dialect conversion to dump the materialization kind. This flag is useful to find out whether a missing materialization rule is for source or target materializations. Also add missing test coverage for the `buildMaterializations` flag.	2025-08-18 13:25:18 +00:00
Chaitanya	4a3bf27c69	[OpenMP] Introduce omp.target_allocmem and omp.target_freemem omp dialect ops. (#145464 ) This PR introduces two new ops in omp dialect, omp.target_allocmem and omp.target_freemem. omp.target_allocmem: Allocates heap memory on device. Will be lowered to omp_target_alloc call in llvm. omp.target_freemem: Deallocates heap memory on device. Will be lowered to omp+target_free call in llvm. Example: %1 = omp.target_allocmem %device : i32, i64 omp.target_freemem %device, %1 : i32, i64 The work in this PR is C-P/inspired from @ivanradanov commit from coexecute implementation: [Add fir omp target alloc and free ops](`be860ac8ba`) [Lower omp_target_{alloc,free} to llvm](`6e2d584dc9`)	2025-08-18 18:15:11 +05:30
Mehdi Amini	cfe5975eaf	[MLIR] Fix SCF verifier crash (#153974 ) An operand of the nested yield op can be null and hasn't been verified yet when processing the enclosing operation. Using `getResultTypes()` will dereference this null Value and crash in the verifier.	2025-08-18 12:48:55 +02:00
Mehdi Amini	16aa283344	[MLIR] Refactor the walkAndApplyPatterns driver to remove the recursion (#154037 ) This is in preparation of a follow-up change to stop traversing unreachable blocks. This is not NFC because of a subtlety of the early_inc. On a test case like: ``` scf.if %cond { "test.move_after_parent_op"() ({ "test.any_attr_of_i32_str"() {attr = 0 : i32} : () -> () }) : () -> () } ``` We recursively traverse the nested regions, and process an op when the region is done (post-order). We need to pre-increment the iterator before processing an operation in case it gets deleted. However we can do this before or after processing the nested region. This implementation does the latter.	2025-08-18 09:07:19 +00:00
Mehdi Amini	87e6fd161a	[MLIR] Erase unreachable blocks before applying patterns in the greedy rewriter (#153957 ) Operations like: %add = arith.addi %add, %add : i64 are legal in unreachable code. Unfortunately many patterns would be unsafe to apply on such IR and can lead to crashes or infinite loops. To avoid this we can remove unreachable blocks before attempting to apply patterns. We may have to do this also whenever the CFG is changed by a pattern, it is left up for future work right now. Fixes #153732	2025-08-18 10:59:43 +02:00

1 2 3 4 5 ...

17998 Commits