llvm-project

Author	SHA1	Message	Date
Mehdi Amini	dbbd3f0d07	[MLIR] Adopt LDBG() macro in Affine/Analysis/Utils.cpp (NFC) (#154626 )	2025-08-20 21:56:03 +00:00
Mehdi Amini	d20a74e631	[MLIR] Adopt LDBG() macro in BasicPtxBuilderInterface.cpp (NFC) (#154625 )	2025-08-20 21:51:17 +00:00
Mehdi Amini	4be19e27b5	[MLIR] Adopt LDBG() debug macros in Affine LoopAnalysis.cpp (NFC) (#154621 )	2025-08-20 21:45:42 +00:00
Mehdi Amini	5683baea6d	[MLIR] Adopt LDBG() debug macro in bufferization (NFC) (#154614 )	2025-08-20 21:14:02 +00:00
Rolf Morel	cbfa265e98	[MLIR][LLVMIR][DLTI] Add `LLVM::TargetAttrInterface` and `#llvm.target` attr (#145899 ) Adds the `#llvm.target<triple = $TRIPLE, chip = $CHIP, features = $FEATURES>` attribute and along with a `-llvm-target-to-data-layout` pass to derive a MLIR data layout from the LLVM data layout string (using the existing `DataLayoutImporter`). The attribute implements the relevant DLTI-interfaces, to expose the `triple`, `chip` (AKA `cpu`) and `features` on `#llvm.target` and the full `DataLayoutSpecInterface`. The pass combines the generated `#dlti.dl_spec` with an existing `dl_spec` in case one is already present, e.g. a `dl_spec` which is there to specify size of the `index` type. Adds a `TargetAttrInterface` which can be implemented by all attributes representing LLVM targets. Similar to the Draft PR https://github.com/llvm/llvm-project/pull/78073. RFC on which this PR is based: https://discourse.llvm.org/t/mandatory-data-layout-in-the-llvm-dialect/85875	2025-08-20 22:00:30 +01:00
Matthias Springer	0499d3a8cf	[mlir][Interfaces] Add `hasUnknownEffects` helper function (#154523 ) I have seen misuse of the `hasEffect` API in downstream projects: users sometimes think that `hasEffect == false` indicates that the operation does not have a certain memory effect. That's not necessarily the case. When the op does not implement the `MemoryEffectsOpInterface`, it is unknown whether it has the specified effect. "false" can also mean "maybe". This commit clarifies the semantics in the documentation. Also adds `hasUnknownEffects` and `mightHaveEffect` convenience functions. Also simplifies a few call sites.	2025-08-20 15:24:53 +00:00
Ian Wood	961b052e98	[mlir][tensor][NFC] Refactor common methods for bubbling extract_slice op (#153675 ) Exposes the `tensor.extract_slice` reshaping logic in `BubbleUpExpandShapeThroughExtractSlice` and `BubbleUpCollapseShapeThroughExtractSlice` through two corresponding utility functions. These compute the offsets/sizes/strides of an extract slice after either collapsing or expanding. This should also make it easier to implement the two other bubbling cases: (1) the `collapse_shape` is a consumer or (2) the `expand_shape` is a consumer. --------- Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>	2025-08-19 19:31:30 +00:00
Kazu Hirata	2c4f0e7ac6	[mlir] Replace SmallSet with SmallPtrSet (NFC) (#154265 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 30 instances that rely on this "redirection". Since the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types. I'm planning to remove the redirection eventually.	2025-08-19 07:11:47 -07:00
Yang Bai	b4c31dc98d	[mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements (#142944 ) ## Description This change introduces a new canonicalization pattern for the MLIR Vector dialect that optimizes chains of insertions. The optimization identifies when a vector is completely initialized through a series of vector.insert operations and replaces the entire chain with a single `vector.from_elements` operation. Please be aware that the new pattern doesn't work for poison vectors where only some elements are set, as MLIR doesn't support partial poison vectors for now. New Pattern: InsertChainFullyInitialized * Detects chains of vector.insert operations. * Validates that all insertions are at static positions, and all intermediate insertions have only one use. * Ensures the entire vector is completely initialized. * Replaces the entire chain with a single vector.from_elementts operation. Refactored Helper Function * Extracted `calculateInsertPosition` from `foldDenseElementsAttrDestInsertOp` to avoid code duplication. ## Example ``` // Before: %v1 = vector.insert %c10, %v0[0] : i64 into vector<2xi64> %v2 = vector.insert %c20, %v1[1] : i64 into vector<2xi64> // After: %v2 = vector.from_elements %c10, %c20 : vector<2xi64> ``` It also works for multidimensional vectors. ``` // Before: %v1 = vector.insert %cv0, %v0[0] : vector<3xi64> into vector<2x3xi64> %v2 = vector.insert %cv1, %v1[1] : vector<3xi64> into vector<2x3xi64> // After: %0:3 = vector.to_elements %arg1 : vector<3xi64> %1:3 = vector.to_elements %arg2 : vector<3xi64> %v2 = vector.from_elements %0#0, %0#1, %0#2, %1#0, %1#1, %1#2 : vector<2x3xi64> ``` --------- Co-authored-by: Yang Bai <yangb@nvidia.com> Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>	2025-08-19 13:43:31 +01:00
Md Asghar Ahmad Shahid	c24c23d9ab	[NFC][mlir][vector] Handle potential static cast assertion. (#152957 ) In FoldArithToVectorOuterProduct pattern, static cast to vector type causes assertion when a scalar type was encountered. It seems the author meant to have a dyn_cast instead. This NFC patch handles it by using dyn_cast.	2025-08-19 09:27:20 +05:30
Charitha Saumya	9617ce4862	[vector][distribution] Bug fix in `moveRegionToNewWarpOpAndAppendReturns` (#153656 )	2025-08-18 13:26:08 -07:00
Yang Bai	4eb1a07d7d	[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175 ) This patch introduces a new unrolling-based approach for lowering multi-dimensional `vector.from_elements` operations. Implementation Details: 1. New Transform Pattern: Added `UnrollFromElements` that unrolls a N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the outermost dimension. 2. Utility Functions: Added `unrollVectorOp` to reuse the unroll algo of vector.gather for vector.from_elements. 3. Integration: Added the unrolling pattern to the convert-vector-to-llvm pass as a temporal transformation. 4. Use direct LLVM dialect operations instead of intermediate vector.insert operations for efficiency in `VectorFromElementsLowering`. Example: ```mlir // unroll %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32> %vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32> %vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32> %result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32> // convert-vector-to-llvm %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>> %poison_1d_0 = llvm.mlir.poison : vector<2xf32> %c0_0 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32> %c1_0 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32> %vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>> %poison_1d_1 = llvm.mlir.poison : vector<2xf32> %c0_1 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32> %c1_1 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32> %vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>> %result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32> ``` --------- Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com> Co-authored-by: Yang Bai <yangb@nvidia.com> Co-authored-by: James Newling <james.newling@gmail.com> Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>	2025-08-18 10:09:12 -07:00
Nishant Patel	4a9d038acd	[MLIR][XeGPU] Distribute load_nd/store_nd/prefetch_nd with offsets from Wg to Sg (#153432 ) This PR adds pattern to distribute the load/store/prefetch nd ops with offsets from workgroup to subgroup IR. This PR is part of the transition to move offsets from create_nd to load/store/prefetch nd ops. Create_nd PR : #152351	2025-08-18 09:45:29 -07:00
Jeremy Kun	c67d27dad0	[mlir][Presburger] NFC: return var index from IntegerRelation::addLocalFloorDiv (#153463 ) addLocalFloorDiv currently returns void and requires the caller to know that the newly added local variable is in a particular index. This commit returns the index of the newly added variable so that callers need not tie themselves to this implementation detail. I found one relevant callsite demonstrating this and updated it. I am using this API out of tree and wanted to make our out-of-tree code a bit more resilient to upstream changes.	2025-08-18 08:47:47 -07:00
Jacques Pienaar	4bf33958da	[mlir] Update builders to use new form. (#154132 ) Mechanically applied using clang-tidy.	2025-08-18 15:19:34 +00:00
Chaitanya	4a3bf27c69	[OpenMP] Introduce omp.target_allocmem and omp.target_freemem omp dialect ops. (#145464 ) This PR introduces two new ops in omp dialect, omp.target_allocmem and omp.target_freemem. omp.target_allocmem: Allocates heap memory on device. Will be lowered to omp_target_alloc call in llvm. omp.target_freemem: Deallocates heap memory on device. Will be lowered to omp+target_free call in llvm. Example: %1 = omp.target_allocmem %device : i32, i64 omp.target_freemem %device, %1 : i32, i64 The work in this PR is C-P/inspired from @ivanradanov commit from coexecute implementation: [Add fir omp target alloc and free ops](`be860ac8ba`) [Lower omp_target_{alloc,free} to llvm](`6e2d584dc9`)	2025-08-18 18:15:11 +05:30
Mehdi Amini	cfe5975eaf	[MLIR] Fix SCF verifier crash (#153974 ) An operand of the nested yield op can be null and hasn't been verified yet when processing the enclosing operation. Using `getResultTypes()` will dereference this null Value and crash in the verifier.	2025-08-18 12:48:55 +02:00
Guray Ozen	5d300afa80	[MLIR][NVVM] Add support for multiple return values in `inline_ptx` (#153774 ) This PR adds the ability for `nvvm.inline_ptx` to return multiple values, matching the expected semantics in PTX while respecting LLVM’s constraints. LLVM’s `inline_asm` op does not natively support multiple returns — instead, it requires packing results into an LLVM `struct` and then extracting them. This PR implements automatic packing/unpacking so that multiple return values can be expressed naturally in MLIR without extra user boilerplate. Example MLIR: ``` %r1, %r2 = nvvm.inline_ptx "{ .reg .pred p; setp.ge.s32 p, $2, $3; selp.s32 $0, $2, $3, p; selp.s32 $1, $2, $3, !p; }" (%a, %b) : i32, i32 -> i32, i32 %r3 = llvm.add %r1, %r2 : i32 ``` Lowered LLVM IR: ``` %1 = llvm.inline_asm has_side_effects asm_dialect = att "{\0A\09 .reg .pred p;\0A\09 setp.ge.s32 p, $2, $3;\0A\09 selp.s32 $0, $2, $3, p;\0A\09 selp.s32 $1, $2, $3, !p;\0A\09}\0A", "=r,=r,r,r" %a, %b : (i32, i32) -> !llvm.struct<(i32, i32)> %2 = llvm.extractvalue %1[0] : !llvm.struct<(i32, i32)> %3 = llvm.extractvalue %1[1] : !llvm.struct<(i32, i32)> %4 = llvm.add %2, %3 : i32 ```	2025-08-18 08:37:55 +02:00
Matthias Springer	0d8aa9d9ec	[mlir][SparseTensor] Simplify pipeline (#152908 ) This refactoring improves compilation time.	2025-08-16 18:45:26 +02:00
Chao Chen	9c4e571ae8	[mlir][xegpu] Add definitions of MemDescType and related ops. (#153273 )	2025-08-15 18:02:13 -05:00
Andrey Timonin	dfa1335db1	[mlir][emitc] Add verification for the emitc.get_field op (#152577 ) This MR adds a `verifier` for the `emitc.get_field` op. - The `verifier` checks that the `emitc.get_field` operation is nested inside an `emitc.class` op. - Additionally, appropriate tests for erroneous cases were added for class-related operations in `invalid_ops.mlir`.	2025-08-15 18:32:12 +02:00
Erick Ochoa Lopez	61caab7789	[mlir][llvm] Add `align` attribute to `llvm.intr.masked.{expandload,compressstore}` (#153063 ) * Add `requiresArgsAndResultsAttr` to `LLVM_OneResultIntrOp` * Add `args_attrs` to `llvm.intr.masked.{expandload,compressstore}` The LLVM intrinsics [`llvm.intr.masked.expandload`](https://llvm.org/docs/LangRef.html#llvm-masked-expandload-intrinsics) and [`llvm.intr.masked.compressstore`](https://llvm.org/docs/LangRef.html#llvm-masked-compressstore-intrinsics) both allow an optional align parameter attribute to be set which defaults to one. Inlining the documentation below for [`llvm.intr.masked.expandload` 's ](https://llvm.org/docs/LangRef.html#id1522) and [`llvm.intr.masked.compressstore`'s](https://llvm.org/docs/LangRef.html#id1522) arguments respectively > The `align` parameter attribute can be provided for the first argument. The pointer alignment defaults to 1. > The `align` parameter attribute can be provided for the second argument. The pointer alignment defaults to 1.	2025-08-15 08:34:14 -04:00
Matthias Springer	21b607adbe	[mlir][SCF] `scf.for`: Add support for unsigned integer comparison (#153379 ) Add a new unit attribute to allow for unsigned integer comparison. Example: ```mlir scf.for unsigned %iv_32 = %lb_32 to %ub_32 step %step_32 : i32 { // body } ``` Discussion: https://discourse.llvm.org/t/scf-should-scf-for-support-unsigned-comparison/84655	2025-08-15 10:59:14 +02:00
Erich Keane	e5e3e4bdb5	[OpenACC] Add firstprivate recipe helper methods to ACC dialect (#153604 ) Like we did for the 'private' clause, this adds an easier to use helper function to add the 'firstprivate' clause + recipe to the Parallel and Serial ops.	2025-08-14 13:07:59 -07:00
Jianhui Li	98728d9dc8	[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter (#152429 ) Lowering transfer_read/transfer_write to load_gather/store_scatter in case the target uArch doesn't support load_nd/store_nd. The high level steps: 1. compute Strides; 2. compute Offsets; 3. collapseMemrefTo1D; 4. create Load gather or store_scatter op	2025-08-14 11:27:07 -07:00
Boyana Norris	1945753700	[mlir][linalg] Fix incorrect linalg short form printing (#153219 ) Both `linalg.map` and `linalg.reduce` are sometimes printed in short form incorrectly, resulting in a round-trip output with different semantics. This patch adds additional `yield` operand checks to ensure that all criteria for short-form printing are satisfied. Updated/added comments and renamed the `findPayloadOp` function to `canUseShortForm`, which more accurately reflects its purpose. A couple of new lit tests check for the proper use of long form when short-form conditions are not met. Fixes #117528	2025-08-14 17:19:16 +01:00
lonely eagle	6d08a39eeb	[mlir][nvgpu] Add tma last dim bytes check (#153451 ) Add the check the number of bytes in the last dimension of Tma must be a multiple of 16.	2025-08-14 20:14:20 +08:00
Andrzej Warzyński	8d4f3171fa	[mlir][linalg] Fix UnPackOp::getTiledOuterDims (#152960 ) Fixes `getTiledOuterDims` by making sure that the `outer_dims_perm` attribute from `linalg.unpack` is taken into account. Fixes #152037	2025-08-14 11:39:50 +01:00
Ege Beysel	8de85e753f	[mlir][linalg] Add support for scalable vectorization of `linalg.batch_mmt4d` (#152984 ) This PR builds upon the previous #146531 and enables scalable vectorization for `batch_mmt4d` as well. --------- Signed-off-by: Ege Beysel <beyselege@gmail.com>	2025-08-14 11:47:51 +02:00
Sayan Saha	8432f24831	[mlir][tosa] Don't fold mul with zero lhs/rhs if resulting type is dynamic (#153420 ) Canonicalizing the following IR: ``` func.func @mul_zero_dynamic_nofold(%arg0: tensor<?x17xf32>) -> tensor<?x17xf32> { %0 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x1xf32>}> : () -> tensor<1x1xf32> %1 = "tosa.const"() <{values = dense<0> : tensor<1xi8>}> : () -> tensor<1xi8> %2 = tosa.mul %arg0, %0, %1 : (tensor<?x17xf32>, tensor<1x1xf32>, tensor<1xi8>) -> tensor<?x17xf32> return %2 : tensor<?x17xf32> } ``` resulted in a crash ``` #0 0x000056513187e8db backtrace (./build-release/bin/mlir-opt+0x9d698db) #1 0x0000565131b17737 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:838:8 #2 0x0000565131b187f3 PrintStackTraceSignalHandler(void) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:918:1 #3 0x0000565131b18c30 llvm::sys::RunSignalHandlers() /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Signals.cpp:105:18 #4 0x0000565131b18c30 SignalHandler(int, siginfo_t, void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:409:3 #5 0x00007f2e4165b050 (/lib/x86_64-linux-gnu/libc.so.6+0x3c050) #6 0x00007f2e416a9eec __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 #7 0x00007f2e4165afb2 raise ./signal/../sysdeps/posix/raise.c:27:6 #8 0x00007f2e41645472 abort ./stdlib/abort.c:81:7 #9 0x00007f2e41645395 _nl_load_domain ./intl/loadmsgcat.c:1177:9 #10 0x00007f2e41653ec2 (/lib/x86_64-linux-gnu/libc.so.6+0x34ec2) #11 0x00005651443ec4ba mlir::DenseIntOrFPElementsAttr::getRaw(mlir::ShapedType, llvm::ArrayRef<char>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:1361:3 #12 0x00005651443f1209 mlir::DenseElementsAttr::resizeSplat(mlir::ShapedType) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:0:10 #13 0x000056513f76f2b6 mlir::tosa::MulOp::fold(mlir::tosa::MulOpGenericAdaptor<llvm::ArrayRef<mlir::Attribute>>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp:0:0 ``` from the folder for `tosa::mul` since the zero value was being reshaped to `?x17` size which isn't supported. AFAIK, `tosa.const` requires all dimensions to be static. So in this case, the fix is to not to fold the op.	2025-08-13 19:45:06 -04:00
Nishant Patel	af87214b84	[MLIR][XeGPU] Add pattern for arith.constant for wg to sg distribution (#151977 )	2025-08-13 13:52:07 -07:00
Sang Ik Lee	baae949f19	[MLIR][GPU][XeVM] Add XeVM target and XeVM dialect integration tests. (#148286 ) As part of XeVM dialect upsteaming, covers remaining parts required for XeVM dialect integration and testing. It has two high level components - XeVM target and serialization support - XeVM dialect integration tests using level zero runtime Co-Authored-by: Artem Kroviakov <artem.kroviakov@intel.com>	2025-08-13 13:17:10 -07:00
Matthias Springer	7e7c9d975e	[mlir][Transforms] Dialect Conversion Driver without Rollback (#151865 ) This commit improves the `allowPatternRollback` flag handling in the dialect conversion driver. Previously, this flag was used to merely detect cases that are incompatible with the new One-Shot Dialect Conversion driver. This commit implements the driver itself: when the flag is set to "false", all IR changes are materialized immediately, bypassing the `IRRewrite` and `ConversionValueMapping` infrastructure. A few selected test cases now run with both the old and the new driver. RFC: https://discourse.llvm.org/t/rfc-a-new-one-shot-dialect-conversion-driver/79083	2025-08-13 17:40:55 +02:00
Matthias Springer	2fcdabaf39	[mlir][DialectUtils] Fix div by zero crash (#153380 )	2025-08-13 13:38:57 +02:00
Longsheng Mou	2edee0bc79	[mlir][gpu] Support outlining nested `gpu.launch` (#152696 ) This PR fixes a crash in `GpuKernelOutliningPass` that occurred when encountering a symbol that was not a `FlatSymbolRefAttr`, enabling outlining of nested `gpu.launch` operations. Fixes #149318.	2025-08-13 11:42:52 +08:00
Gao Yanfeng	24f5385a85	[MLIR][NVVM] Support generating all the ldmatrix intrinsics from NVVM ops (#148783 ) Previously, the NVVM dialect's ldmatrix operation could only generate a limited subset of the available NVVM ldmatrix intrinsics. The intrinsics generating new ops introduced in BlackWell are not accessible through the NVVM ops. This commit extends the ldmatrix operation to support all available ldmatrix intrinsics.	2025-08-12 15:13:15 +01:00
Matthias Springer	ef2b8805bf	[mlir][vector] Implement `InferTypeOpInterface` on `vector.to_elements` (#153172 ) Just for convenience. This auto-generates an additional builder that infers the result type.	2025-08-12 15:15:30 +02:00
Nishant Patel	88ff0f955c	[MLIR][XeGPU] Distribute create_nd_desc op without offset from Wg to Sg (#152351 ) This PR adds pattern to distribute the create_nd_desc op without offsets from workgroup (Wg) IR to subgroup (Sg) IR. The round robin distribution logic (involves offset calculation) now will happen in load/store/prefetch nd ops instead of create_nd.	2025-08-11 21:58:24 -07:00
Scott Manley	e72335192d	[Arith][MemRef] add AtomicRMWKind::xori to enum (#151701 ) Add missing xor AtomicRMWKind enum in arith. Also add support for xor to memref.atomic_rmw so the change can be tested. This does NOT add it for all users of the enum (e.g. Affine, Vector)	2025-08-11 08:46:06 -04:00
Renato Golin	d15280894b	[MLIR][Linalg] Remove matmul_transpose variants (#147961 ) Removes the `(batch_)matmul_transpose_{a\|b}` variants from OpDSL and replace it with `matmul affine_maps [...]` whenever appropriate. This is in line with the [plan](https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863), and can be done since #104783 merged. See: https://discourse.llvm.org/t/deprecate-batch-matmul-transpose-a-b-linalg-operations/87245 Issues investigated: * pad transform tests that could use `matmul` instead, so change to that. * ArmSME test using transpose actually needed it, so changed to `matmul` + affine maps. Arm tests validated by @banach-space (thanks!!).	2025-08-08 22:20:27 +01:00
Chao Chen	fab2b22ada	[mlir][xegpu] Remove OffsetSizeAndStrideOpInterface from CreateNdDescOp (#152773 )	2025-08-08 14:46:27 -05:00
Chao Chen	c96223434c	[mlir][xegpu] Add definition of SliceAttr (#150146 ) --------- Co-authored-by: Charitha Saumya <136391709+charithaintc@users.noreply.github.com>	2025-08-08 11:27:17 -05:00
Min-Yih Hsu	b4e8b8ee91	[mlir][vector] Canonicalize broadcast of shape_cast (#150523 ) Fold `broadcast(shape_cast(x))` into `broadcast(x)` if the type of x is compatible with broadcast's result type and the shape_cast only adds or removes ones in the leading dimensions. --------- Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com> Co-authored-by: James Newling <james.newling@gmail.com>	2025-08-08 09:25:32 -07:00
James Newling	b574bcf036	[mlir][TD] Support padding with poison (#152003 ) Signed-off-by: James Newling <james.newling@gmail.com>	2025-08-08 09:09:03 -07:00
Guray Ozen	76a533c8ec	[MLIR][NVVM] Add pmevent (#152509 ) Add nvvm.pmevent Op that Triggers one or more of a fixed number of performance monitor events, with event index or mask specified by immediate operand. [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-pmevent)	2025-08-08 16:34:18 +02:00
sebvince	8949dc7f9c	[mlir][amdgpu] fold memref.subview/expand_shape/collapse_shape into amdgpu.gather_to_lds for DST operand (#152277 )	2025-08-08 05:47:33 -07:00
Luc Forget	ed6cd8f195	[MLIR][WASM] Custom assembly format for if memory and table ops (#152668 ) Co-authored-by: Luc Forget <luc.forget@woven.toyota>	2025-08-08 11:15:43 +02:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Md Asghar Ahmad Shahid	856a8b5ef9	[mlir][linalg] Add mixed precision folding pattern in vectorize_children_and_apply_patterns TD Op (#148684 ) In case of mixed precision inputs, the inputs are generally casted to match output type thereby introduces arith.extFOp/extIOp instructions. Folding such pattern into vector.contract is desirable for HW having mixed precision ISA support. This patch adds folding of mixed precision pattern into vector.contract optionaly which can be enabled using attribute `fold_type_extensions_into_contract`.	2025-08-08 10:13:09 +05:30
Longsheng Mou	7d886fab74	[mlir][gpu] Update attribute definitions in `gpu::LaunchOp` (#152106 ) `gpu::LaunchOp` is updated the following way: - Change the attribute type of kernel function and module from `SymbolRefAttr` to `FlatSymbolRefAttr` to avoid nested symbol references. - Rename variables from camel case (kernelFunc, kernelModule) to lower case (function, module) and update the syntax. - `LaunchOp::build` support passing `module` and `function` attributes.	2025-08-08 11:43:21 +08:00

1 2 3 4 5 ...

10178 Commits