23848 Commits

Author SHA1 Message Date
Tim Gymnich
ffaba758fb
[MLIR][ROCDL] Add permlane16.swap and permanlane32.swap (#153804)
add rocdl.permlane16.swap and rocdl.permanlane32.swap
2025-08-15 17:35:31 +02:00
Kazu Hirata
f4bc3151bb [mlir] Fix warnings
This patch fixes:

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp:82:1: error: unused
  variable 'wasmSectionName<(anonymous
  namespace)::WasmSectionType::DATACOUNT>'
  [-Werror,-Wunused-const-variable]

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp💯5: error: unused
  variable 'valueTypesEncodings' [-Werror,-Wunused-const-variable]

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp:735:13: error: unused
  function 'buildLiteralType<unsigned int>'
  [-Werror,-Wunused-function]

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp:740:13: error: unused
  function 'buildLiteralType<unsigned long>'
  [-Werror,-Wunused-function]

  mlir/lib/Target/Wasm/TranslateFromWasm.cpp:292:33: error: private
  field 'symbols' is not used [-Werror,-Wunused-private-field]
2025-08-15 07:24:31 -07:00
Guray Ozen
4c389178ee
[MLIR][NVVM] Print readable modifer (NFC) (#153779)
Currently, modifier is printed as address, so it is not readable and not
useful. This PR adds readable printing for it.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-15 15:47:39 +02:00
Guray Ozen
af92cabdef
[MLIR][NVVM] Combine griddepcontrol Ops (#152525)
We've 2 ops:
1. nvvm.griddepcontrol.wait
2. nvvm.griddepcontrol.launch_dependents

They are related to Grid Dependent Launch (or programmatic dependent
launch in CUDA) and same concept. This PR unifies both ops into a single
one.
2025-08-15 15:47:12 +02:00
Erick Ochoa Lopez
61caab7789
[mlir][llvm] Add align attribute to llvm.intr.masked.{expandload,compressstore} (#153063)
* Add `requiresArgsAndResultsAttr` to `LLVM_OneResultIntrOp`
* Add `args_attrs` to `llvm.intr.masked.{expandload,compressstore}`

The LLVM intrinsics
[`llvm.intr.masked.expandload`](https://llvm.org/docs/LangRef.html#llvm-masked-expandload-intrinsics)
and
[`llvm.intr.masked.compressstore`](https://llvm.org/docs/LangRef.html#llvm-masked-compressstore-intrinsics)
both allow an optional align parameter attribute to be set which
defaults to one.

Inlining the documentation below for [`llvm.intr.masked.expandload` 's
](https://llvm.org/docs/LangRef.html#id1522) and
[`llvm.intr.masked.compressstore`'s](https://llvm.org/docs/LangRef.html#id1522)
arguments respectively

> The `align` parameter attribute can be provided for the first
argument. The pointer alignment defaults to 1.

> The `align` parameter attribute can be provided for the second
argument. The pointer alignment defaults to 1.
2025-08-15 08:34:14 -04:00
Mehdi Amini
69453d7021
[MLIR] Fix memory leak in importWebAssemblyToModule when it fails to import (#153794) 2025-08-15 12:33:25 +00:00
Mehdi Amini
7640645f79
[MLIR][Wasm] Remove statistics as they depend on global ctors (#153795)
Use a debug log instead for now.
2025-08-15 12:29:20 +00:00
Markus Böck
8582025f1f
[mlir][Transforms] Turn 1:N -> 1:1 dispatch fatal error into match failure (#153605)
Prior to this PR, the default behaviour of a conversion pattern which
receives operands of a 1:N is to abort the compilation. This has
historically been useful when the 1:N type conversion got merged into
the dialect conversion as it allowed us to easily find patterns that
should be capable of handling 1:N type conversions but didn't.

However, this behaviour has the disadvantage of being non-composable:
While the pattern in question cannot handle the 1:N type conversion,
another pattern part of the set might, but doesn't get the chance as
compilation is aborted.

This PR fixes this behaviour by failing to match and instead of
aborting, giving other patterns the chance to legalize an op. The
implementation uses a reusable function called `dispatchTo1To1` to allow
derived conversion patterns to also implement the behaviour.
2025-08-15 11:45:25 +02:00
Matthias Springer
21b607adbe
[mlir][SCF] scf.for: Add support for unsigned integer comparison (#153379)
Add a new unit attribute to allow for unsigned integer comparison.

Example:
```mlir
scf.for unsigned %iv_32 = %lb_32 to %ub_32 step %step_32 : i32 {
  // body
}
```

Discussion:
https://discourse.llvm.org/t/scf-should-scf-for-support-unsigned-comparison/84655
2025-08-15 10:59:14 +02:00
Ferdinand Lemaire
6bb8f6f2d0
[MLIR][WASM] Introduce an importer for Wasm binaries (#152131)
First step in introducing the wasm-import target to mlir-translate. 
This is the first PR to introduce the pass, with this PR, there is very
little support for the actual WebAssembly language, it's mostly there to
introduce the skeleton of the importer. A follow-up will come with
support for a wider range of operators. It was split to make it easier
to review, since it's a good chunk of work.

---------

Co-authored-by: Luc Forget <dev@alias.lforget.fr>
Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
2025-08-15 10:54:40 +02:00
Chenguang Wang
3f797a8342
[mlir][spirv] Add missing #include in SPIRVImageInterfaces.h (#153727)
SPIRVImageInterfaces.h.inc uses some types, e.g. mlir::TypedValue,
without #include the necessary headers. This is fine most of the time,
but we did run into a weird case where bazel fails to compile
//mlir:SPIRVImageInterfaces on clang19 for ChromiumOS when parse_headers
(see [1]) is specified.

[1]: https://bazel.build/docs/bazel-and-cpp#toolchain-features
2025-08-14 19:07:54 -07:00
Erich Keane
e5e3e4bdb5
[OpenACC] Add firstprivate recipe helper methods to ACC dialect (#153604)
Like we did for the 'private' clause, this adds an easier to use helper
function to add the 'firstprivate' clause + recipe to the Parallel and
Serial ops.
2025-08-14 13:07:59 -07:00
Jianhui Li
98728d9dc8
[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter (#152429)
Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
  1. compute Strides;
  2. compute Offsets;
  3. collapseMemrefTo1D;
  4. create Load gather or store_scatter op
2025-08-14 11:27:07 -07:00
Boyana Norris
ada191136b
[mlir][cmake] Fix mlir target export (#153341)
In https://github.com/llvm/llvm-project/pull/152195, target export was
accidentally moved inside a conditional, but it should have been left
outside. This patch undoes that change.
2025-08-14 11:24:44 -06:00
Matthias Springer
e2ae634cc1
[mlir][LLVM][NFC] Simplify copyUnrankedDescriptors (#153597)
Split the function into two: one that copies a single unranked
descriptor and one that copies multiple unranked descriptors. This is in
preparation of adding 1:N support to the Func->LLVM lowering patterns.
2025-08-14 18:25:19 +02:00
Boyana Norris
1945753700
[mlir][linalg] Fix incorrect linalg short form printing (#153219)
Both `linalg.map` and `linalg.reduce` are sometimes printed in short
form incorrectly, resulting in a round-trip output with different
semantics. This patch adds additional `yield` operand checks to ensure
that all criteria for short-form printing are satisfied. Updated/added
comments and renamed the `findPayloadOp` function to `canUseShortForm`,
which more accurately reflects its purpose. A couple of new lit tests
check for the proper use of long form when short-form conditions are not
met.

Fixes #117528
2025-08-14 17:19:16 +01:00
Renato Golin
8cc22ee674
[MLIR][Maintainers] Add maintainer list for core sub-categories (#152136)
Ref: https://discourse.llvm.org/t/mlir-project-maintainers/87189

See also:
 * #151721 
 * #150945

Compared to the original proposal, one change is included:
* The `ub` dialect has @Hardcode84 as maintainer.

Please accept to validate your nomination, let's keep new nominations
for follow up PRs.
2025-08-14 16:08:15 +01:00
Matthias Springer
0ff92fe2f0
[mlir][LLVM][NFC] Simplify computeSizes function (#153588)
Rename `computeSizes` to `computeSize` and make it compute just a single
size. This is in preparation of adding 1:N support to the Func->LLVM
lowering patterns.
2025-08-14 17:00:03 +02:00
Jaden Angella
bfda0e777d
[mlir][EmitC] Expand the MemRefToEmitC pass - Lowering CopyOp (#151206)
This patch lowers `memref.copy` to `emitc.call_opaque "memcpy"`.
From:
```
func.func @copying(%arg0 : memref<9x4x5x7xf32>, %arg1 : memref<9x4x5x7xf32>) {
  memref.copy %arg0, %arg1 : memref<9x4x5x7xf32> to memref<9x4x5x7xf32>
  return
}
```
To:
```cpp
#include <cstring>
void copying(float v1[9][4][5][7], float v2[9][4][5][7]) {
  size_t v3 = 0;
  float* v4 = &v2[v3][v3][v3][v3];
  float* v5 = &v1[v3][v3][v3][v3];
  size_t v6 = sizeof(float);
  size_t v7 = 1260;
  size_t v8 = v6 * v7;
  memcpy(v5, v4, v8);
  return;
}
```
2025-08-14 05:25:55 -07:00
lonely eagle
6d08a39eeb
[mlir][nvgpu] Add tma last dim bytes check (#153451)
Add the check the number of bytes in the last dimension of Tma must be a
multiple of 16.
2025-08-14 20:14:20 +08:00
Igor Wodiany
87de48d11f
[mlir][spirv] Add spirv validation for module.mlir target test (#153227)
Creating this patch as an example on using the new `mlir-translate`
flag. Eventually all tests will be updated to validate SPIR-V modules.
2025-08-14 12:45:55 +01:00
Andrzej Warzyński
8d4f3171fa
[mlir][linalg] Fix UnPackOp::getTiledOuterDims (#152960)
Fixes `getTiledOuterDims` by making sure that the `outer_dims_perm`
attribute from `linalg.unpack` is taken into account.

Fixes #152037
2025-08-14 11:39:50 +01:00
Ege Beysel
8de85e753f
[mlir][linalg] Add support for scalable vectorization of linalg.batch_mmt4d (#152984)
This PR builds upon the previous #146531 and enables scalable
vectorization for `batch_mmt4d` as well.

---------

Signed-off-by: Ege Beysel <beyselege@gmail.com>
2025-08-14 11:47:51 +02:00
Jordan Rupprecht
1d55b70ec3
[MLIR][GPU][XeVM] Add missing #include for standalone header build (#153532)
This header uses GPUModuleOp but does not directly include the header:
`error: no type named 'GPUModuleOp' in namespace 'mlir::gpu'; did you
mean 'ModuleOp'?`

Needed for #148286
2025-08-14 04:13:41 +00:00
Sayan Saha
8432f24831
[mlir][tosa] Don't fold mul with zero lhs/rhs if resulting type is dynamic (#153420)
Canonicalizing the following IR:

```
func.func @mul_zero_dynamic_nofold(%arg0: tensor<?x17xf32>) -> tensor<?x17xf32> {
  %0 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x1xf32>}> : () -> tensor<1x1xf32>
  %1 = "tosa.const"() <{values = dense<0> : tensor<1xi8>}> : () -> tensor<1xi8>
  %2 = tosa.mul %arg0, %0, %1 : (tensor<?x17xf32>, tensor<1x1xf32>, tensor<1xi8>) -> tensor<?x17xf32>
  return %2 : tensor<?x17xf32>
}
```

resulted in a crash

```
#0 0x000056513187e8db backtrace (./build-release/bin/mlir-opt+0x9d698db)                                                                                                                                                                                                                                                                                                                   
 #1 0x0000565131b17737 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:838:8                                                                                                                                                                                                                
 #2 0x0000565131b187f3 PrintStackTraceSignalHandler(void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:918:1                                                                                                                                                                                                                                
 #3 0x0000565131b18c30 llvm::sys::RunSignalHandlers() /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Signals.cpp:105:18                                                                                                                                                                                                                                         
 #4 0x0000565131b18c30 SignalHandler(int, siginfo_t*, void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:409:3                                                                                                                                                                                                                              
 #5 0x00007f2e4165b050 (/lib/x86_64-linux-gnu/libc.so.6+0x3c050)                                                                                                                                                                                                                                                                                                                            
 #6 0x00007f2e416a9eec __pthread_kill_implementation ./nptl/pthread_kill.c:44:76                                                                                                                                                                                                                                                                                                            
 #7 0x00007f2e4165afb2 raise ./signal/../sysdeps/posix/raise.c:27:6                                                                                                                                                                                                                                                                                                                         
 #8 0x00007f2e41645472 abort ./stdlib/abort.c:81:7                                                                                                                                                                                                                                                                                                                                          
 #9 0x00007f2e41645395 _nl_load_domain ./intl/loadmsgcat.c:1177:9                                                                                                                                                                                                                                                                                                                           
#10 0x00007f2e41653ec2 (/lib/x86_64-linux-gnu/libc.so.6+0x34ec2)                                                                                                                                                                                                                                                                                                                            
#11 0x00005651443ec4ba mlir::DenseIntOrFPElementsAttr::getRaw(mlir::ShapedType, llvm::ArrayRef<char>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:1361:3                                                                                                                                                                                    
#12 0x00005651443f1209 mlir::DenseElementsAttr::resizeSplat(mlir::ShapedType) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:0:10                                                                                                                                                                                                              
#13 0x000056513f76f2b6 mlir::tosa::MulOp::fold(mlir::tosa::MulOpGenericAdaptor<llvm::ArrayRef<mlir::Attribute>>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp:0:0
```

from the folder for `tosa::mul` since the zero value was being reshaped
to `?x17` size which isn't supported. AFAIK, `tosa.const` requires all
dimensions to be static. So in this case, the fix is to not to fold the
op.
2025-08-13 19:45:06 -04:00
Sang Ik Lee
9f953fa62f
[MLIR] XeVM Target: Add missing SPIR-V backend dependency libraries. (#153505)
Adding missing dependency SPIRVDesc, SPIRVInfo 
Fixes post commit build issue with #148286
2025-08-13 16:03:16 -07:00
Krzysztof Drewniak
bbe3d64b39
[mlir][ROCDL] Annotate lane ID functions with noundef, ranges (#151396)
Now that we have general support for setting argument and result
attributes on LLVM intrinsics, extend the definitions of mbcnt.lo and
mbcnt.hi to carry such attributes. With that, update the construction of
the mbcnt.lo/mbcnt.hi calls used to get the lane ID to be `noundef`
(since the lane ID is always defined) and to be annotated with the
correct ranges (so that generic LLVM passes can correctly optimized
based on the fact that there are never more than 32/64 lanes).

(Also, handle a pattern that wasn't using getLaneId() and get rid of a
dead argument)
2025-08-13 17:44:03 -05:00
Nishant Patel
af87214b84
[MLIR][XeGPU] Add pattern for arith.constant for wg to sg distribution (#151977) 2025-08-13 13:52:07 -07:00
James Newling
2796336152
[mlir][vector] Improve vector.gather description (#153278)
Improve/elaborate example describing semantics
2025-08-13 13:50:06 -07:00
Sang Ik Lee
baae949f19
[MLIR][GPU][XeVM] Add XeVM target and XeVM dialect integration tests. (#148286)
As part of XeVM dialect upsteaming, covers remaining parts required for XeVM dialect integration and testing.
It has two high level components
- XeVM target and serialization support
- XeVM dialect integration tests using level zero runtime

Co-Authored-by: Artem Kroviakov <artem.kroviakov@intel.com>
2025-08-13 13:17:10 -07:00
Mehdi Amini
bfd490e0cd
Revert "[MLIR] Split ExecutionEngine Initialization out of ctor into an explicit method call" (#153477)
Reverts llvm/llvm-project#153373

Sanitizer bot is broken
2025-08-13 19:43:04 +00:00
Matthias Springer
c888addc9f
[mlir][Transforms] Fix build (#153447)
Fix build after #151865.
2025-08-13 18:21:45 +02:00
Igor Wodiany
d4045a448d
[mlir][spirv] Add .spv extension to validation files (#153440) 2025-08-13 16:13:06 +00:00
Matthias Springer
7e7c9d975e
[mlir][Transforms] Dialect Conversion Driver without Rollback (#151865)
This commit improves the `allowPatternRollback` flag handling in the
dialect conversion driver. Previously, this flag was used to merely
detect cases that are incompatible with the new One-Shot Dialect
Conversion driver. This commit implements the driver itself: when the
flag is set to "false", all IR changes are materialized immediately,
bypassing the `IRRewrite` and `ConversionValueMapping` infrastructure.

A few selected test cases now run with both the old and the new driver.

RFC:
https://discourse.llvm.org/t/rfc-a-new-one-shot-dialect-conversion-driver/79083
2025-08-13 17:40:55 +02:00
Mohammadreza Ameri Mahabadian
187f2967df
[mlir][spirv] Conditionally add SPV_KHR_non_semantic_info extension u… (#152686)
…pon serialization

If serialization option `emitDebugInfo` is enabled, then it is required
to serialize `SPV_KHR_non_semantic_info` extension provided that it is
available in the target environment.

---------

Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
2025-08-13 11:33:10 -04:00
Shenghang Tsai
2f93693f76
[MLIR] Split ExecutionEngine Initialization out of ctor into an explicit method call (#153373)
This PR introduces a mechanism to defer JIT engine initialization,
enabling registration of required symbols before global constructor
execution.

## Problem

Modules containing `gpu.module` generate global constructors (e.g.,
kernel load/unload) that execute *during* engine creation. This can
force premature symbol resolution, causing failures when:
- Symbols are registered via `mlirExecutionEngineRegisterSymbol` *after*
creation
- Global constructors exist (even if not directly using unresolved
symbols, e.g., an external function declaration)
   - GPU modules introduce mandatory binary loading logic

## Usage
```c
// Create engine without initialization
MlirExecutionEngine jit = mlirExecutionEngineCreate(...);

// Register required symbols
mlirExecutionEngineRegisterSymbol(jit, ...);

// Explicitly initialize (runs global constructors)
mlirExecutionEngineInitialize(jit);
```

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2025-08-13 15:22:01 +02:00
Matthias Springer
2fcdabaf39
[mlir][DialectUtils] Fix div by zero crash (#153380) 2025-08-13 13:38:57 +02:00
Baz
e141da8a62
[mlir][ExecutionEngine] fix default free function in OwningMemRef. (#153133)
`basePtr` should be freed instead of `data` because it is the one which
is storing the output of `malloc`. In `allocAligned()`, the `data` is
malloced and then assigned to `basePtr`.
2025-08-13 10:23:04 +00:00
Adam Siemieniuk
7d1b9cad87
[mlir][amx] Vector to AMX conversion pass (#151121)
Adds a pass for Vector to AMX operation conversion.

Initially, a direct rewrite for vector contraction in packed VNNI layout
is supported. Operations are expected to already be in shapes which are
AMX-compatible for the rewriting to occur.
2025-08-13 11:08:52 +02:00
Longsheng Mou
2edee0bc79
[mlir][gpu] Support outlining nested gpu.launch (#152696)
This PR fixes a crash in `GpuKernelOutliningPass` that occurred when
encountering a symbol that was not a `FlatSymbolRefAttr`, enabling
outlining of nested `gpu.launch` operations. Fixes #149318.
2025-08-13 11:42:52 +08:00
Maksim Levental
2b842e5600
[mlir][python] fix PyThreadState_GetFrame again (#153333)
add more APIs missing from 3.8 (fix rocm builder)
2025-08-12 21:29:23 -05:00
Maksim Levental
9df846bf71
[mlir][python] fix PyThreadState_GetFrame (#153325)
`PyThreadState_GetFrame` wasn't added until 3.9 (fixes currently failing
rocm builder)
2025-08-13 01:16:04 +00:00
Maksim Levental
a40f47c972
[mlir][python] automatic location inference (#151246)
This PR implements "automatic" location inference in the bindings. The
way it works is it walks the frame stack collecting source locations
(Python captures these in the frame itself). It is inspired by JAX's
[implementation](523ddcfbca/jax/_src/interpreters/mlir.py (L462))
but moves the frame stack traversal into the bindings for better
performance.

The system supports registering "included" and "excluded" filenames;
frames originating from functions in included filenames **will not** be
filtered and frames originating from functions in excluded filenames
**will** be filtered (in that order). This allows excluding all the
generated `*_ops_gen.py` files.

The system is also "toggleable" and off by default to save people who
have their own systems (such as JAX) from the added cost.

Note, the system stores the entire stacktrace (subject to
`locTracebackFramesLimit`) in the `Location` using specifically a
`CallSiteLoc`. This can be useful for profiling tools (flamegraphs
etc.).

Shoutout to the folks at JAX for coming up with a good system.

---------

Co-authored-by: Jacques Pienaar <jpienaar@google.com>
2025-08-12 16:59:59 -05:00
Nick Smith
21473462f7
[MLIR][Python] MLIR Enum Python bindings infinite recursion (#151584) (#151588)
Fixes an infinite recursion bug when using I32BitEnumAttrCaseGroup with
python bindings.

For more info, see issue:
- https://github.com/llvm/llvm-project/issues/151584
2025-08-12 14:27:05 -04:00
modiking
38d854c6e8
[MLIR][NVVM] Update MLIR mapa to reflect new address space (#146031)
The mapa.shared.cluster variant that takes in address-space 3 now should
output address-space 7. This patch updates the NVVMOps.td file to reflect this.
2025-08-12 21:43:51 +05:30
Gao Yanfeng
24f5385a85
[MLIR][NVVM] Support generating all the ldmatrix intrinsics from NVVM ops (#148783)
Previously, the NVVM dialect's ldmatrix operation could only generate a
limited subset of the available NVVM ldmatrix intrinsics. The intrinsics
generating new ops introduced in BlackWell are not accessible through
the NVVM ops. This commit extends the ldmatrix operation to support all
available ldmatrix intrinsics.
2025-08-12 15:13:15 +01:00
Akash Banerjee
e1a694cd16 [NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls 2025-08-12 15:06:03 +01:00
Akash Banerjee
c1f410779a Revert "[NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls"
This reverts commit b8104fa320f006bacd3e16afb431b5980dd5000a.
2025-08-12 14:18:57 +01:00
Matthias Springer
ef2b8805bf
[mlir][vector] Implement InferTypeOpInterface on vector.to_elements (#153172)
Just for convenience. This auto-generates an additional builder that
infers the result type.
2025-08-12 15:15:30 +02:00
Akash Banerjee
b8104fa320 [NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls 2025-08-12 14:05:00 +01:00