Currently, modifier is printed as address, so it is not readable and not
useful. This PR adds readable printing for it.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
We've 2 ops:
1. nvvm.griddepcontrol.wait
2. nvvm.griddepcontrol.launch_dependents
They are related to Grid Dependent Launch (or programmatic dependent
launch in CUDA) and same concept. This PR unifies both ops into a single
one.
* Add `requiresArgsAndResultsAttr` to `LLVM_OneResultIntrOp`
* Add `args_attrs` to `llvm.intr.masked.{expandload,compressstore}`
The LLVM intrinsics
[`llvm.intr.masked.expandload`](https://llvm.org/docs/LangRef.html#llvm-masked-expandload-intrinsics)
and
[`llvm.intr.masked.compressstore`](https://llvm.org/docs/LangRef.html#llvm-masked-compressstore-intrinsics)
both allow an optional align parameter attribute to be set which
defaults to one.
Inlining the documentation below for [`llvm.intr.masked.expandload` 's
](https://llvm.org/docs/LangRef.html#id1522) and
[`llvm.intr.masked.compressstore`'s](https://llvm.org/docs/LangRef.html#id1522)
arguments respectively
> The `align` parameter attribute can be provided for the first
argument. The pointer alignment defaults to 1.
> The `align` parameter attribute can be provided for the second
argument. The pointer alignment defaults to 1.
Prior to this PR, the default behaviour of a conversion pattern which
receives operands of a 1:N is to abort the compilation. This has
historically been useful when the 1:N type conversion got merged into
the dialect conversion as it allowed us to easily find patterns that
should be capable of handling 1:N type conversions but didn't.
However, this behaviour has the disadvantage of being non-composable:
While the pattern in question cannot handle the 1:N type conversion,
another pattern part of the set might, but doesn't get the chance as
compilation is aborted.
This PR fixes this behaviour by failing to match and instead of
aborting, giving other patterns the chance to legalize an op. The
implementation uses a reusable function called `dispatchTo1To1` to allow
derived conversion patterns to also implement the behaviour.
First step in introducing the wasm-import target to mlir-translate.
This is the first PR to introduce the pass, with this PR, there is very
little support for the actual WebAssembly language, it's mostly there to
introduce the skeleton of the importer. A follow-up will come with
support for a wider range of operators. It was split to make it easier
to review, since it's a good chunk of work.
---------
Co-authored-by: Luc Forget <dev@alias.lforget.fr>
Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
SPIRVImageInterfaces.h.inc uses some types, e.g. mlir::TypedValue,
without #include the necessary headers. This is fine most of the time,
but we did run into a weird case where bazel fails to compile
//mlir:SPIRVImageInterfaces on clang19 for ChromiumOS when parse_headers
(see [1]) is specified.
[1]: https://bazel.build/docs/bazel-and-cpp#toolchain-features
Like we did for the 'private' clause, this adds an easier to use helper
function to add the 'firstprivate' clause + recipe to the Parallel and
Serial ops.
Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
1. compute Strides;
2. compute Offsets;
3. collapseMemrefTo1D;
4. create Load gather or store_scatter op
Split the function into two: one that copies a single unranked
descriptor and one that copies multiple unranked descriptors. This is in
preparation of adding 1:N support to the Func->LLVM lowering patterns.
Both `linalg.map` and `linalg.reduce` are sometimes printed in short
form incorrectly, resulting in a round-trip output with different
semantics. This patch adds additional `yield` operand checks to ensure
that all criteria for short-form printing are satisfied. Updated/added
comments and renamed the `findPayloadOp` function to `canUseShortForm`,
which more accurately reflects its purpose. A couple of new lit tests
check for the proper use of long form when short-form conditions are not
met.
Fixes#117528
Ref: https://discourse.llvm.org/t/mlir-project-maintainers/87189
See also:
* #151721
* #150945
Compared to the original proposal, one change is included:
* The `ub` dialect has @Hardcode84 as maintainer.
Please accept to validate your nomination, let's keep new nominations
for follow up PRs.
Rename `computeSizes` to `computeSize` and make it compute just a single
size. This is in preparation of adding 1:N support to the Func->LLVM
lowering patterns.
This PR builds upon the previous #146531 and enables scalable
vectorization for `batch_mmt4d` as well.
---------
Signed-off-by: Ege Beysel <beyselege@gmail.com>
This header uses GPUModuleOp but does not directly include the header:
`error: no type named 'GPUModuleOp' in namespace 'mlir::gpu'; did you
mean 'ModuleOp'?`
Needed for #148286
Now that we have general support for setting argument and result
attributes on LLVM intrinsics, extend the definitions of mbcnt.lo and
mbcnt.hi to carry such attributes. With that, update the construction of
the mbcnt.lo/mbcnt.hi calls used to get the lane ID to be `noundef`
(since the lane ID is always defined) and to be annotated with the
correct ranges (so that generic LLVM passes can correctly optimized
based on the fact that there are never more than 32/64 lanes).
(Also, handle a pattern that wasn't using getLaneId() and get rid of a
dead argument)
As part of XeVM dialect upsteaming, covers remaining parts required for XeVM dialect integration and testing.
It has two high level components
- XeVM target and serialization support
- XeVM dialect integration tests using level zero runtime
Co-Authored-by: Artem Kroviakov <artem.kroviakov@intel.com>
This commit improves the `allowPatternRollback` flag handling in the
dialect conversion driver. Previously, this flag was used to merely
detect cases that are incompatible with the new One-Shot Dialect
Conversion driver. This commit implements the driver itself: when the
flag is set to "false", all IR changes are materialized immediately,
bypassing the `IRRewrite` and `ConversionValueMapping` infrastructure.
A few selected test cases now run with both the old and the new driver.
RFC:
https://discourse.llvm.org/t/rfc-a-new-one-shot-dialect-conversion-driver/79083
…pon serialization
If serialization option `emitDebugInfo` is enabled, then it is required
to serialize `SPV_KHR_non_semantic_info` extension provided that it is
available in the target environment.
---------
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
This PR introduces a mechanism to defer JIT engine initialization,
enabling registration of required symbols before global constructor
execution.
## Problem
Modules containing `gpu.module` generate global constructors (e.g.,
kernel load/unload) that execute *during* engine creation. This can
force premature symbol resolution, causing failures when:
- Symbols are registered via `mlirExecutionEngineRegisterSymbol` *after*
creation
- Global constructors exist (even if not directly using unresolved
symbols, e.g., an external function declaration)
- GPU modules introduce mandatory binary loading logic
## Usage
```c
// Create engine without initialization
MlirExecutionEngine jit = mlirExecutionEngineCreate(...);
// Register required symbols
mlirExecutionEngineRegisterSymbol(jit, ...);
// Explicitly initialize (runs global constructors)
mlirExecutionEngineInitialize(jit);
```
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
`basePtr` should be freed instead of `data` because it is the one which
is storing the output of `malloc`. In `allocAligned()`, the `data` is
malloced and then assigned to `basePtr`.
Adds a pass for Vector to AMX operation conversion.
Initially, a direct rewrite for vector contraction in packed VNNI layout
is supported. Operations are expected to already be in shapes which are
AMX-compatible for the rewriting to occur.
This PR fixes a crash in `GpuKernelOutliningPass` that occurred when
encountering a symbol that was not a `FlatSymbolRefAttr`, enabling
outlining of nested `gpu.launch` operations. Fixes#149318.
This PR implements "automatic" location inference in the bindings. The
way it works is it walks the frame stack collecting source locations
(Python captures these in the frame itself). It is inspired by JAX's
[implementation](523ddcfbca/jax/_src/interpreters/mlir.py (L462))
but moves the frame stack traversal into the bindings for better
performance.
The system supports registering "included" and "excluded" filenames;
frames originating from functions in included filenames **will not** be
filtered and frames originating from functions in excluded filenames
**will** be filtered (in that order). This allows excluding all the
generated `*_ops_gen.py` files.
The system is also "toggleable" and off by default to save people who
have their own systems (such as JAX) from the added cost.
Note, the system stores the entire stacktrace (subject to
`locTracebackFramesLimit`) in the `Location` using specifically a
`CallSiteLoc`. This can be useful for profiling tools (flamegraphs
etc.).
Shoutout to the folks at JAX for coming up with a good system.
---------
Co-authored-by: Jacques Pienaar <jpienaar@google.com>
The mapa.shared.cluster variant that takes in address-space 3 now should
output address-space 7. This patch updates the NVVMOps.td file to reflect this.
Previously, the NVVM dialect's ldmatrix operation could only generate a
limited subset of the available NVVM ldmatrix intrinsics. The intrinsics
generating new ops introduced in BlackWell are not accessible through
the NVVM ops. This commit extends the ldmatrix operation to support all
available ldmatrix intrinsics.