Retry landing https://github.com/llvm/llvm-project/pull/153373
## Major changes from previous attempt
- remove the test in CAPI because no existing tests in CAPI deal with
sanitizer exemptions
- update `mlir/docs/Dialects/GPU.md` to reflect the new behavior: load
GPU binary in global ctors, instead of loading them at call site.
- skip the test on Aarch64 since we have an issue with initialization there
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
This PR introduces a mechanism to defer JIT engine initialization,
enabling registration of required symbols before global constructor
execution.
## Problem
Modules containing `gpu.module` generate global constructors (e.g.,
kernel load/unload) that execute *during* engine creation. This can
force premature symbol resolution, causing failures when:
- Symbols are registered via `mlirExecutionEngineRegisterSymbol` *after*
creation
- Global constructors exist (even if not directly using unresolved
symbols, e.g., an external function declaration)
- GPU modules introduce mandatory binary loading logic
## Usage
```c
// Create engine without initialization
MlirExecutionEngine jit = mlirExecutionEngineCreate(...);
// Register required symbols
mlirExecutionEngineRegisterSymbol(jit, ...);
// Explicitly initialize (runs global constructors)
mlirExecutionEngineInitialize(jit);
```
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Reland https://github.com/llvm/llvm-project/pull/150805
Shared libs build was broken.
Add `${dialect_libs}` and `${conversion_libs}` to
`MLIRRegisterAllExtensions` because it depends on
`registerConvert***ToLLVMInterface` functions.
`InitAll***` functions are used by `opt`-style tools to init all MLIR
dialects/passes/extensions. Currently they are implemeted as inline
functions and include essentially the entire MLIR header tree. Each file
which includes this header (~10 currently) takes 10+ sec and multiple GB
of ram to compile (tested with clang-19), which limits amount of
parallel compiler jobs which can be run. Also, flang just includes this
file into one of its headers.
Move the actual registration code to the static library, so it's
compiled only once.
Discourse thread
https://discourse.llvm.org/t/rfc-moving-initall-implementation-into-static-library/87559
The motivation is to avoid having to negate `isDynamic*` checks, avoid
double negations, and allow for `ShapedType::isStaticDim` to be used in
ADT functions without having to wrap it in a lambda performing the
negation.
Also add the new functions to C and Python bindings.
This thread through proper error handling / reporting capabilities to
avoid hitting llvm_unreachable while parsing linalg ops.
Fixes#132755Fixes#132740Fixes#129185
This PR is mainly about exposing the python bindings for
`linalg::isaConvolutionOpInterface` and `linalg::inferConvolutionDims`.
---------
Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
This PR is mainly about exposing the python bindings for`
linalg::isaContractionOpInterface` and` linalg::inferContractionDims`.
---------
Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
I observed that we have the boundary comments in the codebase like:
```
//===----------------------------------------------------------------------===//
// ...
//===----------------------------------------------------------------------===//
```
I also observed that there are incomplete boundary comments. The
revision is generated by a script that completes the boundary comments.
```
//===----------------------------------------------------------------------===//
// ...
...
```
Signed-off-by: hanhanW <hanhan0912@gmail.com>
This is an implementation for [RFC: Supporting Sub-Channel Quantization
in
MLIR](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694).
In order to make the review process easier, the PR has been divided into
the following commit labels:
1. **Add implementation for sub-channel type:** Includes the class
design for `UniformQuantizedSubChannelType`, printer/parser and bytecode
read/write support. The existing types (per-tensor and per-axis) are
unaltered.
2. **Add implementation for sub-channel type:** Lowering of
`quant.qcast` and `quant.dcast` operations to Linalg operations.
3. **Adding C/Python Apis:** We first define he C-APIs and build the
Python-APIs on top of those.
4. **Add pass to normalize generic ....:** This pass normalizes
sub-channel quantized types to per-tensor per-axis types, if possible.
A design note:
- **Explicitly storing the `quantized_dimensions`, even when they can be
derived for ranked tensor.**
While it's possible to infer quantized dimensions from the static shape
of the scales (or zero-points) tensor for ranked
data tensors
([ref](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694/3)
for background), there are cases where this can lead to ambiguity and
issues with round-tripping.
```
Consider the example: tensor<2x4x!quant.uniform<i8:f32:{0:2, 0:2}, {{s00:z00, s01:z01}}>>
```
The shape of the scales tensor is [1, 2], which might suggest that only
axis 1 is quantized. While this inference is technically correct, as the
block size for axis 0 is a degenerate case (equal to the dimension
size), it can cause problems with round-tripping. Therefore, even for
ranked tensors, we are explicitly storing the quantized dimensions.
Suggestions welcome!
PS: I understand that the upcoming holidays may impact your schedule, so
please take your time with the review. There's no rush.
In some projects like JAX ir.Context are used with disabled multi-threading to avoid
caching multiple threading pools:
623865fe95/jax/_src/interpreters/mlir.py (L606-L611)
However, when context has enabled multithreading it also uses locks on
the StorageUniquers and this can be helpful to avoid data races in the
multi-threaded execution (for example with free-threaded cpython,
https://github.com/jax-ml/jax/issues/26272).
With this PR user can enable the multi-threading: 1) enables additional
locking and 2) set a shared threading pool such that cached contexts can
have one global pool.
This PR extends the python bindings for CallSiteLoc, FileLineColRange,
FusedLoc, NameLoc with field accessors. It also adds the missing
`value.location` accessor.
I also did some "spring cleaning" here (`cast` -> `dyn_cast`) after
running into some of my own illegal casts.
Apparently trying to lookup a function pointer using the C api
`mlirExecutionEngineLookup` will throw an assert instead of just
returning a nullptr on builds with asserts.
The docs itself says it returns a nullptr when no function is found so
it should be sensible to not throw an assert in this case.
For extremely large models, it may be inefficient to load the model into
memory in Python prior to passing it to the MLIR C APIs for
deserialization. This change adds an API to parse a ModuleOp directly
from a file path.
Re-lands
[4e14b8a](4e14b8afb4).
For extremely large models, it may be inefficient to load the model into
memory in Python prior to passing it to the MLIR C APIs for
deserialization. This change adds an API to parse a ModuleOp directly
from a file path.
This patch fixes:
mlir/lib/CAPI/Target/LLVMIR.cpp:38:63: error: extra ';' outside of a
function is incompatible with C++98
[-Werror,-Wc++98-compat-extra-semi]
mlir/lib/CAPI/Target/LLVMIR.cpp:60:61: error: extra ';' outside of a
function is incompatible with C++98
[-Werror,-Wc++98-compat-extra-semi]
Model the `IndexType` as `uint64_t` when converting to a python integer.
With the python bindings,
```python
DenseIntElementsAttr(op.attributes["attr"])
```
used to `assert` when `attr` had `index` type like `dense<[1, 2, 3, 4]>
: vector<4xindex>`.
---------
Co-authored-by: Christopher McGirr <christopher.mcgirr@amd.com>
Co-authored-by: Tiago Trevisan Jost <tiago.trevisanjost@amd.com>
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.
I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
[v3 with more `-DBUILD_SHARED_LIBS=ON` fixes]
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.
I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
[v2 with fixed `-DBUILD_SHARED_LIBS=ON` build]
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.