For pyhon versions <3.12, pytype complains that:
```
error: in <module>: collections.abc.Buffer not supported yet [not-supported-yet]
from collections.abc import Buffer as _Buffer
```
Since it seems like this code intends to support <3.12, disabling the
type error on this line.
BasicBlock::getTerminator() is frequently called on valid IR, yet the
function has to check that the last instruction is in fact a terminator,
even in release builds. This check can only be optimized away when the
instruction is dereferenced.
Therefore, introduce the functions hasTerminator() and
getTerminatorOrNull() as replacement and require (assert) that
getTerminator() always returns a valid terminator. As a side effect,
this forces explicit expression of intent at call sites when unfinished
basic blocks should be supported.
Add a new acc::VariableInfoAttr attribute that can be extended and implemented by
language dialects to carry language specific information about variables that is
not reflected into the MLIR type system and is needed in the implementation
of the init/copy/destroy APIs.
A new genPrivateVariableInfo API is added to the MappableTypeInterface to generate
such attribute from an mlir::Value for the host variable.
The use case and motivation is the Fortran OPTIONAL attribute. This patch adds
a new fir::OpenACCFortranVariableInfoAtt that implements the acc::VariableInfoAttr
to carry the OPTIONAL information around.
`ReinterpretCastOpConstantFolder` could fold `memref.reinterpret_cast`
ops whose offset or sizes contain negative constants (e.g. `-1 :
index`).
- A negative constant size passed into `ReinterpretCastOp::create`
reaches
`MemRefType::get`, which asserts that all static dimension sizes are
non-negative, causing a crash.
- A negative constant offset produces an op with a static negative
offset,
which the `ViewLikeInterface` verifier then rejects ("expected offsets
to
be non-negative").
Fix by skipping the fold when any constant size or the offset is
negative.
Negative strides are intentionally left foldable: they are valid in
strided MemRef layouts (e.g. for reverse iteration) and neither
`MemRefType::get` nor `ViewLikeInterface` places a non-negativity
constraint on strides.
Fixes https://github.com/llvm/llvm-project/issues/188407
Assisted-by: Claude Code
In WarpOpScfIfOp and WarpOpScfForOp, the walk that updates users of
escaping values (after moving them to the inner WarpOp) was calling
operand.set() directly, bypassing the rewriter API. This causes the
MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS fingerprint check to fail.
Fix by wrapping the operand updates with rewriter.modifyOpInPlace().
Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
The FoldCast canonicalization pattern was calling
op.getRefMutable().assign(src) directly, bypassing the rewriter. This
violates the pattern API contract and causes fingerprint change failures
when
MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS is enabled. Wrap the
modification with b.modifyOpInPlace() to properly notify the rewriter of
the changes.
Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
`affine.linearize_index` pairs
(`CancelDelinearizeOfLinearizeDisjointExactTail`) only match when basis
elements are exactly equal as `OpFoldResult` values. This means they
cannot simplify cases where dynamic basis products are semantically
equal but represented by different SSA values or affine expressions.
This patch adds a new pass `affine-simplify-with-bounds` with two
rewrite patterns that use `ValueBoundsConstraintSet` to prove equality
of basis products:
- **`SimplifyDelinearizeOfLinearizeDisjointManyToOneTail`**: matches
when multiple consecutive linearize dimensions have a product equal to a
single delinearize dimension (many-to-one).
- **`SimplifyDelinearizeOfLinearizeDisjointOneToManyTail`**: matches
when a single linearize dimension equals the product of multiple
consecutive delinearize dimensions (one-to-many).
Both patterns scan from the tail (innermost dimensions) and support
partial matching. Unmatched prefix dimensions are left as residual
linearize/delinearize operations.
Assisted-by: Cursor (Claude)
---------
Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>
This MR fixes#188187 and #187974. Tighten TOSA constant folding and
identity-style folds so they do not produce invalid or type-incorrect
results when the op’s result type is unranked, rank-dynamic, or
otherwise not a static `RankedTensorType`. Several paths previously
assumed ranked/static shapes or folded through to the operand without
checking that the result type matched the value being returned.
`DenseElementsAttr::get`, `SplatElementsAttr::get` and similar builders
need a static shape; folding with `tensor<*xT>` or dynamic dims must not
fabricate dense attributes with the wrong shape.
Returning the operand from a “no-op” fold is only valid when
`operand.getType() == op.getType()`; otherwise the folder would change
the IR’s type semantics (e.g. ranked → unranked). Which in the bigger
pipeline supposed to be handled by `-tosa-infer-shapes`
Assisted-by: CLion code completion, GPT 5.3 - Codex
---------
Co-authored-by: Sayan Saha <sayans@mathworks.com>
Previously, the verifier for `llvm.call` and `llvm.invoke` would reject
calls where the callee was an `llvm.mlir.alias`, reporting that the
symbol does not reference a valid LLVM function or IFunc. Similarly, the
MLIR-to-LLVM-IR translation had no handling for aliases as callees.
This patch extends both the verifier and the translation to accept
`llvm.mlir.alias` as a valid callee for `llvm.call` and `llvm.invoke`,
mirroring the existing support for `llvm.mlir.ifunc`. The function type
for alias calls is derived from the call operands and result types, and
the translation emits a call through the alias global value.
Fixes#147057
Assisted-by: Claude Code
As with operand/result types this only handles standard dialects, but I think it is still useful as is.
We could consider extensibility if/when necessary.
Fixes undefined references in debug shared libs when building MLIR:
-DLLVM_ENABLE_PROJECTS="mlir"
-DCMAKE_BUILD_TYPE=Debug
-DBUILD_SHARED_LIBS=1
Debug build (-O0) disables dead code elimination, resulting in undefined
references in the following shared libs:
MLIROpenMPDialect (needs to link with TargetParser)
MLIRXeVMDialect (needs to link with TargetParser and MLIROpenMPDialect)
MLIRNVVMDialect (needs to link with TargetParser and MLIROpenMPDialect)
Fixes#189206
Assisted-by: Claude Code
From:
d7e60d5250
Utils were added to OpenMP, particularly [[maybe_unused]]
setOffloadModuleInterfaceAttributes() which calls
llvm::Triple::normalize() creating a new dependency on TargetParser.
When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a
module that contains no test dialect operations, the reader type
callback in `runTest0` called
`reader.getDialectVersion<test::TestDialect>()` and then immediately
asserted that it succeeded. However, if the test dialect was never
referenced in the bytecode (because no test dialect types appear in the
module), the dialect's version information is not stored in the
bytecode, so `getDialectVersion` legitimately returns failure.
When the test dialect version is unavailable in the bytecode being read,
the module contains no test dialect types, so no "funky"-group overrides
are needed and the callback can safely skip by returning `success()`.
A regression test is added with a module that has no test dialect ops,
exercising the `test-dialect-version=2.0` path that previously crashed.
Fixes#128321Fixes#128325
Assisted-by: Claude Code
When a dynamic index of -1 (the kPoisonIndex sentinel) was folded into
the static position of a vector.insert op,
foldDenseElementsAttrDestInsertOp would proceed to call
calculateInsertPosition, which returned -1. The subsequent iterator
arithmetic (allValues.begin() + (-1)) was undefined behaviour, causing
an assertion in DenseElementsAttr::get.
Fix by bailing out early in foldDenseElementsAttrDestInsertOp when any
static position equals kPoisonIndex, consistent with how
InsertChainFullyInitialized already guards this case.
Fixes#188404
Assisted-by: Claude Code
To simplify the output of the reduction-tree pass, this PR introduces
the eraseRedundantBlocksInRegion. For regions containing multiple
execution paths, this functionality selects the shortest 'interesting'
path. Additionally, this PR adds the getSuccessorForwardOperands API to
BranchOpInterface. This allows us to extract the ForwardOperands for a
specific path chosen from multiple alternatives, enabling the creation
of a cf.br operation for the redirected jump.
Previously, it generated extra `single` quote marks around the outer
braces (i.e., `'{'` `6442:\220,1\22` `'}'`). SPIR-V backend does not
expect that. It expects `{6442:\220,1\22}`.
When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a
non-last position within a struct() assembly format directive, the
printed
output is ambiguous: the comma-separated array elements are
indistinguishable from the struct-level commas separating key-value
pairs.
Fix this by wrapping such parameters in square brackets in both the
generated printer and parser. The printer emits '[' before and ']' after
the array value; the parser calls parseLSquare()/parseRSquare() around
the
FieldParser call. Parameters with a custom printer or parser are
unaffected
(the user controls the format in that case).
Fixes#156623
Assisted-by: Claude Code
XeVM currently doesn't support native binary generation. This PR enables
Ahead of Time (AOT) compilation of gpu module to native binary using
`ocloc`.
Currently, only works with LevelZeroRuntimeWrappers.
`loopUnrollByFactor` used `getConstantIntValue()` to read loop bounds,
which sign-extends the constant to `int64_t`. For unsigned `scf.for`
loops with narrow integer types (e.g. i1, i2, i3), this produces wrong
results: a bound such as `1 : i1` has `getSExtValue() == -1` but should
be treated as `1` (unsigned).
Two bugs were introduced by this:
1. **Wrong epilogue detection**: the comparison `upperBoundUnrolledCst <
ubCst` used signed int64, so e.g. `0 < -1` (where ubCst is the
sign-extended i1 value 1) evaluated to false, suppressing the epilogue
that should execute the remaining iterations.
2. **Zero step after overflow**: when `tripCountEvenMultiple == 0` (all
iterations go to the epilogue), `stepUnrolledCst = stepCst *
unrollFactor` can overflow the bound type's bitwidth and wrap to 0. A
zero step causes `constantTripCount` to return `nullopt`, preventing the
zero-trip main loop from being elided.
Fix:
- Use zero-extension (`getZExtValue`) instead of sign-extension when
reading bounds for unsigned loops.
- When `tripCountEvenMultiple == 0`, keep the original step for the main
loop to avoid the zero-step issue (the step value is irrelevant for a
zero-trip loop anyway).
Fixes#163743
Assisted-by: Claude Code
This PR adds scalar type to convert_layout op's result and operand. It
also enhance convert_layout pattern in wg-to-sg, unrolling, and
sg-to-lane distribution.
It is to support reduction to scalar, whether currently the layout
propagation doesn't support scalar to carry any layout. The design
choice to insert convert_layout op after reduction-to-scalar op to
record the layout information permanently across the passes.
The revision clears a long-due TODO, which supports the lowering when
transfer_read/write ops have mask via inserting a vector.shape_cast op
for the masked value.
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Issue #152743 reports that DefaultValuedProp is printed even when the
property value equals the default, unlike DefaultValuedAttr which is not
printed in that case.
The fix for this was already present in the codebase since commit
8955e285e1ac ("[mlir] Add property combinators, initial ODS support"),
which added elision of default-valued properties in the
genPropDictPrinter
function in OpFormatGen.cpp.
This commit adds:
- Documentation in Operations.md clarifying that DefaultValuedProp is
also elided from prop-dict output when the value equals the default,
consistent with the existing documentation for DefaultValuedAttr.
- An explicit test in properties.mlir verifying that DefaultValuedProp
with value equal to default is elided from prop-dict output, and that
DefaultValuedProp with a non-default value is still printed.
Fixes#152743
Assisted-by: Claude Code
Fix two issues in `MatchingSubsets::populateSubsetOpsAtIterArg`:
1. The `collectHoistableOps` parameter was declared but never used when
inserting subset ops via `insert(subsetOp)`. As a result, when recursing
into nested loops with `collectHoistableOps=false`, the nested loop's
subset ops were incorrectly added to the hoistable extraction/insertion
pairs of the parent loop. This caused spurious failures in the
`allDisjoint` check, preventing valid hoisting when nested loop ops
overlapped with outer loop ops. Fix by passing the parameter:
`insert(subsetOp, collectHoistableOps)`.
2. In the nested loop handling branch, there was no guard to detect when
a value has multiple nested loop uses (i.e., is used as an init arg in
more than one nested loop). Without the guard, `nextValue` would be
silently overwritten, leading to an incorrect use-def chain traversal.
Add `if (nextValue) return failure()` before setting `nextValue` for the
nested loop case, mirroring the existing guard for insertion ops.
Fixes#147096
Assisted-by: Claude Code
SwitchOp::getEntrySuccessorRegions and getRegionInvocationBounds called
IntegerAttr::getInt() to retrieve the constant switch argument, but
getInt() asserts that the attribute type must be a signless integer or
index. For unsigned integer types (e.g. ui32), this assertion fired and
crashed the process.
Fix by selecting the appropriate accessor based on the attribute type:
getInt() for signless/index, getSInt() for signed, and getUInt() (cast
to int64_t) for unsigned integer types. Unknown types fall back to the
conservative "all regions possible" path.
The same fix is applied to getRegionInvocationBounds, which had an
identical call to getInt().
Fixes#187973
Assisted-by: Claude Code
The RemoveDeadValuesPass previously emitted an error and skipped
optimization when the IR contained non-function symbol ops, non-call
symbol user ops, or branch ops. This restriction was later removed, but
the comments in RemoveDeadValues.cpp and Passes.td still described the
pass as operating "iff the IR doesn't have any non-function symbol ops,
non-call symbol user ops and branch ops."
Remove the stale restriction text from both the .cpp file comment and
the Passes.td description. Also add a test that verifies dead function
arguments are correctly removed inside a module that defines a symbol
(has a sym_name attribute), which was the original failure case reported
in issue #98700.
Fixes#98700
Assisted-by: Claude Code
Calling attribute setters and MutableOperandRange::assign() without
going through the PatternRewriter, bypassing the rewriter's
change-tracking triggered "operation finger print changed" after the
pattern returned success under
MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.
Assisted-by: Claude Code
LLVMLoadStoreToOCLPattern::matchAndRewrite was calling op->removeAttr()
and op->setOperand() directly without going through the rewriter API.
This caused MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS to report "expected
pattern to replace the root operation or modify it in place".
Fix: wrap the direct mutations in rewriter.modifyOpInPlace().
Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
Add explicit tests for condition propagation in scf.if then and else
branches, including the void-return case. These tests serve as
regression
tests for the bug reported in #159165 where the
SCFIfConditionPropagationPass
(since reverted) had a visited-set that was never populated, causing the
pass
to not propagate conditions into nested scf.if statements.
The current ConditionPropagation canonicalization pattern in SCF.cpp
correctly handles both nested ifs and direct condition uses within
branches
using the getParentType() ancestor check.
Fixes#159165
Assisted-by: Claude Code
LegalizeLaunchFuncOpPattern previously rejected gpu.launch_func ops with
more than one async dependency. This change removes that limitation by
synchronizing additional dependencies onto the primary stream using
CUDA/HIP events, following the same approach already used in
ConvertWaitAsyncOpToGpuRuntimeCallPattern for gpu.wait async.
For each additional async dependency beyond the first:
- If it is a stream (produced by mgpuStreamCreate), create an event,
record it on that stream, wait for it on the primary stream, then
destroy the event.
- If it is already an event, wait for it directly on the primary stream
and destroy it.
Fixes#156984
Assisted-by: Claude Code
Motivated by the need from IREE side to support ArgMax/ArgMin-like
operation using dpp and ballot operation (refer to
https://github.com/iree-org/iree/discussions/23609#discussioncomment-16311655
for more details), this PR adds `gpu.ballot` operation to the MLIR GPU
dialect with ROCDL, NVVM, and SPIR-V lowering support.
Assisted-by: [Claude Code](https://claude.ai/code)
---------
Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
This PR adds `global_prefetch` op to prefetch a cache line to high-level
caches using the aligned address of the source `memref` and an offset
provided by the indices of the element containing the cache line. This
provides temporal hints (e.g., regular or high-priority). Note that
out-of-bounds access is allowed in speculative mode. Ensure the source
`memref` is in address space `1`.
---------
Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
The `--int-range-optimizations` pass runs the `DataFlowSolver` once,
then calls `applyPatternsGreedily` with a `DataFlowListener` that erases
solver state when ops are deleted. However, the greedy driver's
`simplifyRegions` step (which calls `runRegionDCE` between pattern
iterations) can remove block arguments without notifying the listener.
This frees the `BlockArgumentImpl` storage, which may be reused by a
subsequent allocation. The solver then finds stale lattice state keyed
at the reused address and incorrectly treats the new block argument as a
known constant, causing a miscompile.
The existing `enableFolding(false)` was added for the same class of bug
(folding can also remove block arguments). This patch extends the fix by
also disabling region simplification, preventing dead-arg elimination
from causing the same address-reuse problem.
Fixes#137281Fixes#126195
Assisted-by: Claude Code
The scf-for-loop-range-folding pass transforms loops of the form
for (i = lb; i < ub; i += step) { use(i * c) }
into
for (j = lb*c; j < ub*c; j += step*c) { use(j) }
This transformation is only valid when c is strictly positive, since
scf.for requires a positive step. When c is zero or negative, the new
step becomes zero or effectively negative (wrapping in unsigned
arithmetic for index type), producing an incorrect loop.
Add a guard that restricts the MulIOp folding to cases where the
loop-invariant multiplier is a statically known positive integer
constant. Non-constant loop-invariant multipliers are also excluded
since their sign cannot be determined at compile time.
Fixes#56235Fixes#116664
Assisted-by: Claude Code
This patch introduces the following operators:
spirv.Tosa.Gather
spirv.Tosa.Scatter
spirv.Tosa.Resize
Also dialect and serialization round-trip tests have been added.
Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
Nested symbol references (e.g. `@module::@func`) inside a `gpu.launch`
body cannot be resolved after the body is outlined into a new
`gpu.module`. Previously, `createKernelModule` used `getLeafReference()`
to look up each symbol use, which silently skipped nested references
when the leaf name could not be found in the parent symbol table. This
left unresolvable cross-module references in the outlined kernel.
This patch detects nested symbol references whose root exists in the
parent symbol table — meaning the reference was valid before outlining
but will become dangling after it — and emits a diagnostic error.
Phantom references whose root does not exist in the parent are left
as-is, preserving existing behavior for unregistered-op attributes
(regression test from #185357).
The existing `@nested_launch` test was inadvertently testing this broken
behavior (silently producing invalid IR with a dangling
`@nested_launch_kernel::@nested_launch_kernel` reference inside the
outlined outer kernel module); it is updated to expect the new error.
Fixes#187942
Assisted-by: Claude Code
Two related bugs in generate-test-checks.py when a top-level operation
carries attribute alias references (e.g. `#map`, `#map1`) in its
signature:
1. The attribute reference substitution (replacing `#map` with
`#[[$ATTR_0]]`) ran *before* the pending attribute definitions were
processed, so the names were not yet available and the references were
left as-is in the output.
2. CHECK-LABEL lines do not support FileCheck variable references (e.g.
`#[[$ATTR_0]]`), so even after substitution the generated check would be
syntactically wrong.
Fix both issues:
- In the CHECK-LABEL branch, re-apply `process_attribute_references` to
the label prefix and SSA-split rest after flushing pending attribute
definitions, so that names are resolved.
- Split the label prefix at attribute reference boundaries; keep only
the text before the first reference in the CHECK-LABEL line and emit the
remainder on a CHECK-SAME line.
Before:
// CHECK-LABEL: func.func @test() attributes {amap = #map, bmap = #map1}
{
After:
// CHECK-LABEL: func.func @test() attributes {amap =
// CHECK-SAME: #[[$ATTR_0]], bmap = #[[$ATTR_1]]} {
Fixes#162310
Assisted-by: Claude Code
collectFreedValues used cast<MemoryEffectOpInterface> unconditionally on
every op encountered during the walk. Ops that do not implement the
interface (e.g. pdl.pattern, pdl.operands, pdl.types inside a
transform.with_pdl_patterns region) trigger the cast assertion, crashing
mlir-opt when -transform-dialect-check-uses is requested.
Change the cast to dyn_cast and skip ops that don't implement the
interface; they cannot free transform values and are safely ignored.
Fixes#120944
Assisted-by: Claude Code