When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a
module that contains no test dialect operations, the reader type
callback in `runTest0` called
`reader.getDialectVersion<test::TestDialect>()` and then immediately
asserted that it succeeded. However, if the test dialect was never
referenced in the bytecode (because no test dialect types appear in the
module), the dialect's version information is not stored in the
bytecode, so `getDialectVersion` legitimately returns failure.
When the test dialect version is unavailable in the bytecode being read,
the module contains no test dialect types, so no "funky"-group overrides
are needed and the callback can safely skip by returning `success()`.
A regression test is added with a module that has no test dialect ops,
exercising the `test-dialect-version=2.0` path that previously crashed.
Fixes#128321Fixes#128325
Assisted-by: Claude Code
When the bytecode type callback (test-kind=2) calls iface->readType()
for every builtin type, complex types like MemRefType could crash
because the generated reading code used get<T>() which asserts on
invalid parameters, rather than getChecked<T>() which returns null
gracefully.
This change:
- Adds a getChecked<T>() free function helper in
BytecodeImplementation.h that calls T::getChecked(emitError, params)
(no-context form) when a specific override exists, otherwise falls back
to get<T>(). The with-context second branch is intentionally omitted to
avoid instantiating StorageUserBase::getChecked<Args> for types that
only inherit the base template (e.g. ArrayAttr), which would require
complete storage types unavailable in the bytecode reading TU.
- Updates BytecodeBase.td default cBuilder for
DialectAttribute/DialectType to use getChecked<> instead of get<>.
- Updates all custom cBuilder strings in BuiltinDialectBytecode.td.
- Updates the no-args codegen case in BytecodeDialectGen.cpp.
- Adds a regression test in bytecode_callback_with_custom_type.mlir.
Fixes#128308
Assisted-by: Claude Code
When using test-kind=2 in the bytecode roundtrip test, integer types
(i32) are replaced by a custom type (TestI32Type) via a type callback.
This exposed two crash scenarios:
1. Reading IntegerAttr with an unsupported type: `getIntegerBitWidth`
returns 0 for unsupported types and emits an error, but
`readAPIntWithKnownWidth` would proceed to call
`reader.readAPIntWithKnownWidth(0)`, creating a zero-width APInt with a
potentially non-zero value. Fix: early-return failure when `bitWidth ==
0`.
2. Reading VectorType with an unsupported element type:
`VectorType::get` asserts that the element type implements
VectorElementTypeInterface. When the element type is replaced by a
custom type that doesn't implement this interface, the program crashes.
Fix: use `VectorType::getChecked` with a diagnostic emitter lambda
instead of `get<VectorType>` in the bytecode builder.
Fixes#128312
`DenseIntOrFPElementsAttr` was recently generalized to accept any type
that implement the `DenseElementType` interface. The name
`DenseIntOrFPElementsAttr` does not make sense anymore. This commit
renames the attribute to `DenseTypedElementsAttr`. An alias is kept for
migration purposes. The alias will be removed after some time.
Previously the arith folder test would emit `dense<255>` (`0xFF` zero
extended). In-memory without bytecode is `0x01`, so this change ensures
in-memory formats match.
Also changes `0xFF` to `~0x00` since compilation on machines with signed
chars was causing issues, this should ensure it is set to all ones
regardless of char interpretation:
```
[1083/5044] Building CXX object tools/mlir/lib/IR/CMakeFiles/obj.MLIRIR.dir/BuiltinDialectBytecode.cpp.o
/.../BuiltinDialectBytecode.cpp:184:35: warning: result of comparison of constant 255 with expression of type 'const char' is always false [-Wtautological-constant-out-of-range-compare]
184 | if (blob.size() == 1 && blob[0] == 0xFF) {
| ~~~~~~~ ^ ~~~~
1 warning generated.
```
Fixesllvm/llvm-project#186178
When parseCustomEntry() calls a user attribute/type callback that
internally reads sub-attributes/types via the bytecode reader, the
reader may add entries to the deferredWorklist if the depth limit is
exceeded. If the callback then returns success with an empty entry
(falling through to the regular dialect reader), the reader position is
reset but deferredWorklist retains stale entries from the failed partial
read.
This causes an assert(deferredWorklist.empty()) failure in debug builds
when the fallback dialect reader successfully parses the attribute.
Fix by saving and restoring deferredWorklist.size() around each callback
invocation, discarding any stale entries added during a callback's
partial read when the reader position is rolled back.
Fixes#163337
Assisted-by: Claude Code
When a bytecode type callback substitutes a type that does not implement
DenseElementTypeInterface (e.g., \!test.i32 replacing i32), the bytecode
reader attempted to reconstruct a DenseIntOrFPElementsAttr with that
type. This unconditionally called getDenseElementBitWidth() which hit an
llvm_unreachable on unsupported types.
Fix this by validating the element type implements
DenseElementTypeInterface in readDenseIntOrFPElementsAttr before
proceeding. If the check fails, a proper diagnostic is emitted and
reading fails gracefully instead of crashing.
Fixes#128317
Extra cost is in serialization layer localized while resulting in
smaller bytecode files, this also keeps the format compatible with what
was previously.
The bytecode reader could enter an infinite loop when parsing deeply
nested attributes containing type references. The deferred worklist
stored only indices without distinguishing between attributes and types,
causing type indexes to be misinterpreted as attributes.
This patch changes the deferred worklist to store pairs of (index, kind)
to track whether each deferred entry is a type or attribute. The
worklist processing logic is updated to resolve the correct entry type.
resource keys have the problem that you can’t parse them from mlir
assembly if they have special or non-printable characters, but nothing
prevents you from specifying such a key when you create e.g. a
DenseResourceElementsAttr, and it works fine in other ways, including
bytecode emission and parsing
this PR solves the parsing by quoting and escaping keys with special or
non-printable characters in mlir assembly, in the same way as symbols,
e.g.:
```
module attributes {
fst = dense_resource<resource_fst> : tensor<2xf16>,
snd = dense_resource<"resource\09snd"> : tensor<2xf16>
} {}
{-#
dialect_resources: {
builtin: {
resource_fst: "0x0200000001000200",
"resource\09snd": "0x0200000008000900"
}
}
#-}
```
by not quoting keys without special or non-printable characters, the
change is effectively backwards compatible
the change is tested by:
1. adding a test with a dense resource handle key with special
characters to `dense-resource-elements-attr.mlir`
2. adding special and unprintable characters to some resource keys in
the existing lit tests `pretty-resources-print.mlir` and
`mlir/test/Bytecode/resources.mlir`
When serializing to bytecode, users can select the option to elide
resources from the bytecode file. This will instruct the bytecode writer
to serialize only the key and resource kind, while skipping
serialization of the data buffer. At parsing, the IR is built in memory
with valid (but empty) resource handlers.
When emitting bytecode, clients can specify a target dialect version to
emit in `BytecodeWriterConfig`. This exposes a target dialect version to
the DialectBytecodeWriter, which can be queried by name and used to
back-deploy attributes, types, and properties.
This renaming started with the native ODS support for properties, this is completing it.
A mass automated textual rename seems safe for most codebases.
Drop also the ods prefix to keep the accessors the same as they were before
this change:
properties.odsOperandSegmentSizes
reverts back to:
properties.operandSegementSizes
The ODS prefix was creating divergence between all the places and make it harder to
be consistent.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D157173
Leaving this field unitialized could led to crashes when it'll diverge from the
IRNumbering phase.
Differential Revision: https://reviews.llvm.org/D156965
[mlir] Add support for custom readProperties/writeProperties methods.
Currently, operations that opt-in to adopt properties will see auto-generated readProperties/writeProperties methods to emit and parse bytecode. If a dialects opts in to use `usePropertiesForAttributes`, those definitions will be generated for the current definition of the op without the possibility to handle attribute versioning.
The patch adds the capability for an operation to define its own read/write methods for the encoding of properties so that versioned operations can handle upgrading properties encodings.
In addition to this, the patch adds an example showing versioning on NamedProperties through the dialect version API exposed by the reader.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D155340
[mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode.
Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect.
Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D153383
[mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode.
Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect.
Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D153383
We currently only support lazy loading for regions that
statically implement the IsolatedFromAbove trait, but that
limits the amount of operations that can be lazily loaded. This review
lifts that restriction by computing which operations have isolated
regions when numbering, allowing any operation to be lazily loaded
as long as it doesn't use values defined above.
Differential Revision: https://reviews.llvm.org/D156199
We currently encode each region as a separate section, but
the reader expects all of the regions to be in the same section.
This updates the writer to match the behavior that the reader
expects.
Differential Revision: https://reviews.llvm.org/D156198
The bytecode reader currently assumes all regions can be lazy
loaded, which breaks reading any non-isolated region. This patch
fixes that by properly handling nested non-lazy regions, and only
considers isolated regions as lazy.
Differential Revision: https://reviews.llvm.org/D153795
This makes the bytecode reader/writer work on big-endian platforms.
The only problem was related to encoding of multi-byte integers,
where both reader and writer code make implicit assumptions about
endianness of the host platform.
This fixes the current test failures on s390x, and in addition allows
to remove the UNSUPPORTED markers from all other bytecode-related
test cases - they now also all pass on s390x.
Also adding a GFAIL_SKIP to the MultiModuleWithResource unit test,
as this still fails due to an unrelated endian bug regarding
decoding of external resources.
Differential Revision: https://reviews.llvm.org/D153567
Reviewed By: mehdi_amini, jpienaar, rriddle
Currently desired bytecode version is clamped to the maximum. This allows requesting bytecode versions that do not exist. We have added callsite validation for this in StableHLO to ensure we don't pass an invalid version number, probably better if this is managed upstream. If a user wants to use the current version, then omitting `setDesiredBytecodeVersion` is the best way to do that (as opposed to providing a large number).
Adding this check will also properly error on older version numbers as we increment the minimum supported version. Silently claming on minimum version would likely lead to unintentional forward incompatibilities.
Separately, due to bytecode version being `int64_t` and using methods to read/write uints, we can generate payloads with invalid version numbers:
```
mlir-opt file.mlir --emit-bytecode --emit-bytecode-version=-1 | mlir-opt
<stdin>:0:0: error: bytecode version 18446744073709551615 is newer than the current version 5
```
This is fixed with version bounds checking as well.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D151838
This is adding a new interface (`BytecodeOpInterface`) to allow operations to
opt-in skipping conversion to attribute and serializing properties to native
bytecode.
The scheme relies on a new section where properties are stored in sequence
{ size, serialize_properties }, ...
The operations are storing the index of a properties, a table of offset is
built when loading the properties section the first time.
This is a re-commit of 837d1ce0dc which conflicted with another patch upgrading
the bytecode and the collision wasn't properly resolved before.
Differential Revision: https://reviews.llvm.org/D151065
This reverts commit ca5a12fd69d4acf70c08f797cbffd714dd548348
and follow-up fixes:
df34c288c428eb4b867c8075def48b3d1727d60b
07dc906883af660780cf6d0cc1044f7e74dab83e
ab80ad0095083fda062c23ac90df84c40b4332c8
837d1ce0dc8eec5b17255291b3462e6296cb369b
The first commit was incomplete and broken, I'll prepare a new version
later, in the meantime pull this work out of tree.
This is adding a new interface (`BytecodeOpInterface`) to allow operations to
opt-in skipping conversion to attribute and serializing properties to native
bytecode.
The scheme relies on a new section where properties are stored in sequence
{ size, serialize_properties }, ...
The operations are storing the index of a properties, a table of offset is
built when loading the properties section the first time.
Back-deployment to version prior to 4 are relying on getAttrDictionnary() which
we intend to deprecate and remove: that is putting a de-factor end-of-support
horizon for supporting deployments to version older than 4.
Differential Revision: https://reviews.llvm.org/D151065
For block arg locs a common case is no/uknown location (where the producer
signifies they don't care about blockarg location). Also avoid needing to
dynamically resize opnames during parsing.
Assumed to be post lazy loading change, so chose version 3.
Differential Revision: https://reviews.llvm.org/D151038
At the moment we accept (in tests) unregistered dialects and in particular:
"new_processor_id_and_range"()
where there is no `.` separator. We probably will remove support for this
from the parser, but for now we're adding compatibility support in the
reader.
Differential Revision: https://reviews.llvm.org/D151386
This patch implements a mechanism to read/write use-list orders from/to the mlir bytecode format. When producing bytecode, use-list orders are appended to each value of the IR. When reading bytecode, use-lists orders are loaded in memory and used at the end of parsing to sort the existing use-list chains.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D149755
IsolatedRegions are emitted in sections in order for the reader to be
able to skip over them. A new class is exposed to manage the state and
allow the readers to load these IsolatedRegions on-demand.
Differential Revision: https://reviews.llvm.org/D149515
We were querying the wrong EncReader along some paths that resulted in
failures depending on if one encountered an Attribute from an unloaded
dialect before encountering an operation from that dialect.
Also fix error where we were able to emit "custom" form for an attribute
without custom form in TestDialect.
Differential Revision: https://reviews.llvm.org/D150260
This new features enabled to dedicate custom storage inline within operations.
This storage can be used as an alternative to attributes to store data that is
specific to an operation. Attribute can also be stored inside the properties
storage if desired, but any kind of data can be present as well. This offers
a way to store and mutate data without uniquing in the Context like Attribute.
See the OpPropertiesTest.cpp for an example where a struct with a
std::vector<> is attached to an operation and mutated in-place:
struct TestProperties {
int a = -1;
float b = -1.;
std::vector<int64_t> array = {-33};
};
More complex scheme (including reference-counting) are also possible.
The only constraint to enable storing a C++ object as "properties" on an
operation is to implement three functions:
- convert from the candidate object to an Attribute
- convert from the Attribute to the candidate object
- hash the object
Optional the parsing and printing can also be customized with 2 extra
functions.
A new options is introduced to ODS to allow dialects to specify:
let usePropertiesForAttributes = 1;
When set to true, the inherent attributes for all the ops in this dialect
will be using properties instead of being stored alongside discardable
attributes.
The TestDialect showcases this feature.
Another change is that we introduce new APIs on the Operation class
to access separately the inherent attributes from the discardable ones.
We envision deprecating and removing the `getAttr()`, `getAttrsDictionary()`,
and other similar method which don't make the distinction explicit, leading
to an entirely separate namespace for discardable attributes.
Recommit d572cd1b067f after fixing python bindings build.
Differential Revision: https://reviews.llvm.org/D141742
This new features enabled to dedicate custom storage inline within operations.
This storage can be used as an alternative to attributes to store data that is
specific to an operation. Attribute can also be stored inside the properties
storage if desired, but any kind of data can be present as well. This offers
a way to store and mutate data without uniquing in the Context like Attribute.
See the OpPropertiesTest.cpp for an example where a struct with a
std::vector<> is attached to an operation and mutated in-place:
struct TestProperties {
int a = -1;
float b = -1.;
std::vector<int64_t> array = {-33};
};
More complex scheme (including reference-counting) are also possible.
The only constraint to enable storing a C++ object as "properties" on an
operation is to implement three functions:
- convert from the candidate object to an Attribute
- convert from the Attribute to the candidate object
- hash the object
Optional the parsing and printing can also be customized with 2 extra
functions.
A new options is introduced to ODS to allow dialects to specify:
let usePropertiesForAttributes = 1;
When set to true, the inherent attributes for all the ops in this dialect
will be using properties instead of being stored alongside discardable
attributes.
The TestDialect showcases this feature.
Another change is that we introduce new APIs on the Operation class
to access separately the inherent attributes from the discardable ones.
We envision deprecating and removing the `getAttr()`, `getAttrsDictionary()`,
and other similar method which don't make the distinction explicit, leading
to an entirely separate namespace for discardable attributes.
Differential Revision: https://reviews.llvm.org/D141742
Add method to set a desired bytecode file format to generate. Change
write method to be able to return status including the minimum bytecode
version needed by reader. This enables generating an older version of
the bytecode (not dialect ops, attributes or types). But this does not
guarantee that an older version can always be generated, e.g., if a
dialect uses a new encoding only available at later bytecode version.
This clamps setting to at most current version.
Differential Revision: https://reviews.llvm.org/D146555
A dialect can opt-in to handle versioning through the
`BytecodeDialectInterface`. Few hooks are exposed to the dialect to allow
managing a version encoded into the bytecode file. The version is loaded
lazily and allows to retrieve the version information while parsing the input
IR, and gives an opportunity to each dialect for which a version is present
to perform IR upgrades post-parsing through the `upgradeFromVersion` method.
Custom Attribute and Type encodings can also be upgraded according to the
dialect version using readAttribute and readType methods.
There is no restriction on what kind of information a dialect is allowed to
encode to model its versioning. Currently, versioning is supported only for
bytecode formats.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D143647
Resources are encoded in two separate sections similarly to
attributes/types, one for the actual data and one for the data
offsets. Unlike other sections, the resource sections are optional
given that in many cases they won't be present. For testing,
bytecode serialization is added for DenseResourceElementsAttr.
Differential Revision: https://reviews.llvm.org/D132729
Some tests still pass even though we don't claim big-endian support. Using
UNSUPPORTED is a better indicator than XFAIL that we don't guarantee that
the tests work.
This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).
The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.
https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518
Differential Revision: https://reviews.llvm.org/D131747