llvm-project

Author	SHA1	Message	Date
Sandeep Dasgupta	81d7eef134	Sub-channel quantized type implementation (#120172 ) This is an implementation for [RFC: Supporting Sub-Channel Quantization in MLIR](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694). In order to make the review process easier, the PR has been divided into the following commit labels: 1. Add implementation for sub-channel type: Includes the class design for `UniformQuantizedSubChannelType`, printer/parser and bytecode read/write support. The existing types (per-tensor and per-axis) are unaltered. 2. Add implementation for sub-channel type: Lowering of `quant.qcast` and `quant.dcast` operations to Linalg operations. 3. Adding C/Python Apis: We first define he C-APIs and build the Python-APIs on top of those. 4. Add pass to normalize generic ....: This pass normalizes sub-channel quantized types to per-tensor per-axis types, if possible. A design note: - Explicitly storing the `quantized_dimensions`, even when they can be derived for ranked tensor. While it's possible to infer quantized dimensions from the static shape of the scales (or zero-points) tensor for ranked data tensors ([ref](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694/3) for background), there are cases where this can lead to ambiguity and issues with round-tripping. ``` Consider the example: tensor<2x4x!quant.uniform<i8:f32:{0:2, 0:2}, {{s00:z00, s01:z01}}>> ``` The shape of the scales tensor is [1, 2], which might suggest that only axis 1 is quantized. While this inference is technically correct, as the block size for axis 0 is a degenerate case (equal to the dimension size), it can cause problems with round-tripping. Therefore, even for ranked tensors, we are explicitly storing the quantized dimensions. Suggestions welcome! PS: I understand that the upcoming holidays may impact your schedule, so please take your time with the review. There's no rush.	2025-03-23 07:37:55 -05:00
Ulrich Weigand	bb0bbed610	Fix bytecode reader/writer on big-endian platforms This makes the bytecode reader/writer work on big-endian platforms. The only problem was related to encoding of multi-byte integers, where both reader and writer code make implicit assumptions about endianness of the host platform. This fixes the current test failures on s390x, and in addition allows to remove the UNSUPPORTED markers from all other bytecode-related test cases - they now also all pass on s390x. Also adding a GFAIL_SKIP to the MultiModuleWithResource unit test, as this still fails due to an unrelated endian bug regarding decoding of external resources. Differential Revision: https://reviews.llvm.org/D153567 Reviewed By: mehdi_amini, jpienaar, rriddle	2023-06-23 09:22:55 +02:00
Paul Robinson	f79d941575	[MLIR/S90x] Convert tests to check 'target=...' Part of the project to eliminate special handling for triples in lit expressions.	2022-12-09 07:28:36 -08:00
Jacques Pienaar	7732c97f52	[mlir][quant] Initial bytecode encoding for quantized types Add bytecode encoding for quantized types. These mostly follow the storage representation of these. Differential Revision: https://reviews.llvm.org/D136004	2022-10-17 16:28:46 -07:00

4 Commits