Following up on #122188, this PR adds support for poison indices to
`ExtractOp` and `InsertOp`. It also includes canonicalization patterns
to turn extract/insert ops with poison indices into `ub.poison`.
Model the `IndexType` as `uint64_t` when converting to a python integer.
With the python bindings,
```python
DenseIntElementsAttr(op.attributes["attr"])
```
used to `assert` when `attr` had `index` type like `dense<[1, 2, 3, 4]>
: vector<4xindex>`.
---------
Co-authored-by: Christopher McGirr <christopher.mcgirr@amd.com>
Co-authored-by: Tiago Trevisan Jost <tiago.trevisanjost@amd.com>
The TOSA-v1.0 specification makes the shift attribute of the MUL
(Hammard product) operator an input. Move the `shift` parameter of the
MUL operator in the MILR TOSA dialect from an attribute to an input and
update any lit tests appropriately.
Expand the verifier of the `tosa::MulOp` operation to check the various
constraints defined in the TOSA-v1.0 specification. Specifically, ensure
that all input operands (excluding the optional shift) are of the same
rank. This means that broadcasting tests which previously checked rank-0
tensors would be broadcast are no longer valid and are removed.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
On lowering from `memref` to LLVM, `malloc` and other intrinsic
functions from `libc` will be declared in the current module. User's
redefinition of these reserved functions will poison the internal
analysis with wrong prototype. This patch adds assertion on the found
function's type and reports if it mismatch with the intended type.
Related to #120950
---------
Co-authored-by: Luohao Wang <Luohaothu@users.noreply.github.com>
Detailed writeup is in https://github.com/google/heir/issues/1153. See
also https://github.com/llvm/llvm-project/pull/120881. In short,
`propagateIfChanged` is used outside of the `DataFlowAnalysis` scope,
because it is public, but it does not propagate as expected as the
`DataFlowSolver` has stopped running.
To solve such misuse, `propagateIfChanged` should be made
protected/private.
For downstream users affected by this, to correctly propagate the
change, the Analysis should be re-run (check #120881) instead of just a
`propagateIfChanged`
The change to `IntegerRangeAnalysis` is just a expansion of the
`solver->propagateIfChanged`. The `Lattice` has already been updated by
the `join`. Propagation is done by `onUpdate`.
Cc @Mogball for review
See
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792
for detailed introduction.
This PR acts as the first part of it
* Add `OpAsmTypeInterface` and `getAsmName` API for deducing ASM name
from type
* Add default impl in `OpAsmOpInterface` to respect this API when
available.
The `OpAsmAttrInterface` / hooking into Alias system part should be
another PR, using a `getAlias` API.
### Discussion
* Instead of using `StringRef getAsmName()` as the API, I use `void
getAsmName(OpAsmSetNameFn)`, as returning StringRef might be unsafe
(std::string constructed inside then returned a _ref_; and this aligns
with the design of `getAsmResultNames`.
* On the result packing of an op, the current approach is that when not
all of the result types are `OpAsmTypeInterface`, then do nothing (old
default impl)
### Review
Cc @j2kun and @Alexanderviand-intel for downstream; Cc @River707 and
@joker-eph for relevent commit history; Cc @ftynse for discourse.
The recipe operations should have AutomaticAllocationScope so recipes can
be converted using operators that require parent ops to have
AutomaticAllocationScope
The op carries the output-shape directly. This can be used directly.
Also adds a method to get the shape as a `SmallVector<OpFoldResult>`.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
# Overview
This PR adds 2:4 structured sparsity (sparse A, dense B) matrix multiply
instructions to ROCDL.
# Testing
I've added tests to Dialect/mlir and Target/mlir
This was discussed during the original review but I made it stricter
than discussed. Making it a pure view but adding a helper for bytecode
serialization (I could avoid the helper, but it ends up with more logic
and stronger coupling).
The convertFloorOp pattern incurs precision loss when floating-point
numbers exceed the representable range of int64. This pattern should be
removed.
Fixes https://github.com/llvm/llvm-project/issues/119836
This PR adds mfma.scale.f32.32x32x64.f8f6f4 and
mfma.scale.f32.16x16x128.f8f6f4 to the ROCDL dialect. They are converted
to the corresponding intrinsics in the mlir-to-llvmir pass.
Intrinsics are available for the 'cpSize'
variants also. So, this patch migrates the Op
to lower to the intrinsics for all cases.
* Update the existing tests to check the lowering to intrinsics.
* Add newer cp_async_zfill tests to verify the lowering for the 'cpSize'
variants.
* Tidy-up CHECK lines in cp_async() function in nvvmir.mlir (NFC)
PTX spec link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
The TOSA-V1.0 specification adds "nan propagation" modes as attributes
for several operators. Adjust the ODS definitions of the relevant
operations to include this attribute.
The defined modes are "PROPAGATE" and "IGNORE" and the PROPAGATE mode is
set by default.
MAXIMUM, MINIMUM, REDUCE_MAX, REDUCE_MIN, MAX_POOL, CLAMP, and ARGMAX
support this attribute.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
This patch changes PadOp's padding input to type !tosa.shape<2 * rank>,
(where rank is the rank of the PadOp's input), instead of a <rank x 2>
tensor.
This patch is also a part of TOSA v1.0 effort:
https://discourse.llvm.org/t/rfc-tosa-dialect-increment-to-v1-0/83708
This patch updates the PadOp to match all against the TOSA v1.0 form.
Original Authors include:
@Tai78641
@wonjeon
Co-authored-by: Tai Ly <tai.ly@arm.com>
Scan directive allows to specify scan reductions within an worksharing
loop, worksharing loop simd or simd directive which should have an
`InScan` modifier associated with it. This change adds the mlir support
for the same.
Related PR: [Parsing and Semantic Support for
scan](https://github.com/llvm/llvm-project/pull/102792)
This is hopefully the first patch in the series of patches adding some
missing SPIR-V ops to MLIR over the next weeks/months, starting with
something simple: `OpEmitVertex` and `OpEndPrimitive`. Since the ops
have no input and outputs, and the only condition is "This instruction
must only be used when only one stream is present.", which I don't think
can be validate at the instruction level in isolation, I set
`hasVerifier` to 0. I hope I didn't miss anything, but I'm more than
happy to address any comments.
…the distributed IR case.
This patch allows `nd_load` and `nd_store` to preserve the tensor
descriptor shape during distribution to SIMT. The validation now expects
the distributed instruction to retain the `sg_map` attribute and uses it
to verify the consistency.
[note: this is blocked by:
https://github.com/tensorflow/tensorflow/pull/73891 otherwise tensorflow
may have lit test failures]
This patch adds SameOperandsAndResultRank trait to TOSA operators with
ResultsBroadcastableShape trait. SameOperandsAndResultRank trait
requiring that all operands and results have matching ranks unless the
operand/result is unranked.
This also renders the TosaMakeBroadcastable pass unnecessary - but this
pass is left in for now just in case it is still used in some flows. The
lit test, broadcast.mlir, is removed.
This also adds verify of the SameOperandsAndResultRank trait in the
TosaInferShapes pass to validate inferred shapes.
Signed-off-by: Tai Ly <tai.ly@arm.com>
This flag enables the configuration of some transformation such as the
lowering of contractions and transposes. The default configuration
preserves the existing behavior.
Add ROCDL support to the following instructions:
V_MFMA_F32_16X16X32_BF16
V_MFMA_I32_16X16X64_I8
V_MFMA_F32_16X16X32_F16
V_MFMA_F32_32X32X16_BF16
V_MFMA_I32_32X32X32_I8
V_MFMA_F32_32X32X16_F16
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Co-authored-by: Jungwook Park <jungwook.park@amd.com>
This PR adds missing ds\.read.tr4\.b64, ds\.read\.tr8\.b64,
ds\.read\.tr6\.b96,
ds\.read\.tr16\.b64 and global\.load\.lds ops to
the ROCDL dialect.
The ops are converted to the corresponding intrinsic calls during the
translation from MLIR to LLVM IRs.
---------
Co-authored-by: Ognjen Plavsic <plognjen@amd.com>
This follows up on 733be4ed7dcf976719f424c0cb81b77a14f91f5a, which made
mlir-vulkan-runner and its associated passes redundant, and completes
the main goal of #73457. The mlir-vulkan-runner tests become part of the
integration test suite, and the Vulkan runner runtime components become
part of ExecutionEngine, just as was done when removing other
target-specific runners.
This commit is a follow-up to 99a562b3cb17e89273ba0fe77129f2fb17a19381,
which migrated some of the mlir-vulkan-runner tests to mlir-cpu-runner
using a new pipeline and set of wrappers. That commit could not migrate
all the tests, because the existing calling conventions/ABIs for kernel
arguments generated by GPUToLLVMConversionPass were not a good fit for
the Vulkan runtime. This commit fixes this and migrates the remaining
tests. With this commit, mlir-vulkan-runner and many related components
are now unused, and they will be removed in a later commit (see #73457).
The old calling conventions require both the caller (host LLVM code) and
callee (device code) to have compile-time knowledge of the precise
argument types. This works for CUDA, ROCm and SYCL, where there is a
C-like calling convention agreed between the host and device code, and
the runtime passes through arguments as raw data without comprehension.
For Vulkan, however, the interface declared by the shader/kernel is in a
more abstract form, so the device code has indirect access to the
argument data, and the runtime must process the arguments to set up and
bind appropriately-sized buffer descriptors.
This commit introduces a new calling convention option to meet the
Vulkan runtime's needs. It lowers memref arguments to {void*, size_t}
pairs, which can be trivially interpreted by the runtime without it
needing to know the original argument types. Unlike the stopgap measure
in the previous commit, this system can support memrefs of various ranks
and element types, which unblocked migrating the remaining tests.
PR #122344 adds intrinsics for Bulk Async Copy
(non-tensor variants) using TMA. This patch
adds the corresponding NVVM Dialect Ops.
lit tests are added to verify the lowering to all
variants of the intrinsics.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This PR allows to lower **unsigned** `tosa.max_pool2d` to linalg.
```
// CHECK-LABEL: @max_pool_ui8
func.func @max_pool_ui8(%arg0: tensor<1x6x34x62xui8>) -> tensor<1x4x32x62xui8> {
// CHECK: builtin.unrealized_conversion_cast {{.*}} : tensor<1x6x34x62xui8> to tensor<1x6x34x62xi8>
// CHECK: arith.constant 0
// CHECK: linalg.pooling_nhwc_max_unsigned {{.*}} : (tensor<1x4x32x62xi8>) -> tensor<1x4x32x62xi8>
// CHECK: builtin.unrealized_conversion_cast {{.*}} : tensor<1x4x32x62xi8> to tensor<1x4x32x62xui8>
%0 = tosa.max_pool2d %arg0 {pad = array<i64: 0, 0, 0, 0>, kernel = array<i64: 3, 3>, stride = array<i64: 1, 1>} : (tensor<1x6x34x62xui8>) -> tensor<1x4x32x62xui8>
return %0 : tensor<1x4x32x62xui8>
}
```
It does this by
- converting the MaxPool2dConverter from OpRewriterPattern to
OpConversion Pattern
- adjusting the padding value to the the minimum unsigned value when the
max_pool is unsigned
- lowering to `linalg.pooling_nhwc_max_unsigned` (which uses
`arith.maxui`) when the max_pool is unsigned
This commit introduces a new MathToEmitC conversion pass that lowers
selected math operations from the Math dialect to the emitc.call_opaque
operation in the EmitC dialect.
**Supported Math Operations:**
The following operations are converted:
- math.floor -> emitc.call_opaque<"floor">
- math.round -> emitc.call_opaque<"round">
- math.exp -> emitc.call_opaque<"exp">
- math.cos -> emitc.call_opaque<"cos">
- math.sin -> emitc.call_opaque<"sin">
- math.acos -> emitc.call_opaque<"acos">
- math.asin -> emitc.call_opaque<"asin">
- math.atan2 -> emitc.call_opaque<"atan2">
- math.ceil -> emitc.call_opaque<"ceil">
- math.absf -> emitc.call_opaque<"fabs">
- math.powf -> emitc.call_opaque<"pow">
**Target Language Standards:**
The pass supports targeting different language standards:
- C99: Generates calls with suffixes (e.g., floorf, fabsf) for
single-precision floats.
- CPP11: Prepends std:: to functions (e.g., std::floor, std::fabs).
**Design Decisions:**
The pass uses emitc.call_opaque instead of emitc.call to better emulate
C-style function overloading.
emitc.call_opaque does not require a unique type signature, making it
more suitable for operations like <math.h> functions that may be
overloaded for different types.
This design choice ensures compatibility with C/C++ conventions.
In order to meaningfully generate getters and setters from IRDL, it
makes sense to embed the names of operands, results, etc. in the IR
definition. This PR introduces this feature. Names are constrained
similarly to TableGen names.
Remove builder API (e.g., `b.getFloat4E2M1FNType()`) and caching in
`MLIRContext` for low-precision FP types. Types are still cached in the
type uniquer.
For details, see:
https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361/28
Note for LLVM integration: Use `b.getType<Float4E2M1FNType>()` or
`Float4E2M1FNType::get(b.getContext())` instead of
`b.getFloat4E2M1FNType()`.
PR #121507 added 'cvt' intrinsics to convert
float to tf32, with the valid set of rounding and
saturation modes. This PR adds an NVVM Dialect Op
for the same.
* lit tests are added to verify the lowering to intrinsics.
* Negative tests are also added to check the error-handling of invalid
combinations.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This commit adds new wrappers around the MLIR Vulkan runtime which
implement the mgpu* APIs (as generated by GPUToLLVMConversionPass), adds
an optional LLVM lowering to the Vulkan runner mlir-opt pipeline based
on GPUToLLVMConversionPass, and migrates several of the
mlir-vulkan-runner tests to use mlir-cpu-runner instead, together with
the new pipeline and wrappers.
This is a further incremental step towards eliminating
mlir-vulkan-runner and its associated pipeline, passes and wrappers
(#73457). This commit does not migrate all of the tests to the new
system, because changes to the mgpuLaunchKernel ABI will be necessary to
support the tests that use multi-dimensional memref arguments.