11109 Commits

Author SHA1 Message Date
Srinivasa Ravi
ab9e447fb1
[MLIR][NVVM] Add support for mapa MLIR Ops (#124514)
Adds `mapa` and `mapa.shared.cluster` MLIR Ops to generate mapa
instructions.

`mapa` - Map the address of the shared variable in the target CTA.

- `mapa` - source is a register containing generic address pointing to
shared memory.
- `mapa.shared.cluster` - source is a shared memory variable or a
register containing a valid shared memory address.

PTX Spec Reference:

https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-mapa
2025-01-30 11:05:12 +05:30
Krzysztof Drewniak
cdc09a118a
[mlir][IntRangeInference] Infer values for {memref,tensor}.dim (#122945)
Implement the integer range inference niterface for memref.dim and
tetnor.dim using shared code. The inference will infer the `dim` of
dynamic dimensions to [0, index_max] and take the union of all the
dimensions that the `dim` argument could be validly referring to.
2025-01-29 18:43:53 -06:00
Jerry-Ge
956c0707d9
[mlir][tosa] Change the start and size of slice to tosa shape type (#124209)
Update to use getConstShapeValue to collect shape information along the
graph.

Change-Id: Ic6fc2341e3bcfbec06a1d08986e26dd08573bd9c

Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
2025-01-29 13:43:35 -08:00
Jay Foad
aa2952165c
Fix typo "tranpose" (#124929) 2025-01-29 17:49:54 +00:00
Rolf Morel
0d4efa2725
[MLIR][Linalg] Introduce linalg.contract (#123618)
A new op that allows for representing arbitrary contractions on operands
of arbitrary rank, with arbitrary transposes and arbitrary broadcasts
specified through its indexing_maps attribute.

Supports the expected lowerings to linalg.generic and to
vector.contract.

Corresponding RFC is here:
https://discourse.llvm.org/t/mlir-rfc-introduce-linalg-contract/83589
2025-01-29 17:28:52 +00:00
Matthias Springer
6900768719
[mlir][Conversion] Fix typos in MemRef descriptor comments (#124923) 2025-01-29 17:13:14 +01:00
Igor Wodiany
a01097faca
[mlir][spirv] Add definition for VectorTimesMatrixOp (#124571)
Adding op as defined in section 3.52.13. (Arithmetic Instructions) of
the SPIR-V specification.
2025-01-29 10:16:38 -05:00
Andrea Faulds
25ae1a266d [mlir][spirv] Make ConvertToSPIRVPass into a test pass (non-public)
With the removal of mlir-vulkan-runner (as part of #73457) in
e7e3c45bc70904e24e2b3221ac8521e67eb84668, this pass no longer has to be
public (previously it had to be so the runner could use it). This commit
makes it instead only available for use by mlir-opt.

This is a recommit of 058d183980a2f334d085a46c32abded0557aa789 (#124301)
which had been reverted in 4573c857da88b3210d497d9a88a89351a74b5964 due
to a missing linker dependency on MLIRSPIRVTransforms in
mlir/test/lib/Pass/CMakeLists.txt (fixed in this commit).
2025-01-29 15:51:42 +01:00
Andrea Faulds
4573c857da Revert "[mlir][spirv] Make ConvertToSPIRVPass into a test pass (non-public) (#124301)"
This reverts commit 058d183980a2f334d085a46c32abded0557aa789 due to
build failures (missing symbols when linking).
2025-01-29 15:06:32 +01:00
Andrea Faulds
058d183980
[mlir][spirv] Make ConvertToSPIRVPass into a test pass (non-public) (#124301)
With the removal of mlir-vulkan-runner (as part of #73457) in
e7e3c45bc70904e24e2b3221ac8521e67eb84668, this pass no longer has to be
public (previously it had to be so the runner could use it). This commit
makes it instead only available for use by mlir-opt.
2025-01-29 14:36:55 +01:00
Ingo Müller
8d6b24167b
[mlir] Make TypedStrAttr actually enforce the string type. (#124770)
The tablgen definition `TypedStrAttr` is an attribute constraints that
is meant to restrict the type of a `StringAttr` to the type given as
parameter. However, the definition did not previously restrict the type;
any `StringAttr` was accepted. This PR makes the definition actually
enforce the type.

To test the constraints, the PR also changes the test op that was
previously used to test this constraint such that the enforced type is
`AnyInteger` instead of `AnyType`. The latter allowed any type, so not
enforcing that constraint had no observable effect. The PR then adds a
test case with a wrong type and ensures that diagnostics are produced.

Signed-off-by: Ingo Müller <ingomueller@google.com>
2025-01-29 13:32:36 +01:00
Adam Siemieniuk
87782b216f
[mlir][x86vector] AVX512-BF16 Dot op (#124800)
Adds AVX512 bf16 dot-product operation and defines lowering to LLVM
intrinsics.

AVX512 intrinsic operation definition is extended with an optional
extension field that allows specifying necessary LLVM mnemonic suffix
e.g., `"bf16"` for `x86_avx512bf16_` intrinsics.
2025-01-29 13:07:41 +01:00
Dmitriy Smirnov
f20b8e35b3
[MLIR][Linalg] Fixes for Winograd decomposition and for tiling (#123675)
The PR addresses issues with the filters of 1 x r and of r x 1 and with
the tiling.

---------

Signed-off-by: Dmitriy Smirnov <dmitriy.smirnov@arm.com>
2025-01-29 10:38:29 +00:00
Henrich Lauko
2a1f79582f
[MLIR] Fix import of invokes with mismatched variadic types (#124828)
This resolves the same issue addressed in
https://github.com/llvm/llvm-project/pull/124286, but for invoke
operations. The issue arose from duplicated logic for both imports. This
PR also refactors the common import code for call and invoke
instructions to mitigate issues in the future.
2025-01-29 10:43:00 +01:00
Matthias Gehre
5d3ae51612
Reapply "[mlir][python] allow DenseIntElementsAttr for index type (#118947)" (#124804)
This reapplies #118947 and adapts to nanobind.
2025-01-29 09:14:37 +01:00
Matthias Gehre
2ec27848c0
[MLIR] normalize-memrefs: Normalize memref.alloca (#123293)
The pass was only handling `memref.alloc` and this extends it to also
handle `memref.alloca`.
2025-01-29 08:34:33 +01:00
Srinivasa Ravi
d4159e2a1d
[MLIR][NVVM] Add support for griddepcontrol Ops (#124603)
Adds `griddepcontrol.wait` and `griddepcontrol.launch.dependents`
MLIR Ops to generate griddepcontrol instructions.

`griddepcontrol` - Allows dependent and prerequisite grids as defined by
the runtime to control execution in the following ways:

- `griddepcontrol.wait` - causes the executing thread to wait until all
  prerequisite grids in flight have completed and all the memory
  operations from the prerequisite grids are performed and made visible
  to the current grid.

- `griddepcontrol.launch.dependents` - signals that specific dependents
  the runtime system designated to react to this instruction can be
  scheduled as soon as all other CTAs in the grid issue the same
  instruction or have completed.

PTX Spec Reference:

https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol
2025-01-29 10:57:51 +05:30
Diego Caballero
35df525fd0
[mlir][Vector] Add support for poison indices to Extract/IndexOp (#123488)
Following up on #122188, this PR adds support for poison indices to
`ExtractOp` and `InsertOp`. It also includes canonicalization patterns
to turn extract/insert ops with poison indices into `ub.poison`.
2025-01-28 13:51:50 -08:00
Matthias Gehre
1b729c3d70 Revert "[mlir][python] allow DenseIntElementsAttr for index type (#118947)"
This reverts commit 9dd762e8b10586e749b0ddf3542e5dccf8392395.
2025-01-28 18:35:50 +01:00
Matthias Gehre
9dd762e8b1
[mlir][python] allow DenseIntElementsAttr for index type (#118947)
Model the `IndexType` as `uint64_t` when converting to a python integer. 

With the python bindings, 
```python
DenseIntElementsAttr(op.attributes["attr"])
```
used to `assert` when `attr` had `index` type like `dense<[1, 2, 3, 4]>
: vector<4xindex>`.

---------

Co-authored-by: Christopher McGirr <christopher.mcgirr@amd.com>
Co-authored-by: Tiago Trevisan Jost <tiago.trevisanjost@amd.com>
2025-01-28 18:31:58 +01:00
Jack Frankland
a58e774fba
[mlir][tosa] Make TOSA MUL's Shift an Input (#121953)
The TOSA-v1.0 specification makes the shift attribute of the MUL
(Hammard product) operator an input. Move the `shift` parameter of the
MUL operator in the MILR TOSA dialect from an attribute to an input and
update any lit tests appropriately.

Expand the verifier of the `tosa::MulOp` operation to check the various
constraints defined in the TOSA-v1.0 specification. Specifically, ensure
that all input operands (excluding the optional shift) are of the same
rank. This means that broadcasting tests which previously checked rank-0
tensors would be broadcast are no longer valid and are removed.

Signed-off-by: Jack Frankland <jack.frankland@arm.com>
Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
2025-01-28 16:25:22 +00:00
Luohao Wang
e84f6b6a88
[mlir] Fix conflict of user defined reserved functions with internal prototypes (#123378)
On lowering from `memref` to LLVM, `malloc` and other intrinsic
functions from `libc` will be declared in the current module. User's
redefinition of these reserved functions will poison the internal
analysis with wrong prototype. This patch adds assertion on the found
function's type and reports if it mismatch with the intended type.

Related to #120950


---------

Co-authored-by: Luohao Wang <Luohaothu@users.noreply.github.com>
2025-01-28 14:40:47 +01:00
Hongren Zheng
3a439e2caf
[mlir][dataflow] disallow outside use of propagateIfChanged for DataFlowSolver (#120885)
Detailed writeup is in https://github.com/google/heir/issues/1153. See
also https://github.com/llvm/llvm-project/pull/120881. In short,
`propagateIfChanged` is used outside of the `DataFlowAnalysis` scope,
because it is public, but it does not propagate as expected as the
`DataFlowSolver` has stopped running.

To solve such misuse, `propagateIfChanged` should be made
protected/private.

For downstream users affected by this, to correctly propagate the
change, the Analysis should be re-run (check #120881) instead of just a
`propagateIfChanged`

The change to `IntegerRangeAnalysis` is just a expansion of the
`solver->propagateIfChanged`. The `Lattice` has already been updated by
the `join`. Propagation is done by `onUpdate`.

Cc @Mogball for review
2025-01-28 13:32:28 +08:00
Hongren Zheng
3c64f86314
[mlir] Add OpAsmTypeInterface for pretty-print (#121187)
See
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792
for detailed introduction.

This PR acts as the first part of it
* Add `OpAsmTypeInterface` and `getAsmName` API for deducing ASM name
from type
* Add default impl in `OpAsmOpInterface` to respect this API when
available.

The `OpAsmAttrInterface` / hooking into Alias system part should be
another PR, using a `getAlias` API.

### Discussion

* Instead of using `StringRef getAsmName()` as the API, I use `void
getAsmName(OpAsmSetNameFn)`, as returning StringRef might be unsafe
(std::string constructed inside then returned a _ref_; and this aligns
with the design of `getAsmResultNames`.
* On the result packing of an op, the current approach is that when not
all of the result types are `OpAsmTypeInterface`, then do nothing (old
default impl)

### Review 

Cc @j2kun and @Alexanderviand-intel for downstream; Cc @River707 and
@joker-eph for relevent commit history; Cc @ftynse for discourse.
2025-01-28 13:31:41 +08:00
Scott Manley
e492083f55
[OpenACC] Add AutomaticAllocationScope to recipe ops (#124337)
The recipe operations should have AutomaticAllocationScope so recipes can
be converted using operators that require parent ops to have
AutomaticAllocationScope
2025-01-27 07:47:45 -08:00
MaheshRavishankar
092372da15
[mlir][Tensor] Rework ReifyRankedShapedTypeInterface implementation for tensor.expand_shape op. (#113501)
The op carries the output-shape directly. This can be used directly.
Also adds a method to get the shape as a `SmallVector<OpFoldResult>`.

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-01-27 07:05:34 -08:00
Samuel Ginzburg
43a50deb63
[MLIR][ROCDL] Add GFX940 SMFMAC (2:4 sparsity) instructions to the ROCDL dialect (#124435)
# Overview

This PR adds 2:4 structured sparsity (sparse A, dense B) matrix multiply
instructions to ROCDL.

# Testing

I've added tests to Dialect/mlir and Target/mlir
2025-01-27 11:58:26 +01:00
Longsheng Mou
8f17f51deb
[mlir][tosa] Fix comments format(NFC) (#124520)
This PR corrects the formatting of comments in Markdown. The previous
format was as follows:
https://mlir.llvm.org/docs/Dialects/TOSA/#tosaerf-mlirtosaerfop

![image](https://github.com/user-attachments/assets/1d1d10d5-c960-4724-9fb4-29c17ea39b11)

https://mlir.llvm.org/docs/Dialects/TOSA/#tosarescale-mlirtosarescaleop

![image](https://github.com/user-attachments/assets/fb23cbf6-be10-4a60-8b43-b28dc2db6918)
2025-01-27 10:50:53 +00:00
Jacques Pienaar
3b35b4c7f9
[mlir] Allow fallback from file line col range to loc (#124321)
This was discussed during the original review but I made it stricter
than discussed. Making it a pure view but adding a helper for bytecode
serialization (I could avoid the helper, but it ends up with more logic
and stronger coupling).
2025-01-24 18:08:44 -08:00
Jianjian Guan
990837f91d
[mlir][arith][tensor] Disable index type for bitcast (#121455)
Fixes #121397.
2025-01-24 16:53:04 +08:00
donald chen
45d83ae7df
[mlir] [math] Fix the precision issue of expand math (#120865)
The convertFloorOp pattern incurs precision loss when floating-point
numbers exceed the representable range of int64. This pattern should be
removed.

Fixes https://github.com/llvm/llvm-project/issues/119836
2025-01-24 14:46:41 +08:00
Yi Qian
c118864223
[MLIR][ROCDL]Add MFMA_*_F8F6F4 instructions to the ROCDL dialect (#123830)
This PR adds mfma.scale.f32.32x32x64.f8f6f4 and
mfma.scale.f32.16x16x128.f8f6f4 to the ROCDL dialect. They are converted
to the corresponding intrinsics in the mlir-to-llvmir pass.
2025-01-23 19:27:56 +00:00
Durgadoss R
2e6cc79f81
[MLIR][NVVM] Migrate CpAsyncOp to intrinsics (#123789)
Intrinsics are available for the 'cpSize'
variants also. So, this patch migrates the Op
to lower to the intrinsics for all cases.

* Update the existing tests to check the lowering to intrinsics.
* Add newer cp_async_zfill tests to verify the lowering for the 'cpSize'
   variants.
* Tidy-up CHECK lines in cp_async() function in nvvmir.mlir (NFC)

PTX spec link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-01-23 16:15:52 +05:30
Jack Frankland
8388040fc9
[mlir][tosa] Add NaN Propagation Mode Support (#121951)
The TOSA-V1.0 specification adds "nan propagation" modes as attributes
for several operators. Adjust the ODS definitions of the relevant
operations to include this attribute.

The defined modes are "PROPAGATE" and "IGNORE" and the PROPAGATE mode is
set by default.

MAXIMUM, MINIMUM, REDUCE_MAX, REDUCE_MIN, MAX_POOL, CLAMP, and ARGMAX
support this attribute.

Signed-off-by: Jack Frankland <jack.frankland@arm.com>
Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
2025-01-23 10:14:00 +00:00
Jakub Kuderski
98de5dfe6a
[mlir] Add NamedAttribute ctor taking StringRef. NFC. (#123974)
This is a small QoL improvement so that we don't have to go through
helpers when building `NamedAttribute`s.
2025-01-22 19:02:17 -05:00
Jacques Pienaar
a77250fd78
[mlir] Add C and Python interface for file range (#123276)
Plumbs through creating file ranges to C and Python.
2025-01-22 14:33:19 -08:00
Jerry-Ge
7e622b6132
[TOSA] Change PadOp padding to tosa.shape (#123133)
This patch changes PadOp's padding input to type !tosa.shape<2 * rank>,
(where rank is the rank of the PadOp's input), instead of a <rank x 2>
tensor.

This patch is also a part of TOSA v1.0 effort:
https://discourse.llvm.org/t/rfc-tosa-dialect-increment-to-v1-0/83708

This patch updates the PadOp to match all against the TOSA v1.0 form. 

Original Authors include: 
@Tai78641 
@wonjeon

Co-authored-by: Tai Ly <tai.ly@arm.com>
2025-01-22 12:36:48 -08:00
Anchu Rajendran S
afcbcae668
[mlir][OpenMP] inscan reduction modifier and scan op mlir support (#114737)
Scan directive allows to specify scan reductions within an worksharing
loop, worksharing loop simd or simd directive which should have an
`InScan` modifier associated with it. This change adds the mlir support
for the same.

Related PR: [Parsing and Semantic Support for
scan](https://github.com/llvm/llvm-project/pull/102792)
2025-01-22 09:53:54 -08:00
Igor Wodiany
f78359cf43
[mlir][spirv] Add definition for OpEmitVertex and OpEndPrimitive (#123759)
This is hopefully the first patch in the series of patches adding some
missing SPIR-V ops to MLIR over the next weeks/months, starting with
something simple: `OpEmitVertex` and `OpEndPrimitive`. Since the ops
have no input and outputs, and the only condition is "This instruction
must only be used when only one stream is present.", which I don't think
can be validate at the instruction level in isolation, I set
`hasVerifier` to 0. I hope I didn't miss anything, but I'm more than
happy to address any comments.
2025-01-22 12:45:23 -05:00
Petr Kurapov
fa6f88af10
[MLIR][XeGPU] Allow some nd ops to have argument shapes mismatch for … (#120566)
…the distributed IR case.

This patch allows `nd_load` and `nd_store` to preserve the tensor
descriptor shape during distribution to SIMT. The validation now expects
the distributed instruction to retain the `sg_map` attribute and uses it
to verify the consistency.
2025-01-22 18:03:36 +01:00
Tai Ly
729f958c4f
[TOSA] Add SameOperandsAndResultRank to TOSA Ops (#104501)
[note: this is blocked by:
https://github.com/tensorflow/tensorflow/pull/73891 otherwise tensorflow
may have lit test failures]

This patch adds SameOperandsAndResultRank trait to TOSA operators with
ResultsBroadcastableShape trait. SameOperandsAndResultRank trait
requiring that all operands and results have matching ranks unless the
operand/result is unranked.

This also renders the TosaMakeBroadcastable pass unnecessary - but this
pass is left in for now just in case it is still used in some flows. The
lit test, broadcast.mlir, is removed.

This also adds verify of the SameOperandsAndResultRank trait in the
TosaInferShapes pass to validate inferred shapes.

Signed-off-by: Tai Ly <tai.ly@arm.com>
2025-01-22 13:21:04 +00:00
Diego Caballero
d25a1f8887
[mlir][Vector][NFC] Add vector-transform-options flag to ConvertVectorToLLVMPass (#123491)
This flag enables the configuration of some transformation such as the
lowering of contractions and transposes. The default configuration
preserves the existing behavior.
2025-01-21 16:41:57 -08:00
Hyunsung Lee
83cdcd5da4
[MLIR/linalg] Update arg name of generalizeNamedOp in Transforms.h (#123679)
`Generalization.cpp:53`
```cpp
FailureOr<GenericOp> mlir::linalg::generalizeNamedOp(RewriterBase &rewriter,
                                                     LinalgOp linalgOp) {
  if (failed(generalizeNamedOpPrecondition(linalgOp)))
    return rewriter.notifyMatchFailure(linalgOp, "preconditions not met");

  SmallVector<Value> inputs = linalgOp.getDpsInputs();
  ValueRange outputs = linalgOp.getDpsInits();
  SmallVector<AffineMap> indexingMaps = linalgOp.getIndexingMapsArray();
  SmallVector<utils::IteratorType> iterators = linalgOp.getIteratorTypesArray();
  SmallVector<Type> resultTypes = linalgOp.hasPureTensorSemantics()
                                      ? TypeRange(ValueRange(outputs))
                                      : TypeRange{};
...
```

`generalizeNamedOp` in `Generalization.cpp` has a different arg name
than `generalizeNamedOp` in `Transforms.h`

Sync to use `linalgOp`
2025-01-21 23:54:06 +00:00
Yi Qian
4564ac91e1
Add gfx950 mfma instructions to ROCDL dialect (#123361)
Add ROCDL support to the following instructions:
V_MFMA_F32_16X16X32_BF16
V_MFMA_I32_16X16X64_I8
V_MFMA_F32_16X16X32_F16
V_MFMA_F32_32X32X16_BF16
V_MFMA_I32_32X32X32_I8
V_MFMA_F32_32X32X16_F16

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Co-authored-by: Jungwook Park <jungwook.park@amd.com>
2025-01-21 18:21:30 +00:00
plognjen
27f15add7c
[MLIR][ROCDL] Add ops for LDS read transpose and global to LDS intrinsics (#123530)
This PR adds missing ds\.read.tr4\.b64, ds\.read\.tr8\.b64,
ds\.read\.tr6\.b96,
ds\.read\.tr16\.b64 and global\.load\.lds ops to
 the ROCDL dialect.
The ops are converted to the corresponding intrinsic calls during the
translation from MLIR to LLVM IRs.

---------

Co-authored-by: Ognjen Plavsic <plognjen@amd.com>
2025-01-21 16:40:46 +00:00
Andrea Faulds
e7e3c45bc7
[mlir] Remove mlir-vulkan-runner and GPUToVulkan conversion passes (#123750)
This follows up on 733be4ed7dcf976719f424c0cb81b77a14f91f5a, which made
mlir-vulkan-runner and its associated passes redundant, and completes
the main goal of #73457. The mlir-vulkan-runner tests become part of the
integration test suite, and the Vulkan runner runtime components become
part of ExecutionEngine, just as was done when removing other
target-specific runners.
2025-01-21 16:51:27 +01:00
Karlo Basioli
a53abb2386
[mlir][IR] CommonTypeConstraints: fix syntax error (#123765) 2025-01-21 16:31:50 +01:00
Emilio Cota
67a412f072
[mlir][IR] CommonTypeConstraints: fully qualify low-precision FP type… (#123738)
…s isa<> calls in isa<> calls

To ease integration with downstream projects.

Follow-up to PR #123326.
2025-01-21 13:46:09 +00:00
Andrea Faulds
733be4ed7d
[mlir][spirv] Add GpuToLLVM cconv suited to Vulkan, migrate last tests (#123384)
This commit is a follow-up to 99a562b3cb17e89273ba0fe77129f2fb17a19381,
which migrated some of the mlir-vulkan-runner tests to mlir-cpu-runner
using a new pipeline and set of wrappers. That commit could not migrate
all the tests, because the existing calling conventions/ABIs for kernel
arguments generated by GPUToLLVMConversionPass were not a good fit for
the Vulkan runtime. This commit fixes this and migrates the remaining
tests. With this commit, mlir-vulkan-runner and many related components
are now unused, and they will be removed in a later commit (see #73457).

The old calling conventions require both the caller (host LLVM code) and
callee (device code) to have compile-time knowledge of the precise
argument types. This works for CUDA, ROCm and SYCL, where there is a
C-like calling convention agreed between the host and device code, and
the runtime passes through arguments as raw data without comprehension.
For Vulkan, however, the interface declared by the shader/kernel is in a
more abstract form, so the device code has indirect access to the
argument data, and the runtime must process the arguments to set up and
bind appropriately-sized buffer descriptors.

This commit introduces a new calling convention option to meet the
Vulkan runtime's needs. It lowers memref arguments to {void*, size_t}
pairs, which can be trivially interpreted by the runtime without it
needing to know the original argument types. Unlike the stopgap measure
in the previous commit, this system can support memrefs of various ranks
and element types, which unblocked migrating the remaining tests.
2025-01-21 13:27:45 +01:00
Durgadoss R
0f9e913466
[MLIR][NVVM] Add TMA Bulk Copy Ops (#123186)
PR #122344 adds intrinsics for Bulk Async Copy
(non-tensor variants) using TMA. This patch
adds the corresponding NVVM Dialect Ops.

lit tests are added to verify the lowering to all
variants of the intrinsics.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-01-21 13:56:59 +05:30