3051 Commits

Author SHA1 Message Date
Yang Bai
f1f194bf10
[mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774)
### Problem

PR #142944 introduced a new canonicalization pattern which caused
failures in the following GPU-related integration tests:

-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir

The issue occurs because the new canonicalization pattern can generate
multi-dimensional `vector.from_elements` operations (rank > 1), but the
GPU lowering pipelines were not equipped to handle these during the
conversion to LLVM.

### Fix

This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the
GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`:

- `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass.
- `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass.

Co-authored-by: Yang Bai <yangb@nvidia.com>
2025-08-21 21:46:06 -05:00
Tim Gymnich
e20fa4f412
[mlir][AMDGPU] Add PermlaneSwapOp (#154345)
- Add PermlaneSwapOp that lowers to `rocdl.permlane16.swap` and
`rocdl.permlane32.swap`

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-08-21 18:21:43 +02:00
Guray Ozen
5c36fb3303
[MLIR][NVVM] Improve inline_ptx, add readwrite support (#154358)
Key Features
1. Multiple SSA returns – no struct packing/unpacking required.
2. Automatic struct unpacking – values are directly usable.
3. Readable register mapping
    * {$rwN} → read-write
    * {$roN} → read-only
    * {$woN} → write-only
4. Full read-write support (+ modifier).
5. Simplified operand specification – avoids cryptic
"=r,=r,=f,=f,f,f,0,1" constraints.
6. Predicate support: PTX `@p` predication support

IR Example:
```
%wo0, %wo1 = nvvm.inline_ptx """
 .reg .pred p;
 setp.ge.s32 p,   {$r0}, {$r1};
 selp.s32 {$rw0}, {$r0}, {$r1}, p;
 selp.s32 {$rw1}, {$r0}, {$r1}, p;
 selp.s32 {$w0},  {$r0}, {$r1}, p;
 selp.s32 {$w1},  {$r0}, {$r1}, p;
""" ro(%a, %b : f32, f32) rw(%c, %d : i32, i32) -> f32, f32
```

After lowering
```
 %0 = llvm.inline_asm has_side_effects asm_dialect = att
 "{
                              .reg .pred p;\
                              setp.ge.s32 p, $4, $5;   \
                              selp.s32   $0, $4, $5, p;\
                              selp.s32   $1, $4, $5, p;\
                              selp.s32   $2, $4, $5, p;\
                              selp.s32   $3, $4, $5, p;\
   }"
   "=r,=r,=f,=f,f,f,0,1"
   %c500_i32, %c400_i32, %cst, %cst_0
   : (i32, i32, f32, f32)
   -> !llvm.struct<(i32, i32, f32, f32)>

 %1 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>
 %2 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>
 %3 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>
 %4 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)>

 // Unpacked result from nvvm.inline_ptx
 %5 = arith.addi %1, %2 : i32
 // read only
 %6 = arith.addf %cst, %cst_0 : f32
 // write only
 %7 = arith.addf %3, %4 : f32
```
2025-08-21 17:42:18 +02:00
Mehdi Amini
780750bbf9
[MLIR] Adopt LDBG() debug macro in ConvertToLLVMPass (NFC) (#154616) 2025-08-20 21:29:35 +00:00
Akash Banerjee
d69ccded4f
[MLIR] Add cpow support in ComplexToROCDLLibraryCalls (#153183)
This PR adds support for complex power operations (`cpow`) in the
`ComplexToROCDLLibraryCalls` conversion pass, specifically targeting
AMDGPU architectures. The implementation optimises complex
exponentiation by using mathematical identities and special-case
handling for small integer powers.

- Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa`
target instead of using library calls
- Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using
mathematical identity
2025-08-20 17:18:30 +00:00
Yang Bai
4eb1a07d7d
[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175)
This patch introduces a new unrolling-based approach for lowering
multi-dimensional `vector.from_elements` operations.

**Implementation Details:**
1. **New Transform Pattern**: Added `UnrollFromElements` that unrolls a
N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the
outermost dimension.
2. **Utility Functions**: Added `unrollVectorOp` to reuse the unroll
algo of vector.gather for vector.from_elements.
3. **Integration**: Added the unrolling pattern to the
convert-vector-to-llvm pass as a temporal transformation.
4. Use direct LLVM dialect operations instead of intermediate
vector.insert operations for efficiency in `VectorFromElementsLowering`.

**Example:**
```mlir
// unroll
%v = vector.from_elements  %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32>
%vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32>
%vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32>
%result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32>

// convert-vector-to-llvm
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>>
%poison_1d_0 = llvm.mlir.poison : vector<2xf32>
%c0_0 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32>
%c1_0 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32>
%vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>>
%poison_1d_1 = llvm.mlir.poison : vector<2xf32>
%c0_1 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32>
%c1_1 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32>
%vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>>
%result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32>
```

---------

Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: James Newling <james.newling@gmail.com>
Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>
2025-08-18 10:09:12 -07:00
Jacques Pienaar
4bf33958da
[mlir] Update builders to use new form. (#154132)
Mechanically applied using clang-tidy.
2025-08-18 15:19:34 +00:00
Matthias Springer
f7b09ad700
[mlir][LLVM] ArithToLLVM: Add 1:N support for arith.select lowering (#153944)
Add 1:N support for the `arith.select` lowering. Only cases where the
entire true/false value is selected are supported.
2025-08-18 09:42:37 +02:00
Matthias Springer
2692ff8213
[mlir][LLVM] Fix build (#153947)
Fix build after #153937.
2025-08-16 13:06:58 +02:00
Matthias Springer
f8f23e838a
[mlir][LLVM] ControlFlowToLLVM: Add 1:N type conversion support (#153937)
Add support for 1:N type conversions to the `ControlFlowToLLVM` lowering
patterns. Not applicable to `cf.switch` and `cf.assert`.

---------

Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
2025-08-16 12:51:40 +02:00
Matthias Springer
f0967fca04
[mlir][LLVM] FuncToLLVM: Add 1:N type conversion support (#153823)
Add support for 1:N type conversions to the `FuncToLLVM` lowering
patterns. This commit does not change the lowering of any types (such as
`MemRefType`). It just sets up the infrastructure, such that 1:N type
conversions can be used during `FuncToLLVM`.

Note: When the converted result types of a `func.func` have more than 1
type, then the results are wrapped in an `llvm.struct`. That's because
`llvm.func` does not support multiple result values. This "wrapping" was
already implemented for cases where the original `func.func` has
multiple results. With 1:N conversions, even a single result can now
expand to multiple converted results, triggering the same wrapping
mechanism.

The test cases are exercised with both the old and the new no-rollback
conversion driver.
2025-08-16 09:45:08 +02:00
Guray Ozen
4c389178ee
[MLIR][NVVM] Print readable modifer (NFC) (#153779)
Currently, modifier is printed as address, so it is not readable and not
useful. This PR adds readable printing for it.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-15 15:47:39 +02:00
Matthias Springer
21b607adbe
[mlir][SCF] scf.for: Add support for unsigned integer comparison (#153379)
Add a new unit attribute to allow for unsigned integer comparison.

Example:
```mlir
scf.for unsigned %iv_32 = %lb_32 to %ub_32 step %step_32 : i32 {
  // body
}
```

Discussion:
https://discourse.llvm.org/t/scf-should-scf-for-support-unsigned-comparison/84655
2025-08-15 10:59:14 +02:00
Jianhui Li
98728d9dc8
[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter (#152429)
Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
  1. compute Strides;
  2. compute Offsets;
  3. collapseMemrefTo1D;
  4. create Load gather or store_scatter op
2025-08-14 11:27:07 -07:00
Matthias Springer
e2ae634cc1
[mlir][LLVM][NFC] Simplify copyUnrankedDescriptors (#153597)
Split the function into two: one that copies a single unranked
descriptor and one that copies multiple unranked descriptors. This is in
preparation of adding 1:N support to the Func->LLVM lowering patterns.
2025-08-14 18:25:19 +02:00
Matthias Springer
0ff92fe2f0
[mlir][LLVM][NFC] Simplify computeSizes function (#153588)
Rename `computeSizes` to `computeSize` and make it compute just a single
size. This is in preparation of adding 1:N support to the Func->LLVM
lowering patterns.
2025-08-14 17:00:03 +02:00
Jaden Angella
bfda0e777d
[mlir][EmitC] Expand the MemRefToEmitC pass - Lowering CopyOp (#151206)
This patch lowers `memref.copy` to `emitc.call_opaque "memcpy"`.
From:
```
func.func @copying(%arg0 : memref<9x4x5x7xf32>, %arg1 : memref<9x4x5x7xf32>) {
  memref.copy %arg0, %arg1 : memref<9x4x5x7xf32> to memref<9x4x5x7xf32>
  return
}
```
To:
```cpp
#include <cstring>
void copying(float v1[9][4][5][7], float v2[9][4][5][7]) {
  size_t v3 = 0;
  float* v4 = &v2[v3][v3][v3][v3];
  float* v5 = &v1[v3][v3][v3][v3];
  size_t v6 = sizeof(float);
  size_t v7 = 1260;
  size_t v8 = v6 * v7;
  memcpy(v5, v4, v8);
  return;
}
```
2025-08-14 05:25:55 -07:00
Krzysztof Drewniak
bbe3d64b39
[mlir][ROCDL] Annotate lane ID functions with noundef, ranges (#151396)
Now that we have general support for setting argument and result
attributes on LLVM intrinsics, extend the definitions of mbcnt.lo and
mbcnt.hi to carry such attributes. With that, update the construction of
the mbcnt.lo/mbcnt.hi calls used to get the lane ID to be `noundef`
(since the lane ID is always defined) and to be annotated with the
correct ranges (so that generic LLVM passes can correctly optimized
based on the fact that there are never more than 32/64 lanes).

(Also, handle a pattern that wasn't using getLaneId() and get rid of a
dead argument)
2025-08-13 17:44:03 -05:00
Matthias Springer
7e7c9d975e
[mlir][Transforms] Dialect Conversion Driver without Rollback (#151865)
This commit improves the `allowPatternRollback` flag handling in the
dialect conversion driver. Previously, this flag was used to merely
detect cases that are incompatible with the new One-Shot Dialect
Conversion driver. This commit implements the driver itself: when the
flag is set to "false", all IR changes are materialized immediately,
bypassing the `IRRewrite` and `ConversionValueMapping` infrastructure.

A few selected test cases now run with both the old and the new driver.

RFC:
https://discourse.llvm.org/t/rfc-a-new-one-shot-dialect-conversion-driver/79083
2025-08-13 17:40:55 +02:00
Adam Siemieniuk
7d1b9cad87
[mlir][amx] Vector to AMX conversion pass (#151121)
Adds a pass for Vector to AMX operation conversion.

Initially, a direct rewrite for vector contraction in packed VNNI layout
is supported. Operations are expected to already be in shapes which are
AMX-compatible for the rewriting to occur.
2025-08-13 11:08:52 +02:00
Gao Yanfeng
24f5385a85
[MLIR][NVVM] Support generating all the ldmatrix intrinsics from NVVM ops (#148783)
Previously, the NVVM dialect's ldmatrix operation could only generate a
limited subset of the available NVVM ldmatrix intrinsics. The intrinsics
generating new ops introduced in BlackWell are not accessible through
the NVVM ops. This commit extends the ldmatrix operation to support all
available ldmatrix intrinsics.
2025-08-12 15:13:15 +01:00
Akash Banerjee
e1a694cd16 [NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls 2025-08-12 15:06:03 +01:00
Akash Banerjee
c1f410779a Revert "[NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls"
This reverts commit b8104fa320f006bacd3e16afb431b5980dd5000a.
2025-08-12 14:18:57 +01:00
Akash Banerjee
b8104fa320 [NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls 2025-08-12 14:05:00 +01:00
Scott Manley
e72335192d
[Arith][MemRef] add AtomicRMWKind::xori to enum (#151701)
Add missing xor AtomicRMWKind enum in arith. Also add support for xor to
memref.atomic_rmw so the change can be tested.

This does NOT add it for all users of the enum (e.g. Affine, Vector)
2025-08-11 08:46:06 -04:00
Erick Ochoa Lopez
a1672d7c6a
[mlir][vector] Add alignment attribute to maskedload and maskedstore (#151690)
These commits continue the work done in
https://github.com/llvm/llvm-project/pull/144344, of adding alignment
attributes to operations in the vector and memref. These commits focus
on adding the alignment attribute to the `maskedload` and `maskedstore`
operations. The `VectorLoadConversion` pattern in VectorToLLVM is a
template for `load`, `store`, `maskedload` and `maskedstore` operations.
Having the alignment attribute in all these operations would allow for
an easy way to propagate the alignment attribute from the vector dialect
to the LLVM dialect.

This patchset also includes changes to the conversion from VectorToLLVM
to propagate the alignment attribute for the
vector.{,masked}{load,store} operations.
2025-08-08 09:23:44 -04:00
Erick Ochoa Lopez
6d231fbb05
[mlir] MemRefToSPIRV propagate alignment attributes from MemRef ops. (#151723)
This patchset:
* propagates alignment attributes from memref operations into the SPIR-V
dialect,
* fixes an error in the logic which previously propagated alignment
attributes but did not add other MemoryAccess attributes.
* adds a failure condition in the case where the alignment attribute
from the memref dialect (64-bit wide) does not fit in SPIR-V's alignment
attribute (specified to be 32-bit wide).
2025-08-07 12:18:23 -04:00
Erick Ochoa Lopez
d72e58e422
[MLIR][LLVM] Propagate alignment attribute from memref to LLVM (#151380)
Propagate alignment attribute from operations in the memref dialect to
the LLVM dialect.

Possible improvements: maybe the alignment attribute in LLVM's store and
load operations should be confined/constrained to i64? I believe that
way one can avoid typing the value in the attribute dictionary. I.e.,
from `{ alignment = 32 : i64 }` to `{ alignment = 32}`
2025-08-05 12:06:57 -04:00
Longsheng Mou
f1ca88cebf
[mlir][tosa] Use typeConverter->convertType<T> (#150578)
Since `resultTy` might be nullptr, we should use `dyn_cast` instead of
`cast`. Additionally, `typeConverter->convertType<T>` is more
appropriate in this context.
2025-08-04 17:28:31 +08:00
Jack Frankland
96c8b9e508
[mlir][memref][spirv] Add SPIR-V Image Lowering (#150978)
Adds an initial conversion in the Memref -> SPIR-V lowering for images.
Any memref in the "Image" storage-class/address-space will be considered
for lowering to the `!spirv.image` type during Memref to SPIR-V
conversion. Initially only the reading of sampled images are support and
images are read via the `OpImageFetch` instruction. Future work should
expand the conversion patterns to target non-sampled images and add
support for image write operations.

Images are supported for fp32, fp16, int32, uint32, int16 and uint16
types and lit tests have been added to verify this is the case along
with negative testing to check the cases where images aren't supported.

---------

Signed-off-by: Jack Frankland <jack.frankland@arm.com>
2025-08-04 10:16:26 +01:00
Thomas Raoux
0b37de2968
[MLIR][SCF] Propagate loop annotation during while op lowering (#151746)
This is expanding on https://github.com/llvm/llvm-project/pull/102562 

This allows also propagating attributes for scf.while lowering
2025-08-01 12:14:58 -07:00
Jaeho Kim
103461f119
[mlir][spirv] Fix lookup logic spirv.target_env for gpu.module (#147262)
The `gpu.module` operation can contain `spirv.target_env` attributes
within an array attribute named `"targets"`. So it accounts for that
case by iterating over the `"targets"` attribute, if present, and
looking up `spirv.target_env`.

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-08-01 06:54:04 -04:00
Xiaolei Feng
1bc5885186
[MLIR][SPIRV] Add spirv.IsFinite and lower math.{isfinite,isinf,isnan} to spirv. (#151552)
This patch adds support for lowering several float classification ops
from the Math dialect to the SPIR-V dialect.

### Highlights:
- Introduced a new `spirv.IsFinite` operation corresponding to the
SPIR-V `OpIsFinite` instruction.
- Lowered `math.isfinite`, `math.isinf`, and `math.isnan` to SPIR-V
using `CheckedElementwiseOpPattern`.
- Added corresponding tests for op definition and conversion lowering.

This addresses the discussion in:
https://github.com/llvm/llvm-project/issues/150778

---

Let me know if any additional adjustments are needed!

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-07-31 13:54:14 -04:00
Md Abdullah Shahneous Bari
b9a627e6fb
[mlir][spirv] Add 8-bit float type emulation (#148811)
8-bit floats are not supported in SPIR-V. They are emulated as 8-bit
integer during conversion.
2025-07-30 17:39:49 -05:00
Mehdi Amini
75e5a70577
[MLIR] Migrate some conversion passes and dialects to LDBG() macro (NFC) (#151349) 2025-07-30 17:58:54 +02:00
Mehdi Amini
0d8abc2188
[MLIR] Migrate NVVM to the new LDBG debug macro (NFC) (#151162) 2025-07-30 13:28:51 +02:00
Akash Banerjee
0a4c6522a6
[MLIR] Add conversion support for more ops from ComplexToROCDLLibraryCalls (#151166) 2025-07-29 17:11:46 +01:00
Hsiangkai Wang
0d21522c00
[mlir][gpu] Make offset and width in gpu.rotate as attributes (#150901)
`offset` and `width` must be constants and there are constraints on
their values. Update the operation definition to use attributes instead
of operands.
2025-07-29 09:02:42 +01:00
Jaden Angella
5949f4596e
[mlir][EmitC]Expand the MemRefToEmitC pass - Lowering AllocOp (#148257)
This aims to lower `memref.alloc` to `emitc.call_opaque “malloc” ` or
`emitc.call_opaque “aligned_alloc” `
From:
```
module{
  func.func @allocating() {
  %alloc_5 = memref.alloc() : memref<999xi32>
  return
  }
}
```

To:
```
module {
  emitc.include <"stdlib.h">
  func.func @allocating() {
    %0 = emitc.call_opaque "sizeof"() {args = [i32]} : () -> !emitc.size_t
    %1 = "emitc.constant"() <{value = 999 : index}> : () -> index
    %2 = emitc.mul %0, %1 : (!emitc.size_t, index) -> !emitc.size_t
    %3 = emitc.call_opaque "malloc"(%2) : (!emitc.size_t) -> !emitc.ptr<!emitc.opaque<"void">>
    %4 = emitc.cast %3 : !emitc.ptr<!emitc.opaque<"void">> to !emitc.ptr<i32>
    return
  }
}
```
Which is then translated as:
```
#include <stdlib.h>
void allocating() {
  size_t v1 = sizeof(int32_t);
  size_t v2 = 999;
  size_t v3 = v1 * v2;
  void* v4 = malloc(v3);
  int32_t* v5 = (int32_t*) v4;
  return;
}
```
2025-07-28 18:48:26 -07:00
Diego Caballero
33465bb2bb
[mlir][Vector] Remove vector.extractelement and vector.insertelement ops (#149603)
This PR removes `vector.extractelement` and `vector.insertelement` ops
from the code base in favor of the `vector.extract` and `vector.insert`
counterparts.

See RFC:
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops
2025-07-28 11:01:14 -07:00
Matthias Springer
cccde9b2b1
[mlir][SCF] Do not access erased operation in scf.while lowering (#150741)
Do not access the erased `scf.while` operation in the lowering pattern.
That won't work anymore in a One-Shot Dialect Conversion and triggers a
use-after-free sanitizer error.

After the One-Shot Dialect Conversion refactoring, a
`ConversionPatternRewriter` will behave more like a normal
`PatternRewriter`.
2025-07-26 11:07:06 +02:00
Maksim Levental
9e7834cadf
[mlir][NFC] update mlir/lib create APIs (35/n) (#150708)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25 16:47:57 -07:00
Fabian Mora
34a08cb89c
[mlir][LLVM] Remove llvm deps from the LLVM dialect (#150692)
This patch removes spurious includes of `llvm/IR` files, and unnecessary
link components in the LLVM dialect.

The only major dependencies still coming from LLVM are
`llvm::DataLayout`, which is used by `verifyDataLayoutString` and some
`dwarf` symbols in some attributes. Both of them should likely be
removed in the future.

Finally, I also removed one constructor from `LLVM::AssumeOp` that used
[OperandBundleDefT](https://llvm.org/doxygen/classllvm_1_1OperandBundleDefT.html)
without good reason and introduced a header unnecessarily.
2025-07-25 16:51:47 -04:00
Maksim Levental
258daf5395
[mlir][NFC] update mlir create APIs (34/n) (#150660)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25 12:36:54 -05:00
Maksim Levental
c610b24493
[mlir][NFC] update mlir/Dialect create APIs (27/n) (#150638)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25 11:48:32 -05:00
Frank Schlimbach
b2d4963ee9
[NFC][mlir][mesh,shard] Fixing misnomers in mesh dialect, renaming 'mesh' dialect to 'shard' (#150177)
Dialect to 'shard' (discourse 87053)
  - dialect name mesh -> shard
  - (device) mesh -> (device) grid
  - spmdize -> partition

A lot of diffs, but simple renames only.

@tkarna @yaochengji
2025-07-25 16:53:08 +02:00
Maksim Levental
8e8f195322
[mlir][amd] fix LLVM::InsertValueOp::create failure to disambiguate (#150605)
fixes
https://github.com/llvm/llvm-project/pull/149879#issuecomment-3117145615

Note this happens because ADL can't disambiguate between
`mlir::DenseI64ArrayAttr` and `llvm::ArrayRef<int64_t>` **for the value
0** which I guess is equal to nullptr on some (most?) systems.

Note, this only occurs with the value 0.
2025-07-25 07:56:27 -04:00
Longsheng Mou
f047b735e9
[mlir][NFC] Use getDefiningOp<OpTy>() instead of dyn_cast<OpTy>(getDefiningOp()) (#150428)
This PR uses `val.getDefiningOp<OpTy>()` to replace `dyn_cast<OpTy>(val.getDefiningOp())` , `dyn_cast_or_null<OpTy>(val.getDefiningOp())` and `dyn_cast_if_present<OpTy>(val.getDefiningOp())`.
2025-07-25 10:35:51 +08:00
Kazu Hirata
1a0f482de8
[mlir] Remove unused includes (NFC) (#150476)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-24 11:23:53 -07:00
Krzysztof Drewniak
a4dd51d72f
[mlir][ArithToAMDGPU] Use native packing support (#150342)
The current arith-to-amdgpu patterns for scaling_extf and scaling_truncf
don't take full advantage of the native packing ability of the
intrinsics being targetted. Scaling extension takes the location of the
two elements to be extended as a constant argument (byte for fp4, half
for fp8), and scaling truncation takes a 32-bit input register and a
byte or half to write the truncated values to.

Not using these features would cause excess unneeded register pressure.
This PR resolves the inefficiency.

It also adds a test for the expected usecase of extending or
truncateting a block of 32 values to/from fp4 with a uniform scale to
ensure that this usage has a minimal amount of vector shuffling.
2025-07-24 12:26:03 -05:00