llvm-project

Author	SHA1	Message	Date
Yang Bai	f1f194bf10	[mlir][vector] fix: unroll vector.from_elements in gpu pipelines (#154774 ) ### Problem PR #142944 introduced a new canonicalization pattern which caused failures in the following GPU-related integration tests: - mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir - mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir The issue occurs because the new canonicalization pattern can generate multi-dimensional `vector.from_elements` operations (rank > 1), but the GPU lowering pipelines were not equipped to handle these during the conversion to LLVM. ### Fix This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`: - `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass. - `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass. Co-authored-by: Yang Bai <yangb@nvidia.com>	2025-08-21 21:46:06 -05:00
Tim Gymnich	e20fa4f412	[mlir][AMDGPU] Add PermlaneSwapOp (#154345 ) - Add PermlaneSwapOp that lowers to `rocdl.permlane16.swap` and `rocdl.permlane32.swap` --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-08-21 18:21:43 +02:00
Guray Ozen	5c36fb3303	[MLIR][NVVM] Improve inline_ptx, add readwrite support (#154358 ) Key Features 1. Multiple SSA returns – no struct packing/unpacking required. 2. Automatic struct unpacking – values are directly usable. 3. Readable register mapping * {$rwN} → read-write * {$roN} → read-only * {$woN} → write-only 4. Full read-write support (+ modifier). 5. Simplified operand specification – avoids cryptic "=r,=r,=f,=f,f,f,0,1" constraints. 6. Predicate support: PTX `@p` predication support IR Example: ``` %wo0, %wo1 = nvvm.inline_ptx """ .reg .pred p; setp.ge.s32 p, {$r0}, {$r1}; selp.s32 {$rw0}, {$r0}, {$r1}, p; selp.s32 {$rw1}, {$r0}, {$r1}, p; selp.s32 {$w0}, {$r0}, {$r1}, p; selp.s32 {$w1}, {$r0}, {$r1}, p; """ ro(%a, %b : f32, f32) rw(%c, %d : i32, i32) -> f32, f32 ``` After lowering ``` %0 = llvm.inline_asm has_side_effects asm_dialect = att "{ .reg .pred p;\ setp.ge.s32 p, $4, $5; \ selp.s32 $0, $4, $5, p;\ selp.s32 $1, $4, $5, p;\ selp.s32 $2, $4, $5, p;\ selp.s32 $3, $4, $5, p;\ }" "=r,=r,=f,=f,f,f,0,1" %c500_i32, %c400_i32, %cst, %cst_0 : (i32, i32, f32, f32) -> !llvm.struct<(i32, i32, f32, f32)> %1 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> %2 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> %3 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> %4 = llvm.extractvalue %0 : !llvm.struct<(i32, i32, f32, f32)> // Unpacked result from nvvm.inline_ptx %5 = arith.addi %1, %2 : i32 // read only %6 = arith.addf %cst, %cst_0 : f32 // write only %7 = arith.addf %3, %4 : f32 ```	2025-08-21 17:42:18 +02:00
Mehdi Amini	780750bbf9	[MLIR] Adopt LDBG() debug macro in ConvertToLLVMPass (NFC) (#154616 )	2025-08-20 21:29:35 +00:00
Akash Banerjee	d69ccded4f	[MLIR] Add cpow support in ComplexToROCDLLibraryCalls (#153183 ) This PR adds support for complex power operations (`cpow`) in the `ComplexToROCDLLibraryCalls` conversion pass, specifically targeting AMDGPU architectures. The implementation optimises complex exponentiation by using mathematical identities and special-case handling for small integer powers. - Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa` target instead of using library calls - Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using mathematical identity	2025-08-20 17:18:30 +00:00
Yang Bai	4eb1a07d7d	[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175 ) This patch introduces a new unrolling-based approach for lowering multi-dimensional `vector.from_elements` operations. Implementation Details: 1. New Transform Pattern: Added `UnrollFromElements` that unrolls a N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the outermost dimension. 2. Utility Functions: Added `unrollVectorOp` to reuse the unroll algo of vector.gather for vector.from_elements. 3. Integration: Added the unrolling pattern to the convert-vector-to-llvm pass as a temporal transformation. 4. Use direct LLVM dialect operations instead of intermediate vector.insert operations for efficiency in `VectorFromElementsLowering`. Example: ```mlir // unroll %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32> %vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32> %vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32> %result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32> // convert-vector-to-llvm %v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32> => %poison_2d = ub.poison : vector<2x2xf32> %poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>> %poison_1d_0 = llvm.mlir.poison : vector<2xf32> %c0_0 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32> %c1_0 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32> %vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>> %poison_1d_1 = llvm.mlir.poison : vector<2xf32> %c0_1 = llvm.mlir.constant(0 : i64) : i64 %vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32> %c1_1 = llvm.mlir.constant(1 : i64) : i64 %vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32> %vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>> %result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32> ``` --------- Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com> Co-authored-by: Yang Bai <yangb@nvidia.com> Co-authored-by: James Newling <james.newling@gmail.com> Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>	2025-08-18 10:09:12 -07:00
Jacques Pienaar	4bf33958da	[mlir] Update builders to use new form. (#154132 ) Mechanically applied using clang-tidy.	2025-08-18 15:19:34 +00:00
Matthias Springer	f7b09ad700	[mlir][LLVM] `ArithToLLVM`: Add 1:N support for `arith.select` lowering (#153944 ) Add 1:N support for the `arith.select` lowering. Only cases where the entire true/false value is selected are supported.	2025-08-18 09:42:37 +02:00
Matthias Springer	2692ff8213	[mlir][LLVM] Fix build (#153947 ) Fix build after #153937.	2025-08-16 13:06:58 +02:00
Matthias Springer	f8f23e838a	[mlir][LLVM] `ControlFlowToLLVM`: Add 1:N type conversion support (#153937 ) Add support for 1:N type conversions to the `ControlFlowToLLVM` lowering patterns. Not applicable to `cf.switch` and `cf.assert`. --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>	2025-08-16 12:51:40 +02:00
Matthias Springer	f0967fca04	[mlir][LLVM] `FuncToLLVM`: Add 1:N type conversion support (#153823 ) Add support for 1:N type conversions to the `FuncToLLVM` lowering patterns. This commit does not change the lowering of any types (such as `MemRefType`). It just sets up the infrastructure, such that 1:N type conversions can be used during `FuncToLLVM`. Note: When the converted result types of a `func.func` have more than 1 type, then the results are wrapped in an `llvm.struct`. That's because `llvm.func` does not support multiple result values. This "wrapping" was already implemented for cases where the original `func.func` has multiple results. With 1:N conversions, even a single result can now expand to multiple converted results, triggering the same wrapping mechanism. The test cases are exercised with both the old and the new no-rollback conversion driver.	2025-08-16 09:45:08 +02:00
Guray Ozen	4c389178ee	[MLIR][NVVM] Print readable modifer (NFC) (#153779 ) Currently, modifier is printed as address, so it is not readable and not useful. This PR adds readable printing for it. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-15 15:47:39 +02:00
Matthias Springer	21b607adbe	[mlir][SCF] `scf.for`: Add support for unsigned integer comparison (#153379 ) Add a new unit attribute to allow for unsigned integer comparison. Example: ```mlir scf.for unsigned %iv_32 = %lb_32 to %ub_32 step %step_32 : i32 { // body } ``` Discussion: https://discourse.llvm.org/t/scf-should-scf-for-support-unsigned-comparison/84655	2025-08-15 10:59:14 +02:00
Jianhui Li	98728d9dc8	[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter (#152429 ) Lowering transfer_read/transfer_write to load_gather/store_scatter in case the target uArch doesn't support load_nd/store_nd. The high level steps: 1. compute Strides; 2. compute Offsets; 3. collapseMemrefTo1D; 4. create Load gather or store_scatter op	2025-08-14 11:27:07 -07:00
Matthias Springer	e2ae634cc1	[mlir][LLVM][NFC] Simplify `copyUnrankedDescriptors` (#153597 ) Split the function into two: one that copies a single unranked descriptor and one that copies multiple unranked descriptors. This is in preparation of adding 1:N support to the Func->LLVM lowering patterns.	2025-08-14 18:25:19 +02:00
Matthias Springer	0ff92fe2f0	[mlir][LLVM][NFC] Simplify `computeSizes` function (#153588 ) Rename `computeSizes` to `computeSize` and make it compute just a single size. This is in preparation of adding 1:N support to the Func->LLVM lowering patterns.	2025-08-14 17:00:03 +02:00
Jaden Angella	bfda0e777d	[mlir][EmitC] Expand the MemRefToEmitC pass - Lowering `CopyOp` (#151206 ) This patch lowers `memref.copy` to `emitc.call_opaque "memcpy"`. From: ``` func.func @copying(%arg0 : memref<9x4x5x7xf32>, %arg1 : memref<9x4x5x7xf32>) { memref.copy %arg0, %arg1 : memref<9x4x5x7xf32> to memref<9x4x5x7xf32> return } ``` To: ```cpp #include <cstring> void copying(float v1[9][4][5][7], float v2[9][4][5][7]) { size_t v3 = 0; float* v4 = &v2[v3][v3][v3][v3]; float* v5 = &v1[v3][v3][v3][v3]; size_t v6 = sizeof(float); size_t v7 = 1260; size_t v8 = v6 * v7; memcpy(v5, v4, v8); return; } ```	2025-08-14 05:25:55 -07:00
Krzysztof Drewniak	bbe3d64b39	[mlir][ROCDL] Annotate lane ID functions with noundef, ranges (#151396 ) Now that we have general support for setting argument and result attributes on LLVM intrinsics, extend the definitions of mbcnt.lo and mbcnt.hi to carry such attributes. With that, update the construction of the mbcnt.lo/mbcnt.hi calls used to get the lane ID to be `noundef` (since the lane ID is always defined) and to be annotated with the correct ranges (so that generic LLVM passes can correctly optimized based on the fact that there are never more than 32/64 lanes). (Also, handle a pattern that wasn't using getLaneId() and get rid of a dead argument)	2025-08-13 17:44:03 -05:00
Matthias Springer	7e7c9d975e	[mlir][Transforms] Dialect Conversion Driver without Rollback (#151865 ) This commit improves the `allowPatternRollback` flag handling in the dialect conversion driver. Previously, this flag was used to merely detect cases that are incompatible with the new One-Shot Dialect Conversion driver. This commit implements the driver itself: when the flag is set to "false", all IR changes are materialized immediately, bypassing the `IRRewrite` and `ConversionValueMapping` infrastructure. A few selected test cases now run with both the old and the new driver. RFC: https://discourse.llvm.org/t/rfc-a-new-one-shot-dialect-conversion-driver/79083	2025-08-13 17:40:55 +02:00
Adam Siemieniuk	7d1b9cad87	[mlir][amx] Vector to AMX conversion pass (#151121 ) Adds a pass for Vector to AMX operation conversion. Initially, a direct rewrite for vector contraction in packed VNNI layout is supported. Operations are expected to already be in shapes which are AMX-compatible for the rewriting to occur.	2025-08-13 11:08:52 +02:00
Gao Yanfeng	24f5385a85	[MLIR][NVVM] Support generating all the ldmatrix intrinsics from NVVM ops (#148783 ) Previously, the NVVM dialect's ldmatrix operation could only generate a limited subset of the available NVVM ldmatrix intrinsics. The intrinsics generating new ops introduced in BlackWell are not accessible through the NVVM ops. This commit extends the ldmatrix operation to support all available ldmatrix intrinsics.	2025-08-12 15:13:15 +01:00
Akash Banerjee	e1a694cd16	[NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls	2025-08-12 15:06:03 +01:00
Akash Banerjee	c1f410779a	Revert "[NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls" This reverts commit b8104fa320f006bacd3e16afb431b5980dd5000a.	2025-08-12 14:18:57 +01:00
Akash Banerjee	b8104fa320	[NFC] Remove invalid conversions in ComplexToROCDLLibraryCalls	2025-08-12 14:05:00 +01:00
Scott Manley	e72335192d	[Arith][MemRef] add AtomicRMWKind::xori to enum (#151701 ) Add missing xor AtomicRMWKind enum in arith. Also add support for xor to memref.atomic_rmw so the change can be tested. This does NOT add it for all users of the enum (e.g. Affine, Vector)	2025-08-11 08:46:06 -04:00
Erick Ochoa Lopez	a1672d7c6a	[mlir][vector] Add alignment attribute to `maskedload` and `maskedstore` (#151690 ) These commits continue the work done in https://github.com/llvm/llvm-project/pull/144344, of adding alignment attributes to operations in the vector and memref. These commits focus on adding the alignment attribute to the `maskedload` and `maskedstore` operations. The `VectorLoadConversion` pattern in VectorToLLVM is a template for `load`, `store`, `maskedload` and `maskedstore` operations. Having the alignment attribute in all these operations would allow for an easy way to propagate the alignment attribute from the vector dialect to the LLVM dialect. This patchset also includes changes to the conversion from VectorToLLVM to propagate the alignment attribute for the vector.{,masked}{load,store} operations.	2025-08-08 09:23:44 -04:00
Erick Ochoa Lopez	6d231fbb05	[mlir] MemRefToSPIRV propagate alignment attributes from MemRef ops. (#151723 ) This patchset: * propagates alignment attributes from memref operations into the SPIR-V dialect, * fixes an error in the logic which previously propagated alignment attributes but did not add other MemoryAccess attributes. * adds a failure condition in the case where the alignment attribute from the memref dialect (64-bit wide) does not fit in SPIR-V's alignment attribute (specified to be 32-bit wide).	2025-08-07 12:18:23 -04:00
Erick Ochoa Lopez	d72e58e422	[MLIR][LLVM] Propagate alignment attribute from memref to LLVM (#151380 ) Propagate alignment attribute from operations in the memref dialect to the LLVM dialect. Possible improvements: maybe the alignment attribute in LLVM's store and load operations should be confined/constrained to i64? I believe that way one can avoid typing the value in the attribute dictionary. I.e., from `{ alignment = 32 : i64 }` to `{ alignment = 32}`	2025-08-05 12:06:57 -04:00
Longsheng Mou	f1ca88cebf	[mlir][tosa] Use `typeConverter->convertType<T>` (#150578 ) Since `resultTy` might be nullptr, we should use `dyn_cast` instead of `cast`. Additionally, `typeConverter->convertType<T>` is more appropriate in this context.	2025-08-04 17:28:31 +08:00
Jack Frankland	96c8b9e508	[mlir][memref][spirv] Add SPIR-V Image Lowering (#150978 ) Adds an initial conversion in the Memref -> SPIR-V lowering for images. Any memref in the "Image" storage-class/address-space will be considered for lowering to the `!spirv.image` type during Memref to SPIR-V conversion. Initially only the reading of sampled images are support and images are read via the `OpImageFetch` instruction. Future work should expand the conversion patterns to target non-sampled images and add support for image write operations. Images are supported for fp32, fp16, int32, uint32, int16 and uint16 types and lit tests have been added to verify this is the case along with negative testing to check the cases where images aren't supported. --------- Signed-off-by: Jack Frankland <jack.frankland@arm.com>	2025-08-04 10:16:26 +01:00
Thomas Raoux	0b37de2968	[MLIR][SCF] Propagate loop annotation during while op lowering (#151746 ) This is expanding on https://github.com/llvm/llvm-project/pull/102562 This allows also propagating attributes for scf.while lowering	2025-08-01 12:14:58 -07:00
Jaeho Kim	103461f119	[mlir][spirv] Fix lookup logic `spirv.target_env` for `gpu.module` (#147262 ) The `gpu.module` operation can contain `spirv.target_env` attributes within an array attribute named `"targets"`. So it accounts for that case by iterating over the `"targets"` attribute, if present, and looking up `spirv.target_env`. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-08-01 06:54:04 -04:00
Xiaolei Feng	1bc5885186	[MLIR][SPIRV] Add spirv.IsFinite and lower math.{isfinite,isinf,isnan} to spirv. (#151552 ) This patch adds support for lowering several float classification ops from the Math dialect to the SPIR-V dialect. ### Highlights: - Introduced a new `spirv.IsFinite` operation corresponding to the SPIR-V `OpIsFinite` instruction. - Lowered `math.isfinite`, `math.isinf`, and `math.isnan` to SPIR-V using `CheckedElementwiseOpPattern`. - Added corresponding tests for op definition and conversion lowering. This addresses the discussion in: https://github.com/llvm/llvm-project/issues/150778 --- Let me know if any additional adjustments are needed! --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-07-31 13:54:14 -04:00
Md Abdullah Shahneous Bari	b9a627e6fb	[mlir][spirv] Add 8-bit float type emulation (#148811 ) 8-bit floats are not supported in SPIR-V. They are emulated as 8-bit integer during conversion.	2025-07-30 17:39:49 -05:00
Mehdi Amini	75e5a70577	[MLIR] Migrate some conversion passes and dialects to LDBG() macro (NFC) (#151349 )	2025-07-30 17:58:54 +02:00
Mehdi Amini	0d8abc2188	[MLIR] Migrate NVVM to the new LDBG debug macro (NFC) (#151162 )	2025-07-30 13:28:51 +02:00
Akash Banerjee	0a4c6522a6	[MLIR] Add conversion support for more ops from ComplexToROCDLLibraryCalls (#151166 )	2025-07-29 17:11:46 +01:00
Hsiangkai Wang	0d21522c00	[mlir][gpu] Make offset and width in gpu.rotate as attributes (#150901 ) `offset` and `width` must be constants and there are constraints on their values. Update the operation definition to use attributes instead of operands.	2025-07-29 09:02:42 +01:00
Jaden Angella	5949f4596e	[mlir][EmitC]Expand the MemRefToEmitC pass - Lowering `AllocOp` (#148257 ) This aims to lower `memref.alloc` to `emitc.call_opaque “malloc” ` or `emitc.call_opaque “aligned_alloc” ` From: ``` module{ func.func @allocating() { %alloc_5 = memref.alloc() : memref<999xi32> return } } ``` To: ``` module { emitc.include <"stdlib.h"> func.func @allocating() { %0 = emitc.call_opaque "sizeof"() {args = [i32]} : () -> !emitc.size_t %1 = "emitc.constant"() <{value = 999 : index}> : () -> index %2 = emitc.mul %0, %1 : (!emitc.size_t, index) -> !emitc.size_t %3 = emitc.call_opaque "malloc"(%2) : (!emitc.size_t) -> !emitc.ptr<!emitc.opaque<"void">> %4 = emitc.cast %3 : !emitc.ptr<!emitc.opaque<"void">> to !emitc.ptr<i32> return } } ``` Which is then translated as: ``` #include <stdlib.h> void allocating() { size_t v1 = sizeof(int32_t); size_t v2 = 999; size_t v3 = v1 * v2; void* v4 = malloc(v3); int32_t* v5 = (int32_t*) v4; return; } ```	2025-07-28 18:48:26 -07:00
Diego Caballero	33465bb2bb	[mlir][Vector] Remove `vector.extractelement` and `vector.insertelement` ops (#149603 ) This PR removes `vector.extractelement` and `vector.insertelement` ops from the code base in favor of the `vector.extract` and `vector.insert` counterparts. See RFC: https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops	2025-07-28 11:01:14 -07:00
Matthias Springer	cccde9b2b1	[mlir][SCF] Do not access erased operation in `scf.while` lowering (#150741 ) Do not access the erased `scf.while` operation in the lowering pattern. That won't work anymore in a One-Shot Dialect Conversion and triggers a use-after-free sanitizer error. After the One-Shot Dialect Conversion refactoring, a `ConversionPatternRewriter` will behave more like a normal `PatternRewriter`.	2025-07-26 11:07:06 +02:00
Maksim Levental	9e7834cadf	[mlir][NFC] update `mlir/lib` create APIs (35/n) (#150708 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-25 16:47:57 -07:00
Fabian Mora	34a08cb89c	[mlir][LLVM] Remove `llvm` deps from the LLVM dialect (#150692 ) This patch removes spurious includes of `llvm/IR` files, and unnecessary link components in the LLVM dialect. The only major dependencies still coming from LLVM are `llvm::DataLayout`, which is used by `verifyDataLayoutString` and some `dwarf` symbols in some attributes. Both of them should likely be removed in the future. Finally, I also removed one constructor from `LLVM::AssumeOp` that used [OperandBundleDefT](https://llvm.org/doxygen/classllvm_1_1OperandBundleDefT.html) without good reason and introduced a header unnecessarily.	2025-07-25 16:51:47 -04:00
Maksim Levental	258daf5395	[mlir][NFC] update `mlir` create APIs (34/n) (#150660 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-25 12:36:54 -05:00
Maksim Levental	c610b24493	[mlir][NFC] update `mlir/Dialect` create APIs (27/n) (#150638 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-25 11:48:32 -05:00
Frank Schlimbach	b2d4963ee9	[NFC][mlir][mesh,shard] Fixing misnomers in mesh dialect, renaming 'mesh' dialect to 'shard' (#150177 ) Dialect to 'shard' (discourse 87053) - dialect name mesh -> shard - (device) mesh -> (device) grid - spmdize -> partition A lot of diffs, but simple renames only. @tkarna @yaochengji	2025-07-25 16:53:08 +02:00
Maksim Levental	8e8f195322	[mlir][amd] fix LLVM::InsertValueOp::create failure to disambiguate (#150605 ) fixes https://github.com/llvm/llvm-project/pull/149879#issuecomment-3117145615 Note this happens because ADL can't disambiguate between `mlir::DenseI64ArrayAttr` and `llvm::ArrayRef<int64_t>` for the value 0 which I guess is equal to nullptr on some (most?) systems. Note, this only occurs with the value 0.	2025-07-25 07:56:27 -04:00
Longsheng Mou	f047b735e9	[mlir][NFC] Use `getDefiningOp<OpTy>()` instead of `dyn_cast<OpTy>(getDefiningOp())` (#150428 ) This PR uses `val.getDefiningOp<OpTy>()` to replace `dyn_cast<OpTy>(val.getDefiningOp())` , `dyn_cast_or_null<OpTy>(val.getDefiningOp())` and `dyn_cast_if_present<OpTy>(val.getDefiningOp())`.	2025-07-25 10:35:51 +08:00
Kazu Hirata	1a0f482de8	[mlir] Remove unused includes (NFC) (#150476 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-24 11:23:53 -07:00
Krzysztof Drewniak	a4dd51d72f	[mlir][ArithToAMDGPU] Use native packing support (#150342 ) The current arith-to-amdgpu patterns for scaling_extf and scaling_truncf don't take full advantage of the native packing ability of the intrinsics being targetted. Scaling extension takes the location of the two elements to be extended as a constant argument (byte for fp4, half for fp8), and scaling truncation takes a 32-bit input register and a byte or half to write the truncated values to. Not using these features would cause excess unneeded register pressure. This PR resolves the inefficiency. It also adds a test for the expected usecase of extending or truncateting a block of 32 values to/from fp4 with a uniform scale to ensure that this usage has a minimal amount of vector shuffling.	2025-07-24 12:26:03 -05:00

1 2 3 4 5 ...

3051 Commits