llvm-project

Author	SHA1	Message	Date
Jakub Kuderski	59e44799bd	[mlir] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178487 ) Pre-commiting this before landing the new check in https://github.com/llvm/llvm-project/pull/177892	2026-01-28 19:13:47 +00:00
Krzysztof Drewniak	003b28d031	[mlir] Move affine's FoldMemRefAliasOps into its own pass (#172548 ) I'm planning to introduce an interface that'll allow FoldMemRefAliasOps to not know about dialects like NVVM or GPU. To do this, however, I need to get the `affine` ops (which need special handling in order to handle their implicit affine maps) into a separate pass, analogously to how `amdgpu` ops have these patterns under their dialect and ton under `memref`. This commit also changes the expand/collapse_shape index resolvers to return `void`, since they never actually failed and to make it clearer that they modify IR. (Note: An LLM did the initial refactoring and test movement, I've reviewed the results and edited them some.)	2026-01-02 10:13:42 -08:00
Krzysztof Drewniak	3f2e3e67c1	[mlir][AMDGPU][NFC] Fix overlapping masked load refinements (#159805 ) The two paterns for handlig vector.maskedload on AMD GPUs had an overlap - both the "scalar mask becomes an if statement" pattern and the "masked loads become a normal load + a select on buffers" patterns could handle a load with a broadcast mask on a fat buffer resource. This commet add checks to resolve the overlap.	2025-12-02 11:02:45 -08:00
x12301450	b582670f6b	[MLIR][BUG] fix {$VARIABLE} usage in CMakeLists.txt (#156183 ) This pr fixed #156182	2025-09-02 10:32:51 +08:00
sebvince	8949dc7f9c	[mlir][amdgpu] fold memref.subview/expand_shape/collapse_shape into amdgpu.gather_to_lds for DST operand (#152277 )	2025-08-08 05:47:33 -07:00
Maksim Levental	967626b842	[mlir][NFC] update `mlir/Dialect` create APIs (14/n) (#149920 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-24 13:03:47 -05:00
Alan Li	1c3e4e994b	Reapply "[AMDGPU] fold `memref.subview/expand_shape/collapse_shape` into `amdgpu.gather_to_lds`" (#150334 ) This is a reapply of patch #149851. The reapply also fixes a CMake/Bazel build issue, which was the reason of the revert. (Thanks @rupprecht ) Original patch (#149851) message: ----- This PR adds a new optimization pass to fold `memref.subview/expand_shape/collapse_shape` ops into consumer `amdgpu.gather_to_lds` operations. * Implements a new pass `AmdgpuFoldMemRefOpsPass` with pattern `FoldMemRefOpsIntoGatherToLDSOp` * Adds corresponding folding tests	2025-07-24 09:23:15 -04:00
Kazu Hirata	0925d7572a	[mlir] Remove unused includes (NFC) (#150266 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-23 15:18:53 -07:00
Alan Li	9cb5c00bf7	Revert "[AMDGPU] fold `memref.subview/expand_shape/collapse_shape` in… (#150256 ) …to `amdgpu.gather_to_lds` (#149851)" This reverts commit dbc63f1e3724b6f2348c431dc1216537d9c042e8. Having build deps issue.	2025-07-23 12:50:26 -04:00
Alan Li	dbc63f1e37	[AMDGPU] fold `memref.subview/expand_shape/collapse_shape` into `amdgpu.gather_to_lds` (#149851 ) This PR adds a new optimization pass to fold `memref.subview/expand_shape/collapse_shape` ops into consumer `amdgpu.gather_to_lds` operations. * Implements a new pass `AmdgpuFoldMemRefOpsPass` with pattern `FoldMemRefOpsIntoGatherToLDSOp` * Adds corresponding folding tests --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-23 11:22:41 -04:00
Kazu Hirata	cac806bcc5	[mlir] Remove unused includes (NFC) (#148535 )	2025-07-13 13:13:01 -07:00
Kunwar Grover	f96492221d	[mlir][AMDGPU] Add better load/store lowering for full mask (#146748 ) This patch adds a better maskedload/maskedstore lowering on amdgpu backend for loads which are either fully masked or fully unmasked. For these cases, we can either generate a oob buffer load with no if condition, or we can generate a normal load with a if condition (if no fat_raw_buffer space).	2025-07-10 16:11:19 +01:00
Zhuoran Yin	6a97b56ce5	[MLIR][AMDGPU] Redirect transfer read to masked load lowering (#146705 ) This PR reworks https://github.com/llvm/llvm-project/pull/131803. Instead of applying the optimization on transfer_read op, which is too high level, it redirect the pre-existing pattern onto maskedload op. This simplified the implementation of the lowering pattern. This also allows moving the usage of the pass to a target dependent pipeline. Signed-off-by: jerryyin <zhuoryin@amd.com>	2025-07-02 18:24:44 +01:00
Kunwar Grover	6729da647a	[mlir][amdgpu][nfc] Add PatternBenefit to populate methods (#144663 )	2025-06-18 15:19:17 +02:00
Max Graey	8aaac80ddd	[NFC] Use more isa and isa_and_nonnull instead dyn_cast for predicates (#137393 ) Also fix some typos in comments --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2025-05-13 22:34:42 +08:00
Andrzej Warzyński	c45cc3e420	[mlir][vector] Standardize `base` Naming Across Vector Ops (NFC) (#137859 ) [mlir][vector] Standardize base Naming Across Vector Ops (NFC) This change standardizes the naming convention for the argument representing the value to read from or write to in Vector ops that interface with Tensors or MemRefs. Specifically, it ensures that all such ops use the name `base` (i.e., the base address or location to which offsets are applied). Updated operations: * `vector.transfer_read`, * `vector.transfer_write`. For reference, these ops already use `base`: * `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`, `vector.expandload`, `vector.compressstore`, `vector.maskedstore`, `vector.maskedload`. This is a non-functional change (NFC) and does not alter the semantics of these operations. However, it does require users of the XFer ops to switch from `op.getSource()` to `op.getBase()`. To ease the transition, this PR temporarily adds a `getSource()` interface method for compatibility. This is intended for downstream use only and should not be relied on upstream. The method will be removed prior to the LLVM 21 release. Implements #131602	2025-05-12 09:44:50 +01:00
Zhuoran Yin	53e8ff13bd	[MLIR] Fixing the memref linearization size computation for non-packed memref (#138922 ) Credit to @krzysz00 who discovered this subtle bug in `MemRefUtils`. The problem is in `getLinearizedMemRefOffsetAndSize()` utility. In particular, how this subroutine computes the linearized size of a memref is incorrect when given a non-packed memref. ### Background As context, in a packed memref of `memref<8x8xf32>`, we'd compute the size by multiplying the size of dimensions together. This is implemented by composing an affine_map of `affine_map<()[s0, s1] -> (s0 * s1)>` and then computing the result of size via `%size = affine.apply #map()[%c8, %c8]`. However, this is wrong for a non-packed memref of `memref<8x8xf32, strided<[1024, 1]>>`. Since the previous computed multiplication map will only consider the dimension sizes, it'd continue to conclude that the size of the non-packed memref to be 64. ### Solution This PR come up with a fix such that the linearized size computation take strides into consideration. It computes the maximum of (dim size * dim stride) for each dimension. We'd compute the size via the affine_map of `affine_map<()[stride0, size0, stride1] -> ((stride0 * size0), 1 * size1)>` and then computing the size via `%size = affine.max #map()[%stride0, %size0, %size1]`. In particular for the new non-packed memref, the size will be derived as max(1024\8, 1\8) = 8192 (rather than the wrong size 64 computed by packed memref equation).	2025-05-08 13:14:32 -04:00
Kazu Hirata	15f7c6ed70	[mlir] Remove unused local variables (NFC) (#138481 )	2025-05-05 10:08:00 -07:00
Zhuoran Yin	47f4f39265	[MLIR][AMDGPU] Fixing word alignment check for bufferload fastpath (#135982 ) `delta_bytes % (32 ceilDiv elementBitwidth) != 0` condition is incorrect in https://github.com/llvm/llvm-project/pull/135014 For example, last load is issued to load only one last element of fp16. Then `delta bytes = 2`, `(32 ceildiv 16) = 2`. In this case it will be judged as word aligned. It will send to fast path but get all zeros for the fp16 because it cross the word boundary. In reality the equation should be just `delta_bytes % 4` , since a word is 4 bytes. This PR fix the bug by amending the mod target to 4.	2025-04-17 08:50:31 -04:00
Zhuoran Yin	2b983a2458	[MLIR][AMDGPU] Adding dynamic size check to avoid subword buffer load (#135014 ) Motivation: amdgpu buffer load instruction will return all zeros when loading sub-word values. For example, assuming the buffer size is exactly one word and we attempt to invoke `llvm.amdgcn.raw.ptr.buffer.load.v2i32` starting from byte 2 of the word, we will not receive the actual value of the buffer but all zeros for the first word. This is because the boundary has been crossed for the first word. This PR come up with a fix to this problem, such that, it creates a bounds check against the buffer load instruction. It will compare the offset + vector size to see if the upper bound of the address will exceed the buffer size. If it does, masked transfer read will be optimized to `vector.load` + `arith.select`, else, it will continue to fall back to default lowering of the masked vector load.	2025-04-15 16:36:25 -04:00
Zhuoran Yin	ea03bdee70	[MLIR][AMDGPU] Adding Vector transfer_read to load rewrite pattern (#131803 ) This PR adds the Vector transfer_read to load rewrite pattern. The pattern creates a transfer read op lowering. A vector trasfer read op will be lowered to a combination of `vector.load`, `arith.select` and `vector.broadcast` if: - The transfer op is masked. - The memref is in buffer address space. - Other conditions introduced from `TransferReadToVectorLoadLowering` The motivation of this PR is due to the lack of support of masked load from amdgpu backend. `llvm.intr.masked.load` lower to a series of conditional scalar loads refer to (`scalarize-masked-mem-intrin` pass). This PR will make it possible for masked transfer_read to be lowered towards buffer load with bounds check, allowing a more optimized global load accessing pattern compared with existing implementation of `llvm.intr.masked.load` on vectors.	2025-03-21 08:42:04 -04:00
Daniel Hernandez-Juarez	64f67f870d	[mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd for bf16 on gfx942 (#129029 ) - Change to make sure architectures < gfx950 emulate bf16 buffer_atomic_fadd - Add tests for bf16 buffer_atomic_fadd and architectures: gfx12, gfx942 and gfx950 --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2025-03-13 14:30:45 -05:00
Krzysztof Drewniak	42526d240c	[mlir][AMDGPU] Plumb address space 7 through MLIR, add address_space attr. (#125594 ) This commit adds support for casting memrefs into fat raw buffer pointers to the AMDGPU dialect. Fat raw buffer pointers - or, in LLVM terms, ptr addrspcae(7), allow encapsulating a buffer descriptor (as produced by the make.buffer.rsrc intrinsic or provided from some API) into a pointer that supports ordinary pointer operations like load or store. This allows people to take advantage of the additional semantics that buffer_load and similar instructions provide without forcing the use of entirely separate amdgpu.raw_buffer_* operations. Operations on fat raw buffer pointers are translated to the corresponding LLVM intrinsics by the backend. This commit also goes and and defines a #amdgpu.address_space<> attribute so that AMDGPU-specific memory spaces can be represented. Only #amdgpu.address_space<fat_raw_buffer> will work correctly with the memref dialect, but the other possible address spaces are included for completeness. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Prashant Kumar <pk5561@gmail.com>	2025-02-26 16:02:39 -06:00
Fabian Ritter	8900e412ae	[AMDGPU][MLIR] Replace gfx940 and gfx941 with gfx942 in MLIR (#125836 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631	2025-02-19 10:05:45 +01:00
Krzysztof Drewniak	9596e83b2a	[mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd on gfx11 (#108312 ) * Fix a bug introduced by the Chipset refactoring in #107720 where atomics emulation for adds was mistakenly applied to gfx11+ * Add the case needed for gfx11+ atomic emulation, namely that gfx11 doesn't support atomically adding a v2f16 or v2bf16, thus requiring MLIR-level legalization for buffer intrinsics that attempt to do such an addition * Add tests, including tests for gfx11 atomic emulation Co-authored-by: Manupa Karunaratne <manupa.karunaratne@amd.com>	2024-09-12 09:47:52 -05:00
Jakub Kuderski	763bc9249c	[mlir][amdgpu] Align Chipset with TargetParser (#107720 ) Update the Chipset struct to follow the `IsaVersion` definition from llvm's `TargetParser`. This is a follow up to https://github.com/llvm/llvm-project/pull/106169#discussion_r1733955012. * Add the stepping version. Note: This may break downstream code that compares against the minor version directly. * Use comparisons with full Chipset version where possible. Note that we can't use the code in `TargetParser` directly because the chipset utility is outside of `mlir/Target` that re-exports llvm's target library.	2024-09-09 11:12:26 -04:00
Jakub Kuderski	44718311de	[mlir][amdgpu] Remove shared memory optimization pass (#88225 ) This implementation has a number of issues and ultimately does not work on gfx9. * It does not reduce bank conflicts with wide memory accesses. * It does not correctly account for when LDS bank conflicts occur on amdgpu. * The implementation is too fragile to be used on real-world code. For example, the code bails out on any `memref.subview` in the root op, even when the subview is not a user of any of the `memref.alloc` ops. I do not see how these can be easily fixed, therefore I think it's better to delete this code.	2024-04-11 11:07:17 -04:00
Thomas Preud'homme	36cf982d6c	[MLIR] Add missing MLIRFuncDialect dep to MLIRAMDGPUTransforms (#84550 ) This fixes the following failure when doing a clean build (in particular no .ninja* lying around) of lib/libMLIRAMDGPUTransforms.a only: ``` In file included from mlir/lib/Dialect/AMDGPU/Transforms/OptimizeSharedMemory.cpp:21: mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error: mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory ```	2024-03-11 23:08:56 +00:00
erman-gurses	87c0260f45	[AMDGPU] Add parameterization for optimized shared memory variables (#82508 ) - This PR adds parameterization for shared memory variables that are used for optimization: `sharedMemoryLineSizeBytes` and `defaultVectorSizeBits.` - The default values are set to 128 for both variables since it gives zero bank conflicts.	2024-02-27 23:28:12 -05:00
erman-gurses	04381c106f	[MLIR][AMDGPU]Add refactoring for shared-mem optimization (#81791 ) Addressing the issues in this PR: https://github.com/llvm/llvm-project/pull/81550	2024-02-15 13:53:15 -05:00
erman-gurses	29d1aca05c	[AMDGPU][MLIR]Add shmem-optimization as an op using transform dialect (#81550 ) This PR adds functionality to use shared memory optimization as an op using transform dialect.	2024-02-13 17:42:04 -08:00
erman-gurses	3f37df5b71	[reland][mlir][amdgpu] Shared memory access optimization pass (#79164 ) - Reland: https://github.com/llvm/llvm-project/pull/75627 - Reproduced then fixed the build issue	2024-01-25 07:44:45 -08:00
Mehdi Amini	e611a4cf80	Revert "[mlir][amdgpu] Shared memory access optimization pass" (#78822 ) Reverts llvm/llvm-project#75627 ; it broke the bot: https://lab.llvm.org/buildbot/#/builders/61/builds/53218	2024-01-19 16:41:43 -08:00
erman-gurses	b7360fbe8c	[mlir][amdgpu] Shared memory access optimization pass (#75627 ) It implements transformation to optimize accesses to shared memory. Reference: https://reviews.llvm.org/D127457 _This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by newColIdx = col % vecSize + perm[row](col / vecSize, row) where perm is a permutation function indexed by row and vecSize is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory._	2024-01-19 15:44:45 -08:00
Krzysztof Drewniak	ba6d7a0f25	[mlir][AMDGPU] Add gfx941 to buffer atomics emulation Reviewed By: fmorac Differential Revision: https://reviews.llvm.org/D152299	2023-09-13 16:07:07 +00:00
Daniil Dudkin	8a6e54c9b3	[mlir][arith] Rename operations: `maxf` → `maximumf`, `minf` → `minimumf` (#65800 ) This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.	2023-09-11 22:02:19 -07:00
Mehdi Amini	363b655920	Finish renaming getOperandSegmentSizeAttr() from `operand_segment_sizes` to `operandSegmentSizes` This renaming started with the native ODS support for properties, this is completing it. A mass automated textual rename seems safe for most codebases. Drop also the ods prefix to keep the accessors the same as they were before this change: properties.odsOperandSegmentSizes reverts back to: properties.operandSegementSizes The ODS prefix was creating divergence between all the places and make it harder to be consistent. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D157173	2023-08-09 19:37:01 -07:00
Matthias Springer	71d50c890b	[mlir][IR] Improve listener notifications for ops without results `RewriterBase::Listener::notifyOperationReplaced` notifies observers that an op is about to be replaced with a range of values. This notification is not very useful for ops without results, because it does not specify the replacement op (and it cannot be deduced from the replacement values). It provides no additional information over the `notifyOperationRemoved` notification. This revision adds an additional notification when a rewriter replaces an op with another op. By default, this notification triggers the original "op replaced with values" notification, so there is no functional change for existing code. This new API is useful for the transform dialect, which needs to track op replacements. (Updated in a subsequent revision.) Also includes minor documentation improvements. Differential Revision: https://reviews.llvm.org/D152814	2023-06-14 08:51:14 +02:00
Tres Popp	5550c82189	[mlir] Move casting calls from methods to function calls The MLIR classes Type/Attribute/Operation/Op/Value support cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast functionality in addition to defining methods with the same name. This change begins the migration of uses of the method to the corresponding function call as has been decided as more consistent. Note that there still exist classes that only define methods directly, such as AffineExpr, and this does not include work currently to support a functional cast/isa call. Caveats include: - This clang-tidy script probably has more problems. - This only touches C++ code, so nothing that is being generated. Context: - https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…" - Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443 Implementation: This first patch was created with the following steps. The intention is to only do automated changes at first, so I waste less time if it's reverted, and so the first mass change is more clear as an example to other teams that will need to follow similar steps. Steps are described per line, as comments are removed by git: 0. Retrieve the change from the following to build clang-tidy with an additional check: https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check 1. Build clang-tidy 2. Run clang-tidy over your entire codebase while disabling all checks and enabling the one relevant one. Run on all header files also. 3. Delete .inc files that were also modified, so the next build rebuilds them to a pure state. 4. Some changes have been deleted for the following reasons: - Some files had a variable also named cast - Some files had not included a header file that defines the cast functions - Some files are definitions of the classes that have the casting methods, so the code still refers to the method instead of the function without adding a prefix or removing the method declaration at the same time. ``` ninja -C $BUILD_DIR clang-tidy run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-,misc-cast-functions'\ -header-filter=mlir/ mlir/ -fix rm -rf $BUILD_DIR/tools/mlir/*/.inc git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\ mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\ mlir/lib/**/IR/\ mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\ mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\ mlir/test/lib/Dialect/Test/TestTypes.cpp\ mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\ mlir/test/lib/Dialect/Test/TestAttributes.cpp\ mlir/unittests/TableGen/EnumsGenTest.cpp\ mlir/test/python/lib/PythonTestCAPI.cpp\ mlir/include/mlir/IR/ ``` Differential Revision: https://reviews.llvm.org/D150123	2023-05-12 11:21:25 +02:00
Krzysztof Drewniak	cc4703745f	[mlir][AMDGPU] Add emulation pass for atomics on AMDGPU targets Not all AMDGPU targets support all atomic operations. For example, there are not atomic floating-point adds on the gfx10 series. Add a pass to emulate these operations using a compare-and-swap loop, by analogy to the generic atomicrmw rewrite in MemrefToLLVM. This pass is named generally, as in the future we may have a memref-to-amdgpu that translates constructs like atomicrmw fmax (which doesn't generally exist in LLVM) to the relevant intrinsics, which may themselves require emulation. Since the AMDGPU dialect now has a pass that operates on it, the dialect's directory structure is reorganized to match other similarly complex dialects. The pass should be run before amdgpu-to-rocdl if desired. This commit also adds f64 support to atomic_fmax. Depends on D148722 Reviewed By: nirvedhmeshram Differential Revision: https://reviews.llvm.org/D148724	2023-05-03 21:18:48 +00:00

40 Commits