llvm-project

Author	SHA1	Message	Date
Matthias Springer	5fcf907b34	[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260 ) This commit renames 4 pattern rewriter API functions: * `updateRootInPlace` -> `modifyOpInPlace` * `startRootUpdate` -> `startOpModification` * `finalizeRootUpdate` -> `finalizeOpModification` * `cancelRootUpdate` -> `cancelOpModification` The term "root" is a misnomer. The root is the op that a rewrite pattern matches against (https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional). A rewriter must be notified of all in-place op modifications, not just in-place modifications of the root (https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old function names were confusing and have contributed to various broken rewrite patterns. Note: The new function names use the term "modify" instead of "update" for consistency with the `RewriterBase::Listener` terminology (`notifyOperationModified`).	2024-01-17 11:08:59 +01:00
Matthias Springer	0a8e3dd432	[mlir][Interfaces] `DestinationStyleOpInterface`: Rename `hasTensor/BufferSemantics` (#77574 ) Rename interface functions as follows: * `hasTensorSemantics` -> `hasPureTensorSemantics` * `hasBufferSemantics` -> `hasPureBufferSemantics` These two functions return "true" if the op has tensor/buffer operands but not buffer/tensor operands. Also drop the "ranked" part from the interface, i.e., do not distinguish between ranked/unranked types. The new function names describe the functions more accurately. They also align their semantics with the notion of "tensor semantics" with the bufferization framework. (An op is supposed to be bufferized if it has tensor operands, and we don't care if it also has memref operands.) This change is in preparation of #75273, which adds `BufferizableOpInterface::hasTensorSemantics`. By renaming the functions in the `DestinationStyleOpInterface`, we can avoid name clashes between the two interfaces.	2024-01-12 10:02:54 +01:00
Diego Caballero	98f6289a34	[mlir][Vector] Add support for Value indices to vector.extract/insert `vector.extract/insert` ops only support constant indices. This PR is extending them so that arbitrary values can be used instead. This work is part of the RFC: https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops Differential Revision: https://reviews.llvm.org/D155034	2023-09-22 00:39:32 +00:00
Matthias Springer	15ea2306a4	[mlir][NVGPU] Support N-D masks in transform.nvgpu.create_async_groups Support IR that is generated by the vector-to-scf lowering of N-D vector transfers with a mask. (Until now only 1-D and 2-D transfers were supported.) Only transfers that were fully unrolled are supported. Differential Revision: https://reviews.llvm.org/D157286	2023-08-08 14:30:03 +02:00
Matthias Springer	39d8876da3	[mlir][NVGPU] Support 2D masks in transform.nvgpu.create_async_groups Support IR that is generated by the vector-to-scf lowering of 2D vector transfers with a mask. Only 2D transfers that were fully unrolled are supported at the moment. Differential Revision: https://reviews.llvm.org/D156695	2023-08-07 15:46:54 +02:00
Matthias Springer	db393288ff	[mlir][NVGPU][transform] Add `create_async_groups` transform op This transform looks for suitable vector transfers from global memory to shared memory and converts them to async device copies. Differential Revision: https://reviews.llvm.org/D155569	2023-07-18 14:36:41 +02:00
Matthias Springer	a4f4d82c35	[mlir][NVGPU][NFC] Clean up code structure * Move passes to `Transforms` directory. * Add `Utils.h` (will be utilized in a subsequent change). Differential Revision: https://reviews.llvm.org/D155427	2023-07-17 14:15:42 +02:00
Matthias Springer	61223c49dd	[mlir][GPU] Rename MLIRGPUOps CMake target to MLIRGPUDialect This is for consistency with other dialects. Differential Revision: https://reviews.llvm.org/D150659	2023-05-16 14:25:08 +02:00
Tres Popp	5550c82189	[mlir] Move casting calls from methods to function calls The MLIR classes Type/Attribute/Operation/Op/Value support cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast functionality in addition to defining methods with the same name. This change begins the migration of uses of the method to the corresponding function call as has been decided as more consistent. Note that there still exist classes that only define methods directly, such as AffineExpr, and this does not include work currently to support a functional cast/isa call. Caveats include: - This clang-tidy script probably has more problems. - This only touches C++ code, so nothing that is being generated. Context: - https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…" - Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443 Implementation: This first patch was created with the following steps. The intention is to only do automated changes at first, so I waste less time if it's reverted, and so the first mass change is more clear as an example to other teams that will need to follow similar steps. Steps are described per line, as comments are removed by git: 0. Retrieve the change from the following to build clang-tidy with an additional check: https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check 1. Build clang-tidy 2. Run clang-tidy over your entire codebase while disabling all checks and enabling the one relevant one. Run on all header files also. 3. Delete .inc files that were also modified, so the next build rebuilds them to a pure state. 4. Some changes have been deleted for the following reasons: - Some files had a variable also named cast - Some files had not included a header file that defines the cast functions - Some files are definitions of the classes that have the casting methods, so the code still refers to the method instead of the function without adding a prefix or removing the method declaration at the same time. ``` ninja -C $BUILD_DIR clang-tidy run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-,misc-cast-functions'\ -header-filter=mlir/ mlir/ -fix rm -rf $BUILD_DIR/tools/mlir/*/.inc git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\ mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\ mlir/lib/**/IR/\ mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\ mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\ mlir/test/lib/Dialect/Test/TestTypes.cpp\ mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\ mlir/test/lib/Dialect/Test/TestAttributes.cpp\ mlir/unittests/TableGen/EnumsGenTest.cpp\ mlir/test/python/lib/PythonTestCAPI.cpp\ mlir/include/mlir/IR/ ``` Differential Revision: https://reviews.llvm.org/D150123	2023-05-12 11:21:25 +02:00
Matthias Springer	3230a64936	[mlir][NVGPU] Fix incorrect API usage in RewritePatterns Incorrect API usage was detected by D144552. Differential Revision: https://reviews.llvm.org/D145156	2023-03-02 16:15:09 +01:00
Christopher Bate	6ca1a09f03	[mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr) This is a purely mechanical change that introduces an enum attribute in the GPU dialect to represent the various memref memory spaces as opposed to the hard-coded integer attributes that are currently used. The following steps were taken to make the transition across the codebase: 1. Introduce a pass "gpu-lower-memory-space-attributes": The pass updates all memref types that have a memory space attribute that is a `gpu::AddressSpaceAttr`. These attributes are changed to `IntegerAttr`'s using a mapping that is given by the caller. This pass is based on the "map-memref-spirv-storage-class" pass and the common functions can probably be refactored into a set of utilities under the MemRef dialect. 2. Update the verifiers of GPU/NVGPU dialect operations. If a verifier currently checks the address space of an operand using e.g.`getWorkspaceAddressSpace`, then it can continue to do so. However, the checks are changed to only fail if the memory space is either missing or a wrong value of type `gpu::AddressSpaceAttr`. Otherwise, it just assumes the address space is correct because it was specifically lowered to something other than a `gpu::AddressSpaceAttr`. 3. Update existing gpu-to-llvm conversion infrastructure. In the existing gpu-to-X passes, we add a full conversion equivalent to `gpu-lower-memory-space-attributes` just before doing the conversion to the LLVMDialect. This is done because currently both the gpu-to-llvm passes (rocdl,nvvm) run gpu-to-gpu rewrites within the pass, which introduce `AddressSpaceAttr` memory space annotations. Therefore, I inserted the memory space conversion between the gpu-to-gpu rewrites and the LLVM conversion. For more context see the below discourse discussion: https://discourse.llvm.org/t/gpu-workgroup-shared-memory-address-space-is-hard-coded/ Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D140644	2023-01-13 11:00:10 -07:00
Ramkumar Ramachandra	22426110c5	mlir/tblgen: use std::optional in generation This is part of an effort to migrate from llvm::Optional to std::optional. This patch changes the way mlir-tblgen generates .inc files, and modifies tests and documentation appropriately. It is a "no compromises" patch, and doesn't leave the user with an unpleasant mix of llvm::Optional and std::optional. A non-trivial change has been made to ControlFlowInterfaces to split one constructor into two, relating to a build failure on Windows. See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716 Signed-off-by: Ramkumar Ramachandra <r@artagnon.com> Differential Revision: https://reviews.llvm.org/D138934	2022-12-17 11:13:26 +01:00
Jakub Kuderski	abc362a107	[mlir][arith] Change dialect name from Arithmetic to Arith Suggested by @lattner in https://discourse.llvm.org/t/rfc-define-precise-arith-semantics/65507/22. Tested with: `ninja check-mlir check-mlir-integration check-mlir-mlir-spirv-cpu-runner check-mlir-mlir-vulkan-runner check-mlir-examples` and `bazel build --config=generic_clang @llvm-project//mlir:all`. Reviewed By: lattner, Mogball, rriddle, jpienaar, mehdi_amini Differential Revision: https://reviews.llvm.org/D134762	2022-09-29 11:23:28 -04:00
Mehdi Amini	1e7b48a6c3	Apply clang-tidy fixes for readability-identifier-naming in OptimizeSharedMemory.cpp (NFC)	2022-09-06 09:59:30 +00:00
Michele Scuttari	67d0d7ac0a	[MLIR] Update pass declarations to new autogenerated files The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure. Reviewed By: mehdi_amini, rriddle Differential Review: https://reviews.llvm.org/D132838	2022-08-31 12:28:45 +02:00
Michele Scuttari	039b969b32	Revert "[MLIR] Update pass declarations to new autogenerated files" This reverts commit 2be8af8f0e0780901213b6fd3013a5268ddc3359.	2022-08-30 22:21:55 +02:00
Michele Scuttari	2be8af8f0e	[MLIR] Update pass declarations to new autogenerated files The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure. Reviewed By: mehdi_amini, rriddle Differential Review: https://reviews.llvm.org/D132838	2022-08-30 21:56:31 +02:00
Thomas Raoux	91594b5b98	[mlir][nvpu] Prevent F32ToTF32 pattern to generate illegal IR We shouldn't apply this pattern to non F32->F32 mma.sync operations. Differential Revision: https://reviews.llvm.org/D131902	2022-08-15 16:46:18 +00:00
Manish Gupta	14d79afeae	[mlir][NVGPU] nvgpu.mmasync on F32 through TF32 Adds optional attribute to support tensor cores on F32 datatype by lowering to `mma.sync` with TF32 operands. Since, TF32 is not a native datatype in LLVM we are adding `tf32Enabled` as an attribute to allow the IR to be aware of `MmaSyncOp` datatype. Additionally, this patch adds placeholders for nvgpu-to-nvgpu transformation targeting higher precision tf32x3. For mma.sync on f32 input using tensor cores there are two possibilites: (a) tf32 (1 `mma.sync` per warp-level matrix-multiply-accumulate) (b) tf32x3 (3 `mma.sync` per warp-level matrix-multiply-accumulate) Typically, tf32 tensor core acceleration comes at a cost of accuracy from missing precision bits. While f32 has 23 precision bits, tf32 has only 10 precision bits. tf32x3 aims to recover the precision bits by splitting each operand into two tf32 values and issue three `mma.sync` tensor core operations. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D130294	2022-08-01 23:23:27 +00:00
Jacques Pienaar	136d746ec7	[mlir] Flip accessors to prefixed form (NFC) Another mechanical sweep to keep diff small for flip to _Prefixed.	2022-07-10 21:19:11 -07:00
Jacques Pienaar	8df54a6a03	[mlir] Update accessors to prefixed form (NFC) Follow up from flipping dialects to both, flip accessor used to prefixed variant ahead to flipping from _Both to _Prefixed. This just flips to the accessors introduced in the preceding change which are just prefixed forms of the existing accessor changed from. Mechanical change using helper script https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp and clang-format.	2022-06-18 17:53:22 -07:00
Christopher Bate	829c84ec5b	[mlir][nvgpu] fix MSVC warning regarding left shift Differential Revision: https://reviews.llvm.org/D128088	2022-06-17 16:26:40 -06:00
Christopher Bate	d089d68a2c	[mlir][nvgpu] fix missing build dependency for NVGPUTransforms Fixes build failure caused by 51b925df941a66349deff2467203acc200de5e78	2022-06-17 09:42:49 -06:00
Christopher Bate	51b925df94	[mlir][nvgpu] shared memory access optimization pass This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by `newColIdx = col % vecSize + perm[row](col/vecSize,row)` where `perm` is a permutation function indexed by `row` and `vecSize` is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory. Differential Revision: https://reviews.llvm.org/D127457	2022-06-17 09:31:05 -06:00

24 Commits