llvm-project

Author	SHA1	Message	Date
erman-gurses	3f37df5b71	[reland][mlir][amdgpu] Shared memory access optimization pass (#79164 ) - Reland: https://github.com/llvm/llvm-project/pull/75627 - Reproduced then fixed the build issue	2024-01-25 07:44:45 -08:00
Mehdi Amini	e611a4cf80	Revert "[mlir][amdgpu] Shared memory access optimization pass" (#78822 ) Reverts llvm/llvm-project#75627 ; it broke the bot: https://lab.llvm.org/buildbot/#/builders/61/builds/53218	2024-01-19 16:41:43 -08:00
erman-gurses	b7360fbe8c	[mlir][amdgpu] Shared memory access optimization pass (#75627 ) It implements transformation to optimize accesses to shared memory. Reference: https://reviews.llvm.org/D127457 _This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by newColIdx = col % vecSize + perm[row](col / vecSize, row) where perm is a permutation function indexed by row and vecSize is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory._	2024-01-19 15:44:45 -08:00
Krzysztof Drewniak	2ebd633f14	[mlir][AMDGPU] Add packed 8-bit float conversion ops and lowering Define operations that wrap the gfx940's new operations for converting between f32 and registers containing packed sets of four 8-bit floats. Define rocdl operations for the intrinsics and an AMDGPU dialect wrapper around them (to account for the fact that MLIR distinguishes the two float formats at the type level but that the LLVM IR does not). Define an ArithToAMDGPU pass, meant to run before conversion to LLVM, that replaces relevant calls to arith.extf and arith.truncf with the packed operations in the AMDGPU dialect. Note that the conversion currently only handles scalars and vectors of rank <= 1, as we do not have a usecase for multi-dimensional vector support right now. Reviewed By: jsjodin Differential Revision: https://reviews.llvm.org/D152457	2023-09-28 14:44:16 +00:00
Giuseppe Rossini	4b3eaee270	[mlir][AMDGPU] Define wrappers for WMMA matrix ops Wave Matrix Multiply Accumulate (WMMA) is the instruction to accelerate matrix multiplication on RDNA3 architectures. LLVM already provides a set of intrinsics to generate wmma instructions. This change uses those intrinsics to enable the feature in MLIR. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D152451	2023-07-20 18:38:35 +00:00
Tres Popp	c1fa60b4cd	[mlir] Update method cast calls to function calls The MLIR classes Type/Attribute/Operation/Op/Value support cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast functionality in addition to defining methods with the same name. This change begins the migration of uses of the method to the corresponding function call as has been decided as more consistent. Note that there still exist classes that only define methods directly, such as AffineExpr, and this does not include work currently to support a functional cast/isa call. Context: * https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…" * Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443 Implementation: This follows a previous patch that updated calls `op.cast<T>()-> cast<T>(op)`. However some cases could not handle an unprefixed `cast` call due to occurrences of variables named cast, or occurring inside of class definitions which would resolve to the method. All C++ files that did not work automatically with `cast<T>()` are updated here to `llvm::cast` and similar with the intention that they can be easily updated after the methods are removed through a find-replace. See https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check for the clang-tidy check that is used and then update printed occurrences of the function to include `llvm::` before. One can then run the following: ``` ninja -C $BUILD_DIR clang-tidy run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-,misc-cast-functions'\ -export-fixes /tmp/cast/casts.yaml mlir/\ -header-filter=mlir/ -fix rm -rf $BUILD_DIR/tools/mlir/*/.inc ``` Differential Revision: https://reviews.llvm.org/D150348	2023-05-12 11:21:30 +02:00
Krzysztof Drewniak	cc4703745f	[mlir][AMDGPU] Add emulation pass for atomics on AMDGPU targets Not all AMDGPU targets support all atomic operations. For example, there are not atomic floating-point adds on the gfx10 series. Add a pass to emulate these operations using a compare-and-swap loop, by analogy to the generic atomicrmw rewrite in MemrefToLLVM. This pass is named generally, as in the future we may have a memref-to-amdgpu that translates constructs like atomicrmw fmax (which doesn't generally exist in LLVM) to the relevant intrinsics, which may themselves require emulation. Since the AMDGPU dialect now has a pass that operates on it, the dialect's directory structure is reorganized to match other similarly complex dialects. The pass should be run before amdgpu-to-rocdl if desired. This commit also adds f64 support to atomic_fmax. Depends on D148722 Reviewed By: nirvedhmeshram Differential Revision: https://reviews.llvm.org/D148724	2023-05-03 21:18:48 +00:00
Krzysztof Drewniak	98c1104d41	[mlir][AMDGPU] Define atomic compare-and-swap for raw buffers This commit adds the buffer cmpswap intrinsic to the ROCDL dialect and its corresponding AMDGPU dialect wrappers. Reviewed By: nirvedhmeshram Differential Revision: https://reviews.llvm.org/D148722	2023-05-03 21:11:20 +00:00
Manupa Karunaratne	584f64365a	[MLIR][AMDGPU][ROCDL] Adding raw.buffer.atomic.fmax/smax/umin support This commit adds support for atomic fmax/smax/umin support for AMDGPU dialect and the dependent dialects to allow such a lowering. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D144097	2023-02-28 16:58:35 +00:00
Krzysztof Drewniak	22f0c7a451	[mlir][AMDGPU] 8-bit float usage in the AMDGPU dialect Upcoming AMD hardware will include functions that accept 8-bit floats. Specifically, there are MFMA instructions that accept 8-bit floats, either using the same or mixed formats. This patch adds MLIR wrappers for these intrinsics and explicitly adds support for 8-bit floats in the gpu-to-rocdl conversion by way of amdgpu-to-rocdl. Since LLVM does not have f8 types, when targeting LLVM for compilation on an AMD GPU, both f8 types used on AMD hardware (f8E5M2FNUZ and f8E4M3FNUZ) are rewritten to i8. This patch also relaxes the restriction that the types of both source operands to a amdgpu.mfma instructions match exactly, as this is not necessarily required for the bf8 (f8E5M2FNUZ) and fp8 (f8E4M3FNUZ) instructions. In addition, since the buffer_{load,store} operations maintain a whitelist of permitted types, we add the relevant f8 types to that list. This patch does not add any implementations of arithmetic operations for f8 types. Reviewed By: jakeh-gc Differential Revision: https://reviews.llvm.org/D143956	2023-02-15 16:46:08 +00:00
Krzysztof Drewniak	499abb243c	Add generic type attribute mapping infrastructure, use it in GpuToX Remapping memory spaces is a function often needed in type conversions, most often when going to LLVM or to/from SPIR-V (a future commit), and it is possible that such remappings may become more common in the future as dialects take advantage of the more generic memory space infrastructure. Currently, memory space remappings are handled by running a special-purpose conversion pass before the main conversion that changes the address space attributes. In this commit, this approach is replaced by adding a notion of type attribute conversions TypeConverter, which is then used to convert memory space attributes. Then, we use this infrastructure throughout the ToLLVM conversions. This has the advantage of loosing the requirements on the inputs to those passes from "all address spaces must be integers" to "all memory spaces must be convertible to integer spaces", a looser requirement that reduces the coupling between portions of MLIR. ON top of that, this change leads to the removal of most of the calls to getMemorySpaceAsInt(), bringing us closer to removing it. (A rework of the SPIR-V conversions to use this new system will be in a folowup commit.) As a note, one long-term motivation for this change is that I would eventually like to add an allocaMemorySpace key to MLIR data layouts and then call getMemRefAddressSpace(allocaMemorySpace) in the relevant ToLLVM in order to ensure all alloca()s, whether incoming or produces during the LLVM lowering, have the correct address space for a given target. I expect that the type attribute conversion system may be useful in other contexts. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D142159	2023-02-09 18:00:46 +00:00
Kazu Hirata	0a81ace004	[mlir] Use std::optional instead of llvm::Optional (NFC) This patch replaces (llvm::\|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 01:25:58 -08:00
Kazu Hirata	a1fe1f5f77	[mlir] Add #include <optional> (NFC) This patch adds #include <optional> to those files containing llvm::Optional<...> or Optional<...>. I'll post a separate patch to actually replace llvm::Optional with std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-13 21:05:06 -08:00
Fangrui Song	cbb0981388	[mlir] llvm::Optional::value => operator*/operator-> std::optional::value() has undesired exception checking semantics and is unavailable in older Xcode (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). The call sites block std::optional migration.	2022-12-17 19:07:38 +00:00
Kazu Hirata	1a36588ec6	[mlir] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-03 18:50:27 -08:00
Krzysztof Drewniak	d6abdf46bc	[mlir][AMDGPU] Remove buffer ops that are statically out of bounds When the bounds check attribute is true, the raw buffer load, store, and atomic operations have well-defined behavior (returning 0 for loads and ignoring stores) when the buffer access exceeds the bounds of the memory being accessed. Because of how LLVM currently implements these buffer operations (as opaque intrinsics), the backend cannot optimize out this known behavior and eliminate the memory operations. Therefore, use MLIR's canonicalization system to eliminate these operations. Reviewed By: nirvedhmeshram Differential Revision: https://reviews.llvm.org/D138146	2022-11-21 16:47:21 +00:00
Rob Suderman	7e52e0fc72	[mlir][amdgpu] Fix signed/unsigned comparison for abid/cbsz comparison Unsigned/signed comparison failure due to implicit signed value. Reviewed By: stella.stamenova Differential Revision: https://reviews.llvm.org/D133061	2022-08-31 15:29:31 -07:00
Krzysztof Drewniak	c55b41d519	[mlir][AMDGPU] Define amdgpu.mfma operator The amdgpu.mfma operator is a wrapper around the Matrix Fused Multiply Add (MFMA) instructions on some AMD GPUs (the CDNA-based MI-* cards). This interface allows for selecting the operation to be performed by specifying the dimensions of the multiplication to be performed and any additional attributes (such as whether to use reduced-precision floating-point math) that are needed to select the relevant mfma instruction and set its parameters. Reviewed By: ThomasRaoux, nirvedhmeshram Differential Revision: https://reviews.llvm.org/D132956	2022-08-31 21:06:12 +00:00
Jacques Pienaar	8df54a6a03	[mlir] Update accessors to prefixed form (NFC) Follow up from flipping dialects to both, flip accessor used to prefixed variant ahead to flipping from _Both to _Prefixed. This just flips to the accessors introduced in the preceding change which are just prefixed forms of the existing accessor changed from. Mechanical change using helper script https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp and clang-format.	2022-06-18 17:53:22 -07:00
Krzysztof Drewniak	f1f05a91ca	[MLIR][AMDGPU] Add AMDGPU dialect, wrappers around raw buffer intrinsics By analogy with the NVGPU dialect, introduce an AMDGPU dialect for AMD-specific intrinsic wrappers. The dialect initially includes wrappers around the raw buffer intrinsics. On AMD GPUs, a memref can be converted to a "buffer descriptor" that allows more precise control of memory access, such as by allowing for out of bounds loads/stores to be replaced by 0/ignored without adding additional conditional logic, which is important for performance. The repository currently contains a limited conversion from transfer_read/transfer_write to Mubuf intrinsics, which are an older, deprecated intrinsic for the same functionality. The new amdgpu.raw_buffer_* ops allow these operations to be used explicitly and for including metadata such as whether the target chipset is an RDNA chip or not (which impacts the interpretation of some bits in the buffer descriptor), while still maintaining an MLIR-like interface. (This change also exposes the floating-point atomic add intrinsic.) Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D122765	2022-05-10 14:59:58 +00:00

20 Commits