llvm-project

Author	SHA1	Message	Date
Guray Ozen	763109e346	[mlir][gpu] Use `known_block_size` to set `maxntid` for NVVM target (#77301 ) Setting thread block size with `maxntid` on the kernel has great performance benefits. In this way, downstream PTX compiler can do better register allocation. MLIR's `gpu.launch` and `gpu.launch_func` already has an attribute (`known_block_size`) that keeps the thread block size when it is known. This PR simply uses this attribute to set `maxntid`.	2024-01-08 14:49:19 +01:00
Guray Ozen	ea84897ba3	[mlir][gpu] Introduce `gpu.dynamic_shared_memory` Op (#71546 ) While the `gpu.launch` Op allows setting the size via the `dynamic_shared_memory_size` argument, accessing the dynamic shared memory is very convoluted. This PR implements the proposed Op, `gpu.dynamic_shared_memory` that aims to simplify the utilization of dynamic shared memory. RFC: https://discourse.llvm.org/t/rfc-simplifying-dynamic-shared-memory-access-in-gpu/ Proposal from RFC This PR `gpu.dynamic.shared.memory` Op to use dynamic shared memory feature efficiently. It is is a powerful feature that enables the allocation of shared memory at runtime with the kernel launch on the host. Afterwards, the memory can be accessed directly from the device. I believe similar story exists for AMDGPU. Current way Using Dynamic Shared Memory with MLIR Let me illustrate the challenges of using dynamic shared memory in MLIR with an example below. The process involves several steps: - memref.global 0-sized array LLVM's NVPTX backend expects - dynamic_shared_memory_size Set the size of dynamic shared memory - memref.get_global Access the global symbol - reinterpret_cast and subview Many OPs for pointer arithmetic ``` // Step 1. Create 0-sized global symbol. Manually set the alignment memref.global "private" @dynamicShmem : memref<0xf16, 3> { alignment = 16 } func.func @main() { // Step 2. Allocate shared memory gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { // Step 3. Access the global object %shmem = memref.get_global @dynamicShmem : memref<0xf16, 3> // Step 4. A sequence of `memref.reinterpret_cast` and `memref.subview` operations. %4 = memref.reinterpret_cast %shmem to offset: [0], sizes: [14, 64, 128], strides: [8192,128,1] : memref<0xf16, 3> to memref<14x64x128xf16,3> %5 = memref.subview %4[7, 0, 0][7, 64, 128][1,1,1] : memref<14x64x128xf16,3> to memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> %6 = memref.subview %5[2, 0, 0][1, 64, 128][1,1,1] : memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> to memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> %7 = memref.subview %6[0, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 73728>, 3> %8 = memref.subview %6[32, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 77824>, 3> // Step.5 Use "test.use.shared.memory"(%7) : (memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>) -> (index) gpu.terminator } ``` Let’s write the program above with that: ``` func.func @main() { gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { %i = arith.constant 18 : index // Step 1: Obtain shared memory directly %shmem = gpu.dynamic_shared_memory : memref<?xi8, 3> %c147456 = arith.constant 147456 : index %c155648 = arith.constant 155648 : index %7 = memref.view %shmem[%c147456][] : memref<?xi8, 3> to memref<64x64xf16, 3> %8 = memref.view %shmem[%c155648][] : memref<?xi8, 3> to memref<64x64xf16, 3> // Step 2: Utilize the shared memory "test.use.shared.memory"(%7) : (memref<64x64xf16, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, 3>) -> (index) } } ``` This PR resolves #72513	2023-11-16 14:42:17 +01:00
Matthias Springer	ce254598b7	[mlir][Conversion] Store const type converter in ConversionPattern ConversionPatterns do not (and should not) modify the type converter that they are using. * Make `ConversionPattern::typeConverter` const. * Make member functions of the `LLVMTypeConverter` const. * Conversion patterns take a const type converter. * Various helper functions (that are called from patterns) now also take a const type converter. Differential Revision: https://reviews.llvm.org/D157601	2023-08-14 09:03:11 +02:00
Nicolas Vasilache	888717e853	[mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD This revision untangles a few more conversion pieces and allows rewriting the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass in a declarative fashion that provides a much better understanding and control. Differential Revision: https://reviews.llvm.org/D157617	2023-08-10 15:30:48 +00:00
Krzysztof Drewniak	499abb243c	Add generic type attribute mapping infrastructure, use it in GpuToX Remapping memory spaces is a function often needed in type conversions, most often when going to LLVM or to/from SPIR-V (a future commit), and it is possible that such remappings may become more common in the future as dialects take advantage of the more generic memory space infrastructure. Currently, memory space remappings are handled by running a special-purpose conversion pass before the main conversion that changes the address space attributes. In this commit, this approach is replaced by adding a notion of type attribute conversions TypeConverter, which is then used to convert memory space attributes. Then, we use this infrastructure throughout the ToLLVM conversions. This has the advantage of loosing the requirements on the inputs to those passes from "all address spaces must be integers" to "all memory spaces must be convertible to integer spaces", a looser requirement that reduces the coupling between portions of MLIR. ON top of that, this change leads to the removal of most of the calls to getMemorySpaceAsInt(), bringing us closer to removing it. (A rework of the SPIR-V conversions to use this new system will be in a folowup commit.) As a note, one long-term motivation for this change is that I would eventually like to add an allocaMemorySpace key to MLIR data layouts and then call getMemRefAddressSpace(allocaMemorySpace) in the relevant ToLLVM in order to ensure all alloca()s, whether incoming or produces during the LLVM lowering, have the correct address space for a given target. I expect that the type attribute conversion system may be useful in other contexts. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D142159	2023-02-09 18:00:46 +00:00
Christopher Bate	6ca1a09f03	[mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr) This is a purely mechanical change that introduces an enum attribute in the GPU dialect to represent the various memref memory spaces as opposed to the hard-coded integer attributes that are currently used. The following steps were taken to make the transition across the codebase: 1. Introduce a pass "gpu-lower-memory-space-attributes": The pass updates all memref types that have a memory space attribute that is a `gpu::AddressSpaceAttr`. These attributes are changed to `IntegerAttr`'s using a mapping that is given by the caller. This pass is based on the "map-memref-spirv-storage-class" pass and the common functions can probably be refactored into a set of utilities under the MemRef dialect. 2. Update the verifiers of GPU/NVGPU dialect operations. If a verifier currently checks the address space of an operand using e.g.`getWorkspaceAddressSpace`, then it can continue to do so. However, the checks are changed to only fail if the memory space is either missing or a wrong value of type `gpu::AddressSpaceAttr`. Otherwise, it just assumes the address space is correct because it was specifically lowered to something other than a `gpu::AddressSpaceAttr`. 3. Update existing gpu-to-llvm conversion infrastructure. In the existing gpu-to-X passes, we add a full conversion equivalent to `gpu-lower-memory-space-attributes` just before doing the conversion to the LLVMDialect. This is done because currently both the gpu-to-llvm passes (rocdl,nvvm) run gpu-to-gpu rewrites within the pass, which introduce `AddressSpaceAttr` memory space annotations. Therefore, I inserted the memory space conversion between the gpu-to-gpu rewrites and the LLVM conversion. For more context see the below discourse discussion: https://discourse.llvm.org/t/gpu-workgroup-shared-memory-address-space-is-hard-coded/ Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D140644	2023-01-13 11:00:10 -07:00
Thomas Raoux	7efdc117b1	[mlir][nvvm] Add lowering of gpu.printf to nvvm When converting to nvvm lowering gpu.printf to vprintf allows us to support printing when running on cuda. Differential Revision: https://reviews.llvm.org/D141049	2023-01-06 17:29:30 +00:00
Christian Sigg	b251b608b5	[mlir][gpu] Unroll ops on vectors which map to intrinsic calls Unroll ops that map to intrinsics when lowering to LLVM, because intrinsics don't support vector operands/results. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D136345	2022-10-28 10:33:38 +02:00
Mogball	d7ef488bb6	[mlir][gpu] Move GPU headers into IR/ and Transforms/ Depends on D127350 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127352	2022-06-09 22:49:03 +00:00
Krzysztof Drewniak	e1da62910e	[MLIR][GPU] Define gpu.printf op and its lowerings - Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments - Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP. - Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle. And: [MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support. In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D110448	2021-12-09 15:54:31 +00:00
River Riddle	195730a650	[mlir][NFC] Replace references to Identifier with StringAttr This is part of the replacement of Identifier with StringAttr. Differential Revision: https://reviews.llvm.org/D113953	2021-11-16 17:36:26 +00:00
River Riddle	ef976337f5	[mlir:OpConversion] Remove the remaing usages of the deprecated matchAndRewrite methods This commits updates the remaining usages of the ArrayRef<Value> based matchAndRewrite/rewrite methods in favor of the new OpAdaptor overload. Differential Revision: https://reviews.llvm.org/D110360	2021-09-24 17:51:41 +00:00
Alex Zinenko	75e5f0aac9	[mlir] factor memref-to-llvm lowering out of std-to-llvm After the MemRef has been split out of the Standard dialect, the conversion to the LLVM dialect remained as a huge monolithic pass. This is undesirable for the same complexity management reasons as having a huge Standard dialect itself, and is even more confusing given the existence of a separate dialect. Extract the conversion of the MemRef dialect operations to LLVM into a separate library and a separate conversion pass. Reviewed By: herhut, silvas Differential Revision: https://reviews.llvm.org/D105625	2021-07-09 14:49:52 +02:00
Alex Zinenko	4c4876c314	[mlir] Use target-specific GPU kernel attributes in lowering pipelines Until now, the GPU translation to NVVM or ROCDL intrinsics relied on the presence of the generic `gpu.kernel` attribute to attach additional LLVM IR metadata to the relevant functions. This would be problematic if each dialect were to handle the conversion of its own options, which is the intended direction for the translation infrastructure. Introduce `nvvm.kernel` and `rocdl.kernel` in addition to `gpu.kernel` and base translation on these new attributes instead. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D96591	2021-02-12 14:09:24 +01:00
Alex Zinenko	2230bf99c7	[mlir] replace LLVMIntegerType with built-in integer type The LLVM dialect type system has been closed until now, i.e. did not support types from other dialects inside containers. While this has had obvious benefits of deriving from a common base class, it has led to some simple types being almost identical with the built-in types, namely integer and floating point types. This in turn has led to a lot of larger-scale complexity: simple types must still be converted, numerous operations that correspond to LLVM IR intrinsics are replicated to produce versions operating on either LLVM dialect or built-in types leading to quasi-duplicate dialects, lowering to the LLVM dialect is essentially required to be one-shot because of type conversion, etc. In this light, it is reasonable to trade off some local complexity in the internal implementation of LLVM dialect types for removing larger-scale system complexity. Previous commits to the LLVM dialect type system have adapted the API to support types from other dialects. Replace LLVMIntegerType with the built-in IntegerType plus additional checks that such types are signless (these are isolated in a utility function that replaced `isa<LLVMType>` and in the parser). Temporarily keep the possibility to parse `!llvm.i32` as a synonym for `i32`, but add a deprecation notice. Reviewed By: mehdi_amini, silvas, antiagainst Differential Revision: https://reviews.llvm.org/D94178	2021-01-07 19:48:31 +01:00
Alex Zinenko	c69c9e0f0f	[mlir] Remove LLVMType, LLVM dialect types now derive Type directly BEGIN_PUBLIC [mlir] Remove LLVMType, LLVM dialect types now derive Type directly This class has become a simple `isa` hook with no proper functionality. Removing will allow us to eventually make the LLVM dialect type infrastructure open, i.e., support non-LLVM types inside container types, which itself will make the type conversion more progressive. Introduce a call `LLVM::isCompatibleType` to be used instead of `isa<LLVMType>`. For now, this is strictly equivalent. END_PUBLIC Depends On D93681 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D93713	2021-01-05 17:36:54 +01:00
Alex Zinenko	7ed9cfc7b1	[mlir] Remove static constructors from LLVMType LLVMType contains numerous static constructors that were initially introduced for API compatibility with LLVM. Most of these merely forward to arguments to `SpecificType::get` (MLIR defines classes for all types, unlike LLVM IR), while some introduce subtle semantics differences due to different modeling of MLIR types (e.g., structs are not auto-renamed in case of conflicts). Furthermore, these constructors don't match MLIR idioms and actively prevent us from making the LLVM dialect type system more open. Remove them and use `SpecificType::get` instead. Depends On D93680 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D93681	2020-12-23 13:12:47 +01:00
Alex Zinenko	8de43b926f	[mlir] Remove instance methods from LLVMType LLVMType contains multiple instance methods that were introduced initially for compatibility with LLVM API. These methods boil down to `cast` followed by type-specific call. Arguably, they are mostly used in an LLVM cast-follows-isa anti-pattern. This doesn't connect nicely to the rest of the MLIR infrastructure and actively prevents it from making the LLVM dialect type system more open, e.g., reusing built-in types when appropriate. Remove such instance methods and replaces their uses with apporpriate casts and methods on derived classes. In some cases, the result may look slightly more verbose, but most cases should actually use a stricter subtype of LLVMType anyway and avoid the isa/cast. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D93680	2020-12-22 23:34:54 +01:00
Rahul Joshi	563879b6f9	[NFC] Use ConvertOpToLLVMPattern instead of ConvertToLLVMPattern. - use ConvertOpToLLVMPattern to avoid explicit casting and in most cases the constructor can be reused to save a few lines of code. Differential Revision: https://reviews.llvm.org/D92989	2020-12-10 09:33:43 -08:00
Christian Sigg	dcec2ca5bd	Remove typeConverter from ConvertToLLVMPattern and use the existing one in ConversionPattern. ftynse Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D92564	2020-12-04 14:27:16 +01:00
River Riddle	431bb8b318	[mlir][ODS] Use c++ types for integer attributes of fixed width when possible. Unsigned and Signless attributes use uintN_t and signed attributes use intN_t, where N is the fixed width. The 1-bit variants use bool. Differential Revision: https://reviews.llvm.org/D86739	2020-09-01 13:43:32 -07:00
Alex Zinenko	5446ec8507	[mlir] take MLIRContext instead of LLVMDialect in getters of LLVMType's Historical modeling of the LLVM dialect types had been wrapping LLVM IR types and therefore needed access to the instance of LLVMContext stored in the LLVMDialect. The new modeling does not rely on that and only needs the MLIRContext that is used for uniquing, similarly to other MLIR types. Change LLVMType::get<Kind>Ty functions to take `MLIRContext ` instead of `LLVMDialect ` as first argument. This brings the code base closer to completely removing the dependence on LLVMContext from the LLVMDialect, together with additional support for thread-safety of its use. Depends On D85371 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D85372	2020-08-06 11:05:40 +02:00
River Riddle	8d67d187ba	[mlir][DialectConversion] Refactor how block argument types get converted This revision removes the TypeConverter parameter passed to the apply* methods, and instead moves the responsibility of region type conversion to patterns. The types of a region can be converted using the 'convertRegionTypes' method, which acts similarly to the existing 'applySignatureConversion'. This method ensures that all blocks within, and including those moved into, a region will have the block argument types converted using the provided converter. This has the benefit of making more of the legalization logic controlled by patterns, instead of being handled explicitly by the driver. It also opens up the possibility to support multiple type conversions at some point in the future. This revision also adds a new utility class `FailureOr<T>` that provides a LogicalResult friendly facility for returning a failure or a valid result value. Differential Revision: https://reviews.llvm.org/D81681	2020-06-18 15:59:22 -07:00
River Riddle	2265009fbe	[mlir][GPUOpsLowering] Add missing include for FormatVariadic	2020-05-01 15:58:20 -07:00
Wen-Heng (Jack) Chung	9ad5e57316	[mlir][nvvm][rocdl] refactor NVVM and ROCDL dialect. NFC. - Extract common logic between -convert-gpu-to-nvvm and -convert-gpu-to-rocdl. - Cope with the fact that alloca operates on different addrspaces between NVVM and ROCDL. - Modernize unit tests for ROCDL dialect. Differential Revision: https://reviews.llvm.org/D79021	2020-05-01 00:13:26 +02:00

25 Commits