25 Commits

Author SHA1 Message Date
Guray Ozen
763109e346
[mlir][gpu] Use known_block_size to set maxntid for NVVM target (#77301)
Setting thread block size with `maxntid` on the kernel has great
performance benefits. In this way, downstream PTX compiler can do better
register allocation.

MLIR's `gpu.launch` and `gpu.launch_func` already has an attribute
(`known_block_size`) that keeps the thread block size when it is known.
This PR simply uses this attribute to set `maxntid`.
2024-01-08 14:49:19 +01:00
Guray Ozen
ea84897ba3
[mlir][gpu] Introduce gpu.dynamic_shared_memory Op (#71546)
While the `gpu.launch` Op allows setting the size via the
`dynamic_shared_memory_size` argument, accessing the dynamic shared
memory is very convoluted. This PR implements the proposed Op,
`gpu.dynamic_shared_memory` that aims to simplify the utilization of
dynamic shared memory.

RFC:
https://discourse.llvm.org/t/rfc-simplifying-dynamic-shared-memory-access-in-gpu/

**Proposal from RFC**
This PR `gpu.dynamic.shared.memory` Op to use dynamic shared memory
feature efficiently. It is is a powerful feature that enables the
allocation of shared memory at runtime with the kernel launch on the
host. Afterwards, the memory can be accessed directly from the device. I
believe similar story exists for AMDGPU.

**Current way Using Dynamic Shared Memory with MLIR**

Let me illustrate the challenges of using dynamic shared memory in MLIR
with an example below. The process involves several steps:
- memref.global 0-sized array LLVM's NVPTX backend expects
- dynamic_shared_memory_size Set the size of dynamic shared memory
- memref.get_global Access the global symbol
- reinterpret_cast and subview Many OPs for pointer arithmetic

```
// Step 1. Create 0-sized global symbol. Manually set the alignment
memref.global "private" @dynamicShmem  : memref<0xf16, 3> { alignment = 16 }
func.func @main() {
  // Step 2. Allocate shared memory
  gpu.launch blocks(...) threads(...)
    dynamic_shared_memory_size %c10000 {
    // Step 3. Access the global object
    %shmem = memref.get_global @dynamicShmem : memref<0xf16, 3>
    // Step 4. A sequence of `memref.reinterpret_cast` and `memref.subview` operations.
    %4 = memref.reinterpret_cast %shmem to offset: [0], sizes: [14, 64, 128],  strides: [8192,128,1] : memref<0xf16, 3> to memref<14x64x128xf16,3>
    %5 = memref.subview %4[7, 0, 0][7, 64, 128][1,1,1] : memref<14x64x128xf16,3> to memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3>
    %6 = memref.subview %5[2, 0, 0][1, 64, 128][1,1,1] : memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> to memref<64x128xf16, strided<[128, 1], offset: 73728>, 3>
    %7 = memref.subview %6[0, 0][64, 64][1,1]  : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>
    %8 = memref.subview %6[32, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>
    // Step.5 Use
    "test.use.shared.memory"(%7) : (memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>) -> (index)
    "test.use.shared.memory"(%8) : (memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>) -> (index)
    gpu.terminator
  }
```

Let’s write the program above with that:

```
func.func @main() {
    gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 {
    	%i = arith.constant 18 : index
        // Step 1: Obtain shared memory directly
        %shmem = gpu.dynamic_shared_memory : memref<?xi8, 3>
        %c147456 = arith.constant 147456 : index
        %c155648 = arith.constant 155648 : index
        %7 = memref.view %shmem[%c147456][] : memref<?xi8, 3> to memref<64x64xf16, 3>
        %8 = memref.view %shmem[%c155648][] : memref<?xi8, 3> to memref<64x64xf16, 3>

        // Step 2: Utilize the shared memory
        "test.use.shared.memory"(%7) : (memref<64x64xf16, 3>) -> (index)
        "test.use.shared.memory"(%8) : (memref<64x64xf16, 3>) -> (index)
    }
}
```

This PR resolves #72513
2023-11-16 14:42:17 +01:00
Matthias Springer
ce254598b7 [mlir][Conversion] Store const type converter in ConversionPattern
ConversionPatterns do not (and should not) modify the type converter that they are using.

* Make `ConversionPattern::typeConverter` const.
* Make member functions of the `LLVMTypeConverter` const.
* Conversion patterns take a const type converter.
* Various helper functions (that are called from patterns) now also take a const type converter.

Differential Revision: https://reviews.llvm.org/D157601
2023-08-14 09:03:11 +02:00
Nicolas Vasilache
888717e853 [mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD
This revision untangles a few more conversion pieces and allows rewriting
the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass
in a declarative fashion that provides a much better understanding and control.

Differential Revision: https://reviews.llvm.org/D157617
2023-08-10 15:30:48 +00:00
Krzysztof Drewniak
499abb243c Add generic type attribute mapping infrastructure, use it in GpuToX
Remapping memory spaces is a function often needed in type
conversions, most often when going to LLVM or to/from SPIR-V (a future
commit), and it is possible that such remappings may become more
common in the future as dialects take advantage of the more generic
memory space infrastructure.

Currently, memory space remappings are handled by running a
special-purpose conversion pass before the main conversion that
changes the address space attributes. In this commit, this approach is
replaced by adding a notion of type attribute conversions
TypeConverter, which is then used to convert memory space attributes.

Then, we use this infrastructure throughout the *ToLLVM conversions.
This has the advantage of loosing the requirements on the inputs to
those passes from "all address spaces must be integers" to "all
memory spaces must be convertible to integer spaces", a looser
requirement that reduces the coupling between portions of MLIR.

ON top of that, this change leads to the removal of most of the calls
to getMemorySpaceAsInt(), bringing us closer to removing it.

(A rework of the SPIR-V conversions to use this new system will be in
a folowup commit.)

As a note, one long-term motivation for this change is that I would
eventually like to add an allocaMemorySpace key to MLIR data layouts
and then call getMemRefAddressSpace(allocaMemorySpace) in the
relevant *ToLLVM in order to ensure all alloca()s, whether incoming or
produces during the LLVM lowering, have the correct address space for
a given target.

I expect that the type attribute conversion system may be useful in
other contexts.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D142159
2023-02-09 18:00:46 +00:00
Christopher Bate
6ca1a09f03 [mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr)
This is a purely mechanical change that introduces an enum attribute in the GPU
dialect to represent the various memref memory spaces as opposed to the
hard-coded integer attributes that are currently used.

The following steps were taken to make the transition across the codebase:

1. Introduce a pass "gpu-lower-memory-space-attributes":

The pass updates all memref types that have a memory space attribute that is a
`gpu::AddressSpaceAttr`. These attributes are changed to `IntegerAttr`'s using a
mapping that is given by the caller. This pass is based on the
"map-memref-spirv-storage-class" pass and the common functions can probably
be refactored into a set of utilities under the MemRef dialect.

2. Update the verifiers of GPU/NVGPU dialect operations.

If a verifier currently checks the address space of an operand using
e.g.`getWorkspaceAddressSpace`, then it can continue to do so. However, the
checks are changed to only fail if the memory space is either missing or a wrong
value of type `gpu::AddressSpaceAttr`. Otherwise, it just assumes the address
space is correct because it was specifically lowered to something other than a
`gpu::AddressSpaceAttr`.

3. Update existing gpu-to-llvm conversion infrastructure.

In the existing gpu-to-X passes, we add a full conversion equivalent to
`gpu-lower-memory-space-attributes` just before doing the conversion to the
LLVMDialect. This is done because currently both the gpu-to-llvm passes
(rocdl,nvvm) run gpu-to-gpu rewrites within the pass, which introduce
`AddressSpaceAttr` memory space annotations. Therefore, I inserted the
memory space conversion between the gpu-to-gpu rewrites and the LLVM
conversion.

For more context see the below discourse discussion:
https://discourse.llvm.org/t/gpu-workgroup-shared-memory-address-space-is-hard-coded/

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D140644
2023-01-13 11:00:10 -07:00
Thomas Raoux
7efdc117b1 [mlir][nvvm] Add lowering of gpu.printf to nvvm
When converting to nvvm lowering gpu.printf to vprintf allows us to
support printing when running on cuda.

Differential Revision: https://reviews.llvm.org/D141049
2023-01-06 17:29:30 +00:00
Christian Sigg
b251b608b5 [mlir][gpu] Unroll ops on vectors which map to intrinsic calls
Unroll ops that map to intrinsics when lowering to LLVM, because intrinsics don't support vector operands/results.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D136345
2022-10-28 10:33:38 +02:00
Mogball
d7ef488bb6 [mlir][gpu] Move GPU headers into IR/ and Transforms/
Depends on D127350

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D127352
2022-06-09 22:49:03 +00:00
Krzysztof Drewniak
e1da62910e [MLIR][GPU] Define gpu.printf op and its lowerings
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered

This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.

And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels

This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.

In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110448
2021-12-09 15:54:31 +00:00
River Riddle
195730a650 [mlir][NFC] Replace references to Identifier with StringAttr
This is part of the replacement of Identifier with StringAttr.

Differential Revision: https://reviews.llvm.org/D113953
2021-11-16 17:36:26 +00:00
River Riddle
ef976337f5 [mlir:OpConversion] Remove the remaing usages of the deprecated matchAndRewrite methods
This commits updates the remaining usages of the ArrayRef<Value> based
matchAndRewrite/rewrite methods in favor of the new OpAdaptor
overload.

Differential Revision: https://reviews.llvm.org/D110360
2021-09-24 17:51:41 +00:00
Alex Zinenko
75e5f0aac9 [mlir] factor memref-to-llvm lowering out of std-to-llvm
After the MemRef has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the MemRef
dialect operations to LLVM into a separate library and a separate
conversion pass.

Reviewed By: herhut, silvas

Differential Revision: https://reviews.llvm.org/D105625
2021-07-09 14:49:52 +02:00
Alex Zinenko
4c4876c314 [mlir] Use target-specific GPU kernel attributes in lowering pipelines
Until now, the GPU translation to NVVM or ROCDL intrinsics relied on the
presence of the generic `gpu.kernel` attribute to attach additional LLVM IR
metadata to the relevant functions. This would be problematic if each dialect
were to handle the conversion of its own options, which is the intended
direction for the translation infrastructure. Introduce `nvvm.kernel` and
`rocdl.kernel` in addition to `gpu.kernel` and base translation on these new
attributes instead.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D96591
2021-02-12 14:09:24 +01:00
Alex Zinenko
2230bf99c7 [mlir] replace LLVMIntegerType with built-in integer type
The LLVM dialect type system has been closed until now, i.e. did not support
types from other dialects inside containers. While this has had obvious
benefits of deriving from a common base class, it has led to some simple types
being almost identical with the built-in types, namely integer and floating
point types. This in turn has led to a lot of larger-scale complexity: simple
types must still be converted, numerous operations that correspond to LLVM IR
intrinsics are replicated to produce versions operating on either LLVM dialect
or built-in types leading to quasi-duplicate dialects, lowering to the LLVM
dialect is essentially required to be one-shot because of type conversion, etc.
In this light, it is reasonable to trade off some local complexity in the
internal implementation of LLVM dialect types for removing larger-scale system
complexity. Previous commits to the LLVM dialect type system have adapted the
API to support types from other dialects.

Replace LLVMIntegerType with the built-in IntegerType plus additional checks
that such types are signless (these are isolated in a utility function that
replaced `isa<LLVMType>` and in the parser). Temporarily keep the possibility
to parse `!llvm.i32` as a synonym for `i32`, but add a deprecation notice.

Reviewed By: mehdi_amini, silvas, antiagainst

Differential Revision: https://reviews.llvm.org/D94178
2021-01-07 19:48:31 +01:00
Alex Zinenko
c69c9e0f0f [mlir] Remove LLVMType, LLVM dialect types now derive Type directly
BEGIN_PUBLIC
[mlir] Remove LLVMType, LLVM dialect types now derive Type directly

This class has become a simple `isa` hook with no proper functionality.
Removing will allow us to eventually make the LLVM dialect type infrastructure
open, i.e., support non-LLVM types inside container types, which itself will
make the type conversion more progressive.

Introduce a call `LLVM::isCompatibleType` to be used instead of
`isa<LLVMType>`. For now, this is strictly equivalent.
END_PUBLIC

Depends On D93681

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D93713
2021-01-05 17:36:54 +01:00
Alex Zinenko
7ed9cfc7b1 [mlir] Remove static constructors from LLVMType
LLVMType contains numerous static constructors that were initially introduced
for API compatibility with LLVM. Most of these merely forward to arguments to
`SpecificType::get` (MLIR defines classes for all types, unlike LLVM IR), while
some introduce subtle semantics differences due to different modeling of MLIR
types (e.g., structs are not auto-renamed in case of conflicts). Furthermore,
these constructors don't match MLIR idioms and actively prevent us from making
the LLVM dialect type system more open. Remove them and use `SpecificType::get`
instead.

Depends On D93680

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D93681
2020-12-23 13:12:47 +01:00
Alex Zinenko
8de43b926f [mlir] Remove instance methods from LLVMType
LLVMType contains multiple instance methods that were introduced initially for
compatibility with LLVM API. These methods boil down to `cast` followed by
type-specific call. Arguably, they are mostly used in an LLVM cast-follows-isa
anti-pattern. This doesn't connect nicely to the rest of the MLIR
infrastructure and actively prevents it from making the LLVM dialect type
system more open, e.g., reusing built-in types when appropriate. Remove such
instance methods and replaces their uses with apporpriate casts and methods on
derived classes. In some cases, the result may look slightly more verbose, but
most cases should actually use a stricter subtype of LLVMType anyway and avoid
the isa/cast.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D93680
2020-12-22 23:34:54 +01:00
Rahul Joshi
563879b6f9 [NFC] Use ConvertOpToLLVMPattern instead of ConvertToLLVMPattern.
- use ConvertOpToLLVMPattern to avoid explicit casting and in most cases the
  constructor can be reused to save a few lines of code.

Differential Revision: https://reviews.llvm.org/D92989
2020-12-10 09:33:43 -08:00
Christian Sigg
dcec2ca5bd Remove typeConverter from ConvertToLLVMPattern and use the existing one in ConversionPattern.
ftynse

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D92564
2020-12-04 14:27:16 +01:00
River Riddle
431bb8b318 [mlir][ODS] Use c++ types for integer attributes of fixed width when possible.
Unsigned and Signless attributes use uintN_t and signed attributes use intN_t, where N is the fixed width. The 1-bit variants use bool.

Differential Revision: https://reviews.llvm.org/D86739
2020-09-01 13:43:32 -07:00
Alex Zinenko
5446ec8507 [mlir] take MLIRContext instead of LLVMDialect in getters of LLVMType's
Historical modeling of the LLVM dialect types had been wrapping LLVM IR types
and therefore needed access to the instance of LLVMContext stored in the
LLVMDialect. The new modeling does not rely on that and only needs the
MLIRContext that is used for uniquing, similarly to other MLIR types. Change
LLVMType::get<Kind>Ty functions to take `MLIRContext *` instead of
`LLVMDialect *` as first argument. This brings the code base closer to
completely removing the dependence on LLVMContext from the LLVMDialect,
together with additional support for thread-safety of its use.

Depends On D85371

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D85372
2020-08-06 11:05:40 +02:00
River Riddle
8d67d187ba [mlir][DialectConversion] Refactor how block argument types get converted
This revision removes the TypeConverter parameter passed to the apply* methods, and instead moves the responsibility of region type conversion to patterns. The types of a region can be converted using the 'convertRegionTypes' method, which acts similarly to the existing 'applySignatureConversion'. This method ensures that all blocks within, and including those moved into, a region will have the block argument types converted using the provided converter.

This has the benefit of making more of the legalization logic controlled by patterns, instead of being handled explicitly by the driver. It also opens up the possibility to support multiple type conversions at some point in the future.

This revision also adds a new utility class `FailureOr<T>` that provides a LogicalResult friendly facility for returning a failure or a valid result value.

Differential Revision: https://reviews.llvm.org/D81681
2020-06-18 15:59:22 -07:00
River Riddle
2265009fbe [mlir][GPUOpsLowering] Add missing include for FormatVariadic 2020-05-01 15:58:20 -07:00
Wen-Heng (Jack) Chung
9ad5e57316 [mlir][nvvm][rocdl] refactor NVVM and ROCDL dialect. NFC.
- Extract common logic between -convert-gpu-to-nvvm and -convert-gpu-to-rocdl.
- Cope with the fact that alloca operates on different addrspaces between NVVM
  and ROCDL.
- Modernize unit tests for ROCDL dialect.

Differential Revision: https://reviews.llvm.org/D79021
2020-05-01 00:13:26 +02:00