llvm-project

Author	SHA1	Message	Date
Guray Ozen	63389326f5	[mlir][nvvm] Support predicates in `BasicPtxBuilder` (#67102 ) This PR enhances `BasicPtxBuilder` to support predicates in PTX code generation. The `BasicPtxBuilder` interface was initially introduced for generating PTX code automatically for Ops that aren't supported by LLVM core. Predicates, which are typically not supported in LLVM core, are now supported using the same mechanism. In PTX programming, instructions can be guarded by predicates as shown below:. Here `@p` is a predicate register and guard the execution of the instruction. ``` @p ptx.code op1, op2, op3 ``` This PR introduces the `getPredicate` function in the `BasicPtxBuilder` interface to set an optional predicate. When a predicate is provided, the instruction is generated with predicate and guarded, otherwise, predicate is not genearted. Note that the predicate value must always appear as the last argument on the Op definition. Additionally, this PR implements predicate usage for the following ops: - mbarrier.init - mbarrier.init.shared - mbarrier.arrive.expect_tx - mbarrier.arrive.expect_tx.shared - cp.async.bulk.tensor.shared.cluster.global - cp.async.bulk.tensor.global.shared.cta See for more detail in PTX programing model https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-instructions	2023-10-17 12:42:36 +02:00
Guray Ozen	25da11541c	[mlir][nvvm] Move `BasicPtxBuilder` Interface to Its Own File (NFC) (#68095 )	2023-10-12 07:13:58 -07:00
Guray Ozen	6e8ffab9b6	[mlir][nvvm] Introduce `elect.sync` Op (#68323 ) The Op selects a leader thread from a set of threads. See for more information: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync	2023-10-09 10:21:45 -07:00
Guray Ozen	18e161f9e1	[MLIR][NVVM] Introduction of the `wgmma.mma_async` Op This work introduces the `wgmma.mma_async` Op along PTX generation using `BasicPtxBuilderOpInterface`. The Op is designed to execute the matrix multiply-and-accumulate operation across a warpgroup (128 threads). It's important to note that this operation works for devices with the sm_90a capability. The matrix multiply-and-accumulate operation can take one of the following forms. In both cases, matrix D is referred to as the accumulator: D = A * B + D : Result is added to the accumulator matrix D. D = A * B : The input from the accumulator matrix D is not utilized. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D157370	2023-08-09 23:08:00 +02:00
Matthias Springer	876a480cac	[mlir][Conversion] Add type converter parameter to ConvertToLLVMPatternInterface Most `*-to-llvm` conversion patterns require a type converter. This revision adds a type converter to the `populateConvertToLLVMConversionPatterns` function and implements the interface for the MemRef dialect. Differential Revision: https://reviews.llvm.org/D157387	2023-08-09 09:00:46 +02:00
Mehdi Amini	4529797a9d	Add a generic "convert-to-llvm" pass delegating to an interface The multiple -convert-XXX-to-llvm passes are really nice testing tools for individual dialects, but the expectation is that a proper conversion should assemble the conversion patterns using `populateXXXToLLVMConversionPatterns() APIs. However most customers just chain the conversion passes by convenience. This pass makes it composable more transparently to assemble the required patterns for conversion to LLVM dialect by using an interface. The Pass will scan the input and collect all the dialect present, and for those who implement the `ConvertToLLVMPatternInterface` it will use it to populate the conversion pattern, and possible the conversion target. Since these conversions can involve intermediate dialects, or target other dialects than LLVM (for example AVX or NVVM), this pass can't statically declare the required `getDependentDialects()` before the pass pipeline begins. This is worked around by using an extension in the dialectRegistry that will be invoked for every new loaded dialects in the context. This allows to lookup the interface ahead of time and use it to query the dependent dialects. Differential Revision: https://reviews.llvm.org/D157183	2023-08-07 18:46:08 -07:00
Guray Ozen	eda52f3cd3	[mlir][nvvm] Add populate function (nfc) This work adds populate function for the nvvm to llvm conversion pattern. Reviewed By: kuhar Differential Revision: https://reviews.llvm.org/D155189	2023-07-13 14:53:51 +02:00
Guray Ozen	ffbca7e9f3	[mlir][nvvm] Change return type of std::string of getPtx of PtxBuilder getPtx used to return `const char*`. It is not flexible when one needs to build string in the function. This work changes return type. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D155056	2023-07-12 14:59:54 +02:00
Guray Ozen	b6bf775f58	[mlir][nvvm] fix potential bug (NFC) Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155048	2023-07-12 10:09:21 +02:00
Guray Ozen	dd080c7579	[mlir][nvvm] Add NVVMToLLVM Pass It introduces an NVVMToLLVM Pass and a `BasicPtxBuilderOpInterface` interface. The Pass performs pattern matching on all the NVVM Ops that implement the BasicPtxBuilderOpInterface interface to generate LLVM Inline Assembly Ops. The BasicPtxBuilderOpInterface interface is utilized in the convert-nvvm-to-llvm pass, which lowers Ops that support this interface to inline assembly Ops. The interface provides several methods that are used for this lowering. The `getPtx` method returns PTX code. The `hasSideEffect` method is used to determine whether the op has any side effects on the memory. The `hasIntrinsic` method indicates whether the operation has intrinsic support in LLVM. This is particularly useful for Ops that don't have intrinsic support for each case. The `getAsmValues` method returns the arguments to be passed to the PTX code. The order of arguments starts with the results and they are used for write operations, followed by the operands and attributes. Example: If we have the following Op definition that returns PTX code through getPtx: ```tablegen def NVVM_MBarrierArriveExpectTxOp : NVVM_Op<\"mbarrier.arrive.expect_tx\", [DeclareOpInterfaceMethods<BasicPtxBuilderOpInterface>]>, Results<(outs LLVM_Type:$res)>, Arguments<(ins LLVM_i64ptr_any:$addr, I32:$txcount)> { ... let extraClassDefinition = [{ const char* $cppClass::getPtx() { return \"mbarrier.arrive.expect_tx.b64 %0, [%1], %2;\"; } }\]; } ``` The NVVM Op will look like below: ```mlir %0 = nvvm.mbarrier.arrive.expect_tx %barrier, %txcount : !llvm.ptr, i32 -> i32 ``` The `convert-nvvm-to-llvm` Pass generates the following PTX code, while keeping the order of arguments the same. The read/write modifiers are set based on the input and result types. ```mlir %0 = llvm.inline_asm has_side_effects asm_dialect = att "mbarrier.arrive.expect_tx.b64 %0, [%1], %2;", "=r,l,r" %arg0, %arg1 : (!llvm.ptr, i32) -> i32 ``` Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D154060	2023-07-11 12:14:24 +02:00

10 Commits