This PR enhances `BasicPtxBuilder` to support predicates in PTX code
generation. The `BasicPtxBuilder` interface was initially introduced for
generating PTX code automatically for Ops that aren't supported by LLVM
core. Predicates, which are typically not supported in LLVM core, are
now supported using the same mechanism.
In PTX programming, instructions can be guarded by predicates as shown
below:. Here `@p` is a predicate register and guard the execution of the
instruction.
```
@p ptx.code op1, op2, op3
```
This PR introduces the `getPredicate` function in the `BasicPtxBuilder`
interface to set an optional predicate. When a predicate is provided,
the instruction is generated with predicate and guarded, otherwise,
predicate is not genearted. Note that the predicate value must always
appear as the last argument on the Op definition.
Additionally, this PR implements predicate usage for the following ops:
- mbarrier.init
- mbarrier.init.shared
- mbarrier.arrive.expect_tx
- mbarrier.arrive.expect_tx.shared
- cp.async.bulk.tensor.shared.cluster.global
- cp.async.bulk.tensor.global.shared.cta
See for more detail in PTX programing model
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-instructions
This work introduces the `wgmma.mma_async` Op along PTX generation using `BasicPtxBuilderOpInterface`. The Op is designed to execute the matrix multiply-and-accumulate operation across a warpgroup (128 threads). It's important to note that this operation works for devices with the sm_90a capability.
The matrix multiply-and-accumulate operation can take one of the following forms. In both cases, matrix D is referred to as the accumulator:
D = A * B + D : Result is added to the accumulator matrix D.
D = A * B : The input from the accumulator matrix D is not utilized.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D157370
Most `*-to-llvm` conversion patterns require a type converter. This
revision adds a type converter to the
`populateConvertToLLVMConversionPatterns` function and implements the
interface for the MemRef dialect.
Differential Revision: https://reviews.llvm.org/D157387
The multiple -convert-XXX-to-llvm passes are really nice testing tools for
individual dialects, but the expectation is that a proper conversion should
assemble the conversion patterns using `populateXXXToLLVMConversionPatterns()
APIs. However most customers just chain the conversion passes by convenience.
This pass makes it composable more transparently to assemble the required
patterns for conversion to LLVM dialect by using an interface.
The Pass will scan the input and collect all the dialect present, and for
those who implement the `ConvertToLLVMPatternInterface` it will use it to
populate the conversion pattern, and possible the conversion target.
Since these conversions can involve intermediate dialects, or target other
dialects than LLVM (for example AVX or NVVM), this pass can't statically
declare the required `getDependentDialects()` before the pass pipeline
begins. This is worked around by using an extension in the dialectRegistry
that will be invoked for every new loaded dialects in the context. This
allows to lookup the interface ahead of time and use it to query the
dependent dialects.
Differential Revision: https://reviews.llvm.org/D157183
getPtx used to return `const char*`. It is not flexible when one needs to build string in the function. This work changes return type.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D155056
It introduces an NVVMToLLVM Pass and a `BasicPtxBuilderOpInterface` interface. The Pass performs pattern matching on all the NVVM Ops that implement the BasicPtxBuilderOpInterface interface to generate LLVM Inline Assembly Ops.
The BasicPtxBuilderOpInterface interface is utilized in the convert-nvvm-to-llvm pass, which lowers Ops that support this interface to inline assembly Ops. The interface provides several methods that are used for this lowering.
The `getPtx` method returns PTX code. The `hasSideEffect` method is used to determine whether the op has any side effects on the memory. The `hasIntrinsic` method indicates whether the operation has intrinsic support in LLVM. This is particularly useful for Ops that don't have intrinsic support for each case. The `getAsmValues` method returns the arguments to be passed to the PTX code. The order of arguments starts with the results and they are used for write operations, followed by the operands and attributes.
Example:
If we have the following Op definition that returns PTX code through getPtx:
```tablegen
def NVVM_MBarrierArriveExpectTxOp : NVVM_Op<\"mbarrier.arrive.expect_tx\",
[DeclareOpInterfaceMethods<BasicPtxBuilderOpInterface>]>,
Results<(outs LLVM_Type:$res)>, Arguments<(ins LLVM_i64ptr_any:$addr, I32:$txcount)> {
...
let extraClassDefinition = [{
const char* $cppClass::getPtx() { return \"mbarrier.arrive.expect_tx.b64 %0, [%1], %2;\"; }
}\];
}
```
The NVVM Op will look like below:
```mlir
%0 = nvvm.mbarrier.arrive.expect_tx %barrier, %txcount : !llvm.ptr, i32 -> i32
```
The `convert-nvvm-to-llvm` Pass generates the following PTX code, while keeping the order of arguments the same. The read/write modifiers are set based on the input and result types.
```mlir
%0 = llvm.inline_asm has_side_effects asm_dialect = att "mbarrier.arrive.expect_tx.b64 %0, [%1], %2;", "=r,l,r" %arg0, %arg1 : (!llvm.ptr, i32) -> i32
```
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D154060