Add support for loading, computing, and storing `gpu.subgroup` WMMA ops
in transpose mode as well. Update the GPU to NVVM lowerings to support
`transpose` mode and update integration tests as well.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D139021
Enables transposed gpu.subgroup_mma_load_matrix and updates the lowerings in Vector to GPU and GPU to SPIRV. Needed to enable B transpose matmuls lowering to wmma ops.
Taken over from author: stanley-nod <stanley@nod-labs.com>
Reviewed By: ThomasRaoux, antiagainst
Differential Revision: https://reviews.llvm.org/D138770
This allows for incrementally updating the old API usages without
needing to update everything at once. These will be left on Both
for a little bit and then flipped to prefixed when all APIs have been
updated.
Differential Revision: https://reviews.llvm.org/D134386
This patch "modernizes" the LLVM `insertvalue` and `extractvalue`
operations to use DenseI64ArrayAttr, since they only require an array of
indices and previously there was confusion about whether to use i32 or
i64 arrays, and to use assembly format.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D131537
The maxf implementation of wmma elementwise op was incorrect as the
operands of the select to check for Nan were swapped.
Differential Revision: https://reviews.llvm.org/D127879
insert is soft deprecated, so remove all references so it's less likely
to be used and can be easily removed in the future.
Differential Revision: https://reviews.llvm.org/D120021
Support load with broadcast, elementwise divf op and remove the
hardcoded restriction on the vector size. Picking the right size should
be enfored by user and will fail conversion to llvm/spirv if it is not
supported.
Differential Revision: https://reviews.llvm.org/D113618
Use existing helper instead of handling only a subset of indices lowering
arithmetic. Also relax the restriction on the memref rank for the GPU mma ops
as we can now support any rank.
Differential Revision: https://reviews.llvm.org/D113383
In order to support fusion with mma matrix type we need to be able to
execute elementwise operations on them. This add an op to be able to
support some basic elementwise operations. This is a is not a full
solution as it only supports a limited scope or operations. Ideally we would
want to be able to fuse with more kind of operations.
Differential Revision: https://reviews.llvm.org/D112857
wmma intrinsics have a large number of combinations, ideally we want to be able
to target all the different variants. To avoid a combinatorial explosion in the
number of mlir op we use attributes to represent the different variation of
load/store/mma ops. We also can generate with tablegen helpers to know which
combinations are available. Using this we can avoid having too hardcode a path
for specific shapes and can support more types.
This patch also adds boiler plates for tf32 op support.
Differential Revision: https://reviews.llvm.org/D112689
Allow lowering of wmma ops with 64bits indexes. Change the default
version of the test to use default layout.
Differential Revision: https://reviews.llvm.org/D112479
This commits updates the remaining usages of the ArrayRef<Value> based
matchAndRewrite/rewrite methods in favor of the new OpAdaptor
overload.
Differential Revision: https://reviews.llvm.org/D110360
After the MemRef has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the MemRef
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: herhut, silvas
Differential Revision: https://reviews.llvm.org/D105625
This allow creating a matrix with all elements set to a given value. This is
needed to be able to implement a simple dot op.
Differential Revision: https://reviews.llvm.org/D103870
Consolidate the type conversion in a single function to make it simpler
to use. This allow to re-use the type conversion for up coming ops.
Differential Revision: https://reviews.llvm.org/D103868
In order to allow large matmul operations using the MMA ops we need to chain
operations this is not possible unless "DOp" and "COp" type have matching
layout so remove the "DOp" layout and force accumulator and result type to
match.
Added a test for the case where the MMA value is accumulated.
Differential Revision: https://reviews.llvm.org/D103023