The `gpu.module` operation can contain `spirv.target_env` attributes
within an array attribute named `"targets"`. So it accounts for that
case by iterating over the `"targets"` attribute, if present, and
looking up `spirv.target_env`.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
`offset` and `width` must be constants and there are constraints on
their values. Update the operation definition to use attributes instead
of operands.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch migrates away from TypeRagne(std::nullopt) and
ValueRange(std::nullopt).
From the description of gpu.shuffle operation, shuffle up/down rotates
values in the subgroup because it applies modulo on the shifted value to
calculate the result lane ID. It is inconsistent with the definition of
SPIR-V shuffle up/down and NVVM data movement definitions within
subgroup.
In NVVM, it says
"If the computed source lane index j is in range, the returned i32 value
will be the value of %a from lane j; otherwise, it will be the the value
of %a from the current thread."
It will keep the original value if the result land ID is out of range.
In SPIR-V OpGroupNonUniformShuffleUp and OpGroupNonUniformShuffleDown,
it says
"The resulting value is undefined if Delta is greater than the current
invocation’s id within the scope or if the identified invocation is not
in scope restricted tangle."
It's an undefined value if the result land ID is out of range.
Anyway, there is no circular movement in shuffle up/down from these 2
specifications. This patch removes the circular movement in gpu.shuffle
up/down and lower gpu.shuffle up/down to SPIR-V
OpGroupNonUniformShuffleUp and OpGroupNonUniformShuffleDown directly.
Reference:
https://docs.nvidia.com/cuda/archive/12.2.1/nvvm-ir-spec/index.html#data-movementhttps://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformShuffleUphttps://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformShuffleDown
- Introduced `gpu.subgroup_mma_extract` operation to extract values from
`!gpu.mma_matrix` by invocation and indices.
- Introduced `gpu.subgroup_mma_insert` operation to insert values into
`!gpu.mma_matrix` by invocation and indices.
- Updated the conversion patterns to SPIR-V for both extract and insert
operations.
- Added test cases to validate the new operations in the GPU to SPIR-V
conversion.
RFC:
https://discourse.llvm.org/t/rfc-add-gpu-operations-to-permute-data-in-2-loaded-mma-matrix/86148?u=hsiangkai
This is a code cleanup. Update a few places in MLIR that should use
`hasSingleElement`/`getSingleElement`.
Note: `hasSingleElement` is faster than `.getSize() == 1` when it is
used with linked lists etc.
Depends on #131508.
This commit is a further incremental step toward moving the whole
mlir-vulkan-runner MLIR pass pipeline into mlir-opt (see #73457). The
previous step was b225b3adf7b78387c9fcb97a3ff0e0a1e26eafe2, which moved
all device passes prior to SPIR-V serialization into a new mlir-opt test
pass, `-test-vulkan-runner-pipeline`.
This commit changes how SPIR-V serialization is accomplished for Vulkan
runner tests. Until now, this was done by the Vulkan-specific
ConvertGpuLaunchFuncToVulkanLaunchFunc pass. With this commit, this
responsibility is removed from that pass, and is instead done with the
existing generic GpuModuleToBinaryPass. In addition, the SPIR-V
serialization step is no longer done inside mlir-vulkan-runner, but
rather inside mlir-opt (in the `-test-vulkan-runner-pipeline` pass).
Both of these changes represent a greater alignment between
mlir-vulkan-runner and the other GPU integration tests. Notably, the IR
shapes produced by the mlir-opt pipelines for the Vulkan and SYCL
runners are now much more similar, with both using a gpu.binary op for
the serialized SPIR-V kernel.
In order to enable this, this commit includes these supporting changes:
- ConvertToSPIRVPass is enhanced to support producing the IR shape where
a spirv.module is nested inside a gpu.module, since this is what
GpuModuleToBinaryPass expects.
- ConvertGPULaunchFuncToVulkanLaunchFunc is changed to remove its SPIR-V
serialization functionality, and instead now extracts the SPIR-V from a
gpu.binary operation (as produced by ConvertToSPIRVPass).
- `-test-vulkan-runner-pipeline` now attaches SPIR-V target information
required by GpuModuleToBinaryPass.
- The WebGPU pass option, which had been removed from mlir-vulkan-runner
in the previous commit in this series, is restored as an option to
`-test-vulkan-runner-pipeline` instead, so that the WebGPU pass
continues being inserted into the pipeline just before SPIR-V
serialization.
This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.
Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.
Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.
This change contains following:
- adds lowering of printf op to spirv.CL.printf op in GPUToSPIRV pass.
- Fixes Constant decoration parsing for spirv GlobalVariable.
- minor modification to spirv.CL.printf op assembly format.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
This enables performing several reductions in parallel, each smaller
than the size of the subgroup.
One potential application is flash attention with subgroup-wide matrix
multiplication and reduction combined in one kernel. The multiplication
operation requires a 2D matrix to be distributed over the lanes of the
subgroup, which then constrains the shape the following reduction can
have if we want to keep data in registers.
- Replace hand-written parser/printer with auto-generated assembly
format.
- Remove implicit `gpu.module_end` terminator and use the `NoTerminator`
trait instead. (Same as `builtin.module`.)
- Turn the region into a graph region. (Same as `builtin.module`.)
This PR tries to reland #93595 which was reverted in #93732 due to some
issues. The original PR:
- Add integration test for `vector.shuffle` and `vector.interleave`
- Add `VectorToSPIRV` patterns to `GPUToSPIRVPass`
Description of the issue:
-
https://github.com/llvm/llvm-project/pull/93595#issuecomment-2138541700
- Using either `vector.load` or `vector.store` in the kernel function
will cause the validation layer to report an error
- Trying to bypass the issue by using `memref.load` and `memref.store`
to load/store individual elements from/to the vectors, and populate the
vectors using `vector.insertelement` and `vector.extractelement`
instead.
- Add integration test for `vector.shuffle` and `vector.interleave`,
mentioned in issue #91978
- Add `VectorToSPIRV` patterns to `GPUToSPIRVPass`
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
This extension has been superseded by SPV_KHR_cooperative_matrix which
is supported across major vendors GPU like Nvidia, AMD, and Intel.
Given that the KHR version has been supported for nearly half a year,
drop the NV-specific extension to reduce the maintenance burden and code
duplication.
Each vector element is reduced independently, which is a form of
multi-reduction.
The plan is to allow for gradual lowering of multi-reduction that
results in fewer `gpu.shuffle` ops at the end:
1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d
`gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32
For example we can perform 2 independent f16 reductions with a series of
`gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.
This is to avoid confusion when dealing with reduction/combining kinds.
For example, see a recent PR comment:
https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175.
Previously, they were picked to mostly mirror the names of the llvm
vector reduction intrinsics:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In
isolation, it was not clear if `<maxf>` has `arith.maxnumf` or
`arith.maximumf` semantics. The new reduction kind names map 1:1 to
arith ops, which makes it easier to tell/look up their semantics.
Because both the vector and the gpu dialect depend on the arith dialect,
it's more natural to align names with those in arith than with the
lowering to llvm intrinsics.
Issue: https://github.com/llvm/llvm-project/issues/72354
The motivation for this change is explained in
https://github.com/llvm/llvm-project/issues/72354.
Before this change, we could not tell between signed/unsigned
minimum/maximum and NaN treatment for floating point values.
The mapping of old reduction operations to the new ones is as follows:
* `min` --> `minsi` for ints, `minf` for floats
* `max` --> `maxsi` for ints, `maxf` for floats
New reduction kinds not represented in the old enum: `minui`, `maxui`,
`minimumf`, `maximumf`.
As a next step, I would like to have a common definition of combining
kinds used by the `vector` and `gpu` dialects. Separately, the GPU to
SPIR-V lowering does not yet properly handle zero and NaN values -- the
behavior of floating point min/max group reductions is not specified by
the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459.
Issue: https://github.com/llvm/llvm-project/issues/72354
This includes a couple of changes to pass behavior for OpenCL kernels.
Vulkan shaders are not impacted by the changes.
1. SPIR-V module is placed inside GPU module. This change is required for
gpu-module-to-binary to work correctly as it expects kernel function to
be inside the GPU module.
2. A dummy func.func with same kernel name as gpu.func is created. GPU
compilation pipeline defers lowering of gpu launch kernel op. Since
spirv.func is not directly tied to gpu launch kernel, a dummy func.func
is required to avoid legalization issues.
3. Use correct mapping when mapping MemRef memory space to SPIR-V
storage class for OpenCL kernels.
- Now that the KHR coop matrix implementation is robust, switch the gpu
conversion pass to default to it.
- Use a populate function for MMA to coop matrix type conversions. This
makes the API surface area smaller.
These do not produce extension-specific ops and are handled via common
patterns for both the KHR and the NV coop matrix extension.
Also improve match failure reporting and error handling in type
conversion.
This is plugged in as an alternative lowering path in the gpu to spirv
dialect conversion. Add custom op builders for coop matrix ops to make
the create functions nicer to work with and less error-prone. The latter
is accomplished by following the op syntax and also requiring stride to
be a constant op to avoid confusion around the order of arguments.
The remaining lowering patterns will be added in a future patch.
This is a cleanup in preparation for adding a second conversion path
using the KHR cooperative matrix extension.
Make the existing lowering explicit about emitting ops from the NV coop
matrix extension. Clean up surrounding code.
ConversionPatterns do not (and should not) modify the type converter that they are using.
* Make `ConversionPattern::typeConverter` const.
* Make member functions of the `LLVMTypeConverter` const.
* Conversion patterns take a const type converter.
* Various helper functions (that are called from patterns) now also take a const type converter.
Differential Revision: https://reviews.llvm.org/D157601
This commit adds support for arith.extf in the supported list of
elementwise ops for subgroup MMA ops, and enables lowering to
SPIR-V.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D156847
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
We use `use64bitIndex` in the option to decide the target device
address bitwidth. This makes it consistent with index type
conversion too.
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D144827
Vulkan requires GPU processor ID/count builtin variables to be
32-bit scalar or vector for all the cases. Similarly there
are special requirements for OpenCL. We need to make sure those
rules are respected when converting using 64bit for index.
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D144819
This commit just adds options to control index type bitwidth in
GPUToSPIRV conversion, and updates tests to prepare for 64bit
index conversion.
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D144826
This is part of an effort to migrate from llvm::Optional to
std::optional. 22426110c5ef changed the way mlir-tblgen generates .inc
files, emitting std::optional when an Optional attribute is specified in
a .td file. It also changed several .td files hard-coding llvm::Optional
to use std::optional. However, the patch excluded a few .td files in
SPIRV and Bufferization hard-coding llvm::Optional. This patch fixes
that defect, and after this patch, references to llvm::Optional in .cpp
and .h files can be replaced mechanically.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D140329
Conversion from gpu.subgroup_mma_constant_matrix to spirv.MatrixTimesScalar didn't check that the op type was a multiplication and thus would incorrectly convert other elementwise scalar operations.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D140081