This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRAMDGPUTransforms.a only:
```
In file included from mlir/lib/Dialect/AMDGPU/Transforms/OptimizeSharedMemory.cpp:21:
mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error: mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory
```
- This PR adds parameterization for shared memory variables that are
used for optimization: `sharedMemoryLineSizeBytes` and
`defaultVectorSizeBits.`
- The default values are set to 128 for both variables since it gives
zero bank conflicts.
It implements transformation to optimize accesses to shared memory.
Reference: https://reviews.llvm.org/D127457
_This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by newColIdx = col % vecSize + perm[row](col / vecSize, row)
where perm is a permutation function indexed by row and vecSize
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized
accesses
common to loading matrix multiplication operands to/from shared memory._
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Define operations that wrap the gfx940's new operations for converting
between f32 and registers containing packed sets of four 8-bit floats.
Define rocdl operations for the intrinsics and an AMDGPU dialect
wrapper around them (to account for the fact that MLIR distinguishes
the two float formats at the type level but that the LLVM IR does
not).
Define an ArithToAMDGPU pass, meant to run before conversion to LLVM,
that replaces relevant calls to arith.extf and arith.truncf with the
packed operations in the AMDGPU dialect. Note that the conversion
currently only handles scalars and vectors of rank <= 1, as we do not
have a usecase for multi-dimensional vector support right now.
Reviewed By: jsjodin
Differential Revision: https://reviews.llvm.org/D152457
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.
This renaming started with the native ODS support for properties, this is completing it.
A mass automated textual rename seems safe for most codebases.
Drop also the ods prefix to keep the accessors the same as they were before
this change:
properties.odsOperandSegmentSizes
reverts back to:
properties.operandSegementSizes
The ODS prefix was creating divergence between all the places and make it harder to
be consistent.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D157173
Wave Matrix Multiply Accumulate (WMMA) is the instruction to accelerate
matrix multiplication on RDNA3 architectures. LLVM already provides a
set of intrinsics to generate wmma instructions. This change uses those
intrinsics to enable the feature in MLIR.
Reviewed By: krzysz00
Differential Revision: https://reviews.llvm.org/D152451
`RewriterBase::Listener::notifyOperationReplaced` notifies observers that an op is about to be replaced with a range of values. This notification is not very useful for ops without results, because it does not specify the replacement op (and it cannot be deduced from the replacement values). It provides no additional information over the `notifyOperationRemoved` notification.
This revision adds an additional notification when a rewriter replaces an op with another op. By default, this notification triggers the original "op replaced with values" notification, so there is no functional change for existing code.
This new API is useful for the transform dialect, which needs to track op replacements. (Updated in a subsequent revision.)
Also includes minor documentation improvements.
Differential Revision: https://reviews.llvm.org/D152814
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
* https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
* Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This follows a previous patch that updated calls
`op.cast<T>()-> cast<T>(op)`. However some cases could not handle an
unprefixed `cast` call due to occurrences of variables named cast, or
occurring inside of class definitions which would resolve to the method.
All C++ files that did not work automatically with `cast<T>()` are
updated here to `llvm::cast` and similar with the intention that they
can be easily updated after the methods are removed through a
find-replace.
See https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
for the clang-tidy check that is used and then update printed
occurrences of the function to include `llvm::` before.
One can then run the following:
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-export-fixes /tmp/cast/casts.yaml mlir/*\
-header-filter=mlir/ -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D150348
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
Not all AMDGPU targets support all atomic operations. For example,
there are not atomic floating-point adds on the gfx10 series. Add a
pass to emulate these operations using a compare-and-swap loop, by
analogy to the generic atomicrmw rewrite in MemrefToLLVM.
This pass is named generally, as in the future we may have a
memref-to-amdgpu that translates constructs like atomicrmw fmax (which
doesn't generally exist in LLVM) to the relevant intrinsics, which may
themselves require emulation.
Since the AMDGPU dialect now has a pass that operates on it, the
dialect's directory structure is reorganized to match other similarly
complex dialects.
The pass should be run before amdgpu-to-rocdl if desired.
This commit also adds f64 support to atomic_fmax.
Depends on D148722
Reviewed By: nirvedhmeshram
Differential Revision: https://reviews.llvm.org/D148724
This commit adds the buffer cmpswap intrinsic to the ROCDL dialect and
its corresponding AMDGPU dialect wrappers.
Reviewed By: nirvedhmeshram
Differential Revision: https://reviews.llvm.org/D148722
This commit adds support for atomic fmax/smax/umin support
for AMDGPU dialect and the dependent dialects to allow such
a lowering.
Reviewed By: krzysz00
Differential Revision: https://reviews.llvm.org/D144097
Upcoming AMD hardware will include functions that accept 8-bit floats.
Specifically, there are MFMA instructions that accept 8-bit floats,
either using the same or mixed formats. This patch adds MLIR wrappers
for these intrinsics and explicitly adds support for 8-bit floats in
the gpu-to-rocdl conversion by way of amdgpu-to-rocdl.
Since LLVM does not have f8 types, when targeting LLVM for compilation
on an AMD GPU, both f8 types used on AMD hardware (f8E5M2FNUZ and
f8E4M3FNUZ) are rewritten to i8.
This patch also relaxes the restriction that the types of both source
operands to a amdgpu.mfma instructions match exactly, as this is not
necessarily required for the bf8 (f8E5M2FNUZ) and fp8 (f8E4M3FNUZ)
instructions. In addition, since the buffer_{load,store} operations
maintain a whitelist of permitted types, we add the relevant f8 types
to that list.
This patch does not add any implementations of arithmetic operations
for f8 types.
Reviewed By: jakeh-gc
Differential Revision: https://reviews.llvm.org/D143956
Remapping memory spaces is a function often needed in type
conversions, most often when going to LLVM or to/from SPIR-V (a future
commit), and it is possible that such remappings may become more
common in the future as dialects take advantage of the more generic
memory space infrastructure.
Currently, memory space remappings are handled by running a
special-purpose conversion pass before the main conversion that
changes the address space attributes. In this commit, this approach is
replaced by adding a notion of type attribute conversions
TypeConverter, which is then used to convert memory space attributes.
Then, we use this infrastructure throughout the *ToLLVM conversions.
This has the advantage of loosing the requirements on the inputs to
those passes from "all address spaces must be integers" to "all
memory spaces must be convertible to integer spaces", a looser
requirement that reduces the coupling between portions of MLIR.
ON top of that, this change leads to the removal of most of the calls
to getMemorySpaceAsInt(), bringing us closer to removing it.
(A rework of the SPIR-V conversions to use this new system will be in
a folowup commit.)
As a note, one long-term motivation for this change is that I would
eventually like to add an allocaMemorySpace key to MLIR data layouts
and then call getMemRefAddressSpace(allocaMemorySpace) in the
relevant *ToLLVM in order to ensure all alloca()s, whether incoming or
produces during the LLVM lowering, have the correct address space for
a given target.
I expect that the type attribute conversion system may be useful in
other contexts.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D142159
std::optional::value() has undesired exception checking semantics and is
unavailable in older Xcode (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). The
call sites block std::optional migration.
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
When the bounds check attribute is true, the raw buffer load, store,
and atomic operations have well-defined behavior (returning 0 for
loads and ignoring stores) when the buffer access exceeds the bounds
of the memory being accessed.
Because of how LLVM currently implements these buffer operations (as
opaque intrinsics), the backend cannot optimize out this known
behavior and eliminate the memory operations. Therefore, use MLIR's
canonicalization system to eliminate these operations.
Reviewed By: nirvedhmeshram
Differential Revision: https://reviews.llvm.org/D138146
The amdgpu.mfma operator is a wrapper around the Matrix Fused Multiply
Add (MFMA) instructions on some AMD GPUs (the CDNA-based MI-* cards).
This interface allows for selecting the operation to be performed by
specifying the dimensions of the multiplication to be performed and
any additional attributes (such as whether to use reduced-precision
floating-point math) that are needed to select the relevant mfma
instruction and set its parameters.
Reviewed By: ThomasRaoux, nirvedhmeshram
Differential Revision: https://reviews.llvm.org/D132956
Follow up from flipping dialects to both, flip accessor used to prefixed
variant ahead to flipping from _Both to _Prefixed. This just flips to
the accessors introduced in the preceding change which are just prefixed
forms of the existing accessor changed from.
Mechanical change using helper script
https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp and clang-format.
By analogy with the NVGPU dialect, introduce an AMDGPU dialect for
AMD-specific intrinsic wrappers.
The dialect initially includes wrappers around the raw buffer intrinsics.
On AMD GPUs, a memref can be converted to a "buffer descriptor" that
allows more precise control of memory access, such as by allowing for
out of bounds loads/stores to be replaced by 0/ignored without adding
additional conditional logic, which is important for performance.
The repository currently contains a limited conversion from
transfer_read/transfer_write to Mubuf intrinsics, which are an older,
deprecated intrinsic for the same functionality.
The new amdgpu.raw_buffer_* ops allow these operations to be used
explicitly and for including metadata such as whether the target
chipset is an RDNA chip or not (which impacts the interpretation of
some bits in the buffer descriptor), while still maintaining an
MLIR-like interface.
(This change also exposes the floating-point atomic add intrinsic.)
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D122765