This commit removes the no longer required bitcast inserting pattern in
LLVM dialect's type consistency pattern. This was previously required to
enable Mem2Reg and SROA to promote accesses that had different types.
Recent changes to both passes added direct support for this feature to
them, so the pattern has no further use.
This commit relaxes the assumption of type consistency for LLVM dialect
load and store operations in SROA. Instead, there is now a check that
loads and stores are in the bounds specified by the sub-slot they
access.
This commit additionally removes the corresponding patterns from the
type consistency pass, as they are no longer necessary.
Note: It will be necessary to extend Mem2Reg with the logic for
differently sized accesses as well. This is non-the-less a strict
upgrade for productive flows, as the type consistency pass can produce
invalid IR for some odd cases.
GEPArg can only be constructed from int32_t and mlir::Value. Explicitly
cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing
conversion warnings on MSVC. Some recent examples of such are:
```
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'size_t' to 'T' requires a narrowing
conversion
with
[
T=mlir::LLVM::GEPArg
]
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'unsigned int' to 'T' requires a narrowing
conversion
with
[
T=mlir::LLVM::GEPArg
]
```
Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
This commit changes the MLIR to LLVMIR export to also attach subprogram
debug attachements to function declarations.
This commit additonally fixes the two passes that produce subprograms to
not attach the "Definition" flag to function declarations. This
otherwise results in invalid LLVM IR.
Add a rewriter for DIExpressions & use it to run legalization patterns
before exporting to llvm (because LLVM dialect allows DI Expressions
that may not be valid in LLVM IR).
The rewriter driver works similarly to the existing mlir rewriter
drivers, except it operates on lists of DIExpressionElemAttr (i.e.
DIExpressionAttr). Each rewrite pattern transforms a range of
DIExpressionElemAttr into a new list of DIExpressionElemAttr.
In addition, this PR sets up a place to add legalization patterns that
are broadly applicable internally to the LLVM dialect, and they will
always be applied prior to export. This PR adds one pattern for merging
fragment operators.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
This commit adds an optional distinct attribute parameter to the
DISubprogramAttr. This enables modeling of distinct subprograms, as
required for LLVM IR. This change is required to avoid accidential
uniquing of subprograms on functions that would lead to invalid LLVM IR
post export.
This commit adds a distinct attribute parameter to the DICompileUnit to
enable the modeling of distinctness. LLVM requires DICompileUnits to be
distinct and there are cases where one gets two equivalent compilation
units but LLVM still requires differentiates them. We observed such
cases for combinations of LTO and inline functions.
This patch also changes the DIScopeForLLVMFuncOp pass to a module pass,
to ensure that only one distinct DICompileUnit is created, instead of
one for each function.
Fixes https://github.com/llvm/llvm-project/issues/74453.
The `gepToByteOffset` was implicitly casting an signed integer to an
unsigned integer even though negative dimensions are valid for
`llvm.getelementptr`.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
This commit removes the support for typed pointers from the LLVM
dialect. Typed pointers have been deprecated for a while and thus this
removal was announced in a PSA:
https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
This change includes:
- Changing the ` LLVMPointerType`
- Removing remaining usages of the builders and the now removed element
type
- Fixing assembly formats that require fully qualified pointer types
- Updating ODS pointer constraints
This adds a new pass to add an Any comdat to each linkonce
and linkonce_odr function in the LLVM dialect. These comdats are
necessary on Windows
to allow the default system linker to link binaries containing these
functions.
The pattern so far has only supported splitting stores into struct types, marking arrays as explicitly unsupported. This would lead to stores into arrays not being made type-consistent like structs and therefore also not being properly split by SROA and mem2reg.
This patch adds support for array types by creating a common abstraction for both structs and arrays, making an array of size n essentially be treated like a struct with n fields of the arrays element type.
This gives us immediate feature parity without special casing for either of the types.
Differential Revision: https://reviews.llvm.org/D154840
The pattern was previously only capable of storing into struct fields which are primitive types. If the struct contained a nested struct it immediately aborted the pattern rewrite.
This patch introduces the capability of recursively splitting stores into sub-structs as well. This is achieved by splitting an aggregate sized integer from the original store argument and letting repeated pattern applications further split it into field stores.
Additionally, the pattern is also capable of handling partial writes into aggregates, which is a pattern clang may generate as well. Special care had to be taken to make sure no stores are created that weren't in the original code.
Differential Revision: https://reviews.llvm.org/D154707
The goal of this pattern is to eliminate all GEPs that have more than two indices by splitting it into multiple GEPs.
The advantage of this change is that the resulting GEPs only ever index into one aggregate at the time. This enables handling sub-aggregates in other patterns and also creates IR with easier to deduce pointer element types.
As a minor note, `getResultPtrElementType` for `GEPOp` was rewritten since it did not properly handle dynamic indices. The way GEPOp is specified, the resulting pointer element type can *always* be deduced from its base type and indices.
Differential Revision: https://reviews.llvm.org/D154692
Current patterns attempt to immediately create type-consistent stores by bitcasting the value to the type-hint leading to needless code duplication.
This patch extracts that case into its own pattern, allowing other patterns to create type-inconsistent stores and have subsequent pattern applications turn it into a type-consistent store.
Differential Revision: https://reviews.llvm.org/D154587
The original plan was to turn this into its own pattern, but one of the difficulties was deeming when splitting the vector is required.
`SplitIntegerStores` essentially already did that by checking for field overlap.
Therefore, it was renamed to `SplitStores` and extended to splitting stores with values of vector and integer type.
The vector splitting is done in a simple manner by simply using `extractelement` to get each vector element. Subsequent pattern applications are responsible for further cleaning up the output and making it type-consistent.
Worst case, if the code cannot be transformed into a type-consistent form (due to e.g. the code explicitly doing partial writes to elements or similar), we might needlessly do a vector split.
Differential Revision: https://reviews.llvm.org/D154583
This is a common pattern produced by clang and similar.
Essentially, it coalesces stores into adjacent integer fields into a single integer store.
This violates our definition of type-consistency that the pass is supposed to enforce and also prevents SROA and mem2reg from eliminating `alloca`s.
This patch fixes that by splitting these stores into multiple stores.
It does so by simply using logical shift rights and truncating the produced value to the size of the field, optionally bitcasting before storing into the field.
The implementation is currently very simple, only working on struct types of a single depth and adjacent fields in that struct, with no padding inbetween. Future work could improve on these once required.
Differential Revision: https://reviews.llvm.org/D154449
This revision introduces new rewrites to improve the type consistency of
a program expressed in the LLVM dialect.
Type consistency means that a given opaque pointer is consistently used
assuming the same pointee type, in a best effort basis. The introduced
rewrites modify the program to improve type consistency while preserving
the same semantics. This is useful for two main reasons:
- Transformation passes in the LLVM dialect like SROA or Mem2Reg can
analyse code better if type information and structure is used in a
consistent manner. Opaque pointers make this difficult to enforce, but
type consistency improvements increase the amount of occurences where
reasonable analysis can pick up on transformable patterns.
- While LLVM IR is not particularly picky about inconsistent type uses,
it may be of interest to lift LLVM IR into higher level dialects.
Having more instances of consistent type information would help
lifting into dialects that do care about consistent types.
In order to detect cases of inconsistent uses, operations returning an
LLVMPointer can implement the GetResultPtrElementType interface, which
allows getting a hint of which type the provided pointer should see its
pointee as, if such hint is available. The provided rewrites will then
use this hint to attempt to modify operations using the pointers so they
use the hinted type when dealing with the pointer.
Two transformations have been implemented in this revision:
- When a load/store uses a struct ptr directly to write to the first
element of the struct, introduce a GEP to the first element so the
type structure is preserved.
- When a GEP statically indexes in a pointer with a base type
inconsistent with the hint, try to find indices using the hint as a
base type that would result in the same offset, and replace the GEP
with this indexing.
More transformations are possible and I hope this is only a beginning
for this simplification effort.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D153973
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
This may be seen as a hack, but it allows for any piece of MLIR to be able
to end up with DWARF debug info through LLVM.
Assuming the operations in the function have location such as FileLineCol,
this provides backtraces with line tables and allows to step in a debugger.
That makes this pass a perfect companion to -snapshot-op-locations
It was also the default behavior of MLIR to LLVM IR translation until MLIR
got support for proper debug info attributes.
Differential Revision: https://reviews.llvm.org/D144069
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface
wrappers to be emitted for every builtin function in the module. While this has
been useful to bootstrap the interface, it is problematic in the longer term as
it may unintentionally affect the functions that should retain their existing
interface, e.g., libm functions obtained by lowering math operations (see
D126964 for an example). Since D77314, we have a finer-grain control over
interface generation via an attribute that avoids the problem entirely. Remove
the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass
that can be run in any pipeline that needs blanket emission of functions to
annotate all builtin functions with the attribute before performing the usual
lowering that accounts for the attribute.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D127952
This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because:
- it performs less Newton iterations
- it avoids the slow path for e.g. denormals
- it allows reuse of the reciprocal for multiple divisions by the same divisor
Test program:
```
#include <stdio.h>
#include "cuda_fp16.h"
// This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below
// and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values.
__device__ half hdiv_newton(half a, half b) {
float fa = __half2float(a);
float fb = __half2float(b);
float rcp;
asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb));
float result = fa * rcp;
auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000;
if (exponent != 0 && exponent != 0x7f800000) {
float err = __fmaf_rn(-fb, result, fa);
result = __fmaf_rn(rcp, err, result);
}
return __float2half(result);
}
// Surprisingly, this is faster than CUDA's own __hdiv.
__device__ half hdiv_promote(half a, half b) {
return __float2half(__half2float(a) / __half2float(b));
}
// This is an approximation that is accurate up to 1 ulp.
__device__ half hdiv_approx(half a, half b) {
float fa = __half2float(a);
float fb = __half2float(b);
float result;
asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb));
return __float2half(result);
}
__global__ void CheckCorrectness() {
int i = threadIdx.x + blockIdx.x * blockDim.x;
half x = reinterpret_cast<const half&>(i);
for (int j = 0; j < 65536; ++j) {
half y = reinterpret_cast<const half&>(j);
half d1 = hdiv_newton(x, y);
half d2 = hdiv_promote(x, y);
auto s1 = reinterpret_cast<const short&>(d1);
auto s2 = reinterpret_cast<const short&>(d2);
if (s1 != s2) {
printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n",
__half2float(x), i, __half2float(y), j, __half2float(d1), s1,
__half2float(d2), s2);
//__trap();
}
}
}
__device__ half dst;
__global__ void ProfileBuiltin(half x) {
#pragma unroll 1
for (int i = 0; i < 10000000; ++i) {
x = x / x;
}
dst = x;
}
__global__ void ProfilePromote(half x) {
#pragma unroll 1
for (int i = 0; i < 10000000; ++i) {
x = hdiv_promote(x, x);
}
dst = x;
}
__global__ void ProfileNewton(half x) {
#pragma unroll 1
for (int i = 0; i < 10000000; ++i) {
x = hdiv_newton(x, x);
}
dst = x;
}
__global__ void ProfileApprox(half x) {
#pragma unroll 1
for (int i = 0; i < 10000000; ++i) {
x = hdiv_approx(x, x);
}
dst = x;
}
int main() {
CheckCorrectness<<<256, 256>>>();
half one = __float2half(1.0f);
ProfileBuiltin<<<1, 1>>>(one); // 1.001s
ProfilePromote<<<1, 1>>>(one); // 0.560s
ProfileNewton<<<1, 1>>>(one); // 0.508s
ProfileApprox<<<1, 1>>>(one); // 0.304s
auto status = cudaDeviceSynchronize();
printf("%s\n", cudaGetErrorString(status));
}
```
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D126158
BlockArguments gained the ability to have locations attached a while ago, but they
have always been optional. This goes against the core tenant of MLIR where location
information is a requirement, so this commit updates the API to require locations.
Fixes#53279
Differential Revision: https://reviews.llvm.org/D117633
The translation from the MLIR LLVM dialect to LLVM IR includes a mechanism that
ensures the successors of a block to be different blocks in case block
arguments are passed to them since the opposite cannot be expressed in LLVM IR.
This mechanism previously only worked for functions because it was written
prior to the introduction of other region-carrying operations such as the
OpenMP dialect, which also translates directly to LLVM IR. Modify this
mechanism to handle all regions in the module and not only functions.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D117548
These includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.
Differential Revision: https://reviews.llvm.org/D91572
- Exports MLIR targets to be used out-of-tree.
- mimicks `add_clang_library` and `add_flang_library`.
- Fixes libMLIR.so
After https://reviews.llvm.org/D77515 libMLIR.so was no longer containing
any object files. We originally had a cludge there that made it work with
the static initalizers and when switchting away from that to the way the
clang shlib does it, I noticed that MLIR doesn't create a `obj.{name}` target,
and doesn't export it's targets to `lib/cmake/mlir`.
This is due to MLIR using `add_llvm_library` under the hood, which adds
the target to `llvmexports`.
Differential Revision: https://reviews.llvm.org/D78773
[MLIR] Fix libMLIR.so and LLVM_LINK_LLVM_DYLIB
Primarily, this patch moves all mlir references to LLVM libraries into
either LLVM_LINK_COMPONENTS or LINK_COMPONENTS. This enables magic in
the llvm cmake files to automatically replace reference to LLVM components
with references to libLLVM.so when necessary. Among other things, this
completes fixing libMLIR.so, which has been broken for some configurations
since D77515.
Unlike previously, the pattern is now that mlir libraries should almost
always use add_mlir_library. Previously, some libraries still used
add_llvm_library. However, this confuses the export of targets for use
out of tree because libraries specified with add_llvm_library are exported
by LLVM. Instead users which don't need/can't be linked into libMLIR.so
can specify EXCLUDE_FROM_LIBMLIR
A common error mode is linking with LLVM libraries outside of LINK_COMPONENTS.
This almost always results in symbol confusion or multiply defined options
in LLVM when the same object file is included as a static library and
as part of libLLVM.so. To catch these errors more directly, there's now
mlir_check_all_link_libraries.
To simplify usage of add_mlir_library, we assume that all mlir
libraries depend on LLVMSupport, so it's not necessary to separately specify
it.
tested with:
BUILD_SHARED_LIBS=on,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB + LLVM_LINK_LLVM_DYLIB.
By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79067
[MLIR] Move from using target_link_libraries to LINK_LIBS
This allows us to correctly generate dependencies for derived targets,
such as targets which are created for object libraries.
By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79243
Three commits have been squashed to avoid intermediate build breakage.
Summary:
This is much cleaner, and fits the same structure as many other tablegen backends. This was not done originally as the CRTP in the pass classes made it overly verbose/complex.
Differential Revision: https://reviews.llvm.org/D77367
This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.
Differential Revision: https://reviews.llvm.org/D77350
This revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the tablegen definition. This removes additional boilerplate from the pass, and also makes it easier to remove the reliance on the pass registry to provide certain things(e.g. the pass argument).
Differential Revision: https://reviews.llvm.org/D76659
This generates a Passes.td for all of the dialects that have transformation passes. This removes the need for global registration for all of the dialect passes.
Differential Revision: https://reviews.llvm.org/D76657
MLIR supports terminators that have the same successor block with different
block operands, which cannot be expressed in the LLVM's phi-notation as the
block identifier is used to tell apart the predecessors. This limitation can be
worked around by branching to a new block instead, with this new block
unconditionally branching to the original successor and forwarding the
argument. Until now, this transformation was performed during the conversion
from the Standard to the LLVM dialect. This does not scale well to multiple
dialects targeting the LLVM dialect as all of them would have to be aware of
this limitation and perform the preparatory transformation. Instead, do it as a
separate pass and run it immediately before the translation.
Differential Revision: https://reviews.llvm.org/D75619