llvm-project

Author	SHA1	Message	Date
Alex Zinenko	610139d2d9	[mlir] replace 'emit_c_wrappers' func->llvm conversion option with a pass The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface wrappers to be emitted for every builtin function in the module. While this has been useful to bootstrap the interface, it is problematic in the longer term as it may unintentionally affect the functions that should retain their existing interface, e.g., libm functions obtained by lowering math operations (see D126964 for an example). Since D77314, we have a finer-grain control over interface generation via an attribute that avoids the problem entirely. Remove the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass that can be run in any pipeline that needs blanket emission of functions to annotate all builtin functions with the attribute before performing the usual lowering that accounts for the attribute. Reviewed By: chelini Differential Revision: https://reviews.llvm.org/D127952	2022-06-17 11:10:31 +02:00
Mogball	e16d13322b	[mlir] (NFC) Clean up bazel and CMake target names All dialect targets in bazel have been named Dialect and all dialect targets in CMake have been named MLIRDialect.	2022-06-13 16:24:15 +00:00
Christian Sigg	400fef081a	Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This change rolls bcfc0a9051014437b55ab932d9aca5ecdca6776b forward (i.e., reverting 369ce54bb302f209239b8ebc77ad824add9df089) with fixed CMakeLists.txt.	2022-06-05 09:11:43 +02:00
Mehdi Amini	369ce54bb3	Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This reverts commit bcfc0a9051014437b55ab932d9aca5ecdca6776b. The build is broken with shared library enabled.	2022-06-04 08:35:45 +00:00
Christian Sigg	bcfc0a9051	[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration. This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because: - it performs less Newton iterations - it avoids the slow path for e.g. denormals - it allows reuse of the reciprocal for multiple divisions by the same divisor Test program: ``` #include <stdio.h> #include "cuda_fp16.h" // This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below // and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values. __device__ half hdiv_newton(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float rcp; asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb)); float result = fa * rcp; auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000; if (exponent != 0 && exponent != 0x7f800000) { float err = __fmaf_rn(-fb, result, fa); result = __fmaf_rn(rcp, err, result); } return __float2half(result); } // Surprisingly, this is faster than CUDA's own __hdiv. __device__ half hdiv_promote(half a, half b) { return __float2half(__half2float(a) / __half2float(b)); } // This is an approximation that is accurate up to 1 ulp. __device__ half hdiv_approx(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float result; asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb)); return __float2half(result); } __global__ void CheckCorrectness() { int i = threadIdx.x + blockIdx.x * blockDim.x; half x = reinterpret_cast<const half&>(i); for (int j = 0; j < 65536; ++j) { half y = reinterpret_cast<const half&>(j); half d1 = hdiv_newton(x, y); half d2 = hdiv_promote(x, y); auto s1 = reinterpret_cast<const short&>(d1); auto s2 = reinterpret_cast<const short&>(d2); if (s1 != s2) { printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n", __half2float(x), i, __half2float(y), j, __half2float(d1), s1, __half2float(d2), s2); //__trap(); } } } __device__ half dst; __global__ void ProfileBuiltin(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = x / x; } dst = x; } __global__ void ProfilePromote(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_promote(x, x); } dst = x; } __global__ void ProfileNewton(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_newton(x, x); } dst = x; } __global__ void ProfileApprox(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_approx(x, x); } dst = x; } int main() { CheckCorrectness<<<256, 256>>>(); half one = __float2half(1.0f); ProfileBuiltin<<<1, 1>>>(one); // 1.001s ProfilePromote<<<1, 1>>>(one); // 0.560s ProfileNewton<<<1, 1>>>(one); // 0.508s ProfileApprox<<<1, 1>>>(one); // 0.304s auto status = cudaDeviceSynchronize(); printf("%s\n", cudaGetErrorString(status)); } ``` Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D126158	2022-06-04 08:03:29 +02:00
River Riddle	e084679f96	[mlir] Make locations required when adding/creating block arguments BlockArguments gained the ability to have locations attached a while ago, but they have always been optional. This goes against the core tenant of MLIR where location information is a requirement, so this commit updates the API to require locations. Fixes #53279 Differential Revision: https://reviews.llvm.org/D117633	2022-01-19 17:35:35 -08:00
Alex Zinenko	1ad48d6de2	[mlir] handle nested regions in llvm-legalize-for-export The translation from the MLIR LLVM dialect to LLVM IR includes a mechanism that ensures the successors of a block to be different blocks in case block arguments are passed to them since the opposite cannot be expressed in LLVM IR. This mechanism previously only worked for functions because it was written prior to the introduction of other region-carrying operations such as the OpenMP dialect, which also translates directly to LLVM IR. Modify this mechanism to handle all regions in the module and not only functions. Reviewed By: wsmoses Differential Revision: https://reviews.llvm.org/D117548	2022-01-18 17:09:14 +01:00
Mehdi Amini	be0a7e9f27	Adjust "end namespace" comment in MLIR to match new agree'd coding style See D115115 and this mailing list discussion: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154199.html Differential Revision: https://reviews.llvm.org/D115309	2021-12-08 06:05:26 +00:00
River Riddle	65fcddff24	[mlir][BuiltinDialect] Resolve comments from D91571 * Move ops to a BuiltinOps.h * Add file comments	2020-11-19 11:12:49 -08:00
River Riddle	73ca690df8	[mlir][NFC] Remove references to Module.h and Function.h These includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp. Differential Revision: https://reviews.llvm.org/D91572	2020-11-17 00:55:47 -08:00
Rahul Joshi	d150662024	[MLIR][NFC] Eliminate .getBlocks() when not needed Differential Revision: https://reviews.llvm.org/D82229	2020-06-19 14:16:21 -07:00
Stephen Neuendorffer	5469f434bb	[MLIR] Reapply: Adjust libMLIR building to more closely follow libClang This reverts commit ab1ca6e60fc58b857cc5030ca6e024d20d919cb9.	2020-05-04 20:47:57 -07:00
Stephen Neuendorffer	ab1ca6e60f	Revert "[MLIR] Adjust libMLIR building to more closely follow libClang" This reverts commit 4f0f436749c264c16eb226c9b9b132e07e3650a6. This seems to show some compile dependence problems, and also breaks flang.	2020-05-04 12:40:12 -07:00
Valentin Churavy	4f0f436749	[MLIR] Adjust libMLIR building to more closely follow libClang - Exports MLIR targets to be used out-of-tree. - mimicks `add_clang_library` and `add_flang_library`. - Fixes libMLIR.so After https://reviews.llvm.org/D77515 libMLIR.so was no longer containing any object files. We originally had a cludge there that made it work with the static initalizers and when switchting away from that to the way the clang shlib does it, I noticed that MLIR doesn't create a `obj.{name}` target, and doesn't export it's targets to `lib/cmake/mlir`. This is due to MLIR using `add_llvm_library` under the hood, which adds the target to `llvmexports`. Differential Revision: https://reviews.llvm.org/D78773 [MLIR] Fix libMLIR.so and LLVM_LINK_LLVM_DYLIB Primarily, this patch moves all mlir references to LLVM libraries into either LLVM_LINK_COMPONENTS or LINK_COMPONENTS. This enables magic in the llvm cmake files to automatically replace reference to LLVM components with references to libLLVM.so when necessary. Among other things, this completes fixing libMLIR.so, which has been broken for some configurations since D77515. Unlike previously, the pattern is now that mlir libraries should almost always use add_mlir_library. Previously, some libraries still used add_llvm_library. However, this confuses the export of targets for use out of tree because libraries specified with add_llvm_library are exported by LLVM. Instead users which don't need/can't be linked into libMLIR.so can specify EXCLUDE_FROM_LIBMLIR A common error mode is linking with LLVM libraries outside of LINK_COMPONENTS. This almost always results in symbol confusion or multiply defined options in LLVM when the same object file is included as a static library and as part of libLLVM.so. To catch these errors more directly, there's now mlir_check_all_link_libraries. To simplify usage of add_mlir_library, we assume that all mlir libraries depend on LLVMSupport, so it's not necessary to separately specify it. tested with: BUILD_SHARED_LIBS=on, BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB, BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB + LLVM_LINK_LLVM_DYLIB. By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com> Differential Revision: https://reviews.llvm.org/D79067 [MLIR] Move from using target_link_libraries to LINK_LIBS This allows us to correctly generate dependencies for derived targets, such as targets which are created for object libraries. By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com> Differential Revision: https://reviews.llvm.org/D79243 Three commits have been squashed to avoid intermediate build breakage.	2020-05-04 11:40:46 -07:00
River Riddle	1834ad4a69	[mlir][Pass] Update the PassGen to generate base classes instead of utilities Summary: This is much cleaner, and fits the same structure as many other tablegen backends. This was not done originally as the CRTP in the pass classes made it overly verbose/complex. Differential Revision: https://reviews.llvm.org/D77367	2020-04-07 14:08:52 -07:00
River Riddle	80aca1eaf7	[mlir][Pass] Remove the use of CRTP from the Pass classes This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend. Differential Revision: https://reviews.llvm.org/D77350	2020-04-07 14:08:52 -07:00
River Riddle	9a277af2d4	[mlir][Pass] Add support for generating pass utilities via tablegen This revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the tablegen definition. This removes additional boilerplate from the pass, and also makes it easier to remove the reliance on the pass registry to provide certain things(e.g. the pass argument). Differential Revision: https://reviews.llvm.org/D76659	2020-04-01 02:10:46 -07:00
River Riddle	e3d834a54a	[mlir][Pass] Move the registration of dialect passes to tablegen This generates a Passes.td for all of the dialects that have transformation passes. This removes the need for global registration for all of the dialect passes. Differential Revision: https://reviews.llvm.org/D76657	2020-04-01 02:10:46 -07:00
Alex Zinenko	e119980f3f	[mlir] LLVM dialect: move ensureDistinctSuccessors out of std->LLVM conversion MLIR supports terminators that have the same successor block with different block operands, which cannot be expressed in the LLVM's phi-notation as the block identifier is used to tell apart the predecessors. This limitation can be worked around by branching to a new block instead, with this new block unconditionally branching to the original successor and forwarding the argument. Until now, this transformation was performed during the conversion from the Standard to the LLVM dialect. This does not scale well to multiple dialects targeting the LLVM dialect as all of them would have to be aware of this limitation and perform the preparatory transformation. Instead, do it as a separate pass and run it immediately before the translation. Differential Revision: https://reviews.llvm.org/D75619	2020-03-17 15:22:14 +01:00

19 Commits