19 Commits

Author SHA1 Message Date
Alex Zinenko
610139d2d9 [mlir] replace 'emit_c_wrappers' func->llvm conversion option with a pass
The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface
wrappers to be emitted for every builtin function in the module. While this has
been useful to bootstrap the interface, it is problematic in the longer term as
it may unintentionally affect the functions that should retain their existing
interface, e.g., libm functions obtained by lowering math operations (see
D126964 for an example). Since D77314, we have a finer-grain control over
interface generation via an attribute that avoids the problem entirely. Remove
the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass
that can be run in any pipeline that needs blanket emission of functions to
annotate all builtin functions with the attribute before performing the usual
lowering that accounts for the attribute.

Reviewed By: chelini

Differential Revision: https://reviews.llvm.org/D127952
2022-06-17 11:10:31 +02:00
Mogball
e16d13322b [mlir] (NFC) Clean up bazel and CMake target names
All dialect targets in bazel have been named *Dialect and all dialect
targets in CMake have been named MLIR*Dialect.
2022-06-13 16:24:15 +00:00
Christian Sigg
400fef081a Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."
This change rolls bcfc0a9051014437b55ab932d9aca5ecdca6776b forward (i.e., reverting 369ce54bb302f209239b8ebc77ad824add9df089) with fixed CMakeLists.txt.
2022-06-05 09:11:43 +02:00
Mehdi Amini
369ce54bb3 Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."
This reverts commit bcfc0a9051014437b55ab932d9aca5ecdca6776b.

The build is broken with shared library enabled.
2022-06-04 08:35:45 +00:00
Christian Sigg
bcfc0a9051 [MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.
This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because:

- it performs less Newton iterations
- it avoids the slow path for e.g. denormals
- it allows reuse of the reciprocal for multiple divisions by the same divisor

Test program:
```
#include <stdio.h>
#include "cuda_fp16.h"

// This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below
// and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values.
__device__ half hdiv_newton(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float rcp;
  asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb));

  float result = fa * rcp;
  auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000;
  if (exponent != 0 && exponent != 0x7f800000) {
    float err = __fmaf_rn(-fb, result, fa);
    result = __fmaf_rn(rcp, err, result);
  }

  return __float2half(result);
}

// Surprisingly, this is faster than CUDA's own __hdiv.
__device__ half hdiv_promote(half a, half b) {
  return __float2half(__half2float(a) / __half2float(b));
}

// This is an approximation that is accurate up to 1 ulp.
__device__ half hdiv_approx(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float result;
  asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb));
  return __float2half(result);
}

__global__ void CheckCorrectness() {
  int i = threadIdx.x + blockIdx.x * blockDim.x;
  half x = reinterpret_cast<const half&>(i);
  for (int j = 0; j < 65536; ++j) {
    half y = reinterpret_cast<const half&>(j);
    half d1 = hdiv_newton(x, y);
    half d2 = hdiv_promote(x, y);
    auto s1 = reinterpret_cast<const short&>(d1);
    auto s2 = reinterpret_cast<const short&>(d2);
    if (s1 != s2) {
      printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n",
             __half2float(x), i, __half2float(y), j, __half2float(d1), s1,
             __half2float(d2), s2);
      //__trap();
    }
  }
}

__device__ half dst;

__global__ void ProfileBuiltin(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = x / x;
  }
  dst = x;
}

__global__ void ProfilePromote(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_promote(x, x);
  }
  dst = x;
}

__global__ void ProfileNewton(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_newton(x, x);
  }
  dst = x;
}

__global__ void ProfileApprox(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_approx(x, x);
  }
  dst = x;
}

int main() {
  CheckCorrectness<<<256, 256>>>();
  half one = __float2half(1.0f);
  ProfileBuiltin<<<1, 1>>>(one);  // 1.001s
  ProfilePromote<<<1, 1>>>(one);  // 0.560s
  ProfileNewton<<<1, 1>>>(one);   // 0.508s
  ProfileApprox<<<1, 1>>>(one);   // 0.304s
  auto status = cudaDeviceSynchronize();
  printf("%s\n", cudaGetErrorString(status));
}
```

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D126158
2022-06-04 08:03:29 +02:00
River Riddle
e084679f96 [mlir] Make locations required when adding/creating block arguments
BlockArguments gained the ability to have locations attached a while ago, but they
have always been optional. This goes against the core tenant of MLIR where location
information is a requirement, so this commit updates the API to require locations.

Fixes #53279

Differential Revision: https://reviews.llvm.org/D117633
2022-01-19 17:35:35 -08:00
Alex Zinenko
1ad48d6de2 [mlir] handle nested regions in llvm-legalize-for-export
The translation from the MLIR LLVM dialect to LLVM IR includes a mechanism that
ensures the successors of a block to be different blocks in case block
arguments are passed to them since the opposite cannot be expressed in LLVM IR.
This mechanism previously only worked for functions because it was written
prior to the introduction of other region-carrying operations such as the
OpenMP dialect, which also translates directly to LLVM IR. Modify this
mechanism to handle all regions in the module and not only functions.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D117548
2022-01-18 17:09:14 +01:00
Mehdi Amini
be0a7e9f27 Adjust "end namespace" comment in MLIR to match new agree'd coding style
See D115115 and this mailing list discussion:
https://lists.llvm.org/pipermail/llvm-dev/2021-December/154199.html

Differential Revision: https://reviews.llvm.org/D115309
2021-12-08 06:05:26 +00:00
River Riddle
65fcddff24 [mlir][BuiltinDialect] Resolve comments from D91571
* Move ops to a BuiltinOps.h
* Add file comments
2020-11-19 11:12:49 -08:00
River Riddle
73ca690df8 [mlir][NFC] Remove references to Module.h and Function.h
These includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.

Differential Revision: https://reviews.llvm.org/D91572
2020-11-17 00:55:47 -08:00
Rahul Joshi
d150662024 [MLIR][NFC] Eliminate .getBlocks() when not needed
Differential Revision: https://reviews.llvm.org/D82229
2020-06-19 14:16:21 -07:00
Stephen Neuendorffer
5469f434bb [MLIR] Reapply: Adjust libMLIR building to more closely follow libClang
This reverts commit ab1ca6e60fc58b857cc5030ca6e024d20d919cb9.
2020-05-04 20:47:57 -07:00
Stephen Neuendorffer
ab1ca6e60f Revert "[MLIR] Adjust libMLIR building to more closely follow libClang"
This reverts commit 4f0f436749c264c16eb226c9b9b132e07e3650a6.

This seems to show some compile dependence problems, and also breaks flang.
2020-05-04 12:40:12 -07:00
Valentin Churavy
4f0f436749 [MLIR] Adjust libMLIR building to more closely follow libClang
- Exports MLIR targets to be used out-of-tree.
- mimicks `add_clang_library` and `add_flang_library`.
- Fixes libMLIR.so

After https://reviews.llvm.org/D77515 libMLIR.so was no longer containing
any object files. We originally had a cludge there that made it work with
the static initalizers and when switchting away from that to the way the
clang shlib does it, I noticed that MLIR doesn't create a `obj.{name}` target,
and doesn't export it's targets to `lib/cmake/mlir`.

This is due to MLIR using `add_llvm_library` under the hood, which adds
the target to `llvmexports`.

Differential Revision: https://reviews.llvm.org/D78773

[MLIR] Fix libMLIR.so and LLVM_LINK_LLVM_DYLIB

Primarily, this patch moves all mlir references to LLVM libraries into
either LLVM_LINK_COMPONENTS or LINK_COMPONENTS.  This enables magic in
the llvm cmake files to automatically replace reference to LLVM components
with references to libLLVM.so when necessary.  Among other things, this
completes fixing libMLIR.so, which has been broken for some configurations
since D77515.

Unlike previously, the pattern is now that mlir libraries should almost
always use add_mlir_library.  Previously, some libraries still used
add_llvm_library.  However, this confuses the export of targets for use
out of tree because libraries specified with add_llvm_library are exported
by LLVM.  Instead users which don't need/can't be linked into libMLIR.so
can specify EXCLUDE_FROM_LIBMLIR

A common error mode is linking with LLVM libraries outside of LINK_COMPONENTS.
This almost always results in symbol confusion or multiply defined options
in LLVM when the same object file is included as a static library and
as part of libLLVM.so.  To catch these errors more directly, there's now
mlir_check_all_link_libraries.

To simplify usage of add_mlir_library, we assume that all mlir
libraries depend on LLVMSupport, so it's not necessary to separately specify
it.

tested with:
BUILD_SHARED_LIBS=on,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB,
BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB + LLVM_LINK_LLVM_DYLIB.

By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79067

[MLIR] Move from using target_link_libraries to LINK_LIBS

This allows us to correctly generate dependencies for derived targets,
such as targets which are created for object libraries.

By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Differential Revision: https://reviews.llvm.org/D79243

Three commits have been squashed to avoid intermediate build breakage.
2020-05-04 11:40:46 -07:00
River Riddle
1834ad4a69 [mlir][Pass] Update the PassGen to generate base classes instead of utilities
Summary:
This is much cleaner, and fits the same structure as many other tablegen backends. This was not done originally as the CRTP in the pass classes made it overly verbose/complex.

Differential Revision: https://reviews.llvm.org/D77367
2020-04-07 14:08:52 -07:00
River Riddle
80aca1eaf7 [mlir][Pass] Remove the use of CRTP from the Pass classes
This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.

Differential Revision: https://reviews.llvm.org/D77350
2020-04-07 14:08:52 -07:00
River Riddle
9a277af2d4 [mlir][Pass] Add support for generating pass utilities via tablegen
This revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the tablegen definition. This removes additional boilerplate from the pass, and also makes it easier to remove the reliance on the pass registry to provide certain things(e.g. the pass argument).

Differential Revision: https://reviews.llvm.org/D76659
2020-04-01 02:10:46 -07:00
River Riddle
e3d834a54a [mlir][Pass] Move the registration of dialect passes to tablegen
This generates a Passes.td for all of the dialects that have transformation passes. This removes the need for global registration for all of the dialect passes.

Differential Revision: https://reviews.llvm.org/D76657
2020-04-01 02:10:46 -07:00
Alex Zinenko
e119980f3f [mlir] LLVM dialect: move ensureDistinctSuccessors out of std->LLVM conversion
MLIR supports terminators that have the same successor block with different
block operands, which cannot be expressed in the LLVM's phi-notation as the
block identifier is used to tell apart the predecessors. This limitation can be
worked around by branching to a new block instead, with this new block
unconditionally branching to the original successor and forwarding the
argument. Until now, this transformation was performed during the conversion
from the Standard to the LLVM dialect. This does not scale well to multiple
dialects targeting the LLVM dialect as all of them would have to be aware of
this limitation and perform the preparatory transformation. Instead, do it as a
separate pass and run it immediately before the translation.

Differential Revision: https://reviews.llvm.org/D75619
2020-03-17 15:22:14 +01:00