
This builtin is a little more involved than others as targets deal with fma in various different ways. Fundamentally, the CLC __clc_fma builtin compiles to __builtin_elementwise_fma, which compiles to the @llvm.fma intrinsic. However, in the case of fp32 fma some targets call the __clc_sw_fma function, which provides a software implementation of the builtin. This in principle is controlled by the __CLC_HAVE_HW_FMA32 macro and may be a runtime decision, depending on how the target defines that macro. All targets build the CLC fma functions for all types. This is to the CLC library can have a reliable internal implementation for its own purposes. For AMD/NVPTX targets there are no meaningful changes to the generated LLVM bytecode. Some blocks of code have moved around, which confounds llvm-diff. For the clspv and SPIR-V/Mesa targets, only fp32 fma is of interest. Its use in libclc is tightly controlled by checking __CLC_HAVE_HW_FMA32 first. This can either be a compile-time constant (1, for clspv) or a runtime function for SPIR-V/Mesa. The SPIR-V/Mesa target only provided fp32 fma in the OpenCL layer. It unconditionally mapped that to the __clc_sw_fma builtin, even though the generic version in theory had a runtime toggle through __CLC_HAVE_HW_FMA32 specifically for that target. Callers of fma, though, would end up using the ExtInst fma, *not* calling the _Z3fmafff function provided by libclc. This commit keeps this system in place in the OpenCL layer, by mapping fma to __clc_sw_fma. Where other builtins would previously call fma (i.e., result in the ExtInst), they now call __clc_fma. This function checks the __CLC_HAVE_HW_FMA32 runtime toggle, which selects between the slow version or the quick version. The quick version is the LLVM fma intrinsic which llvm-spirv translates to the ExtInst. The clspv target had its own software implementation of fp32 fma, which it called unconditionally - even though __CLC_HAVE_HW_FMA32 is 1 for that target. This is potentially just so its library ships a software version which it can fall back on. In the OpenCL layer, the target doesn't provide fp64 fma, and maps fp16 fma to fp32 mad. This commit keeps this system roughly in place: in the OpenCL layer it maps fp32 fma to __clc_sw_fma, and fp16 fma to mad. Where builtins would previously call into fma, they now call __clc_fma, which compiles to the LLVM intrinsic. If this goes through a translation to SPIR-V it will become the fma ExtInst, or the intrinsic could be replaced by the _Z3fmafff software implementation. The clspv and SPIR-V/Mesa targets could potentially be cleaned up later, depending on their needs.
libclc
libclc is an open source implementation of the library requirements of the OpenCL C programming language, as specified by the OpenCL 1.1 Specification. The following sections of the specification impose library requirements:
- 6.1: Supported Data Types
- 6.2.3: Explicit Conversions
- 6.2.4.2: Reinterpreting Types Using as_type() and as_typen()
- 6.9: Preprocessor Directives and Macros
- 6.11: Built-in Functions
- 9.3: Double Precision Floating-Point
- 9.4: 64-bit Atomics
- 9.5: Writing to 3D image memory objects
- 9.6: Half Precision Floating-Point
libclc is intended to be used with the Clang compiler's OpenCL frontend.
libclc is designed to be portable and extensible. To this end, it provides generic implementations of most library requirements, allowing the target to override the generic implementation at the granularity of individual functions.
libclc currently supports PTX, AMDGPU, SPIRV and CLSPV targets, but support for more targets is welcome.
Compiling and installing
(in the following instructions you can use make
or ninja
)
For an in-tree build, Clang must also be built at the same time:
$ cmake <path-to>/llvm-project/llvm/CMakeLists.txt -DLLVM_ENABLE_PROJECTS="libclc;clang" \
-DCMAKE_BUILD_TYPE=Release -G Ninja
$ ninja
Then install:
$ ninja install
Note you can use the DESTDIR
Makefile variable to do staged installs.
$ DESTDIR=/path/for/staged/install ninja install
To build out of tree, or in other words, against an existing LLVM build or install:
$ cmake <path-to>/llvm-project/libclc/CMakeLists.txt -DCMAKE_BUILD_TYPE=Release \
-G Ninja -DLLVM_DIR=$(<path-to>/llvm-config --cmakedir)
$ ninja
Then install as before.
In both cases this will include all supported targets. You can choose which
targets are enabled by passing -DLIBCLC_TARGETS_TO_BUILD
to CMake. The default
is all
.
In both cases, the LLVM used must include the targets you want libclc support for
(AMDGPU
and NVPTX
are enabled in LLVM by default). Apart from SPIRV
where you do
not need an LLVM target but you do need the
llvm-spirv tool available.
Either build this in-tree, or place it in the directory pointed to by
LLVM_TOOLS_BINARY_DIR
.