This was failing in the float case without -cl-denorms-are-zero and failing for double. This now passes in all cases. This was originally ported from rocm device libs in 8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port in of more recent changes with a few changes. - Templatification, which almost but doesn't quite enable vectorization yet due to the outer branch and loop. - Merging of the 3 types into one shared code path, instead of duplicating per type with 3 different functions implemented together. There are only some slight differences for the half case, which mostly evaluates as float. - Splitting out of the is_odd tracking, instead of deriving it from the accumulated quotient. This costs an extra register, but saves several instructions. This also enables automatic elimination of all of the quo output handling when this code is reused for remainder. I'm guessing this would be unnecessary if SimplifyDemandedBits handled phis. - Removal of the slow FMA path. I don't see how this would ever be faster with the number of instructions replacing it. This is really a problem for the compiler to solve anyway.
libclc
libclc is an open source implementation of the library requirements of the OpenCL C programming language, as specified by the OpenCL 1.1 Specification. The following sections of the specification impose library requirements:
- 6.1: Supported Data Types
- 6.2.3: Explicit Conversions
- 6.2.4.2: Reinterpreting Types Using as_type() and as_typen()
- 6.9: Preprocessor Directives and Macros
- 6.11: Built-in Functions
- 9.3: Double Precision Floating-Point
- 9.4: 64-bit Atomics
- 9.5: Writing to 3D image memory objects
- 9.6: Half Precision Floating-Point
libclc is intended to be used with the Clang compiler's OpenCL frontend.
libclc is designed to be portable and extensible. To this end, it provides generic implementations of most library requirements, allowing the target to override the generic implementation at the granularity of individual functions.
libclc currently supports PTX, AMDGPU, SPIRV and CLSPV targets, but support for more targets is welcome.
Compiling and installing
(in the following instructions you can use make or ninja)
For an in-tree build, Clang must also be built at the same time:
$ cmake <path-to>/llvm-project/llvm/CMakeLists.txt -DLLVM_ENABLE_PROJECTS="clang" \
-DLLVM_ENABLE_RUNTIMES="libclc" -DCMAKE_BUILD_TYPE=Release -G Ninja
$ ninja
Then install:
$ ninja install
Note you can use the DESTDIR Makefile variable to do staged installs.
$ DESTDIR=/path/for/staged/install ninja install
To build out of tree, or in other words, against an existing LLVM build or install:
$ cmake <path-to>/llvm-project/libclc/CMakeLists.txt -DCMAKE_BUILD_TYPE=Release \
-G Ninja -DLLVM_DIR=$(<path-to>/llvm-config --cmakedir)
$ ninja
Then install as before.
In both cases this will include all supported targets. You can choose which
targets are enabled by passing -DLIBCLC_TARGETS_TO_BUILD to CMake. The default
is all.
In both cases, the LLVM used must include the targets you want libclc support for
(AMDGPU and NVPTX are enabled in LLVM by default). Apart from SPIRV where you do
not need an LLVM target but you do need the
llvm-spirv tool available.
Either build this in-tree, or place it in the directory pointed to by
LLVM_TOOLS_BINARY_DIR.