Matt Arsenault 1a9fe1769a
libclc: Update remquo (#187998)
This was failing in the float case without -cl-denorms-are-zero
and failing for double. This now passes in all cases.

This was originally ported from rocm device libs in
8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port
in of more recent changes with a few changes.

- Templatification, which almost but doesn't quite enable
  vectorization yet due to the outer branch and loop.

- Merging of the 3 types into one shared code path, instead of
  duplicating  per type with 3 different functions implemented together.
  There are only some slight differences for the half case, which mostly
  evaluates as float.

- Splitting out of the is_odd tracking, instead of deriving it from the
  accumulated quotient. This costs an extra register, but saves several
instructions. This also enables automatic elimination of all of the quo
  output handling when this code is reused for remainder. I'm guessing
  this would be unnecessary if SimplifyDemandedBits handled phis.

- Removal of the slow FMA path. I don't see how this would ever be
  faster with the number of instructions replacing it. This is really a
  problem for the compiler to solve anyway.
2026-03-23 10:06:59 +00:00
..
2026-03-23 10:06:59 +00:00
2026-02-18 10:43:29 +01:00

libclc

libclc is an open source implementation of the library requirements of the OpenCL C programming language, as specified by the OpenCL 1.1 Specification. The following sections of the specification impose library requirements:

  • 6.1: Supported Data Types
  • 6.2.3: Explicit Conversions
  • 6.2.4.2: Reinterpreting Types Using as_type() and as_typen()
  • 6.9: Preprocessor Directives and Macros
  • 6.11: Built-in Functions
  • 9.3: Double Precision Floating-Point
  • 9.4: 64-bit Atomics
  • 9.5: Writing to 3D image memory objects
  • 9.6: Half Precision Floating-Point

libclc is intended to be used with the Clang compiler's OpenCL frontend.

libclc is designed to be portable and extensible. To this end, it provides generic implementations of most library requirements, allowing the target to override the generic implementation at the granularity of individual functions.

libclc currently supports PTX, AMDGPU, SPIRV and CLSPV targets, but support for more targets is welcome.

Compiling and installing

(in the following instructions you can use make or ninja)

For an in-tree build, Clang must also be built at the same time:

$ cmake <path-to>/llvm-project/llvm/CMakeLists.txt -DLLVM_ENABLE_PROJECTS="clang" \
    -DLLVM_ENABLE_RUNTIMES="libclc" -DCMAKE_BUILD_TYPE=Release -G Ninja
$ ninja

Then install:

$ ninja install

Note you can use the DESTDIR Makefile variable to do staged installs.

$ DESTDIR=/path/for/staged/install ninja install

To build out of tree, or in other words, against an existing LLVM build or install:

$ cmake <path-to>/llvm-project/libclc/CMakeLists.txt -DCMAKE_BUILD_TYPE=Release \
  -G Ninja -DLLVM_DIR=$(<path-to>/llvm-config --cmakedir)
$ ninja

Then install as before.

In both cases this will include all supported targets. You can choose which targets are enabled by passing -DLIBCLC_TARGETS_TO_BUILD to CMake. The default is all.

In both cases, the LLVM used must include the targets you want libclc support for (AMDGPU and NVPTX are enabled in LLVM by default). Apart from SPIRV where you do not need an LLVM target but you do need the llvm-spirv tool available. Either build this in-tree, or place it in the directory pointed to by LLVM_TOOLS_BINARY_DIR.

Website

https://libclc.llvm.org/