Fraser Cormack 586cacdbdd
[libclc] Optimize generic CLC fmin/fmax (#128506)
With this commit, the CLC fmin/fmax builtins use clang's
__builtin_elementwise_(min|max)imumnum which helps us generate LLVM
minimumnum/maximumnum intrinsics directly. These intrinsics uniformly
select the non-NaN input over the (quiet or signalling) NaN input, which
corresponds to what the OpenCL CTS tests.

These intrinsics maintain the vector types, as opposed to scalarizing,
which was previously happening. This commit therefore helps to optimize
codegen for those targets.

Note that there is ongoing discussion regarding how these builtins
should handle signalling NaNs in the OpenCL specification and whether
they should be able to return a quiet NaN as per the IEEE behaviour. If
the specification and/or CTS is ever updated to allow or mandate
returning a qNAN, these builtins could/should be updated to use
__builtin_elementwise_(min|max)num instead which would lower to LLVM
minnum/maxnum intrinsics.

The SPIR-V targets maintain the old implementations, as the LLVM ->
SPIR-V translator can't currently handle the LLVM intrinsics. The
implementation has been simplifies to consistently use clang builtins,
as opposed to before where the half version was explicitly defined.

[1] https://github.com/KhronosGroup/OpenCL-CTS/pull/2285
2025-07-29 13:21:42 +01:00
..

libclc

libclc is an open source implementation of the library requirements of the OpenCL C programming language, as specified by the OpenCL 1.1 Specification. The following sections of the specification impose library requirements:

  • 6.1: Supported Data Types
  • 6.2.3: Explicit Conversions
  • 6.2.4.2: Reinterpreting Types Using as_type() and as_typen()
  • 6.9: Preprocessor Directives and Macros
  • 6.11: Built-in Functions
  • 9.3: Double Precision Floating-Point
  • 9.4: 64-bit Atomics
  • 9.5: Writing to 3D image memory objects
  • 9.6: Half Precision Floating-Point

libclc is intended to be used with the Clang compiler's OpenCL frontend.

libclc is designed to be portable and extensible. To this end, it provides generic implementations of most library requirements, allowing the target to override the generic implementation at the granularity of individual functions.

libclc currently supports PTX, AMDGPU, SPIRV and CLSPV targets, but support for more targets is welcome.

Compiling and installing

(in the following instructions you can use make or ninja)

For an in-tree build, Clang must also be built at the same time:

$ cmake <path-to>/llvm-project/llvm/CMakeLists.txt -DLLVM_ENABLE_PROJECTS="libclc;clang" \
    -DCMAKE_BUILD_TYPE=Release -G Ninja
$ ninja

Then install:

$ ninja install

Note you can use the DESTDIR Makefile variable to do staged installs.

$ DESTDIR=/path/for/staged/install ninja install

To build out of tree, or in other words, against an existing LLVM build or install:

$ cmake <path-to>/llvm-project/libclc/CMakeLists.txt -DCMAKE_BUILD_TYPE=Release \
  -G Ninja -DLLVM_DIR=$(<path-to>/llvm-config --cmakedir)
$ ninja

Then install as before.

In both cases this will include all supported targets. You can choose which targets are enabled by passing -DLIBCLC_TARGETS_TO_BUILD to CMake. The default is all.

In both cases, the LLVM used must include the targets you want libclc support for (AMDGPU and NVPTX are enabled in LLVM by default). Apart from SPIRV where you do not need an LLVM target but you do need the llvm-spirv tool available. Either build this in-tree, or place it in the directory pointed to by LLVM_TOOLS_BINARY_DIR.

Website

https://libclc.llvm.org/