llvm-project/SOURCES at users/tblah/openmp-parse-errors-1 - llvm-project - shylie's gitea

shylie/llvm-project

Fraser Cormack 586cacdbdd

[libclc] Optimize generic CLC fmin/fmax (#128506 )

With this commit, the CLC fmin/fmax builtins use clang's
__builtin_elementwise_(min|max)imumnum which helps us generate LLVM
minimumnum/maximumnum intrinsics directly. These intrinsics uniformly
select the non-NaN input over the (quiet or signalling) NaN input, which
corresponds to what the OpenCL CTS tests.

These intrinsics maintain the vector types, as opposed to scalarizing,
which was previously happening. This commit therefore helps to optimize
codegen for those targets.

Note that there is ongoing discussion regarding how these builtins
should handle signalling NaNs in the OpenCL specification and whether
they should be able to return a quiet NaN as per the IEEE behaviour. If
the specification and/or CTS is ever updated to allow or mandate
returning a qNAN, these builtins could/should be updated to use
__builtin_elementwise_(min|max)num instead which would lower to LLVM
minnum/maxnum intrinsics.

The SPIR-V targets maintain the old implementations, as the LLVM ->
SPIR-V translator can't currently handle the LLVM intrinsics. The
implementation has been simplifies to consistently use clang builtins,
as opposed to before where the half version was explicitly defined.

[1] https://github.com/KhronosGroup/OpenCL-CTS/pull/2285

2025-07-29 13:21:42 +01:00

4 lines

67 B

Plaintext

Raw Permalink Blame History

	`math/clc_fmax.cl`
	`math/clc_fmin.cl`
	`math/clc_runtime_has_hw_fma32.cl`