This commit moves the rotate builtin to the CLC library.
It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n)
intrinsic, for both scalar and vector types. The previous implementation
was too cautious in its handling of the shift amount; the OpenCL rules
state that the shift amount is always treated as an unsigned value
modulo the bitwidth.
This commit moves the mad_sat builtin to the CLC library.
It also optimizes it for vector types by avoiding scalarization. To help
do this it transforms the previous control-flow code into vector select
code. This has also been done for the scalar versions for simplicity.
This commit moves over the OpenCL clz, hadd, mad24, mad_hi, mul24,
mul_hi, popcount, rhadd, and upsample builtins to the CLC library.
This commit also optimizes the vector forms of the mul_hi and upsample
builtins to consistently remain in vector types, instead of recursively
splitting vectors down to the scalar form.
The OpenCL mad_hi builtin wasn't previously publicly available from the
CLC libraries, as it was hash-defined to mul_hi in the header files.
That issue has been fixed, and mad_hi is now exposed.
The custom AMD implementation/workaround for popcount has been removed
as it was only required for clang < 7.
There are still two integer functions which haven't been moved over. The
OpenCL mad_sat builtin uses many of the other integer builtins, and
would benefit from optimization for vector types. That can take place in
a follow-up commit. The rotate builtin could similarly use some more
dedicated focus, potentially using clang builtins.
Using the `__builtin_elementwise_(add|sub)_sat` functions allows us to
directly optimize to the desired intrinsic, and avoid scalarization for
vector types.