5 Commits

Author SHA1 Message Date
Fraser Cormack
76d1cb22c1
[libclc] Move rotate to CLC library; optimize (#125713)
This commit moves the rotate builtin to the CLC library.

It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n)
intrinsic, for both scalar and vector types. The previous implementation
was too cautious in its handling of the shift amount; the OpenCL rules
state that the shift amount is always treated as an unsigned value
modulo the bitwidth.
2025-02-05 10:38:23 +00:00
Fraser Cormack
fe694b18dc
[libclc] Move mad_sat to CLC; optimize for vector types (#125517)
This commit moves the mad_sat builtin to the CLC library.

It also optimizes it for vector types by avoiding scalarization. To help
do this it transforms the previous control-flow code into vector select
code. This has also been done for the scalar versions for simplicity.
2025-02-03 17:50:42 +00:00
Fraser Cormack
7441e87fe0
[libclc] Move several integer functions to CLC library (#116786)
This commit moves over the OpenCL clz, hadd, mad24, mad_hi, mul24,
mul_hi, popcount, rhadd, and upsample builtins to the CLC library.

This commit also optimizes the vector forms of the mul_hi and upsample
builtins to consistently remain in vector types, instead of recursively
splitting vectors down to the scalar form.

The OpenCL mad_hi builtin wasn't previously publicly available from the
CLC libraries, as it was hash-defined to mul_hi in the header files.
That issue has been fixed, and mad_hi is now exposed.

The custom AMD implementation/workaround for popcount has been removed
as it was only required for clang < 7.

There are still two integer functions which haven't been moved over. The
OpenCL mad_sat builtin uses many of the other integer builtins, and
would benefit from optimization for vector types. That can take place in
a follow-up commit. The rotate builtin could similarly use some more
dedicated focus, potentially using clang builtins.
2025-01-29 13:45:33 +00:00
Fraser Cormack
12cdf4330d
[libclc] Move (add|sub)_sat to CLC; optimize (#124903)
Using the `__builtin_elementwise_(add|sub)_sat` functions allows us to
directly optimize to the desired intrinsic, and avoid scalarization for
vector types.
2025-01-29 11:12:40 +00:00
Fraser Cormack
7be30fd533 [libclc] Move abs/abs_diff to CLC library 2024-11-06 09:16:35 +00:00