844 Commits

Author SHA1 Message Date
Harald van Dijk
46ee7f1908
[libclc] Avoid out-of-range float-to-int. (#145698)
For a kernel such as

    kernel void foo(__global double3 *z) {
      double3 x = {0.6631661088,0.6612268107,0.1513627528};
      int3 y = {-1980459213,-660855407,615708204};
      *z = pown(x, y);
    }

we were not storing anything to z, because the implementation of pown
relied on an floating-point-to-integer conversion where the
floating-point value was outside of the integer's range. Although in
LLVM IR we permit that operation so long as we end up ignoring its
result -- that is the general rule for poison -- one thing we are not
permitted to do is have conditional branches that depend on it, and
through the call to __clc_ldexp, we did have that.

To fix this, rather than changing expv at the end to INFINITY/0, we can
change v at the start to values that we know will produce INFINITY/0
without performing such out-of-range conversions.

Tested with

    clang --target=nvptx64 -S -O3 -o - test.cl \
      -Xclang -mlink-builtin-bitcode \
      -Xclang runtimes/runtimes-bins/libclc/nvptx64--.bc

A grep showed that this exact same code existed in three more places, so
I changed it there too, though I did not do a broader search for other
similar code that potentially has the same problem.
2025-06-25 16:37:06 +01:00
Wenju He
13a9b86f62
[NFC][libclc] Replace and delete _CLC_DEFINE_UNARY/BINARY/TERNARY_BUILTIN macros (#145458)
Also delete unused _CLC_DEFINE_BINARY_BUILTIN_WITH_SCALAR_SECOND_ARG,
_CLC_DEFINE_UNARY_BUILTIN_FP16 and _CLC_DEFINE_BINARY_BUILTIN_FP16.

llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc
2025-06-25 13:48:53 +08:00
Wenju He
de3a9ea510
[NFC][libclc] Simplify clc_dot and dot implementation (#142922)
llvm-diff shows no change to amdgcn--amdhsa.bc
2025-06-06 08:09:53 +08:00
Fraser Cormack
6306f0fa21
[libclc] Support LLVM_ENABLE_RUNTIMES when building (#141574)
This commit deprecates the use of LLVM_ENABLE_PROJECTS in favour of
LLVM_ENABLE_RUNTIMES when building libclc.

Alternatively, using -DLLVM_RUNTIME_TARGETS=<triple> combined with
-DRUNTIMES_<triple>_LLVM_ENABLE_RUNTIMES=libclc also gets pretty far but
fails due to zlib problems building the LLVM utility 'prepare_builtins'.
I'm not sure what's going on there but I don't think it's required at
this stage. More work would be required to support that option.

This does nothing to change how the host tools are found in order to be
used to actually build the libclc libraries.

Note that under such a configuration the final libclc builtin libraries
are placed in `<build>/runtimes/runtimes-bins/libclc/`, which differs
from a non-runtimes build. The installation location remains the same.

Fixes #124013.
2025-06-05 17:56:21 +01:00
Fraser Cormack
8c3019ecf4
[libclc] Add (fast) normalize to CLC; add half overloads (#139759)
For simplicity the half overloads just call into the float versions of
the builtin. Otherwise there are no codegen changes to any target.
2025-06-05 09:11:36 +01:00
Romaric Jodin
8f3ccd1674
libclc: clspv: do not set generic_addrspace_val (#141912)
This is breaking clspv: https://github.com/google/clspv/issues/1493
2025-06-02 10:10:52 +01:00
Wenju He
6e3d668206
[libclc] Move prefetch to clc library (#141721)
llvm-diff shows no change to amdgcn--amdhsa.bc
2025-05-29 09:11:06 +08:00
Fraser Cormack
b474c3f69e
[libclc] Move vload & vstore to CLC library (#141755)
This commit moves the various vload and vstore builtins (including
vload_half, vloada_half, etc.) to the CLC library.

This is almost entirely a code move and does not make any attempt to
clean up or optimize the definitions of these builtins. There is no
change to any of the targets' builtin libraries, except that the vstore
helper rounding functions are now internalized.

Cleanups can come in future work. The new CLC declarations and new
OpenCL wrappers show how these CLC implementations could be defined more
simply. The builtins could probably also be vectorized in future work;
right now all of the 'half' versions for both vload and vstore are
essentially scalarized.
2025-05-28 16:16:12 +01:00
Fraser Cormack
9fa81a486e
[libclc] Move step to the CLC library; add missing half variants (#140936)
The half variants were missing but are trivial to implement. There were
some incorrect mixed type overloads (step(float, double)) which aren't
in the OpenCL specification and so have been removed.

Like certain other builtins the CLC step function only deals with
identical types. The OpenCL layer is responsible for casting the scalar
argument to a vector.

This commit also trivially vectorizes the CLC function, generating
better bytecode.
2025-05-22 09:54:27 +01:00
Fraser Cormack
94142d9bb0
[libclc] Support the generic address space (#137183)
This commit provides definitions of builtins with the generic address
space.

One concept to consider is the difference between supporting the generic
address space from the user's perspective and the requirement for libclc
as a compiler implementation detail to define separate generic address
space builtins. In practice a target (like NVPTX) might notionally
support the generic address space, but it's mapped to the same LLVM
target address space as another address space (often the private one).

In such cases libclc must be careful not to define both private and
generic overloads of the same builtin. We track these two concepts
separately, and make the assumption that if the generic address space
does clash with another, it's with the private one. We track the
concepts separately because there are some builtins such as atomics that
are defined for the generic address space but not the private address
space.
2025-05-21 17:50:00 +01:00
Fraser Cormack
0bc7f41db8
[libclc] Move all remquo address spaces to CLC library (#140871)
Previously the OpenCL address space overloads of remquo would call into
the one and only 'private' CLC remquo. This was an outlier compared with
the other pointer-argumented maths builtins.

This commit moves the definitions of all address space overloads to the
CLC library to give more control over each address space to CLC
implementers.

There are some minor changes to the generated bytecode but it's simply
moving IR instructions around.
2025-05-21 11:26:04 +01:00
Fraser Cormack
80913b44a4 [libclc][NFC] Reuse inc file for OpenCL frexp decl 2025-05-21 10:19:31 +01:00
Wenju He
e70568e28e
[libclc] Re-use shuffle_decl.inc in OpenCL shuffle2 declaration (#140679)
Also internalize __clc_get_el_* symbols in clc_shuffle2. llvm-diff shows
no change to amdgcn--amdhsa.bc.
2025-05-21 09:49:24 +01:00
Fraser Cormack
2fb6ff46f6 [libclc] Fix header inclusion issues
For some reason these weren't picked up by pre-commit CI.
2025-05-20 10:19:09 +01:00
Fraser Cormack
32cf55aef3
[libclc] Reorganize OpenCL builtins (#140557)
This commits moves all OpenCL builtins under a top-level 'opencl'
directory, akin to how the CLC builtins are organized. This new
structure aims to better convey the separation of the two layers and
that 'CLC' is not a subset of OpenCL or a libclc target.

In doing so this commit moves the location of the 'lib' directory to
match CLC: libclc/generic/lib/ becomes libclc/opencl/lib/generic/. This
allows us to remove some special casing in CMake and ensure a common
directory structure.

It also tries to better communicate that the OpenCL headers are
libclc-specific OpenCL headers and should not be confused with or used
as standard OpenCL headers. It does so by ensuring includes are of the
form <clc/opencl/*>. It might be that we don't specifically need the
libclc OpenCL headers and we simply could use clang's built-in
declarations, but we can revisit that later.

Aside from the code move, there is some code formatting and updating a
couple of OpenCL builtin includes to use the readily available gentype
helpers. This allows us to remove some '.inc' files.
2025-05-20 09:51:30 +01:00
Fraser Cormack
c27e10fa65
[libclc] Mov erf & erfc to CLC library (#140524)
This completes the set of maths builtins.

No attempt to vectorize or optimize this code. The implementation is
licensed to SunPro so will probably need to be replaced at some point in
the future anyway. Calls to other builtins have been replaced with the
CLC equivalents, and some bit-hacking was replaced with the fabs
builtin.
2025-05-19 11:32:35 +01:00
Wenju He
d779b8f92b
[libclc] Append file_specific_compile_options after ARG_COMPILE_FLAGS (#139871)
This enables file_specific_compile_options to take precedence over
ARG_COMPILE_FLAGS. For example, if we add -fno-slp-vectorize to
COMPILE_OPTIONS of a file, the behavior changes as follows:
* Before this PR: -fno-slp-vectorize is overwritten by -O3, resulting in
SLP vectorizer remaining enabled.
* After this PR: -fno-slp-vectorize overwrites -O3, effectively
disabling SLP vectorizer.
2025-05-16 10:21:45 +01:00
Wenju He
299a278db1
[libclc] Improving vector code generated from scalar code (#140008)
The previous method splits vector data into two halves. shuffle_vector
concatenates the two results into a vector data of original size. This
PR eliminates the use of shuffle_vector.
2025-05-16 10:20:32 +01:00
Fraser Cormack
7a4af40896
[libclc] Move cross to CLC library; add missing half overloads (#139713)
The half overloads are trivially identical to the float and double ones.

It didn't seem worth using 'gentype' for the OpenCL layer or CLC
declarations so they're just written out explicitly. It does help avoid
less trivial repetition in the CLC implementation, though.
2025-05-13 17:07:07 +01:00
Fraser Cormack
95c683fc1b
[libclc] Move logb/ilogb to CLC library; optimize (#128028)
This commit moves the logb and ilogb builtins to the CLC library.

It simultaneously optimizes them both for vector types and for half
types. Vector types were being scalarized in some cases. Half types were
previously promoting to float, whereas this commit provides them a
native implementation.

Everything passes the OpenCL-CTS.

I had to intuit some magic numbers used by these implementations in
order to generate the half variants. I gave them clearer definitions
derived from what I believe are their actual component numbers, but
named them 'magic' to convey that they weren't derived from first
principles.
2025-05-13 11:47:35 +01:00
Fraser Cormack
0e8f0b51ff [libclc][NFC] Fix return after else 2025-05-13 11:46:26 +01:00
Fraser Cormack
655151a7e0
[libclc] Move (fast) length & distance to CLC library (#139701)
This commit also refactors how geometric builtins are defined and
declared, by sharing more helpers. It also removes an unnecessary
gentype-like helper in favour of the more complete math/gentype.inc.

There are no changes to the IR for any of these four builtins.

The 'normalize' builtin will follow in a subsequent commit because it
would involve the addition of missing halfn-type overloads for
completeness.
2025-05-13 11:45:55 +01:00
Fraser Cormack
dd89af7f55
[libclc] Move 'half' builtins to CLC library (#139563)
There are no changes to the generated bytecode.
2025-05-12 17:32:05 +01:00
Fraser Cormack
87978ea272
[libclc] Move tan to the CLC library (#139547)
There was already a __clc_tan in the OpenCL layer. This commit moves the
function over whilst vectorizing it.

The function __clc_tan is no longer a public symbol, which should have
never been the case.
2025-05-12 14:55:27 +01:00
Fraser Cormack
4f107cd8f8
[libclc] Move sin, cos & sincos to CLC library (#139527)
This commit moves the remaining FP64 sin and cos helper functions to the
CLC library. As a consequence, it formally moves all sin, cos and sincos
builtins to the CLC library. Previously, the FP16 and FP32 were
nominally there but still in the OpenCL layer while waiting for the FP64
ones.

The FP64 builtins are now vectorized as the FP16 and FP32 ones were
earlier.

One helper table had to be changed. It was previously a table of bytes
loaded by each work-item as uint4. Since this doesn't vectorize well,
the table was split to load two ulongNs per work-item. While this might
not be as efficient on some devices, one mitigating factor is that we
were previously loading 48 bytes per work-item in total, but only using
40 of them. With this commit we only load the bytes we need.
2025-05-12 11:32:15 +01:00
Fraser Cormack
c5b750f5af [libclc] Move log2/log10 tables to CLC tables impl
These two tables were being used by the CLC library but their
definitions still remained in the OpenCL layer. This worked out after
linking the two together but is a layering violation.

This had a side effect of removing the two table getters from the final
bytecode library, which were never intended to be exposed.

These two tables should probably be refactored so allow better
vectorization of log/log2/log10, but that is left to future work.
2025-05-01 10:23:28 +01:00
Fraser Cormack
6c4dd8d1d2
[libclc] Move minmag & maxmag to the CLC library (#137982) 2025-05-01 09:43:40 +01:00
Fraser Cormack
75f040ab3e
[libclc] Clean up unnecessary #undef __CLC_BODYs (#137959)
This macro is automatically undefined by the various gentype-like
helpers.
2025-04-30 16:13:04 +01:00
Fraser Cormack
1180740ced [libclc][NFC] Remove unused integer-gentype.inc
This is presumably replaced with uses of integer/gentype.inc.
2025-04-30 13:39:23 +01:00
Wenju He
5b6fc61091
[libclc] Add v3 variants of async_work_group_copy/async_work_group_strided_copy/prefetch (#137932)
3-component vector type is supported for them per OpenCL spec.
2025-04-30 13:19:08 +01:00
Fraser Cormack
694a42f018
[libclc] Avoid casting NANs & literals to 'gentype' (#137824)
By having these already defined as type 'gentype' we can avoid
unnecessary casting.
2025-04-29 17:33:21 +01:00
Fraser Cormack
ea688c031e
[libclc] Move fdim to CLC library; simplify (#137811)
This commit moves the fdim builtin to the CLC library. It simultaneously
simplifies the codegen, unifying it between scalar and vector and
avoiding bithacking for vector types.
2025-04-29 16:41:07 +01:00
Fraser Cormack
6ffccea1c2 [libclc][NFC] Remove binary_decl_tt.inc
This was performing the same role as binary_decl.inc.
2025-04-29 15:48:46 +01:00
Fraser Cormack
837d5a740f
[libclc][NFC] Remove unary_builtin.inc (#137656)
We had two ways of achieving the same thing. This commit removes
unary_builtin.inc in favour of the approach combining gentype.inc with
unary_def.inc.

There is no change to the codegen for any target.
2025-04-29 14:17:17 +01:00
Fraser Cormack
78d95cc544
[libclc] Move fract to the CLC library (#137785)
The builtin was already vectorized so there's no difference to codegen
for non-SPIR-V targets.
2025-04-29 13:58:13 +01:00
Fraser Cormack
4609b6a3e7
[libclc] Move fmin & fmax to CLC library (#134218)
This is an alternative to #128506 which doesn't attempt to change the
codegen for fmin and fmax on their way to the CLC library.

The amdgcn and r600 custom definitions of fmin/fmax are now converted to
custom definitions of __clc_fmin and __clc_fmax.

For simplicity, the CLC library doesn't provide vector/scalar versions
of these builtins. The OpenCL layer wraps those up to the vector/vector
versions.

The only codegen change is that non-standard vector/scalar overloads of
fmin/fmax have been removed. We were currently (accidentally,
presumably) providing overloads with mixed elment types such as
fmin(double2, float), fmax(half4, double), etc. The only vector/scalar
overloads in the OpenCL spec are those with scalars of the same element
type as the vector in the first argument.
2025-04-29 10:51:24 +01:00
Fraser Cormack
139e30e215
[libclc] Remove (vload|vstore)_half helpers (#137181)
These were only being used when compiling with versions of clang older
than clang 6. As such they were essentially unsupported and untested.

This somewhat simplifies the codebase, producing fewer helper functions
in the final builtins library. It also avoids typed pointer IR.

There's no change to any of the targets' bytecode other than removing
these helper functions.
2025-04-24 15:08:05 +01:00
Fraser Cormack
2edade2824 [libclc][NFC] Clang-format vload/vstore code 2025-04-24 11:42:18 +01:00
Fraser Cormack
d664c42baa
[libclc] Remove unnecessary clcmacros.h (#137149)
The macros defined by this file (not to be confused with clcmacro.h)
don't appear necessary for building libclc.

The language version macros should be handled by clang, and there are no
uses of NULL or kernel_exec in the source code.
2025-04-24 11:24:24 +01:00
Wenju He
77fe6aaeaa
[libclc] only check filename part of the source for avoiding duplication (#135710)
llvm-diff shows this PR has no changes to amdgcn--amdhsa.bc.

Motivation is that in our downstream the same category of target
built-ins, e.g. math, are organized in several different folders. For
example, in target SOURCES we have math-common/cos.cl, while in generic
SOURCES it is math/cos.cl. Based on current check rule that compares
both folder name and base filename, target math-common/cos.cl won't
override math/cos.cl when collecting source files from SOURCES files in
cmake function libclc_configure_lib_source.

With this PR, we allow folder name to be different in the process.

A notable change of this PR is that two entries in SOURCES with the same
base filename must not implements the same built-in.
2025-04-24 05:35:16 +01:00
Fraser Cormack
6c56160433
[libclc] Re-enable compiler warning (#136872)
libclc is now clean of code that triggers the
bitwise-conditional-parentheses warning, so we can finally remove the
workaround.
2025-04-23 15:59:15 +01:00
Fraser Cormack
806d59eecd
[libclc] Fix unguarded use of image types (#136871)
Commit 8292e05 which switched the OpenCL C version to 3.0 exposed this
issue, which wasn't caught in pre-commit CI.
2025-04-23 15:54:48 +01:00
Wenju He
8292e050e6
[libclc] Build for OpenCL 3.0 (#135733)
This PR is modified cherry-pick of
https://github.com/intel/llvm/commit/cba338e5fb1c
This PR sets OpenCL language version to be the same, which is 3.0,
for every target and device, in order to unify the build process.
Target should define supported extensions and features via
setSupportedOpenCLOpts API.

llvm-diff shows one change to amdgcn--amdhsa.bc:
* ctz symbols are added since they are now enabled for amdgcn.
2025-04-23 13:15:47 +01:00
Wenju He
552902455c
[libclc] Add ctz built-in implementation to clc and generic (#135309) 2025-04-15 15:23:25 +01:00
Fraser Cormack
4cb1803ff9 [libclc][NFC] Fix typo in comment 2025-04-14 14:38:58 +01:00
Wenju He
0c21d6b4c8
[libclc] Fix commands in compile_to_bc are executed sequentially (#130755)
In libclc, we observe that compiling OpenCL source files to bitcode is
executed sequentially on Windows, which increases debug build time by
about an hour.
add_custom_command may introduce additional implicit dependencies, see
https://gitlab.kitware.com/cmake/cmake/-/issues/17097
This PR adds a target for each command, enabling parallel builds of
OpenCL source files.
CMake 3.27 has fixed above issue with DEPENDS_EXPLICIT_ONLY. When LLVM
upgrades cmake vertion to 3.7, we can switch to DEPENDS_EXPLICIT_ONLY.
2025-04-14 14:11:04 +01:00
Wenju He
cbda72a547
[NFC][libclc] Merge atomic extension built-ins with identical name into a single file (#134489)
llvm-diff shows there is no change to amdgcn--amdhsa.bc.

Similar to how cl_khr_fp64 and cl_khr_fp16 implementations are put in a
same file for math built-ins, this PR do the same to atom_* built-ins.

The main motivation is to prevent that two files with same base name
implementats different built-ins. In a follow-up PR, I'd like to relax
libclc_configure_lib_source to only compare filename instead of path for
overriding, since in our downstream the same category of built-ins, e.g.
math, are organized in several different folders.
2025-04-14 10:27:48 +01:00
Fraser Cormack
7d32d72f10 [libclc][NFC] Remove blank line at end of file 2025-04-10 10:02:51 +01:00
Romaric Jodin
135a7874dc
libclc: clspv: fma: remove fp16 implementation (#135002)
clspv is already handling generation of fp16. This implementation is
preventing clspv from making the best choice to use an emulation on top
of fp32-fma, or the native fp16-fma, depending on the command-line
arguments.
2025-04-10 10:01:57 +01:00
Fraser Cormack
b0338c3d6c
[libclc] Move shuffle/shuffle2 to the CLC library (#135000)
This commit moves the shuffle and shuffle2 builtins to the CLC library.
In so doing it makes the headers simpler and re-usable for other builtin
layers to hook into the CLC functions, if they wish.

An additional gentype utility has been made available, which provides a
consistent vector-size-or-1 macro for use.

The existing __CLC_VECSIZE is defined but empty which is useful in
certain applications, such as in concatenation with a type to make a
correctly sized scalar or vector type. However, this isn't usable in the
same preprocessor lines when wanting to check for specific vector sizes,
as e.g., '__CLC_VECSIZE == 2' resolves to '== 2' which is invalid. In
local testing this is also useful for the geometric builtins which are
only available for scalar types and vector types of 2, 3, or 4 elements.

No codegen changes are observed, except the internal shuffle/shuffle2
utility functions are no longer made publicly available.
2025-04-09 15:52:25 +01:00