885 Commits

Author SHA1 Message Date
Wenju He
e6d095e89c
[libclc] Only create a target per each compile command for cmake MSVC generator (#154479)
libclc sequential build issue addressed in commit 0c21d6b4c8ad is
specific to cmake MSVC generator. Therefore, this PR avoids creating a
large number of targets when a non-MSVC generator is used, such as the
Ninja generator, which is used in pre-merge CI on Windows in
llvm-project repo. We plan to migrate from MSVC generator to Ninja
generator in our downstream CI to fix flaky cmake bug `Cannot restore
timestamp`, which might be related to the large number of targets.
2025-08-22 07:45:42 +08:00
Fraser Cormack
5c411b3c0b
[libclc] Use elementwise ctlz/cttz builtins for CLC clz/ctz (#154535)
Using the elementwise builtin optimizes the vector case; instead of
scalarizing we can compile directly to the vector intrinsics.
2025-08-21 09:32:03 +01:00
Wenju He
a450dc80bf
[libclc] Implement __clc_get_local_size/__clc_get_max_sub_group_size for amdgcn (#153785)
This simplifies downstream refactoring of libspirv workitem function in
https://github.com/intel/llvm/tree/sycl/libclc/libspirv/lib/generic
2025-08-19 07:51:17 +08:00
Wenju He
76bb98746b
[NFC][libclc] add missing __CLC_ prefix all internal macros (#153523)
This unifies naming scheme of macros to address review comment
https://github.com/intel/llvm/pull/19779#discussion_r2272194357

math constant value macros are not changed, e.g.
`#define AU0 -9.86494292470009928597e-03`
2025-08-18 07:21:04 +08:00
Wenju He
bce14c69db
[libclc] Fix out-of-bound value for workitem functions according to OpenCL spec (#153784) 2025-08-18 06:51:01 +08:00
Wenju He
111cdaac99
[libclc] Add __attribute__((const)) to functions that don't access memory (#152456)
Before this PR, PostOrderFunctionAttrsPass in opt run can deduce
memory(none) for these functions.

This PR explicitly adds the attribute to align with Clang's OpenCL
headers and ensures the attribute is present throughout the compilation
flow. Generated bitcode files amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc
become slightly smaller.
2025-08-12 17:19:08 +08:00
Wenju He
68c609b6c8
[libclc] Fix libclc install on Windows when MSVC generator is used (#152703)
Fix a regression of df7473673214.

cmake MSVC generator is multiple configurations. Build type is not known
at configure time and CMAKE_CFG_INTDIR is evaluated to $(Configuration)
at configure time. libclc install fails since $(Configuration) in
bitcode file path is unresolved in libclc/cmake_install.cmake at install time.

We need a solution that resolves libclc bitcode file path at install
time. This PR fixes the issue using CMAKE_INSTALL_CONFIG_NAME which can
be evaluated at install time. This is the same solution as in
https://reviews.llvm.org/D76827
2025-08-11 18:05:42 +08:00
Wenju He
12cec437c6
[libclc] Implement clc_log/sinpi/sqrt with __nv_* functions (#150174)
This is to upstream implementations in
https://github.com/intel/llvm/tree/sycl/libclc/clc/lib/ptx-nvidiacl/math
2025-08-11 08:53:49 +08:00
Wenju He
1458eb206f
[NFC][libclc] Delete unused clc/shared/binary_decl_with_scalar_second_arg.inc (#152463) 2025-08-08 08:50:14 +08:00
Wenju He
d618c36cb7
[libclc] Add missing clc/lib/ptx-nvidiacl/SOURCES to CMAKE_CONFIGURE_DEPENDS (#152431) 2025-08-07 16:42:13 +08:00
Wenju He
3d1c1a5277
[libclc] Set TARGET_FILE property for prepare-${obj_suffix} target (#152245)
The target's output bitcode `libclc_builtins_lib` is located in a
sub-directory in clang resource directory since df7473673214. Setting
TARGET_FILE property can allow targets in non-libclc project to obtain
the path to `libclc_builtins_lib`.
2025-08-07 08:28:43 +08:00
Wenju He
af16fc2e2a
[libclc] Move mem_fence and barrier to clc library (#151446)
__clc_mem_fence and __clc_work_group_barrier function have two
parameters memory_scope and memory_order. The design allows the clc
functions to implement SPIR-V ControlBarrier and MemoryBarrier
functions in the future.

The default memory ordering in clc is set to __ATOMIC_SEQ_CST, which is
also the default and strongest ordering in OpenCL and C++.

OpenCL cl_mem_fence_flags parameter is converted to combination of
__MEMORY_SCOPE_DEVICE and __MEMORY_SCOPE_WRKGRP, which is passed to clc.

llvm-diff shows no change to nvptx64--nvidiacl.bc.
llvm-diff show a small change to amdgcn--amdhsa.bc and the number of
LLVM IR instruction is reduced by 1: https://alive2.llvm.org/ce/z/_Uhqvt
2025-08-06 09:49:28 +08:00
Wenju He
04691aae0d
[libclc] Refine id in async_work_group_copy STRIDED_COPY (#151644)
Move id first along 0th dimension to achieve coalesced memory access
when stride is 1.
2025-08-05 08:00:17 +08:00
Fraser Cormack
df74736732
[clang] Add the ability to link libclc OpenCL libraries (#146503)
This commit adds driver support for linking libclc OpenCL libraries. It
takes the form of a new optional flag: --libclc-lib=namespec. Nothing is
linked unless this flag is specified.

Not all libclc targets have corresponding clang targets. For this reason
it is desirable for users to be able to specify a libclc library name.
We support this by taking both a library name (without the .bc suffix)
or a filename. Both of these are searched for in the clang resource
directory. Filenames are
also checked themselves so that absolute paths can be provided. The
syntax for specifying filenames (as opposed to library names) uses a
leading colon (:), inspired by the -l option.

To accommodate this option, libclc libraries are now placed into clang's
resource directory in an in-tree configuration. The libraries are all
placed in <resource-dir>/lib/libclc and
are not grouped under host-specific directories as some other runtime
libraries are; it is not expected that OpenCL libraries will differ
depending on the host toolchain.

Currently only the AMDGPU toolchain supports this option as a proof of
concept. Other targets such as NVPTX or SPIR/SPIR-V could support it
too. We could optionally let target toolchains search for libclc
libraries themselves, possibly when passed an empty --libclc-lib.
2025-08-04 15:37:22 +01:00
Fraser Cormack
b0313adefa
[libclc] Add an option to build SPIR-V targets with the LLVM backend (#151347)
This removes the dependency on an external tool to build the SPIR-V
files. It may be of interest to projects such as Mesa.

Note that the option is off by default as using the SPIR-V backend, at
least on my machine, uses a *lot* of memory and the process is often
killed in a parallelized build. It does complete, however.

Fixes #135327.
2025-08-01 09:48:40 +01:00
Fraser Cormack
586cacdbdd
[libclc] Optimize generic CLC fmin/fmax (#128506)
With this commit, the CLC fmin/fmax builtins use clang's
__builtin_elementwise_(min|max)imumnum which helps us generate LLVM
minimumnum/maximumnum intrinsics directly. These intrinsics uniformly
select the non-NaN input over the (quiet or signalling) NaN input, which
corresponds to what the OpenCL CTS tests.

These intrinsics maintain the vector types, as opposed to scalarizing,
which was previously happening. This commit therefore helps to optimize
codegen for those targets.

Note that there is ongoing discussion regarding how these builtins
should handle signalling NaNs in the OpenCL specification and whether
they should be able to return a quiet NaN as per the IEEE behaviour. If
the specification and/or CTS is ever updated to allow or mandate
returning a qNAN, these builtins could/should be updated to use
__builtin_elementwise_(min|max)num instead which would lower to LLVM
minnum/maxnum intrinsics.

The SPIR-V targets maintain the old implementations, as the LLVM ->
SPIR-V translator can't currently handle the LLVM intrinsics. The
implementation has been simplifies to consistently use clang builtins,
as opposed to before where the half version was explicitly defined.

[1] https://github.com/KhronosGroup/OpenCL-CTS/pull/2285
2025-07-29 13:21:42 +01:00
Fraser Cormack
76bebb5be9
[libclc] Fix building top-level 'libclc' target (#150972)
With libclc being a 'runtime', the top-level build assumes that there is
a corresopnding 'libclc' target. We previously weren't providing this,
leading to a build failure if the user tried to build it.

This commit remedies this by adding support for building the 'libclc'
target. It does so by adding dependencies from the OpenCL builtins to
this target. It uses a configurable in-between target -
libclc-opencl-builtins - to ease the possibility of adding non-OpenCL
builtin libraries in the future.
2025-07-29 10:53:31 +01:00
Wenju He
5223317210
[libclc] Add generic native half implementation of __clc_normalize (#150165)
This is ported from
https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/generic/geometric/normalize.cl
and can pass a closed-source OpenCL CTS
"test_geometrics geom_normalize --half CL_DEVICE_TYPE_GPU" on intel GPU.

llvm-diff amdgcn--amdhsa.bc shows fpext/fptrunc insts are now removed
from normalize function.
2025-07-29 08:29:12 +08:00
Wenju He
bcd0d97224
[libclc] Simplify unary_def_scalarize.inc's use in __clc_erf/erfc/tgamma (#150181)
Also delete unary_def_via_fp32.inc. There are small changes in
amdgcn--amdhsa.bc due to vector conversion is scalarized, e.g.
  %2 = fpext <4 x half> %0 to <4 x float>
  %3 = extractelement <4 x float> %2, i64 0
  %4 = tail call float @llvm.fabs.f32(float %3)
->
  %2 = extractelement <4 x half> %0, i64 0
  %3 = tail call half @llvm.fabs.f16(half %2)
  %4 = fpext half %3 to float
2025-07-29 08:25:58 +08:00
Michał Górny
abe93d9d7e
[libclc] Fix installed symlinks to be relative again (#149728)
Fix the symlink creation logic to use relative paths instead of
absolute, in order to ensure that the installed symlinks actually refer
to the installed .bc files rather than the ones from the build
directory. This was broken in #146833. The change is a bit roundabout
but it attempts to preserve the spirit of #146833, that is the ability
to use multiple output directories (provided they all resides in
`${LIBCLC_OUTPUT_LIBRARY_DIR}` and preserve the same structure in the
installed tree).

Signed-off-by: Michał Górny <mgorny@gentoo.org>
2025-07-21 20:59:31 +02:00
Michał Górny
58c3affdaa
[libclc] Expose prepare_builtins_* variables in top-level CMakeLists (#149657)
Fix `libclc/utils/CMakeLists.txt` to expose `prepare_builtins_*`
variables in parent scope. This was a regression introduced in #148815
where the code was moved into subdirectory, and the variables would no
longer be accessible to calls in top-level CMakeLists, resulting in
attempting to build targets with empty command:

```
[1566/1676] cd /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build && -o /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/clspv--.bc /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/obj.libclc.dir/clspv--/builtins.opt.clspv--.bc
FAILED: clspv--.bc /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/clspv--.bc
cd /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build && -o /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/clspv--.bc /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/obj.libclc.dir/clspv--/builtins.opt.clspv--.bc
/bin/sh: line 1: -o: command not found
```
2025-07-20 12:26:51 +09:00
Wenju He
9c26f37ce3
[libclc] Add generic implementation of some atomic functions in OpenCL spec section 6.15.12.7 (#146814)
Add corresponding clc functions, which are implemented with clang
__scoped_atomic builtins. OpenCL functions are implemented as a wrapper
over clc functions.

Also change legacy atomic_inc and atomic_dec to re-use the newly added
clc_atomic_inc/dec implementations. llvm-diff only no change to
atomic_inc and atomic_dec in bitcode.

Notes:
* Generic OpenCL built-ins functions uses __ATOMIC_SEQ_CST and
__MEMORY_SCOPE_DEVICE for memory order and memory scope parameters.
* OpenCL atomic_*_explicit, atomic_flag* built-ins are not implemented
yet.
* OpenCL built-ins of atomic_intptr_t, atomic_uintptr_t, atomic_size_t
and atomic_ptrdiff_t types are not implemented yet.
* llvm-diff shows no change to nvptx64--nvidiacl.bc and
amdgcn--amdhsa.bc since __opencl_c_atomic_order_seq_cst and
__opencl_c_atomic_scope_device are not defined in these two targets.
2025-07-18 08:09:14 +08:00
Wenju He
c0294f497d
[libclc] Add generic implementation of bitfield_insert/extract,bit_reverse (#149070)
The implementation is based on reference implementation in
OpenCL-CTS/test_integer_ops. The generic implementations pass
OpenCL-CTS/test_integer_ops tests on Intel GPU.
2025-07-18 08:06:29 +08:00
Wenju He
3abecfe9e3
[NFC][libclc] Delete clc/include/clc/relational/floatn.inc (#149252)
llvm-diff shows no change to amdgcn--amdhsa.bc.
2025-07-18 08:05:07 +08:00
Wenju He
cf36f49c04
[libclc] Enable clang fp reciprocal in clc_native_divide/recip/rsqrt/tan (#149269)
The pragma adds `arcp` flag to `fdiv` instruction in these functions.
The flag can provide better performance.
2025-07-18 07:50:35 +08:00
Wenju He
9d78eb5cc5
[libclc] Enable -fdiscard-value-names build flag to reduce bitcode size (#149016)
The flag reduces nvptx64--nvidiacl.bc size from 10.6MB to 5.2MB.
2025-07-17 08:04:33 +08:00
Fraser Cormack
8a7a64873b
[libclc] Move CMake for prepare_builtins to a subdirectory (#148815)
This simply makes things better self-contained.
2025-07-15 12:26:11 +01:00
Mészáros Gergely
7a089bc4c0
[libclc] Delete .gitignore (#147939)
The file is listing build artifacts to ignore, but LLVM has long had the
policy that in-tree builds are not supported, so the ignore rules
shouldn't serve their original purpose anymore.
The rules however are annoying because although they probably intended
only to ignore top-level build artifacts, they lack the leading `/` so
they match any file with the ignored name anywhere under `libclc/`.
2025-07-10 14:07:59 +02:00
Wenju He
28aa5a64ef
[libclc] Declare workitem built-ins in clc, move ptx-nvidiacl workitem built-ins into clc (#144333)
Changes in this PR:
* Declare most of workitem functions in clc and opencl folders.
* Call clc workitem function in corresponding OpenCL workitem function.
* Move ptx-nvidiacl workitem built-in implementations into clc.
* Move a few amdgcn workitem built-in implementations into clc.
* Include only needed headers in OpenCL workitem functions.
* Implement get_local_linear_id, get_max_sub_group_size,
get_num_sub_groups,
get_sub_group_id, get_sub_group_local_id, get_sub_group_size for
ptx-nvidiacl.

llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc.
llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and
nvptx64--.bc.
2025-07-10 08:04:16 +08:00
Fraser Cormack
9b5959dd9a
[libclc] Change symlinks to copies on Windows (#147759)
This mirrors how other LLVM libraries handle symlinks
2025-07-09 17:20:56 +01:00
Fraser Cormack
9d11bd0db8
[libclc] Remove catch-all opencl/clc.h (#147490)
This commit finishes the work started in #146840 and #147276. It makes
each OpenCL header self-contained and each implementation file include
only the headers it needs. It removes the need for a catch-all include
file of all OpenCL builtin declarations.
2025-07-08 10:37:06 +01:00
Fraser Cormack
b67504c461
[libclc] Tighten OpenCL builtin include strategy (#147276)
This commit continues the work from #146840 and extends it to the maths,
geomtrics, common, and relational directories.

All headers have include guards and, where appropriate, include the
minimal code required for their specific definitions. Implementation
files no longer include the large catch-all header of all OpenCL builtin
declarations.
2025-07-08 09:04:43 +01:00
Wenju He
7cd179612d
[libclc] Fix typo in OpenCL header math/sincos.h (#147244)
llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc
2025-07-07 17:30:40 +08:00
Fraser Cormack
ea685890b8
[libclc] Reduce include usage in OpenCL builtins (#146840)
This commit starts the process of reducing the amount of code included
by OpenCL builtins, hopefully reducing build times in the process.

It introduces a minimal OpenCL header - opencl-base.h - which includes
only the OpenCL type definitions and the macros necessary for
declaring/defining functions.

Where the OpenCL builtin implementations would currently include the
whole of <clc/opencl/clc.h>, which defines *all* OpenCL builtins, now
they include only the specific declaration they need.

This mirrors how the CLC builtins are defined.
2025-07-07 10:20:28 +01:00
Wenju He
fa9cd47328
[NFC][libclc] Rename __CLC_FUNCTION to either FUNCTION or __IMPL_FUNCTION (#146999)
Rename to FUNCTION if it is for declaration, since it doesn't make much
sense to use __CLC_FUNCTION for OpenCL function declaration. Rename to
__IMPL_FUNCTION if it is for definition, since in some cases
implementation function isn't clc_* function.
2025-07-07 08:07:51 +08:00
Fraser Cormack
222e795347 [libclc] Fix target dependency
The prepare target was depending on the output of a custom command, but
wasn't the full path to that file. This tripped up CMake if the file was
removed as it didn't know how to rebuild that file.
2025-07-04 11:08:00 +01:00
Fraser Cormack
81e6552a3d
[libclc] Make library output directories explicit (#146833)
These changes were split off from #146503.

This commit makes the output directories of libclc artefacts explicit.
It creates a variable for the final output directory -
LIBCLC_OUTPUT_LIBRARY_DIR - which has not changed. This allows future
changes to alter the output directory more simply, such as by pointing
it to somewhere inside clang's resource directory.

This commit also changes the output directory of each target's
intermediate builtins.*.bc files. They are now placed into each
respective libclc target's object directory, rather than the top-level
libclc binary directory. This should help keep the binary directory a
bit tidier.
2025-07-04 10:35:15 +01:00
Fraser Cormack
85d09de5fa
[libclc] Add prepare-<triple> targets (#146700)
This target provides a unified build target for all devices under the
single triple. This way a user doesn't have to know device names to
build a specific target's bytecode libraries.

Device names may be considered as internal implementation details as
they are not exposed to users of CMake; users only specify triples to
build. Now, instead of `prepare-{barts,cayman,cedar,cypress}-r600--.bc`,
for example, a user may now build simply `prepare-r600--` and have all
four of those libraries built.

This commit also refactors the CMake somewhat. We were previously
diverging between the SPIR-V and other targets, and duplicating a bit of
logic like the creation of the 'prepare' targets, the targets'
properties, and the installation directory. It's cleaner and hopefully
more robust to share this code between all targets. This commit also
takes this opportunity to improve some comments around this code.
2025-07-03 08:30:33 +01:00
Wenju He
b0e6faae08
[libclc] Add missing clc_lgamma_r with generic address space pointer arg (#146495)
There is no change to amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc because
__opencl_c_generic_address_space is not defined for them.
2025-07-02 08:28:01 +08:00
Wenju He
93fe52f19e
[libclc] Add __clc_nan implementation with signed nancode argument (#146485)
In OpenCL Extended Instruction Set Specification, nancode can be signed
integer or vector of signed integers values.
This PR has no change to amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc
because the newly added clc functions are not used in OpenCL library.
2025-07-02 08:27:46 +08:00
Wenju He
338dee0742
[NFC][libclc] Refactor _CLC_*_VECTORIZE macros to functions in .inc files (#145678)
With this PR, if we have customized implementation for scalar or vector
length = 2, we don't need to write new macros, e.g.
https://github.com/intel/llvm/blob/fb18321705f6/libclc/clc/include/clc/clcmacro.h#L15

Undef __HALF_ONLY, __FLOAT_ONLY and __DOUBLE_ONLY at the end of
clc/include/clc/math/gentype.inc

llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc
2025-06-30 17:19:19 +08:00
Harald van Dijk
46ee7f1908
[libclc] Avoid out-of-range float-to-int. (#145698)
For a kernel such as

    kernel void foo(__global double3 *z) {
      double3 x = {0.6631661088,0.6612268107,0.1513627528};
      int3 y = {-1980459213,-660855407,615708204};
      *z = pown(x, y);
    }

we were not storing anything to z, because the implementation of pown
relied on an floating-point-to-integer conversion where the
floating-point value was outside of the integer's range. Although in
LLVM IR we permit that operation so long as we end up ignoring its
result -- that is the general rule for poison -- one thing we are not
permitted to do is have conditional branches that depend on it, and
through the call to __clc_ldexp, we did have that.

To fix this, rather than changing expv at the end to INFINITY/0, we can
change v at the start to values that we know will produce INFINITY/0
without performing such out-of-range conversions.

Tested with

    clang --target=nvptx64 -S -O3 -o - test.cl \
      -Xclang -mlink-builtin-bitcode \
      -Xclang runtimes/runtimes-bins/libclc/nvptx64--.bc

A grep showed that this exact same code existed in three more places, so
I changed it there too, though I did not do a broader search for other
similar code that potentially has the same problem.
2025-06-25 16:37:06 +01:00
Wenju He
13a9b86f62
[NFC][libclc] Replace and delete _CLC_DEFINE_UNARY/BINARY/TERNARY_BUILTIN macros (#145458)
Also delete unused _CLC_DEFINE_BINARY_BUILTIN_WITH_SCALAR_SECOND_ARG,
_CLC_DEFINE_UNARY_BUILTIN_FP16 and _CLC_DEFINE_BINARY_BUILTIN_FP16.

llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc
2025-06-25 13:48:53 +08:00
Wenju He
de3a9ea510
[NFC][libclc] Simplify clc_dot and dot implementation (#142922)
llvm-diff shows no change to amdgcn--amdhsa.bc
2025-06-06 08:09:53 +08:00
Fraser Cormack
6306f0fa21
[libclc] Support LLVM_ENABLE_RUNTIMES when building (#141574)
This commit deprecates the use of LLVM_ENABLE_PROJECTS in favour of
LLVM_ENABLE_RUNTIMES when building libclc.

Alternatively, using -DLLVM_RUNTIME_TARGETS=<triple> combined with
-DRUNTIMES_<triple>_LLVM_ENABLE_RUNTIMES=libclc also gets pretty far but
fails due to zlib problems building the LLVM utility 'prepare_builtins'.
I'm not sure what's going on there but I don't think it's required at
this stage. More work would be required to support that option.

This does nothing to change how the host tools are found in order to be
used to actually build the libclc libraries.

Note that under such a configuration the final libclc builtin libraries
are placed in `<build>/runtimes/runtimes-bins/libclc/`, which differs
from a non-runtimes build. The installation location remains the same.

Fixes #124013.
2025-06-05 17:56:21 +01:00
Fraser Cormack
8c3019ecf4
[libclc] Add (fast) normalize to CLC; add half overloads (#139759)
For simplicity the half overloads just call into the float versions of
the builtin. Otherwise there are no codegen changes to any target.
2025-06-05 09:11:36 +01:00
Romaric Jodin
8f3ccd1674
libclc: clspv: do not set generic_addrspace_val (#141912)
This is breaking clspv: https://github.com/google/clspv/issues/1493
2025-06-02 10:10:52 +01:00
Wenju He
6e3d668206
[libclc] Move prefetch to clc library (#141721)
llvm-diff shows no change to amdgcn--amdhsa.bc
2025-05-29 09:11:06 +08:00
Fraser Cormack
b474c3f69e
[libclc] Move vload & vstore to CLC library (#141755)
This commit moves the various vload and vstore builtins (including
vload_half, vloada_half, etc.) to the CLC library.

This is almost entirely a code move and does not make any attempt to
clean up or optimize the definitions of these builtins. There is no
change to any of the targets' builtin libraries, except that the vstore
helper rounding functions are now internalized.

Cleanups can come in future work. The new CLC declarations and new
OpenCL wrappers show how these CLC implementations could be defined more
simply. The builtins could probably also be vectorized in future work;
right now all of the 'half' versions for both vload and vstore are
essentially scalarized.
2025-05-28 16:16:12 +01:00
Fraser Cormack
9fa81a486e
[libclc] Move step to the CLC library; add missing half variants (#140936)
The half variants were missing but are trivial to implement. There were
some incorrect mixed type overloads (step(float, double)) which aren't
in the OpenCL specification and so have been removed.

Like certain other builtins the CLC step function only deals with
identical types. The OpenCL layer is responsible for casting the scalar
argument to a vector.

This commit also trivially vectorizes the CLC function, generating
better bytecode.
2025-05-22 09:54:27 +01:00