llvm-project

Author	SHA1	Message	Date
Fraser Cormack	586cacdbdd	[libclc] Optimize generic CLC fmin/fmax (#128506 ) With this commit, the CLC fmin/fmax builtins use clang's __builtin_elementwise_(min\|max)imumnum which helps us generate LLVM minimumnum/maximumnum intrinsics directly. These intrinsics uniformly select the non-NaN input over the (quiet or signalling) NaN input, which corresponds to what the OpenCL CTS tests. These intrinsics maintain the vector types, as opposed to scalarizing, which was previously happening. This commit therefore helps to optimize codegen for those targets. Note that there is ongoing discussion regarding how these builtins should handle signalling NaNs in the OpenCL specification and whether they should be able to return a quiet NaN as per the IEEE behaviour. If the specification and/or CTS is ever updated to allow or mandate returning a qNAN, these builtins could/should be updated to use __builtin_elementwise_(min\|max)num instead which would lower to LLVM minnum/maxnum intrinsics. The SPIR-V targets maintain the old implementations, as the LLVM -> SPIR-V translator can't currently handle the LLVM intrinsics. The implementation has been simplifies to consistently use clang builtins, as opposed to before where the half version was explicitly defined. [1] https://github.com/KhronosGroup/OpenCL-CTS/pull/2285	2025-07-29 13:21:42 +01:00
Fraser Cormack	76bebb5be9	[libclc] Fix building top-level 'libclc' target (#150972 ) With libclc being a 'runtime', the top-level build assumes that there is a corresopnding 'libclc' target. We previously weren't providing this, leading to a build failure if the user tried to build it. This commit remedies this by adding support for building the 'libclc' target. It does so by adding dependencies from the OpenCL builtins to this target. It uses a configurable in-between target - libclc-opencl-builtins - to ease the possibility of adding non-OpenCL builtin libraries in the future.	2025-07-29 10:53:31 +01:00
Wenju He	5223317210	[libclc] Add generic native half implementation of __clc_normalize (#150165 ) This is ported from https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/generic/geometric/normalize.cl and can pass a closed-source OpenCL CTS "test_geometrics geom_normalize --half CL_DEVICE_TYPE_GPU" on intel GPU. llvm-diff amdgcn--amdhsa.bc shows fpext/fptrunc insts are now removed from normalize function.	2025-07-29 08:29:12 +08:00
Wenju He	bcd0d97224	[libclc] Simplify unary_def_scalarize.inc's use in __clc_erf/erfc/tgamma (#150181 ) Also delete unary_def_via_fp32.inc. There are small changes in amdgcn--amdhsa.bc due to vector conversion is scalarized, e.g. %2 = fpext <4 x half> %0 to <4 x float> %3 = extractelement <4 x float> %2, i64 0 %4 = tail call float @llvm.fabs.f32(float %3) -> %2 = extractelement <4 x half> %0, i64 0 %3 = tail call half @llvm.fabs.f16(half %2) %4 = fpext half %3 to float	2025-07-29 08:25:58 +08:00
Michał Górny	abe93d9d7e	[libclc] Fix installed symlinks to be relative again (#149728 ) Fix the symlink creation logic to use relative paths instead of absolute, in order to ensure that the installed symlinks actually refer to the installed .bc files rather than the ones from the build directory. This was broken in #146833. The change is a bit roundabout but it attempts to preserve the spirit of #146833, that is the ability to use multiple output directories (provided they all resides in `${LIBCLC_OUTPUT_LIBRARY_DIR}` and preserve the same structure in the installed tree). Signed-off-by: Michał Górny <mgorny@gentoo.org>	2025-07-21 20:59:31 +02:00
Michał Górny	58c3affdaa	[libclc] Expose `prepare_builtins_` variables in top-level CMakeLists (#149657 ) Fix `libclc/utils/CMakeLists.txt` to expose `prepare_builtins_` variables in parent scope. This was a regression introduced in #148815 where the code was moved into subdirectory, and the variables would no longer be accessible to calls in top-level CMakeLists, resulting in attempting to build targets with empty command: ``` [1566/1676] cd /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build && -o /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/clspv--.bc /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/obj.libclc.dir/clspv--/builtins.opt.clspv--.bc FAILED: clspv--.bc /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/clspv--.bc cd /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build && -o /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/clspv--.bc /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/obj.libclc.dir/clspv--/builtins.opt.clspv--.bc /bin/sh: line 1: -o: command not found ```	2025-07-20 12:26:51 +09:00
Wenju He	9c26f37ce3	[libclc] Add generic implementation of some atomic functions in OpenCL spec section 6.15.12.7 (#146814 ) Add corresponding clc functions, which are implemented with clang __scoped_atomic builtins. OpenCL functions are implemented as a wrapper over clc functions. Also change legacy atomic_inc and atomic_dec to re-use the newly added clc_atomic_inc/dec implementations. llvm-diff only no change to atomic_inc and atomic_dec in bitcode. Notes: * Generic OpenCL built-ins functions uses __ATOMIC_SEQ_CST and __MEMORY_SCOPE_DEVICE for memory order and memory scope parameters. * OpenCL atomic__explicit, atomic_flag built-ins are not implemented yet. * OpenCL built-ins of atomic_intptr_t, atomic_uintptr_t, atomic_size_t and atomic_ptrdiff_t types are not implemented yet. * llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc since __opencl_c_atomic_order_seq_cst and __opencl_c_atomic_scope_device are not defined in these two targets.	2025-07-18 08:09:14 +08:00
Wenju He	c0294f497d	[libclc] Add generic implementation of bitfield_insert/extract,bit_reverse (#149070 ) The implementation is based on reference implementation in OpenCL-CTS/test_integer_ops. The generic implementations pass OpenCL-CTS/test_integer_ops tests on Intel GPU.	2025-07-18 08:06:29 +08:00
Wenju He	3abecfe9e3	[NFC][libclc] Delete clc/include/clc/relational/floatn.inc (#149252 ) llvm-diff shows no change to amdgcn--amdhsa.bc.	2025-07-18 08:05:07 +08:00
Wenju He	cf36f49c04	[libclc] Enable `clang fp reciprocal` in clc_native_divide/recip/rsqrt/tan (#149269 ) The pragma adds `arcp` flag to `fdiv` instruction in these functions. The flag can provide better performance.	2025-07-18 07:50:35 +08:00
Wenju He	9d78eb5cc5	[libclc] Enable -fdiscard-value-names build flag to reduce bitcode size (#149016 ) The flag reduces nvptx64--nvidiacl.bc size from 10.6MB to 5.2MB.	2025-07-17 08:04:33 +08:00
Fraser Cormack	8a7a64873b	[libclc] Move CMake for prepare_builtins to a subdirectory (#148815 ) This simply makes things better self-contained.	2025-07-15 12:26:11 +01:00
Mészáros Gergely	7a089bc4c0	[libclc] Delete .gitignore (#147939 ) The file is listing build artifacts to ignore, but LLVM has long had the policy that in-tree builds are not supported, so the ignore rules shouldn't serve their original purpose anymore. The rules however are annoying because although they probably intended only to ignore top-level build artifacts, they lack the leading `/` so they match any file with the ignored name anywhere under `libclc/`.	2025-07-10 14:07:59 +02:00
Wenju He	28aa5a64ef	[libclc] Declare workitem built-ins in clc, move ptx-nvidiacl workitem built-ins into clc (#144333 ) Changes in this PR: * Declare most of workitem functions in clc and opencl folders. * Call clc workitem function in corresponding OpenCL workitem function. * Move ptx-nvidiacl workitem built-in implementations into clc. * Move a few amdgcn workitem built-in implementations into clc. * Include only needed headers in OpenCL workitem functions. * Implement get_local_linear_id, get_max_sub_group_size, get_num_sub_groups, get_sub_group_id, get_sub_group_local_id, get_sub_group_size for ptx-nvidiacl. llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc. llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and nvptx64--.bc.	2025-07-10 08:04:16 +08:00
Fraser Cormack	9b5959dd9a	[libclc] Change symlinks to copies on Windows (#147759 ) This mirrors how other LLVM libraries handle symlinks	2025-07-09 17:20:56 +01:00
Fraser Cormack	9d11bd0db8	[libclc] Remove catch-all opencl/clc.h (#147490 ) This commit finishes the work started in #146840 and #147276. It makes each OpenCL header self-contained and each implementation file include only the headers it needs. It removes the need for a catch-all include file of all OpenCL builtin declarations.	2025-07-08 10:37:06 +01:00
Fraser Cormack	b67504c461	[libclc] Tighten OpenCL builtin include strategy (#147276 ) This commit continues the work from #146840 and extends it to the maths, geomtrics, common, and relational directories. All headers have include guards and, where appropriate, include the minimal code required for their specific definitions. Implementation files no longer include the large catch-all header of all OpenCL builtin declarations.	2025-07-08 09:04:43 +01:00
Wenju He	7cd179612d	[libclc] Fix typo in OpenCL header math/sincos.h (#147244 ) llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc	2025-07-07 17:30:40 +08:00
Fraser Cormack	ea685890b8	[libclc] Reduce include usage in OpenCL builtins (#146840 ) This commit starts the process of reducing the amount of code included by OpenCL builtins, hopefully reducing build times in the process. It introduces a minimal OpenCL header - opencl-base.h - which includes only the OpenCL type definitions and the macros necessary for declaring/defining functions. Where the OpenCL builtin implementations would currently include the whole of <clc/opencl/clc.h>, which defines all OpenCL builtins, now they include only the specific declaration they need. This mirrors how the CLC builtins are defined.	2025-07-07 10:20:28 +01:00
Wenju He	fa9cd47328	[NFC][libclc] Rename __CLC_FUNCTION to either FUNCTION or __IMPL_FUNCTION (#146999 ) Rename to FUNCTION if it is for declaration, since it doesn't make much sense to use __CLC_FUNCTION for OpenCL function declaration. Rename to __IMPL_FUNCTION if it is for definition, since in some cases implementation function isn't clc_* function.	2025-07-07 08:07:51 +08:00
Fraser Cormack	222e795347	[libclc] Fix target dependency The prepare target was depending on the output of a custom command, but wasn't the full path to that file. This tripped up CMake if the file was removed as it didn't know how to rebuild that file.	2025-07-04 11:08:00 +01:00
Fraser Cormack	81e6552a3d	[libclc] Make library output directories explicit (#146833 ) These changes were split off from #146503. This commit makes the output directories of libclc artefacts explicit. It creates a variable for the final output directory - LIBCLC_OUTPUT_LIBRARY_DIR - which has not changed. This allows future changes to alter the output directory more simply, such as by pointing it to somewhere inside clang's resource directory. This commit also changes the output directory of each target's intermediate builtins.*.bc files. They are now placed into each respective libclc target's object directory, rather than the top-level libclc binary directory. This should help keep the binary directory a bit tidier.	2025-07-04 10:35:15 +01:00
Fraser Cormack	85d09de5fa	[libclc] Add prepare-<triple> targets (#146700 ) This target provides a unified build target for all devices under the single triple. This way a user doesn't have to know device names to build a specific target's bytecode libraries. Device names may be considered as internal implementation details as they are not exposed to users of CMake; users only specify triples to build. Now, instead of `prepare-{barts,cayman,cedar,cypress}-r600--.bc`, for example, a user may now build simply `prepare-r600--` and have all four of those libraries built. This commit also refactors the CMake somewhat. We were previously diverging between the SPIR-V and other targets, and duplicating a bit of logic like the creation of the 'prepare' targets, the targets' properties, and the installation directory. It's cleaner and hopefully more robust to share this code between all targets. This commit also takes this opportunity to improve some comments around this code.	2025-07-03 08:30:33 +01:00
Wenju He	b0e6faae08	[libclc] Add missing clc_lgamma_r with generic address space pointer arg (#146495 ) There is no change to amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc because __opencl_c_generic_address_space is not defined for them.	2025-07-02 08:28:01 +08:00
Wenju He	93fe52f19e	[libclc] Add __clc_nan implementation with signed nancode argument (#146485 ) In OpenCL Extended Instruction Set Specification, nancode can be signed integer or vector of signed integers values. This PR has no change to amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc because the newly added clc functions are not used in OpenCL library.	2025-07-02 08:27:46 +08:00
Wenju He	338dee0742	[NFC][libclc] Refactor _CLC_*_VECTORIZE macros to functions in .inc files (#145678 ) With this PR, if we have customized implementation for scalar or vector length = 2, we don't need to write new macros, e.g. https://github.com/intel/llvm/blob/fb18321705f6/libclc/clc/include/clc/clcmacro.h#L15 Undef __HALF_ONLY, __FLOAT_ONLY and __DOUBLE_ONLY at the end of clc/include/clc/math/gentype.inc llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc	2025-06-30 17:19:19 +08:00
Harald van Dijk	46ee7f1908	[libclc] Avoid out-of-range float-to-int. (#145698 ) For a kernel such as kernel void foo(__global double3 z) { double3 x = {0.6631661088,0.6612268107,0.1513627528}; int3 y = {-1980459213,-660855407,615708204}; z = pown(x, y); } we were not storing anything to z, because the implementation of pown relied on an floating-point-to-integer conversion where the floating-point value was outside of the integer's range. Although in LLVM IR we permit that operation so long as we end up ignoring its result -- that is the general rule for poison -- one thing we are not permitted to do is have conditional branches that depend on it, and through the call to __clc_ldexp, we did have that. To fix this, rather than changing expv at the end to INFINITY/0, we can change v at the start to values that we know will produce INFINITY/0 without performing such out-of-range conversions. Tested with clang --target=nvptx64 -S -O3 -o - test.cl \ -Xclang -mlink-builtin-bitcode \ -Xclang runtimes/runtimes-bins/libclc/nvptx64--.bc A grep showed that this exact same code existed in three more places, so I changed it there too, though I did not do a broader search for other similar code that potentially has the same problem.	2025-06-25 16:37:06 +01:00
Wenju He	13a9b86f62	[NFC][libclc] Replace and delete _CLC_DEFINE_UNARY/BINARY/TERNARY_BUILTIN macros (#145458 ) Also delete unused _CLC_DEFINE_BINARY_BUILTIN_WITH_SCALAR_SECOND_ARG, _CLC_DEFINE_UNARY_BUILTIN_FP16 and _CLC_DEFINE_BINARY_BUILTIN_FP16. llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc	2025-06-25 13:48:53 +08:00
Wenju He	de3a9ea510	[NFC][libclc] Simplify clc_dot and dot implementation (#142922 ) llvm-diff shows no change to amdgcn--amdhsa.bc	2025-06-06 08:09:53 +08:00
Fraser Cormack	6306f0fa21	[libclc] Support LLVM_ENABLE_RUNTIMES when building (#141574 ) This commit deprecates the use of LLVM_ENABLE_PROJECTS in favour of LLVM_ENABLE_RUNTIMES when building libclc. Alternatively, using -DLLVM_RUNTIME_TARGETS=<triple> combined with -DRUNTIMES_<triple>_LLVM_ENABLE_RUNTIMES=libclc also gets pretty far but fails due to zlib problems building the LLVM utility 'prepare_builtins'. I'm not sure what's going on there but I don't think it's required at this stage. More work would be required to support that option. This does nothing to change how the host tools are found in order to be used to actually build the libclc libraries. Note that under such a configuration the final libclc builtin libraries are placed in `<build>/runtimes/runtimes-bins/libclc/`, which differs from a non-runtimes build. The installation location remains the same. Fixes #124013.	2025-06-05 17:56:21 +01:00
Fraser Cormack	8c3019ecf4	[libclc] Add (fast) normalize to CLC; add half overloads (#139759 ) For simplicity the half overloads just call into the float versions of the builtin. Otherwise there are no codegen changes to any target.	2025-06-05 09:11:36 +01:00
Romaric Jodin	8f3ccd1674	libclc: clspv: do not set generic_addrspace_val (#141912 ) This is breaking clspv: https://github.com/google/clspv/issues/1493	2025-06-02 10:10:52 +01:00
Wenju He	6e3d668206	[libclc] Move prefetch to clc library (#141721 ) llvm-diff shows no change to amdgcn--amdhsa.bc	2025-05-29 09:11:06 +08:00
Fraser Cormack	b474c3f69e	[libclc] Move vload & vstore to CLC library (#141755 ) This commit moves the various vload and vstore builtins (including vload_half, vloada_half, etc.) to the CLC library. This is almost entirely a code move and does not make any attempt to clean up or optimize the definitions of these builtins. There is no change to any of the targets' builtin libraries, except that the vstore helper rounding functions are now internalized. Cleanups can come in future work. The new CLC declarations and new OpenCL wrappers show how these CLC implementations could be defined more simply. The builtins could probably also be vectorized in future work; right now all of the 'half' versions for both vload and vstore are essentially scalarized.	2025-05-28 16:16:12 +01:00
Fraser Cormack	9fa81a486e	[libclc] Move step to the CLC library; add missing half variants (#140936 ) The half variants were missing but are trivial to implement. There were some incorrect mixed type overloads (step(float, double)) which aren't in the OpenCL specification and so have been removed. Like certain other builtins the CLC step function only deals with identical types. The OpenCL layer is responsible for casting the scalar argument to a vector. This commit also trivially vectorizes the CLC function, generating better bytecode.	2025-05-22 09:54:27 +01:00
Fraser Cormack	94142d9bb0	[libclc] Support the generic address space (#137183 ) This commit provides definitions of builtins with the generic address space. One concept to consider is the difference between supporting the generic address space from the user's perspective and the requirement for libclc as a compiler implementation detail to define separate generic address space builtins. In practice a target (like NVPTX) might notionally support the generic address space, but it's mapped to the same LLVM target address space as another address space (often the private one). In such cases libclc must be careful not to define both private and generic overloads of the same builtin. We track these two concepts separately, and make the assumption that if the generic address space does clash with another, it's with the private one. We track the concepts separately because there are some builtins such as atomics that are defined for the generic address space but not the private address space.	2025-05-21 17:50:00 +01:00
Fraser Cormack	0bc7f41db8	[libclc] Move all remquo address spaces to CLC library (#140871 ) Previously the OpenCL address space overloads of remquo would call into the one and only 'private' CLC remquo. This was an outlier compared with the other pointer-argumented maths builtins. This commit moves the definitions of all address space overloads to the CLC library to give more control over each address space to CLC implementers. There are some minor changes to the generated bytecode but it's simply moving IR instructions around.	2025-05-21 11:26:04 +01:00
Fraser Cormack	80913b44a4	[libclc][NFC] Reuse inc file for OpenCL frexp decl	2025-05-21 10:19:31 +01:00
Wenju He	e70568e28e	[libclc] Re-use shuffle_decl.inc in OpenCL shuffle2 declaration (#140679 ) Also internalize __clc_get_el_* symbols in clc_shuffle2. llvm-diff shows no change to amdgcn--amdhsa.bc.	2025-05-21 09:49:24 +01:00
Fraser Cormack	2fb6ff46f6	[libclc] Fix header inclusion issues For some reason these weren't picked up by pre-commit CI.	2025-05-20 10:19:09 +01:00
Fraser Cormack	32cf55aef3	[libclc] Reorganize OpenCL builtins (#140557 ) This commits moves all OpenCL builtins under a top-level 'opencl' directory, akin to how the CLC builtins are organized. This new structure aims to better convey the separation of the two layers and that 'CLC' is not a subset of OpenCL or a libclc target. In doing so this commit moves the location of the 'lib' directory to match CLC: libclc/generic/lib/ becomes libclc/opencl/lib/generic/. This allows us to remove some special casing in CMake and ensure a common directory structure. It also tries to better communicate that the OpenCL headers are libclc-specific OpenCL headers and should not be confused with or used as standard OpenCL headers. It does so by ensuring includes are of the form <clc/opencl/*>. It might be that we don't specifically need the libclc OpenCL headers and we simply could use clang's built-in declarations, but we can revisit that later. Aside from the code move, there is some code formatting and updating a couple of OpenCL builtin includes to use the readily available gentype helpers. This allows us to remove some '.inc' files.	2025-05-20 09:51:30 +01:00
Fraser Cormack	c27e10fa65	[libclc] Mov erf & erfc to CLC library (#140524 ) This completes the set of maths builtins. No attempt to vectorize or optimize this code. The implementation is licensed to SunPro so will probably need to be replaced at some point in the future anyway. Calls to other builtins have been replaced with the CLC equivalents, and some bit-hacking was replaced with the fabs builtin.	2025-05-19 11:32:35 +01:00
Wenju He	d779b8f92b	[libclc] Append file_specific_compile_options after ARG_COMPILE_FLAGS (#139871 ) This enables file_specific_compile_options to take precedence over ARG_COMPILE_FLAGS. For example, if we add -fno-slp-vectorize to COMPILE_OPTIONS of a file, the behavior changes as follows: * Before this PR: -fno-slp-vectorize is overwritten by -O3, resulting in SLP vectorizer remaining enabled. * After this PR: -fno-slp-vectorize overwrites -O3, effectively disabling SLP vectorizer.	2025-05-16 10:21:45 +01:00
Wenju He	299a278db1	[libclc] Improving vector code generated from scalar code (#140008 ) The previous method splits vector data into two halves. shuffle_vector concatenates the two results into a vector data of original size. This PR eliminates the use of shuffle_vector.	2025-05-16 10:20:32 +01:00
Fraser Cormack	7a4af40896	[libclc] Move cross to CLC library; add missing half overloads (#139713 ) The half overloads are trivially identical to the float and double ones. It didn't seem worth using 'gentype' for the OpenCL layer or CLC declarations so they're just written out explicitly. It does help avoid less trivial repetition in the CLC implementation, though.	2025-05-13 17:07:07 +01:00
Fraser Cormack	95c683fc1b	[libclc] Move logb/ilogb to CLC library; optimize (#128028 ) This commit moves the logb and ilogb builtins to the CLC library. It simultaneously optimizes them both for vector types and for half types. Vector types were being scalarized in some cases. Half types were previously promoting to float, whereas this commit provides them a native implementation. Everything passes the OpenCL-CTS. I had to intuit some magic numbers used by these implementations in order to generate the half variants. I gave them clearer definitions derived from what I believe are their actual component numbers, but named them 'magic' to convey that they weren't derived from first principles.	2025-05-13 11:47:35 +01:00
Fraser Cormack	0e8f0b51ff	[libclc][NFC] Fix return after else	2025-05-13 11:46:26 +01:00
Fraser Cormack	655151a7e0	[libclc] Move (fast) length & distance to CLC library (#139701 ) This commit also refactors how geometric builtins are defined and declared, by sharing more helpers. It also removes an unnecessary gentype-like helper in favour of the more complete math/gentype.inc. There are no changes to the IR for any of these four builtins. The 'normalize' builtin will follow in a subsequent commit because it would involve the addition of missing halfn-type overloads for completeness.	2025-05-13 11:45:55 +01:00
Fraser Cormack	dd89af7f55	[libclc] Move 'half' builtins to CLC library (#139563 ) There are no changes to the generated bytecode.	2025-05-12 17:32:05 +01:00
Fraser Cormack	87978ea272	[libclc] Move tan to the CLC library (#139547 ) There was already a __clc_tan in the OpenCL layer. This commit moves the function over whilst vectorizing it. The function __clc_tan is no longer a public symbol, which should have never been the case.	2025-05-12 14:55:27 +01:00

1 2 3 4 5 ...

870 Commits