llvm-project

Author	SHA1	Message	Date
Wenju He	76bb98746b	[NFC][libclc] add missing __CLC_ prefix all internal macros (#153523 ) This unifies naming scheme of macros to address review comment https://github.com/intel/llvm/pull/19779#discussion_r2272194357 math constant value macros are not changed, e.g. `#define AU0 -9.86494292470009928597e-03`	2025-08-18 07:21:04 +08:00
Wenju He	111cdaac99	[libclc] Add __attribute__((const)) to functions that don't access memory (#152456 ) Before this PR, PostOrderFunctionAttrsPass in opt run can deduce memory(none) for these functions. This PR explicitly adds the attribute to align with Clang's OpenCL headers and ensures the attribute is present throughout the compilation flow. Generated bitcode files amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc become slightly smaller.	2025-08-12 17:19:08 +08:00
Wenju He	af16fc2e2a	[libclc] Move mem_fence and barrier to clc library (#151446 ) __clc_mem_fence and __clc_work_group_barrier function have two parameters memory_scope and memory_order. The design allows the clc functions to implement SPIR-V ControlBarrier and MemoryBarrier functions in the future. The default memory ordering in clc is set to __ATOMIC_SEQ_CST, which is also the default and strongest ordering in OpenCL and C++. OpenCL cl_mem_fence_flags parameter is converted to combination of __MEMORY_SCOPE_DEVICE and __MEMORY_SCOPE_WRKGRP, which is passed to clc. llvm-diff shows no change to nvptx64--nvidiacl.bc. llvm-diff show a small change to amdgcn--amdhsa.bc and the number of LLVM IR instruction is reduced by 1: https://alive2.llvm.org/ce/z/_Uhqvt	2025-08-06 09:49:28 +08:00
Wenju He	04691aae0d	[libclc] Refine id in async_work_group_copy STRIDED_COPY (#151644 ) Move id first along 0th dimension to achieve coalesced memory access when stride is 1.	2025-08-05 08:00:17 +08:00
Wenju He	9c26f37ce3	[libclc] Add generic implementation of some atomic functions in OpenCL spec section 6.15.12.7 (#146814 ) Add corresponding clc functions, which are implemented with clang __scoped_atomic builtins. OpenCL functions are implemented as a wrapper over clc functions. Also change legacy atomic_inc and atomic_dec to re-use the newly added clc_atomic_inc/dec implementations. llvm-diff only no change to atomic_inc and atomic_dec in bitcode. Notes: * Generic OpenCL built-ins functions uses __ATOMIC_SEQ_CST and __MEMORY_SCOPE_DEVICE for memory order and memory scope parameters. * OpenCL atomic__explicit, atomic_flag built-ins are not implemented yet. * OpenCL built-ins of atomic_intptr_t, atomic_uintptr_t, atomic_size_t and atomic_ptrdiff_t types are not implemented yet. * llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc since __opencl_c_atomic_order_seq_cst and __opencl_c_atomic_scope_device are not defined in these two targets.	2025-07-18 08:09:14 +08:00
Wenju He	c0294f497d	[libclc] Add generic implementation of bitfield_insert/extract,bit_reverse (#149070 ) The implementation is based on reference implementation in OpenCL-CTS/test_integer_ops. The generic implementations pass OpenCL-CTS/test_integer_ops tests on Intel GPU.	2025-07-18 08:06:29 +08:00
Wenju He	3abecfe9e3	[NFC][libclc] Delete clc/include/clc/relational/floatn.inc (#149252 ) llvm-diff shows no change to amdgcn--amdhsa.bc.	2025-07-18 08:05:07 +08:00
Wenju He	28aa5a64ef	[libclc] Declare workitem built-ins in clc, move ptx-nvidiacl workitem built-ins into clc (#144333 ) Changes in this PR: * Declare most of workitem functions in clc and opencl folders. * Call clc workitem function in corresponding OpenCL workitem function. * Move ptx-nvidiacl workitem built-in implementations into clc. * Move a few amdgcn workitem built-in implementations into clc. * Include only needed headers in OpenCL workitem functions. * Implement get_local_linear_id, get_max_sub_group_size, get_num_sub_groups, get_sub_group_id, get_sub_group_local_id, get_sub_group_size for ptx-nvidiacl. llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc. llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and nvptx64--.bc.	2025-07-10 08:04:16 +08:00
Fraser Cormack	9d11bd0db8	[libclc] Remove catch-all opencl/clc.h (#147490 ) This commit finishes the work started in #146840 and #147276. It makes each OpenCL header self-contained and each implementation file include only the headers it needs. It removes the need for a catch-all include file of all OpenCL builtin declarations.	2025-07-08 10:37:06 +01:00
Fraser Cormack	b67504c461	[libclc] Tighten OpenCL builtin include strategy (#147276 ) This commit continues the work from #146840 and extends it to the maths, geomtrics, common, and relational directories. All headers have include guards and, where appropriate, include the minimal code required for their specific definitions. Implementation files no longer include the large catch-all header of all OpenCL builtin declarations.	2025-07-08 09:04:43 +01:00
Wenju He	7cd179612d	[libclc] Fix typo in OpenCL header math/sincos.h (#147244 ) llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc	2025-07-07 17:30:40 +08:00
Fraser Cormack	ea685890b8	[libclc] Reduce include usage in OpenCL builtins (#146840 ) This commit starts the process of reducing the amount of code included by OpenCL builtins, hopefully reducing build times in the process. It introduces a minimal OpenCL header - opencl-base.h - which includes only the OpenCL type definitions and the macros necessary for declaring/defining functions. Where the OpenCL builtin implementations would currently include the whole of <clc/opencl/clc.h>, which defines all OpenCL builtins, now they include only the specific declaration they need. This mirrors how the CLC builtins are defined.	2025-07-07 10:20:28 +01:00
Wenju He	fa9cd47328	[NFC][libclc] Rename __CLC_FUNCTION to either FUNCTION or __IMPL_FUNCTION (#146999 ) Rename to FUNCTION if it is for declaration, since it doesn't make much sense to use __CLC_FUNCTION for OpenCL function declaration. Rename to __IMPL_FUNCTION if it is for definition, since in some cases implementation function isn't clc_* function.	2025-07-07 08:07:51 +08:00
Wenju He	338dee0742	[NFC][libclc] Refactor _CLC_*_VECTORIZE macros to functions in .inc files (#145678 ) With this PR, if we have customized implementation for scalar or vector length = 2, we don't need to write new macros, e.g. https://github.com/intel/llvm/blob/fb18321705f6/libclc/clc/include/clc/clcmacro.h#L15 Undef __HALF_ONLY, __FLOAT_ONLY and __DOUBLE_ONLY at the end of clc/include/clc/math/gentype.inc llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc	2025-06-30 17:19:19 +08:00
Wenju He	13a9b86f62	[NFC][libclc] Replace and delete _CLC_DEFINE_UNARY/BINARY/TERNARY_BUILTIN macros (#145458 ) Also delete unused _CLC_DEFINE_BINARY_BUILTIN_WITH_SCALAR_SECOND_ARG, _CLC_DEFINE_UNARY_BUILTIN_FP16 and _CLC_DEFINE_BINARY_BUILTIN_FP16. llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc	2025-06-25 13:48:53 +08:00
Wenju He	de3a9ea510	[NFC][libclc] Simplify clc_dot and dot implementation (#142922 ) llvm-diff shows no change to amdgcn--amdhsa.bc	2025-06-06 08:09:53 +08:00
Fraser Cormack	8c3019ecf4	[libclc] Add (fast) normalize to CLC; add half overloads (#139759 ) For simplicity the half overloads just call into the float versions of the builtin. Otherwise there are no codegen changes to any target.	2025-06-05 09:11:36 +01:00
Wenju He	6e3d668206	[libclc] Move prefetch to clc library (#141721 ) llvm-diff shows no change to amdgcn--amdhsa.bc	2025-05-29 09:11:06 +08:00
Fraser Cormack	b474c3f69e	[libclc] Move vload & vstore to CLC library (#141755 ) This commit moves the various vload and vstore builtins (including vload_half, vloada_half, etc.) to the CLC library. This is almost entirely a code move and does not make any attempt to clean up or optimize the definitions of these builtins. There is no change to any of the targets' builtin libraries, except that the vstore helper rounding functions are now internalized. Cleanups can come in future work. The new CLC declarations and new OpenCL wrappers show how these CLC implementations could be defined more simply. The builtins could probably also be vectorized in future work; right now all of the 'half' versions for both vload and vstore are essentially scalarized.	2025-05-28 16:16:12 +01:00
Fraser Cormack	9fa81a486e	[libclc] Move step to the CLC library; add missing half variants (#140936 ) The half variants were missing but are trivial to implement. There were some incorrect mixed type overloads (step(float, double)) which aren't in the OpenCL specification and so have been removed. Like certain other builtins the CLC step function only deals with identical types. The OpenCL layer is responsible for casting the scalar argument to a vector. This commit also trivially vectorizes the CLC function, generating better bytecode.	2025-05-22 09:54:27 +01:00
Fraser Cormack	94142d9bb0	[libclc] Support the generic address space (#137183 ) This commit provides definitions of builtins with the generic address space. One concept to consider is the difference between supporting the generic address space from the user's perspective and the requirement for libclc as a compiler implementation detail to define separate generic address space builtins. In practice a target (like NVPTX) might notionally support the generic address space, but it's mapped to the same LLVM target address space as another address space (often the private one). In such cases libclc must be careful not to define both private and generic overloads of the same builtin. We track these two concepts separately, and make the assumption that if the generic address space does clash with another, it's with the private one. We track the concepts separately because there are some builtins such as atomics that are defined for the generic address space but not the private address space.	2025-05-21 17:50:00 +01:00
Fraser Cormack	0bc7f41db8	[libclc] Move all remquo address spaces to CLC library (#140871 ) Previously the OpenCL address space overloads of remquo would call into the one and only 'private' CLC remquo. This was an outlier compared with the other pointer-argumented maths builtins. This commit moves the definitions of all address space overloads to the CLC library to give more control over each address space to CLC implementers. There are some minor changes to the generated bytecode but it's simply moving IR instructions around.	2025-05-21 11:26:04 +01:00
Fraser Cormack	80913b44a4	[libclc][NFC] Reuse inc file for OpenCL frexp decl	2025-05-21 10:19:31 +01:00
Wenju He	e70568e28e	[libclc] Re-use shuffle_decl.inc in OpenCL shuffle2 declaration (#140679 ) Also internalize __clc_get_el_* symbols in clc_shuffle2. llvm-diff shows no change to amdgcn--amdhsa.bc.	2025-05-21 09:49:24 +01:00
Fraser Cormack	2fb6ff46f6	[libclc] Fix header inclusion issues For some reason these weren't picked up by pre-commit CI.	2025-05-20 10:19:09 +01:00
Fraser Cormack	32cf55aef3	[libclc] Reorganize OpenCL builtins (#140557 ) This commits moves all OpenCL builtins under a top-level 'opencl' directory, akin to how the CLC builtins are organized. This new structure aims to better convey the separation of the two layers and that 'CLC' is not a subset of OpenCL or a libclc target. In doing so this commit moves the location of the 'lib' directory to match CLC: libclc/generic/lib/ becomes libclc/opencl/lib/generic/. This allows us to remove some special casing in CMake and ensure a common directory structure. It also tries to better communicate that the OpenCL headers are libclc-specific OpenCL headers and should not be confused with or used as standard OpenCL headers. It does so by ensuring includes are of the form <clc/opencl/*>. It might be that we don't specifically need the libclc OpenCL headers and we simply could use clang's built-in declarations, but we can revisit that later. Aside from the code move, there is some code formatting and updating a couple of OpenCL builtin includes to use the readily available gentype helpers. This allows us to remove some '.inc' files.	2025-05-20 09:51:30 +01:00

26 Commits