llvm-project

Author	SHA1	Message	Date
Wenju He	6e3d668206	[libclc] Move prefetch to clc library (#141721 ) llvm-diff shows no change to amdgcn--amdhsa.bc	2025-05-29 09:11:06 +08:00
Fraser Cormack	b474c3f69e	[libclc] Move vload & vstore to CLC library (#141755 ) This commit moves the various vload and vstore builtins (including vload_half, vloada_half, etc.) to the CLC library. This is almost entirely a code move and does not make any attempt to clean up or optimize the definitions of these builtins. There is no change to any of the targets' builtin libraries, except that the vstore helper rounding functions are now internalized. Cleanups can come in future work. The new CLC declarations and new OpenCL wrappers show how these CLC implementations could be defined more simply. The builtins could probably also be vectorized in future work; right now all of the 'half' versions for both vload and vstore are essentially scalarized.	2025-05-28 16:16:12 +01:00
Fraser Cormack	9fa81a486e	[libclc] Move step to the CLC library; add missing half variants (#140936 ) The half variants were missing but are trivial to implement. There were some incorrect mixed type overloads (step(float, double)) which aren't in the OpenCL specification and so have been removed. Like certain other builtins the CLC step function only deals with identical types. The OpenCL layer is responsible for casting the scalar argument to a vector. This commit also trivially vectorizes the CLC function, generating better bytecode.	2025-05-22 09:54:27 +01:00
Fraser Cormack	94142d9bb0	[libclc] Support the generic address space (#137183 ) This commit provides definitions of builtins with the generic address space. One concept to consider is the difference between supporting the generic address space from the user's perspective and the requirement for libclc as a compiler implementation detail to define separate generic address space builtins. In practice a target (like NVPTX) might notionally support the generic address space, but it's mapped to the same LLVM target address space as another address space (often the private one). In such cases libclc must be careful not to define both private and generic overloads of the same builtin. We track these two concepts separately, and make the assumption that if the generic address space does clash with another, it's with the private one. We track the concepts separately because there are some builtins such as atomics that are defined for the generic address space but not the private address space.	2025-05-21 17:50:00 +01:00
Fraser Cormack	0bc7f41db8	[libclc] Move all remquo address spaces to CLC library (#140871 ) Previously the OpenCL address space overloads of remquo would call into the one and only 'private' CLC remquo. This was an outlier compared with the other pointer-argumented maths builtins. This commit moves the definitions of all address space overloads to the CLC library to give more control over each address space to CLC implementers. There are some minor changes to the generated bytecode but it's simply moving IR instructions around.	2025-05-21 11:26:04 +01:00
Wenju He	e70568e28e	[libclc] Re-use shuffle_decl.inc in OpenCL shuffle2 declaration (#140679 ) Also internalize __clc_get_el_* symbols in clc_shuffle2. llvm-diff shows no change to amdgcn--amdhsa.bc.	2025-05-21 09:49:24 +01:00
Fraser Cormack	2fb6ff46f6	[libclc] Fix header inclusion issues For some reason these weren't picked up by pre-commit CI.	2025-05-20 10:19:09 +01:00
Fraser Cormack	c27e10fa65	[libclc] Mov erf & erfc to CLC library (#140524 ) This completes the set of maths builtins. No attempt to vectorize or optimize this code. The implementation is licensed to SunPro so will probably need to be replaced at some point in the future anyway. Calls to other builtins have been replaced with the CLC equivalents, and some bit-hacking was replaced with the fabs builtin.	2025-05-19 11:32:35 +01:00
Wenju He	299a278db1	[libclc] Improving vector code generated from scalar code (#140008 ) The previous method splits vector data into two halves. shuffle_vector concatenates the two results into a vector data of original size. This PR eliminates the use of shuffle_vector.	2025-05-16 10:20:32 +01:00
Fraser Cormack	7a4af40896	[libclc] Move cross to CLC library; add missing half overloads (#139713 ) The half overloads are trivially identical to the float and double ones. It didn't seem worth using 'gentype' for the OpenCL layer or CLC declarations so they're just written out explicitly. It does help avoid less trivial repetition in the CLC implementation, though.	2025-05-13 17:07:07 +01:00
Fraser Cormack	95c683fc1b	[libclc] Move logb/ilogb to CLC library; optimize (#128028 ) This commit moves the logb and ilogb builtins to the CLC library. It simultaneously optimizes them both for vector types and for half types. Vector types were being scalarized in some cases. Half types were previously promoting to float, whereas this commit provides them a native implementation. Everything passes the OpenCL-CTS. I had to intuit some magic numbers used by these implementations in order to generate the half variants. I gave them clearer definitions derived from what I believe are their actual component numbers, but named them 'magic' to convey that they weren't derived from first principles.	2025-05-13 11:47:35 +01:00
Fraser Cormack	0e8f0b51ff	[libclc][NFC] Fix return after else	2025-05-13 11:46:26 +01:00
Fraser Cormack	655151a7e0	[libclc] Move (fast) length & distance to CLC library (#139701 ) This commit also refactors how geometric builtins are defined and declared, by sharing more helpers. It also removes an unnecessary gentype-like helper in favour of the more complete math/gentype.inc. There are no changes to the IR for any of these four builtins. The 'normalize' builtin will follow in a subsequent commit because it would involve the addition of missing halfn-type overloads for completeness.	2025-05-13 11:45:55 +01:00
Fraser Cormack	dd89af7f55	[libclc] Move 'half' builtins to CLC library (#139563 ) There are no changes to the generated bytecode.	2025-05-12 17:32:05 +01:00
Fraser Cormack	87978ea272	[libclc] Move tan to the CLC library (#139547 ) There was already a __clc_tan in the OpenCL layer. This commit moves the function over whilst vectorizing it. The function __clc_tan is no longer a public symbol, which should have never been the case.	2025-05-12 14:55:27 +01:00
Fraser Cormack	4f107cd8f8	[libclc] Move sin, cos & sincos to CLC library (#139527 ) This commit moves the remaining FP64 sin and cos helper functions to the CLC library. As a consequence, it formally moves all sin, cos and sincos builtins to the CLC library. Previously, the FP16 and FP32 were nominally there but still in the OpenCL layer while waiting for the FP64 ones. The FP64 builtins are now vectorized as the FP16 and FP32 ones were earlier. One helper table had to be changed. It was previously a table of bytes loaded by each work-item as uint4. Since this doesn't vectorize well, the table was split to load two ulongNs per work-item. While this might not be as efficient on some devices, one mitigating factor is that we were previously loading 48 bytes per work-item in total, but only using 40 of them. With this commit we only load the bytes we need.	2025-05-12 11:32:15 +01:00
Fraser Cormack	c5b750f5af	[libclc] Move log2/log10 tables to CLC tables impl These two tables were being used by the CLC library but their definitions still remained in the OpenCL layer. This worked out after linking the two together but is a layering violation. This had a side effect of removing the two table getters from the final bytecode library, which were never intended to be exposed. These two tables should probably be refactored so allow better vectorization of log/log2/log10, but that is left to future work.	2025-05-01 10:23:28 +01:00
Fraser Cormack	6c4dd8d1d2	[libclc] Move minmag & maxmag to the CLC library (#137982 )	2025-05-01 09:43:40 +01:00
Fraser Cormack	75f040ab3e	[libclc] Clean up unnecessary #undef __CLC_BODYs (#137959 ) This macro is automatically undefined by the various gentype-like helpers.	2025-04-30 16:13:04 +01:00
Fraser Cormack	694a42f018	[libclc] Avoid casting NANs & literals to 'gentype' (#137824 ) By having these already defined as type 'gentype' we can avoid unnecessary casting.	2025-04-29 17:33:21 +01:00
Fraser Cormack	ea688c031e	[libclc] Move fdim to CLC library; simplify (#137811 ) This commit moves the fdim builtin to the CLC library. It simultaneously simplifies the codegen, unifying it between scalar and vector and avoiding bithacking for vector types.	2025-04-29 16:41:07 +01:00
Fraser Cormack	837d5a740f	[libclc][NFC] Remove unary_builtin.inc (#137656 ) We had two ways of achieving the same thing. This commit removes unary_builtin.inc in favour of the approach combining gentype.inc with unary_def.inc. There is no change to the codegen for any target.	2025-04-29 14:17:17 +01:00
Fraser Cormack	78d95cc544	[libclc] Move fract to the CLC library (#137785 ) The builtin was already vectorized so there's no difference to codegen for non-SPIR-V targets.	2025-04-29 13:58:13 +01:00
Fraser Cormack	4609b6a3e7	[libclc] Move fmin & fmax to CLC library (#134218 ) This is an alternative to #128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.	2025-04-29 10:51:24 +01:00
Wenju He	552902455c	[libclc] Add ctz built-in implementation to clc and generic (#135309 )	2025-04-15 15:23:25 +01:00
Fraser Cormack	b0338c3d6c	[libclc] Move shuffle/shuffle2 to the CLC library (#135000 ) This commit moves the shuffle and shuffle2 builtins to the CLC library. In so doing it makes the headers simpler and re-usable for other builtin layers to hook into the CLC functions, if they wish. An additional gentype utility has been made available, which provides a consistent vector-size-or-1 macro for use. The existing __CLC_VECSIZE is defined but empty which is useful in certain applications, such as in concatenation with a type to make a correctly sized scalar or vector type. However, this isn't usable in the same preprocessor lines when wanting to check for specific vector sizes, as e.g., '__CLC_VECSIZE == 2' resolves to '== 2' which is invalid. In local testing this is also useful for the geometric builtins which are only available for scalar types and vector types of 2, 3, or 4 elements. No codegen changes are observed, except the internal shuffle/shuffle2 utility functions are no longer made publicly available.	2025-04-09 15:52:25 +01:00
Fraser Cormack	949bf518fc	[libclc][NFC] Fix up inconsistent copyright headers Some files were accidentally given two copyright headers. Another was missing one. This commit also converts that file's dos line endings to unix ones and reformats a comment.	2025-04-09 12:00:08 +01:00
Romaric Jodin	0e98817458	libclc: frexp: fix implementation regarding denormals (#134823 ) Devices not supporting denormals can compare them true against zero. It leads to result not matching the CTS expectation when either supporting or not denormals. For example for 0x1.008p-140 we get {0x1.008p-140, 0} while the CTS expects {0x1.008p-1, -139} when supporting denormals, or {0, 0} when not supporting denormals (flushed to zero). Ref #129871	2025-04-08 14:50:26 +01:00
Romaric Jodin	7baa7edc00	[libclc]: clspv: add a dummy implememtation for mul_hi (#134094 ) clspv uses a better implementation that is not using a bigger side when not available. Add a dummy implementation for mul_hi to avoid to override the implementation of clspv with the one in libclc.	2025-04-03 10:18:39 +01:00
Fraser Cormack	ddc48fefe3	[libclc] Move native_(exp10\|powr\|tan) to CLC library (#134080 ) These are the three remaining native builtins not yet ported. There are elementwise versions of exp10 and tan which correspond to the intrinsics, which may be preferable to the current versions which route through other native builtins. Those could be changed in a follow-up if desired.	2025-04-02 17:37:17 +01:00
Fraser Cormack	f186041553	[libclc] Move sinh, cosh & tanh to the CLC library (#134063 ) This commit also vectorizes the builtins.	2025-04-02 15:22:42 +01:00
Fraser Cormack	d51525ba36	[libclc] Move lgamma, lgamma_r & tgamma to CLC library (#134053 ) Also enable half-precision variants of tgamma, which were previously missing. Note that unlike recent work, these builtins are not vectorized as part of this commit. Ultimately all three call into lgamma_r, which has heavy control flow (including switch statements) that would be difficult to vectorize. Additionally the lgamma_r algorithm is copyrighted to SunPro so may need a rewrite in the future anyway. There are no codegen changes (to non-SPIR-V targets) with this commit, aside from the new half builtins.	2025-04-02 15:20:32 +01:00
Fraser Cormack	dd19e7eaaa	[libclc] Move cbrt to the CLC library; vectorize (#133940 )	2025-04-02 10:18:24 +01:00
Fraser Cormack	f14ff59da7	[libclc] Move exp, exp2 and expm1 to the CLC library (#133932 ) These all share the use of a common helper function so are handled in one go. These builtins are also now vectorized.	2025-04-01 18:15:37 +01:00
Fraser Cormack	bcf0f8d8aa	[libclc] Move exp10 to the CLC library (#133899 ) The builtin was already nominally in the CLC library; this commit just moves it over. It also vectorizes the builtin on its way.	2025-04-01 14:39:17 +01:00
Fraser Cormack	13a313fe58	[libclc] Move sinpi/cospi/tanpi to the CLC library (#133889 ) Additionally, these builtins are now vectorized. This also moves the native_recip and native_divide builtins as they are used by the tanpi builtin.	2025-04-01 12:03:21 +01:00
Fraser Cormack	ad48fffb53	[libclc] Move several 'native' builtins to CLC library (#129679 ) This commit moves the 'native' builtins that use asm statements to generate LLVM intrinsics to the CLC library. In doing so it converts them to use the appropriate elementwise builtin to generate the same intrinsic; there are no codegen changes to any target except to AMDGPU targets where `native_log` is no longer custom implemented and instead used the clang elementwise builtin. This work forms part of #127196 and indeed with this commit there are no 'generic' builtins using/abusing asm statements - the remaining builtins are specific to the amdgpu and r600 targets.	2025-04-01 09:20:54 +01:00
Fraser Cormack	7a2b160e76	[libclc] Move rootn to the CLC library; optimize (#133735 ) The function was already nominally in the CLC namespace; this commit just moves it over. This commit also vectorizes the builtin to avoid scalarization.	2025-04-01 09:19:50 +01:00
Fraser Cormack	87602f6d03	[libclc] Fix unresolved reference to missing table (#133691 ) Splitting the 'ln_tbl' into two in db98e292 wasn't done thoroughly enough as some references to the old table still remained. This commit fixes the unresolved references by updating to the new split table.	2025-03-31 16:55:23 +01:00
Fraser Cormack	b52977b868	[libclc] Move pow, powr & pown to the CLC library (#133294 ) These functions were already nominally in the CLC library. Similar to others, these builtins are now vectorized and are not broken down into scalar types.	2025-03-28 08:23:24 +00:00
Fraser Cormack	d32e71d7c7	[libclc] Move fmod, remainder & remquo to the CLC library (#132054 ) These functions were already nominally in the CLC namespace; this commit just formally moves them over. Note that 'half' versions of these CLC functions are now provided. Previously the corresponding OpenCL builtins would forward directly to the 'float' versions of the CLC builtins. Now the OpenCL builtins call the 'half' CLC builtins, which themselves call the 'float' CLC versions. This keeps the interface between the OpenCL and CLC libraries neater and keeps the CLC library self-contained. No changes to the generated code for non-SPIR-V targets is observed.	2025-03-27 14:53:19 +00:00
Fraser Cormack	3284559cca	[libclc] Move atan2/atan2pi to the CLC library (#133226 ) As with other work in this area, these builtins are now vectorized. A further table has been split into two. There was discrepancy between comments above the table describing the values as "lead" and "tail" and variables taken from the table called "head" and "tail", so these have been unified as head/tail.	2025-03-27 10:59:09 +00:00
Fraser Cormack	db98e2922f	[libclc] Move log1p/asinh/acosh/atanh to the CLC library (#132956 ) These four functions all related in that they share tables and helper functions. Furthermore, the acosh and atanh builtins call log1p. As with other work in this area, these builtins are now vectorized. To enable this, there are new table accessor functions which return a vector of table values using a vector of indices. These are internally scalarized, in the absence of gather operations. Some tables which were tables of multiple entries (e.g., double2) are split into two separate "low" and "high" tables. This might affect the performance of memory operations but are hopefully mitigated by better codegen overall.	2025-03-27 09:19:07 +00:00
Fraser Cormack	3013458a79	[libclc] Move asinpi/acospi/atanpi to the CLC library (#132918 ) Similar to d46a6999, this commit simultaneously moves these three functions to the CLC library and optimizes them for vector types by avoiding scalarization.	2025-03-25 13:31:53 +00:00
Fraser Cormack	d46a699953	[libclc] Move asin/acos/atan to the CLC library (#132788 ) This commit simultaneously moves these three functions to the CLC library and optimizing them for vector types by avoiding scalarization.	2025-03-25 09:11:32 +00:00
Fraser Cormack	70c325bf6a	[libclc] Move fp32 sincos helpers to CLC library (#132753 ) This commit moves most of the sincos helper functions to the CLC library. It simultaneously vectorizes them with the aim to increase performance for vector types by avoiding scalarization. Some helpers for double types remain as they use various features not yet ready, like 'fract' which in turn relies on 'fmin'; neither of these are in the CLC library. They also use table lookups and type punning which don't translate well to vector versions. As a proof of concept, float and half versions of the sin and cos builtins are now vectorized and use the CLC helpers to do so. They remain in the OpenCL layer but will be simpler to move to the CLC library when the double versions are ready.	2025-03-24 16:09:31 +00:00
Fraser Cormack	7d048674a4	[libclc] Add license headers to files missing them (#132239 ) This commit bulk updates all '.h', '.cl', '.inc', and '.cpp' files to add any missing license headers. The remaining files are generally CMake, SOURCES, scripts, markdown, etc. There are still some '.ll' files which may benefit from a license header. I can't find an example of an LLVM IR file with a license header in the rest of LLVM, but unlike most other (sub)projects, libclc has examples of LLVM IR as source files, compiled and built into the library.	2025-03-24 10:10:38 +00:00
Fraser Cormack	82912fd620	[libclc] Update license headers (#132070 ) This commit bulk-updates the libclc license headers to the current Apache-2.0 WITH LLVM-exception license in situations where they were previously attributed to AMD - and occasionally under an additional single individual contributor - under an MIT license. AMD signed the LLVM relicensing agreement and so agreed for their past contributions under the new LLVM license. The LLVM project also has had a long-standing, unwritten, policy of not adding copyright notices to source code. This policy was recently written up [1]. This commit therefore also removes these copyright notices at the same time. Note that there are outstanding copyright notices attributed to others - and many files missing copyright headers - which will be dealt with in future work. [1] https://llvm.org/docs/DeveloperPolicy.html#embedded-copyright-or-contributed-by-statements	2025-03-20 11:40:09 +00:00
Fraser Cormack	760eeac6a2	[libclc] Reduce bithacking in CLC frexp (#129871 ) Also replace some magic constants with named ones. Checking against FP zero and using isnan and isinf functions allows the optimizer to create one unified @llvm.is.fpclass intrinsic. This results in fewer more canonical IR instructions.	2025-03-05 14:18:51 +00:00
Fraser Cormack	e5d5503e4e	[libclc] Move hypot to CLC library; optimize (#129551 ) This was already nominally in the CLC library; this commit just formally moves it over. It simultaneously optimizes it for vector types by avoiding scalarization.	2025-03-04 14:16:16 +00:00

1 2

96 Commits