llvm-project

Author	SHA1	Message	Date
Wenju He	99a4d78bef	[libclc][NFC] Simplify __CLC_GENTYPE/__CLC_U_GENTYPE/__CLC_S_GENTYPE define in gentype.inc (#188027 ) Reduce macro re-definition overhead.	2026-03-24 08:32:07 +08:00
Matt Arsenault	befad798a9	libclc: Implement remainder with remquo (#187999) This fixes conformance failures for double and without -cl-denorms-are-zero. Optimizations are able to eliminate the unusued quo handling without duplicating most of the code.	2026-03-23 11:08:13 +01:00
Matt Arsenault	1a9fe1769a	libclc: Update remquo (#187998 ) This was failing in the float case without -cl-denorms-are-zero and failing for double. This now passes in all cases. This was originally ported from rocm device libs in 8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port in of more recent changes with a few changes. - Templatification, which almost but doesn't quite enable vectorization yet due to the outer branch and loop. - Merging of the 3 types into one shared code path, instead of duplicating per type with 3 different functions implemented together. There are only some slight differences for the half case, which mostly evaluates as float. - Splitting out of the is_odd tracking, instead of deriving it from the accumulated quotient. This costs an extra register, but saves several instructions. This also enables automatic elimination of all of the quo output handling when this code is reused for remainder. I'm guessing this would be unnecessary if SimplifyDemandedBits handled phis. - Removal of the slow FMA path. I don't see how this would ever be faster with the number of instructions replacing it. This is really a problem for the compiler to solve anyway.	2026-03-23 10:06:59 +00:00
Wenju He	8eccc21e47	[libclc] Replace llvm-dis with llvm-nm in check-external-funcs.test (#187190 ) llvm-nm is covered by extra_deps in runtime build when LLVM_INCLUDE_TESTS is true.	2026-03-21 09:01:44 +08:00
Matt Arsenault	22f5b8db12	libclc: Update acos (#187666 ) This was originally ported from rocm device libs in efeafa1bdaa715733fc100bcd9d21f93c7272368, merge in more recent changes.	2026-03-20 12:43:44 +01:00
Matt Arsenault	c8dd82916b	libclc: Override cbrt for AMDGPU (#187560 )	2026-03-20 08:34:17 +01:00
Matt Arsenault	edbe8277c1	libclc: Use log intrinsic for half and float cases for amdgpu (#187538 ) This is pretty verbose and ugly. We're pulling the base implementation in for the double cases, and scalarizing it. Also fully defining the half and float cases to directly use the intrinsic, for all vector types. It would be much more convenient if we had linker based overrides for the generic implementations, rather than per source file.	2026-03-20 08:33:31 +01:00
Matt Arsenault	a5de509e4e	libclc: Rewrite log implementation as gentype inc file (#187537 ) Follow the ordinary gentype conventions for the log implementation, instead of using a plain header. This doesn't quite yet enable vectorization, due to how the table is currently indexed. This should make it easier for targets to selectively overload the function for a subset of types.	2026-03-20 08:33:16 +01:00
Matt Arsenault	421bf13e4b	libclc: Update trigpi functions (#187579 ) These were originally ported from rocm device libs in bc81ebefb7d9d9d71d20bfee2ce4cccb09701e9b. Merge in more recent changes.	2026-03-20 07:24:23 +00:00
Matt Arsenault	7f8e236136	libclc: Implement sin and cos with sincos (#187571 ) This eliminates duplicated epilog code. The unused half optimizes out just fine after inlining.	2026-03-20 08:09:57 +01:00
Matt Arsenault	090c40545f	libclc: Replace flush_if_daz implementation (#187569 ) The fallback non-canonicalize path didn't work. Use a more straightforward implementation. Eventually this should use the pattern from #172998	2026-03-20 08:09:16 +01:00
Wenju He	366da1252b	[libclc] Restore previous generic fmod implementation (#187470 ) Restore from before 3c7f70bb9cee for targets that do not yet implement frem. Keep the __builtin_elementwise_fmod-based implementation for AMDGPU.	2026-03-20 07:42:36 +08:00
Matt Arsenault	1f8da27714	libclc: Really implement half trig functions (#187457 ) Previously these just cast to float.	2026-03-19 09:06:28 +00:00
Matt Arsenault	1ba5b6e875	libclc: Stop implementing sincos as separate sin and cos (#187456 )	2026-03-19 09:52:30 +01:00
Matt Arsenault	6e8ca5edde	libclc: Fix nextafter with -cl-denorms-are-zero (#187358 ) Follow the suggested behavior of returning +/-FLT_MIN for logical zeros.	2026-03-19 09:43:58 +01:00
Matt Arsenault	85e9ac5898	libclc: Add canonicalize utility functions (#187357 ) This is mostly to work around spirv's canonicalize still being broken.	2026-03-19 09:43:35 +01:00
Matt Arsenault	9b7c437033	libclc: Update f64 trig functions (#187455 ) Most of of this was originally ported from rocm device libs in 2e6ff0c66e180998425776a27579559dc099732f. Merge in more recent changes.	2026-03-19 08:34:59 +00:00
Matt Arsenault	0960f0b8fe	libclc: Really implement denormal config checks (#187356 ) These should be implementable by checking the behavior of the canonicalize intrinsic. Hack around spirv still failing on canonicalize by overriding and assuming DAZ for float.	2026-03-19 08:34:43 +00:00
Matt Arsenault	a54c149061	libclc: Invert subnormal checks (#187355 ) The base case is correct denormal handling, not flushing. This also matches the spec controls, which starts at IEEE and flushing is enabled with -cl-denorms-are-zero. Also fix wrong defaults for half and double. Denormal support is not optional for these.	2026-03-19 08:25:16 +00:00
Matt Arsenault	bdfd9725af	libclc: Move subnormal config file to clc (#187354 )	2026-03-19 08:26:57 +01:00
Matt Arsenault	e3198dbe59	libclc: Move FLT_MIN gentype macros (#187272 )	2026-03-19 08:16:52 +01:00
Matt Arsenault	9e6ce65962	libclc: Fix vector float tan (#187387 )	2026-03-19 08:16:10 +01:00
Matt Arsenault	b15fa374ff	libclc: Improve float trig function handling (#187264 ) Most of this was originally ported from rocm device libs in c0ab2f81e3ab5c7a4c2e0b812a873c3a7f9dca8b, so merge in more recent changes.	2026-03-18 13:10:58 +00:00
Matt Arsenault	9b8532dd2a	libclc: Clean up sincos macro usage (#187260 ) Handle this more like fract, and implement other address spaces on top of the private overload with a temporary variable.	2026-03-18 13:56:58 +01:00
Matt Arsenault	2ecd001215	libclc: Use select function instead of ?: for some fp selects (#187253 ) It seems that ?: is not quite equivalent to select for floating-point vectors. With ?:, the resulting IR involves integer bitcasts and integer vector typed select. Use select so this is an fp-select. This enables finite math only contexts to optimize out the select. This feels like it's a clang bug though.	2026-03-18 13:44:35 +01:00
Wenju He	350385e792	[libclc][NFC] Change include style from <...> to "..." (#186537 ) project-specific headers should use "". Keep #include <amdhsa_abi.h> llvm-diff shows no change to libclc.bc for spir--, spir64--, nvptx64--, nvptx64--nvidiacl, nvptx64-nvidia-cuda and amdgcn-amd-amdhsa-llvm when LIBCLC_TARGETS_TO_BUILD is "all". Verified that reversing spir64--/libclc.spv and spir--/libclc.spv to LLVM bitcode shows no diff. Also fix `__CLC_INTEGER_CLC_BITFIELD_EXTRACT_SIGNED_H__` guard per copilot review. --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-18 10:29:54 +08:00
Wenju He	b14eea0b23	[libclc] Fix check-libclc dependency on llvm-dis (#186978 ) Add llvm-dis to libclc runtime dependencies.	2026-03-17 18:09:36 +08:00
Matt Arsenault	527496bb10	libclc: Improve large float trig reduction (#186984 )	2026-03-17 10:36:19 +01:00
Matt Arsenault	107b113b67	libclc: Use small trig reduction for nan (#186983 ) Nan should work on either path, but the small reduction path is smaller. There's also possible codegen benefits to knowing the large reduction will not need to handle nans.	2026-03-17 10:35:01 +01:00
Matt Arsenault	a0d6e97142	libclc: Use frexp and ldexp in trig reduction instead of bit hacking (#186982 )	2026-03-17 10:30:40 +01:00
Matt Arsenault	77ba0d9e24	libclc: Update pow functions (#186890 ) The 4 flavors of pow were originally ported from rocm device libs between c45ec604f593fcb03d770f4398142d2446017f68, cc5c65b2c25e0a82fbad95f0ce3bb5262e29eeee, and fe8e00bc3c65115b2e3d2a43cf3d0d756a934a52. Update to a newer version. Additionally expose fast variants for use by the libcall optimizer (e.g, __pow_fast) for float types.	2026-03-17 10:25:46 +01:00
Matt Arsenault	fae024aca9	libclc: Move edge case handling of trig functions (#186429 ) The explicit handling of nan is unnecessary. Clamp infinities to nan at the input. This allows optimizations of the following implementation code to take advantage of the knowledge that it does not need to handle infinities.	2026-03-17 10:16:01 +01:00
Matt Arsenault	19460ff859	libclc: Use fshr builtin in sincos helpers (#186427 )	2026-03-17 09:57:08 +01:00
Matt Arsenault	096371b7e3	libclc: Use struct for ep pair (#186973 ) This will enable use with vector types	2026-03-17 09:30:37 +01:00
Wenju He	4abb927bac	[libclc][CMake] Use clang/llvm-ar on Windows (#186726 ) When LLVM_TARGETS_TO_BUILD contains host target, runtime build sets CMAKE_C_COMPILER to clang-cl on Windows. Changes to fix build on Windows: - libclc struggles to pass specific flags to clang-cl MSVC-like interface. - compile flag handling will be consistent across all host systems. - libclc build is cross-compilation for offloading targets.	2026-03-17 09:45:52 +08:00
Wenju He	1c04e7fada	[libclc] fix compiler check with --target=spirv64 and -disable-llvm-passes (#185376 ) Fix "unknown target triple" errors when LLVM_TARGETS_TO_BUILD is empty. Adding -disable-llvm-passes reduces this to a very basic sanity check of Clang frontend. This allows the test to pass even if SPIR-V backend is not enabled, as the frontend can still generate IR for the target.	2026-03-17 07:59:14 +08:00
Joseph Huber	50f471fc62	[libclc] Add generic clc_mem_fence instruction (#185889 ) Summary: This can be made generic, which works as expected on NVPTX and SPIR-V. We do not replace this for AMDGPU because the dedicated built-in has an extra argument that controls whether or not local memory or global memory will be invalidated. It would be correct to use this generic operation there, but we'd lose that minor optimization so we likely should not regress.	2026-03-16 08:15:49 -05:00
Matt Arsenault	524b0b8b84	libclc: Remove attempt at subnormal flush from trig functions (#186424 )	2026-03-14 08:29:09 +01:00
Matt Arsenault	df4df088d8	libclc: Disable contract in trig reductions (#186432 )	2026-03-14 08:28:40 +01:00
Wenju He	e945f7afbe	[libclc][CMake] Rename opencl to clc in add_libclc_library, update comment (#186544 ) Align with cmake function name.	2026-03-14 10:04:19 +08:00
Wenju He	8175bd92ea	[libclc][CMake] Check SOURCES and LIBRARIES arguments are not empty (#186542 )	2026-03-14 08:52:32 +08:00
Wenju He	5d3aae962d	[libclc][NFC] Rename three .inc files to avoid name conflicts (#186384 ) Follow-up of 9b96ebc. There are binary_def.inc and unary_def.inc in header directory. - clc_ep.inc -> clc_ep_decl.inc - relational/binary_def.inc -> relational/relational_binary_def.inc - relational/unary_def.inc -> relational/relational_unary_def.inc	2026-03-14 07:44:11 +08:00
Wenju He	9b96ebcba5	[libclc] Rename declaration .inc files to *_decl.inc (#186340 ) These .inc files in the header directory have the same name as .inc files in implementation directory. Rename them to avoid name conflict and avoid wrong file being used in implementation. This fixes bitcode change when changing `#include <>` to `#include ""`.	2026-03-13 19:33:17 +08:00
Matt Arsenault	ecc4d3edc9	libclc: Fix mismatch in declared and defined function name (#186227 )	2026-03-12 20:21:20 +00:00
Wenju He	d352aac32c	[libclc][CMake] Add check-libclc umbrella test target (#186053 ) This allows running the full test suite using `ninja check-libclc`.	2026-03-12 19:55:18 +08:00
Matt Arsenault	a372eca60d	libclc: Improve minmag and maxmag (#186092 ) Gives slightly better codegen.	2026-03-12 12:24:07 +01:00
Matt Arsenault	85e542fff3	libclc: Improve fdim handling (#186085 ) The maxnum is somewhat overconstraining. This gives slightly better codegen and avoids the noise from the select and convert, and saves the cost of materializing the nan literal.	2026-03-12 11:52:51 +01:00
Matt Arsenault	ea86511528	libclc: Replace nextafter implementation (#186082 ) Use a more straightforward version which allows optimizations to delete the edge case checks, and also codegens better. Implement in terms of new nextup and nextdown helper functions, which are IEEE functions, and usable in other functions.	2026-03-12 11:52:34 +01:00
Matt Arsenault	3c7f70bb9c	libclc: Replace fmod implementation with elementwise builtin (#186083 ) This corresponds to frem, which for whatever reason is a first class IR instruction. The backend has a heroic freestanding implementation that should be nearly identical to what was here.	2026-03-12 11:47:39 +01:00
Matt Arsenault	d2c9ebf369	libclc: Update f64 log implementations (#186048 ) The log implementation was originally ported from rocm device libs way back in 44b6117dfde30d6cc292fabca8ecb0cef4657f7a. Update this to a version derived from the latest. Leaves the float and half cases alone.	2026-03-12 09:08:09 +00:00

1 2 3 4 5 ...

1069 Commits