llvm-project

Author	SHA1	Message	Date
Matt Arsenault	eed1f2749d	libclc: Use special division for atan2 for DAZ (#190248 ) The AMDGPU DAZ fdiv works fine in this case, so there's maybe something better we could do here.	2026-04-02 22:18:17 +02:00
Matt Arsenault	15bc5b0267	libclc: Simplify fract implementation (#189080 )	2026-03-28 00:16:07 +01:00
Wenju He	7be9972cb2	[libclc] Fix llvm-spirv dependency when llvm-spirv is built in-tree (#188896 ) When SPIRV-LLVM-Translator is built in-tree (i.e., placed in llvm/projects folder), llvm-spirv target exists. Drop legacy llvm-spirv_target dependency (was for non-runtime build) and add llvm-spirv to runtimes dependencies.	2026-03-28 07:06:23 +08:00
Romaric Jodin	e0a1e78738	libclc: do not use int64 in sincos helpers (#188056 ) int64 is optional, thus we do not want to force its usage for clspv.	2026-03-27 18:36:31 +08:00
Matt Arsenault	35781a7d43	libclc: Partially implement nonuniform subgroup reduce functions (#188929 ) For AMDGPU these are identical to the uniform case. Stub out the missing cases with traps to avoid test failures from undefined symbols while keeping the structure consistent.	2026-03-27 09:47:44 +00:00
Matt Arsenault	56e1510d21	libclc: Add work group scan functions (#188829 )	2026-03-27 09:53:35 +01:00
Matt Arsenault	1a32a4185b	libclc: Add subgroup scan functions (#188828 ) Add the base implementation using ds_swizzle which should work on all subtargets. There are at least 2 more paths available for newer targets.	2026-03-27 09:37:27 +01:00
Matt Arsenault	8430e9e8f0	libclc: Update atan2pi (#188707 ) This was originally ported from rocm device libs in 37406a209c75a09f850cd5e5498568d34a6f05d1. Merge in more recent changes.	2026-03-27 08:59:00 +01:00
Matt Arsenault	8e9e9228d4	libclc: Fix missing overloads for atomic_fetch_add/sub (#188478 )	2026-03-27 08:27:36 +01:00
Matt Arsenault	9a51b1eebe	libclc: Implement get_sub_group_id with get_local_linear_id (#188713 )	2026-03-26 11:20:45 +01:00
Matt Arsenault	158282b176	libclc: Update atan2 (#188706 ) This was originally ported from rocm device libs in f9caca8b9dd26a9e7a13e5ca8be57100578e3432. Update for more recent changes.	2026-03-26 11:00:17 +01:00
Matt Arsenault	3aeea10371	libclc: Update erfc (#188570 ) This was originally ported from rocm device libs in 2cf4d5f31204c873d76953bfe3c8b5602b29e789. Merge in more recent changes.	2026-03-26 09:31:06 +01:00
Matt Arsenault	2f11484baa	libclc: Update erf (#188569 ) This was originally ported from rocm device libs in c374cb76f467f01a3f60740703f995a0e1f7a89a. Merge in more recent changes. Also enables vectorization.	2026-03-26 09:21:49 +01:00
Matt Arsenault	180ae2f2b5	libclc: Use prefetch builtin to implement default prefetch (#188491 )	2026-03-25 23:59:21 +01:00
Matt Arsenault	fce6d53850	libclc: Fix directly adding vector booleans (#188540 ) Vector compares return -1 instead of 1, so explicitly select a 0 or 1 instead of directly adding the result.	2026-03-25 23:59:06 +01:00
Matt Arsenault	e46a3299ac	libclc: Fix amdgpu sub_group_broadcast for double (#188594 )	2026-03-25 23:58:40 +01:00
Matt Arsenault	dd57b45522	libclc: Fix amdgpu subgroup reduce for max u64 (#188598 )	2026-03-25 22:02:56 +01:00
Romaric Jodin	b492b55b8c	libclc: clspv does not need workaround for flush_if_daz (#188555 )	2026-03-25 18:23:18 +00:00
Matt Arsenault	d642f598ba	libclc: Update acospi (#188455 ) This was originally ported from rocm device libs in 084124a8fab6fd71d49ac4928d17c3ef8b350ead. Merge in more recent changes.	2026-03-25 13:51:15 +01:00
Matt Arsenault	0effa7caf1	libclc: Update asinpi (#188454 ) This was originally ported from rocm device libs in eea0997566cad3be13df897a06dfda74cbd684b9. Update for more recent changes.	2026-03-25 10:27:09 +00:00
Matt Arsenault	ebe2454dc5	libclc: Update atanpi (#188424 ) This was originally ported from rocm device libs in 03dc366e79cd01afe0bbfad2a7ede3087d6c9356. Merge in more recent changes.	2026-03-25 08:49:01 +00:00
Matt Arsenault	0e49be0d98	libclc: Force assuming fast float fma for AMDGPU (#188245 ) Currently the build uses the default dummy target, which assumes FMA is slow. Force this to assume fast fma, which is the case on any remotely new hardware. In the future if we want better support for older targets, there should be a separate build of the math functions for the slow fma case.	2026-03-25 08:39:45 +00:00
Matt Arsenault	c1cd0d57c3	libclc: Unify fast FMA controls (#188244 ) This was defined in multiple places with different names. Consolidate on one, with a gentype wrapper for it. Also set the value based on the standard FP_FAST_FMA* macros.	2026-03-25 09:24:23 +01:00
Matt Arsenault	46205ff9b4	libclc: Update atanh (#188225 )	2026-03-24 13:07:33 +01:00
Matt Arsenault	8b224162fa	libclc: Update acosh (#188224 ) libclc: Update acosh This was originally ported from rocm device libs in ca4d382e119e1389c83dbb07d9ca0085e88b2944. Merge in more recent changes. Remove unused ep_log.	2026-03-24 13:07:00 +01:00
Matt Arsenault	80245f3b14	libclc: Update asinh (#188219 ) This was originally ported from rocm device libs in 2b4ef39b2f46acf29294f1fbb223ea5a243c2567. Merge in more recent changes.	2026-03-24 12:20:34 +01:00
Matt Arsenault	f046f4518d	libclc: Update tanh (#188215 ) This was originally ported from rocm device libs in f51df5ba8c4512dbeb7828ac0c34f89177b551d6. Merge in more recent changes.	2026-03-24 12:01:30 +01:00
Matt Arsenault	aeba5d62e5	libclc: Update cosh (#188214 ) This was originally ported from rocm device libs in 9cb070f96a8a9af5f513ffba0a8eed362623f216. Merge in more recent changes.	2026-03-24 10:58:59 +00:00
Matt Arsenault	d491f70206	libclc: Update sinh (#188213 ) This was originally ported from rocm device libs in 9f7172965c650627c020264e9dbdb32d005ce69e. Merge in more recent changes.	2026-03-24 10:56:05 +00:00
Matt Arsenault	1b255ef5ac	libclc: Update expm1 (#188209 ) This was originally ported from rocm device libs in 900bd7eb7f5426ad13f624cbf29716afe376c878. Merge in more recent changes.	2026-03-24 11:28:34 +01:00
Matt Arsenault	641569de4e	libclc: Improve tgamma handling (#188066 )	2026-03-24 10:45:07 +01:00
Matt Arsenault	ff3632cdf3	libclc: Update lgamma_r (#188065 ) This was originally ported from rocm device libs in 0ab07e1bde7d002f1a4c30babb6241c0cc366320. Merge in more recent changes.	2026-03-24 10:43:00 +01:00
Wenju He	b929f2f018	[libclc][NFC] Remove __CLC_BIT_INTN macro (#188023 ) This macro was originally introduced in 64735ad63975 for relational built-ins. It is functionally identical to __CLC_S_GENTYPE. Replacing it simplifies gentype.inc, which is widely used in the library.	2026-03-24 15:55:10 +08:00
Matt Arsenault	1a5e52168c	libclc: Update log1p (#188186 )	2026-03-24 08:41:30 +01:00
Matt Arsenault	32c6a53b76	libclc: Fix -cl-denorms-are-zero for rtp and rtn conversions (#188148 )	2026-03-24 07:27:11 +00:00
Matt Arsenault	cba948b2b9	libclc: Update atan (#188095 ) This was originally ported from rocm device libs in 47882923c7b48c00d6c0ea9960b5457e957093c4. Merge in more recent changes.	2026-03-24 08:11:41 +01:00
Matt Arsenault	ff68154480	libclc: Update asin (#188094 ) This was originally ported from rocm device libs in 64a8e1b83e14836f97dab4d28dae498e897804e6. Update for more recent changes.	2026-03-24 08:11:24 +01:00
Matt Arsenault	31aa52086f	libclc: Use nextup and nextdown in place of nextafter (#188141 ) Unfortunately it seems the optimizer isn't able to clean this up, so this is a code quality improvement.	2026-03-24 08:06:01 +01:00
Matt Arsenault	ef5658a289	libclc: Simplify rtz conversion (#188137 )	2026-03-24 08:05:38 +01:00
Wenju He	99a4d78bef	[libclc][NFC] Simplify __CLC_GENTYPE/__CLC_U_GENTYPE/__CLC_S_GENTYPE define in gentype.inc (#188027 ) Reduce macro re-definition overhead.	2026-03-24 08:32:07 +08:00
Matt Arsenault	befad798a9	libclc: Implement remainder with remquo (#187999) This fixes conformance failures for double and without -cl-denorms-are-zero. Optimizations are able to eliminate the unusued quo handling without duplicating most of the code.	2026-03-23 11:08:13 +01:00
Matt Arsenault	1a9fe1769a	libclc: Update remquo (#187998 ) This was failing in the float case without -cl-denorms-are-zero and failing for double. This now passes in all cases. This was originally ported from rocm device libs in 8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port in of more recent changes with a few changes. - Templatification, which almost but doesn't quite enable vectorization yet due to the outer branch and loop. - Merging of the 3 types into one shared code path, instead of duplicating per type with 3 different functions implemented together. There are only some slight differences for the half case, which mostly evaluates as float. - Splitting out of the is_odd tracking, instead of deriving it from the accumulated quotient. This costs an extra register, but saves several instructions. This also enables automatic elimination of all of the quo output handling when this code is reused for remainder. I'm guessing this would be unnecessary if SimplifyDemandedBits handled phis. - Removal of the slow FMA path. I don't see how this would ever be faster with the number of instructions replacing it. This is really a problem for the compiler to solve anyway.	2026-03-23 10:06:59 +00:00
Wenju He	8eccc21e47	[libclc] Replace llvm-dis with llvm-nm in check-external-funcs.test (#187190 ) llvm-nm is covered by extra_deps in runtime build when LLVM_INCLUDE_TESTS is true.	2026-03-21 09:01:44 +08:00
Matt Arsenault	22f5b8db12	libclc: Update acos (#187666 ) This was originally ported from rocm device libs in efeafa1bdaa715733fc100bcd9d21f93c7272368, merge in more recent changes.	2026-03-20 12:43:44 +01:00
Matt Arsenault	c8dd82916b	libclc: Override cbrt for AMDGPU (#187560 )	2026-03-20 08:34:17 +01:00
Matt Arsenault	edbe8277c1	libclc: Use log intrinsic for half and float cases for amdgpu (#187538 ) This is pretty verbose and ugly. We're pulling the base implementation in for the double cases, and scalarizing it. Also fully defining the half and float cases to directly use the intrinsic, for all vector types. It would be much more convenient if we had linker based overrides for the generic implementations, rather than per source file.	2026-03-20 08:33:31 +01:00
Matt Arsenault	a5de509e4e	libclc: Rewrite log implementation as gentype inc file (#187537 ) Follow the ordinary gentype conventions for the log implementation, instead of using a plain header. This doesn't quite yet enable vectorization, due to how the table is currently indexed. This should make it easier for targets to selectively overload the function for a subset of types.	2026-03-20 08:33:16 +01:00
Matt Arsenault	421bf13e4b	libclc: Update trigpi functions (#187579 ) These were originally ported from rocm device libs in bc81ebefb7d9d9d71d20bfee2ce4cccb09701e9b. Merge in more recent changes.	2026-03-20 07:24:23 +00:00
Matt Arsenault	7f8e236136	libclc: Implement sin and cos with sincos (#187571 ) This eliminates duplicated epilog code. The unused half optimizes out just fine after inlining.	2026-03-20 08:09:57 +01:00
Matt Arsenault	090c40545f	libclc: Replace flush_if_daz implementation (#187569 ) The fallback non-canonicalize path didn't work. Use a more straightforward implementation. Eventually this should use the pattern from #172998	2026-03-20 08:09:16 +01:00

1 2 3 4 5 ...

1108 Commits