1108 Commits

Author SHA1 Message Date
Matt Arsenault
eed1f2749d
libclc: Use special division for atan2 for DAZ (#190248)
The AMDGPU DAZ fdiv works fine in this case, so there's
maybe something better we could do here.
2026-04-02 22:18:17 +02:00
Matt Arsenault
15bc5b0267
libclc: Simplify fract implementation (#189080) 2026-03-28 00:16:07 +01:00
Wenju He
7be9972cb2
[libclc] Fix llvm-spirv dependency when llvm-spirv is built in-tree (#188896)
When SPIRV-LLVM-Translator is built in-tree (i.e., placed in
llvm/projects folder), llvm-spirv target exists.

Drop legacy llvm-spirv_target dependency (was for non-runtime build) and
add llvm-spirv to runtimes dependencies.
2026-03-28 07:06:23 +08:00
Romaric Jodin
e0a1e78738
libclc: do not use int64 in sincos helpers (#188056)
int64 is optional, thus we do not want to force its usage for clspv.
2026-03-27 18:36:31 +08:00
Matt Arsenault
35781a7d43
libclc: Partially implement nonuniform subgroup reduce functions (#188929)
For AMDGPU these are identical to the uniform case. Stub out the missing
cases with traps to avoid test failures from undefined symbols while
keeping the structure consistent.
2026-03-27 09:47:44 +00:00
Matt Arsenault
56e1510d21
libclc: Add work group scan functions (#188829) 2026-03-27 09:53:35 +01:00
Matt Arsenault
1a32a4185b
libclc: Add subgroup scan functions (#188828)
Add the base implementation using ds_swizzle which should work
on all subtargets. There are at least 2 more paths available for
newer targets.
2026-03-27 09:37:27 +01:00
Matt Arsenault
8430e9e8f0
libclc: Update atan2pi (#188707)
This was originally ported from rocm device libs in
37406a209c75a09f850cd5e5498568d34a6f05d1. Merge in more
recent changes.
2026-03-27 08:59:00 +01:00
Matt Arsenault
8e9e9228d4
libclc: Fix missing overloads for atomic_fetch_add/sub (#188478) 2026-03-27 08:27:36 +01:00
Matt Arsenault
9a51b1eebe
libclc: Implement get_sub_group_id with get_local_linear_id (#188713) 2026-03-26 11:20:45 +01:00
Matt Arsenault
158282b176
libclc: Update atan2 (#188706)
This was originally ported from rocm device libs in
f9caca8b9dd26a9e7a13e5ca8be57100578e3432. Update for more
recent changes.
2026-03-26 11:00:17 +01:00
Matt Arsenault
3aeea10371
libclc: Update erfc (#188570)
This was originally ported from rocm device libs in
2cf4d5f31204c873d76953bfe3c8b5602b29e789. Merge in more
recent changes.
2026-03-26 09:31:06 +01:00
Matt Arsenault
2f11484baa
libclc: Update erf (#188569)
This was originally ported from rocm device libs in
c374cb76f467f01a3f60740703f995a0e1f7a89a. Merge in more
recent changes. Also enables vectorization.
2026-03-26 09:21:49 +01:00
Matt Arsenault
180ae2f2b5
libclc: Use prefetch builtin to implement default prefetch (#188491) 2026-03-25 23:59:21 +01:00
Matt Arsenault
fce6d53850
libclc: Fix directly adding vector booleans (#188540)
Vector compares return -1 instead of 1, so explicitly select a 0 or 1
instead of directly adding the result.
2026-03-25 23:59:06 +01:00
Matt Arsenault
e46a3299ac
libclc: Fix amdgpu sub_group_broadcast for double (#188594) 2026-03-25 23:58:40 +01:00
Matt Arsenault
dd57b45522
libclc: Fix amdgpu subgroup reduce for max u64 (#188598) 2026-03-25 22:02:56 +01:00
Romaric Jodin
b492b55b8c
libclc: clspv does not need workaround for flush_if_daz (#188555) 2026-03-25 18:23:18 +00:00
Matt Arsenault
d642f598ba
libclc: Update acospi (#188455)
This was originally ported from rocm device libs in
084124a8fab6fd71d49ac4928d17c3ef8b350ead. Merge in more
recent changes.
2026-03-25 13:51:15 +01:00
Matt Arsenault
0effa7caf1
libclc: Update asinpi (#188454)
This was originally ported from rocm device libs in
eea0997566cad3be13df897a06dfda74cbd684b9. Update for more recent
changes.
2026-03-25 10:27:09 +00:00
Matt Arsenault
ebe2454dc5
libclc: Update atanpi (#188424)
This was originally ported from rocm device libs in
03dc366e79cd01afe0bbfad2a7ede3087d6c9356. Merge in more
recent changes.
2026-03-25 08:49:01 +00:00
Matt Arsenault
0e49be0d98
libclc: Force assuming fast float fma for AMDGPU (#188245)
Currently the build uses the default dummy target, which assumes
FMA is slow. Force this to assume fast fma, which is the case on
any remotely new hardware. In the future if we want better support
for older targets, there should be a separate build of the math
functions for the slow fma case.
2026-03-25 08:39:45 +00:00
Matt Arsenault
c1cd0d57c3
libclc: Unify fast FMA controls (#188244)
This was defined in multiple places with different names. Consolidate
on one, with a gentype wrapper for it. Also set the value based on the
standard FP_FAST_FMA* macros.
2026-03-25 09:24:23 +01:00
Matt Arsenault
46205ff9b4
libclc: Update atanh (#188225) 2026-03-24 13:07:33 +01:00
Matt Arsenault
8b224162fa
libclc: Update acosh (#188224)
libclc: Update acosh

This was originally ported from rocm device libs
in ca4d382e119e1389c83dbb07d9ca0085e88b2944. Merge in
more recent changes.

Remove unused ep_log.
2026-03-24 13:07:00 +01:00
Matt Arsenault
80245f3b14
libclc: Update asinh (#188219)
This was originally ported from rocm device libs in
2b4ef39b2f46acf29294f1fbb223ea5a243c2567. Merge in more
recent changes.
2026-03-24 12:20:34 +01:00
Matt Arsenault
f046f4518d
libclc: Update tanh (#188215)
This was originally ported from rocm device libs in
f51df5ba8c4512dbeb7828ac0c34f89177b551d6. Merge in more
recent changes.
2026-03-24 12:01:30 +01:00
Matt Arsenault
aeba5d62e5
libclc: Update cosh (#188214)
This was originally ported from rocm device libs in
9cb070f96a8a9af5f513ffba0a8eed362623f216. Merge in more
recent changes.
2026-03-24 10:58:59 +00:00
Matt Arsenault
d491f70206
libclc: Update sinh (#188213)
This was originally ported from rocm device libs in
9f7172965c650627c020264e9dbdb32d005ce69e. Merge in more
recent changes.
2026-03-24 10:56:05 +00:00
Matt Arsenault
1b255ef5ac
libclc: Update expm1 (#188209)
This was originally ported from rocm device libs in
900bd7eb7f5426ad13f624cbf29716afe376c878. Merge in more
recent changes.
2026-03-24 11:28:34 +01:00
Matt Arsenault
641569de4e
libclc: Improve tgamma handling (#188066) 2026-03-24 10:45:07 +01:00
Matt Arsenault
ff3632cdf3
libclc: Update lgamma_r (#188065)
This was originally ported from rocm device libs in
0ab07e1bde7d002f1a4c30babb6241c0cc366320. Merge
in more recent changes.
2026-03-24 10:43:00 +01:00
Wenju He
b929f2f018
[libclc][NFC] Remove __CLC_BIT_INTN macro (#188023)
This macro was originally introduced in 64735ad63975 for relational
built-ins. It is functionally identical to __CLC_S_GENTYPE. Replacing it
simplifies gentype.inc, which is widely used in the library.
2026-03-24 15:55:10 +08:00
Matt Arsenault
1a5e52168c
libclc: Update log1p (#188186) 2026-03-24 08:41:30 +01:00
Matt Arsenault
32c6a53b76
libclc: Fix -cl-denorms-are-zero for rtp and rtn conversions (#188148) 2026-03-24 07:27:11 +00:00
Matt Arsenault
cba948b2b9
libclc: Update atan (#188095)
This was originally ported from rocm device libs in
47882923c7b48c00d6c0ea9960b5457e957093c4. Merge in more
recent changes.
2026-03-24 08:11:41 +01:00
Matt Arsenault
ff68154480
libclc: Update asin (#188094)
This was originally ported from rocm device libs in
64a8e1b83e14836f97dab4d28dae498e897804e6. Update for more
recent changes.
2026-03-24 08:11:24 +01:00
Matt Arsenault
31aa52086f
libclc: Use nextup and nextdown in place of nextafter (#188141)
Unfortunately it seems the optimizer isn't able to clean this
up, so this is a code quality improvement.
2026-03-24 08:06:01 +01:00
Matt Arsenault
ef5658a289
libclc: Simplify rtz conversion (#188137) 2026-03-24 08:05:38 +01:00
Wenju He
99a4d78bef
[libclc][NFC] Simplify __CLC_GENTYPE/__CLC_U_GENTYPE/__CLC_S_GENTYPE define in gentype.inc (#188027)
Reduce macro re-definition overhead.
2026-03-24 08:32:07 +08:00
Matt Arsenault
befad798a9
libclc: Implement remainder with remquo
(#187999)

This fixes conformance failures for double and
without -cl-denorms-are-zero. Optimizations are
able to eliminate the unusued quo handling without
duplicating most of the code.
2026-03-23 11:08:13 +01:00
Matt Arsenault
1a9fe1769a
libclc: Update remquo (#187998)
This was failing in the float case without -cl-denorms-are-zero
and failing for double. This now passes in all cases.

This was originally ported from rocm device libs in
8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port
in of more recent changes with a few changes.

- Templatification, which almost but doesn't quite enable
  vectorization yet due to the outer branch and loop.

- Merging of the 3 types into one shared code path, instead of
  duplicating  per type with 3 different functions implemented together.
  There are only some slight differences for the half case, which mostly
  evaluates as float.

- Splitting out of the is_odd tracking, instead of deriving it from the
  accumulated quotient. This costs an extra register, but saves several
instructions. This also enables automatic elimination of all of the quo
  output handling when this code is reused for remainder. I'm guessing
  this would be unnecessary if SimplifyDemandedBits handled phis.

- Removal of the slow FMA path. I don't see how this would ever be
  faster with the number of instructions replacing it. This is really a
  problem for the compiler to solve anyway.
2026-03-23 10:06:59 +00:00
Wenju He
8eccc21e47
[libclc] Replace llvm-dis with llvm-nm in check-external-funcs.test (#187190)
llvm-nm is covered by extra_deps in runtime build when
LLVM_INCLUDE_TESTS is true.
2026-03-21 09:01:44 +08:00
Matt Arsenault
22f5b8db12
libclc: Update acos (#187666)
This was originally ported from rocm device libs in
efeafa1bdaa715733fc100bcd9d21f93c7272368, merge in more
recent changes.
2026-03-20 12:43:44 +01:00
Matt Arsenault
c8dd82916b
libclc: Override cbrt for AMDGPU (#187560) 2026-03-20 08:34:17 +01:00
Matt Arsenault
edbe8277c1
libclc: Use log intrinsic for half and float cases for amdgpu (#187538)
This is pretty verbose and ugly. We're pulling the base implementation
in for the double cases, and scalarizing it. Also fully defining the
half and float cases to directly use the intrinsic, for all vector
types. It would be much more convenient if we had linker based overrides
for the generic implementations, rather than per source file.
2026-03-20 08:33:31 +01:00
Matt Arsenault
a5de509e4e
libclc: Rewrite log implementation as gentype inc file (#187537)
Follow the ordinary gentype conventions for the log implementation,
instead of using a plain header. This doesn't quite yet enable
vectorization, due to how the table is currently indexed. This should
make it easier for targets to selectively overload the function for
a subset of types.
2026-03-20 08:33:16 +01:00
Matt Arsenault
421bf13e4b
libclc: Update trigpi functions (#187579)
These were originally ported from rocm device
libs in bc81ebefb7d9d9d71d20bfee2ce4cccb09701e9b.
Merge in more recent changes.
2026-03-20 07:24:23 +00:00
Matt Arsenault
7f8e236136
libclc: Implement sin and cos with sincos (#187571)
This eliminates duplicated epilog code. The unused half
optimizes out just fine after inlining.
2026-03-20 08:09:57 +01:00
Matt Arsenault
090c40545f
libclc: Replace flush_if_daz implementation (#187569)
The fallback non-canonicalize path didn't work. Use a more
straightforward implementation. Eventually this should use
the pattern from #172998
2026-03-20 08:09:16 +01:00