1095 Commits

Author SHA1 Message Date
Matt Arsenault
180ae2f2b5
libclc: Use prefetch builtin to implement default prefetch (#188491) 2026-03-25 23:59:21 +01:00
Matt Arsenault
fce6d53850
libclc: Fix directly adding vector booleans (#188540)
Vector compares return -1 instead of 1, so explicitly select a 0 or 1
instead of directly adding the result.
2026-03-25 23:59:06 +01:00
Matt Arsenault
e46a3299ac
libclc: Fix amdgpu sub_group_broadcast for double (#188594) 2026-03-25 23:58:40 +01:00
Matt Arsenault
dd57b45522
libclc: Fix amdgpu subgroup reduce for max u64 (#188598) 2026-03-25 22:02:56 +01:00
Romaric Jodin
b492b55b8c
libclc: clspv does not need workaround for flush_if_daz (#188555) 2026-03-25 18:23:18 +00:00
Matt Arsenault
d642f598ba
libclc: Update acospi (#188455)
This was originally ported from rocm device libs in
084124a8fab6fd71d49ac4928d17c3ef8b350ead. Merge in more
recent changes.
2026-03-25 13:51:15 +01:00
Matt Arsenault
0effa7caf1
libclc: Update asinpi (#188454)
This was originally ported from rocm device libs in
eea0997566cad3be13df897a06dfda74cbd684b9. Update for more recent
changes.
2026-03-25 10:27:09 +00:00
Matt Arsenault
ebe2454dc5
libclc: Update atanpi (#188424)
This was originally ported from rocm device libs in
03dc366e79cd01afe0bbfad2a7ede3087d6c9356. Merge in more
recent changes.
2026-03-25 08:49:01 +00:00
Matt Arsenault
0e49be0d98
libclc: Force assuming fast float fma for AMDGPU (#188245)
Currently the build uses the default dummy target, which assumes
FMA is slow. Force this to assume fast fma, which is the case on
any remotely new hardware. In the future if we want better support
for older targets, there should be a separate build of the math
functions for the slow fma case.
2026-03-25 08:39:45 +00:00
Matt Arsenault
c1cd0d57c3
libclc: Unify fast FMA controls (#188244)
This was defined in multiple places with different names. Consolidate
on one, with a gentype wrapper for it. Also set the value based on the
standard FP_FAST_FMA* macros.
2026-03-25 09:24:23 +01:00
Matt Arsenault
46205ff9b4
libclc: Update atanh (#188225) 2026-03-24 13:07:33 +01:00
Matt Arsenault
8b224162fa
libclc: Update acosh (#188224)
libclc: Update acosh

This was originally ported from rocm device libs
in ca4d382e119e1389c83dbb07d9ca0085e88b2944. Merge in
more recent changes.

Remove unused ep_log.
2026-03-24 13:07:00 +01:00
Matt Arsenault
80245f3b14
libclc: Update asinh (#188219)
This was originally ported from rocm device libs in
2b4ef39b2f46acf29294f1fbb223ea5a243c2567. Merge in more
recent changes.
2026-03-24 12:20:34 +01:00
Matt Arsenault
f046f4518d
libclc: Update tanh (#188215)
This was originally ported from rocm device libs in
f51df5ba8c4512dbeb7828ac0c34f89177b551d6. Merge in more
recent changes.
2026-03-24 12:01:30 +01:00
Matt Arsenault
aeba5d62e5
libclc: Update cosh (#188214)
This was originally ported from rocm device libs in
9cb070f96a8a9af5f513ffba0a8eed362623f216. Merge in more
recent changes.
2026-03-24 10:58:59 +00:00
Matt Arsenault
d491f70206
libclc: Update sinh (#188213)
This was originally ported from rocm device libs in
9f7172965c650627c020264e9dbdb32d005ce69e. Merge in more
recent changes.
2026-03-24 10:56:05 +00:00
Matt Arsenault
1b255ef5ac
libclc: Update expm1 (#188209)
This was originally ported from rocm device libs in
900bd7eb7f5426ad13f624cbf29716afe376c878. Merge in more
recent changes.
2026-03-24 11:28:34 +01:00
Matt Arsenault
641569de4e
libclc: Improve tgamma handling (#188066) 2026-03-24 10:45:07 +01:00
Matt Arsenault
ff3632cdf3
libclc: Update lgamma_r (#188065)
This was originally ported from rocm device libs in
0ab07e1bde7d002f1a4c30babb6241c0cc366320. Merge
in more recent changes.
2026-03-24 10:43:00 +01:00
Wenju He
b929f2f018
[libclc][NFC] Remove __CLC_BIT_INTN macro (#188023)
This macro was originally introduced in 64735ad63975 for relational
built-ins. It is functionally identical to __CLC_S_GENTYPE. Replacing it
simplifies gentype.inc, which is widely used in the library.
2026-03-24 15:55:10 +08:00
Matt Arsenault
1a5e52168c
libclc: Update log1p (#188186) 2026-03-24 08:41:30 +01:00
Matt Arsenault
32c6a53b76
libclc: Fix -cl-denorms-are-zero for rtp and rtn conversions (#188148) 2026-03-24 07:27:11 +00:00
Matt Arsenault
cba948b2b9
libclc: Update atan (#188095)
This was originally ported from rocm device libs in
47882923c7b48c00d6c0ea9960b5457e957093c4. Merge in more
recent changes.
2026-03-24 08:11:41 +01:00
Matt Arsenault
ff68154480
libclc: Update asin (#188094)
This was originally ported from rocm device libs in
64a8e1b83e14836f97dab4d28dae498e897804e6. Update for more
recent changes.
2026-03-24 08:11:24 +01:00
Matt Arsenault
31aa52086f
libclc: Use nextup and nextdown in place of nextafter (#188141)
Unfortunately it seems the optimizer isn't able to clean this
up, so this is a code quality improvement.
2026-03-24 08:06:01 +01:00
Matt Arsenault
ef5658a289
libclc: Simplify rtz conversion (#188137) 2026-03-24 08:05:38 +01:00
Wenju He
99a4d78bef
[libclc][NFC] Simplify __CLC_GENTYPE/__CLC_U_GENTYPE/__CLC_S_GENTYPE define in gentype.inc (#188027)
Reduce macro re-definition overhead.
2026-03-24 08:32:07 +08:00
Matt Arsenault
befad798a9
libclc: Implement remainder with remquo
(#187999)

This fixes conformance failures for double and
without -cl-denorms-are-zero. Optimizations are
able to eliminate the unusued quo handling without
duplicating most of the code.
2026-03-23 11:08:13 +01:00
Matt Arsenault
1a9fe1769a
libclc: Update remquo (#187998)
This was failing in the float case without -cl-denorms-are-zero
and failing for double. This now passes in all cases.

This was originally ported from rocm device libs in
8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port
in of more recent changes with a few changes.

- Templatification, which almost but doesn't quite enable
  vectorization yet due to the outer branch and loop.

- Merging of the 3 types into one shared code path, instead of
  duplicating  per type with 3 different functions implemented together.
  There are only some slight differences for the half case, which mostly
  evaluates as float.

- Splitting out of the is_odd tracking, instead of deriving it from the
  accumulated quotient. This costs an extra register, but saves several
instructions. This also enables automatic elimination of all of the quo
  output handling when this code is reused for remainder. I'm guessing
  this would be unnecessary if SimplifyDemandedBits handled phis.

- Removal of the slow FMA path. I don't see how this would ever be
  faster with the number of instructions replacing it. This is really a
  problem for the compiler to solve anyway.
2026-03-23 10:06:59 +00:00
Wenju He
8eccc21e47
[libclc] Replace llvm-dis with llvm-nm in check-external-funcs.test (#187190)
llvm-nm is covered by extra_deps in runtime build when
LLVM_INCLUDE_TESTS is true.
2026-03-21 09:01:44 +08:00
Matt Arsenault
22f5b8db12
libclc: Update acos (#187666)
This was originally ported from rocm device libs in
efeafa1bdaa715733fc100bcd9d21f93c7272368, merge in more
recent changes.
2026-03-20 12:43:44 +01:00
Matt Arsenault
c8dd82916b
libclc: Override cbrt for AMDGPU (#187560) 2026-03-20 08:34:17 +01:00
Matt Arsenault
edbe8277c1
libclc: Use log intrinsic for half and float cases for amdgpu (#187538)
This is pretty verbose and ugly. We're pulling the base implementation
in for the double cases, and scalarizing it. Also fully defining the
half and float cases to directly use the intrinsic, for all vector
types. It would be much more convenient if we had linker based overrides
for the generic implementations, rather than per source file.
2026-03-20 08:33:31 +01:00
Matt Arsenault
a5de509e4e
libclc: Rewrite log implementation as gentype inc file (#187537)
Follow the ordinary gentype conventions for the log implementation,
instead of using a plain header. This doesn't quite yet enable
vectorization, due to how the table is currently indexed. This should
make it easier for targets to selectively overload the function for
a subset of types.
2026-03-20 08:33:16 +01:00
Matt Arsenault
421bf13e4b
libclc: Update trigpi functions (#187579)
These were originally ported from rocm device
libs in bc81ebefb7d9d9d71d20bfee2ce4cccb09701e9b.
Merge in more recent changes.
2026-03-20 07:24:23 +00:00
Matt Arsenault
7f8e236136
libclc: Implement sin and cos with sincos (#187571)
This eliminates duplicated epilog code. The unused half
optimizes out just fine after inlining.
2026-03-20 08:09:57 +01:00
Matt Arsenault
090c40545f
libclc: Replace flush_if_daz implementation (#187569)
The fallback non-canonicalize path didn't work. Use a more
straightforward implementation. Eventually this should use
the pattern from #172998
2026-03-20 08:09:16 +01:00
Wenju He
366da1252b
[libclc] Restore previous generic fmod implementation (#187470)
Restore from before 3c7f70bb9cee for targets that do not yet implement
frem. Keep the __builtin_elementwise_fmod-based implementation for
AMDGPU.
2026-03-20 07:42:36 +08:00
Matt Arsenault
1f8da27714
libclc: Really implement half trig functions (#187457)
Previously these just cast to float.
2026-03-19 09:06:28 +00:00
Matt Arsenault
1ba5b6e875
libclc: Stop implementing sincos as separate sin and cos (#187456) 2026-03-19 09:52:30 +01:00
Matt Arsenault
6e8ca5edde
libclc: Fix nextafter with -cl-denorms-are-zero (#187358)
Follow the suggested behavior of returning +/-FLT_MIN for logical
zeros.
2026-03-19 09:43:58 +01:00
Matt Arsenault
85e9ac5898
libclc: Add canonicalize utility functions (#187357)
This is mostly to work around spirv's canonicalize still
being broken.
2026-03-19 09:43:35 +01:00
Matt Arsenault
9b7c437033
libclc: Update f64 trig functions (#187455)
Most of of this was originally ported from rocm
device libs in 2e6ff0c66e180998425776a27579559dc099732f. Merge
in more recent changes.
2026-03-19 08:34:59 +00:00
Matt Arsenault
0960f0b8fe
libclc: Really implement denormal config checks (#187356)
These should be implementable by checking the behavior of
the canonicalize intrinsic. Hack around spirv still failing
on canonicalize by overriding and assuming DAZ for float.
2026-03-19 08:34:43 +00:00
Matt Arsenault
a54c149061
libclc: Invert subnormal checks (#187355)
The base case is correct denormal handling, not flushing. This
also matches the spec controls, which starts at IEEE and
flushing is enabled with -cl-denorms-are-zero.

Also fix wrong defaults for half and double. Denormal support is
not optional for these.
2026-03-19 08:25:16 +00:00
Matt Arsenault
bdfd9725af
libclc: Move subnormal config file to clc (#187354) 2026-03-19 08:26:57 +01:00
Matt Arsenault
e3198dbe59
libclc: Move FLT_MIN gentype macros (#187272) 2026-03-19 08:16:52 +01:00
Matt Arsenault
9e6ce65962
libclc: Fix vector float tan (#187387) 2026-03-19 08:16:10 +01:00
Matt Arsenault
b15fa374ff
libclc: Improve float trig function handling (#187264)
Most of this was originally ported from rocm device libs in
c0ab2f81e3ab5c7a4c2e0b812a873c3a7f9dca8b, so merge
in more recent changes.
2026-03-18 13:10:58 +00:00
Matt Arsenault
9b8532dd2a
libclc: Clean up sincos macro usage (#187260)
Handle this more like fract, and implement other
address spaces on top of the private overload with
a temporary variable.
2026-03-18 13:56:58 +01:00