1069 Commits

Author SHA1 Message Date
Wenju He
99a4d78bef
[libclc][NFC] Simplify __CLC_GENTYPE/__CLC_U_GENTYPE/__CLC_S_GENTYPE define in gentype.inc (#188027)
Reduce macro re-definition overhead.
2026-03-24 08:32:07 +08:00
Matt Arsenault
befad798a9
libclc: Implement remainder with remquo
(#187999)

This fixes conformance failures for double and
without -cl-denorms-are-zero. Optimizations are
able to eliminate the unusued quo handling without
duplicating most of the code.
2026-03-23 11:08:13 +01:00
Matt Arsenault
1a9fe1769a
libclc: Update remquo (#187998)
This was failing in the float case without -cl-denorms-are-zero
and failing for double. This now passes in all cases.

This was originally ported from rocm device libs in
8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port
in of more recent changes with a few changes.

- Templatification, which almost but doesn't quite enable
  vectorization yet due to the outer branch and loop.

- Merging of the 3 types into one shared code path, instead of
  duplicating  per type with 3 different functions implemented together.
  There are only some slight differences for the half case, which mostly
  evaluates as float.

- Splitting out of the is_odd tracking, instead of deriving it from the
  accumulated quotient. This costs an extra register, but saves several
instructions. This also enables automatic elimination of all of the quo
  output handling when this code is reused for remainder. I'm guessing
  this would be unnecessary if SimplifyDemandedBits handled phis.

- Removal of the slow FMA path. I don't see how this would ever be
  faster with the number of instructions replacing it. This is really a
  problem for the compiler to solve anyway.
2026-03-23 10:06:59 +00:00
Wenju He
8eccc21e47
[libclc] Replace llvm-dis with llvm-nm in check-external-funcs.test (#187190)
llvm-nm is covered by extra_deps in runtime build when
LLVM_INCLUDE_TESTS is true.
2026-03-21 09:01:44 +08:00
Matt Arsenault
22f5b8db12
libclc: Update acos (#187666)
This was originally ported from rocm device libs in
efeafa1bdaa715733fc100bcd9d21f93c7272368, merge in more
recent changes.
2026-03-20 12:43:44 +01:00
Matt Arsenault
c8dd82916b
libclc: Override cbrt for AMDGPU (#187560) 2026-03-20 08:34:17 +01:00
Matt Arsenault
edbe8277c1
libclc: Use log intrinsic for half and float cases for amdgpu (#187538)
This is pretty verbose and ugly. We're pulling the base implementation
in for the double cases, and scalarizing it. Also fully defining the
half and float cases to directly use the intrinsic, for all vector
types. It would be much more convenient if we had linker based overrides
for the generic implementations, rather than per source file.
2026-03-20 08:33:31 +01:00
Matt Arsenault
a5de509e4e
libclc: Rewrite log implementation as gentype inc file (#187537)
Follow the ordinary gentype conventions for the log implementation,
instead of using a plain header. This doesn't quite yet enable
vectorization, due to how the table is currently indexed. This should
make it easier for targets to selectively overload the function for
a subset of types.
2026-03-20 08:33:16 +01:00
Matt Arsenault
421bf13e4b
libclc: Update trigpi functions (#187579)
These were originally ported from rocm device
libs in bc81ebefb7d9d9d71d20bfee2ce4cccb09701e9b.
Merge in more recent changes.
2026-03-20 07:24:23 +00:00
Matt Arsenault
7f8e236136
libclc: Implement sin and cos with sincos (#187571)
This eliminates duplicated epilog code. The unused half
optimizes out just fine after inlining.
2026-03-20 08:09:57 +01:00
Matt Arsenault
090c40545f
libclc: Replace flush_if_daz implementation (#187569)
The fallback non-canonicalize path didn't work. Use a more
straightforward implementation. Eventually this should use
the pattern from #172998
2026-03-20 08:09:16 +01:00
Wenju He
366da1252b
[libclc] Restore previous generic fmod implementation (#187470)
Restore from before 3c7f70bb9cee for targets that do not yet implement
frem. Keep the __builtin_elementwise_fmod-based implementation for
AMDGPU.
2026-03-20 07:42:36 +08:00
Matt Arsenault
1f8da27714
libclc: Really implement half trig functions (#187457)
Previously these just cast to float.
2026-03-19 09:06:28 +00:00
Matt Arsenault
1ba5b6e875
libclc: Stop implementing sincos as separate sin and cos (#187456) 2026-03-19 09:52:30 +01:00
Matt Arsenault
6e8ca5edde
libclc: Fix nextafter with -cl-denorms-are-zero (#187358)
Follow the suggested behavior of returning +/-FLT_MIN for logical
zeros.
2026-03-19 09:43:58 +01:00
Matt Arsenault
85e9ac5898
libclc: Add canonicalize utility functions (#187357)
This is mostly to work around spirv's canonicalize still
being broken.
2026-03-19 09:43:35 +01:00
Matt Arsenault
9b7c437033
libclc: Update f64 trig functions (#187455)
Most of of this was originally ported from rocm
device libs in 2e6ff0c66e180998425776a27579559dc099732f. Merge
in more recent changes.
2026-03-19 08:34:59 +00:00
Matt Arsenault
0960f0b8fe
libclc: Really implement denormal config checks (#187356)
These should be implementable by checking the behavior of
the canonicalize intrinsic. Hack around spirv still failing
on canonicalize by overriding and assuming DAZ for float.
2026-03-19 08:34:43 +00:00
Matt Arsenault
a54c149061
libclc: Invert subnormal checks (#187355)
The base case is correct denormal handling, not flushing. This
also matches the spec controls, which starts at IEEE and
flushing is enabled with -cl-denorms-are-zero.

Also fix wrong defaults for half and double. Denormal support is
not optional for these.
2026-03-19 08:25:16 +00:00
Matt Arsenault
bdfd9725af
libclc: Move subnormal config file to clc (#187354) 2026-03-19 08:26:57 +01:00
Matt Arsenault
e3198dbe59
libclc: Move FLT_MIN gentype macros (#187272) 2026-03-19 08:16:52 +01:00
Matt Arsenault
9e6ce65962
libclc: Fix vector float tan (#187387) 2026-03-19 08:16:10 +01:00
Matt Arsenault
b15fa374ff
libclc: Improve float trig function handling (#187264)
Most of this was originally ported from rocm device libs in
c0ab2f81e3ab5c7a4c2e0b812a873c3a7f9dca8b, so merge
in more recent changes.
2026-03-18 13:10:58 +00:00
Matt Arsenault
9b8532dd2a
libclc: Clean up sincos macro usage (#187260)
Handle this more like fract, and implement other
address spaces on top of the private overload with
a temporary variable.
2026-03-18 13:56:58 +01:00
Matt Arsenault
2ecd001215
libclc: Use select function instead of ?: for some fp selects (#187253)
It seems that ?: is not quite equivalent to select for floating-point
vectors. With ?:, the resulting IR involves integer bitcasts and
integer vector typed select. Use select so this is an fp-select. This
enables finite math only contexts to optimize out the select.

This feels like it's a clang bug though.
2026-03-18 13:44:35 +01:00
Wenju He
350385e792
[libclc][NFC] Change include style from <...> to "..." (#186537)
project-specific headers should use "". Keep #include <amdhsa_abi.h>

llvm-diff shows no change to libclc.bc for spir--, spir64--, nvptx64--,
nvptx64--nvidiacl, nvptx64-nvidia-cuda and amdgcn-amd-amdhsa-llvm when
LIBCLC_TARGETS_TO_BUILD is "all".
Verified that reversing spir64--/libclc.spv and spir--/libclc.spv to
LLVM bitcode shows no diff.

Also fix `__CLC_INTEGER_CLC_BITFIELD_EXTRACT_SIGNED_H__` guard per
copilot review.

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-03-18 10:29:54 +08:00
Wenju He
b14eea0b23
[libclc] Fix check-libclc dependency on llvm-dis (#186978)
Add llvm-dis to libclc runtime dependencies.
2026-03-17 18:09:36 +08:00
Matt Arsenault
527496bb10
libclc: Improve large float trig reduction (#186984) 2026-03-17 10:36:19 +01:00
Matt Arsenault
107b113b67
libclc: Use small trig reduction for nan (#186983)
Nan should work on either path, but the small reduction
path is smaller. There's also possible codegen benefits to
knowing the large reduction will not need to handle nans.
2026-03-17 10:35:01 +01:00
Matt Arsenault
a0d6e97142
libclc: Use frexp and ldexp in trig reduction instead of bit hacking (#186982) 2026-03-17 10:30:40 +01:00
Matt Arsenault
77ba0d9e24
libclc: Update pow functions (#186890)
The 4 flavors of pow were originally ported from rocm
device libs between c45ec604f593fcb03d770f4398142d2446017f68,
cc5c65b2c25e0a82fbad95f0ce3bb5262e29eeee, and
fe8e00bc3c65115b2e3d2a43cf3d0d756a934a52. Update to a newer
version. Additionally expose fast variants for use by the
libcall optimizer (e.g, __pow_fast) for float types.
2026-03-17 10:25:46 +01:00
Matt Arsenault
fae024aca9
libclc: Move edge case handling of trig functions (#186429)
The explicit handling of nan is unnecessary. Clamp infinities
to nan at the input. This allows optimizations of the following
implementation code to take advantage of the knowledge that it
does not need to handle infinities.
2026-03-17 10:16:01 +01:00
Matt Arsenault
19460ff859
libclc: Use fshr builtin in sincos helpers (#186427) 2026-03-17 09:57:08 +01:00
Matt Arsenault
096371b7e3
libclc: Use struct for ep pair (#186973)
This will enable use with vector types
2026-03-17 09:30:37 +01:00
Wenju He
4abb927bac
[libclc][CMake] Use clang/llvm-ar on Windows (#186726)
When LLVM_TARGETS_TO_BUILD contains host target, runtime build sets
CMAKE_C_COMPILER to clang-cl on Windows.
Changes to fix build on Windows:
- libclc struggles to pass specific flags to clang-cl MSVC-like interface.
- compile flag handling will be consistent across all host systems.
- libclc build is cross-compilation for offloading targets.
2026-03-17 09:45:52 +08:00
Wenju He
1c04e7fada
[libclc] fix compiler check with --target=spirv64 and -disable-llvm-passes (#185376)
Fix "unknown target triple" errors when LLVM_TARGETS_TO_BUILD is empty.

Adding -disable-llvm-passes reduces this to a very basic sanity check
of Clang frontend. This allows the test to pass even if SPIR-V backend
is not enabled, as the frontend can still generate IR for the target.
2026-03-17 07:59:14 +08:00
Joseph Huber
50f471fc62
[libclc] Add generic clc_mem_fence instruction (#185889)
Summary:
This can be made generic, which works as expected on NVPTX and SPIR-V.
We do not replace this for AMDGPU because the dedicated built-in has an
extra argument that controls whether or not local memory or global
memory will be invalidated. It would be correct to use this generic
operation there, but we'd lose that minor optimization so we likely
should not regress.
2026-03-16 08:15:49 -05:00
Matt Arsenault
524b0b8b84
libclc: Remove attempt at subnormal flush from trig functions (#186424) 2026-03-14 08:29:09 +01:00
Matt Arsenault
df4df088d8
libclc: Disable contract in trig reductions (#186432) 2026-03-14 08:28:40 +01:00
Wenju He
e945f7afbe
[libclc][CMake] Rename opencl to clc in add_libclc_library, update comment (#186544)
Align with cmake function name.
2026-03-14 10:04:19 +08:00
Wenju He
8175bd92ea
[libclc][CMake] Check SOURCES and LIBRARIES arguments are not empty (#186542) 2026-03-14 08:52:32 +08:00
Wenju He
5d3aae962d
[libclc][NFC] Rename three .inc files to avoid name conflicts (#186384)
Follow-up of 9b96ebc. There are binary_def.inc and unary_def.inc in
header directory.
- clc_ep.inc -> clc_ep_decl.inc
- relational/binary_def.inc -> relational/relational_binary_def.inc
- relational/unary_def.inc -> relational/relational_unary_def.inc
2026-03-14 07:44:11 +08:00
Wenju He
9b96ebcba5
[libclc] Rename declaration .inc files to *_decl.inc (#186340)
These .inc files in the header directory have the same name as .inc
files in implementation directory. Rename them to avoid name conflict
and avoid wrong file being used in implementation. This fixes bitcode
change when changing `#include <>` to `#include ""`.
2026-03-13 19:33:17 +08:00
Matt Arsenault
ecc4d3edc9
libclc: Fix mismatch in declared and defined function name (#186227) 2026-03-12 20:21:20 +00:00
Wenju He
d352aac32c
[libclc][CMake] Add check-libclc umbrella test target (#186053)
This allows running the full test suite using `ninja check-libclc`.
2026-03-12 19:55:18 +08:00
Matt Arsenault
a372eca60d
libclc: Improve minmag and maxmag (#186092)
Gives slightly better codegen.
2026-03-12 12:24:07 +01:00
Matt Arsenault
85e542fff3
libclc: Improve fdim handling (#186085)
The maxnum is somewhat overconstraining. This gives slightly
better codegen and avoids the noise from the select and convert,
and saves the cost of materializing the nan literal.
2026-03-12 11:52:51 +01:00
Matt Arsenault
ea86511528
libclc: Replace nextafter implementation (#186082)
Use a more straightforward version which allows
optimizations to delete the edge case checks, and also
codegens better. Implement in terms of new nextup and nextdown
helper functions, which are IEEE functions, and usable in other
functions.
2026-03-12 11:52:34 +01:00
Matt Arsenault
3c7f70bb9c
libclc: Replace fmod implementation with elementwise builtin (#186083)
This corresponds to frem, which for whatever reason is a first
class IR instruction. The backend has a heroic freestanding
implementation that should be nearly identical to what was here.
2026-03-12 11:47:39 +01:00
Matt Arsenault
d2c9ebf369
libclc: Update f64 log implementations (#186048)
The log implementation was originally ported from
rocm device libs way back in 44b6117dfde30d6cc292fabca8ecb0cef4657f7a.
Update this to a version derived from the latest. Leaves the float and
half cases alone.
2026-03-12 09:08:09 +00:00