1059 Commits

Author SHA1 Message Date
Matt Arsenault
090c40545f
libclc: Replace flush_if_daz implementation (#187569)
The fallback non-canonicalize path didn't work. Use a more
straightforward implementation. Eventually this should use
the pattern from #172998
2026-03-20 08:09:16 +01:00
Wenju He
366da1252b
[libclc] Restore previous generic fmod implementation (#187470)
Restore from before 3c7f70bb9cee for targets that do not yet implement
frem. Keep the __builtin_elementwise_fmod-based implementation for
AMDGPU.
2026-03-20 07:42:36 +08:00
Matt Arsenault
1f8da27714
libclc: Really implement half trig functions (#187457)
Previously these just cast to float.
2026-03-19 09:06:28 +00:00
Matt Arsenault
1ba5b6e875
libclc: Stop implementing sincos as separate sin and cos (#187456) 2026-03-19 09:52:30 +01:00
Matt Arsenault
6e8ca5edde
libclc: Fix nextafter with -cl-denorms-are-zero (#187358)
Follow the suggested behavior of returning +/-FLT_MIN for logical
zeros.
2026-03-19 09:43:58 +01:00
Matt Arsenault
85e9ac5898
libclc: Add canonicalize utility functions (#187357)
This is mostly to work around spirv's canonicalize still
being broken.
2026-03-19 09:43:35 +01:00
Matt Arsenault
9b7c437033
libclc: Update f64 trig functions (#187455)
Most of of this was originally ported from rocm
device libs in 2e6ff0c66e180998425776a27579559dc099732f. Merge
in more recent changes.
2026-03-19 08:34:59 +00:00
Matt Arsenault
0960f0b8fe
libclc: Really implement denormal config checks (#187356)
These should be implementable by checking the behavior of
the canonicalize intrinsic. Hack around spirv still failing
on canonicalize by overriding and assuming DAZ for float.
2026-03-19 08:34:43 +00:00
Matt Arsenault
a54c149061
libclc: Invert subnormal checks (#187355)
The base case is correct denormal handling, not flushing. This
also matches the spec controls, which starts at IEEE and
flushing is enabled with -cl-denorms-are-zero.

Also fix wrong defaults for half and double. Denormal support is
not optional for these.
2026-03-19 08:25:16 +00:00
Matt Arsenault
bdfd9725af
libclc: Move subnormal config file to clc (#187354) 2026-03-19 08:26:57 +01:00
Matt Arsenault
e3198dbe59
libclc: Move FLT_MIN gentype macros (#187272) 2026-03-19 08:16:52 +01:00
Matt Arsenault
9e6ce65962
libclc: Fix vector float tan (#187387) 2026-03-19 08:16:10 +01:00
Matt Arsenault
b15fa374ff
libclc: Improve float trig function handling (#187264)
Most of this was originally ported from rocm device libs in
c0ab2f81e3ab5c7a4c2e0b812a873c3a7f9dca8b, so merge
in more recent changes.
2026-03-18 13:10:58 +00:00
Matt Arsenault
9b8532dd2a
libclc: Clean up sincos macro usage (#187260)
Handle this more like fract, and implement other
address spaces on top of the private overload with
a temporary variable.
2026-03-18 13:56:58 +01:00
Matt Arsenault
2ecd001215
libclc: Use select function instead of ?: for some fp selects (#187253)
It seems that ?: is not quite equivalent to select for floating-point
vectors. With ?:, the resulting IR involves integer bitcasts and
integer vector typed select. Use select so this is an fp-select. This
enables finite math only contexts to optimize out the select.

This feels like it's a clang bug though.
2026-03-18 13:44:35 +01:00
Wenju He
350385e792
[libclc][NFC] Change include style from <...> to "..." (#186537)
project-specific headers should use "". Keep #include <amdhsa_abi.h>

llvm-diff shows no change to libclc.bc for spir--, spir64--, nvptx64--,
nvptx64--nvidiacl, nvptx64-nvidia-cuda and amdgcn-amd-amdhsa-llvm when
LIBCLC_TARGETS_TO_BUILD is "all".
Verified that reversing spir64--/libclc.spv and spir--/libclc.spv to
LLVM bitcode shows no diff.

Also fix `__CLC_INTEGER_CLC_BITFIELD_EXTRACT_SIGNED_H__` guard per
copilot review.

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-03-18 10:29:54 +08:00
Wenju He
b14eea0b23
[libclc] Fix check-libclc dependency on llvm-dis (#186978)
Add llvm-dis to libclc runtime dependencies.
2026-03-17 18:09:36 +08:00
Matt Arsenault
527496bb10
libclc: Improve large float trig reduction (#186984) 2026-03-17 10:36:19 +01:00
Matt Arsenault
107b113b67
libclc: Use small trig reduction for nan (#186983)
Nan should work on either path, but the small reduction
path is smaller. There's also possible codegen benefits to
knowing the large reduction will not need to handle nans.
2026-03-17 10:35:01 +01:00
Matt Arsenault
a0d6e97142
libclc: Use frexp and ldexp in trig reduction instead of bit hacking (#186982) 2026-03-17 10:30:40 +01:00
Matt Arsenault
77ba0d9e24
libclc: Update pow functions (#186890)
The 4 flavors of pow were originally ported from rocm
device libs between c45ec604f593fcb03d770f4398142d2446017f68,
cc5c65b2c25e0a82fbad95f0ce3bb5262e29eeee, and
fe8e00bc3c65115b2e3d2a43cf3d0d756a934a52. Update to a newer
version. Additionally expose fast variants for use by the
libcall optimizer (e.g, __pow_fast) for float types.
2026-03-17 10:25:46 +01:00
Matt Arsenault
fae024aca9
libclc: Move edge case handling of trig functions (#186429)
The explicit handling of nan is unnecessary. Clamp infinities
to nan at the input. This allows optimizations of the following
implementation code to take advantage of the knowledge that it
does not need to handle infinities.
2026-03-17 10:16:01 +01:00
Matt Arsenault
19460ff859
libclc: Use fshr builtin in sincos helpers (#186427) 2026-03-17 09:57:08 +01:00
Matt Arsenault
096371b7e3
libclc: Use struct for ep pair (#186973)
This will enable use with vector types
2026-03-17 09:30:37 +01:00
Wenju He
4abb927bac
[libclc][CMake] Use clang/llvm-ar on Windows (#186726)
When LLVM_TARGETS_TO_BUILD contains host target, runtime build sets
CMAKE_C_COMPILER to clang-cl on Windows.
Changes to fix build on Windows:
- libclc struggles to pass specific flags to clang-cl MSVC-like interface.
- compile flag handling will be consistent across all host systems.
- libclc build is cross-compilation for offloading targets.
2026-03-17 09:45:52 +08:00
Wenju He
1c04e7fada
[libclc] fix compiler check with --target=spirv64 and -disable-llvm-passes (#185376)
Fix "unknown target triple" errors when LLVM_TARGETS_TO_BUILD is empty.

Adding -disable-llvm-passes reduces this to a very basic sanity check
of Clang frontend. This allows the test to pass even if SPIR-V backend
is not enabled, as the frontend can still generate IR for the target.
2026-03-17 07:59:14 +08:00
Joseph Huber
50f471fc62
[libclc] Add generic clc_mem_fence instruction (#185889)
Summary:
This can be made generic, which works as expected on NVPTX and SPIR-V.
We do not replace this for AMDGPU because the dedicated built-in has an
extra argument that controls whether or not local memory or global
memory will be invalidated. It would be correct to use this generic
operation there, but we'd lose that minor optimization so we likely
should not regress.
2026-03-16 08:15:49 -05:00
Matt Arsenault
524b0b8b84
libclc: Remove attempt at subnormal flush from trig functions (#186424) 2026-03-14 08:29:09 +01:00
Matt Arsenault
df4df088d8
libclc: Disable contract in trig reductions (#186432) 2026-03-14 08:28:40 +01:00
Wenju He
e945f7afbe
[libclc][CMake] Rename opencl to clc in add_libclc_library, update comment (#186544)
Align with cmake function name.
2026-03-14 10:04:19 +08:00
Wenju He
8175bd92ea
[libclc][CMake] Check SOURCES and LIBRARIES arguments are not empty (#186542) 2026-03-14 08:52:32 +08:00
Wenju He
5d3aae962d
[libclc][NFC] Rename three .inc files to avoid name conflicts (#186384)
Follow-up of 9b96ebc. There are binary_def.inc and unary_def.inc in
header directory.
- clc_ep.inc -> clc_ep_decl.inc
- relational/binary_def.inc -> relational/relational_binary_def.inc
- relational/unary_def.inc -> relational/relational_unary_def.inc
2026-03-14 07:44:11 +08:00
Wenju He
9b96ebcba5
[libclc] Rename declaration .inc files to *_decl.inc (#186340)
These .inc files in the header directory have the same name as .inc
files in implementation directory. Rename them to avoid name conflict
and avoid wrong file being used in implementation. This fixes bitcode
change when changing `#include <>` to `#include ""`.
2026-03-13 19:33:17 +08:00
Matt Arsenault
ecc4d3edc9
libclc: Fix mismatch in declared and defined function name (#186227) 2026-03-12 20:21:20 +00:00
Wenju He
d352aac32c
[libclc][CMake] Add check-libclc umbrella test target (#186053)
This allows running the full test suite using `ninja check-libclc`.
2026-03-12 19:55:18 +08:00
Matt Arsenault
a372eca60d
libclc: Improve minmag and maxmag (#186092)
Gives slightly better codegen.
2026-03-12 12:24:07 +01:00
Matt Arsenault
85e542fff3
libclc: Improve fdim handling (#186085)
The maxnum is somewhat overconstraining. This gives slightly
better codegen and avoids the noise from the select and convert,
and saves the cost of materializing the nan literal.
2026-03-12 11:52:51 +01:00
Matt Arsenault
ea86511528
libclc: Replace nextafter implementation (#186082)
Use a more straightforward version which allows
optimizations to delete the edge case checks, and also
codegens better. Implement in terms of new nextup and nextdown
helper functions, which are IEEE functions, and usable in other
functions.
2026-03-12 11:52:34 +01:00
Matt Arsenault
3c7f70bb9c
libclc: Replace fmod implementation with elementwise builtin (#186083)
This corresponds to frem, which for whatever reason is a first
class IR instruction. The backend has a heroic freestanding
implementation that should be nearly identical to what was here.
2026-03-12 11:47:39 +01:00
Matt Arsenault
d2c9ebf369
libclc: Update f64 log implementations (#186048)
The log implementation was originally ported from
rocm device libs way back in 44b6117dfde30d6cc292fabca8ecb0cef4657f7a.
Update this to a version derived from the latest. Leaves the float and
half cases alone.
2026-03-12 09:08:09 +00:00
Matt Arsenault
285f2debff
libclc: Add ep utility (#186047)
Add utility for compensated arithmetic, which should be used
by a number of the large functions.
2026-03-12 10:05:25 +01:00
Matt Arsenault
acc2285d18
libclc: Update logb implementation (#185881)
Similar to the previous logb change, use a common
bithacking free implementation.
2026-03-12 06:57:38 +00:00
Matt Arsenault
5365c57e14
libclc: Update ilogb implementation (#185877)
This was originally ported from rocm device libs in
d6d0454231ac489c50465d608ddf3f5d900e1535. Update for
more recent changes that were made there. This avoids
bithacking and improves value tracking. This also allows
using a common code path for all types.
2026-03-12 06:50:02 +00:00
Matt Arsenault
3da14eeacb
libclc: Update hypot implementation (#185873)
This avoids bithacking on the values and improves value
tracking.
2026-03-12 06:43:18 +00:00
Wenju He
aa16859162
[libclc][amdgpu] Add clc qualifier functions with non-const pointer (#186015)
Fix unresolved external functions: _Z15__clc_get_fencePv,
_Z15__clc_to_globalPv, _Z14__clc_to_localPv, _Z16__clc_to_privatePv
2026-03-12 14:08:34 +08:00
Matt Arsenault
7265d89ed6
libclc: Add frexp_exp utility function (#185872) 2026-03-12 06:54:55 +01:00
Michał Górny
eeef27e8a7
Revert "[libclc][NFC] Change include style from <...> to "..."" (#185888)
Reverts llvm/llvm-project#185788. This change is causing test
regressions in libclc, so it's definitely not "NFC", and with its size
it's hard to figure out what exactly went wrong.
2026-03-12 06:08:01 +08:00
Matt Arsenault
dc93e6e3a9
libclc: Add gentype infinity macro (#185864) 2026-03-11 13:09:28 +01:00
Wenju He
c0c8b992ef
[libclc][NFC] Change include style from <...> to "..." (#185788)
project-specific headers should use "". Keep #include <amdhsa_abi.h>
2026-03-11 19:04:24 +08:00
Matt Arsenault
9aba26bf47
libclc: Use frexp builtins to implement frexp for amdgpu (#185637)
This should really be the default implementation.
2026-03-11 11:17:39 +01:00