These should be implementable by checking the behavior of
the canonicalize intrinsic. Hack around spirv still failing
on canonicalize by overriding and assuming DAZ for float.
The base case is correct denormal handling, not flushing. This
also matches the spec controls, which starts at IEEE and
flushing is enabled with -cl-denorms-are-zero.
Also fix wrong defaults for half and double. Denormal support is
not optional for these.
It seems that ?: is not quite equivalent to select for floating-point
vectors. With ?:, the resulting IR involves integer bitcasts and
integer vector typed select. Use select so this is an fp-select. This
enables finite math only contexts to optimize out the select.
This feels like it's a clang bug though.
project-specific headers should use "". Keep #include <amdhsa_abi.h>
llvm-diff shows no change to libclc.bc for spir--, spir64--, nvptx64--,
nvptx64--nvidiacl, nvptx64-nvidia-cuda and amdgcn-amd-amdhsa-llvm when
LIBCLC_TARGETS_TO_BUILD is "all".
Verified that reversing spir64--/libclc.spv and spir--/libclc.spv to
LLVM bitcode shows no diff.
Also fix `__CLC_INTEGER_CLC_BITFIELD_EXTRACT_SIGNED_H__` guard per
copilot review.
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Nan should work on either path, but the small reduction
path is smaller. There's also possible codegen benefits to
knowing the large reduction will not need to handle nans.
The 4 flavors of pow were originally ported from rocm
device libs between c45ec604f593fcb03d770f4398142d2446017f68,
cc5c65b2c25e0a82fbad95f0ce3bb5262e29eeee, and
fe8e00bc3c65115b2e3d2a43cf3d0d756a934a52. Update to a newer
version. Additionally expose fast variants for use by the
libcall optimizer (e.g, __pow_fast) for float types.
The explicit handling of nan is unnecessary. Clamp infinities
to nan at the input. This allows optimizations of the following
implementation code to take advantage of the knowledge that it
does not need to handle infinities.
When LLVM_TARGETS_TO_BUILD contains host target, runtime build sets
CMAKE_C_COMPILER to clang-cl on Windows.
Changes to fix build on Windows:
- libclc struggles to pass specific flags to clang-cl MSVC-like interface.
- compile flag handling will be consistent across all host systems.
- libclc build is cross-compilation for offloading targets.
Fix "unknown target triple" errors when LLVM_TARGETS_TO_BUILD is empty.
Adding -disable-llvm-passes reduces this to a very basic sanity check
of Clang frontend. This allows the test to pass even if SPIR-V backend
is not enabled, as the frontend can still generate IR for the target.
Summary:
This can be made generic, which works as expected on NVPTX and SPIR-V.
We do not replace this for AMDGPU because the dedicated built-in has an
extra argument that controls whether or not local memory or global
memory will be invalidated. It would be correct to use this generic
operation there, but we'd lose that minor optimization so we likely
should not regress.
Follow-up of 9b96ebc. There are binary_def.inc and unary_def.inc in
header directory.
- clc_ep.inc -> clc_ep_decl.inc
- relational/binary_def.inc -> relational/relational_binary_def.inc
- relational/unary_def.inc -> relational/relational_unary_def.inc
These .inc files in the header directory have the same name as .inc
files in implementation directory. Rename them to avoid name conflict
and avoid wrong file being used in implementation. This fixes bitcode
change when changing `#include <>` to `#include ""`.
The maxnum is somewhat overconstraining. This gives slightly
better codegen and avoids the noise from the select and convert,
and saves the cost of materializing the nan literal.
Use a more straightforward version which allows
optimizations to delete the edge case checks, and also
codegens better. Implement in terms of new nextup and nextdown
helper functions, which are IEEE functions, and usable in other
functions.
This corresponds to frem, which for whatever reason is a first
class IR instruction. The backend has a heroic freestanding
implementation that should be nearly identical to what was here.
The log implementation was originally ported from
rocm device libs way back in 44b6117dfde30d6cc292fabca8ecb0cef4657f7a.
Update this to a version derived from the latest. Leaves the float and
half cases alone.
This was originally ported from rocm device libs in
d6d0454231ac489c50465d608ddf3f5d900e1535. Update for
more recent changes that were made there. This avoids
bithacking and improves value tracking. This also allows
using a common code path for all types.
Reverts llvm/llvm-project#185788. This change is causing test
regressions in libclc, so it's definitely not "NFC", and with its size
it's hard to figure out what exactly went wrong.