When SPIRV-LLVM-Translator is built in-tree (i.e., placed in
llvm/projects folder), llvm-spirv target exists.
Drop legacy llvm-spirv_target dependency (was for non-runtime build) and
add llvm-spirv to runtimes dependencies.
For AMDGPU these are identical to the uniform case. Stub out the missing
cases with traps to avoid test failures from undefined symbols while
keeping the structure consistent.
This was originally ported from rocm device libs in
c374cb76f467f01a3f60740703f995a0e1f7a89a. Merge in more
recent changes. Also enables vectorization.
Currently the build uses the default dummy target, which assumes
FMA is slow. Force this to assume fast fma, which is the case on
any remotely new hardware. In the future if we want better support
for older targets, there should be a separate build of the math
functions for the slow fma case.
This was defined in multiple places with different names. Consolidate
on one, with a gentype wrapper for it. Also set the value based on the
standard FP_FAST_FMA* macros.
libclc: Update acosh
This was originally ported from rocm device libs
in ca4d382e119e1389c83dbb07d9ca0085e88b2944. Merge in
more recent changes.
Remove unused ep_log.
This macro was originally introduced in 64735ad63975 for relational
built-ins. It is functionally identical to __CLC_S_GENTYPE. Replacing it
simplifies gentype.inc, which is widely used in the library.
(#187999)
This fixes conformance failures for double and
without -cl-denorms-are-zero. Optimizations are
able to eliminate the unusued quo handling without
duplicating most of the code.
This was failing in the float case without -cl-denorms-are-zero
and failing for double. This now passes in all cases.
This was originally ported from rocm device libs in
8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port
in of more recent changes with a few changes.
- Templatification, which almost but doesn't quite enable
vectorization yet due to the outer branch and loop.
- Merging of the 3 types into one shared code path, instead of
duplicating per type with 3 different functions implemented together.
There are only some slight differences for the half case, which mostly
evaluates as float.
- Splitting out of the is_odd tracking, instead of deriving it from the
accumulated quotient. This costs an extra register, but saves several
instructions. This also enables automatic elimination of all of the quo
output handling when this code is reused for remainder. I'm guessing
this would be unnecessary if SimplifyDemandedBits handled phis.
- Removal of the slow FMA path. I don't see how this would ever be
faster with the number of instructions replacing it. This is really a
problem for the compiler to solve anyway.
This is pretty verbose and ugly. We're pulling the base implementation
in for the double cases, and scalarizing it. Also fully defining the
half and float cases to directly use the intrinsic, for all vector
types. It would be much more convenient if we had linker based overrides
for the generic implementations, rather than per source file.
Follow the ordinary gentype conventions for the log implementation,
instead of using a plain header. This doesn't quite yet enable
vectorization, due to how the table is currently indexed. This should
make it easier for targets to selectively overload the function for
a subset of types.