148 Commits

Author SHA1 Message Date
Jan Vesely
a95db14461 half_rsqrt: Cleanup implementation
Passes CTS on carrizo
v2: Use full precision implementation

Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 322887
2018-01-18 21:11:35 +00:00
Jan Vesely
fe8e00bc3c rootn: Port from amd_builtins
Passes piglit on turks and carrizo
fp64 passes ctx on carrizo

v2: fix formatting
    check fp32 denormal support at runtime

Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 322763
2018-01-17 21:22:14 +00:00
Jan Vesely
c45ec604f5 powr: Port from amd_builtins
Passes piglit on turks and carrizo
fp64 passes cts on carrizo

v2: fix formatting
    check fp32 denormal support at runtime

Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 322762
2018-01-17 21:22:06 +00:00
Jan Vesely
5efc8fe321 pown: Port from amd_builtins
Passes piglit on turks and carrizo
fp64 passes CTS on carrizo

v2: fix formatting
    check fp32 denormal support at runtime

Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 322761
2018-01-17 21:22:03 +00:00
Jan Vesely
cc5c65b2c2 pow: Port from amd_builtins
Passes piglit on turks and carrizo
fp64 passes CTS on carrizo

v2: fix formatting
    check fp32 denormal support at runtime

Reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 322760
2018-01-17 21:21:35 +00:00
Jan Vesely
fe7c045753 math: Implement minmag
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 318265
2017-11-15 04:10:39 +00:00
Jan Vesely
7ba243cc3d math: Implement maxmag
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 318264
2017-11-15 04:10:37 +00:00
Jan Vesely
383fbd050c native_powr: Switch implementation to native_exp2 and native_log2
v2: don't use assume
    check only for x<0, the other conditions are handled transparently
v3: don't check inputs at all, nan propagation works as expected

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 318204
2017-11-14 21:55:41 +00:00
Jan Vesely
f38b40daf7 native_divide: provide function implementation instead of macro
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 318067
2017-11-13 18:28:56 +00:00
Jan Vesely
1b9825f982 native_recip: provide function implementation instead of macro
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 318066
2017-11-13 18:28:53 +00:00
Jan Vesely
a6758c94ef native_rsqrt: Switch implementation to 1 / native_sqrt
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 318065
2017-11-13 18:28:51 +00:00
Jan Vesely
541a3f0758 native_tan: Switch implementation to use native_sin/native_cos
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 318064
2017-11-13 18:28:48 +00:00
Jan Vesely
79b7566210 math: Use precomputed constant for log2(10.0)
exp10 CTS fails with or without this change

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 318063
2017-11-13 18:28:45 +00:00
Jan Vesely
6b4a625438 native_exp10: Switch implementation to llvm intrinsic
v2: Use native_log2 instead of wrong constant

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317941
2017-11-10 22:16:41 +00:00
Jan Vesely
4301e6d0c9 native_sqrt: Switch implementation to llvm intrinsic
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317940
2017-11-10 22:16:39 +00:00
Jan Vesely
1f34c851e0 native_sin: Switch implementation to llvm intrinsic
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317939
2017-11-10 22:16:36 +00:00
Jan Vesely
0750b7df51 native_cos: Switch implementation to llvm intrinsic
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317938
2017-11-10 22:16:33 +00:00
Jan Vesely
edbde58de0 native_exp2: Switch implementation to llvm intrinsic
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317937
2017-11-10 22:16:31 +00:00
Jan Vesely
504f85c551 native_exp: Switch implementation to llvm intrinsic
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317936
2017-11-10 22:16:28 +00:00
Jan Vesely
adc1eaedf8 native_log10: Switch to generic native intrinsic inc file
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317934
2017-11-10 22:16:22 +00:00
Jan Vesely
086e796053 native_log: Switch to generic native intrinsic inc file
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317933
2017-11-10 22:16:20 +00:00
Jan Vesely
f58dee9f3a native_log2: Switch to generic native intrinsic inc file
v2: Add __CLC_XCONCAT instead of function name redirection
    Use __CLC_XCONCAT for intrinsic functions as well

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317932
2017-11-10 22:16:15 +00:00
Jan Vesely
47e093da9b math: Implement native_log10
Use llvm instrinsic by default
Provide amdgpu workaround

v2: drop old amd copyrights

Reviewer: Aaron Watry
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316588
2017-10-25 16:49:22 +00:00
Jan Vesely
9f7172965c math: Implement sinh function
mostly copied form amd_builtins

llvm-svn: 296233
2017-02-25 02:46:53 +00:00
Aaron Watry
c606efabb7 math: Add logb builtin
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292335
2017-01-18 03:14:10 +00:00
Aaron Watry
900bd7eb7f math: Add expm1 builtin function
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292334
2017-01-18 03:13:37 +00:00
Aaron Watry
af569547fa math: Implement tgamma
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281566
2016-09-15 00:17:34 +00:00
Aaron Watry
e9009cdd21 math: Implement lgamma
Just use lgamma_r and ignore the value returned in the second argument

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281565
2016-09-15 00:17:31 +00:00
Aaron Watry
0ab07e1bde math: Implement lgamma_r
Ported from the amd-builtins branch, which is itself based on the
Sun Microsystems implementation.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281564
2016-09-15 00:17:28 +00:00
Matt Arsenault
fbfd828d2a Replace nextafter implementation
This one passes conformance.

llvm-svn: 280961
2016-09-08 16:37:56 +00:00
Tom Stellard
d835b3f1af Implement cbrt builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276497
2016-07-22 23:45:15 +00:00
Tom Stellard
9cb070f96a Implement cosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276496
2016-07-22 23:45:13 +00:00
Jan Vesely
973c1fa5f5 math: Use single precision fmax in sp path
Fixes fdim piglit on Turks

v2: use CL fmax instead of __builtin

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom.stellard@amd.com>
llvm-svn: 269807
2016-05-17 19:44:01 +00:00
Jan Vesely
c374cb76f4 math: Add erf ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste,
aside from the removed (optional) code in float function body that
requires subnormals.

reviewers: jvesely

Patch by: Vedran Miletić <rivanvx@gmail.com>

llvm-svn: 268766
2016-05-06 18:02:30 +00:00
Aaron Watry
55a8e0fd6d math: Add fdim implementation
Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.

Passes piglit (float) tests on pitcairn.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 268708
2016-05-06 03:34:45 +00:00
Aaron Watry
09f3c99a86 math: Fix ilogb(double) return type
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 261714
2016-02-24 00:52:15 +00:00
Aaron Watry
d6d0454231 math: Add ilogb ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste
with usage of the CLC wrappers to vectorize them.

This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are
equal to the results of ilogb(0.0f) and ilogb(float nan) respectively.

v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

llvm-svn: 261639
2016-02-23 14:43:09 +00:00
Jan Vesely
7fbb96b907 math: Fix log2 vectorization on non-fp64 hw
reviewer: tstellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260301
2016-02-09 22:17:42 +00:00
Aaron Watry
8872800eff math: Add frexp ported from amd-builtins
The float implementation is almost a direct port from the amd-builtins,
but instead of just having a scalar and float4 implementation, it has
a scalar and arbitrary width vector implementation.

The double scalar is also a direct port from AMD's builtin release.

The double vector implementation copies the logic in the float vector
implementation using the values from the double scalar version.

Both have been tested in piglit using tests sent to that project's
mailing list.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260114
2016-02-08 17:07:21 +00:00
Tom Stellard
37d19875fa Implement modf math builtin
V2: use the reference implementation as suggested by Matt Arsenault

Patch By: Pavel Ondračka

llvm-svn: 258933
2016-01-27 14:52:10 +00:00
Niels Ole Salscheider
f51df5ba8c Implement tanh builtin
This is a port from the AMD builtin library.

llvm-svn: 248780
2015-09-29 06:39:09 +00:00
Tom Stellard
7a09e88b6e Fix double implementation of log
We need to use M_LOG2E instead of M_LOG2E_F.

llvm-svn: 243132
2015-07-24 18:07:14 +00:00
Tom Stellard
44b6117dfd Implement accurate log2 function
Use the implementation was ported from the AMD builtin library rather
than LLVM Intrinsics.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 243131
2015-07-24 18:07:12 +00:00
Tom Stellard
f01ffa9ddc Use llvm intrinsics for native_log and native_log2
llvm-svn: 243130
2015-07-24 18:07:06 +00:00
Tom Stellard
2ef5ec6b2b Fix implementation of sqrt v2
Passing values less than 0 to the llvm.sqrt() intrinsic results in
undefined behavior, so we need to check the input and return NaN if
is is less than 0.

v2:
  - Fix build failures.

llvm-svn: 241906
2015-07-10 13:37:07 +00:00
Tom Stellard
a64bad8338 Use a more accurate implementation for exp
Using exp2(x * M_LOG2E_F) does not give us accurate enough results for
OpenCL.  If you look at the new exp implementation you'll see that
it does multiply the input by M_LOG2E_F, but it still uses the original
input in part of the calculation.

This exp implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237229
2015-05-13 03:55:09 +00:00
Tom Stellard
d538fdc217 Implement exp2 using OpenCL C rather than using an intrinsic
Not all targets support the intrinsic, so it's better to have a
generic implementation which does not use it.

This exp2 implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237228
2015-05-13 03:55:07 +00:00
Tom Stellard
4294541290 Implement sin for double types
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237155
2015-05-12 17:18:47 +00:00
Tom Stellard
2e6ff0c66e Implement cos for double types
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237154
2015-05-12 17:18:46 +00:00
Tom Stellard
37406a209c Implement atan2pi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237138
2015-05-12 14:48:26 +00:00