Tue Ly
f320fefc4a
[libc][math] Implement erff function correctly rounded to all rounding modes.
...
Implement correctly rounded `erff` functions.
For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`.
For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval:
```
erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14).
```
For `x < 0`, we can use the same formula as above, since the odd part is factored out.
Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X:
Reciprocal throughput (clock cycles / op)
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with -march=native (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call;
-- CORE-MATH reciprocal throughput -- with -march=x86-64-v2 (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call;
-- LIBC reciprocal throughput -- with -mavx2 -mfma (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call;
-- LIBC reciprocal throughput -- with -msse4.2 (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call;
```
and latency (clock cycles / op):
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with -march=native (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call;
-- CORE-MATH latency -- with -march=x86-64-v2 (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call;
-- LIBC latency -- with -mavx2 -mfma (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call;
-- LIBC latency -- with -msse4.2 (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call;
```
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D153683
2023-06-28 13:58:37 -04:00
Tue Ly
e557b8a142
[libc][RISCV] Add log, log2, log1p, log10 for RISC-V64 entrypoints.
...
Add log, log2, log1p, log10 RISCV64 entrypoints.
Reviewed By: michaelrj, sivachandra
Differential Revision: https://reviews.llvm.org/D151674
2023-05-30 14:18:19 -04:00
Tue Ly
0bda541829
[libc][doc] Update math function status page to show more targets.
...
Show availability of math functions on each target.
Reviewed By: jeffbailey
Differential Revision: https://reviews.llvm.org/D151489
2023-05-25 19:24:33 -04:00
Kazu Hirata
9a515d8142
[libc] Fix typos in documentation
2023-05-22 23:27:59 -07:00
Kazu Hirata
e042efdab6
[libc] Fix typos in documentation
2023-04-24 23:31:48 -07:00
Tue Ly
f63025f52f
[libc][Obvious] Fix the performance table in math function documentation.
2023-04-18 14:10:26 -04:00
Tue Ly
9af8dca70f
[libc][math] Update range reduction step for log10f and reduce its latency.
...
Simplify the range reduction steps by choosing the reduction constants
carefully so that the reduced arguments v = r*m_x - 1 and v^2 are exact in double
precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the
polynomial evaluations to be parallelized more efficiently.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D147676
2023-04-07 10:31:46 -04:00
Tue Ly
6c7894a8e6
[libc][doc] Move docs/math.rst to docs/math/index.rst
...
Move docs/math.rst to docs/math/index.rst
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D144028
2023-02-14 13:41:44 -05:00
Tue Ly
5814b7b279
[libc][math] Implement log10 function correctly rounded for all rounding modes
...
Implement double precision log10 function correctly rounded for all
rounding modes. This implementation currently needs FMA instructions for
correctness.
Use 2 passes:
Fast pass:
- 1 step range reduction with a lookup table of `2^7 = 128` elements to reduce the ranges to `[-2^-7, 2^-7]`.
- Use a degree-7 minimax polynomial generated by Sollya, evaluated using a mixed of double-double and double precisions.
- Apply Ziv's test for accuracy.
Accurate pass:
- Apply 5 more range reduction steps to reduce the ranges further to [-2^-27, 2^-27].
- Use a degree-4 minimax polynomial generated by Sollya, evaluated using 192-bit precisions.
- By the result of Lefevre (add quote), this is more than enough for correct rounding to all rounding modes.
In progress: Adding detail documentations about the algorithm.
Depend on: https://reviews.llvm.org/D136799
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D139846
2023-01-08 17:41:54 -05:00