llvm-project

Author	SHA1	Message	Date
Anton Rydahl	e774482c4c	Fixed typo in GPU libm device library warning (#69752 ) Correcting a small typo in the error message when the CUDA device libraries are not detected.	2023-10-20 12:17:26 -07:00
Anton Rydahl	c73ad025b1	[libc][libm][GPU] Add missing vendor entrypoints to the GPU version of `libm` (#66034 ) This patch populates the GPU version of `libm` with missing vendor entrypoints. The vendor math entrypoints are disabled by default but can be enabled with the CMake option `LIBC_GPU_VENDOR_MATH=ON`.	2023-10-19 12:24:50 -07:00
Joseph Huber	fa23a2396b	[libc] Fix linking of AMDGPU device runtime control constants for math (#65676 ) Summary: Currently, `libc` temporarily provides math by linking against existing vendor implementations. To use the AMDGPU DeviceRTL we need to define a handful of control constants that alter behaviour for architecture specific things. Previously these were marked `extern const` because they must be present when we link-in the vendor bitcode library. However, this causes linker errors if more than one math function was used. This patch fixes the issue by marking these functions as used and inline on top of being external. This means that they are linkable, but it gives us `linkonce_odr` semantics. The downside is that these globals won't be optimized out, but it allows us to perform constant propagation on them unlike using `weak`.	2023-10-06 21:50:35 -05:00
lntue	da28593d71	[libc][math] Implement double precision expm1 function correctly rounded for all rounding modes. (#67048 ) Implementing expm1 function for double precision based on exp function algorithm: - Reduced x = log2(e) * (hi + mid1 + mid2) + lo, where: * hi is an integer * mid1 * 2^-6 is an integer * mid2 * 2^-12 is an integer * \|lo\| < 2^-13 + 2^-30 - Then exp(x) - 1 = 2^hi * 2^mid1 * 2^mid2 * exp(lo) - 1 ~ 2^hi * (2^mid1 * 2^mid2 * (1 + lo * P(lo)) - 2^(-hi) ) - We evaluate fast pass with P(lo) is a degree-3 Taylor polynomial of (e^lo - 1) / lo in double precision - If the Ziv accuracy test fails, we use degree-6 Taylor polynomial of (e^lo - 1) / lo in double double precision - If the Ziv accuracy test still fails, we re-evaluate everything in 128-bit precision.	2023-09-28 16:43:15 -04:00
Guillaume Chatelet	b6bc9d72f6	[libc] Mass replace enclosing namespace (#67032 ) This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079	2023-09-26 11:45:04 +02:00
Guillaume Chatelet	270547f3bf	[libc][clang-tidy] Add llvm-header-guard to get consistant naming and prevent file copy/paste issues. (#66477 )	2023-09-21 11:14:47 +02:00
Tue Ly	84c899b235	[libc][math] Extract non-MPFR math tests into libc-math-smoke-tests. Extract non-MPFR math tests into libc-math-smoke-tests. Reviewed By: sivachandra, jhuber6 Differential Revision: https://reviews.llvm.org/D159477	2023-09-19 12:10:21 -04:00
Joseph Huber	d6cc3410ab	[libc] Fix missing GPU math implementations (#65616 ) These functions were implemented by simply calling their `__builtin_*` equivalents. The builtins were resolving to the libc functions back again. This patch adds explicit vendor versions for these functions to avoid the recursion.	2023-09-07 11:48:44 -05:00
Tue Ly	f0d05bb699	[libc][math] Fix signed zeros for acosf, acoshf, and atanf in FE_DOWNWARD mode. Fix signed zeros for acosf, acoshf, and atanf in FE_DOWNWARD mode. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D159476	2023-09-07 15:21:33 +00:00
Tue Ly	76bb278ebb	[libc][math] Implement double precision exp10 function correctly rounded for all rounding modes. Implement double precision exp10 function correctly rounded for all rounding modes. Using the same algorithm as double precision exp (https://reviews.llvm.org/D158551) and exp2 (https://reviews.llvm.org/D158812) functions. Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D159143	2023-08-30 08:43:50 -04:00
Tue Ly	8ca614aa22	[libc][math] Implement double precision exp2 function correctly rounded for all rounding modes. Implement double precision exp2 function correctly rounded for all rounding modes. Using the same algorithm as double precision exp function in https://reviews.llvm.org/D158551. Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D158812	2023-08-25 10:15:08 -04:00
Tue Ly	434bf16084	[libc][math] Implement double precision exp function correctly rounded for all rounding modes. Implement double precision exp function correctly rounded for all rounding modes. Using 4 stages: - Range reduction: reduce to `exp(x) = 2^hi * 2^mid1 * 2^mid2 * exp(lo)`. - Use 64 + 64 LUT for 2^mid1 and 2^mid2, and use cubic Taylor polynomial to approximate `(exp(lo) - 1) / lo` in double precision. Relative error in this step is bounded by 1.5 * 2^-63. - If the rounding test fails, use degree-6 Taylor polynomial to approximate `exp(lo)` in double-double precision. Relative error in this step is bounded by 2^-99. - If the rounding test still fails, use degree-7 Taylor polynomial to compute `exp(lo)` in ~128-bit precision. Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D158551	2023-08-24 10:17:17 -04:00
Joseph Huber	b20a385422	[libc] Do not run tests on vendor implemented math We currently remap vendor implementations of math functions to provide a temporarily functional `libm.a` for the GPU. However, we should not run tests on any files that depend on these vendor implementations as they are not under our control and are not always present. The goal in the future is to remove the need for this by replacing all the vendor functionality, but for now this is a workaround. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D158213	2023-08-17 17:22:55 -05:00
Joseph Huber	bb830bca04	[libc][NFC] Suppress warnings on floating point conversions Summary: We implement round by implicitly converting these floating point values. Sometimes this emits warnings that we should silence by making these explicit casts.	2023-08-15 12:47:44 -05:00
Michael Jones	e328d19302	[libc] Fix final conversion warnings This patch fixes the floating point conversion warnings found with `-Wconversion` and `-Wno-sign-conversion`. These were the last warnings I found, meaning that once this lands https://reviews.llvm.org/D156630 should be unblocked. Reviewed By: mcgrathr, lntue Differential Revision: https://reviews.llvm.org/D157449	2023-08-09 10:24:03 -07:00
Anton Rydahl	53f5bfdb58	[libc][libm][GPU] Populating 'libmgpu.a' for math on the GPU This commit populates `libmgpu.a` with wrappers for the following built-ins - modf, modff - nearbyint, nearbyintf - remainder, remainderf - remquo, remquof - rint, rintf - scalbn, scalbnf - sqrt, sqrtf - tan, tanf - tanh, tanhf - trunc, truncf and wrappers the following vendor implementations - nextafter, nextafterf - sincos, sincosf - sinh, sinhf - sinf - tan, tanf - tanh, tanhf Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D153395	2023-08-01 13:34:43 -07:00
Joseph Huber	d1417f431f	[libc] Fix missing bitcode flags passed to GPU vendor math Summary: A previous patch missed adding these to all the definitions.	2023-07-26 09:53:38 -05:00
Ethan Luis McDonough	546c9b3f6a	[libc] Add math functions to AMD/NVPTX libm Related to D152486. The following functions are included in this revision: `acosf`, `acoshf`, `asinf`, `asinhf`, `atanf`, `atanhf`, `ceil`, `ceilf`, `copysign`, `copysignf`, `cos`, `cosf`, `cosh`, `coshf`, `exp10f`, `exp2f`, `expf`, `expm1f`, `fabs`, `fabsf`, `fdim`, `fdimf`, `floor`, `floorf`, `fma`, `fmaf`, `fmax`, `fmaxf`, `fmin`, `fminf`, `fmod`, `fmodf`, `frexp`, `frexpf`, `hypot`, `hypotf`, `ilogb`, `ilogbf`, `ldexp`, `ldexpf`, `llrint`, `llrintf`, `llround`, `llroundf`, `pow`, and `powf`. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D152603	2023-07-26 03:02:24 -05:00
Joseph Huber	7c8a52f90c	[libc][Obvious] Fix AMDGPU control constant for vendor sqrt Summary: This is supposed to be enabled to say that we want correct sqrt by default.	2023-07-25 07:59:23 -05:00
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Tue Ly	f320fefc4a	[libc][math] Implement erff function correctly rounded to all rounding modes. Implement correctly rounded `erff` functions. For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`. For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval: ``` erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14). ``` For `x < 0`, we can use the same formula as above, since the odd part is factored out. Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X: Reciprocal throughput (clock cycles / op) ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call; -- CORE-MATH reciprocal throughput -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call; -- LIBC reciprocal throughput -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call; -- LIBC reciprocal throughput -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call; ``` and latency (clock cycles / op): ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call; -- CORE-MATH latency -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call; -- LIBC latency -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call; -- LIBC latency -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call; ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153683	2023-06-28 13:58:37 -04:00
Tue Ly	9532074a9d	[libc][math] Clean up exhaustive tests implementations. Clean up exhaustive tests. Let check functions return number of failures instead of passed/failed. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D153682	2023-06-28 07:58:46 -04:00
Tue Ly	46aa659a32	[libc][math] Improve exp2f performance. Re-organize special cases and add a special case when `\|x\| < 2^-5`. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153134	2023-06-20 09:34:20 -04:00
Tue Ly	0ae409c0d7	[libc][math] Slightly improve sinhf and coshf performance. Re-order exceptional branches and slightly adjust the evaluation. Depends on https://reviews.llvm.org/D153026 . Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153062	2023-06-20 09:27:28 -04:00
Tue Ly	5dbd5118ec	[libc][math] Improve tanhf performance. Re-order exceptional branches and slightly adjust the evaluation. Performance tested with the CORE-MATH project on AMD EPYC 7B12 (clocks/op) Reciprocal throughputs: ``` --- BEFORE --- $ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 7.794 + 0.102 clc/call; Median-Min = 0.066 clc/call; Max = 8.267 clc/call; [####################] 100 %. (with -msse4.2) Ntrial = 20 ; Min = 10.783 + 0.172 clc/call; Median-Min = 0.144 clc/call; Max = 11.446 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 18.926 + 0.381 clc/call; Median-Min = 0.342 clc/call; Max = 19.623 clc/call; --- AFTER --- $ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 6.598 + 0.085 clc/call; Median-Min = 0.052 clc/call; Max = 6.868 clc/call; [####################] 100 % (with -msse4.2) Ntrial = 20 ; Min = 9.245 + 0.304 clc/call; Median-Min = 0.248 clc/call; Max = 10.675 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 11.724 + 0.440 clc/call; Median-Min = 0.444 clc/call; Max = 12.262 clc/call; ``` Latency: ``` --- BEFORE --- $ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 38.821 + 0.157 clc/call; Median-Min = 0.122 clc/call; Max = 39.539 clc/call; [####################] 100 %. (with -msse4.2) Ntrial = 20 ; Min = 44.767 + 0.766 clc/call; Median-Min = 0.681 clc/call; Max = 45.951 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 55.055 + 1.512 clc/call; Median-Min = 1.571 clc/call; Max = 57.039 clc/call; --- AFTER --- $ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 36.147 + 0.194 clc/call; Median-Min = 0.181 clc/call; Max = 36.536 clc/call; [####################] 100 % (with -msse4.2) Ntrial = 20 ; Min = 40.904 + 0.728 clc/call; Median-Min = 0.557 clc/call; Max = 42.231 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 55.776 + 0.557 clc/call; Median-Min = 0.542 clc/call; Max = 56.551 clc/call; ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153026	2023-06-20 09:25:07 -04:00
Joseph Huber	8060d96aed	[libc] Begin implementing a 'libmgpu.a' for math on the GPU This patch adds an outline to begin adding a `libmgpu.a` file for provindg math on the GPU. Currently, this is most likely going to be wrapping around existing vendor libraries and placing them in a more usable format. Long term, we would like to provide our own implementations of math functions that can be used instead. This patch works by simply forwarding the calls to the standard C math library calls like `sin` to the appropriate vendor call like `__nv_sin`. Currently, we will use the vendor libraries directly and link them in via `-mlink-builtin-bitcode`. This is necessary because of bizarre interactions with the generic bitcode, `-mlink-builtin-bitcode` internalizes and only links in the used symbols, furthermore is propagates the target's default attributes and its the only "truly" correct way to pull in these vendor bitcode libraries without error. If the vendor libraries are not availible at build time, we will still create the `libmgpu.a`, but we will expect that the vendor library definitions will be provided by the user's compilation as is made possible by https://reviews.llvm.org/D152442. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152486	2023-06-14 12:59:15 -05:00
Tue Ly	1557256ab0	[libc] Add Int<> type and fix (U)Int<128> compatibility issues. Add Int<> and Int128 types to replace the usage of __int128_t in math functions. Clean up to make sure that (U)Int128 and __(u)int128_t are interchangeable in the code base. Reviewed By: sivachandra, mikhail.ramalho Differential Revision: https://reviews.llvm.org/D152459	2023-06-13 09:40:48 -04:00
Tue Ly	a982431295	[libc] Add platform independent floating point rounding mode checks. Many math functions need to check for floating point rounding modes to return correct values. Currently most of them use the internal implementation of `fegetround`, which is platform-dependent and blocking math functions to be enabled on platforms with unimplemented `fegetround`. In this change, we add platform independent rounding mode checks and switching math functions to use them instead. https://github.com/llvm/llvm-project/issues/63016 Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152280	2023-06-12 09:36:41 -04:00
Tue Ly	c0a751ae3d	[libc] Fix undefined behavior of left shifting signed integer in exp2f.cpp. Fix undefined behavior of left shifting signed integer in exp2f.cpp. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152336	2023-06-07 01:15:18 -04:00
Tue Ly	a2ac3678cd	[libc][bazel] Add log, log2, log10, log1p to bazel layout. Add log, log2, log10, log1p and their unit tests to bazel layout. Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D151252	2023-05-24 07:43:58 -04:00
Tue Ly	b91e78da37	[libc][math] Implement double precision log1p correctly rounded to all rounding modes. Implement double precision log1p function correctly rounded to all rounding modes. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%. - Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call). - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log1p GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log1p --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log1p --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 760.444 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 827.880 -- LIBC latency -- with FMA 711.837 -- LIBC latency -- without FMA 764.317 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D151049	2023-05-23 11:04:04 -04:00
Tue Ly	111d274841	[libc][math] Implement double precision log2 function correctly rounded to all rounding modes. Implement double precision log2 function correctly rounded to all rounding modes. See https://reviews.llvm.org/D150014 for a more detail description of the algorithm. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log2 --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log2 --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 177.632 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 231.332 -- LIBC latency -- with FMA 459.751 -- LIBC latency -- without FMA 463.850 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150374	2023-05-23 10:49:30 -04:00
Tue Ly	a68bbf42fa	[libc][math] Implement double precision log function correctly rounded to all rounding modes. Implement double precision log function correctly rounded to all rounding modes. See https://reviews.llvm.org/D150014 for a more detail description of the algorithm. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 598.306 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 632.925 -- LIBC latency -- with FMA 455.632 -- LIBC latency -- without FMA 488.564 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150131	2023-05-23 10:35:15 -04:00
Tue Ly	a0c92a3817	[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance. Make log10 correctly rounded for non-FMA targets and improve its performance. Implemented fast pass and accurate pass: Fast Pass: - Range reduction step 0: Extract exponent and mantissa ``` x = 2^(e_x) * m_x ``` - Range reduction step 1: Use lookup tables of size 2^7 = 128 to reduce the argument to: ``` -2^-8 <= v = r * m_x - 1 < 2^-7 where r = 2^-8 * ceil( 2^8 * (1 - 2^-8) / (1 + k * 2^-7) ) and k = trunc( (m_x - 1) * 2^7 ) ``` - Polynomial approximation: approximate `log(1 + v)` by a degree-7 polynomial generated by Sollya with: ``` > P = fpminimax((log(1 + x) - x)/x^2, 5, [\|D...\|], [-2^-8, 2^-7]); ``` - Combine the results: ``` log10(x) ~ ( e_x * log(2) - log(r) + v + v^2 * P(v) ) * log10(e) ``` - Perform additive Ziv's test with errors bounded by `P_ERR * v^2`. Return the result if Ziv's test passed. Accurate Pass: - Take `e_x`, `v`, and the lookup table index from the range reduction step of fast pass. - Perform 3 more range reduction steps: - Range reduction step 2: Use look-up tables of size 193 to reduce the argument to `[-0x1.3ffcp-15, 0x1.3e3dp-15]` ``` v2 = r2 * (1 + v) - 1 = (1 + s2) * (1 + v) - 1 = s2 + v + s2 * v where r2 = 2^-16 * round ( 2^16 / (1 + k * 2^-14) ) and k = trunc( v * 2^14 + 0.5 ). ``` - Range reduction step 3: Use look-up tables of size 161 to reduce the argument to `[-0x1.01928p-22 , 0x1p-22]` ``` v3 = r3 * (1 + v2) - 1 = (1 + s3) * (1 + v2) - 1 = s3 + v2 + s3 * v2 where r3 = 2^-21 * round ( 2^21 / (1 + k * 2^-21) ) and k = trunc( v * 2^21 + 0.5 ). ``` - Range reduction step 4: Use look-up tables of size 130 to reduce the argument to `[-0x1.0002143p-29 , 0x1p-29]` ``` v4 = r4 * (1 + v3) - 1 = (1 + s4) * (1 + v3) - 1 = s4 + v3 + s4 * v3 where r4 = 2^-28 * round ( 2^28 / (1 + k * 2^-28) ) and k = trunc( v * 2^28 + 0.5 ). ``` - Polynomial approximation: approximate `log10(1 + v4)` by a degree-4 minimax polynomial generated by Sollya with: ``` > P = fpminimax(log10(1 + x)/x, 3, [\|128...\|], [-0x1.0002143p-29 , 0x1p-29]); ``` - Combine the results: ``` log10(x) ~ e_x * log10(2) - log10(r) - log10(r2) - log10(r3) - log10(r4) + v * P(v) ``` - The combined results are computed using floating points of 128-bit precision. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.92%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log10 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.402 + 0.589 clc/call; Median-Min = 0.277 clc/call; Max = 22.752 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 75.797 + 3.317 clc/call; Median-Min = 3.407 clc/call; Max = 79.371 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 22.668 + 0.184 clc/call; Median-Min = 0.181 clc/call; Max = 23.205 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 25.977 + 0.183 clc/call; Median-Min = 0.138 clc/call; Max = 26.283 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 22.140 + 0.980 clc/call; Median-Min = 0.853 clc/call; Max = 23.790 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log10 --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 54.613 + 0.357 clc/call; Median-Min = 0.287 clc/call; Max = 55.701 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 79.681 + 0.482 clc/call; Median-Min = 0.294 clc/call; Max = 81.604 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 61.532 + 0.208 clc/call; Median-Min = 0.199 clc/call; Max = 62.256 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 41.510 + 0.205 clc/call; Median-Min = 0.244 clc/call; Max = 41.867 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 55.669 + 0.240 clc/call; Median-Min = 0.280 clc/call; Max = 56.056 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log10 --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 640.688 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 667.354 -- LIBC latency -- with FMA 495.593 -- LIBC latency -- without FMA 504.143 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150014	2023-05-23 10:18:23 -04:00
Tue Ly	f79264b5f6	[libc][math] Remove placeholder implementations of asin and pow. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D148781	2023-04-20 01:28:16 -04:00
Siva Chandra Reddy	9a579d62e8	[libc][math] Remove the unused dp_trig target and tests.	2023-04-19 23:11:34 +00:00
Tue Ly	92bc7f5428	[libc][math] Update range reduction step for log2f and improve its performance. Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = r*m_x - 1 and v^2 are exact in double precision, even without FMA instructions, and -2^-8 <= v < 2^-7. Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D147759	2023-04-11 09:43:23 -04:00
Tue Ly	bc8e87ef4a	[libc][math] Update range reduction step for logf and reduce its latency. Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = r*m_x - 1 and v^2 are exact in double precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the polynomial evaluations to be parallelized more efficiently. Reviewed By: santoshn, zimmermann6 Differential Revision: https://reviews.llvm.org/D147755	2023-04-10 13:00:37 -04:00
Tue Ly	9af8dca70f	[libc][math] Update range reduction step for log10f and reduce its latency. Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = r*m_x - 1 and v^2 are exact in double precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the polynomial evaluations to be parallelized more efficiently. Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D147676	2023-04-07 10:31:46 -04:00
Alex Brachet	2e62cab31e	[libc][NFC] Fix conversion warning	2023-03-28 20:24:20 +00:00
Roland McGrath	0be1fbac2a	[libc] Remove unused aarch64 sqrt and sqrtf implementations These files are not used because the generic sqrt and sqrtf functions already go through internal layers that reach the machine-specific internal implemenations. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D146865	2023-03-24 21:15:20 -07:00
Alex Brachet	a86cc8341d	[libc] Move fma and fmaf into generic dir Differential Revision: https://reviews.llvm.org/D146740	2023-03-23 18:43:09 +00:00
Alex Brachet	7d11a592c5	[libc] Fix some math conversion warnings Differential Revision: https://reviews.llvm.org/D146738	2023-03-23 17:07:19 +00:00
Tue Ly	e35c71493b	[libc][NFC] Clean up clang-tidy warnings for `src/__support` and `src/math`. Clean up some warnings from running libc-lint for these folders. Reviewed By: michaelrj, sivachandra Differential Revision: https://reviews.llvm.org/D146048	2023-03-15 18:47:31 -04:00
Siva Chandra Reddy	adff2b291c	[libc][NFC] Switch all uses of errno in math and math tests to libc_errno.	2023-03-13 22:22:00 +00:00
Tue Ly	31c39439a8	[libc][math] Switch math functions to use libc_errno and fix some errno and floating point exceptions. Switch math functions to use libc_errno and fix some errno and floating point exceptions Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D145349	2023-03-07 00:51:16 -05:00
Tue Ly	0aa9593c2f	[libc][math] Set floating point exceptions for exp*f, sinhf, and coshf. Set FE_OVERFLOW and FE_UNDERFLOW for expf, exp2f, exp10f, expm1f, sinhf and coshf. Reviewed By: sivachandra, renyichen Differential Revision: https://reviews.llvm.org/D144340	2023-02-24 12:56:39 -05:00
Tue Ly	4663d784dd	[libc] Update macros/optimization.h build dependency for CMake and Bazel. Update macros/optimization.h build dependency for CMake and Bazel. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D143805	2023-02-11 01:24:48 -05:00
Tue Ly	ae2d8b4971	[libc][math] Update exceptional cases for logf, log10f, log2f, log1pf. Properly set floating point exceptions and add more exceptional values for non-FMA x86-64 targets. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D143699	2023-02-10 14:08:50 -05:00
Guillaume Chatelet	737e1cd161	[libc] Move likely/unlikely to the optimization header	2023-02-10 15:31:28 +00:00

1 2 3 4

200 Commits