1859 Commits

Author SHA1 Message Date
Joseph Huber
4311246a3a Revert "[libc] Enable hermetic floating point tests"
This passed locally but unfortauntely it seems some tests are not ready
to be made hermetic. Revert for now until we can investigate
specifically which tests are failing and mark those as `UNIT_TEST_ONLY`.

This reverts commit 417ea79e792a87d53f5ac4f5388af4b25aa04d7d.
2023-05-25 19:15:02 -05:00
Joseph Huber
417ea79e79 [libc] Enable hermetic floating point tests
This patch enables us to run the floating point tests as hermetic.
Importantly we now use the internal versions of the `fesetround` and
`fegetround` functions.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151123
2023-05-25 19:08:44 -05:00
Tue Ly
0aa7ea4e22 [libc][darwin] Add OSUtil for darwin arm64 target so that unit tests can be run.
Currently unit tests cannot be run on macOS due to missing OSUtil.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151377
2023-05-25 19:25:24 -04:00
Tue Ly
0bda541829 [libc][doc] Update math function status page to show more targets.
Show availability of math functions on each target.

Reviewed By: jeffbailey

Differential Revision: https://reviews.llvm.org/D151489
2023-05-25 19:24:33 -04:00
Roland McGrath
65c78933ae [libc] Support LIBC_COPT_USE_C_ASSERT build flag
In this mode, LIBC_ASSERT is just standard C assert.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D151498
2023-05-25 14:10:33 -07:00
Roland McGrath
1d4e8f0ea6 [libc] Fix compilation issues in memory_check_utils.h
Strict warnings require explicit static_cast to counteract
default widening of types narrower than int.

Functions in header files should have vague linkage (inline
keyword), not internal linkage (static) or external linkage
(no inline keyword) even for template functions.  Note these
don't use the LIBC_INLINE macro since this is only for test code.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D151494
2023-05-25 14:09:14 -07:00
Siva Chandra Reddy
daeee56798 [libc] Add macro LIBC_THREAD_LOCAL.
It resolves to thread_local on all platform except for the GPUs on which
it resolves to nothing. The use of thread_local in the source code has been
replaced with the new macro.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D151486
2023-05-25 19:53:52 +00:00
Guillaume Chatelet
298843cd66 [libc][test] Drastically reduce mem test runtime
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D151450
2023-05-25 15:05:21 +00:00
Tobias Hieta
f98ee40f4b
[NFC][Py Reformat] Reformat python files in the rest of the dirs
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: jhenderson, #libc, Mordante, sivachandra

Differential Revision: https://reviews.llvm.org/D150784
2023-05-25 11:17:05 +02:00
Siva Chandra Reddy
99a493e3e3 [libc] Make hermetic test depend on the unit test if it exists.
We want to do this so that build system like ninja don't end up running
the hermetic and unit tests in parallel. Running in parallel can cause
problems for tests which read/write disk files as the hermetic and unit
tests can end up stepping on each other.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D151291
2023-05-25 06:27:00 +00:00
Siva Chandra Reddy
25174976e1 [libc] Rearrange error and signal tables.
This is largely a cosmetic change done with a few goals:
1. Reduce the conditionals in picking the correct set of tables for the
   platform.
2. Avoid exposing, for example Linux errors, when building for non-Linux
   platforms. This also prevents build failures when Linux errors are not
   defined on the target non-Linux platform.
3. Some "_table" suffixes have been removed to avoid repeated
   occurance of "table" like "tables/linux_error_table.h".

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151367
2023-05-25 05:51:32 +00:00
Guillaume Chatelet
bb4f88f9b9 [libc] simplify test for getrandom
`getrandom` is implemented as a syscall.
We don't want to test linux implementation of the syscall. We just want to verify that it reacts as expected to sensible values.

Runtime before
```
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag
[       OK ] LlvmLibcGetRandomTest.InvalidFlag (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.InvalidBuffer
[       OK ] LlvmLibcGetRandomTest.InvalidBuffer (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.ReturnsSize
[       OK ] LlvmLibcGetRandomTest.ReturnsSize (took 83 ms)
[ RUN      ] LlvmLibcGetRandomTest.PiEstimation
[       OK ] LlvmLibcGetRandomTest.PiEstimation (took 9882 ms)
```

Runtime after
```
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag
[       OK ] LlvmLibcGetRandomTest.InvalidFlag (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.InvalidBuffer
[       OK ] LlvmLibcGetRandomTest.InvalidBuffer (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.ReturnsSize
[       OK ] LlvmLibcGetRandomTest.ReturnsSize (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.CheckValue
[       OK ] LlvmLibcGetRandomTest.CheckValue (took 0 ms)
```

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D151336
2023-05-24 15:07:51 +00:00
Tue Ly
8c6b83dcfd [libc] Reduce the sizes of some math tests that take longest time.
Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D151256
2023-05-24 08:55:51 -04:00
Tue Ly
a2ac3678cd [libc][bazel] Add log, log2, log10, log1p to bazel layout.
Add log, log2, log10, log1p and their unit tests to bazel layout.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D151252
2023-05-24 07:43:58 -04:00
Tue Ly
7cbcc581a5 [libc] Change UInt integer conversion operators to use standard types.
This fixes an issue with missing `unsigned long` conversion on macOS.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151234
2023-05-23 14:12:46 -04:00
Joseph Huber
99c9515b37 [libc][obvious] Correctly hoist mask out of the loop
Summry:
This was accidentally dropped from a previous patch following a rebase.
Fix it to where it's consistent.

Differential Revision: https://reviews.llvm.org/D151232
2023-05-23 12:21:10 -05:00
Joseph Huber
e826762a08 [libc] More efficiently send bytes via send_n and recv_n
Currently we have the `send_n` and `recv_n` routines to stream data,
such as a string to print, to the other side. The first operation is to
send the size so the other side knows the number of bytes to recieve.
However, this wasted 56 bytes that could've been sent. This meant that
small values, like the arguments to a function to call on the host for
example, needed to perform an extra send. This patch sends the first 56
bytes in the first packet and continues if necessary.

Depends on D150992

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D151041
2023-05-23 10:59:47 -05:00
Joseph Huber
29d3da3b86 [libc] Fix the send_n and recv_n utilities under divergent lanes
We provide the `send_n` and `recv_n` utilities as a generic way to
stream data between both sides of the process. This was previously
tested and performed as expected when using a string of constant size.
However, when the size was allowed to diverge between the threads in the
warp or wavefront this could deadlock. This did not occur on NVPTX
because of the use of the explicit warp sync. However, on AMD one of the
work items in the wavefront could continue executing and hit the next
`recv` call before the other threads, then we would deadlock as we
violated the RPC invariants.

This patch replaces the for loop with a thread ballot. This will cause
every thread in the warp or wavefront to continue executing the loop
until all of them can exit. This acts as a more explicit wavefront sync.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D150992
2023-05-23 10:59:47 -05:00
Tue Ly
b91e78da37 [libc][math] Implement double precision log1p correctly rounded to all rounding modes.
Implement double precision log1p function correctly rounded to all
rounding modes.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.
  - Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call).
  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log1p --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
760.444

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
827.880

-- LIBC latency -- with FMA
711.837

-- LIBC latency -- without FMA
764.317
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D151049
2023-05-23 11:04:04 -04:00
Tue Ly
111d274841 [libc][math] Implement double precision log2 function correctly rounded to all rounding modes.
Implement double precision log2 function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2 --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log2 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
177.632

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
231.332

-- LIBC latency -- with FMA
459.751

-- LIBC latency -- without FMA
463.850
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150374
2023-05-23 10:49:30 -04:00
Tue Ly
a68bbf42fa [libc][math] Implement double precision log function correctly rounded to all rounding modes.
Implement double precision log function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
598.306

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
632.925

-- LIBC latency -- with FMA
455.632

-- LIBC latency -- without FMA
488.564
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150131
2023-05-23 10:35:15 -04:00
Joseph Huber
ad00a3db4d [libc][AMDGPU] Disable the AMDGPU backend's ctor/dtor lowering for libc
The AMDGPU backend has a built-in pass to lower constructors. We do this
manually in the `start.cpp` implementation so we can disable this to
keep the binaries smaller.

Differential Revision: https://reviews.llvm.org/D151213
2023-05-23 09:20:41 -05:00
Tue Ly
a0c92a3817 [libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance.
Make log10 correctly rounded for non-FMA targets and improve its
performance.

Implemented fast pass and accurate pass:

**Fast Pass**:

  - Range reduction step 0: Extract exponent and mantissa
```
  x = 2^(e_x) * m_x
```
  - Range reduction step 1: Use lookup tables of size 2^7 = 128 to reduce the argument to:
```
   -2^-8 <= v = r * m_x - 1 < 2^-7
  where r = 2^-8 * ceil( 2^8 * (1 - 2^-8) / (1 + k * 2^-7) )
  and k = trunc( (m_x - 1) * 2^7 )
```
  - Polynomial approximation: approximate `log(1 + v)` by a degree-7 polynomial generated by Sollya with:
```
 > P = fpminimax((log(1 + x) - x)/x^2, 5, [|D...|], [-2^-8, 2^-7]);
```
  - Combine the results:
```
  log10(x) ~ ( e_x * log(2) - log(r) + v + v^2 * P(v) ) * log10(e)
```
  - Perform additive Ziv's test with errors bounded by `P_ERR * v^2`.  Return the result if Ziv's test passed.

**Accurate Pass**:

  - Take `e_x`, `v`, and the lookup table index from the range reduction step of fast pass.
  - Perform 3 more range reduction steps:
    - Range reduction step 2: Use look-up tables of size 193 to reduce the argument to `[-0x1.3ffcp-15, 0x1.3e3dp-15]`
```
   v2 = r2 * (1 + v) - 1 = (1 + s2) * (1 + v) - 1 = s2 + v + s2 * v
  where r2 = 2^-16 * round ( 2^16 / (1 + k * 2^-14) )
  and k = trunc( v * 2^14 + 0.5 ).
```
    - Range reduction step 3: Use look-up tables of size 161 to reduce the argument to `[-0x1.01928p-22 , 0x1p-22]`
```
   v3 = r3 * (1 + v2) - 1 = (1 + s3) * (1 + v2) - 1 = s3 + v2 + s3 * v2
  where r3 = 2^-21 * round ( 2^21 / (1 + k * 2^-21) )
  and k = trunc( v * 2^21 + 0.5 ).
```
    - Range reduction step 4: Use look-up tables of size 130 to reduce the argument to `[-0x1.0002143p-29 , 0x1p-29]`
```
   v4 = r4 * (1 + v3) - 1 = (1 + s4) * (1 + v3) - 1 = s4 + v3 + s4 * v3
  where r4 = 2^-28 * round ( 2^28 / (1 + k * 2^-28) )
  and k = trunc( v * 2^28 + 0.5 ).
```
  - Polynomial approximation: approximate `log10(1 + v4)` by a degree-4 minimax polynomial generated by Sollya with:
```
  > P = fpminimax(log10(1 + x)/x, 3, [|128...|], [-0x1.0002143p-29 , 0x1p-29]);
```
  - Combine the results:
```
  log10(x) ~ e_x * log10(2) - log10(r) - log10(r2) - log10(r3) - log10(r4) + v * P(v)
```
  - The combined results are computed using floating points of 128-bit precision.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.92%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.402 + 0.589 clc/call; Median-Min = 0.277 clc/call; Max = 22.752 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 75.797 + 3.317 clc/call; Median-Min = 3.407 clc/call; Max = 79.371 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 22.668 + 0.184 clc/call; Median-Min = 0.181 clc/call; Max = 23.205 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 25.977 + 0.183 clc/call; Median-Min = 0.138 clc/call; Max = 26.283 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 22.140 + 0.980 clc/call; Median-Min = 0.853 clc/call; Max = 23.790 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10 --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 54.613 + 0.357 clc/call; Median-Min = 0.287 clc/call; Max = 55.701 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 79.681 + 0.482 clc/call; Median-Min = 0.294 clc/call; Max = 81.604 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 61.532 + 0.208 clc/call; Median-Min = 0.199 clc/call; Max = 62.256 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 41.510 + 0.205 clc/call; Median-Min = 0.244 clc/call; Max = 41.867 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 55.669 + 0.240 clc/call; Median-Min = 0.280 clc/call; Max = 56.056 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log10 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
640.688

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
667.354

-- LIBC latency -- with FMA
495.593

-- LIBC latency -- without FMA
504.143
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150014
2023-05-23 10:18:23 -04:00
Guillaume Chatelet
04e066df5e [libc] Display unit test runtime for hosted environments
With more tests added to LLVM libc each week we want to keep track of unittest's runtime, especially for low end build bots.

Top offender can be tracked with a bit of scripting (spoiler alert, mem function sweep tests are in the top ones)
```
ninja check-libc | grep "ms)" | awk '{print $(NF-1),$0}' | sort -nr | cut -f2- -d' '
```

Unfortunately this doesn't work for hermetic tests since `clock` is unavailable.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151097
2023-05-23 09:23:12 +00:00
Kazu Hirata
9a515d8142 [libc] Fix typos in documentation 2023-05-22 23:27:59 -07:00
Joseph Huber
4fafa39b76 [libc] Add an option to make libc only build the libc-hdrgen tool
The `libc-hdergen` tool is required for cross-builds, however some cases
can cause issues when configuring this build. This patch adds an
ovveride option `LIBC_HDRGEN_ONLY` to allow us to retain the old
(incorrect) behaviour where `libc` would not build with any other
runtimes enabled.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151155
2023-05-22 22:24:57 -05:00
Siva Chandra Reddy
872e46a93e [libc] Add -fno-exceptions and -fno-rtti to integration test build.
Also adjust pthread_create_test to accomodate large page sizes. Both
these changes should now fix the full build builders.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151158
2023-05-22 22:10:36 +00:00
Noah Goldstein
e0b8f98b1f Add some missing [[noreturn]] attributes
Missing in header for `pthread_exit` and `exit`.

Missing in spec file for `pthread_exit`.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151143
2023-05-22 15:54:19 -05:00
Noah Goldstein
916e9be4aa Cleanup code in thread_exit
1) Avoid proper function calls and referencing local variables after
the stack has been deallocated. A proper function call/return or local
variable reference that may have spilled will cause invalid memory
reads after the stack has been deallocated.

2) Mark the function as [[noreturn]] and place
`__builtin_unreachable()` after the `SYS_exit` syscalls.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151142
2023-05-22 15:54:19 -05:00
Noah Goldstein
6a185718d4 Support custom attributes in pthread_create
Only functional for stack growsdown (same as before), but custom
`stack`, `stacksize`, `guardsize`, and `detachstate` all should be
working.

Differential Revision: https://reviews.llvm.org/D148290
2023-05-22 15:54:19 -05:00
Michael Jones
ae3b59e623 [libc] Use MPFR for strtofloat fuzzing
The previous string to float tests didn't check correctness, but due to
the atof differential test proving unreliable the strtofloat fuzz test
has been changed to use MPFR for correctness checking. Some minor bugs
have been found and fixed as well.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D150905
2023-05-22 11:04:53 -07:00
Aiden Grossman
742446a837 [CMake][libc] Fix non-runtime build when other runtimes are enabled
Before this patch, when other runtimes were enabled by setting
LLVM_ENABLE_RUNTIMES and llvm libc was built as a project by setting
LLVM_ENABLE_PROJECTS, the llvm libc CMake system would delay
configuration until the runtime build which never started since libc
wasn't declared as one of the runtime builds. This patch fixes this
behavior by explicitly checking that libc is within LLVM_ENABLE_RUNTIMES
rather than just the variable being set at all.

Reviewed By: Intue

Differential Revision: https://reviews.llvm.org/D151011
2023-05-20 22:33:14 +00:00
Joseph Huber
2ac91712cc [libc] Fix test failing on the GPU due to undefined floating point type
We got build failures due to the handling here, explicitly set the type.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D150983
2023-05-19 14:03:42 -05:00
Siva Chandra Reddy
00bd8e9011 [libc] Add a str() method to FPBits which returns a string representation.
Unit tests for the str() method have also been added.

Previously, a separate test only helper function was being used by the
test matchers which has regressed over many cleanups. Moreover, being a
test only utility, it was not tested separately (and hence the
regression).

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D150906
2023-05-19 06:20:41 +00:00
Siva Chandra Reddy
b095aa3f9a [libc] Use the new wide integer to hex string facility in LibcTest.
The old code, which has regressed over many cleanups, has been replaced
with the new wide integer to hex string facility.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D150901
2023-05-18 20:48:21 +00:00
Joseph Huber
9698265e44 [libc] Add comparison specialization for cpp:string type
We compare this type in the string_test. It had no specialization here
so it could cause linker errors. This patch simply extends the interface
to support it.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D150904
2023-05-18 15:24:00 -05:00
Joseph Huber
3ac8b647c7 [libc] Disable newly added test from running on NVPTX
A recent patch enabled this test which turns out to segfault on the
NVPTX architecture. We disable it for now.

Differential Revision: https://reviews.llvm.org/D150898
2023-05-18 14:05:27 -05:00
Siva Chandra Reddy
d3b3304222 [libc] Add a functioning realloc for hermetic tests.
Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D150846
2023-05-18 16:51:18 +00:00
Siva Chandra Reddy
625d6928a8 [libc] Extend IntegerToString to convert UInt* numbers to hex string.
This new functionality will help us avoid duplicated code in various
places in the testing infrastructure. Since the string representation
of the wide numbers is to be used by tests, to keep it simple, we
zero-pad the strings.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D150849
2023-05-18 16:44:43 +00:00
Joseph Huber
155191e9fd [libc] Restrict access to the RPC Process internals
This patch changes the `Process` struct to only provide the functions
expected to be visible by the interface. So, now we only export the
open, reset, and size query functions. This prevents users of the
interface from messing with the internals of the process, so now the
only existing failure mode is mismatched send and recieve calls.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D150703
2023-05-17 17:48:12 -05:00
Siva Chandra Reddy
4dc205f016 [libc] Add a convenience CMake function add_unittest_framework_library.
This function is used to add unit test and hermetic test framework libraries.
It avoids the duplicated code to add compile options to each every test
framework libraries.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D150727
2023-05-17 21:13:50 +00:00
Nico Weber
d763c6e5e2 Revert "Reland "[CMake] Bumps minimum version to 3.20.0.""
This reverts commit 65429b9af6a2c99d340ab2dcddd41dab201f399c.

Broke several projects, see https://reviews.llvm.org/D144509#4347562 onwards.

Also reverts follow-up commit "[OpenMP] Compile assembly files as ASM, not C"

This reverts commit 4072c8aee4c89c4457f4f30d01dc9bb4dfa52559.

Also reverts fix attempt  "[cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump"

This reverts commit 7d47dac5f828efd1d378ba44a97559114f00fb64.
2023-05-17 10:53:33 -04:00
Roland McGrath
2c874d2128 [libc] Fix definition and use of LIBC_INLINE macro
LIBC_INLINE was doubly defined in two headers.  Define it only in
one place. Also update a few uses to make sure it's always placed
where a function attribute is valid and is used consistently on
every declaration of the same function in case the attributes used
in its definition must match on declarations and definitions.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D150731
2023-05-16 15:11:37 -07:00
Siva Chandra Reddy
a5d98be515 [libc][Obvious] Bump hermetic alloc space to 64KB.
Few hermetic tests are failing as they are running out of memory.

Differential Revision: https://reviews.llvm.org/D150724
2023-05-16 21:23:16 +00:00
Siva Chandra Reddy
9d877369b7 [libc] Remove *TestMain libraries and combine them with the main test libraries.
There are not tests currently which use the main test framework but not
the `main` function from LibcTestMain.cpp. So, this change essentially
simplifies by merging the *TestMain libraries with the main test
libraries.

Reviewed By: michaelrj, jhuber6

Differential Revision: https://reviews.llvm.org/D150698
2023-05-16 20:55:36 +00:00
Joseph Huber
64d169c74d [libc][NFC] Simplifly inbox and outbox state handling
Currently we use a template parameter called `InvertInbox` to invert the
inbox when we load it. This is more easily understood as a static check
on whether or not the process running it is the server. Inverting the
inbox makes the states 1 0 and 0 1 own the buffer, so it's easier to
simply say that the server own the buffer if in != out. Also clean up some of
the comments.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D150365
2023-05-16 13:12:00 -05:00
Guillaume Chatelet
893f02c2af [libc] Add optimized memcmp for RISCV
This patch adds two versions of `bcmp` optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.

Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_memcmp` on a quad core Linux starfive RISCV 64 board running at 1.5GHz.

Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
  L1 Instruction 32 KiB (x4)
  L1 Data 32 KiB (x4)
  L2 Unified 2048 KiB (x1)
----------------------------------------------------------------------
Benchmark            Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------
BM_Memcmp/0/0        110 ns         66.4 ns     10404864 bytes_per_cycle=0.107646/s bytes_per_second=153.989M/s items_per_second=15.071M/s __llvm_libc::memcmp,memcmp Google A
BM_Memcmp/1/0        318 ns          211 ns      3026944 bytes_per_cycle=0.131539/s bytes_per_second=188.167M/s items_per_second=4.73691M/s __llvm_libc::memcmp,memcmp Google B
BM_Memcmp/2/0        204 ns          115 ns      6118400 bytes_per_cycle=0.121675/s bytes_per_second=174.058M/s items_per_second=8.70241M/s __llvm_libc::memcmp,memcmp Google D
BM_Memcmp/3/0        143 ns         99.6 ns      7013376 bytes_per_cycle=0.117974/s bytes_per_second=168.763M/s items_per_second=10.0437M/s __llvm_libc::memcmp,memcmp Google L
BM_Memcmp/4/0       81.3 ns         58.2 ns     11426816 bytes_per_cycle=0.101125/s bytes_per_second=144.661M/s items_per_second=17.1805M/s __llvm_libc::memcmp,memcmp Google M
BM_Memcmp/5/0        177 ns          118 ns      5952512 bytes_per_cycle=0.120612/s bytes_per_second=172.537M/s items_per_second=8.45549M/s __llvm_libc::memcmp,memcmp Google Q
BM_Memcmp/6/0        342 ns          220 ns      3483648 bytes_per_cycle=0.132004/s bytes_per_second=188.834M/s items_per_second=4.54739M/s __llvm_libc::memcmp,memcmp Google S
BM_Memcmp/7/0        208 ns          130 ns      5681152 bytes_per_cycle=0.12468/s bytes_per_second=178.356M/s items_per_second=7.6674M/s __llvm_libc::memcmp,memcmp Google U
BM_Memcmp/8/0        123 ns         79.1 ns      8387584 bytes_per_cycle=0.110593/s bytes_per_second=158.204M/s items_per_second=12.6439M/s __llvm_libc::memcmp,memcmp Google W
BM_Memcmp/9/0      20707 ns        10643 ns        67584 bytes_per_cycle=0.142401/s bytes_per_second=203.707M/s items_per_second=93.9559k/s __llvm_libc::memcmp,uniform 384 to 4096
```

After
```
BM_Memcmp/0/0       80.4 ns         55.8 ns     12648448 bytes_per_cycle=0.132703/s bytes_per_second=189.834M/s items_per_second=17.9256M/s __llvm_libc::memcmp,memcmp Google A
BM_Memcmp/1/0        140 ns         80.5 ns      8230912 bytes_per_cycle=0.337273/s bytes_per_second=482.474M/s items_per_second=12.4165M/s __llvm_libc::memcmp,memcmp Google B
BM_Memcmp/2/0        101 ns         66.4 ns     10571776 bytes_per_cycle=0.208539/s bytes_per_second=298.317M/s items_per_second=15.0687M/s __llvm_libc::memcmp,memcmp Google D
BM_Memcmp/3/0        118 ns         67.6 ns     10533888 bytes_per_cycle=0.176822/s bytes_per_second=252.946M/s items_per_second=14.7946M/s __llvm_libc::memcmp,memcmp Google L
BM_Memcmp/4/0        106 ns         53.0 ns     12722176 bytes_per_cycle=0.111141/s bytes_per_second=158.988M/s items_per_second=18.8591M/s __llvm_libc::memcmp,memcmp Google M
BM_Memcmp/5/0        141 ns         70.2 ns     10436608 bytes_per_cycle=0.26032/s bytes_per_second=372.39M/s items_per_second=14.2458M/s __llvm_libc::memcmp,memcmp Google Q
BM_Memcmp/6/0        144 ns         79.3 ns      8932352 bytes_per_cycle=0.353168/s bytes_per_second=505.211M/s items_per_second=12.612M/s __llvm_libc::memcmp,memcmp Google S
BM_Memcmp/7/0        123 ns         71.7 ns      9945088 bytes_per_cycle=0.22143/s bytes_per_second=316.758M/s items_per_second=13.9421M/s __llvm_libc::memcmp,memcmp Google U
BM_Memcmp/8/0       97.0 ns         56.2 ns     12509184 bytes_per_cycle=0.160526/s bytes_per_second=229.635M/s items_per_second=17.7784M/s __llvm_libc::memcmp,memcmp Google W
BM_Memcmp/9/0       1840 ns          989 ns       676864 bytes_per_cycle=1.4894/s bytes_per_second=2.08067G/s items_per_second=1010.92k/s __llvm_libc::memcmp,uniform 384 to 4096
```

glibc
```
BM_Memcmp/0/0       72.6 ns         51.7 ns     12963840 bytes_per_cycle=0.141261/s bytes_per_second=202.075M/s items_per_second=19.3246M/s glibc::memcmp,memcmp Google A
BM_Memcmp/1/0        118 ns         75.2 ns      9280512 bytes_per_cycle=0.354054/s bytes_per_second=506.478M/s items_per_second=13.3046M/s glibc::memcmp,memcmp Google B
BM_Memcmp/2/0        114 ns         62.9 ns     11152384 bytes_per_cycle=0.222675/s bytes_per_second=318.539M/s items_per_second=15.8943M/s glibc::memcmp,memcmp Google D
BM_Memcmp/3/0       84.0 ns         63.5 ns     11030528 bytes_per_cycle=0.186353/s bytes_per_second=266.581M/s items_per_second=15.7378M/s glibc::memcmp,memcmp Google L
BM_Memcmp/4/0       93.5 ns         51.2 ns     13462528 bytes_per_cycle=0.119215/s bytes_per_second=170.539M/s items_per_second=19.5384M/s glibc::memcmp,memcmp Google M
BM_Memcmp/5/0        123 ns         61.7 ns     11376640 bytes_per_cycle=0.225262/s bytes_per_second=322.239M/s items_per_second=16.1993M/s glibc::memcmp,memcmp Google Q
BM_Memcmp/6/0        122 ns         71.6 ns      9967616 bytes_per_cycle=0.380844/s bytes_per_second=544.802M/s items_per_second=13.9579M/s glibc::memcmp,memcmp Google S
BM_Memcmp/7/0        118 ns         65.6 ns     10555392 bytes_per_cycle=0.238677/s bytes_per_second=341.43M/s items_per_second=15.2334M/s glibc::memcmp,memcmp Google U
BM_Memcmp/8/0       90.4 ns         54.0 ns     12920832 bytes_per_cycle=0.161987/s bytes_per_second=231.724M/s items_per_second=18.5169M/s glibc::memcmp,memcmp Google W
BM_Memcmp/9/0       1045 ns          601 ns      1195008 bytes_per_cycle=2.53677/s bytes_per_second=3.54383G/s items_per_second=1.66423M/s glibc::memcmp,uniform 384 to 4096
```

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D150663
2023-05-16 17:40:26 +00:00
Guillaume Chatelet
7c1f279328 [libc] Add optimized bcmp for RISCV
[libc] Add optimized bcmp for RISCV

This patch adds two versions of bcmp optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.

Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Bcmp on a quad core Linux starfive RISCV 64 board running at 1.5GHz.

Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
  L1 Instruction 32 KiB (x4)
  L1 Data 32 KiB (x4)
  L2 Unified 2048 KiB (x1)
Load Average: 7.03, 5.98, 3.71
----------------------------------------------------------------------
Benchmark            Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------
BM_Bcmp/0/0        102 ns         60.5 ns     11662336 bytes_per_cycle=0.122696/s bytes_per_second=175.518M/s items_per_second=16.5258M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0        328 ns          172 ns      3737600 bytes_per_cycle=0.15256/s bytes_per_second=218.238M/s items_per_second=5.80575M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0        199 ns         99.7 ns      7019520 bytes_per_cycle=0.141897/s bytes_per_second=202.986M/s items_per_second=10.032M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0        173 ns         86.5 ns      8361984 bytes_per_cycle=0.13863/s bytes_per_second=198.312M/s items_per_second=11.5669M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0        105 ns         51.8 ns     13213696 bytes_per_cycle=0.116399/s bytes_per_second=166.51M/s items_per_second=19.2931M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0        167 ns         93.9 ns      7853056 bytes_per_cycle=0.139432/s bytes_per_second=199.459M/s items_per_second=10.6503M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0        262 ns          165 ns      3931136 bytes_per_cycle=0.151516/s bytes_per_second=216.745M/s items_per_second=6.07091M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0        168 ns          105 ns      6665216 bytes_per_cycle=0.143159/s bytes_per_second=204.791M/s items_per_second=9.52163M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0        108 ns         68.0 ns     10175488 bytes_per_cycle=0.125504/s bytes_per_second=179.535M/s items_per_second=14.701M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0      15371 ns         9007 ns        78848 bytes_per_cycle=0.166128/s bytes_per_second=237.648M/s items_per_second=111.031k/s __llvm_libc::bcmp,uniform 384 to 4096
```

After
```
BM_Bcmp/0/0       74.2 ns         49.7 ns     14306304 bytes_per_cycle=0.148927/s bytes_per_second=213.042M/s items_per_second=20.1101M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0        108 ns         68.1 ns     10350592 bytes_per_cycle=0.411197/s bytes_per_second=588.222M/s items_per_second=14.6849M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0       80.2 ns         56.0 ns     12386304 bytes_per_cycle=0.258588/s bytes_per_second=369.912M/s items_per_second=17.8585M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0       92.4 ns         55.7 ns     12555264 bytes_per_cycle=0.206835/s bytes_per_second=295.88M/s items_per_second=17.943M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0       79.3 ns         46.8 ns     14288896 bytes_per_cycle=0.125872/s bytes_per_second=180.061M/s items_per_second=21.3611M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0       98.0 ns         57.9 ns     12232704 bytes_per_cycle=0.268815/s bytes_per_second=384.543M/s items_per_second=17.2711M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0        132 ns         65.5 ns     10474496 bytes_per_cycle=0.417246/s bytes_per_second=596.875M/s items_per_second=15.2673M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0        101 ns         60.9 ns     11505664 bytes_per_cycle=0.253733/s bytes_per_second=362.968M/s items_per_second=16.4202M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0       72.5 ns         50.2 ns     14082048 bytes_per_cycle=0.183262/s bytes_per_second=262.158M/s items_per_second=19.9271M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0        852 ns          803 ns       854016 bytes_per_cycle=1.85028/s bytes_per_second=2.58481G/s items_per_second=1.24597M/s __llvm_libc::bcmp,uniform 384 to 4096
```

For comparison with glibc
```
BM_Bcmp/0/0        106 ns         52.6 ns     12906496 bytes_per_cycle=0.142072/s bytes_per_second=203.235M/s items_per_second=19.0271M/s glibc::bcmp,memcmp Google A
BM_Bcmp/1/0        132 ns         77.1 ns      8905728 bytes_per_cycle=0.365072/s bytes_per_second=522.239M/s items_per_second=12.9782M/s glibc::bcmp,memcmp Google B
BM_Bcmp/2/0        122 ns         62.3 ns     10909696 bytes_per_cycle=0.222667/s bytes_per_second=318.527M/s items_per_second=16.0563M/s glibc::bcmp,memcmp Google D
BM_Bcmp/3/0       99.5 ns         64.2 ns     11074560 bytes_per_cycle=0.185126/s bytes_per_second=264.825M/s items_per_second=15.5674M/s glibc::bcmp,memcmp Google L
BM_Bcmp/4/0       86.6 ns         50.2 ns     13488128 bytes_per_cycle=0.117941/s bytes_per_second=168.717M/s items_per_second=19.9053M/s glibc::bcmp,memcmp Google M
BM_Bcmp/5/0        106 ns         61.4 ns     11344896 bytes_per_cycle=0.248968/s bytes_per_second=356.151M/s items_per_second=16.284M/s glibc::bcmp,memcmp Google Q
BM_Bcmp/6/0        145 ns         71.9 ns     10046464 bytes_per_cycle=0.389814/s bytes_per_second=557.633M/s items_per_second=13.9019M/s glibc::bcmp,memcmp Google S
BM_Bcmp/7/0        119 ns         65.6 ns     10718208 bytes_per_cycle=0.243756/s bytes_per_second=348.696M/s items_per_second=15.2329M/s glibc::bcmp,memcmp Google U
BM_Bcmp/8/0       86.4 ns         54.5 ns     13250560 bytes_per_cycle=0.154831/s bytes_per_second=221.488M/s items_per_second=18.3532M/s glibc::bcmp,memcmp Google W
BM_Bcmp/9/0       1090 ns          604 ns      1186816 bytes_per_cycle=2.53848/s bytes_per_second=3.54622G/s items_per_second=1.65598M/s glibc::bcmp,uniform 384 to 4096
```

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D150567
2023-05-16 17:28:34 +00:00
Mikhail R. Gadelha
25a2aeb190 Revert "[libc] Add explicit constructor calls to fix compilation when using UInt<T>"
This reverts commit b663993067ffb5800632ad41ea7f2f92caab1093.

This caused a regression on aarch64:
https://lab.llvm.org/buildbot#builders/138/builds/43983
2023-05-16 13:06:37 -03:00
Mikhail R. Gadelha
b663993067 [libc] Add explicit constructor calls to fix compilation when using UInt<T>
This patch is similar to 86fe88c8d9 and adds several explicit
constructor calls (bool(...), uint64_t(...), uint8_t(...)) that are
needed when we use UInt<T> (in my case UInt<128> in riscv32).

This patch also adds two operators to UInt<T>:
* operator/= required by printf_core/float_hex_converter.h:148
* operator-- required by FPUtil/ManipulationFunctions.h:166

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D149594
2023-05-16 12:59:32 -03:00