3534 Commits

Author SHA1 Message Date
Joseph Huber
ba8c96593c
[Clang] Do not implicitly link C libraries for the GPU targets (#109052)
Summary:
I initially thought that it would be convenient to automatically link
these libraries like they are for standard C/C++ targets. However, this
created issues when trying to use C++ as a GPU target. This patch moves
the logic to now implicitly pass it as part of the offloading toolchain
instead, if found. This means that the user needs to set the target
toolchain for the link job for automatic detection, but can still be
done manually via `-Xoffload-linker -lc`.
2024-09-18 06:44:07 -07:00
Зишан Мирза
b9e13045ab
[libc] add ctime and ctime_r to date_and_time documentation (#108665)
closes #108664
2024-09-17 09:50:07 -07:00
Youngsuk Kim
c3d78a7af8 [libc][benchmarks] Tidy uses of raw_string_ostream (NFC)
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly

( 65b13610a5226b84889b923bae884ba395ad084d for further reference )

Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
2024-09-17 10:25:18 -05:00
Зишан Мирза
000a3f0a54
[libc][c11] implement ctime (#107285)
This is an implementation of `ctime` and includes `ctime_r`.

According to documentation, `ctime` and `ctime_r` are defined as the
following:

```c
char *ctime(const time_t *timep);
char *ctime_r(const time_t *restrict timep, char buf[restrict 26]);
```

closes #86567
2024-09-16 11:27:11 -07:00
Jeff Bailey
50985d23e5
[libc][nfc] Fix typo in header generation message. (#108813)
Fix a typo in the header generation message.

Before:
Generating header from
/home/vscode/llvm-project/llvm/../libc/newhdrgen/yaml/ctype.yaml and
/home/vscode/llvm-project/libc/include/ctype.h.def

After:
Generating header ctype.h from
/home/vscode/llvm-project/llvm/../libc/newhdrgen/yaml/ctype.yaml and
/home/vscode/llvm-project/libc/include/ctype.h.def
2024-09-16 16:53:43 +01:00
Job Henandez Lara
a205a854e0
[libc][math] Improve fmul performance by using double-double arithmetic. (#107517)
```
 Performance tests with inputs in denormal range:
-- My function --
     Total time      : 2731072304 ns 
     Average runtime : 68.2767 ns/op 
     Ops per second  : 14646276 op/s 
-- Other function --
     Total time      : 3259744268 ns 
     Average runtime : 81.4935 ns/op 
     Ops per second  : 12270913 op/s 
-- Average runtime ratio --
     Mine / Other's  : 0.837818 

 Performance tests with inputs in normal range:
-- My function --
     Total time      : 93467258 ns 
     Average runtime : 2.33668 ns/op 
     Ops per second  : 427957777 op/s 
-- Other function --
     Total time      : 637295452 ns 
     Average runtime : 15.9324 ns/op 
     Ops per second  : 62765299 op/s 
-- Average runtime ratio --
     Mine / Other's  : 0.146662 

 Performance tests with inputs in normal range with exponents close to each other:
-- My function --
     Total time      : 95764894 ns 
     Average runtime : 2.39412 ns/op 
     Ops per second  : 417690014 op/s 
-- Other function --
     Total time      : 639866770 ns 
     Average runtime : 15.9967 ns/op 
     Ops per second  : 62513075 op/s 
-- Average runtime ratio --
     Mine / Other's  : 0.149664 
```

---------

Co-authored-by: Tue Ly <lntue@google.com>
2024-09-14 17:32:22 -04:00
Job Henandez Lara
c0b7f1bb58
[libc][math][c23] add darwin entrypoints for fmul (#108680) 2024-09-14 00:21:32 -04:00
lntue
b659abef48
[libc] Fix vdso VER_FLG_BASE redefinition in overlay mod. (#108628) 2024-09-13 15:06:20 -04:00
Schrodinger ZHU Yifan
82987bd9da
[libc] fix dependency path for vDSO (#108591) 2024-09-13 12:13:27 -04:00
Schrodinger ZHU Yifan
a6438360d4
[libc] fix build issue in overlay mode (#108583) 2024-09-13 11:10:10 -04:00
Schrodinger ZHU Yifan
99fe5954d2
[libc] implement clock_gettime using vDSO (#108458)
supersedes https://github.com/llvm/llvm-project/pull/91805
2024-09-13 10:58:39 -04:00
Sirui Mu
ded080152a
[libc] Add osutils for Windows and make libc and its tests build on Windows target (#104676)
This PR first adds osutils for Windows, and changes some libc code to
make libc and its tests build on the Windows target. It then temporarily
disables some libc tests that are currently problematic on Windows.

Specifically, the changes besides the addition of osutils include:

- Macro `LIBC_TYPES_HAS_FLOAT16` is disabled on Windows. `clang-cl`
generates calls to functions in `compiler-rt` to handle float16
arithmetic and these functions are currently not linked in on Windows.
- Macro `LIBC_TYPES_HAS_INT128` is disabled on Windows.
- The invocation to `::aligned_malloc` is changed to an invocation to
`::_aligned_malloc`.
- The following unit tests are temporarily disabled because they
currently fail on Windows:
  - `test.src.__support.big_int_test`
  - `test.src.__support.arg_list_test`
  - `test.src.fenv.getenv_and_setenv_test`
- Tests involving `__m128i`, `__m256i`, and `__m512i` in
`test.src.string.memory_utils.op_tests.cpp`
- `test_range_errors` in `libc/test/src/math/smoke/AddTest.h` and
`libc/test/src/math/smoke/SubTest.h`
2024-09-11 23:41:32 -04:00
Joseph Huber
666a3f4ed4
[libc] Stub TLS functions on the GPU temporarily (#108267)
Summary:
There's an extern weak symbol for this, we should just factor these into
a more common interface. Stub them temporarily to make the bots happy.
PTXAS does not handle extern weak.
2024-09-11 11:36:07 -07:00
lntue
1896ee3889
[libc] Fix undefined behavior for nan functions. (#106468)
Currently the nan* functions use nullptr dereferencing to crash with
SIGSEGV if the input is nullptr. Both `nan(nullptr)` and `nullptr`
dereferencing are undefined behaviors according to the C standard.
Employing `nullptr` dereference in the `nan` function implementation is
ok if users only linked against the pre-built library, but it might be
completely removed by the compilers' optimizations if it is built from
source together with the users' code.

See for instance:  https://godbolt.org/z/fd8KcM9bx

This PR uses volatile load to prevent the undefined behavior if libc is
built without sanitizers, and leave the current undefined behavior if
libc is built with sanitizers, so that the undefined behavior can be
caught for users' codes.
2024-09-11 14:13:31 -04:00
Schrodinger ZHU Yifan
d8e124dffa
[libc] implement vdso (#91572) 2024-09-11 12:51:11 -04:00
Schrodinger ZHU Yifan
779a444009
[libc] fix tls teardown while being used (#108229)
The call chain to `Mutex:lock` can be polluted by stack protector. For
completely safe, let's postpone the main TLS tearing down to a separate
phase.

fix #108030
2024-09-11 12:22:35 -04:00
Schrodinger ZHU Yifan
ce9f987295
[libc] fix locale dependency for stdlib (#108042)
Address the following issue:
```
❯ ninja libc.test.src.__support.OSUtil.linux.vdso_test.__unit__
[91/127] Building CXX object libc/test/src/__support/OSUtil/linux/CMakeFiles/libc.test.src.__support.OSUtil.linux.vdso_test.__unit__.__build__.dir/vdso_test.cpp.o
FAILED: libc/test/src/__support/OSUtil/linux/CMakeFiles/libc.test.src.__support.OSUtil.linux.vdso_test.__unit__.__build__.dir/vdso_test.cpp.o 
sccache /usr/bin/clang++ -DLIBC_NAMESPACE=__llvm_libc_20_0_0_git -D_DEBUG -I/home/schrodingerzy/Documents/llvm-project/libc -isystem /home/schrodingerzy/Documents/llvm-project/build/libc/include -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -g -std=gnu++17 -fpie -DLIBC_FULL_BUILD -ffreestanding -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -MD -MT libc/test/src/__support/OSUtil/linux/CMakeFiles/libc.test.src.__support.OSUtil.linux.vdso_test.__unit__.__build__.dir/vdso_test.cpp.o -MF libc/test/src/__support/OSUtil/linux/CMakeFiles/libc.test.src.__support.OSUtil.linux.vdso_test.__unit__.__build__.dir/vdso_test.cpp.o.d -o libc/test/src/__support/OSUtil/linux/CMakeFiles/libc.test.src.__support.OSUtil.linux.vdso_test.__unit__.__build__.dir/vdso_test.cpp.o -c /home/schrodingerzy/Documents/llvm-project/libc/test/src/__support/OSUtil/linux/vdso_test.cpp
In file included from /home/schrodingerzy/Documents/llvm-project/libc/test/src/__support/OSUtil/linux/vdso_test.cpp:21:
In file included from /home/schrodingerzy/Documents/llvm-project/libc/test/UnitTest/ErrnoSetterMatcher.h:13:
In file included from /home/schrodingerzy/Documents/llvm-project/libc/src/__support/FPUtil/fpbits_str.h:12:
In file included from /home/schrodingerzy/Documents/llvm-project/libc/src/__support/CPP/string.h:20:
/home/schrodingerzy/Documents/llvm-project/build/libc/include/stdlib.h:13:10: fatal error: 'llvm-libc-types/locale_t.h' file not found
   13 | #include "llvm-libc-types/locale_t.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
[123/127] Building CXX object libc/test/UnitTest/CMakeFiles/LibcTest.unit.dir/LibcTestMain.cpp.o
ninja: build stopped: subcommand failed.
```
2024-09-10 13:04:19 -04:00
lntue
277371943f
[libc][bazel] Update bazel overlay for math functions and their tests. (#107862) 2024-09-09 14:15:46 -04:00
wldfngrs
3d7af093f3
[libc] Add proxy header for the jmp_buf type (#107712)
Added proxy header for the jmp_buf type and changed all use instances
from __jmp_buf * to the typedef alias jmp_buf , fixed the link to LLVM
in stack_t.h description
2024-09-08 20:55:00 -04:00
Rahul Joshi
98563b19c2
[libc][TableGen] Migrate libc-hdrgen backend to use const RecordKeeper (#107542)
Migrate libc-hdrgen backend to use const RecordKeeper
2024-09-07 15:14:07 -07:00
wldfngrs
056a1676cb
[libc] Add proxy header for the stack_t type (#107559)
added proxy header for the stack_t type and modified the corresponding
CMakeLists.txt files
2024-09-07 10:01:33 -04:00
lntue
fc7a893620
[libc] Remove -ffreestanding when building MPFR wrapper. (#107637)
MPFR/GMP headers do not work with -ffreestanding flags.
2024-09-06 16:54:36 -04:00
lntue
876b0e60fe
[libc] Fix signal's dependency on the proxy header sighandler_t. (#107605) 2024-09-06 16:23:53 -04:00
lntue
80cf21dad1
[libc] Fix unit test compile flags propagation. (#106128)
With this change, I was able to build and test for aarch64 & riscv64 on
x86-64 host as follow:

Pre-requisite:
- cross build toolchain for aarch64
```
$ sudo apt install binutils-aarch64-linux-gnu gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
```
- cross build toolchain for riscv64
```
$ sudo apt install binutils-riscv64-linux-gnu gcc-riscv64-linux-gnu g++-riscv64-linux-gnu
```
- qemu user:
```
$ sudo apt install qemu qemu-user qemu-user-static
```

CMake invocation:
```
$ cmake ../runtimes -GNinja -DLLVM_ENABLE_RUNTIMES=libc -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLIBC_TARGET_TRIPLE=<aarch64-linux-gnu/riscv64-linux-gnu> -DCMAKE_BUILD_TYPE=Release -DLIBC_TEST_COMPILE_OPTIONS_DEFAULT="-static"
$ ninja libc
$ ninja check-libc
```
2024-09-06 11:56:07 -04:00
Vitaly Goldshteyn
66a03295de
[libc] Implement branchless head-tail comparison for bcmp (#107540)
Binary size changes:

| Bytes (cache lines) | before   | after   |
|---------------------|----------|---------|
| sse4                | 419 (7)  | 288 (5) |
| avx                 | 430 (7)  | 308 (5) |
| avx512f             | 589 (10) | 390 (7) |

Benchmarks for different CPUs using
https://github.com/google/fleetbench.

 - indus-cascadelake

```
name                                                       old speed            new speed            delta
BM_LIBC_Bcmp_Fleet_L1                                      1.96GB/s ± 1%        2.19GB/s ± 0%  +11.49%  (p=0.000 n=29+24)
BM_LIBC_Bcmp_Fleet_L2                                      1.90GB/s ± 1%        2.14GB/s ± 1%  +12.68%  (p=0.000 n=29+24)
BM_LIBC_Bcmp_Fleet_LLC                                      513MB/s ± 4%         531MB/s ± 4%   +3.53%  (p=0.000 n=24+24)
BM_LIBC_Bcmp_Fleet_Cold                                     452MB/s ± 3%         456MB/s ± 4%     ~     (p=0.103 n=30+30)
BM_LIBC_Bcmp_0_L1                                [Bcmp_0]  2.98GB/s ± 1%        3.15GB/s ± 1%   +5.59%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_0_L2                                [Bcmp_0]  2.86GB/s ± 1%        3.07GB/s ± 1%   +7.21%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_0_LLC                               [Bcmp_0]   738MB/s ± 7%         751MB/s ± 3%   +1.68%  (p=0.000 n=24+25)
BM_LIBC_Bcmp_0_Cold                              [Bcmp_0]   643MB/s ± 3%         642MB/s ± 4%     ~     (p=0.522 n=29+30)
BM_LIBC_Bcmp_1_L1                                [Bcmp_1]  3.08GB/s ± 0%        3.25GB/s ± 0%   +5.35%  (p=0.000 n=28+30)
BM_LIBC_Bcmp_1_L2                                [Bcmp_1]  2.97GB/s ± 1%        3.17GB/s ± 1%   +6.65%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_1_LLC                               [Bcmp_1]   901MB/s ±59%         871MB/s ±36%     ~     (p=0.676 n=29+27)
BM_LIBC_Bcmp_1_Cold                              [Bcmp_1]   686MB/s ± 4%         686MB/s ± 3%     ~     (p=0.934 n=29+30)
BM_LIBC_Bcmp_2_L1                                [Bcmp_2]  1.63GB/s ± 0%        1.80GB/s ± 1%  +10.19%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_2_L2                                [Bcmp_2]  1.57GB/s ± 1%        1.75GB/s ± 1%  +11.46%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_2_LLC                               [Bcmp_2]   451MB/s ±61%         427MB/s ±28%     ~     (p=0.469 n=29+25)
BM_LIBC_Bcmp_2_Cold                              [Bcmp_2]   353MB/s ± 4%         354MB/s ± 5%     ~     (p=0.467 n=30+30)
BM_LIBC_Bcmp_3_L1                                [Bcmp_3]  1.91GB/s ± 1%        2.10GB/s ± 1%   +9.90%  (p=0.000 n=29+29)
BM_LIBC_Bcmp_3_L2                                [Bcmp_3]  1.84GB/s ± 1%        2.03GB/s ± 1%  +10.63%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_3_LLC                               [Bcmp_3]   491MB/s ±24%         538MB/s ±24%   +9.66%  (p=0.000 n=24+27)
BM_LIBC_Bcmp_3_Cold                              [Bcmp_3]   417MB/s ± 4%         421MB/s ± 3%     ~     (p=0.063 n=30+29)
BM_LIBC_Bcmp_4_L1                                [Bcmp_4]   761MB/s ± 1%         867MB/s ± 1%  +14.02%  (p=0.000 n=28+30)
BM_LIBC_Bcmp_4_L2                                [Bcmp_4]   748MB/s ± 1%         860MB/s ± 1%  +15.04%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_4_LLC                               [Bcmp_4]   227MB/s ±29%         260MB/s ±64%  +14.70%  (p=0.000 n=26+27)
BM_LIBC_Bcmp_4_Cold                              [Bcmp_4]   187MB/s ± 3%         191MB/s ± 5%   +2.26%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_5_L1                                [Bcmp_5]  1.48GB/s ± 1%        1.71GB/s ± 1%  +15.26%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_5_L2                                [Bcmp_5]  1.42GB/s ± 1%        1.67GB/s ± 1%  +17.68%  (p=0.000 n=29+29)
BM_LIBC_Bcmp_5_LLC                               [Bcmp_5]   412MB/s ±34%         519MB/s ±80%  +25.87%  (p=0.000 n=27+30)
BM_LIBC_Bcmp_5_Cold                              [Bcmp_5]   336MB/s ± 4%         343MB/s ± 6%   +2.05%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_6_L1                                [Bcmp_6]  2.87GB/s ± 0%        3.24GB/s ± 1%  +12.88%  (p=0.000 n=26+30)
BM_LIBC_Bcmp_6_L2                                [Bcmp_6]  2.78GB/s ± 1%        3.20GB/s ± 1%  +15.15%  (p=0.000 n=26+30)
BM_LIBC_Bcmp_6_LLC                               [Bcmp_6]   926MB/s ±43%        1227MB/s ±76%  +32.53%  (p=0.000 n=27+30)
BM_LIBC_Bcmp_6_Cold                              [Bcmp_6]   716MB/s ± 4%         737MB/s ± 6%   +3.02%  (p=0.000 n=28+29)
BM_LIBC_Bcmp_7_L1                                [Bcmp_7]  1.54GB/s ± 1%        1.56GB/s ± 0%   +1.40%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_7_L2                                [Bcmp_7]  1.47GB/s ± 1%        1.52GB/s ± 1%   +2.97%  (p=0.000 n=27+30)
BM_LIBC_Bcmp_7_LLC                               [Bcmp_7]   351MB/s ±23%         436MB/s ±83%  +24.04%  (p=0.005 n=24+29)
BM_LIBC_Bcmp_7_Cold                              [Bcmp_7]   283MB/s ± 4%         282MB/s ± 4%     ~     (p=0.644 n=30+30)
BM_LIBC_Bcmp_8_L1                                [Bcmp_8]   824MB/s ± 1%        1048MB/s ± 1%  +27.18%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_8_L2                                [Bcmp_8]   808MB/s ± 1%        1027MB/s ± 1%  +27.12%  (p=0.000 n=29+29)
BM_LIBC_Bcmp_8_LLC                               [Bcmp_8]   317MB/s ±79%         332MB/s ±74%     ~     (p=0.338 n=30+29)
BM_LIBC_Bcmp_8_Cold                              [Bcmp_8]   207MB/s ± 5%         212MB/s ± 5%   +2.27%  (p=0.000 n=30+30)
```

 - indus-skylake

```
name                                                       old speed            new speed            delta
BM_LIBC_Bcmp_Fleet_L1                                      2.06GB/s ± 2%        2.25GB/s ± 3%   +9.66%  (p=0.000 n=27+24)
BM_LIBC_Bcmp_Fleet_L2                                      1.96GB/s ± 2%        2.17GB/s ± 2%  +10.61%  (p=0.000 n=30+24)
BM_LIBC_Bcmp_Fleet_LLC                                     1.18GB/s ± 6%        1.32GB/s ± 5%  +12.27%  (p=0.000 n=28+28)
BM_LIBC_Bcmp_Fleet_Cold                                     456MB/s ± 2%         466MB/s ± 2%   +2.22%  (p=0.000 n=28+28)
BM_LIBC_Bcmp_0_L1                                [Bcmp_0]  3.08GB/s ± 2%        3.20GB/s ± 1%   +3.72%  (p=0.000 n=28+22)
BM_LIBC_Bcmp_0_L2                                [Bcmp_0]  2.92GB/s ± 1%        3.05GB/s ± 2%   +4.49%  (p=0.000 n=23+23)
BM_LIBC_Bcmp_0_LLC                               [Bcmp_0]  1.83GB/s ± 8%        1.94GB/s ± 4%   +6.24%  (p=0.000 n=25+27)
BM_LIBC_Bcmp_0_Cold                              [Bcmp_0]   654MB/s ± 2%         659MB/s ± 2%   +0.76%  (p=0.012 n=30+29)
BM_LIBC_Bcmp_1_L1                                [Bcmp_1]  3.19GB/s ± 2%        3.34GB/s ± 2%   +4.41%  (p=0.000 n=26+23)
BM_LIBC_Bcmp_1_L2                                [Bcmp_1]  3.05GB/s ± 2%        3.21GB/s ± 2%   +5.32%  (p=0.000 n=28+25)
BM_LIBC_Bcmp_1_LLC                               [Bcmp_1]  1.95GB/s ± 4%        2.03GB/s ±10%   +3.61%  (p=0.000 n=27+30)
BM_LIBC_Bcmp_1_Cold                              [Bcmp_1]   700MB/s ± 2%         702MB/s ± 2%     ~     (p=0.150 n=30+30)
BM_LIBC_Bcmp_2_L1                                [Bcmp_2]  1.69GB/s ± 2%        1.85GB/s ± 1%   +9.31%  (p=0.000 n=30+26)
BM_LIBC_Bcmp_2_L2                                [Bcmp_2]  1.60GB/s ± 2%        1.78GB/s ± 2%  +10.90%  (p=0.000 n=26+27)
BM_LIBC_Bcmp_2_LLC                               [Bcmp_2]  1.01GB/s ± 5%        1.12GB/s ± 5%  +11.40%  (p=0.000 n=27+28)
BM_LIBC_Bcmp_2_Cold                              [Bcmp_2]   355MB/s ± 3%         360MB/s ± 3%   +1.46%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_3_L1                                [Bcmp_3]  1.98GB/s ± 2%        2.15GB/s ± 2%   +8.89%  (p=0.000 n=29+27)
BM_LIBC_Bcmp_3_L2                                [Bcmp_3]  1.87GB/s ± 3%        2.05GB/s ± 2%  +10.06%  (p=0.000 n=30+26)
BM_LIBC_Bcmp_3_LLC                               [Bcmp_3]  1.19GB/s ± 4%        1.31GB/s ± 6%   +9.82%  (p=0.000 n=27+29)
BM_LIBC_Bcmp_3_Cold                              [Bcmp_3]   424MB/s ± 3%         431MB/s ± 3%   +1.58%  (p=0.000 n=28+30)
BM_LIBC_Bcmp_4_L1                                [Bcmp_4]   849MB/s ± 2%         949MB/s ± 2%  +11.84%  (p=0.000 n=27+28)
BM_LIBC_Bcmp_4_L2                                [Bcmp_4]   815MB/s ± 3%         913MB/s ± 3%  +12.06%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_4_LLC                               [Bcmp_4]   512MB/s ± 9%         571MB/s ± 7%  +11.40%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_4_Cold                              [Bcmp_4]   187MB/s ± 3%         192MB/s ± 2%   +2.56%  (p=0.000 n=30+28)
BM_LIBC_Bcmp_5_L1                                [Bcmp_5]  1.55GB/s ± 2%        1.77GB/s ± 3%  +13.93%  (p=0.000 n=30+28)
BM_LIBC_Bcmp_5_L2                                [Bcmp_5]  1.47GB/s ± 2%        1.70GB/s ± 2%  +15.96%  (p=0.000 n=27+26)
BM_LIBC_Bcmp_5_LLC                               [Bcmp_5]   939MB/s ± 5%        1084MB/s ± 4%  +15.36%  (p=0.000 n=28+27)
BM_LIBC_Bcmp_5_Cold                              [Bcmp_5]   340MB/s ± 2%         347MB/s ± 3%   +1.93%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_6_L1                                [Bcmp_6]  3.06GB/s ± 3%        3.40GB/s ± 2%  +11.13%  (p=0.000 n=30+28)
BM_LIBC_Bcmp_6_L2                                [Bcmp_6]  2.89GB/s ± 3%        3.24GB/s ± 2%  +12.20%  (p=0.000 n=29+26)
BM_LIBC_Bcmp_6_LLC                               [Bcmp_6]  1.93GB/s ± 4%        2.09GB/s ±11%   +8.16%  (p=0.000 n=26+30)
BM_LIBC_Bcmp_6_Cold                              [Bcmp_6]   746MB/s ± 2%         762MB/s ± 2%   +2.11%  (p=0.000 n=30+28)
BM_LIBC_Bcmp_7_L1                                [Bcmp_7]  1.59GB/s ± 2%        1.62GB/s ± 2%   +1.72%  (p=0.000 n=25+27)
BM_LIBC_Bcmp_7_L2                                [Bcmp_7]  1.49GB/s ± 2%        1.53GB/s ± 2%   +2.62%  (p=0.000 n=27+29)
BM_LIBC_Bcmp_7_LLC                               [Bcmp_7]   852MB/s ±10%         909MB/s ± 6%   +6.71%  (p=0.000 n=30+29)
BM_LIBC_Bcmp_7_Cold                              [Bcmp_7]   283MB/s ± 3%         283MB/s ± 2%     ~     (p=0.617 n=30+27)
BM_LIBC_Bcmp_8_L1                                [Bcmp_8]   891MB/s ± 2%        1083MB/s ± 2%  +21.64%  (p=0.000 n=27+24)
BM_LIBC_Bcmp_8_L2                                [Bcmp_8]   855MB/s ± 2%        1045MB/s ± 1%  +22.31%  (p=0.000 n=25+23)
BM_LIBC_Bcmp_8_LLC                               [Bcmp_8]   568MB/s ± 7%         659MB/s ± 8%  +16.04%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_8_Cold                              [Bcmp_8]   207MB/s ± 2%         212MB/s ± 2%   +2.31%  (p=0.000 n=30+27)
```

 - arcadia-rome

```
name                                                       old speed            new speed            delta
BM_LIBC_Bcmp_Fleet_L1                                      2.16GB/s ± 2%        2.27GB/s ± 2%   +5.13%  (p=0.000 n=26+30)
BM_LIBC_Bcmp_Fleet_L2                                      2.15GB/s ± 2%        2.25GB/s ± 2%   +4.64%  (p=0.000 n=27+30)
BM_LIBC_Bcmp_Fleet_LLC                                     1.73GB/s ± 3%        1.81GB/s ± 3%   +4.66%  (p=0.000 n=25+28)
BM_LIBC_Bcmp_Fleet_Cold                                     494MB/s ± 1%         496MB/s ± 2%   +0.45%  (p=0.023 n=22+24)
BM_LIBC_Bcmp_0_L1                                [Bcmp_0]  3.30GB/s ± 1%        3.24GB/s ± 2%   -1.70%  (p=0.000 n=27+30)
BM_LIBC_Bcmp_0_L2                                [Bcmp_0]  3.23GB/s ± 2%        3.19GB/s ± 2%   -1.28%  (p=0.000 n=28+28)
BM_LIBC_Bcmp_0_LLC                               [Bcmp_0]  2.59GB/s ± 3%        2.58GB/s ± 2%   -0.65%  (p=0.010 n=26+26)
BM_LIBC_Bcmp_0_Cold                              [Bcmp_0]   720MB/s ± 1%         707MB/s ± 3%   -1.75%  (p=0.000 n=22+25)
BM_LIBC_Bcmp_1_L1                                [Bcmp_1]  3.37GB/s ± 1%        3.36GB/s ± 2%     ~     (p=0.102 n=28+29)
BM_LIBC_Bcmp_1_L2                                [Bcmp_1]  3.32GB/s ± 2%        3.30GB/s ± 2%   -0.51%  (p=0.038 n=28+29)
BM_LIBC_Bcmp_1_LLC                               [Bcmp_1]  2.67GB/s ± 4%        2.70GB/s ± 4%   +0.96%  (p=0.009 n=28+27)
BM_LIBC_Bcmp_1_Cold                              [Bcmp_1]   755MB/s ± 1%         751MB/s ± 2%   -0.57%  (p=0.000 n=22+25)
BM_LIBC_Bcmp_2_L1                                [Bcmp_2]  1.79GB/s ± 1%        1.86GB/s ± 2%   +3.92%  (p=0.000 n=27+29)
BM_LIBC_Bcmp_2_L2                                [Bcmp_2]  1.77GB/s ± 2%        1.82GB/s ± 2%   +2.99%  (p=0.000 n=28+29)
BM_LIBC_Bcmp_2_LLC                               [Bcmp_2]  1.41GB/s ± 4%        1.47GB/s ± 3%   +3.97%  (p=0.000 n=28+28)
BM_LIBC_Bcmp_2_Cold                              [Bcmp_2]   386MB/s ± 1%         389MB/s ± 1%   +0.60%  (p=0.000 n=21+23)
BM_LIBC_Bcmp_3_L1                                [Bcmp_3]  2.07GB/s ± 2%        2.17GB/s ± 2%   +4.87%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_3_L2                                [Bcmp_3]  2.07GB/s ± 2%        2.13GB/s ± 2%   +3.02%  (p=0.000 n=28+30)
BM_LIBC_Bcmp_3_LLC                               [Bcmp_3]  1.66GB/s ± 2%        1.73GB/s ± 2%   +4.08%  (p=0.000 n=29+26)
BM_LIBC_Bcmp_3_Cold                              [Bcmp_3]   466MB/s ± 2%         469MB/s ± 3%   +0.66%  (p=0.001 n=22+25)
BM_LIBC_Bcmp_4_L1                                [Bcmp_4]   861MB/s ± 1%         964MB/s ± 2%  +11.98%  (p=0.000 n=29+29)
BM_LIBC_Bcmp_4_L2                                [Bcmp_4]   853MB/s ± 2%         935MB/s ± 2%   +9.54%  (p=0.000 n=28+29)
BM_LIBC_Bcmp_4_LLC                               [Bcmp_4]   707MB/s ± 3%         743MB/s ± 4%   +5.08%  (p=0.000 n=29+29)
BM_LIBC_Bcmp_4_Cold                              [Bcmp_4]   199MB/s ± 3%         199MB/s ± 2%     ~     (p=0.107 n=29+25)
BM_LIBC_Bcmp_5_L1                                [Bcmp_5]  1.65GB/s ± 1%        1.75GB/s ± 2%   +6.15%  (p=0.000 n=29+29)
BM_LIBC_Bcmp_5_L2                                [Bcmp_5]  1.64GB/s ± 3%        1.73GB/s ± 2%   +5.37%  (p=0.000 n=29+29)
BM_LIBC_Bcmp_5_LLC                               [Bcmp_5]  1.32GB/s ± 2%        1.40GB/s ± 2%   +6.21%  (p=0.000 n=28+27)
BM_LIBC_Bcmp_5_Cold                              [Bcmp_5]   370MB/s ± 3%         371MB/s ± 2%   +0.16%  (p=0.008 n=29+25)
BM_LIBC_Bcmp_6_L1                                [Bcmp_6]  3.25GB/s ± 2%        3.47GB/s ± 2%   +6.74%  (p=0.000 n=28+29)
BM_LIBC_Bcmp_6_L2                                [Bcmp_6]  3.26GB/s ± 1%        3.44GB/s ± 1%   +5.43%  (p=0.000 n=28+29)
BM_LIBC_Bcmp_6_LLC                               [Bcmp_6]  2.66GB/s ± 2%        2.79GB/s ± 3%   +4.90%  (p=0.000 n=27+29)
BM_LIBC_Bcmp_6_Cold                              [Bcmp_6]   812MB/s ± 3%         799MB/s ± 2%   -1.57%  (p=0.000 n=29+25)
BM_LIBC_Bcmp_7_L1                                [Bcmp_7]  1.71GB/s ± 2%        1.66GB/s ± 2%   -3.14%  (p=0.000 n=29+29)
BM_LIBC_Bcmp_7_L2                                [Bcmp_7]  1.63GB/s ± 2%        1.59GB/s ± 2%   -2.50%  (p=0.000 n=29+28)
BM_LIBC_Bcmp_7_LLC                               [Bcmp_7]  1.25GB/s ± 4%        1.25GB/s ± 2%     ~     (p=0.530 n=28+26)
BM_LIBC_Bcmp_7_Cold                              [Bcmp_7]   311MB/s ± 3%         308MB/s ± 1%     ~     (p=0.127 n=29+24)
BM_LIBC_Bcmp_8_L1                                [Bcmp_8]   869MB/s ± 2%        1098MB/s ± 2%  +26.28%  (p=0.000 n=27+29)
BM_LIBC_Bcmp_8_L2                                [Bcmp_8]   873MB/s ± 2%        1075MB/s ± 1%  +23.06%  (p=0.000 n=27+29)
BM_LIBC_Bcmp_8_LLC                               [Bcmp_8]   743MB/s ± 4%         859MB/s ± 4%  +15.58%  (p=0.000 n=27+27)
BM_LIBC_Bcmp_8_Cold                              [Bcmp_8]   221MB/s ± 4%         221MB/s ± 3%   +0.14%  (p=0.034 n=29+25)
```

 - ixion-haswell

```
name                                                       old speed            new speed            delta
BM_LIBC_Bcmp_Fleet_L1                                      2.27GB/s ± 5%        2.41GB/s ± 6%   +6.10%  (p=0.000 n=29+28)
BM_LIBC_Bcmp_Fleet_L2                                      2.14GB/s ± 6%        2.33GB/s ± 5%   +9.21%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_Fleet_LLC                                     1.30GB/s ± 9%        1.43GB/s ± 8%   +9.85%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_Fleet_Cold                                     475MB/s ± 6%         475MB/s ± 5%     ~     (p=0.839 n=30+29)
BM_LIBC_Bcmp_0_L1                                [Bcmp_0]  3.38GB/s ± 7%        3.46GB/s ± 6%   +2.35%  (p=0.009 n=30+29)
BM_LIBC_Bcmp_0_L2                                [Bcmp_0]  3.20GB/s ± 5%        3.32GB/s ± 6%   +3.52%  (p=0.000 n=28+30)
BM_LIBC_Bcmp_0_LLC                               [Bcmp_0]  1.88GB/s ± 9%        2.00GB/s ± 6%   +6.63%  (p=0.000 n=30+28)
BM_LIBC_Bcmp_0_Cold                              [Bcmp_0]   664MB/s ± 6%         655MB/s ± 6%   -1.32%  (p=0.025 n=30+30)
BM_LIBC_Bcmp_1_L1                                [Bcmp_1]  3.50GB/s ± 8%        3.61GB/s ±10%   +3.09%  (p=0.001 n=29+30)
BM_LIBC_Bcmp_1_L2                                [Bcmp_1]  3.32GB/s ± 7%        3.48GB/s ± 8%   +4.89%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_1_LLC                               [Bcmp_1]  2.02GB/s ± 7%        2.14GB/s ± 9%   +5.82%  (p=0.000 n=28+29)
BM_LIBC_Bcmp_1_Cold                              [Bcmp_1]   716MB/s ± 6%         709MB/s ± 5%   -0.97%  (p=0.040 n=30+28)
BM_LIBC_Bcmp_2_L1                                [Bcmp_2]  1.83GB/s ± 7%        1.97GB/s ± 8%   +7.90%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_2_L2                                [Bcmp_2]  1.74GB/s ± 6%        1.92GB/s ± 6%  +10.29%  (p=0.000 n=30+29)
BM_LIBC_Bcmp_2_LLC                               [Bcmp_2]  1.05GB/s ± 9%        1.15GB/s ± 9%   +9.73%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_2_Cold                              [Bcmp_2]   379MB/s ± 6%         372MB/s ± 6%   -1.74%  (p=0.012 n=30+30)
BM_LIBC_Bcmp_3_L1                                [Bcmp_3]  2.17GB/s ± 5%        2.29GB/s ± 6%   +5.61%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_3_L2                                [Bcmp_3]  2.02GB/s ± 6%        2.20GB/s ± 6%   +8.75%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_3_LLC                               [Bcmp_3]  1.22GB/s ± 8%        1.34GB/s ± 9%   +9.19%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_3_Cold                              [Bcmp_3]   447MB/s ± 3%         441MB/s ± 7%   -1.40%  (p=0.033 n=30+30)
BM_LIBC_Bcmp_4_L1                                [Bcmp_4]   902MB/s ± 6%         995MB/s ±10%  +10.37%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_4_L2                                [Bcmp_4]   863MB/s ± 5%         945MB/s ±11%   +9.50%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_4_LLC                               [Bcmp_4]   528MB/s ±11%         559MB/s ±12%   +5.75%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_4_Cold                              [Bcmp_4]   183MB/s ± 4%         181MB/s ± 7%     ~     (p=0.088 n=28+30)
BM_LIBC_Bcmp_5_L1                                [Bcmp_5]  1.70GB/s ± 6%        1.87GB/s ± 8%  +10.14%  (p=0.000 n=29+29)
BM_LIBC_Bcmp_5_L2                                [Bcmp_5]  1.60GB/s ± 5%        1.80GB/s ± 9%  +12.61%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_5_LLC                               [Bcmp_5]   994MB/s ±13%        1094MB/s ± 8%  +10.10%  (p=0.000 n=29+30)
BM_LIBC_Bcmp_5_Cold                              [Bcmp_5]   362MB/s ± 6%         358MB/s ± 7%     ~     (p=0.123 n=30+30)
BM_LIBC_Bcmp_6_L1                                [Bcmp_6]  3.31GB/s ± 5%        3.67GB/s ± 6%  +10.90%  (p=0.000 n=28+30)
BM_LIBC_Bcmp_6_L2                                [Bcmp_6]  3.11GB/s ± 5%        3.53GB/s ± 5%  +13.59%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_6_LLC                               [Bcmp_6]  1.98GB/s ± 9%        2.18GB/s ± 8%  +10.34%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_6_Cold                              [Bcmp_6]   754MB/s ± 5%         752MB/s ± 5%     ~     (p=0.592 n=30+30)
BM_LIBC_Bcmp_7_L1                                [Bcmp_7]  1.72GB/s ± 5%        1.72GB/s ± 6%     ~     (p=0.549 n=29+29)
BM_LIBC_Bcmp_7_L2                                [Bcmp_7]  1.61GB/s ± 7%        1.63GB/s ± 8%     ~     (p=0.191 n=30+29)
BM_LIBC_Bcmp_7_LLC                               [Bcmp_7]   913MB/s ± 8%         905MB/s ± 9%     ~     (p=0.423 n=30+30)
BM_LIBC_Bcmp_7_Cold                              [Bcmp_7]   304MB/s ± 6%         287MB/s ± 4%   -5.57%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_8_L1                                [Bcmp_8]   961MB/s ± 5%        1124MB/s ± 6%  +16.94%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_8_L2                                [Bcmp_8]   915MB/s ± 8%        1100MB/s ± 7%  +20.16%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_8_LLC                               [Bcmp_8]   593MB/s ± 8%         669MB/s ± 8%  +12.92%  (p=0.000 n=30+30)
BM_LIBC_Bcmp_8_Cold                              [Bcmp_8]   220MB/s ± 4%         220MB/s ± 6%     ~     (p=0.572 n=30+30)
```

Co-authored-by: goldvitaly@google.com <%username%@google.com>
2024-09-06 11:19:01 +02:00
wldfngrs
73514f6831
[libc] Add proxy header for __sighandler_t type (#107354)
Added proxy headers for __sighandler_t type, modified the corresponding
CMakeLists.txt files and test files
2024-09-05 18:04:35 -04:00
Michael Jones
8e28f0471b
[libc] Correct the entrypoints list for ARM/darwin (#107331)
These entrypoints were added to every target without testing. They don't
work on ARM macs.
2024-09-05 09:38:05 -07:00
Petr Hosek
8b77aa990b
[libc] Use correct names for locale variants in spec.td (#106806)
This addresses issue introduced in #105718.
2024-08-30 15:13:23 -07:00
Joseph Huber
5c019bdb7a
[libc] Add support for 'string.h' locale variants (#105719)
Summary:
This adds the locale variants of the string functions. As previously,
these do not use the locale information at all and simply copy the
non-locale version which expects the "C" locale.
2024-08-29 14:20:15 -05:00
Joseph Huber
a87105121d
[libc] Implement locale variants for 'stdlib.h' functions (#105718)
Summary:
This provides the `_l` variants for the `stdlib.h` functions. These are
just copies of the same entrypoint and don't do anything with the locale
information.
2024-08-29 14:18:37 -05:00
Job Henandez Lara
1ace91f925
[libc][math] Add performance tests for fmul and fmull. (#106262) 2024-08-29 14:14:18 -04:00
Guillaume Chatelet
73ef397fcb
[libc][x86] Use prefetch for write for memcpy (#90450)
Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we
prefetch memory for read on the source buffer. This patch adds prefetch
for write on the destination buffer.
2024-08-29 14:17:23 +02:00
Joseph Huber
439d7de14d [libc] Disable failing scanf test on AMDGPU temporarily
Summary:
This test currently fails in the `amdgpu-attributor` pass. I haven't
figured out anything beyond that yet as it's difficult to reduce.
2024-08-28 07:04:15 -05:00
Joseph Huber
8fd9ec5817 [libc] Fix incorrect check for NVPTX backend 2024-08-28 07:04:15 -05:00
Michael Jones
bfc7540e15
[libc] Fix file collision causing test flake (#106119)
In patch #105293 tests for vfscanf were added, meant to be identical to
the fscanf tests. Unfortunately, the author forgot to rename the target
file causing an occasional test flake where one test writes to the file
while the other is trying to read it. This patch fixes the issue by
renaming the target test file for the vfscanf test.
2024-08-26 12:04:24 -07:00
Joseph Huber
b8f134faba
[libc] Implement 'vfscanf' and 'vscanf' (#105293)
Summary:
These are simply forwarding the vlist to the existing helper.
2024-08-26 09:00:10 -05:00
Joseph Huber
c2a96a243b [libc] Fix locale structs with old headergen 2024-08-22 13:51:54 -05:00
Joseph Huber
856dadb33c [libc] Add ctype.h locale variants (#102711)
Summary:
This patch adds all the libc ctype variants. These ignore the locale
ingormation completely, so they're pretty much just stubs. Because these
use locale information, which is system scope, we do not enable building
them outisde of full build mode.
2024-08-22 13:51:54 -05:00
Joseph Huber
518b1f0283 [libc] Fix leftover thread local 2024-08-22 13:09:56 -05:00
Joseph Huber
78d8ab2ab9
[libc] Initial support for 'locale.h' in the LLVM libc (#102689)
Summary:
This patch adds the macros and entrypoints associated with the
`locale.h` entrypoints.  These are mostly stubs, as we (for now and the
forseeable future) only expect to support the C and maybe C.UTF-8
locales in the LLVM libc.
2024-08-22 12:58:46 -05:00
Joseph Huber
2f4232db0b Revert " [libc] Add ctype.h locale variants (#102711)"
This reverts commit 8f005f8306dc52577b3b9482d271fb463f0152a5.
2024-08-22 12:45:16 -05:00
Joseph Huber
8f005f8306
[libc] Add ctype.h locale variants (#102711)
Summary:
This patch adds all the libc ctype variants. These ignore the locale
ingormation completely, so they're pretty much just stubs. Because these
use locale information, which is system scope, we do not enable building
them outisde of full build mode.
2024-08-22 12:41:20 -05:00
Joseph Huber
2b66417d08 [libc] Fix accidentally using system file on GPU
Summary:
Forgot to delete this
2024-08-21 20:08:55 -05:00
Joseph Huber
6b98a72365
[libc] Add scanf support to the GPU build (#104812)
Summary:
The `scanf` function has a "system file" configuration, which is pretty
much what the GPU implementation does at this point. So we should be
able to use it in much the same way.
2024-08-21 18:02:04 -05:00
Michael Jones
b89fef8f67
[libc][docs] Update docs to reflect new headergen (#102381)
Since new headergen is now the default for building LLVM-libc, the docs
need to be updated to reflect that. While I was editing those docs, I
took a quick pass at updating other out-of-date pages.
2024-08-21 10:50:39 -07:00
Michael Jones
a3c66c8f35
[libc] move newheadergen back to safe_load (#105374)
In #100024 we moved from safe_load to load for reading the yaml in
newheadergen due to dependency issues. Those should be resolved by now
so this should be a simple safety improvement.
2024-08-20 14:22:10 -07:00
Michael Jones
2353f484a5
[libc] Include startup code when installing all (#105203)
Previously the libc startup code was marked `EXCLUDE_FROM_ALL` due to
build issues. This patch removes that as no longer necessary.
2024-08-20 13:54:09 -07:00
lntue
54c6b93bcb
[libc][NFC] Add sollya script to compute worst case range reduction. (#104803) 2024-08-19 17:58:46 -04:00
jameshu15869
deb6b45c32
[libc][gpu] Add Atan2 Benchmarks (#104708)
This PR adds benchmarking for `atan2()`, `__nv_atan2()`, and
`__ocml_atan2_f64()` using the same setup as `sin()`. This PR also adds
support for throughout bencmarking for functions with 2 inputs.
2024-08-18 12:50:30 -05:00
Joseph Huber
5c13f9aea2
[libc] Add single threaded kernel attributes to AMDGPU startup utility (#104651)
Summary:
I fixed the errors here recently so I can actually use these. This
shouldn't impact much, just should hopefully make the code generated
slightly better.
2024-08-18 12:50:15 -05:00