Guillaume Chatelet
bc4f3e31a9
[libc][NFC] Selectively disable GCC warnings ( #78462 )
2024-01-18 10:36:21 +01:00
AtariDreams
e06b5a2435
[libc] Give more functions restrict qualifiers (NFC) ( #78061 )
...
strsep, strtok_r, strlcpy, and strlcat take restricted pointers as
parameters.
Add the restrict qualifiers to them.
Sources:
https://man7.org/linux/man-pages/man3/strsep.3.html
https://man7.org/linux/man-pages/man3/strtok_r.3.html
https://man.freebsd.org/cgi/man.cgi?strlcpy
2024-01-15 12:12:09 -06:00
Guillaume Chatelet
5794854213
[libc][NFC] Use 16-byte indices for _mmXXX_shuffle_epi8 ( #77781 )
...
This is less confusing since the implementation only cares about the 4
lower bits.
2024-01-11 16:25:55 +01:00
Guillaume Chatelet
9ca6e5bb86
[libc] Fix buggy AVX2 / AVX512 memcmp ( #77081 )
...
Fixes #77080 .
2024-01-11 11:45:37 +01:00
Nick Desaulniers
1689bbea17
[libc] fix up #77384
2024-01-08 16:18:31 -08:00
Nick Desaulniers
6958986f77
[libc] fix -Wconversion ( #77384 )
...
Fixes the following from GCC:
llvm-project/libc/src/string/memory_utils/op_x86.h:236:24: error:
conversion from ‘long unsigned int’ to ‘uint32_t’ {aka ‘unsigned int’}
may
change value [-Werror=conversion]
236 | return (xored >> 32) | (xored & 0xFFFFFFFF);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
Link:
https://lab.llvm.org/buildbot/#/builders/250/builds/16236/steps/8/logs/stdio
Link: https://github.com/llvm/llvm-project/pull/74506
2024-01-08 16:08:22 -08:00
Nick Desaulniers
5352ce32fc
[libc] fix -Warray-bounds in block_offset ( #77001 )
...
GCC reports an instance of -Warray-bounds in block_offset. Reimplement
block_offset in terms of memcpy_inline which was created to avoid this
diagnostic. See the linked issue for the full trace of diagnostic.
Fixes: https://github.com/llvm/llvm-project/issues/76877
2024-01-05 08:19:04 -08:00
Guillaume Chatelet
64671dbebc
[libc] Remove unnecessary call in memfunction dispatchers ( #75800 )
...
Before this patch the compiler could generate unnecessary calls to the
selected implementation.
https://clang.llvm.org/docs/AttributeReference.html#flatten
2023-12-19 13:57:44 +01:00
Guillaume Chatelet
1d89478830
[reland][libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead ( #73939 ) ( #74446 )
...
Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h`
that was still using `deferred_static_assert`.
2023-12-05 11:35:13 +01:00
Guillaume Chatelet
de7fdc5b54
Revert "[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead" ( #74444 )
...
Reverts llvm/llvm-project#73939
This broke libc-aarch64-ubuntu build bot
https://lab.llvm.org/buildbot/#/builders/138/builds/56186
2023-12-05 11:25:39 +01:00
Guillaume Chatelet
b140948850
[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead ( #73939 )
2023-12-05 11:21:07 +01:00
Guillaume Chatelet
8628ca29aa
[libc] Fix UB in memory utils ( #74295 )
...
The [standard](https://eel.is/c++draft/expr.add#4.3 ) forbids forming
pointers to invalid objects even if the pointer is never read from or
written to. This patch makes sure that we don't do pointer arithmetic on
invalid pointers.
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
2023-12-04 10:57:35 +01:00
Guillaume Chatelet
e2a37e5130
[libc][NFC] Fix missing LIBC_INLINE + style ( #73659 )
2023-11-29 10:37:54 +01:00
doshimili
3153aa4c95
[libc] Adding a version of memset with software prefetching ( #70857 )
...
Software prefetching helps recover performance when hardware prefetching
is disabled. The 'LIBC_COPT_MEMSET_X86_USE_SOFTWARE_PREFETCHING' compile
time option allows users to use this patch.
2023-11-10 10:56:16 +01:00
Dmitry Vyukov
d275277544
[libc] Optimize mempcy size thresholds ( #70049 )
...
Adjust boundary conditions for sizes = 16/32/64.
See the added comment for explanations.
Results on a machine with AVX2, so sizes 64/128 affected:
```
│ baseline │ adjusted │
│ sec/op │ sec/op vs base │
memcpy/Google_A 5.701n ± 0% 5.551n ± 1% -2.63% (n=100)
memcpy/Google_B 3.817n ± 0% 3.776n ± 0% -1.07% (p=0.000 n=100)
memcpy/Google_D 11.35n ± 1% 11.32n ± 0% ~ (p=0.066 n=100)
memcpy/Google_U 3.874n ± 1% 3.821n ± 1% -1.37% (p=0.001 n=100)
memcpy/64 3.843n ± 0% 3.105n ± 3% -19.22% (n=50)
memcpy/128 4.842n ± 0% 3.818n ± 0% -21.15% (p=0.000 n=50)
```
2023-11-07 08:37:19 +01:00
Guillaume Chatelet
bdac972071
Fix load64_aligned ( #71391 )
...
Fix #64758 `load64_aligned` was missing a case for `alignment == 6`.
2023-11-06 14:59:26 +01:00
Dmitry Vyukov
0e110fb429
[libc] memmove optimizations ( #70043 )
...
1. Remove is_disjoint check for smaller sizes and reduce code bloat.
inline_memmove may handle some small sizes as efficiently
as inline_memcpy. For these sizes we may not do is_disjoint check.
This both avoids additional code for the most frequent smaller sizes
and removes code bloat (we don't need the memcpy logic for small sizes).
Here we heavily rely on inlining and dead code elimination: from the
first
inline_memmove we should get only handling of small sizes, and from
the second inline_memmove and inline_memcpy we should get only handling
of larger sizes.
2. Use the memcpy thresholds for memmove.
Memcpy thresholds were more carefully tuned.
This becomes more important since we use memmove
for all small sizes always now.
3. Fix boundary conditions for sizes = 16/32/64.
See the added comment for explanations.
Memmove function size drops from 885 to 715 bytes
due to removed duplication.
```
│ baseline │ small-size │
│ sec/op │ sec/op vs base │
memmove/Google_A 3.208n ± 0% 2.911n ± 0% -9.25% (n=100)
memmove/Google_B 4.113n ± 1% 3.428n ± 0% -16.65% (n=100)
memmove/Google_D 5.838n ± 0% 4.158n ± 0% -28.78% (n=100)
memmove/Google_S 4.712n ± 1% 3.899n ± 0% -17.25% (n=100)
memmove/Google_U 3.609n ± 0% 3.247n ± 1% -10.02% (n=100)
memmove/0 2.982n ± 0% 2.169n ± 0% -27.26% (n=50)
memmove/1 3.253n ± 0% 2.168n ± 0% -33.34% (n=50)
memmove/2 3.255n ± 0% 2.169n ± 0% -33.38% (n=50)
memmove/3 3.259n ± 2% 2.175n ± 0% -33.27% (p=0.000 n=50)
memmove/4 3.259n ± 0% 2.168n ± 5% -33.46% (p=0.000 n=50)
memmove/5 2.488n ± 0% 1.926n ± 0% -22.57% (p=0.000 n=50)
memmove/6 2.490n ± 0% 1.928n ± 0% -22.59% (p=0.000 n=50)
memmove/7 2.492n ± 0% 1.927n ± 0% -22.65% (p=0.000 n=50)
memmove/8 2.737n ± 0% 2.711n ± 0% -0.97% (p=0.000 n=50)
memmove/9 2.736n ± 0% 2.711n ± 0% -0.94% (p=0.000 n=50)
memmove/10 2.739n ± 0% 2.711n ± 0% -1.04% (p=0.000 n=50)
memmove/11 2.740n ± 0% 2.711n ± 0% -1.07% (p=0.000 n=50)
memmove/12 2.740n ± 0% 2.711n ± 0% -1.09% (p=0.000 n=50)
memmove/13 2.744n ± 0% 2.711n ± 0% -1.22% (p=0.000 n=50)
memmove/14 2.742n ± 0% 2.711n ± 0% -1.14% (p=0.000 n=50)
memmove/15 2.742n ± 0% 2.711n ± 0% -1.15% (p=0.000 n=50)
memmove/16 2.997n ± 0% 2.981n ± 0% -0.52% (p=0.000 n=50)
memmove/17 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50)
memmove/18 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50)
memmove/19 2.999n ± 0% 2.982n ± 0% -0.59% (p=0.000 n=50)
memmove/20 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50)
memmove/21 3.000n ± 0% 2.981n ± 0% -0.61% (p=0.000 n=50)
memmove/22 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50)
memmove/23 3.002n ± 0% 2.981n ± 0% -0.67% (p=0.000 n=50)
memmove/24 3.002n ± 0% 2.981n ± 0% -0.70% (n=50)
memmove/25 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50)
memmove/26 3.004n ± 0% 2.982n ± 0% -0.74% (p=0.000 n=50)
memmove/27 3.005n ± 0% 2.981n ± 0% -0.79% (n=50)
memmove/28 3.005n ± 0% 2.982n ± 0% -0.77% (n=50)
memmove/29 3.009n ± 0% 2.981n ± 0% -0.92% (n=50)
memmove/30 3.008n ± 0% 2.981n ± 0% -0.89% (n=50)
memmove/31 3.007n ± 0% 2.982n ± 0% -0.86% (n=50)
memmove/32 3.540n ± 0% 2.998n ± 0% -15.31% (p=0.000 n=50)
memmove/33 3.544n ± 0% 2.997n ± 0% -15.44% (p=0.000 n=50)
memmove/34 3.546n ± 0% 2.999n ± 0% -15.42% (n=50)
memmove/35 3.545n ± 0% 2.999n ± 0% -15.40% (n=50)
memmove/36 3.548n ± 0% 2.998n ± 0% -15.52% (p=0.000 n=50)
memmove/37 3.546n ± 0% 3.000n ± 0% -15.41% (n=50)
memmove/38 3.549n ± 0% 2.999n ± 0% -15.49% (p=0.000 n=50)
memmove/39 3.549n ± 0% 2.999n ± 0% -15.48% (p=0.000 n=50)
memmove/40 3.549n ± 0% 3.000n ± 0% -15.46% (p=0.000 n=50)
memmove/41 3.550n ± 0% 3.001n ± 0% -15.47% (n=50)
memmove/42 3.549n ± 0% 3.001n ± 0% -15.43% (n=50)
memmove/43 3.552n ± 0% 3.001n ± 0% -15.52% (p=0.000 n=50)
memmove/44 3.552n ± 0% 3.001n ± 0% -15.51% (n=50)
memmove/45 3.552n ± 0% 3.002n ± 0% -15.48% (n=50)
memmove/46 3.554n ± 0% 3.001n ± 0% -15.55% (p=0.000 n=50)
memmove/47 3.556n ± 0% 3.002n ± 0% -15.58% (p=0.000 n=50)
memmove/48 3.555n ± 0% 3.003n ± 0% -15.54% (n=50)
memmove/49 3.557n ± 0% 3.002n ± 0% -15.59% (p=0.000 n=50)
memmove/50 3.557n ± 0% 3.004n ± 0% -15.55% (p=0.000 n=50)
memmove/51 3.556n ± 0% 3.004n ± 0% -15.53% (p=0.000 n=50)
memmove/52 3.561n ± 0% 3.004n ± 0% -15.65% (p=0.000 n=50)
memmove/53 3.558n ± 0% 3.004n ± 0% -15.57% (p=0.000 n=50)
memmove/54 3.561n ± 0% 3.005n ± 0% -15.62% (n=50)
memmove/55 3.560n ± 0% 3.006n ± 0% -15.57% (n=50)
memmove/56 3.562n ± 0% 3.006n ± 0% -15.60% (p=0.000 n=50)
memmove/57 3.563n ± 0% 3.006n ± 0% -15.64% (n=50)
memmove/58 3.565n ± 0% 3.007n ± 0% -15.64% (p=0.000 n=50)
memmove/59 3.564n ± 0% 3.006n ± 0% -15.66% (p=0.000 n=50)
memmove/60 3.570n ± 0% 3.008n ± 0% -15.74% (p=0.000 n=50)
memmove/61 3.566n ± 0% 3.009n ± 0% -15.63% (p=0.000 n=50)
memmove/62 3.567n ± 0% 3.007n ± 0% -15.70% (p=0.000 n=50)
memmove/63 3.568n ± 0% 3.008n ± 0% -15.71% (p=0.000 n=50)
memmove/64 4.104n ± 0% 3.008n ± 0% -26.70% (p=0.000 n=50)
memmove/65 4.126n ± 0% 3.662n ± 0% -11.26% (p=0.000 n=50)
memmove/66 4.128n ± 0% 3.662n ± 0% -11.29% (n=50)
memmove/67 4.129n ± 0% 3.662n ± 0% -11.31% (n=50)
memmove/68 4.129n ± 0% 3.661n ± 0% -11.33% (p=0.000 n=50)
memmove/69 4.130n ± 0% 3.662n ± 0% -11.34% (p=0.000 n=50)
memmove/70 4.130n ± 0% 3.662n ± 0% -11.33% (n=50)
memmove/71 4.132n ± 0% 3.662n ± 0% -11.38% (p=0.000 n=50)
memmove/72 4.131n ± 0% 3.661n ± 0% -11.39% (n=50)
memmove/73 4.135n ± 0% 3.661n ± 0% -11.45% (p=0.000 n=50)
memmove/74 4.137n ± 0% 3.662n ± 0% -11.49% (n=50)
memmove/75 4.138n ± 0% 3.662n ± 0% -11.51% (p=0.000 n=50)
memmove/76 4.139n ± 0% 3.661n ± 0% -11.56% (p=0.000 n=50)
memmove/77 4.136n ± 0% 3.662n ± 0% -11.47% (p=0.000 n=50)
memmove/78 4.143n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50)
memmove/79 4.142n ± 0% 3.661n ± 0% -11.60% (n=50)
memmove/80 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50)
memmove/81 4.140n ± 0% 3.661n ± 0% -11.57% (n=50)
memmove/82 4.146n ± 0% 3.661n ± 0% -11.69% (n=50)
memmove/83 4.143n ± 0% 3.661n ± 0% -11.63% (p=0.000 n=50)
memmove/84 4.143n ± 0% 3.661n ± 0% -11.63% (n=50)
memmove/85 4.147n ± 0% 3.661n ± 0% -11.73% (p=0.000 n=50)
memmove/86 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50)
memmove/87 4.147n ± 0% 3.661n ± 0% -11.72% (p=0.000 n=50)
memmove/88 4.148n ± 0% 3.661n ± 0% -11.74% (n=50)
memmove/89 4.152n ± 0% 3.661n ± 0% -11.84% (n=50)
memmove/90 4.151n ± 0% 3.661n ± 0% -11.81% (n=50)
memmove/91 4.150n ± 0% 3.661n ± 0% -11.78% (n=50)
memmove/92 4.153n ± 0% 3.661n ± 0% -11.86% (n=50)
memmove/93 4.158n ± 0% 3.661n ± 0% -11.95% (n=50)
memmove/94 4.157n ± 0% 3.661n ± 0% -11.95% (p=0.000 n=50)
memmove/95 4.155n ± 0% 3.661n ± 0% -11.90% (p=0.000 n=50)
memmove/96 4.149n ± 0% 3.660n ± 0% -11.79% (n=50)
memmove/97 4.157n ± 0% 3.661n ± 0% -11.94% (n=50)
memmove/98 4.157n ± 0% 3.661n ± 0% -11.94% (n=50)
memmove/99 4.168n ± 0% 3.661n ± 0% -12.17% (p=0.000 n=50)
memmove/100 4.159n ± 0% 3.660n ± 0% -12.00% (p=0.000 n=50)
memmove/101 4.161n ± 0% 3.660n ± 0% -12.03% (p=0.000 n=50)
memmove/102 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50)
memmove/103 4.164n ± 0% 3.661n ± 0% -12.08% (n=50)
memmove/104 4.164n ± 0% 3.660n ± 0% -12.11% (n=50)
memmove/105 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50)
memmove/106 4.166n ± 0% 3.660n ± 0% -12.15% (n=50)
memmove/107 4.171n ± 0% 3.660n ± 1% -12.26% (p=0.000 n=50)
memmove/108 4.173n ± 0% 3.660n ± 0% -12.30% (p=0.000 n=50)
memmove/109 4.170n ± 0% 3.660n ± 0% -12.24% (n=50)
memmove/110 4.174n ± 0% 3.660n ± 0% -12.31% (n=50)
memmove/111 4.176n ± 0% 3.660n ± 0% -12.35% (p=0.000 n=50)
memmove/112 4.174n ± 0% 3.659n ± 0% -12.34% (p=0.000 n=50)
memmove/113 4.176n ± 0% 3.660n ± 0% -12.35% (n=50)
memmove/114 4.182n ± 0% 3.660n ± 0% -12.49% (n=50)
memmove/115 4.185n ± 0% 3.660n ± 0% -12.55% (n=50)
memmove/116 4.184n ± 0% 3.659n ± 0% -12.54% (n=50)
memmove/117 4.182n ± 0% 3.660n ± 0% -12.50% (n=50)
memmove/118 4.188n ± 0% 3.660n ± 0% -12.61% (n=50)
memmove/119 4.186n ± 0% 3.660n ± 0% -12.57% (p=0.000 n=50)
memmove/120 4.189n ± 0% 3.659n ± 0% -12.63% (n=50)
memmove/121 4.187n ± 0% 3.660n ± 0% -12.60% (n=50)
memmove/122 4.186n ± 0% 3.660n ± 0% -12.58% (n=50)
memmove/123 4.187n ± 0% 3.660n ± 0% -12.60% (n=50)
memmove/124 4.189n ± 0% 3.659n ± 0% -12.65% (n=50)
memmove/125 4.195n ± 0% 3.659n ± 0% -12.78% (n=50)
memmove/126 4.197n ± 0% 3.659n ± 0% -12.81% (n=50)
memmove/127 4.194n ± 0% 3.659n ± 0% -12.75% (n=50)
memmove/128 5.035n ± 0% 3.659n ± 0% -27.32% (n=50)
memmove/129 5.127n ± 0% 5.164n ± 0% +0.73% (p=0.000 n=50)
memmove/130 5.130n ± 0% 5.176n ± 0% +0.88% (p=0.000 n=50)
memmove/131 5.127n ± 0% 5.180n ± 0% +1.05% (p=0.000 n=50)
memmove/132 5.131n ± 0% 5.169n ± 0% +0.75% (p=0.000 n=50)
memmove/133 5.137n ± 0% 5.179n ± 0% +0.81% (p=0.000 n=50)
memmove/134 5.140n ± 0% 5.178n ± 0% +0.74% (p=0.000 n=50)
memmove/135 5.141n ± 0% 5.187n ± 0% +0.88% (p=0.000 n=50)
memmove/136 5.133n ± 0% 5.184n ± 0% +0.99% (p=0.000 n=50)
memmove/137 5.148n ± 0% 5.186n ± 0% +0.73% (p=0.000 n=50)
memmove/138 5.143n ± 0% 5.189n ± 0% +0.88% (p=0.000 n=50)
memmove/139 5.142n ± 0% 5.192n ± 0% +0.97% (p=0.000 n=50)
memmove/140 5.141n ± 0% 5.192n ± 0% +1.01% (p=0.000 n=50)
memmove/141 5.155n ± 0% 5.188n ± 0% +0.64% (p=0.000 n=50)
memmove/142 5.146n ± 0% 5.192n ± 0% +0.90% (p=0.000 n=50)
memmove/143 5.142n ± 0% 5.203n ± 0% +1.19% (p=0.000 n=50)
memmove/144 5.146n ± 0% 5.197n ± 0% +0.99% (p=0.000 n=50)
memmove/145 5.146n ± 0% 5.196n ± 0% +0.97% (p=0.000 n=50)
memmove/146 5.151n ± 0% 5.207n ± 0% +1.10% (p=0.000 n=50)
memmove/147 5.151n ± 0% 5.205n ± 0% +1.06% (p=0.000 n=50)
memmove/148 5.156n ± 0% 5.190n ± 0% +0.66% (p=0.000 n=50)
memmove/149 5.158n ± 0% 5.212n ± 0% +1.04% (p=0.000 n=50)
memmove/150 5.160n ± 0% 5.203n ± 0% +0.84% (p=0.000 n=50)
memmove/151 5.167n ± 0% 5.210n ± 0% +0.83% (p=0.000 n=50)
memmove/152 5.157n ± 0% 5.206n ± 0% +0.94% (p=0.000 n=50)
memmove/153 5.170n ± 0% 5.211n ± 0% +0.80% (p=0.000 n=50)
memmove/154 5.169n ± 0% 5.222n ± 0% +1.02% (p=0.000 n=50)
memmove/155 5.171n ± 0% 5.215n ± 0% +0.87% (p=0.000 n=50)
memmove/156 5.174n ± 0% 5.214n ± 0% +0.78% (p=0.000 n=50)
memmove/157 5.171n ± 0% 5.218n ± 0% +0.92% (p=0.000 n=50)
memmove/158 5.168n ± 0% 5.224n ± 0% +1.09% (p=0.000 n=50)
memmove/159 5.179n ± 0% 5.218n ± 0% +0.76% (p=0.000 n=50)
memmove/160 5.170n ± 0% 5.219n ± 0% +0.95% (p=0.000 n=50)
memmove/161 5.187n ± 0% 5.220n ± 0% +0.64% (p=0.000 n=50)
memmove/162 5.189n ± 0% 5.234n ± 0% +0.86% (p=0.000 n=50)
memmove/163 5.199n ± 0% 5.250n ± 0% +0.99% (p=0.000 n=50)
memmove/164 5.205n ± 0% 5.260n ± 0% +1.04% (p=0.000 n=50)
memmove/165 5.208n ± 0% 5.261n ± 0% +1.01% (p=0.000 n=50)
memmove/166 5.227n ± 0% 5.275n ± 0% +0.91% (p=0.000 n=50)
memmove/167 5.233n ± 0% 5.281n ± 0% +0.92% (p=0.000 n=50)
memmove/168 5.236n ± 0% 5.295n ± 0% +1.12% (p=0.000 n=50)
memmove/169 5.256n ± 0% 5.297n ± 0% +0.79% (p=0.000 n=50)
memmove/170 5.259n ± 0% 5.302n ± 0% +0.80% (p=0.000 n=50)
memmove/171 5.269n ± 0% 5.321n ± 0% +0.97% (p=0.000 n=50)
memmove/172 5.266n ± 0% 5.318n ± 0% +0.98% (p=0.000 n=50)
memmove/173 5.272n ± 0% 5.330n ± 0% +1.09% (p=0.000 n=50)
memmove/174 5.284n ± 0% 5.331n ± 0% +0.89% (p=0.000 n=50)
memmove/175 5.284n ± 0% 5.322n ± 0% +0.72% (p=0.000 n=50)
memmove/176 5.298n ± 0% 5.337n ± 0% +0.74% (p=0.000 n=50)
memmove/177 5.282n ± 0% 5.338n ± 0% +1.04% (p=0.000 n=50)
memmove/178 5.299n ± 0% 5.337n ± 0% +0.71% (p=0.000 n=50)
memmove/179 5.296n ± 0% 5.343n ± 0% +0.88% (p=0.000 n=50)
memmove/180 5.292n ± 0% 5.343n ± 0% +0.97% (p=0.000 n=50)
memmove/181 5.303n ± 0% 5.335n ± 0% +0.60% (p=0.000 n=50)
memmove/182 5.305n ± 0% 5.338n ± 0% +0.62% (p=0.000 n=50)
memmove/183 5.298n ± 0% 5.329n ± 0% +0.59% (p=0.000 n=50)
memmove/184 5.299n ± 0% 5.333n ± 0% +0.64% (p=0.000 n=50)
memmove/185 5.291n ± 0% 5.330n ± 0% +0.73% (p=0.000 n=50)
memmove/186 5.296n ± 0% 5.332n ± 0% +0.68% (p=0.000 n=50)
memmove/187 5.297n ± 0% 5.320n ± 0% +0.44% (p=0.000 n=50)
memmove/188 5.286n ± 0% 5.314n ± 0% +0.53% (p=0.000 n=50)
memmove/189 5.293n ± 0% 5.318n ± 0% +0.46% (p=0.000 n=50)
memmove/190 5.294n ± 0% 5.318n ± 0% +0.45% (p=0.000 n=50)
memmove/191 5.292n ± 0% 5.314n ± 0% +0.40% (p=0.032 n=50)
memmove/192 5.272n ± 0% 5.304n ± 0% +0.60% (p=0.000 n=50)
memmove/193 5.279n ± 0% 5.310n ± 0% +0.57% (p=0.000 n=50)
memmove/194 5.294n ± 0% 5.308n ± 0% +0.26% (p=0.018 n=50)
memmove/195 5.302n ± 0% 5.311n ± 0% +0.18% (p=0.010 n=50)
memmove/196 5.301n ± 0% 5.316n ± 0% +0.28% (p=0.023 n=50)
memmove/197 5.302n ± 0% 5.327n ± 0% +0.47% (p=0.000 n=50)
memmove/198 5.310n ± 0% 5.326n ± 0% +0.30% (p=0.003 n=50)
memmove/199 5.303n ± 0% 5.319n ± 0% +0.30% (p=0.009 n=50)
memmove/200 5.312n ± 0% 5.330n ± 0% +0.35% (p=0.001 n=50)
memmove/201 5.307n ± 0% 5.333n ± 0% +0.50% (p=0.000 n=50)
memmove/202 5.311n ± 0% 5.334n ± 0% +0.44% (p=0.000 n=50)
memmove/203 5.313n ± 0% 5.335n ± 0% +0.41% (p=0.006 n=50)
memmove/204 5.312n ± 0% 5.332n ± 0% +0.36% (p=0.002 n=50)
memmove/205 5.318n ± 0% 5.345n ± 0% +0.50% (p=0.000 n=50)
memmove/206 5.311n ± 0% 5.333n ± 0% +0.42% (p=0.002 n=50)
memmove/207 5.310n ± 0% 5.338n ± 0% +0.52% (p=0.000 n=50)
memmove/208 5.319n ± 0% 5.341n ± 0% +0.40% (p=0.004 n=50)
memmove/209 5.330n ± 0% 5.346n ± 0% +0.30% (p=0.004 n=50)
memmove/210 5.329n ± 0% 5.349n ± 0% +0.38% (p=0.002 n=50)
memmove/211 5.318n ± 0% 5.340n ± 0% +0.41% (p=0.000 n=50)
memmove/212 5.339n ± 0% 5.343n ± 0% ~ (p=0.396 n=50)
memmove/213 5.329n ± 0% 5.343n ± 0% +0.25% (p=0.017 n=50)
memmove/214 5.339n ± 0% 5.358n ± 0% +0.35% (p=0.035 n=50)
memmove/215 5.342n ± 0% 5.346n ± 0% ~ (p=0.063 n=50)
memmove/216 5.338n ± 0% 5.359n ± 0% +0.39% (p=0.002 n=50)
memmove/217 5.341n ± 0% 5.362n ± 0% +0.39% (p=0.015 n=50)
memmove/218 5.354n ± 0% 5.373n ± 0% +0.36% (p=0.041 n=50)
memmove/219 5.352n ± 0% 5.362n ± 0% ~ (p=0.143 n=50)
memmove/220 5.344n ± 0% 5.370n ± 0% +0.50% (p=0.001 n=50)
memmove/221 5.345n ± 0% 5.373n ± 0% +0.53% (p=0.000 n=50)
memmove/222 5.348n ± 0% 5.360n ± 0% +0.23% (p=0.014 n=50)
memmove/223 5.354n ± 0% 5.377n ± 0% +0.43% (p=0.024 n=50)
memmove/224 5.352n ± 0% 5.363n ± 0% ~ (p=0.052 n=50)
memmove/225 5.372n ± 0% 5.380n ± 0% ~ (p=0.481 n=50)
memmove/226 5.368n ± 0% 5.386n ± 0% +0.34% (p=0.004 n=50)
memmove/227 5.386n ± 0% 5.402n ± 0% +0.29% (p=0.028 n=50)
memmove/228 5.400n ± 0% 5.408n ± 0% ~ (p=0.174 n=50)
memmove/229 5.423n ± 0% 5.427n ± 0% ~ (p=0.444 n=50)
memmove/230 5.411n ± 0% 5.429n ± 0% +0.33% (p=0.020 n=50)
memmove/231 5.420n ± 0% 5.433n ± 0% +0.24% (p=0.034 n=50)
memmove/232 5.435n ± 0% 5.441n ± 0% ~ (p=0.235 n=50)
memmove/233 5.446n ± 0% 5.462n ± 0% ~ (p=0.590 n=50)
memmove/234 5.467n ± 0% 5.461n ± 0% ~ (p=0.921 n=50)
memmove/235 5.472n ± 0% 5.478n ± 0% ~ (p=0.883 n=50)
memmove/236 5.466n ± 0% 5.478n ± 0% ~ (p=0.324 n=50)
memmove/237 5.471n ± 0% 5.489n ± 0% ~ (p=0.132 n=50)
memmove/238 5.485n ± 0% 5.489n ± 0% ~ (p=0.460 n=50)
memmove/239 5.484n ± 0% 5.488n ± 0% ~ (p=0.833 n=50)
memmove/240 5.483n ± 0% 5.495n ± 0% ~ (p=0.095 n=50)
memmove/241 5.498n ± 0% 5.514n ± 0% ~ (p=0.077 n=50)
memmove/242 5.518n ± 0% 5.517n ± 0% ~ (p=0.481 n=50)
memmove/243 5.514n ± 0% 5.511n ± 0% ~ (p=0.503 n=50)
memmove/244 5.510n ± 0% 5.497n ± 0% -0.24% (p=0.038 n=50)
memmove/245 5.516n ± 0% 5.505n ± 0% ~ (p=0.317 n=50)
memmove/246 5.513n ± 1% 5.494n ± 0% ~ (p=0.147 n=50)
memmove/247 5.518n ± 0% 5.499n ± 0% -0.36% (p=0.011 n=50)
memmove/248 5.503n ± 0% 5.492n ± 0% ~ (p=0.267 n=50)
memmove/249 5.498n ± 0% 5.497n ± 0% ~ (p=0.765 n=50)
memmove/250 5.485n ± 0% 5.493n ± 0% ~ (p=0.348 n=50)
memmove/251 5.503n ± 0% 5.482n ± 0% -0.37% (p=0.013 n=50)
memmove/252 5.497n ± 0% 5.485n ± 0% ~ (p=0.077 n=50)
memmove/253 5.489n ± 0% 5.496n ± 0% ~ (p=0.850 n=50)
memmove/254 5.497n ± 0% 5.491n ± 0% ~ (p=0.548 n=50)
memmove/255 5.484n ± 1% 5.494n ± 0% ~ (p=0.888 n=50)
memmove/256 6.952n ± 0% 7.676n ± 0% +10.41% (p=0.000 n=50)
geomean 4.406n 4.127n -6.33%
```
2023-10-26 13:40:25 +02:00
Dmitry Vyukov
f364a7a8b4
[libc] Speed up memmove overlapping check ( #70017 )
...
Use a check that requries fewer instructions and cheaper.
Current code:
```
1b704: 48 39 f7 cmp %rsi,%rdi
1b707: 48 89 f0 mov %rsi,%rax
1b70a: 48 0f 47 c7 cmova %rdi,%rax
1b70e: 48 89 f9 mov %rdi,%rcx
1b711: 48 0f 47 ce cmova %rsi,%rcx
1b715: 48 01 d1 add %rdx,%rcx
1b718: 48 39 c1 cmp %rax,%rcx
```
New code:
```
1b704: 48 89 f8 mov %rdi,%rax
1b707: 48 29 f0 sub %rsi,%rax
1b70a: 48 89 c1 mov %rax,%rcx
1b70d: 48 f7 d9 neg %rcx
1b710: 48 0f 48 c8 cmovs %rax,%rcx
1b714: 48 39 d1 cmp %rdx,%rcx
```
```
│ baseline │ disjoint │
│ sec/op │ sec/op vs base │
memmove/Google_A 3.910n ± 0% 3.861n ± 1% -1.26% (p=0.000 n=50)
```
```
│ baseline │ disjoint │
│ sec/op │ sec/op vs base │
memmove/1 2.724n ± 3% 2.441n ± 0% -10.37% (n=50)
memmove/2 2.878n ± 0% 2.713n ± 0% -5.73% (n=50)
memmove/3 2.835n ± 0% 2.593n ± 0% -8.54% (n=50)
memmove/4 3.032n ± 0% 2.776n ± 0% -8.45% (p=0.000 n=50)
memmove/5 2.833n ± 0% 2.600n ± 0% -8.20% (p=0.000 n=50)
memmove/6 2.758n ± 0% 2.744n ± 0% -0.52% (p=0.000 n=50)
memmove/7 2.762n ± 0% 2.744n ± 0% -0.63% (p=0.000 n=50)
memmove/8 2.763n ± 0% 2.750n ± 0% -0.46% (p=0.000 n=50)
memmove/9 3.182n ± 0% 3.269n ± 0% +2.75% (p=0.000 n=50)
memmove/10 3.185n ± 0% 3.270n ± 0% +2.64% (p=0.000 n=50)
memmove/11 3.188n ± 0% 3.277n ± 0% +2.79% (p=0.000 n=50)
memmove/12 3.190n ± 0% 3.279n ± 0% +2.82% (p=0.000 n=50)
memmove/13 3.194n ± 0% 3.281n ± 0% +2.73% (p=0.000 n=50)
memmove/14 3.197n ± 0% 3.285n ± 0% +2.77% (p=0.000 n=50)
memmove/15 3.198n ± 0% 3.282n ± 0% +2.62% (p=0.000 n=50)
memmove/16 3.201n ± 0% 3.284n ± 0% +2.61% (p=0.000 n=50)
memmove/17 3.564n ± 0% 3.320n ± 0% -6.86% (p=0.000 n=50)
memmove/18 3.572n ± 0% 3.313n ± 0% -7.25% (p=0.000 n=50)
memmove/19 3.572n ± 0% 3.325n ± 0% -6.94% (p=0.000 n=50)
memmove/20 3.575n ± 0% 3.319n ± 0% -7.15% (p=0.000 n=50)
memmove/21 3.578n ± 0% 3.327n ± 0% -7.03% (p=0.000 n=50)
memmove/22 3.581n ± 0% 3.330n ± 0% -7.01% (p=0.000 n=50)
memmove/23 3.582n ± 0% 3.354n ± 1% -6.37% (p=0.000 n=50)
memmove/24 3.587n ± 0% 3.347n ± 1% -6.71% (p=0.000 n=50)
memmove/25 3.591n ± 0% 3.320n ± 0% -7.55% (p=0.000 n=50)
memmove/26 3.593n ± 0% 3.348n ± 0% -6.82% (p=0.000 n=50)
memmove/27 3.596n ± 0% 3.346n ± 0% -6.94% (p=0.000 n=50)
memmove/28 3.597n ± 0% 3.357n ± 0% -6.67% (p=0.000 n=50)
memmove/29 3.601n ± 0% 3.340n ± 0% -7.23% (p=0.000 n=50)
memmove/30 3.602n ± 0% 3.345n ± 0% -7.12% (p=0.000 n=50)
memmove/31 3.608n ± 0% 3.357n ± 0% -6.94% (p=0.000 n=50)
memmove/32 3.605n ± 0% 3.352n ± 0% -7.01% (p=0.000 n=50)
memmove/33 4.128n ± 1% 3.829n ± 0% -7.23% (p=0.000 n=50)
memmove/34 4.149n ± 0% 3.836n ± 0% -7.54% (p=0.000 n=50)
memmove/35 4.134n ± 0% 3.839n ± 0% -7.15% (n=50)
memmove/36 4.151n ± 0% 3.842n ± 0% -7.45% (n=50)
memmove/37 4.152n ± 0% 3.841n ± 0% -7.49% (p=0.000 n=50)
memmove/38 4.159n ± 0% 3.844n ± 0% -7.58% (p=0.000 n=50)
memmove/39 4.165n ± 0% 3.841n ± 0% -7.78% (p=0.000 n=50)
memmove/40 4.162n ± 0% 3.837n ± 0% -7.81% (p=0.000 n=50)
memmove/41 4.161n ± 0% 3.845n ± 0% -7.58% (p=0.000 n=50)
memmove/42 4.164n ± 0% 3.851n ± 0% -7.53% (p=0.000 n=50)
memmove/43 4.165n ± 0% 3.843n ± 0% -7.74% (p=0.000 n=50)
memmove/44 4.175n ± 0% 3.847n ± 0% -7.83% (p=0.000 n=50)
memmove/45 4.170n ± 0% 3.849n ± 0% -7.70% (p=0.000 n=50)
memmove/46 4.175n ± 0% 3.850n ± 0% -7.79% (p=0.000 n=50)
memmove/47 4.180n ± 0% 3.851n ± 0% -7.87% (p=0.000 n=50)
memmove/48 4.178n ± 0% 3.852n ± 0% -7.81% (p=0.000 n=50)
memmove/49 4.175n ± 0% 3.851n ± 0% -7.76% (n=50)
memmove/50 4.178n ± 0% 3.855n ± 0% -7.73% (p=0.000 n=50)
memmove/51 4.190n ± 0% 3.859n ± 0% -7.91% (p=0.000 n=50)
memmove/52 4.188n ± 0% 3.859n ± 0% -7.84% (p=0.000 n=50)
memmove/53 4.191n ± 0% 3.863n ± 0% -7.82% (p=0.000 n=50)
memmove/54 4.192n ± 0% 3.860n ± 0% -7.91% (p=0.000 n=50)
memmove/55 4.192n ± 0% 3.869n ± 0% -7.70% (p=0.000 n=50)
memmove/56 4.204n ± 0% 3.866n ± 0% -8.05% (p=0.000 n=50)
memmove/57 4.198n ± 0% 3.864n ± 0% -7.95% (p=0.000 n=50)
memmove/58 4.202n ± 0% 3.865n ± 0% -8.02% (p=0.000 n=50)
memmove/59 4.208n ± 0% 3.868n ± 0% -8.09% (p=0.000 n=50)
memmove/60 4.205n ± 0% 3.873n ± 0% -7.89% (p=0.000 n=50)
memmove/61 4.212n ± 0% 3.872n ± 0% -8.08% (p=0.000 n=50)
memmove/62 4.214n ± 0% 3.870n ± 0% -8.16% (p=0.000 n=50)
memmove/63 4.215n ± 0% 3.877n ± 0% -8.02% (p=0.000 n=50)
memmove/64 4.217n ± 0% 3.881n ± 0% -7.99% (p=0.000 n=50)
memmove/65 4.990n ± 0% 4.683n ± 0% -6.15% (p=0.000 n=50)
memmove/66 5.022n ± 0% 4.719n ± 0% -6.03% (p=0.000 n=50)
memmove/67 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50)
memmove/68 5.035n ± 0% 4.724n ± 0% -6.18% (p=0.000 n=50)
memmove/69 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50)
memmove/70 5.040n ± 0% 4.728n ± 0% -6.19% (p=0.000 n=50)
memmove/71 5.053n ± 0% 4.728n ± 0% -6.43% (p=0.000 n=50)
memmove/72 5.050n ± 0% 4.732n ± 0% -6.29% (p=0.000 n=50)
memmove/73 5.049n ± 0% 4.733n ± 0% -6.24% (p=0.000 n=50)
memmove/74 5.054n ± 0% 4.734n ± 0% -6.34% (p=0.000 n=50)
memmove/75 5.063n ± 0% 4.736n ± 0% -6.46% (p=0.000 n=50)
memmove/76 5.046n ± 0% 4.741n ± 0% -6.04% (p=0.000 n=50)
memmove/77 5.057n ± 0% 4.741n ± 0% -6.25% (p=0.000 n=50)
memmove/78 5.077n ± 0% 4.739n ± 0% -6.65% (p=0.000 n=50)
memmove/79 5.074n ± 0% 4.746n ± 0% -6.46% (p=0.000 n=50)
memmove/80 5.085n ± 0% 4.747n ± 0% -6.65% (p=0.000 n=50)
memmove/81 5.077n ± 0% 4.735n ± 0% -6.74% (p=0.000 n=50)
memmove/82 5.087n ± 0% 4.747n ± 0% -6.68% (p=0.000 n=50)
memmove/83 5.087n ± 0% 4.754n ± 0% -6.56% (p=0.000 n=50)
memmove/84 5.096n ± 0% 4.753n ± 0% -6.73% (p=0.000 n=50)
memmove/85 5.082n ± 0% 4.749n ± 0% -6.55% (p=0.000 n=50)
memmove/86 5.103n ± 0% 4.752n ± 0% -6.87% (p=0.000 n=50)
memmove/87 5.096n ± 0% 4.760n ± 0% -6.61% (p=0.000 n=50)
memmove/88 5.099n ± 0% 4.765n ± 0% -6.55% (p=0.000 n=50)
memmove/89 5.104n ± 0% 4.757n ± 0% -6.79% (p=0.000 n=50)
memmove/90 5.117n ± 0% 4.767n ± 0% -6.84% (p=0.000 n=50)
memmove/91 5.100n ± 0% 4.766n ± 0% -6.54% (p=0.000 n=50)
memmove/92 5.103n ± 0% 4.763n ± 0% -6.67% (p=0.000 n=50)
memmove/93 5.115n ± 0% 4.772n ± 0% -6.71% (p=0.000 n=50)
memmove/94 5.117n ± 0% 4.769n ± 0% -6.80% (p=0.000 n=50)
memmove/95 5.131n ± 0% 4.775n ± 0% -6.94% (p=0.000 n=50)
memmove/96 5.129n ± 0% 4.772n ± 0% -6.97% (p=0.000 n=50)
memmove/97 5.130n ± 0% 4.764n ± 0% -7.13% (p=0.000 n=50)
memmove/98 5.134n ± 0% 4.780n ± 0% -6.89% (p=0.000 n=50)
memmove/99 5.141n ± 0% 4.780n ± 0% -7.03% (p=0.000 n=50)
memmove/100 5.141n ± 0% 4.780n ± 0% -7.02% (p=0.000 n=50)
memmove/101 5.150n ± 0% 4.782n ± 0% -7.14% (p=0.000 n=50)
memmove/102 5.150n ± 0% 4.790n ± 0% -6.99% (p=0.000 n=50)
memmove/103 5.156n ± 0% 4.788n ± 0% -7.14% (n=50)
memmove/104 5.157n ± 0% 4.793n ± 0% -7.05% (p=0.000 n=50)
memmove/105 5.147n ± 0% 4.791n ± 0% -6.90% (p=0.000 n=50)
memmove/106 5.167n ± 0% 4.793n ± 0% -7.23% (p=0.000 n=50)
memmove/107 5.165n ± 0% 4.801n ± 0% -7.06% (p=0.000 n=50)
memmove/108 5.173n ± 0% 4.800n ± 0% -7.21% (p=0.000 n=50)
memmove/109 5.173n ± 0% 4.797n ± 0% -7.27% (p=0.000 n=50)
memmove/110 5.171n ± 0% 4.808n ± 0% -7.01% (p=0.000 n=50)
memmove/111 5.180n ± 0% 4.799n ± 0% -7.36% (p=0.000 n=50)
memmove/112 5.185n ± 0% 4.812n ± 0% -7.19% (p=0.000 n=50)
memmove/113 5.187n ± 0% 4.797n ± 0% -7.53% (p=0.000 n=50)
memmove/114 5.183n ± 0% 4.809n ± 0% -7.21% (n=50)
memmove/115 5.193n ± 0% 4.811n ± 0% -7.36% (p=0.000 n=50)
memmove/116 5.196n ± 0% 4.815n ± 0% -7.32% (p=0.000 n=50)
memmove/117 5.199n ± 0% 4.816n ± 0% -7.37% (p=0.000 n=50)
memmove/118 5.198n ± 0% 4.811n ± 0% -7.45% (p=0.000 n=50)
memmove/119 5.203n ± 0% 4.818n ± 0% -7.40% (p=0.000 n=50)
memmove/120 5.195n ± 0% 4.823n ± 0% -7.16% (p=0.000 n=50)
memmove/121 5.203n ± 0% 4.812n ± 0% -7.51% (p=0.000 n=50)
memmove/122 5.204n ± 0% 4.818n ± 0% -7.42% (n=50)
memmove/123 5.202n ± 0% 4.822n ± 0% -7.31% (p=0.000 n=50)
memmove/124 5.216n ± 0% 4.823n ± 0% -7.54% (p=0.000 n=50)
memmove/125 5.227n ± 0% 4.823n ± 0% -7.72% (p=0.000 n=50)
memmove/126 5.235n ± 0% 4.830n ± 0% -7.74% (p=0.000 n=50)
memmove/127 5.237n ± 0% 4.833n ± 0% -7.72% (p=0.000 n=50)
memmove/128 5.241n ± 0% 4.832n ± 0% -7.81% (p=0.000 n=50)
memmove/129 6.460n ± 0% 5.858n ± 0% -9.31% (p=0.000 n=50)
memmove/130 7.539n ± 0% 6.634n ± 0% -12.00% (p=0.000 n=50)
memmove/131 7.542n ± 0% 6.623n ± 0% -12.18% (p=0.000 n=50)
memmove/132 7.527n ± 0% 6.667n ± 1% -11.43% (p=0.000 n=50)
memmove/133 7.521n ± 0% 6.631n ± 0% -11.83% (p=0.000 n=50)
memmove/134 7.531n ± 0% 6.642n ± 0% -11.81% (p=0.000 n=50)
memmove/135 7.541n ± 0% 6.692n ± 1% -11.25% (p=0.000 n=50)
memmove/136 7.549n ± 0% 6.657n ± 0% -11.81% (p=0.000 n=50)
memmove/137 7.544n ± 0% 6.646n ± 0% -11.90% (p=0.000 n=50)
memmove/138 7.557n ± 0% 6.673n ± 1% -11.70% (p=0.000 n=50)
memmove/139 7.545n ± 0% 6.654n ± 0% -11.81% (n=50)
memmove/140 7.559n ± 0% 6.680n ± 1% -11.63% (p=0.000 n=50)
memmove/141 7.560n ± 0% 6.664n ± 0% -11.85% (p=0.000 n=50)
memmove/142 7.556n ± 0% 6.679n ± 0% -11.62% (p=0.000 n=50)
memmove/143 7.570n ± 0% 6.683n ± 1% -11.71% (p=0.000 n=50)
memmove/144 7.586n ± 0% 6.683n ± 0% -11.91% (p=0.000 n=50)
memmove/145 7.593n ± 0% 6.665n ± 0% -12.22% (p=0.000 n=50)
memmove/146 7.591n ± 0% 6.665n ± 0% -12.20% (p=0.000 n=50)
memmove/147 7.598n ± 0% 6.665n ± 0% -12.27% (p=0.000 n=50)
memmove/148 7.598n ± 0% 6.670n ± 0% -12.21% (p=0.000 n=50)
memmove/149 7.593n ± 0% 6.691n ± 0% -11.88% (p=0.000 n=50)
memmove/150 7.625n ± 0% 6.713n ± 1% -11.97% (p=0.000 n=50)
memmove/151 7.603n ± 0% 6.710n ± 1% -11.74% (p=0.000 n=50)
memmove/152 7.613n ± 0% 6.701n ± 1% -11.97% (p=0.000 n=50)
memmove/153 7.595n ± 0% 6.710n ± 0% -11.65% (p=0.000 n=50)
memmove/154 7.614n ± 0% 6.721n ± 0% -11.74% (p=0.000 n=50)
memmove/155 7.615n ± 0% 6.709n ± 0% -11.89% (p=0.000 n=50)
memmove/156 7.613n ± 0% 6.693n ± 0% -12.08% (p=0.000 n=50)
memmove/157 7.628n ± 0% 6.708n ± 0% -12.05% (p=0.000 n=50)
memmove/158 7.629n ± 0% 6.706n ± 0% -12.10% (p=0.000 n=50)
memmove/159 7.639n ± 0% 6.724n ± 0% -11.98% (p=0.000 n=50)
memmove/160 7.619n ± 0% 6.702n ± 0% -12.04% (p=0.000 n=50)
memmove/161 7.653n ± 0% 6.698n ± 0% -12.49% (p=0.000 n=50)
memmove/162 8.104n ± 0% 7.140n ± 1% -11.89% (p=0.000 n=50)
memmove/163 8.141n ± 0% 7.187n ± 1% -11.72% (p=0.000 n=50)
memmove/164 8.154n ± 0% 7.107n ± 0% -12.84% (p=0.000 n=50)
memmove/165 8.143n ± 0% 7.117n ± 0% -12.59% (p=0.000 n=50)
memmove/166 8.176n ± 0% 7.110n ± 0% -13.04% (p=0.000 n=50)
memmove/167 8.194n ± 0% 7.168n ± 1% -12.52% (p=0.000 n=50)
memmove/168 8.214n ± 0% 7.188n ± 1% -12.50% (p=0.000 n=50)
memmove/169 8.220n ± 0% 7.242n ± 1% -11.90% (p=0.000 n=50)
memmove/170 8.228n ± 0% 7.244n ± 1% -11.96% (p=0.000 n=50)
memmove/171 8.263n ± 0% 7.184n ± 0% -13.06% (p=0.000 n=50)
memmove/172 8.259n ± 0% 7.325n ± 1% -11.31% (p=0.000 n=50)
memmove/173 8.271n ± 0% 7.225n ± 0% -12.65% (p=0.000 n=50)
memmove/174 8.284n ± 0% 7.287n ± 1% -12.04% (p=0.000 n=50)
memmove/175 8.289n ± 0% 7.282n ± 1% -12.15% (p=0.000 n=50)
memmove/176 8.309n ± 0% 7.328n ± 1% -11.81% (p=0.000 n=50)
memmove/177 8.317n ± 0% 7.264n ± 1% -12.67% (p=0.000 n=50)
memmove/178 8.302n ± 0% 7.342n ± 1% -11.57% (p=0.000 n=50)
memmove/179 8.309n ± 0% 7.357n ± 1% -11.45% (p=0.000 n=50)
memmove/180 8.304n ± 0% 7.318n ± 1% -11.87% (p=0.000 n=50)
memmove/181 8.312n ± 0% 7.363n ± 1% -11.42% (p=0.000 n=50)
memmove/182 8.315n ± 0% 7.320n ± 1% -11.96% (p=0.000 n=50)
memmove/183 8.330n ± 0% 7.286n ± 1% -12.53% (p=0.000 n=50)
memmove/184 8.310n ± 0% 7.324n ± 1% -11.86% (p=0.000 n=50)
memmove/185 8.303n ± 0% 7.267n ± 1% -12.47% (p=0.000 n=50)
memmove/186 8.287n ± 0% 7.312n ± 1% -11.76% (p=0.000 n=50)
memmove/187 8.298n ± 0% 7.395n ± 2% -10.88% (p=0.000 n=50)
memmove/188 8.296n ± 0% 7.339n ± 1% -11.54% (p=0.000 n=50)
memmove/189 8.306n ± 0% 7.299n ± 1% -12.12% (p=0.000 n=50)
memmove/190 8.281n ± 0% 7.309n ± 1% -11.74% (p=0.000 n=50)
memmove/191 8.299n ± 0% 7.282n ± 1% -12.26% (p=0.000 n=50)
memmove/192 8.281n ± 0% 7.335n ± 1% -11.41% (p=0.000 n=50)
memmove/193 8.299n ± 0% 7.325n ± 1% -11.74% (p=0.000 n=50)
memmove/194 8.641n ± 0% 8.034n ± 0% -7.02% (p=0.000 n=50)
memmove/195 8.667n ± 0% 8.073n ± 0% -6.85% (p=0.000 n=50)
memmove/196 8.666n ± 0% 8.030n ± 0% -7.34% (p=0.000 n=50)
memmove/197 8.660n ± 0% 8.096n ± 1% -6.51% (p=0.000 n=50)
memmove/198 8.688n ± 0% 8.047n ± 0% -7.39% (p=0.000 n=50)
memmove/199 8.678n ± 0% 8.061n ± 0% -7.11% (p=0.000 n=50)
memmove/200 8.669n ± 0% 8.034n ± 0% -7.32% (p=0.000 n=50)
memmove/201 8.692n ± 0% 8.061n ± 0% -7.26% (p=0.000 n=50)
memmove/202 8.668n ± 0% 8.060n ± 0% -7.02% (p=0.000 n=50)
memmove/203 8.687n ± 0% 8.066n ± 0% -7.15% (p=0.000 n=50)
memmove/204 8.699n ± 0% 8.076n ± 0% -7.16% (p=0.000 n=50)
memmove/205 8.676n ± 0% 8.085n ± 0% -6.82% (p=0.000 n=50)
memmove/206 8.684n ± 0% 8.101n ± 1% -6.71% (p=0.000 n=50)
memmove/207 8.725n ± 0% 8.099n ± 0% -7.18% (p=0.000 n=50)
memmove/208 8.674n ± 0% 8.073n ± 0% -6.92% (p=0.000 n=50)
memmove/209 8.697n ± 0% 8.088n ± 0% -7.01% (p=0.000 n=50)
memmove/210 8.733n ± 0% 8.076n ± 0% -7.53% (p=0.000 n=50)
memmove/211 8.732n ± 0% 8.104n ± 0% -7.19% (p=0.000 n=50)
memmove/212 8.730n ± 0% 8.091n ± 0% -7.32% (p=0.000 n=50)
memmove/213 8.728n ± 0% 8.100n ± 0% -7.19% (p=0.000 n=50)
memmove/214 8.744n ± 1% 8.081n ± 1% -7.57% (p=0.000 n=50)
memmove/215 8.734n ± 0% 8.150n ± 0% -6.68% (p=0.000 n=50)
memmove/216 8.748n ± 0% 8.116n ± 0% -7.23% (p=0.000 n=50)
memmove/217 8.751n ± 0% 8.129n ± 1% -7.11% (p=0.000 n=50)
memmove/218 8.747n ± 0% 8.114n ± 0% -7.23% (p=0.000 n=50)
memmove/219 8.733n ± 0% 8.159n ± 0% -6.57% (p=0.000 n=50)
memmove/220 8.764n ± 0% 8.145n ± 0% -7.06% (p=0.000 n=50)
memmove/221 8.764n ± 0% 8.142n ± 0% -7.10% (p=0.000 n=50)
memmove/222 8.775n ± 0% 8.152n ± 0% -7.10% (p=0.000 n=50)
memmove/223 8.771n ± 0% 8.143n ± 0% -7.16% (p=0.000 n=50)
memmove/224 8.778n ± 0% 8.175n ± 1% -6.87% (p=0.000 n=50)
memmove/225 8.794n ± 0% 8.138n ± 0% -7.45% (p=0.000 n=50)
memmove/226 10.13n ± 0% 10.06n ± 0% -0.71% (p=0.000 n=50)
memmove/227 10.14n ± 0% 10.08n ± 0% -0.53% (p=0.000 n=50)
memmove/228 10.13n ± 0% 10.08n ± 0% -0.56% (p=0.000 n=50)
memmove/229 10.17n ± 0% 10.11n ± 0% -0.56% (p=0.000 n=50)
memmove/230 10.17n ± 0% 10.13n ± 0% -0.38% (p=0.003 n=50)
memmove/231 10.16n ± 0% 10.12n ± 0% -0.41% (p=0.001 n=50)
memmove/232 10.19n ± 0% 10.12n ± 0% -0.67% (p=0.000 n=50)
memmove/233 10.21n ± 0% 10.14n ± 0% -0.71% (p=0.000 n=50)
memmove/234 10.24n ± 0% 10.16n ± 0% -0.79% (p=0.000 n=50)
memmove/235 10.24n ± 0% 10.16n ± 0% -0.76% (p=0.000 n=50)
memmove/236 10.25n ± 0% 10.16n ± 0% -0.81% (p=0.000 n=50)
memmove/237 10.24n ± 0% 10.17n ± 0% -0.69% (p=0.000 n=50)
memmove/238 10.27n ± 0% 10.19n ± 0% -0.79% (p=0.000 n=50)
memmove/239 10.29n ± 0% 10.19n ± 0% -0.90% (p=0.000 n=50)
memmove/240 10.30n ± 0% 10.20n ± 0% -0.95% (p=0.000 n=50)
memmove/241 10.29n ± 0% 10.20n ± 0% -0.91% (p=0.000 n=50)
memmove/242 10.30n ± 0% 10.22n ± 0% -0.80% (p=0.000 n=50)
memmove/243 10.32n ± 0% 10.23n ± 0% -0.87% (p=0.000 n=50)
memmove/244 10.32n ± 0% 10.24n ± 0% -0.74% (p=0.000 n=50)
memmove/245 10.33n ± 0% 10.23n ± 0% -0.97% (p=0.000 n=50)
memmove/246 10.33n ± 0% 10.24n ± 0% -0.92% (p=0.000 n=50)
memmove/247 10.31n ± 0% 10.24n ± 0% -0.69% (p=0.000 n=50)
memmove/248 10.32n ± 0% 10.26n ± 0% -0.55% (p=0.000 n=50)
memmove/249 10.33n ± 0% 10.28n ± 0% -0.52% (p=0.000 n=50)
memmove/250 10.34n ± 0% 10.27n ± 0% -0.66% (p=0.000 n=50)
memmove/251 10.32n ± 0% 10.27n ± 0% -0.45% (p=0.000 n=50)
memmove/252 10.34n ± 0% 10.30n ± 0% -0.39% (p=0.005 n=50)
memmove/253 10.33n ± 0% 10.27n ± 0% -0.57% (p=0.000 n=50)
memmove/254 10.33n ± 0% 10.27n ± 0% -0.54% (p=0.000 n=50)
memmove/255 10.34n ± 0% 10.29n ± 0% -0.50% (p=0.002 n=50)
memmove/256 10.36n ± 0% 10.31n ± 0% -0.44% (p=0.006 n=50)
memmove/257 10.33n ± 0% 10.29n ± 0% -0.36% (p=0.004 n=50)
geomean 6.142n 5.696n -7.26%
```
2023-10-24 16:05:27 +02:00
Joseph Huber
4cb6c1c7cb
[libc] Enable missing memory tests on the GPU ( #68111 )
...
Summary:
There were a few tests that weren't enabled on the GPU. This is because
the logic caused them to be skipped as we don't use CPU featured on the
host. This also disables the logic making multiple versions of the
memory functions.
2023-10-06 08:27:36 -05:00
Joseph Huber
452fa6b86d
[libc] Change the GPU to use builtin memory functions ( #68003 )
...
Summary:
The GPU build is special in the sense that we always know that
up-to-date `clang` is always going to be the compiler. This allows us to
rely directly on builtins, which allow us to push a lot of this
complexity into the backend. Backend implementations are favored on
the GPU because it allows us to do a lot more target specific
optimizations. This patch changes over the common memory functions to
use builtin versions when building for AMDGPU or NVPTX.
2023-10-04 07:02:55 -05:00
Guillaume Chatelet
b6bc9d72f6
[libc] Mass replace enclosing namespace ( #67032 )
...
This is step 4 of
https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
2023-09-26 11:45:04 +02:00
Guillaume Chatelet
270547f3bf
[libc][clang-tidy] Add llvm-header-guard to get consistant naming and prevent file copy/paste issues. ( #66477 )
2023-09-21 11:14:47 +02:00
Siva Chandra
17114f8b19
[libc] Remove common_libc_tuners.cmake and move options into config.json. ( #66226 )
...
The name has been changed to adhere to the config option naming format.
The necessary build changes to use the new option have also been made.
2023-09-13 22:17:00 -07:00
Siva Chandra
c5ad6c7781
[libc] Fix a typo in a CMakeLists.txt - replace DEPS with DEPENDS. ( #66130 )
2023-09-12 12:24:27 -07:00
Roland McGrath
019a477c88
[libc] Clean up required LIBC_INLINE uses in src/string
...
This was generated using clang-tidy and clang-apply-replacements,
on src/string/*.cpp for just the llvmlibc-inline-function-decl
check, after applying https://reviews.llvm.org/D157164 , and then
some manual fixup.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D157169
2023-08-07 12:21:22 -07:00
Guillaume Chatelet
1f5783474f
[libc][NFC] Rename files
...
This patch mostly renames files so it better reflects the function they declare.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D155607
2023-07-19 09:06:29 +00:00
Guillaume Chatelet
bc8c3f4998
[libc][memfunctions] Explicit error when platform in not supported
...
Reviewed By: JonChesterfield, jhuber6
Differential Revision: https://reviews.llvm.org/D155597
2023-07-19 08:45:04 +00:00
Joseph Huber
8759f1b030
Revert "[libc] Default the GPU build to the default memory utilities"
...
This reverts commit eca8b54a5f76c65a055bac05556b70c2a0ec63a1.
Another user reverted the patch this was based on leaving this one in a
broken state.
2023-07-18 11:01:38 -05:00
Joseph Huber
eca8b54a5f
[libc] Default the GPU build to the default memory utilities
...
A previous patch made this cause an error on the GPU. We have not yet
dedicated time towards an optimial implementaiton there but we do not
want it to cause an error. We simply use the fallback routines.
Differential Revision: https://reviews.llvm.org/D155615
2023-07-18 10:49:51 -05:00
Jon Chesterfield
f717c2d4f2
Revert "[libc][memfunctions] Explicit error when platform in not supported"
...
Broke amdgpu libc bot
This reverts commit a39c951730aa92894e27da038e834229d4613db1.
2023-07-18 16:41:47 +01:00
Guillaume Chatelet
a39c951730
[libc][memfunctions] Explicit error when platform in not supported
...
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D155597
2023-07-18 13:53:03 +00:00
Guillaume Chatelet
23dcdbfba7
[libc][NFC] Split memmove implementations per platform
...
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155515
2023-07-18 12:20:23 +00:00
Guillaume Chatelet
b38dda74fa
[libc][NFC] Split memcmp implementations per platform
...
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155181
2023-07-17 11:35:31 +00:00
Guillaume Chatelet
83f3920854
[libc][NFC] Split memset implementations per platform
...
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155174
2023-07-17 11:12:19 +00:00
Guillaume Chatelet
8cc440b3e7
[libc][NFC] Split memcpy implementations per platform
...
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155099
2023-07-13 10:30:38 +00:00
Guillaume Chatelet
1c4e4e03bd
[libc][NFC] Split bcmp implementations per platform
...
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155076
2023-07-13 10:19:00 +00:00
Guillaume Chatelet
bfd94882f2
[libc][NFC] Move aligned access implementations to separate header
...
Follow up on https://reviews.llvm.org/D154770
Differential Revision: https://reviews.llvm.org/D154800
2023-07-09 22:17:05 +00:00
Guillaume Chatelet
dbaa5838c1
[libc][NFC] Move memfunction's byte per byte implementations to a separate header
...
There will be subsequent patches to move things around and make the file layout more principled.
Differential Revision: https://reviews.llvm.org/D154770
2023-07-09 07:21:58 +00:00
Guillaume Chatelet
cb1468d3cb
[libc] Adding a version of memcpy w/ software prefetching
...
For machines with a lot of cores, hardware prefetchers can saturate the memory bus when utilization is high.
In this case it is desirable to turn off the hardware prefetcher completely.
This has a big impact on the performance of memory functions such as `memcpy` that rely on the fact that the next cache line will be readily available.
This patch adds the 'LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING' compile time option that generates a version of memcpy with software prefetching. While not fully restoring the original performances it mitigates the impact to an acceptable level.
Reviewed By: rtenneti
Differential Revision: https://reviews.llvm.org/D154494
2023-07-07 10:37:32 +00:00
Roland McGrath
5bf8efd269
[libc] Fix more inline definitions
...
Fix a bunch more instances of incorrect use of the `static`
keyword and missing use of LIBC_INLINE and LIBC_INLINE_VAR
macros. Note that even forward declarations and generic template
declarations must follow the prescribed patterns for libc code so
that they match every definition, all template specializations.
Reviewed By: Caslyn
Differential Revision: https://reviews.llvm.org/D154260
2023-06-30 14:46:25 -07:00
Roland McGrath
dbd38b1219
[libc] Add missing cast in x86 big_endian_cmp_mask
...
Implicit narrowing conversions from int to uint16_t
get a compiler warning with the warning settings used
in the Fuchsia build.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D154256
2023-06-30 14:15:59 -07:00
Guillaume Chatelet
1c814c99aa
[libc] Improve memcmp latency and codegen
...
This is based on ideas from @nafi to:
- use a branchless version of 'cmp' for 'uint32_t',
- completely resolve the lexicographic comparison through vector
operations when wide types are available. We also get rid of byte
reloads and serializing '__builtin_ctzll'.
I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.
The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.
Reviewed By: nafi3000
Differential Revision: https://reviews.llvm.org/D148717
2023-06-30 13:00:58 +00:00
Guillaume Chatelet
177583c914
[libc][NFC] Use SIZE_MAX instead of size_t(-1)
2023-06-29 12:21:43 +00:00
Guillaume Chatelet
b3b54131d0
[libc][NFC] Separate avx/no-avx x86 memcpy implementations
...
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D153958
2023-06-28 13:56:56 +00:00
Guillaume Chatelet
bd1cba9f4f
Revert D148717 "[libc] Improve memcmp latency and codegen"
...
Once integrated in our codebase the patch triggered a bunch of failing
tests. We do not yet understand where the bug is but we revert it to
move forward with integration.
This reverts commit 5e32765c15ab8df3d2635a2bb5078c5b1d5714d5.
2023-06-21 12:37:14 +00:00
Alex Brachet
61c9052cec
[libc] Add LIBC_INLINE_VAR for inline variables
...
These are the only variables I could find that use LIBC_INLINE. Note, these are namespace scoped constexpr so local linkage is implied. inline is useful here to silence clang's unused-const-variable variable. For Fuchsia, the distinction between LIBC_INLINE and LIBC_INLINE_VAR is helpful because we define LIBC_INLINE as `[[gnu::always_inline]] inline` when building with gcc. This isn't meaningful on variables.
Alternatively, we could make these variables simply constexpr and also add `[[maybe_unused]]`
Reviewed By: sivachandra, mcgrathr
Differential Revision: https://reviews.llvm.org/D152951
2023-06-16 15:46:32 +00:00
Alex Brachet
10e7b451ad
[libc][NFC] Fix some issues with LIBC_INLINE
...
We define LIBC_INLINE to include [[clang::internal_linkage]], and these
must appear before other specifiers. Additionally, there was also a
missing cast that was causing warnings.
Differential Revision: https://reviews.llvm.org/D152865
2023-06-14 14:09:11 +00:00
Guillaume Chatelet
2cfae7cdf4
[libc] Dispatch memmove to memcpy when buffers are disjoint
...
Most of the time `memmove` is called on buffers that are disjoint, in that case we can use `memcpy` which is faster.
The additional test is branchless on x86, aarch64 and RISCV with the zbb extension (bitmanip).
On x86 this patch adds a latency of 2 to 3 cycles.
Before
```
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------
BM_Memmove/0/0_median 5.00 ns 5.00 ns 10 bytes_per_cycle=1.25477/s bytes_per_second=2.62933G/s items_per_second=199.87M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 6.21 ns 6.21 ns 10 bytes_per_cycle=3.22173/s bytes_per_second=6.75106G/s items_per_second=160.955M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 8.09 ns 8.09 ns 10 bytes_per_cycle=5.31462/s bytes_per_second=11.1366G/s items_per_second=123.603M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 5.95 ns 5.95 ns 10 bytes_per_cycle=2.71865/s bytes_per_second=5.69687G/s items_per_second=167.967M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 5.63 ns 5.63 ns 10 bytes_per_cycle=2.28294/s bytes_per_second=4.78383G/s items_per_second=177.615M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 5.68 ns 5.68 ns 10 bytes_per_cycle=2.16798/s bytes_per_second=4.54295G/s items_per_second=176.015M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 7.46 ns 7.46 ns 10 bytes_per_cycle=3.97619/s bytes_per_second=8.332G/s items_per_second=134.044M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 5.40 ns 5.40 ns 10 bytes_per_cycle=1.79695/s bytes_per_second=3.76546G/s items_per_second=185.211M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 5.62 ns 5.62 ns 10 bytes_per_cycle=3.18747/s bytes_per_second=6.67927G/s items_per_second=177.983M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 101 ns 101 ns 10 bytes_per_cycle=9.77359/s bytes_per_second=20.4803G/s items_per_second=9.9333M/s __llvm_libc::memmove,uniform 384 to 4096
```
After
```
BM_Memmove/0/0_median 3.57 ns 3.57 ns 10 bytes_per_cycle=1.71375/s bytes_per_second=3.59112G/s items_per_second=280.411M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 4.52 ns 4.52 ns 10 bytes_per_cycle=4.47557/s bytes_per_second=9.37843G/s items_per_second=221.427M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 5.70 ns 5.70 ns 10 bytes_per_cycle=7.37396/s bytes_per_second=15.4519G/s items_per_second=175.399M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 4.47 ns 4.47 ns 10 bytes_per_cycle=3.4148/s bytes_per_second=7.15563G/s items_per_second=223.743M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 4.53 ns 4.53 ns 10 bytes_per_cycle=2.86071/s bytes_per_second=5.99454G/s items_per_second=220.69M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 4.19 ns 4.19 ns 10 bytes_per_cycle=2.5484/s bytes_per_second=5.3401G/s items_per_second=238.924M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 5.02 ns 5.02 ns 10 bytes_per_cycle=5.94164/s bytes_per_second=12.4505G/s items_per_second=199.14M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 4.03 ns 4.03 ns 10 bytes_per_cycle=2.47028/s bytes_per_second=5.17641G/s items_per_second=247.906M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 4.70 ns 4.70 ns 10 bytes_per_cycle=3.84975/s bytes_per_second=8.06706G/s items_per_second=212.72M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 90.7 ns 90.7 ns 10 bytes_per_cycle=10.8681/s bytes_per_second=22.7739G/s items_per_second=11.02M/s __llvm_libc::memmove,uniform 384 to 4096
```
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D152811
2023-06-14 08:29:15 +00:00
Guillaume Chatelet
5e32765c15
[libc] Improve memcmp latency and codegen
...
This is based on ideas from @nafi to:
- use a branchless version of 'cmp' for 'uint32_t',
- completely resolve the lexicographic comparison through vector
operations when wide types are available. We also get rid of byte
reloads and serializing '__builtin_ctzll'.
I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.
The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.
Reviewed By: nafi3000
Differential Revision: https://reviews.llvm.org/D148717
2023-06-12 13:47:16 +00:00
Guillaume Chatelet
1ec995cc1c
Revert D148717 "[libc] Improve memcmp latency and codegen"
...
This broke aarch64 debug buildbot https://lab.llvm.org/buildbot/#/builders/223/builds/21703
This reverts commit bd4f978754758d5ef29d1f10370f45362da3de37.
2023-06-12 08:32:00 +00:00