llvm-project

Author	SHA1	Message	Date
Leandro Lacerda	15a192cde5	[libc] Enable double math functions on the GPU (#154857 ) This patch adds the `acos` math function to the NVPTX build. It also adds the `sincos` math function to the `math.h` header.	2025-08-22 06:52:13 -05:00
Muhammad Bassiouni	4d323206ed	[libc][math] Refactor cospif16 implementation to header-only in src/__support/math folder. (#154222 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-08-22 05:04:13 +03:00
Muhammad Bassiouni	783859b2a0	[libc][math] Refactor cospif implementation to header-only in src/__support/math folder. (#154215 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-08-22 04:53:18 +03:00
William Huynh	1b9e9e29e2	[libc] Add boot code for AArch64 (#154789 ) This is required in hermetic testing downstream. It is not complete, and will not work on hardware, however it runs on QEMU, and can report a pass/fail on our tests.	2025-08-21 18:47:34 +00:00
enh-google	71dd4e17dd	[libc] fix strsep()/strtok()/strtok_r() "subsequent searches" behavior. (#154370 ) These functions turned out to have the same bug that was in wcstok() (fixed by 4fc9801), so add the missing tests and fix the code in a way that matches wcstok(). Also fix incorrect test expectations in existing tests. Also update the BUILD.bazel files to actually build the strsep() test.	2025-08-21 09:32:35 -04:00
Joseph Huber	e90ce511e0	Disable asan on last wide string function	2025-08-21 08:30:47 -05:00
Joseph Huber	3596005148	Fix wide read defaults	2025-08-21 08:17:06 -05:00
Joseph Huber	6ac01d12d6	Reapply "[libc] Enable wide-read memory operations by default on Linux (#154602 )" (#154640 ) Reland afterr the sanitizer and arm32 builds complained.	2025-08-21 07:40:23 -05:00
Mikhail R. Gadelha	2b1dcf5383	[libc] Remove hardcoded sizeof in __barrier_type.h (#153718 ) This PR modifies the static_asserts checking the expected sizes in __barrier_type.h, so that we can guarantee that our internal implementation fits the public header.	2025-08-21 09:37:56 -03:00
Joseph Huber	27fc9671f9	Revert "[libc] Enable wide-read memory operations by default on Linux (#154602 )" This reverts commit c80d1483c6d787edf62ff9e86b1e97af5eb5abf9.	2025-08-20 17:27:13 -05:00
Joseph Huber	c80d1483c6	[libc] Enable wide-read memory operations by default on Linux (#154602 ) Summary: This patch changes the linux build to use the wide reads on the memory operations by default. These memory functions will now potentially read outside of the bounds explicitly allowed by the current function. While technically undefined behavior in the standard, plenty of C library implementations do this. it will not cause a segmentation fault on linux as long as you do not cross a page boundary, and because we are only reading memory it should not have atomic effects.	2025-08-20 17:17:12 -05:00
Guillaume Chatelet	4dd9e99284	[libc] Fix `constexpr` `add_with_carry`/`sub_with_borrow` (#154282 ) The previous version of the code would prevent the use of the compiler builtins.	2025-08-20 10:41:51 +02:00
Sterling-Augustine	317920063b	Add vector-based strlen implementation for x86_64 and aarch64 (#152389 ) These replace the default LIBC_CONF_STRING_UNSAFE_WIDE_READ implementation on x86_64 and aarch64. These are substantially faster than both the character-by-character implementation and the original unsafe_wide_read implementation. Some below I have been unable to performance-test the aarch64 version, but I suspect speedups similar to avx2. ``` Function: strlen Variant: char wide ull sse2 avx2 avx512 ============================================================================================================================================================= length=1, alignment=1: 13.18 20.47 (-55.24%) 20.21 (-53.27%) 32.50 (-146.54%) 26.05 (-97.61%) 18.03 (-36.74%) length=1, alignment=0: 12.80 34.92 (-172.89%) 20.01 (-56.39%) 17.52 (-36.86%) 17.78 (-38.92%) 18.04 (-40.94%) length=2, alignment=2: 9.91 19.02 (-91.95%) 12.64 (-27.52%) 11.06 (-11.59%) 9.48 ( 4.38%) 9.48 ( 4.34%) length=2, alignment=0: 9.56 26.88 (-181.24%) 12.64 (-32.31%) 11.06 (-15.73%) 11.06 (-15.72%) 11.83 (-23.80%) length=3, alignment=3: 8.31 10.45 (-25.84%) 8.28 ( 0.32%) 8.28 ( 0.36%) 6.21 ( 25.28%) 6.21 ( 25.24%) length=3, alignment=0: 8.39 14.53 (-73.20%) 8.28 ( 1.33%) 7.24 ( 13.69%) 7.56 ( 9.94%) 7.25 ( 13.65%) length=4, alignment=4: 9.84 21.76 (-121.24%) 15.55 (-58.11%) 6.57 ( 33.18%) 5.02 ( 48.98%) 6.00 ( 39.00%) length=4, alignment=0: 8.64 13.70 (-58.51%) 7.28 ( 15.73%) 6.37 ( 26.31%) 6.36 ( 26.36%) 6.36 ( 26.36%) length=5, alignment=5: 11.85 23.81 (-100.97%) 12.17 ( -2.67%) 5.68 ( 52.09%) 4.87 ( 58.94%) 6.48 ( 45.33%) length=5, alignment=0: 11.82 13.64 (-15.42%) 7.27 ( 38.45%) 6.36 ( 46.15%) 6.37 ( 46.11%) 6.36 ( 46.14%) length=6, alignment=6: 10.50 19.37 (-84.56%) 13.64 (-29.93%) 6.54 ( 37.71%) 6.89 ( 34.35%) 9.45 ( 10.01%) length=6, alignment=0: 14.96 14.05 ( 6.04%) 6.49 ( 56.62%) 5.68 ( 62.04%) 5.68 ( 62.04%) 13.15 ( 12.05%) length=7, alignment=7: 10.97 18.02 (-64.35%) 14.59 (-33.06%) 6.36 ( 41.96%) 5.46 ( 50.25%) 5.46 ( 50.25%) length=7, alignment=0: 10.96 15.76 (-43.77%) 15.37 (-40.15%) 6.96 ( 36.51%) 5.68 ( 48.22%) 7.04 ( 35.83%) length=4, alignment=0: 8.66 13.69 (-58.02%) 7.28 ( 16.00%) 6.37 ( 26.44%) 6.37 ( 26.52%) 6.61 ( 23.74%) length=4, alignment=7: 8.87 17.35 (-95.73%) 12.18 (-37.39%) 5.68 ( 35.94%) 4.87 ( 45.11%) 6.00 ( 32.36%) length=4, alignment=2: 8.67 10.05 (-15.91%) 7.28 ( 16.01%) 7.37 ( 15.02%) 5.46 ( 37.02%) 5.47 ( 36.89%) length=2, alignment=2: 5.64 10.01 (-77.64%) 7.29 (-29.34%) 6.37 (-13.04%) 5.46 ( 3.19%) 5.46 ( 3.19%) length=8, alignment=0: 12.78 16.52 (-29.33%) 18.27 (-43.00%) 11.82 ( 7.47%) 9.83 ( 23.03%) 11.46 ( 10.27%) length=8, alignment=7: 14.24 17.30 (-21.49%) 12.16 ( 14.59%) 5.68 ( 60.14%) 4.87 ( 65.83%) 6.23 ( 56.28%) length=8, alignment=3: 12.34 26.15 (-111.98%) 12.20 ( 1.14%) 6.50 ( 47.34%) 4.87 ( 60.54%) 6.18 ( 49.94%) length=5, alignment=3: 10.95 19.74 (-80.30%) 12.17 (-11.11%) 5.68 ( 48.16%) 4.87 ( 55.56%) 5.96 ( 45.55%) length=16, alignment=0: 20.33 29.29 (-44.08%) 36.18 (-77.97%) 5.68 ( 72.06%) 5.68 ( 72.08%) 10.60 ( 47.86%) length=16, alignment=7: 19.29 17.52 ( 9.16%) 12.98 ( 32.73%) 7.05 ( 63.47%) 4.87 ( 74.75%) 6.23 ( 67.71%) length=16, alignment=4: 20.54 25.18 (-22.56%) 15.42 ( 24.92%) 7.31 ( 64.43%) 4.87 ( 76.29%) 5.98 ( 70.88%) length=10, alignment=4: 14.59 21.26 (-45.71%) 12.17 ( 16.58%) 5.68 ( 61.07%) 4.87 ( 66.65%) 6.00 ( 58.91%) length=32, alignment=0: 35.46 22.00 ( 37.95%) 16.22 ( 54.26%) 7.32 ( 79.35%) 5.68 ( 83.98%) 7.01 ( 80.22%) length=32, alignment=7: 35.23 24.14 ( 31.48%) 16.22 ( 53.96%) 7.30 ( 79.28%) 8.76 ( 75.12%) 6.14 ( 82.58%) length=32, alignment=5: 35.16 28.56 ( 18.76%) 16.22 ( 53.87%) 7.30 ( 79.23%) 6.77 ( 80.75%) 9.82 ( 72.07%) length=21, alignment=5: 26.47 27.66 ( -4.49%) 15.04 ( 43.17%) 6.90 ( 73.95%) 4.87 ( 81.60%) 6.04 ( 77.18%) length=64, alignment=0: 66.45 25.16 ( 62.14%) 22.70 ( 65.83%) 12.99 ( 80.44%) 7.47 ( 88.77%) 8.70 ( 86.90%) length=64, alignment=7: 64.75 27.78 ( 57.10%) 22.72 ( 64.91%) 10.85 ( 83.25%) 7.46 ( 88.48%) 8.68 ( 86.60%) length=64, alignment=6: 67.26 28.58 ( 57.51%) 22.70 ( 66.24%) 11.26 ( 83.25%) 9.46 ( 85.94%) 13.90 ( 79.33%) length=42, alignment=6: 73.42 27.97 ( 61.91%) 19.46 ( 73.49%) 8.92 ( 87.84%) 6.49 ( 91.16%) 6.00 ( 91.83%) length=128, alignment=0: 172.07 39.18 ( 77.23%) 35.68 ( 79.26%) 13.02 ( 92.43%) 12.98 ( 92.46%) 9.76 ( 94.33%) length=128, alignment=7: 163.98 43.79 ( 73.30%) 36.03 ( 78.03%) 15.68 ( 90.44%) 11.35 ( 93.08%) 10.51 ( 93.59%) length=128, alignment=7: 185.86 40.27 ( 78.33%) 36.04 ( 80.61%) 13.78 ( 92.58%) 11.35 ( 93.89%) 10.49 ( 94.36%) length=85, alignment=7: 121.61 55.66 ( 54.23%) 32.34 ( 73.40%) 13.88 ( 88.59%) 7.30 ( 94.00%) 8.72 ( 92.83%) length=256, alignment=0: 295.54 66.48 ( 77.50%) 61.63 ( 79.15%) 19.54 ( 93.39%) 12.97 ( 95.61%) 12.45 ( 95.79%) length=256, alignment=7: 308.06 78.92 ( 74.38%) 61.63 ( 80.00%) 22.90 ( 92.57%) 12.97 ( 95.79%) 13.23 ( 95.71%) length=256, alignment=8: 295.32 65.83 ( 77.71%) 61.62 ( 79.13%) 23.19 ( 92.15%) 12.97 ( 95.61%) 13.50 ( 95.43%) length=170, alignment=8: 234.39 48.79 ( 79.18%) 43.79 ( 81.32%) 16.22 ( 93.08%) 13.97 ( 94.04%) 10.48 ( 95.53%) length=512, alignment=0: 563.75 116.89 ( 79.27%) 114.99 ( 79.60%) 62.71 ( 88.88%) 19.58 ( 96.53%) 17.76 ( 96.85%) length=512, alignment=7: 580.53 120.91 ( 79.17%) 114.47 ( 80.28%) 37.75 ( 93.50%) 19.55 ( 96.63%) 18.68 ( 96.78%) length=512, alignment=9: 584.05 128.35 ( 78.02%) 114.74 ( 80.35%) 39.09 ( 93.31%) 19.76 ( 96.62%) 18.71 ( 96.80%) length=341, alignment=9: 405.84 90.87 ( 77.61%) 78.79 ( 80.59%) 28.77 ( 92.91%) 14.60 ( 96.40%) 14.15 ( 96.51%) length=1024, alignment=0: 1143.61 247.03 ( 78.40%) 243.70 ( 78.69%) 75.59 ( 93.39%) 67.02 ( 94.14%) 28.99 ( 97.46%) length=1024, alignment=7: 1124.55 267.87 ( 76.18%) 259.16 ( 76.95%) 64.96 ( 94.22%) 33.05 ( 97.06%) 30.91 ( 97.25%) length=1024, alignment=10: 1459.58 257.79 ( 82.34%) 239.91 ( 83.56%) 65.00 ( 95.55%) 33.10 ( 97.73%) 30.33 ( 97.92%) length=682, alignment=10: 732.89 163.67 ( 77.67%) 170.54 ( 76.73%) 46.48 ( 93.66%) 24.32 ( 96.68%) 21.44 ( 97.07%) length=2048, alignment=0: 2141.96 451.61 ( 78.92%) 448.00 ( 79.08%) 133.24 ( 93.78%) 61.22 ( 97.14%) 80.08 ( 96.26%) length=2048, alignment=7: 2145.05 458.26 ( 78.64%) 449.99 ( 79.02%) 140.19 ( 93.46%) 60.26 ( 97.19%) 51.71 ( 97.59%) length=2048, alignment=11: 2162.61 463.37 ( 78.57%) 448.07 ( 79.28%) 140.29 ( 93.51%) 59.51 ( 97.25%) 51.59 ( 97.61%) length=1365, alignment=11: 1439.74 322.86 ( 77.58%) 310.84 ( 78.41%) 116.08 ( 91.94%) 42.43 ( 97.05%) 36.15 ( 97.49%) length=4096, alignment=0: 4278.68 871.60 ( 79.63%) 865.25 ( 79.78%) 252.50 ( 94.10%) 161.17 ( 96.23%) 94.97 ( 97.78%) length=4096, alignment=7: 4253.01 871.62 ( 79.51%) 864.21 ( 79.68%) 243.90 ( 94.27%) 171.17 ( 95.98%) 95.14 ( 97.76%) length=4096, alignment=12: 4252.18 879.66 ( 79.31%) 863.68 ( 79.69%) 244.26 ( 94.26%) 185.36 ( 95.64%) 93.61 ( 97.80%) length=2730, alignment=12: 2868.22 597.65 ( 79.16%) 586.22 ( 79.56%) 175.09 ( 93.90%) 120.35 ( 95.80%) 101.35 ( 96.47%) length=0, alignment=0: 4.87 8.11 (-66.73%) 6.49 (-33.34%) 5.80 (-19.26%) 5.68 (-16.67%) 6.86 (-40.91%) length=32, alignment=0: 33.82 22.36 ( 33.89%) 17.03 ( 49.66%) 7.30 ( 78.42%) 5.68 ( 83.22%) 7.50 ( 77.83%) length=64, alignment=0: 66.20 26.76 ( 59.58%) 23.22 ( 64.93%) 12.99 ( 80.37%) 7.34 ( 88.92%) 8.44 ( 87.25%) length=96, alignment=0: 130.26 31.62 ( 75.72%) 30.00 ( 76.97%) 11.39 ( 91.26%) 10.54 ( 91.91%) 8.68 ( 93.34%) length=128, alignment=0: 164.66 39.05 ( 76.29%) 35.68 ( 78.33%) 13.07 ( 92.07%) 12.97 ( 92.12%) 9.59 ( 94.18%) length=160, alignment=0: 196.63 45.18 ( 77.02%) 42.16 ( 78.56%) 14.65 ( 92.55%) 10.87 ( 94.47%) 9.31 ( 95.27%) length=192, alignment=0: 225.50 52.71 ( 76.63%) 49.61 ( 78.00%) 16.22 ( 92.81%) 11.36 ( 94.96%) 11.08 ( 95.09%) length=224, alignment=0: 261.08 57.57 ( 77.95%) 55.82 ( 78.62%) 17.84 ( 93.17%) 12.16 ( 95.34%) 11.51 ( 95.59%) length=256, alignment=0: 295.13 65.56 ( 77.79%) 62.59 ( 78.79%) 19.46 ( 93.41%) 13.12 ( 95.56%) 12.33 ( 95.82%) length=288, alignment=0: 325.69 72.16 ( 77.84%) 69.20 ( 78.75%) 21.08 ( 93.53%) 13.94 ( 95.72%) 12.32 ( 96.22%) length=320, alignment=0: 364.18 78.78 ( 78.37%) 75.69 ( 79.21%) 22.71 ( 93.77%) 14.70 ( 95.96%) 14.46 ( 96.03%) length=352, alignment=0: 391.40 84.87 ( 78.32%) 82.15 ( 79.01%) 24.50 ( 93.74%) 15.62 ( 96.01%) 14.27 ( 96.35%) length=384, alignment=0: 428.50 91.43 ( 78.66%) 88.70 ( 79.30%) 26.16 ( 93.90%) 17.29 ( 95.97%) 15.04 ( 96.49%) length=416, alignment=0: 457.30 98.23 ( 78.52%) 95.02 ( 79.22%) 27.81 ( 93.92%) 17.22 ( 96.23%) 15.05 ( 96.71%) length=448, alignment=0: 488.38 104.52 ( 78.60%) 101.87 ( 79.14%) 31.22 ( 93.61%) 18.07 ( 96.30%) 16.89 ( 96.54%) length=480, alignment=0: 526.44 109.61 ( 79.18%) 108.11 ( 79.46%) 31.11 ( 94.09%) 18.88 ( 96.41%) 17.10 ( 96.75%) length=512, alignment=0: 556.50 117.29 ( 78.92%) 113.78 ( 79.56%) 62.57 ( 88.76%) 19.88 ( 96.43%) 17.80 ( 96.80%) length=576, alignment=0: 622.17 152.93 ( 75.42%) 127.58 ( 79.49%) 39.34 ( 93.68%) 21.31 ( 96.58%) 19.99 ( 96.79%) length=640, alignment=0: 691.01 142.56 ( 79.37%) 161.78 ( 76.59%) 39.20 ( 94.33%) 22.98 ( 96.67%) 20.13 ( 97.09%) length=704, alignment=0: 756.90 156.31 ( 79.35%) 176.19 ( 76.72%) 45.03 ( 94.05%) 24.82 ( 96.72%) 22.33 ( 97.05%) length=768, alignment=0: 826.23 193.17 ( 76.62%) 188.41 ( 77.20%) 50.81 ( 93.85%) 27.46 ( 96.68%) 23.25 ( 97.19%) length=832, alignment=0: 890.17 204.81 ( 76.99%) 201.61 ( 77.35%) 53.77 ( 93.96%) 27.73 ( 96.88%) 25.06 ( 97.18%) length=896, alignment=0: 959.52 217.89 ( 77.29%) 213.86 ( 77.71%) 57.99 ( 93.96%) 29.53 ( 96.92%) 26.29 ( 97.26%) length=960, alignment=0: 1024.52 231.06 ( 77.45%) 227.05 ( 77.84%) 60.36 ( 94.11%) 32.29 ( 96.85%) 27.94 ( 97.27%) length=1024, alignment=0: 1086.71 244.17 ( 77.53%) 239.87 ( 77.93%) 64.72 ( 94.04%) 72.38 ( 93.34%) 28.72 ( 97.36%) length=1152, alignment=0: 1231.48 270.22 ( 78.06%) 266.47 ( 78.36%) 73.38 ( 94.04%) 40.24 ( 96.73%) 32.42 ( 97.37%) length=1280, alignment=0: 1349.29 295.45 ( 78.10%) 292.69 ( 78.31%) 111.80 ( 91.71%) 42.44 ( 96.85%) 34.59 ( 97.44%) length=1408, alignment=0: 1487.13 322.57 ( 78.31%) 318.18 ( 78.60%) 84.47 ( 94.32%) 44.35 ( 97.02%) 37.31 ( 97.49%) length=1536, alignment=0: 1623.52 347.98 ( 78.57%) 344.24 ( 78.80%) 108.31 ( 93.33%) 49.82 ( 96.93%) 39.94 ( 97.54%) length=1664, alignment=0: 1748.88 373.80 ( 78.63%) 370.03 ( 78.84%) 118.76 ( 93.21%) 52.89 ( 96.98%) 42.93 ( 97.55%) length=1792, alignment=0: 1886.22 399.59 ( 78.82%) 397.39 ( 78.93%) 127.32 ( 93.25%) 53.64 ( 97.16%) 45.39 ( 97.59%) length=1920, alignment=0: 2018.37 425.98 ( 78.89%) 422.31 ( 79.08%) 126.70 ( 93.72%) 57.08 ( 97.17%) 48.12 ( 97.62%) length=2048, alignment=0: 2167.09 451.70 ( 79.16%) 447.70 ( 79.34%) 141.68 ( 93.46%) 61.63 ( 97.16%) 79.06 ( 96.35%) length=2304, alignment=0: 2422.03 503.63 ( 79.21%) 502.23 ( 79.26%) 149.62 ( 93.82%) 73.10 ( 96.98%) 56.97 ( 97.65%) length=2560, alignment=0: 2678.68 556.84 ( 79.21%) 553.24 ( 79.35%) 161.06 ( 93.99%) 127.74 ( 95.23%) 58.81 ( 97.80%) length=2816, alignment=0: 2941.95 608.70 ( 79.31%) 604.03 ( 79.47%) 171.85 ( 94.16%) 87.11 ( 97.04%) 67.08 ( 97.72%) length=3072, alignment=0: 3229.89 660.14 ( 79.56%) 659.19 ( 79.59%) 183.85 ( 94.31%) 140.25 ( 95.66%) 73.01 ( 97.74%) length=3328, alignment=0: 3496.08 713.05 ( 79.60%) 710.00 ( 79.69%) 209.72 ( 94.00%) 138.78 ( 96.03%) 77.81 ( 97.77%) length=3584, alignment=0: 3756.52 766.19 ( 79.60%) 763.94 ( 79.66%) 214.16 ( 94.30%) 146.36 ( 96.10%) 83.43 ( 97.78%) length=3840, alignment=0: 4017.15 817.43 ( 79.65%) 819.77 ( 79.59%) 242.07 ( 93.97%) 164.56 ( 95.90%) 89.72 ( 97.77%) length=4096, alignment=0: 4281.59 867.87 ( 79.73%) 864.71 ( 79.80%) 243.33 ( 94.32%) 173.11 ( 95.96%) 95.65 ( 97.77%) length=4608, alignment=0: 4810.30 977.80 ( 79.67%) 985.03 ( 79.52%) 271.13 ( 94.36%) 190.62 ( 96.04%) 107.82 ( 97.76%) length=5120, alignment=0: 5380.16 1075.77 ( 80.00%) 1071.80 ( 80.08%) 294.27 ( 94.53%) 206.04 ( 96.17%) 141.90 ( 97.36%) length=5632, alignment=0: 5925.70 1195.61 ( 79.82%) 1193.68 ( 79.86%) 323.42 ( 94.54%) 223.55 ( 96.23%) 125.28 ( 97.89%) length=6144, alignment=0: 6402.20 1285.52 ( 79.92%) 1281.04 ( 79.99%) 342.68 ( 94.65%) 234.84 ( 96.33%) 167.01 ( 97.39%) length=6656, alignment=0: 6997.01 1387.32 ( 80.17%) 1384.21 ( 80.22%) 365.93 ( 94.77%) 269.89 ( 96.14%) 176.40 ( 97.48%) length=7168, alignment=0: 7454.76 1492.10 ( 79.98%) 1488.45 ( 80.03%) 391.92 ( 94.74%) 280.81 ( 96.23%) 187.73 ( 97.48%) length=7680, alignment=0: 8163.34 1608.43 ( 80.30%) 1615.98 ( 80.20%) 460.03 ( 94.36%) 299.86 ( 96.33%) 201.40 ( 97.53%) ```	2025-08-19 15:18:04 -07:00
Michael Jones	d2b2d6ff10	[libc] Fix missing close at the end of file test (#154392 ) The test added by #150802 was missing a close at the end.	2025-08-19 10:26:05 -07:00
codefaber	fd7f69bfe7	[libc] Fix copy/paste error in file.cpp (#150802 ) Fix using wrong variable due to copy/paste error. --------- Co-authored-by: codefaber <codefaber>	2025-08-19 10:05:38 -07:00
Krishna Pandey	550dbec03a	[libc][math][c++23] Add {,u}fromfp{,x}bf16 math functions (#153992 ) This PR adds the following basic math functions for BFloat16 type along with the tests: - fromfpbf16 - fromfpxbf16 - ufromfpbf16 - ufromfpxbf16 --------- Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-19 22:19:03 +05:30
William Huynh	0c622d72fc	[libc] Add _Returns_twice to C++ code (#153602 ) Fixes issue with `<csetjmp>` which requires `_Returns_twice` but in C++ mode	2025-08-19 09:28:23 +01:00
Muhammad Bassiouni	523c3a0197	[libc][math] fix coshf16 build errors. (#154226 )	2025-08-19 03:39:24 +03:00
Muhammad Bassiouni	2c79dc1082	[libc][math] Refactor coshf16 implementation to header-only in src/__support/math folder. (#153582 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-08-19 02:08:07 +03:00
Mohamed Emad	40833eea21	Reland "[libc][math][c23] Implement C23 math function asinpif16" (#152690 ) #146226 with fixing asinpi MPFR number function and make it work when mpfr < `4.2.0`	2025-08-18 00:04:47 +03:00
Aiden Grossman	71925a90c8	[libc] Setup hdrgen for ioctl (#153976 ) This patch adds some hdrgen yaml for ioctl(). Otherwise the function never actually ends up being available in a full build. This is the last thing that is needed to enable turning on LIBCXX_ENABLE_RANDOM_DEVICE.	2025-08-17 08:52:29 -07:00
Aiden Grossman	29d49c8a37	[libc] Correct standard for getcpu (#153982 )	2025-08-16 16:05:45 -07:00
Leandro Lacerda	75bf739208	[libc][gpu] Disable loop unrolling in the throughput benchmark loop (#153971 ) This patch makes GPU throughput benchmark results more comparable across targets by disabling loop unrolling in the benchmark loop. Motivation: * PTX (post-LTO) evidence on NVPTX: for libc `sin`, the generated PTX shows the `throughput` loop unrolled 8x at `N=128` (one iteration advances the input pointer by 64 bytes = 8 doubles), interleaving eight independent chains before the back-edge. This hides latency and significantly reduces cycles/call as the batch size `N` grows. * Observed scaling (NVPTX measurements): with unrolling enabled, `sin` dropped from ~3,100 cycles/call at `N=1` to ~360 at `N=128`. After enforcing `#pragma clang loop unroll(disable)`, results stabilized (e.g., from ~3100 cycles/call at `N=1` to ~2700 at `N=128`). * libdevice contrast: the libdevice `sin` path did not exhibit a similar drop in our measurements, and the PTX appears as compact internal calls rather than a long FMA chain, leaving less ILP for the outer loop to extract. What this change does: * Applies `#pragma clang loop unroll(disable)` to the GPU `throughput()` loop in both NVPTX and AMDGPU backends. Leaving unrolling entirely to the optimizer makes apples-to-apples comparisons uneven (e.g., libc vs. vendor). Disabling unrolling yields fairer, more consistent numbers.	2025-08-16 20:14:26 +00:00
Leandro Lacerda	cf5f311b26	[libc] Polish GPU benchmarking (#153900 ) This patch provides cleanups and improvements for the GPU benchmarking infrastructure. The key changes are: - Fix benchmark convergence bug: Round up the scaled iteration count (ceil) to ensure it grows properly. The previous truncation logic causes the iteration count to get stuck. - Resolve remaining compiler warning. - Remove unused `BenchmarkLogger` files: This is dead code that added maintenance and cognitive overhead without providing functionality. - Improve build hygiene: Clean up headers and CMake dependencies to strictly follow the 'include what you use' (IWYU) principle.	2025-08-15 19:51:52 -05:00
Leandro Lacerda	08ff017fb0	[libc] Improve GPU benchmarking (#153512 ) This patch improves the GPU benchmarking in this way: * Replace `rand`/`srand` with a deterministic per-thread RNG seeded by `call_index`: reproducible, apples-to-apples libc vs vendor comparisons. * Fix input generation: sample the unbiased exponent uniformly in `[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and `+0.0`. * Fix standard deviation: use an explicit estimator from sums and sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples. * Fix throughput overhead: subtract a loop-only baseline inside NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call already corrected (no `overhead()` call). * Adapt existing math benchmarks to the new RNG/timing plumbing (plumb `call_index`, drop `rand/srand`, clean includes). * Correct inter-thread aggregation: use iteration-weighted pooling to compute the global mean/variance, ensuring statistically sound `Cycles (Mean)` and `Stddev`. * Remove `Time / Iteration` column from the results table: it reported per-thread convergence time (not per-call latency) and was redundant/misleading next to `Cycles (Mean)`. * Remove unused `BenchmarkLogger` files: dead code that added maintenance and cognitive overhead without providing functionality. --- ## TODO (before merge) * [ ] Investigate compiler warnings and address their root causes. * [x] Review how per-thread results are aggregated into the overall result. ## Follow-ups (future PRs) * Add support to run throughput benchmarks with uniform (linear) input distributions, alongside the current log2-uniform scheme. * Review/adjust the configuration and coverage of existing math benchmarks. * Add more math benchmarks (e.g., `exp`/`expf`, others).	2025-08-15 11:00:17 -05:00
Mikhail R. Gadelha	d7199544af	[libc] Fix mbrtowc test (#153721 ) Previously, we were trying to memset a pointer that wasn't being initialized, and the test would randomly fail. This PR replaces the pointers with actual objects.	2025-08-15 11:44:33 -03:00
Krishna Pandey	6602d6c7a7	[libc][math][docs] Add documentation for BFloat16 type (#153475 ) Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-15 20:07:33 +05:30
William Huynh	6b16a276ef	[libc] Add startup code for ARM v7-A, ARM v7-R variants (#153576 ) These variants require a different exception table that requires a bit of initialisation. This allows us to enable testing for these variants downstream.	2025-08-15 09:17:50 +00:00
Muhammad Bassiouni	9ddc85f6d5	[libc][math] Refactor coshf implementation to header-only in src/__support/math folder. (#153427 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-08-14 17:19:47 +03:00
Krishna Pandey	41c9510d72	[libc][math][c++23] Add bf16fma{,f,l,f128} math functions (#153231 ) This PR adds the following basic math functions for BFloat16 type along with the tests: - bf16fma - bf16fmaf - bf16fmal - bf16fmaf128 --------- Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-13 23:26:15 +05:30
Muhammad Bassiouni	0f6d3ad0fe	[libc][math] Refactor cosf16 implementation to header-only in src/__support/math folder. (#152871 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-08-13 18:04:35 +03:00
Jin Huang	91de0a2c43	[libc] Refactor libc code to improve readability. (#153308 ) The PR is going to improve the readability for the files under `llvm-project/libc/src/wchar` directory. --------- Co-authored-by: Jin Huang <jingold@google.com>	2025-08-12 21:41:21 -07:00
Alexey Samsonov	04081caa09	[libc] Remove LIBC_ERRNO_MODE_SYSTEM mode. (#153077 ) Use LIBC_ERRNO_MODE_SYSTEM_INLINE instead as the default for the "public packaging" (i.e. release mode) of an overlay build. The Bazel build has already switched to use it by default in 5ccc734fa0355f971f8f515457a0bece33ab6642. This should be a safe change, as LIBC_ERRNO_MODE_SYSTEM_INLINE works a drop-in (but simpler) LIBC_ERRNO_MODE_SYSTEM replacement. Remove the associated code paths and config settings. Fixes issue #143454.	2025-08-12 19:52:40 -07:00
Krishna Pandey	c819c246f3	[libc][math][c++23] Add bf16div{,f,l,f128} math functions (#153191 ) This PR adds the following basic math functions for BFloat16 type along with the tests: - bf16div - bf16divf - bf16divl - bf16divf128 --------- Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-12 21:46:22 +05:30
Krishna Pandey	8c5e9399f6	[libc][math][c++23] Add f{max,min}imum{,_mag,_mag_num,_num}bf16 math functions (#152881 ) This PR adds the following basic math functions for BFloat16 type along with the tests: - fmaximumbf16 - fmaximum_magbf16 - fmaximum_mag_numbf16 - fmaximum_numbf16 - fminimumbf16 - fminimum_magbf16 - fminimum_mag_numbf16 - fminimum_numbf16 --------- Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-12 20:37:31 +05:30
William Huynh	33fe6353ef	Revert "[libc] Add -Wextra for libc tests" (#153169 ) Reverts llvm/llvm-project#133643	2025-08-12 11:40:14 +00:00
Vinay Deshmukh	e617dc80bf	[libc] Add -Wextra for libc tests (#133643 ) * Relates to: https://github.com/llvm/llvm-project/issues/119281	2025-08-12 12:27:13 +01:00
Joseph Huber	005895290d	[libc] Simplifiy slab waiting in GPU memory allocator (#152872 ) Summary: This moves the waiting to be done inside of the `try_lock` routine instead. This makes the logic much simpler since it's just a single loop on a load. We should have the same effect here, and since we don't care about this being a generic interface it shouldn't matter that it waits abit. Still wait free since it's guaranteed to make progress eventually.	2025-08-11 13:11:39 -05:00
Muhammad Bassiouni	200a99073f	[libc][math] Refactor cosf implementation to header-only in src/__support/math folder. (#152069 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-08-11 21:08:21 +03:00
William Huynh	372d86dcf1	[libc] Cleanup startup/baremetal/arm/start.cpp (#151532 ) Post-commit review changes as suggested by @petrhosek in #146863	2025-08-11 15:11:36 +01:00
William Huynh	0adcbc0228	[libc] Disable LlvmLibcTimespecGet.Monotonic for baremetal targets (#152290 ) This test was caught by our hermetic testing downstream. The baremetal implementation does not support monotonic, so we disable it.	2025-08-11 14:48:11 +01:00
Krishna Pandey	628c0e33e4	[libc][math][c++23] Add bf16mul{,f,l,f128} math functions (#152847 ) This PR adds the following basic math functions for BFloat16 type along with the tests: - bf16mul - bf16mulf - bf16mull - bf16mulf128 --------- Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-10 21:38:27 -04:00
Muhammad Bassiouni	e86c9b6b1b	[libc][math] Refactor cos implementation to header-only in src/__support/math folder. (#151883 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-08-09 18:51:54 +03:00
Joseph Huber	0c139883f4	[libc] Fix server code when GPU is acting as the server Summary: Small fix that just ignores all the extra lanes if we're running the server from a platform that potentially has more.	2025-08-08 19:15:13 -05:00
Krishna Pandey	246f92324f	[libc][math][c++23] Add f{max,min}bf16 math functions (#152782 ) Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-08 17:54:02 -04:00
Krishna Pandey	10088b64ef	[libc][math] Update entrypoints for bf16{add,sub}{,f,l,f128} math functions (#152784 ) Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-08 17:06:48 -04:00
Joseph Huber	ca006898b3	[libc] Cache old slabs when allocating GPU memory (#151866 ) Summary: This patch introduces a lock-free stack used to store a fixed number of slabs. Instead of going directly through RPC memory, we instead can consult the cache and use that. Currently, this means that ~64 MiB of memory will remain in-use if the user completely fills the cache. However, because we always fully destroy the object, the chunk size can be reset so they can be fully reused. This greatly improves performance in cases where the user has previously accessed malloc, lowering the difference between an implementation that does not free slabs at all and one that does. We can also skip the expensive zeroing step if the old chunk size was smaller than the previous one. Smaller chunk sizes need a larger bitfield, and because we know for a fact that the number of users remaining in this slab is zero thanks to the reference counting we can guarantee that the bitfield is all zero like when it was initialized.	2025-08-08 14:28:41 -05:00
Krishna Pandey	1ffb99520d	[libc][math][c++23] Add bf16{add,sub}{,f,l,f128} math functions (#152774 ) This PR adds implements following basic math functions for BFloat16 type along with the tests: - bf16add - bf16addf - bf16addl - bf16addf128 - bf16sub - bf16subf - bf16subl - bf16subf128 --------- Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>	2025-08-08 15:20:24 -04:00
Muhammad Bassiouni	45b15946b1	[libc][hdrgen] Fix hdrgen when using macros as guards in stdlib.yaml. (#152732 )	2025-08-08 18:39:47 +03:00
Muhammad Bassiouni	66734f4c3c	[libc][math] Refactor cbrtf implementation to header-only in src/__support/math folder. (#151846 ) Part of #147386 in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450	2025-08-08 18:28:50 +03:00

1 2 3 4 5 ...

4543 Commits