llvm-project

Author	SHA1	Message	Date
Leandro Lacerda	cf5f311b26	[libc] Polish GPU benchmarking (#153900 ) This patch provides cleanups and improvements for the GPU benchmarking infrastructure. The key changes are: - Fix benchmark convergence bug: Round up the scaled iteration count (ceil) to ensure it grows properly. The previous truncation logic causes the iteration count to get stuck. - Resolve remaining compiler warning. - Remove unused `BenchmarkLogger` files: This is dead code that added maintenance and cognitive overhead without providing functionality. - Improve build hygiene: Clean up headers and CMake dependencies to strictly follow the 'include what you use' (IWYU) principle.	2025-08-15 19:51:52 -05:00
Leandro Lacerda	08ff017fb0	[libc] Improve GPU benchmarking (#153512 ) This patch improves the GPU benchmarking in this way: * Replace `rand`/`srand` with a deterministic per-thread RNG seeded by `call_index`: reproducible, apples-to-apples libc vs vendor comparisons. * Fix input generation: sample the unbiased exponent uniformly in `[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and `+0.0`. * Fix standard deviation: use an explicit estimator from sums and sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples. * Fix throughput overhead: subtract a loop-only baseline inside NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call already corrected (no `overhead()` call). * Adapt existing math benchmarks to the new RNG/timing plumbing (plumb `call_index`, drop `rand/srand`, clean includes). * Correct inter-thread aggregation: use iteration-weighted pooling to compute the global mean/variance, ensuring statistically sound `Cycles (Mean)` and `Stddev`. * Remove `Time / Iteration` column from the results table: it reported per-thread convergence time (not per-call latency) and was redundant/misleading next to `Cycles (Mean)`. * Remove unused `BenchmarkLogger` files: dead code that added maintenance and cognitive overhead without providing functionality. --- ## TODO (before merge) * [ ] Investigate compiler warnings and address their root causes. * [x] Review how per-thread results are aggregated into the overall result. ## Follow-ups (future PRs) * Add support to run throughput benchmarks with uniform (linear) input distributions, alongside the current log2-uniform scheme. * Review/adjust the configuration and coverage of existing math benchmarks. * Add more math benchmarks (e.g., `exp`/`expf`, others).	2025-08-15 11:00:17 -05:00
lntue	66603dd1f1	[libc][NFC] Add stdint.h proxy header to fix dependency issue with <stdint.h> includes. (#150303 ) https://github.com/llvm/llvm-project/issues/149993	2025-07-23 20:19:52 -04:00
jameshu15869	deb6b45c32	[libc][gpu] Add Atan2 Benchmarks (#104708 ) This PR adds benchmarking for `atan2()`, `__nv_atan2()`, and `__ocml_atan2_f64()` using the same setup as `sin()`. This PR also adds support for throughout bencmarking for functions with 2 inputs.	2024-08-18 12:50:30 -05:00
jameshu15869	9a070d6d0f	[libc] [gpu] Add Generic, NvSin, and OcmlSinf64 Throughput Benchmark (#101917 ) This PR implements `2a158426d4` to provide better throughput benchmarking for libc `sin()` and `__nv_sin()`. These changes have not been tested on AMDGPU yet, only compiled.	2024-08-08 15:05:34 -05:00
jameshu15869	677796cab3	[libc] Add Generic and NVPTX Sin Benchmark (#99795 ) This PR adds sin benchmarking for a range of values and on a pregenerated random distribution.	2024-07-29 22:09:11 -05:00
jameshu15869	a09c0f676d	[libc] Add Minimum Time and Iterations, Reduce Epsilon (#100838 ) This PR adds minimums (50 iterations, 500 us, and epsilon of 0.0001) to ensure that all benchmarks run at least a set number of times before outputting a final measurement.	2024-07-26 20:30:19 -05:00
jameshu15869	197b142232	[libc] Add N Threads Benchmark Helper (#99834 ) This PR adds a `BENCHMARK_N_THREADS()` helper to register benchmarks with a specific number of threads. This PR replaces the flags used originally to allow any amount of threads.	2024-07-21 21:56:40 -05:00
jameshu15869	a964f2e8a1	[libc] Improve Benchmark UI (#99796 ) This PR changes the output to resemble Google Benchmark. e.g. ``` Running Suite: LlvmLibcIsAlNumGpuBenchmark Benchmark \| Cycles \| Min \| Max \| Iterations \| Time (ns) \| Stddev \| Threads \| ----------------------------------------------------------------------------------------------------- IsAlnum \| 92 \| 76 \| 482 \| 23 \| 86500 \| 76 \| 64 \| IsAlnumSingleThread \| 87 \| 76 \| 302 \| 20 \| 72000 \| 49 \| 1 \| IsAlnumSingleWave \| 87 \| 76 \| 302 \| 20 \| 72000 \| 49 \| 32 \| IsAlnumCapital \| 89 \| 76 \| 299 \| 17 \| 78500 \| 52 \| 64 \| IsAlnumNotAlnum \| 87 \| 76 \| 303 \| 20 \| 76000 \| 49 \| 64 \| ```	2024-07-21 16:40:01 -05:00
jameshu15869	8badfccefe	[libc] Add Multithreaded GPU Benchmarks (#98964 ) This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by default, adding the option for single threaded benchmarks. We can specify that a benchmark should be run on a single thread using the `SINGLE_THREADED_BENCHMARK()` macro. I chose to use a flag here so that other options could be added in the future.	2024-07-18 07:18:23 -05:00
jameshu15869	b42c332d73	[libc] Use Atomics in GPU Benchmarks (#98842 ) This PR replaces our old method of reducing the benchmark results by using an array to using atomics instead. This should help us implement single threaded benchmarks.	2024-07-15 07:08:23 -05:00
Petr Hosek	5ff3ff33ff	[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597 ) This is a part of #97655.	2024-07-12 09:28:41 -07:00
Mehdi Amini	ce9035f5bd	Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration" (#98593 ) Reverts llvm/llvm-project#98075 bots are broken	2024-07-12 09:12:13 +02:00
Petr Hosek	3f30effe1b	[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075 ) This is a part of #97655.	2024-07-11 12:35:22 -07:00
jameshu15869	f4e6ddbc2e	[libc] Fix Cppcheck Issues (#96999 ) This PR fixes linting issues discovered by `cppcheck`. Fixes: https://github.com/llvm/llvm-project/issues/96863	2024-07-06 17:53:36 -05:00
jameshu15869	02b57dedb7	[libc] NVPTX Profiling (#92009 ) PR for adding microbenchmarking infrastructure for NVPTX. `nvlink` cannot perform LTO, so we cannot inline `libc` functions and this function call overhead is not adjusted for during microbenchmarking.	2024-06-26 16:38:39 -05:00

16 Commits