16 Commits

Author SHA1 Message Date
Leandro Lacerda
cf5f311b26
[libc] Polish GPU benchmarking (#153900)
This patch provides cleanups and improvements for the GPU benchmarking
infrastructure. The key changes are:

- Fix benchmark convergence bug: Round up the scaled iteration count
(ceil) to ensure it grows properly. The previous truncation logic causes
the iteration count to get stuck.
- Resolve remaining compiler warning.
- Remove unused `BenchmarkLogger` files: This is dead code that added
maintenance and cognitive overhead without providing functionality.
- Improve build hygiene: Clean up headers and CMake dependencies to
strictly follow the 'include what you use' (IWYU) principle.
2025-08-15 19:51:52 -05:00
Leandro Lacerda
08ff017fb0
[libc] Improve GPU benchmarking (#153512)
This patch improves the GPU benchmarking in this way:

* Replace `rand`/`srand` with a deterministic per-thread RNG seeded by
`call_index`: reproducible, apples-to-apples libc vs vendor comparisons.
* Fix input generation: sample the unbiased exponent uniformly in
`[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and
`+0.0`.
* Fix standard deviation: use an explicit estimator from sums and
sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples.
* Fix throughput overhead: subtract a loop-only baseline inside
NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call
already corrected (no `overhead()` call).
* Adapt existing math benchmarks to the new RNG/timing plumbing (plumb
`call_index`, drop `rand/srand`, clean includes).
* Correct inter-thread aggregation: use iteration-weighted pooling to
compute the global mean/variance, ensuring statistically sound `Cycles
(Mean)` and `Stddev`.
* Remove `Time / Iteration` column from the results table: it reported
per-thread convergence time (not per-call latency) and was
redundant/misleading next to `Cycles (Mean)`.
* Remove unused `BenchmarkLogger` files: dead code that added
maintenance and cognitive overhead without providing functionality.

---

## TODO (before merge)

* [ ] Investigate compiler warnings and address their root causes.
* [x] Review how per-thread results are aggregated into the overall
result.

## Follow-ups (future PRs)

* Add support to run throughput benchmarks with uniform (linear) input
distributions, alongside the current log2-uniform scheme.
* Review/adjust the configuration and coverage of existing math
benchmarks.
* Add more math benchmarks (e.g., `exp`/`expf`, others).
2025-08-15 11:00:17 -05:00
lntue
66603dd1f1
[libc][NFC] Add stdint.h proxy header to fix dependency issue with <stdint.h> includes. (#150303)
https://github.com/llvm/llvm-project/issues/149993
2025-07-23 20:19:52 -04:00
jameshu15869
deb6b45c32
[libc][gpu] Add Atan2 Benchmarks (#104708)
This PR adds benchmarking for `atan2()`, `__nv_atan2()`, and
`__ocml_atan2_f64()` using the same setup as `sin()`. This PR also adds
support for throughout bencmarking for functions with 2 inputs.
2024-08-18 12:50:30 -05:00
jameshu15869
9a070d6d0f
[libc] [gpu] Add Generic, NvSin, and OcmlSinf64 Throughput Benchmark (#101917)
This PR implements
2a158426d4
to provide better throughput benchmarking for libc `sin()` and
`__nv_sin()`.

These changes have not been tested on AMDGPU yet, only compiled.
2024-08-08 15:05:34 -05:00
jameshu15869
677796cab3
[libc] Add Generic and NVPTX Sin Benchmark (#99795)
This PR adds sin benchmarking for a range of values and on a
pregenerated random distribution.
2024-07-29 22:09:11 -05:00
jameshu15869
a09c0f676d
[libc] Add Minimum Time and Iterations, Reduce Epsilon (#100838)
This PR adds minimums (50 iterations, 500 us, and epsilon of 0.0001) to
ensure that all benchmarks run at least a set number of times before
outputting a final measurement.
2024-07-26 20:30:19 -05:00
jameshu15869
197b142232
[libc] Add N Threads Benchmark Helper (#99834)
This PR adds a `BENCHMARK_N_THREADS()` helper to register benchmarks
with a specific number of threads. This PR replaces the flags used
originally to allow any amount of threads.
2024-07-21 21:56:40 -05:00
jameshu15869
a964f2e8a1
[libc] Improve Benchmark UI (#99796)
This PR changes the output to resemble Google Benchmark. e.g.

```
Running Suite: LlvmLibcIsAlNumGpuBenchmark
Benchmark            |  Cycles |     Min |     Max | Iterations |   Time (ns) |   Stddev |  Threads |
-----------------------------------------------------------------------------------------------------
IsAlnum              |      92 |      76 |     482 |         23 |       86500 |       76 |       64 |
IsAlnumSingleThread  |      87 |      76 |     302 |         20 |       72000 |       49 |        1 |
IsAlnumSingleWave    |      87 |      76 |     302 |         20 |       72000 |       49 |       32 |
IsAlnumCapital       |      89 |      76 |     299 |         17 |       78500 |       52 |       64 |
IsAlnumNotAlnum      |      87 |      76 |     303 |         20 |       76000 |       49 |       64 |
```
2024-07-21 16:40:01 -05:00
jameshu15869
8badfccefe
[libc] Add Multithreaded GPU Benchmarks (#98964)
This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by
default, adding the option for single threaded benchmarks. We can
specify that a benchmark should be run on a single thread using the
`SINGLE_THREADED_BENCHMARK()` macro.

I chose to use a flag here so that other options could be added in the
future.
2024-07-18 07:18:23 -05:00
jameshu15869
b42c332d73
[libc] Use Atomics in GPU Benchmarks (#98842)
This PR replaces our old method of reducing the benchmark results by
using an array to using atomics instead. This should help us implement
single threaded benchmarks.
2024-07-15 07:08:23 -05:00
Petr Hosek
5ff3ff33ff
[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)
This is a part of #97655.
2024-07-12 09:28:41 -07:00
Mehdi Amini
ce9035f5bd
Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration" (#98593)
Reverts llvm/llvm-project#98075

bots are broken
2024-07-12 09:12:13 +02:00
Petr Hosek
3f30effe1b
[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)
This is a part of #97655.
2024-07-11 12:35:22 -07:00
jameshu15869
f4e6ddbc2e
[libc] Fix Cppcheck Issues (#96999)
This PR fixes linting issues discovered by `cppcheck`.

Fixes: https://github.com/llvm/llvm-project/issues/96863
2024-07-06 17:53:36 -05:00
jameshu15869
02b57dedb7
[libc] NVPTX Profiling (#92009)
PR for adding microbenchmarking infrastructure for NVPTX. `nvlink`
cannot perform LTO, so we cannot inline `libc` functions and this
function call overhead is not adjusted for during microbenchmarking.
2024-06-26 16:38:39 -05:00