llvm-project

Author	SHA1	Message	Date
Leandro Lacerda	b1f5da4328	[libc][gpu] Add exp/log benchmarks and flexible input generation (#155727 ) This patch adds GPU benchmarks for the exp (`exp`, `expf`, `expf16`) and log (`log`, `logf`, `logf16`) families of math functions. Adding these benchmarks revealed a key limitation in the existing framework: the input generation mechanism was hardcoded to a single strategy that sampled numbers with a uniform distribution of their unbiased exponents. While this strategy is effective for values spanning multiple orders of magnitude, it is not suitable for linear ranges. The previous framework lacked the flexibility to support this. ### Summary of Changes 1. Framework Refactoring for Flexible Input Sampling: The GPU benchmark framework was refactored to support multiple, pluggable input sampling strategies. * `Random.h`: A new header was created to house the `RandomGenerator` and the new distribution classes. * Distribution Classes: Two sampling strategies were implemented: * `UniformExponent`: Formalizes the previous logic of sampling numbers with a uniform distribution of their unbiased exponents. It can now also be configured to produce only positive values, which is essential for functions like `log`. * `UniformLinear`: A new strategy that samples numbers from a uniform distribution over a linear interval `[min, max)`. * `MathPerf` Update: The `MathPerf` class was updated with a generic `run_throughput` method that is templated on a distribution object. This makes the framework extensible to future sampling strategies. 2. New Benchmarks for `exp` and `log`: Using the newly refactored framework, benchmarks were added for `exp`, `expf`, `expf16`, `log`, `logf`, and `logf16`. The test intervals were carefully chosen to measure the performance of distinct behavioral regions of each function.	2025-08-27 21:05:10 -05:00
Leandro Lacerda	08ff017fb0	[libc] Improve GPU benchmarking (#153512 ) This patch improves the GPU benchmarking in this way: * Replace `rand`/`srand` with a deterministic per-thread RNG seeded by `call_index`: reproducible, apples-to-apples libc vs vendor comparisons. * Fix input generation: sample the unbiased exponent uniformly in `[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and `+0.0`. * Fix standard deviation: use an explicit estimator from sums and sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples. * Fix throughput overhead: subtract a loop-only baseline inside NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call already corrected (no `overhead()` call). * Adapt existing math benchmarks to the new RNG/timing plumbing (plumb `call_index`, drop `rand/srand`, clean includes). * Correct inter-thread aggregation: use iteration-weighted pooling to compute the global mean/variance, ensuring statistically sound `Cycles (Mean)` and `Stddev`. * Remove `Time / Iteration` column from the results table: it reported per-thread convergence time (not per-call latency) and was redundant/misleading next to `Cycles (Mean)`. * Remove unused `BenchmarkLogger` files: dead code that added maintenance and cognitive overhead without providing functionality. --- ## TODO (before merge) * [ ] Investigate compiler warnings and address their root causes. * [x] Review how per-thread results are aggregated into the overall result. ## Follow-ups (future PRs) * Add support to run throughput benchmarks with uniform (linear) input distributions, alongside the current log2-uniform scheme. * Review/adjust the configuration and coverage of existing math benchmarks. * Add more math benchmarks (e.g., `exp`/`expf`, others).	2025-08-15 11:00:17 -05:00
lntue	66603dd1f1	[libc][NFC] Add stdint.h proxy header to fix dependency issue with <stdint.h> includes. (#150303 ) https://github.com/llvm/llvm-project/issues/149993	2025-07-23 20:19:52 -04:00
Joseph Huber	de59e7b86c	[libc] Fix GPU benchmarking	2025-07-18 14:36:23 -05:00
jameshu15869	deb6b45c32	[libc][gpu] Add Atan2 Benchmarks (#104708 ) This PR adds benchmarking for `atan2()`, `__nv_atan2()`, and `__ocml_atan2_f64()` using the same setup as `sin()`. This PR also adds support for throughout bencmarking for functions with 2 inputs.	2024-08-18 12:50:30 -05:00
jameshu15869	2b592b16c1	[libc][gpu] Add Sinf Benchmarks (#102532 ) This PR adds benchmarking for `sinf()` using the same set up as `sin()` but with a smaller range for floats.	2024-08-08 16:26:26 -05:00
jameshu15869	1248698e9b	[libc] [gpu] Fix Minor Benchmark UI Issues (#102529 ) Previously, `AmdgpuSinTwoPow_128` and others were too large for their table cells. This PR shortens the name to `AmdSin...` There were also some `-` missing in the separator. This PR instead creates the separator string using the length of the headers.	2024-08-08 15:32:20 -05:00
jameshu15869	9a070d6d0f	[libc] [gpu] Add Generic, NvSin, and OcmlSinf64 Throughput Benchmark (#101917 ) This PR implements `2a158426d4` to provide better throughput benchmarking for libc `sin()` and `__nv_sin()`. These changes have not been tested on AMDGPU yet, only compiled.	2024-08-08 15:05:34 -05:00
Joseph Huber	ebdcb76d1a	[libc] Only link in the appropriate architecture's device libs	2024-07-30 18:36:41 -05:00
jameshu15869	8f7910a4fc	[libc] Add AMDGPU Sin Benchmark (#101120 ) This PR adds support for benchmarking `__ocml_sin_f64()` against `sin()`. This PR is currently a draft because I do not have access to an AMD GPU and was not able to test the PR, but the code compiled when I ran `ninja gpu-benchmark` from `runtimes-amdgcn-amd-amdhsa-bins` Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-07-30 10:19:48 -05:00
jameshu15869	677796cab3	[libc] Add Generic and NVPTX Sin Benchmark (#99795 ) This PR adds sin benchmarking for a range of values and on a pregenerated random distribution.	2024-07-29 22:09:11 -05:00

11 Commits