10 Commits

Author SHA1 Message Date
jameshu15869
deb6b45c32
[libc][gpu] Add Atan2 Benchmarks (#104708)
This PR adds benchmarking for `atan2()`, `__nv_atan2()`, and
`__ocml_atan2_f64()` using the same setup as `sin()`. This PR also adds
support for throughout bencmarking for functions with 2 inputs.
2024-08-18 12:50:30 -05:00
jameshu15869
2b592b16c1
[libc][gpu] Add Sinf Benchmarks (#102532)
This PR adds benchmarking for `sinf()` using the same set up as `sin()`
but with a smaller range for floats.
2024-08-08 16:26:26 -05:00
jameshu15869
1248698e9b
[libc] [gpu] Fix Minor Benchmark UI Issues (#102529)
Previously, `AmdgpuSinTwoPow_128` and others were too large for their
table cells. This PR shortens the name to `AmdSin...`

There were also some `-` missing in the separator. This PR instead
creates the separator string using the length of the headers.
2024-08-08 15:32:20 -05:00
jameshu15869
9a070d6d0f
[libc] [gpu] Add Generic, NvSin, and OcmlSinf64 Throughput Benchmark (#101917)
This PR implements
2a158426d4
to provide better throughput benchmarking for libc `sin()` and
`__nv_sin()`.

These changes have not been tested on AMDGPU yet, only compiled.
2024-08-08 15:05:34 -05:00
Joseph Huber
ebdcb76d1a [libc] Only link in the appropriate architecture's device libs 2024-07-30 18:36:41 -05:00
jameshu15869
8f7910a4fc
[libc] Add AMDGPU Sin Benchmark (#101120)
This PR adds support for benchmarking `__ocml_sin_f64()` against
`sin()`. This PR is currently a draft because I do not have access to an
AMD GPU and was not able to test the PR, but the code compiled when I
ran `ninja gpu-benchmark` from `runtimes-amdgcn-amd-amdhsa-bins`

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-07-30 10:19:48 -05:00
jameshu15869
677796cab3
[libc] Add Generic and NVPTX Sin Benchmark (#99795)
This PR adds sin benchmarking for a range of values and on a
pregenerated random distribution.
2024-07-29 22:09:11 -05:00
jameshu15869
8badfccefe
[libc] Add Multithreaded GPU Benchmarks (#98964)
This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by
default, adding the option for single threaded benchmarks. We can
specify that a benchmark should be run on a single thread using the
`SINGLE_THREADED_BENCHMARK()` macro.

I chose to use a flag here so that other options could be added in the
future.
2024-07-18 07:18:23 -05:00
jameshu15869
eeed5896de
[libc] Correctly Run Multiple Benchmarks in the Same File (#98467)
There was previously an issue where registering multiple benchmarks in
the same file would only give the results for the last benchmark to run.
This PR fixes the issue.

@jhuber6
2024-07-11 06:58:10 -05:00
jameshu15869
02b57dedb7
[libc] NVPTX Profiling (#92009)
PR for adding microbenchmarking infrastructure for NVPTX. `nvlink`
cannot perform LTO, so we cannot inline `libc` functions and this
function call overhead is not adjusted for during microbenchmarking.
2024-06-26 16:38:39 -05:00