26 Commits

Author SHA1 Message Date
jameshu15869
deb6b45c32
[libc][gpu] Add Atan2 Benchmarks (#104708)
This PR adds benchmarking for `atan2()`, `__nv_atan2()`, and
`__ocml_atan2_f64()` using the same setup as `sin()`. This PR also adds
support for throughout bencmarking for functions with 2 inputs.
2024-08-18 12:50:30 -05:00
Schrodinger ZHU Yifan
b7c7dbd473
Revert "libc: Remove extern "C" from main declarations" (#102827)
Reverts llvm/llvm-project#102825
2024-08-11 13:40:50 -07:00
David Blaikie
1b71c471c7
libc: Remove extern "C" from main declarations (#102825)
This is invalid in C++, and clang recently started warning on it as of
#101853
2024-08-11 13:17:27 -07:00
jameshu15869
2b592b16c1
[libc][gpu] Add Sinf Benchmarks (#102532)
This PR adds benchmarking for `sinf()` using the same set up as `sin()`
but with a smaller range for floats.
2024-08-08 16:26:26 -05:00
jameshu15869
1248698e9b
[libc] [gpu] Fix Minor Benchmark UI Issues (#102529)
Previously, `AmdgpuSinTwoPow_128` and others were too large for their
table cells. This PR shortens the name to `AmdSin...`

There were also some `-` missing in the separator. This PR instead
creates the separator string using the length of the headers.
2024-08-08 15:32:20 -05:00
jameshu15869
9a070d6d0f
[libc] [gpu] Add Generic, NvSin, and OcmlSinf64 Throughput Benchmark (#101917)
This PR implements
2a158426d4
to provide better throughput benchmarking for libc `sin()` and
`__nv_sin()`.

These changes have not been tested on AMDGPU yet, only compiled.
2024-08-08 15:05:34 -05:00
jameshu15869
39826b1030
[libc] [gpu] Change Time To Be Per Iteration (#101919)
Previously, the time field was the total time take to run all iterations
of the benchmark. This PR changes the value displayed to be the average
time take by each iteration.
2024-08-05 08:27:31 -05:00
Joseph Huber
ebdcb76d1a [libc] Only link in the appropriate architecture's device libs 2024-07-30 18:36:41 -05:00
jameshu15869
8f7910a4fc
[libc] Add AMDGPU Sin Benchmark (#101120)
This PR adds support for benchmarking `__ocml_sin_f64()` against
`sin()`. This PR is currently a draft because I do not have access to an
AMD GPU and was not able to test the PR, but the code compiled when I
ran `ninja gpu-benchmark` from `runtimes-amdgcn-amd-amdhsa-bins`

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-07-30 10:19:48 -05:00
jameshu15869
677796cab3
[libc] Add Generic and NVPTX Sin Benchmark (#99795)
This PR adds sin benchmarking for a range of values and on a
pregenerated random distribution.
2024-07-29 22:09:11 -05:00
Joseph Huber
79afb94da1 [libc] Make NVPTX benchmarks use LTO for linking
Summary:
Now that we can do LTO, we can make the benchmarks more accurate by
allowing optimization + inlining of the implementation.
2024-07-27 06:53:12 -05:00
jameshu15869
a09c0f676d
[libc] Add Minimum Time and Iterations, Reduce Epsilon (#100838)
This PR adds minimums (50 iterations, 500 us, and epsilon of 0.0001) to
ensure that all benchmarks run at least a set number of times before
outputting a final measurement.
2024-07-26 20:30:19 -05:00
Joseph Huber
6911f823ad [libc] Fix invalid format specifier in benchmark
Summary:
This value is a uint32_t but is printed as a uint64_t, leading to
invalid offsets when done on AMDGPU due to its packed format extending
past the buffer.
2024-07-22 11:21:22 -05:00
jameshu15869
197b142232
[libc] Add N Threads Benchmark Helper (#99834)
This PR adds a `BENCHMARK_N_THREADS()` helper to register benchmarks
with a specific number of threads. This PR replaces the flags used
originally to allow any amount of threads.
2024-07-21 21:56:40 -05:00
jameshu15869
a964f2e8a1
[libc] Improve Benchmark UI (#99796)
This PR changes the output to resemble Google Benchmark. e.g.

```
Running Suite: LlvmLibcIsAlNumGpuBenchmark
Benchmark            |  Cycles |     Min |     Max | Iterations |   Time (ns) |   Stddev |  Threads |
-----------------------------------------------------------------------------------------------------
IsAlnum              |      92 |      76 |     482 |         23 |       86500 |       76 |       64 |
IsAlnumSingleThread  |      87 |      76 |     302 |         20 |       72000 |       49 |        1 |
IsAlnumSingleWave    |      87 |      76 |     302 |         20 |       72000 |       49 |       32 |
IsAlnumCapital       |      89 |      76 |     299 |         17 |       78500 |       52 |       64 |
IsAlnumNotAlnum      |      87 |      76 |     303 |         20 |       76000 |       49 |       64 |
```
2024-07-21 16:40:01 -05:00
jameshu15869
ef47bbb471
[libc] Add AMDGPU Timing to CMake (#99603)
`libc/benchmarks/gpu/timing/CMakeLists.txt` did not correctly build
`amdgpu` utils. This PR fixes that issue by adding `amdgpu` to the loop
that adds the correct sub directories.
2024-07-19 06:57:55 -05:00
jameshu15869
8badfccefe
[libc] Add Multithreaded GPU Benchmarks (#98964)
This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by
default, adding the option for single threaded benchmarks. We can
specify that a benchmark should be run on a single thread using the
`SINGLE_THREADED_BENCHMARK()` macro.

I chose to use a flag here so that other options could be added in the
future.
2024-07-18 07:18:23 -05:00
jameshu15869
1ecffdaf27
[libc] Add Kernel Resource Usage to nvptx-loader (#97503)
This PR allows `nvptx-loader` to read the resource usage of `_start`,
`_begin`, and `_end` when executing CUDA binaries.

Example output:
```
$ nvptx-loader --print-resource-usage libc/benchmarks/gpu/src/ctype/libc.benchmarks.gpu.src.ctype.isalnum_benchmark.__build__
[ RUN      ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper
[       OK ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper: 93 cycles, 76 min, 470 max, 23 iterations, 78000 ns, 80 stddev
_begin registers: 25
_start registers: 80
_end registers: 62
  ```

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-07-17 16:07:12 -05:00
jameshu15869
b42c332d73
[libc] Use Atomics in GPU Benchmarks (#98842)
This PR replaces our old method of reducing the benchmark results by
using an array to using atomics instead. This should help us implement
single threaded benchmarks.
2024-07-15 07:08:23 -05:00
Petr Hosek
5ff3ff33ff
[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)
This is a part of #97655.
2024-07-12 09:28:41 -07:00
Mehdi Amini
ce9035f5bd
Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration" (#98593)
Reverts llvm/llvm-project#98075

bots are broken
2024-07-12 09:12:13 +02:00
Petr Hosek
3f30effe1b
[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)
This is a part of #97655.
2024-07-11 12:35:22 -07:00
jameshu15869
eeed5896de
[libc] Correctly Run Multiple Benchmarks in the Same File (#98467)
There was previously an issue where registering multiple benchmarks in
the same file would only give the results for the last benchmark to run.
This PR fixes the issue.

@jhuber6
2024-07-11 06:58:10 -05:00
jameshu15869
eb66e31bc2
[libc] Add Timing Utils for AMDGPU (#96828)
PR for adding AMDGPU timing utils for benchmarking.

I was not able to test this code since I do not have an AMD GPU, but I
was able to successfully compile this code using
-DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_TEST_ARCHITECTURE=gfx90a
-DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_LOADER_EXECUTABLE=echo
-DRUNTIMES_amdgcn_amd-amdhsa_LIBC_GPU_TARGET_ARCHITECTURE=gfx90a to
force the code to compile without having an AMD gpu on my machine.

@jhuber6
2024-07-10 16:04:56 -05:00
jameshu15869
f4e6ddbc2e
[libc] Fix Cppcheck Issues (#96999)
This PR fixes linting issues discovered by `cppcheck`.

Fixes: https://github.com/llvm/llvm-project/issues/96863
2024-07-06 17:53:36 -05:00
jameshu15869
02b57dedb7
[libc] NVPTX Profiling (#92009)
PR for adding microbenchmarking infrastructure for NVPTX. `nvlink`
cannot perform LTO, so we cannot inline `libc` functions and this
function call overhead is not adjusted for during microbenchmarking.
2024-06-26 16:38:39 -05:00