This PR adds benchmarking for `atan2()`, `__nv_atan2()`, and
`__ocml_atan2_f64()` using the same setup as `sin()`. This PR also adds
support for throughout bencmarking for functions with 2 inputs.
This PR implements
2a158426d4
to provide better throughput benchmarking for libc `sin()` and
`__nv_sin()`.
These changes have not been tested on AMDGPU yet, only compiled.
PR for adding AMDGPU timing utils for benchmarking.
I was not able to test this code since I do not have an AMD GPU, but I
was able to successfully compile this code using
-DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_TEST_ARCHITECTURE=gfx90a
-DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_LOADER_EXECUTABLE=echo
-DRUNTIMES_amdgcn_amd-amdhsa_LIBC_GPU_TARGET_ARCHITECTURE=gfx90a to
force the code to compile without having an AMD gpu on my machine.
@jhuber6