There was previously an issue where registering multiple benchmarks in
the same file would only give the results for the last benchmark to run.
This PR fixes the issue.
@jhuber6
PR for adding microbenchmarking infrastructure for NVPTX. `nvlink`
cannot perform LTO, so we cannot inline `libc` functions and this
function call overhead is not adjusted for during microbenchmarking.