Summary: This PR enables the basic unit tests for builtins to be run on the GPU architectures. Other targets like profiling are supported, but the host-device natures will make it more difficult to adequately unit test. It may be be possible to do basic tests there, to simply verify that counters are present and in the proper format for when they are copied to the host.