This PR enables `test-lower-to-nvvm` pass pipeline for the integration
tests for NVIDIA sm_90 architecture.
This PR adjusts `test-lower-to-nvvm` pass in two ways:
1) Calls `createConvertNVGPUToNVVMPass` before the outlining process.
This particular pass is responsible for generating both device and host
code. On the host, it calls the CUDA driver to build the TMA descriptor
(`cuTensorMap`).
2) Integrates the `createConvertNVVMToLLVMPass` to generate PTXs for
NVVM Ops.