Implement the feature about perf by stage(llvm-ir -> isa, isa->binary).
The results will be stored into the properties, then users can use them
after using GpuModuleToBinary Pass.
Added missing register_translations in python to replicate the same in
the C-API
Cleaned up the current calls to register passes where the other calls
are already embedded in the mlirRegisterAllPasses.
found here,
https://discourse.llvm.org/t/opencl-example/74187