This PR makes the compilation log from ISA compiler available to users
by returning it as part of the `gpu::ObjectAttr` properties, following
the existing pattern like `LLVMIRToISATimeInMs`.
Currently, the compiler log (which contains useful information such as
spill statistics when --verbose is passed) is only accessible in debug
builds via `LLVM_DEBUG`. However, there are good reasons to make this
information available in release builds as well:
1. Both `ptxas` and `libnvptxcompiler` are publicly available
tools/libraries distributed with the CUDA Toolkit. The `--verbose` flag
and its output are documented public features, not internal debug
information.
2. The verbose output provides valuable insights for users.
A new `SerializedObject` class is used to carry the metadata alongside
the binary when returning from `serializeObject`.
This change adds the `verify-target-arch` option to the
`nvvm-attach-target` to control the `verifyTarget` parameter in the
attached `NVVMTargetAttr` which is used to enable/disable the
verification of the target architecture with respect to the NVVM Ops.
This PR adds `cmd-options` to the `gpu-lower-to-nvvm-pipeline` pipeline
and the `nvvm-attach-target` pass, allowing users to pass flags to the
downstream compiler, *ptxas*.
Example:
```
mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-chip=sm_80 ptxas-cmd-options='-v --register-usage-level=8'"
```
This revision avoids the registration of dialect extensions in Pass::getDependentDialects.
Such registration of extensions can be dangerous because `DialectRegistry::isSubsetOf` is
always guaranteed to return false for extensions (i.e. there is no mechanism to track
whether a lambda is already in the list of already registered extensions).
When the context is already in a multi-threaded mode, this is guaranteed to assert.
Arguably a more structured registration mechanism for extensions with a unique ExtensionID
could be envisioned in the future.
In the process of cleaning this up, multiple usage inconsistencies surfaced around the
registration of translation extensions that this revision also cleans up.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157703
Adds the passes `nvvm-attach-target` & `rocdl-attach-target for attaching `nvvm.target` & `rocdl.target` attributes to GPU Modules.
These passes search GPU Modules in the immediate region of the Op being acted on, attaching the target attribute to the module.
Modules can be selected using a regex string, allowing fine grain attachment of targets, see the test `attach-target.mlir` for an example.
Depends on D154153
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D157351