This PR makes the compilation log from ISA compiler available to users
by returning it as part of the `gpu::ObjectAttr` properties, following
the existing pattern like `LLVMIRToISATimeInMs`.
Currently, the compiler log (which contains useful information such as
spill statistics when --verbose is passed) is only accessible in debug
builds via `LLVM_DEBUG`. However, there are good reasons to make this
information available in release builds as well:
1. Both `ptxas` and `libnvptxcompiler` are publicly available
tools/libraries distributed with the CUDA Toolkit. The `--verbose` flag
and its output are documented public features, not internal debug
information.
2. The verbose output provides valuable insights for users.
A new `SerializedObject` class is used to carry the metadata alongside
the binary when returning from `serializeObject`.
Changes:
- Change `ptxCode.c_str()` to `ptxCode.str().c_str()` to avoid error:
`error: 'class llvm::StringRef' has no member named 'c_str'; did you
mean 'str'?`
- Change `std::nullopt;` to `return mlir::failure();` to avoid error:
`could not convert 'std::nullopt' from 'const std::nullopt_t' to
'llvm::FailureOr<llvm::SmallVector<char, 0> >'`
Extra info:
- Tested versions: tried`llvmorg-21.1.8`, `llvmorg-22.1.0-rc1`,
`llvmorg-23-init`, `main`, all cannot compile without these fixes
- Test environment: `nvidia/cuda:13.1.0-devel-ubuntu24.04` docker image
(comes with gcc 13.3.0 and nvcc 13.1)
- Compile command: just turn-on
`-DLLVM_TARGETS_TO_BUILD="Native;NVPTX"`,
`-DMLIR_ENABLE_NVPTXCOMPILER=ON` and you will see the bugs.
A full command for example:
```
cmake llvm \
-B build \
-G Ninja \
-DLLVM_ENABLE_PROJECTS=mlir \
-DCMAKE_BUILD_TYPE=Release \
-DMLIR_ENABLE_BINDINGS_PYTHON=ON \
-DPython3_EXECUTABLE=python3 \
-DLLVM_ENABLE_RTTI=ON \
-DLLVM_INSTALL_UTILS=ON \
-DLLVM_INCLUDE_TESTS=OFF \
-DMLIR_INCLUDE_TESTS=OFF \
-DLLVM_BUILD_EXAMPLES=OFF \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_CCACHE_BUILD=ON \
-DCUDACXX=nvcc \
-DCUDA_PATH=/usr/local/cuda \
-DCMAKE_CUDA_ARCHITECTURES="80;89;120" \
-DCMAKE_C_COMPILER=gcc \
-DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_CUDA_COMPILER=nvcc \
-DLLVM_TARGETS_TO_BUILD="Native;NVPTX" \
-DMLIR_ENABLE_CUDA_RUNNER=ON \
-DMLIR_ENABLE_CUDA_CONVERSIONS=ON \
-DMLIR_ENABLE_NVPTXCOMPILER=ON
cmake --build build -j$(nproc) -t install
```
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Linux has a limitation of 256 characters for a path. Large function
names being serialized will cause this to fail. As createTemporaryFile
already unique's the file (up to 128 retries for different name
variations), truncating should suffice
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
When enabling or disabling a target we typically need to rebuild most
of LLVM because of the change to the values of the LLVM_HAS_*_TARGET
macros in llvm-config.h, which is included by most of the code, but
are unused by LLVM itself. To avoid this, move the LLVM_HAS_*_TARGET
macros to a separate header, Targets.h.
Update the only in-tree user of the macros (MLIR) to refer to the new
header. I expect that out-of-tree users will detect the change either
at compile time if they build with -Wundef, or at runtime. As far as
I can tell, the usage of these macros is rare in out-of-tree projects,
I found no out-of-tree users in projects indexed by Debian code search
[1], and one user [2] in projects indexed by GitHub code search [3]
(excluding forks of LLVM).
[1] https://codesearch.debian.net/search?q=%23.*LLVM_HAS_.*_TARGET&literal=0
[2] 238706b12b/lib/gc/Target/LLVM/XeVM/Target.cpp (L72)
[3] https://github.com/search?q=%2F%23.*LLVM_HAS_.*_TARGET%2F&type=code
Reviewers: nico, grypp, mstorsjo, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/136388
As title, we need to call `Timer::clear` to avoid extra log like this:
```
===-------------------------------------------------------------------------===
...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0000 seconds (0.0000 wall clock)
---Wall Time--- --- Name ---
----- ....
----- Total
```
PTX source files are expected to only contain ASCII text
(https://docs.nvidia.com/cuda/parallel-thread-execution/#source-format) and no null terminators.
`ptxas` has so far not enforced this but is moving towards doing so.
This revealed a problem where the null terminator is getting printed out
in the output file in MLIR path when outputting ptx directly. Only add the null on the assembly output path for JIT instead of in output of `moduleToObject `.
We hope that the timer can be cleared normally when the target-format is
`offload`, so as to avoid output like this:
```
===-------------------------------------------------------------------------===
Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===
---Wall Time--- --- Name ---
----- Timer for perf llvm-ir -> isa and isa -> binary.
...
```
Co-authored-by: Guray Ozen <guray.ozen@gmail.com>
This PR adds `cmd-options` to the `gpu-lower-to-nvvm-pipeline` pipeline
and the `nvvm-attach-target` pass, allowing users to pass flags to the
downstream compiler, *ptxas*.
Example:
```
mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-chip=sm_80 ptxas-cmd-options='-v --register-usage-level=8'"
```
Implement the feature about perf by stage(llvm-ir -> isa, isa->binary).
The results will be stored into the properties, then users can use them
after using GpuModuleToBinary Pass.
When building mlir with `-DMLIR_NVVM_EMBED_LIBDEVICE=ON`, there will be
a warning
```
build/tools/mlir/lib/Target/LLVM/libdevice_embedded.c:1: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘143’ to ‘-113’ [-Woverflow]
```
which is followed by a large number of characters in stdout.
Fix this to avoid stdout outputting a large number of characters (3e5).
This removes a runtime dependency on the CUDA Toolkit path, instead of
looking up the filesystem we use a version of libdevice embedded in the
binary at build time.
This change allows to expose through an interface attributes wrapping
content as external resources, and the usage inside the ModuleToObject
show how we will be able to provide runtime libraries without relying on
the filesystem.
This is a follow-up of #117246.
I thought then it would be easy to edit a DictionaryAttr but it turns
out that these attributes are immutable and need to be passed during the
construction of the gpu.binary Op.
The first commit was using the NVVMTargetAttr to pass the information.
After feedback from @fabianmcg, this PR now passes the information
through a new option of the gpu-module-to-binary pass.
Please add reviewers, as you see fit.
### Background
In `lib/Target/LLVM/NVVM/Target.cpp`, `NVPTXSerializer` compile PTX to
binary with two different flows controlled by
`MLIR_ENABLE_NVPTXCOMPILER`.
If building mlir with `-DMLIR_ENABLE_NVPTXCOMPILER=ON`, the flow does
not check if the target is `gpu::CompilationTarget::Fatbin`, and compile
PTX to cubin directly, which is not consistent with another flow.
### Implement
Use static [nvfatbin](https://docs.nvidia.com/cuda/nvfatbin/index.html)
library.
I have tested it locally, the two flows can return the same Fatbin
result after inputing the same `GpuModule`.
Here is the [merged
MR](https://github.com/llvm/llvm-project/pull/116007) which caused a
failure and [was
reverted](https://github.com/llvm/llvm-project/pull/116811).
Thanks to @joker-eph for the help, I fix it (miss constructing
`ModuleObject` with callback functions in
`mlir/lib/Target/LLVM/NVVM/Target.cpp`) and split unit tests from origin
test which don't need `ptxas` to make the test runs more widely.
This patch adds the `#gpu.kernel_metadata` and `#gpu.kernel_table`
attributes. The `#gpu.kernel_metadata` attribute allows storing metadata
related to a compiled kernel, for example, the number of scalar
registers used by the kernel. The attribute only has 2 required
parameters, the name and function type. It also has 2 optional
parameters, the arguments attributes and generic dictionary for storing
all other metadata.
The `#gpu.kernel_table` stores a table of `#gpu.kernel_metadata`,
mapping the name of the kernel to the metadata.
Finally, the function `ROCDL::getAMDHSAKernelsELFMetadata` was added to
collect ELF metadata from a binary, and to test the class methods in
both attributes.
Example:
```mlir
gpu.binary @binary [#gpu.object<#rocdl.target<chip = "gfx900">, kernels = #gpu.kernel_table<[
#gpu.kernel_metadata<"kernel0", (i32) -> (), metadata = {sgpr_count = 255}>,
#gpu.kernel_metadata<"kernel1", (i32, f32) -> (), arg_attrs = [{llvm.read_only}, {}]>
]> , bin = "BLOB">]
```
The motivation behind these attributes is to provide useful information
for things like tunning.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
This patch adds an argument to `gpu::TargetAttrInterface::createObject`
to pass the GPU module. This is useful as `gpu::ObjectAttr` contains a
property dict for metadata, hence the module can be used for extracting
things like the symbol table and adding it to the property dict.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
Most of the code here does not depend on the NVPTX target. In particular
the simple offload can just emit LLVM IR and we can use this without the
NVVM backend being built, which can be useful for a frontend that just
need to serialize the IR and leave it up to the runtime to JIT further.
LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX
was enabled when building LLVM. Use this instead of manually defining
MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).
This is another follow-up of #83004, which made the same change for
`MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit
exposes mentioned CMake variable through `mlir-config.h` and uses the
macro that is introduced with the same name. This replaces the macro
`MLIR_NVPTXCOMPILER_ENABLED`, which the CMake files previously defined
manually.
This is another follow-up of #83004. `NVVM/Target.cpp` uses the macro
`MLIR_NVPTXCOMPILER_ENABLED`, which is defined in `llvm-config.h` but
did not include that file, yielding a warning when compiled with
`-Wundef`. This PR adds the include.
~~This is another follow-up of #83004, which made the same change for
`MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit
exposes mentioned CMake variable through `mlir-config.h` and uses the
macro that is introduced with the same name. This replaces the macro
`MLIR_NVPTXCOMPILER_ENABLED`, which the CMake files previously defined
manually.~~
That macro was not defined in some cases and thus yielded warnings if
compiled with `-Wundef`. In particular, they were not defined in the
BUILD files, so the GPU targets were broken when built with Bazel. This
commit exposes mentioned CMake variable through mlir-config.h and uses
the macro that is introduced with the same name. This replaces the macro
MLIR_CUDA_CONVERSIONS_ENABLED, which the CMake files previously defined
manually.
Some specific implementation of the offload may want more customization, and
even avoid using LLVM in-tree to dispatch the ISA translation to a custom
solution. This refactoring makes it possible for such implementation to work
without even configuring the target backend in LLVM.
Reviewers: fabianmcg
Reviewed By: fabianmcg
Pull Request: https://github.com/llvm/llvm-project/pull/71165
This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.
2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.
3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.
4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
Currently, the NVPTX tool compilation path only calls `ptxas`; thus, the
GPU running the binary must be an exact match of the arch of the target,
or else the runtime throws an error due to the arch mismatch.
This patch adds a call to `fatbinary`, creating a fat binary with the
cubin object and the PTX code, allowing the driver to JIT the PTX at
runtime if there's an arch mismatch.
This revision avoids the registration of dialect extensions in Pass::getDependentDialects.
Such registration of extensions can be dangerous because `DialectRegistry::isSubsetOf` is
always guaranteed to return false for extensions (i.e. there is no mechanism to track
whether a lambda is already in the list of already registered extensions).
When the context is already in a multi-threaded mode, this is guaranteed to assert.
Arguably a more structured registration mechanism for extensions with a unique ExtensionID
could be envisioned in the future.
In the process of cleaning this up, multiple usage inconsistencies surfaced around the
registration of translation extensions that this revision also cleans up.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157703
Fix a build failure in GCC 7 caused by the function:
`std::optional<SmallVector<std::unique_ptr<llvm::Module>>> loadBitcodeFiles` ,
in `NVVMTarget` & `ROCDLTarget`.
The failure is caused because GCC fails to use the move constructor in
`std::optional` for constructing the return value, which prompts a call to the
deleted copy constructor in `std::unique_ptr`, resulting in a failure.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D157804
**For an explanation of these patches see D154153.**
Commit message:
This patch adds the NVVM target attribute for serializing GPU modules into
strings containing cubin.
Depends on D154113 and D154100 and D154097
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D154117