llvm-project

Author	SHA1	Message	Date
Zichen Lu	fbffdaa174	[MLIR][GPU] Update serializeToObject to use SerializedObject wrapper and include ISA compiler logs (#176697 ) This PR makes the compilation log from ISA compiler available to users by returning it as part of the `gpu::ObjectAttr` properties, following the existing pattern like `LLVMIRToISATimeInMs`. Currently, the compiler log (which contains useful information such as spill statistics when --verbose is passed) is only accessible in debug builds via `LLVM_DEBUG`. However, there are good reasons to make this information available in release builds as well: 1. Both `ptxas` and `libnvptxcompiler` are publicly available tools/libraries distributed with the CUDA Toolkit. The `--verbose` flag and its output are documented public features, not internal debug information. 2. The verbose output provides valuable insights for users. A new `SerializedObject` class is used to carry the metadata alongside the binary when returning from `serializeObject`.	2026-01-30 12:56:20 +01:00
Jay Zhuang	9341067a73	Fix MLIR compilation bugs for NVPTX target (#177024 ) Changes: - Change `ptxCode.c_str()` to `ptxCode.str().c_str()` to avoid error: `error: 'class llvm::StringRef' has no member named 'c_str'; did you mean 'str'?` - Change `std::nullopt;` to `return mlir::failure();` to avoid error: `could not convert 'std::nullopt' from 'const std::nullopt_t' to 'llvm::FailureOr<llvm::SmallVector<char, 0> >'` Extra info: - Tested versions: tried`llvmorg-21.1.8`, `llvmorg-22.1.0-rc1`, `llvmorg-23-init`, `main`, all cannot compile without these fixes - Test environment: `nvidia/cuda:13.1.0-devel-ubuntu24.04` docker image (comes with gcc 13.3.0 and nvcc 13.1) - Compile command: just turn-on `-DLLVM_TARGETS_TO_BUILD="Native;NVPTX"`, `-DMLIR_ENABLE_NVPTXCOMPILER=ON` and you will see the bugs. A full command for example: ``` cmake llvm \ -B build \ -G Ninja \ -DLLVM_ENABLE_PROJECTS=mlir \ -DCMAKE_BUILD_TYPE=Release \ -DMLIR_ENABLE_BINDINGS_PYTHON=ON \ -DPython3_EXECUTABLE=python3 \ -DLLVM_ENABLE_RTTI=ON \ -DLLVM_INSTALL_UTILS=ON \ -DLLVM_INCLUDE_TESTS=OFF \ -DMLIR_INCLUDE_TESTS=OFF \ -DLLVM_BUILD_EXAMPLES=OFF \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_CCACHE_BUILD=ON \ -DCUDACXX=nvcc \ -DCUDA_PATH=/usr/local/cuda \ -DCMAKE_CUDA_ARCHITECTURES="80;89;120" \ -DCMAKE_C_COMPILER=gcc \ -DCMAKE_CXX_COMPILER=g++ \ -DCMAKE_CUDA_COMPILER=nvcc \ -DLLVM_TARGETS_TO_BUILD="Native;NVPTX" \ -DMLIR_ENABLE_CUDA_RUNNER=ON \ -DMLIR_ENABLE_CUDA_CONVERSIONS=ON \ -DMLIR_ENABLE_NVPTXCOMPILER=ON cmake --build build -j$(nproc) -t install ``` --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2026-01-21 18:43:01 +04:00
Victor Chernyakin	c438773432	[LLVM][ADT] Migrate users of `make_scope_exit` to CTAD (#174030 ) This is a followup to #173131, which introduced the CTAD functionality.	2026-01-02 20:42:56 -08:00
Ivan Butygin	c7af990cb7	Reland [mlir][gpu] Use `SmallString`, `FailureOr` and `StringRef` in `module-to-binary` infra (NFC) (#172390 ) Reland https://github.com/llvm/llvm-project/pull/172284 `MCAsmParser` expects buffer to be null terminated, had to use `getMemBufferCopy` which is unfortunate.	2025-12-17 16:42:54 +03:00
Ivan Butygin	4c7d765d05	Revert "[mlir][gpu] Use `SmallString`, `FailureOr` and `StringRef` in `module-to-binary` infra (NFC) (#172284 )" (#172386 ) This reverts commit c8b8b2f1f9ced8db1458dfdf1ea6a15152c695f0. broke the bot	2025-12-16 00:06:33 +00:00
Ivan Butygin	c8b8b2f1f9	[mlir][gpu] Use `SmallString`, `FailureOr` and `StringRef` in `module-to-binary` infra (NFC) (#172284 ) Instead of `std::string`, `std::optional` and `const std::string&`.	2025-12-16 02:43:08 +03:00
Ivan Butygin	b3ec8be22b	[mlir][gpu] Expose some utility functions from `gpu-to-binary` infra (#172205 ) For people who do not want to use a single monolithic pass.	2025-12-15 13:39:19 +03:00
William Moses	b12e03315a	[MLIR][GPU] Truncate temp filename path size to avoid linux limitations (#155108 ) Linux has a limitation of 256 characters for a path. Large function names being serialized will cause this to fail. As createTemporaryFile already unique's the file (up to 128 retries for different name variations), truncating should suffice	2025-08-23 16:29:01 +00:00
Mehdi Amini	30f9428f14	[MLIR] Adopt LDBG() macro in LLVM/NVVM/Target.cpp (#154721 )	2025-08-21 10:29:37 +00:00
Mehdi Amini	ce535c8700	[MLIR] Add missing includes to NVVM/Target.cpp (fix build) (#150637 ) Depending on the CMake configuration, these missing headers triggers a compilation error.	2025-07-25 19:54:46 +02:00
Kazu Hirata	1a0f482de8	[mlir] Remove unused includes (NFC) (#150476 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-24 11:23:53 -07:00
Mehdi Amini	fbb2fa92cb	[MLIR] Add missing includes The build was broken when MLIR_NVVM_EMBED_LIBDEVICE was enabled.	2025-07-09 03:49:02 -07:00
Kazu Hirata	38b8ef16f7	[mlir] Remove unused includes (NFC) (#147158 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-05 10:38:27 -07:00
Rahul Joshi	b17f3c63de	[NFC][MLIR] Add {} for `else` when `if` body has {} (#139422 )	2025-05-12 10:29:03 -07:00
Peter Collingbourne	667209e451	Config: Move LLVM_HAS__TARGET definitions to a new header. When enabling or disabling a target we typically need to rebuild most of LLVM because of the change to the values of the LLVM_HAS__TARGET macros in llvm-config.h, which is included by most of the code, but are unused by LLVM itself. To avoid this, move the LLVM_HAS__TARGET macros to a separate header, Targets.h. Update the only in-tree user of the macros (MLIR) to refer to the new header. I expect that out-of-tree users will detect the change either at compile time if they build with -Wundef, or at runtime. As far as I can tell, the usage of these macros is rare in out-of-tree projects, I found no out-of-tree users in projects indexed by Debian code search [1], and one user [2] in projects indexed by GitHub code search [3] (excluding forks of LLVM). [1] https://codesearch.debian.net/search?q=%23.LLVM_HAS_._TARGET&literal=0 [2] `238706b12b/lib/gc/Target/LLVM/XeVM/Target.cpp (L72)` [3] https://github.com/search?q=%2F%23.LLVM_HAS_.*_TARGET%2F&type=code Reviewers: nico, grypp, mstorsjo, MaskRay Reviewed By: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/136388	2025-04-18 18:11:18 -07:00
Zichen Lu	1d190065d9	[mlir][target] RAII wrap moduleToObject timer to ensure call `clear` function (#136142 ) As title, we need to call `Timer::clear` to avoid extra log like this: ``` ===-------------------------------------------------------------------------=== ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.0000 seconds (0.0000 wall clock) ---Wall Time--- --- Name --- ----- .... ----- Total ```	2025-04-18 12:33:31 +02:00
modiking	9f2feeb189	[mlir][gpu][nvptx] Remove null terminator when outputting PTX (#133019 ) PTX source files are expected to only contain ASCII text (https://docs.nvidia.com/cuda/parallel-thread-execution/#source-format) and no null terminators. `ptxas` has so far not enforced this but is moving towards doing so. This revealed a problem where the null terminator is getting printed out in the output file in MLIR path when outputting ptx directly. Only add the null on the assembly output path for JIT instead of in output of `moduleToObject `.	2025-04-03 15:50:54 -07:00
Guray Ozen	9910d34d6c	[MLIR][NVVM] Print ptxas path in debug output for "serialize-to-binary" (#132373 )	2025-03-25 12:21:06 +01:00
Zichen Lu	4398a222ad	[mlir][target] Adjust the start and end position of the moduleToObject timer (#132693 ) We hope that the timer can be cleared normally when the target-format is `offload`, so as to avoid output like this: ``` ===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------=== ---Wall Time--- --- Name --- ----- Timer for perf llvm-ir -> isa and isa -> binary. ... ``` Co-authored-by: Guray Ozen <guray.ozen@gmail.com>	2025-03-24 11:35:14 +01:00
Zichen Lu	2ec16ee28b	[mlir][target] Adjust the start position of the moduleToObject timer (#129835 ) As title. To avoid `Ungrouped Timers` when the target is `Assembly `.	2025-03-06 11:09:12 +01:00
Guray Ozen	837b89fc0f	[MLIR][NVVM] Add `ptxas-cmd-options` to pass flags to the downstream compiler (#127457 ) This PR adds `cmd-options` to the `gpu-lower-to-nvvm-pipeline` pipeline and the `nvvm-attach-target` pass, allowing users to pass flags to the downstream compiler, ptxas. Example: ``` mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-chip=sm_80 ptxas-cmd-options='-v --register-usage-level=8'" ```	2025-02-17 12:09:27 +01:00
Zichen Lu	2a5050aa5e	[mlir][target][nvvm] Perf by stage and store into properties (#126178 ) Implement the feature about perf by stage(llvm-ir -> isa, isa->binary). The results will be stored into the properties, then users can use them after using GpuModuleToBinary Pass.	2025-02-11 12:58:58 +01:00
Zichen Lu	a61ca99de2	[mlir] fix overflow warning when generating embedded libdevice (#125801 ) When building mlir with `-DMLIR_NVVM_EMBED_LIBDEVICE=ON`, there will be a warning ``` build/tools/mlir/lib/Target/LLVM/libdevice_embedded.c:1: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘143’ to ‘-113’ [-Woverflow] ``` which is followed by a large number of characters in stdout. Fix this to avoid stdout outputting a large number of characters (3e5).	2025-02-05 12:41:11 +01:00
Mehdi Amini	6a7d6c5f69	[MLIR] Add a MLIR_NVVM_EMBED_LIBDEVICE CMake option that embeds libdevice in the binary (#120238 ) This removes a runtime dependency on the CUDA Toolkit path, instead of looking up the filesystem we use a version of libdevice embedded in the binary at build time.	2024-12-17 16:53:38 +01:00
Mehdi Amini	72e8b9aeaa	[MLIR] Add a BlobAttr interface for attribute to wrap arbitrary content and use it as linkLibs for ModuleToObject (#120116 ) This change allows to expose through an interface attributes wrapping content as external resources, and the usage inside the ModuleToObject show how we will be able to provide runtime libraries without relying on the filesystem.	2024-12-17 01:30:56 +01:00
Renaud Kauffmann	9919295cfd	[mlir][gpu] Adding ELF section option to the gpu-module-to-binary pass (#119440 ) This is a follow-up of #117246. I thought then it would be easy to edit a DictionaryAttr but it turns out that these attributes are immutable and need to be passed during the construction of the gpu.binary Op. The first commit was using the NVVMTargetAttr to pass the information. After feedback from @fabianmcg, this PR now passes the information through a new option of the gpu-module-to-binary pass. Please add reviewers, as you see fit.	2024-12-16 09:09:41 -08:00
Zichen Lu	4971e53612	[mlir][Target] Support Fatbin target for static nvptxcompiler (#118044 ) ### Background In `lib/Target/LLVM/NVVM/Target.cpp`, `NVPTXSerializer` compile PTX to binary with two different flows controlled by `MLIR_ENABLE_NVPTXCOMPILER`. If building mlir with `-DMLIR_ENABLE_NVPTXCOMPILER=ON`, the flow does not check if the target is `gpu::CompilationTarget::Fatbin`, and compile PTX to cubin directly, which is not consistent with another flow. ### Implement Use static [nvfatbin](https://docs.nvidia.com/cuda/nvfatbin/index.html) library. I have tested it locally, the two flows can return the same Fatbin result after inputing the same `GpuModule`.	2024-12-10 11:45:24 +01:00
Zichen Lu	08e7609692	[mlir][fix] Add callback functions for ModuleToObject (#116916 ) Here is the [merged MR](https://github.com/llvm/llvm-project/pull/116007) which caused a failure and [was reverted](https://github.com/llvm/llvm-project/pull/116811). Thanks to @joker-eph for the help, I fix it (miss constructing `ModuleObject` with callback functions in `mlir/lib/Target/LLVM/NVVM/Target.cpp`) and split unit tests from origin test which don't need `ptxas` to make the test runs more widely.	2024-11-20 13:22:08 +01:00
Mehdi Amini	af41c55673	Revert "[MLIR] Add callback functions for ModuleToObject" (#116811 ) Reverts llvm/llvm-project#116007 Bot is broken.	2024-11-19 15:28:17 +01:00
Zichen Lu	2153672ba3	[MLIR] Add callback functions for ModuleToObject (#116007 ) In ModuleToObject flow, users may want to add some callback functions invoked with LLVM IR/ISA for debugging or other purposes.	2024-11-19 13:51:08 +01:00
Guray Ozen	816134b333	[MLIR] Dump sass (#110227 ) This PR dump sass by using nvdiasm	2024-09-27 13:52:15 +02:00
Fabian Mora	016e1eb9c8	[mlir][gpu] Add metadata attributes for storing kernel metadata in GPU objects (#95292 ) This patch adds the `#gpu.kernel_metadata` and `#gpu.kernel_table` attributes. The `#gpu.kernel_metadata` attribute allows storing metadata related to a compiled kernel, for example, the number of scalar registers used by the kernel. The attribute only has 2 required parameters, the name and function type. It also has 2 optional parameters, the arguments attributes and generic dictionary for storing all other metadata. The `#gpu.kernel_table` stores a table of `#gpu.kernel_metadata`, mapping the name of the kernel to the metadata. Finally, the function `ROCDL::getAMDHSAKernelsELFMetadata` was added to collect ELF metadata from a binary, and to test the class methods in both attributes. Example: ```mlir gpu.binary @binary [#gpu.object<#rocdl.target<chip = "gfx900">, kernels = #gpu.kernel_table<[ #gpu.kernel_metadata<"kernel0", (i32) -> (), metadata = {sgpr_count = 255}>, #gpu.kernel_metadata<"kernel1", (i32, f32) -> (), arg_attrs = [{llvm.read_only}, {}]> ]> , bin = "BLOB">] ``` The motivation behind these attributes is to provide useful information for things like tunning. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-08-27 18:44:50 -04:00
Fabian Mora	fd36a7b944	[mlir][gpu] Pass GPU module to `TargetAttrInterface::createObject`. (#94910 ) This patch adds an argument to `gpu::TargetAttrInterface::createObject` to pass the GPU module. This is useful as `gpu::ObjectAttr` contains a property dict for metadata, hence the module can be used for extracting things like the symbol table and adding it to the property dict. --------- Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>	2024-08-27 11:05:04 -04:00
Mehdi Amini	d7da0ae4f4	[MLIR][NVVM] Reduce the scope of the LLVM_HAS_NVPTX_TARGET guard (#97349 ) Most of the code here does not depend on the NVPTX target. In particular the simple offload can just emit LLVM IR and we can use this without the NVVM backend being built, which can be useful for a frontend that just need to serialize the IR and leave it up to the runtime to JIT further.	2024-07-02 11:31:12 +02:00
Adam Paszke	3320249688	[MLIR][NVVM] Make the call to findTool optional for fatbinary (#93968 )	2024-05-31 21:08:57 +02:00
tyb0807	8178a3ad1b	[mlir] Replace MLIR_ENABLE_CUDA_CONVERSIONS with LLVM_HAS_NVPTX_TARGET (#93008 ) LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX was enabled when building LLVM. Use this instead of manually defining MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).	2024-05-24 17:31:28 +02:00
Ingo Müller	099045a045	[mlir][nvvm] Expose MLIR_NVPTXCOMPILER_ENABLED in mlir-config.h. (#84007 ) This is another follow-up of #83004, which made the same change for `MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit exposes mentioned CMake variable through `mlir-config.h` and uses the macro that is introduced with the same name. This replaces the macro `MLIR_NVPTXCOMPILER_ENABLED`, which the CMake files previously defined manually.	2024-03-06 14:14:53 +01:00
Ingo Müller	d70254a623	[mlir][nvvm] Add missing include to llvm-config.h. (#83998 ) This is another follow-up of #83004. `NVVM/Target.cpp` uses the macro `MLIR_NVPTXCOMPILER_ENABLED`, which is defined in `llvm-config.h` but did not include that file, yielding a warning when compiled with `-Wundef`. This PR adds the include. ~~This is another follow-up of #83004, which made the same change for `MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit exposes mentioned CMake variable through `mlir-config.h` and uses the macro that is introduced with the same name. This replaces the macro `MLIR_NVPTXCOMPILER_ENABLED`, which the CMake files previously defined manually.~~	2024-03-06 10:13:12 +01:00
Ingo Müller	5f2097dbed	[mlir] Expose MLIR_CUDA_CONVERSIONS_ENABLED in mlir-config.h. (#83004 ) That macro was not defined in some cases and thus yielded warnings if compiled with `-Wundef`. In particular, they were not defined in the BUILD files, so the GPU targets were broken when built with Bazel. This commit exposes mentioned CMake variable through mlir-config.h and uses the macro that is introduced with the same name. This replaces the macro MLIR_CUDA_CONVERSIONS_ENABLED, which the CMake files previously defined manually.	2024-02-28 14:48:40 +01:00
Mehdi Amini	332a504985	Apply clang-tidy fixes for readability-container-size-empty in Target.cpp (NFC)	2024-02-15 16:02:41 -08:00
Mehdi Amini	867678dd81	Apply clang-tidy fixes for llvm-qualified-auto in Target.cpp (NFC)	2024-02-15 16:02:41 -08:00
Mehdi Amini	d9dadfda85	Refactor ModuleToObject to offer more flexibility to subclass (NFC) Some specific implementation of the offload may want more customization, and even avoid using LLVM in-tree to dispatch the ISA translation to a custom solution. This refactoring makes it possible for such implementation to work without even configuring the target backend in LLVM. Reviewers: fabianmcg Reviewed By: fabianmcg Pull Request: https://github.com/llvm/llvm-project/pull/71165	2023-11-03 13:41:45 -07:00
Fabian Mora	5093413a50	[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220 ) This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string. 3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.	2023-09-14 18:00:27 -04:00
Fabian Mora	c16adb0dcb	[mlir][Target][NVPTX] Add fatbin support to NVPTX compilation. (#65398 ) Currently, the NVPTX tool compilation path only calls `ptxas`; thus, the GPU running the binary must be an exact match of the arch of the target, or else the runtime throws an error due to the arch mismatch. This patch adds a call to `fatbinary`, creating a fat binary with the cubin object and the PTX code, allowing the driver to JIT the PTX at runtime if there's an arch mismatch.	2023-09-07 07:44:41 -04:00
Adrian Kuegel	658a4fdb26	[mlir] Apply ClangTidy fix (NFC) Prefer to use .empty() instead of checking size().	2023-09-05 10:16:55 +02:00
Nicolas Vasilache	7c4e8c6a27	[mlir] Disentangle dialect and extension registrations. This revision avoids the registration of dialect extensions in Pass::getDependentDialects. Such registration of extensions can be dangerous because `DialectRegistry::isSubsetOf` is always guaranteed to return false for extensions (i.e. there is no mechanism to track whether a lambda is already in the list of already registered extensions). When the context is already in a multi-threaded mode, this is guaranteed to assert. Arguably a more structured registration mechanism for extensions with a unique ExtensionID could be envisioned in the future. In the process of cleaning this up, multiple usage inconsistencies surfaced around the registration of translation extensions that this revision also cleans up. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D157703	2023-08-22 00:40:09 +00:00
Fabian Mora	d923f4c2d8	[mlir][NVVM\|ROCDL] Explicitly construct the return type in loadBitcodeFiles in (NVVM\|ROCDL)Target Fix a build failure in GCC 7 caused by the function: `std::optional<SmallVector<std::unique_ptr<llvm::Module>>> loadBitcodeFiles` , in `NVVMTarget` & `ROCDLTarget`. The failure is caused because GCC fails to use the move constructor in `std::optional` for constructing the return value, which prompts a call to the deleted copy constructor in `std::unique_ptr`, resulting in a failure. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D157804	2023-08-13 00:25:30 +00:00
Fabian Mora	211c9752c8	[mlir][NVVM] Adds the NVVM target attribute. For an explanation of these patches see D154153. Commit message: This patch adds the NVVM target attribute for serializing GPU modules into strings containing cubin. Depends on D154113 and D154100 and D154097 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154117	2023-08-08 19:21:36 +00:00

48 Commits