llvm-project

Author	SHA1	Message	Date
Shenghang Tsai	7610b13729	[MLIR] Split ExecutionEngine Initialization out of ctor into an explicit method call (#153524 ) Retry landing https://github.com/llvm/llvm-project/pull/153373 ## Major changes from previous attempt - remove the test in CAPI because no existing tests in CAPI deal with sanitizer exemptions - update `mlir/docs/Dialects/GPU.md` to reflect the new behavior: load GPU binary in global ctors, instead of loading them at call site. - skip the test on Aarch64 since we have an issue with initialization there --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2025-08-17 23:07:24 +02:00
Mehdi Amini	bfd490e0cd	Revert "[MLIR] Split ExecutionEngine Initialization out of ctor into an explicit method call" (#153477 ) Reverts llvm/llvm-project#153373 Sanitizer bot is broken	2025-08-13 19:43:04 +00:00
Shenghang Tsai	2f93693f76	[MLIR] Split ExecutionEngine Initialization out of ctor into an explicit method call (#153373 ) This PR introduces a mechanism to defer JIT engine initialization, enabling registration of required symbols before global constructor execution. ## Problem Modules containing `gpu.module` generate global constructors (e.g., kernel load/unload) that execute during engine creation. This can force premature symbol resolution, causing failures when: - Symbols are registered via `mlirExecutionEngineRegisterSymbol` after creation - Global constructors exist (even if not directly using unresolved symbols, e.g., an external function declaration) - GPU modules introduce mandatory binary loading logic ## Usage ```c // Create engine without initialization MlirExecutionEngine jit = mlirExecutionEngineCreate(...); // Register required symbols mlirExecutionEngineRegisterSymbol(jit, ...); // Explicitly initialize (runs global constructors) mlirExecutionEngineInitialize(jit); ``` --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2025-08-13 15:22:01 +02:00
Md Abdullah Shahneous Bari	281e6d2cc4	[mlir][ExecutionEngine] Add LevelZeroRuntimeWrapper. (#151038 ) Adds LevelZeroRuntime wrapper and tests. Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com> Co-authored-by: Nishant Patel <nishant.b.patel@intel.com> --------- Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com> Co-authored-by: Nishant Patel <nishant.b.patel@intel.com>	2025-08-06 16:48:59 -05:00
Momchil Velikov	962c4217bc	[MLIR][AArch64] Change some tests to ensure SVE vector length is the same throughout the function (#147506 ) This change only applies to functions the can be reasonably expected to use SVE registers. Modifying vector length in the middle of a function might cause incorrect stack deallocation if there are callee-saved SVE registers or incorrect access to SVE stack slots. Addresses (non-issue) https://github.com/llvm/llvm-project/issues/143670	2025-07-09 09:32:25 +01:00
Karlo Basioli	cd585864c0	Pass memory buffer to RuntimeDyld::MemoryManager factory (#142930 ) `RTDyldObjectLinkingLayer` is currently creating a memory manager without any parameters. In this PR I am passing the MemoryBuffer that will be emitted to the MemoryManager so that the user can use it to configure the behaviour of the MemoryManager.	2025-06-06 00:44:39 +01:00
Sang Ik Lee	3fa65dee14	[mlir] SYCL runtime wrapper: add memcpy support. (#141647 )	2025-05-28 11:33:15 -07:00
Longsheng Mou	34be80aa6e	[mlir-runner] Check entry function does not expect arguments (#136825 ) This PR fixes a crash if entry function has inputs. Fixes #136143.	2025-05-15 09:19:39 +08:00
Rahul Joshi	b17f3c63de	[NFC][MLIR] Add {} for `else` when `if` body has {} (#139422 )	2025-05-12 10:29:03 -07:00
Christian Sigg	7851b1bcf1	[mlir][gpu] Change GPU modules to globals (#135478 ) Load/unload GPU modules in global ctors/dtors instead of each time when launching a kernel. Loading GPU modules is a heavy-weight operation and synchronizes the GPU context. Now that the modules are loaded ahead of time, asynchronously launched kernels can run concurrently, see https://discourse.llvm.org/t/how-to-lower-the-combination-of-async-gpu-ops-in-gpu-dialect. The implementations of `embedBinary()` and `launchKernel()` use slightly different mechanics at the moment but I prefer to not change the latter more than necessary as part of this PR. I will prepare a follow-up NFC for `launchKernel()` to align them again.	2025-04-22 13:49:58 +02:00
Nikita Popov	979c275097	[IR] Store Triple in Module (NFC) (#129868 ) The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.	2025-03-06 10:27:47 +01:00
Lang Hames	b18e5b6a36	Re-apply "[ORC] Remove the Triple argument from LLJITBuilder::..." with fixes. This re-applies f905bf3e1ef860c4d6fe67fb64901b6bbe698a91, which was reverted in c861c1a046eb8c1e546a8767e0010904a3c8c385 due to compiler errors, with a fix for MLIR.	2025-03-06 17:17:05 +11:00
Andrea Faulds	eb206e9ea8	[mlir] Rename mlir-cpu-runner to mlir-runner (#123776 ) With the removal of mlir-vulkan-runner (as part of #73457) in e7e3c45bc70904e24e2b3221ac8521e67eb84668, mlir-cpu-runner is now the only runner for all CPU and GPU targets, and the "cpu" name has been misleading for some time already. This commit renames it to mlir-runner.	2025-01-24 14:08:38 +01:00
Michał Górny	047e8e47c1	Reapply "[mlir] Link libraries that aren't included in libMLIR to libMLIR" (#123910 ) Use `mlir_target_link_libraries()` to link dependencies of libraries that are not included in libMLIR, to ensure that they link to the dylib when they are used in Flang. Otherwise, they implicitly pull in all their static dependencies, effectively causing Flang binaries to simultaneously link to the dylib and to static libraries, which is never a good idea. I have only covered the libraries that are used by Flang. If you wish, I can extend this approach to all non-libMLIR libraries in MLIR, making MLIR itself also link to the dylib consistently. [v3 with more `-DBUILD_SHARED_LIBS=ON` fixes]	2025-01-22 09:01:50 +00:00
Michał Górny	9decc24c6b	Revert "[mlir] Link libraries that aren't included in libMLIR to libMLIR (#123781 )" This reverts commit 4c6242ebf50dde0597df2bace49d534b61122496. More BUILD_SHARED_LIBS=ON regressions, sigh.	2025-01-22 09:09:52 +01:00
Michał Górny	4c6242ebf5	[mlir] Link libraries that aren't included in libMLIR to libMLIR (#123781 ) Use `mlir_target_link_libraries()` to link dependencies of libraries that are not included in libMLIR, to ensure that they link to the dylib when they are used in Flang. Otherwise, they implicitly pull in all their static dependencies, effectively causing Flang binaries to simultaneously link to the dylib and to static libraries, which is never a good idea. I have only covered the libraries that are used by Flang. If you wish, I can extend this approach to all non-libMLIR libraries in MLIR, making MLIR itself also link to the dylib consistently. [v2 with fixed `-DBUILD_SHARED_LIBS=ON` build]	2025-01-22 07:54:54 +00:00
Andrea Faulds	e7e3c45bc7	[mlir] Remove mlir-vulkan-runner and GPUToVulkan conversion passes (#123750 ) This follows up on 733be4ed7dcf976719f424c0cb81b77a14f91f5a, which made mlir-vulkan-runner and its associated passes redundant, and completes the main goal of #73457. The mlir-vulkan-runner tests become part of the integration test suite, and the Vulkan runner runtime components become part of ExecutionEngine, just as was done when removing other target-specific runners.	2025-01-21 16:51:27 +01:00
Michał Górny	8b879d106b	Revert "[mlir] Link libraries that aren't included in libMLIR to libMLIR (#123477 )" This reverts commit af6616676fb7f9dd4898290ea684ee0c90f1701d. It broke builds with `-DBUILD_SHARED_LIBS=ON`.	2025-01-20 19:33:51 +01:00
Michał Górny	af6616676f	[mlir] Link libraries that aren't included in libMLIR to libMLIR (#123477 ) Use `mlir_target_link_libraries()` to link dependencies of libraries that are not included in libMLIR, to ensure that they link to the dylib when they are used in Flang. Otherwise, they implicitly pull in all their static dependencies, effectively causing Flang binaries to simultaneously link to the dylib and to static libraries, which is never a good idea. I have only covered the libraries that are used by Flang. If you wish, I can extend this approach to all non-libMLIR libraries in MLIR, making MLIR itself also link to the dylib consistently.	2025-01-20 17:25:20 +00:00
Andrea Faulds	0e39b1348e	[mlir] Remove the mlir-spirv-cpu-runner (move to mlir-cpu-runner) (#114563 ) This commit builds on and completes the work done in 9f6c632ecda08bfff76b798c46d5d7cfde57b5e9 to eliminate the need for a separate mlir-spirv-cpu-runner binary. Since the MLIR processing is already done outside this runner, the only real difference between it and the mlir-cpu-runner is the final linking step between the nested LLVM IR modules. By moving this step into mlir-cpu-runner behind a new command-line flag (`--link-nested-modules`), this commit is able to completely remove the runner component of the mlir-spirv-cpu-runner. The runtime libraries and the tests are moved and renamed to fit into the Execution Engine and Integration tests, following the model of the similar migration done for the CUDA Runner in D97463.	2024-11-08 08:01:52 -05:00
Zentrik	74e1062e34	[MLIR] Don't build MLIRExecutionEngineShared on Windows (#109524 ) This disabled the build of `MLIRExecutionEngineShared` because this causes linkage issues in windows for currently unknown reasons. Related issue: https://github.com/llvm/llvm-project/issues/106859.	2024-10-09 21:43:11 +02:00
Umang Yadav	9f8f1d9890	[MLIR][AMDGPU] Add ability to do 16-bit Memset with HIP APIs (#108587 ) CC: @krzysz00 @manupak	2024-09-20 09:53:41 -05:00
JOE1994	884221eddb	[mlir] Tidy uses of llvm::raw_stream_ostream (NFC) As specified in the docs, 1) raw_string_ostream is always unbuffered and 2) the underlying buffer may be used directly ( 65b13610a5226b84889b923bae884ba395ad084d for further reference ) * Don't call raw_string_ostream::flush(), which is essentially a no-op. * Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.	2024-09-16 23:23:25 -04:00
Guray Ozen	20861f1f2f	[mlir][gpu] Use alloc OP's `host_shared` in cuda runtime (#99035 )	2024-07-17 07:25:11 +02:00
Christian Ulmann	631ae59d30	[MLIR][ExecutionEngine] Introduce shared library (#87067 ) This commit introduces a shared library for the MLIR execution engine. This library is only built when `LLVM_BUILD_LLVM_DYLIB` is set. Having such a library allows downstream users to depend on the execution engine without giving up dynamic linkage. This is especially important for CPU runner-style tools, as they link against large parts of MLIR and LLVM. It is alternatively possible to modify the `MLIRExecutionEngine` target when `LLVM_BUILD_LLVM_DYLIB` is set, to avoid duplicated libraries.	2024-03-30 09:53:19 +01:00
Aart Bik	dc4cfdbb8f	[mlir][sparse] provide an AoS "view" into sparse runtime support lib (#87116 ) Note that even though the sparse runtime support lib always uses SoA storage for COO storage (and provides correct codegen by means of views into this storage), in some rare cases we need the true physical SoA storage as a coordinate buffer. This PR provides that functionality by means of a (costly) coordinate buffer call. Since this is currently only used for testing/debugging by means of the sparse_tensor.print method, this solution is acceptable. If we ever want a performing version of this, we should truly support AoS storage of COO in addition to the SoA used right now.	2024-03-29 15:30:36 -07:00
Kai Sasaki	cb898e26f3	[mlir] Make the print function in CRunnerUtil platform agnostic (#86767 ) The platform running on Apple Silicon does not seem to support the negative nan. It causes the test failure where we explicitly specify the negative nan bit pattern and check the output printed by the CRunnerUtil function. We can make the print function in the utility platform agnostic by using the standard library functions (i.e. `std::isnan` and `std::signbit`) so that we can run the test across platforms that do not support the negative bit pattern. I have added two test cases that would fail in the Apple Silicon platform without print function changes. ``` $ uname -a Darwin Kernel Version 23.3.0: Wed Dec 20 21:30:44 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6000 arm64 ``` See: https://discourse.llvm.org/t/test-failure-of-sparse-sign-test-in-apple-silicon/77876/3	2024-03-28 09:40:17 +09:00
Justin Holewinski	5e78417db5	[MLIR][CUDA] Use _alloca instead of alloca on Windows (#85853 ) MSVC/Windows does not support `alloca()`; instead it defines `_alloca()` in `malloc.h`.	2024-03-20 00:32:19 -07:00
Benjamin Kramer	9a3ece232c	[mlir][sparse] Fix the calling convention of __truncsfbf2 on windows x64 It also wants us to return the value in XMM0.	2024-03-19 13:48:10 +01:00
Guray Ozen	7d55b916a5	[mlir][nvgpu] Support strided memref when creating TMA descriptor (#85652 )	2024-03-18 19:47:39 +01:00
Aart Bik	4daf86ef3f	[mlir][sparse] refactoring sparse runtime lib into less paths (#85332 ) Two constructors could be easily refactored into one after a lot of previous deprecated code has been removed.	2024-03-14 17:06:39 -07:00
Mehdi Amini	716042a63f	Rename llvm::ThreadPool -> llvm::DefaultThreadPool (NFC) (#83702 ) The base class llvm::ThreadPoolInterface will be renamed llvm::ThreadPool in a subsequent commit. This is a breaking change: clients who use to create a ThreadPool must now create a DefaultThreadPool instead.	2024-03-05 18:00:46 -08:00
Mehdi Amini	4a4fb930a5	Use the new ThreadPoolInterface base class instead of the concrete implementation (NFC) (#84056 )	2024-03-05 12:37:11 -08:00
Aart Bik	1c2456d659	[mlir][sparse] remove very thin header file from sparse runtime support (#82820 )	2024-02-23 12:37:36 -08:00
Aart Bik	f8ce460e48	[mlir][sparse] cleanup sparse runtime library (#82807 ) remove some obsoleted APIs from the library that have been fully replaced with actual direct IR codegen	2024-02-23 10:52:28 -08:00
Mehdi Amini	744616b3ae	Rename `ThreadPool::getThreadCount()` to `getMaxConcurrency()` (NFC) (#82296 ) This is addressing a long-time TODO to rename this misleading API. The old one is preserved for now but marked deprecated.	2024-02-19 18:07:12 -08:00
Mehdi Amini	bf4480d923	Apply clang-tidy fixes for readability-identifier-naming in SparseTensorRuntime.cpp (NFC)	2024-02-14 10:11:37 -08:00
Yinying Li	e5924d6499	[mlir][sparse] Implement parsing n out of m (#79935 ) 1. Add parsing methods for block[n, m]. 2. Encode n and m with the newly extended 64-bit LevelType enum. 3. Update 2:4 methods names/comments to n:m.	2024-02-08 14:38:42 -05:00
Benjamin Maxwell	e280c287e4	[mlir] Add `mlir_arm_runner_utils` library for use in integration tests (#78583 ) This adds a new `mlir_arm_runner_utils` library that contains utils specific to Arm/AArch64. This is for use in MLIR integration tests. This initial patch adds `setArmVLBits()` and `setArmSVLBits()`. This allows changing vector length or streaming vector length at runtime (or setting it to a known minimum, i.e. 128-bits).	2024-01-22 09:28:13 +00:00
Fabian Mora	01dbc5da33	Reland [mlir][ExecutionEngine] Add support for global constructors and destructors #78070 (#78170 ) This patch add support for executing global constructors and destructors in the ExecutionEngine.	2024-01-15 12:10:14 -05:00
Cullen Rhodes	3295b88a66	Revert "[mlir][ExecutionEngine] Add support for global constructors and destructors" (#78164 ) this is causing test failures on AArch64 linux, hitting the following assert: # \| mlir-cpu-runner: /home/culrho01/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:519: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const SectionEntry &, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed. Seeing the same in buildbot as well, e.g. https://lab.llvm.org/buildbot/#/builders/179/builds/9094/steps/12/logs/FAIL__MLIR__sparse_codegen_dim_mlir Reverts llvm/llvm-project#78070	2024-01-15 14:21:41 +00:00
Fabian Mora	48e8cd8345	[mlir][ExecutionEngine] Add support for global constructors and destructors (#78070 ) This patch add support for executing global constructors and destructors in the `ExecutionEngine`.	2024-01-14 21:41:23 -05:00
Yinying Li	753dc0a01c	[mlir][verifyMemref] Fix bug and support more types for verifyMemref (#77682 ) 1. Fix a bug in verifyMemref to pass in `data` instead of `baseptr`, which didn't verify data correctly. 2. Add `==` for f16 and bf16. 3. Add a comprehensive test of verifyMemref for all supported types.	2024-01-10 20:04:43 -05:00
Yinying Li	412d784188	[mlir][sparse][CRunnerUtils] Add shuffle in CRunnerUtils (#77124 ) Shuffle can generate an array of unique and random numbers from 0 to size-1. It can be used to generate tensors with specified sparsity level.	2024-01-09 19:46:35 -05:00
Aart Bik	41a07e668c	[mlir][sparse] recognize NVidia 2:4 type for matmul (#76758 ) This removes the temporary DENSE24 attribute and replaces it with proper recognition of dense to 24 conversion. The compressionh will be performed on the device prior to performing the matrix mult. Note that we no longer need to start with the linalg version, we can lift this to the proper named linalg op. Also renames some files into more consistent names.	2024-01-02 14:44:24 -08:00
Adrian Kuegel	ac8b53fc92	[mlir] Apply ClangTidy performance fix - Use '\n' instead of std::endl; https://clang.llvm.org/extra/clang-tidy/checks/performance/avoid-endl.html	2024-01-02 10:00:29 +00:00
Adam Paszke	12e4332501	[mlir][nvgpu] Fix the TMA stride setup (#75838 ) There were two issues with the previous computation: * it never looked at dimensions past the second one * the definition was recursive, making each dimension have an extra `elementSize` power	2023-12-19 08:40:26 +01:00
Yinying Li	7bc6c4abe8	[mlir][print]Add functions for printing memref f16/bf16/i16 (#75094 ) 1. Added functions for printMemrefI16/f16/bf16. 2. Added a new integration test for all the printMemref functions.	2023-12-14 13:06:25 -05:00
Adam Paszke	65aab9e722	[mlir][gpu] Generate multiple rank-specializations for tensor map cre… (#74082 ) …ation The previous code was technically incorrect in that the type indicated that the memref only has 1 dimension, while the code below was happily dereferencing the size array out of bounds. Now, if the compiler doesn't get too smart about optimizations, this code might even work. But, if the compiler realizes that the array has 1 element it might starrt doing silly things. This generates a specialization per each supported rank, making sure we don't do any UB.	2023-12-01 15:51:48 +01:00
Aart Bik	6fb7c2d713	[mlir][sparse] bug fix on all-dense lex insertion (#73987 ) Fixes a bug that appended values after insertion completed. Also slight optimization by avoiding all-Dense computation for every lexInsert call	2023-11-30 14:19:02 -08:00

1 2 3 4 5 ...

591 Commits