Retry landing https://github.com/llvm/llvm-project/pull/153373
## Major changes from previous attempt
- remove the test in CAPI because no existing tests in CAPI deal with
sanitizer exemptions
- update `mlir/docs/Dialects/GPU.md` to reflect the new behavior: load
GPU binary in global ctors, instead of loading them at call site.
- skip the test on Aarch64 since we have an issue with initialization there
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
This PR introduces a mechanism to defer JIT engine initialization,
enabling registration of required symbols before global constructor
execution.
## Problem
Modules containing `gpu.module` generate global constructors (e.g.,
kernel load/unload) that execute *during* engine creation. This can
force premature symbol resolution, causing failures when:
- Symbols are registered via `mlirExecutionEngineRegisterSymbol` *after*
creation
- Global constructors exist (even if not directly using unresolved
symbols, e.g., an external function declaration)
- GPU modules introduce mandatory binary loading logic
## Usage
```c
// Create engine without initialization
MlirExecutionEngine jit = mlirExecutionEngineCreate(...);
// Register required symbols
mlirExecutionEngineRegisterSymbol(jit, ...);
// Explicitly initialize (runs global constructors)
mlirExecutionEngineInitialize(jit);
```
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
`RTDyldObjectLinkingLayer` is currently creating a memory manager
without any parameters.
In this PR I am passing the MemoryBuffer that will be emitted to the
MemoryManager so that the user can use it to configure the behaviour of
the MemoryManager.
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
This re-applies f905bf3e1ef860c4d6fe67fb64901b6bbe698a91, which was reverted in
c861c1a046eb8c1e546a8767e0010904a3c8c385 due to compiler errors, with a fix for
MLIR.
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
this is causing test failures on AArch64 linux, hitting the
following assert:
# | mlir-cpu-runner: /home/culrho01/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:519: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const SectionEntry &, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.
Seeing the same in buildbot as well, e.g.
https://lab.llvm.org/buildbot/#/builders/179/builds/9094/steps/12/logs/FAIL__MLIR__sparse_codegen_dim_mlir
Reverts llvm/llvm-project#78070
Replace some uses of `Type::getPointerTo` via 2 ways
* Remove entirely if it's only used to support an unnecessary bitcast
(remove the bitcast as well).
* Replace with `PointerType::get`/`PointerType::getUnqual`
NFC opaque pointer clean-up effort.
SubtargetFeature.h is currently part of MC while it doesn't depend on
anything in MC. Since some LLVM components might have the need to work
with target features without necessarily needing MC, it might be
worthwhile to move SubtargetFeature.h to a different location. This will
reduce the dependencies of said components.
Note that I choose TargetParser as the destination because that's where
Triple lives and SubtargetFeatures feels related to that.
This issues came up during a JITLink review (D149522). JITLink would
like to avoid a dependency on MC while still needing to store target
features.
Reviewed By: MaskRay, arsenm
Differential Revision: https://reviews.llvm.org/D150549
In https://reviews.llvm.org/D153029, I moved the loading/unloading
mechanisms of shared libraries from the JIT runner to the execution
engine in order to make that mechanism available in the latter
(including its Python bindings). However, I realized that I introduced a
small change in semantic: previously, the JIT runner checked for the
presence of init/destroy functions and only loaded the library as
JITDyLib if they were not present. After I moved the code, all libraries
were loaded as JITDyLib, even if they registered their symbols
explicitly in their init function. I am not sure if this is really a
problem but (1) the previous behavior was different and (2) I guess it
could cause a problem if some symbols are exported through the init
function *and* have public visibility. This patch reestablishes the
original behaviour in the new place of the code.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D153249
Both the mlir-cpu-runner and the execution engine allow to provide a
list of shared libraries that should be loaded into the process such
that the jitted code can use the symbols from those libraries. The
runner had implemented a protocol that allowed libraries to control
which symbols it wants to provide in that context (with a function
called __mlir_runner_init). In absence of that, the runner would rely on
the loading mechanism of the execution engine, which didn't do anything
particular with the symbols, i.e., only symbols with public visibility
were visible to jitted code.
Libraries used a mix of the two mechanisms: while the runner utils and C
runner utils libs (and potentially others) used public visibility, the
async runtime lib (as the only one in the monorepo) used the loading
protocol. As a consequence, the async runtime library could not be used
through the Python bindings of the execution engine.
This patch moves the loading protocol from the runner to the execution
engine. For the runner, this should not change anything: it lets the
execution engine handle the loading which now implements the same
protocol that the runner had implemented before. However, the Python
binding now get to benefit from the loading protocol as well, so the
async runtime library (and potentially other out-of-tree libraries) can
now be used in that context.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D153029
This patch adds support for `-mattr` and `-march` in mlir-cpu-runner.
With this change, one should be able to consistently use mlir-cpu-runner
for MLIR's integration tests (instead of e.g. resorting to lli when some
additional flags are needed). This is demonstrated in
concatenate_dim_1.mlir.
In order to support the new flags, this patch makes sure that
MLIR's ExecutionEngine/JITRunner (that mlir-cpu-runner is built on top of):
* takes into account the new command line flags when creating
TargetMachine,
* avoids recreating TargetMachine if one is already available,
* creates LLVM's DataLayout based on the previously configured
TargetMachine.
This is necessary in order to make sure that the command line
configuration is propagated correctly to the backend code generator.
A few additional updates are made in order to facilitate this change,
including support for debug dumps from JITRunner.
Differential Revision: https://reviews.llvm.org/D146917
Replace references to enumerate results with either result_pairs
(reference wrapper type) or structured bindings. I did not use
structured bindings everywhere as it wasn't clear to me it would
improve readability.
This is in preparation to the switch to zip semantics which won't
support non-const lvalue reference to elements:
https://reviews.llvm.org/D144503.
I chose to use values instead of const lvalue-refs because MLIR is
biased towards avoiding `const` local variables. This won't degrade
performance because currently `result_pair` is cheap to copy (size_t
+ iterator), and in the future, the enumerator iterator dereference
will return temporaries anyway.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D146006
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
This adds a `--no-implicit-module` option, which disables the insertion
of a top-level `builtin.module` during parsing. The top-level op is
required to have the `SymbolTable` trait.
The majority of the change here is removing `ModuleOp` from interfaces.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D134238
The enableObjectCache option was added in
https://reviews.llvm.org/rG06e8101034e, defaulting to false. However,
the init code added there got its logic reversed
(cache(enableObjectCache ? nullptr : new SimpleObjectCache()), which was
fixed in https://reviews.llvm.org/rGd1186fcb04 by setting the default to
true, thereby preserving the existing behavior even if it was
unintentional.
Default now the object cache to false as it was originally intended.
While at it, mention in enableObjectCache's documentation how the
cache can be dumped.
Reviewed-by: mehdi_amini
Differential Revision: https://reviews.llvm.org/D121291
By specifying a sectionMemoryMapper, users can control how
memory for JIT code is allocated.
In particular, I need this in order to use a named memory
region so that profilers such as perf(1) can correctly label
execution cycles coming from JIT'ed code.
Reviewed-by: ezhulenev
Differential Revision: https://reviews.llvm.org/D120415
Its number of optional parameters has grown too large,
which makes adding new optional parameters quite a chore.
Fix this by using an options struct.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D120380
By default, the listeners do nothing unless linked in.
This revision allows the "Perf" and "Intel" Jit event listeners to be used.
The "OProfile" event listener is not enabled at this time,
the associated library structure is not well-isolated.
Differential Revision: https://reviews.llvm.org/D116552
The purpose of the change is to make clear whether the user is
retrieving the original function or the wrapper function, in line with
the invoke commands. This new functionality is useful for users that
already have defined their own packed interface, so they do not want the
extra layer of indirection, or for users wanting to the look at the
resulting primary function rather than the wrapper function.
All locations, except the python bindings now have a `lookupPacked`
method that matches the original `lookup` functionality. `lookup`
still exists, but with new semantics.
- `lookup` returns the function with a given name. If `bool f(int,int)`
is compiled, `lookup` will return a reference to `bool(*f)(int,int)`.
- `lookupPacked` returns the packed wrapper of the function with the
given name. If `bool f(int,int)` is compiled, `lookupPacked` will return
`void(*mlir_f)(void**)`.
Differential Revision: https://reviews.llvm.org/D114352
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
Remove uses of to-be-deprecated API. In cases where the correct
element type was not immediately obvious to me, fall back to
explicit getPointerElementType().
For the use in LLVMOps.td I used the getPointerElementType()
escape hatch, as it's not obvious to me how the load type
should be properly obtained here.
A series of preceding patches changed the mechanism for translating MLIR to
LLVM IR to use dialect interface with delayed registration. It is no longer
necessary for specific dialects to derive from ModuleTranslation. Remove all
virtual methods from ModuleTranslation and factor out the entry point to be a
free function.
Also perform some cleanups in ModuleTranslation internals.
Depends On D96774
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96775
This new invoke will pack a list of argument before calling the
`invokePacked` method. It accepts returned value as output argument
wrapped in `ExecutionEngine::Result<T>`, and delegate the packing of
arguments to a trait to allow for customization for some types.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D95961
All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.
The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.
Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.
Reviewed By: tejohnson, MaskRay, jpienaar
Differential Revision: https://reviews.llvm.org/D91410
These includes have been deprecated in favor of BuiltinDialect.h, which contains the definitions of ModuleOp and FuncOp.
Differential Revision: https://reviews.llvm.org/D91572
This patch introduces a SPIR-V runner. The aim is to run a gpu
kernel on a CPU via GPU -> SPIRV -> LLVM conversions. This is a first
prototype, so more features will be added in due time.
- Overview
The runner follows similar flow as the other runners in-tree. However,
having converted the kernel to SPIR-V, we encode the bind attributes of
global variables that represent kernel arguments. Then SPIR-V module is
converted to LLVM. On the host side, we emulate passing the data to device
by creating in main module globals with the same symbolic name as in kernel
module. These global variables are later linked with ones from the nested
module. We copy data from kernel arguments to globals, call the kernel
function from nested module and then copy the data back.
- Current state
At the moment, the runner is capable of running 2 modules, nested one in
another. The kernel module must contain exactly one kernel function. Also,
the runner supports rank 1 integer memref types as arguments (to be scaled).
- Enhancement of JitRunner and ExecutionEngine
To translate nested modules to LLVM IR, JitRunner and ExecutionEngine were
altered to take an optional (default to `nullptr`) function reference that
is a custom LLVM IR module builder. This allows to customize LLVM IR module
creation from MLIR modules.
Reviewed By: ftynse, mravishankar
Differential Revision: https://reviews.llvm.org/D86108
Due to the original type system implementation, LLVMDialect in MLIR contains an
LLVMContext in which the relevant objects (types, metadata) are created. When
an MLIR module using the LLVM dialect (and related intrinsic-based dialects
NVVM, ROCDL, AVX512) is converted to LLVM IR, it could only live in the
LLVMContext owned by the dialect. The type system no longer relies on the
LLVMContext, so this limitation can be removed. Instead, translation functions
now take a reference to an LLVMContext in which the LLVM IR module should be
constructed. The caller of the translation functions is responsible for
ensuring the same LLVMContext is not used concurrently as the translation no
longer uses a dialect-wide context lock.
As an additional bonus, this change removes the need to recreate the LLVM IR
module in a different LLVMContext through printing and parsing back, decreasing
the compilation overhead in JIT and GPU-kernel-to-blob passes.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D85443
MLIR tests in "mlir/test/mlir-cpu-runner" fails in SystemZ (z14) because
of incompatible datalayout error. This patch fixes it by setting host
CPU name in createTargetMachine()
Differential Revision: https://reviews.llvm.org/D80130
This change makes the ModuleTranslation threadsafe by locking on the
LLVMContext. Furthermore, we now clone the llvm module into a new
context when compiling to PTX similar to what the OrcJit does.
Differential Revision: https://reviews.llvm.org/D78207