609 lines
24 KiB
ReStructuredText
609 lines
24 KiB
ReStructuredText
=============================================
|
|
Machine Learning - Guided Optimization (MLGO)
|
|
=============================================
|
|
|
|
Introduction
|
|
============
|
|
|
|
MLGO refers to integrating ML techniques (primarily) to replace heuristics within
|
|
LLVM with machine learned models.
|
|
|
|
Currently the following heuristics feature such integration:
|
|
|
|
* Inlining for size
|
|
* Register allocation (LLVM greedy eviction heuristic) for performance
|
|
|
|
This document is an outline of the tooling and APIs facilitating MLGO.
|
|
|
|
.. note::
|
|
|
|
The tools for orchestrating ML training are not part of LLVM, as they are
|
|
dependency-heavy - both on the ML infrastructure choice, as well as choices of
|
|
distributed computing. For the training scenario, LLVM only contains facilities
|
|
enabling it, such as corpus extraction, training data extraction, and evaluation
|
|
of models during training.
|
|
|
|
|
|
.. contents::
|
|
|
|
Corpus Tooling
|
|
==============
|
|
|
|
Within the LLVM monorepo, there is the ``mlgo-utils`` python packages that
|
|
lives at ``llvm/utils/mlgo-utils``. This package primarily contains tooling
|
|
for working with corpora, or collections of LLVM bitcode. We use these corpora
|
|
to train and evaluate ML models. Corpora consist of a description in JSON
|
|
format at ``corpus_description.json`` in the root of the corpus, and then
|
|
a bitcode file and command line flags file for each extracted module. The
|
|
corpus structure is designed to contain sufficient information to fully
|
|
compile the bitcode to bit-identical object files.
|
|
|
|
.. program:: extract_ir.py
|
|
|
|
Synopsis
|
|
--------
|
|
|
|
Extracts a corpus from some form of a structured compilation database. This
|
|
tool supports a variety of different scenarios and input types.
|
|
|
|
Options
|
|
-------
|
|
|
|
.. option:: --input
|
|
|
|
The path to the input. This should be a path to a supported structured
|
|
compilation database. Currently only ``compile_commands.json`` files, linker
|
|
parameter files, a directory containing object files (for the local
|
|
ThinLTO case only), or a JSON file containing a bazel aquery result are
|
|
supported.
|
|
|
|
.. option:: --input_type
|
|
|
|
The type of input that has been passed to the ``--input`` flag.
|
|
|
|
.. option:: --output_dir
|
|
|
|
The output directory to place the corpus in.
|
|
|
|
.. option:: --num_workers
|
|
|
|
The number of workers to use for extracting bitcode into the corpus. This
|
|
defaults to the number of hardware threads available on the host system.
|
|
|
|
.. option:: --llvm_objcopy_path
|
|
|
|
The path to the llvm-objcopy binary to use when extracting bitcode.
|
|
|
|
.. option:: --obj_base_dir
|
|
|
|
The base directory for object files. Bitcode files that get extracted into
|
|
the corpus will be placed into the output directory based on where their
|
|
source object files are placed relative to this path.
|
|
|
|
.. option:: --cmd_filter
|
|
|
|
Allows filtering of modules by command line. If set, only modules that much
|
|
the filter will be extracted into the corpus. Regular expressions are
|
|
supported in some instances.
|
|
|
|
.. option:: --thinlto_build
|
|
|
|
If the build was performed with ThinLTO, this should be set to either
|
|
``distributed`` or ``local`` depending upon how the build was performed.
|
|
|
|
.. option:: --cmd_section_name
|
|
|
|
This flag allows specifying the command line section name. This is needed
|
|
on non-ELF platforms where the section name might differ.
|
|
|
|
.. option:: --bitcode_section_name
|
|
|
|
This flag allows specifying the bitcode section name. This is needed on
|
|
non-ELF platforms where the section name might differ.
|
|
|
|
Example: CMake
|
|
--------------
|
|
|
|
CMake can output a ``compilation_commands.json`` compilation database if the
|
|
``CMAKE_EXPORT_COMPILE_COMMANDS`` switch is turned on at compile time. It is
|
|
also necessary to enable bitcode embedding (done by passing
|
|
``-Xclang -fembed-bitcode=all`` to all C/C++ compilation actions in the
|
|
non-ThinLTO case). For example, to extract a corpus from clang, you would
|
|
run the following commands (assuming that the system C/C++ compiler is clang):
|
|
|
|
.. code-block:: bash
|
|
|
|
cmake -GNinja \
|
|
-DCMAKE_BUILD_TYPE=Release \
|
|
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
|
|
-DCMAKE_C_FLAGS="-Xclang -fembed-bitcode=all" \
|
|
-DCMAKE_CXX_FLAGS="-Xclang -fembed-bitcode-all"
|
|
../llvm
|
|
ninja
|
|
|
|
After running CMake and building the project, there should be a
|
|
``compilation_commands.json`` file within the build directory. You can then
|
|
run the following command to create a corpus:
|
|
|
|
.. code-block:: bash
|
|
|
|
python3 ./extract_ir.py \
|
|
--input=./build/compile_commands.json \
|
|
--input_type=json \
|
|
--output_dir=./corpus
|
|
|
|
After running the above command, there should be a full
|
|
corpus of bitcode within the ``./corpus`` directory.
|
|
|
|
Example: Bazel Aquery
|
|
---------------------
|
|
|
|
This tool also supports extracting bitcode from bazel in multiple ways
|
|
depending upon the exact configuration. For ThinLTO, a linker parameters file
|
|
is preferred. For the non-ThinLTO case, the script will accept the output of
|
|
``bazel aquery`` which it will use to find all the object files that are linked
|
|
into a specific target and then extract bitcode from them. First, you need
|
|
to generate the aquery output:
|
|
|
|
.. code-block:: bash
|
|
|
|
bazel aquery --output=jsonproto //path/to:target > /path/to/aquery.json
|
|
|
|
Afterwards, assuming that the build is already complete, you can run this
|
|
script to create a corpus:
|
|
|
|
.. code-block:: bash
|
|
|
|
python3 ./extract_ir.py \
|
|
--input=/path/to/aquery.json \
|
|
--input_type=bazel_aqeury \
|
|
--output_dir=./corpus \
|
|
--obj_base_dir=./bazel-bin
|
|
|
|
This will again leave a corpus that contains all the bitcode files. This mode
|
|
does not capture all object files in the build however, only the ones that
|
|
are involved in the link for the binary passed to the ``bazel aquery``
|
|
invocation.
|
|
|
|
.. program:: make_corpus.py
|
|
|
|
Synopsis
|
|
--------
|
|
|
|
Creates a corpus from a collection of bitcode files.
|
|
|
|
Options
|
|
-------
|
|
|
|
.. option:: --input_dir
|
|
|
|
The input directory to search for bitcode files in.
|
|
|
|
.. option:: --output_dir
|
|
|
|
The output directory to place the constructed corpus in.
|
|
|
|
.. option:: --default_args
|
|
|
|
A list of space separated flags that are put into the corpus description.
|
|
These are used by some tooling when compiling the modules within the corpus.
|
|
|
|
.. program:: combine_training_corpus.py
|
|
|
|
Synopsis
|
|
--------
|
|
|
|
Combines two training corpora that share the same parent folder by generating
|
|
a new ``corpus_description.json`` that contains all the modules in both corpora.
|
|
|
|
Options
|
|
-------
|
|
|
|
.. option:: --root_dir
|
|
|
|
The root directory that contains subfolders consisting of the corpora that
|
|
should be combined.
|
|
|
|
Interacting with ML models
|
|
==========================
|
|
|
|
We interact with ML models in 2 primary scenarios: one is to train such a model.
|
|
The other, inference, is to use a model during compilation, to make optimization
|
|
decisions.
|
|
|
|
For a specific optimization problem - i.e. inlining, or regalloc eviction - we
|
|
first separate correctness - preserving decisions from optimization decisions.
|
|
For example, not inlining functions marked "no inline" is an example of the
|
|
former. Same is not evicting an unevictable live range. An example of the latter
|
|
is deciding to inline a function that will bloat the caller size, just because
|
|
we have reason to believe that later, the effect will be some constant
|
|
propagation that will actually reduce the size (or dynamic instruction count).
|
|
|
|
ML models can be understood as functions. Their inputs are tensors - buffers of
|
|
scalars. The output (in our case, singular) is a scalar. For example, for
|
|
inlining, the inputs are properties of the caller, callee, and the callsite
|
|
being analyzed for inlining. The output is a boolean.
|
|
|
|
Inputs and outputs are named, have a scalar type (e.g. int32_t) and a shape
|
|
(e.g. 3x4). These are the elements that we use to bind to a ML model.
|
|
|
|
In both training and inference, we want to expose to ML (training algorithms or
|
|
trained model, respectively) the features we want to make optimization
|
|
decisions on. In that regard, the interface from the compiler side to the ML
|
|
side is the same: pass features, and get a decision. It's essentially a function
|
|
call, where the parameters and result are bound by name and are described by
|
|
name, scalar type, and shape tuples.
|
|
|
|
The main types in LLVM are:
|
|
|
|
- ``MLModelRunner`` - an abstraction for the decision making mechanism
|
|
- ``TensorSpec`` which describes a tensor.
|
|
|
|
TensorSpec
|
|
----------
|
|
|
|
See ``llvm/Analysis/TensorSpec.h``. This is a simple data bag, identifying a
|
|
tensor by name (a string), scalar type, and shape (a vector of ints). The scalar
|
|
type can only be int (8, 16, 32, or 64), signed or unsigned; float; or double.
|
|
|
|
MLModelRunner
|
|
-------------
|
|
|
|
See ``llvm/Analysis/MLModelRunner.h``. The abstraction has a pure virtual,
|
|
``evaluateUntyped``, but the contract with implementers is a bit more involved:
|
|
|
|
Implementers
|
|
^^^^^^^^^^^^
|
|
|
|
At construction, the implementer is expected to receive a list of ``TensorSpec``
|
|
for input features and the ``TensorSpec`` of the output (e.g.
|
|
``std::vector<TensorSpec>``). The list type is not contractual, but it must be
|
|
a 0-based indexing array-like container. Given a ``TensorSpec`` at index "I" in
|
|
the input list, that has a name "N", shape "D1 x D2x ... Dn", and scalar type
|
|
"T", the implementer must:
|
|
|
|
- set up a contiguous buffer sized ``sizeof(T) * D1 * D2 * ... * Dn``. This
|
|
buffer's lifetime must be the same as the lifetime of the implementer object.
|
|
- call ``MLModelRunner::setUpBufferForTensor`` passing I, the ``TensorSpec``,
|
|
and the buffer above.
|
|
|
|
Internally, the expectation is that the implementer uses the name (and maybe
|
|
shape) of a ``TensorSpec`` for binding (e.g. lookup in an underlying ML model).
|
|
|
|
``MLModelRunner::setUpBufferForTensor`` stores each buffer at the corresponding
|
|
index (i.e. its position in the list used at construction). The expectation is
|
|
that the user will use that position when calling ``MLModelRunner::getTensor``
|
|
to retrieve the underlying buffer (more on that in a bit).
|
|
|
|
The implementation of ``evaluateUntyped`` is expected to use the value in the
|
|
buffers described above, carry out whatever computation (e.g. evaluate a ML
|
|
model) and then place the outcome in an output buffer which will be returned to
|
|
the caller. Importantly, ``evaluateUntyped`` must not reset the input buffers.
|
|
This is because during training we may want to log the features and decisions,
|
|
and since the data is already buffered, there's no reason to force backing it
|
|
up elsewhere.
|
|
|
|
Users
|
|
^^^^^
|
|
|
|
The users must pass the input ``TensorSpec`` list at the construction of a
|
|
specific ``MLModelRunner`` object. After that, users can be agnostic of the
|
|
specific implementation, and would typically follow the following workflow:
|
|
|
|
- call ``getTensor`` or ``getTensorUntyped``, for each input tensor, identified
|
|
by its index (i.e. the index of the corresponding ``TensorSpec`` in the list
|
|
used at construction).
|
|
- populate the tensor buffer of each input tensor with values. Users can take
|
|
advantage of the stability of the tensor buffers like set only once those that
|
|
don't change, or cache the buffer address
|
|
- call ``evaluate`` and use its result.
|
|
|
|
Versioning
|
|
^^^^^^^^^^
|
|
|
|
We support a model "knowing" less inputs than the compiler. This is supported by
|
|
``MLModelRunner::setUpBufferForTensor``. If a ``TensorSpec`` requested by the
|
|
compiler is not supported by the underlying model, the ``MLModelRunner``
|
|
implementer must still call ``setUpBufferForTensor`` with a ``nullptr`` value
|
|
for the buffer. In turn, ``MLModelRunner`` will allocate an appropriately - sized
|
|
buffer and track its lifetime. The user can safely populate that buffer. Since
|
|
the rest of the inputs are still provided, this allows an evolution model where
|
|
we first add features to the compiler and continue using older models without
|
|
regressing. Then, the new compiler can be used to train new models. Deprecating
|
|
features in the compiler involves, then, training first a model without those
|
|
features.
|
|
|
|
``MLModelRunner`` implementations
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
We currently feature 4 implementations:
|
|
|
|
- ``ModelUnderTrainingRunner``. This requires the compiler be built with TFLite
|
|
support. It allows loading a TFLite model dynamically and is primarily
|
|
intended for training scenarios, but it can be used relatively easily in
|
|
production build environments, as it does not change how the compiler operates
|
|
(why this remark is necessary will become clear in a few paragraphs)
|
|
|
|
- ``ReleaseModeModelRunner``. This is intended for inference scenarios. This
|
|
uses the rules defined in ``llvm/cmake/modules/TensorFlowCompile.cmake`` to
|
|
convert, at the time the compiler is built, TensorFlow Saved Models into a
|
|
header (.h) and native object (.o). The latter is a CPU-based implementation of
|
|
the neural network, together with its weights (essentially, loops performing
|
|
matrix multiplications)
|
|
|
|
.. note::
|
|
|
|
we are actively working on replacing this with an EmitC implementation
|
|
requiring no out of tree build-time dependencies.
|
|
|
|
- ``InteractiveModelRunner``. This is intended for training scenarios where the
|
|
training algorithm drives compilation. This model runner has no special
|
|
dependencies, and relies on I/O pipes to communicate with a separate process,
|
|
presumably a python training algorithm. We do not envision using this in a
|
|
production environment.
|
|
|
|
- ``NoInferenceModelRunner``. This serves as a store for feature values, and its
|
|
``evaluate`` should never be called. It's used for training scenarios, when we
|
|
want to capture the behavior of the default (non-ML) heuristic.
|
|
|
|
Note that training leaves it to the training infrastructure to handle
|
|
distributed computing. The assumed architecture has python processes
|
|
communicating remotely between themselves, but managing local communication with
|
|
clang.
|
|
|
|
Logging Facility
|
|
----------------
|
|
|
|
When training models, we need to expose the features we will want to use during
|
|
inference, as well as outcomes, to guide reward-based learning techniques. This
|
|
can happen in 2 forms:
|
|
|
|
- when running the compiler on some input, as a capture of the features and
|
|
actions taken by some policy or a model currently being used.
|
|
For example, see ``DevelopmentModeInlineAdvisor`` or ``DevelopmentModeEvictAdvisor``
|
|
in ``MLRegallocEvictAdvisor.cpp``. In more detail, in the former case, if
|
|
``-training-log`` is specified, the features and actions (inline/no inline)
|
|
from each inlining decision are saved to the specified file. Since
|
|
``MLModelRunner`` implementations hold on to feature values (they don't get
|
|
cleared by ``evaluate``), logging is easily supported by just looping over the
|
|
model runner's features and passing the tensor buffers to the logger. Note how
|
|
we use the ``NoInferenceModelRunner`` to capture the features observed when
|
|
using the default policy.
|
|
|
|
- as a serialization mechanism for the ``InteractiveModelRunner``. Here, we need
|
|
to pass the observed features over IPC (a file descriptor, likely a named
|
|
pipe).
|
|
|
|
Both cases require serializing the same kind of data and we support both with
|
|
``Analysis/Utils/TrainingLogger``.
|
|
|
|
The goal of the logger design was avoiding any new dependency, and optimizing
|
|
for the tensor scenario - i.e. exchanging potentially large buffers of fixed
|
|
size, containing scalars. We explicitly assume the reader of the format has the
|
|
same endianness as the compiler host, and we further expect the reader and the
|
|
compiler run on the same host. This is because we expect the training scenarios
|
|
have a (typically python) process managing the compiler process, and we leave to
|
|
the training side to handle remoting.
|
|
|
|
The logger produces the following sequence:
|
|
|
|
- a header describing the structure of the log. This is a one-line textual JSON
|
|
dictionary with the following elements:
|
|
|
|
- ``features``: a list of JSON-serialized ``TensorSpec`` values. The position
|
|
in the list matters, as it will be the order in which values will be
|
|
subsequently recorded. If we are just logging (i.e. not using the
|
|
``InteractiveModelRunner``), the last feature should be that of the action
|
|
(e.g. "inline/no inline", or "index of evicted live range")
|
|
- (optional) ``score``: a ``TensorSpec`` describing a value we will include to
|
|
help formulate a reward. This could be a size estimate or a latency estimate.
|
|
- (optional) ``advice``: a ``TensorSpec`` describing the action. This is used
|
|
for the ``InteractiveModelRunner``, in which case it shouldn't be in the
|
|
``features`` list.
|
|
- a sequence of ``contexts``. Contexts are independent traces of the optimization
|
|
problem. For module passes, there is only one context, for function passes,
|
|
there is a context per function. The start of a context is marked with a
|
|
one-line JSON dictionary of the form ``{"context": <context name, a string>}``
|
|
|
|
Each context has a sequence of:
|
|
|
|
- ``observations``. An observation is:
|
|
|
|
- one-line JSON ``{"observation": <observation number. 0-indexed>}``
|
|
- a binary dump of the tensor buffers, in the order in which they were
|
|
specified in the header.
|
|
- a new line character
|
|
- if ``score`` was specified in the header:
|
|
|
|
- a one-line JSON object ``{"outcome": <value>}``, where the ``value``
|
|
conforms to the ``TensorSpec`` in defined for the ``score`` in the header.
|
|
- the outcome value, as a binary dump
|
|
- a new line character.
|
|
|
|
The format uses a mix of textual JSON (for headers) and binary dumps (for tensors)
|
|
because the headers are not expected to dominate the payload - the tensor values
|
|
are. We wanted to avoid overburdening the log reader - likely python - from
|
|
additional dependencies; and the one-line JSON makes it rudimentarily possible
|
|
to inspect a log without additional tooling.
|
|
|
|
A python utility for reading logs, used for tests, is available at
|
|
``Analysis/models/log_reader.py``. A utility showcasing the ``InteractiveModelRunner``,
|
|
which uses this reader as well, is at ``Analysis/models/interactive_host.py``.
|
|
The latter is also used in tests.
|
|
|
|
There is no C++ implementation of a log reader. We do not have a scenario
|
|
motivating one.
|
|
|
|
IR2Vec Embeddings
|
|
=================
|
|
|
|
IR2Vec is a program embedding approach designed specifically for LLVM IR. It
|
|
is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
|
|
capture syntactic, semantic, and structural properties of the IR through
|
|
learned representations. These representations are obtained as a JSON
|
|
vocabulary that maps the entities of the IR (opcodes, types, operands) to
|
|
n-dimensional floating point vectors (embeddings).
|
|
|
|
With IR2Vec, representation at different granularities of IR, such as
|
|
instructions, functions, and basic blocks, can be obtained. Representations
|
|
of loops and regions can be derived from these representations, which can be
|
|
useful in different scenarios. The representations can be useful for various
|
|
downstream tasks, including ML-guided compiler optimizations.
|
|
|
|
The core components are:
|
|
- **Vocabulary**: A mapping from IR entities (opcodes, types, etc.) to their
|
|
vector representations. This is managed by ``IR2VecVocabAnalysis``. The
|
|
vocabulary (.json file) contains three sections -- Opcodes, Types, and
|
|
Arguments, each containing the representations of the corresponding
|
|
entities.
|
|
|
|
.. note::
|
|
|
|
It is mandatory to have these three sections present in the vocabulary file
|
|
for it to be valid; order in which they appear does not matter.
|
|
|
|
- **Embedder**: A class (``ir2vec::Embedder``) that uses the vocabulary to
|
|
compute embeddings for instructions, basic blocks, and functions.
|
|
|
|
Using IR2Vec
|
|
------------
|
|
|
|
.. note::
|
|
|
|
This section describes how to use IR2Vec within LLVM passes. A standalone
|
|
tool :doc:`CommandGuide/llvm-ir2vec` is available for generating the
|
|
embeddings and triplets from LLVM IR files, which can be useful for
|
|
training vocabularies and generating embeddings outside of compiler passes.
|
|
|
|
For generating embeddings, first the vocabulary should be obtained. Then, the
|
|
embeddings can be computed and accessed via an ``ir2vec::Embedder`` instance.
|
|
|
|
1. **Get the Vocabulary**:
|
|
In a ModulePass, get the vocabulary analysis result:
|
|
|
|
.. code-block:: c++
|
|
|
|
auto &VocabRes = MAM.getResult<IR2VecVocabAnalysis>(M);
|
|
if (!VocabRes.isValid()) {
|
|
// Handle error: vocabulary is not available or invalid
|
|
return;
|
|
}
|
|
const ir2vec::Vocab &Vocabulary = VocabRes.getVocabulary();
|
|
|
|
Note that ``IR2VecVocabAnalysis`` pass is immutable.
|
|
|
|
2. **Create Embedder instance**:
|
|
With the vocabulary, create an embedder for a specific function:
|
|
|
|
.. code-block:: c++
|
|
|
|
// Assuming F is an llvm::Function&
|
|
// For example, using IR2VecKind::Symbolic:
|
|
std::unique_ptr<ir2vec::Embedder> Emb =
|
|
ir2vec::Embedder::create(IR2VecKind::Symbolic, F, Vocabulary);
|
|
|
|
|
|
3. **Compute and Access Embeddings**:
|
|
Call ``getFunctionVector()`` to get the embedding for the function.
|
|
|
|
.. code-block:: c++
|
|
|
|
const ir2vec::Embedding &FuncVector = Emb->getFunctionVector();
|
|
|
|
Currently, ``Embedder`` can generate embeddings at three levels: Instructions,
|
|
Basic Blocks, and Functions. Appropriate getters are provided to access the
|
|
embeddings at these levels.
|
|
|
|
.. note::
|
|
|
|
The validity of ``Embedder`` instance (and the embeddings it generates) is
|
|
tied to the function it is associated with remains unchanged. If the function
|
|
is modified, the embeddings may become stale and should be recomputed accordingly.
|
|
|
|
4. **Working with Embeddings:**
|
|
Embeddings are represented as ``std::vector<double>``. These
|
|
vectors as features for machine learning models, compute similarity scores
|
|
between different code snippets, or perform other analyses as needed.
|
|
|
|
Further Details
|
|
---------------
|
|
|
|
For more detailed information about the IR2Vec algorithm, its parameters, and
|
|
advanced usage, please refer to the original paper:
|
|
`IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_.
|
|
|
|
For information about using IR2Vec tool for generating embeddings and
|
|
triplets from LLVM IR, see :doc:`CommandGuide/llvm-ir2vec`.
|
|
|
|
The LLVM source code for ``IR2Vec`` can also be explored to understand the
|
|
implementation details.
|
|
|
|
Building with ML support
|
|
========================
|
|
|
|
.. note::
|
|
|
|
For up to date information on custom builds, see the ``ml-*``
|
|
`build bots <http://lab.llvm.org>`_. They are set up using
|
|
`like this <https://github.com/google/ml-compiler-opt/blob/main/buildbot/buildbot_init.sh>`_.
|
|
|
|
Embed pre-trained models (aka "release" mode)
|
|
---------------------------------------------
|
|
|
|
This supports the ``ReleaseModeModelRunner`` model runners.
|
|
|
|
You need a tensorflow pip package for the AOT (ahead-of-time) Saved Model compiler
|
|
and a thin wrapper for the native function generated by it. We currently support
|
|
TF 2.15. We recommend using a python virtual env (in which case, remember to
|
|
pass ``-DPython3_ROOT_DIR`` to ``cmake``).
|
|
|
|
Once you install the pip package, find where it was installed:
|
|
|
|
.. code-block:: console
|
|
|
|
TF_PIP=$(sudo -u buildbot python3 -c "import tensorflow as tf; import os; print(os.path.dirname(tf.__file__))")``
|
|
|
|
Then build LLVM:
|
|
|
|
.. code-block:: console
|
|
|
|
cmake -DTENSORFLOW_AOT_PATH=$TF_PIP \
|
|
-DLLVM_INLINER_MODEL_PATH=<path to inliner saved model dir> \
|
|
-DLLVM_RAEVICT_MODEL_PATH=<path to regalloc eviction saved model dir> \
|
|
<...other options...>
|
|
|
|
The example shows the flags for both inlining and regalloc, but either may be
|
|
omitted.
|
|
|
|
You can also specify a URL for the path, and it is also possible to pre-compile
|
|
the header and object and then just point to the precompiled artifacts. See for
|
|
example ``LLVM_OVERRIDE_MODEL_HEADER_INLINERSIZEMODEL``.
|
|
|
|
.. note::
|
|
|
|
We are transitioning away from the AOT compiler shipping with the
|
|
tensorflow package, and to a EmitC, in-tree solution, so these details will
|
|
change soon.
|
|
|
|
Using TFLite (aka "development" mode)
|
|
-------------------------------------
|
|
|
|
This supports the ``ModelUnderTrainingRunner`` model runners.
|
|
|
|
Build the TFLite package using `this script <https://raw.githubusercontent.com/google/ml-compiler-opt/refs/heads/main/buildbot/build_tflite.sh>`_.
|
|
Then, assuming you ran that script in ``/tmp/tflitebuild``, just pass
|
|
``-C /tmp/tflitebuild/tflite.cmake`` to the ``cmake`` for LLVM.
|
|
|
|
Interactive Mode (for training / research)
|
|
------------------------------------------
|
|
|
|
The ``InteractiveModelRunner`` is available with no extra dependencies. For the
|
|
optimizations that are currently MLGO-enabled, it may be used as follows:
|
|
|
|
- for inlining: ``-mllvm -enable-ml-inliner=release -mllvm -inliner-interactive-channel-base=<name>``
|
|
- for regalloc eviction: ``-mllvm -regalloc-evict-advisor=release -mllvm -regalloc-evict-interactive-channel-base=<name>``
|
|
|
|
where the ``name`` is a path fragment. We will expect to find 2 files,
|
|
``<name>.in`` (readable, data incoming from the managing process) and
|
|
``<name>.out`` (writable, the model runner sends data to the managing process)
|