Currenty, setting the -mbb-profile-dump dumps a CSV file with blocks
inside an individual function identified by their MBB numbers. This
patch changes the MBBs to be identified by their ID which is set at MBB
creation and not changed afterwards, making it inherently stable
throughout the backend. This alleviates concerns with the MBB IDs
changing between the profile dump and what ends up in the final object
file. The MBBs inside the SHT_LLVM_BB_ADDR_MAP sections are also
identified using their MBB ID rather than number, so if we want to match
them up we need to identify the MBBs here by number.
Reviewed By: mtrofin, rahmanl
Differential Revision: https://reviews.llvm.org/D147366
In the very initial version of the patch that introduced this test I had
it only being conditionally compiled when in MLGO development mode (ie
when HAVE_TFLITE is true), but after some revision it is enabled all the
time and thus should be tested all the time. This patch fixes this
leftover behavior that should've been fixed in the original change set.
This patch updates the test for the MBB profile dump to include a
function that has multiple basic blocks so that we can test the
numbering of multiple basic blocks within an individual function.
This patch adds a basic block profile dump option within the AsmPrinter
and dumps basic block profile information so that cost models can use
the data for downstream analysis.
Differential Revision: https://reviews.llvm.org/D143311
There's an early-exit case for regalloc when we don't even get a chance
to ask for an advisor (priority or eviction), and switch the context.
Then, when we want to log the reward for that function (==the one with
the early exit case), we hit the error case where the function's name
doesn't match the last-seen context.
There are a few possible fixes, one would be to just switch context when
output-ing the reward, which would be correct. This patch opts for the
alternative where we check any loging happened in the first place - just
to re-validate that no function would have been regaloc-ed without first
log-ing its reward.
Differential Revision: https://reviews.llvm.org/D143359
This reverts commit a772f0bb920a4957fb94dd8dbe45943809fd0ec3.
The main problem was related to how we handled `dbgs()` from the hosted
compiler. Using explicit `subprocess.communicate`, and not relying on
dbgs() being flushed until the end appears to address the problem.
Also some fixes due to some bots running older pythons, so we can't have
nice things like `int | float` and such.
This reverts commit a7354899d1a235a796b3a2ccb45f6596983c8672.
The way stdout/stderr get routed seems to work differently locally and
on the bots. Investigating.
This hooks up the interactive model runner to the passes that support
ml-based decisions. Because the interface to this runner is the exact
same as the one used during inference, we just reuse the exact same
setup we have for "release mode". This makes "release mode" a misnomer -
and that's something we needed to resolve sooner or later (e.g.
supporting more than one embedded model for the same problem was another
reason to drop that nomenclature). That will happen in a subsequent
change.
To use this evaluator, just enable the pass in (currently) "release"
mode, but also pass the base name for the 2 channel files via the
pass-specific flag.
The 2 files are the responsibilty of the hosting process. The added
tests use a minimal, toy such host, illustrating setup and
communication.
Differential Revision: https://reviews.llvm.org/D143218
This leverages the new logging format in that we don't need to buffer
the training data, we can just write it out.
Differential Revision: https://reviews.llvm.org/D142168
The dependency was due to the log format. This change switches to the
previously-introduced (D139370) "dependency-free" logger instead of the
protobuf-based one.
A subsequent change will clean out the unnecessary abstraction left
behind.
This change drops the logger unittest, we have sufficient test coverage
via lit tests, and a unit test would require adding, unnecesarily, a log
reader (the reader is expected to be python, for the ML side, and there
is a reader for that under Analysis/models, used for tests).
Differential Revision: https://reviews.llvm.org/D141720
We are in the process of retiring LLVM_HAVE_TF_API in favor of
LLVM_HAVE_TFLITE. This patch takes care of the transition in
llvm/test.
Differential Revision: https://reviews.llvm.org/D140133
This is the next step in dropping the dependency on protobuf.
The simple logger produces an output consisting of lines of json
strings. Tensor values - which should constitute the bulk of the data -
are serialized as raw byte buffers. This allows for light-weight reading
of the values.
The next step is to switch the training logic to the new logging format,
following which the protobuf-based logger will be dropped, together with
the training dependency on protobuf.
Subsequent changes will also stop buffering and stream, instead - the
buffering model is just as a convenient point-in-time.
Differential Revision: https://reviews.llvm.org/D139370
Fix a test using invalid MLIR using different VRegs for the tied operands
of ADD64rr, which happened to trigger an assertion after my latest
changes.
Also attempting to adjust the MLRegalloc tests to the adjusted regalloc
(though I don't have a 100% working setup for them even without my
changes)
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information and then produce the log file.
Differential Revision: https://reviews.llvm.org/D133616
This commit adds in two new features to the ML regalloc eviction
analysis that can be used in ML models, a vector of MBB frequencies and
a vector of indicies mapping instructions to their corresponding basic
blocks. This will allow for further experimentation with per-instruction
features and give a lot more flexibility for future experimentation over
how we're extracting MBB frequency data currently.
Reviewed By: mtrofin, jacobhegna
Differential Revision: https://reviews.llvm.org/D134166
This patch adds in instruction based features to the regalloc advisor
gated behind a flag so a user can decide at runtime whether or not they
want to enable the feature. The features are only enabled when LLVM is
compiled in MLGO develpment mode (LLVM_HAVE_TF_API) is set to true.
To extract the instruction features, I'm taking a list of segments from
each LiveInterval and noting the start and end SlotIndices. This list is then
sorted based on the start SlotIndex and I iterate through each SlotIndex
to grab instructions, making sure to check for overlaps. This results in
a vector of opcodes and binary mapping matrix that maps live ranges to the
opcodes of the instructions within that LR.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D131930
This patch introduces the priority analysis and the priority advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D132835
TLite is a lightweight, statically linkable[1], model evaluator, supporting a
subset of what the full tensorflow library does, sufficient for the
types of scenarios we envision having. It is also faster.
We still use saved models as "source of truth" - 'release' mode's AOT
starts from a saved model; and the ML training side operates in terms of
saved models.
Using TFLite solves the following problems compared to using the full TF
C API:
- a compiler-friendly implementation for runtime-loadable (as opposed
to AOT-embedded) models: it's statically linked; it can be built via
cmake;
- solves an issue we had when building the compiler with both AOT and
full TF C API support, whereby, due to a packaging issue on the TF
side, we needed to have the pip package and the TF C API library at
the same version. We have no such constraints now.
The main liability is it supporting a subset of what the full TF
framework does. We do not expect that to cause an issue, but should that
be the case, we can always revert back to using the full framework
(after also figuring out a way to address the problems that motivated
the move to TFLite).
Details:
This change switches the development mode to TFLite. Models are still
expected to be placed in a directory - i.e. the parameters to clang
don't change; what changes is the directory content: we still need
an `output_spec.json` file; but instead of the saved_model protobuf and
the `variables` directory, we now just have one file, `model.tflite`.
The change includes a utility showing how to take a saved model and
convert it to TFLite, which it uses for testing.
The full TF implementation can still be built (not side-by-side). We
intend to remove it shortly, after patching downstream dependencies. The
build behavior, however, prioritizes TFLite - i.e. trying to enable both
full TF C API and TFLite will just pick TFLite.
[1] thanks to @petrhosek's changes to TFLite's cmake support and its deps!
Some generic tests are not supported by the nvptx now. Moreover, they
are no plans to fix the tested features in nvptx. So, suggest to mark
them as UNSUPPORTED
Differential Revision: https://reviews.llvm.org/D123928
These test cases all rely on a default target being specified. Adding
the requirement gets the tests properly skipped when
LLVM_DEFAULT_TARGET_TRIPLE is unset.
Factoring it out so we can subsequently cache it. This should be a NFC,
however, for the float quantities, we see small errors in the least
significant digits. This is because, before, we were summing up one by
one. Now, we sum up results of sums.
This shouldn't matter for ML, and will require rework when we do
quantization (avoiding floats altogether), but meanwhile, it did require
an update to the reference file used for testing.
The patch also bumps the precision of the variables involved in this, to
reduce the error (note they are casted back to float at the end by the
SET macro, since we only work with float and not double in TF)
Differential Revision: https://reviews.llvm.org/D118659
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.
To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.
This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.
The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.
Differential Revision: https://reviews.llvm.org/D117752
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information (currently after the Virtual Register
Rewriter) and then produce the log file.
This patch also introduces the score injection pass, 'Register
Allocation Pass Scoring', which is trivially just logging the score in
development mode.
Differential Revision: https://reviews.llvm.org/D117147
This patch introduces the eviction analysis and the eviction advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.
Differential Revision: https://reviews.llvm.org/D115707