This helps training algorithms that may want to sometimes replicate the
default decision. The default decision is presented as an extra feature
called `inlining_default`. It's not normally exported to save
computation time.
This is only available in interactive mode.
Differential Revision: https://reviews.llvm.org/D147794
This reverts commit a772f0bb920a4957fb94dd8dbe45943809fd0ec3.
The main problem was related to how we handled `dbgs()` from the hosted
compiler. Using explicit `subprocess.communicate`, and not relying on
dbgs() being flushed until the end appears to address the problem.
Also some fixes due to some bots running older pythons, so we can't have
nice things like `int | float` and such.
This reverts commit a7354899d1a235a796b3a2ccb45f6596983c8672.
The way stdout/stderr get routed seems to work differently locally and
on the bots. Investigating.
This hooks up the interactive model runner to the passes that support
ml-based decisions. Because the interface to this runner is the exact
same as the one used during inference, we just reuse the exact same
setup we have for "release mode". This makes "release mode" a misnomer -
and that's something we needed to resolve sooner or later (e.g.
supporting more than one embedded model for the same problem was another
reason to drop that nomenclature). That will happen in a subsequent
change.
To use this evaluator, just enable the pass in (currently) "release"
mode, but also pass the base name for the 2 channel files via the
pass-specific flag.
The 2 files are the responsibilty of the hosting process. The added
tests use a minimal, toy such host, illustrating setup and
communication.
Differential Revision: https://reviews.llvm.org/D143218
In `MLInlineAdvisor::getAdviceImpl`, we call `getCachedFPI` twice, once
for the caller, once for the callee, so the second may invalidate the
reference obtained by the first because the underlying implementation of
the cache is a `DenseMap`. `std::map` doesn't have that problem.
We really just need to invalidate loop info and the dominator tree, in
addition to the FunctionPropertiesInfo we were invalidating originally.
Doing more adds unnecessary compile time overhead.
Previously if the inliner split an SCC such that an empty one remained, the MLInlineAdvisor could potentially lose track of the EdgeCount if a subsequent CGSCC pass modified the calls of a function that was initially in the SCC pre-split. Saving the seen nodes in onPassEntry resolves this.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127693
There could be successors that were reached before but now are only
reachable from elsewhere in the CFG.
Suppose the following diamond CFG (lines are arrows pointing down):
A
/ \
B C
\ /
D
There's a call site in C that is inlined. Upon doing that, it turns out
it expands to:
call void @llvm.trap()
unreachable
D isn't reachable from C anymore, but we did discount it when we set up
FunctionPropertiesUpdater, so we need to re-include it here.
The patch also updates loop accounting to use LoopInfo rather than
traverse BBs.
Differential Revision: https://reviews.llvm.org/D127353
Once ForceStop is set to true, we only return positive inlining advice when it is mandatory; There is no need for further node/edge accounting.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127245
Re-computing FunctionPropertiesInfo after each inlining may be very time
consuming: in certain cases, e.g. large caller with lots of callsites,
and when the overall IR doesn't increase (thus not tripping a size bloat
threshold).
This patch addresses this by incrementally updating
FunctionPropertiesInfo.
Differential Revision: https://reviews.llvm.org/D125841
This allows the compiler to support more features than those supported by a
model. The only requirement (development mode only) is that the new
features must be appended at the end of the list of features requested
from the model. The support is transparent to compiler code: for
unsupported features, we provide a valid buffer to copy their values;
it's just that this buffer is disconnected from the model, so insofar
as the model is concerned (AOT or development mode), these features don't
exist. The buffers are allocated at setup - meaning, at steady state,
there is no extra allocation (maintaining the current invariant). These
buffers has 2 roles: one, keep the compiler code simple. Second, allow
logging their values in development mode. The latter allows retraining
a model supporting the larger feature set starting from traces produced
with the old model.
For release mode (AOT-ed models), this decouples compiler evolution from
model evolution, which we want in scenarios where the toolchain is
frequently rebuilt and redeployed: we can first deploy the new features,
and continue working with the older model, until a new model is made
available, which can then be picked up the next time the compiler is built.
Differential Revision: https://reviews.llvm.org/D124565
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.
To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.
This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.
The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.
Differential Revision: https://reviews.llvm.org/D117752
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.
Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.
This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115847
This avoids the InlineAdvisor carrying the responsibility of deleting
Function objects. We use LazyCallGraph::Node objects instead, which are
stable in memory for the duration of the Module-wide performance of CGSCC
passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which
is the case here)
Differential Revision: https://reviews.llvm.org/D116964
This prepares it for the regalloc work. Part of it is making model
evaluation accross 'development' and 'release' scenarios more reusable.
This patch:
- extends support to tensors of any shape (not just scalars, like we had
in the inliner -Oz case). While the tensor shape can be anything, we
assume row-major layout and expose the tensor as a buffer.
- exposes the NoInferenceModelRunner, which we use in the 'development'
mode to keep the evaluation code path consistent and simplify logging,
as we'll want to reuse it in the regalloc case.
Differential Revision: https://reviews.llvm.org/D115306
This avoids unnecessary re-calculation of module-wide features in the
MLInlineAdvisor. In cases where function passes don't invalidate
functions (and, thus, don't invalidate the module), but we re-process a
CGSCC, we currently refreshed module features unnecessarily. The
overhead of fetching cached results (albeit they weren't themselves
invalidated) was noticeable in certain modules' compilations.
We don't want to just invalidate the advisor object, though, via the
analysis manager, because we'd then need to re-create expensive state
(like the model evaluator in the ML 'development' mode).
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D113644
This change yields an additional 2% size reduction on an internal search
binary, and an additional 0.5% size reduction on fuchsia.
Differential Revision: https://reviews.llvm.org/D104751
This reverts commit d97f776be5f8cd3cd446fe73827cd355f6bab4e1.
The original problem was due to build failures in shared lib builds. D95079
moved ImportedFunctionsInliningStatistics under Analysis, unblocking
this.
When using 2 InlinePass instances in the same CGSCC - one for other
mandatory inlinings, the other for the heuristic-driven ones - the order
in which the ImportedFunctionStats would be output-ed would depend on
the destruction order of the inline passes, which is not deterministic.
This patch moves the ImportedFunctionStats responsibility to the
InlineAdvisor to address this problem.
Differential Revision: https://reviews.llvm.org/D94982
Expanding from D94808 - we ensure the same InlineAdvisor is used by both
InlinerPass instances. The notion of mandatory inlining is moved into
the core InlineAdvisor: advisors anyway have to handle that case, so
this change also factors out that a bit better.
Differential Revision: https://reviews.llvm.org/D94825
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.
Note that this patch does not yet enable much in terms of post-always
inline function optimization.
Differential Revision: https://reviews.llvm.org/D91567
(This reverts commit a5e0194709c40212694370e0ea789a1ca14548b5, and
corrects author).
Rename the pass to be able to extend it to function properties other than inliner features.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D82044
Rename the pass to be able to extend it to function properties other than inliner features.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D82044
Outside of compiler-rt (where it's arguably an anti-pattern too),
LLVM tries to keep its build files as simple as possible. See e.g.
llvm/docs/SupportLibrary.rst, "Code Organization".
Differential Revision: https://reviews.llvm.org/D84243