Reverts c992690179eb5de6efe47d5c8f3a23f2302723f2.
The problem is that if there is a sequence "{delete A->B} {delete A->B}
{insert A->B}" the net result is "{delete A->B}", which is not what we
want.
Duplicate successors may happen in cases like switch statements (as
shown in the unit test).
The second problem was that in `invoke` cases, some edges we speculate may get deleted don't, but are also not reachable from the inlined call site's basic block. We just need to check which edges are actually not present anymore.
The fix is to sanitize the list of deletes, just like we do for inserts.
This seems to cause asserts in our builds:
llvm/include/llvm/Support/GenericDomTreeConstruction.h:927:
static void llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<BasicBlock, false>>::DeleteEdge(DomTreeT &, const BatchUpdatePtr, const NodePtr, const NodePtr) [DomTreeT = llvm::DominatorTreeBase<BasicBlock, false>]:
Assertion `!IsSuccessor(To, From) && "Deleted edge still exists in the CFG!"' failed.
and
llvm/lib/Analysis/FunctionPropertiesAnalysis.cpp:390:
DominatorTree &llvm::FunctionPropertiesUpdater::getUpdatedDominatorTree(FunctionAnalysisManager &) const:
Assertion `DT.getNode(BB)' failed.
See comment on the PR.
> We need the dominator tree analysis for loop info analysis, which we need to get features like most nested loop and number of top level loops. Invalidating and recomputing these from scratch after each successful inlining can sometimes lead to lengthy compile times. We don't need to recompute from scratch, though, since we have some boundary information about where the changes to the CFG happen; moreover, for dom tree, the API supports incrementally updating the analysis result.
>
> This change addresses the dom tree part. The loop info is still recomputed from scratch. This does reduce the compile time quite significantly already, though (~5x in a specific case)
>
> The loop info change might be more involved and would follow in a subsequent PR.
This reverts commit a2a5508bdae7d115b6c3ace461beb7a987a44407 and the
follow-up commit cdd11d694a406a98a16d6265168ee2fbe1b6a87c.
We need the dominator tree analysis for loop info analysis, which we need to get features like most nested loop and number of top level loops. Invalidating and recomputing these from scratch after each successful inlining can sometimes lead to lengthy compile times. We don't need to recompute from scratch, though, since we have some boundary information about where the changes to the CFG happen; moreover, for dom tree, the API supports incrementally updating the analysis result.
This change addresses the dom tree part. The loop info is still recomputed from scratch. This does reduce the compile time quite significantly already, though (~5x in a specific case)
The loop info change might be more involved and would follow in a subsequent PR.
With #94815, the nodes belonging to dead functions are no longer
invalidated, but kept around to batch delete at the end of the call
graph walk.
The ML inliner needs to be updated to handle this. This fixes some
asserts getting hit, e.g. https://crbug.com/348376263.
AvailableExternally linkage is interesting because, in ThinLTO cases, it
means the function may get elided if it survives inlining - see
`elim-avail-extern` pass.
This applies to the AOT case where we embed models in the compiler. The
change adds support for multiple models for the same agent, and allows
the user select one via a command line flag. "agent" refers to e.g. the
inline advisor or the register allocator eviction advisor.
To avoid build setup complexity, the support is delegated to the saved
model. Since saved models define computational graphs, we can generate a
composite model (this happens prior to building and embedding it in LLVM
and is not shown in this change) that exposes an extra feature with a
predefined name: `_model_selector`. The model, then, delegates
internally to contained models based on that feature value.
Model selection is expected to happen at model instantiation, there is
no current scenario for switching them afterwards.
If the model doesn't expose such a feature but the user passes one, we
report error.
If the model exposes such a feature but the user doesn't pass one, we
also report an error.
Invalid model selector values are expected to be handled by the saved
model.
Internally, the model uses a pair of uint64 values - the high and low of
the MD5 hash of the name.
A tool composing models would, then, need to:
- expose the extra feature, `_model_selector`, shape (2,), uint64 data
type
- test its value (`tf.cond` or `tf.case` in Tensorflow) against the MD5
hash, in the [high, low] order, of contained models based on a
user-specified name (which the user will then use as flag value to the
compiler)
Agents just need to add a flag to capture the name of a model and pass
it to `ReleaseModeModelRunner` at construction. This can be passed in
all cases without checking - the case where the model is not composite
and we pass an empty name, everything works as before.
This change also factors out the string flags we pass to the
`ReleaseModeModelRunner` for better maintainability (we risk confusing
parameters that are strings otherwise)
This aligns the inlining macros more closely with how the regalloc
macros are defined.
- Explicitly specify the dtype/shape
- Remove separate names for python/C++
- Add docstring for inline cost features
Differential Revision: https://reviews.llvm.org/D149384
This helps training algorithms that may want to sometimes replicate the
default decision. The default decision is presented as an extra feature
called `inlining_default`. It's not normally exported to save
computation time.
This is only available in interactive mode.
Differential Revision: https://reviews.llvm.org/D147794
This reverts commit a772f0bb920a4957fb94dd8dbe45943809fd0ec3.
The main problem was related to how we handled `dbgs()` from the hosted
compiler. Using explicit `subprocess.communicate`, and not relying on
dbgs() being flushed until the end appears to address the problem.
Also some fixes due to some bots running older pythons, so we can't have
nice things like `int | float` and such.
This reverts commit a7354899d1a235a796b3a2ccb45f6596983c8672.
The way stdout/stderr get routed seems to work differently locally and
on the bots. Investigating.
This hooks up the interactive model runner to the passes that support
ml-based decisions. Because the interface to this runner is the exact
same as the one used during inference, we just reuse the exact same
setup we have for "release mode". This makes "release mode" a misnomer -
and that's something we needed to resolve sooner or later (e.g.
supporting more than one embedded model for the same problem was another
reason to drop that nomenclature). That will happen in a subsequent
change.
To use this evaluator, just enable the pass in (currently) "release"
mode, but also pass the base name for the 2 channel files via the
pass-specific flag.
The 2 files are the responsibilty of the hosting process. The added
tests use a minimal, toy such host, illustrating setup and
communication.
Differential Revision: https://reviews.llvm.org/D143218
In `MLInlineAdvisor::getAdviceImpl`, we call `getCachedFPI` twice, once
for the caller, once for the callee, so the second may invalidate the
reference obtained by the first because the underlying implementation of
the cache is a `DenseMap`. `std::map` doesn't have that problem.
We really just need to invalidate loop info and the dominator tree, in
addition to the FunctionPropertiesInfo we were invalidating originally.
Doing more adds unnecessary compile time overhead.
Previously if the inliner split an SCC such that an empty one remained, the MLInlineAdvisor could potentially lose track of the EdgeCount if a subsequent CGSCC pass modified the calls of a function that was initially in the SCC pre-split. Saving the seen nodes in onPassEntry resolves this.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127693
There could be successors that were reached before but now are only
reachable from elsewhere in the CFG.
Suppose the following diamond CFG (lines are arrows pointing down):
A
/ \
B C
\ /
D
There's a call site in C that is inlined. Upon doing that, it turns out
it expands to:
call void @llvm.trap()
unreachable
D isn't reachable from C anymore, but we did discount it when we set up
FunctionPropertiesUpdater, so we need to re-include it here.
The patch also updates loop accounting to use LoopInfo rather than
traverse BBs.
Differential Revision: https://reviews.llvm.org/D127353
Once ForceStop is set to true, we only return positive inlining advice when it is mandatory; There is no need for further node/edge accounting.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127245
Re-computing FunctionPropertiesInfo after each inlining may be very time
consuming: in certain cases, e.g. large caller with lots of callsites,
and when the overall IR doesn't increase (thus not tripping a size bloat
threshold).
This patch addresses this by incrementally updating
FunctionPropertiesInfo.
Differential Revision: https://reviews.llvm.org/D125841
This allows the compiler to support more features than those supported by a
model. The only requirement (development mode only) is that the new
features must be appended at the end of the list of features requested
from the model. The support is transparent to compiler code: for
unsupported features, we provide a valid buffer to copy their values;
it's just that this buffer is disconnected from the model, so insofar
as the model is concerned (AOT or development mode), these features don't
exist. The buffers are allocated at setup - meaning, at steady state,
there is no extra allocation (maintaining the current invariant). These
buffers has 2 roles: one, keep the compiler code simple. Second, allow
logging their values in development mode. The latter allows retraining
a model supporting the larger feature set starting from traces produced
with the old model.
For release mode (AOT-ed models), this decouples compiler evolution from
model evolution, which we want in scenarios where the toolchain is
frequently rebuilt and redeployed: we can first deploy the new features,
and continue working with the older model, until a new model is made
available, which can then be picked up the next time the compiler is built.
Differential Revision: https://reviews.llvm.org/D124565
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.
To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.
This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.
The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.
Differential Revision: https://reviews.llvm.org/D117752
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.
Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.
This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115847
This avoids the InlineAdvisor carrying the responsibility of deleting
Function objects. We use LazyCallGraph::Node objects instead, which are
stable in memory for the duration of the Module-wide performance of CGSCC
passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which
is the case here)
Differential Revision: https://reviews.llvm.org/D116964
This prepares it for the regalloc work. Part of it is making model
evaluation accross 'development' and 'release' scenarios more reusable.
This patch:
- extends support to tensors of any shape (not just scalars, like we had
in the inliner -Oz case). While the tensor shape can be anything, we
assume row-major layout and expose the tensor as a buffer.
- exposes the NoInferenceModelRunner, which we use in the 'development'
mode to keep the evaluation code path consistent and simplify logging,
as we'll want to reuse it in the regalloc case.
Differential Revision: https://reviews.llvm.org/D115306
This avoids unnecessary re-calculation of module-wide features in the
MLInlineAdvisor. In cases where function passes don't invalidate
functions (and, thus, don't invalidate the module), but we re-process a
CGSCC, we currently refreshed module features unnecessarily. The
overhead of fetching cached results (albeit they weren't themselves
invalidated) was noticeable in certain modules' compilations.
We don't want to just invalidate the advisor object, though, via the
analysis manager, because we'd then need to re-create expensive state
(like the model evaluator in the ML 'development' mode).
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D113644
This change yields an additional 2% size reduction on an internal search
binary, and an additional 0.5% size reduction on fuchsia.
Differential Revision: https://reviews.llvm.org/D104751
This reverts commit d97f776be5f8cd3cd446fe73827cd355f6bab4e1.
The original problem was due to build failures in shared lib builds. D95079
moved ImportedFunctionsInliningStatistics under Analysis, unblocking
this.
When using 2 InlinePass instances in the same CGSCC - one for other
mandatory inlinings, the other for the heuristic-driven ones - the order
in which the ImportedFunctionStats would be output-ed would depend on
the destruction order of the inline passes, which is not deterministic.
This patch moves the ImportedFunctionStats responsibility to the
InlineAdvisor to address this problem.
Differential Revision: https://reviews.llvm.org/D94982
Expanding from D94808 - we ensure the same InlineAdvisor is used by both
InlinerPass instances. The notion of mandatory inlining is moved into
the core InlineAdvisor: advisors anyway have to handle that case, so
this change also factors out that a bit better.
Differential Revision: https://reviews.llvm.org/D94825
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.
Note that this patch does not yet enable much in terms of post-always
inline function optimization.
Differential Revision: https://reviews.llvm.org/D91567
(This reverts commit a5e0194709c40212694370e0ea789a1ca14548b5, and
corrects author).
Rename the pass to be able to extend it to function properties other than inliner features.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D82044
Rename the pass to be able to extend it to function properties other than inliner features.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D82044
Outside of compiler-rt (where it's arguably an anti-pattern too),
LLVM tries to keep its build files as simple as possible. See e.g.
llvm/docs/SupportLibrary.rst, "Code Organization".
Differential Revision: https://reviews.llvm.org/D84243