79 Commits

Author SHA1 Message Date
S. VenkataKeerthy
dcf8ec9218
Reland "[MLGO][IR2Vec] Integrating IR2Vec with MLInliner (#143479)" (#145664)
Relanding #143479 after fixes. 

Removed `NumberOfFeatures` from the `FeatureIndex` enum as the number of features used depends on whether IR2Vec embeddings are used.
2025-06-30 22:37:39 +02:00
S. VenkataKeerthy
d37325ea95
Revert "[MLGO][IR2Vec] Integrating IR2Vec with MLInliner (#143479)" (#145418)
This reverts commit af2c06ecd610735dfa5d236c6d5f109e4f2334e6 as it
causes failure of lit test (Transforms/Inline/ML/interactive-mode.ll)
2025-06-24 00:48:16 +02:00
S. VenkataKeerthy
af2c06ecd6
[MLGO][IR2Vec] Integrating IR2Vec with MLInliner (#143479)
Changes to use Symbolic embeddings in MLInliner. 

(Fixes #141836, Tracking issue - #141817)
2025-06-23 14:07:45 -07:00
vporpo
08dda4dcbf
[Analysis][EphemeralValuesCache][InlineCost] Ephemeral values caching for the CallAnalyzer (#130210)
This patch does two things:

1. It implements an ephemeral values cache analysis pass that collects the ephemeral values of a function and caches them for fast lookups. The collection of the ephemeral values is done lazily when the user calls `EphemeralValuesCache::ephValues()`.

2. It adds caching of ephemeral values using the `EphemeralValuesCache` to speed up `CallAnalyzer::analyze()`. Without caching this can take a long time to run in cases where the function contains a large number of `@llvm.assume()` calls and a large number of callsites. The time is spent in `collectEphemeralvalues()`.
2025-03-19 18:18:45 -07:00
Jonas Hahnfeld
44ff94e99e
[Diagnostics] Return rvalue reference from temporary argument (#127400)
This fixes compilation issues with GCC and C++23:
```
error: cannot bind non-const lvalue reference of type
'llvm::OptimizationRemarkMissed&' to an rvalue of type
'llvm::OptimizationRemarkMissed'
```

Closes #105778
2025-03-13 08:07:07 +07:00
Michele Scandale
ab4253f6df
[Analysis] Remove global state from PluginInline{Advisor,Order}Analysis. (#114615)
The plugin analysis for `InlineAdvisor` and `InlineOrder` currently
relies on shared global state to keep track if the analysis is
available.
This causes issues when pipelines using plugins and pipelines not using
plugins are run in the same process.
The shared global state can be easily replaced by checking in the given
instance of `ModuleAnalysisManager` if the plugin analysis has been
registered.
2024-11-18 10:24:09 -08:00
Shilei Tian
e34e27f198
[TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus via getInliningLastCallToStaticBonus (#111311)
Currently we will not be able to inline a large function even if it only
has one live use because the inline cost is still very high after
applying `LastCallToStaticBonus`, which is a constant. This could
significantly impact the performance because CSR spill is very
expensive.

This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to
allow targets to customize this value.

Fixes SWDEV-471398.
2024-10-11 10:19:54 -04:00
Nikita Popov
4169338e75
[IR] Don't include Module.h in Analysis.h (NFC) (#97023)
Replace it with a forward declaration instead. Analysis.h is pulled in
by all passes, but not all passes need to access the module.
2024-06-28 14:30:47 +02:00
Mohammed Keyvanzadeh
7b57a1b401
[llvm] format and terminate namespaces with closing comment (#94917)
Namespaces are terminated with a closing comment in the majority of the
codebase so do the same here for consistency. Also format code within
some namespaces to make clang-format happy.
2024-06-21 23:50:53 +03:30
Elliot Goodrich
b0abd4893f [llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
2023-06-25 15:42:22 +01:00
Mircea Trofin
ab2e7666c2 [mlgo][inl] Interactive mode: optionally tell the default decision
This helps training algorithms that may want to sometimes replicate the
default decision. The default decision is presented as an extra feature
called `inlining_default`. It's not normally exported to save
computation time.

This is only available in interactive mode.

Differential Revision: https://reviews.llvm.org/D147794
2023-04-10 12:20:09 -07:00
Mircea Trofin
5fd51fcba6 Reland "[mlgo] Hook up the interactive runner to the mlgo-ed passes"
This reverts commit a772f0bb920a4957fb94dd8dbe45943809fd0ec3.

The main problem was related to how we handled `dbgs()` from the hosted
compiler. Using explicit `subprocess.communicate`, and not relying on
dbgs() being flushed until the end appears to address the problem.

Also some fixes due to some bots running older pythons, so we can't have
nice things like `int | float` and such.
2023-02-03 17:54:42 -08:00
Mircea Trofin
a772f0bb92 Revert "[mlgo] Hook up the interactive runner to the mlgo-ed passes"
This reverts commit a7354899d1a235a796b3a2ccb45f6596983c8672.

The way stdout/stderr get routed seems to work differently locally and
on the bots. Investigating.
2023-02-03 16:34:31 -08:00
Mircea Trofin
a7354899d1 [mlgo] Hook up the interactive runner to the mlgo-ed passes
This hooks up the interactive model runner to the passes that support
ml-based decisions. Because the interface to this runner is the exact
same as the one used during inference, we just reuse the exact same
setup we have for "release mode". This makes "release mode" a misnomer -
and that's something we needed to resolve sooner or later (e.g.
supporting more than one embedded model for the same problem was another
reason to drop that nomenclature). That will happen in a subsequent
change.

To use this evaluator, just enable the pass in (currently) "release"
mode, but also pass the base name for the 2 channel files via the
pass-specific flag.

The 2 files are the responsibilty of the hosting process. The added
tests use a minimal, toy such host, illustrating setup and
communication.

Differential Revision: https://reviews.llvm.org/D143218
2023-02-03 16:22:57 -08:00
ibricchi
07af0e2d3e Reapply "[InlineAdvisor] Allow loading advisors as plugins"
This reverts commit 8d22a63e2c8b4931113ca9d1ee8b17f7ff453e81.

Fix was missing dependency.
2022-12-17 10:35:14 -08:00
Mircea Trofin
8d22a63e2c Revert "[InlineAdvisor] Allow loading advisors as plugins"
This reverts commit a00aaf2b1317fbc224dc6606ef7c2a10d617f28f.

Example failures:
    https://lab.llvm.org/buildbot#builders/68/builds/44933
    https://lab.llvm.org/buildbot#builders/230/builds/6938
2022-12-16 16:10:22 -08:00
ibricchi
a00aaf2b13 [InlineAdvisor] Allow loading advisors as plugins
Adds the ability to load InlineAdvisors as plugins. This allows developing and distributing inlining heuristics outside of tree.

The PluginInlineAdvisorAnalysis class serves as the entry point for dynamic advisors. Plugins must register instances of this class to provide their own InliningAdvisor.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D139644
2022-12-16 16:00:37 -08:00
Fangrui Song
d4b6fcb32e [Analysis] llvm::Optional => std::optional 2022-12-14 07:32:24 +00:00
Kazu Hirata
edc83a15b4 [mlgo] Use LLVM_HAVE_TFLITE instead of LLVM_HAVE_TF_API in C++ code (NFC)
We use LLVM_HAVE_TFLITE as the key to enable the mlgo work these days,
and LLVM_HAVE_TF_API is defined whenever LLVM_HAVE_TF_API is defined.

I'm posting this patch because it's purely mechanical.

I'll post a follow-up patch to remove LLVM_HAVE_TF_API in non-C++
files, and that will not be as mechanical as this one.

Differential Revision: https://reviews.llvm.org/D139863
2022-12-12 11:28:40 -08:00
Kazu Hirata
9f252e5567 [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:31:17 -08:00
Kazu Hirata
19aff0f37d [Analysis] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 19:43:04 -08:00
Fangrui Song
fa71c16455 [Inliner] Move cl::opt inside llvm:: 2022-11-24 20:31:13 -08:00
Aiden Grossman
65abca4611 [MLGO] Fix InlineAdvisor and ModelUnderTrainingRunner after hasValue removal
Recentlyin 4b6b248, llvm::Optional's hasValue method was removed as
described in
https://discourse.llvm.org/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor
This breaks InlineAdvisor and ModelUnderTrainingRunner. This patch fixes
them by changing the method to has_value, which hasValue was evaluating
to before.

Differential Revision: https://reviews.llvm.org/D138635
2022-11-24 03:48:34 +00:00
Kazu Hirata
30d3f56e33 [Analysis] clang-format InlineAdvisor.cpp (NFC) 2022-07-13 13:38:50 -07:00
Mingming Liu
e0d069598b [Inline] Annotate inline pass name with link phase information for analysis.
The annotation is flag gated; flag is turned off by default.

Differential Revision: https://reviews.llvm.org/D125495
2022-06-24 10:06:43 -07:00
Mingming Liu
bc856eb3fc [SampleProfile][Inline] Annotate sample profile inline remarks with link phase (prelink/postlink) information.
Differential Revision: https://reviews.llvm.org/D126833
2022-06-22 17:00:53 -07:00
Kazu Hirata
7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata
e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Kazu Hirata
129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Jin Xin Ng
9f2b873a7d [inliner] Add per-SCC-pass InlineAdvisor printing option
Adds option to print the contents of the Inline Advisor after each SCC Inliner pass

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127689
2022-06-14 08:06:52 -07:00
Mingming Liu
8601f269f1 [Inline][Remark][NFC] Optionally provide inline context to inline
advisor.

This patch has no functional change, and merely a preparation patch for
main functional change. The motivating use case is to annotate inline
remark pass name with context information (e.g. prelink or postlink,
CGSCC or always-inliner), see D125495 for more details.

Differential Revision: https://reviews.llvm.org/D126824
2022-06-02 13:14:30 -07:00
Kazu Hirata
9aa52ba574 [Analysis] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC) 2022-03-20 18:21:40 -07:00
serge-sans-paille
71c3a5519d Cleanup includes: LLVMAnalysis
Number of lines output by preprocessor:
before: 1065940348
after:  1065307662

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120659
2022-03-01 18:01:54 +01:00
Mircea Trofin
f29256a64a [MLGO] Improved support for AOT cross-targeting scenarios
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.

To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.

This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.

The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.

Differential Revision: https://reviews.llvm.org/D117752
2022-01-20 07:05:39 -08:00
Mircea Trofin
3e8553aab4 [mlgo][inline] Improve global state tracking
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.

Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.

This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D115847
2022-01-18 17:45:34 +00:00
Mircea Trofin
248d55af3e [NFC][MLGO] Use LazyCallGraph::Node to track functions.
This avoids the InlineAdvisor carrying the responsibility of deleting
Function objects. We use LazyCallGraph::Node objects instead, which are
stable in memory for the duration of the Module-wide performance of CGSCC
passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which
is the case here)

Differential Revision: https://reviews.llvm.org/D116964
2022-01-11 19:23:47 -08:00
Nikita Popov
a8c2ba105d [Inline] Disable deferred inlining
After the switch to the new pass manager, we have observed multiple
instances of catastrophic inlining, where the inliner produces huge
functions with many hundreds of thousands of instructions from small
input IR. We were forced to back out the switch to the new pass
manager for this reason. This patch fixes at least one of the root
cause issues.

LLVM uses a bottom-up inliner, and the fact that functions are processed
bottom-up is not just a question of optimality -- it is an imporant
requirement to prevent runaway inlining. The premise of the current
inlining approach and cost model is that after all calls inside a function
have been inlined, it may get large enough that inlining it into its
callers is no longer considered profitable. This safeguard does not
exist if inlining doesn't happen bottom-up, as inlining the callees,
and their callees, and their callees etc. will always seem individually
profitable, and the inliner can easily flatten the whole call tree.

There are instances where we necessarily have to deviate from bottom-up
inlining: When inlining in an SCC there is no natural "bottom", so
inlining effectively happens top-down. This requires special care,
and the inliner avoids exponential blowup by ensuring that functions
in the SCC grow in a balanced way and will eventually hit the threshold.

However, there is one instance where the inlining advisor explicitly
violates the bottom-up principle: Deferred inlining tries to "defer"
inlining a call if it determines that inlining the caller into all
its call-sites would be more profitable. Something very important to
understand about deferred inlining is that it doesn't make one inlining
choice in place of another -- it effectively chooses to do both. If we
have a call chain A -> B -> C and cost modelling tells us that inlining
B -> C is profitable, but we defer this and instead inline A -> B first,
then we'll now have a call A -> C, and the cost model will (a few special
cases notwithstanding) still tell us that this is profitable. So the end
result is that we inlined *both* B and C, even though under the usual
cost model function B would have been too large to further inline after
C has been integrated into it.

Because deferred inlining violates the bottom-up invariant of the inliner,
it can result in exponential inlining. The exponential-deferred-inlining.ll
test case illustrates this on a simple example (see
https://gist.github.com/nikic/1262b5f7d27278e1b34a190ae10947f5 for a
much more catastrophic case with about 5000x size blowup). If the call
chain A -> B -> C is not a chain but a tree of calls, then we end up
deferring inlining across the tree and end up flattening everything into
the root node.

This patch proposes to address this by disabling deferred inlining
entirely (currently still behind an option). Beyond the issue of
exponential inlining, I don't think that the whole concept makes sense,
at least as long as deferred inlining still ends up inlining both call
edges.

I believe the motivation for having deferred inlining in the first place
is that you might have a small wrapper function with local linkage that
could be eliminated if inlined. This would automatically happen if there
was a single caller, due to the large "last call to local" bonus. However,
this bonus is not extended if there are multiple callers, even if we
would eventually end up inlining into all of them (if the bonus were
extended).

Now, unlike the normal inlining cost model, the deferred inlining cost
model does look at all callers, and will extend the "last call to local"
bonus if it determines that we could inline all of them as long as we
defer the current inlining decision. This makes very little sense.
The "last call to local" bonus doesn't really cost model anything.
It's basically an "infinite" bonus that ensures we always inline the
last call to a local. The fact that it's not literally infinite just
prevents inlining of huge functions, which can easily result in
scalability issues. I very much doubt that it was an intentional
cost-modelling choice to say that getting rid of a small local function
is worth adding 15000 instructions elsewhere, yet this is exactly how
this value is getting used here.

The main alternative I see to complete removal is to change deferred
inlining to an actual either/or decision. That is, to mark deferred
calls as noinline so we're actually trading off one inlining decision
against another, and not just adding a side-channel to the cost model
to do both.

Apart from fixing the catastrophic inlining case, the effect on rustc
is a modest compile-time improvement on average (up to 8% for a
parsing-type crate, where tree-like calls are expected) and pretty
neutral where run-time performance is concerned (mix of small wins
and losses, usually in the sub-1% category).

Differential Revision: https://reviews.llvm.org/D115497
2021-12-16 09:59:50 +01:00
Nikita Popov
7abf299fed [InlineAdvisor] Add option to control deferred inlining (NFC)
This change is split out from D115497 to add the option
independently from the switch of the default value.
2021-12-14 15:46:11 +01:00
Nikita Popov
3beafecedf [InlineAdvisor] Remove outdated comment (NFC)
This just returns None nowadays, so this comment doesn't apply
anymore.
2021-12-09 15:11:56 +01:00
modimo
5caad9b5d3 [InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner
Adds the following switches:

1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are:

 1. Original: defers to original advisor
 2. AlwaysInline: inline all sites not in replay
 3. NeverInline: inline no sites not in replay

2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are:

  1. Line
  2. LineColumn
  3. LineDiscriminator
  4. LineColumnDiscriminator

Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks.

All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using:
1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function
2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system
3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang

An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another.

Updated various tests to cover the new switches and negative remarks

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D112040
2021-10-29 12:32:03 -07:00
modimo
313c657fce [InlineAdvisor] Add -inline-replay-scope=<Function|Module> to control replay scope
The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks.

This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself.

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D110658
2021-10-18 13:08:39 -07:00
Mircea Trofin
7d541eb4d4 [inliner] Mandatory inlining decisions produce remarks
This also removes the need to disable the mandatory inlining phase in
tests.

In a departure from the previous remark, we don't output a 'cost' in
this case, because there's no such thing. We just report that inlining
happened because of the attribute.

Differential Revision: https://reviews.llvm.org/D110891
2021-10-05 14:01:25 -07:00
Fangrui Song
0bb767e7db [InlineAdvisor] Use one single quote 2021-09-23 12:16:15 -07:00
Arthur Eubanks
3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Fangrui Song
76093b1739 [InlineAdvisor] Add single quotes around caller/callee names
Clang diagnostics refer to identifier names in quotes.
This patch makes inline remarks conform to the convention.
New behavior:

```
% clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c
a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline]
int bar(int a) { return foo(a); }
                        ^
```

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D107791
2021-08-10 11:51:31 -07:00
Mircea Trofin
935dea2cb2 [MLGO] fix silly LLVM_DEBUG misuse 2021-07-27 15:10:28 -07:00
Mircea Trofin
eb76ca573d [NFC][MLGO] Debug messages for what inline advisor is selected
We already have an indication (error) if the desired inline advisor
cannot be enabled, but we don't have a positive indication. Added
LLVM_DEBUG messages for the latter.
2021-07-27 15:05:39 -07:00
serge-sans-paille
1ce2b58454 [NFC] Use llvm::raw_string_ostream instead of std::stringstream
That's more efficient and we don't loose any valuable feature when doing so.
2021-03-12 18:43:59 +01:00
modimo
ce7f9cdb50 [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks
This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only.

The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining.

In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places.

Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay.

Testing:
ninja check-llvm
newly added test correctly replays CGSCC inlining decisions

Reviewed By: mtrofin, wenlei

Differential Revision: https://reviews.llvm.org/D94334
2021-01-25 15:38:57 -08:00
Mircea Trofin
ccec2cf1d9 Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor"
This reverts commit d97f776be5f8cd3cd446fe73827cd355f6bab4e1.

The original problem was due to build failures in shared lib builds. D95079
moved ImportedFunctionsInliningStatistics under Analysis, unblocking
this.
2021-01-20 13:33:43 -08:00