Currently BlockAddresses store both the Function and the BasicBlock they
reference, and the BlockAddress is part of the use list of both the
Function and BasicBlock.
This is quite awkward, because this is not really a use of the function
itself (and walks of function uses generally skip block addresses for
that reason). This also has weird implications on function RAUW (as that
will replace the function in block addresses in a way that generally
doesn't make sense), and causes other peculiar issues, like the ability
to have multiple block addresses for one block (with different
functions).
Instead, I believe it makes more sense to specify only the basic block
and let the function be implied by the BB parent. This does mean that we
may have block addresses without a function (if the BB is not inserted),
but this should only happen during IR construction.
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
This patch changes the type of VisitedMap to DenseSet from DenseMap
because the value side of the map is always "true".
Technically:
if (VisitedMap[*SI])
inserts "false" as a value, but the value is immediately overridden
with:
VisitedMap[*SI] = true;
While we are at it, this patch removes the repeated hash lookups
around the "if" statement.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
In the cost model of partial inlining, cost for intrinsics will be applied. However, some intrinsics for vector have invalid cost, which is not allowed for partial inlining. Instead of assertion, we directly do not do partial inlining in this circumstance to avoid compiling errors.
This is a stacked PR following on from #68415
This patch has two purposes:
(1) It tries to make inlining more likely when it can avoid a
streaming-mode change.
(2) It avoids inlining when inlining causes more streaming-mode changes.
An example of (1) is:
```
void streaming_compatible_bar(void);
void foo(void) __arm_streaming {
/* other code */
streaming_compatible_bar();
/* other code */
}
void f(void) {
foo(); // expensive streaming mode change
}
->
void f(void) {
/* other code */
streaming_compatible_bar();
/* other code */
}
```
where it wouldn't have inlined the function when foo would be a
non-streaming function.
An example of (2) is:
```
void streaming_bar(void) __arm_streaming;
void foo(void) __arm_streaming {
streaming_bar();
streaming_bar();
}
void f(void) {
foo(); // expensive streaming mode change
}
-> (do not inline into)
void f(void) {
streaming_bar(); // these are now two expensive streaming mode changes
streaming_bar();
}```
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.
At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152537
Partial Inlining identifies basic blocks that can be outlined into a
function. It is possible that an unreachable basic block is marked for
outlining. During costing of the outlined region, such unreachable basic
blocks are included as well. However, the CodeExtractor eliminates such
unreachable basic blocks and emits outlined function without them.
Thus, during costing of the outlined function, it is possible that the
cost of the outlined function comes out to be lesser than the cost of
outlined region, which triggers an assert.
Assertion `OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined
function cost should be no less than the outlined region"' failed.
This patch adds code to eliminate unreachable blocks from the function
body before passing it on to be inlined. It also adds a test that checks
for behaviour of costing in case of unreachable basic blocks.
Discussion: https://discourse.llvm.org/t/incorrect-costing-in-partialinliner-if-ir-has-unreachable-basic-blocks/70163
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D149130
The legacy pass is only used in AMDGPU codegen, which doesn't care about running it in call graph order (it actually has to work around that fact).
Make the legacy pass a module pass and share code with the new pass.
This allows us to remove the legacy inliner infrastructure.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D146446
It seems unnecessarily limiting to disallow recursive partial
inlining, and there are clearly cases where it can benefit
code by avoiding a function call and potentially enabling
other transformations like dead argument elimination
in cases where an argument is only used prior to the early-out
test at the top of the function.
The pass already properly rewrites the recursive calls
within the body of the freshly cloned function, so the only
change here is removing the bail-out when recursion is
detected.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D136383
With the recent addition of new parameter MergeAttributes (D134117),
callers need to specify several default parameters before getting to
specify the new parameter.
This patch reorders the parameters so that callers do not have to
specify as many default parameters.
Differential Revision: https://reviews.llvm.org/D134125
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
Fixing ICE when partial inline tries to deal with blockaddress uses of function which is typical for asm-goto/callbr. We ran into this with PGO multi-region partial inline.
Differential Revision: https://reviews.llvm.org/D117509
This reverts commit fd4808887ee47f3ec8a030e9211169ef4fb094c3.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
ProfileCount could model invalid values, but a user had no indication
that the getCount method could return bogus data. Optional<ProfileCount>
addresses that, because the user must dereference the optional. In
addition, the patch removes concept duplication.
Differential Revision: https://reviews.llvm.org/D113839
This patch enhances computeOutliningColdRegionsInfo() to allow it to
consider regions containing a single basic block and a single
predecessor as candidate for partial inlining.
Reviewed By: fhann
Differential Revision: https://reviews.llvm.org/D89911
Make member function const where possible, use LLVM_DEBUG to print debug traces
rather than a custom option, pass by reference to avoid null checking, ...
Reviewed By: fhann
Differential Revision: https://reviews.llvm.org/D89895
https://bugs.llvm.org/show_bug.cgi?id=45932
assert(OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined function cost should be no less than the outlined region") getting triggered in computeBBInlineCost.
Intrinsics like "assume" are considered regular function calls while computing costs.
This patch enables computeBBInlineCost to queries TTI for intrinsic call cost.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87132
This reverts commit 454de99a6fec705e76ed7743bf538f7a77296f59.
The problem was that one of the ctor arguments of CallAnalyzer was left
to be const std::function<>&. A function_ref was passed for it, and then
the ctor stored the value in a function_ref field. So a std::function<>
would be created as a temporary, and not survive past the ctor
invocation, while the field would.
Tested locally by following https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Original Differential Revision: https://reviews.llvm.org/D79917