This is a stacked PR following on from #68415
This patch has two purposes:
(1) It tries to make inlining more likely when it can avoid a
streaming-mode change.
(2) It avoids inlining when inlining causes more streaming-mode changes.
An example of (1) is:
```
void streaming_compatible_bar(void);
void foo(void) __arm_streaming {
/* other code */
streaming_compatible_bar();
/* other code */
}
void f(void) {
foo(); // expensive streaming mode change
}
->
void f(void) {
/* other code */
streaming_compatible_bar();
/* other code */
}
```
where it wouldn't have inlined the function when foo would be a
non-streaming function.
An example of (2) is:
```
void streaming_bar(void) __arm_streaming;
void foo(void) __arm_streaming {
streaming_bar();
streaming_bar();
}
void f(void) {
foo(); // expensive streaming mode change
}
-> (do not inline into)
void f(void) {
streaming_bar(); // these are now two expensive streaming mode changes
streaming_bar();
}```
- The motivation is to expose tunable knobs to control the aggressiveness of inlines for different backend (e.g., machines with different icache size, and workload with different icache/itlb PMU counters). Tuning inline aggressiveness shows a small (~+0.3%) but stable improvement on workload/hardware that is more frontend bound.
- Both multipliers could be overridden from command line.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D153154
Multiplying raw block frequency with an integer carries a high risk
of overflow.
- Add `BlockFrequency::mul` return an std::optional with the product
or `nullopt` to indicate an overflow.
- Fix two instances where overflow was likely.
On AMDGPU, alloca instructions have penalty that can
be avoided when SROA is applied after inlining.
This patch introduces the default implementation of
TargetTransformInfo::getCallerAllocaCost.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D149740
When we inline a callee into a caller, the compiler needs to make sure
that the caller supports a superset of instruction sets that the
callee is allowed to use. Normally, we check for the compatibility of
target features via functionsHaveCompatibleAttributes, but that
happens after we decide to honor call site attribute
Attribute::AlwaysInline. If the caller contains a call marked with
Attribute::AlwaysInline, which can happen with
__attribute__((flatten)) placed on the caller, the caller could end up
with code that cannot be lowered to assembly code.
This patch fixes the problem by checking the target feature
compatibility before we honor Attribute::AlwaysInline.
Fixes https://github.com/llvm/llvm-project/issues/62664
Differential Revision: https://reviews.llvm.org/D150396
!make.implicit metadata attached to branch means it will very likely
be eliminated (together with associated cmp instruction).
Reviewed By: apilipenko
Differential Revision: https://reviews.llvm.org/D149747
This aligns the inlining macros more closely with how the regalloc
macros are defined.
- Explicitly specify the dtype/shape
- Remove separate names for python/C++
- Add docstring for inline cost features
Differential Revision: https://reviews.llvm.org/D149384
Instead use ConstantFoldSelectInstruction(), which will return
nullptr if it cannot be folded and a constant expression would
be produced instead.
In preparation for removing select constant expressions.
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This commit fixes LLVMAnalysis and its dependencies.
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
InlineCostCallAnalyzer encourages inlining of the last call to the
static function by subtracting LastCallToStaticBonus from Cost.
This patch introduces getStaticBonusApplied to make available the
amount of LastCallToStaticBonus applied.
The intent is to allow the module inliner to determine whether
inlining a given call site is expected to reduce the caller size with
an expression like:
IC.getCost() + IC.getStaticBonusApplied() < 0
This patch does not add a use of getStaticBonus yet.
Differential Revision: https://reviews.llvm.org/D134373
We check to see if a given CallBase is a sole call to a local function
at multiple places in InlineCost.cpp. This patch factors out the
common code.
Differential Revision: https://reviews.llvm.org/D134114
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
The value of the attribute is a size in bytes. It has the effect of
suppressing inlining of functions whose stacksizes exceed the given value.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D129904
For recursive callers, we want to be conservative when inlining callees with large stack size. We currently have a limit `InlineConstants::TotalAllocaSizeRecursiveCaller`, but that is hard coded.
We found the current limit insufficient to suppress problematic inlining that bloats stack size for deep recursion. This change adds a switch to make the limit tunable as a mitigation.
Differential Revision: https://reviews.llvm.org/D129411
Use a common ConstantFoldInstOperands-based constant folding
implementation, instead of specifying the folding function for
each function individually. Going through the generic handling
doesn't appear to have any significant compile-time impact.
As the test change shows, this is not NFC, because we now use
DataLayout-aware constant folding, which can do slightly better
in some cases (e.g. those involving GEPs).
This removes the extractvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
extractvalue is already not supported in bitcode, so we do not need
to worry about bitcode auto-upgrade.
Uses of ConstantExpr::getExtractValue() should be replaced with
IRBuilder::CreateExtractValue() (if the fact that the result is
constant is not important) or ConstantFoldExtractValueInstruction()
(if it is). Though for this particular case, it is also possible
and usually preferable to use getAggregateElement() instead.
The C API function LLVMConstExtractValue() is removed, as the
underlying constant expression no longer exists. Instead,
LLVMBuildExtractValue() should be used (which will constant fold
or create an instruction). Depending on the use-case,
LLVMGetAggregateElement() may also be used instead.
Differential Revision: https://reviews.llvm.org/D125795
The hidden option max-inline-stacksize=<N> prevents the inlining of functions
with a stack size larger than N.
Reviewed By: mtrofin, aeubanks
Differential Review: https://reviews.llvm.org/D127988
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889