Always require functions to be larger than MinFunctionSize when
SpecializeLiteralConstant is enabled, and increase MinFunctionSize to
500, to prevent excessive triggering of specialisations on small
functions.
When propagating a constant to a select instruction we only consider the
condition operand as the use. I am extending the logic to consider the
true and false values too, in case the condition had been found to be
constant in a previous propagation but halted.
When using the LLVM flang compiler with alias analysis (AA) enabled,
SPEC2017:548.exchange2_r was running significantly slower than wihtout
the AA.
This was caused by the GVN pass replacing many of the loads in the
pre-AA code with phi-nodes that form a long chain of dependencies, which
the function specialization was unable to follow.
This adds a function to discover phi-nodes in a transitive set, with
some limitations to avoid spending ages analysing phi-nodes.
The minimum latency savings also had to be lowered - fewer load
instructions means less saving.
Adding some more prints to help debugging the isProfitable decision.
No significant change in compile time or generated code-size.
(A previous attempt to fix this was abandoned: https://github.com/llvm/llvm-project/pull/71442)
---------
Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.
- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
Currently the naming scheme is a bit funky; the specializations are named
after the original function followed by an arbitrary decimal number. This
makes it hard to debug inlined specializations of recursive functions.
With this patch I am adding ".specialized." in between of the original
name and the suffix, which is now a single increment counter.
The code has changed significantly over time making the description
outdated. In this patch I am re-writing the description with an
emphasis to the cost model, where most of the changes have happened.
Differential Revision: https://reviews.llvm.org/D158723
* Changes the default value of FuncSpecMaxIters from 1 to 10.
This allows specialization of recursive functions.
* Adds an option to control the maximum codesize growth per function.
* Measured ~45% performance uplift for SPEC2017:548.exchange2_r on
AWS Graviton3.
Differential Revision: https://reviews.llvm.org/D145819
Currently we make an arbitrary comparison between codesize and latency
in order to decide whether to keep a specialization or not. Sometimes
the latency savings are biased in favor of loops because of imprecise
block frequencies, therefore this metric contains a lot of noise. This
patch tries to address the problem as follows:
* Reject specializations whose codesize savings are less than X% of
the original function size.
* Reject specializations whose latency savings are less than Y% of
the original function size.
* Reject specializations whose inlining bonus is less than Z% of
the original function size.
I am not saying this is super precise, but at least X, Y and Z are
configurable, allowing us to tweak the cost model. Moreover, it lets
us prioritize codesize over latency, which is a less noisy metric.
I am also increasing the minimum size a function should have to be
considered a candidate for specialization. Initially the cost of
a function was calculated as
CodeMetrics::NumInsts * InlineConstants::getInstrCost()
which later in D150464 was altered into CodeMetrics::NumInsts since
the metric is supposed to model TargetTransformInfo::TCK_CodeSize.
However, we omitted adjusting MinFunctionSize in that commit.
Differential Revision: https://reviews.llvm.org/D157123
Currently we only consider basic blocks with a unique predecessor when
estimating the size of dead code. However, we could expand to this to
consider blocks with a back-edge, or blocks preceded by dead blocks.
Differential Revision: https://reviews.llvm.org/D156903
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.
Differential Revision: https://reviews.llvm.org/D155103
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.
Differential Revision: https://reviews.llvm.org/D154852
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.
In addition to the last revision this one fixes the bug reported on
Phabricator.
Differential Revision: https://reviews.llvm.org/D154852
Reverting due to the crash reported in D154852.
Also reverting the subsequent commit as collateral damage:
"[FuncSpec] Split the specialization bonus into CodeSize and Latency."
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.
Differential Revision: https://reviews.llvm.org/D155103
Adds a TODO for checking inlinining opportunities while traversing
the users of the specialization arguments. This was brought up in
the review of D154852.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.
Differential Revision: https://reviews.llvm.org/D154852
Before looking up a value in the map of known constants we attempt
to dynamically cast it. The code looks cleaner if we move the cast
inside findConstantFor(), where the look up happens.
Differential Revision: https://reviews.llvm.org/D155177
As shown in D154820, the DataLayout-independent constant folding
interface is not good enough for handling GEPs. Instead we should
be using the DataLayout-aware constant folding interface. Since
there isn't a method to specifically handle GEPs we can use the
one which folds generic instruction operands.
Differential Revision: https://reviews.llvm.org/D154821
D150464 updated the cost model for function specialization. Unfortunately, this
also crashes when trying to build stage2 LLD with thinLTO and assertions. It looks
like the issue is caused by a mishandling of the Constant in a SwitchInst since the
Constant cannot always be assumed to safely casted to a ConstantInt. In the case
of the crash, Constant was a ConstantExpr which triggered the assertion.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D154159
After each iteration of the function specializer, constant stack values
are promoted to constant globals in order to enable recursive function
specialization. This should also be done once before running the
specializer. Enables specialization of _QMbrute_forcePdigits_2 from
SPEC2017:548.exchange2_r.
Differential Revision: https://reviews.llvm.org/D152799
Instead of blindly traversing the use-def chain of constant arguments,
compute known constants along the way. Stop as soon as a user cannot
be replaced by a constant. Keep it light-weight by handling some basic
instruction types.
Differential Revision: https://reviews.llvm.org/D150464
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.
Differential Revision: https://reviews.llvm.org/D150375
As reported on https://reviews.llvm.org/D150375#4367861 and
following, this change causes PDT invalidation issues. Revert
it and dependent commits.
This reverts commit 0524534d5220da5ecb2cd424a46520184d2be366.
This reverts commit ced90d1ff64a89a13479a37a3b17a411a3259f9f.
This reverts commit 9f992cc9350a7f7072a6dbf018ea07142ea7a7ed.
This reverts commit 1b1232047e83b69561fd64b9547cb0a0d374473a.
To do so we have to tweak the cost model such that specialization
does not trigger excessively.
Differential Revision: https://reviews.llvm.org/D150649
Instead of blindly traversing the use-def chain of constant arguments,
compute known constants along the way. Stop as soon as a user cannot
be replaced by a constant. Keep it light-weight by handling some basic
instruction types.
Differential Revision: https://reviews.llvm.org/D150464
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.
Differential Revision: https://reviews.llvm.org/D150375
There are a few inaccuracies with how FuncSpec handles global
variables.
When specialisation on non-const global variables is disabled (the
default) the pass could nevertheless perform some specializations,
e.g. on a constant GEP expression, or on a SSA variable, for which the
Solver has determined it has the value of a global variable.
When specialisation on non-const global variables is enabled, the pass
would skip non-scalars, e.g. a global array, but this should be
completely inconsequential, a pointer is a pointer.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D149476
Change-Id: Ic73051b2f8602587306760bf2ec552e5860f8d39
To track the return values of specializations, we need to invalidate all
the lattice values across the use-def chain which originates from the
callsites, recompute and propagate.
Differential Revision: https://reviews.llvm.org/D146158
Allow a function to be specialised even if it has its address taken or
it's global. For such functions, consider all of the arguments as
overdefined. Don't delete the functions even if all the apparent calls
were redirected to specialised instances.
Reviewed By: labrinea, ChuanqiXu
Differential Revision: https://reviews.llvm.org/D148345
* Remove redundant variable `NbFunctionsSpecialized` as it is no longer
used by the cost model.
* Rename statistic `NumFuncSpecialized` to `NumSpecsCreated` as a better
description (the old name confusingly implied number of functions we have
created clones for).
* Same for variable `SpecializedFuncs`. Renamed to `Specializations`.
* Move debug message in the destructor (avoids repetition when MaxIters > 1).
Differential Revision: https://reviews.llvm.org/D145375
If the only user of the Alloca argument provided to getPromotableAlloca()
is the same as the Call argument, StoreValue is never set and results
in an assertion failure that isa<> was used on a nullptr when passed into
getCandidateConstant().
This was originally seen when trying to build SPEC 2006 416.gamess using
flang with lto enabled.
Differential Revision: https://reviews.llvm.org/D143457
The `FunctionSpecialization` pass chooses specializations among the
opportunities presented by a single function and its calls,
progressively penalizing subsequent specialization attempts by
artificially increasing the cost of a specialization, depending on how
many specialization were applied before. Thus the chosen
specializations are sensitive to the order the functions appear in the
module and may be worse than others, had those others been considered
earlier.
This patch makes the `FunctionSpecialization` pass rank the
specializations globally, i.e. choose the "best" specializations
among the all possible specializations in the module, for all
functions.
Since this involved quite a bit of redesign of the pass data
structures, this patch also carries:
* removal of duplicate specializations
* optimization of call sites update, by collecting per
specialization the list of call sites that can be directly
rewritten, without prior expensive check if the call constants and
their positions match those of the specialized function.
A bit of a write-up up about the FuncSpec data structures and
operation:
Each potential function specialisation is kept in a single vector
(`AllSpecs` in `FunctionSpecializer::run`). This vector is populated
by `FunctionSpecializer::findSpecializations`.
The `findSpecializations` member function has a local `DenseMap` to
eliminate duplicates - with each call to the current function,
`findSpecializations` builds a specialisation signature (`SpecSig`)
and looks it in the duplicates map. If the signature is present, the
function records the call to rewrite into the existing specialisation
instance. If the signature is absent, it means we have a new
specialisation instance - the function calculates the gain and creates
a new entry in `AllSpecs`. Negative gain specialisation are ignored at
this point, unless forced.
The potential specialisations for a function form a contiguous range
in the `AllSpecs` [1]. This range is recorded in `SpecMap SM`, so we
can quickly find all specialisations for a function.
Once we have all the potential specialisations with their gains we
need to choose the best ones, which fit in the module specialisation
budget. This is done by using a max-heap (`std::make_heap`,
`std::push_heap`, etc) to find the best `NSpec` specialisations with a
single traversal of the `AllSpecs` vector. The heap itself is
contained with a small vector (`BestSpecs`) of indices into
`AllSpecs`, since elements of `AllSpecs` are a bit too heavy to
shuffle around.
Next the chosen specialisation are performed, that is, functions
cloned, `SCCPSolver` primed, and known call sites updated.
Then we run the `SCCPSolver` to propagate constants in the cloned
functions, after which we walk the calls of the original functions to
update them to call the specialised functions.
---
[1] This range may contain specialisation that were discarded and is
not ordered in any way. One alternative design is to keep a vector
indices of all specialisations for this function (which would
initially be, `i`, `i+1`, `i+2`, etc) and later sort them by gain,
pushing non-applied ones to the back. This has the potential to speed
`updateCallSites` up.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D139346
Change-Id: I708851eb38f07c42066637085b833ca91b195998
Reland 877a9f9abec61f06e39f1cd872e37b828139c2d1 since D138654 (parent)
has been fixed with 9ebaf4fef4aac89d4eff08e48185d61bc893f14e and with
8f1e11c5a7d70f96943a72649daa69f152d73e90.
Differential Revision: https://reviews.llvm.org/D126455
Reland 42c2dc401742266da3e0251b6c1ca491f4779963 which was reverted
in cb03b1bd99313a728d47060b909a73e7f5991231. The fix for the link
errors was to reintroduce one of the two occurences of 'Scalar'
under the LINK_COMPONENTS.
Differential Revision: https://reviews.llvm.org/D138654
This reverts commit 42c2dc401742266da3e0251b6c1ca491f4779963.
This broke some buildbots:
undefined reference to `llvm::createBitTrackingDCEPass()'
undefined reference to `llvm::createAlignmentFromAssumptionsPass()'
undefined reference to `llvm::createLoopUnrollPass(int, bool, bool, int, int, int, int, int, int)'
undefined reference to `llvm::createLICMPass(unsigned int, unsigned int, bool)'
undefined reference to `llvm::createWarnMissedTransformationsPass()'
undefined reference to `llvm::createAlignmentFromAssumptionsPass()'
undefined reference to `llvm::createCallSiteSplittingPass()'
undefined reference to `llvm::createCFGSimplificationPass(llvm::SimplifyCFGOptions, std::function<bool (llvm::Function const&)>)'
undefined reference to `llvm::createFloat2IntPass()'
undefined reference to `llvm::createLowerConstantIntrinsicsPass()'
undefined reference to `llvm::createLoopRotatePass(int, bool)'
undefined reference to `llvm::createLoopDistributePass()'
undefined reference to `llvm::createLoopSinkPass()'
undefined reference to `llvm::createInstSimplifyLegacyPass()'
undefined reference to `llvm::createDivRemPairsPass()'
undefined reference to `llvm::createCFGSimplificationPass(llvm::SimplifyCFGOptions, std::function<bool (llvm::Function const&)>)'
undefined reference to `llvm::SetLicmMssaOptCap'
undefined reference to `llvm::SetLicmMssaNoAccForPromotionCap'
undefined reference to `llvm::ForgetSCEVInLoopUnroll'
This reverts commit 877a9f9abec61f06e39f1cd872e37b828139c2d1.
It depends on the parent revision 42c2dc401742266da3e0251b6c1ca491f4779963
which needs to be reverted as it broke some buildbots, so reverting both.
The aim of this patch is to minimize the compilation time overhead of
running Function Specialization. It is about 40% slower to run as a
standalone pass (IPSCCP + FuncSpec vs IPSCCP with FuncSpec) according
to my measurements. I compiled the llvm testsuite with NewPM-O3 + LTO
and measured single threaded [user + system] time of IPSCCP and FuncSpec
by passing the '-time-passes' option to lld. Then I compared the two
configurations in terms of Instruction Count of the total compilation
(not of the individual passes) as in https://llvm-compile-time-tracker.com.
Geomean for non-LTO builds is -0.25% and LTO is -0.5% approximately.
You can find more info below:
https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518
Differential Revision: https://reviews.llvm.org/D126455
The LLVMipo library no longer depends on the Scalar component.
The shared functions between IPSCCP and SCCP have been moved
under Utils, in the SCCPSolver.
This is preliminary work for D126455, in order to break a cyclic
dependency between LLVM libraries.
Differential Revision: https://reviews.llvm.org/D138654
Deleting a fully specialised function left dangling pointers in
`FunctionAnalysisManager`, which causes an internal compiler error
when the function's storage was reused.
Fixes bug #58759.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D138909
Change-Id: Ifed378c748af35e8fe7dcbdddb0f41b8777cbe87