* Remove redundant variable `NbFunctionsSpecialized` as it is no longer
used by the cost model.
* Rename statistic `NumFuncSpecialized` to `NumSpecsCreated` as a better
description (the old name confusingly implied number of functions we have
created clones for).
* Same for variable `SpecializedFuncs`. Renamed to `Specializations`.
* Move debug message in the destructor (avoids repetition when MaxIters > 1).
Differential Revision: https://reviews.llvm.org/D145375
If the only user of the Alloca argument provided to getPromotableAlloca()
is the same as the Call argument, StoreValue is never set and results
in an assertion failure that isa<> was used on a nullptr when passed into
getCandidateConstant().
This was originally seen when trying to build SPEC 2006 416.gamess using
flang with lto enabled.
Differential Revision: https://reviews.llvm.org/D143457
The `FunctionSpecialization` pass chooses specializations among the
opportunities presented by a single function and its calls,
progressively penalizing subsequent specialization attempts by
artificially increasing the cost of a specialization, depending on how
many specialization were applied before. Thus the chosen
specializations are sensitive to the order the functions appear in the
module and may be worse than others, had those others been considered
earlier.
This patch makes the `FunctionSpecialization` pass rank the
specializations globally, i.e. choose the "best" specializations
among the all possible specializations in the module, for all
functions.
Since this involved quite a bit of redesign of the pass data
structures, this patch also carries:
* removal of duplicate specializations
* optimization of call sites update, by collecting per
specialization the list of call sites that can be directly
rewritten, without prior expensive check if the call constants and
their positions match those of the specialized function.
A bit of a write-up up about the FuncSpec data structures and
operation:
Each potential function specialisation is kept in a single vector
(`AllSpecs` in `FunctionSpecializer::run`). This vector is populated
by `FunctionSpecializer::findSpecializations`.
The `findSpecializations` member function has a local `DenseMap` to
eliminate duplicates - with each call to the current function,
`findSpecializations` builds a specialisation signature (`SpecSig`)
and looks it in the duplicates map. If the signature is present, the
function records the call to rewrite into the existing specialisation
instance. If the signature is absent, it means we have a new
specialisation instance - the function calculates the gain and creates
a new entry in `AllSpecs`. Negative gain specialisation are ignored at
this point, unless forced.
The potential specialisations for a function form a contiguous range
in the `AllSpecs` [1]. This range is recorded in `SpecMap SM`, so we
can quickly find all specialisations for a function.
Once we have all the potential specialisations with their gains we
need to choose the best ones, which fit in the module specialisation
budget. This is done by using a max-heap (`std::make_heap`,
`std::push_heap`, etc) to find the best `NSpec` specialisations with a
single traversal of the `AllSpecs` vector. The heap itself is
contained with a small vector (`BestSpecs`) of indices into
`AllSpecs`, since elements of `AllSpecs` are a bit too heavy to
shuffle around.
Next the chosen specialisation are performed, that is, functions
cloned, `SCCPSolver` primed, and known call sites updated.
Then we run the `SCCPSolver` to propagate constants in the cloned
functions, after which we walk the calls of the original functions to
update them to call the specialised functions.
---
[1] This range may contain specialisation that were discarded and is
not ordered in any way. One alternative design is to keep a vector
indices of all specialisations for this function (which would
initially be, `i`, `i+1`, `i+2`, etc) and later sort them by gain,
pushing non-applied ones to the back. This has the potential to speed
`updateCallSites` up.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D139346
Change-Id: I708851eb38f07c42066637085b833ca91b195998
Reland 877a9f9abec61f06e39f1cd872e37b828139c2d1 since D138654 (parent)
has been fixed with 9ebaf4fef4aac89d4eff08e48185d61bc893f14e and with
8f1e11c5a7d70f96943a72649daa69f152d73e90.
Differential Revision: https://reviews.llvm.org/D126455
Reland 42c2dc401742266da3e0251b6c1ca491f4779963 which was reverted
in cb03b1bd99313a728d47060b909a73e7f5991231. The fix for the link
errors was to reintroduce one of the two occurences of 'Scalar'
under the LINK_COMPONENTS.
Differential Revision: https://reviews.llvm.org/D138654
This reverts commit 42c2dc401742266da3e0251b6c1ca491f4779963.
This broke some buildbots:
undefined reference to `llvm::createBitTrackingDCEPass()'
undefined reference to `llvm::createAlignmentFromAssumptionsPass()'
undefined reference to `llvm::createLoopUnrollPass(int, bool, bool, int, int, int, int, int, int)'
undefined reference to `llvm::createLICMPass(unsigned int, unsigned int, bool)'
undefined reference to `llvm::createWarnMissedTransformationsPass()'
undefined reference to `llvm::createAlignmentFromAssumptionsPass()'
undefined reference to `llvm::createCallSiteSplittingPass()'
undefined reference to `llvm::createCFGSimplificationPass(llvm::SimplifyCFGOptions, std::function<bool (llvm::Function const&)>)'
undefined reference to `llvm::createFloat2IntPass()'
undefined reference to `llvm::createLowerConstantIntrinsicsPass()'
undefined reference to `llvm::createLoopRotatePass(int, bool)'
undefined reference to `llvm::createLoopDistributePass()'
undefined reference to `llvm::createLoopSinkPass()'
undefined reference to `llvm::createInstSimplifyLegacyPass()'
undefined reference to `llvm::createDivRemPairsPass()'
undefined reference to `llvm::createCFGSimplificationPass(llvm::SimplifyCFGOptions, std::function<bool (llvm::Function const&)>)'
undefined reference to `llvm::SetLicmMssaOptCap'
undefined reference to `llvm::SetLicmMssaNoAccForPromotionCap'
undefined reference to `llvm::ForgetSCEVInLoopUnroll'
This reverts commit 877a9f9abec61f06e39f1cd872e37b828139c2d1.
It depends on the parent revision 42c2dc401742266da3e0251b6c1ca491f4779963
which needs to be reverted as it broke some buildbots, so reverting both.
The aim of this patch is to minimize the compilation time overhead of
running Function Specialization. It is about 40% slower to run as a
standalone pass (IPSCCP + FuncSpec vs IPSCCP with FuncSpec) according
to my measurements. I compiled the llvm testsuite with NewPM-O3 + LTO
and measured single threaded [user + system] time of IPSCCP and FuncSpec
by passing the '-time-passes' option to lld. Then I compared the two
configurations in terms of Instruction Count of the total compilation
(not of the individual passes) as in https://llvm-compile-time-tracker.com.
Geomean for non-LTO builds is -0.25% and LTO is -0.5% approximately.
You can find more info below:
https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518
Differential Revision: https://reviews.llvm.org/D126455
The LLVMipo library no longer depends on the Scalar component.
The shared functions between IPSCCP and SCCP have been moved
under Utils, in the SCCPSolver.
This is preliminary work for D126455, in order to break a cyclic
dependency between LLVM libraries.
Differential Revision: https://reviews.llvm.org/D138654
Deleting a fully specialised function left dangling pointers in
`FunctionAnalysisManager`, which causes an internal compiler error
when the function's storage was reused.
Fixes bug #58759.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D138909
Change-Id: Ifed378c748af35e8fe7dcbdddb0f41b8777cbe87
The `FunctionSpecialization` pass needs loop analysis results for its
cost function. For this purpose, it computes the `DominatorTree` and
`LoopInfo` for a function in `getSpecializationBonus`. This function,
however, is called O(number of call sites x number of arguments), but
the DominatorTree/LoopInfo can be computed just once.
This patch plugs into the PassManager infrastructure to obtain
LoopInfo for a function and removes ad-hoc computation from
`getSpecializatioBonus`.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D136332
[recommitting after recommitting a dependency]
This patch reorders the traversal of function call sites and function
formal parameters to:
* do various argument feasibility checks (`isArgumentInteresting` )
only once per argument, i.e. doing N-args checks instead of
N-calls x N-args checks.
* do hash table lookups only once per call site, i.e. N-calls
lookups/inserts instead of N-call x N-args lookups/inserts.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D135968
Change-Id: I7d21081a2479cbdb62deac15f903d6da4f3b8529
When calculating the specialization bonus for a given function argument,
we recursively traverse the chain of (certain) users, accumulating the
instruction costs. Then we exponentially increase the bonus to account
for loop nests. This is problematic for two reasons: (a) the users might
not themselves be inside the loop nest, (b) if they are we are accounting
for it multiple times. Instead we should be adjusting the bonus before
traversing the user chain.
This reduces the instruction count for CTMark (newPM-O3) when Function
Specialization is enabled without actually reducing the amount of
specializations performed (geomean: -0.001% non-LTO, -0.406% LTO).
Differential Revision: https://reviews.llvm.org/D136692
[fixed test to work with reverse iteration]
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` . There are a few
issues with the implementation, though:
* even with the default, the pass will still specialise based on
floating-point literals
* even when it's enabled, the pass will specialise only for the `i1`
type (or `i2` if all of the possible 4 values occur, or `i3` if all
of the possible 8 values occur, etc)
The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.
This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:
* unknown or undef
* a constant
* a constant range with a single element
on the basis that specialisation is pointless for those cases.
Is also changes the criteria for picking up an actual argument to
specialise if the argument is:
* a LLVM IR constant
* has `constant` lattice value
has `constantrange` lattice value with a single element.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135893
Change-Id: Iea273423176082ec51339aa66a5fe9fea83557ee
This patch reorders the traversal of function call sites and function
formal parameters to:
* do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks.
* do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D135968
When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g. for a function
void f(int x, int y)
the call like `f(1, 2)` could be matched by either
void f.1(int x /* int y == 2 */);
or
void f.2(/* int x == 1, int y == 2 */);
The `FunctionSpecialisation` pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag `-force-function-specialization`, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.
This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when `-force-function-specialization`
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D136180
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` . There are a few
issues with the implementation, though:
* even with the default, the pass will still specialise based on
floating-point literals
* even when it's enabled, the pass will specialise only for the `i1`
type (or `i2` if all of the possible 4 values occur, or `i3` if all
of the possible 8 values occur, etc)
The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.
This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:
* unknown or undef
* a constant
* a constant range with a single element
on the basis that specialisation is pointless for those cases.
Is also changes the criteria for picking up an actual argument to
specialise if the argument is:
* a LLVM IR constant
* has `constant` lattice value
has `constantrange` lattice value with a single element.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135893
When collecting the possible constant arguments to
specialise a function the compiler will abandon the search
on the first argument that is for some reason unsuitable as
a specialisation constant. Thus, depending on the traversal
order of the functions and call sites, the compiler can end
up with a different set of possible constants, hence with
different set of specialisations.
With this patch, the compiler will skip unsuitable
constants, but nevertheless will continue searching for
more.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135867
Small functions with size under a given threshold are not
considered for specialisaion on the presumption that they
are easy to inline. This does not apply to `noinline`
functions, though.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135862
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
This patch improves the fix in D110529 to prevent from crashing on value
with byval attribute that is not added in SCCP solver.
Authored-by: sinan.lin@linux.alibaba.com
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D126355
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.
On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.
I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.
Differential Revision: https://reviews.llvm.org/D127131
This isn't expected to reduce compilation times as 'max-iters' is set to
one by default, but it helps with recursive functions that require higher
iteration counts.
Differential Revision: https://reviews.llvm.org/D122819
This fixes a TODO in constantArgPropagation() to make it feature complete.
However, I do find myself in agreement with the review comments in
https://reviews.llvm.org/D106426. I don't think we should pursue
specializing such recursive functions as the code size increase becomes
linear to 'max-iters'. Compiling the modified test just with -O3 (no
function specialization) generates the same code.
Differential Revision: https://reviews.llvm.org/D122755
The current implementation of Function Specialization does not allow
specializing more than one arguments per function call, which is a
limitation I am lifting with this patch.
My main challenge was to choose the most suitable ADT for storing the
specializations. We need an associative container for binding all the
actual arguments of a specialization to the function call. We also
need a consistent iteration order across executions. Lastly we want
to be able to sort the entries by Gain and reject the least profitable
ones.
MapVector fits the bill but not quite; erasing elements is expensive
and using stable_sort messes up the indices to the underlying vector.
I am therefore using the underlying vector directly after calculating
the Gain.
Differential Revision: https://reviews.llvm.org/D119880
We will check a bit later that the constant is in fact a function,
so the separate check for a function pointer type is largely
redunant. Also simplify the cast stripping with
stripPointerCasts().
`ArgInfo` is reduced to only contain a pair of {formal,actual} values.
The specialized function `Fn` and the `Partial` flag are redundant in
this structure. The `Gain` is moved to a new struct `SpecializationInfo`.
The value mappings created by cloneCandidateFunction() are being used
by rewriteCallSites() for matching the formal arguments of recursive
functions.
The list of specializations is passed by reference to calculateGains()
instead of being returned by value.
The `IsPartial` flag is removed from isArgumentInteresting() and
getPossibleConstants() as it's no longer used anywhere in the code.
Differential Revision: https://reviews.llvm.org/D120753
A function is basically dead when:
* it has no uses
* it has only self-referencing uses (it's recursive)
Differential Revision: https://reviews.llvm.org/D119878
We only need to do propagation on use instructions of the original
value, rather than the replacing const value which might have lots
of irrelavant uses. This is done by caching uses before replacing.
Differential Revision: https://reviews.llvm.org/D119815
This is a fix for a use-after-free found by the address sanitizer when
compiling GCC: https://github.com/llvm/llvm-project/issues/52821
The Function Specialization pass may remove instructions, cached
inside the PredicateBase class, which are later being dereferenced
from the SCCPInstVisitor class. To prevent the dangling references
I am lazily deleting the dead instructions after the Solver has run.
Differential Revision: https://reviews.llvm.org/D118591
Rename option MaxConstantsThreshold to MaxClonesThreshold. Not only is this
more descriptive, this is also in preparation of introducing another threshold
to analyse more than just 1 constant argument as we currently do, and to better
distinguish these options/thresholds.
This is a follow up of D115458 and truncates the worklist of actual arguments
that can be specialised to 'MaxConstantsThreshold' candidates if
MaxConstantsThreshold was exceeded. Thus, this changes the behaviour of option
-func-specialization-max-constants. Before it didn't specialise at all when
this threshold was exceeded, but now it specialises up to MaxConstantsThreshold
candidates from the sorted worklist.
Differential Revision: https://reviews.llvm.org/D115509
This reverts commit 20b03d65364d963585bf16f175b367f3842f223a.
This shows some failed tests on a bot with expensive checks enabled that I need
to look at.
This mostly is the same code that is refactored to decouple the cost and
benefit analysis. The biggest change is top-level function specializeFunctions
that now drives the transformation more like this:
specializeFunctions() {
Cost = getSpecializationCost(F);
calculateGains(F, Cost);
specializeFunction(F);
}
while this is just a restructuring, it helps the functional change in
calculateGains. I.e., we now sort the candidates based on the expected
specialisation gain, which we didn't do before. For this, a book keeping struct
ArgInfo was introduced. If we have a list of N candidates, but we only want
specialise less than N as set by option -func-specialization-max-constants, we
sort the list and discard the candidates that give the least benefit.
Given a formal argument, this change results in selecting the best actual
argument(s). This is NFC'ish in that this shouldn't change the current output
(hence no test change here), but in follow ups starting with D115509, it
should and I want to go one step further and compare all functions and all
arguments, which will mostly build on top of this refactoring and change.
Differential Revision: https://reviews.llvm.org/D115458