Enable specialization on literal constant arguments by default in
Function Specialization.
---------
Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
Currently the naming scheme is a bit funky; the specializations are named
after the original function followed by an arbitrary decimal number. This
makes it hard to debug inlined specializations of recursive functions.
With this patch I am adding ".specialized." in between of the original
name and the suffix, which is now a single increment counter.
As reported on https://reviews.llvm.org/D150375#4367861 and
following, this change causes PDT invalidation issues. Revert
it and dependent commits.
This reverts commit 0524534d5220da5ecb2cd424a46520184d2be366.
This reverts commit ced90d1ff64a89a13479a37a3b17a411a3259f9f.
This reverts commit 9f992cc9350a7f7072a6dbf018ea07142ea7a7ed.
This reverts commit 1b1232047e83b69561fd64b9547cb0a0d374473a.
To do so we have to tweak the cost model such that specialization
does not trigger excessively.
Differential Revision: https://reviews.llvm.org/D150649
The `FunctionSpecialization` pass chooses specializations among the
opportunities presented by a single function and its calls,
progressively penalizing subsequent specialization attempts by
artificially increasing the cost of a specialization, depending on how
many specialization were applied before. Thus the chosen
specializations are sensitive to the order the functions appear in the
module and may be worse than others, had those others been considered
earlier.
This patch makes the `FunctionSpecialization` pass rank the
specializations globally, i.e. choose the "best" specializations
among the all possible specializations in the module, for all
functions.
Since this involved quite a bit of redesign of the pass data
structures, this patch also carries:
* removal of duplicate specializations
* optimization of call sites update, by collecting per
specialization the list of call sites that can be directly
rewritten, without prior expensive check if the call constants and
their positions match those of the specialized function.
A bit of a write-up up about the FuncSpec data structures and
operation:
Each potential function specialisation is kept in a single vector
(`AllSpecs` in `FunctionSpecializer::run`). This vector is populated
by `FunctionSpecializer::findSpecializations`.
The `findSpecializations` member function has a local `DenseMap` to
eliminate duplicates - with each call to the current function,
`findSpecializations` builds a specialisation signature (`SpecSig`)
and looks it in the duplicates map. If the signature is present, the
function records the call to rewrite into the existing specialisation
instance. If the signature is absent, it means we have a new
specialisation instance - the function calculates the gain and creates
a new entry in `AllSpecs`. Negative gain specialisation are ignored at
this point, unless forced.
The potential specialisations for a function form a contiguous range
in the `AllSpecs` [1]. This range is recorded in `SpecMap SM`, so we
can quickly find all specialisations for a function.
Once we have all the potential specialisations with their gains we
need to choose the best ones, which fit in the module specialisation
budget. This is done by using a max-heap (`std::make_heap`,
`std::push_heap`, etc) to find the best `NSpec` specialisations with a
single traversal of the `AllSpecs` vector. The heap itself is
contained with a small vector (`BestSpecs`) of indices into
`AllSpecs`, since elements of `AllSpecs` are a bit too heavy to
shuffle around.
Next the chosen specialisation are performed, that is, functions
cloned, `SCCPSolver` primed, and known call sites updated.
Then we run the `SCCPSolver` to propagate constants in the cloned
functions, after which we walk the calls of the original functions to
update them to call the specialised functions.
---
[1] This range may contain specialisation that were discarded and is
not ordered in any way. One alternative design is to keep a vector
indices of all specialisations for this function (which would
initially be, `i`, `i+1`, `i+2`, etc) and later sort them by gain,
pushing non-applied ones to the back. This has the potential to speed
`updateCallSites` up.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D139346
Change-Id: I708851eb38f07c42066637085b833ca91b195998
Reland 877a9f9abec61f06e39f1cd872e37b828139c2d1 since D138654 (parent)
has been fixed with 9ebaf4fef4aac89d4eff08e48185d61bc893f14e and with
8f1e11c5a7d70f96943a72649daa69f152d73e90.
Differential Revision: https://reviews.llvm.org/D126455
This reverts commit 877a9f9abec61f06e39f1cd872e37b828139c2d1.
It depends on the parent revision 42c2dc401742266da3e0251b6c1ca491f4779963
which needs to be reverted as it broke some buildbots, so reverting both.
The aim of this patch is to minimize the compilation time overhead of
running Function Specialization. It is about 40% slower to run as a
standalone pass (IPSCCP + FuncSpec vs IPSCCP with FuncSpec) according
to my measurements. I compiled the llvm testsuite with NewPM-O3 + LTO
and measured single threaded [user + system] time of IPSCCP and FuncSpec
by passing the '-time-passes' option to lld. Then I compared the two
configurations in terms of Instruction Count of the total compilation
(not of the individual passes) as in https://llvm-compile-time-tracker.com.
Geomean for non-LTO builds is -0.25% and LTO is -0.5% approximately.
You can find more info below:
https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518
Differential Revision: https://reviews.llvm.org/D126455
The current implementation of Function Specialization does not allow
specializing more than one arguments per function call, which is a
limitation I am lifting with this patch.
My main challenge was to choose the most suitable ADT for storing the
specializations. We need an associative container for binding all the
actual arguments of a specialization to the function call. We also
need a consistent iteration order across executions. Lastly we want
to be able to sort the entries by Gain and reject the least profitable
ones.
MapVector fits the bill but not quite; erasing elements is expensive
and using stable_sort messes up the indices to the underlying vector.
I am therefore using the underlying vector directly after calculating
the Gain.
Differential Revision: https://reviews.llvm.org/D119880