The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.
- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
Currently we make an arbitrary comparison between codesize and latency
in order to decide whether to keep a specialization or not. Sometimes
the latency savings are biased in favor of loops because of imprecise
block frequencies, therefore this metric contains a lot of noise. This
patch tries to address the problem as follows:
* Reject specializations whose codesize savings are less than X% of
the original function size.
* Reject specializations whose latency savings are less than Y% of
the original function size.
* Reject specializations whose inlining bonus is less than Z% of
the original function size.
I am not saying this is super precise, but at least X, Y and Z are
configurable, allowing us to tweak the cost model. Moreover, it lets
us prioritize codesize over latency, which is a less noisy metric.
I am also increasing the minimum size a function should have to be
considered a candidate for specialization. Initially the cost of
a function was calculated as
CodeMetrics::NumInsts * InlineConstants::getInstrCost()
which later in D150464 was altered into CodeMetrics::NumInsts since
the metric is supposed to model TargetTransformInfo::TCK_CodeSize.
However, we omitted adjusting MinFunctionSize in that commit.
Differential Revision: https://reviews.llvm.org/D157123
Currently we only consider basic blocks with a unique predecessor when
estimating the size of dead code. However, we could expand to this to
consider blocks with a back-edge, or blocks preceded by dead blocks.
Differential Revision: https://reviews.llvm.org/D156903
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.
Differential Revision: https://reviews.llvm.org/D155103
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.
Differential Revision: https://reviews.llvm.org/D154852
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.
In addition to the last revision this one fixes the bug reported on
Phabricator.
Differential Revision: https://reviews.llvm.org/D154852
Those are added by the SCCP Solver before invoking the Specializer.
They need to be removed otherwise the destructor of PredicateInfo
complains.
Differential Revision: https://reviews.llvm.org/D156365
Reverting due to the crash reported in D154852.
Also reverting the subsequent commit as collateral damage:
"[FuncSpec] Split the specialization bonus into CodeSize and Latency."
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.
Differential Revision: https://reviews.llvm.org/D155103
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.
Differential Revision: https://reviews.llvm.org/D154852
As shown in D154820, the DataLayout-independent constant folding
interface is not good enough for handling GEPs. Instead we should
be using the DataLayout-aware constant folding interface. Since
there isn't a method to specifically handle GEPs we can use the
one which folds generic instruction operands.
Differential Revision: https://reviews.llvm.org/D154821
The InstCostVisitor is currently using the DataLayout-independent constant
folding interface. This is a workaround since we can't directly call
ConstantExpr::getGetElementPtr due to deprecation. This patch shows that
the constant folding interface we are using is not good enough.
Differential Revision: https://reviews.llvm.org/D154820
The specialization bonus is zero in some unittests because the basic blocks
containing the users of the constant arguments are executed less frequently
than the entry block. Sinking them into loops solves that.
Differential Revision: https://reviews.llvm.org/D153230
Instead of blindly traversing the use-def chain of constant arguments,
compute known constants along the way. Stop as soon as a user cannot
be replaced by a constant. Keep it light-weight by handling some basic
instruction types.
Differential Revision: https://reviews.llvm.org/D150464
As reported on https://reviews.llvm.org/D150375#4367861 and
following, this change causes PDT invalidation issues. Revert
it and dependent commits.
This reverts commit 0524534d5220da5ecb2cd424a46520184d2be366.
This reverts commit ced90d1ff64a89a13479a37a3b17a411a3259f9f.
This reverts commit 9f992cc9350a7f7072a6dbf018ea07142ea7a7ed.
This reverts commit 1b1232047e83b69561fd64b9547cb0a0d374473a.
To do so we have to tweak the cost model such that specialization
does not trigger excessively.
Differential Revision: https://reviews.llvm.org/D150649
Instead of blindly traversing the use-def chain of constant arguments,
compute known constants along the way. Stop as soon as a user cannot
be replaced by a constant. Keep it light-weight by handling some basic
instruction types.
Differential Revision: https://reviews.llvm.org/D150464