121 Commits

Author SHA1 Message Date
Nikita Popov
35bad229c1
[PredicateInfo] Use bitcast instead of ssa.copy (#151174)
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
    
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
2025-08-11 09:25:01 +02:00
Slava Zakharin
bec9ac2daf
[llvm] Lower latency bonus threshold in function specialization. (#143954)
Related to #143219.

Function specialization does not kick in if flang sets `noalias`
attributes on the function arguments of `digits_2`, because PRE
optimizes several `srem` instructions and other memory accesses
from the inner loops causing the latency bonus to be lower than
the current 40% threshold.

While looking at this, I did not really get why we compute the latency
bonus as a ratio of the latency of the "eliminated" instructions
and the code-size of the whole function. It did not make much sense
to me.

I tried computing the total latency as a sum of latencies
of the instructions that belong to non-dead code (including
the instructions that would be executed had they not been
"eliminated" due to the constant propagation). This total
latency should identify the total cost of executing the function
with the given argument being dynamically equal to the tried
constant value. Then the latency bonus would be computed
as the ratio between the latency of the "eliminated" instructions
and the total latency. Unfortunately, this did not given me a good
heuristics either. The bonus was close to 0% on some targets,
and as big as 3-5% on other targets. This does match very well
with the performance gain achieved by function specialization
for exchange2, so it seemd like another artificial heuristic
not better than the current one.

It seems that GCC uses a set of different heuristics for function
specialization, but I am not an expert here and I cannot say
if we can match them in LLVM.

With all that said, I decided to try to lower the threshold
to avoid the regression and be able to re-enable the generally
good change for `noalias` attribute.

With this patch, I was able to reduce the effect of `noalias`,
so that `-force-no-alias=true` is only ~10% slower than
`-force-no-alias=false` code on neoverse-v1 and neoverse-v2.
On neoverse-n1, `-force-no-alias=true` is >2x faster than
`-force-no-alias=false` regardless of this patch.

This threshold has been changed before also due to improved
alias information:
2fb51fba8c (diff-066363256b7b4164e66b28a3028b2cb9e405c9136241baa33db76ebd2edb87cd)

Please let me know what testing I should run to make sure this change
is safe. As I understand, it may affect the compilation time
performance,
and I will appreciate it if someone points out which benchmarks
need to be checked before merging this.
2025-06-17 16:13:42 -07:00
David Green
98b6f8dc69
[CostModel] Remove optional from InstructionCost::getValue() (#135596)
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.
2025-04-23 07:46:27 +01:00
David Green
52bffdf9f5
[IPSCCP][FuncSpec] Protect against metadata access from call args. (#124284)
Fixes an issue reported from #114964, where metadata arguments were
attempted to be accessed as constants.
2025-01-25 10:59:50 +00:00
Ryan Mansfield
67efbd0bf1
[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955) 2025-01-08 11:07:23 +01:00
Hari Limaye
88e9b373c0
[FuncSpec] Query SCCPSolver in more places (#114964)
When traversing the use-def chain of an Argument in a candidate
specialization, also query the SCCPSolver to see if a Value is constant.
This allows us to better estimate the codesize savings of a candidate in
the presence of instructions that are a user of the argument we are
estimating savings for which also use arguments that have been found
constant by IPSCCP.

Similarly when estimating the dead basic blocks from branch and switch
instructions which become constant, also query the SCCPSolver to see if
a predecessor is unreachable.
2024-11-06 13:25:15 +00:00
Hari Limaye
5f30b1aae0
[FuncSpec] Improve handling of BinaryOperator instructions (#114534)
When visiting BinaryOperator instructions during estimation of codesize
savings for a candidate specialization, don't bail when the other
operand is not found to be constant. This allows us to find more
constants than we otherwise would, for example `and(false, x)`.
2024-11-04 11:39:53 +00:00
Hari Limaye
5ed3f46359
[FuncSpec] Improve handling of Comparison Instructions (#114073)
When visiting comparison instructions during computation of a
specializations's bonus, make use of information from the lattice value
of the other operand in the case where we have not found this to have a
specific constant value.
2024-11-04 11:39:00 +00:00
Hari Limaye
daa9af179f
[FuncSpec] Handle ssa_copy intrinsic calls in InstCostVisitor (#114247)
Look through ssa_copy intrinsic calls when computing codesize bonus for
a specialization.

Also remove redundant logic to skip computing codesize bonus for
ssa_copy intrinsics, now these are considered zero-cost by TTI (in PR
#75294).
2024-11-04 10:09:50 +00:00
Hari Limaye
e19a5fc6d3
[FuncSpec] Improve accounting of specialization codesize growth (#113448)
Only accumulate the codesize increase of functions that are actually
specialized, rather than for every candidate specialization that we
analyse.

This fixes a subtle bug where prior analysis of candidate
specializations that were deemed unprofitable could prevent subsequent
profitable candidates from being recognised.
2024-10-29 11:53:12 +00:00
Hari Limaye
06664fdc76
[FuncSpec] Enable SpecializeLiteralConstant by default (#113442)
Enable specialization on literal constant arguments by default in
Function Specialization.

---------

Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
2024-10-29 11:41:25 +00:00
Ellis Hoag
6ab26eab4f
Check hasOptSize() in shouldOptimizeForSize() (#112626) 2024-10-28 09:45:03 -07:00
Hari Limaye
c6931c2525
[FuncSpec] Only compute Latency bonus when necessary (#113159)
Only compute the Latency component of a specialisation's Bonus when
necessary, to avoid unnecessarily computing the Block Frequency
Information for a Function.
2024-10-23 09:05:44 +01:00
Hari Limaye
0d1a91e8f9
[FuncSpec] Update MinFunctionSize logic (#112711)
Always require functions to be larger than MinFunctionSize when
SpecializeLiteralConstant is enabled, and increase MinFunctionSize to
500, to prevent excessive triggering of specialisations on small
functions.
2024-10-18 09:21:19 +01:00
Alexandros Lamprineas
6472cb1e21
[FuncSpec] Improve estimation of select instruction. (#111176)
When propagating a constant to a select instruction we only consider the
condition operand as the use. I am extending the logic to consider the
true and false values too, in case the condition had been found to be
constant in a previous propagation but halted.
2024-10-09 10:25:20 +01:00
Yingwei Zheng
f364b2ee22
[LLVM] Don't peek through bitcast on pointers and gep with zero indices. NFC. (#102889)
Since we are using opaque pointers now, we don't need to peek through
bitcast on pointers and gep with zero indices.
2024-08-13 22:38:50 +08:00
Hans Wennborg
575e68e571 FunctionSpecialization: Make the ordering of BestSpecs stricter
otherwise it's not guaranteed which of two candidates with the same
score would get specialized first, or at all.
2024-06-12 14:02:18 +02:00
Mats Petersson
2fb51fba8c
[FuncSpec] Update function specialization to handle phi-chains (#72903)
When using the LLVM flang compiler with alias analysis (AA) enabled,
SPEC2017:548.exchange2_r was running significantly slower than wihtout
the AA.

This was caused by the GVN pass replacing many of the loads in the
pre-AA code with phi-nodes that form a long chain of dependencies, which
the function specialization was unable to follow.

This adds a function to discover phi-nodes in a transitive set, with
some limitations to avoid spending ages analysing phi-nodes.

The minimum latency savings also had to be lowered - fewer load
instructions means less saving.

Adding some more prints to help debugging the isProfitable decision.

No significant change in compile time or generated code-size.

(A previous attempt to fix this was abandoned: https://github.com/llvm/llvm-project/pull/71442)

---------

Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
2023-11-22 10:41:01 +00:00
Nikita Popov
c4c0ac10f1 [IPO] Remove unnecessary bitcasts (NFC) 2023-11-06 16:49:45 +01:00
Matthias Braun
5181156b37
Use BlockFrequency type in more places (NFC) (#68266)
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.

- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
2023-10-05 11:40:17 -07:00
Alexandros Lamprineas
e15d72adac
[FuncSpec] Adjust the names of specializations and promoted stack values
Currently the naming scheme is a bit funky; the specializations are named
after the original function followed by an arbitrary decimal number. This
makes it hard to debug inlined specializations of recursive functions.
With this patch I am adding ".specialized." in between of the original
name and the suffix, which is now a single increment counter.
2023-09-19 14:40:31 +01:00
Alexandros Lamprineas
7afea8a8c3 [NFC][FuncSpec] Update the description of function specialization.
The code has changed significantly over time making the description
outdated. In this patch I am re-writing the description with an
emphasis to the cost model, where most of the changes have happened.

Differential Revision: https://reviews.llvm.org/D158723
2023-08-26 11:16:51 +01:00
Alexandros Lamprineas
386aa2ab9d [FuncSpec] Increase the maximum number of times the specializer can run.
* Changes the default value of FuncSpecMaxIters from 1 to 10.
  This allows specialization of recursive functions.
* Adds an option to control the maximum codesize growth per function.
* Measured ~45% performance uplift for SPEC2017:548.exchange2_r on
  AWS Graviton3.

Differential Revision: https://reviews.llvm.org/D145819
2023-08-22 09:36:12 +01:00
Kazu Hirata
3c03a15104 [llvm] Use DenseMap::lookup (NFC) 2023-08-10 18:44:16 -07:00
Alexandros Lamprineas
d1b376fd7b [FuncSpec] Rework the discardment logic for unprofitable specializations.
Currently we make an arbitrary comparison between codesize and latency
in order to decide whether to keep a specialization or not. Sometimes
the latency savings are biased in favor of loops because of imprecise
block frequencies, therefore this metric contains a lot of noise. This
patch tries to address the problem as follows:

* Reject specializations whose codesize savings are less than X% of
  the original function size.
* Reject specializations whose latency savings are less than Y% of
  the original function size.
* Reject specializations whose inlining bonus is less than Z% of
  the original function size.

I am not saying this is super precise, but at least X, Y and Z are
configurable, allowing us to tweak the cost model. Moreover, it lets
us prioritize codesize over latency, which is a less noisy metric.

I am also increasing the minimum size a function should have to be
considered a candidate for specialization. Initially the cost of
a function was calculated as

  CodeMetrics::NumInsts * InlineConstants::getInstrCost()

which later in D150464 was altered into CodeMetrics::NumInsts since
the metric is supposed to model TargetTransformInfo::TCK_CodeSize.
However, we omitted adjusting MinFunctionSize in that commit.

Differential Revision: https://reviews.llvm.org/D157123
2023-08-09 10:28:46 +01:00
Alexandros Lamprineas
c2d19002ae [FuncSpec] Estimate dead blocks more accurately.
Currently we only consider basic blocks with a unique predecessor when
estimating the size of dead code. However, we could expand to this to
consider blocks with a back-edge, or blocks preceded by dead blocks.

Differential Revision: https://reviews.llvm.org/D156903
2023-08-07 11:04:23 +01:00
Alexandros Lamprineas
5bfefff1c4 Reland [FuncSpec] Split the specialization bonus into CodeSize and Latency.
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.

Differential Revision: https://reviews.llvm.org/D155103
2023-08-02 12:41:13 +01:00
Alexandros Lamprineas
893d3a61c0 Reland [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-31 08:25:48 +01:00
Douglas Yung
32683b231e Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."
This reverts commit 96ff464dd3aac255adc52787a1e28487a9cd4c35.

The test in this change was failing on many buildbots:

https://lab.llvm.org/buildbot/#/builders/164/builds/41292
https://lab.llvm.org/buildbot/#/builders/258/builds/4491
https://lab.llvm.org/buildbot/#/builders/192/builds/3566
https://lab.llvm.org/buildbot/#/builders/123/builds/20411
https://lab.llvm.org/buildbot/#/builders/58/builds/42553
https://lab.llvm.org/buildbot/#/builders/247/builds/7037
https://lab.llvm.org/buildbot/#/builders/139/builds/46259
https://lab.llvm.org/buildbot/#/builders/216/builds/24650
https://lab.llvm.org/buildbot/#/builders/234/builds/12571
https://lab.llvm.org/buildbot/#/builders/232/builds/12574
https://lab.llvm.org/buildbot/#/builders/235/builds/975
2023-07-27 13:47:52 -07:00
Alexandros Lamprineas
96ff464dd3 [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

In addition to the last revision this one fixes the bug reported on
Phabricator.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-27 19:24:11 +01:00
Alexandros Lamprineas
c52ab9ea2f Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."
Reverting due to the crash reported in D154852.

Also reverting the subsequent commit as collateral damage:

"[FuncSpec] Split the specialization bonus into CodeSize and Latency."
2023-07-26 12:33:41 +01:00
Alexandros Lamprineas
20c8f58c11 [FuncSpec] Split the specialization bonus into CodeSize and Latency.
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.

Differential Revision: https://reviews.llvm.org/D155103
2023-07-26 12:03:46 +01:00
Alexandros Lamprineas
59a5c582dc [FuncSpec][NFC] Leave a comment for future improvements.
Adds a TODO for checking inlinining opportunities while traversing
the users of the specialization arguments. This was brought up in
the review of D154852.
2023-07-25 11:58:42 +01:00
Alexandros Lamprineas
03f1d09fe4 [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-25 11:00:20 +01:00
Alexandros Lamprineas
bb6d60bf9d [FuncSpec][NFC] Sink cast into function.
Before looking up a value in the map of known constants we attempt
to dynamically cast it. The code looks cleaner if we move the cast
inside findConstantFor(), where the look up happens.

Differential Revision: https://reviews.llvm.org/D155177
2023-07-14 14:00:23 +01:00
Alexandros Lamprineas
1d0476cb4d [FuncSpec] Prefer DataLayout-aware constant folding of GEPs.
As shown in D154820, the DataLayout-independent constant folding
interface is not good enough for handling GEPs. Instead we should
be using the DataLayout-aware constant folding interface. Since
there isn't a method to specifically handle GEPs we can use the
one which folds generic instruction operands.

Differential Revision: https://reviews.llvm.org/D154821
2023-07-11 13:24:26 +01:00
Vincent Lee
2fc0d17e8b [FuncSpec] Avoid crashing when SwitchInst doesn't see ConstantInt
D150464 updated the cost model for function specialization. Unfortunately, this
also crashes when trying to build stage2 LLD with thinLTO and assertions. It looks
like the issue is caused by a mishandling of the Constant in a SwitchInst since the
Constant cannot always be assumed to safely casted to a ConstantInt. In the case
of the crash, Constant was a ConstantExpr which triggered the assertion.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D154159
2023-06-30 21:02:50 -07:00
Alexandros Lamprineas
ce9d3f09b4 [FuncSpec] Promote stack values before specialization.
After each iteration of the function specializer, constant stack values
are promoted to constant globals in order to enable recursive function
specialization. This should also be done once before running the
specializer. Enables specialization of _QMbrute_forcePdigits_2 from
SPEC2017:548.exchange2_r.

Differential Revision: https://reviews.llvm.org/D152799
2023-06-19 14:32:41 +01:00
Alexandros Lamprineas
5400257ded [FuncSpec] Add Freeze and CallBase to the InstCostVisitor.
Allows constant folding of such instructions when estimating user bonus.

Differential Revision: https://reviews.llvm.org/D153036
2023-06-19 10:53:08 +01:00
Alexandros Lamprineas
4d13896d8a Reland "[FuncSpec] Improve the accuracy of the cost model"
Instead of blindly traversing the use-def chain of constant arguments,
compute known constants along the way. Stop as soon as a user cannot
be replaced by a constant. Keep it light-weight by handling some basic
instruction types.

Differential Revision: https://reviews.llvm.org/D150464
2023-06-08 17:44:48 +01:00
Alexandros Lamprineas
475ddca56e Reland "[FuncSpec] Replace LoopInfo with BlockFrequencyInfo"
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.

Differential Revision: https://reviews.llvm.org/D150375
2023-06-08 17:44:47 +01:00
Nikita Popov
96a14f388b Revert "[FuncSpec] Replace LoopInfo with BlockFrequencyInfo"
As reported on https://reviews.llvm.org/D150375#4367861 and
following, this change causes PDT invalidation issues. Revert
it and dependent commits.

This reverts commit 0524534d5220da5ecb2cd424a46520184d2be366.
This reverts commit ced90d1ff64a89a13479a37a3b17a411a3259f9f.
This reverts commit 9f992cc9350a7f7072a6dbf018ea07142ea7a7ed.
This reverts commit 1b1232047e83b69561fd64b9547cb0a0d374473a.
2023-05-30 14:49:03 +02:00
Alexandros Lamprineas
0524534d52 [FuncSpec] Enable specialization of literal constants.
To do so we have to tweak the cost model such that specialization
does not trigger excessively.

Differential Revision: https://reviews.llvm.org/D150649
2023-05-25 09:55:46 +01:00
Alexandros Lamprineas
ced90d1ff6 [FuncSpec] Improve the accuracy of the cost model.
Instead of blindly traversing the use-def chain of constant arguments,
compute known constants along the way. Stop as soon as a user cannot
be replaced by a constant. Keep it light-weight by handling some basic
instruction types.

Differential Revision: https://reviews.llvm.org/D150464
2023-05-24 11:40:12 +01:00
Alexandros Lamprineas
1b1232047e [FuncSpec] Replace LoopInfo with BlockFrequencyInfo.
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.

Differential Revision: https://reviews.llvm.org/D150375
2023-05-22 17:49:52 +01:00
Alexandros Lamprineas
929a8c9f72 [FuncSpec][NFC] Rename cryptic variable to better describe it.
UM -> UniqueSpecs

Brought up on the review of D145379. Committing it seperately for now
since the Cost model improvements need rethink.
2023-05-09 11:55:40 +01:00
Alexandros Lamprineas
93ac2dbefc [FuncSpec][NFC] Add an alias for InstructionCost.
Split from D145379. I'll rethink the Cost model improvements and
commit separately.
2023-05-09 11:37:48 +01:00
Momchil Velikov
de24d08459 [FuncSpec] Fix inconsistent treatment of global variables
There are a few inaccuracies with how FuncSpec handles global
variables.

When specialisation on non-const global variables is disabled (the
default) the pass could nevertheless perform some specializations,
e.g. on a constant GEP expression, or on a SSA variable, for which the
Solver has determined it has the value of a global variable.

When specialisation on non-const global variables is enabled, the pass
would skip non-scalars, e.g. a global array, but this should be
completely inconsequential, a pointer is a pointer.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D149476

Change-Id: Ic73051b2f8602587306760bf2ec552e5860f8d39
2023-05-05 09:56:06 +01:00
Alexandros Lamprineas
54e5fb789c [FuncSpec] Track the return values of specializations.
To track the return values of specializations, we need to invalidate all
the lattice values across the use-def chain which originates from the
callsites, recompute and propagate.

Differential Revision: https://reviews.llvm.org/D146158
2023-04-24 13:18:49 +01:00
Momchil Velikov
cc7bb7080f [FuncSpec] Relax restrictions on candidates for specialisation
Allow a function to be specialised even if it has its address taken or
it's global. For such functions, consider all of the arguments as
overdefined. Don't delete the functions even if all the apparent calls
were redirected to specialised instances.

Reviewed By: labrinea, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D148345
2023-04-20 12:06:22 +01:00