Before this patch, the compiler gave a bump to the inline-threshold
when the total size of the allocas passed as arguments to the
callee was below 256 bytes.
This heuristic ignores that some of these allocas could have be removed
by SROA if inlining was applied.
Ideally, this bonus would be attributed to the threshold once the
size of all the allocas that could not be handled by SROA is known:
at the end of the InlineCost analysis.
However, we may never reach this point if the inline-cost analysis exits
early when the inline cost goes over the threshold mid-analysis.
This patch proposes:
* Attribute the bonus in the inline-threshold when allocas are passed
as arguments (regardless of their total size).
* Assigns a cost to each alloca proportional to its size,
such that the cost of all the allocas cancels the bonus.
Potential problems:
* This patch assumes that removing alloca instructions with SROA is
always profitable. This may not be the case if the total size of the
allocas is still too big to be promoted to registers/LDS.
* Redundant calls to getTotalAllocaSize
* Awkwardly, the threshold attributed contributes to the single-bb and
vector bonus.
Reviewed By: scchan
Differential Revision: https://reviews.llvm.org/D149741
-amdgpu-inline-max-bb option could lead to a suboptimal
codegen preventing inlining of really simple functions
including pure wrapper calls. Relax the cutoff by allowing
to call a function with a single block on the grounds
that it will not increase total number of blocks after
inlining.
Differential Revision: https://reviews.llvm.org/D97744
Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.
amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
some limit
2) It increases the threshold if there are pointers to private arrays(?)
These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.
This way we can remove the custom amdgpu-inline pass.
This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D94153
This seems to only have overridden cold handling, which we probably
shouldn't do. As far as I can tell the wrapper library functions are
still inlined as appropriate.
Summary:
Most of IR instructions got better code size estimations after commit 47a5c36b.
So default parameters values should be updated to improve inlining and
unrolling for the target.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70391
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.
amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64642
llvm-svn: 366348