This reverts commit 86bfeb906e3a95ae428f3e97d78d3d22a7c839f3.
This is a long time coming re-application that was originally reverted due to
regressions, unrelated to the actual inlining change. These regressions have since
been fixed due to another long-in-the-making change of a66051c6 landing.
Original commit message for reference:
---
We have several situations where it's beneficial for code size to ensure that every
call to always-inline functions are inlined before normal inlining decisions are
made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this,
it only does it on a per-SCC basis, rather than the whole module. Ensuring that
all mandatory inlinings are done before any heuristic based decisions are made
just makes sense.
Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary
for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0.
This also fixes an exponential compile time blow up in
https://github.com/llvm/llvm-project/issues/59126
Differential Revision: https://reviews.llvm.org/D143624
---
This patch adds additional logic to add additional facts for A != B, if
A is a monotonically increasing induction phi. The motivating use case
for this is removing checks when using iterators with hardened libc++,
e.g. https://godbolt.org/z/zhKEP37vG.
The patch pulls in SCEV to detect AddRecs. If possible, the patch adds
the following facts for a AddRec phi PN with StartValue as incoming
value from the loo preheader and B being an upper bound for PN from a
condition in the loop header.
* (ICMP_UGE, PN, StartValue)
* (ICMP_ULT, PN, B) [if (ICMP_ULE, StartValue, B)]
The patch also adds an optional precondition to FactOrCheck (the new
DoesHold field) , which can be used to only add a fact if the
precondition holds at the point the fact is added to the constraint
system.
Depends on D151799.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D152730
Adjust the pipeline slightly to move ConstraintElim just before the loop
simplification pipeline. This increases the number of cases where SCEV
should can preserved in the future.
This also enables slightly more opportunities, by benefiting from
earlier CFG simplifications, which allow more conditions to be added.
Reviewed By: nikic, antoniofrighetto
Differential Revision: https://reviews.llvm.org/D158843
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.
Differential Revision: https://reviews.llvm.org/D150375
As reported on https://reviews.llvm.org/D150375#4367861 and
following, this change causes PDT invalidation issues. Revert
it and dependent commits.
This reverts commit 0524534d5220da5ecb2cd424a46520184d2be366.
This reverts commit ced90d1ff64a89a13479a37a3b17a411a3259f9f.
This reverts commit 9f992cc9350a7f7072a6dbf018ea07142ea7a7ed.
This reverts commit 1b1232047e83b69561fd64b9547cb0a0d374473a.
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.
Differential Revision: https://reviews.llvm.org/D150375
This reverts commit 6f29d1adf29820daae9ea7a01ae2588b67735b9e.
https://reviews.llvm.org/D149768 is causing size regressions for -Oz
with FullLTO, and I'm reverting that one while investigating. This
commit depends on that one, so it needs to be reverted as well.
This is a cheap pass so there's no need to limit to -O3.
This removes some differences between various pipelines.
Code size regressions should be addressed with
https://reviews.llvm.org/D149768.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148269
Original patch (50b2a113db197a97f60ad2aace8b7382dc9b8c31) ignored the
fact that -ftrivial-auto-var-init could affect function parameters with
the sret attribute.
Just do not move instruction that don't affect alloca.
Also add missing test case for volatile instruction.
Differential Revision: https://reviews.llvm.org/D148507
This is effectively a debugging pass to adjust function attributes.
I don't think it makes sense to run it in the post-link pipeline.
Differential Revision: https://reviews.llvm.org/D148904
LICM does not use ORE from the pass manager, it constructs its
own instance. As such, explicitly requiring the analysis in the
pipeline is unnecessary.
This is a cheap pass so there's no need to limit to -O3.
This removes some differences between various pipelines.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148269
The pre-link pipeline already ran the pass and it only needs to be run once.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D145978
This could also move initialization of sret args, causing actually
initialized parts of such return values to be uninitialized. See
discussion on the code review.
> As a result of -ftrivial-auto-var-init, clang generates instructions to
> set alloca'd memory to a given pattern, right after the allocation site.
> In some cases, this (somehow costly) operation could be delayed, leading
> to conditional execution in some cases.
>
> This is not an uncommon situation: it happens ~500 times on the cPython
> code base, and much more on the LLVM codebase. The benefit greatly
> varies on the execution path, but it should not regress on performance.
>
> This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
> MemorySSA update fixes.
>
> Differential Revision: https://reviews.llvm.org/D137707
This reverts commit 50b2a113db197a97f60ad2aace8b7382dc9b8c31
and follow-up commit ad9ad3735c4821ff4651fab7537a75b8f0bb60f8.
Next step after https://reviews.llvm.org/D113179
Recently a set of patches by @anton-afanasyev improved many cases (better and cleaner vectorized code) thanks to improvements to AIC's TruncInstCombine (IC cannot handle it) motivated by real examples in bug reports. There was a discussion that -O2 could benefit from AIC as well, but discussion then stalled, so I would like restart it, with new numbers from LLVM compile time tracker.
As -O2 pipeline is not tracked by LLVM compile time tracker, I disabled AIC for -O3 to get an idea how expensive is it. Without AIC, I observed that geomean was cca -0.10%. Given that it seems like AIC is quite cheap, heavily tested by -O3 pipeline, I am proposing to enable it also with -O2 and similar to improve quality to vectorized code.
https://llvm-compile-time-tracker.com/compare.php?from=a1df5abef5f27646c809c7b85cf6170eb68f7735&to=e1ba6068f58c6ca862b920b8750faccb42a5843c&stat=instructions:u
Differential Revision: https://reviews.llvm.org/D147604
Reviewed-By: nikic
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.
This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.
This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
MemorySSA update fixes.
Differential Revision: https://reviews.llvm.org/D137707
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.
This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.
Differential Revision: https://reviews.llvm.org/D137707
The function order in some tests had to be changed because they relied on ordering of functions returned in an SCC which is consistent but unspecified.
With opaque pointers, all function pointer types are the same, meaning there should be no bitcasts.
Internal benchmarks with SampleFDO look neutral.
This was added in D36333.
Reviewed By: tejohnson, davidxl
Differential Revision: https://reviews.llvm.org/D146099
This is a modified version of commit b374423304a8 by
Arthur (https://reviews.llvm.org/D143424).
Here we invoke to the pass independent of PGOOPT. We now check if the
profile is available through the program summary. This ensures CHR is
called in distributed ThinLTO BE compilation (where PGOOPT might not
be created).
Differential Revision: https://reviews.llvm.org/D144769
This pass isn't a simplification, it's a non-canonical optimization.
This makes it only run once in a (Thin)LTO pipeline during postlink, just like all the other optimization pipeline passes.
Reviewed By: xur
Differential Revision: https://reviews.llvm.org/D143424
This seems to cause large regressions in existing code, as much as 75% slower
(4x the time taken). Small always inline functions seem to be used a lot in the
cmsis-dsp library.
I would add a phase ordering test to show the problems, but one already exists!
The llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll was just changed by
removing alwaysinline to hide the problems that existed.
This reverts commit cae033dcf227aeecf58fca5af6fc7fde1fd2fb4f.
This reverts commit 8e33c41e72ad42e4c27f8cbc3ad2e02b169637a1.
We have several situations where it's beneficial for code size to ensure that every
call to always-inline functions are inlined before normal inlining decisions are
made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this,
it only does it on a per-SCC basis, rather than the whole module. Ensuring that
all mandatory inlinings are done before any heuristic based decisions are made
just makes sense.
Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary
for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0.
This also fixes an exponential compile time blow up in
https://github.com/llvm/llvm-project/issues/59126
Differential Revision: https://reviews.llvm.org/D143624
This reverts commit fb13dcf3431cd83911fe56899d2fade808dc5b8d.
A large compile-time regression for code generated by sanitizers has
been reported. Revert while I investigate the issue. Details and
reproducers are available here: https://reviews.llvm.org/D135915
This reverts commit 2656572d485127cc30b8fe9752024d2a0f1c50db.
It looks like CINT2017rate/502.gcc_r gets mis-compiled with LTO + PGO on
AArch64 with function specialization.
This patch enables Function Specialization by default at all
optimization levels except Os, Oz.
Compilation Time Overhead:
--------------------------
Measured the Instruction Count increase (Geomean) for CTMark from
the llvm-testsuite as in https://llvm-compile-time-tracker.com.
* {-O3, Non-LTO}: +0.136% Instruction Count
* {-O3, LTO}: +0.346% Instruction Count
Performance Uplift:
-------------------
Measured +9.121% score increase for 505.mcf_r from SPEC Int 2017
(Tested on Neoverse N1 with -O3 + LTO)
Correctness Testing:
--------------------
* Passes bootstrap Clang with ASAN + LTO + FuncSpec aggressive options:
{ MaxClonesThreshold=10,
SmallFunctionThreshold=10,
AvgLoopIterationCount=30,
SpecializeOnAddresses=true,
EnableSpecializationForLiteralConstant=true,
FuncSpecializationMaxIters=10 }
* Builds Chromium and passes its unittests with the above options + ThinLTO.
For more info please refer to
https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518
Differential Revision: https://reviews.llvm.org/D140210
BasicAA currently has an optional dependency on the PhiValues
analysis. However, at least with our current pipeline setup, we
never actually make use of it. It's possible that this used to work
with the legacy pass manager, but I'm not sure of that either.
Given that this analysis has not actually been in use for a long
time, and nobody noticed or complained, I think we should drop
support for it and focus on one code path. It is worth noting that
analysis quality for the non-PhiValues case has significantly
improved in the meantime.
If we really wanted to make use of PhiValues, the right way would
probably be to pass it in via AAQI in places we want to use it,
rather than using an optional pass manager dependency (which are
an unpredictable PITA and should really only ever be used for
analyses that are only preserved and not used).
Differential Revision: https://reviews.llvm.org/D139719
The sanitizer bots turned green again after another change went in, i.e.
revert 26dd64ba9cfabe5474bb207f3b7099965f81fed7, so I don't think this
patch was causing the problems.