As reported in
https://github.com/llvm/llvm-project/issues/154668#issuecomment-3233294078,
we missed invalidating analysis as we don't set the MadeChanges to true
after removing dead functions.
This patch makes it explicit to remove the dead functions marked by
FuncSpec in SCCP and set MadeChanges correctly.
Fixes#155738.
The original assumption "we already replaced its users with a constant"
for the global variable becomes incorrect after #154668. The users in
the dead function are not simplified, in fact.
This patch poisons all the unsimplified constant global variable users.
Fixes#153295.
For test case below:
```llvm
define i32 @caller() {
entry:
%call1 = call i32 @callee(i32 1)
%call2 = call i32 @callee(i32 0)
%cond = icmp eq i32 %call2, 0
br i1 %cond, label %common.ret, label %if.then
common.ret: ; preds = %entry
ret i32 0
if.then: ; preds = %entry
%unreachable_call = call i32 @callee(i32 2)
ret i32 %unreachable_call
}
define internal i32 @callee(i32 %ac) {
entry:
br label %ai
ai: ; preds = %ai, %entry
%add = or i32 0, 0
%cond = icmp eq i32 %ac, 1
br i1 %cond, label %aj, label %ai
aj: ; preds = %ai
ret i32 0
}
```
Before specialization, the SCCP solver determines that
`unreachable_call` is unexecutable, as the value of `callee` can only be
zero.
After specializing the call sites `call1` and `call2`, FnSpecializer
announces `callee` is a dead function since all executable call sites
are specialized. However, the unexecutable call sites can become
executable again after solving specialized calls.
In this testcase, `call2` is considered `Overdefined` after
specialization, making `cond` also `Overdefined`. Thus,
`unreachable_call` becomes executable.
This patch skips SCCP on the blocks in dead functions, and poisons the
call sites of dead functions.
Currently BlockAddresses store both the Function and the BasicBlock they
reference, and the BlockAddress is part of the use list of both the
Function and BasicBlock.
This is quite awkward, because this is not really a use of the function
itself (and walks of function uses generally skip block addresses for
that reason). This also has weird implications on function RAUW (as that
will replace the function in block addresses in a way that generally
doesn't make sense), and causes other peculiar issues, like the ability
to have multiple block addresses for one block (with different
functions).
Instead, I believe it makes more sense to specify only the basic block
and let the function be implied by the BB parent. This does mean that we
may have block addresses without a function (if the BB is not inserted),
but this should only happen during IR construction.
Model C/C++ `errno` macro by adding a corresponding `errno`
memory location kind to the IR. Preliminary work to separate
`errno` writes from other memory accesses, to the benefit of
alias analyses and optimization correctness.
Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.
Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
During inter-procedural SCCP, also infer attributes on arguments, not
just return values. This allows other non-interprocedural passes to make
use of the information later.
Similarly to the existing range attribute inference, also infer the
nonnull attribute on function return values.
I think in practice FunctionAttrs will handle nearly all cases, the main
one I think it doesn't is cases involving branch conditions. But as we
already have the information here, we may as well materialize it.
It might seem obvious, but it's not a good idea to convert a
debug-intrinsic instruction into an UnreachableInst, as this means
things operate differently with and without the -g option. However this
can happen due to the "mutate the next instruction" API calls we make.
With RemoveDIs eliminating debug intrinsics, this behaviour is at risk
of changing, hence this patch ensures we only ever mutate the next _non_
debuginfo instruction into an Unreachable.
The tests instrumented with the --try... flag all exercise this, I've
added some metadata to a SCCP test to ensure it's exercised.
* Changes the default value of FuncSpecMaxIters from 1 to 10.
This allows specialization of recursive functions.
* Adds an option to control the maximum codesize growth per function.
* Measured ~45% performance uplift for SPEC2017:548.exchange2_r on
AWS Graviton3.
Differential Revision: https://reviews.llvm.org/D145819
In a follow up we will reuse the logic in MemoryEffectsBase to merge
AAMemoryLocation and AAMemoryBehavior without duplicating all the bit
fiddling code already available in MemoryEffectsBase.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153305
Move `AttributeMask` out of `llvm/IR/Attributes.h` to a new file
`llvm/IR/AttributeMask.h`. After doing this we can remove the
`#include <bitset>` and `#include <set>` directives from `Attributes.h`.
Since there are many headers including `Attributes.h`, but not needing
the definition of `AttributeMask`, this causes unnecessary bloating of
the translation units and slows down compilation.
This commit adds in the include directive for `llvm/IR/AttributeMask.h`
to the handful of source files that need to see the definition.
This reduces the total number of preprocessing tokens across the LLVM
source files in lib from (roughly) 1,917,509,187 to 1,902,982,273 - a
reduction of ~0.76%. This should result in a small improvement in
compilation time.
Differential Revision: https://reviews.llvm.org/D153728
In replaceSignedInst, if a signed instruction can be repalced with
unsigned instruction, we created a new instruction and removed the old
instruction's value state. If the following instructions has this new
instruction as a use operand, transformations like replaceSignedInst and
refineInstruction would be blocked. The reason is there is no value
state for the new instrution.
This patch set the new instruction's value state with the removed
instruction's value state. I believe it is correct bacause when we
repalce a signed instruction with unsigned instruction, the value state
is not changed.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D152337
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.
Differential Revision: https://reviews.llvm.org/D150375
The SCCPSolver is using a structure (AnalysisResultsForFn) where it keeps
pointers to various analyses needed by the IPSCCP pass. These analyses are
requested all at the same time, which can become problematic in some cases.
For example one could be retrieved via getCachedAnalysis() prior to the
actual execution of the analysis. In more detail:
The IPSCCP pass uses a DomTreeUpdater to preserve the PostDominatorTree
in case the PostDominatorTreeAnalysis had run before IPSCCP. Starting with
commit 1b1232047e83b the IPSCCP pass may use BlockFrequencyAnalysis for
some functions in the module. As a result, the PostDominatorTreeAnalysis
may not run until the BlockFrequencyAnalysis has run, since the latter
analysis depends on the former. Currently, we setup the DomTreeUpdater
using getCachedAnalysis to retrieve a PostDominatorTree. This happens
before BlockFrequencyAnalysis has run, therefore the cached analysis can
become invalid by the time we use it.
Differential Revision: https://reviews.llvm.org/D151666
As reported on https://reviews.llvm.org/D150375#4367861 and
following, this change causes PDT invalidation issues. Revert
it and dependent commits.
This reverts commit 0524534d5220da5ecb2cd424a46520184d2be366.
This reverts commit ced90d1ff64a89a13479a37a3b17a411a3259f9f.
This reverts commit 9f992cc9350a7f7072a6dbf018ea07142ea7a7ed.
This reverts commit 1b1232047e83b69561fd64b9547cb0a0d374473a.
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.
Differential Revision: https://reviews.llvm.org/D150375
Since https://reviews.llvm.org/D141386 !range violations return
poison instead of causing immediate undefined behavior. As such,
it is fine for IPSCCP to infer !range even if the value might be
poison. (The value cannot be undef as this would promote undef to
poison, but this is already checked separately.)
This basically undoes the late change done to D83952, restoring
it to its original version (which is now valid).
Differential Revision: https://reviews.llvm.org/D144467
When replacing return values with undef, we should also drop the
noundef attribute (and other UB implying attributes).
Differential Revision: https://reviews.llvm.org/D144461
This patch adds several missing GlobalList modifier functions, like
removeGlobalVariable(), eraseGlobalVariable() and insertGlobalVariable().
There is no longer need to access the list directly so it also makes
getGlobalList() private.
Differential Revision: https://reviews.llvm.org/D144027
This patch moves a couple of helper functions from the global llvm::
namespace into the SCCPSolver class. This reduces the need for separate
SCCPSolver arguments and also limits the scope of those functions that
have quite generic names.
(The remaining isConstant and isOverdefined should ideally be removed)
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D142370
This includes 2 different, related fixes:
1. Fix asserting on direct assume-like intrinsic uses of a function
address
2. Fix asserting on constant expression casts used by assume-like
intrinsics.
By default hasAddressTaken permits assume-like intrinsic uses, which
ignores assume-like calls and pointer casts of the address used by
assume-like calls.
Fixes#59602, but there are additional issues I encountered when
debugging this. For instance, the original failing bitcast expression
was really unused. Clang tentatively created it for the function type,
but was unnecessary after applyGlobalValReplacements. That did not
clean up the now dead ConstantExpr which hung around oun the user
list, so this assert only reproduced when running clang from the
original testcase, and didn't just running opt -passes=ipsccp. I don't
know who is responsible for cleaning up unused ConstantExprs, but I've
run into similar issues several times recently.
Additionally, I found a few assertions with llvm.ssa.copy with
functions and casts of functions as the argument.
Another issue theoretically exists if hasAddressTaken chooses to
respect nocapture when passed function addresses. The search here
would need to do additional work to look at the users of the constant
cast to see if any call sites need returned to be stripped.
Reland 877a9f9abec61f06e39f1cd872e37b828139c2d1 since D138654 (parent)
has been fixed with 9ebaf4fef4aac89d4eff08e48185d61bc893f14e and with
8f1e11c5a7d70f96943a72649daa69f152d73e90.
Differential Revision: https://reviews.llvm.org/D126455
Reland 42c2dc401742266da3e0251b6c1ca491f4779963 which was reverted
in cb03b1bd99313a728d47060b909a73e7f5991231. The fix for the link
errors was to reintroduce one of the two occurences of 'Scalar'
under the LINK_COMPONENTS.
Differential Revision: https://reviews.llvm.org/D138654
This reverts commit 42c2dc401742266da3e0251b6c1ca491f4779963.
This broke some buildbots:
undefined reference to `llvm::createBitTrackingDCEPass()'
undefined reference to `llvm::createAlignmentFromAssumptionsPass()'
undefined reference to `llvm::createLoopUnrollPass(int, bool, bool, int, int, int, int, int, int)'
undefined reference to `llvm::createLICMPass(unsigned int, unsigned int, bool)'
undefined reference to `llvm::createWarnMissedTransformationsPass()'
undefined reference to `llvm::createAlignmentFromAssumptionsPass()'
undefined reference to `llvm::createCallSiteSplittingPass()'
undefined reference to `llvm::createCFGSimplificationPass(llvm::SimplifyCFGOptions, std::function<bool (llvm::Function const&)>)'
undefined reference to `llvm::createFloat2IntPass()'
undefined reference to `llvm::createLowerConstantIntrinsicsPass()'
undefined reference to `llvm::createLoopRotatePass(int, bool)'
undefined reference to `llvm::createLoopDistributePass()'
undefined reference to `llvm::createLoopSinkPass()'
undefined reference to `llvm::createInstSimplifyLegacyPass()'
undefined reference to `llvm::createDivRemPairsPass()'
undefined reference to `llvm::createCFGSimplificationPass(llvm::SimplifyCFGOptions, std::function<bool (llvm::Function const&)>)'
undefined reference to `llvm::SetLicmMssaOptCap'
undefined reference to `llvm::SetLicmMssaNoAccForPromotionCap'
undefined reference to `llvm::ForgetSCEVInLoopUnroll'
This reverts commit 877a9f9abec61f06e39f1cd872e37b828139c2d1.
It depends on the parent revision 42c2dc401742266da3e0251b6c1ca491f4779963
which needs to be reverted as it broke some buildbots, so reverting both.
The aim of this patch is to minimize the compilation time overhead of
running Function Specialization. It is about 40% slower to run as a
standalone pass (IPSCCP + FuncSpec vs IPSCCP with FuncSpec) according
to my measurements. I compiled the llvm testsuite with NewPM-O3 + LTO
and measured single threaded [user + system] time of IPSCCP and FuncSpec
by passing the '-time-passes' option to lld. Then I compared the two
configurations in terms of Instruction Count of the total compilation
(not of the individual passes) as in https://llvm-compile-time-tracker.com.
Geomean for non-LTO builds is -0.25% and LTO is -0.5% approximately.
You can find more info below:
https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518
Differential Revision: https://reviews.llvm.org/D126455
The LLVMipo library no longer depends on the Scalar component.
The shared functions between IPSCCP and SCCP have been moved
under Utils, in the SCCPSolver.
This is preliminary work for D126455, in order to break a cyclic
dependency between LLVM libraries.
Differential Revision: https://reviews.llvm.org/D138654
Deleting a fully specialised function left dangling pointers in
`FunctionAnalysisManager`, which causes an internal compiler error
when the function's storage was reused.
Fixes bug #58759.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D138909
Change-Id: Ifed378c748af35e8fe7dcbdddb0f41b8777cbe87
The `FunctionSpecialization` pass needs loop analysis results for its
cost function. For this purpose, it computes the `DominatorTree` and
`LoopInfo` for a function in `getSpecializationBonus`. This function,
however, is called O(number of call sites x number of arguments), but
the DominatorTree/LoopInfo can be computed just once.
This patch plugs into the PassManager infrastructure to obtain
LoopInfo for a function and removes ad-hoc computation from
`getSpecializatioBonus`.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D136332