Recently some users reported that they observed large increases of
runtime (up to +600% on some translation units) when they upgraded to a
more recent (slightly patched, internal) clang version. Bisection
revealed that the bulk of this increase was probably caused by my
earlier commit bb27d5e5c6b194a1440b8ac4e5ace68d0ee2a849 ("Don't assume
third iteration in loops").
As I evaluated that earlier commit on several open source project, it
turns out that on average it's runtime-neutral (or slightly helpful: it
reduced the total analysis time by 1.5%) but it can cause runtime spikes
on some code: in particular it more than doubled the time to analyze
`tmux` (one of the smaller test projects).
Further profiling and investigation proved that these spikes were caused
by an _increase of analysis scope_ because there was an heuristic that
placed functions on a "don't inline this" blacklist if they reached the
`-analyzer-max-loop` limit (anywhere, on any one execution path) --
which became significantly rarer when my commit ensured the analyzer no
longer "just assumes" four iterations. (With more inlining significantly
more entry points use up their allocated budgets, which leads to the
increased runtime.)
I feel that this heuristic for the "don't inline" blacklist is
unjustified and arbitrary, because reaching the "retry without inlining"
limit on one path does not imply that inlining the function won't be
valuable on other paths -- so I hope that we can eventually replace it
with more "natural" limits of the analysis scope.
However, the runtime increases are annoying for the users whose project
is affected, so I created this quick workaround commit that approximates
the "don't inline" blacklist effects of ambiguous loops (where the
analyzer doesn't understand the loop condition) without fully reverting
the "Don't assume third iteration" commit (to avoid reintroducing the
false positives that were eliminated by it).
Investigating this issue was a team effort: I'm grateful to Endre Fülöp
(gamesh411) who did the bisection and shared his time measurement setup,
and Gábor Tóthvári (tigbr) who helped me in profiling.
VirtualCallChecker.cpp implements two related checkers:
- `optin.cplusplus.VirtualCall` which reports situations when
constructors or destructors call virtual methods (which is bugprone
because it does not trigger virtual dispatch, but can be legitmate).
- `cplusplus.PureVirtualCall` reports situations when constructors or
destructors call _pure_ virtual methods, which is an error.
Six years ago these two bug types were both reported by the same checker
(called `optin.cplusplus.VirtualCall`) and it had an option called
`PureOnly` which limited its output to the pure case.
When (in 2019) the two checker parts were separated by the commit
d3971fe97b64785c079d64bf4c8c3e2b5e1f85a1, the option `PureOnly` was
preserved for the sake of compatibility, but it is no longer useful
(when it is set to true, it just suppresses all reports from
`optin.cplusplus.VirtualCall`) so it was marked as deprecated.
I'm removing this deprecated option now because it is no longer relevant
and its presence caused minor complications when I was porting
`VirtualCallChecker.cpp` to the new multipart checker framework
(introduced in 27099982da2f5a6c2d282d6b385e79d080669546).
So far CSA was relying on the LLVM Statistic package that allowed us to
gather some data about analysis of an entire translation unit. However,
the translation unit consists of a collection of loosely related entry
points. Aggregating data across multiple such entry points is often
counter productive.
This change introduces a new lightweight always-on facility to collect
Boolean or numerical statistics for each entry point and dump them in a
CSV format. Such format makes it easy to aggregate data across multiple
translation units and analyze it with common data-processing tools.
We break down the existing statistics that were collected on the per-TU
basis into values per entry point.
Additionally, we enable the statistics unconditionally (STATISTIC ->
ALWAYS_ENABLED_STATISTIC) to facilitate their use (you can gather the
data with a simple run-time flag rather than having to recompile the
analyzer). These statistics are very light and add virtually no
overhead.
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
CPP-6160
In our test pool, the max entry point RT was improved by this change:
1'181 seconds (~19.7 minutes) -> 94 seconds (1.6 minutes)
BTW, the 1.6 minutes is still really bad. But a few orders of magnitude
better than it was before.
This was the most servere RT edge-case as you can see from the numbers.
There are are more known RT bottlenecks, such as:
- Large environment sizes, and `removeDead`. See more about the failed
attempt on improving it at:
https://discourse.llvm.org/t/unsuccessful-attempts-to-fix-a-slow-analysis-case-related-to-removedead-and-environment-size/84650
- Large chunk of time could be spend inside `assume`, to reach a fixed
point. This is something we want to look into a bit later if we have
time.
We have 3'075'607 entry points in our test set.
About 393'352 entry points ran longer than 1 second when measured.
To give a sense of the distribution, if we ignore the slowest 500 entry
points, then the maximum entry point runs for about 14 seconds. These
500 slow entry points are in 332 translation units.
By this patch, out of the slowest 500 entry points, 72 entry points were
improved by at least 10x after this change.
We measured no RT regression on the "usual" entry points.

(The dashed lines represent the maximum of their RT)
CPP-6092
This commit adds the new analyzer option
`assume-at-least-one-iteration`, which is `false` by default, but can be
set to `true` to ensure that the analyzer always assumes at least one
iteration in loops.
In some situations this "loop is skipped" execution path is an important
corner case that may evade the notice of the developer and hide
significant bugs -- however, there are also many situations where it's
guaranteed that at least one iteration will happen (e.g. some data
structure is always nonempty), but the analyzer cannot realize this and
will produce false positives when it assumes that the loop is skipped.
This commit refactors some logic around the implementation of the new
feature, but the only functional change is introducing the new analyzer
option. If the new option is left in its default state (false), then the
analysis is functionally equivalent to an analysis done with a version
before this commit.
If we have a refutation Z3 query timed out (UNDEF), allow a couple of
retries to improve stability of the query. By default allow 2 retries,
which will give us in maximum of 3 solve attempts per query.
Retries should help mitigating flaky Z3 queries.
See the details in the following RFC:
https://discourse.llvm.org/t/analyzer-rfc-retry-z3-crosscheck-queries-on-timeout/83711
Note that with each attempt, we spend more time per query.
Currently, we have a 15 seconds timeout per query - which are also in
effect for the retry attempts.
---
Why should this help?
In short, retrying queries should bring stability because if a query
runs long
it's more likely that it did so due to some runtime anomaly than it's on
the edge of succeeding. This is because most queries run quick, and the
queries that run long, usually run long by a fair amount.
Consequently, retries should improve the stability of the outcome of the
Z3 query.
In general, the retries shouldn't increase the overall analysis time
because it's really rare we hit the 0.1% of the cases when we would do
retries. But keep in mind that the retry attempts can add up if many
retries are allowed, or the individual query timeout is large.
CPP-5920
Discussion here:
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus
The original patch, #97298 introduced new timeouts backed by thorough
testing and measurements to keep the running time of Z3 within
reasonable limits. The measurements also showed that only certain
reports and certain TUs were responsible for the poor performance of Z3
refutation.
Unfortunately, it seems like that on machines with different
characteristics (slower machines) the current timeouts don't just axe
0.01% of reports, but many more as well. Considering that timeouts are
inherently nondeterministic as a cutoff point, this lead reports sets
being vastly different on the same projects with the same configuration.
The discussion link shows that all configurations introduced in the
patch with their default values lead to severa nondeterminism of the
analyzer. As we, and others use the analyzer as a gating tool for PRs,
we should revert to the original defaults.
We should respect that
* There are still parts of the analyzer that are either proven or
suspected to contain nondeterministic code (like pointer sets),
* A 15s timeout is more likely to hit the same reports every time on a
wider range of machines, but is still inherently nondeterministic, but
an infinite timeout leads to the tool hanging,
* If you measure the performance of the analyzer on your machines, you
can and should achieve some speedup with little or no observable
nondeterminism.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
This commit moves the **alpha.security.taint.TaintPropagation** and
**alpha.security.taint.GenericTaint** checkers to the **optin.taint**
optional package.
These checkers were stabilized and improved by recent commits thus
they are ready for production use.
As discussed in
https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10
Some `isTainted()` queries can blow up the analysis times, and
effectively halt the analysis under specific workloads.
We don't really have the time now to do a caching re-implementation of
`isTainted()`, so we need to workaround the case.
The workaround with the smallest blast radius was to limit what symbols
`isTainted()` does the query (by walking the SymExpr). So far, the
threshold 10 worked for us, but this value can be overridden using the
"max-tainted-symbol-complexity" config value.
This new option is "deprecated" from the getgo, as I expect this issue
to be fixed within the next few months and I don't want users to
override this value anyways. If they do, this message will let them know
that they are on their own, and the next release may break them (as we
no longer recognize this option if we drop it).
Mitigates #89720
CPP-5414
This is exactly as originally landed in #95129,
but now the minimal Z3 version was increased to meet this change in #96682.
https://discourse.llvm.org/t/bump-minimal-z3-requirements-from-4-7-1-to-4-8-9/79664/4
---
This patch is a functional change.
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520
As a result of this patch, individual Z3 queries in refutation will be
bound by 300ms. Every report equivalence class will be processed in at
most 1 second.
The heuristic should have only really marginal observable impact -
except for the cases when we had big report eqclasses with long-running
(15s) Z3 queries, where previously CSA effectively halted. After this
patch, CSA will tackle such extreme cases as well.
(cherry picked from commit eacc3b3504be061f7334410dd0eb599688ba103a)
This patch is a functional change.
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520
As a result of this patch, individual Z3 queries in refutation will be
bound by 300ms. Every report equivalence class will be processed in
at most 1 second.
The heuristic should have only really marginal observable impact -
except for the cases when we had big report eqclasses with long-running
(15s) Z3 queries, where previously CSA effectively halted.
After this patch, CSA will tackle such extreme cases as well.
Reviewers: NagyDonat, haoNoQ, Xazax-hun, Szelethus, mikhailramalho
Reviewed By: NagyDonat
Pull Request: https://github.com/llvm/llvm-project/pull/95129
The checker may create failure branches for all stream write operations
only if the new option "pedantic" is set to true.
Result of the write operations is often not checked in typical code. If
failure branches are created the checker will warn for unchecked write
operations and generate a lot of "false positives" (these are valid
warnings but the programmer does not care about this problem).
Default value of checker option `ModelPOSIX` is changed to `true`.
Documentation is updated.
This is a re-apply of commit 7af4e8bcc354d2bd7e46ecf547172b1f19ddde3e
that was reverted because a test failure (this is fixed now).
Thanks to recent improvements in #67663, InvalidPtr checker does not
emit any false positives on the following OS projects: memcached, tmux,
curl, twin, vim, openssl, sqlite, ffmpeg, postgres, tinyxml2, libwebm,
xerces, bitcoin, protobuf, qtbase, contour, acid, openrct2. (Before the
changes mentioned above, there were 27 reports, catching the `getenv`
invalidates previous `getenv` results cases. That strict behaviour is
disabled by default)
Introduce 'InvalidatingGetEnv' checker option for 'getenv' calls.
- POSIX suggests consecutive 'getenv' calls may invalidate
pointer pointers. This is often too strict in real-world scenarios.
- New 'InvalidatingGetEnv' checker option provides a more
pragmatic default that doesn't treat consecutive 'getenv'
calls as invalidating.
- Now also handles main function specifications with an
environment pointer as the third parameter.
Original Phabricator review:
https://reviews.llvm.org/D154603
This commit releases a checker that was developed to a stable level in
the Ericsson-internal fork of Clang Static Analyzer.
Note that the functionality of this checker overlaps with
core.UndefinedBinaryOperatorResult ("UBOR"), but there are several
differences between them:
(1) UBOR is only triggered when the constant folding performed by the
Clang Static Analyzer engine determines that the value of a binary
operator expression is undefined; this checker can report issues where
the operands are not constants.
(2) UBOR has unrelated checks for handling other binary operators, this
checker only examines bitwise shifts.
(3) This checker has a Pedantic flag and by default does not report
expressions (e.g. -2 << 2) that're undefined by the standard but
consistently supported in practice.
(4) UBOR exhibits buggy behavior in code that involves cast expressions,
e.g.
void foo(unsigned short s) {
if (s == 2) {
(void) ((unsigned int) s) << 16;
}
}
Later it would be good to eliminate this overlap (perhaps by deprecating
and then eliminating the bitwise shift handling in UBOR), but in my
opinion that belongs to separate commits.
Differential Revision: https://reviews.llvm.org/D156312
Co-authored-by: Endre Fulop <endre.fulop@sigmatechnology.se>
Main reason for this change is that these checkers were implemented in the same class
but had different dependency ordering. (NonNullParamChecker should run before StdCLibraryFunctionArgs
to get more special warning about null arguments, but the apiModeling.StdCLibraryFunctions was a modeling
checker that should run before other non-modeling checkers. The modeling checker changes state in a way
that makes it impossible to detect a null argument by NonNullParamChecker.)
To make it more simple, the modeling part is removed as separate checker and can be only used if
checker StdCLibraryFunctions is turned on, that produces the warnings too. Modeling the functions
without bug detection (for invalid argument) is not possible. The modeling of standard functions
does not happen by default from this change on.
Reviewed By: Szelethus
Differential Revision: https://reviews.llvm.org/D151225
According to my measurement in https://reviews.llvm.org/D108230#3933232,
it seems like there is no drawback to enabling this analyzer-config by default.
Actually, enabling this by default would make it consistent with the
codegen of clang, which according to `-fstrict-flex-arrays`, assumes
by default that all trailing arrays could be FAMs, let them be of size
undefined, zero, one, or anything else.
Speaking of `-fstrict-flex-arrays`, in the next patch I'll deprecate
the analyzer-config FAM option in favor of that flag. That way, CSA will
always be in sync with what the codegen will think of FAMs.
So, if a new codebase sets `-fstrict-flex-arrays` to some value above 0,
CSA will also make sure that only arrays of the right size will be
considered as FAMs.
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D138657
Extend checker 'ErrnoModeling' with a state of 'errno' to indicate
the importance of the 'errno' value and how it should be used.
Add a new checker 'ErrnoChecker' that observes use of 'errno' and
finds possible wrong uses, based on the "errno state".
The "errno state" should be set (together with value of 'errno')
by other checkers (that perform modeling of the given function)
in the future. Currently only a test function can set this value.
The new checker has no user-observable effect yet.
Reviewed By: martong, steakhal
Differential Revision: https://reviews.llvm.org/D122150
If a lazyCompoundVal to a struct is bound to the store, there is a policy which decides
whether a copy gets created instead.
This patch introduces a similar policy for arrays, which is required to model structured
binding to arrays without false negatives.
Differential Revision: https://reviews.llvm.org/D128064
This new CTU implementation is the natural extension of the normal single TU
analysis. The approach consists of two analysis phases. During the first phase,
we do a normal single TU analysis. During this phase, if we find a foreign
function (that could be inlined from another TU) then we don’t inline that
immediately, we rather mark that to be analysed later.
When the first phase is finished then we start the second phase, the CTU phase.
In this phase, we continue the analysis from that point (exploded node)
which had been enqueued during the first phase. We gradually extend the
exploded graph of the single TU analysis with the new node that was
created by the inlining of the foreign function.
We count the number of analysis steps of the first phase and we limit the
second (ctu) phase with this number.
This new implementation makes it convenient for the users to run the
single-TU and the CTU analysis in one go, they don't need to run the two
analysis separately. Thus, we name this new implementation as "onego" CTU.
Discussion:
https://discourse.llvm.org/t/rfc-much-faster-cross-translation-unit-ctu-analysis-implementation/61728
Differential Revision: https://reviews.llvm.org/D123773
This change adds an option to detect all null dereferences for
non-default address spaces, except for address spaces 256, 257 and 258.
Those address spaces are special since null dereferences are not errors.
All address spaces can be considered (except for 256, 257, and 258) by
using -analyzer-config
core.NullDereference:DetectAllNullDereferences=true. This option is
false by default, retaining the original behavior.
A LIT test was enhanced to cover this case, and the rst documentation
was updated to describe this behavior.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D122841
Summary: Produce SymbolCast for integral types in `evalCast` function. Apply several simplification techniques while producing the symbols. Added a boolean option `handle-integral-cast-for-ranges` under `-analyzer-config` flag. Disabled the feature by default.
Differential Revision: https://reviews.llvm.org/D105340
Some projects [1,2,3] have flex-generated files besides bison-generated
ones.
Unfortunately, the comment `"/* A lexical scanner generated by flex */"`
generated by the tools is not necessarily at the beginning of the file,
thus we need to quickly skim through the file for this needle string.
Luckily, StringRef can do this operation in an efficient way.
That being said, now the bison comment is not required to be at the very
beginning of the file. This allows us to detect a couple more cases
[4,5,6].
Alternatively, we could say that we only allow whitespace characters
before matching the bison/flex header comment. That would prevent the
(probably) unnecessary string search in the buffer. However, I could not
verify that these tools would actually respect this assumption.
Additionally to this, e.g. the Twin project [1] has other non-whitespace
characters (some preprocessor directives) before the flex-generated
header comment. So the heuristic in the previous paragraph won't work
with that.
Thus, I would advocate the current implementation.
According to my measurement, this patch won't introduce measurable
performance degradation, even though we will do 2 linear scans.
I introduce the ignore-bison-generated-files and
ignore-flex-generated-files to disable skipping these files.
Both of these options are true by default.
[1]: https://github.com/cosmos72/twin/blob/master/server/rcparse_lex.cpp#L7
[2]: 22362cdcf9/sandbox/count-words/lexer.c (L6)
[3]: 11abdf6462/lab1/lex.yy.c (L6)
[4]: 47f5b2cfe2/B_yacc/1/y1.tab.h (L2)
[5]: 71d1bf9b1e/src/VBox/Additions/x11/x11include/xorg-server-1.8.0/parser.h (L2)
[6]: 3f773ceb13/Framework/OpenEars.framework/Versions/A/Headers/jsgf_parser.h (L2)
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D114510
If the `assume-controlled-environment` is `true`, we should expect `getenv()`
to succeed, and the result should not be considered tainted.
By default, the option will be `false`.
Reviewed By: NoQ, martong
Differential Revision: https://reviews.llvm.org/D111296
D105819 Added NoOwnershipChangeVisitor, but it is only registered when an
off-by-default, hidden checker option was enabled. The reason behind this was
that it grossly overestimated the set of functions that really needed a note:
std::string getTrainName(const Train *T) {
return T->name;
} // note: Retuning without changing the ownership of or deallocating memory
// Umm... I mean duh? Nor would I expect this function to do anything like that...
void foo() {
Train *T = new Train("Land Plane");
print(getTrainName(T)); // note: calling getTrainName / returning from getTrainName
} // warn: Memory leak
This patch adds a heuristic that guesses that any function that has an explicit
operator delete call could have be responsible for deallocating the memory that
ended up leaking. This is waaaay too conservative (see the TODOs in the new
function), but it safer to err on the side of too little than too much, and
would allow us to enable the option by default *now*, and add refinements
one-by-one.
Differential Revision: https://reviews.llvm.org/D108753
Quoting https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html:
> In the absence of the zero-length array extension, in ISO C90 the contents
> array in the example above would typically be declared to have a single
> element.
We should not assume that the size of the //flexible array member// field has
a single element, because in some cases they use it as a fallback for not
having the //zero-length array// language extension.
In this case, the analyzer should return `Unknown` as the extent of the field
instead.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D108230
The previous behavior was to deduplicate reports based on md5 of the
html file. This algorithm might have worked originally but right now
HTML reports contain information rich enough to make them virtually
always distinct which breaks deduplication entirely.
The new strategy is to (finally) take advantage of IssueHash - the
stable report identifier provided by clang that is the same if and only if
the reports are duplicates of each other.
Additionally, scan-build no longer performs deduplication on its own.
Instead, the report file name is now based on the issue hash,
and clang instances will silently refuse to produce a new html file
when a duplicate already exists. This eliminates the problem entirely.
The '-analyzer-config stable-report-filename' option is deprecated
because report filenames are no longer unstable. A new option is
introduced, '-analyzer-config verbose-report-filename', to produce
verbose file names that look similar to the old "stable" file names.
The old option acts as an alias to the new option.
Differential Revision: https://reviews.llvm.org/D105167
This is a rather common feedback we get from out leak checkers: bug reports are
really short, and are contain barely any usable information on what the analyzer
did to conclude that a leak actually happened.
This happens because of our bug report minimizing effort. We construct bug
reports by inspecting the ExplodedNodes that lead to the error from the bottom
up (from the error node all the way to the root of the exploded graph), and mark
entities that were the cause of a bug, or have interacted with it as
interesting. In order to make the bug report a bit less verbose, whenever we
find an entire function call (from CallEnter to CallExitEnd) that didn't talk
about any interesting entity, we prune it (click here for more info on bug
report generation). Even if the event to highlight is exactly this lack of
interaction with interesting entities.
D105553 generalized the visitor that creates notes for these cases. This patch
adds a new kind of NoStateChangeVisitor that leaves notes in functions that
took a piece of dynamically allocated memory that later leaked as parameter,
and didn't change its ownership status.
Differential Revision: https://reviews.llvm.org/D105553
Summary:
The default CTUImportThreshold (8) seems to be too conservative with C projects.
We increase this value to 24 and we introduce another threshold for C++ source
files (defaulted to 8) because their AST is way more compilcated than C source
files.
Differential Revision: https://reviews.llvm.org/D83475
Adding file handling functions from the POSIX standard (2017).
A new checker option is introduced to enable them.
In follow-up patches I am going to upstream networking, pthread, and other
groups of POSIX functions.
Differential Revision: https://reviews.llvm.org/D82288
Summary:
The default value of 100 makes the analysis slow. Projects of considerable
size can take more time to finish than it is practical. The new default
setting of 8 is based on the analysis of LLVM itself. With the old default
value of 100 the analysis time was over a magnitude slower. Thresholding the
load of ASTUnits is to be extended in the future with a more fine-tuneable
solution that accomodates to the specifics of the project analyzed.
Reviewers: martong, balazske, Szelethus
Subscribers: whisperity, xazax.hun, baloghadamsoftware, szepet, rnkovacs, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, Charusso, steakhal, ASDenysPetrov, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D82561
Pass EvalCallOptions via runCheckersForEvalCall into defaultEvalCall.
Update the AnalysisOrderChecker to support evalCall for testing.
Differential Revision: https://reviews.llvm.org/D82256
This reverts commit 97e07d0c352ca469eb07a0cb3162c2807ff1099d.
Reason: OSX broke for a different reason, this really only seem to work
on linux and very generic windows builds
As per http://lists.llvm.org/pipermail/cfe-dev/2019-August/063215.html, lets get rid of this option.
It presents 2 issues that have bugged me for years now:
* OSObject is NOT a boolean option. It in fact has 3 states:
* osx.OSObjectRetainCount is enabled but OSObject it set to false: RetainCount
regards the option as disabled.
* sx.OSObjectRetainCount is enabled and OSObject it set to true: RetainCount
regards the option as enabled.
* osx.OSObjectRetainCount is disabled: RetainCount regards the option as
disabled.
* The hack involves directly modifying AnalyzerOptions::ConfigTable, which
shouldn't even be public in the first place.
This still isn't really ideal, because it would be better to preserve the option
and remove the checker (we want visible checkers to be associated with
diagnostics, and hidden options like this one to be associated with changing how
the modeling is done), but backwards compatibility is an issue.
Differential Revision: https://reviews.llvm.org/D78097
iAs listed in the summary D77846, we have 5 different categories of bugs we're
checking for in CallAndMessage. I think the documentation placed in the code
explains my thought process behind my decisions quite well.
A non-obvious change I had here is removing the entry for
CallAndMessageUnInitRefArg. In fact, I removed the CheckerNameRef typed field
back in D77845 (it was dead code), so that checker didn't really exist in any
meaningful way anyways.
Differential Revision: https://reviews.llvm.org/D77866