Previously some checkers attached explicitly created program point tags
to some of the exploded graph nodes that they created. In most of the
checkers this ad-hoc tagging only affected the debug dump of the
exploded graph (and they weren't too relevant for debugging) so this
commit removes them.
There were two checkers where the tagging _did_ have a functional role:
- In `RetainCountChecker` the presence of tags were checked by
`RefCountReportVisitor`.
- In `DynamicTypePropagation` the checker sometimes wanted to create two
identical nodes and had to apply an explicit tag on the second one to
avoid "caching out".
In these two situations I preserved the tags but switched to using
`SimpleProgramPointTag` instead of `CheckerProgramPointTag` because
`CheckerProgramPointTag` didn't provide enough benefits to justify its
existence.
Note that this commit depends on the earlier commit "[analyzer] Fix
tagging of PostAllocatorCall" ec96c0c072ef3f78813c378949c00e1c07aa44e5
and would introduce crashes when cherry-picked onto a branch that
doesn't contain that commit.
For more details about the background see the discourse thread
https://discourse.llvm.org/t/role-of-programpointtag-in-the-static-analyzer/
As a tangentially related changes, this commit also adds some comments
to document the surprising behavior of `CheckerContext::addTransition`
and an assertion in the constructor of `PathSensitiveBugReport` to get a
more readable crash dump in the case when the report is constructed with
`nullptr` as the `ErrorNode`. (This can happen due to "caching out".)
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The checker classes (i.e. classes derived from `CheckerBase` via the
utility template `Checker<...>`) act as intermediates between the user
and the analyzer engine, so they have two interfaces:
- On the frontend side, they have a public name, can be enabled or
disabled, can accept checker options and can be reported as the source
of bug reports.
- On the backend side, they can handle various checker callbacks and
they "leave a mark" on the `ExplodedNode`s that are created by them.
(These `ProgramPointTag` marks are internal: they appear in debug logs
and can be queried by checker logic; but the user doesn't see them.)
In a significant majority of the checkers there is 1:1 correspondence
between these sides, but there are also many checker classes where
several related user-facing checkers share the same backend class.
Historically each of these "multi-part checker" classes had its own
hacks to juggle its multiple names, which led to lots of ugliness like
lazy initialization of `mutable std::unique_ptr<BugType>` members and
redundant data members (when a checker used its custom `CheckNames`
array and ignored the inherited single `Name`).
My recent commit 27099982da2f5a6c2d282d6b385e79d080669546 tried to unify
and standardize these existing solutions to get rid of some of the
technical debt, but it still used enum values to identify the checker
parts within a "multi-part" checker class, which led to some ugliness.
This commit introduces a new framework which takes a more direct,
object-oriented approach: instead of identifying checker parts with
`{parent checker object, index of part}` pairs, the parts of a
multi-part checker become stand-alone objects that store their own name
(and enabled/disabled status) as a data member.
This is implemented by separating the functionality of `CheckerBase`
into two new classes: `CheckerFrontend` and `CheckerBackend`. The name
`CheckerBase` is kept (as a class derived from both `CheckerFrontend`
and `CheckerBackend`), so "simple" checkers that use `CheckerBase` and
`Checker<...>` continues to work without changes. However we also get
first-class support for the "many frontends - one backend" situation:
- The class `CheckerFamily<...>` works exactly like `Checker<...>` but
inherits from `CheckerBackend` instead of `CheckerBase`, so it won't
have a superfluous single `Name` member.
- Classes deriving from `CheckerFamily` can freely own multiple
`CheckerFrontend` data members, which are enabled within the
registration methods corresponding to their name and can be used to
initialize the `BugType`s that they can emit.
In this scheme each `CheckerFamily` needs to override the pure virtual
method `ProgramPointTag::getTagDescription()` which returns a string
which represents that class for debugging purposes. (Previously this
used the name of one arbitrary sub-checker, which was passable for
debugging purposes, but not too elegant.)
I'm planning to implement follow-up commits that convert all the
"multi-part" checkers to this `CheckerFamily` framework.
Previously the class `ExplodedGraph` had a data member called `Roots`
which could (in theory) store multiple root nodes -- but in practice
exploded graphs always had at most one root node (zero was possible for
empty and partially copied graphs) and introducing a graph with multiple
roots would've caused severe malfuncitons (e.g. in code that used the
pattern `*roots_begin()` to refer to _the_ root node).
I don't see any practical use case for adding multiple root nodes in a
single graph (this seems to be yet another of the "generalize for the
sake of generalization" decisions which were common in the early history
of the analyzer), so this commit replaces the vector `Roots` with
`ExplodedNode *Root` (which may be null when the graph is empty or under
construction).
Note that the complicated logic of `ExplodedGraph::trim` deserves a
through cleanup, but I left that for a follow-up commit.
So far CSA was relying on the LLVM Statistic package that allowed us to
gather some data about analysis of an entire translation unit. However,
the translation unit consists of a collection of loosely related entry
points. Aggregating data across multiple such entry points is often
counter productive.
This change introduces a new lightweight always-on facility to collect
Boolean or numerical statistics for each entry point and dump them in a
CSV format. Such format makes it easy to aggregate data across multiple
translation units and analyze it with common data-processing tools.
We break down the existing statistics that were collected on the per-TU
basis into values per entry point.
Additionally, we enable the statistics unconditionally (STATISTIC ->
ALWAYS_ENABLED_STATISTIC) to facilitate their use (you can gather the
data with a simple run-time flag rather than having to recompile the
analyzer). These statistics are very light and add virtually no
overhead.
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
CPP-6160
The method name `getCheckerName` would imply "get the name of the
checker associated with `this`", so it's suitable for e.g.
`BugType::getCheckerName` -- but a method that just "gets the name of
`this`" should be simply called `getName`.
This change eliminates the redundant and ugly pattern
`Checker->getCheckerName()` and helps to visually distinguish this
method from `BugType::getCheckerName`.
In the constructor of `BugType` the call of this method was completely
removed (instead of just changing the name) because that call was dead
code (when the data member `Checker` is non-null, the string stored in
`CheckerName` is irrelevant) and was often querying the name of the
checker before it was properly initialized.
Moreover, in `ReturnValueChecker.cpp` the static helper function
`getName` (which gets a function name from a `CallEvent`) was renamed to
the more specific `getFunctionName` to avoid the name collision.
This change is yet another cleanup commit before my planned changes that
would add support for multi-part checkers to this method.
`CheckerNameRef` is a trivial wrapper around a `StringRef` which is
guaranteed to be owned by the `CheckerRegistry` (the only `friend` of
the class) because other code can't call the private constructor.
This class had offered two ways to recover the plain `StringRef`: an an
`operator StringRef()` for implicit conversion and a method `StringRef
getName()` which could be called explicitly.
However this method name was really confusing, because it implies "get
the name of this object" instead of "get this name as a plain
`StringRef`"; so I removed it from the codebase and used
`static_cast<StringRef>` in the two locations where the cast wasn't
performed implicitly.
This commit "prepares the ground" for planned improvements in checker
name handling.
Well, yes. It's not pretty.
At least after this we would have a bit more unique pointers than
before.
This is for fixing the memory leak diagnosed by:
https://lab.llvm.org/buildbot/#/builders/24/builds/5580
And that caused the revert of #127409.
After these uptrs that patch can re-land finally.
The real paths resolves symlinks and makes the tests fail when the
filesystem is a symlink tree over a content-addressable storage (our
internal environment).
Specifically, add a scope for
- each work-list step,
- each entry point,
- each checker run within a step, and
- bug-suppression phase at the end of the analysis of an entry-point.
These scopes add no perceptible run-time overhead when time-tracing is
disabled. You can enable it and generate a time trace using the
`-ftime-trace=file.json` option.
See also the RFC:
https://discourse.llvm.org/t/analyzer-rfc-ftime-trace-time-scopes-for-steps-and-entry-points/84343
--
CPP-6065
The motivating example: https://compiler-explorer.com/z/WjsxYfs43
```C++
#include <stdlib.h>
void inf_loop_break_callee() {
void* data = malloc(10);
while (1) {
(void)data; // line 3
break; // -> execution continues on line 3 ?!!
}
}
```
To correct the flow steps in this example (see the fixed version in the
added test case) I changed two things in the engine:
- Make `processCallExit` create a new StmtPoint only for return
statements. If the last non-jump statement is not a return statement,
e.g. `(void)data;`, it is no longer inserted in the exploded graph after
the function exit.
- Skip the purge program points. In the example above, purge
points are still inserted after the `break;` executes. Now, when the bug
reporter is looking for the next statement executed after the function
execution is finished, it will ignore the purge program points, so it
won't confusingly pick the `(void)data;` statement.
CPP-5778
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
Fix some false negatives of StackAddrEscapeChecker:
- Output parameters
```
void top(int **out) {
int local = 42;
*out = &local; // Noncompliant
}
```
- Indirect global pointers
```
int **global;
void top() {
int local = 42;
*global = &local; // Noncompliant
}
```
Note that now StackAddrEscapeChecker produces a diagnostic if a function
with an output parameter is analyzed as top-level or as a callee. I took
special care to make sure the reports point to the same primary location
and, in many cases, feature the same primary message. That is the
motivation to modify Core/BugReporter.cpp and Core/ExplodedGraph.cpp
To avoid false positive reports when a global indirect pointer is
assigned a local address, invalidated, and then reset, I rely on the
fact that the invalidation symbol will be a DerivedSymbol of a
ConjuredSymbol that refers to the same memory region.
The checker still has a false negative for non-trivial escaping via a
returned value. It requires a more sophisticated traversal akin to
scanReachableSymbols, which out of the scope of this change.
CPP-4734
---------
This is the last of the 3 stacked PRs, it must not be merged before
https://github.com/llvm/llvm-project/pull/105652 and
https://github.com/llvm/llvm-project/pull/105653
Previously there were certain situations where
alpha.security.ArrayBoundV2 produced lots of very similar and redundant
reports that only differed in their full `Description` that contained
the (negative) byte offset value. (See
https://github.com/llvm/llvm-project/issues/86969 for details.)
This change updates the `Profile()` method of `PathSensitiveBugReport`
to ensure that it uses `getShortDescription()` instead of the full
`Description` so the standard report deduplication eliminates most of
these redundant reports.
Note that the effects of this change are very limited because there are
very few checkers that specify a separate short description, and so
`getShortDescription()` practically always defaults to returning the
full `Description`.
For the sake of consistency `BasicBugReport::Profile()` is also updated
to use the short description. (Right now there are no checkers that use
`BasicBugReport` with separate long and short descriptions.)
This commit also includes some small code quality improvements in
`ArrayBoundV2` that are IMO too trivial to be moved into a separate
commit.
This is exactly as originally landed in #95129,
but now the minimal Z3 version was increased to meet this change in #96682.
https://discourse.llvm.org/t/bump-minimal-z3-requirements-from-4-7-1-to-4-8-9/79664/4
---
This patch is a functional change.
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520
As a result of this patch, individual Z3 queries in refutation will be
bound by 300ms. Every report equivalence class will be processed in at
most 1 second.
The heuristic should have only really marginal observable impact -
except for the cases when we had big report eqclasses with long-running
(15s) Z3 queries, where previously CSA effectively halted. After this
patch, CSA will tackle such extreme cases as well.
(cherry picked from commit eacc3b3504be061f7334410dd0eb599688ba103a)
This patch is a functional change.
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520
As a result of this patch, individual Z3 queries in refutation will be
bound by 300ms. Every report equivalence class will be processed in
at most 1 second.
The heuristic should have only really marginal observable impact -
except for the cases when we had big report eqclasses with long-running
(15s) Z3 queries, where previously CSA effectively halted.
After this patch, CSA will tackle such extreme cases as well.
Reviewers: NagyDonat, haoNoQ, Xazax-hun, Szelethus, mikhailramalho
Reviewed By: NagyDonat
Pull Request: https://github.com/llvm/llvm-project/pull/95129
When debugging CSA issues, sometimes it would be useful to have a
dedicated note for the analysis entry point, aka. the function name you
would need to pass as "-analyze-function=XYZ" to reproduce a specific
issue.
One way we use (or will use) this downstream is to provide tooling on
top of creduce to enhance to supercharge productivity by automatically
reduce cases on crashes for example.
This will be added only if the "-analyzer-note-analysis-entry-points" is
set or the "analyzer-display-progress" is on.
This additional entry point marker will be the first "note" if enabled,
with the following message: "[debug] analyzing from XYZ". They are
prefixed by "[debug]" to remind the CSA developer that this is only
meant to be visible for them, for debugging purposes.
CPP-5012
There are currently a few checkers that don't fill in the bug report's
"decl-with-issue" field (typically a function in which the bug is
found).
The new attribute `[[clang::suppress]]` uses decl-with-issue to reduce
the size of the suppression source range map so that it didn't need to
do that for the entire translation unit.
I'm already seeing a few problems with this approach so I'll probably
redesign it in some point as it looks like a premature optimization. Not
only checkers shouldn't be required to pass decl-with-issue (consider
clang-tidy checkers that never had such notion), but also it's not
necessarily uniquely determined (consider leak suppressions at
allocation site).
For now I'm adding a simple stop-gap solution that falls back to
building the suppression map for the entire TU whenever decl-with-issue
isn't specified. Which won't happen in the default setup because luckily
all default checkers do provide decl-with-issue.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
The new attribute can be placed on statements in order to suppress
arbitrary warnings produced by static analysis tools at those statements.
Previously such suppressions were implemented as either informal comments
(eg. clang-tidy `// NOLINT:`) or with preprocessor macros (eg.
clang static analyzer's `#ifdef __clang_analyzer__`). The attribute
provides a universal, formal, flexible and neat-looking suppression mechanism.
Implement support for the new attribute in the clang static analyzer;
clang-tidy coming soon.
The attribute allows specifying which specific warnings to suppress,
in the form of free-form strings that are intended to be specific to
the tools, but currently none are actually supported; so this is also
going to be a future improvement.
Differential Revision: https://reviews.llvm.org/D93110
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
...because it provides no useful functionality compared to its base
class `BugType`.
A long time ago there were substantial differences between `BugType` and
`BuiltinBug`, but they were eliminated by commit 1bd58233 in 2009 (!).
Since then the only functionality provided by `BuiltinBug` was that it
specified `categories::LogicError` as the bug category and it stored an
extra data member `desc`.
This commit sets `categories::LogicError` as the default value of the
third argument (bug category) in the constructors of BugType and
replaces use of the `desc` field with simpler logic.
Note that `BugType` has a data member `Description` and a non-virtual
method `BugType::getDescription()` which queries it; these are distinct
from the member `desc` of `BuiltinBug` and the identically named method
`BuiltinBug::getDescription()` which queries it. This confusing name
collision was a major motivation for the elimination of `BuiltinBug`.
As this commit touches many files, I avoided functional changes and left
behind FIXME notes to mark minor issues that should be fixed later.
Differential Revision: https://reviews.llvm.org/D158855
I'm involved with the Static Analyzer for the most part.
I think we should embrace newer language standard features and gradually
move forward.
Differential Revision: https://reviews.llvm.org/D154325
If ignored, the subexpr is a UnaryOperator (&) which cannot be evaluated
(assertion failed).
#define offsetof(type,memb) ((unsigned long)&((type*)0)->memb)
Patch By danix800!
Differential Revision: https://reviews.llvm.org/D144780
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
It turns out llvm::isa<> is variadic, and we could have used this at a
lot of places.
The following patterns:
x && isa<T1>(x) || isa<T2>(x) ...
Will be replaced by:
isa_and_non_null<T1, T2, ...>(x)
Sometimes it caused further simplifications, when it would cause even
more code smell.
Aside from this, keep in mind that within `assert()` or any macro
functions, we need to wrap the isa<> expression within a parenthesis,
due to the parsing of the comma.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111982
This commit adds a very first version of this feature.
It is off by default and has to be turned on by checking the
corresponding box. For this reason, HTML reports still keep
control notes (aka grey bubbles).
Further on, we plan on attaching arrows to events and having all arrows
not related to a currently selected event barely visible. This will
help with reports where control flow goes back and forth (eg in loops).
Right now, it can get pretty crammed with all the arrows.
Differential Revision: https://reviews.llvm.org/D92639
This patch handles the `std::swap` function specialization
for `std::unique_ptr`. Implemented to be very similar to
how `swap` method is handled
Differential Revision: https://reviews.llvm.org/D104300
`PathSensitiveBughReport` has a function to mark a symbol as interesting but
it was not possible to clear this flag. This can be useful in some cases,
so the functionality is added.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D105637
D66572 separated BugReport and BugReporter into basic and path sensitive
versions. As a result, checker silencing, which worked deep in the path
sensitive report generation facilities became specific to it. DeadStoresChecker,
for instance, despite being in the static analyzer, emits non-pathsensitive
reports, and was impossible to silence.
This patch moves the corresponding code before the call to the virtual function
generateDiagnosticForConsumerMap (which is overriden by the specific kinds of
bug reporters). Although we see bug reporting as relatively lightweight compared
to the analysis, this will get rid of several steps we used to throw away.
Quoting from D65379:
At a very high level, this consists of 3 steps:
For all BugReports in the same BugReportEquivClass, collect all their error
nodes in a set. With that set, create a new, trimmed ExplodedGraph whose leafs
are all error nodes.
Until a valid report is found, construct a bug path, which is yet another
ExplodedGraph, that is linear from a given error node to the root of the graph.
Run all visitors on the constructed bug path. If in this process the report got
invalidated, start over from step 2.
Checker silencing used to kick in after all of these. Now it does before any of
them :^)
Differential Revision: https://reviews.llvm.org/D102914
Change-Id: Ice42939304516f2bebd05a1ea19878b89c96a25d
The majority of all `addVisitor` callers follow the same pattern:
addVisitor(std::make_unique<SomeVisitor>(arg1, arg2, ...));
This patches introduces additional overload for `addVisitor` to simplify
that pattern:
addVisitor<SomeVisitor>(arg1, arg2, ...);
Differential Revision: https://reviews.llvm.org/D103457
The program point created by the checker, even if it is an error node,
might not be the same as the name under which the report is emitted.
Make sure we're checking the name of the checker, because thats what
we're silencing after all.
Differential Revision: https://reviews.llvm.org/D102683