120 Commits

Author SHA1 Message Date
Donát Nagy
9600a12f0d
[analyzer] Workaround for slowdown spikes (unintended scope increase) (#136720)
Recently some users reported that they observed large increases of
runtime (up to +600% on some translation units) when they upgraded to a
more recent (slightly patched, internal) clang version. Bisection
revealed that the bulk of this increase was probably caused by my
earlier commit bb27d5e5c6b194a1440b8ac4e5ace68d0ee2a849 ("Don't assume
third iteration in loops").

As I evaluated that earlier commit on several open source project, it
turns out that on average it's runtime-neutral (or slightly helpful: it
reduced the total analysis time by 1.5%) but it can cause runtime spikes
on some code: in particular it more than doubled the time to analyze
`tmux` (one of the smaller test projects).

Further profiling and investigation proved that these spikes were caused
by an _increase of analysis scope_ because there was an heuristic that
placed functions on a "don't inline this" blacklist if they reached the
`-analyzer-max-loop` limit (anywhere, on any one execution path) --
which became significantly rarer when my commit ensured the analyzer no
longer "just assumes" four iterations. (With more inlining significantly
more entry points use up their allocated budgets, which leads to the
increased runtime.)

I feel that this heuristic for the "don't inline" blacklist is
unjustified and arbitrary, because reaching the "retry without inlining"
limit on one path does not imply that inlining the function won't be
valuable on other paths -- so I hope that we can eventually replace it
with more "natural" limits of the analysis scope.

However, the runtime increases are annoying for the users whose project
is affected, so I created this quick workaround commit that approximates
the "don't inline" blacklist effects of ambiguous loops (where the
analyzer doesn't understand the loop condition) without fully reverting
the "Don't assume third iteration" commit (to avoid reintroducing the
false positives that were eliminated by it).

Investigating this issue was a team effort: I'm grateful to Endre Fülöp
(gamesh411) who did the bisection and shared his time measurement setup,
and Gábor Tóthvári (tigbr) who helped me in profiling.
2025-05-12 10:56:29 +02:00
Donát Nagy
03adb0ec7d
[analyzer] Remove deprecated option VirtualCall:PureOnly (#131823)
VirtualCallChecker.cpp implements two related checkers:
- `optin.cplusplus.VirtualCall` which reports situations when
constructors or destructors call virtual methods (which is bugprone
because it does not trigger virtual dispatch, but can be legitmate).
- `cplusplus.PureVirtualCall` reports situations when constructors or
destructors call _pure_ virtual methods, which is an error.

Six years ago these two bug types were both reported by the same checker
(called `optin.cplusplus.VirtualCall`) and it had an option called
`PureOnly` which limited its output to the pure case.

When (in 2019) the two checker parts were separated by the commit
d3971fe97b64785c079d64bf4c8c3e2b5e1f85a1, the option `PureOnly` was
preserved for the sake of compatibility, but it is no longer useful
(when it is set to true, it just suppresses all reports from
`optin.cplusplus.VirtualCall`) so it was marked as deprecated.

I'm removing this deprecated option now because it is no longer relevant
and its presence caused minor complications when I was porting
`VirtualCallChecker.cpp` to the new multipart checker framework
(introduced in 27099982da2f5a6c2d282d6b385e79d080669546).
2025-03-19 18:22:00 +01:00
Arseniy Zaostrovnykh
57e36419b2
[analyzer] Introduce per-entry-point statistics (#131175)
So far CSA was relying on the LLVM Statistic package that allowed us to
gather some data about analysis of an entire translation unit. However,
the translation unit consists of a collection of loosely related entry
points. Aggregating data across multiple such entry points is often
counter productive.

This change introduces a new lightweight always-on facility to collect
Boolean or numerical statistics for each entry point and dump them in a
CSV format. Such format makes it easy to aggregate data across multiple
translation units and analyze it with common data-processing tools.

We break down the existing statistics that were collected on the per-TU
basis into values per entry point.

Additionally, we enable the statistics unconditionally (STATISTIC ->
ALWAYS_ENABLED_STATISTIC) to facilitate their use (you can gather the
data with a simple run-time flag rather than having to recompile the
analyzer). These statistics are very light and add virtually no
overhead.

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
CPP-6160
2025-03-17 08:23:31 +01:00
Balázs Kéri
da7403ed1d
[clang][analyzer] Add checker 'alpha.core.FixedAddressDereference' (#127191) 2025-03-03 10:44:05 +01:00
Balázs Benics
22a5bb32b7
[analyzer] Limit Store by region-store-binding-limit (#127602)
In our test pool, the max entry point RT was improved by this change:
1'181 seconds (~19.7 minutes) -> 94 seconds (1.6 minutes)

BTW, the 1.6 minutes is still really bad. But a few orders of magnitude
better than it was before.

This was the most servere RT edge-case as you can see from the numbers.
There are are more known RT bottlenecks, such as:

- Large environment sizes, and `removeDead`. See more about the failed
attempt on improving it at:
https://discourse.llvm.org/t/unsuccessful-attempts-to-fix-a-slow-analysis-case-related-to-removedead-and-environment-size/84650

- Large chunk of time could be spend inside `assume`, to reach a fixed
point. This is something we want to look into a bit later if we have
time.

We have 3'075'607 entry points in our test set.
About 393'352 entry points ran longer than 1 second when measured.

To give a sense of the distribution, if we ignore the slowest 500 entry
points, then the maximum entry point runs for about 14 seconds. These
500 slow entry points are in 332 translation units.

By this patch, out of the slowest 500 entry points, 72 entry points were
improved by at least 10x after this change.

We measured no RT regression on the "usual" entry points.


![slow-entrypoints-before-and-after-bind-limit](https://github.com/user-attachments/assets/44425a76-f1cb-449c-bc3e-f44beb8c5dc7)
(The dashed lines represent the maximum of their RT)

CPP-6092
2025-02-24 15:48:06 +01:00
Donát Nagy
edbc1fb228
[analyzer] Add option assume-at-least-one-iteration (#125494)
This commit adds the new analyzer option
`assume-at-least-one-iteration`, which is `false` by default, but can be
set to `true` to ensure that the analyzer always assumes at least one
iteration in loops.

In some situations this "loop is skipped" execution path is an important
corner case that may evade the notice of the developer and hide
significant bugs -- however, there are also many situations where it's
guaranteed that at least one iteration will happen (e.g. some data
structure is always nonempty), but the analyzer cannot realize this and
will produce false positives when it assumes that the loop is skipped.

This commit refactors some logic around the implementation of the new
feature, but the only functional change is introducing the new analyzer
option. If the new option is left in its default state (false), then the
analysis is functionally equivalent to an analysis done with a version
before this commit.
2025-02-12 11:56:02 +01:00
Balazs Benics
55391f85ac
[analyzer] Retry UNDEF Z3 queries 2 times by default (#120239)
If we have a refutation Z3 query timed out (UNDEF), allow a couple of
retries to improve stability of the query. By default allow 2 retries,
which will give us in maximum of 3 solve attempts per query.

Retries should help mitigating flaky Z3 queries.
See the details in the following RFC:

https://discourse.llvm.org/t/analyzer-rfc-retry-z3-crosscheck-queries-on-timeout/83711

Note that with each attempt, we spend more time per query.
Currently, we have a 15 seconds timeout per query - which are also in
effect for the retry attempts.

---

Why should this help?
In short, retrying queries should bring stability because if a query
runs long
it's more likely that it did so due to some runtime anomaly than it's on
the edge of succeeding. This is because most queries run quick, and the
queries that run long, usually run long by a fair amount.
Consequently, retries should improve the stability of the outcome of the
Z3 query.

In general, the retries shouldn't increase the overall analysis time
because it's really rare we hit the 0.1% of the cases when we would do
retries. But keep in mind that the retry attempts can add up if many
retries are allowed, or the individual query timeout is large.

CPP-5920
2025-01-06 18:08:12 +01:00
Kristóf Umann
ea8e328ae2
[analyzer][Z3] Restore the original timeout of 15s (#118291)
Discussion here:

https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus

The original patch, #97298 introduced new timeouts backed by thorough
testing and measurements to keep the running time of Z3 within
reasonable limits. The measurements also showed that only certain
reports and certain TUs were responsible for the poor performance of Z3
refutation.

Unfortunately, it seems like that on machines with different
characteristics (slower machines) the current timeouts don't just axe
0.01% of reports, but many more as well. Considering that timeouts are
inherently nondeterministic as a cutoff point, this lead reports sets
being vastly different on the same projects with the same configuration.
The discussion link shows that all configurations introduced in the
patch with their default values lead to severa nondeterminism of the
analyzer. As we, and others use the analyzer as a gating tool for PRs,
we should revert to the original defaults.

We should respect that
* There are still parts of the analyzer that are either proven or
suspected to contain nondeterministic code (like pointer sets),
* A 15s timeout is more likely to hit the same reports every time on a
wider range of machines, but is still inherently nondeterministic, but
an infinite timeout leads to the tool hanging,
* If you measure the performance of the analyzer on your machines, you
can and should achieve some speedup with little or no observable
nondeterminism.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-12-13 14:31:06 +01:00
Daniel Krupp
f82fb06cd1
[analyzer] Moving TaintPropagation checker out of alpha (#67352)
This commit moves the **alpha.security.taint.TaintPropagation** and
**alpha.security.taint.GenericTaint** checkers to the **optin.taint**
optional package.

These checkers were stabilized and improved by recent commits thus 
they are ready for production use.
2024-09-26 14:00:13 +02:00
Balazs Benics
848658955a
[analyzer] Limit isTainted() by skipping complicated symbols (#105493)
As discussed in

https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10

Some `isTainted()` queries can blow up the analysis times, and
effectively halt the analysis under specific workloads.

We don't really have the time now to do a caching re-implementation of
`isTainted()`, so we need to workaround the case.

The workaround with the smallest blast radius was to limit what symbols
`isTainted()` does the query (by walking the SymExpr). So far, the
threshold 10 worked for us, but this value can be overridden using the
"max-tainted-symbol-complexity" config value.

This new option is "deprecated" from the getgo, as I expect this issue
to be fixed within the next few months and I don't want users to
override this value anyways. If they do, this message will let them know
that they are on their own, and the next release may break them (as we
no longer recognize this option if we drop it).

Mitigates #89720

CPP-5414
2024-08-21 14:24:56 +02:00
Balázs Kéri
9440a49b0f
[clang][analyzer] MmapWriteExecChecker improvements (#97078)
Read the 'mmap' flags from macro values and use a better test for the
error situation. Checker messages are improved too.
2024-07-26 10:10:36 +02:00
Balazs Benics
ae570d82e8
Reland "[analyzer] Harden safeguards for Z3 query times" (#97298)
This is exactly as originally landed in #95129,
but now the minimal Z3 version was increased to meet this change in #96682.

https://discourse.llvm.org/t/bump-minimal-z3-requirements-from-4-7-1-to-4-8-9/79664/4

---

This patch is a functional change.
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520

As a result of this patch, individual Z3 queries in refutation will be
bound by 300ms. Every report equivalence class will be processed in at
most 1 second.

The heuristic should have only really marginal observable impact -
except for the cases when we had big report eqclasses with long-running
(15s) Z3 queries, where previously CSA effectively halted. After this
patch, CSA will tackle such extreme cases as well.

(cherry picked from commit eacc3b3504be061f7334410dd0eb599688ba103a)
2024-07-01 17:22:24 +02:00
Balazs Benics
8fc9c03cde
[analyzer] Revert Z3 changes (#95916)
Requested in:
https://github.com/llvm/llvm-project/pull/95128#issuecomment-2176008007

Revert "[analyzer] Harden safeguards for Z3 query times"
Revert "[analyzer][NFC] Reorganize Z3 report refutation"

This reverts commit eacc3b3504be061f7334410dd0eb599688ba103a.
This reverts commit 89c26f6c7b0a6dfa257ec090fcf5b6e6e0c89aab.
2024-06-18 14:59:28 +02:00
Balazs Benics
eacc3b3504
[analyzer] Harden safeguards for Z3 query times
This patch is a functional change.
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520

As a result of this patch, individual Z3 queries in refutation will be
bound by 300ms. Every report equivalence class will be processed in
at most 1 second.

The heuristic should have only really marginal observable impact -
except for the cases when we had big report eqclasses with long-running
(15s) Z3 queries, where previously CSA effectively halted.
After this patch, CSA will tackle such extreme cases as well.

Reviewers: NagyDonat, haoNoQ, Xazax-hun, Szelethus, mikhailramalho

Reviewed By: NagyDonat

Pull Request: https://github.com/llvm/llvm-project/pull/95129
2024-06-18 09:48:22 +02:00
Balázs Kéri
09f160c629
[clang][analyzer] Move StreamChecker out of the alpha package. (#89247) 2024-04-30 09:01:45 +02:00
Balázs Kéri
c2067c1f47
[clang][analyzer] Add "pedantic" mode to StreamChecker. (#87322)
The checker may create failure branches for all stream write operations
only if the new option "pedantic" is set to true.
Result of the write operations is often not checked in typical code. If
failure branches are created the checker will warn for unchecked write
operations and generate a lot of "false positives" (these are valid
warnings but the programmer does not care about this problem).
2024-04-08 12:19:03 +02:00
Balázs Kéri
bbeb946652 [clang][analyzer] Change value of checker option in unix.StdCLibraryFunctions (second try). (#80457)
Default value of checker option `ModelPOSIX` is changed to `true`.
Documentation is updated.

This is a re-apply of commit 7af4e8bcc354d2bd7e46ecf547172b1f19ddde3e
that was reverted because a test failure (this is fixed now).
2024-03-04 15:28:20 +01:00
Balázs Kéri
da5966e0c1 Revert "[clang][analyzer] Change default value of checker option in unix.StdCLibraryFunctions. (#80457)"
This reverts commit 7af4e8bcc354d2bd7e46ecf547172b1f19ddde3e.
2024-03-04 09:50:36 +01:00
Balázs Kéri
7af4e8bcc3
[clang][analyzer] Change default value of checker option in unix.StdCLibraryFunctions. (#80457)
Default value of checker option `ModelPOSIX` is changed to `true`.
Documentation is updated.
2024-03-04 09:29:18 +01:00
Endre Fülöp
b98a594977
[clang][analyzer] Move security.cert.env.InvalidPtr out of alpha (#71912)
Thanks to recent improvements in #67663, InvalidPtr checker does not
emit any false positives on the following OS projects: memcached, tmux,
curl, twin, vim, openssl, sqlite, ffmpeg, postgres, tinyxml2, libwebm,
xerces, bitcoin, protobuf, qtbase, contour, acid, openrct2. (Before the
changes mentioned above, there were 27 reports, catching the `getenv`
invalidates previous `getenv` results cases. That strict behaviour is
disabled by default)
2023-11-24 10:02:56 +01:00
Balázs Kéri
72d3bf2b87
[clang][Analyzer] Move checker 'alpha.unix.Errno' to 'unix.Errno'. (#69469) 2023-11-21 13:34:03 +01:00
Endre Fülöp
f7a46d700f
[analyzer][clangsa] Add new option to alpha.security.cert.InvalidPtrChecker (#67663)
Introduce 'InvalidatingGetEnv' checker option for 'getenv' calls.

- POSIX suggests consecutive 'getenv' calls may invalidate 
  pointer pointers. This is often too strict in real-world scenarios.
- New 'InvalidatingGetEnv' checker option provides a more 
  pragmatic default that doesn't treat consecutive 'getenv' 
  calls as invalidating.
- Now also handles main function specifications with an 
  environment pointer as the third parameter.

Original Phabricator review:
https://reviews.llvm.org/D154603
2023-10-24 13:59:54 +02:00
Balázs Kéri
c202a17d02
[clang][analyzer] Move checker alpha.unix.StdCLibraryFunctions out of alpha. (#66207) 2023-10-16 14:51:05 +02:00
Donát Nagy
25b9696b61 [analyzer] Upstream BitwiseShiftChecker
This commit releases a checker that was developed to a stable level in
the Ericsson-internal fork of Clang Static Analyzer.

Note that the functionality of this checker overlaps with
core.UndefinedBinaryOperatorResult ("UBOR"), but there are several
differences between them:
(1) UBOR is only triggered when the constant folding performed by the
Clang Static Analyzer engine determines that the value of a binary
operator expression is undefined; this checker can report issues where
the operands are not constants.
(2) UBOR has unrelated checks for handling other binary operators, this
checker only examines bitwise shifts.
(3) This checker has a Pedantic flag and by default does not report
expressions (e.g. -2 << 2) that're undefined by the standard but
consistently supported in practice.
(4) UBOR exhibits buggy behavior in code that involves cast expressions,
e.g.
    void foo(unsigned short s) {
      if (s == 2) {
        (void) ((unsigned int) s) << 16;
      }
    }

Later it would be good to eliminate this overlap (perhaps by deprecating
and then eliminating the bitwise shift handling in UBOR), but in my
opinion that belongs to separate commits.

Differential Revision: https://reviews.llvm.org/D156312

Co-authored-by: Endre Fulop <endre.fulop@sigmatechnology.se>
2023-08-18 10:47:05 +02:00
Balazs Benics
7cd1f3ad22 [analyzer] Remove deprecated analyzer-config options
The `consider-single-element-arrays-as-flexible-array-members` analyzer
option was deprecated in clang-16, and now removed from clang-17 as
promised in
https://releases.llvm.org/16.0.0/tools/clang/docs/ReleaseNotes.html#static-analyzer

This shouldn't change observable behavior.

Differential Revision: https://reviews.llvm.org/D154481
2023-07-07 13:24:33 +02:00
Balázs Kéri
4f0436dd15 [clang][analyzer] Merge apiModeling.StdCLibraryFunctions and StdCLibraryFunctionArgs checkers into one.
Main reason for this change is that these checkers were implemented in the same class
but had different dependency ordering. (NonNullParamChecker should run before StdCLibraryFunctionArgs
to get more special warning about null arguments, but the apiModeling.StdCLibraryFunctions was a modeling
checker that should run before other non-modeling checkers. The modeling checker changes state in a way
that makes it impossible to detect a null argument by NonNullParamChecker.)
To make it more simple, the modeling part is removed as separate checker and can be only used if
checker StdCLibraryFunctions is turned on, that produces the warnings too. Modeling the functions
without bug detection (for invalid argument) is not possible. The modeling of standard functions
does not happen by default from this change on.

Reviewed By: Szelethus

Differential Revision: https://reviews.llvm.org/D151225
2023-06-01 09:54:35 +02:00
Balazs Benics
3648175839 [analyzer] Consider single-elem arrays as FAMs by default
According to my measurement in https://reviews.llvm.org/D108230#3933232,
it seems like there is no drawback to enabling this analyzer-config by default.

Actually, enabling this by default would make it consistent with the
codegen of clang, which according to `-fstrict-flex-arrays`, assumes
by default that all trailing arrays could be FAMs, let them be of size
undefined, zero, one, or anything else.

Speaking of `-fstrict-flex-arrays`, in the next patch I'll deprecate
the analyzer-config FAM option in favor of that flag. That way, CSA will
always be in sync with what the codegen will think of FAMs.

So, if a new codebase sets `-fstrict-flex-arrays` to some value above 0,
CSA will also make sure that only arrays of the right size will be
considered as FAMs.

Reviewed By: xazax.hun

Differential Revision: https://reviews.llvm.org/D138657
2022-11-25 10:24:56 +01:00
Balázs Kéri
60f3b07118 [clang][analyzer] Add checker for bad use of 'errno'.
Extend checker 'ErrnoModeling' with a state of 'errno' to indicate
the importance of the 'errno' value and how it should be used.
Add a new checker 'ErrnoChecker' that observes use of 'errno' and
finds possible wrong uses, based on the "errno state".
The "errno state" should be set (together with value of 'errno')
by other checkers (that perform modeling of the given function)
in the future. Currently only a test function can set this value.
The new checker has no user-observable effect yet.

Reviewed By: martong, steakhal

Differential Revision: https://reviews.llvm.org/D122150
2022-06-20 10:07:31 +02:00
isuckatcs
92bf652d40 [Static Analyzer] Small array binding policy
If a lazyCompoundVal to a struct is bound to the store, there is a policy which decides
whether a copy gets created instead.

This patch introduces a similar policy for arrays, which is required to model structured
binding to arrays without false negatives.

Differential Revision: https://reviews.llvm.org/D128064
2022-06-17 18:56:13 +02:00
Gabor Marton
56b9b97c1e [clang][analyzer][ctu] Make CTU a two phase analysis
This new CTU implementation is the natural extension of the normal single TU
analysis. The approach consists of two analysis phases. During the first phase,
we do a normal single TU analysis. During this phase, if we find a foreign
function (that could be inlined from another TU) then we don’t inline that
immediately, we rather mark that to be analysed later.
When the first phase is finished then we start the second phase, the CTU phase.
In this phase, we continue the analysis from that point (exploded node)
which had been enqueued during the first phase. We gradually extend the
exploded graph of the single TU analysis with the new node that was
created by the inlining of the foreign function.

We count the number of analysis steps of the first phase and we limit the
second (ctu) phase with this number.

This new implementation makes it convenient for the users to run the
single-TU and the CTU analysis in one go, they don't need to run the two
analysis separately. Thus, we name this new implementation as "onego" CTU.

Discussion:
https://discourse.llvm.org/t/rfc-much-faster-cross-translation-unit-ctu-analysis-implementation/61728

Differential Revision: https://reviews.llvm.org/D123773
2022-05-18 10:35:52 +02:00
Vince Bridgers
3566bbe62f [analyzer] Add option for AddrSpace in core.NullDereference check
This change adds an option to detect all null dereferences for
    non-default address spaces, except for address spaces 256, 257 and 258.
    Those address spaces are special since null dereferences are not errors.

    All address spaces can be considered (except for 256, 257, and 258) by
    using -analyzer-config
    core.NullDereference:DetectAllNullDereferences=true. This option is
    false by default, retaining the original behavior.

    A LIT test was enhanced to cover this case, and the rst documentation
    was updated to describe this behavior.

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D122841
2022-04-24 03:51:49 -05:00
Denys Petrov
d835dd4cf5 [analyzer] Produce SymbolCast symbols for integral types in SValBuilder::evalCast
Summary: Produce SymbolCast for integral types in `evalCast` function. Apply several simplification techniques while producing the symbols. Added a boolean option `handle-integral-cast-for-ranges` under `-analyzer-config` flag. Disabled the feature by default.

Differential Revision: https://reviews.llvm.org/D105340
2022-01-18 16:08:04 +02:00
Balazs Benics
9873ef409c [analyzer] Ignore flex generated files
Some projects [1,2,3] have flex-generated files besides bison-generated
ones.
Unfortunately, the comment `"/* A lexical scanner generated by flex */"`
generated by the tools is not necessarily at the beginning of the file,
thus we need to quickly skim through the file for this needle string.

Luckily, StringRef can do this operation in an efficient way.

That being said, now the bison comment is not required to be at the very
beginning of the file. This allows us to detect a couple more cases
[4,5,6].

Alternatively, we could say that we only allow whitespace characters
before matching the bison/flex header comment. That would prevent the
(probably) unnecessary string search in the buffer. However, I could not
verify that these tools would actually respect this assumption.

Additionally to this, e.g. the Twin project [1] has other non-whitespace
characters (some preprocessor directives) before the flex-generated
header comment. So the heuristic in the previous paragraph won't work
with that.
Thus, I would advocate the current implementation.

According to my measurement, this patch won't introduce measurable
performance degradation, even though we will do 2 linear scans.

I introduce the ignore-bison-generated-files and
ignore-flex-generated-files to disable skipping these files.
Both of these options are true by default.

[1]: https://github.com/cosmos72/twin/blob/master/server/rcparse_lex.cpp#L7
[2]: 22362cdcf9/sandbox/count-words/lexer.c (L6)
[3]: 11abdf6462/lab1/lex.yy.c (L6)

[4]: 47f5b2cfe2/B_yacc/1/y1.tab.h (L2)
[5]: 71d1bf9b1e/src/VBox/Additions/x11/x11include/xorg-server-1.8.0/parser.h (L2)
[6]: 3f773ceb13/Framework/OpenEars.framework/Versions/A/Headers/jsgf_parser.h (L2)

Reviewed By: xazax.hun

Differential Revision: https://reviews.llvm.org/D114510
2021-12-06 10:20:17 +01:00
Balazs Benics
edde4efc66 [analyzer] Introduce the assume-controlled-environment config option
If the `assume-controlled-environment` is `true`, we should expect `getenv()`
to succeed, and the result should not be considered tainted.
By default, the option will be `false`.

Reviewed By: NoQ, martong

Differential Revision: https://reviews.llvm.org/D111296
2021-10-13 10:50:26 +02:00
Kristóf Umann
9d359f6c73 [analyzer] MallocChecker: Add notes from NoOwnershipChangeVisitor only when a function "intents", but doesn't change ownership, enable by default
D105819 Added NoOwnershipChangeVisitor, but it is only registered when an
off-by-default, hidden checker option was enabled. The reason behind this was
that it grossly overestimated the set of functions that really needed a note:

std::string getTrainName(const Train *T) {
  return T->name;
} // note: Retuning without changing the ownership of or deallocating memory
// Umm... I mean duh? Nor would I expect this function to do anything like that...

void foo() {
  Train *T = new Train("Land Plane");
  print(getTrainName(T)); // note: calling getTrainName / returning from getTrainName
} // warn: Memory leak

This patch adds a heuristic that guesses that any function that has an explicit
operator delete call could have be responsible for deallocating the memory that
ended up leaking. This is waaaay too conservative (see the TODOs in the new
function), but it safer to err on the side of too little than too much, and
would allow us to enable the option by default *now*, and add refinements
one-by-one.

Differential Revision: https://reviews.llvm.org/D108753
2021-09-13 15:01:20 +02:00
Balazs Benics
91c07eb8ee [analyzer] Ignore single element arrays in getStaticSize() conditionally
Quoting https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html:
> In the absence of the zero-length array extension, in ISO C90 the contents
> array in the example above would typically be declared to have a single
> element.

We should not assume that the size of the //flexible array member// field has
a single element, because in some cases they use it as a fallback for not
having the //zero-length array// language extension.
In this case, the analyzer should return `Unknown` as the extent of the field
instead.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D108230
2021-09-04 10:19:57 +02:00
Artem Dergachev
7309359928 [analyzer] Fix scan-build report deduplication.
The previous behavior was to deduplicate reports based on md5 of the
html file. This algorithm might have worked originally but right now
HTML reports contain information rich enough to make them virtually
always distinct which breaks deduplication entirely.

The new strategy is to (finally) take advantage of IssueHash - the
stable report identifier provided by clang that is the same if and only if
the reports are duplicates of each other.

Additionally, scan-build no longer performs deduplication on its own.
Instead, the report file name is now based on the issue hash,
and clang instances will silently refuse to produce a new html file
when a duplicate already exists. This eliminates the problem entirely.

The '-analyzer-config stable-report-filename' option is deprecated
because report filenames are no longer unstable. A new option is
introduced, '-analyzer-config verbose-report-filename', to produce
verbose file names that look similar to the old "stable" file names.
The old option acts as an alias to the new option.

Differential Revision: https://reviews.llvm.org/D105167
2021-08-26 13:34:29 -07:00
Kristóf Umann
2d3668c997 [analyzer] MallocChecker: Add a visitor to leave a note on functions that could have, but did not change ownership on leaked memory
This is a rather common feedback we get from out leak checkers: bug reports are
really short, and are contain barely any usable information on what the analyzer
did to conclude that a leak actually happened.

This happens because of our bug report minimizing effort. We construct bug
reports by inspecting the ExplodedNodes that lead to the error from the bottom
up (from the error node all the way to the root of the exploded graph), and mark
entities that were the cause of a bug, or have interacted with it as
interesting. In order to make the bug report a bit less verbose, whenever we
find an entire function call (from CallEnter to CallExitEnd) that didn't talk
about any interesting entity, we prune it (click here for more info on bug
report generation). Even if the event to highlight is exactly this lack of
interaction with interesting entities.

D105553 generalized the visitor that creates notes for these cases. This patch
adds a new kind of NoStateChangeVisitor that leaves notes in functions that
took a piece of dynamically allocated memory that later leaked as parameter,
and didn't change its ownership status.

Differential Revision: https://reviews.llvm.org/D105553
2021-08-16 16:19:00 +02:00
Gabor Marton
d12d0b73f1 [analyzer] Add CTUImportCppThreshold for C++ files
Summary:
The default CTUImportThreshold (8) seems to be too conservative with C projects.
We increase this value to 24 and we introduce another threshold for C++ source
files (defaulted to 8) because their AST is way more compilcated than C source
files.

Differential Revision: https://reviews.llvm.org/D83475
2020-07-09 15:36:33 +02:00
Nithin Vadukkumchery Rajendrakumar
20e271a98d [analyzer] Warning for default constructed unique_ptr dereference
Summary: Add support for warning incase of default constructed unique pointer dereferences

Reviewed By: NoQ, Szelethus, vsavchenko, xazax.hun

Tags: #clang

Differential Revision: https://reviews.llvm.org/D81315
2020-07-08 09:51:02 +02:00
Gabor Marton
db4d5f7048 [analyzer][StdLibraryFunctionsChecker] Add POSIX file handling functions
Adding file handling functions from the POSIX standard (2017).
A new checker option is introduced to enable them.
In follow-up patches I am going to upstream networking, pthread, and other
groups of POSIX functions.

Differential Revision: https://reviews.llvm.org/D82288
2020-07-02 14:28:05 +02:00
Endre Fülöp
52f6532366 [analyzer][CrossTU] Lower CTUImportThreshold default value
Summary:
The default value of 100 makes the analysis slow. Projects of considerable
size can take more time to finish than it is practical. The new default
setting of 8 is based on the analysis of LLVM itself. With the old default
value of 100 the analysis time was over a magnitude slower. Thresholding the
load of ASTUnits is to be extended in the future with a more fine-tuneable
solution that accomodates to the specifics of the project analyzed.

Reviewers: martong, balazske, Szelethus

Subscribers: whisperity, xazax.hun, baloghadamsoftware, szepet, rnkovacs, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, Charusso, steakhal, ASDenysPetrov, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D82561
2020-07-01 10:08:52 +02:00
Nithin Vadukkumchery Rajendrakumar
37c1bf21d1 [analyzer] Enable constructor support in evalCall event.
Pass EvalCallOptions via runCheckersForEvalCall into defaultEvalCall.
Update the AnalysisOrderChecker to support evalCall for testing.

Differential Revision: https://reviews.llvm.org/D82256
2020-06-25 09:47:13 -07:00
Endre Fülöp
5cc18516c4 [analyzer] On-demand parsing capability for CTU
Summary:
Introduce on-demand parsing of needed ASTs during CTU analysis.
The index-file format is extended, and analyzer-option CTUInvocationList
is added to specify the exact invocations needed to parse the needed
source-files.

Reviewers: martong, balazske, Szelethus, xazax.hun, whisperity

Reviewed By: martong, xazax.hun

Subscribers: gribozavr2, thakis, ASDenysPetrov, ormris, mgorny, whisperity, xazax.hun, baloghadamsoftware, szepet, rnkovacs, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, Charusso, steakhal, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75665
2020-06-11 10:56:59 +02:00
Endre Fülöp
435b458ad0 Revert "[analyzer] On-demand parsing capability for CTU"
This reverts commit 97e07d0c352ca469eb07a0cb3162c2807ff1099d.
Reason: OSX broke for a different reason, this really only seem to work
on linux and very generic windows builds
2020-06-10 17:55:37 +02:00
Endre Fülöp
97e07d0c35 [analyzer] On-demand parsing capability for CTU
Summary:
Introduce on-demand parsing of needed ASTs during CTU analysis.
The index-file format is extended, and analyzer-option CTUInvocationList
is added to specify the exact invocations needed to parse the needed
source-files.

Reviewers: martong, balazske, Szelethus, xazax.hun, whisperity

Reviewed By: martong, xazax.hun

Subscribers: gribozavr2, thakis, ASDenysPetrov, ormris, mgorny, whisperity, xazax.hun, baloghadamsoftware, szepet, rnkovacs, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, Charusso, steakhal, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75665
2020-06-10 13:43:51 +02:00
Endre Fülöp
c640779494 Revert "[analyzer] On-demand parsing capability for CTU"
This reverts commit 020815fafd15ddac0f2b5539e7766107d7b25ddc.
Reason: PS4 buildbot broke
2020-06-10 10:30:10 +02:00
Endre Fülöp
020815fafd [analyzer] On-demand parsing capability for CTU
Summary:
Introduce on-demand parsing of needed ASTs during CTU analysis.
The index-file format is extended, and analyzer-option CTUInvocationList
is added to specify the exact invocations needed to parse the needed
source-files.

Reviewers: martong, balazske, Szelethus, xazax.hun, whisperity

Reviewed By: martong, xazax.hun

Subscribers: gribozavr2, thakis, ASDenysPetrov, ormris, mgorny, whisperity, xazax.hun, baloghadamsoftware, szepet, rnkovacs, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, Charusso, steakhal, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75665
2020-06-10 08:59:04 +02:00
Kirstóf Umann
6f5431846b [analyzer][RetainCount] Remove the CheckOSObject option
As per http://lists.llvm.org/pipermail/cfe-dev/2019-August/063215.html, lets get rid of this option.

It presents 2 issues that have bugged me for years now:

* OSObject is NOT a boolean option. It in fact has 3 states:
  * osx.OSObjectRetainCount is enabled but OSObject it set to false: RetainCount
    regards the option as disabled.
  * sx.OSObjectRetainCount is enabled and OSObject it set to true: RetainCount
    regards the option as enabled.
  * osx.OSObjectRetainCount is disabled: RetainCount regards the option as
    disabled.
* The hack involves directly modifying AnalyzerOptions::ConfigTable, which
  shouldn't even be public in the first place.

This still isn't really ideal, because it would be better to preserve the option
and remove the checker (we want visible checkers to be associated with
diagnostics, and hidden options like this one to be associated with changing how
the modeling is done), but backwards compatibility is an issue.

Differential Revision: https://reviews.llvm.org/D78097
2020-05-26 13:22:58 +02:00
Kirstóf Umann
1c8f999e0b [analyzer][CallAndMessage] Add checker options for each bug type
iAs listed in the summary D77846, we have 5 different categories of bugs we're
checking for in CallAndMessage. I think the documentation placed in the code
explains my thought process behind my decisions quite well.

A non-obvious change I had here is removing the entry for
CallAndMessageUnInitRefArg. In fact, I removed the CheckerNameRef typed field
back in D77845 (it was dead code), so that checker didn't really exist in any
meaningful way anyways.

Differential Revision: https://reviews.llvm.org/D77866
2020-05-21 15:31:37 +02:00