24 Commits

Author SHA1 Message Date
Donát Nagy
a807e8ea9f
[analyzer] Prettify checker registration and unittest code (#147797)
This commit tweaks the interface of `CheckerRegistry::addChecker` to
make it more practical for plugins and tests:
- The parameter `IsHidden` now defaults to `false` even in the
  non-templated overload (because setting it to true is unusual,
  especially in plugins).
- The parameter `DocsUri` defaults to the dummy placeholder string
  `"NoDocsUri"` because (as of now) nothing queries its value from the
  checker registry (it's only used by the logic that generates the
  clang-tidy documentation, but that loads it directly from `Checkers.td`
  without involving the `CheckerRegistry`), so there is no reason to
  demand specifying this value.

In addition to propagating these changes, this commit clarifies,
corrects and extends lots of comments and performs various minor code
quality improvements in the code of unit tests and example plugins.

I originally wrote the bulk of this commit when I was planning to add an
extra parameter to `addChecker` in order to implement some technical
details of the CheckerFamily framework. At the end I decided against
adding that extra parameter, so this cleanup was left out of the PR
https://github.com/llvm/llvm-project/pull/139256 and I'm merging it now
as a separate commit (after minor tweaks).

This commit is mostly NFC: the only functional change is that the
analyzer will be compatible with plugins that rely on the default
argument values and don't specify `IsHidden` or `DocsUri`. (But existing
plugin code will remain valid as well.)
2025-07-22 13:36:58 +02:00
Kazu Hirata
1cc9e6e5aa
[StaticAnalyzer] Use llvm::count (NFC) (#141370) 2025-05-24 14:45:46 -07:00
Kazu Hirata
f002f300c5
[clang] Remove unused local variables (NFC) (#138453) 2025-05-04 10:51:40 -07:00
Donát Nagy
58bad2862c
[analyzer][NFC] Require explicit matching mode for CallDescriptions (#92454)
This commit deletes the "simple" constructor of `CallDescription` which
did not require a `CallDescription::Mode` argument and always used the
"wildcard" mode `CDM::Unspecified`.

A few months ago, this vague matching mode was used by many checkers,
which caused bugs like https://github.com/llvm/llvm-project/issues/81597
and https://github.com/llvm/llvm-project/issues/88181. Since then, my
commits improved the available matching modes and ensured that all
checkers explicitly specify the right matching mode.

After those commits, the only remaining references to the "simple"
constructor were some unit tests; this commit updates them to use an
explicitly specified matching mode (often `CDM::SimpleFunc`).

The mode `CDM::Unspecified` was not deleted in this commit because it's
still a reasonable choice in `GenericTaintChecker` and a few unit tests.
2024-05-17 13:08:45 +02:00
NagyDonat
fb299cae51
[analyzer] Make recognition of hardened __FOO_chk functions explicit (#86536)
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.

Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.

Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.

This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.

This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).

The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.

After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.

This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.

After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.

This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.

As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-04-05 11:20:27 +02:00
NagyDonat
52a460f9d4
[analyzer] Refactor CallDescription match mode (NFC) (#83432)
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.

However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597

To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.

Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
 - `CDF_None` was the default "match anything" mode,
 - `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).

Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
2024-03-04 15:43:37 +01:00
isuckatcs
d65379c8d4 [analyzer] Remove the loop from the exploded graph caused by missing information in program points
This patch adds CFGElementRef to ProgramPoints
and helps the analyzer to differentiate between
two otherwise identically looking ProgramPoints.

Fixes #60412

Differential Revision: https://reviews.llvm.org/D143328
2023-03-04 02:01:45 +01:00
serge-sans-paille
d9ab3e82f3
[clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470.

The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.

Differential Revision: https://reviews.llvm.org/D139881
2022-12-27 09:55:19 +01:00
Vitaly Buka
aa171833ab Revert "[clang] Use a StringRef instead of a raw char pointer to store builtin and call information"
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4 (part 2)"
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4"

GCC build hangs on this bot https://lab.llvm.org/buildbot/#/builders/37/builds/19104
compiling CMakeFiles/obj.clangBasic.dir/Targets/AArch64.cpp.d

The bot uses GNU 11.3.0, but I can reproduce locally with gcc (Debian 12.2.0-3) 12.2.0.

This reverts commit caa713559bd38f337d7d35de35686775e8fb5175.
This reverts commit 06b90e2e9c991e211fecc97948e533320a825470.
This reverts commit e953ae5bbc313fd0cc980ce021d487e5b5199ea4.
2022-12-25 23:12:47 -08:00
serge-sans-paille
e953ae5bbc
[clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile
time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This is a recommit of 719d98dfa841c522d8d452f0685e503538415a53 that into
account a GGC issue (probably
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92181) when dealing with
intiailizer_list and constant expressions.

Workaround this by avoiding initializer list, at the expense of a
temporary plain old array.

Differential Revision: https://reviews.llvm.org/D139881
2022-12-24 10:25:06 +01:00
serge-sans-paille
07d9ab9aa5
Revert "[clang] Use a StringRef instead of a raw char pointer to store builtin and call information"
There are still remaining issues with GCC 12, see for instance

https://lab.llvm.org/buildbot/#/builders/93/builds/12669

This reverts commit 5ce4e92264102de21760c94db9166afe8f71fcf6.
2022-12-23 13:29:21 +01:00
serge-sans-paille
5ce4e92264
[clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile
time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This is a recommit of 719d98dfa841c522d8d452f0685e503538415a53 with a
change to llvm/utils/TableGen/OptParserEmitter.cpp to cope with GCC bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108158

Differential Revision: https://reviews.llvm.org/D139881
2022-12-23 12:48:17 +01:00
serge-sans-paille
b7065a31b5
Revert "[clang] Use a StringRef instead of a raw char pointer to store builtin and call information"
Failing builds: https://lab.llvm.org/buildbot#builders/9/builds/19030
This is GCC specific and has been reported upstream: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108158

This reverts commit 719d98dfa841c522d8d452f0685e503538415a53.
2022-12-23 11:36:56 +01:00
serge-sans-paille
719d98dfa8
[clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile
time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

Differential Revision: https://reviews.llvm.org/D139881
2022-12-23 10:31:47 +01:00
Kazu Hirata
a41fbb1fc2 [clang/unittests] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-03 12:14:19 -08:00
Kristóf Umann
32ac21d049 [NFC][analyzer] Allow CallDescriptions to be matched with CallExprs
Since CallDescriptions can only be matched against CallEvents that are created
during symbolic execution, it was not possible to use it in syntactic-only
contexts. For example, even though InnerPointerChecker can check with its set of
CallDescriptions whether a function call is interested during analysis, its
unable to check without hassle whether a non-analyzer piece of code also calls
such a function.

The patch adds the ability to use CallDescriptions in syntactic contexts as
well. While we already have that in Signature, we still want to leverage the
ability to use dynamic information when we have it (function pointers, for
example). This could be done with Signature as well (StdLibraryFunctionsChecker
does it), but it makes it even less of a drop-in replacement.

Differential Revision: https://reviews.llvm.org/D119004
2022-03-01 17:13:04 +01:00
Balazs Benics
abc873694f [analyzer] Restrict CallDescription fuzzy builtin matching
`CallDescriptions` for builtin functions relaxes the match rules
somewhat, so that the `CallDescription` will match for calls that have
some prefix or suffix. This was achieved by doing a `StringRef::contains()`.
However, this is somewhat problematic for builtins that are substrings
of each other.

Consider the following:

`CallDescription{ builtin, "memcpy"}` will match for
`__builtin_wmemcpy()` calls, which is unfortunate.

This patch addresses/works around the issue by checking if the
characters around the function's name are not part of the 'name'
semantically. In other words, to accept a match for `"memcpy"` the call
should not have alphanumeric (`[a-zA-Z]`) characters around the 'match'.

So, `CallDescription{ builtin, "memcpy"}` will not match on:

 - `__builtin_wmemcpy: there is a `w` alphanumeric character before the match.
 - `__builtin_memcpyFOoBar_inline`: there is a `F` character after the match.
 - `__builtin_memcpyX_inline`: there is an `X` character after the match.

But it will still match for:
 - `memcpy`: exact match
 - `__builtin_memcpy`: there is an _ before the match
 - `__builtin_memcpy_inline`: there is an _ after the match
 - `memcpy_inline_builtinFooBar`: there is an _ after the match

Reviewed By: NoQ

Differential Revision: https://reviews.llvm.org/D118388
2022-02-11 10:45:18 +01:00
Balazs Benics
0b9d3a6e53 [analyzer][NFC] Separate CallDescription from CallEvent
`CallDescriptions` deserve its own translation unit.
This patch simply moves the corresponding parts.
Also includes the `CallDescription.h` where it's necessary.

Reviewed By: martong, xazax.hun, Szelethus

Differential Revision: https://reviews.llvm.org/D113587
2021-11-15 19:10:46 +01:00
Balazs Benics
72d04d7b2b [analyzer] Allow matching non-CallExprs using CallDescriptions
Fallback to stringification and string comparison if we cannot compare
the `IdentifierInfo`s, which is the case for C++ overloaded operators,
constructors, destructors, etc.

Examples:
  { "std", "basic_string", "basic_string", 2} // match the 2 param std::string constructor
  { "std", "basic_string", "~basic_string" }  // match the std::string destructor
  { "aaa", "bbb", "operator int" } // matches the struct bbb conversion operator to int

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D111535
2021-10-18 14:57:24 +02:00
Balazs Benics
5644d15257 [analyzer][NFC] Add unittests for CallDescription and split the old ones
This NFC change accomplishes three things:
1) Splits up the single unittest into reasonable segments.
2) Extends the test infra using a template to select the AST-node
   from which it is supposed to construct a `CallEvent`.
3) Adds a *lot* of different tests, documenting the current
   capabilities of the `CallDescription`. The corresponding tests are
   marked with `FIXME`s, where the current behavior should be different.

Both `CXXMemberCallExpr` and `CXXOperatorCallExpr` are derived from
`CallExpr`, so they are matched by using the default template parameter.
On the other hand, `CXXConstructExpr` is not derived from `CallExpr`.
In case we want to match for them, we need to pass the type explicitly
to the `CallDescriptionAction`.

About destructors:
They have no AST-node, but they are generated in the CFG machinery in
the analyzer. Thus, to be able to match against them, we would need to
construct a CFG and walk on that instead of simply walking the AST.

I'm also relaxing the `EXPECT`ation in the
`CallDescriptionConsumer::performTest()`, to check the `LookupResult`
only if we matched for the `CallDescription`.
This is necessary to allow tests in which we expect *no* matches at all.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D111794
2021-10-18 14:57:24 +02:00
Dmitri Gribenko
b22804b354 [Tooling] Migrated APIs that take ownership of objects to unique_ptr
Subscribers: jkorous, arphaman, kadircet, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D66960

llvm-svn: 370451
2019-08-30 09:29:34 +00:00
Jonas Devlieghere
2b3d49b610 [Clang] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

Differential revision: https://reviews.llvm.org/D66259

llvm-svn: 368942
2019-08-14 23:04:18 +00:00
Artem Dergachev
f301096f51 [analyzer] NFC: CallDescription: Implement describing C library functions.
When matching C standard library functions in the checker, it's easy to forget
that they are often implemented as macros that are expanded to builtins.

Such builtins would have a different name, so matching the callee identifier
would fail, or may sometimes have more arguments than expected, so matching
the exact number of arguments would fail, but this is fine as long as we have
all the arguments that we need in their respective places.

This patch adds a set of flags to the CallDescription class so that to handle
various special matching rules, and adds the first flag into this set,
which enables a more fuzzy matching for functions that
may be implemented as compiler builtins.

Differential Revision: https://reviews.llvm.org/D62556

llvm-svn: 364867
2019-07-01 23:02:07 +00:00
Artem Dergachev
ec8e95640f [analyzer] NFC: Add a convenient CallDescriptionMap class.
It encapsulates the procedure of figuring out whether a call event
corresponds to a function that's modeled by a checker.

Checker developers no longer need to worry about performance of
lookups into their own custom maps.

Add unittests - which finally test CallDescription itself as well.

Differential Revision: https://reviews.llvm.org/D62441

llvm-svn: 364866
2019-07-01 23:02:03 +00:00