This commit deletes the "simple" constructor of `CallDescription` which
did not require a `CallDescription::Mode` argument and always used the
"wildcard" mode `CDM::Unspecified`.
A few months ago, this vague matching mode was used by many checkers,
which caused bugs like https://github.com/llvm/llvm-project/issues/81597
and https://github.com/llvm/llvm-project/issues/88181. Since then, my
commits improved the available matching modes and ensured that all
checkers explicitly specify the right matching mode.
After those commits, the only remaining references to the "simple"
constructor were some unit tests; this commit updates them to use an
explicitly specified matching mode (often `CDM::SimpleFunc`).
The mode `CDM::Unspecified` was not deleted in this commit because it's
still a reasonable choice in `GenericTaintChecker` and a few unit tests.
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.
Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.
Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.
This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.
This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).
The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.
After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.
This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.
After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.
This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.
As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.
However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597
To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.
Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
- `CDF_None` was the default "match anything" mode,
- `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).
Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
This avoids recomputing string length that is already known at compile time.
It has a slight impact on preprocessing / compile time, see
https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u
This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470.
The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.
Differential Revision: https://reviews.llvm.org/D139881
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4 (part 2)"
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4"
GCC build hangs on this bot https://lab.llvm.org/buildbot/#/builders/37/builds/19104
compiling CMakeFiles/obj.clangBasic.dir/Targets/AArch64.cpp.d
The bot uses GNU 11.3.0, but I can reproduce locally with gcc (Debian 12.2.0-3) 12.2.0.
This reverts commit caa713559bd38f337d7d35de35686775e8fb5175.
This reverts commit 06b90e2e9c991e211fecc97948e533320a825470.
This reverts commit e953ae5bbc313fd0cc980ce021d487e5b5199ea4.
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Since CallDescriptions can only be matched against CallEvents that are created
during symbolic execution, it was not possible to use it in syntactic-only
contexts. For example, even though InnerPointerChecker can check with its set of
CallDescriptions whether a function call is interested during analysis, its
unable to check without hassle whether a non-analyzer piece of code also calls
such a function.
The patch adds the ability to use CallDescriptions in syntactic contexts as
well. While we already have that in Signature, we still want to leverage the
ability to use dynamic information when we have it (function pointers, for
example). This could be done with Signature as well (StdLibraryFunctionsChecker
does it), but it makes it even less of a drop-in replacement.
Differential Revision: https://reviews.llvm.org/D119004
`CallDescriptions` have a `RequiredArgs` and `RequiredParams` members,
but they are of different types, `unsigned` and `size_t` respectively.
In the patch I use only `unsigned` for both, that should be large enough
anyway.
I also introduce the `MaybeUInt` type alias for `Optional<unsigned>`.
Additionally, I also avoid the use of the //smart// less-than operator.
template <typename T>
constexpr bool operator<=(const Optional<T> &X, const T &Y);
Which would check if the optional **has** a value and compare the data
only after. I found it surprising, thus I think we are better off
without it.
Reviewed By: martong, xazax.hun
Differential Revision: https://reviews.llvm.org/D113594
Previously, CallDescription simply referred to the qualified name parts
by `const char*` pointers.
In the future we might want to dynamically load and populate
`CallDescriptionMaps`, hence we will need the `CallDescriptions` to
actually **own** their qualified name parts.
Reviewed By: martong, xazax.hun
Differential Revision: https://reviews.llvm.org/D113593
This patch introduces `CallDescription::matches()` member function,
accepting a `CallEvent`.
Semantically, `Call.isCalled(CD)` is the same as `CD.matches(Call)`.
The patch also introduces the `matchesAny()` variadic free function template.
It accepts a `CallEvent` and at least one `CallDescription` to match
against.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D113590
Sometimes we only want to decide if some function is called, and we
don't care which of the set.
This `CallDescriptionSet` will have the same behavior, except
instead of `lookup()` returning a pointer to the mapped value,
the `contains()` returns `bool`.
Internally, it uses the `CallDescriptionMap<bool>` for implementing the
behavior. It is preferred, to reuse the generic
`CallDescriptionMap::lookup()` logic, instead of duplicating it.
The generic version might be improved by implementing a hash lookup or
something along those lines.
Reviewed By: martong, Szelethus
Differential Revision: https://reviews.llvm.org/D113589
`CallDescriptions` deserve its own translation unit.
This patch simply moves the corresponding parts.
Also includes the `CallDescription.h` where it's necessary.
Reviewed By: martong, xazax.hun, Szelethus
Differential Revision: https://reviews.llvm.org/D113587