This patch improves MallocChecker to detect use-after-free bugs when
a freed structure's field is passed by address (e.g., `&ptr->field`).
Previously, MallocChecker would miss such cases, as it only checked the
top-level symbol of argument values.
This patch analyzes the base region of arguments and extracts the
symbolic region (if any), allowing UAF detection even for field address
expressions.
Fixes#152446
Changed the warning message:
- **From**: 'Attempt to free released memory'
**To**: 'Attempt to release already released memory'
- **From**: 'Attempt to free non-owned memory'
**To**: 'Attempt to release non-owned memory'
- **From**: 'Use of memory after it is freed'
**To**: 'Use of memory after it is released'
All connected tests and their expectations have been changed
accordingly.
Inspired by [this
PR](https://github.com/llvm/llvm-project/pull/147542#discussion_r2195197922)
MallocChecker.cpp has a complex heuristic that supresses reports where
the memory release happens during the release of a reference-counted
object (to suppress a significant amount of false positives).
Previously this logic asserted that there is at most one release point
corresponding to a symbol, but it turns out that there is a rare corner
case where the symbol can be released, forgotten and then released
again. This commit removes that assertion to avoid the crash. (As this
issue just affects a bug suppression heuristic, I didn't want to dig
deeper and modify the way the state of the symbol is changed.)
Fixes#149754
The checks for the 'z' and 't' format specifiers added in the original
PR #143653 had some issues and were overly strict, causing some build
failures and were consequently reverted at
4c85bf2fe8.
In the latest commit
27c58629ec,
I relaxed the checks for the 'z' and 't' format specifiers, so warnings
are now only issued when they are used with mismatched types.
The original intent of these checks was to diagnose code that assumes
the underlying type of `size_t` is `unsigned` or `unsigned long`, for
example:
```c
printf("%zu", 1ul); // Not portable, but not an error when size_t is unsigned long
```
However, it produced a significant number of false positives. This was
partly because Clang does not treat the `typedef` `size_t` and
`__size_t` as having a common "sugar" type, and partly because a large
amount of existing code either assumes `unsigned` (or `unsigned long`)
is `size_t`, or they define the equivalent of size_t in their own way
(such as
sanitizer_internal_defs.h).2e67dcfdcd/compiler-rt/lib/sanitizer_common/sanitizer_internal_defs.h (L203)
Including the results of `sizeof`, `sizeof...`, `__datasizeof`,
`__alignof`, `_Alignof`, `alignof`, `_Countof`, `size_t` literals, and
signed `size_t` literals, the results of pointer-pointer subtraction and
checks for standard library functions (and their calls).
The goal is to enable clang and downstream tools such as clangd and
clang-tidy to provide more portable hints and diagnostics.
The previous discussion can be found at #136542.
This PR implements this feature by introducing a new subtype of `Type`
called `PredefinedSugarType`, which was considered appropriate in
discussions. I tried to keep `PredefinedSugarType` simple enough yet not
limited to `size_t` and `ptrdiff_t` so that it can be used for other
purposes. `PredefinedSugarType` wraps a canonical `Type` and provides a
name, conceptually similar to a compiler internal `TypedefType` but
without depending on a `TypedefDecl` or a source file.
Additionally, checks for the `z` and `t` format specifiers in format
strings for `scanf` and `printf` were added. It will precisely match
expressions using `typedef`s or built-in expressions.
The affected tests indicates that it works very well.
Several code require that `SizeType` is canonical, so I kept `SizeType`
to its canonical form.
The failed tests in CI are allowed to fail. See the
[comment](https://github.com/llvm/llvm-project/pull/135386#issuecomment-3049426611)
in another PR #135386.
This commit removes the DoubleDelete bug type from `MallocChecker.cpp`
because it's completely redundant with the `DoubleFree` bug (which is
already used for all allocator families, including new/delete).
This simplifies the code of the checker and prevents the potential
confusion caused by two semantically equivalent and very similar, but
not identical bug report messages.
This commit converts MallocChecker to the new checker family framework
that was introduced in the recent commit
6833076a5d9f5719539a24e900037da5a3979289 -- and gets rid of some awkward
unintended interactions between the checker frontends.
Placement new does not allocate memory, so it should not be reported as
a memory leak. A recent MallocChecker refactor changed inlining of
placement-new calls with manual evaluation by MallocChecker.
339282d49f
This change avoids marking the value returned by placement new as
allocated and hence avoids the false leak reports.
Note that the there are two syntaxes to invoke placement new:
`new (p) int` and an explicit operator call `operator new(sizeof(int), p)`.
The first syntax was already properly handled by the engine.
This change corrects handling of the second syntax.
CPP-6375
Previously some checkers attached explicitly created program point tags
to some of the exploded graph nodes that they created. In most of the
checkers this ad-hoc tagging only affected the debug dump of the
exploded graph (and they weren't too relevant for debugging) so this
commit removes them.
There were two checkers where the tagging _did_ have a functional role:
- In `RetainCountChecker` the presence of tags were checked by
`RefCountReportVisitor`.
- In `DynamicTypePropagation` the checker sometimes wanted to create two
identical nodes and had to apply an explicit tag on the second one to
avoid "caching out".
In these two situations I preserved the tags but switched to using
`SimpleProgramPointTag` instead of `CheckerProgramPointTag` because
`CheckerProgramPointTag` didn't provide enough benefits to justify its
existence.
Note that this commit depends on the earlier commit "[analyzer] Fix
tagging of PostAllocatorCall" ec96c0c072ef3f78813c378949c00e1c07aa44e5
and would introduce crashes when cherry-picked onto a branch that
doesn't contain that commit.
For more details about the background see the discourse thread
https://discourse.llvm.org/t/role-of-programpointtag-in-the-static-analyzer/
As a tangentially related changes, this commit also adds some comments
to document the surprising behavior of `CheckerContext::addTransition`
and an assertion in the constructor of `PathSensitiveBugReport` to get a
more readable crash dump in the case when the report is constructed with
`nullptr` as the `ErrorNode`. (This can happen due to "caching out".)
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Closes#57270.
This PR changes the `Stmt *` field in `SymbolConjured` with
`CFGBlock::ConstCFGElementRef`. The motivation is that, when conjuring a
symbol, there might not always be a statement available, causing
information to be lost for conjured symbols, whereas the CFGElementRef
can always be provided at the callsite.
Following the idea, this PR changes callsites of functions to create
conjured symbols, and replaces them with appropriate `CFGElementRef`s.
There is a caveat at loop widening, where the correct location is the
CFG terminator (which is not an element and does not have a ref). In
this case, the first element in the block is passed as a location.
Previous PR #128251, Reverted at #137304.
This PR changes the `Stmt *` field in `SymbolConjured` with
`CFGBlock::ConstCFGElementRef`. The motivation is that, when conjuring a
symbol, there might not always be a statement available, causing
information to be lost for conjured symbols, whereas the CFGElementRef
can always be provided at the callsite.
Following the idea, this PR changes callsites of functions to create
conjured symbols, and replaces them with appropriate `CFGElementRef`s.
Closes#57270
`CheckerNameRef` is a trivial wrapper around a `StringRef` which is
guaranteed to be owned by the `CheckerRegistry` (the only `friend` of
the class) because other code can't call the private constructor.
This class had offered two ways to recover the plain `StringRef`: an an
`operator StringRef()` for implicit conversion and a method `StringRef
getName()` which could be called explicitly.
However this method name was really confusing, because it implies "get
the name of this object" instead of "get this name as a plain
`StringRef`"; so I removed it from the codebase and used
`static_cast<StringRef>` in the two locations where the cast wasn't
performed implicitly.
This commit "prepares the ground" for planned improvements in checker
name handling.
In general, if we see an allocation, we associate the immutable memory
space with the constructed memory region.
This works fine if we see the allocation.
However, with symbolic regions it's not great because there we don't
know anything about their memory spaces, thus put them into the Unknown
space.
The unfortunate consequence is that once we learn about some aliasing
with this Symbolic Region, we can't change the memory space to the
deduced one.
In this patch, we open up the memory spaces as a trait, basically
allowing associating a better memory space with a memregion that
was created with the Unknown memory space.
As a side effect, this means that now queriing the memory space of a
region depends on the State, but many places in the analyzer, such as
the Store, doesn't have (and cannot have) access to the State by design.
This means that some uses must solely rely on the memspaces of the
region, but any other users should use the getter taking a State.
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Fixes#62985Fixes#58820
When 3rd-party header files are included as system headers, their
overloaded `new` and `delete` operators are also considered as the std
ones. However, those overloaded operator functions will also be inlined.
This makes the same
symbolic memory marked as released twice: during `checkPreCall` of the
overloaded `delete` operator and when calling `::operator delete` after
inlining the overloaded operator function (if it has).
This patch attempts to fix this bug by adjusting the strategy of
verifying whether the callee is a standard `new` or `delete` operator in
the `isStandardNewDelete` function.
There is no good way to tell CSA if function with `ownership_returns`
attribute returns initialized or not initialized memory. To make FP rate
lower, let's assume that memory returned from such functions is unknown
and do not reason about it.
In future it would be great to add a way to annotate such behavior
Current MalloChecker logic suppresses FP caused by refcounting only for
C++ destructors. The same pattern occurs a lot in C in objects with
intrusive refcounting. See #104229 for code example.
To extend current logic to C, suspect all release functions as candidate
for suppression.
Closes: #104229
PR refactors `MallocChecker` to not violate invariant of `BindExpr`,
which should be called only during `evalCall` to avoid conflicts.
To achieve this, most of `postCall` logic was moved to `evalCall` with
addition return value binding in case of processing of allocation
functions. Check functions prototypes was changed to use `State` with
bound return value.
`checkDelim` logic was left in `postCall` to avoid conflicts with
`StreamChecker` which also evaluates `getline` and friends.
PR also introduces breaking change in the unlikely case when the
definition of an allocation function (e.g. `malloc()`) is visible: now
checker does not try to inline allocation functions and assumes their
initial semantics.
Closes#73830
In preparation for adding essentially the same visitor to StreamChecker,
this patch factors this visitor out to a common header.
I'll be the first to admit that the interface of these classes are not
terrific, but it rather tightly held back by its main technical debt,
which is NoStoreFuncVisitor, the main descendant of
NoStateChangeVisitor.
Change-Id: I99d73ccd93a18dd145bbbc83afadbb432dd42b90
A new optional checker (optin.taint.TaintedAlloc) will warn if a memory
allocation function (malloc, calloc, realloc, alloca, operator new[]) is
called with a tainted (attacker controlled) size parameter.
A large, maliciously set size value can trigger memory exhaustion. To
get this warning, the alpha.security.taint.TaintPropagation checker also
needs to be switched on.
The warning will only be emitted, if the analyzer cannot prove that the
size is below reasonable bounds (<SIZE_MAX/4).
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
24 under clang/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
Fixes#90498.
Same as 5337efc69cdd5 for atomic builtins, but for `std::atomic` this
time. This is useful because even though the actual builtin atomic is
still there, it may be buried beyond the inlining depth limit.
Also add one popular custom smart pointer class name to the name-based
heuristics, which isn't necessary to fix the bug but arguably a good
idea regardless.
Before this commit the the checker alpha.security.taint.TaintPropagation always reported warnings when the size argument of a memcpy-like or malloc-like function was tainted. However, this produced false positive reports in situations where the size was tainted, but correctly performed bound checks guaranteed the safety of the call.
This commit removes the rough "always warn if the size argument is tainted" heuristic; but it would be good to add a more refined "warns if the size argument is tainted and can be too large" heuristic in follow-up commits. That logic would belong to CStringChecker and MallocChecker, because those are the checkers responsible for the more detailed modeling of memcpy-like and malloc-like functions. To mark this plan, TODO comments are added in those two checkers.
There were several test cases that used these sinks to test generic properties of taint tracking; those were adapted to use different logic.
As a minor unrelated change, this commit ensures that strcat (and its wide variant, wcsncat) propagates taint from the first argument to the first argument, i.e. a tainted string remains tainted if we concatenate it with another string. This change was required because the adapted variant of multipleTaintedArgs is relying on strncat to compose a value that combines taint from two different sources.
This commit ensures that the `CallDescription`s in `MallocChecker` are
matched with the mode `CDM::CLibrary`, so:
- they don't match methods or functions within user-defined namespaces;
- they also match builtin variants of these functions (if any), so the
checker can model `__builtin_alloca()` like `alloca()`.
This change fixes https://github.com/llvm/llvm-project/issues/81597. New
tests were added to verify that `std::malloc` and `std::free` (from
`<cstdlib>`) are modeled, but a method that's named e.g. `free` isn't
confused with the memory release function.
The responsibility for modeling `__builtin_alloca` and
`__builtin_alloca_with_align` was moved from `BuiltinFunctionChecker` to
`MallocChecker`, to avoid buggy interactions between the checkers and
ensure that the builtin and non-builtin variants are handled by exactly
the same logic.
This change might be a step backwards for the users who don't have
`unix.Malloc` enabled; but I suspect that `__builtin_alloca()` is so
rare that it would be a waste of time to implement backwards
compatibility for them.
There were several test files that relied on `__builtin_alloca()` calls
to get an `AllocaRegion`, these were modified to enable `unix.Malloc`.
One of these files (cxx-uninitialized-object-ptr-ref.cpp) had some tests
that relied on the fact that `malloc()` was treated as a "black box" in
them, these were updated to use `calloc()` (to get initialized memory)
and `free()` (to avoid memory leak reports).
While I was developing this change, I found a very suspicious assert in
`MallocChecker`. As it isn't blocking the goals of this commit, I just
marked it with a FIXME, but I'll try to investigate and fix it in a
follow-up change.
This reapplies 80ab8234ac309418637488b97e0a62d8377b2ecf again, after
fixing a name collision warning in the unit tests (see the revert commit
13ccaf9b9d4400bb128b35ff4ac733e4afc3ad1c for details).
In addition to the previously applied changes, this commit also clarifies the
code in MallocChecker that distinguishes POSIX "getline()" and C++ standard
library "std::getline()" (which are two completely different functions). Note
that "std::getline()" was (accidentally) handled correctly even without this
clarification; but it's better to explicitly handle and test this corner case.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
According to POSIX 2018.
1. lineptr, n and stream can not be NULL.
2. If *n is non-zero, *lineptr must point to a region of at least *n
bytes, or be a NULL pointer.
Additionally, if *lineptr is not NULL, *n must not be undefined.
`getdelim` and `getline` may free, allocate, or re-allocate the input
buffer, ensuring its size is enough to hold the incoming line, the
delimiter, and the null terminator.
`*lineptr` must be a valid argument to `free`, which means it can be
either
1. `NULL`, in which case these functions perform an allocation
equivalent to a call to `malloc` even on failure.
2. A pointer returned by the `malloc` family of functions. Other
pointers are UB (`alloca`, a pointer to a static, to a stack variable, etc.)
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.
However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597
To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.
Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
- `CDF_None` was the default "match anything" mode,
- `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).
Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
...to model the results of alloca() and _alloca() calls. Previously it
acted as if these functions were returning memory from the heap, which
led to alpha.security.ArrayBoundV2 producing incorrect messages.
This trivial commit removes a 13-year-old FIXME comment from
MallocChecker.cpp because it was fixed at least 10 years ago when
`getConjuredHeapSymbolVal()` was added.
I'm involved with the Static Analyzer for the most part.
I think we should embrace newer language standard features and gradually
move forward.
Differential Revision: https://reviews.llvm.org/D154325
This patch adds CFGElementRef to ProgramPoints
and helps the analyzer to differentiate between
two otherwise identically looking ProgramPoints.
Fixes#60412
Differential Revision: https://reviews.llvm.org/D143328
This avoids recomputing string length that is already known at compile time.
It has a slight impact on preprocessing / compile time, see
https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u
This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470.
The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.
Differential Revision: https://reviews.llvm.org/D139881
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4 (part 2)"
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4"
GCC build hangs on this bot https://lab.llvm.org/buildbot/#/builders/37/builds/19104
compiling CMakeFiles/obj.clangBasic.dir/Targets/AArch64.cpp.d
The bot uses GNU 11.3.0, but I can reproduce locally with gcc (Debian 12.2.0-3) 12.2.0.
This reverts commit caa713559bd38f337d7d35de35686775e8fb5175.
This reverts commit 06b90e2e9c991e211fecc97948e533320a825470.
This reverts commit e953ae5bbc313fd0cc980ce021d487e5b5199ea4.
std::optional::value() has undesired exception checking semantics and is
unavailable in older Xcode (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). The
call sites block std::optional migration.
This makes `ninja clang` work in the absence of llvm::Optional::value.