Eight StaticAnalyzer checker files define a REGISTER_CHECKER macro that
is never undefined, which can leak into subsequent translation units in
unity builds. Add #undef REGISTER_CHECKER at the end of each file. See
https://discourse.llvm.org/t/rfc-enabling-unity-build/90306 for more
info.
"clauded" not coded
It turns out, that some checks for cstring functions happened as a side
effect of other checks. For example, whether the arguments to memcpy
were uninitialized happened during buffer overflow checking.
The way this was implemented is that if alpha.unix.cstring.OutOfBounds
was disabled, alpha.unix.cstring.UninitializedRead couldn't emit any
warnings. It turns out that major modeling steps are early-exited if a
certain checker is disabled!
This patch moved the early returns to the report emission parts --
modeling still happens, only the bug report construction is omitted.
This would mean that if we find a fatal error (like buffer overflow) we
_should_ stop analysis even if we don't emit a warning (thats a part of
doing modeling), but I decided against implementing that.
One hurdle is that CStringChecker is a dependency of MallocChecker, and
the current tests rely on the CStringChecker _not_ terminating execution
paths prematurely. Considering that the checkers that would do that are
in alpha anyways, this doesn't seem to be an urgent step immediately.
I added FIXMEs to all tests would have failed if the patch sank the
analysis at the fatal cstring function call, but didn't. I also added a
new test case for buffers overlapping, but not being quite equal.
Original Author: Kristóf Umann <dkszelethus@gmail.com>
Co-Author: Endre Fülöp <endre.fulop@sigmatechnology.com> (rebasing, cleaning up comments)
Reviewers added valuable suggestions to comments/test code organisation.
Signature:
```c
size_t strxfrm(char *dest, const char *src, size_t n);
```
The modeling covers:
* `src` can never be null
* `dest` can be null only if n is 0, and then the return value is some unspecified positive integer
* `src` and `dest` must not overlap
* `dest` must have at least `n` bytes of capacity
* The return value can either be:
- `< n`, and the contents of the buffer pointed by `dest` is invalidated
- `>= n`, and the contents of the buffer pointed by `dest` is marked as undefined
CPP-6854
Prevent an assertion failure in the cstring checker when library
functions like memcpy are defined with non-default address spaces.
Adds a test for this case.
CStringChecker had an AdditionOverflow bug type which was intended for a
situation where the analyzer concludes that the addition of two
size/length values overflows `size_t`.
I strongly suspect that the analyzer could emit bugs of this type in
certain complex corner cases (e.g. due to inaccurate cast modeling), but
these reports would be all false positives because in the real world the
sum of two size/length values is always far below SIZE_MAX. (Although
note that there was no test where the analyzer emitted a bug with this
type.)
To simplify the code (and perhaps eliminate false positives), I
eliminated this bug type and replaced code that emits it by a simple
`addSink()` call (because we still want to get rid of the execution
paths where the analyzer has invalid assumptions).
This commit converts the class CStringChecker to the checker family
framework and slightly simplifies the implementation.
This commit is NFC and preserves the confused garbage descriptions and
categories of the bug types (which was inherited from old mistakes). I'm
planning to clean that up in a follow-up commit.
Bounded string functions takes smallest of two values as it's copy size
(`amountCopied` variable in `evalStrcpyCommon`), and it's used to
decided whether this operation will cause out-of-bound access and
invalidate it's super region if it does.
for `strlcat`: `amountCopied = min (size - dstLen - 1 , srcLen)`
for others: `amountCopied = min (srcLen, size)`
Currently when one of two values is unknown or `SValBuilder` can't
decide which one is smaller, `amountCopied` will remain `UnknownVal`,
which will invalidate copy destination's super region unconditionally.
This patch add check to see if one of these two values is definitely
in-bound, if so `amountCopied` has to be in-bound too, because it‘s less
than or equal to them, we can avoid the invalidation of super region and
some related false positives in this situation.
Note: This patch uses `size` as an approximation of `size - dstLen - 1`
in `strlcat` case because currently analyzer doesn't handle complex
expressions like this very well.
Closes#143807.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Closes#57270.
This PR changes the `Stmt *` field in `SymbolConjured` with
`CFGBlock::ConstCFGElementRef`. The motivation is that, when conjuring a
symbol, there might not always be a statement available, causing
information to be lost for conjured symbols, whereas the CFGElementRef
can always be provided at the callsite.
Following the idea, this PR changes callsites of functions to create
conjured symbols, and replaces them with appropriate `CFGElementRef`s.
There is a caveat at loop widening, where the correct location is the
CFG terminator (which is not an element and does not have a ref). In
this case, the first element in the block is passed as a location.
Previous PR #128251, Reverted at #137304.
Per suggestion in
https://github.com/llvm/llvm-project/pull/128251#discussion_r2055916229,
adding a new helper function in `SValBuilder` to conjure a symbol when
given a `CallEvent`.
Tested manually (with assertions) that the `LocationContext *` obtained
from the `CallEvent` are identical to those passed in the original
argument.
This PR changes the `Stmt *` field in `SymbolConjured` with
`CFGBlock::ConstCFGElementRef`. The motivation is that, when conjuring a
symbol, there might not always be a statement available, causing
information to be lost for conjured symbols, whereas the CFGElementRef
can always be provided at the callsite.
Following the idea, this PR changes callsites of functions to create
conjured symbols, and replaces them with appropriate `CFGElementRef`s.
Closes#57270
Before this commit, there were two alpha checkers that used different
algorithms/logic for detecting out of bounds memory access: the old
`alpha.security.ArrayBound` and the experimental, more complex
`alpha.security.ArrayBoundV2`.
After lots of quality improvement commits ArrayBoundV2 is now stable
enough to be moved out of the alpha stage. As indexing (and dereference)
are common operations, it still produces a significant amount of false
positives, but not much more than e.g. `core.NullDereference` or
`core.UndefinedBinaryOperatorResult`, so it should be acceptable as a
non-`core` checker.
At this point `alpha.security.ArrayBound` became obsolete (there is a
better tool for the same task), so I'm removing it from the codebase.
With this I can eliminate the ugly "V2" version mark almost everywhere
and rename `alpha.security.ArrayBoundV2` to `security.ArrayBound`.
(The version mark is preserved in the filename "ArrayBoundCheckerV2", to
ensure a clear git history. I'll rename it to "ArrayBoundChecker.cpp" in
a separate commit.)
This commit adapts the unit tests of `alpha.security.ArrayBound` to
testing the new `security.ArrayBound` (= old ArrayBoundV2). Currently
the names of the test files are very haphazard, I'll probably create a
separate followup commit that consolidates this.
One could create dangling APSInt references in various ways in the past, that were sometimes assumed to be persisted in the BasicValueFactor.
One should always use BasicValueFactory to create persistent APSInts, that could be used by ConcreteInts or SymIntExprs and similar long-living objects.
If one used a temporary or local variables for this, these would dangle.
To enforce the contract of the analyzer BasicValueFactory and the uses of APSInts, let's have a dedicated strong-type for this.
The idea is that APSIntPtr is always owned by the BasicValueFactory, and that is the only component that can construct it.
These PRs are all NFC - besides fixing dangling APSInt references.
…ring.UninitRead
This is a drastic simplification of #106982. If you read that patch,
this is the same thing with all BugReporterVisitors.cpp and
SValBuilder.cpp changes removed! (since all replies came regarding
changed to those files, I felt the new PR was justified)
The patch was inspired by a pretty poor bug report on FFMpeg:

In this bug report, block is uninitialized, hence the bug report that it
should not have been passed to memcpy. The confusing part is in line 93,
where block was passed as a non-const pointer to seq_unpack_rle_block,
which was obviously meant to initialize block. As developers, we know
that clang likely didn't skip this function and found a path of
execution on which this initialization failed, but NoStoreFuncVisitor
failed to attach the usual "returning without writing to block" message.
I fixed this by instead of tracking the entire array, I tracked the
actual element which was found to be uninitialized (Remember, we
heuristically only check if the first and last-to-access element is
initialized, not the entire array). This is how the bug report looks
now, with 'seq_unpack_rle_block' having notes describing the path of
execution and lack of a value change:


Since NoStoreFuncVisitor was a TU-local class, I moved it back to
BugReporterVisitors.h, and registered it manually in CStringChecker.cpp.
This was done because we don't have a good trackRegionValue() function,
only a trackExpressionValue() function. We have an expression for the
array, but not for its first (or last-to-access) element, so I only had
a MemRegion on hand.
This commit explicitly specifies the matching mode (C library function,
any non-method function, or C++ method) for the `CallDescription`s
constructed in various checkers.
Some code was simplified to use `CallDescriptionSet`s instead of
individual `CallDescription`s.
This change won't cause major functional changes, but isn't NFC because
it ensures that e.g. call descriptions for a non-method function won't
accidentally match a method that has the same name.
Separate commits have already performed this change in other checkers:
- easy cases: e2f1cbae45f81f3cd9a4d3c2bcf69a094eb060fa
- MallocChecker: d6d84b5d1448e4f2e24b467a0abcf42fe9d543e9
- iterator checkers: 06eedffe0d2782922e63cc25cb927f4acdaf7b30
- InvalidPtr checker: 024281d4d26344f9613b9115ea1fcbdbdba23235
... and follow-up commits will handle the remaining checkers.
My goal is to ensure that the call description mode is always explicitly
specified and eliminate (or strongly restrict) the vague "may be either
a method or a simple function" mode that's the current default.
Before this commit the the checker alpha.security.taint.TaintPropagation always reported warnings when the size argument of a memcpy-like or malloc-like function was tainted. However, this produced false positive reports in situations where the size was tainted, but correctly performed bound checks guaranteed the safety of the call.
This commit removes the rough "always warn if the size argument is tainted" heuristic; but it would be good to add a more refined "warns if the size argument is tainted and can be too large" heuristic in follow-up commits. That logic would belong to CStringChecker and MallocChecker, because those are the checkers responsible for the more detailed modeling of memcpy-like and malloc-like functions. To mark this plan, TODO comments are added in those two checkers.
There were several test cases that used these sinks to test generic properties of taint tracking; those were adapted to use different logic.
As a minor unrelated change, this commit ensures that strcat (and its wide variant, wcsncat) propagates taint from the first argument to the first argument, i.e. a tainted string remains tainted if we concatenate it with another string. This change was required because the adapted variant of multipleTaintedArgs is relying on strncat to compose a value that combines taint from two different sources.
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.
Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.
Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.
This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.
This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).
The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.
After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.
This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.
After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.
This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.
As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.
However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597
To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.
Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
- `CDF_None` was the default "match anything" mode,
- `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).
Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
`CE->getCalleeDecl()` returns `VarDecl` if the callee is actually a
function pointer variable. Consequently, calling `getAsFunction()` will
return null.
To workaround the case, we should use the `CallEvent::parameters()`,
which will internally recover the function being called and do the right
thing.
Fixes#74269
Depends on "[analyzer][NFC] Prefer CallEvent over CallExpr in APIs"
This change only uplifts existing APIs, without any semantic changes.
This is the continuation of 44820630dfa45bc47748a5abda7d4a9cb86da2c1.
Benefits of using CallEvents over CallExprs:
The callee decl is traced through function pointers if possible.
This will be important to fix#74269 in a follow-up patch.
...because it provides no useful functionality compared to its base
class `BugType`.
A long time ago there were substantial differences between `BugType` and
`BuiltinBug`, but they were eliminated by commit 1bd58233 in 2009 (!).
Since then the only functionality provided by `BuiltinBug` was that it
specified `categories::LogicError` as the bug category and it stored an
extra data member `desc`.
This commit sets `categories::LogicError` as the default value of the
third argument (bug category) in the constructors of BugType and
replaces use of the `desc` field with simpler logic.
Note that `BugType` has a data member `Description` and a non-virtual
method `BugType::getDescription()` which queries it; these are distinct
from the member `desc` of `BuiltinBug` and the identically named method
`BuiltinBug::getDescription()` which queries it. This confusing name
collision was a major motivation for the elimination of `BuiltinBug`.
As this commit touches many files, I avoided functional changes and left
behind FIXME notes to mark minor issues that should be fixed later.
Differential Revision: https://reviews.llvm.org/D158855
Older versions of clang (for example, v12) throw an error when compiling
CStringChecker.cpp that the initializers for `SourceArgExpr`,
`DestinationArgExpr`, and `SizeArgExpr` are missing braces around
initialization of subobject. Newer clang versions don't throw this
error. This patch adds the initialization braces to satisfy clang.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D154871
I'm involved with the Static Analyzer for the most part.
I think we should embrace newer language standard features and gradually
move forward.
Differential Revision: https://reviews.llvm.org/D154325
Fixing GitHub issue: https://github.com/llvm/llvm-project/issues/55019
Following the previous fix https://reviews.llvm.org/D12571 on issue https://github.com/llvm/llvm-project/issues/23328
The two issues report false memory leaks after calling string-copy APIs with a buffer field in an object as the destination.
The buffer invalidation incorrectly drops the assignment to a heap memory block when no overflow problems happen.
And the pointer of the dropped assignment is declared in the same object of the destination buffer.
The previous fix only considers the `memcpy` functions whose copy length is available from arguments.
In this issue, the copy length is inferable from the buffer declaration and string literals being copied.
Therefore, I have adjusted the previous fix to reuse the copy length computed before.
Besides, for APIs that never overflow (strsep) or we never know whether they can overflow (std::copy),
new invalidation operations have been introduced to inform CStringChecker::InvalidateBuffer whether or not to
invalidate the super region that encompasses the destination buffer.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D152435
string_view has a slightly weaker contract, which only specifies whether
the value is bigger or smaller than 0. Adapt users accordingly and just
forward to the standard function (that also compiles down to memcmp)
This avoids recomputing string length that is already known at compile time.
It has a slight impact on preprocessing / compile time, see
https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u
This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470.
The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.
Differential Revision: https://reviews.llvm.org/D139881
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4 (part 2)"
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4"
GCC build hangs on this bot https://lab.llvm.org/buildbot/#/builders/37/builds/19104
compiling CMakeFiles/obj.clangBasic.dir/Targets/AArch64.cpp.d
The bot uses GNU 11.3.0, but I can reproduce locally with gcc (Debian 12.2.0-3) 12.2.0.
This reverts commit caa713559bd38f337d7d35de35686775e8fb5175.
This reverts commit 06b90e2e9c991e211fecc97948e533320a825470.
This reverts commit e953ae5bbc313fd0cc980ce021d487e5b5199ea4.
Support for functions wmempcpy, wmemmove, wmemcmp is added to the checker.
The same tests are copied that exist for the non-wide versions, with
non-wide functions and character types changed to the wide version.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D130470
Support for functions wmemcpy, wcslen, wcsnlen is added to the checker.
Documentation and tests are updated and extended with the new functions.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D130091
CStringChecker is using getByteLength to get the length of a string
literal. For targets where a "char" is 8-bits, getByteLength() and
getLength() will be equal for a C string, but for targets where a "char"
is 16-bits getByteLength() returns the size in octets.
This is verified in our downstream target, but we have no way to add a
test case for this case since there is no target supporting 16-bit
"char" upstream. Since this cannot have a test case, I'm asserted this
change is "correct by construction", and visually inspected to be
correct by way of the following example where this was found.
The case that shows this fails using a target with 16-bit chars is here.
getByteLength() for the string literal returns 4, which fails when
checked against "char x[4]". With the change, the string literal is
evaluated to a size of 2 which is a correct number of "char"'s for a
16-bit target.
```
void strcpy_no_overflow_2(char *y) {
char x[4];
strcpy(x, "12"); // with getByteLength(), returns 4 using 16-bit chars
}
```
This change exposed that embedded nulls within the string are not
handled. This is documented as a FIXME for a future fix.
```
void strcpy_no_overflow_3(char *y) {
char x[3];
strcpy(x, "12\0");
}
```
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D129269