In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.
Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.
Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.
This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.
This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).
The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.
After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.
This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.
After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.
This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.
As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.
However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597
To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.
Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
- `CDF_None` was the default "match anything" mode,
- `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).
Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
`CE->getCalleeDecl()` returns `VarDecl` if the callee is actually a
function pointer variable. Consequently, calling `getAsFunction()` will
return null.
To workaround the case, we should use the `CallEvent::parameters()`,
which will internally recover the function being called and do the right
thing.
Fixes#74269
Depends on "[analyzer][NFC] Prefer CallEvent over CallExpr in APIs"
This change only uplifts existing APIs, without any semantic changes.
This is the continuation of 44820630dfa45bc47748a5abda7d4a9cb86da2c1.
Benefits of using CallEvents over CallExprs:
The callee decl is traced through function pointers if possible.
This will be important to fix#74269 in a follow-up patch.
...because it provides no useful functionality compared to its base
class `BugType`.
A long time ago there were substantial differences between `BugType` and
`BuiltinBug`, but they were eliminated by commit 1bd58233 in 2009 (!).
Since then the only functionality provided by `BuiltinBug` was that it
specified `categories::LogicError` as the bug category and it stored an
extra data member `desc`.
This commit sets `categories::LogicError` as the default value of the
third argument (bug category) in the constructors of BugType and
replaces use of the `desc` field with simpler logic.
Note that `BugType` has a data member `Description` and a non-virtual
method `BugType::getDescription()` which queries it; these are distinct
from the member `desc` of `BuiltinBug` and the identically named method
`BuiltinBug::getDescription()` which queries it. This confusing name
collision was a major motivation for the elimination of `BuiltinBug`.
As this commit touches many files, I avoided functional changes and left
behind FIXME notes to mark minor issues that should be fixed later.
Differential Revision: https://reviews.llvm.org/D158855
Older versions of clang (for example, v12) throw an error when compiling
CStringChecker.cpp that the initializers for `SourceArgExpr`,
`DestinationArgExpr`, and `SizeArgExpr` are missing braces around
initialization of subobject. Newer clang versions don't throw this
error. This patch adds the initialization braces to satisfy clang.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D154871
I'm involved with the Static Analyzer for the most part.
I think we should embrace newer language standard features and gradually
move forward.
Differential Revision: https://reviews.llvm.org/D154325
Fixing GitHub issue: https://github.com/llvm/llvm-project/issues/55019
Following the previous fix https://reviews.llvm.org/D12571 on issue https://github.com/llvm/llvm-project/issues/23328
The two issues report false memory leaks after calling string-copy APIs with a buffer field in an object as the destination.
The buffer invalidation incorrectly drops the assignment to a heap memory block when no overflow problems happen.
And the pointer of the dropped assignment is declared in the same object of the destination buffer.
The previous fix only considers the `memcpy` functions whose copy length is available from arguments.
In this issue, the copy length is inferable from the buffer declaration and string literals being copied.
Therefore, I have adjusted the previous fix to reuse the copy length computed before.
Besides, for APIs that never overflow (strsep) or we never know whether they can overflow (std::copy),
new invalidation operations have been introduced to inform CStringChecker::InvalidateBuffer whether or not to
invalidate the super region that encompasses the destination buffer.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D152435
string_view has a slightly weaker contract, which only specifies whether
the value is bigger or smaller than 0. Adapt users accordingly and just
forward to the standard function (that also compiles down to memcmp)
This avoids recomputing string length that is already known at compile time.
It has a slight impact on preprocessing / compile time, see
https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u
This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470.
The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.
Differential Revision: https://reviews.llvm.org/D139881
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4 (part 2)"
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4"
GCC build hangs on this bot https://lab.llvm.org/buildbot/#/builders/37/builds/19104
compiling CMakeFiles/obj.clangBasic.dir/Targets/AArch64.cpp.d
The bot uses GNU 11.3.0, but I can reproduce locally with gcc (Debian 12.2.0-3) 12.2.0.
This reverts commit caa713559bd38f337d7d35de35686775e8fb5175.
This reverts commit 06b90e2e9c991e211fecc97948e533320a825470.
This reverts commit e953ae5bbc313fd0cc980ce021d487e5b5199ea4.
Support for functions wmempcpy, wmemmove, wmemcmp is added to the checker.
The same tests are copied that exist for the non-wide versions, with
non-wide functions and character types changed to the wide version.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D130470
Support for functions wmemcpy, wcslen, wcsnlen is added to the checker.
Documentation and tests are updated and extended with the new functions.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D130091
CStringChecker is using getByteLength to get the length of a string
literal. For targets where a "char" is 8-bits, getByteLength() and
getLength() will be equal for a C string, but for targets where a "char"
is 16-bits getByteLength() returns the size in octets.
This is verified in our downstream target, but we have no way to add a
test case for this case since there is no target supporting 16-bit
"char" upstream. Since this cannot have a test case, I'm asserted this
change is "correct by construction", and visually inspected to be
correct by way of the following example where this was found.
The case that shows this fails using a target with 16-bit chars is here.
getByteLength() for the string literal returns 4, which fails when
checked against "char x[4]". With the change, the string literal is
evaluated to a size of 2 which is a correct number of "char"'s for a
16-bit target.
```
void strcpy_no_overflow_2(char *y) {
char x[4];
strcpy(x, "12"); // with getByteLength(), returns 4 using 16-bit chars
}
```
This change exposed that embedded nulls within the string are not
handled. This is documented as a FIXME for a future fix.
```
void strcpy_no_overflow_3(char *y) {
char x[3];
strcpy(x, "12\0");
}
```
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D129269
A recent review emphasized the preference to use DefaultBool instead of
bool for checker options. This change is a NFC and cleans up some of the
instances where bool was used, and could be changed to DefaultBool.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D123464
Under the hood this prints the same as `QualType::getAsString()` but cuts out the middle-man when that string is sent to another raw_ostream.
Also cleaned up all the call sites where this occurs.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D123926
This change fixes an assert that occurs in the SMT layer when refuting a
finding that uses pointers of two different sizes. This was found in a
downstream build that supports two different pointer sizes, The CString
Checker was attempting to compute an overlap for the 'to' and 'from'
pointers, where the pointers were of different sizes.
In the downstream case where this was found, a specialized memcpy
routine patterned after memcpy_special is used. The analyzer core hits
on this builtin because it matches the 'memcpy' portion of that builtin.
This cannot be duplicated in the upstream test since there are no
specialized builtins that match that pattern, but the case does
reproduce in the accompanying LIT test case. The amdgcn target was used
for this reproducer. See the documentation for AMDGPU address spaces here
https://llvm.org/docs/AMDGPUUsage.html#address-spaces.
The assert seen is:
`*Solver->getSort(LHS) == *Solver->getSort(RHS) && "AST's must have the same sort!"'
Ack to steakhal for reviewing the fix, and creating the test case.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D118050
Few weeks back I was experimenting with reading the uninitialized values from src , which is actually a bug but the CSA seems to give up at that point . I was curious about that and I pinged @steakhal on the discord and according to him this seems to be a genuine issue and needs to be fix. So I goes with fixing this bug and thanks to @steakhal who help me creating this patch. This feature seems to break some tests but this was the genuine problem and the broken tests also needs to fix in certain manner. I add a test but yeah we need more tests,I'll try to add more tests.Thanks
Reviewed By: steakhal, NoQ
Differential Revision: https://reviews.llvm.org/D120489
Few weeks back I was experimenting with reading the uninitialized values from src , which is actually a bug but the CSA seems to give up at that point . I was curious about that and I pinged @steakhal on the discord and according to him this seems to be a genuine issue and needs to be fix. So I goes with fixing this bug and thanks to @steakhal who help me creating this patch. This feature seems to break some tests but this was the genuine problem and the broken tests also needs to fix in certain manner. I add a test but yeah we need more tests,I'll try to add more tests.Thanks
Reviewed By: steakhal, NoQ
Differential Revision: https://reviews.llvm.org/D120489
There is different bug types for different types of bugs but the **emitAdditionOverflowbug** seems to use bugtype **BT_NotCSting** but actually it have to use **BT_AdditionOverflow** .
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D119462
This patch replaces each use of the previous API with the new one.
In variadic cases, it will use the ADL `matchesAny(Call, CDs...)`
variadic function.
Also simplifies some code involving such operations.
Reviewed By: martong, xazax.hun
Differential Revision: https://reviews.llvm.org/D113591
`CallDescriptions` deserve its own translation unit.
This patch simply moves the corresponding parts.
Also includes the `CallDescription.h` where it's necessary.
Reviewed By: martong, xazax.hun, Szelethus
Differential Revision: https://reviews.llvm.org/D113587
This is mostly a mechanical change, but a testcase that contains
parts of the StringRef class (clang/test/Analysis/llvm-conventions.cpp)
isn't touched.
Currently, parameters of functions without their definition present cannot
be represented as regions because it would be difficult to ensure that the
same declaration is used in every case. To overcome this, we split
`VarRegion` to two subclasses: `NonParamVarRegion` and `ParamVarRegion`.
The latter does not store the `Decl` of the parameter variable. Instead it
stores the index of the parameter which enables retrieving the actual
`Decl` every time using the function declaration of the stack frame. To
achieve this we also removed storing of `Decl` from `DeclRegion` and made
`getDecl()` pure virtual. The individual `Decl`s are stored in the
appropriate subclasses, such as `FieldRegion`, `ObjCIvarRegion` and the
newly introduced `NonParamVarRegion`.
Differential Revision: https://reviews.llvm.org/D80522
Summary:
I wanted to extend the diagnostics of the CStringChecker with taintedness.
This requires the CStringChecker to be refactored to support a more flexible
reporting mechanism.
This patch does only refactorings, such:
- eliminates always false parameters (like WarnAboutSize)
- reduces the number of parameters
- makes strong types differentiating *source* and *destination* buffers
(same with size expressions)
- binds the argument expression and the index, making diagnostics accurate
and easy to emit
- removes a bunch of default parameters to make it more readable
- remove random const char* warning message parameters, making clear where
and what is going to be emitted
Note that:
- CheckBufferAccess now checks *only* one buffer, this removed about 100 LOC
code duplication
- not every function was refactored to use the /new/ strongly typed API, since
the CString related functions are really closely coupled monolithic beasts,
I will refactor them separately
- all tests are preserved and passing; only the message changed at some places.
In my opinion, these messages are holding the same information.
I would also highlight that this refactoring caught a bug in
clang/test/Analysis/string.c:454 where the diagnostic did not reflect reality.
This catch backs my effort on simplifying this monolithic CStringChecker.
Reviewers: NoQ, baloghadamsoftware, Szelethus, rengolin, Charusso
Reviewed By: NoQ
Subscribers: whisperity, xazax.hun, szepet, rnkovacs, a.sidorin,
mikhail.ramalho, donat.nagy, dkrupp, Charusso, martong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74806
Some checkers may not only depend on language options but also analyzer options.
To make this possible this patch changes the parameter of the shouldRegister*
function to CheckerManager to be able to query the analyzer options when
deciding whether the checker should be registered.
Differential Revision: https://reviews.llvm.org/D75271
Summary:
This patch introduces a placeholder for representing the dynamic size of
regions. It also moves the `getExtent()` method of `SubRegions` to the
`MemRegionManager` as `getStaticSize()`.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D69540
While analyzing code `memcmp(a, NULL, n);', where `a' has an unconstrained
symbolic value, the analyzer was emitting a warning about the *first* argument
being a null pointer, even though we'd rather have it warn about the *second*
argument.
This happens because CStringChecker first checks whether the two argument
buffers are in fact the same buffer, in order to take the fast path.
This boils down to assuming `a == NULL' to true. Then the subsequent check
for null pointer argument "discovers" that `a' is null.
Don't take the fast path unless we are *sure* that the buffers are the same.
Otherwise proceed as normal.
Differential Revision: https://reviews.llvm.org/D71322