Original commit message:
"
Commit https://github.com/llvm/llvm-project/commit/46f3ade introduced a notion
of printing the attributes on the left to improve the printing of attributes
attached to variable declarations. The intent was to produce more GCC compatible
code because clang tends to print the attributes on the right hand side which is
not accepted by gcc.
This approach has increased the complexity in tablegen and the attrubutes
themselves as now the are supposed to know where they could appear. That lead to
mishandling of the `override` keyword which is modelled as an attribute in
clang.
This patch takes an inspiration from the existing approach and tries to keep the
position of the attributes as they were written. To do so we use simpler
heuristic which checks if the source locations of the attribute precedes the
declaration. If so, it is considered to be printed before the declaration.
Fixes https://github.com/llvm/llvm-project/issues/87151
"
The reason for the bot breakage is that attributes coming from ApiNotes are not
marked implicit even though they do not have source locations. This caused an
assert to trigger. This patch forces attributes with no source location
information to be printed on the left. That change is consistent to the overall
intent of the change to increase the chances for attributes to compile across
toolchains and at the same time the produced code to be as close as possible to
the one written by the user.
Commit https://github.com/llvm/llvm-project/commit/46f3ade introduced a
notion of printing the attributes on the left to improve the printing of
attributes attached to variable declarations. The intent was to produce
more GCC compatible code because clang tends to print the attributes on
the right hand side which is not accepted by gcc.
This approach has increased the complexity in tablegen and the
attrubutes themselves as now the are supposed to know where they could
appear. That lead to mishandling of the `override` keyword which is
modelled as an attribute in clang.
This patch takes an inspiration from the existing approach and tries to
keep the position of the attributes as they were written. To do so we
use simpler heuristic which checks if the source locations of the
attribute precedes the declaration. If so, it is considered to be
printed before the declaration.
Fixes https://github.com/llvm/llvm-project/issues/87151
The checker may create failure branches for all stream write operations
only if the new option "pedantic" is set to true.
Result of the write operations is often not checked in typical code. If
failure branches are created the checker will warn for unchecked write
operations and generate a lot of "false positives" (these are valid
warnings but the programmer does not care about this problem).
Until now function `fseek` returned nonzero on error, this is changed to
-1 only. And it does not produce EOF error any more.
This complies better with the POSIX standard.
When debugging CSA issues, sometimes it would be useful to have a
dedicated note for the analysis entry point, aka. the function name you
would need to pass as "-analyze-function=XYZ" to reproduce a specific
issue.
One way we use (or will use) this downstream is to provide tooling on
top of creduce to enhance to supercharge productivity by automatically
reduce cases on crashes for example.
This will be added only if the "-analyzer-note-analysis-entry-points" is
set or the "analyzer-display-progress" is on.
This additional entry point marker will be the first "note" if enabled,
with the following message: "[debug] analyzing from XYZ". They are
prefixed by "[debug]" to remind the CSA developer that this is only
meant to be visible for them, for debugging purposes.
CPP-5012
This reapplies 80ab8234ac309418637488b97e0a62d8377b2ecf again, after
fixing a name collision warning in the unit tests (see the revert commit
13ccaf9b9d4400bb128b35ff4ac733e4afc3ad1c for details).
In addition to the previously applied changes, this commit also clarifies the
code in MallocChecker that distinguishes POSIX "getline()" and C++ standard
library "std::getline()" (which are two completely different functions). Note
that "std::getline()" was (accidentally) handled correctly even without this
clarification; but it's better to explicitly handle and test this corner case.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
The checker finds a type of undefined behavior, where if the type of a
pointer to an object-array is different from the objects' underlying
type, calling `delete[]` is undefined, as the size of the two objects
might be different.
The checker has been in alpha for a while now, it is a simple checker
that causes no crashes, and considering the severity of the issue, it
has a low result-count on open-source projects (in my last test-run on
my usual projects, it had 0 results).
This commit cleans up the documentation and adds docs for the limitation
related to tracking through references, in addition to moving it to
`cplusplus`.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Co-authored-by: whisperity <whisperity@gmail.com>
According to POSIX 2018.
1. lineptr, n and stream can not be NULL.
2. If *n is non-zero, *lineptr must point to a region of at least *n
bytes, or be a NULL pointer.
Additionally, if *lineptr is not NULL, *n must not be undefined.
Inside the ExprEngine when we process the initializers, we create a
PostInitializer program-point, which will refer to the field being
initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.
When a constructor (of which we evaluate the initializer-list) is
analyzed in top-level context, then the `this` pointer will be
represented by a `SymbolicRegion`, (as it should be).
This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the
initialized region.
```c++
class Bear {
public:
void brum() const;
};
class Door {
public:
// PostInitializer would refer to "FieldRegion{SymRegion{this}}"
// whereas in the store and everywhere else it would be:
// "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
Door() : ptr(nullptr) {
ptr->brum(); // Bug
}
private:
Bear* ptr;
};
```
We (as CSA folks) decided to avoid the creation of FieldRegions directly
of symbolic regions in the past:
f8643a9b31
---
In this patch, I propose to also canonicalize it as in the mentioned
patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0,
Ty}`
This would mean that FieldRegions will/should never simply wrap a
SymbolicRegion directly, but rather an ElementRegion that is sitting in
between.
This patch should have practically no observable effects, as the store
(due to the mentioned patch) was made resilient to this issue, but we
use `PostInitializer::getLocationValue()` for an alternative reporting,
where we faced this issue.
Note that in really rare cases it suppresses now dereference bugs, as
demonstrated in the test. It is because in the past we failed to follow
the region of the PostInitializer inside the StoreSiteFinder visitor -
because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional<PostInitializer> PIP =
Pred->getLocationAs<PostInitializer>()) {
const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP->getInitializer()->getInit();
}
}
```
Notice that the equality check didn't pass for the regions I'm
canonicalizing in this patch.
Given the nature of this change, we would rather upstream this patch.
CPP-4954
Predefined macro FUNCTION in clang is not returning the same string than
MS for templated functions.
See https://godbolt.org/z/q3EKn5zq4
For the same test case MSVC is returning:
function: TestClass::TestClass
function: TestStruct::TestStruct
function: TestEnum::TestEnum
The initial work for this was in the reverted patch
(https://github.com/llvm/llvm-project/pull/66120). This patch solves the
issues raised in the reverted patch.
The checker alpha.security.ArrayBoundV2 performs bounds checking in two
steps: first it checks for underflow, and if it isn't guaranteed then it
assumes that there is no underflow. After this, it checks for overflow,
and if that's guaranteed or the index is tainted then it reports it.
This meant that in situations where overflow and underflow are both
possible (but the index is either tainted or guaranteed to be invalid),
the checker was reporting just an overflow error.
This commit modifies the messages printed in these cases to mention the
possibility of an underflow.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
* Add support for multiple, potentially overlapping critical sections:
The checker can now simultaneously handle several mutex's critical
sections without confusing them.
* Implement the handling of recursive mutexes:
By identifying the lock events, recursive mutexes are now supported.
A lock event is a pair of a lock expression, and the SVal of the mutex
that it locks, so even multiple locks of the same mutex (and even by
the same expression) is now supported.
* Refine the note tags generated by the checker:
The note tags now correctly show just for mutexes that are
active at the point of error, and multiple acquisitions of the same mutex
are also noted.
StaticAnalyzer didn't check if the variable is declared in
`CompoundStmt` under `SwitchStmt`, which make static analyzer reach root
without finding the declaration.
Fixes#68819
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
These functions should not be allowed if the file position is
indeterminate (they return the file position).
This condition is now checked, and tests are improved to check it.
This PR makes alpha.webkit.UncountedLocalVarsChecker ignore raw
references and pointers to a ref counted type which appears within
"trival" statements. To do this, this PR extends TrivialFunctionAnalysis
so that it can also analyze "triviality" of statements as well as that
of functions Each Visit* function is now augmented with
withCachedResult, which is responsible for looking up and updating the
cache for each Visit* functions.
As this PR dramatically improves the false positive rate of the checker,
it also deletes the code to ignore raw pointers and references within if
and for statements.
`getdelim` and `getline` may free, allocate, or re-allocate the input
buffer, ensuring its size is enough to hold the incoming line, the
delimiter, and the null terminator.
`*lineptr` must be a valid argument to `free`, which means it can be
either
1. `NULL`, in which case these functions perform an allocation
equivalent to a call to `malloc` even on failure.
2. A pointer returned by the `malloc` family of functions. Other
pointers are UB (`alloca`, a pointer to a static, to a stack variable, etc.)
This commit adds a testcase which highlights the current incorrect
behavior of the CSA diagnostic generation: it produces a note which says
"Assuming 'arg' is >= 0" in a situation where this is not a fresh
assumption because 'arg' is an unsigned integer.
I also created ticket 78440 to track this bug.
`va_list` is a platform-specific type. On some, it is a struct instead
of a pointer to a struct, so `lookupFn` was ignoring calls to `vfprintf`
and `vfscanf`.
`stream.c` now runs in four different platforms to make sure the logic
works across targets.
Emit a warning if pointer/reference to compound literal is returned from
a function.
In C, compound literals in block scope are lvalues that have automatic
storage duration. In C++, compound literals in block scope are
temporaries.
In either case, returning a pointer/reference to a compound literal can
cause a use-after-free bug.
Fixes#8678
Default value of checker option `ModelPOSIX` is changed to `true`.
Documentation is updated.
This is a re-apply of commit 7af4e8bcc354d2bd7e46ecf547172b1f19ddde3e
that was reverted because a test failure (this is fixed now).
If a stream operation fails the position can become "indeterminate".
This may cause warning from the checker at a later operation. The new
note tag shows the place where the position becomes "indeterminate",
this is where a failure occurred.
Model `getc` and `putc` as equivalent to `fgetc` and `fputc` respectively.
Model `vfscanf` and `vfprintf` as `fscanf` and `fprintf`, except that
`vfscanf` can not invalidate the parameters due to the indirection via a
`va_list`. Nevertheless, we can still track EOF and errors as for `fscanf`.
The checker reported a false positive on this code
void testTaintedSanitizedVLASize(void) {
int x;
scanf("%d", &x);
if (x<1)
return;
int vla[x]; // no-warning
}
After the fix, the checker only emits tainted warning if the vla size is
coming from a tainted source and it cannot prove that it is positive.
Specific arguments passed to stream handling functions are changed by
the function, this means these should be invalidated ("escaped") by the
analyzer. This change adds the argument invalidation (in specific cases)
to the checker.
A memory access is an out of bounds error if the offset is < the extent
of the memory region. Notice that here "<" is a _mathematical_
comparison between two numbers and NOT a C/C++ operator that compares
two typed C++ values: for example -1 < 1000 is true in mathematics, but
if the `-1` is an `int` and the `1000` is a `size_t` value, then
evaluating the C/C++ operator `<` will return false because the `-1`
will be converted to `SIZE_MAX` by the automatic type conversions.
This means that it's incorrect to perform a bounds check with
`evalBinOpNN(State, BO_LT, ...)` which performs automatic conversions
and can produce wildly incorrect results.
ArrayBoundsCheckerV2 already had a special case where it avoided calling
`evalBinOpNN` in a situation where it would have performed an automatic
conversion; this commit replaces that code with a more general one that
covers more situations. (It's still not perfect, but it's better than
the previous version and I think it will cover practically all
real-world code.)
Note that this is not a limitation/bug of the simplification algorithm
defined in `getSimplifedOffsets()`: the simplification is not applied in
the test case `test_comparison_with_extent_symbol` (because the `Extent`
is not a concrete int), but without the new code it would still run into
a `-1 < UNSIGNED` comparison that evaluates to false because
`evalBinOpNN` performs an automatic type conversion.
Function 'fileno' fails only if invalid pointer is passed, this is a
case that is often ignored in source code. The failure case leads to
many "false positive" reports when `fileno` returns -1 and this is not
checked in the program. Because this, the function is now assumed
to not fail (this is assumption that the passed file pointer is correct).
The change affects `StdCLibraryFunctionsChecker` and
`StreamChecker`.
This PR adds the support for WebKit's RefAllowingPartiallyDestroyed and
RefPtrAllowingPartiallyDestroyed, which are smart pointer types which
may be used after the destructor had started running.
This PR makes the checker ignore / skip calls to methods of Web Template
Platform's container types such as HashMap, HashSet, WeakHashSet,
WeakHashMap, Vector, etc...
Now calling `open` with the `O_CREAT` flag and no mode parameter will
raise an issue in any system that defines `O_CREAT`.
The value for this flag is obtained after the full source code has been
parsed, leveraging `checkASTDecl`.
Hence, any `#define` or `#undefine` of `O_CREAT` following an `open` may
alter the results. Nevertheless, since redefining reserved identifiers
is UB, this is probably ok.
Allow address-of operator (&), enum constant, and a reference to
constant as well as materializing temporqary expression and an
expression with cleanups to appear within a trivial function.
This PR introduces the concept of a "trivial function" which applies to
a function that only calls other trivial functions and contain literals
and expressions that don't result in heap mutations (specifically it
does not call deref). This is implemented using ConstStmtVisitor and
checking each statement and expression's trivialness.
This PR also introduces the concept of a "ingleton function", which is a
static member function or a free standing function which ends with the
suffix "singleton". Such a function's return value is understood to be
safe to call any function with.
This PR makes the checker not emit warning when a function is called
with a return value of another function when the return value is of type
Ref<T> or RefPtr<T>.
This PR makes alpha.webkit.UncountedCallArgsChecker eplicitly check the
safety of the object argument in a member function call. It also removes
the exemption of local variables from this checker so that each local
variable's safety is checked if it's used in a function call instead of
relying on the local variable checker to find those since local variable
checker currently has exemption for "for" and "if" statements.
The attribute is now allowed on an assortment of declarations, to
suppress warnings related to declarations themselves, or all warnings in
the lexical scope of the declaration.
I don't necessarily see a reason to have a list at all, but it does look
as if some of those more niche items aren't properly supported by the
compiler itself so let's maintain a short safe list for now.
The initial implementation raised a question whether the attribute
should apply to lexical declaration context vs. "actual" declaration
context. I'm using "lexical" here because it results in less warnings
suppressed, which is the conservative behavior: we can always expand it
later if we think this is wrong, without breaking any existing code. I
also think that this is the correct behavior that we will probably never
want to change, given that the user typically desires to keep the
suppressions as localized as possible.
This PR aligns the evaluation of default arguments with other kinds of
arguments by extracting the expressions within them as argument values
to be evaluated.
The bug was caused by isRefCountable erroneously returning false for a
class with both ref() and deref() functions defined because we were not
resetting the base paths results between looking for "ref()" and
"deref()"