The current implementation of MemRegion::getDescriptiveName fails for
FieldRegions whose SuperRegion is an ElementRegion. As outlined below:
```Cpp
struct val_struct { int val; };
extern struct val_struct val_struct_array[3];
void func(){
// FieldRegion with ElementRegion as SuperRegion.
val_struct_array[0].val;
}
```
For this special case, the expression cannot be pretty printed and must
therefore be obtained separately.
This change enables more accurate modeling of the write effects of
`fread`. In particular, instead of invalidating the whole buffer, in a
best-effort basis, we would try to invalidate the actually accesses
elements of the buffer. This preserves the previous value of the buffer
of the unaffected slots. As a result, diagnose more uninitialized buffer
uses for example.
Currently, this refined invalidation only triggers for `fread` if and
only if the `count` parameter and the buffer pointer's index component
are concrete or perfectly-constrained symbols.
Additionally, if the `fread` would read more than 64 elements, the whole
buffer is invalidated as before. This is to have safeguards against
performance issues.
Refer to the comments of the assertions in the following example to see
the changes in the diagnostics:
```c++
void demo() {
FILE *fp = fopen("/home/test", "rb+");
if (!fp) return;
int buffer[10]; // uninitialized
int read_items = fread(buffer+1, sizeof(int), 5, fp);
if (5 == read_items) {
int v1 = buffer[1]; // Unknown value but not garbage.
clang_analyzer_isTainted(v1); // expected-warning {{YES}} <-- Would be "NO" without this patch.
clang_analyzer_dump(v1); // expected-warning {{conj_}} <-- Not a "derived" symbol, so it's directly invalidated now.
int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before.
(void)(v1 + v0);
} else {
// If 'fread' had an error.
int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before.
(void)v0;
}
fclose(fp);
}
```
CPP-3247, CPP-3802
Co-authored by Marco Borgeaud (marco-antognini-sonarsource)
In PR #79382, I need to add a new type that derives from
ConstantArrayType. This means that ConstantArrayType can no longer use
`llvm::TrailingObjects` to store the trailing optional Expr*.
This change refactors ConstantArrayType to store a 60-bit integer and
4-bits for the integer size in bytes. This replaces the APInt field
previously in the type but preserves enough information to recreate it
where needed.
To reduce the number of places where the APInt is re-constructed I've
also added some helper methods to the ConstantArrayType to allow some
common use cases that operate on either the stored small integer or the
APInt as appropriate.
Resolves#85124.
Fixes https://github.com/llvm/llvm-project/issues/84463
Changes:
- Adapted MemRegion::getDescriptiveName
- Added unittest to check name for a given clang::ento::ElementRegion
- Some format changes due to clang-format
---------
Co-authored-by: Andreas Steinhausen <andreas.steinhausen@concenrio.io>
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
When we construct a `NonParamVarRegion`, we canonicalize the decl to
always use the same entity for consistency.
At the moment that is the canonical decl - which is the first decl in
the redecl chain.
However, this can cause problems with tentative declarations and extern
declarations if we declare an array with unknown bounds.
Consider this C example: https://godbolt.org/z/Kdvr11EqY
```lang=C
typedef typeof(sizeof(int)) size_t;
size_t clang_analyzer_getExtent(const void *p);
void clang_analyzer_dump(size_t n);
extern const unsigned char extern_redecl[];
const unsigned char extern_redecl[] = { 1,2,3,4 };
const unsigned char tentative_redecl[];
const unsigned char tentative_redecl[] = { 1,2,3,4 };
const unsigned char direct_decl[] = { 1,2,3,4 };
void test_redeclaration_extent(void) {
clang_analyzer_dump(clang_analyzer_getExtent(direct_decl)); // 4
clang_analyzer_dump(clang_analyzer_getExtent(extern_redecl)); // should be 4 instead of Unknown
clang_analyzer_dump(clang_analyzer_getExtent(tentative_redecl)); // should be 4 instead of Unknown
}
```
The `getType()` of the canonical decls for the forward declared globals,
will return `IncompleteArrayType`, unlike the
`getDefinition()->getType()`, which would have returned
`ConstantArrayType` of 4 elements.
This makes the `MemRegionManager::getStaticSize()` return `Unknown` as
the extent for the array variables, leading to FNs.
To resolve this, I think we should prefer the definition decl (if
present) over the canonical decl when constructing `NonParamVarRegion`s.
FYI The canonicalization of the decl was introduced by D57619 in 2019.
Differential Revision: https://reviews.llvm.org/D154827
I'm involved with the Static Analyzer for the most part.
I think we should embrace newer language standard features and gradually
move forward.
Differential Revision: https://reviews.llvm.org/D154325
This patch introduces a new `CXXLifetimeExtendedObjectRegion` as a representation
of the memory for the temporary object that is lifetime extended by the reference
to which they are bound.
This separation provides an ability to detect the use of dangling pointers
(either binding or dereference) in a robust manner.
For example, the `ref` is conditionally dangling in the following example:
```
template<typename T>
T const& select(bool cond, T const& t, T const& u) { return cond ? t : u; }
int const& le = Composite{}.x;
auto&& ref = select(cond, le, 10);
```
Before the change, regardless of the value of `cond`, the `select()` call would
have returned a `temp_object` region.
With the proposed change we would produce a (non-dangling) `lifetime_extended_object`
region with lifetime bound to `le` or a `temp_object` region for the dangling case.
We believe that such separation is desired, as such lifetime extended temporaries
are closer to the variables. For example, they may have a static storage duration
(this patch removes a static temporary region, which was an abomination).
We also think that alternative approaches are not viable.
While for some cases it may be possible to determine if the region is lifetime
extended by searching the parents of the initializer expr, this quickly becomes
complex in the presence of the conditions operators like this one:
```
Composite cc;
// Ternary produces prvalue 'int' which is extended, as branches differ in value category
auto&& x = cond ? Composite{}.x : cc.x;
// Ternary produces xvalue, and extends the Composite object
auto&& y = cond ? Composite{}.x : std::move(cc).x;
```
Finally, the lifetime of the `CXXLifetimeExtendedObjectRegion` is tied to the lifetime of
the corresponding variables, however, the "liveness" (or reachability) of the extending
variable does not imply the reachability of all symbols in the region.
In conclusion `CXXLifetimeExtendedObjectRegion`, in contrast to `VarRegions`, does not
need any special handling in `SymReaper`.
RFC: https://discourse.llvm.org/t/rfc-detecting-uses-of-dangling-references/70731
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D151325
The last use was removed by:
commit e2e37b9afc0a0a66a1594377a88221e115d95348
Author: Ted Kremenek <kremenek@apple.com>
Date: Thu Jul 28 23:08:02 2011 +0000
MemRegion::getMemorySpace() is annotated with
LLVM_ATTRIBUTE_RETURNS_NONNULL (which triggers instant UB if a null
pointer is returned), and callers indeed don't check the return value
for null. Thus, even though llvm::dyn_cast is called, it can never
return null in this context. Therefore, we can safely call llvm::cast.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D151727
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
By default, clang assumes that all trailing array objects could be a
FAM. So, an array of undefined size, size 0, size 1, or even size 42 is
considered as FAMs for optimizations at least.
One needs to override the default behavior by supplying the
`-fstrict-flex-arrays=<N>` flag, with `N > 0` value to reduce the set of
FAM candidates. Value `3` is the most restrictive and `0` is the most
permissive on this scale.
0: all trailing arrays are FAMs
1: only incomplete, zero and one-element arrays are FAMs
2: only incomplete, zero-element arrays are FAMs
3: only incomplete arrays are FAMs
If the user is happy with consdering single-element arrays as FAMs, they
just need to remove the
`consider-single-element-arrays-as-flexible-array-members` from the
command line.
Otherwise, if they don't want to recognize such cases as FAMs, they
should specify `-fstrict-flex-arrays` anyway, which will be picked up by
CSA.
Any use of the deprecated analyzer-config value will trigger a warning
explaining what to use instead.
The `-analyzer-config-help` is updated accordingly.
Depends on D138657
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D138659
The -fstrict-flex-arrays=3 is the most restrictive type of flex arrays.
No number, including 0, is allowed in the FAM. In the cases where a "0"
is used, the resulting size is the same as if a zero-sized object were
substituted.
This is needed for proper _FORTIFY_SOURCE coverage in the Linux kernel,
among other reasons. So while the only reason for specifying a
zero-length array at the end of a structure is for specify a FAM,
treating it as such will cause _FORTIFY_SOURCE not to work correctly;
__builtin_object_size will report -1 instead of 0 for a destination
buffer size to keep any kernel internals from using the deprecated
members as fake FAMs.
For example:
struct broken {
int foo;
int fake_fam[0];
struct something oops;
};
There have been bugs where the above struct was created because "oops"
was added after "fake_fam" by someone not realizing. Under
__FORTIFY_SOURCE, doing:
memcpy(p->fake_fam, src, len);
raises no warnings when __builtin_object_size(p->fake_fam, 1) returns -1
and may stomp on "oops."
Omitting a warning when using the (invalid) zero-length array is how GCC
treats -fstrict-flex-arrays=3. A warning in that situation is likely an
irritant, because requesting this option level is explicitly requesting
this behavior.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
Differential Revision: https://reviews.llvm.org/D134902
It turns out that in certain cases `SymbolRegions` are wrapped by
`ElementRegions`; in others, it's not. This discrepancy can cause the
analyzer not to recognize if the two regions are actually referring to
the same entity, which then can lead to unreachable paths discovered.
Consider this example:
```lang=C++
struct Node { int* ptr; };
void with_structs(Node* n1) {
Node c = *n1; // copy
Node* n2 = &c;
clang_analyzer_dump(*n1); // lazy...
clang_analyzer_dump(*n2); // lazy...
clang_analyzer_dump(n1->ptr); // rval(n1->ptr): reg_$2<int * SymRegion{reg_$0<struct Node * n1>}.ptr>
clang_analyzer_dump(n2->ptr); // rval(n2->ptr): reg_$1<int * Element{SymRegion{reg_$0<struct Node * n1>},0 S64b,struct Node}.ptr>
clang_analyzer_eval(n1->ptr != n2->ptr); // UNKNOWN, bad!
(void)(*n1);
(void)(*n2);
}
```
The copy of `n1` will insert a new binding to the store; but for doing
that it actually must create a `TypedValueRegion` which it could pass to
the `LazyCompoundVal`. Since the memregion in question is a
`SymbolicRegion` - which is untyped, it needs to first wrap it into an
`ElementRegion` basically implementing this untyped -> typed conversion
for the sake of passing it to the `LazyCompoundVal`.
So, this is why we have `Element{SymRegion{.}, 0,struct Node}` for `n1`.
The problem appears if the analyzer evaluates a read from the expression
`n1->ptr`. The same logic won't apply for `SymbolRegionValues`, since
they accept raw `SubRegions`, hence the `SymbolicRegion` won't be
wrapped into an `ElementRegion` in that case.
Later when we arrive at the equality comparison, we cannot prove that
they are equal.
For more details check the corresponding thread on discourse:
https://discourse.llvm.org/t/are-symbolicregions-really-untyped/64406
---
In this patch, I'm eagerly wrapping each `SymbolicRegion` by an
`ElementRegion`; basically canonicalizing to this form.
It seems reasonable to do so since any object can be thought of as a single
array of that object; so this should not make much of a difference.
The tests also underpin this assumption, as only a few were broken by
this change; and actually fixed a FIXME along the way.
About the second example, which does the same copy operation - but on
the heap - it will be fixed by the next patch.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D132142
leaking in ARC mode
When ARC (automatic reference count) is enabled, (objective-c) block
objects are automatically retained and released thus they do not leak.
Without ARC, they still can leak from an expiring stack frame like
other stack variables.
With this commit, the static analyzer now puts a block object in an
"unknown" region if ARC is enabled because it is up to the
implementation to choose whether to put the object on stack initially
(then move to heap when needed) or in heap directly under ARC.
Therefore, the `StackAddrEscapeChecker` has no need to know
specifically about ARC at all and it will not report errors on objects
in "unknown" regions.
Reviewed By: NoQ (Artem Dergachev)
Differential Revision: https://reviews.llvm.org/D131009
Some code [0] consider that trailing arrays are flexible, whatever their size.
Support for these legacy code has been introduced in
f8f632498307d22e10fab0704548b270b15f1e1e but it prevents evaluation of
__builtin_object_size and __builtin_dynamic_object_size in some legit cases.
Introduce -fstrict-flex-arrays=<n> to have stricter conformance when it is
desirable.
n = 0: current behavior, any trailing array member is a flexible array. The default.
n = 1: any trailing array member of undefined, 0 or 1 size is a flexible array member
n = 2: any trailing array member of undefined or 0 size is a flexible array member
This takes into account two specificities of clang: array bounds as macro id
disqualify FAM, as well as non standard layout.
Similar patch for gcc discuss here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
[0] https://docs.freebsd.org/en/books/developers-handbook/sockets/#sockets-essential-functions
This reverts D126864 and related fixes.
This reverts commit 572b08790a69f955ae0cbb1b4a7d4a215f15dad9.
This reverts commit 886715af962de2c92fac4bd37104450345711e4a.
Some code [0] consider that trailing arrays are flexible, whatever their size.
Support for these legacy code has been introduced in
f8f632498307d22e10fab0704548b270b15f1e1e but it prevents evaluation of
__builtin_object_size and __builtin_dynamic_object_size in some legit cases.
Introduce -fstrict-flex-arrays=<n> to have stricter conformance when it is
desirable.
n = 0: current behavior, any trailing array member is a flexible array. The default.
n = 1: any trailing array member of undefined, 0 or 1 size is a flexible array member
n = 2: any trailing array member of undefined or 0 size is a flexible array member
n = 3: any trailing array member of undefined size is a flexible array member (strict c99 conformance)
Similar patch for gcc discuss here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
[0] https://docs.freebsd.org/en/books/developers-handbook/sockets/#sockets-essential-functions
The arithmetic restriction seems to be artificial.
The comment below seems to be stale.
Thus, we remove both.
Depends on D127306.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D127763
Previously, system globals were treated as immutable regions, unless it
was the `errno` which is known to be frequently modified.
D124244 wants to add a check for stores to immutable regions.
It would basically turn all stores to system globals into an error even
though we have no reason to believe that those mutable sys globals
should be treated as if they were immutable. And this leads to
false-positives if we apply D124244.
In this patch, I'm proposing to treat mutable sys globals actually
mutable, hence allocate them into the `GlobalSystemSpaceRegion`, UNLESS
they were declared as `const` (and a primitive arithmetic type), in
which case, we should use `GlobalImmutableSpaceRegion`.
In any other cases, I'm using the `GlobalInternalSpaceRegion`, which is
no different than the previous behavior.
---
In the tests I added, only the last `expected-warning` was different, compared to the baseline.
Which is this:
```lang=C++
void test_my_mutable_system_global_constraint() {
assert(my_mutable_system_global > 2);
clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{TRUE}}
invalidate_globals();
clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{UNKNOWN}} It was previously TRUE.
}
void test_my_mutable_system_global_assign(int x) {
my_mutable_system_global = x;
clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{TRUE}}
invalidate_globals();
clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{UNKNOWN}} It was previously TRUE.
}
```
---
Unfortunately, the taint checker will be also affected.
The `stdin` global variable is a pointer, which is assumed to be a taint
source, and the rest of the taint propagation rules will propagate from
it.
However, since mutable variables are no longer treated immutable, they
also get invalidated, when an opaque function call happens, such as the
first `scanf(stdin, ...)`. This would effectively remove taint from the
pointer, consequently disable all the rest of the taint propagations
down the line from the `stdin` variable.
All that said, I decided to look through `DerivedSymbol`s as well, to
acquire the memregion in that case as well. This should preserve the
previously existing taint reports.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D127306
This patch annotates the most important analyzer function APIs.
Also adds a couple of assertions for uncovering any potential issues
earlier in the constructor; in those cases, the member functions were
already dereferencing the members unconditionally anyway.
Measurements showed no performance impact, nor crashes.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D126198
Under the hood this prints the same as `QualType::getAsString()` but cuts out the middle-man when that string is sent to another raw_ostream.
Also cleaned up all the call sites where this occurs.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D123926
WG14 has elected to remove support for K&R C functions in C2x. The
feature was introduced into C89 already deprecated, so after this long
of a deprecation period, the committee has made an empty parameter list
mean the same thing in C as it means in C++: the function accepts no
arguments exactly as if the function were written with (void) as the
parameter list.
This patch implements WG14 N2841 No function declarators without
prototypes (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2841.htm)
and WG14 N2432 Remove support for function definitions with identifier
lists (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2432.pdf).
It also adds The -fno-knr-functions command line option to opt into
this behavior in other language modes.
Differential Revision: https://reviews.llvm.org/D123955
Add a checker to maintain the system-defined value 'errno'.
The value is supposed to be set in the future by existing or
new checkers that evaluate errno-modifying function calls.
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D120310
Add a checker to maintain the system-defined value 'errno'.
The value is supposed to be set in the future by existing or
new checkers that evaluate errno-modifying function calls.
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D120310
This avoids an unnecessary copy required by 'return OS.str()', allowing
instead for NRVO or implicit move. The .str() call (which flushes the
stream) is no longer required since 65b13610a5226b84889b923bae884ba395ad084d,
which made raw_string_ostream unbuffered by default.
Differential Revision: https://reviews.llvm.org/D115374
Replace variable and functions names, as well as comments that contain whitelist with
more inclusive terms.
Reviewed By: aaron.ballman, martong
Differential Revision: https://reviews.llvm.org/D112642
It turns out llvm::isa<> is variadic, and we could have used this at a
lot of places.
The following patterns:
x && isa<T1>(x) || isa<T2>(x) ...
Will be replaced by:
isa_and_non_null<T1, T2, ...>(x)
Sometimes it caused further simplifications, when it would cause even
more code smell.
Aside from this, keep in mind that within `assert()` or any macro
functions, we need to wrap the isa<> expression within a parenthesis,
due to the parsing of the comma.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111982
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.
Differential Revision: https://reviews.llvm.org/D110808
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
`SVB.getStateManager().getOwningEngine().getAnalysisManager().getAnalyzerOptions()`
is quite a mouthful and might involve a few pointer indirections to get
such a simple thing like an analyzer option.
This patch introduces an `AnalyzerOptions` reference to the `SValBuilder`
abstract class, while refactors a few cases to use this /simpler/ accessor.
Reviewed By: martong, Szelethus
Differential Revision: https://reviews.llvm.org/D108824
Quoting https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html:
> In the absence of the zero-length array extension, in ISO C90 the contents
> array in the example above would typically be declared to have a single
> element.
We should not assume that the size of the //flexible array member// field has
a single element, because in some cases they use it as a fallback for not
having the //zero-length array// language extension.
In this case, the analyzer should return `Unknown` as the extent of the field
instead.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D108230
This reverts commit df1f4e0cc6ec9a734aae41ffd48ee8b2007fcabb.
Now the test case explicitly specifies the target triple.
I decided to use x86_64 for that matter, to have a fixed
bitwidth for `size_t`.
Aside from that, relanding the original changes of:
https://reviews.llvm.org/D105184
Currently only `ConstantArrayType` is considered for flexible array
members (FAMs) in `getStaticSize()`.
However, `IncompleteArrayType` also shows up in practice as FAMs.
This patch will ignore the `IncompleteArrayType` and return Unknown
for that case as well. This way it will be at least consistent with
the current behavior until we start modeling them accurately.
I'm expecting that this will resolve a bunch of false-positives
internally, caused by the `ArrayBoundV2`.
Reviewed By: ASDenysPetrov
Differential Revision: https://reviews.llvm.org/D105184
Currently, parameters of functions without their definition present cannot
be represented as regions because it would be difficult to ensure that the
same declaration is used in every case. To overcome this, we split
`VarRegion` to two subclasses: `NonParamVarRegion` and `ParamVarRegion`.
The latter does not store the `Decl` of the parameter variable. Instead it
stores the index of the parameter which enables retrieving the actual
`Decl` every time using the function declaration of the stack frame. To
achieve this we also removed storing of `Decl` from `DeclRegion` and made
`getDecl()` pure virtual. The individual `Decl`s are stored in the
appropriate subclasses, such as `FieldRegion`, `ObjCIvarRegion` and the
newly introduced `NonParamVarRegion`.
Differential Revision: https://reviews.llvm.org/D80522
Summary:
`ScopeContext` wanted to be a thing, but sadly it is dead code.
If you wish to continue the work in D19979, here was a tiny code which
could be reused, but that tiny and that dead, I felt that it is unneded.
Note: Other changes are truly uninteresting.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D73519