I'm involved with the Static Analyzer for the most part.
I think we should embrace newer language standard features and gradually
move forward.
Differential Revision: https://reviews.llvm.org/D154325
In the following example, we will end up hitting the `llvm_unreachable()`:
https://godbolt.org/z/5sccc95Ec
```lang=C++
enum class E {};
const E glob[] = {{}};
void initlistWithinInitlist() {
clang_analyzer_dump(glob[0]); // crashes at loading from `glob[0]`
}
```
We should just return `std::nullopt` instead for these cases.
It's better than crashing.
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D146538
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Casting a pointer to a suitably large integral type by reinterpret-cast
should result in the same value as by using the `__builtin_bit_cast()`.
The compiler exploits this: https://godbolt.org/z/zMP3sG683
However, the analyzer does not bind the same symbolic value to these
expressions, resulting in weird situations, such as failing equality
checks and even results in crashes: https://godbolt.org/z/oeMP7cj8q
Previously, in the `RegionStoreManager::getBinding()` even if `T` was
non-null, we replaced it with `TVR->getValueType()` in case the `MR` was
`TypedValueRegion`.
It doesn't make much sense to auto-detect the type if the type is
already given. By not doing the auto-detection, we would just do the
right thing and perform the load by that type.
This means that we will cast the value to that type.
So, in this patch, I'm proposing to do auto-detection only if the type
was null.
Here is a snippet of code, annotated by the previous and new dump values.
`LocAsInteger` should wrap the `SymRegion`, since we want to load the
address as if it was an integer.
In none of the following cases should type auto-detection be triggered,
hence we should eventually reach an `evalCast()` to lazily cast the loaded
value into that type.
```lang=C++
void LValueToRValueBitCast_dumps(void *p, char (*array)[8]) {
clang_analyzer_dump(p); // remained: &SymRegion{reg_$0<void * p>}
clang_analyzer_dump(array); // remained: {{&SymRegion{reg_$1<char (*)[8] array>}
clang_analyzer_dump((unsigned long)p);
// remained: {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
clang_analyzer_dump(__builtin_bit_cast(unsigned long, p)); <--------- change #1
// previously: {{&SymRegion{reg_$0<void * p>}}}
// now: {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
clang_analyzer_dump((unsigned long)array); // remained: {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
clang_analyzer_dump(__builtin_bit_cast(unsigned long, array)); <--------- change #2
// previously: {{&SymRegion{reg_$1<char (*)[8] array>}}}
// now: {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
}
```
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D136603
Previously, `LazyCompoundVal` bindings to subregions referred by
`LazyCopoundVals`, were not marked as //lazily copied//.
This change returns `LazyCompoundVals` from `getInterestingValues()`,
so their regions can be marked as //lazily copied// in `RemoveDeadBindingsWorker::VisitBinding()`.
Depends on D134947
Authored by: Tomasz Kamiński <tomasz.kamiński@sonarsource.com>
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D135136
To illustrate our current understanding, let's start with the following program:
https://godbolt.org/z/33f6vheh1
```lang=c++
void clang_analyzer_printState();
struct C {
int x;
int y;
int more_padding;
};
struct D {
C c;
int z;
};
C foo(D d, int new_x, int new_y) {
d.c.x = new_x; // B1
assert(d.c.x < 13); // C1
C c = d.c; // L
assert(d.c.y < 10); // C2
assert(d.z < 5); // C3
d.c.y = new_y; // B2
assert(d.c.y < 10); // C4
return c; // R
}
```
In the code, we create a few bindings to subregions of root region `d` (`B1`, `B2`), a constrain on the values (`C1`, `C2`, ….), and create a `lazyCompoundVal` for the part of the region `d` at point `L`, which is returned at point `R`.
Now, the question is which of these should remain live as long the return value of the `foo` call is live. In perfect a word we should preserve:
# only the bindings of the subregions of `d.c`, which were created before the copy at `L`. In our example, this includes `B1`, and not `B2`. In other words, `new_x` should be live but `new_y` shouldn’t.
# constraints on the values of `d.c`, that are reachable through `c`. This can be created both before the point of making the copy (`L`) or after. In our case, that would be `C1` and `C2`. But not `C3` (`d.z` value is not reachable through `c`) and `C4` (the original value of`d.c.y` was overridden at `B2` after the creation of `c`).
The current code in the `RegionStore` covers the use case (1), by using the `getInterestingValues()` to extract bindings to parts of the referred region present in the store at the point of copy. This also partially covers point (2), in case when constraints are applied to a location that has binding at the point of the copy (in our case `d.c.x` in `C1` that has value `new_x`), but it fails to preserve the constraints that require creating a new symbol for location (`d.c.y` in `C2`).
We introduce the concept of //lazily copied// locations (regions) to the `SymbolReaper`, i.e. for which a program can access the value stored at that location, but not its address. These locations are constructed as a set of regions referred to by `lazyCompoundVal`. A //readable// location (region) is a location that //live// or //lazily copied// . And symbols that refer to values in regions are alive if the region is //readable//.
For simplicity, we follow the current approach to live regions and mark the base region as //lazily copied//, and consider any subregions as //readable//. This makes some symbols falsy live (`d.z` in our example) and keeps the corresponding constraints alive.
The rename `Regions` to `LiveRegions` inside `RegionStore` is NFC change, that was done to make it clear, what is difference between regions stored in this two sets.
Regression Test: https://reviews.llvm.org/D134941
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Reviewed By: martong, xazax.hun
Differential Revision: https://reviews.llvm.org/D134947
`LazyCompoundVals` should only appear as `default` bindings in the
store. This fixes the second case in this patch-stack.
Depends on: D132142
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D132143
It turns out that in certain cases `SymbolRegions` are wrapped by
`ElementRegions`; in others, it's not. This discrepancy can cause the
analyzer not to recognize if the two regions are actually referring to
the same entity, which then can lead to unreachable paths discovered.
Consider this example:
```lang=C++
struct Node { int* ptr; };
void with_structs(Node* n1) {
Node c = *n1; // copy
Node* n2 = &c;
clang_analyzer_dump(*n1); // lazy...
clang_analyzer_dump(*n2); // lazy...
clang_analyzer_dump(n1->ptr); // rval(n1->ptr): reg_$2<int * SymRegion{reg_$0<struct Node * n1>}.ptr>
clang_analyzer_dump(n2->ptr); // rval(n2->ptr): reg_$1<int * Element{SymRegion{reg_$0<struct Node * n1>},0 S64b,struct Node}.ptr>
clang_analyzer_eval(n1->ptr != n2->ptr); // UNKNOWN, bad!
(void)(*n1);
(void)(*n2);
}
```
The copy of `n1` will insert a new binding to the store; but for doing
that it actually must create a `TypedValueRegion` which it could pass to
the `LazyCompoundVal`. Since the memregion in question is a
`SymbolicRegion` - which is untyped, it needs to first wrap it into an
`ElementRegion` basically implementing this untyped -> typed conversion
for the sake of passing it to the `LazyCompoundVal`.
So, this is why we have `Element{SymRegion{.}, 0,struct Node}` for `n1`.
The problem appears if the analyzer evaluates a read from the expression
`n1->ptr`. The same logic won't apply for `SymbolRegionValues`, since
they accept raw `SubRegions`, hence the `SymbolicRegion` won't be
wrapped into an `ElementRegion` in that case.
Later when we arrive at the equality comparison, we cannot prove that
they are equal.
For more details check the corresponding thread on discourse:
https://discourse.llvm.org/t/are-symbolicregions-really-untyped/64406
---
In this patch, I'm eagerly wrapping each `SymbolicRegion` by an
`ElementRegion`; basically canonicalizing to this form.
It seems reasonable to do so since any object can be thought of as a single
array of that object; so this should not make much of a difference.
The tests also underpin this assumption, as only a few were broken by
this change; and actually fixed a FIXME along the way.
About the second example, which does the same copy operation - but on
the heap - it will be fixed by the next patch.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D132142
Prior to this patch when the analyzer encountered a non-POD 0 length array,
it still invoked the constructor for 1 element, which lead to false positives.
This patch makes sure that we no longer construct any elements when we see a
0 length array.
Differential Revision: https://reviews.llvm.org/D131501
If a lazyCompoundVal to a struct is bound to the store, there is a policy which decides
whether a copy gets created instead.
This patch introduces a similar policy for arrays, which is required to model structured
binding to arrays without false negatives.
Differential Revision: https://reviews.llvm.org/D128064
Region store was not able to see through this case to the actual
initialized value of STRUCT ff. This change addresses this case by
getting the direct binding. This was found and debugged in a downstream
compiler, with debug guidance from @steakhal. A positive and negative
test case is added.
The specific case where this issue was exposed.
typedef struct {
int a:1;
int b[2];
} STRUCT;
int main() {
STRUCT ff = {0};
STRUCT* pff = &ff;
int a = ((int)pff + 1);
return a;
}
Reviewed By: steakhal, martong
Differential Revision: https://reviews.llvm.org/D124349
Essentially, having a default member initializer for a constant member
does not necessarily imply the member will have the given default value.
Remove part of a2e053638bbf ([analyzer] Treat more const variables and
fields as known contants., 2018-05-04).
Fix#47878
Reviewed By: r.stahl, steakhal
Differential Revision: https://reviews.llvm.org/D124621
Usages of makeNull need to be deprecated in favor of makeNullWithWidth
for architectures where the pointer size should not be assumed. This can
occur when pointer sizes can be of different sizes, depending on address
space for example. See https://reviews.llvm.org/D118050 as an example.
This was uncovered initially in a downstream compiler project, and
tested through those systems tests.
steakhal performed systems testing across a large set of open source
projects.
Co-authored-by: steakhal
Resolves: https://github.com/llvm/llvm-project/issues/53664
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D119601
Summary: Add support of multi-dimensional arrays in `RegionStoreManager::getBindingForElement`. Handle nested ElementRegion's getting offsets and checking for being in bounds. Get values from the nested initialization lists using obtained offsets.
Differential Revision: https://reviews.llvm.org/D111654
Summary: Assuming that values of constant arrays never change, we can retrieve values for specific position(index) right from the initializer, if presented. Retrieve a character code by index from StringLiteral which is an initializer of constant arrays in global scope.
This patch has a known issue of getting access to characters past the end of the literal. The declaration, in which the literal is used, is an implicit cast of kind `array-to-pointer`. The offset should be in literal length's bounds. This should be distinguished from the states in the Standard C++20 [dcl.init.string] 9.4.2.3. Example:
const char arr[42] = "123";
char c = arr[41]; // OK
const char * const str = "123";
char c = str[41]; // NOK
Differential Revision: https://reviews.llvm.org/D107339
Summary: Fix a case when the extent can not be retrieved correctly from incomplete array declaration. Use redeclaration to get the array extent.
Differential Revision: https://reviews.llvm.org/D111542
Summary:
1. Improve readability by moving deeply nested block of code from RegionStoreManager::getBindingForElement to new separate functions:
- getConstantValFromConstArrayInitializer;
- getSValFromInitListExpr.
2. Handle the case when index is a symbolic value. Write specific test cases.
3. Add test cases when there is no initialization expression presented.
This patch implies to make next patches clearer and easier for review process.
Differential Revision: https://reviews.llvm.org/D106681
It turns out llvm::isa<> is variadic, and we could have used this at a
lot of places.
The following patterns:
x && isa<T1>(x) || isa<T2>(x) ...
Will be replaced by:
isa_and_non_null<T1, T2, ...>(x)
Sometimes it caused further simplifications, when it would cause even
more code smell.
Aside from this, keep in mind that within `assert()` or any macro
functions, we need to wrap the isa<> expression within a parenthesis,
due to the parsing of the comma.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111982
Summary: Move logic from CastRetrievedVal to evalCast and replace CastRetrievedVal with evalCast. Also move guts from SimpleSValBuilder::dispatchCast inside evalCast.
evalCast intends to substitute dispatchCast, evalCastFromNonLoc and evalCastFromLoc in the future. OriginalTy provides additional information for casting, which is useful for some cases and useless for others. If `OriginalTy.isNull()` is true, then cast performs based on CastTy only. Now evalCast operates in two ways. It retains all previous behavior and take over dispatchCast behavior. dispatchCast, evalCastFromNonLoc and evalCastFromLoc is considered as buggy since it doesn't take into account OriginalTy of the SVal and should be improved.
From this patch use evalCast instead of dispatchCast, evalCastFromNonLoc and evalCastFromLoc functions. dispatchCast redirects to evalCast.
This patch shall not change any behavior.
Differential Revision: https://reviews.llvm.org/D96090
Summary:
CompoundLiteralRegions have been properly modeled before, but
'getBindingForElement` was not changed to accommodate this change
properly.
rdar://problem/46144644
Differential Revision: https://reviews.llvm.org/D78990
The `SubEngine` interface is an interface with only one implementation
`EpxrEngine`. Adding other implementations are difficult and very
unlikely in the near future. Currently, if anything from `ExprEngine` is
to be exposed to other classes it is moved to `SubEngine` which
restricts the alternative implementations. The virtual methods are have
a slight perofrmance impact. Furthermore, instead of the `LLVM`-style
inheritance a native inheritance is used here, which renders `LLVM`
functions like e.g. `cast<T>()` unusable here. This patch removes this
interface and allows usage of `ExprEngine` directly.
Differential Revision: https://reviews.llvm.org/D80548
Summary:
This patch uses the new `DynamicSize.cpp` to serve dynamic information.
Previously it was static and probably imprecise data.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D69599
Summary:
This patch introduces a placeholder for representing the dynamic size of
regions. It also moves the `getExtent()` method of `SubRegions` to the
`MemRegionManager` as `getStaticSize()`.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D69540
When implementation of the block runtime is available, we should not
warn that block layout fields are uninitialized simply because they're
on the stack.
Write tests for the actual crash that was found. Write comments and refactor
code around 17 style bugs and suppress 3 false positives.
Differential Revision: https://reviews.llvm.org/D66847
llvm-svn: 370246
If the global variable has an initializer, we'll ignore it because we're usually
not analyzing the program from the beginning, which means that the global
variable may have changed before we start our analysis.
However when we're analyzing main() as the top-level function, we can rely
on global initializers to still be valid. At least in C; in C++ we have global
constructors that can still break this logic.
This patch allows the Static Analyzer to load constant initializers from
global variables if the top-level function of the current analysis is main().
Differential Revision: https://reviews.llvm.org/D65361
llvm-svn: 370244
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
Differential revision: https://reviews.llvm.org/D66259
llvm-svn: 368942
Quotes around StringRegions are now escaped and unescaped correctly,
producing valid JSON.
Additionally, add a forgotten escape for Store values.
Differential Revision: https://reviews.llvm.org/D63519
llvm-svn: 363897
Include a unique pointer so that it was possible to figure out if it's
the same cluster in different program states. This allows comparing
dumps of different states against each other.
Differential Revision: https://reviews.llvm.org/D63362
llvm-svn: 363896