`CXXInheritedCtorInitExpr` is another of the node kinds that should be
considered an "original initializer". An assertion failure in
`assert(Children.size() == 1)` happens without this fix.
---------
Co-authored-by: martinboehme <mboehme@google.com>
Assume in fewer places that the analysis is of a `FunctionDecl`, and
initialize the `Environment` properly for `Stmt`s.
Moves constructors for `Environment` to header to make it more obvious
that there are only minor differences between them and very little
initialization in the constructors.
Tested with check-clang-tooling.
This class no longer serves any purpose; see also the discussion here:
https://reviews.llvm.org/D155204#inline-1503204
A lot of existing tests in TransferTest.cpp check for the existence of
`RecordValue`s. Some of these checks are now simply redundant and have
been
removed. In other cases, tests were checking for the existence of a
`RecordValue` as a way of testing whether a record has been initialized.
I have
typically changed these test to instead check whether a field of the
record has
a value.
Previously, we were propagating storage locations the other way around,
i.e.
from initializers to result objects, using `RecordValue::getLoc()`. This
gave
the wrong behavior in some cases -- see the newly added or fixed tests
in this
patch.
In addition, this patch now unblocks removing the `RecordValue` class
entirely,
as we no longer need `RecordValue::getLoc()`.
With this patch, the test `TransferTest.DifferentReferenceLocInJoin`
started to
fail because the framework now always uses the same storge location for
a
`MaterializeTemporaryExpr`, meaning that the code under test no longer
set up
the desired state where a variable of reference type is mapped to two
different
storage locations in environments being joined. Rather than trying to
modify
this test to set up the test condition again, I have chosen to replace
the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the
test
condition directly; because this test is more direct, it will also be
less
brittle in the face of future changes.
In https://github.com/llvm/llvm-project/pull/72985, I made a change to
discard
expression state (`ExprToLoc` and `ExprToVal`) at the beginning of each
basic
block. I did so with the claim that "we never need to access entries
from these
maps outside of the current basic block", noting that there are
exceptions to
this claim when control flow happens inside a full-expression (the
operands of
`&&`, `||`, and the conditional operator live in different basic blocks
than the
operator itself) but that we already have a mechanism for retrieving the
values
of these operands from the environment for the block they are computed
in.
It turns out, however, that the operands of these operators aren't the
only
expressions whose values can be accessed from a different basic block;
when
control flow happens within a full-expression, that control flow can be
"interposed" between an expression and its parent. Here is an example:
```cxx
void f(int*, int);
bool cond();
void target() {
int i = 0;
f(&i, cond() ? 1 : 0);
}
```
([godbolt](https://godbolt.org/z/hrbj1Mj3o))
In the CFG[^1] , note how the expression for `&i` is computed in block
B4,
but the parent of this expression (the `CallExpr`) is located in block
B1.
The the argument expression `&i` and the `CallExpr` are essentially
"torn apart"
into different basic blocks by the conditional operator in the second
argument.
In other words, the edge between the `CallExpr` and its argument `&i`
straddles
the boundary between two blocks.
I used to think that this scenario -- where an edge between an
expression and
one of its children straddles a block boundary -- could only happen
between the
expression that triggers the control flow (`&&`, `||`, or the
conditional
operator) and its children, but the example above shows that other
expressions
can be affected as well; the control flow is still triggered by `&&`,
`||` or
the conditional operator, but the expressions affected lie outside these
operators.
Discarding expression state too soon is harmful. For example, an
analysis that
checks the arguments of the `CallExpr` above would not be able to
retrieve a
value for the `&i` argument.
This patch therefore ensures that we don't discard expression state
before the
end of a full-expression. In other cases -- when the evaluation of a
full-expression is complete -- we still want to discard expression state
for the
reasons explained in https://github.com/llvm/llvm-project/pull/72985
(avoid
performing joins on boolean values that are no longer needed, which
unnecessarily extends the flow condition; improve debuggability by
removing
clutter from the expression state).
The impact on performance from this change is about a 1% slowdown in the
Crubit nullability check benchmarks:
```
name old cpu/op new cpu/op delta
BM_PointerAnalysisCopyPointer 71.9µs ± 1% 71.9µs ± 2% ~ (p=0.987 n=15+20)
BM_PointerAnalysisIntLoop 190µs ± 1% 192µs ± 2% +1.06% (p=0.000 n=14+16)
BM_PointerAnalysisPointerLoop 325µs ± 5% 324µs ± 4% ~ (p=0.496 n=18+20)
BM_PointerAnalysisBranch 193µs ± 0% 192µs ± 4% ~ (p=0.488 n=14+18)
BM_PointerAnalysisLoopAndBranch 521µs ± 1% 525µs ± 3% +0.94% (p=0.017 n=18+19)
BM_PointerAnalysisTwoLoops 337µs ± 1% 341µs ± 3% +1.19% (p=0.004 n=17+19)
BM_PointerAnalysisJoinFilePath 1.62ms ± 2% 1.64ms ± 3% +0.92% (p=0.021 n=20+20)
BM_PointerAnalysisCallInLoop 1.14ms ± 1% 1.15ms ± 4% ~ (p=0.135 n=16+18)
```
[^1]:
```
[B5 (ENTRY)]
Succs (1): B4
[B1]
1: [B4.9] ? [B2.1] : [B3.1]
2: [B4.4]([B4.6], [B1.1])
Preds (2): B2 B3
Succs (1): B0
[B2]
1: 1
Preds (1): B4
Succs (1): B1
[B3]
1: 0
Preds (1): B4
Succs (1): B1
[B4]
1: 0
2: int i = 0;
3: f
4: [B4.3] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int *, int))
5: i
6: &[B4.5]
7: cond
8: [B4.7] (ImplicitCastExpr, FunctionToPointerDecay, _Bool (*)(void))
9: [B4.8]()
T: [B4.9] ? ... : ...
Preds (1): B5
Succs (2): B2 B3
[B0 (EXIT)]
Preds (1): B1
```
This is to be consistent with `getValue()`, which also uses
`ignoreCFGOmittedNodes()`.
Before this fix, it was not possible to retrieve a `Value` from a "CFG
omitted"
node that had previously been set using `setValue()`; see the
accompanying test,
which fails without the fix.
I discovered this issue while running internal integration tests on
https://github.com/llvm/llvm-project/pull/78127.
…ng member pointers.
This initially landed with a broken test due to a mid-air collision with
a new requirement for Environment initialization before field modeling.
Have added that initialization in the test.
From first landing:
getMethodDecl does not handle pointers to members and returns nullptr
for them. getMethodDecl contains a decade-plus-old FIXME to handle
pointers to members, but two approaches I looked at for fixing it are
more invasive or complex than simply swapping to getCalleeDecl.
The first, have getMethodDecl call getCalleeDecl, creates a large tree
of const-ness mismatches due to getMethodDecl returning a non-const
value while being a const member function and getCalleeDecl only being a
const member function when it returns a const value.
The second, implementing an AST walk to match how
CXXMemberCallExpr::getImplicitObjectArgument grabs the LHS of the binary
operator, is basically reimplementing Expr::getReferencedDeclOfCallee,
which is used by Expr::getCalleeDecl. We don't need another copy of that
code.
… pointers.
getMethodDecl does not handle pointers to members and returns nullptr
for them. getMethodDecl contains a decade-plus-old FIXME to handle
pointers to members, but two approaches I looked at for fixing it are
more invasive or complex than simply swapping to getCalleeDecl.
The first, have getMethodDecl call getCalleeDecl, creates a large tree
of const-ness mismatches due to getMethodDecl returning a non-const
value while being a const member function and getCalleeDecl only being a
const member function when it returns a const value.
The second, implementing an AST walk to match how
CXXMemberCallExpr::getImplicitObjectArgument grabs the LHS of the binary
operator, is basically reimplementing Expr::getReferencedDeclOfCallee,
which is used by Expr::getCalleeDecl. We don't need another copy of that
code.
Synthetic fields are intended to model the internal state of a class
(e.g. the value stored in a `std::optional`) without having to depend on
that class's implementation details.
Today, this is typically done with properties on `RecordValue`s, but
these have several drawbacks:
* Care must be taken to call `refreshRecordValue()` before modifying a
property so that the modified property values aren’t seen by other
environments that may have access to the same `RecordValue`.
* Properties aren’t associated with a storage location. If an analysis
needs to associate a location with the value stored in a property (e.g.
to model the reference returned by `std::optional::value()`), it needs
to manually add an indirection using a `PointerValue`. (See for example
the way this is done in UncheckedOptionalAccessModel.cpp, specifically
in `maybeInitializeOptionalValueMember()`.)
* Properties don’t participate in the builtin compare, join, and widen
operations. If an analysis needs to apply these operations to
properties, it needs to override the corresponding methods of
`ValueModel`.
* Longer-term, we plan to eliminate `RecordValue`, as by-value
operations on records aren’t really “a thing” in C++ (see
https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This
would obviously eliminate the ability to set properties on
`RecordValue`s.
To demonstrate the advantages of synthetic fields, this patch converts
UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly
simplifies the implementation of the check.
This PR is pretty big; to make it easier to review, I have broken it
down into a stack of three commits, each of which contains a set of
logically related changes. I considered submitting each of these as a
separate PR, but the commits only really make sense when taken together.
To review, I suggest first looking at the changes in
UncheckedOptionalAccessModel.cpp. This gives a flavor for how the
various API changes work together in the context of an analysis. Then,
review the rest of the changes.
We never need to access entries from these maps outside of the current
basic
block. This could only ever become a consideration when flow control
happens
inside a full-expression (i.e. we have multiple basic blocks for a full
expression); there are two kinds of expression where this can happen,
but we
already deal with these in other ways:
* Short-circuiting logical operators (`&&` and `||`) have operands that
live in
different basic blocks than the operator itself, but we already have
code in
the framework to retrieve the value of these operands from the
environment
for the block they are computed in, rather than in the environment of
the
block containing the operator.
* The conditional operator similarly has operands that live in different
basic
blocks. However, we currently don't implement a transfer function for
the
conditional operator. When we do this, we need to retrieve the values of
the
operands from the environments of the basic blocks they live in, as we
already do for logical operators. This patch adds a comment to this
effect
to the code.
Clearing out `ExprToLoc` and `ExprToVal` has two benefits:
* We avoid performing joins on boolean expressions contained in
`ExprToVal` and
hence extending the flow condition in cases where this is not needed.
Simpler
flow conditions should reduce the amount of work we do in the SAT
solver.
* Debugging becomes easier when flow conditions are simpler and
`ExprToLoc` /
`ExprToVal` don’t contain any extraneous entries.
Benchmark results on Crubit's `pointer_nullability_analysis_benchmark
show a
slight runtime increase for simple benchmarks, offset by substantial
runtime
reductions for more complex benchmarks:
```
name old cpu/op new cpu/op delta
BM_PointerAnalysisCopyPointer 29.8µs ± 1% 29.9µs ± 4% ~ (p=0.879 n=46+49)
BM_PointerAnalysisIntLoop 101µs ± 3% 104µs ± 4% +2.96% (p=0.000 n=55+57)
BM_PointerAnalysisPointerLoop 378µs ± 3% 245µs ± 3% -35.09% (p=0.000 n=47+55)
BM_PointerAnalysisBranch 118µs ± 2% 122µs ± 3% +3.37% (p=0.000 n=59+59)
BM_PointerAnalysisLoopAndBranch 779µs ± 3% 413µs ± 5% -47.01% (p=0.000 n=56+45)
BM_PointerAnalysisTwoLoops 187µs ± 3% 192µs ± 5% +2.80% (p=0.000 n=57+58)
BM_PointerAnalysisJoinFilePath 17.4ms ± 3% 7.2ms ± 3% -58.75% (p=0.000 n=58+57)
BM_PointerAnalysisCallInLoop 14.7ms ± 4% 10.3ms ± 2% -29.87% (p=0.000 n=56+58)
```
This allows querying whether, given the flow condition, a certain
formula still
has a solution (though it is not necessarily implied by the flow
condition, as
`flowConditionImplies()` would check).
This can be checked today, but only with a double negation, i.e. to
check
whether, given the flow condition, a formula F has a solution, you can
check
`!Env.flowConditionImplies(Arena.makeNot(F))`. The double negation makes
this
hard to reason about, and it would be nicer to have a way of directly
checking
this.
For consistency, this patch also renames `flowConditionImplies()` to
`proves()`;
the old name is kept around for compatibility but deprecated.
We want to eliminate the `RecordStorageLocation` from `RecordValue` and,
ultimately, eliminate `RecordValue` entirely (see the discussion linked
in the
`RecordValue` class comment). This is one step in that direction.
To eliminate `RecordValue::getChild()`, we also eliminate the last
remaining
caller, namely the `getFieldValue(const RecordValue *, ...)` overload.
Calls
to this overload have been rewritten to use the
`getFieldValue(const RecordStorageLocation *, ...)` overload. Note that
this
also makes the code slightly simpler in many cases.
Now that prvalue expressions map directly to values (see
https://reviews.llvm.org/D158977), it's no longer guaranteed that
`RecordValue`s
associated with the same expression will always have the same storage
location.
In other words, D158977 invalidated the assertion in
`mergeDistinctValues()`.
The newly added test causes this assertion to fail without the other
changes in
the patch.
This patch fixes the issue. However, the real fix will be to eliminate
the
`StorageLocation` from `RecordValue` entirely.
- Both of these constructs are used to represent structs, classes, and unions;
Clang uses the collective term "record" for these.
- The term "aggregate" in `AggregateStorageLocation` implies that, at some
point, the intention may have been to use it also for arrays, but it don't
think it's possible to use it for arrays. Records and arrays are very
different and therefore need to be modeled differently. Records have a fixed
set of named fields, which can have different type; arrays have a variable
number of elements, but they all have the same type.
- Futhermore, "aggregate" has a very specific meaning in C++
(https://en.cppreference.com/w/cpp/language/aggregate_initialization).
Aggregates of class type may not have any user-declared or inherited
constructors, no private or protected non-static data members, no virtual
member functions, and so on, but we use `AggregateStorageLocations` to model all objects of class type.
In addition, for consistency, we also rename the following:
- `getAggregateLoc()` (in `RecordValue`, formerly known as `StructValue`) to
simply `getLoc()`.
- `refreshStructValue()` to `refreshRecordValue()`
We keep the old names around as deprecated synonyms to enable clients to be migrated to the new names.
Reviewed By: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D156788
For the time being, we're keeping the `Strict` versions around as deprecated synonyms so that clients can be migrated, but these synonyms will be removed soon.
Depends On D156673
Reviewed By: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D156674
In the case where the expression was not yet associated with a storage location, we created a new storage location but failed to associate it with the expression.
The newly added test fails without the fix.
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D155465
These insulate tests against changes to the `getChild()` functions of `AggregateStorageLocation` and `StructValue` that will happen as part of the migration to strict handling of value categories (see https://discourse.llvm.org/t/70086 for details):
- `AggregateStorageLocation::getChild()` will soon return a `StorageLocation *`
instead of a `StorageLocation &`. When this happens, `getFieldValue()` will be
changed to return null if `AggregateStorageLocation::getChild()` returns null;
test code will not need to change as it should already be checking whether the
return value of `getFieldValue()` is null.
- `StructValue::getChild()` will soon return a `StorageLocation *` instead of a
`Value *`. When this happens, `getFieldValue()` will be changed to look up the
`Value *` in the `Environment`. Again, test code will not need to change.
The test helpers will continue to serve a useful purpose once the API changes are complete, so the intent is to leave them in place.
This patch changes DataflowEnvironmentTest.cpp and RecordOpsTest.cpp to use the test helpers. TransferTest.cpp will be changed in an upcoming patch to help keep patch sizes manageable for review.
Depends On D154934
Reviewed By: ymandel, xazax.hun, gribozavr2
Differential Revision: https://reviews.llvm.org/D154935
This parameter was already a no-op, so removing it doesn't change behavior.
Reviewed By: ymandel
Differential Revision: https://reviews.llvm.org/D150137
This parameter was already a no-op, so removing it doesn't change behavior.
Depends On D149144
Reviewed By: ymandel, xazax.hun, gribozavr2
Differential Revision: https://reviews.llvm.org/D149151
When building the set of referenced fields for the `DataflowAnalysisContext`,
include fields referenced only in default member initializers. These
initializers are visited in the CFGs of constructors and so the fields must be
included when analysing constructor bodies.
Differential Revision: https://reviews.llvm.org/D144987
There were two (small) bugs causing crashes in the analysis. This patch fixes both of them.
1. An enum value was accessed as a class member. Now, the engine gracefully
ignores such member expressions.
2. Field access in `MemberExpr` of struct/class-typed global variables. Analysis
didn't interpret fields of global vars, because the vars were initialized before
the fields were added to the "allowlist". Now, the allowlist is set _before_
init of globals.
Differential Revision: https://reviews.llvm.org/D141384
This reverts commit 2b1a517a92bfdfa3b692a660e19a2bb22513a567. It's a fix forward
with two memory errors fixed, one of which was the cause of the build breakage
in the buildbots.
Original message:
Previously, the model for structs modeled all fields in a struct when
`createValue` was called for that type. This patch adds a prepass on the
function under analysis to discover the fields referenced in the scope and then
limits modeling to only those fields. This reduces wasted memory usage
(modeling unused fields) which can be important for programs that use large
structs.
Note: This patch obviates the need for https://reviews.llvm.org/D123032.
Previously, the model for structs modeled all fields in a struct when
`createValue` was called for that type. This patch adds a prepass on the
function under analysis to discover the fields referenced in the scope and then
limits modeling to only those fields. This reduces wasted memory usage
(modeling unused fields) which can be important for programss that use large
structs.
Note: This patch obviates the need for https://reviews.llvm.org/D123032.
Differential Revision: https://reviews.llvm.org/D140694
Previously, the analysis modeled global variables appearing in the _body_ of
any function (including constructors). But, that misses those appearing in
constructor _initializers_. This patch adds the initializers to the set of
expressions used to determine which globals to model.
Differential Revision: https://reviews.llvm.org/D140501
This patch moves `Analysis/FlowSensitive/NoopAnalysis.h` from `clang/unittests/` to `clang/include/clang/`, so that we can use it for doing context-sensitive analysis.
Reviewed By: ymandel, gribozavr2, sgatev
Differential Revision: https://reviews.llvm.org/D130304
To keep functionality of creating boolean expressions in a consistent location.
Depends On D128357
Reviewed By: gribozavr2, sgatev, xazax.hun
Differential Revision: https://reviews.llvm.org/D128519
This enables tests out of clang/unittests/Analysis/FlowSensitive to
use the testing support utilities.
Reviewed-by: ymandel, gribozavr2
Differential Revision: https://reviews.llvm.org/D121285
Adds a dataflow analysis that detects unsafe accesses to values of type
`std::optional`, `absl::optional`, or `base::Optional`.
Reviewed-by: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D121197
When pre-initializing fields in the environment, the code assumed that all
fields of a struct would be initialized. However, given limits on value
construction, that assumption is incorrect. This patch changes the code to drop
that assumption and thereby avoid dereferencing a nullptr.
Differential Revision: https://reviews.llvm.org/D121158
This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.
Reviewed-by: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D120711