`CXXInheritedCtorInitExpr` is another of the node kinds that should be
considered an "original initializer". An assertion failure in
`assert(Children.size() == 1)` happens without this fix.
---------
Co-authored-by: martinboehme <mboehme@google.com>
This is one of the node kinds that should be considered an "original
initializer". The patch adds a test that was causing an assertion
failure in
`assert(Children.size() == 1)` without the fix.
We previously had a hand-rolled recursive traversal here that was
exactly what
`RecursiveASTVistor` does anyway. Using the visitor not only eliminates
the
explicit traversal logic but also allows us to introduce a common
visitor base
class for `getReferencedDecls()` and `ResultObjectVisitor`, ensuring
that the
two are consistent in terms of the nodes they visit. Inconsistency
between these
two has caused crashes in the past when `ResultObjectVisitor` tried to
propagate
result object locations to entities that weren't modeled becasue
`getReferencedDecls()` didn't visit them.
Assume in fewer places that the analysis is of a `FunctionDecl`, and
initialize the `Environment` properly for `Stmt`s.
Moves constructors for `Environment` to header to make it more obvious
that there are only minor differences between them and very little
initialization in the constructors.
Tested with check-clang-tooling.
Trying to do so can cause crashes -- see newly added test and the
comments in
the fix.
We're starting to see a repeating pattern here: We're getting crashes
because
`ResultObjectVisitor` and `getReferencedDecls()` don't agree on which
parts of
the AST to visit and, hence, which fields should be modeled.
I think we should ensure consistency between these two parts of the code
by
using a `RecursiveASTVisitor` in `getReferencedDecls()`[^1]; the
`Traverse...()` functions that control which parts of the AST we visit
would go
in a common base class that would be used for both `ResultObjectVisitor`
and
`getReferencedDecls()`.
I'd like to focus this PR, however, on a targeted fix for the current
crash and
postpone the refactoring to a later PR (which will be easier to revert
if there
are unintended side-effects).
[^1]: As an added bonus, this would make the code better structured and
more
efficient than the current sequence of `if (dyn_cast<T>(...))`
statements).
We used to crash if the previous iteration contained a `BoolValue` and
the
current iteration contained an `IntegerValue`. The accompanying test
sets up
this situation -- see comments there for details.
While I'm here, clean up the tests for integral casts to use the test
helpers we
have available now. I was looking at these tests to understand how we
handle
integral casts, and the test helpers make the tests easier to read.
I reverted https://github.com/llvm/llvm-project/pull/89213 beause it was
causing buildbots to fail with assertion failures.
Embarrassingly, it turns out I had been running tests locally in
`Release` mode, i.e. with `assert()` compiled away.
This PR re-lands #89213 with fixes for the failing assertions.
For some reason, when I merged #89235, two lines were mis-formatted.
This patch corrects this; while I'm here, I'm also correcting other
existing formatting errors.
This class no longer serves any purpose; see also the discussion here:
https://reviews.llvm.org/D155204#inline-1503204
A lot of existing tests in TransferTest.cpp check for the existence of
`RecordValue`s. Some of these checks are now simply redundant and have
been
removed. In other cases, tests were checking for the existence of a
`RecordValue` as a way of testing whether a record has been initialized.
I have
typically changed these test to instead check whether a field of the
record has
a value.
Moves free functions from DataflowEnvironment.h/cc and
DataflowAnalysisContext.h/cc to RecordOps and a new ASTOps and exposes
them as needed for current use and to expose getReferencedDecls for
out-of-tree use.
Minimal change in functionality, only to modify the return type of
getReferenceDecls to return the collected decls instead of using output
params.
Tested with `ninja check-clang-tooling`.
Previously, we were propagating storage locations the other way around,
i.e.
from initializers to result objects, using `RecordValue::getLoc()`. This
gave
the wrong behavior in some cases -- see the newly added or fixed tests
in this
patch.
In addition, this patch now unblocks removing the `RecordValue` class
entirely,
as we no longer need `RecordValue::getLoc()`.
With this patch, the test `TransferTest.DifferentReferenceLocInJoin`
started to
fail because the framework now always uses the same storge location for
a
`MaterializeTemporaryExpr`, meaning that the code under test no longer
set up
the desired state where a variable of reference type is mapped to two
different
storage locations in environments being joined. Rather than trying to
modify
this test to set up the test condition again, I have chosen to replace
the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the
test
condition directly; because this test is more direct, it will also be
less
brittle in the face of future changes.
The previous API relied on pointer equality of inputs and outputs to
signal whether a change occured. This was too subtle and led to bugs in
practice. It was also very limiting: the override could not return an equivalent (but
not identical) value.
This is currently only used in one place, but I'm working on a patch
that will
use this from a second place. And I think this already improves the
readability
of the one place this is used so far.
Reported by Static Analyzer Tool:
In clang::dataflow::Environment::initialize(): Using the auto keyword
without an & causes the copy of an object of type LambdaCapture
In https://github.com/llvm/llvm-project/pull/72985, I made a change to
discard
expression state (`ExprToLoc` and `ExprToVal`) at the beginning of each
basic
block. I did so with the claim that "we never need to access entries
from these
maps outside of the current basic block", noting that there are
exceptions to
this claim when control flow happens inside a full-expression (the
operands of
`&&`, `||`, and the conditional operator live in different basic blocks
than the
operator itself) but that we already have a mechanism for retrieving the
values
of these operands from the environment for the block they are computed
in.
It turns out, however, that the operands of these operators aren't the
only
expressions whose values can be accessed from a different basic block;
when
control flow happens within a full-expression, that control flow can be
"interposed" between an expression and its parent. Here is an example:
```cxx
void f(int*, int);
bool cond();
void target() {
int i = 0;
f(&i, cond() ? 1 : 0);
}
```
([godbolt](https://godbolt.org/z/hrbj1Mj3o))
In the CFG[^1] , note how the expression for `&i` is computed in block
B4,
but the parent of this expression (the `CallExpr`) is located in block
B1.
The the argument expression `&i` and the `CallExpr` are essentially
"torn apart"
into different basic blocks by the conditional operator in the second
argument.
In other words, the edge between the `CallExpr` and its argument `&i`
straddles
the boundary between two blocks.
I used to think that this scenario -- where an edge between an
expression and
one of its children straddles a block boundary -- could only happen
between the
expression that triggers the control flow (`&&`, `||`, or the
conditional
operator) and its children, but the example above shows that other
expressions
can be affected as well; the control flow is still triggered by `&&`,
`||` or
the conditional operator, but the expressions affected lie outside these
operators.
Discarding expression state too soon is harmful. For example, an
analysis that
checks the arguments of the `CallExpr` above would not be able to
retrieve a
value for the `&i` argument.
This patch therefore ensures that we don't discard expression state
before the
end of a full-expression. In other cases -- when the evaluation of a
full-expression is complete -- we still want to discard expression state
for the
reasons explained in https://github.com/llvm/llvm-project/pull/72985
(avoid
performing joins on boolean values that are no longer needed, which
unnecessarily extends the flow condition; improve debuggability by
removing
clutter from the expression state).
The impact on performance from this change is about a 1% slowdown in the
Crubit nullability check benchmarks:
```
name old cpu/op new cpu/op delta
BM_PointerAnalysisCopyPointer 71.9µs ± 1% 71.9µs ± 2% ~ (p=0.987 n=15+20)
BM_PointerAnalysisIntLoop 190µs ± 1% 192µs ± 2% +1.06% (p=0.000 n=14+16)
BM_PointerAnalysisPointerLoop 325µs ± 5% 324µs ± 4% ~ (p=0.496 n=18+20)
BM_PointerAnalysisBranch 193µs ± 0% 192µs ± 4% ~ (p=0.488 n=14+18)
BM_PointerAnalysisLoopAndBranch 521µs ± 1% 525µs ± 3% +0.94% (p=0.017 n=18+19)
BM_PointerAnalysisTwoLoops 337µs ± 1% 341µs ± 3% +1.19% (p=0.004 n=17+19)
BM_PointerAnalysisJoinFilePath 1.62ms ± 2% 1.64ms ± 3% +0.92% (p=0.021 n=20+20)
BM_PointerAnalysisCallInLoop 1.14ms ± 1% 1.15ms ± 4% ~ (p=0.135 n=16+18)
```
[^1]:
```
[B5 (ENTRY)]
Succs (1): B4
[B1]
1: [B4.9] ? [B2.1] : [B3.1]
2: [B4.4]([B4.6], [B1.1])
Preds (2): B2 B3
Succs (1): B0
[B2]
1: 1
Preds (1): B4
Succs (1): B1
[B3]
1: 0
Preds (1): B4
Succs (1): B1
[B4]
1: 0
2: int i = 0;
3: f
4: [B4.3] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int *, int))
5: i
6: &[B4.5]
7: cond
8: [B4.7] (ImplicitCastExpr, FunctionToPointerDecay, _Bool (*)(void))
9: [B4.8]()
T: [B4.9] ? ... : ...
Preds (1): B5
Succs (2): B2 B3
[B0 (EXIT)]
Preds (1): B1
```
This fixes a crash introduced by
https://github.com/llvm/llvm-project/pull/82348
but also adds additional handling to make sure that we treat empty
initializer
lists for both unions and structs/classes correctly (see tests added in
this
patch).
Reverts llvm/llvm-project#82348, which caused crashes when analyzing
empty InitListExprs for unions, e.g.
```cc
union U {
double double_value;
int int_value;
};
void target() {
U value;
value = {};
}
```
Co-authored-by: Samira Bazuzi <bazuzi@users.noreply.github.com>
This function will be useful when we change the behavior of record-type
prvalues
so that they directly initialize the associated result object. See also
the
comment here for more details:
9e73656af5/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h (L354)
As part of this patch, we document and assert that synthetic fields may
not have
reference type.
There is no practical use case for this: A `StorageLocation` may not
have
reference type, and a synthetic field of the corresponding non-reference
type
can serve the same purpose.
This patch adds a new interface for the join operation, now properly
called `join`. Originally, the framework offered a single `merge`
operation, which could serve either as a join or a widening. In
practice, though we found this conflation didn't work for non-trivial
anlyses, and split of the widening operation (`widen`). This change
completes the transition by introducing a proper `join` with strict join
semantics.
In the process, it drops an odd (and often misused) aspect of `merge`
wherein callees could implictly instruct the framework to drop the
current entry by returning `false`. This features was never used
correctly in analyses and doesn't belong in a join operation, so it is
omitted.
---------
Co-authored-by: Dmitri Gribenko <gribozavr@gmail.com>
Co-authored-by: martinboehme <mboehme@google.com>
The CFG doesn't contain a CFGElement for the
`CXXDefaultInitExpr::getInit()`, so
it makes sense to consider the `CXXDefaultInitExpr` to be the expression
that
originally constructs the object.
This is to be consistent with `getValue()`, which also uses
`ignoreCFGOmittedNodes()`.
Before this fix, it was not possible to retrieve a `Value` from a "CFG
omitted"
node that had previously been set using `setValue()`; see the
accompanying test,
which fails without the fix.
I discovered this issue while running internal integration tests on
https://github.com/llvm/llvm-project/pull/78127.
In various places, we would previously call `FunctionDecl::hasBody()`
(which
checks whether any redeclaration of the function has a body, not
necessarily the
one on which `hasBody()` is being called).
This is bug-prone, as a recent bug in Crubit's nullability checker has
shown
([fix](4b01ed0f14),
[fix for the
fix](e0c5d8ddd7)).
Instead, we now use `FunctionDecl::doesThisDeclarationHaveABody()`
which, as the
name implies, checks whether the specific redeclaration it is being
called on
has a body.
Alternatively, I considered being more lenient and "canonicalizing" to
the
`FunctionDecl` that has the body if the `FunctionDecl` being passed is a
different redeclaration. However, this also risks hiding bugs: A caller
might
inadverently perform the analysis for all redeclarations of a function
and end
up duplicating work without realizing it. By accepting only the
redeclaration
that contains the body, we prevent this.
I've checked, and all clients that I'm aware of do currently pass in the
redeclaration that contains the function body. Typically this is because
they
use the `ast_matchers::hasBody()` matcher which, unlike
`FunctionDecl::hasBody()`, only matches for the redeclaration containing
the
body.
This template function casts the result of `getValue()` or
`getStorageLocation()` to a given subclass of `Value` or
`StorageLocation` (using `cast_or_null`).
It's a common pattern to do something like this:
```cxx
auto *Val = cast_or_null<PointerValue>(Env.getValue(E));
```
This can now be expressed more concisely like this:
```cxx
auto *Val = Env.get<PointerValue>(E);
```
Instead of adding a new method `get()`, I had originally considered
simply adding a template parameter to `getValue()` and
`getStorageLocation()` (with a default argument of `Value` or
`StorageLocation`), but this results in an undesirable repetition at the
callsite, e.g. `getStorageLocation<RecordStorageLocation>(...)`. The
`Value` and `StorageLocation` in the method name adds nothing of value
when the template argument already contains this information, so it
seemed best to shorten the method name to simply `get()`.
So far, if there was a chain of record type prvalues,
`getResultObjectLocation()` would assign a different result object
location to
each one. This makes no sense, of course, as all of these prvalues end
up
initializing the same result object.
This patch fixes this by propagating storage locations up through the
entire
chain of prvalues.
The new implementation also has the desirable effect of making it
possible to
make `getResultObjectLocation()` const, which seems appropriate given
that,
logically, it is just an accessor.
…ng member pointers.
This initially landed with a broken test due to a mid-air collision with
a new requirement for Environment initialization before field modeling.
Have added that initialization in the test.
From first landing:
getMethodDecl does not handle pointers to members and returns nullptr
for them. getMethodDecl contains a decade-plus-old FIXME to handle
pointers to members, but two approaches I looked at for fixing it are
more invasive or complex than simply swapping to getCalleeDecl.
The first, have getMethodDecl call getCalleeDecl, creates a large tree
of const-ness mismatches due to getMethodDecl returning a non-const
value while being a const member function and getCalleeDecl only being a
const member function when it returns a const value.
The second, implementing an AST walk to match how
CXXMemberCallExpr::getImplicitObjectArgument grabs the LHS of the binary
operator, is basically reimplementing Expr::getReferencedDeclOfCallee,
which is used by Expr::getCalleeDecl. We don't need another copy of that
code.