Corrected various spelling mistakes such as 'occurred', 'receiver',
'initialized', 'length', and others in comments, variable names,
function names, and documentation throughout the project. These
changes improve code readability and maintain consistency in naming
and documentation.
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
In the dataflow framework, the builtin transfer function currently only
handles the GLValue result case of ConditionalOperator when the
true and false expression StorageLocations are exactly the same.
Ideally / we have wanted to introduce alias sets to handle when the Locs
are different. However, that is a larger change to the framework
(and we may need to introduce weak updates).
For now, do something simpler to at least handle when the GLValue is
immediately cast to an RValue, by making up a distinct StorageLocation
that holds the join of the true and false expression values (when not a
record). This seems like the most common case, so seems worth covering.
The case when an LValue is needed and can be updated later (and
thus needs a link to the original storage locations) seems more rare,
and we currently do not handle such updates either, so this intermediate
step is no different (for that case).
…opy/move assignments.
This mirrors the handling of copy/move constructors.
Also fix a couple capitalizations of variables in an adjacent and
related test.
Reverts llvm/llvm-project#157148
Adds fixes to `TransferVisitor::VisitCXXConstructExpr` and `copyRecord`
to avoid crashing on base class initialization from sibling-derived
class instances. I believe this is the only use of copyRecord where we
need this special handling for a shared base class.
Reverts llvm/llvm-project#153066
copyRecord crashes if copying from the RecordStorageLocation shared by
the base/derived objects after a DerivedToBase cast because the source
type is still `Derived` but the copy destination could be of a sibling
type derived from Base that has children not present in `Derived`.
For example, running the dataflow analysis over the following produces
UB by nullptr deref, or fails asserts if enabled:
```cc
struct Base {};
struct DerivedOne : public Base {
int DerivedOneField;
};
struct DerivedTwo : public Base {
int DerivedTwoField;
DerivedTwo(const DerivedOne& d1)
: Base(d1), DerivedTwoField(d1.DerivedOneField) {}
};
```
The constructor initializer for `DerivedTwoField` serves the purpose of
forcing `DerivedOneField` to be modeled, which is necessary to trigger
the crash but not the assert failure.
This reintroduces `Type.h`, having earlier been renamed to `TypeBase.h`,
as a redirection to `TypeBase.h`, and redirects most users to include
the former instead.
This is a preparatory patch for being able to provide inline definitions
for `Type` methods which would otherwise cause a circular dependency
with `Decl{,CXX}.h`.
Doing these operations into their own NFC patch helps the git rename
detection logic work, preserving the history.
This patch makes clang just a little slower to build (~0.17%), just
because it makes more code indirectly include `DeclCXX.h`.
This is a preparatory patch, to be able to provide inline definitions
for `Type` functions which depend on `Decl{,CXX}.h`. As the latter also
depends on `Type.h`, this would not be possible without some
reorganizing.
Splitting this rename into its own patch allows git to track this as a
rename, and preserve all git history, and not force any code
reformatting.
A later NFC patch will reintroduce `Type.h` as redirection to
`TypeBase.h`, rewriting most places back to directly including `Type.h`
instead of `TypeBase.h`, leaving only a handful of places where this is
necessary.
Then yet a later patch will exploit this by making more stuff inline.
Transfer all casts by kind as we currently do implicit casts. This
obviates the need for specific handling of static casts.
Also transfer CK_BaseToDerived and CK_DerivedToBase and add tests for
these and missing tests for already-handled cast types.
Ensure that CK_BaseToDerived casts result in modeling of the fields of the derived class.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Relands #129502.
Previously when the framework encountered unsupported values (such as
enum classes), they were always treated as equal when comparing with
`==`, regardless of their actual values being different.
Now the two sides are only equal if there's a Value assigned to them.
Added handling for the special case of `nullptr == nullptr`, to preserve
the behavior of untyped `nullptr` having no value.
Previously when the framework encountered unsupported values (such as
enum classes), they were always treated as equal when comparing with
`==`, regardless of their actual values being different.
Now the two sides are only equal if there's a Value assigned to them.
Added a Value assignment for `nullptr`, to handle the special case of
`nullptr == nullptr`.
This was missing a call to `ignoreCFGOmittedNodes()`. As a result, the
function
would erroneously conclude that a block did not contain an expression
consumed
in a different block if the expression in question was surrounded by a
`ParenExpr` in the consuming block. The patch adds a test that triggers
this
scenario (and fails without the fix).
To prevent this kind of bug in the future, the patch also adds a new
method
`blockForStmt()` to `AdornedCFG` that calls `ignoreCFGOmittedNodes()`
and is
preferred over accessing `getStmtToBlock()` directly.
We definitely know that these operations change the value of their
operand, so
clear out any value associated with it. We don't create a new value,
instead
leaving it to the analysis to do this if desired.
- Instead of comparing the identity of the `PointerValue`s, compare the
underlying `StorageLocation`s.
- If the `StorageLocation`s are the same, return a definite "true" as
the
result of the comparison. Before, if the `PointerValue`s were different,
we
would return an atom, even if the storage locations themselves were the
same.
- If the `StorageLocation`s are different, return an atom (as before).
Pointers
that have different storage locations may still alias, so we can't
return a
definite "false" in this case.
The application-level gains from this are relatively modest. For the
Crubit
nullability check running on an internal codebase, this change reduces
the
number of functions on which the SAT solver times out from 223 to 221;
the
number of "pointer expression not modeled" errors reduces from 3815 to
3778.
Still, it seems that the gain in precision is generally worthwhile.
@Xazax-hun inspired me to think about this with his
[comments](https://github.com/llvm/llvm-project/pull/73860#pullrequestreview-1761484615)
on a different PR.
The existing code was full of comments about how we assume this is
always the
case, but it's not mandated by the standard, and there is code out there
that
returns a different type. So check that the result type is in fact the
same as
the destination type before attempting to copy to the result.
To make sure that we don't bail out in more cases than intended, I've
extended
existing tests to verify that in the common case, we do return the
destination
object (by reference or value, as the case may be).
`ConstantExpr` does not appear as a `CFGStmt` in the CFG, so
`StmtToEnvMap::getEnvironment()` was not finding an entry for it in the
map,
causing a crash when we tried to access the iterator resulting from the
map
lookup.
The fix is to make `ignoreCFGOmittedNodes()` ignore `ConstantExpr`, but
in
addition, I'm hardening `StmtToEnvMap::getEnvironment()` to make sure
release
builds don't crash in similar situations in the future.
I reverted https://github.com/llvm/llvm-project/pull/89213 beause it was
causing buildbots to fail with assertion failures.
Embarrassingly, it turns out I had been running tests locally in
`Release` mode, i.e. with `assert()` compiled away.
This PR re-lands #89213 with fixes for the failing assertions.
This class no longer serves any purpose; see also the discussion here:
https://reviews.llvm.org/D155204#inline-1503204
A lot of existing tests in TransferTest.cpp check for the existence of
`RecordValue`s. Some of these checks are now simply redundant and have
been
removed. In other cases, tests were checking for the existence of a
`RecordValue` as a way of testing whether a record has been initialized.
I have
typically changed these test to instead check whether a field of the
record has
a value.
Moves free functions from DataflowEnvironment.h/cc and
DataflowAnalysisContext.h/cc to RecordOps and a new ASTOps and exposes
them as needed for current use and to expose getReferencedDecls for
out-of-tree use.
Minimal change in functionality, only to modify the return type of
getReferenceDecls to return the collected decls instead of using output
params.
Tested with `ninja check-clang-tooling`.
Previously, we were propagating storage locations the other way around,
i.e.
from initializers to result objects, using `RecordValue::getLoc()`. This
gave
the wrong behavior in some cases -- see the newly added or fixed tests
in this
patch.
In addition, this patch now unblocks removing the `RecordValue` class
entirely,
as we no longer need `RecordValue::getLoc()`.
With this patch, the test `TransferTest.DifferentReferenceLocInJoin`
started to
fail because the framework now always uses the same storge location for
a
`MaterializeTemporaryExpr`, meaning that the code under test no longer
set up
the desired state where a variable of reference type is mapped to two
different
storage locations in environments being joined. Rather than trying to
modify
this test to set up the test condition again, I have chosen to replace
the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the
test
condition directly; because this test is more direct, it will also be
less
brittle in the face of future changes.
This is currently only used in one place, but I'm working on a patch
that will
use this from a second place. And I think this already improves the
readability
of the one place this is used so far.
This is a relatively rare case, but
- It's still nice to get this right,
- We can remove the special case for this in
`VisitCXXOperatorCallExpr()` (that
simply bails out), and
- With this in place, I can avoid having to add a similar special case
in an
upcoming patch.
This expresses better what the class actually does, and it reduces the
number of
`Context`s that we have in the codebase.
A deprecated alias `ControlFlowContext` is available from the old
header.
This fixes a crash introduced by
https://github.com/llvm/llvm-project/pull/82348
but also adds additional handling to make sure that we treat empty
initializer
lists for both unions and structs/classes correctly (see tests added in
this
patch).
Reverts llvm/llvm-project#82348, which caused crashes when analyzing
empty InitListExprs for unions, e.g.
```cc
union U {
double double_value;
int int_value;
};
void target() {
U value;
value = {};
}
```
Co-authored-by: Samira Bazuzi <bazuzi@users.noreply.github.com>
Although in a normal implementation the assumption is reasonable, it
seems that some esoteric implementation are not returning a T&. This
should be handled correctly and the values be propagated.
---------
Co-authored-by: martinboehme <mboehme@google.com>
This occurs in rewritten candidates for binary operators (a C++20
feature).
The patch modifies UncheckedOptionalAccessModelTest to run in C++20 mode
(as
well as C++17 mode, as before) and to use rewritten candidates. The
modified
test fails without the newly added support for
`CXXRewrittenBinaryOperator`.
When calling `Environment::getResultObjectLocation` with a
CXXOperatorCallExpr that is a prvalue, we just hit an assert because no
record was ever created.
---------
Co-authored-by: martinboehme <mboehme@google.com>
In particular, it's important that we create the "fallback" atomic at
this point
(which we produce if the transfer function didn't produce a value for
the
expression) so that it is placed in the correct environment.
Previously, we processed the terminator condition in the
`TerminatorVisitor`,
which put the fallback atomic in a copy of the environment that is
produced as
input for the _successor_ block, rather than the environment for the
block
containing the expression for which we produce the fallback atomic.
As a result, we produce different fallback atomics every time we process
the
successor block, and hence we don't have a consistent representation of
the
terminator condition in the flow condition.
This patch includes a test (authored by ymand@) that fails without the
fix.
This template function casts the result of `getValue()` or
`getStorageLocation()` to a given subclass of `Value` or
`StorageLocation` (using `cast_or_null`).
It's a common pattern to do something like this:
```cxx
auto *Val = cast_or_null<PointerValue>(Env.getValue(E));
```
This can now be expressed more concisely like this:
```cxx
auto *Val = Env.get<PointerValue>(E);
```
Instead of adding a new method `get()`, I had originally considered
simply adding a template parameter to `getValue()` and
`getStorageLocation()` (with a default argument of `Value` or
`StorageLocation`), but this results in an undesirable repetition at the
callsite, e.g. `getStorageLocation<RecordStorageLocation>(...)`. The
`Value` and `StorageLocation` in the method name adds nothing of value
when the template argument already contains this information, so it
seemed best to shorten the method name to simply `get()`.
So far, if there was a chain of record type prvalues,
`getResultObjectLocation()` would assign a different result object
location to
each one. This makes no sense, of course, as all of these prvalues end
up
initializing the same result object.
This patch fixes this by propagating storage locations up through the
entire
chain of prvalues.
The new implementation also has the desirable effect of making it
possible to
make `getResultObjectLocation()` const, which seems appropriate given
that,
logically, it is just an accessor.
Synthetic fields are intended to model the internal state of a class
(e.g. the value stored in a `std::optional`) without having to depend on
that class's implementation details.
Today, this is typically done with properties on `RecordValue`s, but
these have several drawbacks:
* Care must be taken to call `refreshRecordValue()` before modifying a
property so that the modified property values aren’t seen by other
environments that may have access to the same `RecordValue`.
* Properties aren’t associated with a storage location. If an analysis
needs to associate a location with the value stored in a property (e.g.
to model the reference returned by `std::optional::value()`), it needs
to manually add an indirection using a `PointerValue`. (See for example
the way this is done in UncheckedOptionalAccessModel.cpp, specifically
in `maybeInitializeOptionalValueMember()`.)
* Properties don’t participate in the builtin compare, join, and widen
operations. If an analysis needs to apply these operations to
properties, it needs to override the corresponding methods of
`ValueModel`.
* Longer-term, we plan to eliminate `RecordValue`, as by-value
operations on records aren’t really “a thing” in C++ (see
https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This
would obviously eliminate the ability to set properties on
`RecordValue`s.
To demonstrate the advantages of synthetic fields, this patch converts
UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly
simplifies the implementation of the check.
This PR is pretty big; to make it easier to review, I have broken it
down into a stack of three commits, each of which contains a set of
logically related changes. I considered submitting each of these as a
separate PR, but the commits only really make sense when taken together.
To review, I suggest first looking at the changes in
UncheckedOptionalAccessModel.cpp. This gives a flavor for how the
various API changes work together in the context of an analysis. Then,
review the rest of the changes.
We never need to access entries from these maps outside of the current
basic
block. This could only ever become a consideration when flow control
happens
inside a full-expression (i.e. we have multiple basic blocks for a full
expression); there are two kinds of expression where this can happen,
but we
already deal with these in other ways:
* Short-circuiting logical operators (`&&` and `||`) have operands that
live in
different basic blocks than the operator itself, but we already have
code in
the framework to retrieve the value of these operands from the
environment
for the block they are computed in, rather than in the environment of
the
block containing the operator.
* The conditional operator similarly has operands that live in different
basic
blocks. However, we currently don't implement a transfer function for
the
conditional operator. When we do this, we need to retrieve the values of
the
operands from the environments of the basic blocks they live in, as we
already do for logical operators. This patch adds a comment to this
effect
to the code.
Clearing out `ExprToLoc` and `ExprToVal` has two benefits:
* We avoid performing joins on boolean expressions contained in
`ExprToVal` and
hence extending the flow condition in cases where this is not needed.
Simpler
flow conditions should reduce the amount of work we do in the SAT
solver.
* Debugging becomes easier when flow conditions are simpler and
`ExprToLoc` /
`ExprToVal` don’t contain any extraneous entries.
Benchmark results on Crubit's `pointer_nullability_analysis_benchmark
show a
slight runtime increase for simple benchmarks, offset by substantial
runtime
reductions for more complex benchmarks:
```
name old cpu/op new cpu/op delta
BM_PointerAnalysisCopyPointer 29.8µs ± 1% 29.9µs ± 4% ~ (p=0.879 n=46+49)
BM_PointerAnalysisIntLoop 101µs ± 3% 104µs ± 4% +2.96% (p=0.000 n=55+57)
BM_PointerAnalysisPointerLoop 378µs ± 3% 245µs ± 3% -35.09% (p=0.000 n=47+55)
BM_PointerAnalysisBranch 118µs ± 2% 122µs ± 3% +3.37% (p=0.000 n=59+59)
BM_PointerAnalysisLoopAndBranch 779µs ± 3% 413µs ± 5% -47.01% (p=0.000 n=56+45)
BM_PointerAnalysisTwoLoops 187µs ± 3% 192µs ± 5% +2.80% (p=0.000 n=57+58)
BM_PointerAnalysisJoinFilePath 17.4ms ± 3% 7.2ms ± 3% -58.75% (p=0.000 n=58+57)
BM_PointerAnalysisCallInLoop 14.7ms ± 4% 10.3ms ± 2% -29.87% (p=0.000 n=56+58)
```
The code assumed that the source parameter of an assignment operator is
always
passed by reference, but it is legal for it to be passed by value.
This patch includes a test that assert-fails without the fix.
The assertion fails on the test
TransferTest.EvaluateBlockWithUnreachablePreds
(which I think, ironically, was introuced in the same patch as the
assertion).
This just wasn't obvious because the assertion is inside an `LLVM_DEBUG`
block
and is thus only executed if the command-line flag `-debug` is passed.
We don't
have any CI builds that do this, so it's almost guaranteed that
assertions like
this will start failing over time (if they ever passed in the first
place --
which I'm not sure about here).
It's not clear to me whether there's _some_ assertion we might be able
to make
here -- I've looked at this for a while but haven't been able to come up
with
anything obvious. For the time being, I think it's best to simply delete
the
assertion.