This consolidates the code used in various places to initialize objects (usually for variables) into one central location.
It will also help reduce the number of changes needed when we make the upcoming API changes to `AggregateStorageLocation` and `StructValue`.
Depends On D155074
Reviewed By: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D155075
This moves the formatting job to a shell script, which should also fix
the clang pre-commit CI.
Differential Revision: https://reviews.llvm.org/D153920
- Rename `getReferencedFields()` to `getModeledFields()`. Move the logic that
returns all object fields when doing a context-sensitive analysis to here from
`DataflowAnalysisContext::createStorageLocation()`. I think all callers of
the previous `getReferencedFields()` should use this logic; the fact that they
were not doing so looks like a bug.
- Make `getModeledFields()` public. I have an upcoming patch that will need to
use this function from Transfer.cpp, and there doesn't seem to be any reason
why this function should not be public.
- Use a `SmallSetVector` to get deterministic iteration order. I have a pending
patch where I'm getting flaky tests because
`Environment::createValueUnlessSelfReferential()` is non-deterministically
populating different fields depending on the iteration order. This change
fixes those flaky tests.
Reviewed By: gribozavr2
Differential Revision: https://reviews.llvm.org/D154586
This reverts commit 7a72ce98224be76d9328e65eee472381f7c8e7fe.
Test problems were due to unspecified order of function arg evaluation.
Reland "[dataflow] Replace most BoolValue subclasses with references to Formula (and AtomicBoolValue => Atom and BoolValue => Formula where appropriate)"
This properly frees the Value hierarchy from managing boolean formulas.
We still distinguish AtomicBoolValue; this type is used in client code.
However we expect to convert such uses to BoolValue (where the
distinction is not needed) or Atom (where atomic identity is intended),
and then fold AtomicBoolValue into FormulaBoolValue.
We also distinguish TopBoolValue; this has distinct rules for
widen/join/equivalence, and top-ness is not represented in Formula.
It'd be nice to find a cleaner representation (e.g. the absence of a
formula), but no immediate plans.
For now, BoolValues with the same Formula are deduplicated. This doesn't
seem desirable, as Values are mutable by their creators (properties).
We can probably drop this for FormulaBoolValue immediately (not in this
patch, to isolate changes). For AtomicBoolValue we first need to update
clients to stop using value pointers for atom identity.
The data structures around flow conditions are updated:
- flow condition tokens are Atom, rather than AtomicBoolValue*
- conditions are Formula, rather than BoolValue
Most APIs were changed directly, some with many clients had a
new version added and the existing one deprecated.
The factories for BoolValues in Environment keep their existing
signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue)
and are not deprecated. These have very many clients and finding the
most ergonomic API & migration path still needs some thought.
Differential Revision: https://reviews.llvm.org/D153469
These changes are OK, but they break downstream stuff that needs more time to adapt :-(
This reverts commit 71579569f4399d3cf6bc618dcd449b5388d749cc.
This reverts commit 5e4ad816bf100b0325ed45ab1cfea18deb3ab3d1.
This reverts commit 1c3ac8dfa16c42a631968aadd0396cfe7f7778e0.
And simplify formulas containing true/false
It's unclear to me how useful this is, it does make formulas more
conveniently self-contained now (we can usefully print them without
carrying around the "true/false" labels)
(while here, simplify !!X to X, too)
Differential Revision: https://reviews.llvm.org/D153485
This properly frees the Value hierarchy from managing boolean formulas.
We still distinguish AtomicBoolValue; this type is used in client code.
However we expect to convert such uses to BoolValue (where the
distinction is not needed) or Atom (where atomic identity is intended),
and then fold AtomicBoolValue into FormulaBoolValue.
We also distinguish TopBoolValue; this has distinct rules for
widen/join/equivalence, and top-ness is not represented in Formula.
It'd be nice to find a cleaner representation (e.g. the absence of a
formula), but no immediate plans.
For now, BoolValues with the same Formula are deduplicated. This doesn't
seem desirable, as Values are mutable by their creators (properties).
We can probably drop this for FormulaBoolValue immediately (not in this
patch, to isolate changes). For AtomicBoolValue we first need to update
clients to stop using value pointers for atom identity.
The data structures around flow conditions are updated:
- flow condition tokens are Atom, rather than AtomicBoolValue*
- conditions are Formula, rather than BoolValue
Most APIs were changed directly, some with many clients had a
new version added and the existing one deprecated.
The factories for BoolValues in Environment keep their existing
signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue)
and are not deprecated. These have very many clients and finding the
most ergonomic API & migration path still needs some thought.
Differential Revision: https://reviews.llvm.org/D153469
This is the first step in untangling the two current jobs of BoolValue.
=== Desired end-state: ===
- BoolValue will model C++ booleans e.g. held in StorageLocations.
this includes describing uncertainty (e.g. "top" is a Value concern)
- Formula describes analysis-level assertions in terms of SAT atoms.
These can still be linked together: a BoolValue may have a corresponding
SAT atom which is constrained by formulas.
=== Done in this patch: ===
BoolValue is left intact, Formula is just the input type to the
SAT solver, and we build formulas as needed to invoke the solver.
=== Incidental changes to debug string printing: ===
- variables renamed from B0 etc to V0 etc
B0 collides with the names of basic blocks, which is confusing when
debugging flow conditions.
- debug printing of formulas (Formula and Atom) uses operator<<
rather than debugString(), so works with gtest.
Therefore moved out of DebugSupport.h
- Did the same to Solver::Result, and some helper changes to SolverTest,
so that we get useful messages on unit test failures
- formulas are now printed as infix expressions on one line, rather than
wrapped/indented S-exprs. My experience is that this is easier to scan
FCs for small examples, and large ones are unreadable either way.
- most of the several debugString() functions for constraints/results
are unused, so removed them rather than updating tests.
Inlined the one that was actually used into its callsite.
Differential Revision: https://reviews.llvm.org/D153366
The SAT solver imported its constraints by iterating over an unordered DenseSet.
The path taken, and ultimately the runtime, the specific solution found, and
whether it would time out or complete could depend on the iteration order.
Instead, have the caller specify an ordered collection of constraints.
If this is built in a deterministic way, the system can have deterministic
behavior.
(The main alternative is to sort the constraints by value, but this option
is simpler today).
A lot of nondeterminism still appears to be remain in the framework, so today
the solver's inputs themselves are not deterministic yet.
Differential Revision: https://reviews.llvm.org/D153584
When introducing this new overload in https://reviews.llvm.org/D151183, I didn't consider that the `ASTContext` parameter was unnecessary because it could also be obtained from the `FunctionDecl`.
Reviewed By: gribozavr2, xazax.hun
Differential Revision: https://reviews.llvm.org/D151549
This is the most common use case, so it makes sense to have a specific overload for it.
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D151183
Attempting to analyze templated code doesn't have a good cost-benefit ratio. We
have so far done a best-effort attempt at this, but maintaining this support has
an ongoing high maintenance cost because the AST for templates can violate a lot
of the invariants that otherwise hold for the AST of concrete code. As just one
example, in concrete code the operand of a UnaryOperator '*' is always a prvalue
(https://godbolt.org/z/s3e5xxMd1), but in templates this isn't true
(https://godbolt.org/z/6W9xxGvoM).
Further rationale for not analyzing templates:
* The semantics of a template itself are weakly defined; semantics can depend
strongly on the concrete template arguments. Analyzing the template itself (as
opposed to an instantiation) therefore has limited value.
* Analyzing templates requires a lot of special-case code that isn't necessary
for concrete code because dependent types are hard to deal with and the AST
violates invariants that otherwise hold for concrete code (see above).
* There's precedent in that neither Clang Static Analyzer nor the flow-sensitive
warnings in Clang (such as uninitialized variables) support analyzing
templates.
Reviewed By: gribozavr2, xazax.hun
Differential Revision: https://reviews.llvm.org/D150352
With -dataflow-log=/dir we will write /dir/0.html etc for each
function analyzed.
These files show the function's code and CFG, and the path through
the CFG taken by the analysis. At each analysis point we can see the
lattice state.
Currently the lattice state dump is not terribly useful but we can
improve this: showing values associated with the current Expr,
simplifying flow condition, highlighting changes etc.
(Trying not to let this patch scope-creep too much, so I ripped out the
half-finished features)
Demo: 9718fdd484/analysis.html
Differential Revision: https://reviews.llvm.org/D146591
DataflowAnalysisContext has a few too many responsibilities, this narrows them.
It also allows the Arena to be shared with analysis steps, which need to create
Values, without exposing the whole DACtx API (flow conditions etc).
This means Environment no longer needs to proxy all these methods.
(For now it still does, because there are many callsites to update, and maybe
if we separate bool formulas from values we can avoid churning them twice)
In future, if we untangle the concepts of Values from boolean formulas/atoms,
Arena would also be responsible for creating formulas and managing atom IDs.
Differential Revision: https://reviews.llvm.org/D148554
This is less verbose than checking for class, struct, and union individually,
and I believe it's also more efficient (not that that should be the overriding
concern).
Reviewed By: sammccall, xazax.hun
Differential Revision: https://reviews.llvm.org/D147603
Replace:
1. createAtomicBoolValue() --> create<AtomicBoolValue>()
2. createTopBoolValue() --> create<TopBoolValue>()
/Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:386:19: error: 'createAtomicBoolValue' is deprecated: use create<AtomicBoolValue> instead [-Werror,-Wdeprecated-declarations]
return DACtx->createAtomicBoolValue();
^~~~~~~~~~~~~~~~~~~~~
create<AtomicBoolValue>
/Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h:215:3: note: 'createAtomicBoolValue' has been explicitly marked deprecated here
LLVM_DEPRECATED("use create<AtomicBoolValue> instead",
^
/Users/jiefu/llvm-project/llvm/include/llvm/Support/Compiler.h:143:50: note: expanded from macro 'LLVM_DEPRECATED'
^
In file included from /Users/jiefu/llvm-project/clang/lib/Analysis/FlowSensitive/Transfer.cpp:14:
In file included from /Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/Transfer.h:19:
/Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:391:19: error: 'createTopBoolValue' is deprecated: use create<TopBoolValue> instead [-Werror,-Wdeprecated-declarations]
return DACtx->createTopBoolValue();
^~~~~~~~~~~~~~~~~~
create<TopBoolValue>
/Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h:227:3: note: 'createTopBoolValue' has been explicitly marked deprecated here
LLVM_DEPRECATED("use create<TopBoolValue> instead", "create<TopBoolValue>")
^
/Users/jiefu/llvm-project/llvm/include/llvm/Support/Compiler.h:143:50: note: expanded from macro 'LLVM_DEPRECATED'
^
2 errors generated.
These methods provide a less verbose way of allocating `StorageLocation`s and
`Value`s than the existing `takeOwnership(make_unique(...))` pattern.
In addition, because allocation of `StorageLocation`s and `Value`s now happens
within the `DataflowAnalysisContext`, the `create<T>()` open up the possibility
of using `BumpPtrAllocator` to allocate these objects if it turns out this
helps performance.
Reviewed By: ymandel, xazax.hun, gribozavr2
Differential Revision: https://reviews.llvm.org/D147302
The goal is to be able to understand how the analysis executes, and what its
incremental and final findings are, by enabling logging and reading the logs.
This should include both framework and analysis-specific information.
Ad-hoc printf-debugging doesn't seem sufficient for my understanding, at least.
Being able to check in logging, turn it on in a production binary, and quickly
find particular analysis steps within complex functions seem important.
This can be enabled programmatically through DataflowAnalysisOptions, or
via the flag -dataflow-log. (Works in unittests, clang-tidy, standalone
tools...)
Important missing pieces here:
- a logger implementation that produces an interactive report (HTML file)
which can be navigated via timeline/code/CFG.
(I think the Logger interface is sufficient for this, but need to prototype).
- display of the application-specific lattice
- more useful display for the built-in environment
(e.g. meaningful & consistent names for values, hiding redundant variables in
the flow condition, hiding unreachable expressions)
Differential Revision: https://reviews.llvm.org/D144730
Merges `TransferOptions` into the newly-introduced
`DataflowAnalysisContext::Options` and removes explicit parameter for
`TransferOptions`, relying instead on the common options carried by the analysis
context. Given that there was no intent to allow different options between calls
to `transfer`, a common value for the options is preferable.
Differential Revision: https://reviews.llvm.org/D140703
This reverts commit 2b1a517a92bfdfa3b692a660e19a2bb22513a567. It's a fix forward
with two memory errors fixed, one of which was the cause of the build breakage
in the buildbots.
Original message:
Previously, the model for structs modeled all fields in a struct when
`createValue` was called for that type. This patch adds a prepass on the
function under analysis to discover the fields referenced in the scope and then
limits modeling to only those fields. This reduces wasted memory usage
(modeling unused fields) which can be important for programs that use large
structs.
Note: This patch obviates the need for https://reviews.llvm.org/D123032.
Previously, the model for structs modeled all fields in a struct when
`createValue` was called for that type. This patch adds a prepass on the
function under analysis to discover the fields referenced in the scope and then
limits modeling to only those fields. This reduces wasted memory usage
(modeling unused fields) which can be important for programss that use large
structs.
Note: This patch obviates the need for https://reviews.llvm.org/D123032.
Differential Revision: https://reviews.llvm.org/D140694
This patch modifies context-sensitive analysis of functions to use a cache,
rather than recreate the `ControlFlowContext` from a function decl on each
encounter. However, this is just step 1 (of N) in adding support for a
configurable map of "modeled" function decls (see issue #56879). The map will go
from the actual function decl to the `ControlFlowContext` used to model it. Only
functions pre-configured in the map will be modeled in a context-sensitive way.
We start with a cache because it introduces the desired map, while retaining the
current behavior. Here, functions are mapped to their actual implementations
(when available).
Differential Revision: https://reviews.llvm.org/D131039
Rename `DataflowAnalysisContext::getStableStorageLocation(QualType)`
to `createStorageLocation`, to make it clear that it doesn't return
a stable storage location.
Differential Revision: https://reviews.llvm.org/D131021
Reviewed-by: ymandel, xazax.hun, gribozavr2
Previously we used to desugar implications and biconditionals into
equivalent CNF/DNF as soon as possible. However, this desugaring makes
debug output (Environment::dump()) less readable than it could be.
Therefore, it makes sense to keep the sugared representation of a
boolean formula, and desugar it in the solver.
Reviewed By: sgatev, xazax.hun, wyt
Differential Revision: https://reviews.llvm.org/D130519
The latter way to abbreviate is a lot more common in the LLVM codebase.
Reviewed By: sgatev, xazax.hun
Differential Revision: https://reviews.llvm.org/D130423
Treat `std::nullptr_t` as a regular scalar type to avoid tripping
assertions when analyzing code that uses `std::nullptr_t`.
Differential Revision: https://reviews.llvm.org/D129097
When a `nullptr` is assigned to a pointer variable, it is wrapped in a `ImplicitCastExpr` with cast kind `CK_NullTo(Member)Pointer`. This patch assigns singleton pointer values representing null to these expressions.
For each pointee type, a singleton null `PointerValue` is created and stored in the `NullPointerVals` map of the `DataflowAnalysisContext` class. The pointee type is retrieved from the implicit cast expression, and used to initialise the `PointeeLoc` field of the `PointerValue`. The `PointeeLoc` created is not mapped to any `Value`, reflecting the absence of value indicated by null pointers.
Reviewed By: gribozavr2, sgatev, xazax.hun
Differential Revision: https://reviews.llvm.org/D128056
This patch introduces `buildAndSubstituteFlowCondition` - given a flow condition token, this function returns the expression of constraints defining the flow condition, with values substituted where specified.
As an example:
Say we have tokens `FC1`, `FC2`, `FC3`:
```
FlowConditionConstraints: {
FC1: C1,
FC2: C2,
FC3: (FC1 v FC2) ^ C3,
}
```
`buildAndSubstituteFlowCondition(FC3, /*Substitutions:*/{{C1 -> C1'}})`
returns a value corresponding to `(C1' v C2) ^ C3`.
Note:
This function returns the flow condition expressed directly as its constraints, which differs to how we currently represent the flow condition as a token bound to a set of constraints and dependencies. Making the representation consistent may be an option to consider in the future.
Depends On D128357
Reviewed By: gribozavr2, xazax.hun
Differential Revision: https://reviews.llvm.org/D128363
`createStorageLocation` in `DataflowEnvironment` is now a trivial wrapper around the logic in `DataflowAnalysisContext`.
Additionally, `getObjectFields` and `getFieldsFromClassHierarchy` (required for the implementation of `createStorageLocation`) are also moved to `DataflowAnalysisContext`.
Reviewed By: gribozavr2, sgatev
Differential Revision: https://reviews.llvm.org/D128359
`equivalentBoolValues` compares equivalence between two booleans. The current implementation does not consider constraints imposed by flow conditions on the booleans and its subvalues.
Depends On D128520
Reviewed By: gribozavr2, xazax.hun
Differential Revision: https://reviews.llvm.org/D128521
Given a set of `Constraints`, `querySolver` adds common background information across queries (`TrueVal` is always true and `FalseVal` is always false) and passes the query to the solver.
`checkUnsatisfiable` is a simple wrapper around `querySolver` for checking that the solver returns an unsatisfiable result.
Depends On D128519
Reviewed By: gribozavr2, xazax.hun
Differential Revision: https://reviews.llvm.org/D128520
To keep functionality of creating boolean expressions in a consistent location.
Depends On D128357
Reviewed By: gribozavr2, sgatev, xazax.hun
Differential Revision: https://reviews.llvm.org/D128519
A flow condition is represented with an atomic boolean token, and it is bound to a set of constraints: `(FC <=> C1 ^ C2 ^ ...)`. \
This was internally represented as `(FC v !C1 v !C2 v ...) ^ (C1 v !FC) ^ (C2 v !FC) ^ ...` and tracked by 2 maps:
- `FlowConditionFirstConjunct` stores the first conjunct `(FC v !C1 v !C2 v ...)`
- `FlowConditionRemainingConjuncts` stores the remaining conjuncts `(C1 v !FC) ^ (C2 v !FC) ^ ...`
This patch simplifies the tracking of the constraints by using a single `FlowConditionConstraints` map which stores `(C1 ^ C2 ^ ...)`, eliminating the use of two maps.
Reviewed By: gribozavr2, sgatev, xazax.hun
Differential Revision: https://reviews.llvm.org/D128357
A follow-up to 62b2a47 to centralize the logic that skips expressions
that the CFG does not emit. This allows client code to avoid
sprinkling this logic everywhere.
Add redirects in the transfer function to similarly skip such
expressions by forwarding the visit to the sub-expression.
Differential Revision: https://reviews.llvm.org/D124965
Enable efficient implementation of context-aware joining of distinct
boolean values. It can be used to join distinct boolean values while
preserving flow condition information.
Flow conditions are represented as Token <=> Clause iff formulas. To
perform context-aware joining, one can simply add the tokens of flow
conditions to the formula when joining distinct boolean values, e.g:
`makeOr(makeAnd(FC1, Val1), makeAnd(FC2, Val2))`. This significantly
simplifies the implementation of `Environment::join`.
This patch removes the `DataflowAnalysisContext::getSolver` method.
The `DataflowAnalysisContext::flowConditionImplies` method should be
used instead.
Reviewed-by: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D124395
This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.
Reviewed-by: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D120711