28 Commits

Author SHA1 Message Date
Florian Mayer
564fd62aed
[FlowSensitive] Allow to dump nested RecordStorageLocation (#112457)
We have an internal analysis that uses them, and the HTML dump would
fail on the assertion.
2024-10-24 11:05:39 -07:00
martinboehme
e8fce95887
[clang][nullability] Remove RecordValue. (#89052)
This class no longer serves any purpose; see also the discussion here:
https://reviews.llvm.org/D155204#inline-1503204

A lot of existing tests in TransferTest.cpp check for the existence of
`RecordValue`s. Some of these checks are now simply redundant and have
been
removed. In other cases, tests were checking for the existence of a
`RecordValue` as a way of testing whether a record has been initialized.
I have
typically changed these test to instead check whether a field of the
record has
a value.
2024-04-19 09:39:52 +02:00
martinboehme
59ff3adcc1
[clang][dataflow][NFC] Rename ControlFlowContext to AdornedCFG. (#85640)
This expresses better what the class actually does, and it reduces the
number of
`Context`s that we have in the codebase.

A deprecated alias `ControlFlowContext` is available from the old
header.
2024-03-19 08:44:08 +01:00
martinboehme
a11ab139e4
[clang][dataflow] Fix u8 string error with C++20. (#84302)
See also discussion on https://github.com/llvm/llvm-project/pull/84291.
2024-03-07 12:53:26 +01:00
martinboehme
5830d1a2df
Revert "[dataflow][nfc] Fix u8 string usage with c++20" (#84301)
Reverts llvm/llvm-project#84291

The patch broke Windows builds.
2024-03-07 11:48:51 +01:00
Vincent Lee
6e79f77adb
[dataflow][nfc] Fix u8 string usage with c++20 (#84291)
Clang returns an error when compiling this file with c++20
```
error: ISO C++20 does not permit initialization of char array with UTF-8 string literal
```
It seems like c++20 treats u8strings differently than strings (probably
needs char8_t).
Make this a string to fix the error.
2024-03-07 10:46:36 +01:00
martinboehme
994493ce05
[clang][dataflow][NFC] Rename a confusingly named variable. (#80182) 2024-02-01 05:31:32 +01:00
martinboehme
82324bc991
[clang][dataflow] In the CFG visualization, mark converged blocks. (#79999)
Here's an example of the output:


![image](https://github.com/llvm/llvm-project/assets/29098113/63cd509e-c2a7-4794-b758-ea73812ff09f)
2024-01-31 08:31:08 +01:00
Kazu Hirata
46a5693125 [FlowSensitive] Fix warnings
This patch fixes:

  clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp:376:22: error:
  comparison of integers of different signs: 'unsigned int' and
  'TokenInfo::(unnamed enum at
  clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp:356:7)'
  [-Werror,-Wsign-compare]

  clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp:385:23: error:
  comparison of integers of different signs: 'unsigned int' and
  'TokenInfo::(unnamed enum at
  clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp:356:7)'
  [-Werror,-Wsign-compare]

etc
2023-12-08 07:56:45 -08:00
Aaron Ballman
11dfb3cb32 Work around an ICE in MSVC; NFC 2023-12-08 10:20:31 -05:00
martinboehme
71f2ec2db1
[clang][dataflow] Add synthetic fields to RecordStorageLocation (#73860)
Synthetic fields are intended to model the internal state of a class
(e.g. the value stored in a `std::optional`) without having to depend on
that class's implementation details.

Today, this is typically done with properties on `RecordValue`s, but
these have several drawbacks:

* Care must be taken to call `refreshRecordValue()` before modifying a
property so that the modified property values aren’t seen by other
environments that may have access to the same `RecordValue`.

* Properties aren’t associated with a storage location. If an analysis
needs to associate a location with the value stored in a property (e.g.
to model the reference returned by `std::optional::value()`), it needs
to manually add an indirection using a `PointerValue`. (See for example
the way this is done in UncheckedOptionalAccessModel.cpp, specifically
in `maybeInitializeOptionalValueMember()`.)

* Properties don’t participate in the builtin compare, join, and widen
operations. If an analysis needs to apply these operations to
properties, it needs to override the corresponding methods of
`ValueModel`.

* Longer-term, we plan to eliminate `RecordValue`, as by-value
operations on records aren’t really “a thing” in C++ (see
https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This
would obviously eliminate the ability to set properties on
`RecordValue`s.

To demonstrate the advantages of synthetic fields, this patch converts
UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly
simplifies the implementation of the check.

This PR is pretty big; to make it easier to review, I have broken it
down into a stack of three commits, each of which contains a set of
logically related changes. I considered submitting each of these as a
separate PR, but the commits only really make sense when taken together.

To review, I suggest first looking at the changes in
UncheckedOptionalAccessModel.cpp. This gives a flavor for how the
various API changes work together in the context of an analysis. Then,
review the rest of the changes.
2023-12-04 09:29:22 +01:00
martinboehme
526c9b7e37
[clang][nullability] Use proves() and assume() instead of deprecated synonyms. (#70297) 2023-10-30 13:18:57 +01:00
martinboehme
2be7c651c4
[clang][dataflow] HTML logger: Mark iterations that have converged. (#68204)
I've eliminated the `logText("Block converged")` call entirely because

a) These logs are associated with an individual `CFGElement`, while
convergence
   should be associated with a block, and

b) The log message was being associated with the wrong block:
`recordState()`
dumps all of the pending log messages, but `blockConverged()` is called
after
   the last `recordState()` call for a given block, so that the "Block
converged" log message was being associated with the first element of
the
   _next_ block to be processed.

Example:


![image](https://github.com/llvm/llvm-project/assets/29098113/6a19095c-2dbb-4771-9485-e8e45c9d26fb)
2023-10-04 13:47:24 +02:00
martinboehme
ed65ced22a
[clang][dataflow] Identify post-visit state changes in the HTML logger. (#66746)
Previously, post-visit state changes were indistinguishable from
ordinary
iterations, which could give a confusing picture of how many iterations
a block
needs to converge.

Now, post-visit state changes are marked with "post-visit" instead of an
iteration number:


![screenshot](https://github.com/llvm/llvm-project/assets/29098113/5e9553d6-dfaa-45d3-8ea4-e623a14ee4c5))
2023-09-20 15:18:57 +02:00
Kinuko Yasuda
e791535b13
[clang][dataflow] Remove RecordValue.getLog() usage in HTMLLogger (#65645)
We can dump the same information from RecordStorageLocation.

Tested the behavior before and after patch, that generates the field
values in the HTML
in both cases (and also made sure that removing the relevant code makes
the field values
in the HTML go away)
2023-09-12 11:25:40 +02:00
Martin Braenne
e6cd409fc6 [clang][dataflow] In ControlFlowContext, handle Decl by reference instead of pointer.
`build()` guarantees that we'll always have a `Decl`, so we can simplify the code.

Reviewed By: ymandel, xazax.hun

Differential Revision: https://reviews.llvm.org/D156859
2023-08-03 06:59:29 +00:00
Martin Braenne
9ecdbe3855 [clang][dataflow] Rename AggregateStorageLocation to RecordStorageLocation and StructValue to RecordValue.
- Both of these constructs are used to represent structs, classes, and unions;
  Clang uses the collective term "record" for these.

- The term "aggregate" in `AggregateStorageLocation` implies that, at some
  point, the intention may have been to use it also for arrays, but it don't
  think it's possible to use it for arrays. Records and arrays are very
  different and therefore need to be modeled differently. Records have a fixed
  set of named fields, which can have different type; arrays have a variable
  number of elements, but they all have the same type.

- Futhermore, "aggregate" has a very specific meaning in C++
  (https://en.cppreference.com/w/cpp/language/aggregate_initialization).
  Aggregates of class type may not have any user-declared or inherited
  constructors, no private or protected non-static data members, no virtual
  member functions, and so on, but we use `AggregateStorageLocations` to model all objects of class type.

In addition, for consistency, we also rename the following:

- `getAggregateLoc()` (in `RecordValue`, formerly known as `StructValue`) to
  simply `getLoc()`.

- `refreshStructValue()` to `refreshRecordValue()`

We keep the old names around as deprecated synonyms to enable clients to be migrated to the new names.

Reviewed By: ymandel, xazax.hun

Differential Revision: https://reviews.llvm.org/D156788
2023-08-01 20:29:40 +00:00
Martin Braenne
b244b6ae0b [clang][dataflow] Remove Strict suffix from accessors.
For the time being, we're keeping the `Strict` versions around as deprecated synonyms so that clients can be migrated, but these synonyms will be removed soon.

Depends On D156673

Reviewed By: ymandel, xazax.hun

Differential Revision: https://reviews.llvm.org/D156674
2023-07-31 19:40:09 +00:00
Martin Braenne
f76f6674d8 [clang][dataflow] Use Strict accessors where we weren't using them yet.
This eliminates all uses of the deprecated accessors.

Reviewed By: ymandel, xazax.hun

Differential Revision: https://reviews.llvm.org/D156672
2023-07-31 19:40:04 +00:00
Martin Braenne
1b334a2ae7 [clang][dataflow] Eliminate ReferenceValue.
There are no remaining uses of this class in the framework.

This patch is part of the ongoing migration to strict handling of value categories (see https://discourse.llvm.org/t/70086 for details).

Reviewed By: ymandel, xazax.hun, gribozavr2

Differential Revision: https://reviews.llvm.org/D155922
2023-07-27 13:14:47 +00:00
Martin Braenne
771d7d71df [clang][dataflow] HTMLLogger: Don't crash if CFG contains unreachable blocks.
Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D156411
2023-07-27 13:02:42 +00:00
Martin Braenne
44f98d0101 [clang][dataflow] Eliminate duplication between AggregateStorageLocation and StructValue.
After this change, `StructValue` is just a wrapper for an `AggregateStorageLocation`. For the wider context, see https://discourse.llvm.org/t/70086.

## How to review

- Start by looking at the comments added / changed in Value.h, StorageLocation.h,
  and DataflowEnvironment.h. This will give you a good overview of the semantic
  changes.

- Look at the corresponding .cpp files that implement the semantic changes.

- Transfer.cpp, TypeErasedDataflowAnalysis.cpp, and RecordOps.cpp show how the
  core of the framework is affected by the semantic changes.

- UncheckedOptionalAccessModel.cpp shows how this complex model is affected by
  the changes.

- Many of the changes in the rest of the patch are mechanical in nature.

Reviewed By: ymandel, xazax.hun

Differential Revision: https://reviews.llvm.org/D155446
2023-07-24 13:20:01 +00:00
Sam McCall
fc9821a877 Reland "[dataflow] Add dedicated representation of boolean formulas"
This reverts commit 7a72ce98224be76d9328e65eee472381f7c8e7fe.

Test problems were due to unspecified order of function arg evaluation.

Reland "[dataflow] Replace most BoolValue subclasses with references to Formula (and AtomicBoolValue => Atom and BoolValue => Formula where appropriate)"

This properly frees the Value hierarchy from managing boolean formulas.

We still distinguish AtomicBoolValue; this type is used in client code.
However we expect to convert such uses to BoolValue (where the
distinction is not needed) or Atom (where atomic identity is intended),
and then fold AtomicBoolValue into FormulaBoolValue.

We also distinguish TopBoolValue; this has distinct rules for
widen/join/equivalence, and top-ness is not represented in Formula.
It'd be nice to find a cleaner representation (e.g. the absence of a
formula), but no immediate plans.

For now, BoolValues with the same Formula are deduplicated. This doesn't
seem desirable, as Values are mutable by their creators (properties).
We can probably drop this for FormulaBoolValue immediately (not in this
patch, to isolate changes). For AtomicBoolValue we first need to update
clients to stop using value pointers for atom identity.

The data structures around flow conditions are updated:
- flow condition tokens are Atom, rather than AtomicBoolValue*
- conditions are Formula, rather than BoolValue
Most APIs were changed directly, some with many clients had a
new version added and the existing one deprecated.

The factories for BoolValues in Environment keep their existing
signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue)
and are not deprecated. These have very many clients and finding the
most ergonomic API & migration path still needs some thought.

Differential Revision: https://reviews.llvm.org/D153469
2023-07-07 11:58:33 +02:00
Sam McCall
2d8cd19512 Revert "Reland "[dataflow] Add dedicated representation of boolean formulas" and followups
These changes are OK, but they break downstream stuff that needs more time to adapt :-(

This reverts commit 71579569f4399d3cf6bc618dcd449b5388d749cc.
This reverts commit 5e4ad816bf100b0325ed45ab1cfea18deb3ab3d1.
This reverts commit 1c3ac8dfa16c42a631968aadd0396cfe7f7778e0.
2023-07-05 14:32:25 +02:00
Sam McCall
5e4ad816bf [dataflow] Replace most BoolValue subclasses with references to Formula (and AtomicBoolValue => Atom and BoolValue => Formula where appropriate)
This properly frees the Value hierarchy from managing boolean formulas.

We still distinguish AtomicBoolValue; this type is used in client code.
However we expect to convert such uses to BoolValue (where the
distinction is not needed) or Atom (where atomic identity is intended),
and then fold AtomicBoolValue into FormulaBoolValue.

We also distinguish TopBoolValue; this has distinct rules for
widen/join/equivalence, and top-ness is not represented in Formula.
It'd be nice to find a cleaner representation (e.g. the absence of a
formula), but no immediate plans.

For now, BoolValues with the same Formula are deduplicated. This doesn't
seem desirable, as Values are mutable by their creators (properties).
We can probably drop this for FormulaBoolValue immediately (not in this
patch, to isolate changes). For AtomicBoolValue we first need to update
clients to stop using value pointers for atom identity.

The data structures around flow conditions are updated:
- flow condition tokens are Atom, rather than AtomicBoolValue*
- conditions are Formula, rather than BoolValue
Most APIs were changed directly, some with many clients had a
new version added and the existing one deprecated.

The factories for BoolValues in Environment keep their existing
signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue)
and are not deprecated. These have very many clients and finding the
most ergonomic API & migration path still needs some thought.

Differential Revision: https://reviews.llvm.org/D153469
2023-07-05 13:54:32 +02:00
Sam McCall
b56b15ed71 [dataflow] HTMLLogger - show the value of the current expr
Differential Revision: https://reviews.llvm.org/D148949
2023-04-26 17:15:23 +02:00
Sam McCall
0304aa25e0 [dataflow] allow specifying path to dot with $GRAPHVIZ_DOT
I'd like to use this in a CI system where it's easier to tell the
program about paths than manipulate $PATH.
2023-04-21 05:29:11 +02:00
Sam McCall
a443b3d18e [dataflow] add HTML logger: browse code/cfg/analysis timeline/state
With -dataflow-log=/dir we will write /dir/0.html etc for each
function analyzed.

These files show the function's code and CFG, and the path through
the CFG taken by the analysis. At each analysis point we can see the
lattice state.

Currently the lattice state dump is not terribly useful but we can
improve this: showing values associated with the current Expr,
simplifying flow condition, highlighting changes etc.

(Trying not to let this patch scope-creep too much, so I ripped out the
half-finished features)

Demo: 9718fdd484/analysis.html

Differential Revision: https://reviews.llvm.org/D146591
2023-04-19 15:37:06 +02:00