llvm-project

Author	SHA1	Message	Date
martinboehme	c83ec847ac	[clang][dataflow] Extend debug output for `Environment`. (#79982 ) * Print `ReturnLoc`, `ReturnVal`, and `ThisPointeeLoc` if applicable. * For entries in `LocToVal` that correspond to declarations, print the names of the declarations next to them. I've removed the FIXME because all relevant fields are now being dumped. I'm not sure we actually need the capability for the caller to specify which fields to dump, so I've simply deleted this part of the comment. Some examples of the output: ![image](https://github.com/llvm/llvm-project/assets/29098113/17d0978f-b86d-4555-8a61-d1f2021f8d59) ![image](https://github.com/llvm/llvm-project/assets/29098113/021dbb24-5fe2-4720-8a08-f48dcf4b88f8)	2024-01-31 08:11:13 +01:00
martinboehme	7a6c2628e9	[clang][dataflow] Eliminate two uses of `RecordValue::getLoc()`. (#79163 ) This is a small step towards eventually eliminating `RecordValue` entirely.	2024-01-24 08:06:32 +01:00
martinboehme	a2caa4929e	[clang][dataflow] Treat comma operator correctly in `getResultObjectLocation()`. (#78427 )	2024-01-22 09:23:06 +01:00
martinboehme	f1226eea52	[clang][dataflow] Consider `CXXDefaultInitExpr` to be an "original record ctor". (#78423 ) The CFG doesn't contain a CFGElement for the `CXXDefaultInitExpr::getInit()`, so it makes sense to consider the `CXXDefaultInitExpr` to be the expression that originally constructs the object.	2024-01-18 08:59:26 +01:00
martinboehme	23bfc271a3	[clang][dataflow] Use `ignoreCFGOmittedNodes()` in `setValue()`. (#78245 ) This is to be consistent with `getValue()`, which also uses `ignoreCFGOmittedNodes()`. Before this fix, it was not possible to retrieve a `Value` from a "CFG omitted" node that had previously been set using `setValue()`; see the accompanying test, which fails without the fix. I discovered this issue while running internal integration tests on https://github.com/llvm/llvm-project/pull/78127.	2024-01-16 15:48:44 +01:00
martinboehme	c19cacfa34	[clang][dataflow] Tighten checking for existence of a function body. (#78163 ) In various places, we would previously call `FunctionDecl::hasBody()` (which checks whether any redeclaration of the function has a body, not necessarily the one on which `hasBody()` is being called). This is bug-prone, as a recent bug in Crubit's nullability checker has shown ([fix](`4b01ed0f14`), [fix for the fix](`e0c5d8ddd7`)). Instead, we now use `FunctionDecl::doesThisDeclarationHaveABody()` which, as the name implies, checks whether the specific redeclaration it is being called on has a body. Alternatively, I considered being more lenient and "canonicalizing" to the `FunctionDecl` that has the body if the `FunctionDecl` being passed is a different redeclaration. However, this also risks hiding bugs: A caller might inadverently perform the analysis for all redeclarations of a function and end up duplicating work without realizing it. By accepting only the redeclaration that contains the body, we prevent this. I've checked, and all clients that I'm aware of do currently pass in the redeclaration that contains the function body. Typically this is because they use the `ast_matchers::hasBody()` matcher which, unlike `FunctionDecl::hasBody()`, only matches for the redeclaration containing the body.	2024-01-16 12:52:55 +01:00
martinboehme	2ee396b0b1	[clang][dataflow] Add `Environment::get<>()`. (#76027 ) This template function casts the result of `getValue()` or `getStorageLocation()` to a given subclass of `Value` or `StorageLocation` (using `cast_or_null`). It's a common pattern to do something like this: ```cxx auto Val = cast_or_null<PointerValue>(Env.getValue(E)); ``` This can now be expressed more concisely like this: ```cxx auto Val = Env.get<PointerValue>(E); ``` Instead of adding a new method `get()`, I had originally considered simply adding a template parameter to `getValue()` and `getStorageLocation()` (with a default argument of `Value` or `StorageLocation`), but this results in an undesirable repetition at the callsite, e.g. `getStorageLocation<RecordStorageLocation>(...)`. The `Value` and `StorageLocation` in the method name adds nothing of value when the template argument already contains this information, so it seemed best to shorten the method name to simply `get()`.	2023-12-21 09:02:20 +01:00
martinboehme	ca1034341c	[clang][dataflow] Fix an issue with `Environment::getResultObjectLocation()`. (#75483 ) So far, if there was a chain of record type prvalues, `getResultObjectLocation()` would assign a different result object location to each one. This makes no sense, of course, as all of these prvalues end up initializing the same result object. This patch fixes this by propagating storage locations up through the entire chain of prvalues. The new implementation also has the desirable effect of making it possible to make `getResultObjectLocation()` const, which seems appropriate given that, logically, it is just an accessor.	2023-12-18 09:10:03 +01:00
Samira Bazuzi	40381d1264	[clang][dataflow] Re-land: Retrieve members from accessors called usi… (#74336 ) …ng member pointers. This initially landed with a broken test due to a mid-air collision with a new requirement for Environment initialization before field modeling. Have added that initialization in the test. From first landing: getMethodDecl does not handle pointers to members and returns nullptr for them. getMethodDecl contains a decade-plus-old FIXME to handle pointers to members, but two approaches I looked at for fixing it are more invasive or complex than simply swapping to getCalleeDecl. The first, have getMethodDecl call getCalleeDecl, creates a large tree of const-ness mismatches due to getMethodDecl returning a non-const value while being a const member function and getCalleeDecl only being a const member function when it returns a const value. The second, implementing an AST walk to match how CXXMemberCallExpr::getImplicitObjectArgument grabs the LHS of the binary operator, is basically reimplementing Expr::getReferencedDeclOfCallee, which is used by Expr::getCalleeDecl. We don't need another copy of that code.	2023-12-05 12:09:33 +01:00
martinboehme	3b6d63c519	Revert "[clang][dataflow] Retrieve members from accessors called using member…" (#74299 ) Reverts llvm/llvm-project#73978	2023-12-04 11:27:31 +01:00
Samira Bazuzi	a3fe9cb24d	[clang][dataflow] Retrieve members from accessors called using member… (#73978 ) … pointers. getMethodDecl does not handle pointers to members and returns nullptr for them. getMethodDecl contains a decade-plus-old FIXME to handle pointers to members, but two approaches I looked at for fixing it are more invasive or complex than simply swapping to getCalleeDecl. The first, have getMethodDecl call getCalleeDecl, creates a large tree of const-ness mismatches due to getMethodDecl returning a non-const value while being a const member function and getCalleeDecl only being a const member function when it returns a const value. The second, implementing an AST walk to match how CXXMemberCallExpr::getImplicitObjectArgument grabs the LHS of the binary operator, is basically reimplementing Expr::getReferencedDeclOfCallee, which is used by Expr::getCalleeDecl. We don't need another copy of that code.	2023-12-04 10:10:07 +01:00
martinboehme	71f2ec2db1	[clang][dataflow] Add synthetic fields to `RecordStorageLocation` (#73860 ) Synthetic fields are intended to model the internal state of a class (e.g. the value stored in a `std::optional`) without having to depend on that class's implementation details. Today, this is typically done with properties on `RecordValue`s, but these have several drawbacks: * Care must be taken to call `refreshRecordValue()` before modifying a property so that the modified property values aren’t seen by other environments that may have access to the same `RecordValue`. * Properties aren’t associated with a storage location. If an analysis needs to associate a location with the value stored in a property (e.g. to model the reference returned by `std::optional::value()`), it needs to manually add an indirection using a `PointerValue`. (See for example the way this is done in UncheckedOptionalAccessModel.cpp, specifically in `maybeInitializeOptionalValueMember()`.) * Properties don’t participate in the builtin compare, join, and widen operations. If an analysis needs to apply these operations to properties, it needs to override the corresponding methods of `ValueModel`. * Longer-term, we plan to eliminate `RecordValue`, as by-value operations on records aren’t really “a thing” in C++ (see https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This would obviously eliminate the ability to set properties on `RecordValue`s. To demonstrate the advantages of synthetic fields, this patch converts UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly simplifies the implementation of the check. This PR is pretty big; to make it easier to review, I have broken it down into a stack of three commits, each of which contains a set of logically related changes. I considered submitting each of these as a separate PR, but the commits only really make sense when taken together. To review, I suggest first looking at the changes in UncheckedOptionalAccessModel.cpp. This gives a flavor for how the various API changes work together in the context of an analysis. Then, review the rest of the changes.	2023-12-04 09:29:22 +01:00
martinboehme	5bd643e145	[clang][dataflow] Strengthen widening of boolean values. (#73484 ) Before we widen to top, we now check if both values can be proved either true or false in their respective environments; if so, widening returns a true or false literal. The idea is that we avoid losing information if posssible. This patch includes a test that fails without this change to widening. This change does mean that we call the SAT solver in more places, but this seems acceptable given the additional precision we gain. In tests on an internal codebase, the number of SAT solver timeouts we observe with Crubit's nullability checker does increase by about 25%. They can be brought back to the previous level by doubling the SAT solver work limit.	2023-11-27 14:55:49 +01:00
martinboehme	c4c59192e6	[clang][dataflow] Clear `ExprToLoc` and `ExprToVal` at the start of a block. (#72985 ) We never need to access entries from these maps outside of the current basic block. This could only ever become a consideration when flow control happens inside a full-expression (i.e. we have multiple basic blocks for a full expression); there are two kinds of expression where this can happen, but we already deal with these in other ways: * Short-circuiting logical operators (`&&` and `\|\|`) have operands that live in different basic blocks than the operator itself, but we already have code in the framework to retrieve the value of these operands from the environment for the block they are computed in, rather than in the environment of the block containing the operator. * The conditional operator similarly has operands that live in different basic blocks. However, we currently don't implement a transfer function for the conditional operator. When we do this, we need to retrieve the values of the operands from the environments of the basic blocks they live in, as we already do for logical operators. This patch adds a comment to this effect to the code. Clearing out `ExprToLoc` and `ExprToVal` has two benefits: * We avoid performing joins on boolean expressions contained in `ExprToVal` and hence extending the flow condition in cases where this is not needed. Simpler flow conditions should reduce the amount of work we do in the SAT solver. * Debugging becomes easier when flow conditions are simpler and `ExprToLoc` / `ExprToVal` don’t contain any extraneous entries. Benchmark results on Crubit's `pointer_nullability_analysis_benchmark show a slight runtime increase for simple benchmarks, offset by substantial runtime reductions for more complex benchmarks: ``` name old cpu/op new cpu/op delta BM_PointerAnalysisCopyPointer 29.8µs ± 1% 29.9µs ± 4% ~ (p=0.879 n=46+49) BM_PointerAnalysisIntLoop 101µs ± 3% 104µs ± 4% +2.96% (p=0.000 n=55+57) BM_PointerAnalysisPointerLoop 378µs ± 3% 245µs ± 3% -35.09% (p=0.000 n=47+55) BM_PointerAnalysisBranch 118µs ± 2% 122µs ± 3% +3.37% (p=0.000 n=59+59) BM_PointerAnalysisLoopAndBranch 779µs ± 3% 413µs ± 5% -47.01% (p=0.000 n=56+45) BM_PointerAnalysisTwoLoops 187µs ± 3% 192µs ± 5% +2.80% (p=0.000 n=57+58) BM_PointerAnalysisJoinFilePath 17.4ms ± 3% 7.2ms ± 3% -58.75% (p=0.000 n=58+57) BM_PointerAnalysisCallInLoop 14.7ms ± 4% 10.3ms ± 2% -29.87% (p=0.000 n=56+58) ```	2023-11-22 16:34:24 +01:00
martinboehme	a0700532dd	[clang][dataflow] Replace one remaining call to deprecated `addToFlowCondition()`. (#71547 )	2023-11-08 05:32:04 +01:00
martinboehme	7c636728c0	[clang][dataflow] Simplify flow conditions displayed in HTMLLogger. (#70848 ) This can make the flow condition significantly easier to interpret; see below for an example. I had hoped that adding the simplification as a preprocessing step before the SAT solver (in `DataflowAnalysisContext::querySolver()`) would also speed up SAT solving and maybe even eliminate SAT solver timeouts, but in my testing, this actually turns out to be a pessimization. It appears that these simplifications are easy enough for the SAT solver to perform itself. Nevertheless, the improvement in debugging alone makes this a worthwhile change. Example of flow condition output with these changes: ``` Flow condition token: V37 Constraints: (V16 = (((V15 & (V19 = V12)) & V22) & V25)) (V15 = ((V12 & ((V14 = V9) \| (V14 = V4))) & (V13 = V14))) True atoms: (V0, V1, V2, V5, V6, V7, V29, V30, V32, V34, V35, V37) False atoms: (V3, V8, V17) Equivalent atoms: (V11, V15) Flow condition constraints before simplification: V37 ((!V3 & !V8) & !V17) (V37 = V34) (V34 = (V29 & (V35 = V30))) (V29 = (((V16 \| V2) & V32) & (V30 = V32))) (V16 = (((V15 & (V19 = V12)) & V22) & V25)) (V15 = V11) (V11 = ((((V7 \| V2) & V12) & ((V7 & (V14 = V9)) \| (V2 & (V14 = V4)))) & (V13 = V14))) (V2 = V1) (V1 = V0) V0 (V7 = V6) (V6 = V5) (V5 = V2) ```	2023-11-07 15:18:34 +01:00
martinboehme	d1f59544cf	[clang][dataflow] Add `Environment::allows()`. (#70046 ) This allows querying whether, given the flow condition, a certain formula still has a solution (though it is not necessarily implied by the flow condition, as `flowConditionImplies()` would check). This can be checked today, but only with a double negation, i.e. to check whether, given the flow condition, a formula F has a solution, you can check `!Env.flowConditionImplies(Arena.makeNot(F))`. The double negation makes this hard to reason about, and it would be nicer to have a way of directly checking this. For consistency, this patch also renames `flowConditionImplies()` to `proves()`; the old name is kept around for compatibility but deprecated.	2023-10-25 16:02:22 +02:00
martinboehme	14b039c1dd	[clang][dataflow] Remove `declToLocConsistent()` assertion. (#69819 ) As described [here](https://discourse.llvm.org/t/70086/6), there are legitimate non-bug scenarios where two `DeclToLoc` maps to be joined contain different storage locations for the same declaration. This patch also adds a test containing an example of such a situation. (The test fails without the other changes in this patch.) With the assertion removed, the existing logic in `intersectDenseMaps()` will remove the corresponding declaration from the joined DeclToLoc map. We also remove `removeDecl()`'s precondition (that the declaration must be associated with a storage location) because this may no longer hold if the declaration was previously removed during a join, as described above.	2023-10-24 08:42:30 +02:00
Stanislav Gatev	52d0696355	[clang][dataflow] Add support for lambda captures (#68558 ) This adds support for copy, ref, and this lambda captures to the core framework and also adds relevant tests in UncheckedOptionalAccessTest.	2023-10-11 22:18:46 +02:00
Corentin Jabot	af4751738d	[C++] Implement "Deducing this" (P0847R7) This patch implements P0847R7 (partially), CWG2561 and CWG2653. Reviewed By: aaron.ballman, #clang-language-wg Differential Revision: https://reviews.llvm.org/D140828	2023-10-02 14:33:02 +02:00
martinboehme	834cb919b3	[clang][dataflow] Remove declarations from `DeclToLoc` when their lifetime ends. (#67300 ) After https://reviews.llvm.org/D153273, we're now able to use `CFGLifetimeEnds` together with the other CFG options we use.	2023-09-26 08:41:09 +02:00
martinboehme	1d7b59ca8d	[clang][dataflow] Fix two null pointer dereferences in `getMemberForAccessor()`. (#66742 ) The additions to the test trigger crashes without the fixes.	2023-09-19 09:03:20 +02:00
Kinuko Yasuda	03be486ecc	[clang][dataflow] Model the fields that are accessed via inline accessors (#66368 ) So that the values that are accessed via such accessors can be analyzed as a limited version of context-sensitive analysis. We can potentially do this only when some option is set, but doing additional modeling like this won't be expensive and intrusive, so we do it by default for now.	2023-09-18 10:46:36 +02:00
martinboehme	7f66cc7d7a	[clang][dataflow] Merge `RecordValue`s with different locations correctly. (#65319 ) Now that prvalue expressions map directly to values (see https://reviews.llvm.org/D158977), it's no longer guaranteed that `RecordValue`s associated with the same expression will always have the same storage location. In other words, D158977 invalidated the assertion in `mergeDistinctValues()`. The newly added test causes this assertion to fail without the other changes in the patch. This patch fixes the issue. However, the real fix will be to eliminate the `StorageLocation` from `RecordValue` entirely.	2023-09-12 08:43:29 +02:00
Yitzhak Mandelbaum	80f0dc3aa4	[clang][dataflow] Unsoundly treat "Unknown" as "Equivalent" in widening. This change makes widening act the same as equivalence checking. When the analysis does not provide an answer regarding the equivalence of two distinct values, the framework treats them as equivalent. This is an unsound choice that enables convergence. Differential Revision: https://reviews.llvm.org/D159355	2023-09-07 19:06:35 +00:00
Martin Braenne	266c12a1bd	[clang][dataflow] When dumping `ExprToVal`, dump the `Value`, not just its location. This makes `ExprToVal` dumping consistent with `LocToVal` dumping. Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D159274	2023-09-04 07:38:33 +00:00
Martin Braenne	330d5bcbf6	[clang][dataflow] Don't associate prvalue expressions with storage locations. Instead, map prvalue expressions directly to values in a newly introduced map `Environment::ExprToVal`. This change introduces an additional member variable in `Environment` but is an overall win: - It is more conceptually correctly, since prvalues don't have storage locations. - It eliminates complexity from `Environment::setValue(const Expr &E, Value &Val)`. - It reduces the amount of data stored in `Environment`: A prvalue now has a single entry in `ExprToVal` instead of one in `ExprToLoc` and one in `LocToVal`. - Not allocating `StorageLocation`s for prvalues additionally reduces memory usage. This patch is the last step in the migration to strict handling of value categories (see https://discourse.llvm.org/t/70086 for details). The changes here are almost entirely internal to `Environment`. The only externally observable change is that when associating a `RecordValue` with the location returned by `Environment::getResultObjectLocation()` for a given expression, callers additionally need to associate the `RecordValue` with the expression themselves. Reviewed By: xazax.hun Differential Revision: https://reviews.llvm.org/D158977	2023-08-29 07:28:46 +00:00
Martin Braenne	9ecdbe3855	[clang][dataflow] Rename `AggregateStorageLocation` to `RecordStorageLocation` and `StructValue` to `RecordValue`. - Both of these constructs are used to represent structs, classes, and unions; Clang uses the collective term "record" for these. - The term "aggregate" in `AggregateStorageLocation` implies that, at some point, the intention may have been to use it also for arrays, but it don't think it's possible to use it for arrays. Records and arrays are very different and therefore need to be modeled differently. Records have a fixed set of named fields, which can have different type; arrays have a variable number of elements, but they all have the same type. - Futhermore, "aggregate" has a very specific meaning in C++ (https://en.cppreference.com/w/cpp/language/aggregate_initialization). Aggregates of class type may not have any user-declared or inherited constructors, no private or protected non-static data members, no virtual member functions, and so on, but we use `AggregateStorageLocations` to model all objects of class type. In addition, for consistency, we also rename the following: - `getAggregateLoc()` (in `RecordValue`, formerly known as `StructValue`) to simply `getLoc()`. - `refreshStructValue()` to `refreshRecordValue()` We keep the old names around as deprecated synonyms to enable clients to be migrated to the new names. Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D156788	2023-08-01 20:29:40 +00:00
Martin Braenne	b244b6ae0b	[clang][dataflow] Remove `Strict` suffix from accessors. For the time being, we're keeping the `Strict` versions around as deprecated synonyms so that clients can be migrated, but these synonyms will be removed soon. Depends On D156673 Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D156674	2023-07-31 19:40:09 +00:00
Martin Braenne	17ba278f76	[clang][dataflow] Remove deprecated accessors as well as `SkipPast`. Depends On D156672 Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D156673	2023-07-31 19:40:06 +00:00
Martin Braenne	f76f6674d8	[clang][dataflow] Use `Strict` accessors where we weren't using them yet. This eliminates all uses of the deprecated accessors. Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D156672	2023-07-31 19:40:04 +00:00
Martin Braenne	8c0acbf837	[clang][dataflow] Avoid -Wunused-variable.	2023-07-28 07:57:50 +00:00
Martin Braenne	e95134b9cb	[clang][dataflow] Reverse course on `getValue()` deprecation. In the [value categories RFC](https://discourse.llvm.org/t/70086), I proposed that the end state of the migration should be that `getValue()` should only be legal to call on prvalues. As a stepping stone, to allow migrating off existing calls to `getValue()`, I proposed introducing `getValueStrict()`, which would already have the new semantics. However, I've now reconsidered this. Any expression, whether prvalue or glvalue, has a value, so really there isn't any reason to forbid calling `getValue()` on glvalues. I'm therefore removing the deprecation from `getValue()` and transitioning existing `getValueStrict()` calls back to `getValue()`. The other "strict" accessors are a different case. `setValueStrict()` should only be called on prvalues because glvalues need to have a storage location associated with them; it doesn't make sense to only set a value for them. And, of course, `getStorageLocationStrict()` and `setStorageLocationStrict()` should obviously only be called on glvalues because prvalues don't have storage locations. Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D155921	2023-07-27 13:14:49 +00:00
Martin Braenne	1b334a2ae7	[clang][dataflow] Eliminate `ReferenceValue`. There are no remaining uses of this class in the framework. This patch is part of the ongoing migration to strict handling of value categories (see https://discourse.llvm.org/t/70086 for details). Reviewed By: ymandel, xazax.hun, gribozavr2 Differential Revision: https://reviews.llvm.org/D155922	2023-07-27 13:14:47 +00:00
Jie Fu	e71bae94b0	[clang][dataflow] Fix build failure due to -Wunused-variable in DataflowEnvironment.cpp (NFC) /data/llvm-project/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:125:11: error: unused variable 'StructVal2' [-Werror,-Wunused-variable] auto *StructVal2 = cast<StructValue>(&Val2); ^ 1 error generated.	2023-07-24 21:35:52 +08:00
Martin Braenne	44f98d0101	[clang][dataflow] Eliminate duplication between `AggregateStorageLocation` and `StructValue`. After this change, `StructValue` is just a wrapper for an `AggregateStorageLocation`. For the wider context, see https://discourse.llvm.org/t/70086. ## How to review - Start by looking at the comments added / changed in Value.h, StorageLocation.h, and DataflowEnvironment.h. This will give you a good overview of the semantic changes. - Look at the corresponding .cpp files that implement the semantic changes. - Transfer.cpp, TypeErasedDataflowAnalysis.cpp, and RecordOps.cpp show how the core of the framework is affected by the semantic changes. - UncheckedOptionalAccessModel.cpp shows how this complex model is affected by the changes. - Many of the changes in the rest of the patch are mechanical in nature. Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D155446	2023-07-24 13:20:01 +00:00
Martin Braenne	0f6cf55567	[clang][dataflow] Bugfix for `refreshStructValue()`. In the case where the expression was not yet associated with a storage location, we created a new storage location but failed to associate it with the expression. The newly added test fails without the fix. Reviewed By: xazax.hun Differential Revision: https://reviews.llvm.org/D155465	2023-07-17 18:56:25 +00:00
Martin Braenne	d17f455a63	[clang][dataflow] Add `refreshStructValue()`. Besides being a useful abstraction, this function will help insulate existing clients of the framework from upcoming changes to the API of `StructValue` and `AggregateStorageLocation`. Depends On D155202 Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D155204	2023-07-17 07:26:13 +00:00
Martin Braenne	6d768548ec	[clang][dataflow] Add `DataflowEnvironment::createObject()`. This consolidates the code used in various places to initialize objects (usually for variables) into one central location. It will also help reduce the number of changes needed when we make the upcoming API changes to `AggregateStorageLocation` and `StructValue`. Depends On D155074 Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D155075	2023-07-17 07:26:10 +00:00
Sam McCall	6272226b9f	[dataflow] Remove deprecated BoolValue flow condition accessors Use the Formula versions instead, now. Differential Revision: https://reviews.llvm.org/D155152	2023-07-13 09:39:23 +02:00
Sam McCall	7d935d0836	[dataflow] improve determinism of generated SAT system Fixes two places where we relied on map iteration order when processing values, which leaked nondeterminism into the generated SAT formulas. Adds a couple of tests that directly assert that the SAT system is equivalent on each run. It's desirable that the formulas are deterministic based on the input: - our SAT solver is naive and perfermance is sensitive to even simple semantics-preserving transformations like A\|B to B\|A. (e.g. it's likely to choose a different variable to split on). Timeout failures are bad, but flaky ones are terrible to debug. - similarly when debugging, it's important to have a consistent understanding of what e.g. "V23" means across runs. --- Both changes in this patch were isolated from a nullability analysis of real-world code which was extremely slow, spending ages in the SAT solver at "random" points that varied on each run. I've included a reduced version of the code as a regression test. One of the changes shows up directly as flow-condition nondeterminism with a no-op analysis, the other relied on bits of the nullability analysis but I found a synthetic example to show the problem. Differential Revision: https://reviews.llvm.org/D154948	2023-07-12 08:09:10 +02:00
Martin Braenne	b47bdcbc72	[clang][dataflow] Include fields initialized in an `InitListExpr` in `getModeledFields()`. Previously, we were including these fields only in the specific instance that was initialized by the `InitListExpr`, but not in other instances of the same type. This is inconsistent and error-prone. Depends On D154952 Reviewed By: xazax.hun, gribozavr2 Differential Revision: https://reviews.llvm.org/D154961	2023-07-12 04:52:29 +00:00
Martin Braenne	e53da3eab4	[clang][dataflow][NFC] Expand a comment. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D154834	2023-07-11 06:32:08 +00:00
Martin Braenne	e8a1560d1d	[clang][dataflow] Various changes to handling of modeled fields. - Rename `getReferencedFields()` to `getModeledFields()`. Move the logic that returns all object fields when doing a context-sensitive analysis to here from `DataflowAnalysisContext::createStorageLocation()`. I think all callers of the previous `getReferencedFields()` should use this logic; the fact that they were not doing so looks like a bug. - Make `getModeledFields()` public. I have an upcoming patch that will need to use this function from Transfer.cpp, and there doesn't seem to be any reason why this function should not be public. - Use a `SmallSetVector` to get deterministic iteration order. I have a pending patch where I'm getting flaky tests because `Environment::createValueUnlessSelfReferential()` is non-deterministically populating different fields depending on the iteration order. This change fixes those flaky tests. Reviewed By: gribozavr2 Differential Revision: https://reviews.llvm.org/D154586	2023-07-10 06:45:53 +00:00
Sam McCall	fc9821a877	Reland "[dataflow] Add dedicated representation of boolean formulas" This reverts commit 7a72ce98224be76d9328e65eee472381f7c8e7fe. Test problems were due to unspecified order of function arg evaluation. Reland "[dataflow] Replace most BoolValue subclasses with references to Formula (and AtomicBoolValue => Atom and BoolValue => Formula where appropriate)" This properly frees the Value hierarchy from managing boolean formulas. We still distinguish AtomicBoolValue; this type is used in client code. However we expect to convert such uses to BoolValue (where the distinction is not needed) or Atom (where atomic identity is intended), and then fold AtomicBoolValue into FormulaBoolValue. We also distinguish TopBoolValue; this has distinct rules for widen/join/equivalence, and top-ness is not represented in Formula. It'd be nice to find a cleaner representation (e.g. the absence of a formula), but no immediate plans. For now, BoolValues with the same Formula are deduplicated. This doesn't seem desirable, as Values are mutable by their creators (properties). We can probably drop this for FormulaBoolValue immediately (not in this patch, to isolate changes). For AtomicBoolValue we first need to update clients to stop using value pointers for atom identity. The data structures around flow conditions are updated: - flow condition tokens are Atom, rather than AtomicBoolValue* - conditions are Formula, rather than BoolValue Most APIs were changed directly, some with many clients had a new version added and the existing one deprecated. The factories for BoolValues in Environment keep their existing signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue) and are not deprecated. These have very many clients and finding the most ergonomic API & migration path still needs some thought. Differential Revision: https://reviews.llvm.org/D153469	2023-07-07 11:58:33 +02:00
Sam McCall	2d8cd19512	Revert "Reland "[dataflow] Add dedicated representation of boolean formulas" and followups These changes are OK, but they break downstream stuff that needs more time to adapt :-( This reverts commit 71579569f4399d3cf6bc618dcd449b5388d749cc. This reverts commit 5e4ad816bf100b0325ed45ab1cfea18deb3ab3d1. This reverts commit 1c3ac8dfa16c42a631968aadd0396cfe7f7778e0.	2023-07-05 14:32:25 +02:00
Sam McCall	5e4ad816bf	[dataflow] Replace most BoolValue subclasses with references to Formula (and AtomicBoolValue => Atom and BoolValue => Formula where appropriate) This properly frees the Value hierarchy from managing boolean formulas. We still distinguish AtomicBoolValue; this type is used in client code. However we expect to convert such uses to BoolValue (where the distinction is not needed) or Atom (where atomic identity is intended), and then fold AtomicBoolValue into FormulaBoolValue. We also distinguish TopBoolValue; this has distinct rules for widen/join/equivalence, and top-ness is not represented in Formula. It'd be nice to find a cleaner representation (e.g. the absence of a formula), but no immediate plans. For now, BoolValues with the same Formula are deduplicated. This doesn't seem desirable, as Values are mutable by their creators (properties). We can probably drop this for FormulaBoolValue immediately (not in this patch, to isolate changes). For AtomicBoolValue we first need to update clients to stop using value pointers for atom identity. The data structures around flow conditions are updated: - flow condition tokens are Atom, rather than AtomicBoolValue* - conditions are Formula, rather than BoolValue Most APIs were changed directly, some with many clients had a new version added and the existing one deprecated. The factories for BoolValues in Environment keep their existing signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue) and are not deprecated. These have very many clients and finding the most ergonomic API & migration path still needs some thought. Differential Revision: https://reviews.llvm.org/D153469	2023-07-05 13:54:32 +02:00
Martin Braenne	1e7329cd79	[clang][dataflow] Model variables / fields / funcs used in default initializers. The newly added test fails without the other changes in this patch. Reviewed By: sammccall, gribozavr2 Differential Revision: https://reviews.llvm.org/D154420	2023-07-04 12:06:10 +00:00
Martin Braenne	74d8455ba6	[clang][dataflow] Make `getThisPointeeStorageLocation()` return an `AggregateStorageLocation`. This avoids the need for casts at callsites. Depends On D153852 Reviewed By: sammccall, xazax.hun, gribozavr2 Differential Revision: https://reviews.llvm.org/D153854	2023-06-29 04:07:08 +00:00
Martin Braenne	d36324866e	[clang][dataflow] Initialize fields of anonymous records correctly. Previously, the newly added test would crash. Depends On D153851 Reviewed By: gribozavr2 Differential Revision: https://reviews.llvm.org/D153852	2023-06-29 04:07:04 +00:00

1 2 3 4

153 Commits