This checks if the layout of `std::initializer_list` is something Clang
can handle much earlier and deduplicates the checks in
CodeGen/CGExprAgg.cpp and AST/ExprConstant.cpp
Also now diagnose `union initializer_list` (Fixes#95495), bit-field for
the size (Fixes a crash that would happen during codegen if it were
unnamed), base classes (that wouldn't be initialized) and polymorphic
classes (whose vtable pointer wouldn't be initialized).
This is one of the node kinds that should be considered an "original
initializer". The patch adds a test that was causing an assertion
failure in
`assert(Children.size() == 1)` without the fix.
# Observed erroneous behavior
Prior to this change, a `MemberExpr` that accesses an anonymous class
might have a prvalue as its base (even though C++ mandates that the base
of a `MemberExpr` must be a glvalue), if the code containing the
`MemberExpr` was in a template.
Here's an example on [godbolt](https://godbolt.org/z/Gz1Mer9oz) (that is
essentially identical to the new test this patch adds).
This example sets up a struct containing an anonymous struct:
```cxx
struct S {
struct {
int i;
};
};
```
It then accesses the member `i` using the expression `S().i`.
When we do this in a non-template function, we get the following AST:
```
`-ExprWithCleanups <col:10, col:14> 'int'
`-ImplicitCastExpr <col:10, col:14> 'int' <LValueToRValue>
`-MemberExpr <col:10, col:14> 'int' xvalue .i 0xbdcb3c0
`-MemberExpr <col:10, col:14> 'S::(anonymous struct at line:2:3)' xvalue .S::(anonymous struct at line:2:3) 0xbdcb488
`-MaterializeTemporaryExpr <col:10, col:12> 'S' xvalue
`-CXXTemporaryObjectExpr <col:10, col:12> 'S' 'void () noexcept' zeroing
```
As expected, the AST contains a `MaterializeTemporarExpr` to materialize
the prvalue `S()` before accessing its members.
When we perform this access in a function template (that doesn't
actually even use its template parameter), the AST for the template
itself looks the same as above. However, the AST for an instantiation of
the template looks different:
```
`-ExprWithCleanups <col:10, col:14> 'int'
`-ImplicitCastExpr <col:10, col:14> 'int' <LValueToRValue>
`-MemberExpr <col:10, col:14> 'int' xvalue .i 0xbdcb3c0
`-MaterializeTemporaryExpr <col:10, col:14> 'S::(anonymous struct at line:2:3)' xvalue
`-MemberExpr <col:10, col:14> 'S::(anonymous struct at line:2:3)' .S::(anonymous struct at line:2:3) 0xbdcb488
`-CXXTemporaryObjectExpr <col:10, col:12> 'S' 'void () noexcept' zeroing
```
Note how the inner `MemberExpr` (the one accessing the anonymous struct)
acts on a prvalue.
Interestingly, this does not appear to cause any problems for CodeGen,
probably because CodeGen is set up to deal with `MemberExpr`s on rvalues
in C. However, it does cause issues in the dataflow framework, which
only supports C++ today and expects the base of a `MemberExpr` to be a
glvalue.
Beyond the issues with the dataflow framework, I think this issue should
be fixed because it goes contrary to what the C++ standard mandates, and
the AST produced for the non-template case indicates that we want to
follow the C++ rules here.
# Reasons for erroneous behavior
Here's why we're getting this malformed AST.
First of all, `TreeTransform` [strips any
`MaterializeTemporaryExpr`s](cd132dcbeb/clang/lib/Sema/TreeTransform.h (L14853))
from the AST.
It is therefore up to
[`TreeTransform::RebuildMemberExpr()`](cd132dcbeb/clang/lib/Sema/TreeTransform.h (L2853))
to recreate a `MaterializeTemporaryExpr` if needed. In the [general
case](cd132dcbeb/clang/lib/Sema/TreeTransform.h (L2915)),
it does this: It calls `Sema::BuildMemberReferenceExpr()`, which ensures
that the base is a glvalue by [materializing a
temporary](cd132dcbeb/clang/lib/Sema/SemaExprMember.cpp (L1016))
if needed. However, when `TreeTransform::RebuildMemberExpr()` encounters
an anonymous class, it [calls
`Sema::BuildFieldReferenceExpr()`](cd132dcbeb/clang/lib/Sema/TreeTransform.h (L2880)),
which, unlike `Sema::BuildMemberReferenceExpr()`, does not make sure
that the base is a glvalue.
# Proposed fix
I considered several possible ways to fix this issue:
- Add logic to `Sema::BuildFieldReferenceExpr()` that materializes a
temporary if needed. This appears to work, but it feels like the fix is
in the wrong place:
- AFAIU, other callers of `Sema::BuildFieldReferenceExpr()` don't need
this logic.
- The issue is caused by `TreeTransform` removing the
`MaterializeTemporaryExpr`, so it seems the fix should also be in
`TreeTransform`
- Materialize the temporary directly in
`TreeTransform::RebuildMemberExpr()` if needed (within the case that
deals with anonymous classes).
This would work, too, but it would duplicate logic that already exists
in `Sema::BuildMemberReferenceExpr()` (which we leverage for the general
case).
- Use `Sema::BuildMemberReferenceExpr()` instead of
`Sema::BuildFieldReferenceExpr()` for the anonymous class case, so that
it also uses the existing logic for materializing the temporary.
This is the option I've decided to go with here. There's a slight
wrinkle in that we create a `LookupResult` that claims we looked up the
unnamed field for the anonymous class -- even though we would obviously
never be able to look up an unnamed field. I think this is defensible
and still better than the other alternatives, but I would welcome
feedback on this from others who know the code better.
- Instead of comparing the identity of the `PointerValue`s, compare the
underlying `StorageLocation`s.
- If the `StorageLocation`s are the same, return a definite "true" as
the
result of the comparison. Before, if the `PointerValue`s were different,
we
would return an atom, even if the storage locations themselves were the
same.
- If the `StorageLocation`s are different, return an atom (as before).
Pointers
that have different storage locations may still alias, so we can't
return a
definite "false" in this case.
The application-level gains from this are relatively modest. For the
Crubit
nullability check running on an internal codebase, this change reduces
the
number of functions on which the SAT solver times out from 223 to 221;
the
number of "pointer expression not modeled" errors reduces from 3815 to
3778.
Still, it seems that the gain in precision is generally worthwhile.
@Xazax-hun inspired me to think about this with his
[comments](https://github.com/llvm/llvm-project/pull/73860#pullrequestreview-1761484615)
on a different PR.
The existing code was full of comments about how we assume this is
always the
case, but it's not mandated by the standard, and there is code out there
that
returns a different type. So check that the result type is in fact the
same as
the destination type before attempting to copy to the result.
To make sure that we don't bail out in more cases than intended, I've
extended
existing tests to verify that in the common case, we do return the
destination
object (by reference or value, as the case may be).
Trying to do so can cause crashes -- see newly added test and the
comments in
the fix.
We're starting to see a repeating pattern here: We're getting crashes
because
`ResultObjectVisitor` and `getReferencedDecls()` don't agree on which
parts of
the AST to visit and, hence, which fields should be modeled.
I think we should ensure consistency between these two parts of the code
by
using a `RecursiveASTVisitor` in `getReferencedDecls()`[^1]; the
`Traverse...()` functions that control which parts of the AST we visit
would go
in a common base class that would be used for both `ResultObjectVisitor`
and
`getReferencedDecls()`.
I'd like to focus this PR, however, on a targeted fix for the current
crash and
postpone the refactoring to a later PR (which will be easier to revert
if there
are unintended side-effects).
[^1]: As an added bonus, this would make the code better structured and
more
efficient than the current sequence of `if (dyn_cast<T>(...))`
statements).
`ConstantExpr` does not appear as a `CFGStmt` in the CFG, so
`StmtToEnvMap::getEnvironment()` was not finding an entry for it in the
map,
causing a crash when we tried to access the iterator resulting from the
map
lookup.
The fix is to make `ignoreCFGOmittedNodes()` ignore `ConstantExpr`, but
in
addition, I'm hardening `StmtToEnvMap::getEnvironment()` to make sure
release
builds don't crash in similar situations in the future.
We used to crash if the previous iteration contained a `BoolValue` and
the
current iteration contained an `IntegerValue`. The accompanying test
sets up
this situation -- see comments there for details.
While I'm here, clean up the tests for integral casts to use the test
helpers we
have available now. I was looking at these tests to understand how we
handle
integral casts, and the test helpers make the tests easier to read.
I reverted https://github.com/llvm/llvm-project/pull/89213 beause it was
causing buildbots to fail with assertion failures.
Embarrassingly, it turns out I had been running tests locally in
`Release` mode, i.e. with `assert()` compiled away.
This PR re-lands #89213 with fixes for the failing assertions.
This class no longer serves any purpose; see also the discussion here:
https://reviews.llvm.org/D155204#inline-1503204
A lot of existing tests in TransferTest.cpp check for the existence of
`RecordValue`s. Some of these checks are now simply redundant and have
been
removed. In other cases, tests were checking for the existence of a
`RecordValue` as a way of testing whether a record has been initialized.
I have
typically changed these test to instead check whether a field of the
record has
a value.
Previously, we were propagating storage locations the other way around,
i.e.
from initializers to result objects, using `RecordValue::getLoc()`. This
gave
the wrong behavior in some cases -- see the newly added or fixed tests
in this
patch.
In addition, this patch now unblocks removing the `RecordValue` class
entirely,
as we no longer need `RecordValue::getLoc()`.
With this patch, the test `TransferTest.DifferentReferenceLocInJoin`
started to
fail because the framework now always uses the same storge location for
a
`MaterializeTemporaryExpr`, meaning that the code under test no longer
set up
the desired state where a variable of reference type is mapped to two
different
storage locations in environments being joined. Rather than trying to
modify
this test to set up the test condition again, I have chosen to replace
the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the
test
condition directly; because this test is more direct, it will also be
less
brittle in the face of future changes.
This is a relatively rare case, but
- It's still nice to get this right,
- We can remove the special case for this in
`VisitCXXOperatorCallExpr()` (that
simply bails out), and
- With this in place, I can avoid having to add a similar special case
in an
upcoming patch.
I'm making some changes to `Environment::getResultObjectLocation()`,
with the
ultimate goal of eliminating `RecordValue` entirely, and I'd like to
make sure
I don't break this behavior (and I've realized we don't have a test for
it yet).
This fixes a crash introduced by
https://github.com/llvm/llvm-project/pull/82348
but also adds additional handling to make sure that we treat empty
initializer
lists for both unions and structs/classes correctly (see tests added in
this
patch).
Reverts llvm/llvm-project#82348, which caused crashes when analyzing
empty InitListExprs for unions, e.g.
```cc
union U {
double double_value;
int int_value;
};
void target() {
U value;
value = {};
}
```
Co-authored-by: Samira Bazuzi <bazuzi@users.noreply.github.com>
Although in a normal implementation the assumption is reasonable, it
seems that some esoteric implementation are not returning a T&. This
should be handled correctly and the values be propagated.
---------
Co-authored-by: martinboehme <mboehme@google.com>
When calling `Environment::getResultObjectLocation` with a
CXXOperatorCallExpr that is a prvalue, we just hit an assert because no
record was ever created.
---------
Co-authored-by: martinboehme <mboehme@google.com>
In particular, it's important that we create the "fallback" atomic at
this point
(which we produce if the transfer function didn't produce a value for
the
expression) so that it is placed in the correct environment.
Previously, we processed the terminator condition in the
`TerminatorVisitor`,
which put the fallback atomic in a copy of the environment that is
produced as
input for the _successor_ block, rather than the environment for the
block
containing the expression for which we produce the fallback atomic.
As a result, we produce different fallback atomics every time we process
the
successor block, and hence we don't have a consistent representation of
the
terminator condition in the flow condition.
This patch includes a test (authored by ymand@) that fails without the
fix.
Previously, we hard-coded the cap on block visits inside the framework.
This
patch enables the caller to specify the cap in the APIs for running an
analysis.
The CFG doesn't contain a CFGElement for the
`CXXDefaultInitExpr::getInit()`, so
it makes sense to consider the `CXXDefaultInitExpr` to be the expression
that
originally constructs the object.
In particular, it's important that we create the "fallback" atomic at
this point
(which we produce if the transfer function didn't produce a value for
the
expression) so that it is placed in the correct environment.
Previously, we processed the terminator condition in the
`TerminatorVisitor`,
which put the fallback atomic in a copy of the environment that is
produced as
input for the _successor_ block, rather than the environment for the
block
containing the expression for which we produce the fallback atomic.
As a result, we produce different fallback atomics every time we process
the
successor block, and hence we don't have a consistent representation of
the
terminator condition in the flow condition.
This patch includes a test (authored by ymand@) that fails without the
fix.
So far, if there was a chain of record type prvalues,
`getResultObjectLocation()` would assign a different result object
location to
each one. This makes no sense, of course, as all of these prvalues end
up
initializing the same result object.
This patch fixes this by propagating storage locations up through the
entire
chain of prvalues.
The new implementation also has the desirable effect of making it
possible to
make `getResultObjectLocation()` const, which seems appropriate given
that,
logically, it is just an accessor.
Before we widen to top, we now check if both values can be proved either
true or
false in their respective environments; if so, widening returns a true
or false
literal. The idea is that we avoid losing information if posssible.
This patch includes a test that fails without this change to widening.
This change does mean that we call the SAT solver in more places, but
this seems
acceptable given the additional precision we gain.
In tests on an internal codebase, the number of SAT solver timeouts we
observe
with Crubit's nullability checker does increase by about 25%. They can
be
brought back to the previous level by doubling the SAT solver work
limit.
The code assumed that the source parameter of an assignment operator is
always
passed by reference, but it is legal for it to be passed by value.
This patch includes a test that assert-fails without the fix.