This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
For lambdas that are converted to C function pointers,
```
int (*ret_zero)() = []() { return 0; };
```
clang will generate conversion method like:
```
CXXConversionDecl implicit used constexpr operator int (*)() 'int (*() const noexcept)()' inline
-CompoundStmt
-ReturnStmt
-ImplicitCastExpr 'int (*)()' <FunctionToPointerDecay>
-DeclRefExpr 'int ()' lvalue CXXMethod 0x5ddb6fe35b18 '__invoke' 'int ()'
-CXXMethodDecl implicit used __invoke 'int ()' static inline
-CompoundStmt (empty)
```
Based on comment in Sema, `__invoke`'s function body is left empty
because it's will be filled in CodeGen, so in AST analysis phase we
should get lambda's `operator()` directly instead of calling `__invoke`
itself.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Closes#57270.
This PR changes the `Stmt *` field in `SymbolConjured` with
`CFGBlock::ConstCFGElementRef`. The motivation is that, when conjuring a
symbol, there might not always be a statement available, causing
information to be lost for conjured symbols, whereas the CFGElementRef
can always be provided at the callsite.
Following the idea, this PR changes callsites of functions to create
conjured symbols, and replaces them with appropriate `CFGElementRef`s.
There is a caveat at loop widening, where the correct location is the
CFG terminator (which is not an element and does not have a ref). In
this case, the first element in the block is passed as a location.
Previous PR #128251, Reverted at #137304.
This PR changes the `Stmt *` field in `SymbolConjured` with
`CFGBlock::ConstCFGElementRef`. The motivation is that, when conjuring a
symbol, there might not always be a statement available, causing
information to be lost for conjured symbols, whereas the CFGElementRef
can always be provided at the callsite.
Following the idea, this PR changes callsites of functions to create
conjured symbols, and replaces them with appropriate `CFGElementRef`s.
Closes#57270
I noticed recently that this code (that I wrote xD) uses the
`getRuntimeDefinition()` which isn't quite necessary for the simple task
this function was designed for.
Why would it be better not using this API here?
I'm experimenting with improving how virtual functions are inlined,
where depending on our ability of deducing the dynamic type of the
object we may end up with inaccurate type information. Such inaccuracy
would mean that we may have multiple runtime definitions. After that,
this code would become ambiguous.
To resolve this, I decided to refactor this and use a simpler - but
equivalent approach.
When instantiating "callable<T>", the "class CallableType" nested type
will only have a declaration in the copy for the instantiation - because
it's not refereed to directly by any other code that would need a
complete definition.
However, in the past, when conservative eval calling member function, we
took the static type of the "this" expr, and looked up the CXXRecordDecl
it refereed to to see if it has any mutable members (to decide if it
needs to refine invalidation or not). Unfortunately, that query needs a
definition, and it asserts otherwise, thus we crashed.
To fix this, we should consult the dynamic type of the object, because
that will have the definition.
I anyways added a check for "hasDefinition" just to be on the safe side.
Fixes#77378
If a mutex interface is split in inheritance chain, e.g. struct mutex
has `unlock` and inherits `lock` from __mutex_base then calls m.lock()
and m.unlock() have different "this" targets: m and the __mutex_base of
m, which used to confuse the `ActiveCritSections` list.
Taking base region canonicalizes the region used to identify a critical
section and enables search in ActiveCritSections list regardless of
which class the callee is the member of.
This likely fixes#104241
CPP-5541
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
As my BSc thesis I've implemented a checker for std::variant and
std::any, and in the following weeks I'll upload a revised version of
them here.
# Prelude
@Szelethus and I sent out an email with our initial plans here:
https://discourse.llvm.org/t/analyzer-new-checker-for-std-any-as-a-bsc-thesis/65613/2
We also created a stub checker patch here:
https://reviews.llvm.org/D142354.
Upon the recommendation of @haoNoQ , we explored an option where instead
of writing a checker, we tried to improve on how the analyzer natively
inlined the methods of std::variant and std::any. Our attempt is in this
patch https://reviews.llvm.org/D145069, but in a nutshell, this is what
happened: The analyzer was able to model much of what happened inside
those classes, but our false positive suppression machinery erroneously
suppressed it. After months of trying, we could not find a satisfying
enhancement on the heuristic without introducing an allowlist/denylist
of which functions to not suppress.
As a result (and partly on the encouragement of @Xazax-hun) I wrote a
dedicated checker!
The advantage of the checker is that it is not dependent on the
standard's implementation and won't put warnings in the standard library
definitions. Also without the checker it would be difficult to create
nice user-friendly warnings and NoteTags -- as per the standard's
specification, the analysis is sinked by an exception, which we don't
model well now.
# Design ideas
The working of the checker is straightforward: We find the creation of
an std::variant instance, store the type of the variable we want to
store in it, then save this type for the instance. When retrieving type
from the instance we check what type we want to retrieve as, and compare
it to the actual type. If the two don't march we emit an error.
Distinguishing variants by instance (e.g. MemRegion *) is not the most
optimal way. Other checkers, like MallocChecker uses a symbol-to-trait
map instead of region-to-trait. The upside of using symbols (which would
be the value of a variant, not the variant itself itself) is that the
analyzer would take care of modeling copies, moves, invalidation, etc,
out of the box. The problem is that for compound types, the analyzer
doesn't create a symbol as a result of a constructor call that is fit
for this job. MallocChecker in contrast manipulates simple pointers.
My colleges and I considered the option of making adjustments directly
to the memory model of the analyzer, but for the time being decided
against it, and go with the bit more cumbersome, but immediately viable
option of simply using MemRegions.
# Current state and review plan
This patch contains an already working checker that can find and report
certain variant/any misuses, but still lands it in alpha. I plan to
upload the rest of the checker in later patches.
The full checker is also able to "follow" the symbolic value held by the
std::variant and updates the program state whenever we assign the value
stored in the variant. I have also built a library that is meant to
model union-like types similar to variant, hence some functions being a
bit more multipurpose then is immediately needed.
I also intend to publish my std::any checker in a later commit.
---------
Co-authored-by: Gabor Spaits <gabor.spaits@ericsson.com>
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Workaround the case when the `this` pointer is actually a `NonLoc`, by
returning `Unknown` instead.
The solution isn't ideal, as `this` should be really a `Loc`, but due to
how casts work, I feel this is our easiest and best option.
As this patch presents, I'm evaluating a cast to transform the `NonLoc`.
However, given that `evalCast()` can't be cast from `NonLoc` to a
pointer type thingy (`Loc`), we end up with `Unknown`.
It is because `EvalCastVisitor::VisitNonLocSymbolVal()` only evaluates
casts that happen from NonLoc to NonLocs.
When I tried to actually implement that case, I figured:
1) Create a `SymbolicRegion` from that `nonloc::SymbolVal`; but
`SymbolRegion` ctor expects a pointer type for the symbol.
2) Okay, just have a `SymbolCast`, getting us the pointer type; but
`SymbolRegion` expects `SymbolData` symbols, not generic `SymExpr`s, as
stated:
> // Because pointer arithmetic is represented by ElementRegion layers,
> // the base symbol here should not contain any arithmetic.
3) We can't use `ElementRegion`s to perform this cast because to have an
`ElementRegion`, you already have to have a `SubRegion` that you want to
cast, but the point is that we don't have that.
At this point, I gave up, and just left a FIXME instead, while still
returning `Unknown` on that path.
IMO this is still better than having a crash.
Fixes#69922
I actually visited each link and added relevant context directly to the code.
This is related to the effort to eliminate internal bug tracker links
(d618f1c, e0ac46e).
Test files still have a lot of rdar links and ids in them.
I haven't touched them yet.
This patch adds CFGElementRef to ProgramPoints
and helps the analyzer to differentiate between
two otherwise identically looking ProgramPoints.
Fixes#60412
Differential Revision: https://reviews.llvm.org/D143328
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
It turns out we can reach the `Init.castAs<nonlock::CompoundVal>()`
expression with other kinds of SVals. Such as by `nonloc::ConcreteInt`
in this example: https://godbolt.org/z/s4fdxrcs9
```lang=C++
int buffer[10];
void b();
void top() {
b(&buffer);
}
void b(int *c) {
*c = 42; // would crash
}
```
In this example, we try to store `42` to the `Elem{buffer, 0}`.
This situation can appear if the CallExpr refers to a function
declaration without prototype. In such cases, the engine will pick the
redecl of the referred function decl which has function body, hence has
a function prototype.
This weird situation will have an interesting effect to the AST, such as
the argument at the callsite will miss a cast, which would cast the
`int (*)[10]` expression into `int *`, which means that when we evaluate
the `*c = 42` expression, we want to bind `42` to an array, causing the
crash.
Look at the AST of the callsite with and without the function prototype:
https://godbolt.org/z/Gncebcbdb
The only difference is that without the proper function prototype, we
will not have the `ImplicitCastExpr` `BitCasting` from `int (*)[10]`
to `int *` to match the expected type of the parameter declaration.
In this patch, I'm proposing to emit a cast in the mentioned edge-case,
to bind the argument value of the expected type to the parameter.
I'm only proposing this if the runtime definition has exactly the same
number of parameters as the callsite feeds it by arguments.
If that's not the case, I believe, we are better off by binding `Unknown`
to those parameters.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D136162
In case when the prvalue is returned from the function (kind is one
of `SimpleReturnedValueKind`, `CXX17ElidedCopyReturnedValueKind`),
then it construction happens in context of the caller.
We pass `BldrCtx` explicitly, as `currBldrCtx` will always refer to callee
context.
In the following example:
```
struct Result {int value; };
Result create() { return Result{10}; }
int accessValue(Result r) { return r.value; }
void test() {
for (int i = 0; i < 2; ++i)
accessValue(create());
}
```
In case when the returned object was constructed directly into the
argument to a function call `accessValue(create())`, this led to
inappropriate value of `blockCount` being used to locate parameter region,
and as a consequence resulting object (from `create()`) was constructed
into a different region, that was later read by inlined invocation of
outer function (`accessValue`).
This manifests itself only in case when calling block is visited more
than once (loop in above example), as otherwise there is no difference
in `blockCount` value between callee and caller context.
This happens only in case when copy elision is disabled (before C++17).
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D132030
This new CTU implementation is the natural extension of the normal single TU
analysis. The approach consists of two analysis phases. During the first phase,
we do a normal single TU analysis. During this phase, if we find a foreign
function (that could be inlined from another TU) then we don’t inline that
immediately, we rather mark that to be analysed later.
When the first phase is finished then we start the second phase, the CTU phase.
In this phase, we continue the analysis from that point (exploded node)
which had been enqueued during the first phase. We gradually extend the
exploded graph of the single TU analysis with the new node that was
created by the inlining of the foreign function.
We count the number of analysis steps of the first phase and we limit the
second (ctu) phase with this number.
This new implementation makes it convenient for the users to run the
single-TU and the CTU analysis in one go, they don't need to run the two
analysis separately. Thus, we name this new implementation as "onego" CTU.
Discussion:
https://discourse.llvm.org/t/rfc-much-faster-cross-translation-unit-ctu-analysis-implementation/61728
Differential Revision: https://reviews.llvm.org/D123773
This patch restores the symmetry between how operator new and operator delete
are handled by also inlining the content of operator delete when possible.
Patch by Fred Tingaud.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D124845
Summary: Refactor return value of `StoreManager::attemptDownCast` function by removing the last parameter `bool &Failed` and replace the return value `SVal` with `Optional<SVal>`. Make the function consistent with the family of `evalDerivedToBase` by renaming it to `evalBaseToDerived`. Aligned the code on the call side with these changes.
Differential Revision: https://reviews.llvm.org/
This patch replaces each use of the previous API with the new one.
In variadic cases, it will use the ADL `matchesAny(Call, CDs...)`
variadic function.
Also simplifies some code involving such operations.
Reviewed By: martong, xazax.hun
Differential Revision: https://reviews.llvm.org/D113591
This patch introduces `CallDescription::matches()` member function,
accepting a `CallEvent`.
Semantically, `Call.isCalled(CD)` is the same as `CD.matches(Call)`.
The patch also introduces the `matchesAny()` variadic free function template.
It accepts a `CallEvent` and at least one `CallDescription` to match
against.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D113590
`CallDescriptions` deserve its own translation unit.
This patch simply moves the corresponding parts.
Also includes the `CallDescription.h` where it's necessary.
Reviewed By: martong, xazax.hun, Szelethus
Differential Revision: https://reviews.llvm.org/D113587
Previously, if accidentally multiple checkers `eval::Call`-ed the same
`CallEvent`, in debug builds the analyzer detected this and crashed
with the message stating this. Unfortunately, the message did not state
the offending checkers violating this invariant.
This revision addresses this by printing a more descriptive message
before aborting.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D112889
It turns out llvm::isa<> is variadic, and we could have used this at a
lot of places.
The following patterns:
x && isa<T1>(x) || isa<T2>(x) ...
Will be replaced by:
isa_and_non_null<T1, T2, ...>(x)
Sometimes it caused further simplifications, when it would cause even
more code smell.
Aside from this, keep in mind that within `assert()` or any macro
functions, we need to wrap the isa<> expression within a parenthesis,
due to the parsing of the comma.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111982
Fallback to stringification and string comparison if we cannot compare
the `IdentifierInfo`s, which is the case for C++ overloaded operators,
constructors, destructors, etc.
Examples:
{ "std", "basic_string", "basic_string", 2} // match the 2 param std::string constructor
{ "std", "basic_string", "~basic_string" } // match the std::string destructor
{ "aaa", "bbb", "operator int" } // matches the struct bbb conversion operator to int
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111535
Refactor the code to make it more readable.
It will set up further changes, and improvements to this code in
subsequent patches.
This is a non-functional change.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111534
'(self.prop)' produces a surprising AST where ParenExpr
resides inside `PseudoObjectExpr.
This breaks ObjCMethodCall::getMessageKind() which in turn causes us
to perform unnecessary dynamic dispatch bifurcation when evaluating
body-farmed property accessors, which in turn causes us
to explore infeasible paths.
This cleanup patch refactors a bunch of functional duplicates of
getDecltypeForParenthesizedExpr into a common implementation.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: aaronpuchert
Differential Revision: https://reviews.llvm.org/D100713
This renames the expression value categories from rvalue to prvalue,
keeping nomenclature consistent with C++11 onwards.
C++ has the most complicated taxonomy here, and every other language
only uses a subset of it, so it's less confusing to use the C++ names
consistently, and mentally remap to the C names when working on that
context (prvalue -> rvalue, no xvalues, etc).
Renames:
* VK_RValue -> VK_PRValue
* Expr::isRValue -> Expr::isPRValue
* SK_QualificationConversionRValue -> SK_QualificationConversionPRValue
* JSON AST Dumper Expression nodes value category: "rvalue" -> "prvalue"
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D103720
This change groups
* Rename: `ignoreParenBaseCasts` -> `IgnoreParenBaseCasts` for uniformity
* Rename: `IgnoreConversionOperator` -> `IgnoreConversionOperatorSingleStep` for uniformity
* Inline `IgnoreNoopCastsSingleStep` into a lambda inside `IgnoreNoopCasts`
* Refactor `IgnoreUnlessSpelledInSource` to make adequate use of `IgnoreExprNodes`
Differential Revision: https://reviews.llvm.org/D86880
Pass EvalCallOptions via runCheckersForEvalCall into defaultEvalCall.
Update the AnalysisOrderChecker to support evalCall for testing.
Differential Revision: https://reviews.llvm.org/D82256