`StdLibraryFunctionsChecker` contained the following condition for
`getcwd`:
```
.Case({NotNull(0),
ArgumentCondition(1, WithinRange, Range(1, SizeMax)),
ReturnValueCondition(BO_EQ, ArgNo(0))},
ErrnoMustNotBeChecked, GenericSuccessMsg)
```
In this case argument 1 should be not zero and return value is set to be
equal to argument 1. This would mean that return value is implicitly not
zero. But for unknown reason (probably analyzer inaccuracy) it can occur
that the return value is still assumable to be zero after this condition
was applied. This results in false positive if `ErrnoChecker` is enabled
because when the return value is 0 value of `errno` should be allowed to
be read but in this case it is not.
The bug is fixed by adding an extra (theoretically redundant) condition
for the return value to be non-zero.
Fixes#175136
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
The checks for the 'z' and 't' format specifiers added in the original
PR #143653 had some issues and were overly strict, causing some build
failures and were consequently reverted at
4c85bf2fe8.
In the latest commit
27c58629ec,
I relaxed the checks for the 'z' and 't' format specifiers, so warnings
are now only issued when they are used with mismatched types.
The original intent of these checks was to diagnose code that assumes
the underlying type of `size_t` is `unsigned` or `unsigned long`, for
example:
```c
printf("%zu", 1ul); // Not portable, but not an error when size_t is unsigned long
```
However, it produced a significant number of false positives. This was
partly because Clang does not treat the `typedef` `size_t` and
`__size_t` as having a common "sugar" type, and partly because a large
amount of existing code either assumes `unsigned` (or `unsigned long`)
is `size_t`, or they define the equivalent of size_t in their own way
(such as
sanitizer_internal_defs.h).2e67dcfdcd/compiler-rt/lib/sanitizer_common/sanitizer_internal_defs.h (L203)
Including the results of `sizeof`, `sizeof...`, `__datasizeof`,
`__alignof`, `_Alignof`, `alignof`, `_Countof`, `size_t` literals, and
signed `size_t` literals, the results of pointer-pointer subtraction and
checks for standard library functions (and their calls).
The goal is to enable clang and downstream tools such as clangd and
clang-tidy to provide more portable hints and diagnostics.
The previous discussion can be found at #136542.
This PR implements this feature by introducing a new subtype of `Type`
called `PredefinedSugarType`, which was considered appropriate in
discussions. I tried to keep `PredefinedSugarType` simple enough yet not
limited to `size_t` and `ptrdiff_t` so that it can be used for other
purposes. `PredefinedSugarType` wraps a canonical `Type` and provides a
name, conceptually similar to a compiler internal `TypedefType` but
without depending on a `TypedefDecl` or a source file.
Additionally, checks for the `z` and `t` format specifiers in format
strings for `scanf` and `printf` were added. It will precisely match
expressions using `typedef`s or built-in expressions.
The affected tests indicates that it works very well.
Several code require that `SizeType` is canonical, so I kept `SizeType`
to its canonical form.
The failed tests in CI are allowed to fail. See the
[comment](https://github.com/llvm/llvm-project/pull/135386#issuecomment-3049426611)
in another PR #135386.
Closes#57270.
This PR changes the `Stmt *` field in `SymbolConjured` with
`CFGBlock::ConstCFGElementRef`. The motivation is that, when conjuring a
symbol, there might not always be a statement available, causing
information to be lost for conjured symbols, whereas the CFGElementRef
can always be provided at the callsite.
Following the idea, this PR changes callsites of functions to create
conjured symbols, and replaces them with appropriate `CFGElementRef`s.
There is a caveat at loop widening, where the correct location is the
CFG terminator (which is not an element and does not have a ref). In
this case, the first element in the block is passed as a location.
Previous PR #128251, Reverted at #137304.
Per suggestion in
https://github.com/llvm/llvm-project/pull/128251#discussion_r2055916229,
adding a new helper function in `SValBuilder` to conjure a symbol when
given a `CallEvent`.
Tested manually (with assertions) that the `LocationContext *` obtained
from the `CallEvent` are identical to those passed in the original
argument.
This PR changes the `Stmt *` field in `SymbolConjured` with
`CFGBlock::ConstCFGElementRef`. The motivation is that, when conjuring a
symbol, there might not always be a statement available, causing
information to be lost for conjured symbols, whereas the CFGElementRef
can always be provided at the callsite.
Following the idea, this PR changes callsites of functions to create
conjured symbols, and replaces them with appropriate `CFGElementRef`s.
Closes#57270
`NotNullConstraint` is used to check both null and non-null of a pointer.
So the name, which was created originally for just checking non-nullness, becomes less suitable.
The same reason applies to `NotNullBufferConstraint`. This commit renames them.
In addition, messages of the assertions in `describe` and ` describeArgumentValue`
are updated to indicate that these two functions can be called on any constraint
though they were partially implemented for `NotNullConstraint` & `NotNullBufferConstraint`.
For the unix function
`int setsockopt(int, int, int, const void *, socklen_t);`, the last two
parameters represent a buffer and a size.
In case the size is zero, buffer can be null. Previously, the hard-coded
pre-condition requires the buffer to never be null, which can cause
false positives.
(rdar://146678142)
One could create dangling APSInt references in various ways in the past, that were sometimes assumed to be persisted in the BasicValueFactor.
One should always use BasicValueFactory to create persistent APSInts, that could be used by ConcreteInts or SymIntExprs and similar long-living objects.
If one used a temporary or local variables for this, these would dangle.
To enforce the contract of the analyzer BasicValueFactory and the uses of APSInts, let's have a dedicated strong-type for this.
The idea is that APSIntPtr is always owned by the BasicValueFactory, and that is the only component that can construct it.
These PRs are all NFC - besides fixing dangling APSInt references.
This could deduce some complex syms derived from simple ones whose
values could be constrainted to be concrete during execution, thus
reducing some overconstrainted states.
This commit also fix `unix.StdCLibraryFunctions` crash due to these
overconstrainted states being added to the graph, which is marked as
sink node (PosteriorlyOverconstrained). The 'assume' API is used in
non-dual style so the checker should protectively test whether these
newly added nodes are actually impossible.
1. The crash: https://godbolt.org/z/8KKWeKb86
2. The solver needs to solve equivalent: https://godbolt.org/z/ed8WqsbTh
Change formatv() to validate that the number of arguments passed matches
number of replacement fields in the format string, and that the replacement
indices do not contain holes.
To support cases where this cannot be guaranteed, introduce a formatv()
overload that allows disabling validation with a bool flag as its first argument.
Function 'fileno' fails only if invalid pointer is passed, this is a
case that is often ignored in source code. The failure case leads to
many "false positive" reports when `fileno` returns -1 and this is not
checked in the program. Because this, the function is now assumed
to not fail (this is assumption that the passed file pointer is correct).
The change affects `StdCLibraryFunctionsChecker` and
`StreamChecker`.
Some stream functions were recently added to `StreamChecker` that were
not modeled by `StdCLibraryFunctionsChecker`. To ensure consistency
these functions are added to the other checker too.
Some of the related tests are re-organized.
Cleanup most of the lazy-init `BugType` legacy.
Some will be preserved, as those are slightly more complicated to
refactor.
Notice, that the default category for `BugType` is `LogicError`. I
omitted setting this explicitly where I could.
Please, actually have a look at the diff. I did this manually, and we
rarely check the bug type descriptions and stuff in tests, so the
testing might be shallow on this one.
This patch fixes:
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:609:23:
error: 'describe' overrides a member function but is not marked
'override' [-Werror,-Winconsistent-missing-override]
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:627:23:
error: 'describe' overrides a member function but is not marked
'override' [-Werror,-Winconsistent-missing-override]
The checker now displays one combined note tag for errno-related and
"case"-related notes. Previous functions in the errno-modeling part that
were used for construction of note tags are removed. The note tag added
by StdLibraryFunctionsChecker contains the code to display the note tag
for 'errno' (this was done previously by these removed functions).
Recent changes in StdLibraryFunctionsChecker introduced a situation
where the checker sequentially performed two state transitions to add
two separate note tags.
In the unlikely case when the updated state (the variable `NewState`)
was posteriorly overconstrained, the engine marked the node after the
first state transition as a sink to stop the "natural" graph exploration
after that point.
However, in this particular case the checker tried to directly add a
second node, and this triggered an assertion in the `addPredecessor()`
method of `ExplodedNode`.
This commit introduces an explicit `isSink()` check to avoid this crash.
To avoid similar bugs in the future, perhaps it would be possible to
tweak `addTransition()` and ensure that it returns `nullptr` when it
would return a sink node (to unify the two possible error conditions).
This crash was observed in an analysis of the curl project (in a very
long and complex function), and there I validated that this is the root
cause, but I don't have a self-contained testcase that can trigger the
creation of a PosteriorlyOverconstrained node in this situation.
The modeling of send, recv, sendmsg, recvmsg, sendto, recvfrom is changed:
These functions do not return 0, except if the message length is 0.
(In sendmsg, recvmsg the length is not checkable but it is more likely
that a message with 0 length is invalid for these functions.)
Reviewed By: donat.nagy
Differential Revision: https://reviews.llvm.org/D155715
Success or failure messages are now shown at all checked functions, if the call
(return value) is interesting.
Additionally new functions are added: open, openat, socket, shutdown
Reviewed By: donat.nagy
Differential Revision: https://reviews.llvm.org/D154423
The note tag that was previously added in all cases when a standard function call
is found is displayed now only if the function call (return value) is "interesting".
This results in less unneeded notes but some of the previously good notes disappear
too. This is because interestingness is not always set as it should be.
Reviewed By: donat.nagy
Differential Revision: https://reviews.llvm.org/D153776
Change 1: ErrnoChecker notes show only messages related to errno,
not to assumption of success or failure of functions.
Change 2: StdLibraryFunctionsChecker adds its own note about success
or failure of functions, and the errno related note, independently.
Change 3: Every modeled function in StdLibraryFunctionsChecker
should have a note tag message in all "cases". This is not implemented yet,
only for file (stream) related functions.
Reviewed By: donat.nagy
Differential Revision: https://reviews.llvm.org/D153612
I'm involved with the Static Analyzer for the most part.
I think we should embrace newer language standard features and gradually
move forward.
Differential Revision: https://reviews.llvm.org/D154325
Main reason for this change is that these checkers were implemented in the same class
but had different dependency ordering. (NonNullParamChecker should run before StdCLibraryFunctionArgs
to get more special warning about null arguments, but the apiModeling.StdCLibraryFunctions was a modeling
checker that should run before other non-modeling checkers. The modeling checker changes state in a way
that makes it impossible to detect a null argument by NonNullParamChecker.)
To make it more simple, the modeling part is removed as separate checker and can be only used if
checker StdCLibraryFunctions is turned on, that produces the warnings too. Modeling the functions
without bug detection (for invalid argument) is not possible. The modeling of standard functions
does not happen by default from this change on.
Reviewed By: Szelethus
Differential Revision: https://reviews.llvm.org/D151225
If a wrong (too small) buffer argument is found, the dynamic buffer size and
values of connected arguments are displayed in the warning message, if
these are simple known integer values.
Reviewed By: Szelethus
Differential Revision: https://reviews.llvm.org/D149321
Some file and directory related functions have an integer file descriptor argument
that can be a valid file descriptor or a special value AT_FDCWD. This value is
relatively often used in open source projects and is usually defined as a negative
number, and the checker reports false warnings (a valid file descriptor is not
negative) if this fix is not included.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D149160