This commit implements the entirety of the now-accepted [N3017
-Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded by
AST consumers (CodeGen or constant expression evaluators) or expand
embed directive as a comma expression.
This reverts commit
682d461d5a.
---------
Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
This patch is a functional change.
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520
As a result of this patch, individual Z3 queries in refutation will be
bound by 300ms. Every report equivalence class will be processed in
at most 1 second.
The heuristic should have only really marginal observable impact -
except for the cases when we had big report eqclasses with long-running
(15s) Z3 queries, where previously CSA effectively halted.
After this patch, CSA will tackle such extreme cases as well.
Reviewers: NagyDonat, haoNoQ, Xazax-hun, Szelethus, mikhailramalho
Reviewed By: NagyDonat
Pull Request: https://github.com/llvm/llvm-project/pull/95129
This change enables more accurate modeling of the write effects of
`fread`. In particular, instead of invalidating the whole buffer, in a
best-effort basis, we would try to invalidate the actually accesses
elements of the buffer. This preserves the previous value of the buffer
of the unaffected slots. As a result, diagnose more uninitialized buffer
uses for example.
Currently, this refined invalidation only triggers for `fread` if and
only if the `count` parameter and the buffer pointer's index component
are concrete or perfectly-constrained symbols.
Additionally, if the `fread` would read more than 64 elements, the whole
buffer is invalidated as before. This is to have safeguards against
performance issues.
Refer to the comments of the assertions in the following example to see
the changes in the diagnostics:
```c++
void demo() {
FILE *fp = fopen("/home/test", "rb+");
if (!fp) return;
int buffer[10]; // uninitialized
int read_items = fread(buffer+1, sizeof(int), 5, fp);
if (5 == read_items) {
int v1 = buffer[1]; // Unknown value but not garbage.
clang_analyzer_isTainted(v1); // expected-warning {{YES}} <-- Would be "NO" without this patch.
clang_analyzer_dump(v1); // expected-warning {{conj_}} <-- Not a "derived" symbol, so it's directly invalidated now.
int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before.
(void)(v1 + v0);
} else {
// If 'fread' had an error.
int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before.
(void)v0;
}
fclose(fp);
}
```
CPP-3247, CPP-3802
Co-authored by Marco Borgeaud (marco-antognini-sonarsource)
This commit implements the entirety of the now-accepted [N3017 -
Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded
by AST consumers (CodeGen or constant expression evaluators) or
expand embed directive as a comma expression.
---------
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
Co-authored-by: Podchishchaeva, Mariya <mariya.podchishchaeva@intel.com>
This patch implements the 'loop' construct AST, as well as the basic
appertainment rule. Additionally, it sets up the 'parent' compute
construct, which is necessary for codegen/other diagnostics.
A 'loop' can apply to a for or range-for loop, otherwise it has no other
restrictions (though some of its clauses do).
Depends on https://github.com/llvm/llvm-project/pull/92527
Clang now support the following:
- Extending lifetime of object bound to reference members of aggregates,
that are created from default member initializer.
- Rebuild `CXXDefaultArgExpr` and `CXXDefaultInitExpr` as needed where
called or constructed.
But CFG and ExprEngine need to be updated to address this change.
This PR add `CXXDefaultArgExpr` and `CXXDefaultInitExpr` into CFG, and
correct handle these expressions in ExprEngine
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This commit deletes the "simple" constructor of `CallDescription` which
did not require a `CallDescription::Mode` argument and always used the
"wildcard" mode `CDM::Unspecified`.
A few months ago, this vague matching mode was used by many checkers,
which caused bugs like https://github.com/llvm/llvm-project/issues/81597
and https://github.com/llvm/llvm-project/issues/88181. Since then, my
commits improved the available matching modes and ensured that all
checkers explicitly specify the right matching mode.
After those commits, the only remaining references to the "simple"
constructor were some unit tests; this commit updates them to use an
explicitly specified matching mode (often `CDM::SimpleFunc`).
The mode `CDM::Unspecified` was not deleted in this commit because it's
still a reasonable choice in `GenericTaintChecker` and a few unit tests.
Resolves#89264
Values should not be stored in addresses of labels, this throws a fatal
error when this happens.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
24 under clang/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
OpenACC is going to need an array sections implementation that is a
simpler version/more restrictive version of the OpenMP version.
This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`,
then adds an enum to choose between the two.
This also fixes a couple of 'drive-by' issues that I discovered on the way,
but leaves the OpenACC Sema parts reasonably unimplemented (no semantic
analysis implementation), as that will be a followup patch.
Interestingly, this case crashed from the very beginning of the project,
at least starting by clang-3.
As a "fix" I just do the same thing as we do for concrete integers. It
might not be the best we could do, but arguably, it's still better than
crashing.
Fixes#89185
The class `KnownSVal` was very magical abstract class within the `SVal`
class hierarchy: with a hacky `classof` method it acted as if it was the
common ancestor of the classes `UndefinedSVal` and `DefinedSVal`.
However, it was only used in two `getAs<KnownSVal>()` calls and the
signatures of two methods, which does not "pay for" its weird behavior,
so I created this commit that removes it and replaces its use with more
straightforward solutions.
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.
Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.
Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.
This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.
This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).
The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.
After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.
This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.
After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.
This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.
As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
HLSL constant sized array function parameters do not decay to pointers.
Instead constant sized array types are preserved as unique types for
overload resolution, template instantiation and name mangling.
This implements the change by adding a new `ArrayParameterType` which
represents a non-decaying `ConstantArrayType`. The new type behaves the
same as `ConstantArrayType` except that it does not decay to a pointer.
Values of `ConstantArrayType` in HLSL decay during overload resolution
via a new `HLSLArrayRValue` cast to `ArrayParameterType`.
`ArrayParamterType` values are passed indirectly by-value to functions
in IR generation resulting in callee generated memcpy instructions.
The behavior of HLSL function calls is documented in the [draft language
specification](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf)
under the Expr.Post.Call heading.
Additionally the design of this implementation approach is documented in
[Clang's
documentation](https://clang.llvm.org/docs/HLSL/FunctionCalls.html)
Resolves#70123
In PR #79382, I need to add a new type that derives from
ConstantArrayType. This means that ConstantArrayType can no longer use
`llvm::TrailingObjects` to store the trailing optional Expr*.
This change refactors ConstantArrayType to store a 60-bit integer and
4-bits for the integer size in bytes. This replaces the APInt field
previously in the type but preserves enough information to recreate it
where needed.
To reduce the number of places where the APInt is re-constructed I've
also added some helper methods to the ConstantArrayType to allow some
common use cases that operate on either the stored small integer or the
APInt as appropriate.
Resolves#85124.
When debugging CSA issues, sometimes it would be useful to have a
dedicated note for the analysis entry point, aka. the function name you
would need to pass as "-analyze-function=XYZ" to reproduce a specific
issue.
One way we use (or will use) this downstream is to provide tooling on
top of creduce to enhance to supercharge productivity by automatically
reduce cases on crashes for example.
This will be added only if the "-analyzer-note-analysis-entry-points" is
set or the "analyzer-display-progress" is on.
This additional entry point marker will be the first "note" if enabled,
with the following message: "[debug] analyzing from XYZ". They are
prefixed by "[debug]" to remind the CSA developer that this is only
meant to be visible for them, for debugging purposes.
CPP-5012
This reapplies 80ab8234ac309418637488b97e0a62d8377b2ecf again, after
fixing a name collision warning in the unit tests (see the revert commit
13ccaf9b9d4400bb128b35ff4ac733e4afc3ad1c for details).
In addition to the previously applied changes, this commit also clarifies the
code in MallocChecker that distinguishes POSIX "getline()" and C++ standard
library "std::getline()" (which are two completely different functions). Note
that "std::getline()" was (accidentally) handled correctly even without this
clarification; but it's better to explicitly handle and test this corner case.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
According to POSIX 2018.
1. lineptr, n and stream can not be NULL.
2. If *n is non-zero, *lineptr must point to a region of at least *n
bytes, or be a NULL pointer.
Additionally, if *lineptr is not NULL, *n must not be undefined.
Fixes https://github.com/llvm/llvm-project/issues/84463
Changes:
- Adapted MemRegion::getDescriptiveName
- Added unittest to check name for a given clang::ento::ElementRegion
- Some format changes due to clang-format
---------
Co-authored-by: Andreas Steinhausen <andreas.steinhausen@concenrio.io>
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Inside the ExprEngine when we process the initializers, we create a
PostInitializer program-point, which will refer to the field being
initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.
When a constructor (of which we evaluate the initializer-list) is
analyzed in top-level context, then the `this` pointer will be
represented by a `SymbolicRegion`, (as it should be).
This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the
initialized region.
```c++
class Bear {
public:
void brum() const;
};
class Door {
public:
// PostInitializer would refer to "FieldRegion{SymRegion{this}}"
// whereas in the store and everywhere else it would be:
// "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
Door() : ptr(nullptr) {
ptr->brum(); // Bug
}
private:
Bear* ptr;
};
```
We (as CSA folks) decided to avoid the creation of FieldRegions directly
of symbolic regions in the past:
f8643a9b31
---
In this patch, I propose to also canonicalize it as in the mentioned
patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0,
Ty}`
This would mean that FieldRegions will/should never simply wrap a
SymbolicRegion directly, but rather an ElementRegion that is sitting in
between.
This patch should have practically no observable effects, as the store
(due to the mentioned patch) was made resilient to this issue, but we
use `PostInitializer::getLocationValue()` for an alternative reporting,
where we faced this issue.
Note that in really rare cases it suppresses now dereference bugs, as
demonstrated in the test. It is because in the past we failed to follow
the region of the PostInitializer inside the StoreSiteFinder visitor -
because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional<PostInitializer> PIP =
Pred->getLocationAs<PostInitializer>()) {
const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP->getInitializer()->getInit();
}
}
```
Notice that the equality check didn't pass for the regions I'm
canonicalizing in this patch.
Given the nature of this change, we would rather upstream this patch.
CPP-4954
StaticAnalyzer didn't check if the variable is declared in
`CompoundStmt` under `SwitchStmt`, which make static analyzer reach root
without finding the declaration.
Fixes#68819
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
This reverts commit e48d5a838f69e0a8e0ae95a8aed1a8809f45465a.
Fails to build on x86-64 w/gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
with the following message:
../llvm-project/clang/unittests/StaticAnalyzer/IsCLibraryFunctionTest.cpp:41:28: error: declaration of ‘std::unique_ptr<clang::ASTUnit> IsCLibraryFunctionTest::ASTUnit’ changes meaning of ‘ASTUnit’ [-fpermissive]
41 | std::unique_ptr<ASTUnit> ASTUnit;
| ^~~~~~~
In file included from ../llvm-project/clang/unittests/StaticAnalyzer/IsCLibraryFunctionTest.cpp:4:
../llvm-project/clang/include/clang/Frontend/ASTUnit.h:89:7: note: ‘ASTUnit’ declared here as ‘class clang::ASTUnit’
89 | class ASTUnit {
| ^~~~~~~
From issue #73088. I changed `NodeBuilderContext` into a class.
Additionally, there were some other mentions of the former being a
struct which I also changed into a class. This is my first time working
with an issue so I will be open to hearing any advice or changes that
need to be done.
Previously, the function `isCLibraryFunction()` and logic relying on it
only accepted functions that are declared directly within a TU (i.e. not
in a namespace or a class). However C++ headers like <cstdlib> declare
many C standard library functions within the namespace `std`, so this
commit ensures that functions within the namespace `std` are also
accepted.
After this commit it will be possible to match functions like `malloc`
or `free` with `CallDescription::Mode::CLibrary`.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
`getdelim` and `getline` may free, allocate, or re-allocate the input
buffer, ensuring its size is enough to hold the incoming line, the
delimiter, and the null terminator.
`*lineptr` must be a valid argument to `free`, which means it can be
either
1. `NULL`, in which case these functions perform an allocation
equivalent to a call to `malloc` even on failure.
2. A pointer returned by the `malloc` family of functions. Other
pointers are UB (`alloca`, a pointer to a static, to a stack variable, etc.)
This commit adds a testcase which highlights the current incorrect
behavior of the CSA diagnostic generation: it produces a note which says
"Assuming 'arg' is >= 0" in a situation where this is not a fresh
assumption because 'arg' is an unsigned integer.
I also created ticket 78440 to track this bug.
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.
However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597
To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.
Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
- `CDF_None` was the default "match anything" mode,
- `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).
Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
HLSL supports vector truncation and element conversions as part of
standard conversion sequences. The vector truncation conversion is a C++
second conversion in the conversion sequence. If a vector truncation is
in a conversion sequence an element conversion may occur after it before
the standard C++ third conversion.
Vector element conversions can be boolean conversions, floating point or
integral conversions or promotions.
[HLSL Draft
Specification](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf)
---------
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
The attribute is now allowed on an assortment of declarations, to
suppress warnings related to declarations themselves, or all warnings in
the lexical scope of the declaration.
I don't necessarily see a reason to have a list at all, but it does look
as if some of those more niche items aren't properly supported by the
compiler itself so let's maintain a short safe list for now.
The initial implementation raised a question whether the attribute
should apply to lexical declaration context vs. "actual" declaration
context. I'm using "lexical" here because it results in less warnings
suppressed, which is the conservative behavior: we can always expand it
later if we think this is wrong, without breaking any existing code. I
also think that this is the correct behavior that we will probably never
want to change, given that the user typically desires to keep the
suppressions as localized as possible.
'serial', 'parallel', and 'kernel' constructs are all considered
'Compute' constructs. This patch creates the AST type, plus the required
infrastructure for such a type, plus some base types that will be useful
in the future for breaking this up.
The only difference between the three is the 'kind'( plus some minor
clause legalization rules, but those can be differentiated easily
enough), so rather than representing them as separate AST nodes, it
seems
to make sense to make them the same.
Additionally, no clause AST functionality is being implemented yet, as
that fits better in a separate patch, and this is enough to get the
'naked' constructs implemented.
This is otherwise an 'NFC' patch, as it doesn't alter execution at all,
so there aren't any tests. I did this to break up the review workload
and to get feedback on the layout.
This is a follow-up for 721dd3bc2 [analyzer] NFC: Don't regenerate
duplicate HTML reports.
Because HTMLRewriter re-runs the Lexer for syntax highlighting and macro
expansion purposes, it may get fairly expensive when the rewriter is
invoked multiple times on the same file. In the static analyzer (which
uses HTMLRewriter for HTML output mode) we only get away with this
because there are usually very few reports emitted per file. But if loud
checkers are enabled, such as `webkit.*`, this may explode in complexity
and even cause the compiler to run over the 32-bit SourceLocation
addressing limit.
This patch caches intermediate results so that re-lexing only needed to
happen once.
As the clever __COUNTER__ test demonstrates, "once" is still too many.
Ideally we shouldn't re-lex anything at all, which remains a TODO.
There are currently a few checkers that don't fill in the bug report's
"decl-with-issue" field (typically a function in which the bug is
found).
The new attribute `[[clang::suppress]]` uses decl-with-issue to reduce
the size of the suppression source range map so that it didn't need to
do that for the entire translation unit.
I'm already seeing a few problems with this approach so I'll probably
redesign it in some point as it looks like a premature optimization. Not
only checkers shouldn't be required to pass decl-with-issue (consider
clang-tidy checkers that never had such notion), but also it's not
necessarily uniquely determined (consider leak suppressions at
allocation site).
For now I'm adding a simple stop-gap solution that falls back to
building the suppression map for the entire TU whenever decl-with-issue
isn't specified. Which won't happen in the default setup because luckily
all default checkers do provide decl-with-issue.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
`OpaqueValueExpr` doesn't necessarily contain a source expression.
Particularly, after #78041, it is used to carry the type and the value
kind of a non-type template argument of floating-point type or referring
to a subobject (those are so called `StructuralValue` arguments).
This fixes#79575.
Implements https://isocpp.org/files/papers/P2662R3.pdf
The feature is exposed as an extension in older language modes.
Mangling is not yet supported and that is something we will have to do before release.
Previously the function `RangeConstraintManager::printValue()` crashed
when it encountered an empty rangeset (because `RangeSet::getBitwidth()`
and `RangeSet::isUnsigned()` assert that the rangeset is not empty).
This commit adds a special case that avoids this behavior.
As `printValue()` is only used by the checker debug.ExprInspection (and
during manual debugging), the impacts of this commit are very limited.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>