162 Commits

Author SHA1 Message Date
Matheus Izvekov
91cdd35008
[clang] Improve nested name specifier AST representation (#147835)
This is a major change on how we represent nested name qualifications in
the AST.

* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.

This has great results on compile-time-tracker as well:

![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831)

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.

The rest and bulk of the changes are mostly consequences of the changes
in API.

PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
2025-08-09 05:06:53 -03:00
Endre Fülöp
4f2ed926db
[analyzer][NFCi] Pass if bind is to a Decl or not to checkBind (#152137)
Binding a value to location can happen when a new value is created or
when and existing value is updated. This modification exposes whether
the value binding happens at a declaration.
This helps simplify the hacky logic of the BindToImmutable checker.
2025-08-08 19:16:47 +02:00
Donát Nagy
a807e8ea9f
[analyzer] Prettify checker registration and unittest code (#147797)
This commit tweaks the interface of `CheckerRegistry::addChecker` to
make it more practical for plugins and tests:
- The parameter `IsHidden` now defaults to `false` even in the
  non-templated overload (because setting it to true is unusual,
  especially in plugins).
- The parameter `DocsUri` defaults to the dummy placeholder string
  `"NoDocsUri"` because (as of now) nothing queries its value from the
  checker registry (it's only used by the logic that generates the
  clang-tidy documentation, but that loads it directly from `Checkers.td`
  without involving the `CheckerRegistry`), so there is no reason to
  demand specifying this value.

In addition to propagating these changes, this commit clarifies,
corrects and extends lots of comments and performs various minor code
quality improvements in the code of unit tests and example plugins.

I originally wrote the bulk of this commit when I was planning to add an
extra parameter to `addChecker` in order to implement some technical
details of the CheckerFamily framework. At the end I decided against
adding that extra parameter, so this cleanup was left out of the PR
https://github.com/llvm/llvm-project/pull/139256 and I'm merging it now
as a separate commit (after minor tweaks).

This commit is mostly NFC: the only functional change is that the
analyzer will be compatible with plugins that rely on the default
argument values and don't specify `IsHidden` or `DocsUri`. (But existing
plugin code will remain valid as well.)
2025-07-22 13:36:58 +02:00
Kazu Hirata
6ab6321d03
[clang] Use range-based for loops (NFC) (#143153)
Note that use of llvm::for_each is discouraged unless we have functors
readily available.
2025-06-06 09:16:41 -07:00
Balázs Benics
104f5d1ff8
[analyzer] Introduce the check::BlockEntrance checker callback (#140924)
Tranersing the CFG blocks of a function is a fundamental operation. Many
C++ constructs can create splits in the control-flow, such as `if`,
`for`, and similar control structures or ternary expressions, gnu
conditionals, gotos, switches and possibly more.

Checkers should be able to get notifications about entering or leaving a
CFG block of interest.

Note that in the ExplodedGraph there is always a BlockEntrance
ProgramPoint right after the BlockEdge ProgramPoint. I considered naming
this callback check::BlockEdge, but then that may leave the observer of
the graph puzzled to see BlockEdge points followed more BlockEdge nodes
describing the same CFG transition. This confusion could also apply to
Bug Report Visitors too.

Because of this, I decided to hook BlockEntrance ProgramPoints instead.
The same confusion applies here, but I find this still a better place
TBH. There would only appear only one BlockEntrance ProgramPoint in the
graph if no checkers modify the state or emit a bug report. Otherwise
they modify some GDM (aka. State) thus create a new ExplodedNode with
the same BlockEntrance ProgramPoint in the graph.

CPP-6484
2025-05-27 10:11:12 +02:00
Kazu Hirata
1cc9e6e5aa
[StaticAnalyzer] Use llvm::count (NFC) (#141370) 2025-05-24 14:45:46 -07:00
Kazu Hirata
f002f300c5
[clang] Remove unused local variables (NFC) (#138453) 2025-05-04 10:51:40 -07:00
Reid Kleckner
e3c0565b74
Reapply "[cmake] Refactor clang unittest cmake" (#134195)
This reapplies 5ffd9bdb50b57 (#133545) with fixes.

The BUILD_SHARED_LIBS=ON build was fixed by adding missing LLVM
dependencies to the InterpTests binary in
unittests/AST/ByteCode/CMakeLists.txt .
2025-04-02 21:07:30 -07:00
dpalermo
03a791f703
Revert "[cmake] Refactor clang unittest cmake" (#134022)
Reverts llvm/llvm-project#133545

This change is breaking several buildbots as well as developer's builds.
Reverting to allow people to make progress.
2025-04-01 22:19:27 -05:00
Reid Kleckner
5ffd9bdb50
[cmake] Refactor clang unittest cmake (#133545)
Pass all the dependencies into add_clang_unittest. This is consistent
with how it is done for LLDB. I borrowed the same named argument list
structure from add_lldb_unittest. This is a necessary step towards
consolidating unit tests into fewer binaries, but seems like a good
refactoring in its own right.
2025-04-01 14:12:44 -07:00
Qinkun Bao
0cd82327ff
Fix some typos (NFC) (#133558) 2025-03-29 20:54:15 +01:00
Donát Nagy
ea107d5c63
[NFC][analyzer] Use CheckerBase::getName in checker option handling (#131612)
The virtual method `ProgramPointTag::getTagDescription` had two very
distinct use cases:
- It is printed in the DOT graph visualization of the exploded graph
(that is, a debug printout).
- The checker option handling code used it to query the name of a
checker, which relied on the coincidence that in `CheckerBase` this
method is defined to be equivalent with `getName()`.

This commit switches to using `getName` in the second use case, because
this way we will be able to properly support checkers that have multiple
(separately named) parts.

The method `reportInvalidCheckerOptionName` is extended with an
additional overload that allows specifying the `CheckerPartIdx`. The
methods `getChecker*Option` could be extended analogously in the future,
but they are just convenience wrappers around the variants that directly
take `StringRef CheckerName`, so I'll only do this extension if it's
needed.
2025-03-18 16:11:43 +01:00
T-Gruber
9c542bcf0a
[analyzer] performTrivialCopy triggers checkLocation before binding (#129016)
The triggered callbacks for the default copy constructed instance and
the instance used for initialization now behave in the same way. The LHS
already calls checkBind. To keep this consistent, checkLocation is now
triggered accordingly for the RHS.
Further details on the previous discussion:
https://discourse.llvm.org/t/checklocation-for-implicitcastexpr-of-kind-ck-noop/84729

---------

Authored-by: tobias.gruber <tobias.gruber@concentrio.io>
2025-03-04 17:00:55 +01:00
Balázs Benics
22a5bb32b7
[analyzer] Limit Store by region-store-binding-limit (#127602)
In our test pool, the max entry point RT was improved by this change:
1'181 seconds (~19.7 minutes) -> 94 seconds (1.6 minutes)

BTW, the 1.6 minutes is still really bad. But a few orders of magnitude
better than it was before.

This was the most servere RT edge-case as you can see from the numbers.
There are are more known RT bottlenecks, such as:

- Large environment sizes, and `removeDead`. See more about the failed
attempt on improving it at:
https://discourse.llvm.org/t/unsuccessful-attempts-to-fix-a-slow-analysis-case-related-to-removedead-and-environment-size/84650

- Large chunk of time could be spend inside `assume`, to reach a fixed
point. This is something we want to look into a bit later if we have
time.

We have 3'075'607 entry points in our test set.
About 393'352 entry points ran longer than 1 second when measured.

To give a sense of the distribution, if we ignore the slowest 500 entry
points, then the maximum entry point runs for about 14 seconds. These
500 slow entry points are in 332 translation units.

By this patch, out of the slowest 500 entry points, 72 entry points were
improved by at least 10x after this change.

We measured no RT regression on the "usual" entry points.


![slow-entrypoints-before-and-after-bind-limit](https://github.com/user-attachments/assets/44425a76-f1cb-449c-bc3e-f44beb8c5dc7)
(The dashed lines represent the maximum of their RT)

CPP-6092
2025-02-24 15:48:06 +01:00
Balazs Benics
3dc159431b
[analyzer] Clean up slightly the messed up ownership model of the analyzer (#128368)
Well, yes. It's not pretty.
At least after this we would have a bit more unique pointers than
before.

This is for fixing the memory leak diagnosed by:
https://lab.llvm.org/buildbot/#/builders/24/builds/5580

And that caused the revert of #127409.

After these uptrs that patch can re-land finally.
2025-02-24 11:34:36 +01:00
Ziqing Luo
536606f6f6
[StaticAnalyzer] Fix state update in VisitObjCForCollectionStmt (#124477)
In `VisitObjCForCollectionStmt`, the function does `evalLocation` for
the current element at the original source state `Pred`. The evaluation
may result in a new state, say `PredNew`. I.e., there is a transition:
`Pred -> PredNew`, though it is a very rare case that `Pred` is NOT
identical to `PredNew`. (This explains why the bug exists for many years
but no one noticed until recently a crash observed downstream.) Later,
the original code does NOT use `PredNew` as the new source state in
`StmtNodeBuilder` for next transitions. In cases `Pred != PredNew`, the
program ill behaves.

(rdar://143280254)
2025-01-30 16:21:46 -08:00
Balazs Benics
5f6b714507
[analyzer][NFC] Simplify PositiveAnalyzerOption handling (#121910)
This simplifies #120239
Addresses my comment at:
https://github.com/llvm/llvm-project/pull/120239#issuecomment-2574600543

CPP-5920
2025-01-07 15:19:16 +01:00
Balazs Benics
55391f85ac
[analyzer] Retry UNDEF Z3 queries 2 times by default (#120239)
If we have a refutation Z3 query timed out (UNDEF), allow a couple of
retries to improve stability of the query. By default allow 2 retries,
which will give us in maximum of 3 solve attempts per query.

Retries should help mitigating flaky Z3 queries.
See the details in the following RFC:

https://discourse.llvm.org/t/analyzer-rfc-retry-z3-crosscheck-queries-on-timeout/83711

Note that with each attempt, we spend more time per query.
Currently, we have a 15 seconds timeout per query - which are also in
effect for the retry attempts.

---

Why should this help?
In short, retrying queries should bring stability because if a query
runs long
it's more likely that it did so due to some runtime anomaly than it's on
the edge of succeeding. This is because most queries run quick, and the
queries that run long, usually run long by a fair amount.
Consequently, retries should improve the stability of the outcome of the
Z3 query.

In general, the retries shouldn't increase the overall analysis time
because it's really rare we hit the 0.1% of the cases when we would do
retries. But keep in mind that the retry attempts can add up if many
retries are allowed, or the individual query timeout is large.

CPP-5920
2025-01-06 18:08:12 +01:00
Kristóf Umann
ea8e328ae2
[analyzer][Z3] Restore the original timeout of 15s (#118291)
Discussion here:

https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus

The original patch, #97298 introduced new timeouts backed by thorough
testing and measurements to keep the running time of Z3 within
reasonable limits. The measurements also showed that only certain
reports and certain TUs were responsible for the poor performance of Z3
refutation.

Unfortunately, it seems like that on machines with different
characteristics (slower machines) the current timeouts don't just axe
0.01% of reports, but many more as well. Considering that timeouts are
inherently nondeterministic as a cutoff point, this lead reports sets
being vastly different on the same projects with the same configuration.
The discussion link shows that all configurations introduced in the
patch with their default values lead to severa nondeterminism of the
analyzer. As we, and others use the analyzer as a gating tool for PRs,
we should revert to the original defaults.

We should respect that
* There are still parts of the analyzer that are either proven or
suspected to contain nondeterministic code (like pointer sets),
* A 15s timeout is more likely to hit the same reports every time on a
wider range of machines, but is still inherently nondeterministic, but
an infinite timeout leads to the tool hanging,
* If you measure the performance of the analyzer on your machines, you
can and should achieve some speedup with little or no observable
nondeterminism.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-12-13 14:31:06 +01:00
Balazs Benics
e67e03a22c
[analyzer] EvalBinOpLL should return Unknown less often (#114222)
SValBuilder::getKnownValue, getMinValue, getMaxValue use
SValBuilder::simplifySVal.

simplifySVal does repeated simplification until a fixed-point is
reached. A single step is done by SimpleSValBuilder::simplifySValOnce,
using a Simplifier visitor. That will basically decompose SymSymExprs,
and apply constant folding using the constraints we have in the State.
Once it decomposes a SymSymExpr, it simplifies both sides and then uses
the SValBuilder::evalBinOp to reconstruct the same - but now simpler -
SymSymExpr, while applying some caching to remain performant.

This decomposition, and then the subsequent re-composition poses new
challenges to the SValBuilder::evalBinOp, which is built to handle
expressions coming from real C/C++ code, thus applying some implicit
assumptions.

One previous assumption was that nobody would form an expression like
"((int*)0) - q" (where q is an int pointer), because it doesn't really
makes sense to write code like that.

However, during simplification, we may end up with a call to evalBinOp
similar to this.

To me, simplifying a SymbolRef should never result in Unknown or Undef,
unless it was Unknown or Undef initially or, during simplification we
realized that it's a division by zero once we did the constant folding,
etc.

In the following case the simplified SVal should not become UnknownVal:
```c++
void top(char *p, char *q) {
  int diff = p - q; // diff: reg<p> - reg<q>
  if (!p) // p: NULL
    simplify(diff); // diff after simplification should be: 0(loc) - reg<q>
}
```

Returning Unknown from the simplifySVal can weaken analysis precision in
other places too, such as in SValBuilder::getKnownValue, getMinValue, or
getMaxValue because we call simplifySVal before doing anything else.

For nonloc::SymbolVals, this loss of precision is critical, because for
those the SymbolRef carries an accurate type of the encoded computation,
thus we should at least have a conservative upper or lower bound that we
could return from getMinValue or getMaxValue - yet we would just return
nullptr.

```c++
const llvm::APSInt *SimpleSValBuilder::getKnownValue(ProgramStateRef state,
                                                      SVal V) {
  return getConstValue(state, simplifySVal(state, V));
}

const llvm::APSInt *SimpleSValBuilder::getMinValue(ProgramStateRef state,
                                                    SVal V) {
  V = simplifySVal(state, V);

  if (const llvm::APSInt *Res = getConcreteValue(V))
    return Res;

  if (SymbolRef Sym = V.getAsSymbol())
    return state->getConstraintManager().getSymMinVal(state, Sym);

  return nullptr;
}
```

For now, I don't plan to make the simplification bullet-proof, I'm just
explaining why I made this change and what you need to look out for in
the future if you see a similar issue.

CPP-5750
2024-10-31 11:01:47 +01:00
T-Gruber
86d65ae794
[analyzer] Improve FieldRegion descriptive name (#112313)
The current implementation of MemRegion::getDescriptiveName fails for
FieldRegions whose SuperRegion is an ElementRegion. As outlined below:
```Cpp
struct val_struct { int val; };
extern struct val_struct val_struct_array[3];

void func(){
  // FieldRegion with ElementRegion as SuperRegion.
  val_struct_array[0].val;
}
```

For this special case, the expression cannot be pretty printed and must
therefore be obtained separately.
2024-10-25 11:59:16 +02:00
JOE1994
918972bded [clang] Strip unneeded calls to raw_string_ostream::str() (NFC)
Avoid extra layer of indirection.

p.s.
Also, remove calls to raw_string_ostream::flush(), which are no-ops.
2024-09-14 04:38:50 -04:00
T-Gruber
87c51e2af0
Run PreStmt/PostStmt checker for GCCAsmStmt (#95409)
Fixes #94940

Run PreStmt and PostStmt checker for GCCAsmStmt.
Unittest to validate that corresponding callback functions are triggered.
2024-07-10 14:15:53 +02:00
Martin Storsjö
45b360d4a2
[clang] Disable C++14 sized deallocation by default for MinGW targets (#97232)
This reverts 130e93cc26ca9d3ac50ec5a92e3109577ca2e702 for the MinGW
target.

This avoids the issue that is discussed in
https://github.com/llvm/llvm-project/issues/96899 (and which is
summarized in the code comment). This is intended as a temporary
workaround until the issue is handled better within libc++.
2024-07-02 00:03:15 +03:00
Xing Xue
668ee3f547
[clang] Default to -fno-sized-deallocation for AIX (#97076)
Some `libc++` LIT test cases and user code define their own version of
`operator delete` that are not sized. With `-fno-sized-deallocation`,
destructors call the non-sized `operator delete` and it will be resolved
to the user defined version. However, with `-fsized-deallocation`,
destructors will call the sized `operator delete` which will be resolved
to the weak definition in `libc++abi` because the user code does not
define the corresponding sized version. The `libc++abi` sized `operator
delete` in turn calls the non-sized version of `operator delete` of the
same shared object inside `libc++abi` instead of the user defined
version on AIX because runtime linking is not the default for AIX and
therefore, fails the tests or user code. This patch sets
`-fno-sized-deallocation` as the default for AIX if neither
`-fsize-deallocation` nor `-fno-sized-deallocation` is explicitly set,
similar to what is done for ZOS.
2024-07-01 15:37:52 -04:00
Balazs Benics
ae570d82e8
Reland "[analyzer] Harden safeguards for Z3 query times" (#97298)
This is exactly as originally landed in #95129,
but now the minimal Z3 version was increased to meet this change in #96682.

https://discourse.llvm.org/t/bump-minimal-z3-requirements-from-4-7-1-to-4-8-9/79664/4

---

This patch is a functional change.
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520

As a result of this patch, individual Z3 queries in refutation will be
bound by 300ms. Every report equivalence class will be processed in at
most 1 second.

The heuristic should have only really marginal observable impact -
except for the cases when we had big report eqclasses with long-running
(15s) Z3 queries, where previously CSA effectively halted. After this
patch, CSA will tackle such extreme cases as well.

(cherry picked from commit eacc3b3504be061f7334410dd0eb599688ba103a)
2024-07-01 17:22:24 +02:00
Balazs Benics
b3b0d09cce
Reland "[analyzer][NFC] Reorganize Z3 report refutation" (#97265)
This is exactly as originally landed in #95128,
but now the minimal Z3 version was increased to meet this change in #96682.

https://discourse.llvm.org/t/bump-minimal-z3-requirements-from-4-7-1-to-4-8-9/79664/4

---

This change keeps existing behavior, namely that if we hit a Z3 timeout
we will accept the report as "satisfiable".

This prepares for the commit "Harden safeguards for Z3 query times".
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520

(cherry picked from commit 89c26f6c7b0a6dfa257ec090fcf5b6e6e0c89aab)
2024-07-01 16:03:18 +02:00
Balazs Benics
8fc9c03cde
[analyzer] Revert Z3 changes (#95916)
Requested in:
https://github.com/llvm/llvm-project/pull/95128#issuecomment-2176008007

Revert "[analyzer] Harden safeguards for Z3 query times"
Revert "[analyzer][NFC] Reorganize Z3 report refutation"

This reverts commit eacc3b3504be061f7334410dd0eb599688ba103a.
This reverts commit 89c26f6c7b0a6dfa257ec090fcf5b6e6e0c89aab.
2024-06-18 14:59:28 +02:00
Balazs Benics
eacc3b3504
[analyzer] Harden safeguards for Z3 query times
This patch is a functional change.
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520

As a result of this patch, individual Z3 queries in refutation will be
bound by 300ms. Every report equivalence class will be processed in
at most 1 second.

The heuristic should have only really marginal observable impact -
except for the cases when we had big report eqclasses with long-running
(15s) Z3 queries, where previously CSA effectively halted.
After this patch, CSA will tackle such extreme cases as well.

Reviewers: NagyDonat, haoNoQ, Xazax-hun, Szelethus, mikhailramalho

Reviewed By: NagyDonat

Pull Request: https://github.com/llvm/llvm-project/pull/95129
2024-06-18 09:48:22 +02:00
Balazs Benics
89c26f6c7b
[analyzer][NFC] Reorganize Z3 report refutation
This change keeps existing behavior, namely that if we hit a Z3 timeout
we will accept the report as "satisfiable".

This prepares for the commit "Harden safeguards for Z3 query times".
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520

Reviewers: NagyDonat, haoNoQ, Xazax-hun, mikhailramalho, Szelethus

Reviewed By: NagyDonat

Pull Request: https://github.com/llvm/llvm-project/pull/95128
2024-06-18 09:42:29 +02:00
Martin Storsjö
f31b197d9d
[analyzer] Fix a test issue in mingw configurations (#92737)
On Windows, long is always 32 bit, thus one can't use long for casting
pointers to integers, on 64 bit architectures.

Instead use long long, which should be large enough.

This avoids errors like "error: cast from pointer to smaller type 'long'
loses information" in this testcase.

This condition only seems to be an error in mingw mode; in MSVC mode
(clang-cl), this is only a warning.
2024-05-27 10:18:03 +03:00
Pengcheng Wang
130e93cc26
Reland "[clang] Enable sized deallocation by default in C++14 onwards" (#90373)
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.

This is another try of https://reviews.llvm.org/D112921.

The original commit cf5a8b4 was reverted by 2e5035a due to some
failures (see #83774).

Fixes #60061
2024-05-22 12:37:27 +08:00
Donát Nagy
58bad2862c
[analyzer][NFC] Require explicit matching mode for CallDescriptions (#92454)
This commit deletes the "simple" constructor of `CallDescription` which
did not require a `CallDescription::Mode` argument and always used the
"wildcard" mode `CDM::Unspecified`.

A few months ago, this vague matching mode was used by many checkers,
which caused bugs like https://github.com/llvm/llvm-project/issues/81597
and https://github.com/llvm/llvm-project/issues/88181. Since then, my
commits improved the available matching modes and ensured that all
checkers explicitly specify the right matching mode.

After those commits, the only remaining references to the "simple"
constructor were some unit tests; this commit updates them to use an
explicitly specified matching mode (often `CDM::SimpleFunc`).

The mode `CDM::Unspecified` was not deleted in this commit because it's
still a reasonable choice in `GenericTaintChecker` and a few unit tests.
2024-05-17 13:08:45 +02:00
Vitaly Buka
2e5035aeed
Revert "[clang] Enable sized deallocation by default in C++14 onwards (#83774)" (#90299)
https://lab.llvm.org/buildbot/#/builders/168/builds/20063 (should be
fixed with #90292)

More details in #83774

This reverts commit cf5a8b489464d09dfdd7a48ce7c8b41d3c9bf819.
2024-04-26 17:14:43 -07:00
Pengcheng Wang
cf5a8b4894
[clang] Enable sized deallocation by default in C++14 onwards (#83774)
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.

This is another try of https://reviews.llvm.org/D112921.

Fixes #60061
2024-04-26 16:59:12 +08:00
NagyDonat
fb299cae51
[analyzer] Make recognition of hardened __FOO_chk functions explicit (#86536)
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.

Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.

Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.

This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.

This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).

The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.

After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.

This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.

After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.

This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.

As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-04-05 11:20:27 +02:00
NagyDonat
e1d4ddb0c6
Reapply "[analyzer] Accept C library functions from the std namespace" again (#85791)
This reapplies 80ab8234ac309418637488b97e0a62d8377b2ecf again, after
fixing a name collision warning in the unit tests (see the revert commit
13ccaf9b9d4400bb128b35ff4ac733e4afc3ad1c for details).

In addition to the previously applied changes, this commit also clarifies the
code in MallocChecker that distinguishes POSIX "getline()" and C++ standard
library "std::getline()" (which are two completely different functions). Note
that "std::getline()" was (accidentally) handled correctly even without this
clarification; but it's better to explicitly handle and test this corner case.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-03-25 12:43:51 +01:00
T-Gruber
86d479fd7c
Adapted MemRegion::getDescriptiveName to handle ElementRegions (#85104)
Fixes https://github.com/llvm/llvm-project/issues/84463

Changes:
- Adapted MemRegion::getDescriptiveName
- Added unittest to check name for a given clang::ento::ElementRegion
- Some format changes due to clang-format

---------

Co-authored-by: Andreas Steinhausen <andreas.steinhausen@concenrio.io>
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-03-21 18:27:53 +01:00
Philip Reames
13ccaf9b9d Revert "Reapply "[analyzer] Accept C library functions from the std namespace""
This reverts commit e48d5a838f69e0a8e0ae95a8aed1a8809f45465a.

Fails to build on x86-64 w/gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
with the following message:

../llvm-project/clang/unittests/StaticAnalyzer/IsCLibraryFunctionTest.cpp:41:28: error: declaration of ‘std::unique_ptr<clang::ASTUnit> IsCLibraryFunctionTest::ASTUnit’ changes meaning of ‘ASTUnit’ [-fpermissive]
   41 |   std::unique_ptr<ASTUnit> ASTUnit;
      |                            ^~~~~~~
In file included from ../llvm-project/clang/unittests/StaticAnalyzer/IsCLibraryFunctionTest.cpp:4:
../llvm-project/clang/include/clang/Frontend/ASTUnit.h:89:7: note: ‘ASTUnit’ declared here as ‘class clang::ASTUnit’
   89 | class ASTUnit {
      |       ^~~~~~~
2024-03-13 10:19:42 -07:00
NagyDonat
e48d5a838f Reapply "[analyzer] Accept C library functions from the std namespace"
This reapplies f32b04d4ea91ad1018c25a1d4178cc4392d34968i, after fixing
the use-after-free of ASTUnit in the unittest.
https://github.com/llvm/llvm-project/pull/84469#issuecomment-1992163439

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-03-13 14:48:42 +01:00
NagyDonat
f32b04d4ea
Revert "[analyzer] Accept C library functions from the std namespace" (#84926)
Reverts llvm/llvm-project#84469 because it causes buildbot failures.
I'll examine them and re-submit the change.
2024-03-12 16:01:04 +01:00
NagyDonat
80ab8234ac
[analyzer] Accept C library functions from the std namespace (#84469)
Previously, the function `isCLibraryFunction()` and logic relying on it
only accepted functions that are declared directly within a TU (i.e. not
in a namespace or a class). However C++ headers like <cstdlib> declare
many C standard library functions within the namespace `std`, so this
commit ensures that functions within the namespace `std` are also
accepted.

After this commit it will be possible to match functions like `malloc`
or `free` with `CallDescription::Mode::CLibrary`.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-03-12 13:51:12 +01:00
NagyDonat
52a460f9d4
[analyzer] Refactor CallDescription match mode (NFC) (#83432)
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.

However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597

To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.

Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
 - `CDF_None` was the default "match anything" mode,
 - `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).

Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
2024-03-04 15:43:37 +01:00
Balazs Benics
18f219c5ac
[analyzer][NFC] Cleanup BugType lazy-init patterns (#76655)
Cleanup most of the lazy-init `BugType` legacy.
Some will be preserved, as those are slightly more complicated to
refactor.

Notice, that the default category for `BugType` is `LogicError`. I
omitted setting this explicitly where I could.

Please, actually have a look at the diff. I did this manually, and we
rarely check the bug type descriptions and stuff in tests, so the
testing might be shallow on this one.
2024-01-01 18:53:36 +01:00
Kazu Hirata
f3dcc2351c
[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00
Jan Svoboda
8e0c9bb91f [clang] NFCI: Change returned AnalyzerOptions smart pointer to reference 2023-09-05 13:23:53 -07:00
Aaron Ballman
a02f9a7756 Revert "[clang] Enable sized deallocation by default in C++14 onwards"
This reverts commit 2916b125f686115deab2ba573dcaff3847566ab9.

Reverting due to failures on:
https://lab.llvm.org/buildbot/#/builders/216/builds/26407
https://lab.llvm.org/staging/#/builders/247/builds/5659
http://45.33.8.238/win/83485/step_7.txt
2023-08-29 09:36:59 -04:00
wangpc
2916b125f6 [clang] Enable sized deallocation by default in C++14 onwards
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.

Reviewed By: rnk, aaron.ballman, #libc, ldionne, Mordante, MaskRay

Differential Revision: https://reviews.llvm.org/D112921
2023-08-29 15:42:50 +08:00
Donát Nagy
8a5cfdf785 [analyzer][NFC] Remove useless class BuiltinBug
...because it provides no useful functionality compared to its base
class `BugType`.

A long time ago there were substantial differences between `BugType` and
`BuiltinBug`, but they were eliminated by commit 1bd58233 in 2009 (!).
Since then the only functionality provided by `BuiltinBug` was that it
specified `categories::LogicError` as the bug category and it stored an
extra data member `desc`.

This commit sets `categories::LogicError` as the default value of the
third argument (bug category) in the constructors of BugType and
replaces use of the `desc` field with simpler logic.

Note that `BugType` has a data member `Description` and a non-virtual
method `BugType::getDescription()` which queries it; these are distinct
from the member `desc` of `BuiltinBug` and the identically named method
`BuiltinBug::getDescription()` which queries it. This confusing name
collision was a major motivation for the elimination of `BuiltinBug`.

As this commit touches many files, I avoided functional changes and left
behind FIXME notes to mark minor issues that should be fixed later.

Differential Revision: https://reviews.llvm.org/D158855
2023-08-28 15:20:14 +02:00
isuckatcs
d65379c8d4 [analyzer] Remove the loop from the exploded graph caused by missing information in program points
This patch adds CFGElementRef to ProgramPoints
and helps the analyzer to differentiate between
two otherwise identically looking ProgramPoints.

Fixes #60412

Differential Revision: https://reviews.llvm.org/D143328
2023-03-04 02:01:45 +01:00