This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 30 instances that rely on this "redirection", with about
half of them under clang/. Since the redirection doesn't improve
readability, this patch replaces SmallSet with SmallPtrSet for pointer
element types.
I'm planning to remove the redirection eventually.
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
This commit fixes the false positive that C++ class methods with libc
function names would be false warned about. For example,
```
struct T {void strcpy() const;};
void test(const T& t) { str.strcpy(); // no warn }
```
rdar://156264388
The character buffer passed to a "%.*s" specifier may be safely bound if
the precision is properly specified, even if the buffer does not
guarantee null-termination.
For example,
```
void f(std::span<char> span) {
printf("%.*s", (int)span.size(), span.data()); // "span.data()" does not guarantee null-termination but is safely bound by "span.size()", so this call is safe
}
```
rdar://154072130
Refactor the safe pattern analysis of pointer and size expression pairs
so that the check can be re-used in more places. For example, it can be
used to check whether the following cases are safe:
- `std::span<T>{ptr, size} // span construction`
- `snprintf(ptr, size, "%s", ...) // unsafe libc call`
- `printf("%.*s", size, ptr) // unsafe libc call`
Support safe construction of `std::span` from `begin` and `end` calls on
hardened containers or views or `std::initializer_list`s.
For example, the following code is safe:
```
void create(std::initializer_list<int> il) {
std::span<int> input{ il.begin(), il.end() }; // no warn
}
```
rdar://152637380
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The CI on my PR https://github.com/llvm/llvm-project/pull/138895 is
failing with errors like
`clang/lib/Analysis/UnsafeBufferUsage.cpp:971:45: error: reference to
'PointerType' is ambiguous`
This patch should resolve it by removing `using namespace llvm`
The -Wunsafe-buffer-usage analysis currently warns when a const sized
array is safely accessed, with an index that is bound by the remainder
operator. The false alarm is now suppressed.
Example:
int circular_buffer(int array[10], uint idx) {
return array[idx %10]; // Safe operation
}
rdar://148443453
---------
Co-authored-by: MalavikaSamak <malavika2@apple.com>
This changes exposes a low-level helper that is used to implement
`forEachArgumentWithParamType` but can also be used without matchers,
e.g. if performance is a concern.
Commit f5ee10538b68835112323c241ca7db67ca78bf62 introduced a copy of the
implementation of the `forEachArgumentWithParamType` matcher that was
needed for optimizing performance of `-Wunsafe-buffer-usage`.
This change shares the code between the two so that we do not repeat
ourselves and any bugfixes or changes will be picked up by both
implementations in the future.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
The Clang disgnostic `-Wunsafe-buffer-usage` was adding up to +15%
compilation time when used. Profiling showed that most of the overhead
comes from the use of ASTMatchers.
This change replaces the ASTMatcher infrastructure with simple matching
functions and keeps the functionality unchanged. It reduces the overhead
added by `-Wunsafe-buffer-usage` by 87.8%, leaving a negligible
additional compilation time of 1.7% when the diagnostic is used.
**Old version without -Wunsafe-buffer-usage:**
```
$ hyperfine -i -w 1 --runs 5 '/tmp/old_clang -c -std=c++20 /tmp/preprocessed.cc -ferror-limit=20'
Benchmark 1: /tmp/old_clang -c -std=c++20 /tmp/preprocessed.cc -ferror-limit=20
Time (mean ± σ): 231.035 s ± 3.210 s [User: 229.134 s, System: 1.704 s]
Range (min … max): 228.751 s … 236.682 s 5 runs
```
**Old version with -Wunsafe-buffer-usage:**
```
$ hyperfine -i -w 1 --runs 10 '/tmp/old_clang -c -std=c++20 /tmp/preprocessed.cc -ferror-limit=20 -Wunsafe-buffer-usage'
Benchmark 1: /tmp/old_clang -c -std=c++20 /tmp/preprocessed.cc -ferror-limit=20 -Wunsafe-buffer-usage
Time (mean ± σ): 263.840 s ± 0.854 s [User: 262.043 s, System: 1.575 s]
Range (min … max): 262.442 s … 265.142 s 10 runs
```
**New version with -Wunsafe-buffer-usage:**
```
$ hyperfine -i -w 1 --runs 10 '/tmp/new_clang -c -std=c++20 /tmp/preprocessed.cc -ferror-limit=20 -Wunsafe-buffer-usage'
Benchmark 1: /tmp/new_clang -c -std=c++20 /tmp/preprocessed.cc -ferror-limit=20 -Wunsafe-buffer-usage
Time (mean ± σ): 235.169 s ± 1.408 s [User: 233.406 s, System: 1.561 s]
Range (min … max): 232.221 s … 236.792 s 10 runs
```
We can take advantage of the attribute `alloc_size`. For example,
```
void * malloc(size_t size) __attribute__((alloc_size(1)));
std::span<char>{(char *)malloc(x), x}; // this is safe
```
rdar://136634730
`MeasureTokenLength` may return an unsigned 0 representing failure in
obtaining length of a token. The analysis now gives up on such cases.
Otherwise, there might be issues caused by unsigned integer "overflow".
Do not warn when a constant sized array is indexed with an expression
that contains bitwise and operation
involving constants and it always results in a bound safe access.
(rdar://136684050)
---------
Co-authored-by: MalavikaSamak <malavika2@apple.com>
EvaluateAsConstExpr() can return an lvalue which is not compatible
with a subsequent getInt() call. Instead, use EvaluateAsInt() which
will use all techniques availble to get an int result compatible
with the subsequent getInt() call.
This reverts commit 7dd34baf5505d689161c3a8678322a394d7a2929.
Fixed the assertion violation reported by
7dd34baf5505d689161c3a8678322a394d7a2929
Co-authored-by: MalavikaSamak <malavika2@apple.com>
Do not warn when constant sized array is indexed by expressions that
evaluate to a const value. For instance, sizeof(T) expression value can
be evaluated at compile time and if an array is indexed by such an
expression, it's bounds can be validated.
(rdar://140320289)
Co-authored-by: MalavikaSamak <malavika2@apple.com>
Do not warn about unsafe buffer access, when multi-dimensional constant
arrays are accessed and their indices are within the bounds of the
buffer. Warning in such cases would be a false positive. Such a
suppression already exists for 1-d
arrays and it is now extended to multi-dimensional arrays.
(rdar://137926311)
(rdar://140320139)
Co-authored-by: MalavikaSamak <malavika2@apple.com>
Do not warn when two parameter constructor receives pointer address from
a std::addressof method and the span size is set to 1.
(rdar://139298119)
Co-authored-by: MalavikaSamak <malavika2@apple.com>
Do not warn when a string literal is indexed and the idex value is
within the bounds of the length of the string.
(rdar://139106996)
Co-authored-by: MalavikaSamak <malavika2@apple.com>
CXXCtorInitializers are not statements , but they point to an
initializer expression which is. When visiting a FunctionDecl, also
walk through any constructor initializers and run the warning
checks/matchers against their initializer expressions. This catches
warnings for initializing fields and calling other constructors, such
as:
struct C {
C(P* Ptr) : AnUnsafeCtor(Ptr) {}
}
Field initializers can be found by traversing CXXDefaultInitExprs. This
catches warnings in places such as:
struct C {
P* Ptr;
AnUnsafeCtor U{Ptr};
};
We add tests for explicit construction, for field initialization, base
class constructor calls, delegated constructor calls, and aggregate
initialization.
Note that aggregate initialization is not fully covered where a field
specifies an initializer and it's not overridden in the aggregate initialization,
such as in:
struct AggregateViaValueInit {
UnsafeMembers f1;
// FIXME: A construction of this class does initialize the field
// through this initializer, so it should warn. Ideally it should
// also point to where the site of the construction is in
// testAggregateViaValueInit().
UnsafeMembers f2{3};
};
void testAggregateViaValueInit() {
auto A = AggregateViaValueInit();
};
There are 3 tests for different types of aggregate initialization with
FIXMEs documenting this future work.
One attempt to fix this involved returning true from
MatchDescendantVisitor::shouldVisitImplicitCode(), however, it breaks expectations
for field in-class initializers by moving the SourceLocation, possibly
to inside the implicit ctor instead of on the line where the field
initialization happens.
struct C {
P* Ptr;
AnUnsafeCtor U{Ptr}; // expected-warning{{this is never seen then}}
};
Tests are also added for std::span(ptr, size) constructor being called
from a field initializer and a constructor initializer.
Issue #80482
Emit a warning when the raw pointer retrieved from std::vector and
std::array instances are cast to a larger type. Such a cast followed by
a field dereference to the resulting pointer could cause an OOB access.
This is similar to the existing span::data warning.
(rdar://136704278)
Co-authored-by: MalavikaSamak <malavika2@apple.com>
For `snprintf(a, sizeof a, ...)`, the first two arguments form a safe
pattern if `a` is a constant array. In such a case, this commit will
suppress the warning.
(rdar://117182250)
The commit d7dd2c468fecae871ba67e891a3519c758c94b63 crashes for such
an example:
```
void printf() { printf(); }
```
Because it assumes `printf` must have arguments. This commit fixes
this issue.
(rdar://117182250)
Revert commit 23457964392d00fc872fa6021763859024fb38da, and re-land
with a new flag "-Wunsafe-buffer-usage-in-libc-call" for the new
warning.
(rdar://117182250)
[-Wunsafe-buffer-usage] Add warn on unsafe calls to libc functions
Warning about calls to libc functions involving buffer access. Warned
functions are hardcoded by names.
(rdar://117182250)
`QualType::isConstantArrayType()` checks canonical type. So a following
cast should be applied to canonical type as well:
```
if (Ty->isConstantArrayType())
cast<ConstantArrayType>(Ty.getCanonicalType()); // cast<ConstantArrayType>(Ty) is incorrect
```
Extend the unsafe_buffer_usage attribute, so they can also be added to
struct fields. This will cause the compiler to warn about the unsafe
field at their access sites.
Co-authored-by: MalavikaSamak <malavika2@apple.com>
The -Wunsafe-buffer-usage warning should fire on any call to a function
annotated with [[clang::unsafe_buffer_usage]], however it omitted calls
to constructors, since the expression is a CXXConstructExpr which does
not subclass CallExpr. Thus the matcher on callExpr() does not find
these expressions.
Add a new WarningGadget that matches cxxConstructExpr that are calling a
CXXConstructDecl annotated by [[clang::unsafe_buffer_usage]] and fires
the warning. The new UnsafeBufferUsageCtorAttrGadget gadget explicitly
avoids matching against the std::span(ptr, size) constructor because
that is handled by SpanTwoParamConstructorGadget and we never want two
gadgets to match the same thing (and this is guarded by asserts).
The gadgets themselves do not report the warnings, instead each gadget's
Stmt is passed to the UnsafeBufferUsageHandler (implemented by
UnsafeBufferUsageReporter). The Reporter is previously hardcoded that a
CXXConstructExpr statement must be a match for std::span(ptr, size), but
that is no longer the case. We want the Reporter to generate different
warnings (in the -Wunsafe-buffer-usage-in-container subgroup) for the
span contructor. And we will want it to report more warnings for other
std-container-specific gadgets in the future. To handle this we allow
the gadget to control if the warning is general (it calls
handleUnsafeBufferUsage()) or is a std-container-specific warning (it
calls handleUnsafeOperationInContainer()).
Then the WarningGadget grows a virtual method to dispatch to the
appropriate path in the UnsafeBufferUsageHandler. By doing so, we no
longer need getBaseStmt in the Gadget interface. The only use of it for
FixableGadgets was to get the SourceLocation, so we make an explicit
virtual method for that on Gadget. Then the handleUnsafeOperation()
dispatcher can be a virtual method that is only in WarningGadget.
The SpanTwoParamConstructorGadget gadget dispatches to
handleUnsafeOperationInContainer() while the other WarningGadgets all
dispatch to the original handleUnsafeBufferUsage().
Tests are added for annotated constructors, conversion operattors, call
operators, fold expressions, and regular methods.
Issue #80482
UPCAddressofArraySubscriptGadget::getClaimedVarUseSites should skip
parentheses when accessing the DeclRefExpr, otherwise a crash happens
with parenthesized references.
In PR #79382, I need to add a new type that derives from
ConstantArrayType. This means that ConstantArrayType can no longer use
`llvm::TrailingObjects` to store the trailing optional Expr*.
This change refactors ConstantArrayType to store a 60-bit integer and
4-bits for the integer size in bytes. This replaces the APInt field
previously in the type but preserves enough information to recreate it
where needed.
To reduce the number of places where the APInt is re-constructed I've
also added some helper methods to the ConstantArrayType to allow some
common use cases that operate on either the stored small integer or the
APInt as appropriate.
Resolves#85124.
Without the fix gcc warned like
../../clang/lib/Analysis/UnsafeBufferUsage.cpp:2203:26: warning: unused variable 'CArrTy' [-Wunused-variable]
2203 | } else if (const auto *CArrTy = Ctx.getAsConstantArrayType(
| ^~~~~~
Example:
int * const my_var = my_initializer;
Currently when transforming my_var to std::span the fixits:
- replace "int * const my_var = " with "std::span<int> const my_var {"
- add ", SIZE}" after "my_initializer" where SIZE is either inferred or
a placeholder
This patch makes that behavior less intrusive by not modifying variable
cv-qualifiers and initialization syntax.
The new behavior is:
- replace "int *" with "std::span<int>"
- add "{" before "my_initializer"
- add ", SIZE}" after "my_initializer"
This is an improvement on its own - since we don't touch the identifier,
we automatically can handle macros in them.
It also simplifies future work on initializer fixits.