llvm-project/clang-tools-extra/clang-tidy/misc/ConfusableIdentifierCheck.h
Richard Smith c56c349d39
[clang-tidy] Switch misc-confusable-identifiers check to a faster algorithm. (#130369)
Optimizations:

- Only build the skeleton for each identifier once, rather than once for
each declaration of that identifier.
- Only compute the contexts in which identifiers are declared for
identifiers that have the same skeleton as another identifier in the
translation unit.
- Only compare pairs of declarations that are declared in related
contexts, rather than comparing all pairs of declarations with the same
skeleton.

Also simplify by removing the caching of enclosing `DeclContext` sets,
because with the above changes we don't even compute the enclosing
`DeclContext` sets in common cases. Instead, we terminate the traversal
to enclosing `DeclContext`s immediately if we've already found another
declaration in that context with the same identifier. (This optimization
is not currently applied to the `forallBases` traversal, but could be
applied there too if needed.)

This also fixes two bugs that together caused the check to fail to find
some of the issues it was looking for:

- The old check skipped comparisons of declarations from different
contexts unless both declarations were type template parameters. This
caused the checker to not warn on some instances of the CVE it is
intended to detect.
- The old check skipped comparisons of declarations in all base classes
other than the first one found by the traversal. This appears to be an
oversight, incorrectly returning `false` rather than `true` from the
`forallBases` callback, which terminates traversal.

This also fixes an issue where the check would have false positives for
template parameters and function parameters in some cases, because those
parameters sometimes have a parent `DeclContext` that is the parent of
the parameterized entity, or sometimes is the translation unit. In
either case, this would cause warnings about declarations that are never
visible together in any scope.

This decreases the runtime of this check, especially in the common case
where there are few or no skeletons with two or more different
identifiers. Running this check over LLVM, clang, and clang-tidy, the
wall time for the check as reported by clang-tidy's internal profiler is
reduced from 5202.86s to 3900.90s.
2025-07-01 13:31:46 -07:00

45 lines
1.6 KiB
C++

//===--- ConfusableIdentifierCheck.h - clang-tidy ---------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_CONFUSABLE_IDENTIFIER_CHECK_H
#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_CONFUSABLE_IDENTIFIER_CHECK_H
#include "../ClangTidyCheck.h"
#include "llvm/ADT/DenseMap.h"
namespace clang::tidy::misc {
/// Finds symbol which have confusable identifiers, i.e. identifiers that look
/// the same visually but have a different Unicode representation.
/// If symbols are confusable but don't live in conflicting namespaces, they are
/// not reported.
class ConfusableIdentifierCheck : public ClangTidyCheck {
public:
ConfusableIdentifierCheck(StringRef Name, ClangTidyContext *Context);
~ConfusableIdentifierCheck();
void registerMatchers(ast_matchers::MatchFinder *Finder) override;
void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
void onEndOfTranslationUnit() override;
std::optional<TraversalKind> getCheckTraversalKind() const override {
return TK_IgnoreUnlessSpelledInSource;
}
private:
void addDeclToCheck(const NamedDecl *ND, const Decl *Parent);
llvm::DenseMap<
const IdentifierInfo *,
llvm::SmallVector<std::pair<const NamedDecl *, const Decl *>, 1>>
NameToDecls;
};
} // namespace clang::tidy::misc
#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_CONFUSABLE_IDENTIFIER_CHECK_H