Currently SpecialCaseList created at least twice,
one on by `Driver`, for diagnostics only, and then
the real one by the `ASTContext`.
Also, deppending on enabled sanitizers, not all
sections will be used.
In both cases there is unnecessary RadixTree
construction.
This patch changes `GlobMatcher` to do initialization
lazily only when needed.
And remove empty one from `RegexMatcher`.
This saves saves 0.5% of clang time building large project.
This commit moves the `RegexMatcher`, `GlobMatcher`, `Matcher` and
`Section` classes into an anonymous namespace within
`SpecialCaseList.cpp`. These classes are implementation details of
`SpecialCaseList` and do not need to be exposed in the header.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The SectionStr is already copied in addSection, so
there is no need to copy it again in the Section
constructor.
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
This commit introduces `SpecialCaseList::Match`, a small struct to hold
the matched rule and its line number. This simplifies the `match`
methods by allowing them to return a single value instead of using a
callback.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Preparing to moving most of implementation out of the header file.
* https://github.com/llvm/llvm-project/pull/167280
---------
Co-authored-by: Naveen Seth Hanig <naveen.hanig@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit adds a new RadixTree to `SpecialCaseList` for handling
substring matches. Previously, `SpecialCaseList` only supported prefix
and suffix matching. With this change, patterns that have neither
prefixes nor suffixes can now be efficiently filtered.
According to SpecialCaseListBM:
Lookup benchmarks (significant improvements):
```
OVERALL_GEOMEAN -0.7809
```
Lookup `*test*` like benchmarks (huge improvements):
```
OVERALL_GEOMEAN -0.9947
```
https://gist.github.com/vitalybuka/ee7f681b448eb18974386ab35e2d4d27
This commit enhances the `SpecialCaseList::GlobMatcher` to filter globs
more efficiently by considering both prefixes and suffixes.
Previously, the `GlobMatcher` used a `RadixTree` to store globs based
on their prefixes. This allowed for quick lookup of potential matches
by matching the query string's prefix against the stored prefixes.
However, for globs with common prefixes but different suffixes,
unnecessary glob matching attempts could still occur.
This change introduces a nested `RadixTree` structure:
`PrefixSuffixToGlob: RadixTree<Prefix, RadixTree<Suffix, Globs>>`.
Now, when a query string is matched, it first finds matching prefixes,
and then within those prefix matches, it further filters by matching
the reversed suffix of the query string against the reversed suffixes
of the globs. This significantly reduces the number of `Glob::match`
calls, especially for large special case lists with many globs sharing
common prefixes but differing in their suffixes.
According to SpecialCaseListBM:
Lookup benchmarks (significant improvements):
```
OVERALL_GEOMEAN -0.5815
```
Lookup `*suffix` and `prefix*suffix` like benchmarks (huge
improvements):
```
OVERALL_GEOMEAN -0.9316
```
https://gist.github.com/vitalybuka/e586751902760ced6beefcdf0d7b26fd
This commit optimizes `SpecialCaseList` by using a `RadixTree` to filter
glob patterns based on their prefixes. When matching a query, the
`RadixTree` quickly identifies all glob patterns whose prefixes match
the query's prefix. This significantly reduces the number of glob
patterns that need to be fully evaluated, leading to performance
improvements, especially when dealing with a large number of patterns.
According to SpecialCaseListBM:
Lookup benchmarks (significant improvements):
```
OVERALL_GEOMEAN -0.8177
```
Lookup like `prefix*` benchmarks (huge improvements):
```
OVERALL_GEOMEAN -0.9819
```
https://gist.github.com/vitalybuka/824884bcbc1713e815068c279159dafe
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
On average it saves half positive of Glob matching.
However, in real build most SpecialCaseList unmatched,
this change should not affect this case.
To be able to do so without breaking behavior, we
need to re-order matches according precedence.
Usually it's LineNo, and it's already ordered,
but Diagnostic requires reordering by rule length.
Co-authored-by: Rahul Joshi <rjoshi@nvidia.com>
This extends approach used in WarningsSpecialCaseList
on other types of SpecialCaseList.
SpecialCaseList will remove leading "./" from the Query
for prefixes which looks like file names.
Now affected prefixes are "src", "!src", "mainfile", "source".
To avoid breaking users, this behavior is enabled for
files starting "#!special-case-list-v3".
This way we can roll out new breaking features as opt-int.
E.g. "#!special-case-list-v3" will enabled something new.
Nothing to enabled yet, but with pinpointed default it's an option.
Follow up #162350.
report_fatal_error is not a good way to report diagnostics to the users, so this switches to using actual diagnostic reporting mechanisms instead.
Fixes#147187
https://reviews.llvm.org/D154014 addes glob support and enables it when
`#!special-case-list-v2` is the first line. This patch makes the glob
support the default (faster than regex after
https://reviews.llvm.org/D156046) and switches to the deprecated regex
support if `#!special-case-list-v1` is the first line.
I have surveyed many ignore lists. All ignore lists I find only use
basic `*` `.` and don't use regex metacharacters such as `(` and `)`.
(As neither `src:` nor `fun:` benefits from using regex.)
They are unaffected by the transition (with a caution that regex
`src:x/a.pb.*` matches `x/axpbx` but glob `src:x/a.pb.*` doesn't).
There is no deprecating warning. If a user finds
`#!special-case-list-v1`, they shall read that the old syntax is
deprecated.
Link:
https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666
Add an option in `SpecialCaseList` to use Globs instead of Regex to match patterns. `GlobPattern` was extended in https://reviews.llvm.org/D153587 to support brace expansions which allows us to use patterns like `*/src/foo.{c,cpp}`. It turns out that most patterns only take advantage of `*` so using Regex was overkill and required lots of escaping in practice. This often led to bugs due to forgetting to escape special characters.
Since this would be a breaking change, we temporarily support Regex by default and use Globs when `#!special-case-list-v2` is the first line in the file. Users should switch to the glob format described in https://llvm.org/doxygen/classllvm_1_1GlobPattern.html. For example, `(abc|def)` should become `{abc,def}`.
See discussion in https://reviews.llvm.org/D152762 and https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D154014
`TrigramIndex` was added back in https://reviews.llvm.org/D27188 as an optimization to make `SpecialCaseList::match()` faster. I've found that `TrigramIndex` actually makes the function slower and it has no functional use, so we can remove it.
I grabbed the list of queries passed to `SpecialCaseList::match()` on a random very large file (`AArch64ISelLowering.cpp`) and measured the runtime to call `match()` on all of them with [this line](8e1f820bb4/llvm/lib/Support/SpecialCaseList.cpp (L64)) disabled and then enabled.
```
$ hyperfine --warmup 3 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests' 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests'
Benchmark 1: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests
Time (mean ± σ): 575.9 ms ± 20.3 ms [User: 573.1 ms, System: 2.7 ms]
Range (min … max): 555.5 ms … 620.0 ms 10 runs
Benchmark 2: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests
Time (mean ± σ): 283.4 ms ± 6.7 ms [User: 280.3 ms, System: 3.0 ms]
Range (min … max): 277.0 ms … 294.9 ms 10 runs
Summary
'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests' ran
2.03 ± 0.09 times faster than 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests'
```
Using `perf` I found that most of the runtime in `TrigramIndex::isDefinitelyOut()` comes from a division operation that seems to come from `std::unordered_map`: 8e1f820bb4/llvm/include/llvm/Support/TrigramIndex.h (L62)
Removing `TrigramIndex` will make it easier to potentially switch to using `GlobPattern` instead of a full regex for `SpecialCaseList`. See discussion in https://reviews.llvm.org/D152762 for details.
Reviewed By: MaskRay, #sanitizers, vitalybuka
Differential Revision: https://reviews.llvm.org/D153171
The cleanup was manual, but assisted by "include-what-you-use". It consists in
1. Removing unused forward declaration. No impact expected.
2. Removing unused headers in .cpp files. No impact expected.
3. Removing unused headers in .h files. This removes implicit dependencies and
is generally considered a good thing, but this may break downstream builds.
I've updated llvm, clang, lld, lldb and mlir deps, and included a list of the
modification in the second part of the commit.
4. Replacing header inclusion by forward declaration. This has the same impact
as 3.
Notable changes:
- llvm/Support/TargetParser.h no longer includes llvm/Support/AArch64TargetParser.h nor llvm/Support/ARMTargetParser.h
- llvm/Support/TypeSize.h no longer includes llvm/Support/WithColor.h
- llvm/Support/YAMLTraits.h no longer includes llvm/Support/Regex.h
- llvm/ADT/SmallVector.h no longer includes llvm/Support/MemAlloc.h nor llvm/Support/ErrorHandling.h
You may need to add some of these headers in your compilation units, if needs be.
As an hint to the impact of the cleanup, running
clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Support/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 8000919 lines
after: 7917500 lines
Reduced dependencies also helps incremental rebuilds and is more ccache
friendly, something not shown by the above metric :-)
Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
With updates to various LLVM tools that use SpecialCastList.
It was tempting to use RealFileSystem as the default, but that makes it
too easy to accidentally forget passing VFS in clang code.
Summary:
This is a follow-up to 590f279c456bbde632eca8ef89a85c478f15a249, which
moved some of the callers to use VFS.
It turned out more code in Driver calls into real filesystem APIs and
also needs an update.
Reviewers: gribozavr2, kadircet
Reviewed By: kadircet
Subscribers: ormris, mgorny, hiraditya, llvm-commits, jkorous, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70440
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
Extends SCL functionality to allow users to find the line number in the file the SCL is built from through SpecialCaseList::inSectionBlame(...).
Also removes the need to compile the SCL before use. As the matcher now contains a list of regexes to test against instead of a single regex, the regexes can be individually built on each insertion rather than one large compilation at the end of construction.
This change also fixes a bug where blank lines would cause the parser to become out-of-sync with the line number. An error on line `k` was being reported as being on line `k - num_blank_lines_before_k`.
Note: This change has a cyclical dependency on D39486. Both these changes must be submitted at the same time to avoid a build breakage.
Reviewers: vlad.tsyrklevich
Reviewed By: vlad.tsyrklevich
Subscribers: kcc, pcc, llvm-commits
Differential Revision: https://reviews.llvm.org/D39485
llvm-svn: 317617
Summary:
Sanitizer blacklist entries currently apply to all sanitizers--there
is no way to specify that an entry should only apply to a specific
sanitizer. This is important for Control Flow Integrity since there are
several different CFI modes that can be enabled at once. For maximum
security, CFI blacklist entries should be scoped to only the specific
CFI mode(s) that entry applies to.
Adding section headers to SpecialCaseLists allows users to specify more
information about list entries, like sanitizer names or other metadata,
like so:
[section1]
fun:*fun1*
[section2|section3]
fun:*fun23*
The section headers are regular expressions. For backwards compatbility,
blacklist entries entered before a section header are put into the '[*]'
section so that blacklists without sections retain the same behavior.
SpecialCaseList has been modified to also accept a section name when
matching against the blacklist. It has also been modified so the
follow-up change to clang can define a derived class that allows
matching sections by SectionMask instead of by string.
Reviewers: pcc, kcc, eugenis, vsk
Reviewed By: eugenis, vsk
Subscribers: vitalybuka, llvm-commits
Differential Revision: https://reviews.llvm.org/D37924
llvm-svn: 314170